Bug#1002623: nltk: CVE-2021-43854

Salvatore Bonaccorso carnil at debian.org
Sat Dec 25 20:00:25 GMT 2021


Source: nltk
Version: 3.6.5-1
Severity: important
Tags: security upstream
Forwarded: https://github.com/nltk/nltk/issues/2866
X-Debbugs-Cc: carnil at debian.org, Debian Security Team <team at security.debian.org>

Hi,

The following vulnerability was published for nltk.

CVE-2021-43854[0]:
| NLTK (Natural Language Toolkit) is a suite of open source Python
| modules, data sets, and tutorials supporting research and development
| in Natural Language Processing. Versions prior to 3.6.5 are vulnerable
| to regular expression denial of service (ReDoS) attacks. The
| vulnerability is present in PunktSentenceTokenizer, sent_tokenize and
| word_tokenize. Any users of this class, or these two functions, are
| vulnerable to the ReDoS attack. In short, a specifically crafted long
| input to any of these vulnerable functions will cause them to take a
| significant amount of execution time. If your program relies on any of
| the vulnerable functions for tokenizing unpredictable user input, then
| we would strongly recommend upgrading to a version of NLTK without the
| vulnerability. For users unable to upgrade the execution time can be
| bounded by limiting the maximum length of an input to any of the
| vulnerable functions. Our recommendation is to implement such a limit.


If you fix the vulnerability please also make sure to include the
CVE (Common Vulnerabilities & Exposures) id in your changelog entry.

For further information see:

[0] https://security-tracker.debian.org/tracker/CVE-2021-43854
    https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-43854
[1] https://github.com/nltk/nltk/issues/2866
[2] https://github.com/nltk/nltk/security/advisories/GHSA-f8m6-h2c7-8h9x

Please adjust the affected versions in the BTS as needed.

Regards,
Salvatore



More information about the debian-science-maintainers mailing list