Bug#1074234: scikit-learn: CVE-2024-5206

Moritz Mühlenhoff jmm at inutil.org
Mon Jun 24 22:36:11 BST 2024


Source: scikit-learn
X-Debbugs-CC: team at security.debian.org
Severity: important
Tags: security

Hi,

The following vulnerability was published for scikit-learn.

CVE-2024-5206[0]:
| A sensitive data leakage vulnerability was identified in scikit-
| learn's TfidfVectorizer, specifically in versions up to and
| including 1.4.1.post1, which was fixed in version 1.5.0. The
| vulnerability arises from the unexpected storage of all tokens
| present in the training data within the `stop_words_` attribute,
| rather than only storing the subset of tokens required for the TF-
| IDF technique to function. This behavior leads to the potential
| leakage of sensitive information, as the `stop_words_` attribute
| could contain tokens that were meant to be discarded and not stored,
| such as passwords or keys. The impact of this vulnerability varies
| based on the nature of the data being processed by the vectorizer.

https://huntr.com/bounties/14bc0917-a85b-4106-a170-d09d5191517c
https://github.com/scikit-learn/scikit-learn/commit/70ca21f106b603b611da73012c9ade7cd8e438b8 (1.5.0rc1)


If you fix the vulnerability please also make sure to include the
CVE (Common Vulnerabilities & Exposures) id in your changelog entry.

For further information see:

[0] https://security-tracker.debian.org/tracker/CVE-2024-5206
    https://www.cve.org/CVERecord?id=CVE-2024-5206

Please adjust the affected versions in the BTS as needed.



More information about the debian-science-maintainers mailing list