Bug#925294: Does not work without extra downloads
Enrico Zini
enrico at debian.org
Fri Mar 22 15:18:53 GMT 2019
Package: python3-nltk
Version: 3.4-1
Severity: normal
Hello,
I tried to use nltk for simple work tokenization, but it fails:
>>> import nltk
>>> nltk.word_tokenize("foo")
…
LookupError:
**********************************************************************
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('punkt')
Attempted to load tokenizers/punkt/PY3/english.pickle
Searched in:
- '/home/enrico/nltk_data'
- '/usr/nltk_data'
- '/usr/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- ''
**********************************************************************
I am extremely reluctant to run unreviewed code that downloads random
data from the internet in some unspecified way, and does unspecified
things with it, to the point that I decided to give up using the library
altogether.
It would have been an entirely different story if the datasets that nltk
needs were also packaged in Debian, so that it could have worked out of
the box.
Enrico
-- System Information:
Debian Release: buster/sid
APT prefers testing
APT policy: (500, 'testing')
Architecture: amd64 (x86_64)
Kernel: Linux 4.19.0-2-amd64 (SMP w/4 CPU cores)
Kernel taint flags: TAINT_WARN, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=en_IE.UTF-8, LC_CTYPE=en_IE.UTF-8 (charmap=UTF-8), LANGUAGE=en_IE:en (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
Versions of packages python3-nltk depends on:
ii python3 3.7.2-1
ii python3-six 1.12.0-1
Versions of packages python3-nltk recommends:
ii prover9 0.0.200911a-2.1+b2
ii python3-numpy 1:1.16.1-1
ii python3-tk 3.7.2-3
python3-nltk suggests no packages.
-- no debconf information
More information about the debian-science-maintainers
mailing list