[Python-modules-team] Bug#720341: [python-sphinxcontrib.spelling] Spellchecker is not unicode avare in PythonBuiltinsFilter class
Slavko
linux at slavino.sk
Tue Aug 20 18:08:42 UTC 2013
Package: python-sphinxcontrib.spelling
Version: 1.4-1
Severity: normal
Tags: patch
Hi,
the package has a problem with unicode strings/words in in the
PythonBuiltinsFilter's _skip method and when i try to check rst
documents written in Slovak language i get:
Exception occurred:
File "/usr/lib/pymodules/python2.7/sphinx/application.py", line 204,
in build
self.builder.build_update()
File "/usr/lib/pymodules/python2.7/sphinx/builders/__init__.py", line
191, in build_update
self.build(['__all__'], to_build)
File "/usr/lib/pymodules/python2.7/sphinx/builders/__init__.py", line
252, in build
self.write(docnames, list(updated_docnames), method)
File "/usr/lib/pymodules/python2.7/sphinx/builders/__init__.py", line
292, in write
self.write_doc(docname, doctree)
File "/usr/lib/pymodules/python2.7/sphinxcontrib/spelling.py", line
295, in write_doc
for word, suggestions in self.checker.check(node.astext()):
File "/usr/lib/pymodules/python2.7/sphinxcontrib/spelling.py", line
203, in check
for word, pos in self.tokenizer(text):
File "/usr/lib/python2.7/dist-packages/enchant/tokenize/__init__.py",
line 389, in next
(word,pos) = next(self._tokenizer)
File "/usr/lib/python2.7/dist-packages/enchant/tokenize/__init__.py",
line 389, in next
(word,pos) = next(self._tokenizer)
File "/usr/lib/python2.7/dist-packages/enchant/tokenize/__init__.py",
line 390, in next
while self._skip(word):
File "/usr/lib/pymodules/python2.7/sphinxcontrib/spelling.py", line
150, in _skip
return hasattr(__builtin__, word)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in
position 2: ordinal not in range(128)
After some inspection i found, that sphinx sends all words as unicode
strings (<type 'unicode'>), not matter if they have non ASCII chars or
not, but when there are non ASCII chars here is a problem,
because the hasattr function gets the *str* as argument. Solution seems
to be to add "encode...", to convert from unicode to the str type (at
line 149):
return hasattr(__builtin__, word.encode("utf-8"))
I am not sure if it is workaround or solution, but seems to work for
English texts too. Patch attached.
regards
--- System information. ---
Architecture: amd64
Kernel: Linux 3.10-2-amd64
Debian Release: jessie/sid
--- Package information. ---
Depends (Version) | Installed
===============================-+-============
python | 2.7.5-2
python-support (>= 0.90.0) | 1.0.15
python-docutils | 0.10-3
python-enchant | 1.6.5-2
python-sphinx | 1.1.3+dfsg-8
--
Slavko
http://slavino.sk
-------------- next part --------------
A non-text attachment was scrubbed...
Name: spelling_unicode.patch
Type: text/x-diff
Size: 447 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/python-modules-team/attachments/20130820/79cf5ace/attachment.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: OpenPGP digital signature
URL: <http://lists.alioth.debian.org/pipermail/python-modules-team/attachments/20130820/79cf5ace/attachment.sig>
More information about the Python-modules-team
mailing list