[Python-modules-team] Bug#671842: python-html5lib: lxml builder: ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
Jakub Wilk
jwilk at debian.org
Mon May 7 11:48:05 UTC 2012
Package: python-html5lib
Version: 0.90-2
Severity: normal
lxml builder raises an exception when parsing a string with control
characters:
>>> import html5lib
>>> html5lib.parse('foo\bfoo', treebuilder='lxml')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 38, in parse
return p.parse(doc, encoding=encoding)
File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 211, in parse
parseMeta=parseMeta, useChardet=useChardet)
File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 111, in _parse
self.mainLoop()
File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 174, in mainLoop
self.phase.processCharacters(token)
File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 572, in processCharacters
self.parser.phase.processCharacters(token)
File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 611, in processCharacters
self.parser.phase.processCharacters(token)
File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 652, in processCharacters
self.parser.phase.processCharacters(token)
File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 711, in processCharacters
self.parser.phase.processCharacters(token)
File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 804, in processCharacters
self.parser.phase.processCharacters(token)
File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 948, in processCharacters
self.tree.insertText(token["data"])
File "/usr/lib/pymodules/python2.7/html5lib/treebuilders/_base.py", line 288, in insertText
parent.insertText(data)
File "/usr/lib/pymodules/python2.7/html5lib/treebuilders/etree_lxml.py", line 225, in insertText
builder.Element.insertText(self, data, insertBefore)
File "/usr/lib/pymodules/python2.7/html5lib/treebuilders/etree.py", line 114, in insertText
self._element.text += data
File "lxml.etree.pyx", line 904, in lxml.etree._Element.text.__set__ (src/lxml/lxml.etree.c:37110)
File "apihelpers.pxi", line 721, in lxml.etree._setNodeText (src/lxml/lxml.etree.c:16855)
File "apihelpers.pxi", line 1366, in lxml.etree._utf8 (src/lxml/lxml.etree.c:22060)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
-- System Information:
Debian Release: wheezy/sid
APT prefers unstable
APT policy: (990, 'unstable'), (500, 'experimental')
Architecture: i386 (x86_64)
Kernel: Linux 3.3.0-trunk-amd64 (SMP w/2 CPU cores)
Locale: LANG=C, LC_CTYPE=pl_PL.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages python-html5lib depends on:
ii python 2.7.2-10
ii python-support 1.0.14
Versions of packages python-html5lib suggests:
ii python-beautifulsoup <none>
ii python-chardet 2.0.1-2
ii python-genshi <none>
ii python-lxml 2.3.2-1
--
Jakub Wilk
More information about the Python-modules-team
mailing list