[Python-modules-team] Bug#639390: python-html5lib: lxml builder: ValueError: Invalid attribute name

Jakub Wilk jwilk at debian.org
Fri Aug 26 17:41:03 UTC 2011


Package: python-html5lib
Version: 0.90-2

>>> import html5lib
>>> html5lib.parse('<div><div><a/</div></div>\n', treebuilder='lxml')
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 38, in parse
     return p.parse(doc, encoding=encoding)
   File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 211, in parse
     parseMeta=parseMeta, useChardet=useChardet)
   File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 111, in _parse
     self.mainLoop()
   File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 176, in mainLoop
     self.phase.processSpaceCharacters(token)
   File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 952, in processSpaceCharacters
     self.tree.reconstructActiveFormattingElements()
   File "/usr/lib/pymodules/python2.7/html5lib/treebuilders/_base.py", line 181, in reconstructActiveFormattingElements
     clone = entry.cloneNode() #Mainly to get a new copy of the attributes
   File "/usr/lib/pymodules/python2.7/html5lib/treebuilders/etree.py", line 136, in cloneNode
     element.attributes[name] = value
   File "lxml.etree.pyx", line 2145, in lxml.etree._Attrib.__setitem__ (src/lxml/lxml.etree.c:46818)
   File "apihelpers.pxi", line 558, in lxml.etree._setAttributeValue (src/lxml/lxml.etree.c:15734)
   File "apihelpers.pxi", line 1554, in lxml.etree._attributeValidOrRaise (src/lxml/lxml.etree.c:24197)
ValueError: Invalid attribute name u'<'


Funnily enough, the problem goes away if I remove the trailing newline.

-- System Information:
Debian Release: wheezy/sid
   APT prefers unstable
   APT policy: (990, 'unstable'), (500, 'experimental')
Architecture: i386 (x86_64)

Kernel: Linux 3.0.0-1-amd64 (SMP w/2 CPU cores)
Locale: LANG=C, LC_CTYPE=pl_PL.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages python-html5lib depends on:
ii  python                        2.7.2-5    interactive high-level object-orie
ii  python-support                1.0.14     automated rebuilding support for P

Versions of packages python-html5lib suggests:
pn  python-beautifulsoup          <none>     (no description available)
ii  python-chardet                2.0.1-2    universal character encoding detec
pn  python-genshi                 <none>     (no description available)
ii  python-lxml                   2.3-0.1+b2 pythonic binding for the libxml2 a

-- 
Jakub Wilk





More information about the Python-modules-team mailing list