[xml/sgml-pkgs] Bug#574104: Bug#574104: libxml2: considers null bytes as EOF markers
Mike Hommey
mh at glandium.org
Tue Mar 16 12:04:29 UTC 2010
On Tue, Mar 16, 2010 at 12:37:05PM +0100, Jakub Wilk wrote:
> * Mike Hommey <mh at glandium.org>, 2010-03-16, 12:23:
> >>libxml2 ignores null bytes (and following bytes) in an XML file:
> >>
> >>$ printf '<test/>\0junk' | xmlwf
> >>STDIN:1:7: not well-formed (invalid token)
> >>
> >>$ printf '<test/>\0junk' | xmllint -
> >><?xml version="1.0"?>
> >><test/>
> >
> >For a starter, libxml2 treats your data as UTF-8, and as such uses null
> >terminated strings, so this is not an unexpected behaviour.
>
> Huh? Why should I care about such implementation details? I care
> about behaviour, which is broken. (Anyway, UTF-8 and null-terminated
> string are *unrelated* concepts.)
>
> >Secondly, the null character is not allowed in a xml file.
>
> That's my point. It is not allowed, yet xmllint happily accept files
> containing it as well-formed.
Oh, sorry for the misunderstanding.
Interestingly, it *does* recognize some brokenness due to null
characters:
$ printf '<test>ju\0nk</test>' |xmllint -
-:1: parser error : Char 0x0 out of allowed range
<test>ju
^
-:1: parser error : Premature end of data in tag test line 1
<test>ju
^
Mike
More information about the debian-xml-sgml-pkgs
mailing list