[xml/sgml-pkgs] Bug#1037063: libxml-libxml-perl: Seemingly incorrect handling of escaped characters in patterns
Niko Tyni
ntyni at debian.org
Mon Jun 5 20:23:47 BST 2023
reassign 1037063 libxml2 2.9.10+dfsg-6.7
found 1037063 2.9.10+dfsg-6.7+deb11u4
close 1037063 2.9.12+dfsg-1
forwarded 1037063 https://gitlab.gnome.org/GNOME/libxml2/-/issues/188
thanks
Hi, thanks for the report. See below for some analysis.
(TL;DR: It's a bug in libxml2 that was fixed upstream after the version
in bullseye, and the fix will be in the upcoming bookworm release.)
On Fri, Jun 02, 2023 at 10:38:02PM -0500, Xan Charbonnet wrote:
> Package: libxml-libxml-perl
> Version: 2.0134+dfsg-2+b1
> Severity: normal
>
> Dear Maintainer,
>
> I use XML::LibXML::Reader to work with files that validate against the Library
> of Congress's MARCXML Schema, available here:
> https://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd
>
> That schema includes a pattern:
> [\dA-Za-z!"#$%&'()*+,-./:;<=>?{}_^`~\[\]\\]{1}
> or, with the XML escaping processed:
> [\dA-Za-z!"#$%&'()*+,-./:;<=>?{}_^`~\[\]\\]{1}
>
> That regex requires a single character, any one of a long list of allowable
> characters. Note how three of the characters require escaping because they
> would have meaning in the regex itself: the two square brackets [ and ], and
> the backslash \.
>
> An online XML Schema validator that I found with a quick search:
> https://www.liquid-technologies.com/online-xsd-validator
> shows that those three characters are valid. The problem is that
> XML::LibXML::Reader seems to believe that they are not.
[...]
> test.xml:1: Schemas validity error : Element 'root', attribute 'code': [facet
> 'pattern'] The value '[' is not accepted by the pattern
> '[\dA-Za-z!"#$%&'()*+,-./:;<=>?{}_^`~\[\]\\]{1}'.
>
> I believe that value in fact should match that pattern. The online schema
> validator from earlier validates this pair of files. If you replace the data
> in the "code" attribute with any of the other characters, validation passes.
> It only fails for the three characters that are escaped.
This last part is not quite correct: validation also fails for the
backtick (`) and the tilde (~) characters. It's not about quoting,
it's about mistreating the caret (^) in the middle of the pattern.
The fault actually lies is in libxml2 which libxml-libxml-perl uses,
as seen with `xmllint --schema test.xsd test.xml` (xmllint is in
the libxml2-utils package).
It looks like it was fixed upstream in libxml2 2.9.11 with
https://gitlab.gnome.org/GNOME/libxml2/-/commit/7d6837ba0e282e94eb8630ad791f427e44a57491
and the fix entered Debian with 2.9.12.
I'm reassigning and closing the bug as it's fixed in current versions
in unstable and testing. Not sure if it's something that should be
backported to current Debian stable (bullseye). Feel free to discuss
that with the libxml2 maintainers (cc'd) if you like.
--
Niko Tyni ntyni at debian.org
More information about the debian-xml-sgml-pkgs
mailing list