[xml/sgml-pkgs] Bug#993638: Bug#993638: libxml2: XHTML 1.0 validation is broken

Vincent Lefevre vincent at vinc17.net
Mon Sep 20 14:57:35 BST 2021


On 2021-09-20 03:18:46 +0200, Vincent Lefevre wrote:
> Hmm... there seems to be a subtle difference in xhtml-special.ent:
> 
> With the file from w3c-dtd-xhtml:
> 
> <!ENTITY quot    """ ><!-- quotation mark = APL quote, U+0022 ISOnum -->
> <!ENTITY amp     "&" ><!-- ampersand, U+0026 ISOnum -->
> <!ENTITY lt      "<" ><!-- less-than sign, U+003C ISOnum -->
> <!ENTITY gt      ">" ><!-- greater-than sign, U+003E ISOnum -->
> 
> But with the file from w3c-sgml-lib:
> 
> <!ENTITY lt      "&#60;" ><!-- less-than sign, U+003C ISOnum -->
> <!ENTITY gt      ">" ><!-- greater-than sign, U+003E ISOnum -->
> <!ENTITY amp     "&#38;" ><!-- ampersand, U+0026 ISOnum -->
> <!ENTITY apos    "'" ><!-- The Apostrophe (Apostrophe Quote, APL Quote), U+0027 ISOnum -->
> <!ENTITY quot    """ ><!-- quotation mark (Quote Double), U+0022 ISOnum --

On this subject, I can see in an old CVS repository of mine that
I committed a correct file (I don't remember where it came from,
but probably from Debian), i.e. with

<!ENTITY quot    """> <!--  quotation mark, U+0022 ISOnum -->
<!ENTITY amp     "&#38;"> <!--  ampersand, U+0026 ISOnum -->
<!ENTITY lt      "&#60;"> <!--  less-than sign, U+003C ISOnum -->
<!ENTITY gt      ">"> <!--  greater-than sign, U+003E ISOnum -->
<!ENTITY apos    "'"> <!--  apostrophe = APL quote, U+0027 ISOnum -->

in August 2002.

On https://snapshot.debian.org/package/w3c-dtd-xhtml/1.1-5/
I can see that the Debian package (released on 2004-08-08)
was correct (for the XHTML 1.0 xhtml-special.ent file; the
XHTML 1.1 one was incorrect).

But on https://snapshot.debian.org/package/w3c-sgml-lib/1.2-2/
(which gave the w3c-dtd-xhtml binary package in this version),
released on 2012-04-14, while the upstream part was correct,
the  w3c-sgml-lib_1.2-2.debian.tar.gz file has

  debian/legacy/basic/xhtml-special.ent

with the incorrect entity definitions. So, if I understand correctly,
this was a Debian-specific bug. I suspect that the incorrect XHTML 1.1
definitions were retrieved from the old w3c-dtd-xhtml source and
shared for both XHTML 1.0 and XHTML 1.1 DTDs. This would explain
how the bug has been introduced in Debian from 2012 to 2016 (and
still now until the w3c-dtd-xhtml package is removed from users'
machines).

-- 
Vincent Lefèvre <vincent at vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



More information about the debian-xml-sgml-pkgs mailing list