[xml/sgml-pkgs] Bug#993638: Bug#993638: libxml2: XHTML 1.0 validation is broken

Vincent Lefevre vincent at vinc17.net
Mon Sep 20 01:21:40 BST 2021


On 2021-09-19 22:33:09 +0200, Thorsten Glaser wrote:
> It probably contains the ones for 1.0, but I found w3c-sgml-lib to
> not be sufficient in many ways and now use local files only…

which has always been the case, AFAIK. And the XHTML 1.0 related files
seem to be identical to the w3c-dtd-xhtml ones, except for comments
and spacing. For instance, there's the following change in the comment
of xhtml-lat1.ent:

      Typical invocation:
 
        <!ENTITY % xhtml-lat1
            PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN"
-                  "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent" >
+                  "xhtml-lat1.ent" >
        %xhtml-lat1;

but /usr/share/xml/xhtml/schema/dtd/1.0/xhtml1-strict.dtd from
w3c-dtd-xhtml is using:

<!ENTITY % HTMLlat1 PUBLIC
   "-//W3C//ENTITIES Latin 1 for XHTML//EN"
   "xhtml-lat1.ent">
%HTMLlat1;

<!ENTITY % HTMLsymbol PUBLIC
   "-//W3C//ENTITIES Symbols for XHTML//EN"
   "xhtml-symbol.ent">
%HTMLsymbol;

<!ENTITY % HTMLspecial PUBLIC
   "-//W3C//ENTITIES Special for XHTML//EN"
   "xhtml-special.ent">
%HTMLspecial;

(which has never had any issue). So, this was probably an old
documentation bug (but it doesn't matter when one uses only
public identifiers and catalogs).

> which means validating involves copying the file, changing the http
> link in the DOCTYPE with a local file:// link, then validating…
> working but suboptimal.

Everything should be available with the public identifiers via
catalogs. Perhaps w3c-sgml-lib doesn't set the catalogs correctly.
For instance, with w3c-dtd-xhtml, /etc/xml/w3c-dtd-xhtml.xml
contains:

<?xml version="1.0"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD XML Catalogs V1.0//EN"
  "file:///usr/share/xml/schema/xml-core/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<delegatePublic publicIdStartString="-//W3C//DTD XHTML//EN" catalog="file:///usr/share/xml/xhtml/schema/dtd/catalog.xml"/>
<delegatePublic publicIdStartString="-//W3C//DTD XHTML 1.0 Transitional//EN" catalog="file:///usr/share/xml/xhtml/schema/dtd/1.0/catalog.xml"/>
<delegatePublic publicIdStartString="-//W3C//DTD XHTML Basic 1.0//EN" catalog="file:///usr/share/xml/xhtml/schema/dtd/basic/catalog.xml"/>
<delegatePublic publicIdStartString="-//W3C//DTD XHTML 1//EN" catalog="file:///usr/share/xml/xhtml/schema/dtd/catalog.xml"/>
<delegatePublic publicIdStartString="-//W3C//DTD XHTML 1.1//EN" catalog="file:///usr/share/xml/xhtml/schema/dtd/1.1/catalog.xml"/>
<delegatePublic publicIdStartString="-//W3C//DTD XHTML 1.0 Strict//EN" catalog="file:///usr/share/xml/xhtml/schema/dtd/1.0/catalog.xml"/>
<delegatePublic publicIdStartString="-//W3C//DTD XHTML 1.0 Frameset//EN" catalog="file:///usr/share/xml/xhtml/schema/dtd/1.0/catalog.xml"/>
<delegatePublic publicIdStartString="-//W3C//ENTITIES Symbols for XHTML//EN" catalog="file:///usr/share/xml/entities/xhtml/catalog.xml"/>
<delegatePublic publicIdStartString="-//W3C//ENTITIES Special for XHTML//EN" catalog="file:///usr/share/xml/entities/xhtml/catalog.xml"/>
<delegatePublic publicIdStartString="-//W3C//ENTITIES Latin 1 for XHTML//EN" catalog="file:///usr/share/xml/entities/xhtml/catalog.xml"/>
<delegatePublic publicIdStartString="-//W3C//DTD XHTML Basic//EN" catalog="file:///usr/share/xml/xhtml/schema/dtd/catalog.xml"/>
<delegatePublic publicIdStartString="-//W3C//DTD HTML//EN" catalog="file:///usr/share/xml/xhtml/schema/dtd/catalog.xml"/>
</catalog>

and /usr/share/xml/entities/xhtml/catalog.xml contains:

[...]
<group prefer="public">
<!-- ISO latin 1 entity set for Extensible HTML (XML 1.0 format) -->
<public publicId="-//W3C//ENTITIES Latin 1 for XHTML//EN" uri="xhtml-lat1.ent"/>
<public publicId="-//W3C//ENTITIES Symbols for XHTML//EN" uri="xhtml-symbol.ent"/>
<public publicId="-//W3C//ENTITIES Special for XHTML//EN" uri="xhtml-special.ent"/>
</group>
[...]

so that libxml2 gets the right files only by using public identifiers.

-- 
Vincent Lefèvre <vincent at vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



More information about the debian-xml-sgml-pkgs mailing list