[xml/sgml-pkgs] Bug#993638: Bug#993638: libxml2: XHTML 1.0 validation is broken

Mattia Rizzolo mattia at debian.org
Sun Sep 19 21:59:31 BST 2021


On Sun, Sep 19, 2021 at 09:45:19PM +0200, Vincent Lefevre wrote:
> On 2021-09-19 19:15:54 +0200, Mattia Rizzolo wrote:
> > I can never manage to download DTDs from w3.org (how could you?!), so,
> > taking your testcase and a copy of the same DTD:
> 
> The DTD is provided by Debian, no need to download it.

But you need to instruct xmllint to use said DTD, it won't by its own
decision to pick a random DTD from the filesystem.  I also know how to
use apt-file myself:
| % apt-file search xhtml1-strict.dtd
| dita-ot: /usr/share/dita-ot/demo/h2d/dtd/xhtml1-strict.dtd
| erlang-erl-docgen: /usr/lib/erlang/lib/erl_docgen-1.1.1/priv/dtd/xhtml1-strict.dtd
| kate5-data: /usr/share/katexmltools/xhtml1-strict.dtd.xml
| libpxp-ocaml-dev: /usr/share/doc/libpxp-ocaml-dev/examples/namespaces/xhtml1-strict.dtd.gz
| librdf-rdfa-parser-perl: /usr/share/perl5/auto/share/dist/RDF-RDFa-Parser/catalogue/www.w3.org/MarkUp/DTD/xhtml1-strict.dtd
| w3-recs: /usr/share/doc/w3-recs/html/www.w3.org/TR/2002/REC-xhtml1-20020801/DTD/xhtml1-strict.dtd.gz
| w3c-sgml-lib: /usr/share/xml/w3c-sgml-lib/schema/dtd/REC-xhtml1-20020801/xhtml1-strict.dtd
| xemacs21-basesupport: /usr/share/xemacs21/xemacs-packages/etc/psgml-dtds/xhtml1-strict.dtd
| xmlcopyeditor: /usr/share/xmlcopyeditor/dtd/xhtml1-strict.dtd
| %

indeed the one I used is the one from xmlcopyeditor (I picked a random
package, trusting that said .dtd is actually the same as all of the
above).

> > mattia at warren /tmp/tmp/xml % xmllint --dtdvalid xhtml1-strict.dtd --nonet --noout test.html
> > I/O error : Attempt to load network entity http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd
> > test.html:2: warning: failed to load external entity "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
> > C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
> >                                                                                ^
> > mattia at warren /tmp/tmp/xml %
> > 
> > which looks good to me.
> 
> An I/O error is not good. Your system appears to be broken.

My system is fine.  That error message is only a red herring due to
--nonet, and indeed the return code of xmllint is 0.

If you prefer, I can modify the DOCTYPE and do this instead, so there
won't be "I/O error"s and the return code is clear:

mattia at warren /tmp/tmp/xml % cat test.html
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "file:///tmp/tmp/xml/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>title</title></head>
<body><p>text</p></body>
</html>
mattia at warren /tmp/tmp/xml % xmllint --noout --nonet test.html ; echo $?
0
mattia at warren /tmp/tmp/xml % dpkg -l libxml2|tail -n1
ii  libxml2:amd64  2.9.12+dfsg-4 amd64        GNOME XML library
mattia at warren /tmp/tmp/xml %

-- 
regards,
                        Mattia Rizzolo

GPG Key: 66AE 2B4A FCCF 3F52 DA18  4D18 4B04 3FCD B944 4540      .''`.
More about me:  https://mapreri.org                             : :'  :
Launchpad user: https://launchpad.net/~mapreri                  `. `'`
Debian QA page: https://qa.debian.org/developer.php?login=mattia  `-
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/debian-xml-sgml-pkgs/attachments/20210919/d402d595/attachment-0001.sig>


More information about the debian-xml-sgml-pkgs mailing list