Bug#1093338: libbio-eutilities-perl: FTBFS: Failed 8/20 test programs. 0/174 subtests failed.
David Miguel Susano Pinto
carandraug+dev at gmail.com
Thu Jan 23 23:46:51 GMT 2025
David Miguel Susano Pinto <carandraug+dev at gmail.com> writes:
> Étienne Mollier <emollier at debian.org> writes:
>> [...]
>> This is bugging me by the way, because I'm not supposed to have
>> much more network access from my build environment than from my
>> autopkgtest environment, so I don't understand why the build
>> time test goes through. I guess there is a ressource available
>> from within the source code directory which is not captured
>> anymore once files are dispatched in the file system tree.
>
> What's happening here is that the XML parsers download the DTD
> associated with the XML files (the DTD declares the structure of the XML
> file which is used to validade it). To avoid requiring internet
> connection during the tests in Debian, those DTD are inlined.
>
> Still don't know why it was working before and not anymore.
>
> As a side note, the PurePerl XML parser is being used which is the
> fallback, slowest, not recommended (from the Debian package description)
> parser. So maybe other parsers should be recommended.
I believe I have figured out what's happening. This is a bug in the DTD
parser of the XML::SAX::PurePerl parser. Here's how to demonstrate that
it doesn't happen with other parsers:
git clone https://salsa.debian.org/perl-team/modules/packages/libbio-eutilities-perl.git
cd libbio-eutilities-perl
patch -p 1 < debian/patches/inline-DTDs-on-testsuite
XML_SIMPLE_PREFERRED_PARSER="XML::LibXML::SAX" prove -I . t # no failures
XML_SIMPLE_PREFERRED_PARSER="XML::SAX::Expat" prove -I . t # no failures
XML_SIMPLE_PREFERRED_PARSER="XML::SAX::PurePerl" prove -I . t # this fails
Note that BioPerl doesn't care which XML parser is used, it leaves that
to XML::Simple. The XML::Simple default are XML::SAX::ParserFactory's
normal rules which typically means the last SAX parser installed.
So I think there are two issues here, a XML::SAX bug and a Debian
packaging bug of the XML::SAX parsers.
1. A XML::SAX bug (the root of this bug?):
The root of the failures is that the PurePerl parser is unable to
parse a choice or sequence list where one of the elements has the "+"
quantifier, e.g.:
<!ELEMENT IdUrlSet (Id,(ObjUrl+|Info))>
<!ELEMENT DocSum (Id , Item+)>
These can be seen on:
https://www.ncbi.nlm.nih.gov/entrez/query/DTD/eLink_020511.dtd
https://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSummary_041029.dtd
Those two element declarations are the ones that cause this FTBFS.
If I modify the DTDs to remove the "+" from those lines, then the
tests no longer fail.
2. A Debian packaging bug?
After installing all these Debian packages, the default XML parser
ends up being XML::SAX::PurePerl. It shouldn't. The default
behaviour of XML::SAX is to use the latest installed. Since the
PurePerl parser comes with XML::SAX itself, it should be the first to
be installed and therefore only used when no other is available.
I'm not sure how XML::SAX knows which parser was last installed but
this seems like a Debian package issue to me.
I am unable to fix either of those bugs. But if we avoid the use of the
PurePerl parser then this FTBFS bug goes away. Here's what the
developer of the PurePerl parser has to say on its own documentation:
XML::SAX::PurePerl is slow. Very slow. I suggest you use something
else in fact. However it is great as a fallback parser for
XML::SAX, where the user might not be able to install an XS based
parser or C library.
Currently lots [of bugs], probably. At the moment the weakest area
is parsing DOCTYPE declarations, though the code is in place to
start doing this. Also parsing parameter entity references is
causing me much confusion, since it's not exactly what I would call
trivial, or well documented in the XML grammar. XML documents with
internal subsets are likely to fail.
Best wishes
David
More information about the pkg-perl-maintainers
mailing list