Bug#1093338: libbio-eutilities-perl: FTBFS: Failed 8/20 test programs. 0/174 subtests failed.

David Miguel Susano Pinto carandraug+dev at gmail.com
Thu Jan 23 23:46:51 GMT 2025


David Miguel Susano Pinto <carandraug+dev at gmail.com> writes:
> Étienne Mollier <emollier at debian.org> writes:
>> [...]
>> This is bugging me by the way, because I'm not supposed to have
>> much more network access from my build environment than from my
>> autopkgtest environment, so I don't understand why the build
>> time test goes through.  I guess there is a ressource available
>> from within the source code directory which is not captured
>> anymore once files are dispatched in the file system tree.
>
> What's happening here is that the XML parsers download the DTD
> associated with the XML files (the DTD declares the structure of the XML
> file which is used to validade it).  To avoid requiring internet
> connection during the tests in Debian, those DTD are inlined.
>
> Still don't know why it was working before and not anymore.
>
> As a side note, the PurePerl XML parser is being used which is the
> fallback, slowest, not recommended (from the Debian package description)
> parser.  So maybe other parsers should be recommended.

I believe I have figured out what's happening.  This is a bug in the DTD
parser of the XML::SAX::PurePerl parser.  Here's how to demonstrate that
it doesn't happen with other parsers:

    git clone https://salsa.debian.org/perl-team/modules/packages/libbio-eutilities-perl.git
    cd libbio-eutilities-perl
    patch -p 1 < debian/patches/inline-DTDs-on-testsuite
    XML_SIMPLE_PREFERRED_PARSER="XML::LibXML::SAX" prove -I . t  # no failures
    XML_SIMPLE_PREFERRED_PARSER="XML::SAX::Expat" prove -I . t  # no failures
    XML_SIMPLE_PREFERRED_PARSER="XML::SAX::PurePerl" prove -I . t  # this fails

Note that BioPerl doesn't care which XML parser is used, it leaves that
to XML::Simple.  The XML::Simple default are XML::SAX::ParserFactory's
normal rules which typically means the last SAX parser installed.

So I think there are two issues here, a XML::SAX bug and a Debian
packaging bug of the XML::SAX parsers.


1. A XML::SAX bug (the root of this bug?):

   The root of the failures is that the PurePerl parser is unable to
   parse a choice or sequence list where one of the elements has the "+"
   quantifier, e.g.:

       <!ELEMENT IdUrlSet (Id,(ObjUrl+|Info))>
       <!ELEMENT DocSum (Id , Item+)>

   These can be seen on:

       https://www.ncbi.nlm.nih.gov/entrez/query/DTD/eLink_020511.dtd
       https://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSummary_041029.dtd

   Those two element declarations are the ones that cause this FTBFS.
   If I modify the DTDs to remove the "+" from those lines, then the
   tests no longer fail.


2. A Debian packaging bug?

   After installing all these Debian packages, the default XML parser
   ends up being XML::SAX::PurePerl.  It shouldn't.  The default
   behaviour of XML::SAX is to use the latest installed.  Since the
   PurePerl parser comes with XML::SAX itself, it should be the first to
   be installed and therefore only used when no other is available.

   I'm not sure how XML::SAX knows which parser was last installed but
   this seems like a Debian package issue to me.


I am unable to fix either of those bugs.  But if we avoid the use of the
PurePerl parser then this FTBFS bug goes away.  Here's what the
developer of the PurePerl parser has to say on its own documentation:

    XML::SAX::PurePerl is slow.  Very slow.  I suggest you use something
    else in fact.  However it is great as a fallback parser for
    XML::SAX, where the user might not be able to install an XS based
    parser or C library.

    Currently lots [of bugs], probably.  At the moment the weakest area
    is parsing DOCTYPE declarations, though the code is in place to
    start doing this.  Also parsing parameter entity references is
    causing me much confusion, since it's not exactly what I would call
    trivial, or well documented in the XML grammar.  XML documents with
    internal subsets are likely to fail.


Best wishes
David



More information about the pkg-perl-maintainers mailing list