Bug#420636: [xml/sgml-pkgs] Bug#420636: crashes on feeds that contain
invalid utf-8 sequences
Mike Hommey
mh at glandium.org
Mon Apr 23 17:51:14 UTC 2007
severity 420636 wishlist
thanks
On Mon, Apr 23, 2007 at 01:19:02PM -0400, Joey Hess <joeyh at debian.org> wrote:
> Package: libxml-parser-perl
> Version: 2.34-4.2
> Severity: normal
>
> XML::Parser is not robust enough to handle all the broken rss feeds out
> there. The most common breakage that it fails on is a feed that contains
> an invalid utf-8 sequence:
>
> not well-formed (invalid token) at line 86, column 165, byte 4698 at
> /usr/lib/perl5/XML/Parser.pm line 187
>
> I've attached a copy of this feed.
>
> The approach taken in other languages XML parsers, such as python's
> feedparser, is to attempt to be as robust as possible, to be forgiving in
> what is accepted. They also set a bozo bit if a feed is not well-formed,
> so that tools that care can detect this.
Such a lax xml parser is not an xml parser. This bug is therefore a
wishlist bug.
Mike
More information about the debian-xml-sgml-pkgs
mailing list