[xml/sgml-pkgs] Bug#420636: crashes on feeds that contain invalid
utf-8 sequences
Joey Hess
joeyh at debian.org
Mon Apr 23 17:19:02 UTC 2007
Package: libxml-parser-perl
Version: 2.34-4.2
Severity: normal
XML::Parser is not robust enough to handle all the broken rss feeds out
there. The most common breakage that it fails on is a feed that contains
an invalid utf-8 sequence:
not well-formed (invalid token) at line 86, column 165, byte 4698 at
/usr/lib/perl5/XML/Parser.pm line 187
I've attached a copy of this feed.
The approach taken in other languages XML parsers, such as python's
feedparser, is to attempt to be as robust as possible, to be forgiving in
what is accepted. They also set a bozo bit if a feed is not well-formed,
so that tools that care can detect this.
-- System Information:
Debian Release: lenny/sid
APT prefers unstable
APT policy: (500, 'unstable')
Architecture: i386 (i686)
Kernel: Linux 2.6.20-1-686 (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Versions of packages libxml-parser-perl depends on:
ii libc6 2.5-2 GNU C Library: Shared libraries
ii libexpat1 1.95.8-3.4 XML parsing C library - runtime li
ii liburi-perl 1.35-2 Manipulates and accesses URI strin
ii libwww-perl 5.805-1 WWW client/server library for Perl
ii perl 5.8.8-7 Larry Wall's Practical Extraction
ii perl-base [perlapi-5.8.8] 5.8.8-7 The Pathologically Eclectic Rubbis
libxml-parser-perl recommends no packages.
-- no debconf information
--
see shy jo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.alioth.debian.org/pipermail/debian-xml-sgml-pkgs/attachments/20070423/1a72c72f/attachment.pgp
More information about the debian-xml-sgml-pkgs
mailing list