[xml/sgml-pkgs] Bug#420636: crashes on feeds that contain invalid utf-8 sequences

Joey Hess joeyh at debian.org
Mon Apr 23 17:19:02 UTC 2007


Package: libxml-parser-perl
Version: 2.34-4.2
Severity: normal

XML::Parser is not robust enough to handle all the broken rss feeds out
there. The most common breakage that it fails on is a feed that contains
an invalid utf-8 sequence:

not well-formed (invalid token) at line 86, column 165, byte 4698 at
/usr/lib/perl5/XML/Parser.pm line 187

I've attached a copy of this feed.

The approach taken in other languages XML parsers, such as python's
feedparser, is to attempt to be as robust as possible, to be forgiving in
what is accepted. They also set a bozo bit if a feed is not well-formed,
so that tools that care can detect this.

-- System Information:
Debian Release: lenny/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: i386 (i686)

Kernel: Linux 2.6.20-1-686 (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages libxml-parser-perl depends on:
ii  libc6                         2.5-2      GNU C Library: Shared libraries
ii  libexpat1                     1.95.8-3.4 XML parsing C library - runtime li
ii  liburi-perl                   1.35-2     Manipulates and accesses URI strin
ii  libwww-perl                   5.805-1    WWW client/server library for Perl
ii  perl                          5.8.8-7    Larry Wall's Practical Extraction 
ii  perl-base [perlapi-5.8.8]     5.8.8-7    The Pathologically Eclectic Rubbis

libxml-parser-perl recommends no packages.

-- no debconf information

-- 
see shy jo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.alioth.debian.org/pipermail/debian-xml-sgml-pkgs/attachments/20070423/1a72c72f/attachment.pgp


More information about the debian-xml-sgml-pkgs mailing list