Bug#586021: python-debian: can deb822.Sources can not handle Sources file with mixed data

sean finney seanius at debian.org
Tue Jun 15 19:20:31 UTC 2010

Package: python-debian
Version: 0.1.16
Severity: important

Hash: SHA1

I was updating the codebase for the debian patch tracker, and have stumbled
across what i believe is a regression.  Now that python-debian uses unicode
internally (since 0.1.15 it seems), if a Sources file contains both utf-8
and latin-1 encoded maintainer names (like the etch Sources file does),
then it seems impossible to produce output from the resulting Sources instance.

the following code should illustrate it


from debian import deb822
import sys

fh = file(sys.argv[1], "r")
outf = file(sys.argv[2], "w")

slist = deb822.Sources.iter_paragraphs(fh)
for ent in slist:
    print ent['Package']

./testit.py /srv/patch-tracker/archive/dists/etch/main/source/Sources /dev/null
<snip lots of output>
Traceback (most recent call last):
  File "./testit.py", line 12, in <module>
  File "/usr/lib/pymodules/python2.5/debian/deb822.py", line 387, in dump
    value = self.get_as_string(key)
  File "/usr/lib/pymodules/python2.5/debian/deb822.py", line 904, in get_as_string
    return Deb822.get_as_string(self, key)
  File "/usr/lib/pymodules/python2.5/debian/deb822.py", line 362, in get_as_string
    return unicode(self[key])
  File "/usr/lib/pymodules/python2.5/debian/deb822.py", line 179, in __getitem__
    value = value.decode(self.encoding)
  File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 2-5: invalid data

(maintainer name for cadubi is in latin-1)

i've tried a few variants of dump(), str(), unicode() with catching
the UnicodeDecode exception and re-encoding in latin-1, but the problem
seems to be within the iterator code.

- -- System Information:
Debian Release: squeeze/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.34-rc5minime-00800-g198000a (SMP w/2 CPU cores)
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages python-debian depends on:
ii  python                        2.5.4-9    An interactive high-level object-o
ii  python-support                1.0.8      automated rebuilding support for P

Versions of packages python-debian recommends:
ii  python-apt                    0.7.95     Python interface to libapt-pkg

Versions of packages python-debian suggests:
ii  gpgv                          1.4.10-3   GNU privacy guard - signature veri

- -- no debconf information

Version: GnuPG v1.4.10 (GNU/Linux)


More information about the pkg-python-debian-maint mailing list