Bug#586021: python-debian: can deb822.Sources can not handle Sources	file with mixed data
    sean finney 
    seanius at debian.org
       
    Tue Jun 15 19:20:31 UTC 2010
    
    
  
Package: python-debian
Version: 0.1.16
Severity: important
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I was updating the codebase for the debian patch tracker, and have stumbled
across what i believe is a regression.  Now that python-debian uses unicode
internally (since 0.1.15 it seems), if a Sources file contains both utf-8
and latin-1 encoded maintainer names (like the etch Sources file does),
then it seems impossible to produce output from the resulting Sources instance.
the following code should illustrate it
#!/usr/bin/python
from debian import deb822
import sys
fh = file(sys.argv[1], "r")
outf = file(sys.argv[2], "w")
slist = deb822.Sources.iter_paragraphs(fh)
for ent in slist:
    print ent['Package']
    outf.write(ent.dump().encode('utf-8'))
    outf.write("\n")
./testit.py /srv/patch-tracker/archive/dists/etch/main/source/Sources /dev/null
<snip lots of output>
cadaver
cadubi
Traceback (most recent call last):
  File "./testit.py", line 12, in <module>
    outf.write(ent.dump().encode('utf-8'))
  File "/usr/lib/pymodules/python2.5/debian/deb822.py", line 387, in dump
    value = self.get_as_string(key)
  File "/usr/lib/pymodules/python2.5/debian/deb822.py", line 904, in get_as_string
    return Deb822.get_as_string(self, key)
  File "/usr/lib/pymodules/python2.5/debian/deb822.py", line 362, in get_as_string
    return unicode(self[key])
  File "/usr/lib/pymodules/python2.5/debian/deb822.py", line 179, in __getitem__
    value = value.decode(self.encoding)
  File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 2-5: invalid data
(maintainer name for cadubi is in latin-1)
i've tried a few variants of dump(), str(), unicode() with catching
the UnicodeDecode exception and re-encoding in latin-1, but the problem
seems to be within the iterator code.
- -- System Information:
Debian Release: squeeze/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing'), (1, 'experimental')
Architecture: amd64 (x86_64)
Kernel: Linux 2.6.34-rc5minime-00800-g198000a (SMP w/2 CPU cores)
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages python-debian depends on:
ii  python                        2.5.4-9    An interactive high-level object-o
ii  python-support                1.0.8      automated rebuilding support for P
Versions of packages python-debian recommends:
ii  python-apt                    0.7.95     Python interface to libapt-pkg
Versions of packages python-debian suggests:
ii  gpgv                          1.4.10-3   GNU privacy guard - signature veri
- -- no debconf information
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iD8DBQFMF9J7ynjLPm522B0RAib+AJ0U6R4WSsqd3kdz5gtOMZlkqimHCACfZmAa
Tu+6uN9WwvU/AxMqI0SWrrA=
=r73W
-----END PGP SIGNATURE-----
    
    
More information about the pkg-python-debian-maint
mailing list