Better encoding support for python 2

Stuart Prescott stuart at debian.org
Sat Aug 4 11:29:20 BST 2018


Hi Mihai,

Many thanks for your patch. I think str handling in particular looks correct 
and I'll merge it before our next release. 

I don't think I understand how key lookup part of your patch actually solves 
the specific problem you mention, however.

> We have run across an older deb file:
> 
> http://ubuntu-master.mirror.tudos.de/ubuntu/pool/universe/a/aspell-is/aspell-is_0.51-0-4_all.deb
> 
> One of its files, usr/lib/aspell/íslenska.alias, is not utf8-encoded in
> the control file.

The encoding problem you have highlighted is found in the md5sums control 
file which is not a Deb822 format file and so the Deb822 class isn't a 
natural fit. (Also, for quite some time, the Deb822 format files have 
required to have keys be a limited subset of ASCII and this requirement only 
documented standard practice for a long time before that.) Further, Deb822 
keys are case-insensitive and so it seems bad to use filenames in Deb822 
keys since that will cause unwanted collisions.

My approach would be to use the `md5sums` parser within debfile to get this 
value:


from debian.debfile import DebFile

deb = DebFile('aspell-is_0.51-0-4_all.deb')

for encoding in [None, 'latin1']:
    for fname, md5sum in deb.md5sums(encoding=encoding).items():
        print(fname, md5sum)

(where None means not to decode it / leave as bytes and 'latin1' decodes the 
data)


Can you share what you are doing that is combining Deb822 files with 
md5sums?

> If there is a process I need to follow in order to submit the patch (i.e.
> for a repo, sign a contributor agreement etc) please let me know and I
> will do that too.

I don't think we know what our 'preferred' approach is right now. We seem to 
be currently doing discussion on the mailing list, merge requests on salsa 
and bug reports on the BTS... which is rather messy but does seem to be 
working just fine.

Thanks,
Stuart


-- 
Stuart Prescott    http://www.nanonanonano.net/   stuart at nanonanonano.net
Debian Developer   http://www.debian.org/         stuart at debian.org
GPG fingerprint    90E2 D2C1 AD14 6A1B 7EBB 891D BBC1 7EBB 1396 F2F7




More information about the pkg-python-debian-maint mailing list