Better encoding support for python 2
Mihai Ibanescu
mihai.ibanescu at gmail.com
Tue Jul 10 20:32:55 BST 2018
Hi,
We have run across an older deb file:
http://ubuntu-master.mirror.tudos.de/ubuntu/pool/universe/a/aspell-is/aspell-is_0.51-0-4_all.deb
One of its files, usr/lib/aspell/íslenska.alias, is not utf8-encoded in the
control file.
This exposed what I think is a bug in deb822.Deb822: in python 2, I cannot
load a sequence (dictionary) in one encoding and dump it into a different
encoding. This works fine in python3. The difference is that keys are
internally stored as text both in PY2 and PY3, but they mean different
things. In PY3, text means unicode, so the original encoding is irrelevant.
In PY2, text means binary (in PY3 parlance), and the original encoding is
relevant.
To simplify the problem, I will only use the first offending letter of the
file that has problems, í (\xed in iso-8859-1). Here is my test script:
from debian import deb822
obj = deb822.Deb822({'\xed': 'i'}, encoding='iso-8859-1')
print(obj.dump(encoding='utf-8'))
Running it in python3:
python3.6 test.py
í: i
Running it in python 2.7:
python2.7 test.py
...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xed in position 0:
ordinal not in range(128)
Another bug in PY2 is related to the implementation of __str__: it should
return a string object, but self.dump() returns Unicode.
The attached patch fixes both of those problems.
I will be happy to write a test but I wanted to get some feedback about the
correctness of the patch first.
There are also a lot of unreleased patches in git, and it would be nice if
they were tagged as a release.
If there is a process I need to follow in order to submit the patch (i.e.
for a repo, sign a contributor agreement etc) please let me know and I will
do that too.
Thanks!
Mihai
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/pkg-python-debian-maint/attachments/20180710/72dba53c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: x.patch
Type: text/x-patch
Size: 1561 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/pkg-python-debian-maint/attachments/20180710/72dba53c/attachment.bin>
More information about the pkg-python-debian-maint
mailing list