Bug#898022: diffoscope: Traceback when comparing paths with invalid unicode characters
Mattia Rizzolo
mattia at debian.org
Thu May 10 16:36:22 BST 2018
Control: tag -1 patch
On Sun, May 06, 2018 at 01:38:58AM +0100, Chris Lamb wrote:
> This is via <https://github.com/lamby/trydiffoscope/issues/35>, but I
> think the bug is in diffoscope itself.
It is, although one could say it's a bug in argparse.
> However, I can't seem to minimally reproduce with file by itself:
>
> import magic
> filename = b'\xf0\x28\x8c\x28'
> with open(filename, 'w'):
> pass
> m = magic.open(magic.NONE)
> m.load()
> m.file(filename)
That's because argparse decodes the arguments, you can get the same
traceback by using this instead of the last command above:
|>>> m.file(filename.decode('utf-8', errors='surrogateescape'))
|Traceback (most recent call last):
| File "<stdin>", line 1, in <module>
| File "/usr/lib/python3/dist-packages/magic/compat.py", line 148, in file
| return Magic.__tostr(_file(self._magic_t, Magic.__tobytes(filename)))
| File "/usr/lib/python3/dist-packages/magic/compat.py", line 138, in __tobytes
| return bytes(b, 'utf-8')
|UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf0' in position 0: surrogates not allowed
What do you think if we try to use:
|>>> m.file(f.encode('utf-8', errors='surrogateescape'))
In that place?
I.e., the following patch would fix this bug for me.
See also:
https://www.python.org/dev/peps/pep-0383/
https://bugs.python.org/issue21416
|diff --git a/diffoscope/comparators/utils/file.py b/diffoscope/comparators/utils/file.py
|index 4fd49ac..0638ef4 100644
|--- a/diffoscope/comparators/utils/file.py
|+++ b/diffoscope/comparators/utils/file.py
|@@ -68,7 +68,7 @@ class File(object, metaclass=abc.ABCMeta):
| if not hasattr(self, '_mimedb'):
| self._mimedb = magic.open(magic.NONE)
| self._mimedb.load()
|- return self._mimedb.file(path)
|+ return self._mimedb.file(path.encode('utf-8', errors='surrogateescape'))
|
| @classmethod
| def guess_encoding(self, path):
Do you think this would be fine?
--
regards,
Mattia Rizzolo
GPG Key: 66AE 2B4A FCCF 3F52 DA18 4D18 4B04 3FCD B944 4540 .''`.
more about me: https://mapreri.org : :' :
Launchpad user: https://launchpad.net/~mapreri `. `'`
Debian QA page: https://qa.debian.org/developer.php?login=mattia `-
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/reproducible-builds/attachments/20180510/606e7b92/attachment.sig>
More information about the Reproducible-builds
mailing list