Bug#898022: diffoscope: Traceback when comparing paths with invalid unicode characters
Mattia Rizzolo
mattia at debian.org
Thu May 10 17:01:50 BST 2018
On Thu, May 10, 2018 at 04:43:37PM +0100, Chris Lamb wrote:
> > Do you think this would be fine?
>
> Whilst this works, would it not be better if we could use bytes for
> filenames throughout? I mean, AIUI there is no assumption that
> filesystems need to have any form of valid encoding whatsoever, let
> alone UTF-8.
That was my initial idea as well, but apparently the Python developers
are of different opinion. Check out the PEP I linked in my previous
email: https://www.python.org/dev/peps/pep-0383/
Together with the argparse bug I also linked:
https://bugs.python.org/issue21416 - apparently it's "hard" (more like
impossible?) to get bytes from the CLI...
I believe that, like that bug is showing, we should just specify
type=os.fsencode # https://docs.python.org/3/library/os.html#os.fsencode
in the parser.add_argument() calls using a filename (to make sure
argparse doesn't change output), and then re-encode them before passing
them to functions that can't handle surrogate encoded stuff like this
magic module.
> However, somewhat happy to see this in diffoscope as it certainly
> improves the current state of affairs. If you do commit it, please
> include my testcase (or something based on it) that I added in:
>
> https://bugs.debian.org/898022#5
Of course.
--
regards,
Mattia Rizzolo
GPG Key: 66AE 2B4A FCCF 3F52 DA18 4D18 4B04 3FCD B944 4540 .''`.
more about me: https://mapreri.org : :' :
Launchpad user: https://launchpad.net/~mapreri `. `'`
Debian QA page: https://qa.debian.org/developer.php?login=mattia `-
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/reproducible-builds/attachments/20180510/2ca746c3/attachment.sig>
More information about the Reproducible-builds
mailing list