Bug#898022: diffoscope: Traceback when comparing paths with invalid unicode characters

Mattia Rizzolo mattia at debian.org
Thu May 10 20:12:34 BST 2018


Control: clone -1 -2
Control: tag -2 - patch
Control: severity -2 wishlist
Control: retitle -2 diffoscope: should handle the filenames as bytes

On Thu, May 10, 2018 at 06:01:50PM +0200, Mattia Rizzolo wrote:
> I believe that, like that bug is showing, we should just specify
>     type=os.fsencode    # https://docs.python.org/3/library/os.html#os.fsencode
> in the parser.add_argument() calls using a filename (to make sure
> argparse doesn't change output)

This indeed makes the filename be a .bytes through all the code, and
therefore requires a bunch of changes pretty much everywhere in the
comparators (str.endsiwth → bytes.endswith and with them all the
comparing strings needs to be become bytes as well).

I'm cloning this bug to keep track of this thing, as I'm not going to do
that now.
Initial patch to start (from there just try run it and it will crash)
  |--- a/diffoscope/main.py
  |+++ b/diffoscope/main.py
  |@@ -74,11 +74,13 @@ def create_parser():
  |     parser = argparse.ArgumentParser(
  |         description='Calculate differences between two files or directories',
  |         add_help=False, formatter_class=HelpFormatter)
  |-    parser.add_argument('path1', nargs='?', help='First file or directory to '
  |-                        'compare. If omitted, tries to read a diffoscope diff from stdin.')
  |-    parser.add_argument('path2', nargs='?', help='Second file or directory to '
  |-                        'compare. If omitted, no comparison is done but instead we read a '
  |-                        'diffoscope diff from path1 and will output this in the formats '
  |+    parser.add_argument('path1', nargs='?', type=os.fsencode,
  |+                        help='First file or directory to compare. If omitted, '
  |+                        'tries to read a diffoscope diff from stdin.')
  |+    parser.add_argument('path2', nargs='?', type=os.fsencode,
  |+                        help='Second file or directory to compare. If omitted, '
  |+                        'no comparison is done but instead we read a diffoscope '
  |+                        'diff from path1 and will output this in the formats '
  |                         'specified by the rest of the command line.')
  |     parser.add_argument('--debug', action='store_true',
  |                         default=False, help='Display debug messages')
  |


Also, I believe doing that change also requires changing the handling of
the filenames in the Container objects, thanks to diffoscope's
recursivity.  So really, a quite big change.


In the meantime to fix #898022 I'll apply the patch I posted earlier.

-- 
regards,
                        Mattia Rizzolo

GPG Key: 66AE 2B4A FCCF 3F52 DA18  4D18 4B04 3FCD B944 4540      .''`.
more about me:  https://mapreri.org                             : :'  :
Launchpad user: https://launchpad.net/~mapreri                  `. `'`
Debian QA page: https://qa.debian.org/developer.php?login=mattia  `-
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/reproducible-builds/attachments/20180510/eaced9d4/attachment.sig>


More information about the Reproducible-builds mailing list