Bug#879217: diffoscope: container multilevel comparison

Juliana Oliveira juliana.orod at gmail.com
Fri Oct 20 15:43:08 UTC 2017

Package: diffoscope
Version: 87
Severity: wishlist

Dear maintainer,

As of today, is not possible to properly compare container files when they
have different depths. For example, we can compare sample.tar.gz to
sample.tar.bz, but can't sample.tar.gz to sample.tar.

This happens due to the .compare method dynamic, which when called from a
container expects to be compared to another container and when called from a
file, expects to be compared to a file. When comparing multi-level containers,
eventually we'll have a File <> Container comparison, falling back to binary
comparison without extracting further container layers. Since container depth
level goes up to 50, this gets worse the lower we get.

At the moment, comparison works like this:

sample.tar.gz (A)               A.compare(B) (Container <> Container)
└── sample.tar (C)                - extracts sample.tar and sample.txt
    └── sample1.txt             C.compare(D) (Container <> File)
                                  - returns binary comparison
sample.tar (B)
└── sample2.txt (D)

A simple solution would be extracting all the way to a File, but that may
cause loss of metadata and tree visualization. A more complex solution may be
a comparison by level in which on each level, metadata is attached to a
Difference tree. This normalizes container levels. For example:

sample.tar.gz (A)               A.compare(B) (Container <> Container)
└── sample.tar (C)                - same type but containers. attach
    └── sample1.txt (E)             metadata diff to tree
                                  - extracts sample.tar and sample.txt
                                C.compare(D) (Container <> File)
sample.tar (B)                    - one is container. attach C metadata
└── sample2.txt (D)                 to diff
                                  - extracts sample.txt
                                E.compare(D) (File <> File)
                                  - both files. attach diff

Possible resulting diff:

--- sample.tar.gz
+++ sample.tar
├── file list
│ --rw-r--r--   0 user  ...  sample.tar
│ +-rw-r--r--   0 user  ...  sample2.txt
├── filetype from diffoscope
│ -GzipFile
│ +TarFile
├── --- sample.tar
│ ├── file list
│ │ +-rw-r--r--   0 user ... sample1.txt
│ │   --- sample1.txt
│ ├── +++ sample2.txt
│ │ @@ -1,6 +1,12 @@
│ │ +A common form of lorem ipsum reads:
│ ...

More information about the Reproducible-builds mailing list