Bug#1053668: diffoscope: Consider using `file -i` as fallback for unknown file output

Niels Thykier niels at thykier.net
Sun Oct 8 12:50:20 BST 2023


Package: diffoscope
Version: 250
Severity: wishlist
X-Debbugs-Cc: niels at thykier.net

Hi,

I noticed that `diffoscope` used `hexdump -C` based diffs for the 
debian/changelog in the `mscgen` package.

My first bet was that `file` would produce incorrect output and indeed, 
`file` classifies that changelog as a `Message Sequence Chart` rather 
than text.  This is now filed as 1053666.

Digging a bit deeper, it turns out that `file -i` correctly classifies 
the changelog as `text/plain; charset=utf-8`.  That is, `file` knows it 
is text and I suspect `diffoscope` should try `file -i` as well when it 
gets an unknown result from `file`.

This bug report obviously assumes that the `hexdump -C` like output is 
triggered because `diffoscope` uses `file` for determining how to 
analyze the changelog.  If it uses something else, then there is some 
other bug in play that makes `diffoscope` treat the `mscgen` changelog 
as a binary file.

Here are two samples files that `file` considers to be `Message Sequence 
Chart (chart)` and `text/plain; charset=us-ascii` with -i, in case it is 
useful for a test:

```
msc {
   a, b;
}
```
```
msc {
   c, d;
}
```



Best regards,
Niels



More information about the Reproducible-builds mailing list