Bug#1053668: diffoscope: Consider using `file -i` as fallback for unknown file output
Chris Lamb
lamby at debian.org
Wed Oct 11 13:51:18 BST 2023
Niels Thykier wrote:
> Digging a bit deeper, it turns out that `file -i` correctly classifies
> the changelog as `text/plain; charset=utf-8`. That is, `file` knows it
> is text and I suspect `diffoscope` should try `file -i` as well when it
> gets an unknown result from `file`.
By "unknown result" I assume you mean that diffoscope cannot match
the file type with any known comparator. :) Indeed, diffoscope
doesn't recognise the bogus "Message Sequence Chart" so it falls
back to using a hexdump as you intuited.
I've got some WIP code that will treat unknown file types as text if
they have a MIME type of text/plain. This avoids the use of hexdump
with the examples you sent over at least.
Do you think I should be further limiting that conditional to a
whitelist of safe encodings, too? (eg. "utf-8" and "us-ascii", etc.)
Regards,
--
,''`.
: :' : Chris Lamb
`. `'` lamby at debian.org 🍥 chris-lamb.co.uk
`-
More information about the Reproducible-builds
mailing list