Bug#1022210: diffoscope: highlight whitespace-only differences in text data

Paul Wise pabs at debian.org
Sat Oct 22 03:59:30 BST 2022


Package: diffoscope
Version: 224
Severity: wishlist

It would be nice if diffoscope could help highlight that text data
differ only in the whitespace or if they differ in the text too.

The proposal would be that by default diffoscope would use wdiff (or
similar) to compare all line based text buffers (including those
converted from other formats) in order to check for whitespace-only
differences. When there are non-whitespace differences then display the
wdiff (or similar) output, with a comment saying these are differences
in the text after ignoring whitespace. In situations where the text
buffer differs only by whitespace, diffoscope would do a line diff of
the text buffer itself, with a comment saying the text of the two text
buffers was not different and only whitespace changes were present.

Since this check probably won't be useful for all diffs of line based
text buffers, probably there will need to be various heuristics for
when to apply it and when not to apply it, perhaps starting with
applying it to everything and then adding exceptions over time.

This would be useful in some situations like when comparing old
versions of a document with newer versions of a document or similar.
In particular it would have been useful when preparing this mail:

https://lists.debian.org/msgid-search/197a4671e7694c24424b91b4d7288867c0c85d9b.camel@debian.org

-- System Information:
Debian Release: bookworm/sid
  APT prefers testing-debug
  APT policy: (900, 'testing-debug'), (900, 'testing'), (800, 'unstable-debug'), (800, 'unstable'), (790, 'buildd-unstable'), (700, 'experimental-debug'), (700, 'experimental'), (690, 'buildd-experimental')
merged-usr: no
Architecture: amd64 (x86_64)

Kernel: Linux 6.0.0-1-amd64 (SMP w/8 CPU threads; PREEMPT)
Kernel taint flags: TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=en_AU.utf8, LC_CTYPE=en_AU.utf8 (charmap=UTF-8), LANGUAGE=en_AU:en
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages diffoscope depends on:
ii  diffoscope-minimal  224

Versions of packages diffoscope recommends:
ii  abootimg                         0.6-1+b2
ii  acl                              2.3.1-1
ii  androguard                       3.4.0~a1-5
ii  apksigner                        31.0.2-1
ii  apktool                          2.6.1+dfsg.1-2
ii  binutils-multiarch               2.39-8
ii  bzip2                            1.0.8-5+b1
ii  caca-utils                       0.99.beta20-3
ii  colord                           1.4.6-1
ii  coreboot-utils                   4.15~dfsg-2
ii  db-util                          5.3.1+nmu1
ii  default-jdk [java-sdk]           2:1.11-72
ii  default-jdk-headless             2:1.11-72
pn  device-tree-compiler             <none>
pn  docx2txt                         <none>
ii  e2fsprogs                        1.46.6~rc1-1+b1
ii  enjarify                         1:1.0.3-5
ii  ffmpeg                           7:5.1.2-1
ii  fontforge-extras                 1:20220308~dfsg-1
pn  fp-utils                         <none>
ii  genisoimage                      9:1.1.11-3.4
ii  gettext                          0.21-9
ii  ghc                              9.0.2-4
ii  ghostscript                      9.56.1~dfsg-1
ii  giflib-tools                     5.2.1-2.5
ii  gnumeric                         1.12.52-1
ii  gnupg                            2.2.39-1
ii  gnupg-utils                      2.2.39-1+b1
pn  hdf5-tools                       <none>
ii  imagemagick                      8:6.9.11.60+dfsg-1.3+b3
ii  imagemagick-6.q16 [imagemagick]  8:6.9.11.60+dfsg-1.3+b3
ii  jsbeautifier                     1.14.4-1
ii  libarchive-tools                 3.6.0-1
pn  libxmlb-dev                      <none>
ii  llvm                             1:14.0-55.2+b1
ii  lz4 [liblz4-tool]                1.9.4-1
pn  mono-utils                       <none>
ii  ocaml-nox                        4.13.1-3
pn  odt2txt                          <none>
pn  oggvideotools                    <none>
ii  openjdk-11-jdk [java-sdk]        11.0.17+8-2
ii  openssh-client                   1:9.0p1-1+b2
ii  openssl                          3.0.5-4
ii  pgpdump                          0.34-1
ii  poppler-utils                    22.08.0-2.1
pn  procyon-decompiler               <none>
ii  python3-argcomplete              2.0.0-1
ii  python3-binwalk                  2.3.3+dfsg1-2
ii  python3-debian                   0.1.48
ii  python3-defusedxml               0.7.1-2
ii  python3-guestfs                  1:1.48.4-2+b1
hi  python3-jsondiff                 1.1.1-4
ii  python3-pdfminer                 20220319+dfsg-1
ii  python3-progressbar              2.5-3
ii  python3-pypdf2                   2.11.0-1
ii  python3-pyxattr                  0.7.2-2+b1
ii  python3-rpm                      4.17.1.1+dfsg-1
ii  python3-tlsh                     3.4.4+20151206-1.4+b2
pn  r-base-core                      <none>
pn  radare2                          <none>
ii  rpm2cpio                         4.17.1.1+dfsg-1
ii  sng                              1.1.0-4
ii  sqlite3                          3.39.4-1
ii  squashfs-tools                   1:4.5.1-1
ii  tcpdump                          4.99.1-4+b1
ii  u-boot-tools                     2022.10+dfsg-1
ii  unzip                            6.0-27
pn  wabt                             <none>
pn  xmlbeans                         <none>
ii  xxd                              2:9.0.0626-1
ii  xz-utils                         5.2.5-2.1
ii  zip                              3.0-12
ii  zstd                             1.5.2+dfsg-1

Versions of packages diffoscope suggests:
ii  libjs-jquery  3.6.1+dfsg+~3.5.14-1

-- no debconf information

-- 
bye,
pabs

https://wiki.debian.org/PaulWise
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://alioth-lists.debian.net/pipermail/reproducible-builds/attachments/20221022/2b094cc8/attachment.sig>


More information about the Reproducible-builds mailing list