Bug#999438: diffoscope: issues with XML files not named *.xml

Paul Wise pabs at debian.org
Thu Nov 11 01:05:41 GMT 2021


Package: diffoscope
Version: 190
Severity: normal

There are two issues with XML files not named *.xml:

They don't get reformatted before comparison, resulting in a diff of
the plain text, instead of a diff of the reformatted XML.

When comparing them with XML files named *.xml, a comparison of the
bytes is done, resulting in a diff of two hex dumps, instead of a diff
of the reformatted XML or a diff of the plain text. The reformatted XML
would be the best thing to diff, but plain text should be a fallback.

The xmllint tool can reformat them just fine and the file tool can
detect them as XML and detect their MIME type, so this issue is likely
to be a problem in the diffoscope code.

   $ head -vn-0 test-{old,new}.xml 
   ==> test-old.xml <==
   <?xml version="1.0" encoding="UTF-8"?>
   <test>
   <foo>
   <bar>
   </bar>
   </foo>
   </test>
   
   ==> test-new.xml <==
   <?xml version="1.0" encoding="UTF-8"?>
   <test>
   <foo>
   <bar>
   <baz>
   </baz>
   </bar>
   </foo>
   </test>
   
   $ diffoscope test-{old,new}.xml 
   --- test-old.xml
   +++ test-new.xml
   │   --- test-old.xml
   ├── +++ test-new.xml
   │ @@ -1,6 +1,8 @@
   │  <?xml version="1.0" encoding="utf-8"?>
   │  <test>
   │    <foo>
   │ -    <bar/>
   │ +    <bar>
   │ +      <baz/>
   │ +    </bar>
   │    </foo>
   │  </test>
   
   $ cp test-new.xml test-new.not-xml
   
   $ cp test-old.xml test-old.not-xml
   
   $ diffoscope test-{old,new}.not-xml 
   --- test-old.not-xml
   +++ test-new.not-xml
   @@ -1,7 +1,9 @@
    <?xml version="1.0" encoding="UTF-8"?>
    <test>
    <foo>
    <bar>
   +<baz>
   +</baz>
    </bar>
    </foo>
    </test>
   
   $ diffoscope test-old.xml test-new.not-xml 
   --- test-old.xml
   +++ test-new.not-xml
   @@ -1,5 +1,6 @@
    00000000: 3c3f 786d 6c20 7665 7273 696f 6e3d 2231  <?xml version="1
    00000010: 2e30 2220 656e 636f 6469 6e67 3d22 5554  .0" encoding="UT
    00000020: 462d 3822 3f3e 0a3c 7465 7374 3e0a 3c66  F-8"?>.<test>.<f
   -00000030: 6f6f 3e0a 3c62 6172 3e0a 3c2f 6261 723e  oo>.<bar>.</bar>
   -00000040: 0a3c 2f66 6f6f 3e0a 3c2f 7465 7374 3e0a  .</foo>.</test>.
   +00000030: 6f6f 3e0a 3c62 6172 3e0a 3c62 617a 3e0a  oo>.<bar>.<baz>.
   +00000040: 3c2f 6261 7a3e 0a3c 2f62 6172 3e0a 3c2f  </baz>.</bar>.</
   +00000050: 666f 6f3e 0a3c 2f74 6573 743e 0a         foo>.</test>.
   
   $ diffoscope test-old.not-xml test-new.xml 
   --- test-old.not-xml
   +++ test-new.xml
   @@ -1,5 +1,6 @@
    00000000: 3c3f 786d 6c20 7665 7273 696f 6e3d 2231  <?xml version="1
    00000010: 2e30 2220 656e 636f 6469 6e67 3d22 5554  .0" encoding="UT
    00000020: 462d 3822 3f3e 0a3c 7465 7374 3e0a 3c66  F-8"?>.<test>.<f
   -00000030: 6f6f 3e0a 3c62 6172 3e0a 3c2f 6261 723e  oo>.<bar>.</bar>
   -00000040: 0a3c 2f66 6f6f 3e0a 3c2f 7465 7374 3e0a  .</foo>.</test>.
   +00000030: 6f6f 3e0a 3c62 6172 3e0a 3c62 617a 3e0a  oo>.<bar>.<baz>.
   +00000040: 3c2f 6261 7a3e 0a3c 2f62 6172 3e0a 3c2f  </baz>.</bar>.</
   +00000050: 666f 6f3e 0a3c 2f74 6573 743e 0a         foo>.</test>.
   
   $ xmllint --format test-old.xml
   <?xml version="1.0" encoding="UTF-8"?>
   <test>
     <foo>
       <bar>
   </bar>
     </foo>
   </test>
   
   $ xmllint --format test-new.xml
   <?xml version="1.0" encoding="UTF-8"?>
   <test>
     <foo>
       <bar>
         <baz>
   </baz>
       </bar>
     </foo>
   </test>
   
   $ xmllint --format test-old.not-xml
   <?xml version="1.0" encoding="UTF-8"?>
   <test>
     <foo>
       <bar>
   </bar>
     </foo>
   </test>
   
   $ xmllint --format test-new.not-xml
   <?xml version="1.0" encoding="UTF-8"?>
   <test>
     <foo>
       <bar>
         <baz>
   </baz>
       </bar>
     </foo>
   </test>
    
   $ file test-*
   test-new.not-xml: XML 1.0 document, ASCII text
   test-new.xml:     XML 1.0 document, ASCII text
   test-old.not-xml: XML 1.0 document, ASCII text
   test-old.xml:     XML 1.0 document, ASCII text
   
   $ file --mime test-*
   test-new.not-xml: text/xml; charset=us-ascii
   test-new.xml:     text/xml; charset=us-ascii
   test-old.not-xml: text/xml; charset=us-ascii
   test-old.xml:     text/xml; charset=us-ascii
   
-- System Information:
Debian Release: bookworm/sid
  APT prefers testing-debug
  APT policy: (900, 'testing-debug'), (900, 'testing'), (860, 'testing-proposed-updates-debug'), (860, 'testing-proposed-updates'), (800, 'unstable-debug'), (800, 'unstable'), (790, 'buildd-unstable'), (700, 'experimental-debug'), (700, 'experimental'), (690, 'buildd-experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 5.14.0-4-amd64 (SMP w/8 CPU threads)
Kernel taint flags: TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=en_AU.utf8, LC_CTYPE=en_AU.utf8 (charmap=UTF-8), LANGUAGE=en_AU:en
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages diffoscope depends on:
ii  diffoscope-minimal  190

Versions of packages diffoscope recommends:
ii  abootimg                         0.6-1+b2
ii  acl                              2.3.1-1
ii  androguard                       3.4.0~a1-1
ii  apksigner                        30.0.3-4
ii  apktool                          2.5.0+dfsg.1-2
ii  binutils-multiarch               2.37-7
ii  bzip2                            1.0.8-4
ii  caca-utils                       0.99.beta19-2.2
ii  colord                           1.4.5-3
ii  db-util                          5.3.1+nmu1
ii  default-jdk [java-sdk]           2:1.11-72
ii  default-jdk-headless             2:1.11-72
pn  device-tree-compiler             <none>
pn  docx2txt                         <none>
ii  e2fsprogs                        1.46.4-1
ii  enjarify                         1:1.0.3-5
ii  ffmpeg                           7:4.4.1-1+b1
ii  fontforge-extras                 1:20201107~dfsg-4
pn  fp-utils                         <none>
ii  genisoimage                      9:1.1.11-3.2
ii  gettext                          0.21-4
ii  ghc                              8.8.4-3
ii  ghostscript                      9.54.0~dfsg-5
ii  giflib-tools                     5.1.9-2
ii  gnumeric                         1.12.50-1
ii  gnupg                            2.2.27-2
ii  gnupg-utils                      2.2.27-2
pn  hdf5-tools                       <none>
ii  imagemagick                      8:6.9.11.60+dfsg-1.3
ii  imagemagick-6.q16 [imagemagick]  8:6.9.11.60+dfsg-1.3
ii  jsbeautifier                     1.14.0-1
ii  libarchive-tools                 3.4.3-2+b1
ii  llvm                             1:11.0-51+nmu5
ii  lz4 [liblz4-tool]                1.9.3-2
pn  mono-utils                       <none>
ii  ocaml-nox                        4.11.1-4
pn  odt2txt                          <none>
pn  oggvideotools                    <none>
ii  openjdk-11-jdk [java-sdk]        11.0.13+8-1
ii  openssh-client                   1:8.7p1-1
ii  openssl                          1.1.1l-1
ii  pgpdump                          0.33-2
ii  poppler-utils                    20.09.0-3.1
pn  procyon-decompiler               <none>
ii  python3-argcomplete              1.12.3-0.1
ii  python3-binwalk                  2.3.2+dfsg1-1
ii  python3-debian                   0.1.42
ii  python3-defusedxml               0.7.1-1
ii  python3-guestfs                  1:1.44.2-1+b1
ii  python3-jsondiff                 1.1.1-4
ii  python3-pdfminer                 20201018+dfsg-1
ii  python3-progressbar              2.5-2
ii  python3-pypdf2                   1.26.0-4
ii  python3-pyxattr                  0.7.2-1+b1
ii  python3-rpm                      4.16.1.2+dfsg1-3
ii  python3-tlsh                     3.4.4+20151206-1.4
pn  r-base-core                      <none>
pn  radare2                          <none>
ii  rpm2cpio                         4.16.1.2+dfsg1-3
ii  sng                              1.1.0-4
ii  sqlite3                          3.36.0-2
ii  squashfs-tools                   1:4.5-3
ii  tcpdump                          4.99.1-3
ii  u-boot-tools                     2021.10+dfsg-1
ii  unzip                            6.0-26
ii  vim-common                       2:8.2.3565-1
pn  wabt                             <none>
pn  xmlbeans                         <none>
ii  xxd                              2:8.2.3565-1+b1
ii  xz-utils                         5.2.5-2
ii  zip                              3.0-12
ii  zstd                             1.4.8+dfsg-3

Versions of packages diffoscope suggests:
ii  libjs-jquery  3.5.1+dfsg+~3.5.5-8

-- no debconf information

-- 
bye,
pabs

https://wiki.debian.org/PaulWise
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://alioth-lists.debian.net/pipermail/reproducible-builds/attachments/20211111/bcb6cb88/attachment.sig>


More information about the Reproducible-builds mailing list