Bug#999438: diffoscope: issues with XML files not named *.xml
Paul Wise
pabs at debian.org
Thu Nov 11 01:05:41 GMT 2021
Package: diffoscope
Version: 190
Severity: normal
There are two issues with XML files not named *.xml:
They don't get reformatted before comparison, resulting in a diff of
the plain text, instead of a diff of the reformatted XML.
When comparing them with XML files named *.xml, a comparison of the
bytes is done, resulting in a diff of two hex dumps, instead of a diff
of the reformatted XML or a diff of the plain text. The reformatted XML
would be the best thing to diff, but plain text should be a fallback.
The xmllint tool can reformat them just fine and the file tool can
detect them as XML and detect their MIME type, so this issue is likely
to be a problem in the diffoscope code.
$ head -vn-0 test-{old,new}.xml
==> test-old.xml <==
<?xml version="1.0" encoding="UTF-8"?>
<test>
<foo>
<bar>
</bar>
</foo>
</test>
==> test-new.xml <==
<?xml version="1.0" encoding="UTF-8"?>
<test>
<foo>
<bar>
<baz>
</baz>
</bar>
</foo>
</test>
$ diffoscope test-{old,new}.xml
--- test-old.xml
+++ test-new.xml
│ --- test-old.xml
├── +++ test-new.xml
│ @@ -1,6 +1,8 @@
│ <?xml version="1.0" encoding="utf-8"?>
│ <test>
│ <foo>
│ - <bar/>
│ + <bar>
│ + <baz/>
│ + </bar>
│ </foo>
│ </test>
$ cp test-new.xml test-new.not-xml
$ cp test-old.xml test-old.not-xml
$ diffoscope test-{old,new}.not-xml
--- test-old.not-xml
+++ test-new.not-xml
@@ -1,7 +1,9 @@
<?xml version="1.0" encoding="UTF-8"?>
<test>
<foo>
<bar>
+<baz>
+</baz>
</bar>
</foo>
</test>
$ diffoscope test-old.xml test-new.not-xml
--- test-old.xml
+++ test-new.not-xml
@@ -1,5 +1,6 @@
00000000: 3c3f 786d 6c20 7665 7273 696f 6e3d 2231 <?xml version="1
00000010: 2e30 2220 656e 636f 6469 6e67 3d22 5554 .0" encoding="UT
00000020: 462d 3822 3f3e 0a3c 7465 7374 3e0a 3c66 F-8"?>.<test>.<f
-00000030: 6f6f 3e0a 3c62 6172 3e0a 3c2f 6261 723e oo>.<bar>.</bar>
-00000040: 0a3c 2f66 6f6f 3e0a 3c2f 7465 7374 3e0a .</foo>.</test>.
+00000030: 6f6f 3e0a 3c62 6172 3e0a 3c62 617a 3e0a oo>.<bar>.<baz>.
+00000040: 3c2f 6261 7a3e 0a3c 2f62 6172 3e0a 3c2f </baz>.</bar>.</
+00000050: 666f 6f3e 0a3c 2f74 6573 743e 0a foo>.</test>.
$ diffoscope test-old.not-xml test-new.xml
--- test-old.not-xml
+++ test-new.xml
@@ -1,5 +1,6 @@
00000000: 3c3f 786d 6c20 7665 7273 696f 6e3d 2231 <?xml version="1
00000010: 2e30 2220 656e 636f 6469 6e67 3d22 5554 .0" encoding="UT
00000020: 462d 3822 3f3e 0a3c 7465 7374 3e0a 3c66 F-8"?>.<test>.<f
-00000030: 6f6f 3e0a 3c62 6172 3e0a 3c2f 6261 723e oo>.<bar>.</bar>
-00000040: 0a3c 2f66 6f6f 3e0a 3c2f 7465 7374 3e0a .</foo>.</test>.
+00000030: 6f6f 3e0a 3c62 6172 3e0a 3c62 617a 3e0a oo>.<bar>.<baz>.
+00000040: 3c2f 6261 7a3e 0a3c 2f62 6172 3e0a 3c2f </baz>.</bar>.</
+00000050: 666f 6f3e 0a3c 2f74 6573 743e 0a foo>.</test>.
$ xmllint --format test-old.xml
<?xml version="1.0" encoding="UTF-8"?>
<test>
<foo>
<bar>
</bar>
</foo>
</test>
$ xmllint --format test-new.xml
<?xml version="1.0" encoding="UTF-8"?>
<test>
<foo>
<bar>
<baz>
</baz>
</bar>
</foo>
</test>
$ xmllint --format test-old.not-xml
<?xml version="1.0" encoding="UTF-8"?>
<test>
<foo>
<bar>
</bar>
</foo>
</test>
$ xmllint --format test-new.not-xml
<?xml version="1.0" encoding="UTF-8"?>
<test>
<foo>
<bar>
<baz>
</baz>
</bar>
</foo>
</test>
$ file test-*
test-new.not-xml: XML 1.0 document, ASCII text
test-new.xml: XML 1.0 document, ASCII text
test-old.not-xml: XML 1.0 document, ASCII text
test-old.xml: XML 1.0 document, ASCII text
$ file --mime test-*
test-new.not-xml: text/xml; charset=us-ascii
test-new.xml: text/xml; charset=us-ascii
test-old.not-xml: text/xml; charset=us-ascii
test-old.xml: text/xml; charset=us-ascii
-- System Information:
Debian Release: bookworm/sid
APT prefers testing-debug
APT policy: (900, 'testing-debug'), (900, 'testing'), (860, 'testing-proposed-updates-debug'), (860, 'testing-proposed-updates'), (800, 'unstable-debug'), (800, 'unstable'), (790, 'buildd-unstable'), (700, 'experimental-debug'), (700, 'experimental'), (690, 'buildd-experimental')
Architecture: amd64 (x86_64)
Kernel: Linux 5.14.0-4-amd64 (SMP w/8 CPU threads)
Kernel taint flags: TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=en_AU.utf8, LC_CTYPE=en_AU.utf8 (charmap=UTF-8), LANGUAGE=en_AU:en
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
Versions of packages diffoscope depends on:
ii diffoscope-minimal 190
Versions of packages diffoscope recommends:
ii abootimg 0.6-1+b2
ii acl 2.3.1-1
ii androguard 3.4.0~a1-1
ii apksigner 30.0.3-4
ii apktool 2.5.0+dfsg.1-2
ii binutils-multiarch 2.37-7
ii bzip2 1.0.8-4
ii caca-utils 0.99.beta19-2.2
ii colord 1.4.5-3
ii db-util 5.3.1+nmu1
ii default-jdk [java-sdk] 2:1.11-72
ii default-jdk-headless 2:1.11-72
pn device-tree-compiler <none>
pn docx2txt <none>
ii e2fsprogs 1.46.4-1
ii enjarify 1:1.0.3-5
ii ffmpeg 7:4.4.1-1+b1
ii fontforge-extras 1:20201107~dfsg-4
pn fp-utils <none>
ii genisoimage 9:1.1.11-3.2
ii gettext 0.21-4
ii ghc 8.8.4-3
ii ghostscript 9.54.0~dfsg-5
ii giflib-tools 5.1.9-2
ii gnumeric 1.12.50-1
ii gnupg 2.2.27-2
ii gnupg-utils 2.2.27-2
pn hdf5-tools <none>
ii imagemagick 8:6.9.11.60+dfsg-1.3
ii imagemagick-6.q16 [imagemagick] 8:6.9.11.60+dfsg-1.3
ii jsbeautifier 1.14.0-1
ii libarchive-tools 3.4.3-2+b1
ii llvm 1:11.0-51+nmu5
ii lz4 [liblz4-tool] 1.9.3-2
pn mono-utils <none>
ii ocaml-nox 4.11.1-4
pn odt2txt <none>
pn oggvideotools <none>
ii openjdk-11-jdk [java-sdk] 11.0.13+8-1
ii openssh-client 1:8.7p1-1
ii openssl 1.1.1l-1
ii pgpdump 0.33-2
ii poppler-utils 20.09.0-3.1
pn procyon-decompiler <none>
ii python3-argcomplete 1.12.3-0.1
ii python3-binwalk 2.3.2+dfsg1-1
ii python3-debian 0.1.42
ii python3-defusedxml 0.7.1-1
ii python3-guestfs 1:1.44.2-1+b1
ii python3-jsondiff 1.1.1-4
ii python3-pdfminer 20201018+dfsg-1
ii python3-progressbar 2.5-2
ii python3-pypdf2 1.26.0-4
ii python3-pyxattr 0.7.2-1+b1
ii python3-rpm 4.16.1.2+dfsg1-3
ii python3-tlsh 3.4.4+20151206-1.4
pn r-base-core <none>
pn radare2 <none>
ii rpm2cpio 4.16.1.2+dfsg1-3
ii sng 1.1.0-4
ii sqlite3 3.36.0-2
ii squashfs-tools 1:4.5-3
ii tcpdump 4.99.1-3
ii u-boot-tools 2021.10+dfsg-1
ii unzip 6.0-26
ii vim-common 2:8.2.3565-1
pn wabt <none>
pn xmlbeans <none>
ii xxd 2:8.2.3565-1+b1
ii xz-utils 5.2.5-2
ii zip 3.0-12
ii zstd 1.4.8+dfsg-3
Versions of packages diffoscope suggests:
ii libjs-jquery 3.5.1+dfsg+~3.5.5-8
-- no debconf information
--
bye,
pabs
https://wiki.debian.org/PaulWise
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://alioth-lists.debian.net/pipermail/reproducible-builds/attachments/20211111/bcb6cb88/attachment.sig>
More information about the Reproducible-builds
mailing list