Bug#909122: diffoscope: MemoryError when comparing big ISO images
Marek Marczykowski-Górecki
marmarek at invisiblethingslab.com
Tue Sep 18 19:17:03 BST 2018
Package: diffoscope
Version: 101
Severity: normal
Dear Maintainer,
When comparing two 4.5GB ISO images, diffoscope tries to load them into
memory, which fails with MemoryError in json comparator:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/diffoscope/main.py", line 470, in main
sys.exit(run_diffoscope(parsed_args))
File "/usr/lib/python3/dist-packages/diffoscope/main.py", line 442, in run_diffoscope
difference = compare_root_paths(path1, path2)
File "/usr/lib/python3/dist-packages/diffoscope/comparators/utils/compare.py", line 65, in compare_root_paths
file1 = specialize(FilesystemFile(path1, container=container1))
File "/usr/lib/python3/dist-packages/diffoscope/comparators/utils/specialize.py", line 49, in specialize
if try_recognize(file, cls, cls.recognizes):
File "/usr/lib/python3/dist-packages/diffoscope/comparators/utils/specialize.py", line 36, in try_recognize
if not recognizes(file):
File "/usr/lib/python3/dist-packages/diffoscope/comparators/json.py", line 52, in recognizes
f.read().decode('utf-8', errors='ignore'),
MemoryError
Obviously ISO file is not JSON.
The whole thing could be avoided if earlier check (if initial 10 chars
contains '[' or '{') would be executed not only on "text" files.
Any reasons for that "is_text" there? Alternatively, if is_text=False,
maybe the function should return False early?
I can provide a patch for either option, but I'd like to know which one
of them you prefer.
The JSONFile.recognizes function, for context:
@classmethod
def recognizes(cls, file):
with open(file.path, 'rb') as f:
# Try fuzzy matching for JSON files
is_text = any(
file.magic_file_type.startswith(x)
for x in ('ASCII text', 'UTF-8 Unicode text'),
)
if is_text and not file.name.endswith('.json'):
buf = f.read(10)
if not any(x in buf for x in b'{['):
return False
f.seek(0)
try:
file.parsed = json.loads(
f.read().decode('utf-8', errors='ignore'),
object_pairs_hook=collections.OrderedDict,
)
except ValueError:
return False
return True
-- System Information:
Debian Release: buster/sid
APT prefers testing
APT policy: (500, 'testing')
Architecture: amd64 (x86_64)
Kernel: Linux 4.14.67-1.pvops.qubes.x86_64 (SMP w/8 CPU cores)
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968), LANGUAGE=C (charmap=ANSI_X3.4-1968)
Shell: /bin/sh linked to /usr/bin/dash
Init: unable to detect
Versions of packages diffoscope depends on:
ii libpython3.6-stdlib 3.6.6-1
ii python3 3.6.5-3
ii python3-distro 1.3.0-1
ii python3-distutils 3.6.6-1
ii python3-libarchive-c 2.1-3.1
ii python3-magic 2:0.4.15-2
ii python3-pkg-resources 40.2.0-1
Versions of packages diffoscope recommends:
ii abootimg 0.6-1+b2
ii acl 2.2.52-3+b1
pn apktool <none>
ii binutils-multiarch 2.31.1-5
ii bzip2 1.0.6-9
ii caca-utils 0.99.beta19-2+b3
ii colord 1.3.3-2
ii db-util 5.3.1
ii default-jdk-headless 2:1.10-68
ii device-tree-compiler 1.4.7-3
ii docx2txt 1.4-1
ii e2fsprogs 1.44.4-2
ii enjarify 1:1.0.3-4
ii fontforge-extras 0.3-4
ii fp-utils 3.0.4+dfsg-20
ii fp-utils-3.0.4 [fp-utils] 3.0.4+dfsg-20
ii genisoimage 9:1.1.11-3+b2
ii gettext 0.19.8.1-7
ii ghc 8.2.2-4
ii ghostscript 9.25~dfsg-2
ii giflib-tools 5.1.4-3
ii gnumeric 1.12.41-1
ii gnupg 2.2.10-1
ii imagemagick 8:6.9.10.8+dfsg-1
ii imagemagick-6.q16 [imagemagick] 8:6.9.10.8+dfsg-1
ii jsbeautifier 1.6.4-7
ii libarchive-tools 3.2.2-5
ii llvm 1:6.0-43
ii lz4 1.8.2-1
ii mono-utils 4.6.2.7+dfsg-1
ii odt2txt 0.5-1+b2
pn oggvideotools <none>
ii openssh-client 1:7.8p1-1
ii pgpdump 0.33-1
ii poppler-utils 0.63.0-2
ii procyon-decompiler 0.5.32-4
ii python3-argcomplete 1.8.1-1
ii python3-binwalk 2.1.2~git20180830+dfsg1-1
ii python3-debian 0.1.33
ii python3-defusedxml 0.5.0-1
ii python3-guestfs 1:1.38.4-1
ii python3-jsondiff 1.1.1-2
ii python3-progressbar 2.3-4
ii python3-pyxattr 0.6.0-2+b2
ii python3-tlsh 3.4.4+20151206-1+b4
ii r-base-core 3.5.1-1+b1
ii rpm2cpio 4.14.1+dfsg1-4
ii sng 1.1.0-1+b1
ii sqlite3 3.24.0-1
ii squashfs-tools 1:4.3-6
ii tcpdump 4.9.2-3
ii unzip 6.0-21
ii vim-common 2:8.1.0320-1
ii xmlbeans 2.6.0+dfsg-4
ii xxd 2:8.1.0320-1
ii xz-utils 5.2.2-1.3
Versions of packages diffoscope suggests:
ii libjs-jquery 3.2.1-1
-- no debconf information
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/reproducible-builds/attachments/20180918/1a5531b3/attachment.sig>
More information about the Reproducible-builds
mailing list