Bug#991059: diffoscope: out-of-memory

Roland Clobus rclobus at rclobus.nl
Sat Aug 14 15:01:34 BST 2021


Hello Chris,

On 13/08/2021 18:52, Chris Lamb wrote:
>> I think I've found the cause for the different code paths.
>> The squashfs image contains devices, which can only be extracted as root.
> 
> Ah, bravo — well discovered! Alas, I'm afraid I should have been able
> to help you come to this earlier, for I clearly encountered precisely
> this issue before and had completely forgotten about it:
> 
>   https://salsa.debian.org/reproducible-builds/diffoscope/commit/95dbe95a471e127798614727deea637186c1364f
>   https://salsa.debian.org/reproducible-builds/diffoscope/-/issues/63
> 
> (I will be the first to admit that I did not really resolve the
> underlying problem, merely prevented it from coming up in the
> testsuite.)

This looks exactly like the issue at hand.
As you wrote, the commit avoids the issue instead of resolving it.
The wishlist ticket to handle this issue was closed
(https://salsa.debian.org/reproducible-builds/diffoscope/-/issues/65)

The hard part is, that unsquashfs only has two possible return values: 0
and 1. There is not discrimination made for the cause of an error state.

> One question though — why would the character devices existing or not
> be relevant to it OOMing? Or rather, why aren't they simply compared
> in the normal way? Sure, if character devices exist they will take
> extra time and resources to be compared, but surely your ISO does not
> contain so many character devices that it adds a significant burden to
> the comparison process?

There are only 8 devices in the image.

> Let's take stock. What do we want this diffoscope invocation on
> Jenkins to actually do? In the first instance, we obviously don't want
> it to OOM. But do we want it to extract these character devices or
> not? And if we want or can skip over them, what should we do in that
> situation? And is that going to be helpful at all in this OOM
> situation anyway?

The diffoscope invocation on Jenkins should primarily show whether the
content of the squashfs image is identical between two build runs, and
should list the differences of the files within the image. Attached is
the output that I can get running diffoscope as root.

The chance of character devices being different is nearly zero, so it
would be ok to have these as a blind spot.
Since the character devices are embedded in the squashfs image, they are
not active (not connected to a device). If they would have been created,
only a basic comparison suffices.

The difference that is found by diffoscope (when running as root) lies
in a difference of regular files within the squashfs image. Both
squashfs images have different lengths and (due to the compression) are
totally different.

However, because unsquashfs returns a non-zero value, diffoscope assumes
that the extraction failed and reverts to a binary comparison (using
xxd). The output of xxd is piped directly to memory. So the 2.6GiB
squashfs image will become a 9.5GiB xxd file. Running 'diff -u' on these
xxd files results on my computer (with 32GB) in an OOM.
Anyway, doing a binary comparison on squashfs files of this kind is not
that meaningful.

On jenkins.debian.net, the amount of memory is limited with 'ulimit -v'
to 10GB, so that limit is reached rather quickly.


Having written all this, I noticed that by focussing on the crash
itself, I lost the overall goal: having the differences within the
squashfs image listed.

If possible, I would like to see something like:
* If the return value of unsquashfs is non-zero, look whether stderr
only contains lines like
'create_inode: could not create character device ./dev/console, because
you're not superuser!'
* If that is the case, resume normal operation, pretending the return
code to be zero
* If not, then something else happened, which is out-of-scope for this
ticket and handled with the current code

> (A side question: can you confirm whether diffoscope is running as
> root or not in your particular Jenkins test? I don't want to
> misinterpret the logs.)

I'm 99.9% sure that I'm not running as root, because I would have needed
a sudo invocation (and there would not have been an OOM). Holger, can
you confirm this?

With kind regards,
Roland Clobus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/reproducible-builds/attachments/20210814/5bc16aac/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://alioth-lists.debian.net/pipermail/reproducible-builds/attachments/20210814/5bc16aac/attachment.sig>


More information about the Reproducible-builds mailing list