Bug#991059: diffoscope: out-of-memory

Chris Lamb chris at reproducible-builds.org
Fri Aug 13 17:52:03 BST 2021


Hi Roland,

> > So it appears to me that different code is activated for regular users
> > and root.

In addition to the filesystem device difference (discussed below), the
other highly relevant difference is that processes run as root are
terminated by the OOM killer with a slower priority.

This is unlikely to be the underlying issue of course, but it will
introduce uncertainty to any experiment or
testcase.

> I think I've found the cause for the different code paths.
> The squashfs image contains devices, which can only be extracted as root.

Ah, bravo — well discovered! Alas, I'm afraid I should have been able
to help you come to this earlier, for I clearly encountered precisely
this issue before and had completely forgotten about it:

  https://salsa.debian.org/reproducible-builds/diffoscope/commit/95dbe95a471e127798614727deea637186c1364f
  https://salsa.debian.org/reproducible-builds/diffoscope/-/issues/63

(I will be the first to admit that I did not really resolve the
underlying problem, merely prevented it from coming up in the
testsuite.)

One question though — why would the character devices existing or not
be relevant to it OOMing? Or rather, why aren't they simply compared
in the normal way? Sure, if character devices exist they will take
extra time and resources to be compared, but surely your ISO does not
contain so many character devices that it adds a significant burden to
the comparison process?

> With squashfuse (mounting the image in userspace) to replace the
> unpacking to disc with unsquashfs, this issue might be avoided.

Oh that's an interesting idea. However, let's keep it in the back
pocket for now — filesystem mounting (particularly of the FUSE
variety) would not be a trivial addition to diffoscope, so we should
be sure the effort and complexity is justified first.

Let's take stock. What do we want this diffoscope invocation on
Jenkins to actually do? In the first instance, we obviously don't want
it to OOM. But do we want it to extract these character devices or
not? And if we want or can skip over them, what should we do in that
situation? And is that going to be helpful at all in this OOM
situation anyway?

(A side question: can you confirm whether diffoscope is running as
root or not in your particular Jenkins test? I don't want to
misinterpret the logs.)


Best wishes,

--
      o
    ⬋   ⬊      Chris Lamb
   o     o     reproducible-builds.org 💠
    ⬊   ⬋
      o



More information about the Reproducible-builds mailing list