Bug#1012318: diffoscope 214 produced no output and was killed after running into timeout after 150m

Chris Lamb chris at reproducible-builds.org
Wed Jun 8 13:15:17 BST 2022


Hey Mattia,

> Oh, yes, that's probably what it ought to do indeed (logging that it
> received a signal and leave some traces of itself on stdout, at least).

Hm, by "leave some traces of itself on stdout", what exactly do you
mean by that? You appear to be implying it should do more than simply
logging it has received a signal and is cleaning up after itself?

> Before raising the KILL limit diffoscope should learn how to treat TERM
> (and/or probably other signals too?) in a way that makes it at least
> terminating its current task (i.e. killing subprocesses, and start to
> render the output).  But indeed, should diffoscope really do that?

Hmm. Okay, let me spell out some of the downsides of doing this, if
only as a way of explaining them back to myself (!) and seeing how
much of a problem they really are:

* It's unclear whether this fits the semantics of the TERM signal.
  As you yourself ask in your reply, it is unclear whether
  diffoscope *should* actually do this. (If I were on the command-line
  and hit CTRL+C, I'm not entirely sure I'd want it to stop performing
  a diff and start opening files to write HTML...)

* Writing "signal-safe" code is fraught with weirdness, and this is
  compounded by us using a scripting language that partly abstracts
  away signal handling. (In other words, *could* we reliably do this?
  I'm not sure.)

* Related to the previous point, the code paths for this particular
  mode would be difficult to test, and even if we had good testing
  coverage, they would not be exercised very often in real life.
  Re-creating issues arising from supporting this would likely be
  almost impossible — something breaking on Jenkins would be difficult
  to reproduce locally.

* Things like subprocesses should be reaped anyway, at least according
  to my understanding about how processes work. 

* It's all a bit of a workaround to "diffoscope being slow". Or, rather,
  there are higher-priority things that will avoid us reaching the
  timeout in the first place. (For instance, improving the overall
  speed of ELF handling.)

(Sorry for the sloppy braindump; again, it's not intended as a rejoinder
to what you wrote...)


Best wishes,

-- 
      o
    ⬋   ⬊      Chris Lamb
   o     o     reproducible-builds.org 💠
    ⬊   ⬋
      o



More information about the Reproducible-builds mailing list