Bug#852013: Patch to prevent segfaults on signal

Brett Smith debbug at brettcsmith.org
Sat Jan 28 16:53:02 UTC 2017

tags -1 +patch

I believe I have tracked down the problem.  The changes introduced in
fec9e97c51b3a8ff226a4b3b2b0563a4a680ac68 cause a file object to be
manipulated on both sides of a thread boundary, which seems dangerous. 
The attached patch restructures things to avoid that problem, and a few
other implementation issues.  The commit message has more details about
the rationale.

I've also attached a script I used to help reproduce the issue.  It
doesn't do so reliably, and it's not totally robust itself, but I found
it more reliable than trying to time timeout or a ^C right.

fec9d97 was meant to improve portability.  It's possible that my patch
has its own portability problems: it opens a FIFO in nonblocking mode,
and fifo(7) makes it sound like this behavior is undefined by POSIX.  We
might want to check with some BSD folks to see what they do in that
case.  But even if the patch doesn't work as intended outside Linux,
hopefully fixing that is a matter of checking for more error codes or
something like that, and won't require as much restructuring as fec9d97 did.

I should note that the cleanup that we do after receiving a signal is a
little incomplete in the current state, too.  What happens now is: when
we get a signal, we raise a SystemExit exception in the main thread. 
That bubbles all the way back up to diffoscope.main.main, which cleans
up all the temporary files and directories in a finally block, and then
the Python process exits.  However, subprocesses and all of our threads
(which are marked as daemon threads) are still running, and have all the
file descriptors open.  So the files still "exist" in some sense and are
taking up space, even if they're no longer addressable.  We're also
still spending plenty of CPU doing the real diff work.  Eventually the
processes will all go away because they finished their work naturally,
or because they encounter an error because their working directories
have disappeared out from under them.

It would be nice to move to a system where, when we get a signal, we
send it to all our children, then wait for them to finish before doing
our own cleanup.  But that's a bigger job and less urgent, so I didn't
go that far here.

Brett Smith

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/reproducible-builds/attachments/20170128/34ef4fe0/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-diffoscope.diff-Improve-FIFO-writing-robustness.patch
Type: text/x-diff
Size: 5759 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/reproducible-builds/attachments/20170128/34ef4fe0/attachment.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.sh
Type: application/x-shellscript
Size: 213 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/reproducible-builds/attachments/20170128/34ef4fe0/attachment.bin>

More information about the Reproducible-builds mailing list