[pkg-gnupg-maint] Bug#841143: False assumptions about nPth (was: Bug#841143: Suspected race in gpg1 to gpg2 conversion or agent startup [and 1 more messages]) [and 1 more messages]
Ian Jackson
ijackson at chiark.greenend.org.uk
Mon Jan 9 12:07:04 UTC 2017
Werner Koch writes ("False assumptions about nPth (was: Bug#841143: Suspected race in gpg1 to gpg2 conversion or agent startup [and 1 more messages])"):
> Please point out a single threading bug in gpg-agent or any other part
> of GnuPG. But before you point me to your patches please learn about
> nPth (and its predecessor GNU Pth) and understand why we are not using
> Posix threads directly.
You are right that I was confused about pth. It would have been very
helpful if you had mentioned at some earlier point in this
conversation that npth is a non-preemptive threading library and that
that is why you thought there aren't threading bugs. I thought it was
a simple wrapper around pthreads with some signal handling support.
Use of a non-concurrent threading library is part of the kind of
"systematic and effective way to avoid threading bugs" which I was
hoping to find. Sorry for missing that.
I think that at least my patch
[PATCH 4/4] gpg agent lockup fix: Interrupt main loop when active_connections_value==0
is very likely a fix to an actual race.
During debugging I several times had a gdb attached to a stuck
gpg-agent process. I found the process stuck in select, selecting
only on the inotify fd, with `shutdown_pending' having the value 1 and
`active_connections' having the value 0. Because of difficulties
collecting logging, and the fact that adding logging (once I figured
out how to do so) seemed to dramatically reduce the failure
probability, I can't be 100% sure of the history of those stuck
gpg-agents.
At the very least empirically that patch reduces the failure
probability of a run of the complete dgit test suite on my laptop from
about 100% (I guess that represents a failure probability of 0.1% per
gnupg run) to about 5-10%.
Thanks for your logging tips. Unfortunately, however, they came
rather late. Yesterday this problem got me completely blocked on dgit
development so I had to fight the bug alone. It took me many hours
which could probably have been significantly shortened with your help.
Next time someone reports a bug like this, it would be better if you
mentioned the reasons why you think it's not a bug (npth's special
properties, in this case). You could have linked to npth's
documentation. Earlier instructions for collecting debug logs would
have been helpful. Speculation as to where the bug might or might not
be, rather than blanket denials, would have been welcome.
I'm afraid this has made me somewhat tetchy as you can probably tell.
Do you intend to rework my patch(es) and apply the ones that make
sense ? Do you intend to fix the remaining bug ?
Ian.
PS: npth is also not bug-free. For example, see #850686, just
reported.
--
Ian Jackson <ijackson at chiark.greenend.org.uk> These opinions are my own.
If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.
More information about the pkg-gnupg-maint
mailing list