[Debian-ha-maintainers] Bug#953944: crmsh: flaky autopkgtest that times out regularly: missing isolation-machine restriction?

Paul Gevers elbrus at debian.org
Sat Mar 14 22:14:05 GMT 2020


Hi Valentin,

On 14-03-2020 23:02, Valentin Vidić wrote:
> On Sat, Mar 14, 2020 at 10:55:29PM +0100, Paul Gevers wrote:
>> Can you rephrase your question, I don't understand what you're asking.
>> Everything I can provide you is already available from ci.debian.net.
> 
> Right, so when the test timeouts it hangs on something but this is not
> visible in the logs or anywhere. You mentioned that it causes problems
> for the whole host when this happens so I was thinking you chould send
> some more info like the 'ps aufx' so I can get an idea what it is doing
> when it hangs? If not I can just disable the last test and it should be
> fine again.

We are having issues with the infrastructure and the end of the log
hints that this test may be one of the tests that cause it. If I catch
such a failure in real life, I'll send you the $(ps auxf), but I'm not
inspecting every issue at the moment and regularly just restart *all*
the workers when many hang. E.g. here [1] you can see the amount of
running lxc containers per worker. It should toggle between 0 and 1 for
the amd64 workers (ci-worker-[0-9]*), but they are ramping up because
the lxc doesn't close always. Some workers remain longer at one level,
while others jump multiple times per day. All workers are identical from
the provisioning point of view.

[1] https://ci.debian.net/munin/debci-day.html

Paul

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://alioth-lists.debian.net/pipermail/debian-ha-maintainers/attachments/20200314/73dc1acc/attachment.sig>


More information about the Debian-ha-maintainers mailing list