Bug#920546: dolfin: flaky autopkgtest as it times out sometimes

Drew Parsons dparsons at debian.org
Sat Jun 1 07:25:17 BST 2019


Source: dolfin
Followup-For: Bug #920546


I have a hunch the timeout problem might be related to
oversubscription of CPUs in mpi runs.

(in principle the same would apply to python MPI tests, presumeably
the python/MPI interface would "slow down" messages enough to avoid
the race condition)

I've uploaded 2018.1.0.post1-18 to print the number of available CPUs
at test time, to test if oversubscription is a plausible explanation.

Currently oversubscription is permitted at up to 2 jobs per CPU. The
demos use 3 processes each.  So if 4 CPU are available then 2 jobs
(6 processes) are run, which would be 50% oversubscribed.

If that is the case and correlates with MPI C++ timeouts, then the
next step is to strictly never oversubscribe (but if only 1 or 2 CPU
is available then the first job of 3 processes must still be
oversubscribed)



More information about the debian-science-maintainers mailing list