Bug#920546: dolfin: flaky autopkgtest as it times out sometimes
Drew Parsons
dparsons at debian.org
Sat Jun 1 07:25:17 BST 2019
Source: dolfin
Followup-For: Bug #920546
I have a hunch the timeout problem might be related to
oversubscription of CPUs in mpi runs.
(in principle the same would apply to python MPI tests, presumeably
the python/MPI interface would "slow down" messages enough to avoid
the race condition)
I've uploaded 2018.1.0.post1-18 to print the number of available CPUs
at test time, to test if oversubscription is a plausible explanation.
Currently oversubscription is permitted at up to 2 jobs per CPU. The
demos use 3 processes each. So if 4 CPU are available then 2 jobs
(6 processes) are run, which would be 50% oversubscribed.
If that is the case and correlates with MPI C++ timeouts, then the
next step is to strictly never oversubscribe (but if only 1 or 2 CPU
is available then the first job of 3 processes must still be
oversubscribed)
More information about the debian-science-maintainers
mailing list