[Debian-med-packaging] Bug#1087184: examl: fails to run with tight cores count
Étienne Mollier
emollier at debian.org
Sat Nov 9 11:45:13 GMT 2024
Package: examl
Version: 3.0.22-5
Severity: normal
While trying to give a hand with openmpi 5 transition, Salsa CI
autopkgtest for examl[1] failed with the following error:
+ examl -t testData/49.tree -m GAMMA -s 49.unpartitioned.binary -n T1
Run MPI on 2 of 2 available processors
Use /usr/lib/examl/bin/examl-avx2 and 2 processors
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2
slots that were requested by the application:
/usr/lib/examl/bin/examl-avx2
Either request fewer procs for your application, or make more slots
available for use.
A "slot" is the PRRTE term for an allocatable unit where we can
launch a process. The number of slots available are defined by the
environment in which PRRTE processes are run:
1. Hostfile, via "slots=N" clauses (N defaults to number of
processor cores if not provided)
2. The --host command line parameter, via a ":N" suffix on the
hostname (N defaults to 1 if not provided)
3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
4. If none of a hostfile, the --host command line parameter, or an
RM is present, PRRTE defaults to the number of processor cores
In all the above cases, if you want PRRTE to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.
Alternatively, you can use the --map-by :OVERSUBSCRIBE option to ignore the
number of available slots when deciding the number of processes to
launch.
--------------------------------------------------------------------------
[1]: https://salsa.debian.org/med-team/examl/-/jobs/6561673
I initially thought that examl needed 2 cores minimum, but the
error is a not as clear as I first thought. It seems that examl
did capture two cores and try to make use of both, but for some
reason, only one was usable. As far as I can tell, the error
was repeatable over two consecutive runs in Salsa CI. I did not
hit problems when testing locally, and there are no visible
problems in debci since the month of April[2], so this is not
too catastrophic neither.
[2]: https://ci.debian.net/packages/e/examl/unstable/amd64/
For information,
--
.''`. Étienne Mollier <emollier at debian.org>
: :' : pgp: 8f91 b227 c7d6 f2b1 948c 8236 793c f67e 8f0d 11da
`. `' sent from /dev/pts/4, please excuse my verbosity
`- on air: Erik Norlander - Music Machine
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/debian-med-packaging/attachments/20241109/a8eb084a/attachment-0001.sig>
More information about the Debian-med-packaging
mailing list