Bug#1100120: libopenmpi-dev: mpi4py spawn tests get OPAL ERROR: Unreachable in file ../../../ompi/runtime/ompi_mpi_finalize.c at line 286
Drew Parsons
dparsons at debian.org
Tue Mar 11 13:52:42 GMT 2025
Package: libopenmpi-dev
Version: 5.0.7-1
Severity: serious
Justification: FTBFS (dependencies)
mpi4py build-time tests are showing problems in openmpi, with
buildtime tests failing. That's with mpi4py 4.0.3-1.
debci tests from its last build are still passing for now.
I'm assuming the bug is in openmpi, not mpi4py itself, since mpi4py is
passing tests with mpich (32 bit arches).
The first problem comes from PMIX,
An error occurred in PMIx Event Notification
The error is reproducible,
cf. https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/mpi4py.html
https://tests.reproducible-builds.org/debian/rbuild/unstable/amd64/mpi4py_4.0.3-1.rbuild.log.gz
It is triggered by test_util_pkl5, and also test_util_pool,
test_util_sync and test_win.
It is associated with a kernel general protection fault from prte.
That bug is reported in Bug#1098576, currently assigned to pmix though
I suspect it might be an openmpi issue.
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1098576
Here I'm reporting a second problem: spawn is failing,
for instance:
ERROR: testNoArgs (test_spawn.TestSpawnSingleWorldMany.testNoArgs)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/drew/projects/python/build/mpi4py/test/test_spawn.py", line 175, in testNoArgs
child = self.COMM.Spawn(
script, None, self.MAXPROCS,
info=self.INFO, root=self.ROOT,
)
File "src/mpi4py/MPI.src/Comm.pyx", line 2544, in mpi4py.MPI.Intracomm.Spawn
with nogil: CHKERR( MPI_Comm_spawn(
mpi4py.MPI.Exception: MPI_ERR_UNKNOWN: unknown error
----------------------------------------------------------------------
Ran 1857 tests in 84.632s
FAILED (errors=40, skipped=162)
[sandy:272668] OPAL ERROR: Unreachable in file ../../../ompi/runtime/ompi_mpi_finalize.c at line 286
I've marked this bug severity serious because of the message at the
end concerning the OPAL error in ompi_mpi_finalize.c (as well as the
MPI_ERR_UNKNOWN errors in the spawn tests). If the OPAL message is a
red herring then please downgrade severity if appropriate.
We could just skip the failing tests in mpi4py (in fact I will for now),
but the underlying problem should be fixed in any case.
With mpi4py, I will upload 4.0.3-2 skipping the pmix failures, in
order to get a reproducible record of the spawn failure. After that I
will upload a release of mpi4py to skip the spawn tests, until the
issue is fixed in openmpi (or pmix).
-- System Information:
Debian Release: trixie/sid
APT prefers unstable-debug
APT policy: (500, 'unstable-debug'), (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386
Kernel: Linux 6.12.17-amd64 (SMP w/8 CPU threads; PREEMPT)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE
Locale: LANG=en_AU.UTF-8, LC_CTYPE=en_AU.UTF-8 (charmap=UTF-8), LANGUAGE=en_AU:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
Versions of packages libopenmpi-dev depends on:
ii gfortran [gfortran-mod-15] 4:14.2.0-1
ii gfortran-11 [gfortran-mod-15] 11.5.0-2
ii gfortran-12 [gfortran-mod-15] 12.4.0-4
ii gfortran-13 [gfortran-mod-15] 13.3.0-12
ii gfortran-14 [gfortran-mod-15] 14.2.0-17
ii libevent-dev 2.1.12-stable-10+b1
ii libhwloc-dev 2.12.0-1
ii libibverbs-dev 56.0-2
ii libjs-jquery 3.6.1+dfsg+~3.5.14-1
ii libjs-jquery-ui 1.13.2+dfsg-1
ii libopenmpi40 5.0.7-1
ii libpmix-dev 5.0.6-5
ii openmpi-bin 5.0.7-1
ii openmpi-common 5.0.7-1
ii zlib1g-dev 1:1.3.dfsg+really1.3.1-1+b1
Versions of packages libopenmpi-dev recommends:
ii libcoarrays-openmpi-dev 2.10.2+ds-4
Versions of packages libopenmpi-dev suggests:
pn openmpi-doc <none>
-- no debconf information
More information about the debian-science-maintainers
mailing list