Bug#1131533: mpich breaks mpi4py autopkgtest str2bool[supported] == False
Drew Parsons
dparsons at debian.org
Mon Mar 23 18:08:22 GMT 2026
Source: mpi4py
Followup-For: Bug #1131533
Control: tags -1 help
The str2bool error reported here appears to arise from a mismatch
between the old mpi4py build against mpich 4, and the new mpich 5.
MPICH is supposed to have ABI compatibility, so perhaps the error is a
bug in the compatibility. But at the same time, MPI-5 (supported by mpich 5)
is introducing a common MPI ABI (common with openmpi). Perhaps mpi4py
was getting confused over which ABI it's working with. I'm not sure
exactly.
To keep the build consistent, I uploaded mpi4py 4.1.1-3 to rebuild
against mpich 5. With the rebuild, testGetFortranInfo is now passing
on i386, as tested on porterbox (barriere):
testGetFortranInfo (test_mpiabi.TestMPIABI.testGetFortranInfo) ... testGetFortranInfo (test_mpiabi.TestMPIABI.testGetFortranInfo) ... testGetFortranInfo (test_mpiabi.TestMPIABI.testGetFortranInfo) ... testGetFortranInfo (test_mpiabi.TestMPIABI.testGetFortranInfo) ... ok
ok
testGetInfo (test_mpiabi.TestMPIABI.testGetInfo) ... ok
testGetVersion (test_mpiabi.TestMPIABI.testGetVersion) ... ok
In fact all tests are passing successfully on barriere, with
autopkgtest ending with:
autopkgtest [16:29:40]: test mpi4py-test: -----------------------]
autopkgtest [16:29:40]: test mpi4py-test: - - - - - - - - - - results - - - - - - - - - -
mpi4py-test PASS
autopkgtest [16:29:40]: @@@@@@@@@@@@@@@@@@@@ summary
command1 SKIP unknown restriction hint-testsuite-triggers
command1 SKIP unknown restriction hint-testsuite-triggers
mpi4py-test PASS
So in that sense, this bug is resolved by rebuilding mpi4py against mpich 5.
However, on debci the tests are now hanging at test_cco_buf.TestCCOBufWorld.testAllreduce
after hitting an ERROR in test_cco_buf.TestCCOBufWorld.testAllgather
so debci fails on timeout,
https://ci.debian.net/data/autopkgtest/testing/i386/m/mpi4py/69692323/log.gz
I suspect the timeout in testAllreduce is indirectly triggered by the
error in testAllgather.
I can't see what the substantial difference between the two test
environments is. Why are the same tests passing on barriere,
but hitting an error and failing with timeout on debci?
Help wanted.
More information about the debian-science-maintainers
mailing list