Bug#1102612: mpich 4.3 not initialising multiple processes
Drew Parsons
dparsons at debian.org
Fri Apr 11 00:53:45 BST 2025
Package: mpich
Version: 4.3.0-5
Severity: serious
Justification: debci
I apologise for another serious bug, but mpich 4.3 is doing weird
things that we don't want in trixie. I see the problem in mpich test
errors in armci-mpi
(https://buildd.debian.org/status/fetch.php?pkg=armci-mpi&arch=amd64&ver=0.4-5&stamp=1744327219&raw=0 )
but can reproduce in a trivial test.
The problem is that mpich is not initialising multiple processes.
Instead it is simply launching multiple single processes (each with
MPI_Comm_size = 1).
You can see the problem in the armci-mpi test errors, e.g.
FAIL: benchmarks/ping-pong
==========================
[1744327153.607644] [sbuild:19884:0] sock.c:513 UCX WARN unable to read somaxconn value from /proc/sys/net/core/somaxconn file
[0] ARMCI Error: This benchmark should be run on at least two processes
Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(comm=0x84000000, 1) - process 0
[0] ARMCI Error: This benchmark should be run on at least two processes
[1744327153.612861] [sbuild:19883:0] sock.c:513 UCX WARN unable to read somaxconn value from /proc/sys/net/core/somaxconn file
Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(comm=0x84000000, 1) - process 0
FAIL benchmarks/ping-pong (exit status: 1)
The error message "at least two processes" is issued by armci-mpi's ping-pong.c,
when it detects MPI_Comm_size = 1. But the test is launched with
mpiexec.mpich -np 2 (that's why the error is repeated twice).
I can reproduce the issue with a trivial test:
```
$ cat mpich_test.c
#include <stdio.h>
#include <mpi.h>
int main(int argc, char **argv) {
int me, nproc;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &me);
MPI_Comm_size(MPI_COMM_WORLD, &nproc);
printf("mpi test rank %d of %d\n", me, nproc);
MPI_Finalize();
return 0;
}
$ mpicc.mpich -o mpich_test mpich_test.c
$ mpiexec.mpich -n 4 ./mpich_test
mpi test rank 0 of 1
mpi test rank 0 of 1
mpi test rank 0 of 1
mpi test rank 0 of 1
```
It should instead be reporting
mpi test rank 3 of 4
mpi test rank 1 of 4
mpi test rank 0 of 4
mpi test rank 2 of 4
There is even more weirdness however. The first time I compiled and
ran this trivial test, it did report having 4 processes, but that
correct output was accompanied with pmix warnings:
Query for unrecognized attribute: pmix.qry.node
Query for unrecognized attribute: pmix.qry.peers
But after recompiling the same way, it no longer gave the correct output
but also did not give the pmix warnings.
Can you reproduce this problem?
-- System Information:
Debian Release: trixie/sid
APT prefers unstable-debug
APT policy: (500, 'unstable-debug'), (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386
Kernel: Linux 6.12.21-amd64 (SMP w/8 CPU threads; PREEMPT)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE
Locale: LANG=en_AU.UTF-8, LC_CTYPE=en_AU.UTF-8 (charmap=UTF-8), LANGUAGE=en_AU:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
Versions of packages mpich depends on:
ii hwloc 2.12.0-1
ii libc6 2.41-6
ii libhwloc15 2.12.0-1
ii libmpich12 4.3.0-5
ii libslurm42t64 24.11.3-2
ii perl 5.40.1-2
Versions of packages mpich recommends:
ii libmpich-dev 4.3.0-5
Versions of packages mpich suggests:
ii mpich-doc 4.3.0-5
-- no debconf information
More information about the debian-science-maintainers
mailing list