Bug#1102612: mpich 4.3 not initialising multiple processes

Drew Parsons dparsons at debian.org
Fri Apr 11 00:53:45 BST 2025


Package: mpich
Version: 4.3.0-5
Severity: serious
Justification: debci

I apologise for another serious bug, but mpich 4.3 is doing weird
things that we don't want in trixie. I see the problem in mpich test
errors in armci-mpi
(https://buildd.debian.org/status/fetch.php?pkg=armci-mpi&arch=amd64&ver=0.4-5&stamp=1744327219&raw=0 )
but can reproduce in a trivial test.

The problem is that mpich is not initialising multiple processes.
Instead it is simply launching multiple single processes (each with
MPI_Comm_size = 1).

You can see the problem in the armci-mpi test errors, e.g.

FAIL: benchmarks/ping-pong
==========================

[1744327153.607644] [sbuild:19884:0]            sock.c:513  UCX  WARN  unable to read somaxconn value from /proc/sys/net/core/somaxconn file
[0] ARMCI Error: This benchmark should be run on at least two processes
Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(comm=0x84000000, 1) - process 0
[0] ARMCI Error: This benchmark should be run on at least two processes
[1744327153.612861] [sbuild:19883:0]            sock.c:513  UCX  WARN  unable to read somaxconn value from /proc/sys/net/core/somaxconn file
Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(comm=0x84000000, 1) - process 0
FAIL benchmarks/ping-pong (exit status: 1)

The error message "at least two processes" is issued by armci-mpi's ping-pong.c,
when it detects MPI_Comm_size = 1.  But the test is launched with 
mpiexec.mpich -np 2  (that's why the error is repeated twice).

I can reproduce the issue with a trivial test:
```
$ cat mpich_test.c 
#include <stdio.h>
#include <mpi.h>

int main(int argc, char **argv) {
  int me, nproc;

  MPI_Init(&argc, &argv);

  MPI_Comm_rank(MPI_COMM_WORLD, &me);
  MPI_Comm_size(MPI_COMM_WORLD, &nproc);

  printf("mpi test rank %d of %d\n", me, nproc);
  MPI_Finalize();

  return 0;
}
$ mpicc.mpich -o mpich_test mpich_test.c
$ mpiexec.mpich -n 4 ./mpich_test
mpi test rank 0 of 1
mpi test rank 0 of 1
mpi test rank 0 of 1
mpi test rank 0 of 1
```

It should instead be reporting
mpi test rank 3 of 4
mpi test rank 1 of 4
mpi test rank 0 of 4
mpi test rank 2 of 4


There is even more weirdness however. The first time I compiled and
ran this trivial test, it did report having 4 processes, but that
correct output was accompanied with pmix warnings:
Query for unrecognized attribute: pmix.qry.node
Query for unrecognized attribute: pmix.qry.peers

But after recompiling the same way, it no longer gave the correct output
but also did not give the pmix warnings.

Can you reproduce this problem?


-- System Information:
Debian Release: trixie/sid
  APT prefers unstable-debug
  APT policy: (500, 'unstable-debug'), (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 6.12.21-amd64 (SMP w/8 CPU threads; PREEMPT)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE
Locale: LANG=en_AU.UTF-8, LC_CTYPE=en_AU.UTF-8 (charmap=UTF-8), LANGUAGE=en_AU:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages mpich depends on:
ii  hwloc          2.12.0-1
ii  libc6          2.41-6
ii  libhwloc15     2.12.0-1
ii  libmpich12     4.3.0-5
ii  libslurm42t64  24.11.3-2
ii  perl           5.40.1-2

Versions of packages mpich recommends:
ii  libmpich-dev  4.3.0-5

Versions of packages mpich suggests:
ii  mpich-doc  4.3.0-5

-- no debconf information



More information about the debian-science-maintainers mailing list