Bug#1102612: mpich 4.3 not initialising multiple processes
Drew Parsons
dparsons at debian.org
Fri Apr 11 16:14:39 BST 2025
Package: mpich
Followup-For: Bug #1102612
There is evidence that libucx0 might be the problem, or a problem,
in the ga (libglobalarrays-dev) build logs
https://buildd.debian.org/status/fetch.php?pkg=ga&arch=amd64&ver=5.9.1-1&stamp=1744381093&raw=0
e.g. FAIL: global/testing/pgtest
copy is OK
> Checking scatter/gather (might be slow)...
[sbuild:90178:0:90178] ucp_request.c:212 Assertion `ucs_async_check_owner_thread(&(worker)->async)' failed
==== backtrace (tid: 90178) ====
0 /lib/x86_64-linux-gnu/libucs.so.0(ucs_handle_error+0x2bc) [0x7fc32aa2564c]
1 /lib/x86_64-linux-gnu/libucs.so.0(ucs_fatal_error_message+0xb6) [0x7fc32aa231f6]
2 /lib/x86_64-linux-gnu/libucs.so.0(ucs_fatal_error_format+0x11a) [0x7fc32aa2331a]
3 /lib/x86_64-linux-gnu/libucp.so.0(ucp_request_release+0x1a7) [0x7fc32aab7487]
4 /lib/x86_64-linux-gnu/libmpich.so.12(+0x349457) [0x7fc32b3ff457]
5 /lib/x86_64-linux-gnu/libmpich.so.12(+0x349685) [0x7fc32b3ff685]
6 /lib/x86_64-linux-gnu/libucp.so.0(ucp_am_rndv_process_rts+0x17f) [0x7fc32aa9fb9f]
7 /lib/x86_64-linux-gnu/libucp.so.0(ucp_rndv_rts_handler+0xbd) [0x7fc32ab20b2d]
8 /lib/x86_64-linux-gnu/libuct.so.0(+0x21fa9) [0x7fc32a93cfa9]
9 /lib/x86_64-linux-gnu/libuct.so.0(uct_self_ep_am_bcopy+0x7e) [0x7fc32a93d6ce]
10 /lib/x86_64-linux-gnu/libucp.so.0(ucp_am_rndv_proto_progress+0x468) [0x7fc32aa8b1a8]
11 /lib/x86_64-linux-gnu/libucp.so.0(ucp_am_send_nbx+0x9aa) [0x7fc32aa9b01a]
12 /lib/x86_64-linux-gnu/libmpich.so.12(+0x17d593) [0x7fc32b233593]
13 /lib/x86_64-linux-gnu/libmpich.so.12(+0x17f378) [0x7fc32b235378]
14 /lib/x86_64-linux-gnu/libmpich.so.12(MPI_Get_accumulate+0x2fc) [0x7fc32b235e3c]
15 ./global/testing/pgtest.x(+0xcd40f) [0x55c644f5140f]
16 ./global/testing/pgtest.x(+0xd3693) [0x55c644f57693]
17 ./global/testing/pgtest.x(+0xd38ba) [0x55c644f578ba]
18 ./global/testing/pgtest.x(+0xd3aaf) [0x55c644f57aaf]
19 ./global/testing/pgtest.x(+0x929db) [0x55c644f169db]
20 ./global/testing/pgtest.x(+0xfa8e) [0x55c644e93a8e]
21 ./global/testing/pgtest.x(+0x111e5) [0x55c644e951e5]
22 ./global/testing/pgtest.x(+0x117c3) [0x55c644e957c3]
23 ./global/testing/pgtest.x(+0x4d05) [0x55c644e88d05]
24 /lib/x86_64-linux-gnu/libc.so.6(+0x29ca8) [0x7fc32aebcca8]
25 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x7fc32aebcd65]
26 ./global/testing/pgtest.x(+0x4d31) [0x55c644e88d31]
=================================
Program received signal SIGABRT: Process abort signal.
More information about the debian-science-maintainers
mailing list