Bug#1003079: scotch v7 MPI dgord dgpart tests fail

Drew Parsons dparsons at debian.org
Mon Jan 3 12:15:23 GMT 2022


Package: scotch
Version: 7.0.0-1exp1
Severity: normal

autopkgtest fails for the new scotch v7 package on the MPI tests of
dgord and dgpart (check_prog_dgord and check_prog_dgpart rules in
src/check/Makefile)

The failure is intermittent, usually fails with 

$ mpirun -n 3 --oversubscribe /usr/bin/dgord data/bump.grf /dev/null -Cu -vt
*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
Failing at address: 0x55aa224dfee0
[ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3c910)[0x7f7bcf819910]
[ 1] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x154)[0x7f7bccb93be4]
[ 2] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x4fc1)[0x7f7bccb93fc1]
[ 3] /lib/x86_64-linux-gnu/libopen-pal.so.40(opal_progress+0x2c)[0x7f7bcf4d596c]
[ 4] /lib/x86_64-linux-gnu/libmpi.so.40(ompi_request_default_wait+0x55)[0x7f7bcf9f8075]
[ 5] /lib/x86_64-linux-gnu/libmpi.so.40(ompi_coll_base_sendrecv_actual+0xcf)[0x7f7bcfa5174f]
[ 6] /lib/x86_64-linux-gnu/libmpi.so.40(ompi_coll_base_allgather_intra_bruck+0x140)[0x7f7bcfa4f9b0]
[ 7] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so(ompi_coll_tuned_allgather_intra_dec_fixed+0x4a)[0x7f7bcc554aca]
[ 8] /lib/x86_64-linux-gnu/libmpi.so.40(MPI_Allgather+0x121)[0x7f7bcfa0be61]
[ 9] /lib/x86_64-linux-gnu/libptscotch-7.0.so(_SCOTCHhdgraphInduceList+0x22f)[0x7f7bcfb307ff]
[10] /lib/x86_64-linux-gnu/libptscotch-7.0.so(+0x49e02)[0x7f7bcfb30e02]
[11] /lib/x86_64-linux-gnu/libptscotch-7.0.so(+0x4a6f8)[0x7f7bcfb316f8]
[12] /lib/x86_64-linux-gnu/libptscotch-7.0.so(+0x6597b)[0x7f7bcfb4c97b]
[13] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8d80)[0x7f7bcf765d80]
[14] /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f7bcf8d9b6f]
*** End of error message ***

but sometimes passes.

The unreliability of the failure suggests a race condition, likely
related to the new dynamic thread management in v7. Perhaps we need a
different set of scotch PTHREAD flags when compiling?

Review further once v7 has passed the NEW queue to experimental.



-- System Information:
Debian Release: bookworm/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 5.15.0-2-amd64 (SMP w/8 CPU threads)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE
Locale: LANG=en_AU.UTF-8, LC_CTYPE=en_AU.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages scotch depends on:
ii  libc6          2.33-1
ii  libscotch-7.0  7.0.0-1exp1

scotch recommends no packages.

scotch suggests no packages.

-- no debconf information



More information about the debian-science-maintainers mailing list