Bug#1087800: pmix: apply upstream patch for bigendian alignment
Drew Parsons
dparsons at debian.org
Mon Nov 18 19:03:14 GMT 2024
Source: pmix
Version: 5.0.3-2
Severity: normal
Control: affects -1 src:mpi4py
Control: forwarded -1 https://github.com/openpmix/prrte/pull/2075
OpenMPI occasionally crashes via PMIX, especially on the less common
architectures, in ways that are hard to debug.
For instance mpi4py is currently FTBFS on s390x (and ppc64) with test
failure in DPM:
[sbuild:00340] PMIX ERROR: PMIX_ERR_BAD_PARAM in file ../../../../../3rd-party/prrte/src/runtime/prte_data_server.c at line 270
[sbuild:00340] PMIX ERROR: PMIX_ERR_BAD_PARAM in file ../../../../../3rd-party/prrte/src/runtime/prte_data_server.c at line 270
[sbuild:00340] PRTE ERROR: Bad parameter in file ../../../../../3rd-party/prrte/src/runtime/prte_data_server.c at line 487
[sbuild:00340] PRTE ERROR: Bad parameter in file ../../../../../3rd-party/prrte/src/runtime/prte_data_server.c at line 487
testJoin (test_dynproc.TestDPM.testJoin) ... [sbuild:00344] *** Process received signal ***
[sbuild:00343] *** Process received signal ***
[sbuild:00343] Signal: Segmentation fault (11)
[sbuild:00343] Signal code: Address not mapped (1)
[sbuild:00343] Failing at address: (nil)
[sbuild:00344] Signal: Segmentation fault (11)
[sbuild:00344] Signal code: Address not mapped (1)
[sbuild:00344] Failing at address: (nil)
[sbuild:00343] [ 0] linux-vdso64.so.1(__kernel_rt_sigreturn+0x0) [0x3fffbbfe480]
[sbuild:00343] [ 1] /lib/s390x-linux-gnu/libmpi.so.40(ompi_dpm_connect_accept+0x4d4) [0x3ffba1f431c]
[sbuild:00343] [ 2] /lib/s390x-linux-gnu/libmpi.so.40(PMPI_Comm_join+0x270) [0x3ffba22cfd0]
[sbuild:00343] [ 3] /<<PKGBUILDDIR>>/.pybuild/cpython3_3.13/build/mpi4py/MPI.cpython-313-s390x-linux-gnu.so(+0x176990) [0x3ffba6f6990]
see
https://buildd.debian.org/status/fetch.php?pkg=mpi4py&arch=s390x&ver=4.0.1-3&stamp=1731894519&raw=0
https://github.com/mpi4py/mpi4py/issues/586
https://github.com/openpmix/openpmix/issues/3447
It seems to be a bigendian issue, and not many upstream developers are
able to test for them.
However for this issue, upstream suspects prrte PR#2075 might fix the
problem. It's a small patch, should be safe to apply in any case.
Can we apply it and see if it fixes some s390x PMIX problems?
https://github.com/openpmix/prrte/pull/2075
https://patch-diff.githubusercontent.com/raw/openpmix/prrte/pull/2075.patch
--- a/src/runtime/prte_data_server.c
+++ b/src/runtime/prte_data_server.c
@@ -15,7 +15,7 @@
* Copyright (c) 2015-2020 Intel, Inc. All rights reserved.
* Copyright (c) 2017-2018 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
- * Copyright (c) 2021-2022 Nanook Consulting. All rights reserved.
+ * Copyright (c) 2021-2024 Nanook Consulting All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
@@ -182,7 +182,8 @@ void prte_data_server(int status, pmix_proc_t *sender,
prte_data_object_t *data;
pmix_data_buffer_t *answer, *reply;
int rc, k;
- uint32_t ninfo, i;
+ size_t ninfo;
+ uint32_t i;
char **keys = NULL, *str;
bool wait = false;
int room_number;
More information about the debian-science-maintainers
mailing list