Bug#918369: FTBFS for armhf on arm64, fails MPI-based tests
Steve McIntyre
steve at einval.com
Tue Jan 8 14:49:13 GMT 2019
On Tue, Jan 08, 2019 at 02:12:06PM +0000, Steve McIntyre wrote:
>On Tue, Jan 08, 2019 at 10:42:01AM +0000, Alastair McKinstry wrote:
>>Hi Steve
>>
>>On 05/01/2019 15:40, Steve McIntyre wrote:
>>> Package: src:p4est
>>> Version: 1.1-5
>>> Severity: important
>>>
>>> I've been doing a full rebuild of the Debian archive, building all
>>> source packages targeting armel and armhf using arm64 hardware. We are
>>> planning in future to move all of our 32-bit armel/armhf builds to
>>> using arm64 machines, so this rebuild is to identify packages that
>>> might have problems with this configuration.
>>>
>>You submitted this and a number of other MPI-test based bugs. I believe they
>>may be due to #918157, a regression in pmix that breaks MPI on 32-bit archs.
>>
>>I've uploaded a new openmpi (3.1.3-8) that uses its internal pmix until a fix
>>is found for #918157, so these should now work. Can you re-test ?
>
>Sure, no problem. I've just queued all those builds to rebuild again
>now. I'll let you know how that goes.
Not looking good, I'm afraid. The errors are different, but still
seeing errors. From the dune-common rebuild:
...
1/91 Test #1: indexsettest ........................... Passed 0.01 sec
Start 2: remoteindicestest
2/91 Test #2: remoteindicestest ......................***Failed 0.05 sec
[mjolnir:16218] mca_base_component_repository_open: unable to open mca_pmix_pmix2x: /usr/lib/arm-linux-gnueabihf/openmpi/lib/openmpi3/mca_pmix_pmix2x.
so: undefined symbol: OPAL_MCA_PMIX2X_PMIx_Get_version (ignored)
[mjolnir:16218] [[42547,0],0] ORTE_ERROR_LOG: Not found in file ess_hnp_module.c at line 325
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_pmix_base_select failed
--> Returned value Not found (-13) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[mjolnir:16217] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 532
[mjolnir:16217] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 166
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_ess_init failed
--> Returned value Unable to start a daemon on the local node (-127) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "Unable to start a daemon on the local node" (-127) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[mjolnir:16217] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
...
--
Steve McIntyre, Cambridge, UK. steve at einval.com
"Because heaters aren't purple!" -- Catherine Pitt
More information about the debian-science-maintainers
mailing list