Bug#918364: FTBFS for armhf on arm64, fails MPI-based tests
Steve McIntyre
steve at einval.com
Sat Jan 5 14:43:10 GMT 2019
Package: src:dune-pdelab
Version: 2.6~20180302-1
Severity: important
Hi!
I've been doing a full rebuild of the Debian archive, building all
source packages targeting armel and armhf using arm64 hardware. We are
planning in future to move all of our 32-bit armel/armhf builds to
using arm64 machines, so this rebuild is to identify packages that
might have problems with this configuration.
I've tried to build dune-pdelab for armhf on top of arm64, and it's
failing some of its tests. It looks like a problem with MPI_Init() at
various points, but I don't know enough to do even basic debugging
here - sorry!
(A similar bug has shown up when building dune-common, and it's
suggested that these might have a common root in #918157 against
openmpi.)
...
Start 40: testpoisson-periodic-2d-deg1-dg0-parallel
40/95 Test #40: testpoisson-periodic-2d-deg1-dg0-parallel ............. Passed 0.18 sec
Start 41: testpoisson-periodic-2d-deg1-dg0-parallel-mpi-2
41/95 Test #41: testpoisson-periodic-2d-deg1-dg0-parallel-mpi-2 .......***Failed 0.16 sec
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated
that it can be used to communicate between these processes. This is
an error; Open MPI requires that all MPI processes be able to reach
each other. This error can sometimes be the result of forgetting to
specify the "self" BTL.
Process 1 ([[28203,1],1]) is on host: maul
Process 2 ([[28203,1],0]) is on host: maul
BTLs attempted: self tcp vader
Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
MPI_INIT has failed because at least one MPI process is unreachable
from another. This *usually* means that an underlying communication
plugin -- such as a BTL or an MTL -- has either not loaded or not
allowed itself to be used. Your MPI job will now abort.
You may wish to try to narrow down the problem;
* Check the output of ompi_info to see which BTL/MTL plugins are
available.
* Run your application with MPI_THREAD_SINGLE.
* Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
if using MTL-based communications) to see exactly which
communication plugins were considered and/or discarded.
--------------------------------------------------------------------------
[maul:25992] *** An error occurred in MPI_Init
[maul:25992] *** reported by process [1848311809,1]
[maul:25992] *** on a NULL communicator
[maul:25992] *** Unknown error
[maul:25992] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[maul:25992] *** and potentially your MPI job)
Start 42: testpoisson-periodic-3d-deg1-dg0-parallel
42/95 Test #42: testpoisson-periodic-3d-deg1-dg0-parallel ............. Passed 0.20 sec
...
Full build log is online at
https://www.einval.com/debian/arm/rebuild-logs/armhf/FAIL/dune-pdelab_2.6~20180302-1_armhf.log
-- System Information:
Debian Release: 9.6
APT prefers stable-updates
APT policy: (500, 'stable-updates'), (500, 'stable-debug'), (500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386
Kernel: Linux 4.9.0-8-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), LANGUAGE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
More information about the debian-science-maintainers
mailing list