Bug#816101: petsc: FTBFS on mipsel - broken openmpi breaks petsc build
Drew Parsons
dparsons at debian.org
Wed Apr 13 10:59:22 UTC 2016
reopen 816101
found 816101 3.6.3.dfsg2-4
thanks
A workaround for openmpi has now been applied (1.10.2-12+b1) enabling
petsc to build on mips64el (and mips).
But the petsc build still fails on mipsel.
The log does not reveal any failure mode. The build completes
successfully. The test for mipsel-linux-gnu-real-debug (i.e. for the
debug package) passes successfully.
The failure happens during the test for mipsel-linux-gnu-real (the main
build version). It passes the first test using 1 MPI process. It
fails during the second test using 2 MPI processes. The test is ex19 in
petsc-3.6.3.dfsg2/src/snes/examples/tutorials.
I can reproduce the problem in a schroot on etler.debian.org. I first
build the library (debian/rules build). Then run the tests by hand:
$ cd ~/petsc/petsc-3.6.3.dfsg2/src/snes/examples/tutorials
$ PETSC_DIR=/home/dparsons/petsc/petsc-3.6.3.dfsg2 PETSC_ARCH=mipsel-linux-gnu-real LD_LIBRARY_PATH=:/home/dparsons/petsc/petsc-3.6.3.dfsg2/mipsel-linux-gnu-real/lib make clean
$ PETSC_DIR=/home/dparsons/petsc/petsc-3.6.3.dfsg2 PETSC_ARCH=mipsel-linux-gnu-real LD_LIBRARY_PATH=:/home/dparsons/petsc/petsc-3.6.3.dfsg2/mipsel-linux-gnu-real/lib make ex19
$ PETSC_DIR=/home/dparsons/petsc/petsc-3.6.3.dfsg2 PETSC_ARCH=mipsel-linux-gnu-real LD_LIBRARY_PATH=:/home/dparsons/petsc/petsc-3.6.3.dfsg2/mipsel-linux-gnu-real/lib mpirun -n 1 ./ex19 -da_refine 3 -pc_type mg -ksp_type fgmres
This is the first test (1 process), which succeeds. The second test
with two processes simply uses -n 2:
$ PETSC_DIR=/home/dparsons/petsc/petsc-3.6.3.dfsg2 PETSC_ARCH=mipsel-linux-gnu-real LD_LIBRARY_PATH=:/home/dparsons/petsc/petsc-3.6.3.dfsg2/mipsel-linux-gnu-real/lib mpirun -n 2 ./ex19 -da_refine 3 -pc_type mg -ksp_type fgmres
This test doesn't "fail" as such, it just never completes. It must be
caught in a deadlock. I think the failure seen in the automated build
is a timeout. Apparently the buildd halts the build after 5 hours.
The strange thing is that this problem only happens for the normal real
build (with mipsel-linux-gnu-real).
The other three build permutations (real-debug, complex and
complex-debug) run the same 2-processor test just fine.
Any ideas what to do next? The openmpi bug (#818909) still has some
unfinished business with chrpath for mips architectures to remove the
workaround applied in 1.10.2-12. Could this generate a deadlock in one
petsc build but not the others?
Drew
More information about the debian-science-maintainers
mailing list