Bug#816101: petsc: FTBFS on mipsel - broken openmpi breaks petsc build

Drew Parsons dparsons at debian.org
Wed Apr 13 10:59:22 UTC 2016


reopen 816101
found 816101 3.6.3.dfsg2-4
thanks

A workaround for openmpi has now been applied (1.10.2-12+b1) enabling
petsc to build on mips64el (and mips).

But the petsc build still fails on mipsel.

The log does not reveal any failure mode.  The build completes
successfully.  The test for mipsel-linux-gnu-real-debug (i.e. for the
debug package) passes successfully.  

The failure happens during the test for mipsel-linux-gnu-real (the main
build version).  It passes the first test using 1 MPI process.  It
fails during the second test using 2 MPI processes. The test is ex19 in
petsc-3.6.3.dfsg2/src/snes/examples/tutorials.

I can reproduce the problem in a schroot on etler.debian.org.  I first
build the library (debian/rules build).  Then run the tests by hand:

$ cd ~/petsc/petsc-3.6.3.dfsg2/src/snes/examples/tutorials
$ PETSC_DIR=/home/dparsons/petsc/petsc-3.6.3.dfsg2 PETSC_ARCH=mipsel-linux-gnu-real LD_LIBRARY_PATH=:/home/dparsons/petsc/petsc-3.6.3.dfsg2/mipsel-linux-gnu-real/lib  make clean
$ PETSC_DIR=/home/dparsons/petsc/petsc-3.6.3.dfsg2 PETSC_ARCH=mipsel-linux-gnu-real LD_LIBRARY_PATH=:/home/dparsons/petsc/petsc-3.6.3.dfsg2/mipsel-linux-gnu-real/lib  make ex19
$ PETSC_DIR=/home/dparsons/petsc/petsc-3.6.3.dfsg2 PETSC_ARCH=mipsel-linux-gnu-real LD_LIBRARY_PATH=:/home/dparsons/petsc/petsc-3.6.3.dfsg2/mipsel-linux-gnu-real/lib  mpirun -n 1 ./ex19 -da_refine 3 -pc_type mg -ksp_type fgmres

This is the first test (1 process), which succeeds. The second test
with two processes simply uses -n 2:
$ PETSC_DIR=/home/dparsons/petsc/petsc-3.6.3.dfsg2 PETSC_ARCH=mipsel-linux-gnu-real LD_LIBRARY_PATH=:/home/dparsons/petsc/petsc-3.6.3.dfsg2/mipsel-linux-gnu-real/lib  mpirun -n 2 ./ex19 -da_refine 3 -pc_type mg -ksp_type fgmres

This test doesn't "fail" as such, it just never completes. It must be
caught in a deadlock.  I think the failure seen in the automated build
is a timeout. Apparently the buildd halts the build after 5 hours.

The strange thing is that this problem only happens for the normal real
build (with mipsel-linux-gnu-real).

The other three build permutations (real-debug, complex and 
complex-debug) run the same 2-processor test just fine.

Any ideas what to do next?  The openmpi bug (#818909) still has some
unfinished business with chrpath for mips architectures to remove the
workaround applied in 1.10.2-12. Could this generate a deadlock in one
petsc build but not the others?

Drew



More information about the debian-science-maintainers mailing list