[Debian-med-packaging] Bug#896886: openmpi: upstream version 3.0.1 makes lots of autopkgtests flaky

Paul Gevers elbrus at debian.org
Wed Apr 25 13:12:05 BST 2018


Source: openmpi
Version: 3.0.1-1
Severity: normal
User: debian-ci at lists.debian.org
Usertags: breaks
Control: affects -1 src:lammps
Control: affects -1 src:esys-particle
Control: affects -1 src:liggghts
Control: affects -1 src:gerris
Control: affects -1 src:gmsh
Control: affects -1 src:ray

With the upload of upstream version 3.0.1 of openmpi to Debian, the
autopkgtest of lammps¹, esys-particle², liggghts³, gerris⁴, gmsh⁵, ray⁶
started to regularly fail with an error similar to the one copied below.
(ray is also seeing another issue)

Unfortunately there was the transition and some issues with the CI
infrastructure intermixed (those give different errors though), so not
all failures are due to this issue. However, I can also not exclude that
all these issues are due to a packages mixing different versions of
libopenmpi* due to the transition. However, if they can't be mixed, I
think openmpi should be blocked from migrating to testing until the
transition is finished (I thought that as a library it would be allowed
to migrate before all reverse dependencies are rebuild if it doesn't
break installability, as the old library will stay in the archive until
all reverse dependencies are rebuild and migrated).

It has been pointed out in a previous issue that autopkgtest may be
sensitive to the hardware they run on, so I tried to check which workers
pass and which workers fail with this error (for most, I also note the
version of openmpi that was involved). Unfortunately, there are worker
were test both pass and fail (7, 8).

I hope you can investigate the issue.

Paul

¹ https://ci.debian.net/packages/l/lammps
² https://ci.debian.net/packages/e/esys-particle
³ https://ci.debian.net/packages/l/liggghtshttps://ci.debian.net/packages/g/gerrishttps://ci.debian.net/packages/g/gmshhttps://ci.debian.net/packages/r/ray

fail:
worker#5 -8
https://ci.debian.net/data/autopkgtest/testing/amd64/g/gmsh/204541/log.gz
worker#7 -8
https://ci.debian.net/data/autopkgtest/testing/amd64/g/gerris/204539/log.gz
worker#3 -8
https://ci.debian.net/data/autopkgtest/testing/amd64/r/ray/204537/log.gz
worker#3 -8
https://ci.debian.net/data/autopkgtest/testing/amd64/e/esys-particle/204527/log.gz
worker#2
https://ci.debian.net/data/autopkgtest/testing/amd64/l/lammps/201744/log.gz
worker#3 -8
https://ci.debian.net/data/autopkgtest/testing/amd64/g/gerris/201739/log.gz
worker#1 -8
https://ci.debian.net/data/autopkgtest/testing/amd64/e/esys-particle/201735/log.gz
worker#6
https://ci.debian.net/data/autopkgtest/testing/amd64/l/lammps/189185/log.gz
worker#7 -6
https://ci.debian.net/data/autopkgtest/testing/amd64/e/esys-particle/189181/log.gz
worker#2 -6
https://ci.debian.net/data/autopkgtest/unstable/amd64/l/liggghts/188523/log.gz
worker#8 -6
https://ci.debian.net/data/autopkgtest/testing/amd64/e/esys-particle/184333/log.gz
worker#3 -6
https://ci.debian.net/data/autopkgtest/unstable/amd64/e/esys-particle/180310/log.gz
worker#3
https://ci.debian.net/data/autopkgtest/unstable/amd64/l/lammps/173837/log.gz

pass:
worker#4 -8
https://ci.debian.net/data/autopkgtest/unstable/amd64/g/gerris/205247/log.gz
worker#4
https://ci.debian.net/data/autopkgtest/unstable/amd64/l/lammps/202788/log.gz
worker#9 -8
https://ci.debian.net/data/autopkgtest/unstable/amd64/g/gmsh/202759/log.gz
worker#9
https://ci.debian.net/data/autopkgtest/testing/amd64/l/lammps/195323/log.gz
worker#7 -6
https://ci.debian.net/data/autopkgtest/testing/amd64/g/gerris/195319/log.gz
worker#1 -6
https://ci.debian.net/data/autopkgtest/testing/amd64/e/esys-particle/190183/log.gz
worker#8 -6
https://ci.debian.net/data/autopkgtest/unstable/amd64/g/gmsh/189823/log.gz
worker#8
https://ci.debian.net/data/autopkgtest/unstable/amd64/l/lammps/185784/log.gz
worker#10
https://ci.debian.net/data/autopkgtest/unstable/amd64/l/lammps/180338/log.gz
worker#8
https://ci.debian.net/data/autopkgtest/unstable/amd64/l/lammps/177294/log.gz

transition issue:
worker#1
https://ci.debian.net/data/autopkgtest/testing/amd64/l/lammps/201064/log.gz
worker#7 -7
https://ci.debian.net/data/autopkgtest/testing/amd64/e/esys-particle/201044/log.gz
worker#7
https://ci.debian.net/data/autopkgtest/testing/amd64/l/lammps/196312/log.gz
worker#9 -7
https://ci.debian.net/data/autopkgtest/testing/amd64/e/esys-particle/196303/log.gz
worker#7
https://ci.debian.net/data/autopkgtest/unstable/amd64/l/lammps/168795/log.gz
worker#9
https://ci.debian.net/data/autopkgtest/unstable/amd64/l/lammps/156573/log.gz


It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "(null)" (-43) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[lammps-1523947718:1047] Local abort before MPI_INIT completed completed
successfully, but am not able to aggregate error messages, and not able
to guarantee that all other processes were killed!
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[lammps-1523947718:1048] Local abort before MPI_INIT completed completed
successfully, but am not able to aggregate error messages, and not able
to guarantee that all other processes were killed!

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://alioth-lists.debian.net/pipermail/debian-med-packaging/attachments/20180425/a899651b/attachment.sig>


More information about the Debian-med-packaging mailing list