[Debian-med-packaging] openmpi/3.1.1.real-3 breaks 10 autopkgtests in a similar fashion: bug?

Alastair McKinstry alastair.mckinstry at sceal.ie
Wed Jul 18 11:49:37 BST 2018


Dear all,

This regression appears to be an issue with openmpi / pmix.

To note, openmpi now uses an external pmix library, and there was a 
regression; while openmpi 3.1.1 was supposed to work with pmix 3.0.0,

on uploading the new pmix it broke openmpi, which was fixed with 
openmpi-3.1.1-3.

Unfortunately this library (or rather libopenmpi3) uses an unversioned 
dependency on libpmix2, and from the tests below was pulling in pmix 2.1.2

rather than the latest 3.0.0.

I'm uploading a new release of openmpi3 to fix this; it works locally 
for me (including building dune-grid) but I dont have a local install

of autopkgtests to work against, so I will await the CI results,

(How do I do autopkgtests locally ? I build using pbuilder)

regards
Alastair



On 17/07/2018 20:29, Paul Gevers wrote:
> Dear maintainers,
>
> With a recent upload of openmpi the autopkgtest of
> dune-grid
> esys-particle
> examl
> gerris
> gmsh
> lammps
> liggghts
> mrbayes
> phyml
> ray
> started to fail in testing. See:
> https://qa.debian.org/excuses.php?package=openmpi and links therein.
>
> Without knowing openmpi, it appears that all these failing cases have
> the same error (example from dune-grid below). Maybe it would be useful
> if there is communication between the maintainers of the packages
> involved on how to fix/improve this situation.
>
> Currently this regression is delaying the migration to testing by 10
> days. It seems to me that the maintainers of the reverse depends would
> want to file a bug (one!) to prevent migration if the solution isn't
> clear by the end of the current age period.
>
> More information about this email and the reason of it can be found on
> https://wiki.debian.org/ContinuousIntegration/RegressionEmailInformation
>
> Paul
>
> https://ci.debian.net/data/autopkgtest/testing/amd64/d/dune-grid/624176/log.gz
>
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
>    getting local rank failed
>    --> Returned value No permission (-17) instead of ORTE_SUCCESS
>

-- 
Alastair McKinstry, <alastair at sceal.ie>, <mckinstry at debian.org>, https://diaspora.sceal.ie/u/amckinstry
Commander Vimes didn’t like the phrase “The innocent have nothing to fear,”
  believing the innocent had everything to fear, mostly from the guilty but in the longer term
  even more from those who say things like “The innocent have nothing to fear.”
  - T. Pratchett, Snuff




More information about the Debian-med-packaging mailing list