Bug#944769: python3-h5py fails to import if offline due to apparent MPI failure

Thibaut Paumard thibaut at debian.org
Fri Nov 15 10:59:34 GMT 2019


Le 15/11/2019 à 11:21, Thibaut Paumard a écrit :
> There are ways to detect whether a job is running under MPI (at least as
> far as OpenMPI is concerned, I did not find the equivalent for MPICH).
> 
> What I do in my own code is checking for the environment variable
> OMPI_COMM_WORLD_SIZE. If it exists, I know I'm running under OpenMPI and
> I setup the MPI environment accordingly. If it does not, I proceed with
> the serial version of my code.

I just tested that mpich also sets three environment variables (which is
very little compared to what OpenMPI sets). I did not find documentation
on those variables, so I assume they may change without prior notice.
Looks like the two implementation set two common environment varables
(under different names):

OpenMPI                      MPICH
OMPI_COMM_WORLD_SIZE         MPI_LOCALNRANKS
OMPI_COMM_WORLD_LOCAL_RANK   MPI_LOCALRANKID

I mention that because MPICH is still the default MPI implementation on
m68k and h5py is linked with it on this platform.

So h5py could check e.g. OMPI_COMM_WORLD_SIZE and MPI_LOCALNRANKS and
only import mpi4py if one of the two is set.

Note that OpenMPI is implemented in such a way that it is possible to
initialize it from within an application. So it's technically possible
for libhdf5 to use OpenMPI parallelization even if the job has not been
started by mpirun.openmpi.

On the other hand, MPICH jobs *must* be started by mpirun.mpich.

Kind regards, Thibaut.



More information about the debian-science-maintainers mailing list