[Debian-med-packaging] Bug#896492: examl: autopkgtest started failing for unclear reasons, but old test logs contain error code 255 as well

Paul Gevers elbrus at debian.org
Sat Apr 21 18:46:00 BST 2018


Source: examl
Version: 3.0.20-1
Severity: wishlist
User: debian-ci at lists.debian.org
Usertags: issue

In preparation of using autopkgtest results for unstable-to-testing
migration, I am currently running autopkgtests for migration candidates
in testing. While investigating the results for openmpi, I looked into
the failures of examl. The autopkgtest in unstable also started to fail
recently (which is not a total surprise as openmpi started a transition,
hence the severity wishlist), but the output isn't very clear why
(copied below). However, I noticed that all the old PASS results have
errors as well, so I wonder if these shouldn't have failed as well? The
point of error doesn't seem to be always the same, so I expect some
races and/or hardware/setup requirements that aren't always fulfilled on
ci.debian.net infrastructure.

Message often spotted in PASSing tests:
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status,
thus causing
the job to be terminated. The first process to do so was:

  Process name: [[18372,1],0]
  Exit code:    255
-------------------------------------------------------

Could you please investigate what is going on? If nothing else, could
you make the output clearer for an bystander? error codes that don't
result in failure are confusing.

Feel free to ask my help to clarify the situation.

Paul

ยน https://ci.debian.net/packages/e/examl/

autopkgtest [15:00:53]: test run-unit-test: [-----------------------
gappyness: 0.074048
Pattern compression: ON

Alignment has 51 completely undetermined sites that will be
automatically removed from the binary alignment file



Your alignment has 628 unique patterns


Under CAT the memory required by ExaML for storing CLVs and tip vectors
will be
1015476 bytes
991 kiloBytes
0 MegaBytes
0 GigaBytes


Under GAMMA the memory required by ExaML for storing CLVs and tip
vectors will be
3969588 bytes
3876 kiloBytes
3 MegaBytes
0 GigaBytes

Please note that, these are just the memory requirements for doing
likelihood calculations!
To be on the safe side, we recommend that you execute ExaML on a system
with twice that memory.


Binary and compressed alignment file written to file 49.unpartitioned.binary

Parsing completed, exiting now ...

gappyness: 0.074048
Pattern compression: ON

Alignment has 51 completely undetermined sites that will be
automatically removed from the binary alignment file



Your alignment has 642 unique patterns


Under CAT the memory required by ExaML for storing CLVs and tip vectors
will be
1038114 bytes
1013 kiloBytes
0 MegaBytes
0 GigaBytes


Under GAMMA the memory required by ExaML for storing CLVs and tip
vectors will be
4058082 bytes
3962 kiloBytes
3 MegaBytes
0 GigaBytes

Please note that, these are just the memory requirements for doing
likelihood calculations!
To be on the safe side, we recommend that you execute ExaML on a system
with twice that memory.


Binary and compressed alignment file written to file 49.partitioned.binary

Parsing completed, exiting now ...

Run MPI on 2 of 2 available processors
Use examl with AVX support and 2 processors
[examl-1523718023:01317] mca_base_component_repository_open: unable to
open mca_pmix_pmix2x: libpmix.so.2: cannot open shared object file: No
such file or directory (ignored)
[examl-1523718023:01317] [[732,0],0] ORTE_ERROR_LOG: Not found in file
ess_hnp_module.c at line 649
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_pmix_base_select failed
  --> Returned value Not found (-13) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
autopkgtest [15:00:53]: test run-unit-test: -----------------------]
autopkgtest [15:00:54]: test run-unit-test:  - - - - - - - - - - results
- - - - - - - - - -
run-unit-test        FAIL non-zero exit status 1

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://alioth-lists.debian.net/pipermail/debian-med-packaging/attachments/20180421/bcf58f8e/attachment.sig>


More information about the Debian-med-packaging mailing list