[Debichem-devel] Bug#1006788: Bug#1006788: bagel: autopkgtest failure with new mpich.

Michael Banck mbanck at debian.org
Sun Nov 27 09:46:23 GMT 2022


Hi,

On Wed, Aug 17, 2022 at 10:25:38PM +0200, Paul Gevers wrote:
> Control: severity -1 serious
> Control: retitle -1 autopkgtest fails on hosts with lots of RAM/cores
> 
> Hi,
> 
> On Sun, 3 Apr 2022 19:42:42 +0200 Michael Banck <mbanck at debian.org> wrote:
> > Hrm, it seems that test case passed now on the latest upload:
> > https://ci.debian.net/data/autopkgtest/unstable/amd64/b/bagel/20573831/log.gz
> > 
> > |Get:14 http://deb.debian.org/debian unstable/main amd64 libmpich12 amd64 4.0.1-1 [4,924 kB]
> > [...]
> > |running test case 'he3_svp_asd-dmrg'... PASSED.
> > 
> > So I'm a bit at a loss about what's going on here, perhaps that test
> > case really is just flakey.
> 
> Yes, this test looks flaky (I came here because it was blocking glibc). The
> good news is however, it seems related to the host that runs the test. I.e.
> the test fails on our beefy amd64 host (ci-worker13) with 64 cores and 256GB
> RAM, but seems to pass on the others.
> 
> The error on s390x is the same by the way (that has 10 cores and 32GB RAM).

I can reproduce this again on my developer (amd64) notebook.

If I downgrade mpich from 4.0.2 to 3.x, it passes fine:

|(unstable-amd64-sbuild)mba at curie:/tmp/autopkgtest.p02Sns/build.Osj/src$ dpkg -l | grep mpich
|ii  libmpich12:amd64                   3.4.1-5                           amd64        Shared libraries for MPICH
|(unstable-amd64-sbuild)mba at curie:/tmp/autopkgtest.p02Sns/build.Osj/src$ ./debian/tests/testsuite.sh 
|running test case 'he3_svp_asd-dmrg'... PASSED.
|All tests passed
|(unstable-amd64-sbuild)mba at curie:/tmp/autopkgtest.p02Sns/build.Osj/src$ dpkg -l | grep mpich
|ii  libmpich12:amd64                   4.0.2-2                           amd64        Shared libraries for MPICH
|(unstable-amd64-sbuild)mba at curie:/tmp/autopkgtest.p02Sns/build.Osj/src$ ./debian/tests/testsuite.sh 
|running test case 'he3_svp_asd-dmrg'... FAILED.
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * broadcast                                   0.00
|                 * dmrg block                                  0.00
|  >> ** ..             0.17
|
| ===== Starting sweeps =====
|
|  o convergence threshold: 1.0000e-08
|  iter state         sweep average     sweep range      dE average
|  ERROR: EXCEPTION RAISED:  dsyev/pdsyevd failed in Matrix
|1 tests failed

If I set BAGEL_NUM_THREADS as Graham suggests it also passes, so I'll
upload that now:

|(unstable-amd64-sbuild)mba at curie:/tmp/autopkgtest.p02Sns/build.Osj/src$ BAGEL_NUM_THREADS=4 ./debian/tests/testsuite.sh 
|running test case 'he3_svp_asd-dmrg'... PASSED.
|All tests passed


Michael



More information about the Debichem-devel mailing list