Bug#1114459: pmix breaks slurm-wlm autopkgtest: causes test to time out
Paul Gevers
elbrus at debian.org
Fri Sep 5 20:11:38 BST 2025
Source: pmix, slurm-wlm
Control: found -1 pmix/6.0.0-3
Control: found -1 slurm-wlm/24.11.5-4
Severity: serious
Tags: sid trixie
User: debian-ci at lists.debian.org
Usertags: breaks needs-update
Dear maintainer(s),
With a recent upload of pmix the autopkgtest of slurm-wlm fails in
testing when that autopkgtest is run with the binary packages of pmix
from unstable. It times out after 2:47h, where normally it only takes
minutes. It passes when run with only packages from testing. In tabular
form:
pass fail
pmix from testing 6.0.0-3
slurm-wlm from testing 24.11.5-4
all others from testing from testing
I copied some of the output at the bottom of this report.
Currently this regression is blocking the migration of pmix to testing
[1]. Due to the nature of this issue, I filed this bug report against
both packages. Can you please investigate the situation and reassign the
bug to the right package?
More information about this bug and the reason for filing it can be found on
https://wiki.debian.org/ContinuousIntegration/RegressionEmailInformation
Paul
[1] https://qa.debian.org/excuses.php?package=pmix
https://ci.debian.net/data/autopkgtest/testing/amd64/s/slurm-wlm/64130829/log.gz
383s ● slurmctld.service - Slurm controller daemon
383s Loaded: loaded (/usr/lib/systemd/system/slurmctld.service;
enabled; preset: enabled)
383s Active: active (running) since Fri 2025-09-05 03:51:07 UTC;
10s ago
383s Invocation: 612aa5cddd6f46faaa0671f23b1f95eb
383s Docs: man:slurmctld(8)
383s Main PID: 3312 (slurmctld)
383s Tasks: 88
383s Memory: 5.2M (peak: 9M)
383s CPU: 84ms
383s CGroup: /system.slice/slurmctld.service
383s ├─3312 /usr/sbin/slurmctld --systemd
383s └─3379 "slurmctld: slurmscriptd"
383s 383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmctld[3312]: slurmctld: No
job state file (/var/lib/slurm/slurmctld/job_state.old) to recover
383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmctld[3312]: slurmctld: error:
Could not open reservation state file
/var/lib/slurm/slurmctld/resv_state: No such file or directory
383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmctld[3312]: slurmctld: error:
NOTE: Trying backup state save file. Reservations may be lost
383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmctld[3312]: slurmctld: No
reservation state file (/var/lib/slurm/slurmctld/resv_state.old) to recover
383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmctld[3312]: slurmctld: error:
Could not open trigger state file
/var/lib/slurm/slurmctld/trigger_state: No such file or directory
383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmctld[3312]: slurmctld: error:
NOTE: Trying backup state save file. Triggers may be lost!
383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmctld[3312]: slurmctld: No
trigger state file (/var/lib/slurm/slurmctld/trigger_state.old) to recover
383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmctld[3312]: slurmctld:
read_slurm_conf: backup_controller not specified
383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmctld[3312]: slurmctld:
Reinitializing job accounting state
383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmctld[3312]: slurmctld: Running
as primary controller
383s ● slurmd.service - Slurm node daemon
383s Loaded: loaded (/usr/lib/systemd/system/slurmd.service;
enabled; preset: enabled)
383s Active: active (running) since Fri 2025-09-05 03:51:07 UTC;
10s ago
383s Invocation: 91f78f38727e43a1b6b612ea9ff72296
383s Docs: man:slurmd(8)
383s Main PID: 3406 (slurmd)
383s Tasks: 12
383s Memory: 2.2M (peak: 3.8M)
383s CPU: 62ms
383s CGroup: /system.slice/slurmd.service
383s └─3406 /usr/sbin/slurmd --systemd
383s 383s Sep 05 03:51:07 ci-248-6c8bbe56 systemd[1]: Starting
slurmd.service - Slurm node daemon...
383s Sep 05 03:51:07 ci-248-6c8bbe56 (slurmd)[3406]: slurmd.service:
Referenced but unset environment variable evaluates to an empty string:
SLURMD_OPTIONS
383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmd[3406]: slurmd:
_read_slurm_cgroup_conf: No cgroup.conf file (/etc/slurm/cgroup.conf),
using defaults
383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmd[3406]:
_read_slurm_cgroup_conf: No cgroup.conf file (/etc/slurm/cgroup.conf),
using defaults
383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmd[3406]: slurmd: error: Node
configuration differs from hardware: CPUs=1:64(hw) Boards=1:1(hw)
SocketsPerBoard=1:1(hw) CoresPerSocket=1:32(hw) ThreadsPerCore=1:2(hw)
383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmd[3406]: slurmd: CPU frequency
setting not configured for this node
383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmd[3406]: slurmd: slurmd
version 24.11.5 started
383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmd[3406]: slurmd: slurmd
started on Fri, 05 Sep 2025 03:51:07 +0000
383s Sep 05 03:51:07 ci-248-6c8bbe56 systemd[1]: Started slurmd.service
- Slurm node daemon.
383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmd[3406]: slurmd: CPUs=1
Boards=1 Sockets=1 Cores=1 Threads=1 Memory=257333 TmpDisk=256000
Uptime=1279 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null)
383s PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
383s test* up infinite 1 idle localhost
383s NODELIST NODES PARTITION STATE 383s localhost 1 test*
idle 10374s autopkgtest [06:37:48]: ERROR: timed out on command "su -s
/bin/bash root -c set -e; exec
/tmp/autopkgtest-lxc.a9795u61/downtmp/wrapper.sh
--artifacts=/tmp/autopkgtest-lxc.a9795u61/downtmp/mpi-artifacts
--chdir=/tmp/autopkgtest-lxc.a9795u61/downtmp/build.CW7/src
--env=AUTOPKGTEST_TESTBED_ARCH=amd64 --env=AUTOPKGTEST_TEST_ARCH=amd64
--env=DEB_BUILD_OPTIONS=parallel=64 --env=DEBIAN_FRONTEND=noninteractive
--env=LANG=C.UTF-8 --unset-env=LANGUAGE --unset-env=LC_ADDRESS
--unset-env=LC_ALL --unset-env=LC_COLLATE --unset-env=LC_CTYPE
--unset-env=LC_IDENTIFICATION --unset-env=LC_MEASUREMENT
--unset-env=LC_MESSAGES --unset-env=LC_MONETARY --unset-env=LC_NAME
--unset-env=LC_NUMERIC --unset-env=LC_PAPER --unset-env=LC_TELEPHONE
--unset-env=LC_TIME --script-pid-file=/tmp/autopkgtest_script_pid
--source-profile
--stderr=/tmp/autopkgtest-lxc.a9795u61/downtmp/mpi-stderr
--stdout=/tmp/autopkgtest-lxc.a9795u61/downtmp/mpi-stdout
--tmp=/tmp/autopkgtest-lxc.a9795u61/downtmp/autopkgtest_tmp
--env=AUTOPKGTEST_NORMAL_USER=debci --env=ADT_NORMAL_USER=debci
--make-executable=/tmp/autopkgtest-lxc.a9795u61/downtmp/build.CW7/src/debian/tests/mpi
-- /tmp/autopkgtest-lxc.a9795u61/downtmp/build.CW7/src/debian/tests/mpi"
(kind: test)
10374s autopkgtest [06:37:48]: test mpi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 585 bytes
Desc: OpenPGP digital signature
URL: <http://alioth-lists.debian.net/pipermail/debian-science-maintainers/attachments/20250905/660d39f3/attachment-0001.sig>
More information about the debian-science-maintainers
mailing list