Bug#980033: libucx0: UCX ERROR rdma_create_event_channel failed: No such device

Drew Parsons dparsons at debian.org
Wed Jan 13 09:16:05 GMT 2021


Package: libucx0
Version: 1.10.0~rc1-2
Severity: serious
Justification: debci

Our next round of whack-a-mole comes from the new UCX.
pmix 4.0.0-3 seems to have fixed the pmix error from bug#979744.

debci tests next report a problem with UCX, with
  openmpi 4.1.0-5
  pmix 4.0.0-3
  ucx 1.10.0~rc1-2

The openmpi debci test at
https://ci.debian.net/data/autopkgtest/testing/arm64/o/openmpi/9650495/log.gz
reports:

autopkgtest [15:16:16]: test hello4: [-----------------------
[1610522176.588740] [ci-013-36a60f22:1417 :0]      rdmacm_cm.c:638  UCX  ERROR rdma_create_event_channel failed: No such device
[1610522176.588779] [ci-013-36a60f22:1417 :0]     ucp_worker.c:1432 UCX  ERROR failed to open CM on component rdmacm with status Input/output error
[ci-013-36a60f22:01417] ../../../../../../ompi/mca/pml/ucx/pml_ucx.c:273  Error: Failed to create UCP worker
 node           0 : Hello world
autopkgtest [15:16:17]: test hello4: -----------------------]
autopkgtest [15:16:18]: test hello4:  - - - - - - - - - - results - - - - - - - - - -
hello4               FAIL stderr: [ci-013-36a60f22:01417] ../../../../../../ompi/mca/pml/ucx/pml_ucx.c:273  Error: Failed to create UCP worker
autopkgtest [15:16:18]: test hello4:  - - - - - - - - - - stderr - - - - - - - - - -
[ci-013-36a60f22:01417] ../../../../../../ompi/mca/pml/ucx/pml_ucx.c:273  Error: Failed to create UCP worker
autopkgtest [15:16:18]: @@@@@@@@@@@@@@@@@@@@ summary
hello1               FAIL stderr: [ci-013-36a60f22:01292] ../../../../../../ompi/mca/pml/ucx/pml_ucx.c:273  Error: Failed to create UCP worker
hello2               FAIL stderr: [ci-013-36a60f22:01218] ../../../../../../ompi/mca/pml/ucx/pml_ucx.c:273  Error: Failed to create UCP worker
hello4               FAIL stderr: [ci-013-36a60f22:01417] ../../../../../../ompi/mca/pml/ucx/pml_ucx.c:273  Error: Failed to create UCP worker



Other client applications fail with the same error.


-- System Information:
Debian Release: bullseye/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 5.10.0-1-amd64 (SMP w/8 CPU threads)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE
Locale: LANG=en_AU.UTF-8, LC_CTYPE=en_AU.UTF-8 (charmap=UTF-8), LANGUAGE=en_AU:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages libucx0 depends on:
ii  ibverbs-providers  33.0-1
ii  libbinutils        2.35.1-7
ii  libc6              2.31-9
ii  libibverbs1        33.0-1
ii  libnuma1           2.0.12-1+b1
ii  librdmacm1         33.0-1

libucx0 recommends no packages.

libucx0 suggests no packages.

-- no debconf information



More information about the debian-science-maintainers mailing list