Bug#980033: libucx0: UCX ERROR rdma_create_event_channel failed: No such device

Alastair McKinstry alastair.mckinstry at sceal.ie
Wed Jan 13 10:07:42 GMT 2021


On 13/01/2021 09:16, Drew Parsons wrote:
> Package: libucx0
> Version: 1.10.0~rc1-2
> Severity: serious
> Justification: debci
>
> Our next round of whack-a-mole comes from the new UCX.
> pmix 4.0.0-3 seems to have fixed the pmix error from bug#979744.
>
> debci tests next report a problem with UCX, with
>    openmpi 4.1.0-5
>    pmix 4.0.0-3
>    ucx 1.10.0~rc1-2

Thanks. This appears to be unwanted warnings from UCX that RDMA is not 
present.

I'm looking at silencing this via openmpi conf params.

Alastair



> The openmpi debci test at
> https://ci.debian.net/data/autopkgtest/testing/arm64/o/openmpi/9650495/log.gz
> reports:
>
> autopkgtest [15:16:16]: test hello4: [-----------------------
> [1610522176.588740] [ci-013-36a60f22:1417 :0]      rdmacm_cm.c:638  UCX  ERROR rdma_create_event_channel failed: No such device
> [1610522176.588779] [ci-013-36a60f22:1417 :0]     ucp_worker.c:1432 UCX  ERROR failed to open CM on component rdmacm with status Input/output error
> [ci-013-36a60f22:01417] ../../../../../../ompi/mca/pml/ucx/pml_ucx.c:273  Error: Failed to create UCP worker
>   node           0 : Hello world
> autopkgtest [15:16:17]: test hello4: -----------------------]
> autopkgtest [15:16:18]: test hello4:  - - - - - - - - - - results - - - - - - - - - -
> hello4               FAIL stderr: [ci-013-36a60f22:01417] ../../../../../../ompi/mca/pml/ucx/pml_ucx.c:273  Error: Failed to create UCP worker
> autopkgtest [15:16:18]: test hello4:  - - - - - - - - - - stderr - - - - - - - - - -
> [ci-013-36a60f22:01417] ../../../../../../ompi/mca/pml/ucx/pml_ucx.c:273  Error: Failed to create UCP worker
> autopkgtest [15:16:18]: @@@@@@@@@@@@@@@@@@@@ summary
> hello1               FAIL stderr: [ci-013-36a60f22:01292] ../../../../../../ompi/mca/pml/ucx/pml_ucx.c:273  Error: Failed to create UCP worker
> hello2               FAIL stderr: [ci-013-36a60f22:01218] ../../../../../../ompi/mca/pml/ucx/pml_ucx.c:273  Error: Failed to create UCP worker
> hello4               FAIL stderr: [ci-013-36a60f22:01417] ../../../../../../ompi/mca/pml/ucx/pml_ucx.c:273  Error: Failed to create UCP worker
>
>
>
> Other client applications fail with the same error.
>
>
> -- System Information:
> Debian Release: bullseye/sid
>    APT prefers unstable
>    APT policy: (500, 'unstable'), (1, 'experimental')
> Architecture: amd64 (x86_64)
> Foreign Architectures: i386
>
> Kernel: Linux 5.10.0-1-amd64 (SMP w/8 CPU threads)
> Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE
> Locale: LANG=en_AU.UTF-8, LC_CTYPE=en_AU.UTF-8 (charmap=UTF-8), LANGUAGE=en_AU:en
> Shell: /bin/sh linked to /usr/bin/dash
> Init: systemd (via /run/systemd/system)
> LSM: AppArmor: enabled
>
> Versions of packages libucx0 depends on:
> ii  ibverbs-providers  33.0-1
> ii  libbinutils        2.35.1-7
> ii  libc6              2.31-9
> ii  libibverbs1        33.0-1
> ii  libnuma1           2.0.12-1+b1
> ii  librdmacm1         33.0-1
>
> libucx0 recommends no packages.
>
> libucx0 suggests no packages.
>
> -- no debconf information
>
-- 
Alastair McKinstry, email: alastair at sceal.ie, matrix: @alastair:sceal.ie, phone: 087-6847928
Green Party Councillor, Galway County Council



More information about the debian-science-maintainers mailing list