Bug#980033: libucx0: UCX ERROR rdma_create_event_channel failed: No such device
Alastair McKinstry
alastair.mckinstry at sceal.ie
Wed Jan 13 10:07:42 GMT 2021
On 13/01/2021 09:16, Drew Parsons wrote:
> Package: libucx0
> Version: 1.10.0~rc1-2
> Severity: serious
> Justification: debci
>
> Our next round of whack-a-mole comes from the new UCX.
> pmix 4.0.0-3 seems to have fixed the pmix error from bug#979744.
>
> debci tests next report a problem with UCX, with
> openmpi 4.1.0-5
> pmix 4.0.0-3
> ucx 1.10.0~rc1-2
Thanks. This appears to be unwanted warnings from UCX that RDMA is not
present.
I'm looking at silencing this via openmpi conf params.
Alastair
> The openmpi debci test at
> https://ci.debian.net/data/autopkgtest/testing/arm64/o/openmpi/9650495/log.gz
> reports:
>
> autopkgtest [15:16:16]: test hello4: [-----------------------
> [1610522176.588740] [ci-013-36a60f22:1417 :0] rdmacm_cm.c:638 UCX ERROR rdma_create_event_channel failed: No such device
> [1610522176.588779] [ci-013-36a60f22:1417 :0] ucp_worker.c:1432 UCX ERROR failed to open CM on component rdmacm with status Input/output error
> [ci-013-36a60f22:01417] ../../../../../../ompi/mca/pml/ucx/pml_ucx.c:273 Error: Failed to create UCP worker
> node 0 : Hello world
> autopkgtest [15:16:17]: test hello4: -----------------------]
> autopkgtest [15:16:18]: test hello4: - - - - - - - - - - results - - - - - - - - - -
> hello4 FAIL stderr: [ci-013-36a60f22:01417] ../../../../../../ompi/mca/pml/ucx/pml_ucx.c:273 Error: Failed to create UCP worker
> autopkgtest [15:16:18]: test hello4: - - - - - - - - - - stderr - - - - - - - - - -
> [ci-013-36a60f22:01417] ../../../../../../ompi/mca/pml/ucx/pml_ucx.c:273 Error: Failed to create UCP worker
> autopkgtest [15:16:18]: @@@@@@@@@@@@@@@@@@@@ summary
> hello1 FAIL stderr: [ci-013-36a60f22:01292] ../../../../../../ompi/mca/pml/ucx/pml_ucx.c:273 Error: Failed to create UCP worker
> hello2 FAIL stderr: [ci-013-36a60f22:01218] ../../../../../../ompi/mca/pml/ucx/pml_ucx.c:273 Error: Failed to create UCP worker
> hello4 FAIL stderr: [ci-013-36a60f22:01417] ../../../../../../ompi/mca/pml/ucx/pml_ucx.c:273 Error: Failed to create UCP worker
>
>
>
> Other client applications fail with the same error.
>
>
> -- System Information:
> Debian Release: bullseye/sid
> APT prefers unstable
> APT policy: (500, 'unstable'), (1, 'experimental')
> Architecture: amd64 (x86_64)
> Foreign Architectures: i386
>
> Kernel: Linux 5.10.0-1-amd64 (SMP w/8 CPU threads)
> Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE
> Locale: LANG=en_AU.UTF-8, LC_CTYPE=en_AU.UTF-8 (charmap=UTF-8), LANGUAGE=en_AU:en
> Shell: /bin/sh linked to /usr/bin/dash
> Init: systemd (via /run/systemd/system)
> LSM: AppArmor: enabled
>
> Versions of packages libucx0 depends on:
> ii ibverbs-providers 33.0-1
> ii libbinutils 2.35.1-7
> ii libc6 2.31-9
> ii libibverbs1 33.0-1
> ii libnuma1 2.0.12-1+b1
> ii librdmacm1 33.0-1
>
> libucx0 recommends no packages.
>
> libucx0 suggests no packages.
>
> -- no debconf information
>
--
Alastair McKinstry, email: alastair at sceal.ie, matrix: @alastair:sceal.ie, phone: 087-6847928
Green Party Councillor, Galway County Council
More information about the debian-science-maintainers
mailing list