Bug#1074350: nvidia-kernel-dkms: Trying to modprobe nvidia-peermem to use NCCL/RDMA/Infiniband with GPUs

Jeffrey Mark Siskind qobi at qobi.org
Thu Jul 4 05:19:28 BST 2024


   So I don't know why the module doesn't load.

   Any ideas?

I figured it out. doca-ofed aka MLNX_OFED needs to have
openibd.service running. It failed because opensmd.service was
running. For some reason, it hung when I tried to stop opensmd.service.
I rebooted and then nvidia-peermem loaded.

    Jeff (http: //engineering.purdue.edu/~qobi)



More information about the pkg-nvidia-devel mailing list