Bug#1074350: nvidia-kernel-dkms: Trying to modprobe nvidia-peermem to use NCCL/RDMA/Infiniband with GPUs

Jeffrey Mark Siskind qobi at qobi.org
Thu Jun 27 17:24:16 BST 2024


   Could that be your problem?

That looks like it might be.

How can we/I compile the nvidia-peermem driver with Mellanox ib_peer_mem symbols?

As an aside (and you might not be the right person to ask): Debian
12.5 has OFED (the Infiniband substrate). NVidia/Mellanox has
developed MOFED which is much newer and has much better support for
current hardware. E.g. opensm in Debian 12.5 does not work with NDR.
The version in MOFED does. MOFED is available for Ubuntu (24.04 LTS). But I
strongly prefer to run Debian. It would be great if MOFED were
packaged up for Debian. (NVidia has a version of MOFED for Debian. But
when I installed it, it uninstalled a lot of packages that I
need. Like OpenCV and ROS. So I uninstalled it, cleaned up the mess,
and reinstalled all of the standard Debian packages that I need. It
would be great if there was a properly packaged version that lived in
the Debian ecosystem.)

    Jeff (http: //engineering.purdue.edu/~qobi)



More information about the pkg-nvidia-devel mailing list