Bug#1074350: nvidia-kernel-dkms: Trying to modprobe nvidia-peermem to use NCCL/RDMA/Infiniband with GPUs

Andreas Beckmann anbe at debian.org
Thu Jun 27 19:36:58 BST 2024


On 27/06/2024 18.24, Jeffrey Mark Siskind wrote:
> How can we/I compile the nvidia-peermem driver with Mellanox ib_peer_mem symbols?

Probably not a problem of the nvidia-peermem module but of the kernel 
(or a third-party module) that needs to provide these symbols.

In the next upload I'll apply the patch from the pull request I had 
linked s.t. there will be a message in dmesg if the module refuses to load.

> As an aside (and you might not be the right person to ask): Debian
> 12.5 has OFED (the Infiniband substrate). NVidia/Mellanox has
> developed MOFED which is much newer and has much better support for
> current hardware. E.g. opensm in Debian 12.5 does not work with NDR.
> The version in MOFED does. MOFED is available for Ubuntu (24.04 LTS). But I
> strongly prefer to run Debian. It would be great if MOFED were
> packaged up for Debian. (NVidia has a version of MOFED for Debian. But
> when I installed it, it uninstalled a lot of packages that I
> need. Like OpenCV and ROS. So I uninstalled it, cleaned up the mess,
> and reinstalled all of the standard Debian packages that I need. It
> would be great if there was a properly packaged version that lived in
> the Debian ecosystem.)

This sounds like you want to file a RFP bug against wnpp

https://www.debian.org/devel/wnpp/


Andreas



More information about the pkg-nvidia-devel mailing list