How to debug nvidia-smi not detecting a GPU card

Thomas Lange lange at cs.uni-koeln.de
Tue Feb 21 18:52:47 GMT 2023


Hi,

I have a custom made Debian netboot environment to detect some Nvidia hardware.
Now I have the problem, that the following GPU is show when using lspci, but nvidia-smi does not detect the card.
How can I debug, what's missing? Attached is a list of Debian packages I use in this environment:
I'm using kernel 6.1.0-3-amd64

43:00.0 VGA compatible controller [0300]: NVIDIA Corporation AD102GL [L6000 / RTX 6000 Ada Generation] [10de:26b1] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: NVIDIA Corporation AD102GL [L6000] [10de:16a1]

I did not have firmware-nvidia-gsp installed. Do I need this?
I do not understand which firmware or drivers nvidia-smi needs for
certain type of cards. Is there a mapping or a list which components
an ADA Generation card needs?


root at fai-server:/# dpkg -l|grep nvidia
ii  firmware-nvidia-tesla-gsp       525.60.13-1              NVIDIA GSP firmware (Tesla version)
ii  glx-alternative-nvidia          1.2.2                    allows the selection of NVIDIA as GLX provider
ii  libglx-nvidia0:amd64            525.60.13-1              NVIDIA binary GLX library
ii  libnvidia-egl-wayland1:amd64    1:1.1.10-1               Wayland EGL External Platform library -- shared library
ii  libnvidia-eglcore:amd64         525.60.13-1              NVIDIA binary EGL core libraries
ii  libnvidia-glcore:amd64          525.60.13-1              NVIDIA binary OpenGL/GLX core libraries
ii  libnvidia-glvkspirv:amd64       525.60.13-1              NVIDIA binary Vulkan Spir-V compiler library
ii  libnvidia-ml-dev:amd64          11.7.50~11.7.0-2         NVIDIA Management Library (NVML) development files
ii  libnvidia-ml1:amd64             525.60.13-1              NVIDIA Management Library (NVML) runtime library
ii  libnvidia-ptxjitcompiler1:amd64 525.60.13-1              NVIDIA PTX JIT Compiler library
ii  libnvidia-tesla-ml1:amd64       525.60.13-1              NVIDIA Management Library (NVML) runtime library (Tesla version)
ii  nvidia-alternative              525.60.13-1              allows the selection of NVIDIA as GLX provider
ii  nvidia-cuda-dev:amd64           11.7.60~11.7.0-2         NVIDIA CUDA development files
ii  nvidia-cuda-toolkit             11.7.64~11.7.0-2         NVIDIA CUDA development toolkit
ii  nvidia-detect                   525.60.13-1              NVIDIA GPU detection utility
ii  nvidia-egl-common               525.60.13-1              NVIDIA binary EGL driver - common files
ii  nvidia-installer-cleanup        20220217+2               cleanup after driver installation with the nvidia-installer
ii  nvidia-kernel-common            20220217+2               NVIDIA binary kernel module support files
ii  nvidia-kernel-dkms              525.60.13-1              NVIDIA binary kernel module DKMS source
ii  nvidia-kernel-support           525.60.13-1              NVIDIA binary kernel module support files
ii  nvidia-legacy-check             525.60.13-1              check for NVIDIA GPUs requiring a legacy driver
ii  nvidia-modprobe                 525.78.01-1              utility to load NVIDIA kernel modules and create device nodes
ii  nvidia-opencl-dev:amd64         11.7.60~11.7.0-2         NVIDIA OpenCL development files
ii  nvidia-profiler                 11.7.50~11.7.0-2         NVIDIA Profiler for CUDA and OpenCL
ii  nvidia-smi                      525.60.13-1              NVIDIA System Management Interface
ii  nvidia-support                  20220217+2               NVIDIA binary graphics driver support files
ii  nvidia-tesla-alternative        525.60.13-1              allows the selection of NVIDIA as GLX provider (Tesla version)
ii  nvidia-tesla-driver-bin         525.60.13-1              NVIDIA driver support binaries (Tesla version)
ii  nvidia-tesla-kernel-dkms        525.60.13-1              NVIDIA binary kernel module DKMS source (Tesla version)
ii  nvidia-tesla-kernel-support     525.60.13-1              NVIDIA binary kernel module support files (Tesla version)
ii  nvidia-tesla-legacy-check       525.60.13-1              check for NVIDIA GPUs requiring a legacy driver (Tesla version)
ii  nvidia-tesla-smi                525.60.13-1              NVIDIA System Management Interface (Tesla version)
ii  nvidia-tesla-vdpau-driver:amd64 525.60.13-1              Video Decode and Presentation API for Unix - NVIDIA driver (Tesla version)
ii  nvidia-vdpau-driver:amd64       525.60.13-1              Video Decode and Presentation API for Unix - NVIDIA driver
ii  nvidia-vulkan-common            525.60.13-1              NVIDIA Vulkan driver - common files
ii  nvidia-vulkan-icd:amd64         525.60.13-1              NVIDIA Vulkan installable client driver (ICD)
ii  firmware-nvidia-tesla-gsp       525.60.13-1              NVIDIA GSP firmware (Tesla version)
ii  libcuda1:amd64                  525.60.13-1              NVIDIA CUDA Driver Library
ii  libcudart11.0:amd64             11.7.60~11.7.0-2         NVIDIA CUDA Runtime Library

-- 
regards Thomas



More information about the pkg-nvidia-devel mailing list