How to debug nvidia-smi not detecting a GPU card
Thomas Lange
lange at cs.uni-koeln.de
Tue Feb 21 18:52:47 GMT 2023
Hi,
I have a custom made Debian netboot environment to detect some Nvidia hardware.
Now I have the problem, that the following GPU is show when using lspci, but nvidia-smi does not detect the card.
How can I debug, what's missing? Attached is a list of Debian packages I use in this environment:
I'm using kernel 6.1.0-3-amd64
43:00.0 VGA compatible controller [0300]: NVIDIA Corporation AD102GL [L6000 / RTX 6000 Ada Generation] [10de:26b1] (rev a1) (prog-if 00 [VGA controller])
Subsystem: NVIDIA Corporation AD102GL [L6000] [10de:16a1]
I did not have firmware-nvidia-gsp installed. Do I need this?
I do not understand which firmware or drivers nvidia-smi needs for
certain type of cards. Is there a mapping or a list which components
an ADA Generation card needs?
root at fai-server:/# dpkg -l|grep nvidia
ii firmware-nvidia-tesla-gsp 525.60.13-1 NVIDIA GSP firmware (Tesla version)
ii glx-alternative-nvidia 1.2.2 allows the selection of NVIDIA as GLX provider
ii libglx-nvidia0:amd64 525.60.13-1 NVIDIA binary GLX library
ii libnvidia-egl-wayland1:amd64 1:1.1.10-1 Wayland EGL External Platform library -- shared library
ii libnvidia-eglcore:amd64 525.60.13-1 NVIDIA binary EGL core libraries
ii libnvidia-glcore:amd64 525.60.13-1 NVIDIA binary OpenGL/GLX core libraries
ii libnvidia-glvkspirv:amd64 525.60.13-1 NVIDIA binary Vulkan Spir-V compiler library
ii libnvidia-ml-dev:amd64 11.7.50~11.7.0-2 NVIDIA Management Library (NVML) development files
ii libnvidia-ml1:amd64 525.60.13-1 NVIDIA Management Library (NVML) runtime library
ii libnvidia-ptxjitcompiler1:amd64 525.60.13-1 NVIDIA PTX JIT Compiler library
ii libnvidia-tesla-ml1:amd64 525.60.13-1 NVIDIA Management Library (NVML) runtime library (Tesla version)
ii nvidia-alternative 525.60.13-1 allows the selection of NVIDIA as GLX provider
ii nvidia-cuda-dev:amd64 11.7.60~11.7.0-2 NVIDIA CUDA development files
ii nvidia-cuda-toolkit 11.7.64~11.7.0-2 NVIDIA CUDA development toolkit
ii nvidia-detect 525.60.13-1 NVIDIA GPU detection utility
ii nvidia-egl-common 525.60.13-1 NVIDIA binary EGL driver - common files
ii nvidia-installer-cleanup 20220217+2 cleanup after driver installation with the nvidia-installer
ii nvidia-kernel-common 20220217+2 NVIDIA binary kernel module support files
ii nvidia-kernel-dkms 525.60.13-1 NVIDIA binary kernel module DKMS source
ii nvidia-kernel-support 525.60.13-1 NVIDIA binary kernel module support files
ii nvidia-legacy-check 525.60.13-1 check for NVIDIA GPUs requiring a legacy driver
ii nvidia-modprobe 525.78.01-1 utility to load NVIDIA kernel modules and create device nodes
ii nvidia-opencl-dev:amd64 11.7.60~11.7.0-2 NVIDIA OpenCL development files
ii nvidia-profiler 11.7.50~11.7.0-2 NVIDIA Profiler for CUDA and OpenCL
ii nvidia-smi 525.60.13-1 NVIDIA System Management Interface
ii nvidia-support 20220217+2 NVIDIA binary graphics driver support files
ii nvidia-tesla-alternative 525.60.13-1 allows the selection of NVIDIA as GLX provider (Tesla version)
ii nvidia-tesla-driver-bin 525.60.13-1 NVIDIA driver support binaries (Tesla version)
ii nvidia-tesla-kernel-dkms 525.60.13-1 NVIDIA binary kernel module DKMS source (Tesla version)
ii nvidia-tesla-kernel-support 525.60.13-1 NVIDIA binary kernel module support files (Tesla version)
ii nvidia-tesla-legacy-check 525.60.13-1 check for NVIDIA GPUs requiring a legacy driver (Tesla version)
ii nvidia-tesla-smi 525.60.13-1 NVIDIA System Management Interface (Tesla version)
ii nvidia-tesla-vdpau-driver:amd64 525.60.13-1 Video Decode and Presentation API for Unix - NVIDIA driver (Tesla version)
ii nvidia-vdpau-driver:amd64 525.60.13-1 Video Decode and Presentation API for Unix - NVIDIA driver
ii nvidia-vulkan-common 525.60.13-1 NVIDIA Vulkan driver - common files
ii nvidia-vulkan-icd:amd64 525.60.13-1 NVIDIA Vulkan installable client driver (ICD)
ii firmware-nvidia-tesla-gsp 525.60.13-1 NVIDIA GSP firmware (Tesla version)
ii libcuda1:amd64 525.60.13-1 NVIDIA CUDA Driver Library
ii libcudart11.0:amd64 11.7.60~11.7.0-2 NVIDIA CUDA Runtime Library
--
regards Thomas
More information about the pkg-nvidia-devel
mailing list