cuda broken after upgrade (x86_64, Jessie, GTX980)

Alois Schloegl alois.schloegl at ist.ac.at
Sun Feb 28 23:53:40 UTC 2016




When trying to upgrade cuda on a Debian/Jessie machine with 4 GTX980
cards, cuda became unusable. nvidia-smi reports this error:

# nvidia-smi
Failed to initialize NVML: GPU access blocked by the operating system

and a short cuda test program fails also because it does not find the GPU.


Here is some diagnostic information about the system.
First, the installed nvidia packages:


# dpkg -l|grep -i nvidia
ii  glx-alternative-nvidia                 0.7.1~bpo8+1
        amd64        allows the selection of NVIDIA as GLX provider
ii  libcublas6.5:amd64                     6.5.19-3~bpo8+1
        amd64        NVIDIA cuBLAS Library
ii  libcuda1:amd64                         352.79-1~bpo8+1
        amd64        NVIDIA CUDA Driver Library
ii  libcudart6.5:amd64                     6.5.19-3~bpo8+1
        amd64        NVIDIA CUDA Runtime Library
ii  libcufft6.5:amd64                      6.5.19-3~bpo8+1
        amd64        NVIDIA cuFFT Library
ii  libcufftw6.5:amd64                     6.5.19-3~bpo8+1
        amd64        NVIDIA cuFFTW Library
ii  libcuinj64-6.5:amd64                   6.5.19-3~bpo8+1
        amd64        NVIDIA CUINJ Library (64-bit)
ii  libcurand6.5:amd64                     6.5.19-3~bpo8+1
        amd64        NVIDIA cuRAND Library
ii  libcusparse6.5:amd64                   6.5.19-3~bpo8+1
        amd64        NVIDIA cuSPARSE Library
ii  libegl1-nvidia:amd64                   352.79-1~bpo8+1
        amd64        NVIDIA binary EGL libraries
ii  libgl1-nvidia-glx:amd64                352.79-1~bpo8+1
        amd64        NVIDIA binary OpenGL libraries
ii  libgles1-nvidia:amd64                  352.79-1~bpo8+1
        amd64        NVIDIA binary OpenGL|ES 1.x libraries
ii  libgles2-nvidia:amd64                  352.79-1~bpo8+1
        amd64        NVIDIA binary OpenGL|ES 2.x libraries
ii  libnppc6.5:amd64                       6.5.19-3~bpo8+1
        amd64        NVIDIA Performance Primitives core runtime library
ii  libnppi6.5:amd64                       6.5.19-3~bpo8+1
        amd64        NVIDIA Performance Primitives for image processing
runtime library
ii  libnpps6.5:amd64                       6.5.19-3~bpo8+1
        amd64        NVIDIA Performance Primitives for signal processing
runtime library
ii  libnvidia-compiler:amd64               352.79-1~bpo8+1
        amd64        NVIDIA runtime compiler library
ii  libnvidia-eglcore:amd64                352.79-1~bpo8+1
        amd64        NVIDIA binary EGL core libraries
ii  libnvidia-ml1:amd64                    352.79-1~bpo8+1
        amd64        NVIDIA Management Library (NVML) runtime library
ii  libnvtoolsext1:amd64                   6.5.19-3~bpo8+1
        amd64        NVIDIA Tools Extension Library
ii  libnvvm2:amd64                         6.5.19-3~bpo8+1
        amd64        NVIDIA NVVM Library
ii  nvidia-alternative                     352.79-1~bpo8+1
        amd64        allows the selection of NVIDIA as GLX provider
ii  nvidia-cuda-dev                        6.5.19-3~bpo8+1
        amd64        NVIDIA CUDA development files
ii  nvidia-cuda-doc                        6.5.19-3~bpo8+1
        all          NVIDIA CUDA and OpenCL documentation
ii  nvidia-cuda-gdb                        6.5.19-3~bpo8+1
        amd64        NVIDIA CUDA Debugger (GDB)
ii  nvidia-cuda-mps                        352.79-1~bpo8+1
        amd64        NVIDIA CUDA Multi Process Service (MPS)
ii  nvidia-cuda-toolkit                    6.5.19-3~bpo8+1
        amd64        NVIDIA CUDA development toolkit
ii  nvidia-detect                          352.79-1~bpo8+1
        amd64        NVIDIA GPU detection utility
ii  nvidia-driver                          352.79-1~bpo8+1
        amd64        NVIDIA metapackage
ii  nvidia-driver-bin                      352.79-1~bpo8+1
        amd64        NVIDIA driver support binaries
ii  nvidia-installer-cleanup               20151021+1~bpo8+1
        amd64        cleanup after driver installation with the
nvidia-installer
ii  nvidia-kernel-common                   20151021+1~bpo8+1
        amd64        NVIDIA binary kernel module support files
ii  nvidia-kernel-dkms                     352.79-1~bpo8+1
        amd64        NVIDIA binary kernel module DKMS source
ii  nvidia-kernel-source                   352.79-1~bpo8+1
        amd64        NVIDIA binary kernel module source
ii  nvidia-kernel-support                  352.79-1~bpo8+1
        amd64        NVIDIA binary kernel module support files
ii  nvidia-modprobe                        358.09-1~bpo8+1
        amd64        utility to load NVIDIA kernel modules and create
device nodes
ii  nvidia-opencl-common                   352.79-1~bpo8+1
        amd64        NVIDIA OpenCL driver
ii  nvidia-profiler                        6.5.19-3~bpo8+1
        amd64        NVIDIA Profiler for CUDA and OpenCL
ii  nvidia-settings                        340.93-1~bpo8+1
        amd64        tool for configuring the NVIDIA graphics driver
ii  nvidia-smi                             352.79-1~bpo8+1
        amd64        NVIDIA System Management Interface
ii  nvidia-support                         20151021+1~bpo8+1
        amd64        NVIDIA binary graphics driver support files
ii  nvidia-vdpau-driver:amd64              352.79-1~bpo8+1
        amd64        Video Decode and Presentation API for Unix - NVIDIA
driver
ii  nvidia-xconfig                         340.46-1
        amd64        X configuration tool for non-free NVIDIA drivers
ii  xserver-xorg-video-nvidia              352.79-1~bpo8+1
        amd64        NVIDIA binary Xorg driver




# nvidia-detect
Detected NVIDIA GPUs:
02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204
[GeForce GTX 980] [10de:13c0] (rev a1)
03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204
[GeForce GTX 980] [10de:13c0] (rev a1)
82:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204
[GeForce GTX 980] [10de:13c0] (rev a1)
83:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204
[GeForce GTX 980] [10de:13c0] (rev a1)

Checking card:  NVIDIA Corporation GM204 [GeForce GTX 980] (rev a1)
Your card is only supported by the updated drivers from jessie-backports.
See http://backports.debian.org for instructions how to use backports.
It is recommended to install the
    nvidia-driver/jessie-backports
package.

Checking card:  NVIDIA Corporation GM204 [GeForce GTX 980] (rev a1)
Your card is only supported by the updated drivers from jessie-backports.
See http://backports.debian.org for instructions how to use backports.
It is recommended to install the
    nvidia-driver/jessie-backports
package.

Checking card:  NVIDIA Corporation GM204 [GeForce GTX 980] (rev a1)
Your card is only supported by the updated drivers from jessie-backports.
See http://backports.debian.org for instructions how to use backports.
It is recommended to install the
    nvidia-driver/jessie-backports
package.

Checking card:  NVIDIA Corporation GM204 [GeForce GTX 980] (rev a1)
Your card is only supported by the updated drivers from jessie-backports.
See http://backports.debian.org for instructions how to use backports.
It is recommended to install the
    nvidia-driver/jessie-backports
package.


Trying to install these packages as suggested:

# apt-get install nvidia-driver/jessie-backports
Reading package lists... Done
Building dependency tree
Reading state information... Done
nvidia-driver is already the newest version.
Selected version '352.79-1~bpo8+1' (Debian Backports:jessie-backports
[amd64]) for 'nvidia-driver'
Selected version '352.79-1~bpo8+1' (Debian Backports:jessie-backports
[amd64]) for 'libgl1-nvidia-glx' because of 'nvidia-driver'
Selected version '352.79-1~bpo8+1' (Debian Backports:jessie-backports
[amd64]) for 'nvidia-alternative' because of 'libgl1-nvidia-glx'
Selected version '352.79-1~bpo8+1' (Debian Backports:jessie-backports
[amd64]) for 'libegl1-nvidia' because of 'nvidia-driver'
Selected version '352.79-1~bpo8+1' (Debian Backports:jessie-backports
[amd64]) for 'libnvidia-eglcore' because of 'libegl1-nvidia'
Selected version '352.79-1~bpo8+1' (Debian Backports:jessie-backports
[amd64]) for 'nvidia-driver-bin' because of 'nvidia-driver'
Selected version '352.79-1~bpo8+1' (Debian Backports:jessie-backports
[amd64]) for 'libnvidia-ml1' because of 'nvidia-driver-bin'
Selected version '352.79-1~bpo8+1' (Debian Backports:jessie-backports
[amd64]) for 'xserver-xorg-video-nvidia' because of 'nvidia-driver'
Selected version '352.79-1~bpo8+1' (Debian Backports:jessie-backports
[amd64]) for 'nvidia-vdpau-driver' because of 'xserver-xorg-video-nvidia'
Selected version '352.79-1~bpo8+1' (Debian Backports:jessie-backports
[amd64]) for 'libgles1-nvidia' because of 'nvidia-driver'
Selected version '352.79-1~bpo8+1' (Debian Backports:jessie-backports
[amd64]) for 'libgles2-nvidia' because of 'nvidia-driver'
0 upgraded, 0 newly installed, 0 to remove and 39 not upgraded.


Still, nvidia-detect will report the same message (shown above). This
seems like a bug to me.

Do you have any recommendations, how to make cuda and nvidia-smi usable
again ?


Best,
  Alois





More information about the pkg-nvidia-devel mailing list