cuda with gtx980 on Jessie

Alois Schloegl alois.schloegl at ist.ac.at
Thu Sep 10 10:17:46 UTC 2015


Hi,


I'm trying to setup a machine with two GeForce GTX 980 for the use with
Cuda. I'd like to use the debian package because, the nvidia release
does not support debian, and the machine is part of a cluster running
debian.

Because, the GTX 980 does not seem to be supported by the 340.x drivers
[1], (I tried and it did not work), I had to go with the experimental
branch, and basically

  apt-get -t experimental install nvidia-driver nvidia-cuda-toolkit.

I also followed the instructions on

http://linuxconfig.org/nvidia-geforce-driver-installation-on-debian-jessie-linux-8-64bit


I see the following issues,

(1) when trying to run a cuda application, it fails already at
cudaGetDeviceCount(..) returnes with error code "cudaErrorNoDevice".

#include <cuda.h>
#include <stdio.h>

int main()
{
        printf("\nSimple GPU info query.\n\n");
        int ngpu;
        int status = cudaGetDeviceCount(&ngpu);
	...
}


(2) When running /usr/bin/nvidia-smi, it fails with
Failed to initialize NVML: GPU access blocked by the operating system


(3) wenn running nvidia-settings, it fails with:
#
** (nvidia-settings:2685): WARNING **: Couldn't connect to accessibility
bus: Failed to connect to socket /tmp/dbus-G8Z2cGOBFu: Connection refused

ERROR: nvidia-settings could not find the registry key file. This file
should have been installed along with this driver at
/usr/share/nvidia/nvidia-application-profiles-key-documentation, also
I've these fails available.


-rw-r--r-- 1 root root 4579 Jul 22 03:26
/usr/share/nvidia/nvidia-application-profiles-352.30-rc
-rw-r--r-- 1 root root 6041 Jul 22 03:26
/usr/share/nvidia/nvidia-application-profiles-352.30-key-documentation
-r--r--r-- 1 root root 4579 Sep 10 00:00
/usr/share/nvidia/nvidia-application-profiles-352.41-rc
-r--r--r-- 1 root root 6041 Sep 10 00:00
/usr/share/nvidia/nvidia-application-profiles-352.41-key-documentation


* modinfo nvidia-current
filename:       /lib/modules/3.16.0-4-amd64/updates/dkms/nvidia-current.ko
alias:          char-major-195-*
version:        352.30
supported:      external
license:        NVIDIA
alias:          pci:v000010DEd00000E00sv*sd*bc04sc80i00*
alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
depends:        drm,i2c-core
vermagic:       3.16.0-4-amd64 SMP mod_unload modversions
parm:           NVreg_Mobile:int
parm:           NVreg_ResmanDebugLevel:int
parm:           NVreg_RmLogonRC:int
parm:           NVreg_ModifyDeviceFiles:int
parm:           NVreg_DeviceFileUID:int
parm:           NVreg_DeviceFileGID:int
parm:           NVreg_DeviceFileMode:int
parm:           NVreg_UpdateMemoryTypes:int
parm:           NVreg_InitializeSystemMemoryAllocations:int
parm:           NVreg_UsePageAttributeTable:int
parm:           NVreg_MapRegistersEarly:int
parm:           NVreg_RegisterForACPIEvents:int
parm:           NVreg_CheckPCIConfigSpace:int
parm:           NVreg_EnablePCIeGen3:int
parm:           NVreg_EnableMSI:int
parm:           NVreg_MemoryPoolSize:int
parm:           NVreg_RegistryDwords:charp
parm:           NVreg_RmMsg:charp
parm:           NVreg_AssignGpus:charp


* nvidia-modprobe --version

nvidia-modprobe:  version 352.39  (buildmeister at vm-ubuntu1404-x64-001)
Fri Aug 14 21:53:43 PDT 2015


I assume that these issues are all related, indiciating some
non-matching components. Did I miss anything, what else can I try ?



the nvidia package list (obtained with  "dpkg -l|grep -i nvidia") is:

ii  glx-alternative-nvidia                 0.5.1
        amd64        allows the selection of NVIDIA as GLX provider
ii  libcublas6.0:amd64                     6.0.37-5
        amd64        NVIDIA cuBLAS Library
ii  libcublas6.5:amd64                     6.5.14-1
        amd64        NVIDIA cuBLAS Library
ii  libcuda1:amd64                         352.30-1
        amd64        NVIDIA CUDA Driver Library
rc  libcuda1-352                           352.39-0ubuntu1
        amd64        NVIDIA CUDA runtime library
ii  libcudart6.0:amd64                     6.0.37-5
        amd64        NVIDIA CUDA Runtime Library
ii  libcudart6.5:amd64                     6.5.14-1
        amd64        NVIDIA CUDA Runtime Library
ii  libcufft6.0:amd64                      6.0.37-5
        amd64        NVIDIA cuFFT Library
ii  libcufft6.5:amd64                      6.5.14-1
        amd64        NVIDIA cuFFT Library
ii  libcufftw6.0:amd64                     6.0.37-5
        amd64        NVIDIA cuFFTW Library
ii  libcufftw6.5:amd64                     6.5.14-1
        amd64        NVIDIA cuFFTW Library
rc  libcuinj64-6.0:amd64                   6.0.37-5
        amd64        NVIDIA CUINJ Library (64-bit)
ii  libcuinj64-6.5:amd64                   6.5.14-1
        amd64        NVIDIA CUINJ Library (64-bit)
ii  libcurand6.0:amd64                     6.0.37-5
        amd64        NVIDIA cuRAND Library
ii  libcurand6.5:amd64                     6.5.14-1
        amd64        NVIDIA cuRAND Library
ii  libcusparse6.0:amd64                   6.0.37-5
        amd64        NVIDIA cuSPARSE Library
ii  libcusparse6.5:amd64                   6.5.14-1
        amd64        NVIDIA cuSPARSE Library
ii  libnppc6.0:amd64                       6.0.37-5
        amd64        NVIDIA Performance Primitives core runtime library
ii  libnppc6.5:amd64                       6.5.14-1
        amd64        NVIDIA Performance Primitives core runtime library
ii  libnppi6.0:amd64                       6.0.37-5
        amd64        NVIDIA Performance Primitives for image processing
runtime library
ii  libnppi6.5:amd64                       6.5.14-1
        amd64        NVIDIA Performance Primitives for image processing
runtime library
ii  libnpps6.0:amd64                       6.0.37-5
        amd64        NVIDIA Performance Primitives for signal processing
runtime library
ii  libnpps6.5:amd64                       6.5.14-1
        amd64        NVIDIA Performance Primitives for signal processing
runtime library
ii  libnvcuvid1:amd64                      352.30-1
        amd64        NVIDIA CUDA Video Decoder runtime library
ii  libnvidia-eglcore:amd64                352.30-1
        amd64        NVIDIA binary EGL core libraries
ii  libnvidia-ml1:amd64                    352.30-1
        amd64        NVIDIA Management Library (NVML) runtime library
ii  libnvtoolsext1:amd64                   6.5.14-1
        amd64        NVIDIA Tools Extension Library
ii  libnvvm2:amd64                         6.5.14-1
        amd64        NVIDIA NVVM Library
ii  nvidia-alternative                     352.30-1
        amd64        allows the selection of NVIDIA as GLX provider
ii  nvidia-cuda-dev                        6.5.14-1
        amd64        NVIDIA CUDA development files
ii  nvidia-cuda-doc                        6.5.14-1
        all          NVIDIA CUDA and OpenCL documentation
ii  nvidia-cuda-gdb                        6.5.14-1
        amd64        NVIDIA CUDA Debugger (GDB)
ii  nvidia-cuda-toolkit                    6.5.14-1
        amd64        NVIDIA CUDA development toolkit
ii  nvidia-detect                          352.30-1
        amd64        NVIDIA GPU detection utility
ii  nvidia-installer-cleanup               20141201+1
        amd64        cleanup after driver installation with the
nvidia-installer
ii  nvidia-kernel-common                   20141201+1
        amd64        NVIDIA binary kernel module support files
ii  nvidia-kernel-dkms                     352.30-1
        amd64        NVIDIA binary kernel module DKMS source
ii  nvidia-kernel-source                   352.30-1
        amd64        NVIDIA binary kernel module source
ii  nvidia-modprobe                        352.39-0ubuntu1
        amd64        Load the NVIDIA kernel driver and create device files
ii  nvidia-profiler                        6.5.14-1
        amd64        NVIDIA Profiler for CUDA and OpenCL
ii  nvidia-smi                             352.30-1
        amd64        NVIDIA System Management Interface
ii  nvidia-support                         20141201+1
        amd64        NVIDIA binary graphics driver support files
ii  nvidia-visual-profiler                 6.5.14-1
        amd64        NVIDIA Visual Profiler for CUDA and OpenCL
ii  nvidia-xconfig                         340.46-1
        amd64        X configuration tool for non-free NVIDIA drivers



* It seems like a bug that the debian package
nvidia-cuda-toolbox/nvidia-drivers does not support GTX980 - at least
its a wishlist item. Is there any chance that the package will be
upgraded soon ?

* Do you think that the non-matching version number in nvidia-modprobe
(:  version 352.39) and nvidia drivers (352.30) is part of the problem ?
  A web search seem to suggest that this is also a possible cause [2,3]

* Do you thing that nvidia-xconfig 352.30-1 is not in experimental yet,
is part of the problem? Any plans to upgrade it soon ?

Do you have any plans to upgrade the nvidia packages to some 352.xx
version anytime soon ? Is there anything I can help you with (testing ?)


Kind regards,
  Alois




[1] http://www.nvidia.com/object/unix.html
[2] https://github.com/Kaixhin/dockerfiles/issues/1
[3]
https://devtalk.nvidia.com/default/topic/842147/problem-with-cuda-7-toolkit-on-centos-6-6/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: OpenPGP digital signature
URL: <http://lists.alioth.debian.org/pipermail/pkg-nvidia-devel/attachments/20150910/eb9f7e92/attachment.sig>


More information about the pkg-nvidia-devel mailing list