cuda with gtx980 on Jessie
Alois Schloegl
alois.schloegl at ist.ac.at
Thu Sep 10 10:17:46 UTC 2015
Hi,
I'm trying to setup a machine with two GeForce GTX 980 for the use with
Cuda. I'd like to use the debian package because, the nvidia release
does not support debian, and the machine is part of a cluster running
debian.
Because, the GTX 980 does not seem to be supported by the 340.x drivers
[1], (I tried and it did not work), I had to go with the experimental
branch, and basically
apt-get -t experimental install nvidia-driver nvidia-cuda-toolkit.
I also followed the instructions on
http://linuxconfig.org/nvidia-geforce-driver-installation-on-debian-jessie-linux-8-64bit
I see the following issues,
(1) when trying to run a cuda application, it fails already at
cudaGetDeviceCount(..) returnes with error code "cudaErrorNoDevice".
#include <cuda.h>
#include <stdio.h>
int main()
{
printf("\nSimple GPU info query.\n\n");
int ngpu;
int status = cudaGetDeviceCount(&ngpu);
...
}
(2) When running /usr/bin/nvidia-smi, it fails with
Failed to initialize NVML: GPU access blocked by the operating system
(3) wenn running nvidia-settings, it fails with:
#
** (nvidia-settings:2685): WARNING **: Couldn't connect to accessibility
bus: Failed to connect to socket /tmp/dbus-G8Z2cGOBFu: Connection refused
ERROR: nvidia-settings could not find the registry key file. This file
should have been installed along with this driver at
/usr/share/nvidia/nvidia-application-profiles-key-documentation, also
I've these fails available.
-rw-r--r-- 1 root root 4579 Jul 22 03:26
/usr/share/nvidia/nvidia-application-profiles-352.30-rc
-rw-r--r-- 1 root root 6041 Jul 22 03:26
/usr/share/nvidia/nvidia-application-profiles-352.30-key-documentation
-r--r--r-- 1 root root 4579 Sep 10 00:00
/usr/share/nvidia/nvidia-application-profiles-352.41-rc
-r--r--r-- 1 root root 6041 Sep 10 00:00
/usr/share/nvidia/nvidia-application-profiles-352.41-key-documentation
* modinfo nvidia-current
filename: /lib/modules/3.16.0-4-amd64/updates/dkms/nvidia-current.ko
alias: char-major-195-*
version: 352.30
supported: external
license: NVIDIA
alias: pci:v000010DEd00000E00sv*sd*bc04sc80i00*
alias: pci:v000010DEd*sv*sd*bc03sc02i00*
alias: pci:v000010DEd*sv*sd*bc03sc00i00*
depends: drm,i2c-core
vermagic: 3.16.0-4-amd64 SMP mod_unload modversions
parm: NVreg_Mobile:int
parm: NVreg_ResmanDebugLevel:int
parm: NVreg_RmLogonRC:int
parm: NVreg_ModifyDeviceFiles:int
parm: NVreg_DeviceFileUID:int
parm: NVreg_DeviceFileGID:int
parm: NVreg_DeviceFileMode:int
parm: NVreg_UpdateMemoryTypes:int
parm: NVreg_InitializeSystemMemoryAllocations:int
parm: NVreg_UsePageAttributeTable:int
parm: NVreg_MapRegistersEarly:int
parm: NVreg_RegisterForACPIEvents:int
parm: NVreg_CheckPCIConfigSpace:int
parm: NVreg_EnablePCIeGen3:int
parm: NVreg_EnableMSI:int
parm: NVreg_MemoryPoolSize:int
parm: NVreg_RegistryDwords:charp
parm: NVreg_RmMsg:charp
parm: NVreg_AssignGpus:charp
* nvidia-modprobe --version
nvidia-modprobe: version 352.39 (buildmeister at vm-ubuntu1404-x64-001)
Fri Aug 14 21:53:43 PDT 2015
I assume that these issues are all related, indiciating some
non-matching components. Did I miss anything, what else can I try ?
the nvidia package list (obtained with "dpkg -l|grep -i nvidia") is:
ii glx-alternative-nvidia 0.5.1
amd64 allows the selection of NVIDIA as GLX provider
ii libcublas6.0:amd64 6.0.37-5
amd64 NVIDIA cuBLAS Library
ii libcublas6.5:amd64 6.5.14-1
amd64 NVIDIA cuBLAS Library
ii libcuda1:amd64 352.30-1
amd64 NVIDIA CUDA Driver Library
rc libcuda1-352 352.39-0ubuntu1
amd64 NVIDIA CUDA runtime library
ii libcudart6.0:amd64 6.0.37-5
amd64 NVIDIA CUDA Runtime Library
ii libcudart6.5:amd64 6.5.14-1
amd64 NVIDIA CUDA Runtime Library
ii libcufft6.0:amd64 6.0.37-5
amd64 NVIDIA cuFFT Library
ii libcufft6.5:amd64 6.5.14-1
amd64 NVIDIA cuFFT Library
ii libcufftw6.0:amd64 6.0.37-5
amd64 NVIDIA cuFFTW Library
ii libcufftw6.5:amd64 6.5.14-1
amd64 NVIDIA cuFFTW Library
rc libcuinj64-6.0:amd64 6.0.37-5
amd64 NVIDIA CUINJ Library (64-bit)
ii libcuinj64-6.5:amd64 6.5.14-1
amd64 NVIDIA CUINJ Library (64-bit)
ii libcurand6.0:amd64 6.0.37-5
amd64 NVIDIA cuRAND Library
ii libcurand6.5:amd64 6.5.14-1
amd64 NVIDIA cuRAND Library
ii libcusparse6.0:amd64 6.0.37-5
amd64 NVIDIA cuSPARSE Library
ii libcusparse6.5:amd64 6.5.14-1
amd64 NVIDIA cuSPARSE Library
ii libnppc6.0:amd64 6.0.37-5
amd64 NVIDIA Performance Primitives core runtime library
ii libnppc6.5:amd64 6.5.14-1
amd64 NVIDIA Performance Primitives core runtime library
ii libnppi6.0:amd64 6.0.37-5
amd64 NVIDIA Performance Primitives for image processing
runtime library
ii libnppi6.5:amd64 6.5.14-1
amd64 NVIDIA Performance Primitives for image processing
runtime library
ii libnpps6.0:amd64 6.0.37-5
amd64 NVIDIA Performance Primitives for signal processing
runtime library
ii libnpps6.5:amd64 6.5.14-1
amd64 NVIDIA Performance Primitives for signal processing
runtime library
ii libnvcuvid1:amd64 352.30-1
amd64 NVIDIA CUDA Video Decoder runtime library
ii libnvidia-eglcore:amd64 352.30-1
amd64 NVIDIA binary EGL core libraries
ii libnvidia-ml1:amd64 352.30-1
amd64 NVIDIA Management Library (NVML) runtime library
ii libnvtoolsext1:amd64 6.5.14-1
amd64 NVIDIA Tools Extension Library
ii libnvvm2:amd64 6.5.14-1
amd64 NVIDIA NVVM Library
ii nvidia-alternative 352.30-1
amd64 allows the selection of NVIDIA as GLX provider
ii nvidia-cuda-dev 6.5.14-1
amd64 NVIDIA CUDA development files
ii nvidia-cuda-doc 6.5.14-1
all NVIDIA CUDA and OpenCL documentation
ii nvidia-cuda-gdb 6.5.14-1
amd64 NVIDIA CUDA Debugger (GDB)
ii nvidia-cuda-toolkit 6.5.14-1
amd64 NVIDIA CUDA development toolkit
ii nvidia-detect 352.30-1
amd64 NVIDIA GPU detection utility
ii nvidia-installer-cleanup 20141201+1
amd64 cleanup after driver installation with the
nvidia-installer
ii nvidia-kernel-common 20141201+1
amd64 NVIDIA binary kernel module support files
ii nvidia-kernel-dkms 352.30-1
amd64 NVIDIA binary kernel module DKMS source
ii nvidia-kernel-source 352.30-1
amd64 NVIDIA binary kernel module source
ii nvidia-modprobe 352.39-0ubuntu1
amd64 Load the NVIDIA kernel driver and create device files
ii nvidia-profiler 6.5.14-1
amd64 NVIDIA Profiler for CUDA and OpenCL
ii nvidia-smi 352.30-1
amd64 NVIDIA System Management Interface
ii nvidia-support 20141201+1
amd64 NVIDIA binary graphics driver support files
ii nvidia-visual-profiler 6.5.14-1
amd64 NVIDIA Visual Profiler for CUDA and OpenCL
ii nvidia-xconfig 340.46-1
amd64 X configuration tool for non-free NVIDIA drivers
* It seems like a bug that the debian package
nvidia-cuda-toolbox/nvidia-drivers does not support GTX980 - at least
its a wishlist item. Is there any chance that the package will be
upgraded soon ?
* Do you think that the non-matching version number in nvidia-modprobe
(: version 352.39) and nvidia drivers (352.30) is part of the problem ?
A web search seem to suggest that this is also a possible cause [2,3]
* Do you thing that nvidia-xconfig 352.30-1 is not in experimental yet,
is part of the problem? Any plans to upgrade it soon ?
Do you have any plans to upgrade the nvidia packages to some 352.xx
version anytime soon ? Is there anything I can help you with (testing ?)
Kind regards,
Alois
[1] http://www.nvidia.com/object/unix.html
[2] https://github.com/Kaixhin/dockerfiles/issues/1
[3]
https://devtalk.nvidia.com/default/topic/842147/problem-with-cuda-7-toolkit-on-centos-6-6/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: OpenPGP digital signature
URL: <http://lists.alioth.debian.org/pipermail/pkg-nvidia-devel/attachments/20150910/eb9f7e92/attachment.sig>
More information about the pkg-nvidia-devel
mailing list