Bug#879792: nvidia-detect: misses nvidia tesla p100 (GP100GL)

Vincent McIntyre vincent.mcintyre at csiro.au
Thu Oct 26 01:22:25 UTC 2017


Package: nvidia-detect
Version: 375.82-6
Severity: normal
Tags: patch
thanks

Also found in
375.82-1~deb9u1
375.82-4~bpo9+1
384.90-1

I have a system with a nvidia tesla p100 GPU:
# lspci -v -s 04:00

04:00.0 3D controller: NVIDIA Corporation GP100GL (rev a1)
        Subsystem: NVIDIA Corporation GP100GL
        Flags: fast devsel, IRQ 113, NUMA node 0
        Memory at 91000000 (32-bit, non-prefetchable) [size=16M]
        Memory at 3b800000000 (64-bit, prefetchable) [size=16G]
        Memory at 3bc00000000 (64-bit, prefetchable) [size=32M]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [250] Latency Tolerance Reporting
        Capabilities: [258] L1 PM Substates
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [420] Advanced Error Reporting
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900] #19
        Kernel modules: nouveau

# lspci -n -s 04:00
04:00.0 0302: 10de:15f8 (rev a1)

The nvidia-detect script does not detect there is support for this
unless I supply the PCI ID, which is inconvenient for automation.

# nvidia-detect
No NVIDIA GPU detected.

# nvidia-detect 10de:15f8
Checking driver support for PCI ID [10de:15f8]
Your card is supported by the default drivers.
It is recommended to install the
    nvidia-driver
package.

The fix looks straightforward, allow the processing of lspci -mn to
match the appropriate device class:

# diff -u /usr/bin/nvidia-detect{,.new}
--- /usr/bin/nvidia-detect      2017-10-26 08:43:02.568875942 +0800
+++ /usr/bin/nvidia-detect.new  2017-10-26 08:42:42.792830644 +0800
@@ -215,7 +215,7 @@
                exit 1
        fi

-       NV_DEVICES=$(lspci -mn | awk '{ gsub("\"",""); if ($2 == "0300" && ($3 == "10de" || $3 == "12d2")) { print $1 } }')
+       NV_DEVICES=$(lspci -mn | sed -e's/"//g'| awk '$2 ~ /^030[0-2]$/{ if ( ($3 == "10de" || $3 == "12d2")) { print $1 } }')

        if [ -z "$NV_DEVICES" ]; then
                echo "No NVIDIA GPU detected."

#  /usr/bin/nvidia-detect.new
Detected NVIDIA GPUs:
04:00.0 3D controller [0302]: NVIDIA Corporation GP100GL [10de:15f8] (rev a1)
82:00.0 3D controller [0302]: NVIDIA Corporation GP100GL [10de:15f8] (rev a1)

Checking card:  NVIDIA Corporation GP100GL (rev a1)
Your card is supported by the default drivers.
It is recommended to install the
    nvidia-driver
package.

Checking card:  NVIDIA Corporation GP100GL (rev a1)
Your card is supported by the default drivers.
It is recommended to install the
    nvidia-driver
package.



The bit I am unsure about is which classes should be matched.
https://pci-ids.ucw.cz/read/PD/03 lists these subclasses:
  00	VGA compatible controller	
  01	XGA compatible controller	
  02	3D controller	
  80	Display controller

So I think my suggested change is ok but there may be other considerations
I am not aware of.

Comments?

Kind regards
Vince


-- Package-specific info:
uname -a:
Linux testbox 4.9.0-4-amd64 #1 SMP Debian 4.9.51-1 (2017-09-28) x86_64 GNU/Linux

/proc/version:
Linux version 4.9.0-4-amd64 (debian-kernel at lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.51-1 (2017-09-28)

lspci 'VGA compatible controller [0300]':
0a:00.0 VGA compatible controller [0300]: Matrox Electronics Systems Ltd. G200eR2 [102b:0534] (rev 01) (prog-if 00 [VGA controller])
	Subsystem: Dell G200eR2 [1028:0600]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 64 (4000ns min, 8000ns max), Cache Line Size: 32 bytes
	Interrupt: pin A routed to IRQ 19
	NUMA node: 0
	Region 0: Memory at 90000000 (32-bit, prefetchable) [size=16M]
	Region 1: Memory at 92800000 (32-bit, non-prefetchable) [size=16K]
	Region 2: Memory at 92000000 (32-bit, non-prefetchable) [size=8M]
	[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: mgag200
	Kernel modules: mgag200

dmesg:

Device node permissions:
crw-rw---- 1 root video 226,  0 Oct 25 11:36 /dev/dri/card0
crw-rw---- 1 root video 226, 64 Oct 25 11:36 /dev/dri/controlD64
video:x:44:

OpenGL and NVIDIA library files installed:

/etc/modprobe.d:
total 12
drwxr-xr-x  2 root root 4096 Oct 25 12:19 .
drwxr-xr-x 95 root root 4096 Oct 26 08:40 ..


/etc/modules-load.d:
-rw-r--r-- 1 root root  195 Oct 25 11:20 /etc/modules

/etc/modules-load.d/:
total 8
drwxr-xr-x  2 root root 4096 Oct 25 11:20 .
drwxr-xr-x 95 root root 4096 Oct 26 08:40 ..
lrwxrwxrwx  1 root root   10 Jul  6 04:31 modules.conf -> ../modules


Files from nvidia-installer:

Config and logfiles:

<<<<<<<<<< Xorg (journald) >>>>>>>>>>
^^^^^^^^^^ Xorg (journald) ^^^^^^^^^^

Kernel modules: nvidia.ko


lsmod:
Module                  Size  Used by
mpt3sas               217088  1
raid_class             16384  1 mpt3sas
scsi_transport_sas     45056  1 mpt3sas
mptctl                 36864  1
mptbase                77824  1 mptctl
dell_rbu               16384  0
ipmi_devintf           20480  2
intel_rapl             20480  0
sb_edac                24576  0
edac_core              57344  1 sb_edac
x86_pkg_temp_thermal    16384  0
intel_powerclamp       16384  0
coretemp               16384  0
kvm_intel             192512  0
kvm                   589824  1 kvm_intel
irqbypass              16384  1 kvm
crct10dif_pclmul       16384  0
crc32_pclmul           16384  0
ghash_clmulni_intel    16384  0
nouveau              1544192  0
intel_cstate           16384  0
iTCO_wdt               16384  0
iTCO_vendor_support    16384  1 iTCO_wdt
lpc_ich                24576  0
intel_uncore          118784  0
joydev                 20480  0
evdev                  24576  1
mxm_wmi                16384  1 nouveau
intel_rapl_perf        16384  0
mfd_core               16384  1 lpc_ich
video                  40960  1 nouveau
mei_me                 36864  0
mgag200                45056  1
ttm                    98304  2 mgag200,nouveau
sg                     32768  0
ipmi_si                57344  1
ipmi_msghandler        49152  2 ipmi_devintf,ipmi_si
dcdbas                 16384  0
pcspkr                 16384  0
mei                   102400  1 mei_me
drm_kms_helper        155648  2 mgag200,nouveau
drm                   360448  5 mgag200,nouveau,ttm,drm_kms_helper
i2c_algo_bit           16384  2 mgag200,nouveau
shpchp                 36864  0
wmi                    16384  2 mxm_wmi,nouveau
acpi_power_meter       20480  0
button                 16384  1 nouveau
ip_tables              24576  0
x_tables               36864  1 ip_tables
autofs4                40960  3
ext4                  585728  8
crc16                  16384  1 ext4
jbd2                  106496  1 ext4
crc32c_generic         16384  0
fscrypto               28672  1 ext4
ecb                    16384  0
mbcache                16384  9 ext4
mlx4_en               114688  0
dm_mod                118784  27
sr_mod                 24576  0
cdrom                  61440  1 sr_mod
hid_generic            16384  0
usbhid                 53248  0
hid                   122880  2 hid_generic,usbhid
sd_mod                 45056  2
crc32c_intel           24576  16
aesni_intel           167936  1
aes_x86_64             20480  1 aesni_intel
glue_helper            16384  1 aesni_intel
lrw                    16384  1 aesni_intel
gf128mul               16384  1 lrw
ablk_helper            16384  1 aesni_intel
cryptd                 24576  3 ablk_helper,ghash_clmulni_intel,aesni_intel
mlx4_core             303104  1 mlx4_en
devlink                28672  2 mlx4_en,mlx4_core
ahci                   36864  0
ehci_pci               16384  0
libahci                32768  1 ahci
ehci_hcd               81920  1 ehci_pci
tg3                   159744  0
ptp                    20480  2 tg3,mlx4_en
libata                249856  2 ahci,libahci
usbcore               249856  3 usbhid,ehci_hcd,ehci_pci
megaraid_sas          131072  3
pps_core               16384  1 ptp
usb_common             16384  1 usbcore
libphy                 49152  1 tg3
scsi_mod              225280  9 sd_mod,megaraid_sas,mptctl,scsi_transport_sas,libata,mpt3sas,raid_class,sr_mod,sg

xrandr:

OpenCL ICDs:


-- System Information:
Debian Release: 9.1
  APT prefers stable
  APT policy: (990, 'stable'), (500, 'stable-debug')
Architecture: amd64 (x86_64)

Kernel: Linux 4.9.0-4-amd64 (SMP w/32 CPU cores)
Locale: LANG=en_AU.UTF-8, LC_CTYPE=en_AU.UTF-8 (charmap=UTF-8), LANGUAGE=en_AU:en (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages nvidia-detect depends on:
ii  pciutils  1:3.5.2-1

nvidia-detect recommends no packages.

nvidia-detect suggests no packages.

Versions of packages nvidia-detect is related to:
pn  bumblebee                      <none>
pn  bumblebee-nvidia               <none>
pn  ccache                         <none>
pn  libcuda1                       <none>
pn  libcuda1-any                   <none>
pn  libdrm-nouveau1                <none>
pn  libdrm-nouveau1a               <none>
pn  libdrm-nouveau2                <none>
pn  libgl1-nvidia-glx-any          <none>
pn  libopencl1                     <none>
pn  libvulkan1                     <none>
pn  linux-headers                  <none>
ii  make                           4.1-9.1
pn  nvidia-driver                  <none>
pn  nvidia-glx                     <none>
pn  nvidia-glx-any                 <none>
pn  nvidia-kernel-common           <none>
pn  nvidia-kernel-dkms             <none>
pn  nvidia-kernel-source           <none>
pn  nvidia-kernel-support-any      <none>
pn  nvidia-modprobe                <none>
pn  nvidia-settings                <none>
pn  nvidia-support                 <none>
pn  nvidia-xconfig                 <none>
pn  opencl-icd                     <none>
pn  vulkan-icd                     <none>
pn  xserver-xorg                   <none>
pn  xserver-xorg-core              <none>
pn  xserver-xorg-legacy            <none>
pn  xserver-xorg-video-nouveau     <none>
pn  xserver-xorg-video-nvidia-any  <none>

-- debconf-show failed

--



More information about the pkg-nvidia-devel mailing list