Bug#1032563: nvidia-driver 525.89.02 fails to initialize with Quadro M2200M
Matvey Soloviev
blackhole89 at gmail.com
Wed Mar 8 23:20:19 GMT 2023
Package: nvidia-driver
Version: 525.89.02-1
Nvidia drivers newer than the 510 series fail to load on my system, which
is a Lenovo Thinkpad P51 with a Quadro M2200 GPU, with BIOS 1.60 and ECP
1.10. I have encountered this bug with driver versions 515, 520 and now the
525 that landed in testing, as well as with a version of 525 installed
using nvidia's official installer, and kernels including 6.0.7 and 6.2.2
from xanmod and 6.1.0-5-amd64 from Debian's official repository. My system
is a mixture of packages from stable and testing, with libc6=2.36-7. Driver
version 510.108.03-1 works (but is unstable in sleep and broken in
hibernation).
Below is an excerpt from journalctl's output including what appears to be
potentially pertinent clusters of lines to me. All logs are from a boot on
the xanmod 6.2.2 kernel, but there is no appreciable difference in the
relevant outputs when running with Debian's 6.1.0-5. The operational
failure points seem to be the ones pertaining to RmInitAdapter and
NvKmsKapiDevice, but I'm not sure what, if any, causality there is between
the two issues.
Mar 08 21:28:46 tangerine kernel: Command line:
BOOT_IMAGE=/boot/vmlinuz-6.2.2-x64v1-xanmod1 root=(***) ro quiet
mitigations=off psi=1 nvidia-drm.modeset=1
(...)
Mar 08 21:28:46 tangerine kernel: nvidia: module verification failed:
signature and/or required key missing - tainting kernel
Mar 08 21:28:46 tangerine kernel: nvidia-nvlink: Nvlink Core is being
initialized, major device number 235
Mar 08 21:28:46 tangerine kernel:
Mar 08 21:28:46 tangerine kernel: nvidia 0000:01:00.0: vgaarb: changed VGA
decodes: olddecodes=io+mem,decodes=none:owns=io+mem
Mar 08 21:28:46 tangerine kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel
Module 525.89.02 Wed Feb 1 23:23:25 UTC 2023
Mar 08 21:28:46 tangerine kernel: nvidia-modeset: Loading NVIDIA Kernel
Mode Setting Driver for UNIX platforms 525.89.02 Wed Feb 1 23:09:40 UTC
2023
Mar 08 21:28:46 tangerine systemd[1]: Finished Rebuild Hardware Database.
Mar 08 21:28:46 tangerine systemd[1]: Starting Rule-based Manager for
Device Events and Files...
Mar 08 21:28:46 tangerine kernel: [drm] [nvidia-drm] [GPU ID 0x00000100]
Loading driver
Mar 08 21:28:46 tangerine kernel: ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM:
Argument #4 type mismatch - Found [Buffer], ACPI requires [Package]
(20221020/nsarguments-61)
Mar 08 21:28:46 tangerine systemd[1]: Started Rule-based Manager for Device
Events and Files.
Mar 08 21:28:46 tangerine systemd[1]: Starting Show Plymouth Boot Screen...
(...)
Mar 08 21:28:55 tangerine kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter
failed! (0x25:0x65:1457)
Mar 08 21:28:55 tangerine kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter
failed, device minor number 0
Mar 08 21:28:55 tangerine kernel: [drm:nv_drm_load [nvidia_drm]] *ERROR*
[nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
Mar 08 21:28:55 tangerine kernel: [drm:nv_drm_probe_devices [nvidia_drm]]
*ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device
Mar 08 21:28:55 tangerine systemd-modules-load[306]: Inserted module
'nvidia_drm'
Some possibly pertinent information from nvidia-bug-report.log.gz:
____________________________________________
*** /sys/bus/pci/devices/0000:01:00.0/power/control
*** ls: -rw-r--r-- 1 root root 4096 2023-03-08 22:17:25.703945619 +0100
/sys/bus/pci/devices/0000:01:00.0/power/control
on
____________________________________________
*** /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 22:17:25.711945656 +0100
/sys/bus/pci/devices/0000:01:00.0/power/runtime_status
active
____________________________________________
*** /sys/bus/pci/devices/0000:01:00.0/power/runtime_usage
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 22:17:25.715945675 +0100
/sys/bus/pci/devices/0000:01:00.0/power/runtime_usage
3
____________________________________________
*** /sys/bus/pci/devices/0000:01:00.1/power/control
*** ls: -rw-r--r-- 1 root root 4096 2023-03-08 22:17:25.749945832 +0100
/sys/bus/pci/devices/0000:01:00.1/power/control
auto
____________________________________________
*** /sys/bus/pci/devices/0000:01:00.1/power/runtime_status
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 22:17:25.761945887 +0100
/sys/bus/pci/devices/0000:01:00.1/power/runtime_status
suspended
____________________________________________
*** /sys/bus/pci/devices/0000:01:00.1/power/runtime_usage
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 22:17:25.807946100 +0100
/sys/bus/pci/devices/0000:01:00.1/power/runtime_usage
0
____________________________________________
*** /proc/driver/nvidia/./gpus/0000:01:00.0/power
*** ls: -r--r--r-- 1 root root 0 2023-03-08 22:17:25.819946156 +0100
/proc/driver/nvidia/./gpus/0000:01:00.0/power
Runtime D3 status: ?
Video Memory: ?
GPU Hardware Support:
Video Memory Self Refresh: ?
Video Memory Off: ?
____________________________________________
/usr/bin/lspci -d "10de:*" -v -xxx
01:00.0 VGA compatible controller: NVIDIA Corporation GM206GLM [Quadro
M2200 Mobile] (rev a1) (prog-if 00 [VGA controller])
Subsystem: Lenovo GM206GLM [Quadro M2200 Mobile]
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at eb000000 (32-bit, non-prefetchable) [size=16M]
Memory at c0000000 (64-bit, prefetchable) [size=256M]
Memory at d0000000 (64-bit, prefetchable) [size=32M]
I/O ports at d000 [size=128]
Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [250] Latency Tolerance Reporting
Capabilities: [258] L1 PM Substates
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] Secondary PCI Express
Kernel driver in use: nvidia
Kernel modules: nvidia
00: de 10 36 14 07 00 10 00 a1 00 00 03 00 00 80 00
10: 00 00 00 eb 0c 00 00 c0 00 00 00 00 0c 00 00 d0
20: 00 00 00 00 01 d0 00 00 00 00 00 00 aa 17 51 22
30: 00 00 00 00 60 00 00 00 00 00 00 00 0a 01 00 00
40: aa 17 51 22 00 00 00 00 00 00 00 00 00 00 00 00
50: 01 00 00 00 01 00 00 00 ce d6 23 00 00 00 00 00
60: 01 68 03 00 08 00 00 00 05 78 80 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 10 00 02 00 e1 8d 2c 01
80: 30 21 00 00 03 3d 45 00 40 01 01 11 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 13 08 04 00
a0: 00 04 00 00 0e 00 00 00 03 00 1f 00 00 00 00 00
b0: 00 00 00 00 09 00 14 01 00 00 10 80 00 00 00 00
c0: 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01:00.1 Audio device: NVIDIA Corporation GM206 High Definition Audio
Controller (rev a1)
Flags: bus master, fast devsel, latency 0, IRQ 17
Memory at ec000000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
00: de 10 ba 0f 06 00 10 00 a1 00 03 04 00 00 80 00
10: 00 00 00 ec 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 60 00 00 00 00 00 00 00 ff 02 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 ce d6 23 00 00 00 00 00
60: 01 68 03 00 0b 00 00 00 05 78 80 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 10 00 02 00 e1 8d 2c 01
80: 30 29 09 00 03 3d 45 00 43 00 01 11 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 13 08 04 00
a0: 00 00 00 00 0e 00 00 00 00 00 01 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
____________________________________________
/usr/bin/lspci -d "10b5:*" -v -xxx
____________________________________________
/usr/bin/lspci -t
-[0000:00]-+-00.0
+-01.0-[01]--+-00.0
| \-00.1
+-08.0
+-14.0
+-14.2
+-15.0
+-16.0
+-16.3
+-17.0
+-1c.0-[03]--
+-1c.2-[04]----00.0
+-1c.4-[05-3d]--
+-1d.0-[3e]----00.0
+-1d.4-[3f]----00.0
+-1f.0
+-1f.2
+-1f.3
+-1f.4
\-1f.6
____________________________________________
/usr/bin/lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v6/7th Gen Core
Processor Host Bridge/DRAM Registers [8086:5918] (rev 05)
00:01.0 PCI bridge [0604]: Intel Corporation 6th-10th Gen Core Processor
PCIe Controller (x16) [8086:1901] (rev 05)
00:08.0 System peripheral [0880]: Intel Corporation Xeon E3-1200 v5/v6 /
E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
[8086:1911]
00:14.0 USB controller [0c03]: Intel Corporation 100 Series/C230 Series
Chipset Family USB 3.0 xHCI Controller [8086:a12f] (rev 31)
00:14.2 Signal processing controller [1180]: Intel Corporation 100
Series/C230 Series Chipset Family Thermal Subsystem [8086:a131] (rev 31)
00:15.0 Signal processing controller [1180]: Intel Corporation 100
Series/C230 Series Chipset Family Serial IO I2C Controller #0 [8086:a160]
(rev 31)
00:16.0 Communication controller [0780]: Intel Corporation 100 Series/C230
Series Chipset Family MEI Controller #1 [8086:a13a] (rev 31)
00:16.3 Serial controller [0700]: Intel Corporation 100 Series/C230 Series
Chipset Family KT Redirection [8086:a13d] (rev 31)
00:17.0 SATA controller [0106]: Intel Corporation
Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI Mode]
[8086:a102] (rev 31)
00:1c.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset
Family PCI Express Root Port #1 [8086:a110] (rev f1)
00:1c.2 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset
Family PCI Express Root Port #3 [8086:a112] (rev f1)
00:1c.4 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset
Family PCI Express Root Port #5 [8086:a114] (rev f1)
00:1d.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset
Family PCI Express Root Port #9 [8086:a118] (rev f1)
00:1d.4 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset
Family PCI Express Root Port #13 [8086:a11c] (rev f1)
00:1f.0 ISA bridge [0601]: Intel Corporation CM238 Chipset LPC/eSPI
Controller [8086:a154] (rev 31)
00:1f.2 Memory controller [0580]: Intel Corporation 100 Series/C230 Series
Chipset Family Power Management Controller [8086:a121] (rev 31)
00:1f.3 Audio device [0403]: Intel Corporation CM238 HD Audio Controller
[8086:a171] (rev 31)
00:1f.4 SMBus [0c05]: Intel Corporation 100 Series/C230 Series Chipset
Family SMBus [8086:a123] (rev 31)
00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection
(5) I219-LM [8086:15e3] (rev 31)
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM206GLM
[Quadro M2200 Mobile] [10de:1436] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation GM206 High Definition Audio
Controller [10de:0fba] (rev a1)
04:00.0 Network controller [0280]: Intel Corporation Wireless 8265 / 8275
[8086:24fd] (rev 78)
3e:00.0 Non-Volatile memory controller [0108]: Lenovo Device [17aa:0004]
3f:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A
PCI Express Card Reader [10ec:525a] (rev 01)
____________________________________________
____________________________________________
*** /sys/devices/system/node/has_cpu
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 22:17:28.447958149 +0100
/sys/devices/system/node/has_cpu
0
____________________________________________
*** /sys/devices/system/node/has_memory
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 22:17:28.449958159 +0100
/sys/devices/system/node/has_memory
0
____________________________________________
*** /sys/devices/system/node/has_normal_memory
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 22:17:28.449958159 +0100
/sys/devices/system/node/has_normal_memory
0
____________________________________________
*** /sys/devices/system/node/online
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 22:17:28.451958167 +0100
/sys/devices/system/node/online
0
____________________________________________
*** /sys/devices/system/node/possible
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 22:17:28.453958177 +0100
/sys/devices/system/node/possible
0
____________________________________________
*** /sys/bus/pci/devices/0000:01:00.0/local_cpulist
*** ls: -r--r--r-- 1 root root 4096 2023-03-08 21:54:55.862015759 +0100
/sys/bus/pci/devices/0000:01:00.0/local_cpulist
0-7
____________________________________________
*** /sys/bus/pci/devices/0000:01:00.0/numa_node
*** ls: -rw-r--r-- 1 root root 4096 2023-03-08 21:54:55.862015759 +0100
/sys/bus/pci/devices/0000:01:00.0/numa_node
-1
____________________________________________
*** /proc/driver/nvidia/./version
*** ls: -r--r--r-- 1 root root 0 2023-03-08 22:17:25.667945453 +0100
/proc/driver/nvidia/./version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 525.89.02 Wed Feb 1
23:23:25 UTC 2023
GCC version: gcc version 11.3.0 (Debian 11.3.0-8)
____________________________________________
*** /proc/driver/nvidia/./gpus/0000:01:00.0/information
*** ls: -r--r--r-- 1 root root 0 2023-03-08 22:17:41.954016177 +0100
/proc/driver/nvidia/./gpus/0000:01:00.0/information
Model: Quadro M2200
IRQ: 140
GPU UUID: GPU-????????-????-????-????-????????????
Video BIOS: ??.??.??.??.??
Bus Type: PCIe
DMA Size: 40 bits
DMA Mask: 0xffffffffff
Bus Location: 0000:01:00.0
Device Minor: 0
GPU Excluded: No
____________________________________________
*** /proc/driver/nvidia/./gpus/0000:01:00.0/registry
*** ls: -rw-r--r-- 1 root root 0 2023-03-08 22:17:42.072016657 +0100
/proc/driver/nvidia/./gpus/0000:01:00.0/registry
Binary: ""
____________________________________________
*** /proc/driver/nvidia/./params
*** ls: -r--r--r-- 1 root root 0 2023-03-08 21:55:09.912015263 +0100
/proc/driver/nvidia/./params
ResmanDebugLevel: 4294967295
RmLogonRC: 1
ModifyDeviceFiles: 1
DeviceFileUID: 0
DeviceFileGID: 0
DeviceFileMode: 438
InitializeSystemMemoryAllocations: 1
UsePageAttributeTable: 4294967295
EnableMSI: 1
EnablePCIeGen3: 0
MemoryPoolSize: 0
KMallocHeapMaxSize: 0
VMallocHeapMaxSize: 0
IgnoreMMIOCheck: 0
TCEBypassMode: 0
EnableStreamMemOPs: 0
EnableUserNUMAManagement: 1
NvLinkDisable: 0
RmProfilingAdminOnly: 1
PreserveVideoMemoryAllocations: 0
EnableS0ixPowerManagement: 0
S0ixPowerManagementVideoMemoryThreshold: 256
DynamicPowerManagement: 3
DynamicPowerManagementVideoMemoryThreshold: 200
RegisterPCIDriver: 1
EnablePCIERelaxedOrderingMode: 0
EnableGpuFirmware: 18
EnableGpuFirmwareLogs: 2
EnableDbgBreakpoint: 0
OpenRmEnableUnsupportedGpus: 0
DmaRemapPeerMmio: 1
RegistryDwords: ""
RegistryDwordsPerDevice: ""
RmMsg: ""
GpuBlacklist: ""
TemporaryFilePath: ""
ExcludedGpus: ""
____________________________________________
*** /proc/driver/nvidia/./registry
*** ls: -rw-r--r-- 1 root root 0 2023-03-08 22:17:42.076016674 +0100
/proc/driver/nvidia/./registry
Binary: ""
In the event it is helpful, I can try to provide more complete information
as gathered by reportbug, which however would be a bit burdensome since I
just reverted my machine to 510.108.03-1 to restore functionality. I also
have access to a full nvidia-bug-report.log.gz gathered in the broken
configuration, but wasn't sure if the bug tracker supports attachments.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/pkg-nvidia-devel/attachments/20230309/5758fd1b/attachment-0001.htm>
More information about the pkg-nvidia-devel
mailing list