Bug#994942: nvidia-driver: does not load on kernel 5.14.6

Julian Gilbey jdg at debian.org
Tue Oct 5 19:42:53 BST 2021


found 994942 470.63.01-1
thanks

On Thu, Sep 23, 2021 at 05:26:34PM +0200, Michael Rasmussen wrote:
> Package: nvidia-driver
> Version: 470.57.02-3
> Severity: important
> 
> Dear Maintainer,
> 
> When loading the kernel modules the following is shown in syslog:
> [  283.076623] resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
> [  283.076628] caller _nv000722rm+0x1ad/0x200 [nvidia] mapping multiple BARs
> [  283.078369] NVRM: GPU 0000:0b:00.0: RmInitAdapter failed! (0x23:0xffff:1204)
> [  283.078395] NVRM: GPU 0000:0b:00.0: rm_init_adapter failed, device minor number 0
> [...]

I don't know whether it's related, but I am also seeing this bug with
version 470.63.01-1, and I am also getting buffer overflows reported,
for example:

Oct  5 17:29:44 euler kernel: [   66.779884] ------------[ cut here ]------------
Oct  5 17:29:44 euler kernel: [   66.779887] Buffer overflow detected (8 < 192)!
Oct  5 17:29:44 euler kernel: [   66.779898] WARNING: CPU: 9 PID: 3150 at include/linux/thread_info.h:200 ethtool_rxnfc_copy_to_user+0x2b/0xb0
Oct  5 17:29:44 euler kernel: [   66.779906] Modules linked in: ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_multiport xt_cgroup cpufreq_ondemand cpufreq_conservative cpufreq_userspace cpufreq_powersave xt_mark xt_owner xt_tcpudp nft_compat nft_counter nf_tables libcrc32c nfnetlink cmac overlay algif_hash algif_skcipher af_alg bnep binfmt_misc sd_mod eeepc_wmi sg asus_wmi btusb btrtl btbcm btintel bluetooth hid_generic jitterentropy_rng uvcvideo videobuf2_vmalloc sha512_ssse3 videobuf2_memops sha512_generic snd_usb_audio videobuf2_v4l2 videobuf2_common drbg usbhid videodev snd_usbmidi_lib uas ansi_cprng snd_rawmidi hid snd_seq_device usb_storage mc ecdh_generic ecc intel_rapl_msr intel_rapl_common dm_crypt rtw88_8822be snd_hda_codec_realtek rtw88_8822b snd_hda_codec_generic rtw88_pci edac_mce_amd snd_hda_codec_hdmi nls_ascii ledtrig_audio rtw88_core snd_hda_intel kvm_amd battery sparse_keymap snd_intel_dspcfg wmi_bmof video snd_intel_sdw_acpi mxm_wmi kvm mac80211 evdev snd_hda_codec nls_cp437 vfat
Oct  5 17:29:44 euler kernel: [   66.779952]  snd_hda_core nvidia_drm(POE) irqbypass snd_hwdep crc32_pclmul fat ghash_clmulni_intel snd_pcm_oss snd_mixer_oss cfg80211 rapl igb snd_pcm ptp drm_kms_helper pps_core dca ccp pcspkr snd_timer i2c_algo_bit ahci rfkill efi_pstore cec libarc4 snd libahci rng_core rc_core soundcore sp5100_tco libata watchdog nvidia_modeset(POE) k10temp i2c_piix4 wmi button acpi_cpufreq nvidia(POE) nfsd drivetemp auth_rpcgss scsi_mod nfs_acl lockd parport_pc grace ppdev lp drm parport sunrpc fuse configfs efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic dm_mod crc32c_intel xhci_pci aesni_intel xhci_hcd nvme crypto_simd cryptd nvme_core usbcore t10_pi crc_t10dif crct10dif_generic gpio_amdpt crct10dif_pclmul crct10dif_common usb_common gpio_generic
Oct  5 17:29:44 euler kernel: [   66.779997] CPU: 9 PID: 3150 Comm: nmbd Tainted: P           OE     5.14.0-1-amd64 #1  Debian 5.14.6-2
Oct  5 17:29:44 euler kernel: [   66.779999] Hardware name: System manufacturer System Product Name/ROG STRIX X399-E GAMING, BIOS 1205 05/12/2020
Oct  5 17:29:44 euler kernel: [   66.780000] RIP: 0010:ethtool_rxnfc_copy_to_user+0x2b/0xb0
Oct  5 17:29:44 euler kernel: [   66.780003] Code: 1f 44 00 00 41 55 65 48 8b 04 25 c0 7b 01 00 41 54 55 53 f6 40 10 02 75 23 be 08 00 00 00 48 c7 c7 28 16 70 8b e8 64 4c 14 00 <0f> 0b 41 bc f2 ff ff ff 5b 44 89 e0 5d 41 5c 41 5d c3 48 89 fb 49
Oct  5 17:29:44 euler kernel: [   66.780005] RSP: 0018:ffffa8da03c5fbf8 EFLAGS: 00010282
Oct  5 17:29:44 euler kernel: [   66.780007] RAX: 0000000000000000 RBX: ffffffffc0e24ec0 RCX: ffff9cd5bd458888
Oct  5 17:29:44 euler kernel: [   66.780008] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9cd5bd458880
Oct  5 17:29:44 euler kernel: [   66.780009] RBP: ffff9cc699a6c000 R08: 0000000000000000 R09: ffffa8da03c5fa20
Oct  5 17:29:44 euler kernel: [   66.780010] R10: ffffa8da03c5fa18 R11: ffff9cd5bd0fffe8 R12: 0000000000000000
Oct  5 17:29:44 euler kernel: [   66.780010] R13: 00007fff0cf24fb0 R14: 0000000000000000 R15: ffffa8da03c5fc28
Oct  5 17:29:44 euler kernel: [   66.780012] FS:  00007f792e382a40(0000) GS:ffff9cd5bd440000(0000) knlGS:0000000000000000
Oct  5 17:29:44 euler kernel: [   66.780013] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct  5 17:29:44 euler kernel: [   66.780014] CR2: 0000556c89b8e098 CR3: 0000800151d32000 CR4: 00000000003506e0
Oct  5 17:29:44 euler kernel: [   66.780015] Call Trace:
Oct  5 17:29:44 euler kernel: [   66.780020]  ethtool_get_rxnfc+0xcb/0x1b0
Oct  5 17:29:44 euler kernel: [   66.780023]  dev_ethtool+0xb4b/0x28f0
Oct  5 17:29:44 euler kernel: [   66.780026]  ? tomoyo_init_request_info+0x8f/0xb0
Oct  5 17:29:44 euler kernel: [   66.780029]  ? tomoyo_path_number_perm+0x66/0x1d0
Oct  5 17:29:44 euler kernel: [   66.780031]  ? obj_cgroup_charge_pages+0xdf/0x180
Oct  5 17:29:44 euler kernel: [   66.780035]  dev_ioctl+0x156/0x480
Oct  5 17:29:44 euler kernel: [   66.780038]  sock_do_ioctl+0x9b/0x130
Oct  5 17:29:44 euler kernel: [   66.780042]  sock_ioctl+0x23a/0x320
Oct  5 17:29:44 euler kernel: [   66.780043]  __x64_sys_ioctl+0x83/0xb0
Oct  5 17:29:44 euler kernel: [   66.780046]  do_syscall_64+0x3b/0xc0
Oct  5 17:29:44 euler kernel: [   66.780050]  entry_SYSCALL_64_after_hwframe+0x44/0xae
Oct  5 17:29:44 euler kernel: [   66.780052] RIP: 0033:0x7f7931d4e957
Oct  5 17:29:44 euler kernel: [   66.780054] Code: 3c 1c 48 f7 d8 4c 39 e0 77 b9 e8 24 ff ff ff 85 c0 78 be 4c 89 e0 5b 5d 41 5c c3 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e1 94 0c 00 f7 d8 64 89 01 48
Oct  5 17:29:44 euler kernel: [   66.780056] RSP: 002b:00007fff0cf24f48 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Oct  5 17:29:44 euler kernel: [   66.780057] RAX: ffffffffffffffda RBX: 0000556c89b8d010 RCX: 00007f7931d4e957
Oct  5 17:29:44 euler kernel: [   66.780058] RDX: 00007fff0cf24f80 RSI: 0000000000008946 RDI: 000000000000000f
Oct  5 17:29:44 euler kernel: [   66.780058] RBP: 000000000000000f R08: 0000000000000000 R09: 0000556c89b8d1d8
Oct  5 17:29:44 euler kernel: [   66.780059] R10: 0000556c89b8e3e4 R11: 0000000000000246 R12: 0000556c89b8d1c0
Oct  5 17:29:44 euler kernel: [   66.780060] R13: 00007fff0cf24f80 R14: 0000556c89b8e380 R15: 0000556c89b8e424
Oct  5 17:29:44 euler kernel: [   66.780061] ---[ end trace 40007612b4540696 ]---

In my case, downgrading the kernel to 5.10.0-8 also allowed this newer
nvidia-graphics-driver collection of packages to work.  And I'm no
longer getting kernel errors.

So it seems to be some interaction between the newer kernel and the
NVIDIA dirvers.

Best wishes,

   Julian



More information about the pkg-nvidia-devel mailing list