Bug#1104190: nvidia-kernel-dkms: package postinst failing due to module overwrite

Andreas Beckmann anbe at debian.org
Sun Apr 27 21:32:02 BST 2025


Control: reassign -1 dkms
Control: close -1

On 4/27/25 10:25, Russell Coker wrote:
> On Sunday, 27 April 2025 17:04:25 AEST Andreas Beckmann wrote:
>>> The error about the module version not being newer would be because the
>>> postinst has been run many times (every time I install packages) and
>>> compiles the same files.  Maybe there should be a --force to address that
>>> case.
>> dkms fails to get the version from the module because of that weird modinfo
>> failure and then uses an empty version string (notice the double space in
>> the error message where the version should have been printed)
> 
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1104199
> 
> OK thanks for pointing that out.  It turned out that there was EACCES which

Good. I don't think dkms could do much in that case...
And having no version still makes a valid module.

> After I fixed that error I still got the following (which happens even if
> running in permissive mode so it's not SE Linux at fault) so it looks like the
> --force is needed:
> 
> Signing module /var/lib/dkms/nvidia-current/535.216.03/build/nvidia-uvm.ko
> Signing module /var/lib/dkms/nvidia-current/535.216.03/build/nvidia-peermem.ko
> Module /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current.ko.xz already
> installed at version 535.216.03, override by specifying --force
> Module /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current-modeset.ko.xz
> already installed at version 535.216.03, override by specifying --force
> Module /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current-drm.ko.xz
> already installed at version 535.216.03, override by specifying --force
> Module /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current-uvm.ko.xz
> already installed at version 535.216.03, override by specifying --force
> Module /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current-peermem.ko.xz
> already installed at version 535.216.03, override by specifying --force

At some point dkms got confused and lost the information that the module 
was already installed. Or could it be that it couldn't delete the 
previously installed module because of some permission error?
(I've never used SE Linux and I doubt I could do tests for these issues 
in a chroot on a "normal" host kernel.)
This could also be caused by an earlier dkms version that made it easier 
to get dkms into a messier state.

>> Maybe add --force
> 
> # dkms install -k 6.12.22-amd64 nvidia-current/535.216.03 --force
> Found pre-existing /lib/modules/6.12.22-amd64/updates/dkms/nvidia-
> current.ko.xz, archiving for uninstallation
> Installing /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current.ko.xz
> Found pre-existing /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current-
> modeset.ko.xz, archiving for uninstallation
> Installing /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current-
> modeset.ko.xz
> Found pre-existing /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current-
> drm.ko.xz, archiving for uninstallation
> Installing /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current-drm.ko.xz
> Found pre-existing /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current-
> uvm.ko.xz, archiving for uninstallation
> Installing /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current-uvm.ko.xz
> Found pre-existing /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current-
> peermem.ko.xz, archiving for uninstallation
> Installing /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current-
> peermem.ko.xz
> Running depmod..... done.
> # echo $?
> 0

Yes, dkms stomps over the not-cleaned up module of the same version.
Reinstalling on 6.12.22 would need --force again, but 6.12.25 was 
recently uploaded and you should get clean results without manual steps 
there. Upon removing 6.12.22 you may have some leftover stray modules in 
/lib/modules/6.12.22-* but it will be hard for dkms to clean that up 
properly.

I'm reassigning to dkms and closing this bug report for now, but I'm 
open to try fixing this in dkms if there is a reproducible way for dkms 
(3.1.8+) to get into this state. (Even if it involves "user errors" to 
e.g. get the SE Linux label wrong.) Please reopen if there is more 
action needed.

Andreas

PS: If you want to experiment, you could try with the dkms-test-dkms 
package. Builds a single trivial module which does nothing. ;-)

PPS: I have an idea what could have happened:
- initial 'dkms install' succeded
- SE label gets corrupted
- upon 'dkms uninstall', the "unaccessible" built module in 
/var/lib/dkms) has version '' (empty) and is thus older than the version 
found in /lib/modules - so dkms concludes it didn't install its 
"outdated" module and therefore does not delete it from /lib/modules
- after processing all modules (and removing none after the version 
check always failed) dkms moves the module state from 'installed' to 'built'
- on a subsequent 'dkms install' dkms stomps over the supposedly 
"in-kernel" module of the same version.

==> https://github.com/dell/dkms/issues/525



More information about the pkg-nvidia-devel mailing list