Bug#1104190: nvidia-kernel-dkms: package postinst failing due to module overwrite
Andreas Beckmann
anbe at debian.org
Sun Apr 27 21:32:02 BST 2025
Control: reassign -1 dkms
Control: close -1
On 4/27/25 10:25, Russell Coker wrote:
> On Sunday, 27 April 2025 17:04:25 AEST Andreas Beckmann wrote:
>>> The error about the module version not being newer would be because the
>>> postinst has been run many times (every time I install packages) and
>>> compiles the same files. Maybe there should be a --force to address that
>>> case.
>> dkms fails to get the version from the module because of that weird modinfo
>> failure and then uses an empty version string (notice the double space in
>> the error message where the version should have been printed)
>
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1104199
>
> OK thanks for pointing that out. It turned out that there was EACCES which
Good. I don't think dkms could do much in that case...
And having no version still makes a valid module.
> After I fixed that error I still got the following (which happens even if
> running in permissive mode so it's not SE Linux at fault) so it looks like the
> --force is needed:
>
> Signing module /var/lib/dkms/nvidia-current/535.216.03/build/nvidia-uvm.ko
> Signing module /var/lib/dkms/nvidia-current/535.216.03/build/nvidia-peermem.ko
> Module /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current.ko.xz already
> installed at version 535.216.03, override by specifying --force
> Module /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current-modeset.ko.xz
> already installed at version 535.216.03, override by specifying --force
> Module /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current-drm.ko.xz
> already installed at version 535.216.03, override by specifying --force
> Module /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current-uvm.ko.xz
> already installed at version 535.216.03, override by specifying --force
> Module /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current-peermem.ko.xz
> already installed at version 535.216.03, override by specifying --force
At some point dkms got confused and lost the information that the module
was already installed. Or could it be that it couldn't delete the
previously installed module because of some permission error?
(I've never used SE Linux and I doubt I could do tests for these issues
in a chroot on a "normal" host kernel.)
This could also be caused by an earlier dkms version that made it easier
to get dkms into a messier state.
>> Maybe add --force
>
> # dkms install -k 6.12.22-amd64 nvidia-current/535.216.03 --force
> Found pre-existing /lib/modules/6.12.22-amd64/updates/dkms/nvidia-
> current.ko.xz, archiving for uninstallation
> Installing /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current.ko.xz
> Found pre-existing /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current-
> modeset.ko.xz, archiving for uninstallation
> Installing /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current-
> modeset.ko.xz
> Found pre-existing /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current-
> drm.ko.xz, archiving for uninstallation
> Installing /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current-drm.ko.xz
> Found pre-existing /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current-
> uvm.ko.xz, archiving for uninstallation
> Installing /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current-uvm.ko.xz
> Found pre-existing /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current-
> peermem.ko.xz, archiving for uninstallation
> Installing /lib/modules/6.12.22-amd64/updates/dkms/nvidia-current-
> peermem.ko.xz
> Running depmod..... done.
> # echo $?
> 0
Yes, dkms stomps over the not-cleaned up module of the same version.
Reinstalling on 6.12.22 would need --force again, but 6.12.25 was
recently uploaded and you should get clean results without manual steps
there. Upon removing 6.12.22 you may have some leftover stray modules in
/lib/modules/6.12.22-* but it will be hard for dkms to clean that up
properly.
I'm reassigning to dkms and closing this bug report for now, but I'm
open to try fixing this in dkms if there is a reproducible way for dkms
(3.1.8+) to get into this state. (Even if it involves "user errors" to
e.g. get the SE Linux label wrong.) Please reopen if there is more
action needed.
Andreas
PS: If you want to experiment, you could try with the dkms-test-dkms
package. Builds a single trivial module which does nothing. ;-)
PPS: I have an idea what could have happened:
- initial 'dkms install' succeded
- SE label gets corrupted
- upon 'dkms uninstall', the "unaccessible" built module in
/var/lib/dkms) has version '' (empty) and is thus older than the version
found in /lib/modules - so dkms concludes it didn't install its
"outdated" module and therefore does not delete it from /lib/modules
- after processing all modules (and removing none after the version
check always failed) dkms moves the module state from 'installed' to 'built'
- on a subsequent 'dkms install' dkms stomps over the supposedly
"in-kernel" module of the same version.
==> https://github.com/dell/dkms/issues/525
More information about the pkg-nvidia-devel
mailing list