Bug#889669: nvidia-graphics-drivers: solve the upgrade problem

Tue Mar 20 20:22:31 UTC 2018

Hi,

On 2/5/18 4:26 PM, Philipp Kern wrote:
> Since forever users of NVIDIA on Debian accepted that package upgrades
> break newly spawned binaries because the interface between the client
> library and the kernel driver is strictly versioned. The kernel module
> will emit an API mismatch error into the kernel log and GLX requests
> will fail. A reboot is required to remediate this situation.
> 
> I would propose the following model:
> 
> * All binary packages that require strict versioning with NVRM are
> shipped in versioned packages. This means that the library package names
> reflect both major and minor version (= the version on which the driver
> checks) of the driver. The resulting packages should be co-installable
> with each other.
> * An script modifies the symlink for the currently active libraries to
> point to the version of the currently loaded nvidia module (as fetched
> from sysfs's /sys/module/nvidia/version). This script is called on
> installation but more crucially on every boot. This will tie the
> libraries to the module loaded at boot-up.
> * The kernel module itself does not have to be versioned. The kernel
> module can be upgraded and it will end up in the initrd automatically.
> 
> Assuming that we have a metapackage that pulls in the most recent driver
> (like linux-image does), this model would allow to upgrade the driver at
> any point in time and only make it live with the next reboot. This
> allows applications to continue to function.
> 
> This approach has the drawback that every update from NVIDIA needs to go
> through NEW. However I think this is just a theoretical disadvantage at
> this point as NEW latency for ABI version changes has decreased a lot.
> 
> The thing I'm not sure about is how this proposal interacts with the
> legacy modules. I suppose they can all use the same mechanism but the
> script would need to be aware what library stack needs to be chosen. The
> NVIDIA kernel shim already checks using rm_is_supported_device if the
> currently installed device is supported. That together with modalias
> should supposedly already load the correct module and then the script
> could just check which of the modules (if legacy or the normal one) is
> loaded and act accordingly.
> 
> Do you think this would be workable? The NVIDIA packaging is quite a
> beast to handle, I know (and I'm very grateful for your work!). So we
> should have some consensus if this is something you'd be interested in. :)

is there something I could help with to get to a consensus here?
Anything? :)

(After just having had this again that I needed to reboot when all I
wanted was getting the i386 driver.)

Kind regards and thanks
Philipp Kern