Bug#889669: nvidia-graphics-drivers: solve the upgrade problem

Michael Schaller misch at google.com
Wed Mar 21 08:17:15 UTC 2018


Please reconsider that this is merely an annoyance and that this is a
wishlist item.
If a NVIDIA driver security update is pushed and security updates are
installed unattendedly then all NVIDIA user space components will stop
working immediately after the respective package updates as the loaded
kernel module and the user space components have a version mismatch.
The consequences are not immediately visible to the user as NVIDIA
components in memory are still properly matched and hence still work. The
real issue is with new processes as for an instance no OpenGL applications
or CUDA workloads can be launched anymore. This is especially severe for
CUDA server farms as they currently can't enable unattended security
updates unless they specifically exclude NVIDIA driver updates.

On Wed, Mar 21, 2018 at 9:00 AM Philipp Kern <pkern at debian.org> wrote:

> On 03/20/2018 10:59 PM, Luca Boccassi wrote:
> > The problems I see are that it would make an already quite complex
> > packaging system, over which we have very little control (most of it
> > it's binary blobs) even more complicated. We already have 2 layers of
> > update-alternatives (mesa vs nvidia and then current vs legacy).
> >
> > It would also mean we have to start maintaining multiple versions at
> > the same time - again being all binary blobs, which will multiply the
> > source of problems. Basically, it would mean that instead of having
> > current vs legacy340xx (up until a few months ago also legacy304xx),
> > every single driver update would have to be maintained separately.

> I don't propose this as the solution, though. I think that'd indeed be
> infeasible. What I'm saying is that the *binary* packages are versioned
> like this, not the source packages. It's like the kernel in a way, where
> every ABI version gets its own binary package name. Although in Debian
> the hesitance to change the ABI is much higher than in Ubuntu, for
> reasons that I assume have to do with the NEW queue. Cleaning up older
> versions is something we'd find a solution for, just like people clean
> up their old kernels.

> So please separate out maintenance from the proposal. ;-)

> I get it with the two layers of alternatives. Is the reason for mesa vs.
> nvidia because we don't put Nvidia into the library search path first
> and need to deal with the corresponding file conflicts in a sane way? Or
> because we want to keep co-installability between mesa and nvidia?

> > In the end the problem is an annoyance but not a deal breaker - updates
> > can be scheduled and delayed (unlike some other OSes...), and on top of
> > that, version bumps are not that common - at most once a month, and
> > only for those running unstable or testing - in stable we just ship LTS
> > versions.

> Actually it's a real deal breaker in mass deployments. If your users are
> hesitant to do reboots because it resets their work environment, you
> really need to detach nvidia updates from the rest of the package
> updates, which means having a custom-built solution to do that. That has
> turned out to be brittle, as it turns out that you end up installing
> pre-downloaded modules at boot, blocking it for about ten minutes. (It
> has gotten better with SSDs, but still.)

> Even if you just ship LTS versions there are sometimes updates needed,
> be it for Meltdown/Spectre or new hardware. In our case we actually do
> use testing, but even then we had the need to push updates to drivers. I
> think a setup that separates out binaries for every version that allows
> for consistent rollbacks[1] and rollforwards would be beneficial not
> just for us but also for the whole userbase of Debian.

> We'd be willing to invest some time into a solution - as our own to work
> around the flaws in the packaging has turned out to be a maintenance
> headache. But that only works if we at least agree on a plan. I'm also
> happy to clarify more that I probably missed in the proposal. :)

> Kind regards and thanks
> Philipp Kern

> [1] We had a bunch of regressions with newer drivers in the past that
> made them dead on arrival, like missing repaints in terminals for a
> fraction of the cards.

> --
> To unsubscribe, send mail to 889669-unsubscribe at bugs.debian.org.



More information about the pkg-nvidia-devel mailing list