Bug#889669: nvidia-graphics-drivers: solve the upgrade problem

Luca Boccassi bluca at debian.org
Wed Mar 21 13:09:01 UTC 2018


Control: severity -1 normal

On Wed, 2018-03-21 at 08:17 +0000, Michael Schaller wrote:
> Please reconsider that this is merely an annoyance and that this is a
> wishlist item.
> If a NVIDIA driver security update is pushed and security updates are
> installed unattendedly then all NVIDIA user space components will
> stop
> working immediately after the respective package updates as the
> loaded
> kernel module and the user space components have a version mismatch.
> The consequences are not immediately visible to the user as NVIDIA
> components in memory are still properly matched and hence still work.
> The
> real issue is with new processes as for an instance no OpenGL
> applications
> or CUDA workloads can be launched anymore. This is especially severe
> for
> CUDA server farms as they currently can't enable unattended security
> updates unless they specifically exclude NVIDIA driver updates.

That's fine, I didn't grok that you had large installations where this
was causing issues already, personally I'm fine with talking about
possible solutions.

Seeing your email address domain - any chance your company could use
its gargantuan soft-power to get Nvidia to publish the specs for the
missing parts of Nouveau (reclocking, power managerment, etc)? That
would solve all our problems once and for all :-P

> On Wed, Mar 21, 2018 at 9:00 AM Philipp Kern <pkern at debian.org>
> wrote:
> 
> > On 03/20/2018 10:59 PM, Luca Boccassi wrote:
> > > The problems I see are that it would make an already quite
> > > complex
> > > packaging system, over which we have very little control (most of
> > > it
> > > it's binary blobs) even more complicated. We already have 2
> > > layers of
> > > update-alternatives (mesa vs nvidia and then current vs legacy).
> > > 
> > > It would also mean we have to start maintaining multiple versions
> > > at
> > > the same time - again being all binary blobs, which will multiply
> > > the
> > > source of problems. Basically, it would mean that instead of
> > > having
> > > current vs legacy340xx (up until a few months ago also
> > > legacy304xx),
> > > every single driver update would have to be maintained
> > > separately.
> > I don't propose this as the solution, though. I think that'd indeed
> > be
> > infeasible. What I'm saying is that the *binary* packages are
> > versioned
> > like this, not the source packages. It's like the kernel in a way,
> > where
> > every ABI version gets its own binary package name. Although in
> > Debian
> > the hesitance to change the ABI is much higher than in Ubuntu, for
> > reasons that I assume have to do with the NEW queue. Cleaning up
> > older
> > versions is something we'd find a solution for, just like people
> > clean
> > up their old kernels.
> > So please separate out maintenance from the proposal. ;-)
> > I get it with the two layers of alternatives. Is the reason for
> > mesa vs.
> > nvidia because we don't put Nvidia into the library search path
> > first
> > and need to deal with the corresponding file conflicts in a sane
> > way? Or
> > because we want to keep co-installability between mesa and nvidia?
> > > In the end the problem is an annoyance but not a deal breaker -
> > > updates
> > > can be scheduled and delayed (unlike some other OSes...), and on
> > > top of
> > > that, version bumps are not that common - at most once a month,
> > > and
> > > only for those running unstable or testing - in stable we just
> > > ship LTS
> > > versions.
> > Actually it's a real deal breaker in mass deployments. If your
> > users are
> > hesitant to do reboots because it resets their work environment,
> > you
> > really need to detach nvidia updates from the rest of the package
> > updates, which means having a custom-built solution to do that.
> > That has
> > turned out to be brittle, as it turns out that you end up
> > installing
> > pre-downloaded modules at boot, blocking it for about ten minutes.
> > (It
> > has gotten better with SSDs, but still.)
> > Even if you just ship LTS versions there are sometimes updates
> > needed,
> > be it for Meltdown/Spectre or new hardware. In our case we actually
> > do
> > use testing, but even then we had the need to push updates to
> > drivers. I
> > think a setup that separates out binaries for every version that
> > allows
> > for consistent rollbacks[1] and rollforwards would be beneficial
> > not
> > just for us but also for the whole userbase of Debian.
> > We'd be willing to invest some time into a solution - as our own to
> > work
> > around the flaws in the packaging has turned out to be a
> > maintenance
> > headache. But that only works if we at least agree on a plan. I'm
> > also
> > happy to clarify more that I probably missed in the proposal. :)
> > Kind regards and thanks
> > Philipp Kern
> > [1] We had a bunch of regressions with newer drivers in the past
> > that
> > made them dead on arrival, like missing repaints in terminals for a
> > fraction of the cards.
> > --
> > To unsubscribe, send mail to 889669-unsubscribe at bugs.debian.org.

-- 
Kind regards,
Luca Boccassi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: This is a digitally signed message part
URL: <http://lists.alioth.debian.org/pipermail/pkg-nvidia-devel/attachments/20180321/ada5a2e8/attachment.sig>


More information about the pkg-nvidia-devel mailing list