Bug#889669: nvidia-graphics-drivers: solve the upgrade problem

Luca Boccassi bluca at debian.org
Wed Mar 21 13:01:22 UTC 2018


On Wed, 2018-03-21 at 08:56 +0100, Philipp Kern wrote:
> On 03/20/2018 10:59 PM, Luca Boccassi wrote:
> > The problems I see are that it would make an already quite complex
> > packaging system, over which we have very little control (most of
> > it
> > it's binary blobs) even more complicated. We already have 2 layers
> > of
> > update-alternatives (mesa vs nvidia and then current vs legacy).
> > 
> > It would also mean we have to start maintaining multiple versions
> > at
> > the same time - again being all binary blobs, which will multiply
> > the
> > source of problems. Basically, it would mean that instead of having
> > current vs legacy340xx (up until a few months ago also
> > legacy304xx),
> > every single driver update would have to be maintained separately.
> 
> I don't propose this as the solution, though. I think that'd indeed
> be
> infeasible. What I'm saying is that the *binary* packages are
> versioned
> like this, not the source packages. It's like the kernel in a way,
> where
> every ABI version gets its own binary package name. Although in
> Debian
> the hesitance to change the ABI is much higher than in Ubuntu, for
> reasons that I assume have to do with the NEW queue. Cleaning up
> older
> versions is something we'd find a solution for, just like people
> clean
> up their old kernels.
> 
> So please separate out maintenance from the proposal. ;-)

Ah I see - one issue I can foresee is that it's binary blobs all the
way down - so there's really no way to know that libnvidia-foo from
version 1.1 can work with libnvidia-bar from version 2.2. So all the
packages would have to be versioned.

Isn't this sort-of-like what Ubuntu does? IIRC they lump together
everything into a single package unlike we do, and they are named after
the major revision.

How would the switch-at-boot mechanism work?

> I get it with the two layers of alternatives. Is the reason for mesa
> vs.
> nvidia because we don't put Nvidia into the library search path first
> and need to deal with the corresponding file conflicts in a sane way?
> Or
> because we want to keep co-installability between mesa and nvidia?

co-installability - it used to be that each vendor had its own version
of libGL, and they were all incompatible with each other. With libglvnd
this is changing - but sadly we need to keep shipping the non-glvnd
versions as there are often regressions (and some use cases don't work
with the glvnd versions yet, like switchable graphics on laptops).
So in reality what glvnd is doing for us right now is multiplying the
maintenance effort rather than reducing it. But I digress...

> > In the end the problem is an annoyance but not a deal breaker -
> > updates
> > can be scheduled and delayed (unlike some other OSes...), and on
> > top of
> > that, version bumps are not that common - at most once a month, and
> > only for those running unstable or testing - in stable we just ship
> > LTS
> > versions.
> 
> Actually it's a real deal breaker in mass deployments. If your users
> are
> hesitant to do reboots because it resets their work environment, you
> really need to detach nvidia updates from the rest of the package
> updates, which means having a custom-built solution to do that. That
> has
> turned out to be brittle, as it turns out that you end up installing
> pre-downloaded modules at boot, blocking it for about ten minutes.
> (It
> has gotten better with SSDs, but still.)
> 
> Even if you just ship LTS versions there are sometimes updates
> needed,
> be it for Meltdown/Spectre or new hardware. In our case we actually
> do
> use testing, but even then we had the need to push updates to
> drivers. I
> think a setup that separates out binaries for every version that
> allows
> for consistent rollbacks[1] and rollforwards would be beneficial not
> just for us but also for the whole userbase of Debian.
> 
> We'd be willing to invest some time into a solution - as our own to
> work
> around the flaws in the packaging has turned out to be a maintenance
> headache. But that only works if we at least agree on a plan. I'm
> also
> happy to clarify more that I probably missed in the proposal. :)
> 
> Kind regards and thanks
> Philipp Kern
> 
> [1] We had a bunch of regressions with newer drivers in the past that
> made them dead on arrival, like missing repaints in terminals for a
> fraction of the cards.

Ok so now I understand you have some large deployments where this is an
actual issue - I didn't get it immediately, sorry.

I'm up for talking about proposals - Andreas, what do you think?

-- 
Kind regards,
Luca Boccassi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: This is a digitally signed message part
URL: <http://lists.alioth.debian.org/pipermail/pkg-nvidia-devel/attachments/20180321/e1f79b24/attachment.sig>


More information about the pkg-nvidia-devel mailing list