Bug#889669: nvidia-graphics-drivers: solve the upgrade problem

Luca Boccassi bluca at debian.org
Tue Mar 20 21:59:41 UTC 2018


Control: severity -1 wishlist

On Tue, 2018-03-20 at 21:22 +0100, Philipp Kern wrote:
> Hi,
> 
> On 2/5/18 4:26 PM, Philipp Kern wrote:
> > Since forever users of NVIDIA on Debian accepted that package
> > upgrades
> > break newly spawned binaries because the interface between the
> > client
> > library and the kernel driver is strictly versioned. The kernel
> > module
> > will emit an API mismatch error into the kernel log and GLX
> > requests
> > will fail. A reboot is required to remediate this situation.
> > 
> > I would propose the following model:
> > 
> > * All binary packages that require strict versioning with NVRM are
> > shipped in versioned packages. This means that the library package
> > names
> > reflect both major and minor version (= the version on which the
> > driver
> > checks) of the driver. The resulting packages should be co-
> > installable
> > with each other.
> > * An script modifies the symlink for the currently active libraries
> > to
> > point to the version of the currently loaded nvidia module (as
> > fetched
> > from sysfs's /sys/module/nvidia/version). This script is called on
> > installation but more crucially on every boot. This will tie the
> > libraries to the module loaded at boot-up.
> > * The kernel module itself does not have to be versioned. The
> > kernel
> > module can be upgraded and it will end up in the initrd
> > automatically.
> > 
> > Assuming that we have a metapackage that pulls in the most recent
> > driver
> > (like linux-image does), this model would allow to upgrade the
> > driver at
> > any point in time and only make it live with the next reboot. This
> > allows applications to continue to function.
> > 
> > This approach has the drawback that every update from NVIDIA needs
> > to go
> > through NEW. However I think this is just a theoretical
> > disadvantage at
> > this point as NEW latency for ABI version changes has decreased a
> > lot.
> > 
> > The thing I'm not sure about is how this proposal interacts with
> > the
> > legacy modules. I suppose they can all use the same mechanism but
> > the
> > script would need to be aware what library stack needs to be
> > chosen. The
> > NVIDIA kernel shim already checks using rm_is_supported_device if
> > the
> > currently installed device is supported. That together with
> > modalias
> > should supposedly already load the correct module and then the
> > script
> > could just check which of the modules (if legacy or the normal one)
> > is
> > loaded and act accordingly.
> > 
> > Do you think this would be workable? The NVIDIA packaging is quite
> > a
> > beast to handle, I know (and I'm very grateful for your work!). So
> > we
> > should have some consensus if this is something you'd be interested
> > in. :)
> 
> is there something I could help with to get to a consensus here?
> Anything? :)
> 
> (After just having had this again that I needed to reboot when all I
> wanted was getting the i386 driver.)
> 
> Kind regards and thanks
> Philipp Kern

Hi,

Thanks for your proposal, I understand the need to reboot is an
annoyance.

The problems I see are that it would make an already quite complex
packaging system, over which we have very little control (most of it
it's binary blobs) even more complicated. We already have 2 layers of
update-alternatives (mesa vs nvidia and then current vs legacy).

It would also mean we have to start maintaining multiple versions at
the same time - again being all binary blobs, which will multiply the
source of problems. Basically, it would mean that instead of having
current vs legacy340xx (up until a few months ago also legacy304xx),
every single driver update would have to be maintained separately.

In the end the problem is an annoyance but not a deal breaker - updates
can be scheduled and delayed (unlike some other OSes...), and on top of
that, version bumps are not that common - at most once a month, and
only for those running unstable or testing - in stable we just ship LTS
versions.

Sorry, my personal opinion is that I'm just not sure it would be really
worth the additional time and hassle :-/

-- 
Kind regards,
Luca Boccassi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: This is a digitally signed message part
URL: <http://lists.alioth.debian.org/pipermail/pkg-nvidia-devel/attachments/20180320/d023d0f8/attachment.sig>


More information about the pkg-nvidia-devel mailing list