Bug#908568: nvidia-driver: build error

Vincent Lefevre vincent at vinc17.net
Wed Sep 12 02:31:17 BST 2018


On 2018-09-11 15:29:02 -0700, Russ Allbery wrote:
> Vincent Lefevre <vincent at vinc17.net> writes:
> 
> > This would mean that a breakage is possible after any patch (in
> > particular with those "Update to SVN..." in the changelog).  Thus this
> > means that the kernel module build system must check that the GCC Debian
> > package version number matches the one used to build the kernel. For
> > sid, GCC is often not in sync with the one used to build the
> > kernel. This is a really big problem.
> 
> I don't think it's an NVIDIA-specific problem, though, right?  Doesn't
> this happen with any kernel module build?

I assume that this depends on which features of the interface are
used.

> Or am I confused and this is an NVIDIA-specific check?

I don't know. It would be interesting to know the causes of the
failures.

I tried to search for information and found:

  https://www.kernel.org/doc/html/v4.14/process/stable-api-nonsense.html

Some issues may come from the kernel build options, and concerning
the compiler:

  "Depending on the version of the C compiler you use, different
  kernel data structures will contain different alignment of
  structures, and possibly include different functions in different
  ways (putting functions inline or not.) The individual function
  organization isn’t that important, but the different data structure
  padding is very important."

Well, concerning the alignment of structures and internal padding,
this is a much more general problem that could affect any library
that expose structures in their .h public header files. I assume
that this will never change, otherwise one would see a breakage
in various places. So, I'm wondering why this is mentioned.

Concerning inline functions, this would need explanations. In any
case, I don't see why this would change between minor versions.
And this is more likely to be affected by compiler options.

Moreover, it appears that NVIDIA's compiler version check predates
the change of numbering in GCC. See for instance:

  https://devtalk.nvidia.com/default/topic/833061/nvidia-drivers-352-09-dont-install/

"Compiler version check failed:

The major and minor number of the compiler used to
compile the kernel:

gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)

does not match the compiler used here:

cc (Ubuntu 4.9.2-0ubuntu1~14.04) 4.9.2"

In the past, GCC 4.6, 4.7, 4.8, 4.9 were indeed different releases,
but now, only the major number corresponds to some given release:
5, 6, 7, 8, 9 (GCC was using X.Y.Z where Z was a patchlevel, and
now uses X.Y where Y is a patchlevel, thus to be consistent with
the past test, NVIDIA should now just check X).

https://devtalk.nvidia.com/default/topic/960778/linux/having-trouble-with-the-340-96-driver-on-kali-linux-failing-cc-version-check/

says: "That's because the ABI (i.e. how parameters are passed, or
how the stack is laid out) changes between gcc versions, especially
between major version changes (here, v5's ABI is incompatible with
V6's)."

Thus only the *major* version matters.

-- 
Vincent Lefèvre <vincent at vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



More information about the pkg-nvidia-devel mailing list