Bug#883615: Acknowledgement ([CRITICAL] Stretch p-u 9.3 breaks NVidia driver and X.org)

Aurelien Jarno aurelien at aurel32.net
Sun Dec 17 13:56:04 UTC 2017


On 2017-12-17 10:10, Andreas Beckmann wrote:
> I did dig further. An easier target for debugging is glxinfo. Which can be further minimized to
> 
> #include <X11/Xlib.h>
> #include <GL/glx.h>
> #include <pthread.h>
> int main()
> {
>         pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
>         pthread_mutex_lock(&mutex);
>         pthread_mutex_unlock(&mutex);
> 
>         Display * dpy ;
>         dpy = XOpenDisplay ( NULL ) ;
> 
>         pthread_mutex_lock(&mutex);
>         pthread_mutex_unlock(&mutex);
> 
>    int fbAttribSingle[] = {
>       GLX_RENDER_TYPE,   GLX_RGBA_BIT,
>       GLX_RED_SIZE,      1,
>       GLX_GREEN_SIZE,    1,
>       GLX_BLUE_SIZE,     1,
>       GLX_DOUBLEBUFFER,  False,
>       None };
>         GLXFBConfig * configs ;
>         int nConfigs ;
>         configs = glXChooseFBConfig ( dpy , 0 , fbAttribSingle , & nConfigs ) ;
> 
>         pthread_mutex_lock(&mutex);
>         pthread_mutex_unlock(&mutex);
> }
> 
> (link with -lGL -lX11)
> 
> that dies at some point in pthread_mutex_lock after several
> calls succeeded:
> 
> (gdb) bt
> #0  0x00007ffff754b1d4 in pthread_mutex_lock (mutex=0x7ffff7001180 <dispatchLock>) at forward.c:192
> #1  0x00007ffff6dab007 in LockDispatch () at ../../../src/GLdispatch/GLdispatch.c:144
> #2  __glDispatchNewVendorID () at ../../../src/GLdispatch/GLdispatch.c:198
> #3  0x00007ffff702c3c2 in ?? () from /usr/lib/x86_64-linux-gnu/libGLX.so.0
> #4  0x00007ffff702d1ac in ?? () from /usr/lib/x86_64-linux-gnu/libGLX.so.0
> #5  0x00007ffff7026251 in glXChooseFBConfig () from /usr/lib/x86_64-linux-gnu/libGLX.so.0
> #6  0x0000555555554964 in main () at mwe.c:25
> (gdb) info shared
> From                To                  Syms Read   Shared Object Library
> 0x00007ffff7dd9aa0  0x00007ffff7df5340  Yes         /lib64/ld-linux-x86-64.so.2
> 0x00007ffff7b745d0  0x00007ffff7b78c1b  Yes (*)     /usr/lib/x86_64-linux-gnu/libGL.so.1
> 0x00007ffff7812da0  0x00007ffff789a434  Yes (*)     /usr/lib/x86_64-linux-gnu/libX11.so.6
> 0x00007ffff7475910  0x00007ffff759f403  Yes         /lib/x86_64-linux-gnu/libc.so.6
> 0x00007ffff7252d80  0x00007ffff725394e  Yes         /lib/x86_64-linux-gnu/libdl.so.2
> 0x00007ffff7024a20  0x00007ffff702ef9d  Yes (*)     /usr/lib/x86_64-linux-gnu/libGLX.so.0
> 0x00007ffff6daabb0  0x00007ffff6dada37  Yes         /usr/lib/x86_64-linux-gnu/libGLdispatch.so.0
> 0x00007ffff6b4fb40  0x00007ffff6b619f5  Yes (*)     /usr/lib/x86_64-linux-gnu/libxcb.so.1
> 0x00007ffff6935700  0x00007ffff693f49f  Yes (*)     /usr/lib/x86_64-linux-gnu/libXext.so.6
> 0x00007ffff672f010  0x00007ffff672fc8c  Yes (*)     /usr/lib/x86_64-linux-gnu/libXau.so.6
> 0x00007ffff6529340  0x00007ffff652ac48  Yes (*)     /usr/lib/x86_64-linux-gnu/libXdmcp.so.6
> 0x00007ffff63153d0  0x00007ffff63225df  Yes (*)     /lib/x86_64-linux-gnu/libbsd.so.0
> 0x00007ffff610c0e0  0x00007ffff610eecf  Yes         /lib/x86_64-linux-gnu/librt.so.1
> 0x00007ffff5ef2ab0  0x00007ffff5eff811  Yes         /lib/x86_64-linux-gnu/libpthread.so.0
> 0x00007ffff5c00f00  0x00007ffff5c76291  Yes (*)     /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0
> 0x00007ffff59ab810  0x00007ffff59ad5a3  Yes (*)     /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.375.82
> 0x00007ffff3ed7600  0x00007ffff4fbac77  Yes (*)     /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.375.82
> 0x00007ffff38d7680  0x00007ffff39438da  Yes         /lib/x86_64-linux-gnu/libm.so.6
> (gdb) disassemble
> Dump of assembler code for function pthread_mutex_lock:
>    0x00007ffff754b1b0 <+0>:     mov    0x2a957a(%rip),%eax        # 0x7ffff77f4730 <__libc_pthread_functions_init>
>    0x00007ffff754b1b6 <+6>:     test   %eax,%eax
>    0x00007ffff754b1b8 <+8>:     jne    0x7ffff754b1c0 <pthread_mutex_lock+16>
>    0x00007ffff754b1ba <+10>:    xor    %eax,%eax
>    0x00007ffff754b1bc <+12>:    retq   
>    0x00007ffff754b1bd <+13>:    nopl   (%rax)
>    0x00007ffff754b1c0 <+16>:    mov    0x2a94c1(%rip),%rax        # 0x7ffff77f4688 <__libc_pthread_functions+264>
>    0x00007ffff754b1c7 <+23>:    ror    $0x11,%rax
>    0x00007ffff754b1cb <+27>:    xor    %fs:0x30,%rax
> => 0x00007ffff754b1d4 <+36>:    jmpq   *%rax
> 
> After finally understanding that the fs segment is used for TLS storage
> addressing, I actually saw the difference in the linked libraries:
> /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.375.82 vs.
> /usr/lib/x86_64-linux-gnu/tls/libnvidia-tls.so.375.82

Oh, it's strange this library didn't show up in the ldd tests that
appear earlier in this bug report. I guess it is (indirectly) dlopened.

> From the documentation:
> 
> The nvidia-tls libraries (/usr/lib/libnvidia-tls.so.384.98 and /usr/lib/tls/libnvidia-tls.so.384.98); these files provide thread local storage support for the NVIDIA OpenGL libraries (libGL, libnvidia-glcore, and libglx). Each nvidia-tls library provides support for a particular thread local storage model (such as ELF TLS), and the one appropriate for your system will be loaded at run time.
> 
> and from the source code of nvidia-installer (which we don't use):
> 
>       "NVIDIA's OpenGL libraries are compiled with one of two "
>       "different thread local storage (TLS) mechanisms: 'classic tls' "
>       "which is used on systems with glibc 2.2 or older, and 'new tls' "
>       "which is used on systems with tls-enabled glibc 2.3 or newer.  "
> 
Yes exactly. "New" TLS mechanism is implemented in the NPTL (as opposed
to LinuxThreads) and required a 2.6 kernel minimum (as opposed to 2.4)
to work. The hardware capabilities mechanism has been slightly abused to
export the ability of the kernel to support NPTL. This has been done
that way as RedHat backported all the NPTL support to its 2.4 kernel.

The Debian glibc package therefore provided two different libc depending
on the running kernel, one in /lib and the other in /lib/tls. Debian
dropped the 2.4 kernel support in Lenny, and thus only the glibc with
the "new" TLS mechanism was provided. As a consequence all the packages
stopped using the tls directory as the new mechanism was guaranteed to
be supported. IIRC we made sure that all libraries have been moved out
of the tls/ directory, but I guess we missed the nvidia library as it
was in non-free.

> So we probably shouldn't ship the classic ones at all and move the new
> ones to the regular library directory (nvidia seems to be the only package
> still shipping stuff in tls/)

Indeed that is the correct fix. I am actually surprised that Nvidia
still provides a library built for such an old glibc and I wonder how
they build it.

Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien at aurel32.net                 http://www.aurel32.net



More information about the pkg-nvidia-devel mailing list