Bug#807244: Bug#800574: Bug#807244: libegl1-nvidia: Programs crash due to elisian-unlock on skylake processor with nvidia driver 352.63-1 (experimental)
Andreas Beckmann
anbe at debian.org
Tue Dec 8 18:25:38 UTC 2015
Hi Aurelien,
thanks for your analysis.
On 2015-12-08 10:23, Aurelien Jarno wrote:
> I disagree it is supposed to be fixed. Intel got a few bugs in there
> TSX-NI implementation for Haswell and Broadwell and possibly early
> versions of Skylake, and to avoid data loss we have therefore disabled
> lock elision for some CPU revisions.
That's what I meant with "fixed". But obviously there are two problems
here: buggy hardware (blacklisted, #800574) and ...
> That said the bugs in the Intel
> implementation are corner cases, and it took quite some time for them to
> get discovered. If your program crashes reproducibly, it's definitely not
> an issue with the TSX-NI implementation. Disabling --enable-lock-elision
> it's just a workaround for the real issue. People now start to have CPUs
> with a working TSX-NI implementation which is therefore not blacklisted
> and thus the problem is appearing again.
... buggy software (#807244), which is only exposed by running on
hardware with working TSX-NI.
That could also explain the fact that the bug was introduced in 352+.
Jelle, I didn't dig through the nvidia forums, but if this info isn't
mentioned there already, maybe you could post it:
> According to the backtrace the problem is typical of a call to
> mutex_unlock() on a mutex which hasn't been locked with mutex_lock()
> before.
(or was already unlocked.)
Andreas
More information about the pkg-nvidia-devel
mailing list