Bug#949767: clblas: *gemm wrong answers in out-of-order queues
Rebecca N. Palmer
rebecca_palmer at zoho.com
Mon Jan 27 19:37:49 GMT 2020
Control: retitle -1 clblas: *gemm wrong answers in out-of-order queues
Control: reassign -1 src:clblas
Control: found -1 2.12-1
I think I've found the actual bug, in clblas src/library/blas/xgemm.cc:
clblasGemm (with a single command queue) enqueues up to 4 kernels and
returns an event that depends on only the last of them, so if the queue
is out-of-order, waiting on this event doesn't necessarily wait for all
of them to finish.
This was previously noticed in
https://github.com/clMathLibraries/clBLAS/issues/269#issuecomment-225453543
, but not actually reported as a bug.
clblas includes a client/performance tester that creates an out-of-order
queue (at src/client/clfunc_common.hpp:306), implying that it intends to
allow such queues. (We don't run clblas' own tests, possibly because of
https://github.com/clMathLibraries/clBLAS/issues/338.)
The real fix would be to return an event that depends on all the
kernels' events (e.g. created with clEnqueueMarkerWithWaitList).
As a workaround for now, I intend to disable out-of-order queues in
libgpuarray. (It appears to be the only reverse dependency of clblas
that also uses out-of-order queues.)
More information about the debian-science-maintainers
mailing list