[Pkg-opencl-devel] Bug#974797: pocl: Please upgrade to llvm-toolchain-11

Andreas Beckmann anbe at debian.org
Thu Nov 19 14:39:04 GMT 2020


POCL built against LLVM 10 (sid) or LLVM 11 (experimental) causes a 
autopkgtest regression on armhf in libgpuarray while it succeeded with 
LLVM 9.
https://ci.debian.net/packages/libg/libgpuarray/testing/armhf/
(The autopkgtest cannot be run in pure testing due to missing RC-buggy
libclblas, it only works (and previously passed) in sid (or rather
testing+sid). There are no problems on x86)

The failing test can be called with

POCL_CACHE_DIR=$(mktemp -d)/pocl-cache \
DEVICE=opencl0:0 python3.9 -m nose -v pygpu.tests.test_blas

It terminates with a segmentation fault in LLVM.

The CL kernel is a piece of generated source code created by the
(simplified) stack: python - libgpuarray - libclblas before it gets
handed over to pocl. While I managed to extract the CL kernel source, I
couldn't produce an OpenCL program that builds the kernel in the same
way s.t. it triggers the segmentation fault.

Backtraces from coredumps:

#0  getEmissionKind () at /build/llvm-toolchain-10-hVI0Qp/llvm-toolchain-10-10.0.1/llvm/include/llvm/IR/DebugInfoMetadata.h:1244
#1  initialize () at /build/llvm-toolchain-10-hVI0Qp/llvm-toolchain-10-10.0.1/llvm/lib/CodeGen/LexicalScopes.cpp:53
#2  0xf827f2f0 in computeIntervals () at /build/llvm-toolchain-10-hVI0Qp/llvm-toolchain-10-10.0.1/llvm/lib/CodeGen/LiveDebugVariables.cpp:979
#3  runOnMachineFunction () at /build/llvm-toolchain-10-hVI0Qp/llvm-toolchain-10-10.0.1/llvm/lib/CodeGen/LiveDebugVariables.cpp:996
#4  runOnMachineFunction () at /build/llvm-toolchain-10-hVI0Qp/llvm-toolchain-10-10.0.1/llvm/lib/CodeGen/LiveDebugVariables.cpp:1023
#5  0xf82f46c8 in runOnFunction () at /build/llvm-toolchain-10-hVI0Qp/llvm-toolchain-10-10.0.1/llvm/lib/CodeGen/MachineFunctionPass.cpp:73
#6  0xf816e494 in runOnFunction () at /build/llvm-toolchain-10-hVI0Qp/llvm-toolchain-10-10.0.1/llvm/lib/IR/LegacyPassManager.cpp:1481
#7  0xf816e750 in runOnModule () at /build/llvm-toolchain-10-hVI0Qp/llvm-toolchain-10-10.0.1/llvm/lib/IR/LegacyPassManager.cpp:1517
#8  0xf816eba8 in runOnModule () at /build/llvm-toolchain-10-hVI0Qp/llvm-toolchain-10-10.0.1/llvm/lib/IR/LegacyPassManager.cpp:1582
#9  run () at /build/llvm-toolchain-10-hVI0Qp/llvm-toolchain-10-10.0.1/llvm/lib/IR/LegacyPassManager.cpp:1694
#10 0xfdcd2446 in pocl_llvm_codegen (Device=Device at entry=0x839f60, Modp=0x321bdc0, Output=Output at entry=0xfffea5b4, OutputSize=OutputSize at entry=0xfffea5c8) at ./lib/CL/pocl_llvm_wg.cc:624
#11 0xfdc9669e in llvm_codegen (output=output at entry=0x3763c40 "/tmp/tmp.hvljjDK8aD/pocl-cache/EG/BKJEEKFFENDHPDCNOBDADIAOJNAPPBJKDBOEM/Sdot_kernel/0-0-0/Sdot_kernel.so", device_i=device_i at entry=0, kernel=kernel at entry=0xfffebf88, device=0x839f60, 
    command=command at entry=0xfffebfc0, specialize=specialize at entry=0) at ./lib/CL/devices/common.c:158
#12 0xfdc98304 in pocl_check_kernel_disk_cache (command=command at entry=0xfffebfc0, specialized=specialized at entry=0) at ./lib/CL/devices/common.c:958
#13 0xfdc98722 in pocl_check_kernel_dlhandle_cache (command=0xfffebfc0, initial_refcount=0, specialize=0) at ./lib/CL/devices/common.c:1081
#14 0xfdc70534 in program_compile_dynamic_wg_binaries (program=program at entry=0x31af008) at ./lib/CL/pocl_build.c:179
#15 0xfdc8153c in get_binary_sizes (sizes=0xfffec0b8, program=0x31af008) at ./lib/CL/clGetProgramInfo.c:36
#16 POclGetProgramInfo (program=0x31af008, param_name=<optimized out>, param_value_size=4, param_value=0xfffec0b8, param_value_size_ret=0x0) at ./lib/CL/clGetProgramInfo.c:116
#17 0xcaf53722 in getSingleBinaryFromProgram (binary=std::vector of length 0, capacity 0, program=0x31af008) at ./src/library/blas/generic/binary_lookup.cc:392
#18 BinaryLookup::populateCache (this=this at entry=0xfffec138) at ./src/library/blas/generic/binary_lookup.cc:466
#19 0xcaf4f738 in makeKernelCached (device=0x839f60, context=0x820cd0, sid=sid at entry=320, key=key at entry=0xfffec2bc, kernelGenerator=kernelGenerator at entry=0xcaf7aad9 <generator(char*, size_t, SubproblemDim const*, PGranularity const*, void*)>, 
    dims=0x2fb03d0, pgran=pgran at entry=0x2fb040c, extra=extra at entry=0xfffec304, buildOpts=buildOpts at entry=0xfffec55c "-g -DINCX_NONUNITY -DINCY_NONUNITY", error=error at entry=0xfffec240) at ./src/library/blas/generic/common2.cc:90
#20 0xcaf52662 in makeSolutionSeq (funcID=funcID at entry=CLBLAS_DOT, args=args at entry=0xfffec820, numCommandQueues=numCommandQueues at entry=1, commandQueues=commandQueues at entry=0x635598, numEventsInWaitList=numEventsInWaitList at entry=0, 
    eventWaitList=eventWaitList at entry=0x0, events=events at entry=0xfffec6c4, seq=seq at entry=0xfffec6c8) at ./src/library/blas/generic/solution_seq_make.c:587
#21 0xcaf3e9b6 in doDot (kargs=kargs at entry=0xfffec820, N=1, dotProduct=<optimized out>, offDP=0, X=0xe2afe8, offx=1, incx=2, Y=0xab71a8, offy=1, incy=2, scratchBuff=0x9d7ff0, doConj=0, numCommandQueues=1, commandQueues=0x635598, numEventsInWaitList=0, 
    eventWaitList=0x0, events=0xfffec974) at ./src/library/blas/xdot.c:132
#22 0xcaf3eac8 in clblasSdot (N=<optimized out>, dotProduct=<optimized out>, offDP=<optimized out>, X=0xe2afe8, offx=1, incx=2, Y=0xab71a8, offy=1, incy=2, scratchBuff=0x9d7ff0, numCommandQueues=1, commandQueues=0x635598, numEventsInWaitList=0, 
    eventWaitList=0x0, events=0xfffec974) at ./src/library/blas/xdot.c:193
#23 0xfea574c2 in sdot (N=<optimized out>, X=0xde9630, offX=1, incX=2, Y=0x4ccd20, offY=1, incY=2, Z=0xdf7410, offZ=0) at ./src/gpuarray_blas_opencl_clblas.c:212
#24 0xfea4425c in GpuArray_rdot (X=X at entry=0xca593174, Y=Y at entry=0xca593134, Z=Z at entry=0xca5931b4, nocopy=nocopy at entry=0) at ./src/gpuarray_array_blas.c:77
#25 0xca38e7d4 in __pyx_f_5pygpu_4blas_pygpu_blas_rdot (__pyx_v_X=__pyx_v_X at entry=0xca593168, __pyx_v_Y=__pyx_v_Y at entry=0xca593128, __pyx_v_Z=__pyx_v_Z at entry=0xca5931a8, __pyx_v_nocopy=__pyx_v_nocopy at entry=0) at pygpu/blas.c:1931
#26 0xca38edb4 in __pyx_pf_5pygpu_4blas_dot (__pyx_self=<optimized out>, __pyx_v_overwrite_z=<optimized out>, __pyx_v_Z=0xca5931a8, __pyx_v_Y=<optimized out>, __pyx_v_X=<optimized out>) at pygpu/blas.c:2871
#27 __pyx_pw_5pygpu_4blas_1dot (__pyx_self=<optimized out>, __pyx_args=<optimized out>, __pyx_kwds=<optimized out>) at pygpu/blas.c:2757
#28 0x0009fff4 in cfunction_call (func=<built-in function dot>, args=<optimized out>, kwargs={'overwrite_z': True}) at ../Objects/methodobject.c:539
#29 0x00084ef8 in _PyObject_MakeTpCall (tstate=0x3eb0d8, callable=<built-in function dot>, args=0xfddde4b4, nargs=<optimized out>, keywords=<optimized out>) at ../Objects/call.c:191
#30 0x0007e618 in _PyObject_VectorcallTstate (kwnames=('overwrite_z',), nargsf=<optimized out>, args=<optimized out>, callable=<built-in function dot>, tstate=0x3eb0d8) at ../Include/cpython/abstract.h:116
#31 _PyObject_VectorcallTstate (kwnames=('overwrite_z',), nargsf=<optimized out>, args=<optimized out>, callable=<built-in function dot>, tstate=0x3eb0d8) at ../Include/cpython/abstract.h:103
#32 PyObject_Vectorcall (kwnames=('overwrite_z',), nargsf=<optimized out>, args=<optimized out>, callable=<built-in function dot>) at ../Include/cpython/abstract.h:127
[...]

#0  getEmissionKind () at /build/llvm-toolchain-11-xvRkgA/llvm-toolchain-11-11.0.0/llvm/include/llvm/IR/DebugInfoMetadata.h:1282
#1  initialize () at /build/llvm-toolchain-11-xvRkgA/llvm-toolchain-11-11.0.0/llvm/lib/CodeGen/LexicalScopes.cpp:54
#2  0xf7b02dfc in computeIntervals () at /build/llvm-toolchain-11-xvRkgA/llvm-toolchain-11-11.0.0/llvm/lib/CodeGen/LiveDebugVariables.cpp:971
#3  runOnMachineFunction () at /build/llvm-toolchain-11-xvRkgA/llvm-toolchain-11-11.0.0/llvm/lib/CodeGen/LiveDebugVariables.cpp:988
#4  runOnMachineFunction () at /build/llvm-toolchain-11-xvRkgA/llvm-toolchain-11-11.0.0/llvm/lib/CodeGen/LiveDebugVariables.cpp:1015
#5  0xf7b7c198 in runOnFunction () at /build/llvm-toolchain-11-xvRkgA/llvm-toolchain-11-11.0.0/llvm/lib/CodeGen/MachineFunctionPass.cpp:73
#6  0xf79d43e4 in runOnFunction () at /build/llvm-toolchain-11-xvRkgA/llvm-toolchain-11-11.0.0/llvm/lib/IR/LegacyPassManager.cpp:1516
#7  0xf79d990c in runOnModule () at /build/llvm-toolchain-11-xvRkgA/llvm-toolchain-11-11.0.0/llvm/lib/IR/LegacyPassManager.cpp:1552
#8  0xf79d494c in runOnModule () at /build/llvm-toolchain-11-xvRkgA/llvm-toolchain-11-11.0.0/llvm/lib/IR/LegacyPassManager.cpp:1617
#9  run () at /build/llvm-toolchain-11-xvRkgA/llvm-toolchain-11-11.0.0/llvm/lib/IR/LegacyPassManager.cpp:614
#10 0xfdcd1b52 in pocl_llvm_codegen (Device=Device at entry=0x81dca8, Modp=0x36e3150, Output=Output at entry=0xfffea5b4, OutputSize=OutputSize at entry=0xfffea5c8) at ./lib/CL/pocl_llvm_wg.cc:624
#11 0xfdc95bb6 in llvm_codegen (output=output at entry=0xe5ce88 "/tmp/tmp.9VVoi1yAx0/pocl-cache/EK/PMEOKDJDCAGIGHNLHHOJHFMFBJEIDPNFKHIHE/Sdot_kernel/0-0-0/Sdot_kernel.so", device_i=device_i at entry=0, kernel=kernel at entry=0xfffebf88, device=0x81dca8, 
    command=command at entry=0xfffebfc0, specialize=specialize at entry=0) at ./lib/CL/devices/common.c:158
#12 0xfdc9781c in pocl_check_kernel_disk_cache (command=command at entry=0xfffebfc0, specialized=specialized at entry=0) at ./lib/CL/devices/common.c:958
#13 0xfdc97c3a in pocl_check_kernel_dlhandle_cache (command=0xfffebfc0, initial_refcount=0, specialize=0) at ./lib/CL/devices/common.c:1081
#14 0xfdc6fe20 in program_compile_dynamic_wg_binaries (program=program at entry=0x36df0f8) at ./lib/CL/pocl_build.c:179
#15 0xfdc80998 in get_binary_sizes (sizes=0xfffec0b8, program=0x36df0f8) at ./lib/CL/clGetProgramInfo.c:36
#16 POclGetProgramInfo (program=0x36df0f8, param_name=4453, param_value_size=4, param_value=0xfffec0b8, param_value_size_ret=0x0) at ./lib/CL/clGetProgramInfo.c:115
#17 0xca659722 in getSingleBinaryFromProgram (binary=std::vector of length 0, capacity 0, program=0x36df0f8) at ./src/library/blas/generic/binary_lookup.cc:392
#18 BinaryLookup::populateCache (this=this at entry=0xfffec138) at ./src/library/blas/generic/binary_lookup.cc:466
#19 0xca655738 in makeKernelCached (device=0x81dca8, context=0x821cd8, sid=sid at entry=320, key=key at entry=0xfffec2bc, kernelGenerator=kernelGenerator at entry=0xca680ad9 <generator(char*, size_t, SubproblemDim const*, PGranularity const*, void*)>, dims=0xe1b0c8, 
    pgran=pgran at entry=0xe1b104, extra=extra at entry=0xfffec304, buildOpts=buildOpts at entry=0xfffec55c "-g -DINCX_NONUNITY -DINCY_NONUNITY", error=error at entry=0xfffec240) at ./src/library/blas/generic/common2.cc:90
#20 0xca658662 in makeSolutionSeq (funcID=funcID at entry=CLBLAS_DOT, args=args at entry=0xfffec820, numCommandQueues=numCommandQueues at entry=1, commandQueues=commandQueues at entry=0x635598, numEventsInWaitList=numEventsInWaitList at entry=0, 
    eventWaitList=eventWaitList at entry=0x0, events=events at entry=0xfffec6c4, seq=seq at entry=0xfffec6c8) at ./src/library/blas/generic/solution_seq_make.c:587
#21 0xca6449b6 in doDot (kargs=kargs at entry=0xfffec820, N=1, dotProduct=<optimized out>, offDP=0, X=0xe5fc08, offx=1, incx=2, Y=0xe5f890, offy=1, incy=2, scratchBuff=0xe5d458, doConj=0, numCommandQueues=1, commandQueues=0x635598, numEventsInWaitList=0, 
    eventWaitList=0x0, events=0xfffec974) at ./src/library/blas/xdot.c:132
#22 0xca644ac8 in clblasSdot (N=<optimized out>, dotProduct=<optimized out>, offDP=<optimized out>, X=0xe5fc08, offx=1, incx=2, Y=0xe5f890, offy=1, incy=2, scratchBuff=0xe5d458, numCommandQueues=1, commandQueues=0x635598, numEventsInWaitList=0, 
    eventWaitList=0x0, events=0xfffec974) at ./src/library/blas/xdot.c:193
#23 0xfea574c2 in sdot (N=<optimized out>, X=0xe5ce08, offX=1, incX=2, Y=0x4ccd20, offY=1, incY=2, Z=0x7ba990, offZ=0) at ./src/gpuarray_blas_opencl_clblas.c:212
#24 0xfea4425c in GpuArray_rdot (X=X at entry=0xc9c99174, Y=Y at entry=0xc9c99134, Z=Z at entry=0xc9c991b4, nocopy=nocopy at entry=0) at ./src/gpuarray_array_blas.c:77
#25 0xc9a947d4 in __pyx_f_5pygpu_4blas_pygpu_blas_rdot (__pyx_v_X=__pyx_v_X at entry=0xc9c99168, __pyx_v_Y=__pyx_v_Y at entry=0xc9c99128, __pyx_v_Z=__pyx_v_Z at entry=0xc9c991a8, __pyx_v_nocopy=__pyx_v_nocopy at entry=0) at pygpu/blas.c:1931
#26 0xc9a94db4 in __pyx_pf_5pygpu_4blas_dot (__pyx_self=<optimized out>, __pyx_v_overwrite_z=<optimized out>, __pyx_v_Z=0xc9c991a8, __pyx_v_Y=<optimized out>, __pyx_v_X=<optimized out>) at pygpu/blas.c:2871
#27 __pyx_pw_5pygpu_4blas_1dot (__pyx_self=<optimized out>, __pyx_args=<optimized out>, __pyx_kwds=<optimized out>) at pygpu/blas.c:2757
#28 0x0009fff4 in cfunction_call (func=<built-in function dot>, args=<optimized out>, kwargs={'overwrite_z': True}) at ../Objects/methodobject.c:539
#29 0x00084ef8 in _PyObject_MakeTpCall (tstate=0x3eb0d8, callable=<built-in function dot>, args=0xfddde4b4, nargs=<optimized out>, keywords=<optimized out>) at ../Objects/call.c:191
#30 0x0007e618 in _PyObject_VectorcallTstate (kwnames=('overwrite_z',), nargsf=<optimized out>, args=<optimized out>, callable=<built-in function dot>, tstate=0x3eb0d8) at ../Include/cpython/abstract.h:116
#31 _PyObject_VectorcallTstate (kwnames=('overwrite_z',), nargsf=<optimized out>, args=<optimized out>, callable=<built-in function dot>, tstate=0x3eb0d8) at ../Include/cpython/abstract.h:103
#32 PyObject_Vectorcall (kwnames=('overwrite_z',), nargsf=<optimized out>, args=<optimized out>, callable=<built-in function dot>) at ../Include/cpython/abstract.h:127
[...]


Andreas



More information about the Pkg-opencl-devel mailing list