Bug#975931: libgpuarray autopkgtest using pocl on armhf triggers segfault in LLVM
Andreas Beckmann
anbe at debian.org
Thu Dec 10 00:33:26 GMT 2020
Control: reassign -1 libllvm10,libllvm11
Control: found -1 1:10.0.1-8
Control: found -1 1:11.0.0-5
Control: retitle -1 libgpuarray autopkgtest using pocl on armhf triggers segfault in LLVM
Control: affects -1 + src:pocl
This bug is reproducible with pocl built against llvm-10 (in sid)
and pocl built against llvm-11 (in experimental), while no error
occurrs with pocl built against llvm-9 (in testing).
I managed to create a reproducer in C with an embedded OpenCL kernel
that hopefully helps debugging the issue. (Removing python,
libgpuarray, libclblas, ... from the path triggering the issue.)
Installing the following packages should be sufficient:
pocl-opencl-icd libpocl2-dbgsym libllvm10-dbgsym ocl-icd-opencl-dev ocl-icd-libopencl1-dbgsym
The 975931.sh script builds the ./975931 binary and runs it with an
empty pocl kernel cache, resulting in a segmentation fault with this backtrace:
#0 getEmissionKind () at /build/llvm-toolchain-10-cW4tHW/llvm-toolchain-10-10.0.1/llvm/include/llvm/IR/DebugInfoMetadata.h:1244
#1 initialize () at /build/llvm-toolchain-10-cW4tHW/llvm-toolchain-10-10.0.1/llvm/lib/CodeGen/LexicalScopes.cpp:53
#2 0xb14102f0 in computeIntervals () at /build/llvm-toolchain-10-cW4tHW/llvm-toolchain-10-10.0.1/llvm/lib/CodeGen/LiveDebugVariables.cpp:979
#3 runOnMachineFunction () at /build/llvm-toolchain-10-cW4tHW/llvm-toolchain-10-10.0.1/llvm/lib/CodeGen/LiveDebugVariables.cpp:996
#4 runOnMachineFunction () at /build/llvm-toolchain-10-cW4tHW/llvm-toolchain-10-10.0.1/llvm/lib/CodeGen/LiveDebugVariables.cpp:1023
#5 0xb14856c8 in runOnFunction () at /build/llvm-toolchain-10-cW4tHW/llvm-toolchain-10-10.0.1/llvm/lib/CodeGen/MachineFunctionPass.cpp:73
#6 0xb12ff494 in runOnFunction () at /build/llvm-toolchain-10-cW4tHW/llvm-toolchain-10-10.0.1/llvm/lib/IR/LegacyPassManager.cpp:1481
#7 0xb12ff750 in runOnModule () at /build/llvm-toolchain-10-cW4tHW/llvm-toolchain-10-10.0.1/llvm/lib/IR/LegacyPassManager.cpp:1517
#8 0xb12ffba8 in runOnModule () at /build/llvm-toolchain-10-cW4tHW/llvm-toolchain-10-10.0.1/llvm/lib/IR/LegacyPassManager.cpp:1582
#9 run () at /build/llvm-toolchain-10-cW4tHW/llvm-toolchain-10-10.0.1/llvm/lib/IR/LegacyPassManager.cpp:1694
#10 0xb6e64c82 in pocl_llvm_codegen (Device=Device at entry=0xdb0010, Modp=0x1361838, Output=Output at entry=0xbefde86c, OutputSize=OutputSize at entry=0xbefde880) at ./lib/CL/pocl_llvm_wg.cc:624
#11 0xb6e291de in llvm_codegen (output=output at entry=0xdeb898 "pocl-kernel-cache-2020-12-10T00:06:19+00:00-hPVZwM/AP/PNFEAPBKBFEAKGGNMALGHGJEEKGMJFBFBMDHA/Sdot_kernel/0-0-0/Sdot_kernel.so", device_i=device_i at entry=0, kernel=kernel at entry=0xbefe0240,
device=0xdb0010, command=command at entry=0xbefe0278, specialize=specialize at entry=0) at ./lib/CL/devices/common.c:158
#12 0xb6e2ae44 in pocl_check_kernel_disk_cache (command=command at entry=0xbefe0278, specialized=specialized at entry=0) at ./lib/CL/devices/common.c:958
#13 0xb6e2b262 in pocl_check_kernel_dlhandle_cache (command=0xbefe0278, initial_refcount=0, specialize=0) at ./lib/CL/devices/common.c:1081
#14 0xb6e033d4 in program_compile_dynamic_wg_binaries (program=program at entry=0xd8ab88) at ./lib/CL/pocl_build.c:179
#15 0xb6e13f20 in get_binary_sizes (sizes=0xbefe0384, program=0xd8ab88) at ./lib/CL/clGetProgramInfo.c:36
#16 POclGetProgramInfo (program=0xd8ab88, param_name=4453, param_value_size=128, param_value=0xbefe0384, param_value_size_ret=0xbefe0380) at ./lib/CL/clGetProgramInfo.c:115
#17 0x00473070 in main () at 975931.c:238
Then it runs the binary again, this time with the pocl kernel cache contents
from previous failure, resulting in
inlinable function call in a function with debug info must have a !dbg location
%11 = call i32 @_Z12get_local_idj(i32 0)
inlinable function call in a function with debug info must have a !dbg location
%19 = call i32 @_Z12get_local_idj(i32 1)
inlinable function call in a function with debug info must have a !dbg location
%27 = call i32 @_Z12get_local_idj(i32 2)
binary size: 52077
OK
It may well be that pocl calls llvm with some invalid input
(the fact that the second run does not segfault seems to
indicate something like this), but still a compiler (library)
should not segfault in this case.
I hope you can shed some light on whether llvm or pocl is to
blame here.
Andreas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 975931.c
Type: text/x-csrc
Size: 9414 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/debian-science-maintainers/attachments/20201210/d92c5bf0/attachment-0001.c>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 975931.sh
Type: application/x-sh
Size: 155 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/debian-science-maintainers/attachments/20201210/d92c5bf0/attachment-0001.sh>
More information about the debian-science-maintainers
mailing list