Bug#946422: silx: autopkgtest regression: pocl error

Andreas Beckmann anbe at debian.org
Thu Dec 19 13:17:21 GMT 2019


On 19/12/2019 11.59, PICCA Frederic-Emmanuel wrote:
> I found that commenting this line 
> 
> # self.d_array_5 = pyopencl.array.zeros_like(self.d_array_img) - 5
> 
> remove the pocl issue.

I think that's a red herring. Without that line I get python errors because d_array_5 is missing.

That's the backtrace I get in gdb for the original failure:

(gdb) bt
#0  0x00007ffff7e41081 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff7e2c535 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fffe67c46d1 in pocl_check_kernel_dlhandle_cache (cmd=cmd at entry=0x13e1bb0, initial_refcount=initial_refcount at entry=1) at ./lib/CL/devices/common.c:1097
#3  0x00007fffe67ca327 in pocl_pthread_prepare_kernel (cmd=0x13e1bb0, data=0x1335df0) at ./lib/CL/devices/pthread/pthread_scheduler.c:413
#4  pocl_pthread_exec_command (td=0x133cb80, cmd=0x13e1bb0) at ./lib/CL/devices/pthread/pthread_scheduler.c:450
#5  pocl_pthread_driver_thread (p=<optimized out>) at ./lib/CL/devices/pthread/pthread_scheduler.c:496
#6  0x00007ffff7deefb7 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#7  0x00007ffff7f012df in clone () from /lib/x86_64-linux-gnu/libc.so.6

we are failing in pocl_check_kernel_dlhandle_cache() here:

    if (ci->dlhandle == NULL || ci->wg == NULL || dl_error != NULL)
      { 
        POCL_ABORT (
            "pocl error: lt_dlopen(\"%s\") or lt_dlsym() failed with '%s'.\n"
            "note: missing symbols in the kernel binary might be"
            " reported as 'file not found' errors.\n",
            module_fn, dl_error);
      }

(gdb) print *ci
$11 = {hash = "p\374|\271.\364b0\217\036\361?\271\327~\372", local_wgs = {32, 1, 1}, wg = 0x0, dlhandle = 0x7fff890010f0, next = 0x0, prev = 0x0, ref_count = 1}
(gdb) print dl_error
$15 = 0x7ffff53433e0 "can't close resident module"

I'm suspecting openmpi (that gets loaded by the io import) somehow messes up some state,
causing the lt_*() failures.

Andreas



More information about the debian-science-maintainers mailing list