Bug#903514: gimp won't launch

Alexis Murzeau amubtdx at gmail.com
Tue Aug 7 23:13:10 BST 2018


Hi,

On Fri, 3 Aug 2018 22:53:08 -0400 James Van Zandt
<jim.vanzandt at gmail.com> wrote:
> Thanks, Benedict - the same solution worked for me.
> 
> Specifically:
> 
>    sudo apt-get install libopenblas-base- libopenblas-dev- \
>                      libblas3 liblapack3 libblas-dev liblapack-dev
> 
> Unfortunately julia and libjulia0.6 were also removed here, since they
> depend on libopenblas-base.  I intend to report this as a bug, and request
> that they depend instead on the virtual packages libblas.so.3 and
> liblapack.so.3 (which can also be provided by liblapack3 and libblas3,
> resp.).

After checking what could cause gimp issues, I found that on my machine,
gimp almost always hang showing nothing (no splashscreen) when
libopenblas-base is installed.

Using gdb to find where it hung (gimp-gdb.txt) gives threads waiting on
a lock while doing thread-local related stuff and the main thread is in
the process of dl_close-ing openblas waiting the threads to exit using
pthread_join.

It seems that the lock used in `tls_get_addr_tail` [0] is the same as
the one locked by _dl_close [1].
A recursive lock is used but here it does not help as the thread calling
`tls_get_addr_tail` and `_dl_close` are not the same.

This deadlock may not happen everytime, in my case, the openblas threads
are still initializing while dl_close is called.

Given this, I think the offending commit in openblas is bf40f806 [2]
which add TLS variables to avoid locking. But many change were done
since then.

One of related bug report is [3] which seems to indicate that the locks
handling is not easy inside glibc.

There were an attempt to fix deadlocks between tls_get_addr and a
dlclose of a module whose finalizer joins with that thread [4].

So I see these possibles solutions:
 * Add a breaks between gimp and openblas
 * Disable TLS in openblas build (if possible, but this would cause a
performance loss for users that use openblas without gimp)
 * Patch glibc to not deadlock (but this seems not easy to do at all)

Also, this deadlock might not be the only cause of issues encountered in
this bug report.

Reassigning to glibc with affects on openblas and gimp as this is caused
by a deadlock inside glibc.

[0] https://github.com/bminor/glibc/blob/glibc-2.27/elf/dl-tls.c#L761
[1] https://github.com/bminor/glibc/blob/glibc-2.27/elf/dl-close.c#L812

[2]
https://github.com/xianyi/OpenBLAS/commit/bf40f806efa55c7a7c7ec57535919598eaeb569d#diff-31f8d4e8863583d95bf2f9529f83844e
[4] https://sourceware.org/ml/libc-alpha/2015-06/msg00062.html

-- 
Alexis Murzeau
PGP: B7E6 0EBB 9293 7B06 BDBC  2787 E7BD 1904 F480 937F
-------------- next part --------------
(gdb) thr a a bt

Thread 4 (Thread 0x7f727a990700 (LWP 26238)):
#0  0x00007f7283ad711c in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f7283ad06c6 in __GI___pthread_mutex_lock (mutex=0x7f7287775968 <_rtld_global+2312>) at ../nptl/pthread_mutex_lock.c:113
#2  0x00007f728775e5b7 in tls_get_addr_tail (ti=0x7f7278c2fc70, dtv=0x55edf85706b0, the_map=0x55edf8567980) at ../elf/dl-tls.c:761
#3  0x00007f7287764288 in __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55
#4  0x00007f7276d86400 in get_memory_table () at memory.c:1147
#5  0x00007f7276d86400 in blas_memory_alloc (procpos=procpos at entry=2) at memory.c:1147
#6  0x00007f7276d86bbb in blas_thread_server (arg=0x2) at blas_server.c:297
#7  0x00007f7283acdf2a in start_thread (arg=0x7f727a990700) at pthread_create.c:463
#8  0x00007f7283a00edf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7f727b191700 (LWP 26237)):
#0  0x00007f7283ad711c in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f7283ad06c6 in __GI___pthread_mutex_lock (mutex=0x7f7287775968 <_rtld_global+2312>) at ../nptl/pthread_mutex_lock.c:113
#2  0x00007f728775e5b7 in tls_get_addr_tail (ti=0x7f7278c2fc70, dtv=0x55edf85704b0, the_map=0x55edf8567980) at ../elf/dl-tls.c:761
#3  0x00007f7287764288 in __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55
#4  0x00007f7276d86400 in get_memory_table () at memory.c:1147
#5  0x00007f7276d86400 in blas_memory_alloc (procpos=procpos at entry=2) at memory.c:1147
#6  0x00007f7276d86bbb in blas_thread_server (arg=0x1) at blas_server.c:297
#7  0x00007f7283acdf2a in start_thread (arg=0x7f727b191700) at pthread_create.c:463
#8  0x00007f7283a00edf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7f727b992700 (LWP 26236)):
#0  0x00007f7283ad711c in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f7283ad06c6 in __GI___pthread_mutex_lock (mutex=0x7f7287775968 <_rtld_global+2312>) at ../nptl/pthread_mutex_lock.c:113
#2  0x00007f728775e5b7 in tls_get_addr_tail (ti=0x7f7278c2fc70, dtv=0x55edf8556c10, the_map=0x55edf8567980) at ../elf/dl-tls.c:761
#3  0x00007f7287764288 in __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55
#4  0x00007f7276d86400 in get_memory_table () at memory.c:1147
#5  0x00007f7276d86400 in blas_memory_alloc (procpos=procpos at entry=2) at memory.c:1147
#6  0x00007f7276d86bbb in blas_thread_server (arg=0x0) at blas_server.c:297
#7  0x00007f7283acdf2a in start_thread (arg=0x7f727b992700) at pthread_create.c:463
#8  0x00007f7283a00edf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f727fb9ce00 (LWP 26235)):
#0  0x00007f7283acf38d in __GI___pthread_timedjoin_ex (threadid=140129676633856, thread_return=thread_return at entry=0x0, abstime=abstime at entry=0x0, block=block at entry=true)
    at pthread_join_common.c:89
#1  0x00007f7283acf1cc in __pthread_join (threadid=<optimized out>, thread_return=thread_return at entry=0x0) at pthread_join.c:24
#2  0x00007f7276d8764f in blas_thread_shutdown_ () at blas_server.c:972
#3  0x00007f7276d865e7 in blas_shutdown () at memory.c:1274
#4  0x00007f7276b60011 in gotoblas_quit () at memory.c:1470
#5  0x00007f7287760c54 in _dl_close_worker (map=<optimized out>, force=<optimized out>) at dl-close.c:288
#6  0x00007f728776183e in _dl_close (_map=0x55edf855fc40) at dl-close.c:842
---Type <return> to continue, or q <return> to quit---
#7  0x00007f7283a3cadf in __GI__dl_catch_exception (exception=exception at entry=0x7fff21e5ac60, operate=operate at entry=0x7f7280ef8350 <dlclose_doit>, args=args at entry=0x55edf855fc40)
    at dl-error-skeleton.c:196
#8  0x00007f7283a3cb6f in __GI__dl_catch_error (objname=objname at entry=0x55edf84a7c70, errstring=errstring at entry=0x55edf84a7c78, mallocedp=mallocedp at entry=0x55edf84a7c68, operate=operate at entry=0x7f7280ef8350 <dlclose_doit>, args=args at entry=0x55edf855fc40) at dl-error-skeleton.c:215
#9  0x00007f7280ef8975 in _dlerror_run (operate=operate at entry=0x7f7280ef8350 <dlclose_doit>, args=0x55edf855fc40) at dlerror.c:162
#10 0x00007f7280ef8393 in __dlclose (handle=<optimized out>) at dlclose.c:46
#11 0x00007f7283704366 in g_module_close () at /usr/lib/x86_64-linux-gnu/libgmodule-2.0.so.0
#12 0x00007f7284f75ecc in  () at /usr/lib/x86_64-linux-gnu/libgegl-0.4.so.0
#13 0x00007f7284f765d2 in gegl_module_new () at /usr/lib/x86_64-linux-gnu/libgegl-0.4.so.0
#14 0x00007f7284f76aac in  () at /usr/lib/x86_64-linux-gnu/libgegl-0.4.so.0
#15 0x00007f7284f75aa1 in gegl_datafiles_read_directories () at /usr/lib/x86_64-linux-gnu/libgegl-0.4.so.0
#16 0x00007f7283ce5ced in g_slist_foreach () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#17 0x00007f7284f48608 in  () at /usr/lib/x86_64-linux-gnu/libgegl-0.4.so.0
#18 0x00007f7283cd35d8 in g_option_context_parse () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#19 0x00007f7283cd4574 in g_option_context_parse_strv () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#20 0x000055edf72b2baa in main ()
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://alioth-lists.debian.net/pipermail/pkg-gnome-maintainers/attachments/20180808/35be4e66/attachment.sig>


More information about the pkg-gnome-maintainers mailing list