Bug#1049407: gnome-shell: build-time tests crash when using Mesa softpipe
Simon McVittie
smcv at debian.org
Tue Aug 15 11:31:38 BST 2023
Source: gnome-shell
Version: 44.3-3
Severity: normal
Steps to reproduce:
1. Get a source tree and build-dependencies
2. Remove this workaround from d/rules if you are using mips(64)el:
ifneq ($(filter mips%,$(DEB_HOST_ARCH_CPU)),)
# gnome-shell on mips(64)el works on a real GPU (in practice usually an
# AMD GPU), but crashes when using llvmpipe or softpipe, which is all that
# is available on the buildds, so we only run the unit tests at build time
# and skip the tests that would run the whole Shell. See discussion in
# https://salsa.debian.org/gnome-team/gnome-shell/-/merge_requests/71
meson_test_options += --no-suite shell
endif
3. Add this instead:
export GALLIUM_DRIVER=softpipe
export LIBGL_ALWAYS_SOFTWARE=true
4. debuild
Expected result: tests pass.
Actual result: tests fail with a gnome-shell
crash. According to mips porter YunQiang Su on
<https://salsa.debian.org/gnome-team/gnome-shell/-/merge_requests/71>,
this is also reproducible on arm64 (I have not verified this).
Impact: nobody intentionally uses softpipe in practice, but this
prevents us from using it as a workaround when llvmpipe has issues
(such as #1049404).
YunQiang Su writes:
> The reason is that the in gjs/gi/function.cpp(Function::invoke),
> the value of ffi_arg_pointers.get() has no TOPLEVEL Stage, so
> shell_wm_completed_map segfault.
...
> On my ARM64 machine, if no breakpoint is set, segfault will always
> happen. If 2 breakpoints is set on both: b function.cpp:1050 if
> function=shell_wm_completed_map shell_wm_completed_map The test will
> always pass.
>
> So I guess some other thread change the data to shell_wm_completed_map.
...
> nano sleep some time (1<<23 ns for my arm64 server) before the ffi_call
> can pass the test.
>
> and taskset also helps the possibility of test pass.
This suggests that there is some timing or multi-threading issue that is
triggering this when using softpipe.
shell_wm_completed_map() is a gnome-shell function, nothing to do with
Mesa or LLVM, and similarly gjs/gi/function.cpp is part of gjs, so I
think this is more likely to be a bug in gobject-introspection, gjs,
gnome-shell or mutter than a bug in LLVM or Mesa.
My guess would be that there's some fallback rendering path that is
rarely tested and therefore contains bugs, because all real-world GNOME
Shell users are using either a hardware GPU or llvmpipe, and nobody uses
softpipe in practice.
smcv
More information about the pkg-gnome-maintainers
mailing list