Bug#1109107: libsoup3: intermittent test failure: memory corruption in multithread-test
Simon McVittie
smcv at debian.org
Fri Jul 11 14:51:16 BST 2025
Source: libsoup3
Version: 3.6.4-2
Severity: important
Tags: ftbfs help moreinfo
Control: block 1035983 by -1
In a previous build of libsoup3 on the official buildds,
multithread-test failed with evidence of memory corruption:
https://buildd.debian.org/status/fetch.php?pkg=libsoup3&arch=amd64&ver=3.6.4-2&stamp=1737574120&raw=0
> 17/38 multithread-test RUNNING
> >>> MALLOC_PERTURB_=181 G_TEST_SRCDIR=/build/reproducible-path/libsoup3-3.6.4/tests MESON_TEST_ITERATION=1 LD_LIBRARY_PATH=/build/reproducible-path/libsoup3-3.6.4/obj-x86_64-linux-gnu/tests:/build/reproducible-path/libsoup3-3.6.4/obj-x86_64-linux-gnu/libsoup MALLOC_CHECK_=2 MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 G_TEST_BUILDDIR=/build/reproducible-path/libsoup3-3.6.4/obj-x86_64-linux-gnu/tests G_DEBUG=gc-friendly UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 /build/reproducible-path/libsoup3-3.6.4/obj-x86_64-linux-gnu/tests/multithread-test --debug
> ▶ 15/38 /misc/cancel-while-reading/msg OK
> ▶ 15/38 /misc/cancel-while-reading/req/immediate OK
> ▶ 17/38 /multithread/no-main-context OK
> ▶ 17/38 /multithread/basic/async OK
> ▶ 17/38 /multithread/basic/sync OK
> ▶ 17/38 /multithread/basic-ssl/async OK
> ▶ 17/38 /multithread/basic-ssl/sync OK
> ▶ 17/38 /multithread/basic-proxy/async OK
> ▶ 17/38 /multithread/basic-proxy/sync OK
> ▶ 17/38 /multithread/basic-no-main-thread/async OK
> ▶ 17/38 /multithread/basic-no-main-thread/sync OK
> ▶ 17/38 /multithread/basic-ssl-proxy/async OK
> ▶ 17/38 /multithread/basic-ssl-proxy/sync OK
> ▶ 17/38 /multithread/basic-http2/async OK
> 17/38 multithread-test ERROR 0.09s killed by signal 6 SIGABRT
> ――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――
> stderr:
> malloc(): unsorted double linked list corrupted
>
> (test program exited with status code -6)
When the build was retried, all tests succeeded, so this is presumably
intermittent or otherwise unreproducible.
This is **not** the same as the failure mode that has been the most common
in the past, where tests that use Apache fail with "Address already in
use: AH00072: make_sock: could not bind to address 127.0.0.1:xxx".
Similarly when I tried to add Salsa-CI to this package, my first attempt
failed with a different indication of memory corruption:
https://salsa.debian.org/gnome-team/libsoup3/-/jobs/7814730
> 17/38 multithread-test ERROR 15.76s killed by signal 6 SIGABRT
> ――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――
> stderr:
> tcache_thread_shutdown(): unaligned tcache chunk detected
> (test program exited with status code -6)
> TAP parsing error: Too few tests run (expected 21, got 11)
I think we can probably treat any evidence of memory corruption in this
test as being essentially equivalent - if we corrupt the heap, then
glibc can fail in several different ways as a result, none of which are
meaningfully different.
There seems to be a second failure mode where multithread-test times
out (the default timeout is 60 seconds, but we use a 6x multiplier in
the Debian packaging to accommodate slower architectures). That
failure mode should be treated as a separate bug and is out of scope for
this particular bug report, although it's possible that it has the same
root cause. I will report that failure mode as a separate bug.
To get an idea of how frequent this is, I tried these steps on the amd64
porterbox, barriere:
1. build libsoup3 (from unstable):
schroot -c $chroot -r -- \
env DEB_BUILD_PROFILES=noudeb \
debuild -e CCACHE_DIR=$HOME/.ccache -e PATH=/usr/lib/ccache:$PATH -us -uc -B
2. run multithread-test repeatedly:
schroot -c $chroot -r -- \
env -C obj-x86_64-linux-gnu \
DEB_BUILD_PROFILES=noudeb CCACHE_DIR=$HOME/.ccache PATH=/usr/lib/ccache:$PATH \
DEB_PYTHON_INSTALL_LAYOUT=deb LC_ALL=C.UTF-8 \
meson test --repeat 100 -j1 multithread-test
(I tried this 3 times; optionally add --timeout-multiplier=6 to the
`meson test` command-line to emulate the original package build more
accurately)
3. read obj-x86_64-linux-gnu/meson-logs/testlog.txt for details of the
failures, if any
and my results were as follows:
- 7 successes, 1 timeout, 1 failure with memory corruption
- 19 successes, 1 timeout, 6 more successes, 1 more timeout, I cancelled
the run at this point
- 10 successes, 1 timeout, 15 more successes, 1 failure with
memory corruption
Anyone who wants libsoup3 tests to pass more often is invited to help to
debug and fix this. If the failure is reproducible under valgrind,
probably the easiest way is to build it in an environment that is
suitable for interactive debugging, then run multithread-test repeatedly
under valgrind, using something like
meson test --repeat 100 --wrapper=./valgrind.sh multithread-test
to get a backtrace for the memory corruption and figure out how it is
happening. But this might not be possible if using valgrind perturbs the
timing enough that the failure mode never actually happens.
Or it might be possible to build libsoup3 (and ideally GLib too) with
-fsanitize=address,undefined, and then run multithread-test repeatedly,
as above; but, again, AddressSanitizer slows down the binaries, which
could perturb the timing enough that the failure mode never actually
happens.
Annoyingly, it is not possible to run two or more copies of this test in
parallel, so that cannot be used to get to a failure sooner (this is
because each run of this test uses the same fixed filenames and port
numbers).
I am a member of the GNOME team, but not an Uploader of this particular
package. I am aware that some project members believe that, because I
have solved test issues it in the past, I should be held personally
responsible for every test failure that occurs in GNOME. As per the
Debian Social Contract §2.1.1, I decline that responsibility: I am not
able to fix everything all of the time, and I'm sorry if the project
considers my contributions to be inadequate.
smcv
More information about the pkg-gnome-maintainers
mailing list