Bug#1109107: libsoup3: intermittent test failure: memory corruption in multithread-test

Simon McVittie smcv at debian.org
Fri Jul 11 14:51:16 BST 2025


Source: libsoup3
Version: 3.6.4-2
Severity: important
Tags: ftbfs help moreinfo
Control: block 1035983 by -1

In a previous build of libsoup3 on the official buildds, 
multithread-test failed with evidence of memory corruption:

https://buildd.debian.org/status/fetch.php?pkg=libsoup3&arch=amd64&ver=3.6.4-2&stamp=1737574120&raw=0
> 17/38 multithread-test         RUNNING
> >>> MALLOC_PERTURB_=181 G_TEST_SRCDIR=/build/reproducible-path/libsoup3-3.6.4/tests MESON_TEST_ITERATION=1 LD_LIBRARY_PATH=/build/reproducible-path/libsoup3-3.6.4/obj-x86_64-linux-gnu/tests:/build/reproducible-path/libsoup3-3.6.4/obj-x86_64-linux-gnu/libsoup MALLOC_CHECK_=2 MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 G_TEST_BUILDDIR=/build/reproducible-path/libsoup3-3.6.4/obj-x86_64-linux-gnu/tests G_DEBUG=gc-friendly UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 /build/reproducible-path/libsoup3-3.6.4/obj-x86_64-linux-gnu/tests/multithread-test --debug
> ▶ 15/38 /misc/cancel-while-reading/msg OK
> ▶ 15/38 /misc/cancel-while-reading/req/immediate OK
> ▶ 17/38 /multithread/no-main-context OK
> ▶ 17/38 /multithread/basic/async OK
> ▶ 17/38 /multithread/basic/sync OK
> ▶ 17/38 /multithread/basic-ssl/async OK
> ▶ 17/38 /multithread/basic-ssl/sync OK
> ▶ 17/38 /multithread/basic-proxy/async OK
> ▶ 17/38 /multithread/basic-proxy/sync OK
> ▶ 17/38 /multithread/basic-no-main-thread/async OK
> ▶ 17/38 /multithread/basic-no-main-thread/sync OK
> ▶ 17/38 /multithread/basic-ssl-proxy/async OK
> ▶ 17/38 /multithread/basic-ssl-proxy/sync OK
> ▶ 17/38 /multithread/basic-http2/async OK
> 17/38 multithread-test         ERROR             0.09s   killed by signal 6 SIGABRT
> ――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
> stderr:
> malloc(): unsorted double linked list corrupted
> 
> (test program exited with status code -6)

When the build was retried, all tests succeeded, so this is presumably 
intermittent or otherwise unreproducible.

This is **not** the same as the failure mode that has been the most common 
in the past, where tests that use Apache fail with "Address already in 
use: AH00072: make_sock: could not bind to address 127.0.0.1:xxx".

Similarly when I tried to add Salsa-CI to this package, my first attempt 
failed with a different indication of memory corruption:

https://salsa.debian.org/gnome-team/libsoup3/-/jobs/7814730
> 17/38 multithread-test         ERROR            15.76s   killed by signal 6 SIGABRT
> ――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
> stderr:
> tcache_thread_shutdown(): unaligned tcache chunk detected
> (test program exited with status code -6)
> TAP parsing error: Too few tests run (expected 21, got 11)

I think we can probably treat any evidence of memory corruption in this 
test as being essentially equivalent - if we corrupt the heap, then 
glibc can fail in several different ways as a result, none of which are 
meaningfully different.

There seems to be a second failure mode where multithread-test times 
out (the default timeout is 60 seconds, but we use a 6x multiplier in 
the Debian packaging to accommodate slower architectures). That 
failure mode should be treated as a separate bug and is out of scope for 
this particular bug report, although it's possible that it has the same 
root cause. I will report that failure mode as a separate bug.

To get an idea of how frequent this is, I tried these steps on the amd64 
porterbox, barriere:

1. build libsoup3 (from unstable):

   schroot -c $chroot -r -- \
   env DEB_BUILD_PROFILES=noudeb \
   debuild -e CCACHE_DIR=$HOME/.ccache -e PATH=/usr/lib/ccache:$PATH -us -uc -B

2. run multithread-test repeatedly:

   schroot -c $chroot -r -- \
   env -C obj-x86_64-linux-gnu \
   DEB_BUILD_PROFILES=noudeb CCACHE_DIR=$HOME/.ccache PATH=/usr/lib/ccache:$PATH \
   DEB_PYTHON_INSTALL_LAYOUT=deb LC_ALL=C.UTF-8 \
   meson test --repeat 100 -j1 multithread-test

   (I tried this 3 times; optionally add --timeout-multiplier=6 to the
   `meson test` command-line to emulate the original package build more
   accurately)

3. read obj-x86_64-linux-gnu/meson-logs/testlog.txt for details of the
   failures, if any

and my results were as follows:

- 7 successes, 1 timeout, 1 failure with memory corruption
- 19 successes, 1 timeout, 6 more successes, 1 more timeout, I cancelled
  the run at this point
- 10 successes, 1 timeout, 15 more successes, 1 failure with
  memory corruption

Anyone who wants libsoup3 tests to pass more often is invited to help to 
debug and fix this. If the failure is reproducible under valgrind, 
probably the easiest way is to build it in an environment that is 
suitable for interactive debugging, then run multithread-test repeatedly 
under valgrind, using something like

    meson test --repeat 100 --wrapper=./valgrind.sh multithread-test

to get a backtrace for the memory corruption and figure out how it is 
happening. But this might not be possible if using valgrind perturbs the 
timing enough that the failure mode never actually happens.

Or it might be possible to build libsoup3 (and ideally GLib too) with 
-fsanitize=address,undefined, and then run multithread-test repeatedly, 
as above; but, again, AddressSanitizer slows down the binaries, which 
could perturb the timing enough that the failure mode never actually 
happens.

Annoyingly, it is not possible to run two or more copies of this test in 
parallel, so that cannot be used to get to a failure sooner (this is 
because each run of this test uses the same fixed filenames and port 
numbers).

I am a member of the GNOME team, but not an Uploader of this particular 
package. I am aware that some project members believe that, because I 
have solved test issues it in the past, I should be held personally 
responsible for every test failure that occurs in GNOME. As per the 
Debian Social Contract §2.1.1, I decline that responsibility: I am not 
able to fix everything all of the time, and I'm sorry if the project 
considers my contributions to be inadequate.

    smcv



More information about the pkg-gnome-maintainers mailing list