Bug#1035983: libsoup3 (and libsoup2) autopkgtests are flaky: Address already in use: AH00072: make_sock: could not bind to address 127.0.0.1:47524

Sat Jul 12 18:11:05 BST 2025

Control: clone 1035983 -2
Control: retitle 1035983 libsoup3: intermittent test failures: Address already in use: AH00072: make_sock: could not bind to address 127.0.0.1:xxx
Control: retitle -2 libsoup3: [metabug] several intermittent test failures resulting in flaky autopkgtests and FTBFS
Control: unblock 1035983 by 1109107 1109108
Control: block -2 by 1035983

On Mon, 19 May 2025 at 17:57:50 +0200, Santiago Vila wrote:
>El 19/5/25 a las 16:43, Simon McVittie escribió:
>>Is this still the same failure mode described in the bug title, with "Address already in use" and "could not bind to address ..." being reported by Apache?
>
>That's a very good question and I'm glad that you asked :-)
>
>In some cases, yes, but not always.

Bug #1035983 has always mentioned the AH00072 issue in its title, so I 
think it's probably best if we consider any other sources of FTBFS or 
autopkgtest failures as out-of-scope for #1035983.

Regarding the topic of flaky tests in general:

Unfortunately I suspect that what's happening here is that we have a 
series of different test failures, each of them individually quite rare 
(therefore hard to reproduce or debug), which add up to a significant 
probability that at least one of the rare failures will happen at least 
once in any given test run and therefore the overall test suite fails. 

I've cloned a "metabug" (-2 above) to be blocked by #1035983 and other 
concrete and potentially actionable causes of test failures, but that 
metabug is not going to be directly actionable, because issues that 
can't be identified can't be fixed: the only way it can be solved is to 
chip away at its actionable dependencies until the failure rate becomes 
sufficiently low. I am not an expert on this package and I cannot commit 
to being able to achieve that.

Individual tests that are sufficiently flaky can be worked around by 
disabling or ignoring the test if necessary (as was done for the 
tls_interaction test already), but the cost of disabling tests is that 
we can no longer use them to detect RC-severity regressions 
(particularly on architectures with few users where the buildds and 
autopkgtest are basically the only tools we have), so there's a 
trade-off here between breakage caused by false-positive failures and 
breakage caused by regressions that could have been caught by running 
the tests. As a non-expert trying to keep this package afloat, I don't 
feel that I am able to make high-quality uploads without automated tests 
to detect my inevitable mistakes. I'm sorry that this is disappointing, 
and I would be delighted to stop contributing to libsoup when someone 
can do a better job, but until then all I can do is to try to have a 
net-positive impact to the best of my limited ability.

As mentioned previously, the AH00072 issue, #1035983, is particularly bad 
for this because it affects several tests equally, and disabling all of 
them would lose a lot of the overall test coverage.

>I've put a collection
>of failed build logs here:
>
>https://people.debian.org/~sanvila/build-logs/libsoup3/

Thanks, hopefully someone can analyze those at some point and pick out 
the actionable equivalence classes. I cannot commit to being able to do 
this myself.

I've reported some other sources of intermittent test failures as 
#1109107 (no solution known, help welcome), #1109108 (no solution known, 
help welcome) and #1109120 (fixed in the latest upload to unstable by an 
upstream change). None of these are, individually, a high probability of 
failure, but they add up.

When I tried running the test suite repeatedly on barriere, the failure 
modes I saw intermittently were #1109107 and #1109108. I don't think I 
saw #1109120 or #1035983, so those might be less common, at least on 
that particular machine (if the failures are timing-dependent then they 
might behave differently elsewhere).

Regarding #1035983 (the AH00072 issue) specifically:

>Last time I looked at the libsoup* test suite, the actual tests were 
>each reasonably reliable, but the reliability issue was with their 
>setup/teardown. They run a temporary Apache web server, in order to 
>have a realistic server to test against. I think what's happening is 
>that sometimes, the web server port from one test (let's say test 
>number 5) is still considered by the kernel to be in use by the time 
>we reach the setup stage of the next test (let's say test number 6).
>
>As a result, the Apache for test number 6 can't listen on the port it 
>has been configured to use, and testing fails at that point.

I tried applying the attached patch as a brute-force attempt to solve 
the port-still-in-use problem (#1035983). (FYI this will not apply 
cleanly to upstream code, it requires other changes already in 
debian/patches to add more debug info, which I added last time I spent 
time on trying to figure this out.)

Unfortunately it didn't work: the test made multiple attempts to start 
Apache, but they all failed with the same error message shown in the 
Subject, until the overall test timed out. That suggests that my theory 
about the web server port being in TIME_WAIT state might not have been 
correct. I don't know what else to try there.

In 3.6.5-2 I added a patch fixing an upstream issue where one of the 
tests that used Apache was not marked "don't run in parallel", so it 
could end up being run in parallel with other tests - that could have 
resulted in a similar failure mode. We can see whether that helps. I 
think I've still seen the AH00072 error occasionally even after making 
that change, though, so it can't be the whole story.

     smcv
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tests-If-we-can-t-start-Apache-wait-a-bit-and-try-again.patch
Type: text/x-diff
Size: 1442 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/pkg-gnome-maintainers/attachments/20250712/6e61aa45/attachment.patch>