[Pkg-utopia-maintainers] Bug#1082570: libportal: intermittent FTBFS: Segmentation fault in pytest

Sat Feb 8 15:10:27 GMT 2025

Control: clone 1082570 -2
Control: retitle -2 libportal: FTBFS: TestRemoteDesktop::test_create_session: assert create_session_done_invoked

On Fri, 07 Feb 2025 at 19:26:17 +0100, Santiago Vila wrote:
> Now this is failing 100% of the time at least
> on AWS instances of type m7a.large and r7a.large
> (having 2 CPUs)

Please share a log for the particular test failure you are experiencing
here? It might be a recurrence of #1082570 (a segmentation fault in the
Python process while running pytest) or it might be something different.

> and there are also test failures
> in reproducible builds:
> https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/libportal.html

Thank you for reporting this, but this is not the same test failure
that was already tracked in #1082570; cloned as -2. The failure here is
that in TestRemoteDesktop::test_create_session, the assertion "assert
create_session_done_invoked" is failing. That's clearly undesired, and
is a test failure like #1082570, but the symptoms don't match. Looking
at the reproducible-builds history, it seems to have started somewhat
recently (late January), most likely triggered by a change in some other
package. The timing suggests that the switch to python3.13 might be one
possible trigger?

I'll send a failing log from reproducible-builds to the cloned bug for
reference when I get a bug number for the clone back from the BTS.

If the test failure you are experiencing on your AWS instances has
the same assertion-failure traceback as seen on reproducible-builds,
we should use the cloned bug -2 to represent it; or, if it's a segfault
in the python3 interpreter during pytest with symptoms similar to those
described in #1082570, we can use #1082570; or if it's some third thing
with different symptoms, please use a third clone or a new bug report
(a new bug report might be simplest).

If the specific symptom that was previously seen in #1082570 no longer
happens, we should probably assume that #1082570 was fixed by upstream
changes and close it, leaving other bug reports open to represent
different test failures.

> and also in Salsa CI:
> https://salsa.debian.org/debian/libportal/-/jobs/6329361

This was a different build/test failure, distinct from either #1082570
or the clone -2. In #1082570 the python3 interpreter crashed during
testing, in the clone -2 an assertion failed, and in the Salsa-CI failure,
testing did not even start (xvfb-run failed before we got that far).

I believe the root cause for this was a limitation of the build chroot
(a missing or incomplete /proc, which is at least arguably a bug in the
build environment), but it has been avoided in newer versions of xvfb-run
(#921657, #1087418) by a change whose author name you might recognise.

I retried the pipeline on the same source code now that xvfb-run has been
fixed, and the build jobs have run successfully, so I'm considering this
specific issue to be a solved problem and not cloning a bug to represent it.

> If you think that the best course of action is to fix the test
> please take a look at this bug in giza which looks similar
> because the way xvfb-run is used:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1094102

If xvfb-run is starting its child process before the X server is ready
to receive connections, then I think that's a bug in xvfb-run, which in
fact was already reported (#1095028).

If that issue persists, then it would be possible to sprinkle workarounds
involving sleep(1) into every package that wants to run tests with an
X11 display (so that we start up the X display, then wait a bit for it
to become ready, and finally run the actual tests). However, that seems
like it scales poorly, and I think it would be better if someone can
solve the underlying issue centrally, in xvfb-run. If your expectation is
that the "someone" should be me, I'm sorry that I have not yet done so.

Neither #1082570 nor -2 looks like the same failure mode described in
#1095028, though: the libportal tests seem to start slowly enough that,
in practice, they do not trigger #1095028. Unless the failures in your
AWS testing *do* show evidence of that failure mode? If they do, adding
a sleep-based workaround would be one way that it could be addressed.

    smcv