[Pkg-utopia-maintainers] Bug#1005889: dbus: flaky autopkgtest on ppc64el: dbus/integration/transient-services.sh.test

Paul Gevers elbrus at debian.org
Mon Feb 21 19:03:45 GMT 2022


Hi Simon,

On 21-02-2022 12:10, Simon McVittie wrote:
> Is there anything unusual about the ppc64el CI-runners compared with other
> architectures? (For example: lots of CPUs, few CPUs, lots of RAM, less RAM,
> lots of I/O bandwidth, running on tmpfs, using qemu, using lxc, running
> many tests in parallel, ...)

Our ppc64el runners are quite similar in terms of CPU, RAM etc as most 
of our amd64/i386/arm64 workers. The thing I noticed them to be 
different is that they seem to run in a virtual environment:
debian at ci-worker-ppc64el-01:~$ lspci
00:01.0 Ethernet controller: Red Hat, Inc. Virtio network device
00:02.0 SCSI storage controller: Red Hat, Inc. Virtio SCSI
00:03.0 USB controller: Red Hat, Inc. QEMU XHCI Host Controller (rev 01)
00:04.0 Communication controller: Red Hat, Inc. Virtio console
00:05.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon
00:06.0 VGA compatible controller: Device 1234:1111 (rev 02)

>>From https://ci.debian.net/packages/d/dbus/testing/ppc64el/ it looks like
> this is failing about 25% of the time, does that match your experience?

I was totally judging form this page, so yes.

>> Bail out! /run/user/1000/dbus-1/services is not a directory
> 
> My best guess at the root cause for this is that when
> gnome-desktop-testing-runner schedules lots of unit tests in a
> newly-opened user session, if the integration test for transient
> services happens to be one of the first ones to be run, then the session
> dbus-daemon will not necessarily have been started by systemd socket
> activation just yet. If the test runner has a large number of CPU cores,
> then that makes it more likely that the test will win the race with the
> dbus-daemon, resulting in failure.

I don't experience our ppc64el hosts as extremely fast, but who knows.

> I have a possible patch which I'll upload soon. Would you be able to
> schedule several consecutive runs on the affected hardware to make
> sure it's really fixed? 10 runs should be enough for a reasonable level
> of confidence.

Sure, but anybody (with Salsa credentials) can schedule those jobs. Just 
hitting the retry button will do. Results should be fast too as they are 
scheduled with higher prio.

Paul
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 495 bytes
Desc: OpenPGP digital signature
URL: <http://alioth-lists.debian.net/pipermail/pkg-utopia-maintainers/attachments/20220221/870f58e8/attachment.sig>


More information about the Pkg-utopia-maintainers mailing list