Bug#1057562: Some severities to reconsider due to flaky tests
Paul Gevers
elbrus at debian.org
Thu Apr 10 08:29:46 BST 2025
Hi,
On 09-04-2025 21:42, Jeremy Bícha wrote:
> On Wed, Apr 9, 2025 at 1:33 PM Paul Gevers <elbrus at debian.org> wrote:
>>
>> My personal standard (but with Release Team hat in mind) is that I file
>> RC bugs about flakiness if a test fails more than about 1 out of 6 times
>> (on a particular architecture if it's architecture specific).
Sorry, I didn't have the full context of the thread when I replied
yesterday. I should have delayed sending the response until I did.
> What about if the failures happen 100% on someone's AWS instances but
> are reliably passing on official Debian infrastructure?
That's not what I mean with flaky.
For some background: in private communications that I've had with
Santiago, he has been advocating the case to declare FTBFS that happen
reliably on 1-cpu hosts as RC. From what I've seen so far in his
reports, the 1-cpu case occasionally exposes bugs that are hidden
otherwise on the official buildds and whatever the maintainer uses for
their test builds (own machine, salsa, etc). Thus, using 1-cpu hosts is
a valuable way to test. On the other hand, 1-cpu hosts are not what most
developers (and users I assume) use, and also not what we use on the
buildds. Hence I can also relate to maintainers that think the 1-cpu
case is just odd. As a result, I have refused to back him up in filing
the 1-cpu FTBFS type of bugs at RC level and I suggested to file these
bugs at severity level important. I've told him however that I do expect
maintainers to take reasonable (and hence maintainable) patches, which
ideally should just go upstream of course. So I suggested he'd work on
providing patches with the 1-cpu reports that he files, as the 1-cpu
case is important to him. I've told Santiago multiple times I appreciate
his QA rebuilds (including the 1-cpu ones) a lot.
So, back to this case. The original report (1057562) was filed at
severity serious and didn't mention the 1-cpu case. Jeremy claimed
flakiness and lowered severity, which was bumped later by a Release
Manager with the request to avoid the flaky test (fix it or disable it)
because it was seen on the buildds. Later on (after message 86) the
severity discussion becomes more difficult, because of changes that
probably lowered the chance of the bug on more-than-1-cpu hosts, the
definition of flakiness and statistics on the buildds. My (Release Team
member) position is the following. As mentioned earlier, flakiness in my
book is a serious problem if the failure rate is above 1 out of 6
(roughly). It's an important problem if it occurs less. On its own,
1-cpu FTBFS are important issues but not serious. In this case, the
FTBFS are due to a particular test, and luckily tests can be disabled
during the build. The test fails reliably on the 1-cpu case, and when
tested by Santiago on a 2-cpu system failed 8% of the cases. According
to my limits above, that 8% is not RC, but because the FTBFS happens
because of 1 test, I do ask the maintainers of gcr4 (and gcr) to disable
that particular test during the build until the underlying problem has
been fixed. The patch in message 121 is supposed to do exactly that.
Paul
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 495 bytes
Desc: OpenPGP digital signature
URL: <http://alioth-lists.debian.net/pipermail/pkg-gnome-maintainers/attachments/20250410/8b344876/attachment-0001.sig>
More information about the pkg-gnome-maintainers
mailing list