Bug#1057562: Some severities to reconsider due to flaky tests

Paul Gevers elbrus at debian.org
Thu Apr 10 08:29:46 BST 2025


Hi,

On 09-04-2025 21:42, Jeremy Bícha wrote:
> On Wed, Apr 9, 2025 at 1:33 PM Paul Gevers <elbrus at debian.org> wrote:
>>
>> My personal standard (but with Release Team hat in mind) is that I file
>> RC bugs about flakiness if a test fails more than about 1 out of 6 times
>> (on a particular architecture if it's architecture specific).


Sorry, I didn't have the full context of the thread when I replied 
yesterday. I should have delayed sending the response until I did.

> What about if the failures happen 100% on someone's AWS instances but
> are reliably passing on official Debian infrastructure?


That's not what I mean with flaky.

For some background: in private communications that I've had with 
Santiago, he has been advocating the case to declare FTBFS that happen 
reliably on 1-cpu hosts as RC. From what I've seen so far in his 
reports, the 1-cpu case occasionally exposes bugs that are hidden 
otherwise on the official buildds and whatever the maintainer uses for 
their test builds (own machine, salsa, etc). Thus, using 1-cpu hosts is 
a valuable way to test. On the other hand, 1-cpu hosts are not what most 
developers (and users I assume) use, and also not what we use on the 
buildds. Hence I can also relate to maintainers that think the 1-cpu 
case is just odd. As a result, I have refused to back him up in filing 
the 1-cpu FTBFS type of bugs at RC level and I suggested to file these 
bugs at severity level important. I've told him however that I do expect 
maintainers to take reasonable (and hence maintainable) patches, which 
ideally should just go upstream of course. So I suggested he'd work on 
providing patches with the 1-cpu reports that he files, as the 1-cpu 
case is important to him. I've told Santiago multiple times I appreciate 
his QA rebuilds (including the 1-cpu ones) a lot.

So, back to this case. The original report (1057562) was filed at 
severity serious and didn't mention the 1-cpu case. Jeremy claimed 
flakiness and lowered severity, which was bumped later by a Release 
Manager with the request to avoid the flaky test (fix it or disable it) 
because it was seen on the buildds. Later on (after message 86) the 
severity discussion becomes more difficult, because of changes that 
probably lowered the chance of the bug on more-than-1-cpu hosts, the 
definition of flakiness and statistics on the buildds. My (Release Team 
member) position is the following. As mentioned earlier, flakiness in my 
book is a serious problem if the failure rate is above 1 out of 6 
(roughly). It's an important problem if it occurs less. On its own, 
1-cpu FTBFS are important issues but not serious. In this case, the 
FTBFS are due to a particular test, and luckily tests can be disabled 
during the build. The test fails reliably on the 1-cpu case, and when 
tested by Santiago on a 2-cpu system failed 8% of the cases. According 
to my limits above, that 8% is not RC, but because the FTBFS happens 
because of 1 test, I do ask the maintainers of gcr4 (and gcr) to disable 
that particular test during the build until the underlying problem has 
been fixed. The patch in message 121 is supposed to do exactly that.

Paul

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 495 bytes
Desc: OpenPGP digital signature
URL: <http://alioth-lists.debian.net/pipermail/pkg-gnome-maintainers/attachments/20250410/8b344876/attachment-0001.sig>


More information about the pkg-gnome-maintainers mailing list