Bug#848859: FTBFS randomly (failing tests)

Thu Jan 5 10:06:50 UTC 2017

Hi all,

On 04.01.2017 20:57, Santiago Vila wrote:
> I still want to build all packages and have 0 or 1 failures,
> so in this case the probability should be 1/50/2, i.e. 1%.
> 
> I think this is still feasible.

My experience is that the buildds that are currently in use provide more
build problems than the packages themself. BTW, why don't you count this
as RC?

[statistical]
>> The "fix" for such cases is the increasing of the threshold or disabling
>> the test completely. Because you can do nothing with it due to the
>> nature of numerical simulations.

Just disabling would be bad, since the test still can point to some
problem in the implementation, like optimization problems or such.

IMO the correct solution here would be to remove the randomness and to
seed with a fixed value (which is known to give a result within the
expectations).

> But as far as there are people in this project who consider that a
> package which FTBFS on single-CPU machines more than 50% of the time
> is ok "because it does not usually fail in buildd.debian.org",
> we are doomed.

We have 10 archs, and with a 50% chance of failure you will not get any
version built. Even when the buildds try it two or three times.

> The problem I see with this threshold thing is that every maintainer
> seems to have his own threshold, different from the others.
> 
> In case we decide about RC-ness depending on probability of failure:
> What threshold do you think we should use for a single package and why?
> 
> [ I say that 1% of failure is the maximum we should allow, and I've
>   explained why, but I would love to hear your opinion on this ].

I would not put any direct number here, but be pragmatic: We need to
build the package from source, and this has to be done on all supported
architectures, and on the buildds. As long as this is done, I see no
reason for an RC bug. If a package fails on a supported arch on the
buildds, this is RC, independent of whether this came from an
architectural difference or from a random build failure. Your tests with
repeated builds help the maintainer to find the cause in that case, and
for this they are helpful. But not release critical by themself.

Cheers

Ole