usrmerge testing in our CI

Sat Jun 17 22:08:20 BST 2023

On 2023-06-17, Holger Levsen wrote:
> first: i've temporarily disabled testing usrmerge variation last
> night as this broke our builds, because the .buildinfo files vary
> (usrmerge installed or not), causing basically all builds to fail.

Ah well!

From what I recall looking at the log posted in irc it might be
sufficient to "apt autoremove usrmerge" after the fact, as usrmerge is
not really reversible... e.g. the /bin -> /usr/bin and other symlinks
should continue to persist.

> second: this has been happening since a few weeks, I still don't
> get why this suddenly stopped working as we have varied usrmerge
> since 2020.

I think at several times during the bookworm cycle the techniques had to
be adjusted in order to actually test usrmerge variations... and this
just seems like a new surprise in a long list of surprises...

> third: the code which was in use (since then) was varying usrmerge
> everywhere except buster & bullseye!

I am confident during the bullseye release cycle it was actually
enabled, and then once it was released it was changed or just not tested
for some reason.

> On Thu, Jun 15, 2023 at 09:55:38AM -0700, Vagrant Cascadian wrote:
>> On 2023-06-15, Chris Lamb wrote:
>> Off the top of my head, I do not know how many, but definitely some of
>> the usrmerge related bugs I have found will successfully build in a
>> usrmerge environment, but in a way that quietly breaks the package
>> (e.g. junk in manpages or other documentation, entirely missing
>> documentation, embedding an entirely sensical path, etc.). It would be
>> hard to systematically find these bugs without testing builds in
>> usrmerge and non-usrmerge environments. And I am not sure any other
>> project makes any more sense to do that sort of testing.
>
> as said originally: we are the reproducible builds project, not the 
> Debian QA project trying to find as many issues as we can.

It is a judgement call just how much of the variations are "just" QA and
how much are important reproducible builds issues...

I mean, sure, the argument can be made that a usrmerge and non-usrmerge
environment are not the same build environment, even if they otherwise
have the same packages installed!

How many times have we had people ask us to "just ignore timestamps"? Of
course, that is impractical to actually ignore due to checksum
mismatches or maintaining a consistent state system clock.

How about /bin/sh -> bash vs. /bin/sh -> dash? The transition is
arguably complete.

Build paths? These in theory are easy to reproduce and build in the same
path, but in practice there are a lot of real-world build path
variations on Debian infrastructure...

I see the purpose of testing exceptional things as actual reproducible
builds issues, and even if unlikely to occur in Debian, they might help
out some other project that makes different choices than Debian.

> I'd rather have us focus on real reproducibility issues, than issues
> with a variation we, Debian, don't care about.

I would also like that we focus on real reproducibility issues, with
variations that we, Reproducible Builds and Debian, care about. :P

>> > I'm less clear about whether we should cease testing bookworm. It
>> > doesn't seem right for the CI to claim that various [bookworm]
>> > packages are "reproducible in bookworm", when the presence of usrmerge
>> > (or lack thereof) in bookworm means that they can still vary.
>
> bookworm wasn't build with usrmerge variation, but rather usrmerge
> was explicitly disabled on the builds.

For clarity, you mean on buildd.debian.org?

> so one can also say that it doesnt seem right to introduce a variation
> which never occured.

I am near-absolutely-positive that it did occur on many developer
machines, many of whom hopefully signed and uploaded their .buildinfo
files...

Even some builds on buildd.debian.org were done with usrmerge enabled,
as debootstrap at various points defaulted to creating a usrmerge chroot
by default. And then did not. And then did. And I am not even sure what
the default is anymore. So even if people were using sbuild, pbuilder,
etc. for a clean build environment, they very likely uploaded packages
and corresponding .buildinfo files throughout the bookworm release cycle
that were built in both usrmerge and non-usrmerge environments...

Yes most of those packages did not actually make it into bookworm,
thanks to an awesome release team policy, but I would not be surprised
if some that used build-depends on some of these maintainer-built
packages ended up in bookworm... and of course a few builds that
happened on buildds with usrmerge enabled might have also slipped
through?

So, it is significantly more than never, at the very least. Never is a
pretty low bar to disprove. :)

>> I find it a bit weird to change the variations after the release, as
>> then the stats will gradually change as packages get retested, without
>> any actual work done to change them... feels a bit too much like moving
>> the goalposts retroactively.
>
> iirc the amount of affected packages is something like 50, so hardly 0.1%,
> so hardly noticable.

I suspect it is much higher than that, as in jenkins.git it was broken
for quite a while... (not sure exactly how long)

commit 474a47c31e7217f0afb015d67d70975d2d84d813
Author: Mattia Rizzolo <mattia at debian.org>
Date:   Wed Oct 12 13:16:16 2022 +0200

    reproducible debian: only re-configure usrmerge on bookworm+

Somewhere along the way the technique we used to enable usrmerge testing
was broken (probably due to changes in usrmerge, debootstrap, etc.), and
for at least as long as it took to cycle through and rebuild all
packages, that was effectively not being tested... and then usrmerge
bugs started showing up again once that was fixed.

And apparently it recently broke even harder...

> also we are not testing against what was released anyway, but rather 
> doing CI tests with arbitrary variations. bookworm also wasnt built in 2024,
> yet we are testing exactlty this right now. ;)

Well, yes... we are not testing against packages in the archive at all,
really, we are stress-testing in an intentionally challenging
environment with the intention to find reproducibility bugs... and
reproducible builds testing also tends to unveil real bugs that might be
hard to find any other way that through applying reproducible builds.

I know we would all love to (also?) be able to have tests that attempt
to rebuild what is in the archive, but as I understand that is a bit
stalled out right now...

So, I guess now (at the start of the trixie development cycle) is as
good a time as ever to decide if we want to shift gears and test less
variations going forward? That still would not get us testing against
packages in the archive, but it might actually get closer... while not
really getting what we really want (e.g. real tests against the
archive). I am of mixed mind on that...

live well,
  vagrant
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 227 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/reproducible-builds/attachments/20230617/9abcaef5/attachment.sig>