Build with DEB_BUILD_OPTIONS="nocheck" when testing for reproducibility?

Sat Mar 22 23:41:01 GMT 2025

Hi Santiago, Otto,

On Sat, 22 Mar 2025 at 19:52, Santiago Vila <sanvila at debian.org> wrote:
>
> El 22/3/25 a las 20:34, Otto Kekäläinen escribió:
> > Hi!
> >
> > I noticed https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/mariadb.html
> > is failing on a single test failing in the test suite that runs after
> > the actual build. While the test should of course be made more robust,
> > I am not sure running those tests are relevant for testing
> > reproducibility as the tests don't affect the artifacts, so it would
> > actually make sense to skip those tests when testing for
> > reproducibility.
> >
> > Is there some general technique to run specific packages with
> > DEB_BUILD_OPTIONS="nocheck" in reproducible builds?

There is not yet, as far as I'm aware; however, there are a couple of
existing (similar, arguably duplicate) feature requests for reprotest,
available in Debian bugreports:

  - https://bugs.debian.org/786644
  - https://bugs.debian.org/1019742

I like the idea, because I think that it could help expose
insufficient isolation between the build and test phases of software.
That's based on my opinion that ideally, builds should not be able to
alter any of the tests that will subsequently run -- and vice-versa,
that testing should not affect any of the built artifacts nor the
sources.

> There are two schools of thought about that.
>
> The official definition in Policy says this:
>
> "repeatedly building the source package [...] will produce bit-for-bit identical binary packages"
>
> It does not say "building the package with nocheck".

If packages are/could-be built in complete isolation from their
subsequent testing, then the stated goal might be easier to achieve -
and perhaps without any rephasing of policy.

However: allowing the software to be built and tested _without_ that
kind of isolation is a good way to discover and note software that
doesn't obey the isolation best-practice by itself (e.g. it allows
detecting build/test bugs that could affect the software when packaged
in less stringent ecosystems).  That seems like a good candidate for
reprotest and/or tests.reproducible-builds.org infrastructure.

(as an aside: I would be glad for any feedback on whether the term
"variance testing" is an effective way to communicate what the
tests.reproducible-builds.org infra provides -- it is slightly
distinct from rebuilding)

> It follows from the official definition that "reproducible" implies "buildable", i.e. you can't have
> a reproducible package if it fails to build in the normal way.
>
> The other school of thought says that a package which fails to build 80% of the time because
> of the tests but produces identical *.deb packages the remaining 20% of the time would be reproducible. In my opinion, that's not the kind of reproducibility we should be aiming for.

If the build from a given source and set of inputs (e.g. build
dependencies) produces bit-for-bit identical results, then that
software is reproducible.  That doesn't necessarily indicate that it
is _reliable_ software -- an a 20% failure across multiple runs of a
test suite would seem to indicate that it is in fact unreliable.

NB: I think it's important to include the test suite with distributed
sources when possible; as a developer, running and writing tests are
both part of the software development process.  However, as noted
previously: tests can theoretically write to the sources and/or build
artifacts if they're not sufficiently isolated, unless precautions are
taken to prevent that.

Regards,
James