Bug#876055: Environment variable handling for reproducible builds
Ximin Luo
infinity0 at debian.org
Tue Sep 19 10:15:00 UTC 2017
Russ Allbery:
> [..] It does mean that discovery of any new
> such environment variable would require a change to our whitelist in
> approach (3), so there would be some lag and the whitelist would become
> long over time (with a corresponding testing load). But (3) does try to
> achieve that use case without trying to anticipate any possible
> environment variable setting. It lets us be reactive to newly-discovered
> environment variables across which we want to stay reproducible.
>
I can also see the merits in your (3) suggestion but I don't think it would be appropriate to hard-code the list in Policy, because it would be too hard to change it and then people might end up relying on a very-incomplete list and then do stupid stuff that was counter to the original intention of the discussions around the policy. It would be better to find a generic wording (with some examples) similar to what I suggested elsewhere.
>> Does everything in policy need to be rigorously testable? or is it ok
>> to have Policy state the desired outcome even if we don't know how (or
>> don't have the resources) to test it fully today.
>
> I don't think everything has to be rigorously testable, but I do think
> it's a useful canary. If I can't test something, I start wondering
> whether that means I have problems with my underlying assumptions.
>
> [..]
The "strict" interpretation is in principle testable though - we just have to collect enough environment variables and decide which category they fall under, and add that logic to our build tools.
I think in these early days, it would be fine for public package builders and reproducibility testers to do (3) as you suggested, i.e.
- clean the environment
- set certain variables to a fixed value (the "whitelist") and record these in buildinfo
This "loose" interpretation of reproducibility still gives us some useful results, as well as testable reproducibility for end users, but as I said I don't think this should be Policy since the whitelist should be expanding quite quickly especially early on.
OTOH, developer reproducibility checkers (such as reprotest) can be a little bit more strict. I can imagine something like:
- reprotest runs 3 builds:
- build 0 with current env
- build 1 with current env + varying some "blacklist" envvars
- build 2 with current env + varying some "non-whitelist" envvars
If there are differences between build 1 and build 2, then reprotest reports "unexpected envvar $XXX affected the build" and the developer can then either submit it for inclusion on the "whitelist" or the "blacklist" based on the Policy wording. If it ends up on the blacklist then they would also have to fix their own package to be invariant under that envvar.
So over time, this way we can build up a blacklist and a whitelist. But it shouldn't be in the original policy. And I don't think what I suggested above is a particularly disruptive or surprising process, especially since the "public" builders would only do the "looser" interpretation so people aren't bothered by bogus "unreproducible" reports.
X
--
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git
More information about the Reproducible-builds
mailing list