Bug#876055: Environment variable handling for reproducible builds
Vagrant Cascadian
vagrant at debian.org
Tue Sep 19 03:37:55 UTC 2017
On 2017-09-18, Vagrant Cascadian wrote:
> On 2017-09-18, Russ Allbery wrote:
>> Daniel Kahn Gillmor <dkg at fifthhorseman.net> writes:
>>> On Sun 2017-09-17 16:26:25 -0700, Russ Allbery wrote:
>>> Does everything in policy need to be rigorously testable? or is it ok
>>> to have Policy state the desired outcome even if we don't know how (or
>>> don't have the resources) to test it fully today.
>>
>> I don't think everything has to be rigorously testable, but I do think
>> it's a useful canary. If I can't test something, I start wondering
>> whether that means I have problems with my underlying assumptions.
>>
>> In particular, for (1), we have no comprehensive list of environment
>> variables that affect the behavior of tools, and that list would be
>> difficult to create. Many pieces of software add their own environment
>> variables with little coordination, and many of those variables could
>> possibly affect tool output.
>
> There is a huge difference between variables that *might* affect the
> build as an unintended input that gets stored in a resulting packages in
> some manner, and variables that are designed to change the behavior of
> parts of the build toolchain.
>
> I consider unintended variables that affect the build output a bug, and
> variables designed and intended to change the behavior of the toolchain
> expected, reasonable behavior.
Ok, after discussing on IRC a bit, I figured it might be worth expanding
on that point a bit...
The envioronment variables (and other variations) used by the
reproducible builds test infrastructure:
https://tests.reproducible-builds.org/debian/index_variations.html
I'll try and summarize the rationale for each of the variables used,
many of which have had actual impacts on the result of the builds:
CAPTURE_ENVIRONMENT, BUILDUSERID, BUILDUSERNAME
Some builds capture the entire environment, or most of the environment;
setting arbitrary environment variables can help detect this.
TZ
The timezone used can change the results of embedded timestamps.
LANG, LANGUAGE, LC_ALL
The locale and language settings definitely change the strings embedded
in some binaries, if tool output is translated.
PATH, USER, HOME
Some builds embed these.
DEB_BUILD_OPTIONS=parallel=N
The level of parallelism can change the build output, although other
values in DEB_BUILD_OPTIONS values might be reasonably expected to
change output (e.g. noautodbgsym).
None of the above variables should change the resulting built package,
with the possible exception of some other values of DEB_BUILD_OPTIONS.
On the other hand, I would expect variables such as CC, MAKE,
CROSS_COMPILE, CFLAGS, etc. to reasonably and likely change the result
of the built package. They are, in a sense, part of the build toolchain
environment.
Without generating comprehensive blacklists and/or whitelists, is it
plausible to come up with a policy description of the above two classes
of variables? Given the above lists, it seems relatively obvious to me
that there are basically two classes of variables, but I'm at a loss for
how to really describe it in policy.
You could give a reasonable test of:
Is this variable intended to change the results of the binary, or is
it changing the build as an unintended side-effect?
That does require reasoned interpretation, though. I envision such tests
being used in bug reports relating to reproducibility issues, on a
case-by-case basis.
It doesn't solve the testability issue on a policy level, but that could
possibly be addressed outside of policy through best practices for
reproducibility documentation.
live well,
vagrant
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/reproducible-builds/attachments/20170918/f4a12a01/attachment.sig>
More information about the Reproducible-builds
mailing list