[Reproducible-builds] Bug#138409: dpkg-dev: please add support for .buildinfo files

Guillem Jover guillem at debian.org
Thu Jan 28 17:29:52 UTC 2016


Hi!

On Tue, 2016-01-05 at 14:32:51 +0100, Jérémy Bobbio wrote:
> Control: retitle -1 dpkg-dev: please add support for .buildinfo files
> Control: tag -1 + patch

> The attached patch will enable dpkg-buildpackage to create .buildinfo
> files as specified on the Debian wiki [1]. They have two main purposes:
> 
>  * recording information about the system environment used during a
>    particular build—versions of the build dependencies installed, system
>    architecture, etc. for easier forensics/debugging;
>  * describe how to recreate (partially or in full) the original
>    environment when trying to reproduce a particular build.
> 
> Since Guillem's preliminary review in February 2015 [2], the
> specification has slightly elvolved to be a bit more relaxed and the
> code have been improved.

Cool, thanks!

> One of the main change is that `.buildinfo` should now be named with an
> arbitrary identifier. By default this defaults to $HOSTNAME-$TIMESTAMP
> but can be set to an arbitrary value by the `--buildinfo-identifier`
> command line flag.

Hmmm, leaking the hostname seems slightly privacy-concerning? If the
information therein is not relevant I'd rather use something like an
UUID (although that would require increasing the pseudo-build-essential
set), or just hashing the hostname-timestamp with something like md5
or sha1 or similar.

> To address privacy concerns, the Build-Path field is now only included
> when either the build path starts by `/build/` or
> `--always-include-path` has been specified on the command line of
> `dpkg-genbuildinfo`.

Thanks!

> .buildinfo files are now accepted (although discarded) by the Debian
> archive [3]. This change should thus not affect Debian developpers in
> their daily work.

Ah perfect, thanks for pushing for this. I'm planning on including
most of the patches that look fine for 1.18.5 or .6 ideally.

I've some pending changes I'll be committing to master or a separate
branch, that I'd like to be tested on the reproducible setup (ideally
against the already generated and pre-existing reproducible binaries),
if that's possible, I'll ask about that when those land, I just need
to finish up fewm more unit tests.

Here's a quick review:

> From 7b6d953f834b1e8800d3f8af4570d57d86e5c592 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?J=C3=A9r=C3=A9my=20Bobbio?= <lunar at debian.org>
> Date: Fri, 6 Nov 2015 12:17:39 +0000
> Subject: [PATCH] Add support for .buildinfo files

> diff --git a/man/dpkg-buildpackage.1 b/man/dpkg-buildpackage.1
> index 13770ba..54cee7b 100644
> --- a/man/dpkg-buildpackage.1
> +++ b/man/dpkg-buildpackage.1
> @@ -317,6 +322,12 @@ The source package version (without the epoch).
>  The upstream version.
>  .RE
>  .TP
> +.BI \-\-buildinfo-identifier= identifier
> +By default, \fBdpkg\-buildpackage\fP put the system hostname and the

“puts”, but perhaps better “uses foo and bar to create the quux
filename”.

> +current time in the name of the \fB.buildinfo\fP file. An arbitrary
> +identifier can be specified as a replacement (since dpkg 1.18.5).

I'd probably describe first what the option actually does, so that the
version when it got intruduced makes sense to what it refers to. And
then how does that diverges from the default.

And please, place new sentences on a new line. (I should probably
update the "coding-style" about that.)

> It must
> +contain only alphanumeric characters and hyphens.
> +.TP
>  .BI \-p sign-command
>  When \fBdpkg\-buildpackage\fP needs to execute GPG to sign a source
>  control (\fB.dsc\fP) file or a \fB.changes\fP file it will run

> diff --git a/man/dpkg-genbuildinfo.1 b/man/dpkg-genbuildinfo.1
> new file mode 100644
> index 0000000..77f2a76
> --- /dev/null
> +++ b/man/dpkg-genbuildinfo.1
> @@ -0,0 +1,98 @@

> +.BI \-\-always\-include\-path
> +By default, the \fBBuild-Path\fR field will only be written if the current
> +directory starts with \fB/build/\fR. Specify this option to always write
> +a \fBBuild-Path\fR field when generating the \fB.buildinfo\fR.

Missing escaped dash in field names.

> diff --git a/scripts/Dpkg/Control/Types.pm b/scripts/Dpkg/Control/Types.pm
> index 09e12d1..ad6e11b 100644
> --- a/scripts/Dpkg/Control/Types.pm
> +++ b/scripts/Dpkg/Control/Types.pm
> @@ -51,16 +52,17 @@ between Dpkg::Control and Dpkg::Control::Fields.
>  
>  use constant {
>      CTRL_UNKNOWN => 0,
> -    CTRL_INFO_SRC => 1,      # First control block in debian/control
> -    CTRL_INFO_PKG => 2,      # Subsequent control blocks in debian/control
> -    CTRL_INDEX_SRC => 4,     # Entry in repository's Packages files
> -    CTRL_INDEX_PKG => 8,     # Entry in repository's Sources files
> -    CTRL_PKG_SRC => 16,      # .dsc file of source package
> -    CTRL_PKG_DEB => 32,      # DEBIAN/control in binary packages
> -    CTRL_FILE_CHANGES => 64, # .changes file
> -    CTRL_FILE_VENDOR => 128, # File in $Dpkg::CONFDIR/origins
> -    CTRL_FILE_STATUS => 256, # $Dpkg::ADMINDIR/status
> -    CTRL_CHANGELOG => 512,   # Output of dpkg-parsechangelog
> +    CTRL_INFO_SRC => 1,         # First control block in debian/control
> +    CTRL_INFO_PKG => 2,         # Subsequent control blocks in debian/control
> +    CTRL_INDEX_SRC => 4,        # Entry in repository's Packages files
> +    CTRL_INDEX_PKG => 8,        # Entry in repository's Sources files
> +    CTRL_PKG_SRC => 16,         # .dsc file of source package
> +    CTRL_PKG_DEB => 32,         # DEBIAN/control in binary packages
> +    CTRL_FILE_BUILDINFO => 64,  # .buildinfo file
> +    CTRL_FILE_CHANGES => 128,   # .changes file
> +    CTRL_FILE_VENDOR => 256,    # File in $Dpkg::CONFDIR/origins
> +    CTRL_FILE_STATUS => 512,    # $Dpkg::ADMINDIR/status
> +    CTRL_CHANGELOG => 1024,     # Output of dpkg-parsechangelog

Just add the new type at the end, and use the next bit. Add the
comment above the line, like the current code looks like in master. :)

>  };
>  
>  =head1 CHANGES
> diff --git a/scripts/dpkg-buildpackage.pl b/scripts/dpkg-buildpackage.pl
> index 17ada97..ef62297 100755
> --- a/scripts/dpkg-buildpackage.pl
> +++ b/scripts/dpkg-buildpackage.pl
> @@ -412,6 +428,15 @@ if (defined $parallel) {
>      $build_opts->export();
>  }
>  
> +if (defined $buildinfo_identifier) {
> +    error(g_('buildinfo identifiers must not be empty and only contain alphanumeric characters and hyphens'))
> +        unless $buildinfo_identifier =~ /\A[A-Za-z0-9-]+\z/;

Can we just simply use the package name rules instead? It also avoids
potential problems with case and similar. (There's a
pkg_name_is_illegal function in Dpkg::Package already.)

> +} else {
> +    my $hostname = hostname;
> +    my $timestamp = strftime('%Y%m%dT%H%M%SZ', gmtime());
> +    $buildinfo_identifier = "$hostname-$timestamp";

See my comment at the top.

> +}
> +
>  set_build_profiles(@build_profiles) if @build_profiles;
>  
>  my $cwd = cwd();
> @@ -569,6 +594,25 @@ if ($include & BUILD_BINARY) {
>      withecho(@debian_rules, $buildtarget);
>      run_hook('binary', 1);
>      withecho(@rootcommand, @debian_rules, $binarytarget);
> +
> +    if (-e "../$pv.dsc") {
> +        run_hook('buildinfo', 1);
> +
> +        push @buildinfo_opts, "--admindir=$admindir" if $admindir;
> +
> +        my $buildinfo = "${pv}_${buildinfo_identifier}.buildinfo";
> +        withecho("dpkg-genbuildinfo @buildinfo_opts >../$buildinfo");
> +
> +        my $control = Dpkg::Control::Info->new('debian/control');
> +        my $sec = $control->get_source->{'Section'} // '-';
> +        my $pri = $control->get_source->{'Priority'} // '-';
> +        warning(_g('missing Section for source files')) if $sec eq '-';
> +        warning(_g('missing Priority for source files')) if $pri eq '-';
> +        withecho('dpkg-distaddfile', $buildinfo, $sec, $pri);
> +
> +    } else {
> +        warning(_g('no .dsc file, skipping .buildinfo generation'));
> +    }
>  }

ISTR mentioning this before, but I don't see why generating the
buildinfo file is tied to existing a source package at all? The source
should be included if we are including a source in the upload, that's
it.

> diff --git a/scripts/dpkg-genbuildinfo.pl b/scripts/dpkg-genbuildinfo.pl
> new file mode 100755
> index 0000000..5c5122b
> --- /dev/null
> +++ b/scripts/dpkg-genbuildinfo.pl
> @@ -0,0 +1,322 @@
> +#!/usr/bin/perl

> +sub append_deps {
> +    my $env_pkgs = shift;
> +    my $deps;
> +
> +    foreach my $dep_str (@_) {
> +        next unless $dep_str;
> +        $deps = deps_parse($dep_str, reduce_restrictions => 1, build_dep => 1);
> +        # add packages as unseen if they were not there before
> +        deps_iterate $deps, sub { ${$env_pkgs}{$_[0]->{package}} //= 0; 1 };

For non-built-ins functions please use parenthesis.

> +    }
> +}


> +while (@ARGV) {
> +    $_=shift(@ARGV);

Spaces around assignment.

> +    if (m/^-B$/) {
> +	set_build_type(BUILD_ARCH_DEP, $_);
> +    } elsif (m/^-A$/) {
> +	set_build_type(BUILD_ARCH_INDEP, $_);
> +    } elsif (m/^-c(.*)$/) {
> +	$controlfile = $1;
> +    } elsif (m/^-l(.*)$/) {
> +	$changelogfile = $1;
> +    } elsif (m/^-f(.*)$/) {
> +	$fileslistfile = $1;
> +    } elsif (m/^-F([0-9a-z]+)$/) {
> +        $changelogformat = $1;
> +    } elsif (m/^-u(.*)$/) {
> +	$uploadfilesdir = $1;
> +    } elsif (m/^--admindir=(.*)$/) {
> +	$admindir = $1;
> +    } elsif (m/^--always-include-path$/) {
> +	$always_include_path = 1;
> +    } elsif (m/^-(?:\?|-help)$/) {
> +	usage();
> +	exit(0);
> +    } elsif (m/^--version$/) {
> +	version();
> +	exit(0);
> +    } else {
> +        usageerr(_g("unknown option \`%s'"), $_);
> +    }
> +}

For new perl code let's not use hard tabs (I'll update the
coding-style if this is not there yet).

(For vim that would be: set expandtab; set sts=4.)

> +$fields->{'Source'} = $spackage;
> +if ($changelog->{'Binary-Only'}) {
> +    $fields->{'Source'} .= ' (' . $sourceversion . ')';
> +    $fields->{'Changes'} = $changelog->{'Changes'} . "\n\n"
> +                         . ' -- ' . $changelog->{'Maintainer'}
> +                         . '  ' . $changelog->{'Date'};
> +}

Hmmm, it bothers me slightly that the Changes field here diverges in
form from the one in the .changes file.

I think I'd prefer to have the Date as its own field, maybe always
included. And also the Maintainer field. Any reason to not include
them all the time or on their own?

> +$fields->{'Binary'} = join(' ', map { $_->{'Package'} } $control->get_packages());
> +# Avoid overly long line by splitting over multiple lines
> +if (length($fields->{'Binary'}) > 980) {
> +    $fields->{'Binary'} =~ s/(.{0,980}) /$1\n/g;
> +}
> +$fields->{'Architecture'} = join ' ', sort @archvalues;
> +$fields->{'Version'} = $binaryversion;
> +
> +my $cwd = cwd();
> +# Only include build path if explicitely required or in the common location
> +if ($always_include_path or $cwd =~ /\A\/build\//) {
> +    $fields->{'Build-Path'} = $cwd;
> +}

Hmm, I think the «/build» policy should be Debian-specific thing. Let's
move that into a vendor hook. For example it could be named perhaps
'builtin-system-build-path' or something along those lines that
returns the path if defined or undef otherwise.

> +$checksums->export_to_control($fields);
> +
> +my @status = parse_status("$admindir/status");
> +my $facts = shift @status;
> +my %depends=%{shift @status};
> +my @essential_pkgs=@{shift @status};

Spaces surrounding assignment operator. And I'd probably unpack them
in a single assignment, like:

  my ($facts, $depends, $essential_pkgs) = parse...;

and then use references.

> +my $deps;
> +my %env_pkgs;
> +
> +# parse essential list
> +
> +append_deps(\%env_pkgs, @essential_pkgs, 'build-essential',
> +                        $control->get_source->{'Build-Depends'});

Please do not hardcode build-essential here (as that's a Debianism),
use the vendor hook to get any custom builtin build dependencies.

> +if ($include & BUILD_ARCH_DEP) {
> +    append_deps(\%env_pkgs, $control->get_source->{'Build-Depends-Arch'});
> +}
> +
> +if ($include & BUILD_ARCH_INDEP) {
> +    append_deps(\%env_pkgs, $control->get_source->{'Build-Depends-Indep'});
> +}
> +
> +while (my ($pkg, $seen) = each(%env_pkgs)) {
> +    next if $seen;
> +    $env_pkgs{$pkg} = 1; # mark as seen
> +    next unless defined $facts->{pkg}->{$pkg}->[0];
> +    append_deps(\%env_pkgs, @{$depends{$pkg}});
> +    keys %env_pkgs; # reset iterator
> +}

Resetting every time we append new deps is a bit suboptimal. Instead
I'd use both a hash to track if the packages have been seen, and an
array to add packages to it.

> +my $environment = Dpkg::Deps::AND->new();
> +foreach my $pkg (sort keys %env_pkgs) {
> +    foreach my $installed_pkg (@{$facts->{pkg}->{$pkg}}) {
> +        my $version = $installed_pkg->{version};
> +        my $architecture = $installed_pkg->{architecture};
> +        my $pkg_name = $pkg;

> +        if (defined $architecture &&
> +            $architecture ne 'all' && $architecture ne $build_arch) {
> +            $pkg_name = "$pkg_name:$architecture";
> +        }

To me the $pkg_name handling here seems a bit confusing, I'd probably
just move the first assignment to an else block, and use $pkg in the
arch-qualified case which seems clearer.

Also this will include all Multi-Arch instances for a given package
regardless of them being used or not, I don't think that's desirable.

> +        my $dep = Dpkg::Deps::Simple->new("$pkg_name (= $version)");
> +        $environment->add($dep);
> +    }
> +}
> +$environment = "\n" . $environment->output();
> +$environment =~ s/, /,\n/g;
> +$fields->{'Build-Environment'} = $environment;
> +
> +$fields->output(\*STDOUT);

Thanks,
Guillem



More information about the Reproducible-builds mailing list