[Reproducible-builds] Preliminary review of dpkg-genbuildinfo

Jérémy Bobbio lunar at debian.org
Wed Feb 11 20:00:43 UTC 2015


Hi!

Now back to your comments on the code:

Guillem Jover:
> > diff --git a/debian/usertags b/debian/usertags
> > index 0fc26f2..0419488 100644
> > --- a/debian/usertags
> > +++ b/debian/usertags
> > @@ -65,6 +65,7 @@ dpkg-checkbuilddeps	[DPKG-CHECKBUILDDEPS]
> >  dpkg-deb		[DPKG-DEB]
> >  dpkg-distaddfile	[DPKG-DISTADDFILE]
> >  dpkg-divert		[DPKG-DIVERT]
> > +dpkg-genbuildinfo	[DPKG-GENBUILDINFO]
> 
> The pseudo-tag is only needed for old tools for legacy reasons. I
> should just do a round of «bts retitle» at some point.

Removed.

> >  dpkg-genchanges		[DPKG-GENCHANGES]
> >  dpkg-gencontrol		[DPKG-GENCONTROL]
> >  dpkg-gensymbols		[DPKG-GENCSYMBOLS]
> > diff --git a/man/dpkg-genbuildinfo.1 b/man/dpkg-genbuildinfo.1
> > new file mode 100644
> > index 0000000..626bf66
> > --- /dev/null
> > +++ b/man/dpkg-genbuildinfo.1
> > @@ -0,0 +1,89 @@
> > +.\" dpkg manual page - dpkg-genbuildinfo(1)
> > +.\"
> > +.\" Copyright © 2015 Jérémy Bobbio <lunar at debian.org>
> 
> This probably needs to be completed with the © from the sources it's
> borrowing from.

Added.

> The changes to man/po/po4a.conf are missing.

Added.

> > diff --git a/scripts/Dpkg/Control/FieldsCore.pm b/scripts/Dpkg/Control/FieldsCore.pm
> > index 8a5695b..75de185 100644
> > --- a/scripts/Dpkg/Control/FieldsCore.pm
> > +++ b/scripts/Dpkg/Control/FieldsCore.pm
> > @@ -361,6 +373,11 @@ our %FIELD_ORDER = (
> >          Vcs-Svn Testsuite), &field_list_src_dep(), qw(Package-List),
> >          @checksum_fields, qw(Files)
> >      ],
> > +    CTRL_FILE_BUILDINFO() => [
> > +        qw(Format Build-Architecture Source Binary Architecture Version
> > +        Build-Environment Build-Path),
> > +        @checksum_fields,
> > +    ],
> 
> Probably pack all «Build-» fields together at the end after the checksum
> ones, and leave Build-Environment as the last of those, as it's the one
> taking the most vertical space.

Done.

> >      CTRL_FILE_CHANGES() => [
> >          qw(Format Date Source Binary Binary-Only Built-For-Profiles Architecture
> >          Version Distribution Urgency Maintainer Changed-By Description
> > diff --git a/scripts/dpkg-buildpackage.pl b/scripts/dpkg-buildpackage.pl
> > index a3cca87..018f37d 100755
> > --- a/scripts/dpkg-buildpackage.pl
> > +++ b/scripts/dpkg-buildpackage.pl
> > @@ -158,6 +158,7 @@ my $since;
> >  my $maint;
> >  my $changedby;
> >  my $desc;
> > +my @buildinfo_opts;
> >  my @changes_opts;
> >  my @hook_names = qw(
> >      init preclean source build binary changes postclean check sign done
> 
> The buildinfo hook name is missing in the @hook_names array.

Added.
 
> > @@ -567,6 +570,25 @@ if ($include & BUILD_BINARY) {
> >      withecho(@debian_rules, $buildtarget);
> >      run_hook('binary', 1);
> >      withecho(@rootcommand, @debian_rules, $binarytarget);
> > +
> > +    if (-e "../$pv.dsc") {
> 
> Why is the buildinfo file tied to whether there's or not a source
> package? I'd say it might make sense to not generate a buildinfo file
> only if we are not building binary packages. To check for that use the
> BUILD_ constants.

I've added this test after seeing build failures with
cross-binutils which calls `dpkg-buildpackage -B` without a
source previously generated:
https://sources.debian.net/src/cross-binutils/0.22/debian/rules/#L53

In my mind, the buildinfo files are tying a given source to binary
packages. So if they can't contain the hash of a .dsc, then they should
not be generated.

If we say buildinfo are of a more general interest, dpkg-genbuildinfo
could skip including the .dsc if there's none available.

The whole block you quoted is conditional `$include & BUILD_BINARY`. Is
there something more that needs to be added to only generate a buildinfo
file only when building binary packages?

> > +        run_hook('buildinfo', 1);
> 
> The hook should not be conditional to the code being executed. The
> second argument should represent that.

Something like the following, then?

    run_hook('buildinfo', $include & BUILD_BINARY);

I'm not sure why this has to be different than the 'binary' hook?

> > +        push @buildinfo_opts, "--admindir=$admindir" if $admindir;
> > +
> > +        my $buildinfo_path = "../$pva.buildinfo";
> > +        my $buildinfo = Dpkg::Control->new(type => CTRL_FILE_BUILDINFO);
> > +
> > +        print { *STDERR } " dpkg-genbuildinfo @buildinfo_opts >$buildinfo_path\n";
> > +
> > +        open my $buildinfo_fh, '-|', 'dpkg-genbuildinfo', @buildinfo_opts
> > +            or subprocerr('dpkg-genbuildinfo');
> > +        $buildinfo->parse($buildinfo_fh, _g('parse buildinfo file'));
> > +        $buildinfo->save($buildinfo_path);
> > +        close $buildinfo_fh or subprocerr(_g('dpkg-genbuildinfo'));
> 
> There's no need for all this, just call dpkg-genbuildinfo redirecting
> its output to the destination file. The $changes code where this is
> originating from is accessed later on, so that's why it's being parsed.

Changed.
 
> > diff --git a/scripts/dpkg-genbuildinfo.pl b/scripts/dpkg-genbuildinfo.pl
> > new file mode 100755
> > index 0000000..d7784ee
> > --- /dev/null
> > +++ b/scripts/dpkg-genbuildinfo.pl
> > @@ -0,0 +1,263 @@
> > +#!/usr/bin/perl
> > +#
> > +# dpkg-genbuildinfo
> > +#
> > +# Copyright © 2003-2013 Yann Dirson <dirson at debian.org>
> > +# Copyright © 2014 Niko Tyni <ntyni at debian.org>
> > +# Copyright © 2014-2015 Jérémy Bobbio <lunar at debian.org>
> 
> This also probably needs to be completed with the © from the other
> sources it's borrowing from.

Added.
 
> > +my $quiet = 0;
> 
> There's nothing to be quiet about here.

Removed.
 
> > +my $buildinfo_format = '1.9';
> 
> Why is this format 1.9?

Good question. I think I misread the policy while working on this during
DebConf14.

Now, I could invent a clever explaination: we could consider this format
as an evolution from what dh-buildinfo produced. This would buildinfo
“2.0”, and it's not stable yet, it's “1.9”.

Should we make this 1.0 instead? :)

> > +my $checksums = Dpkg::Checksums->new();
> > +my %archadded;
> > +my @archvalues;
> > +
> > +# There's almost the same function in dpkg-checkbuilddeps,
> > +# they probably should be factored out.
> > +sub parse_status {
> 
> I've a local commit switching the parse_status() in dpkg-checkbuilddeps
> to use Dpkg::Index, unfortunately due to the current implementation
> it incurs a significant speed regression, but I'm reworking the
> Dpkg::Control parser to speed it up, so this code will be able to
> switch to something similar too.

Nice. :)
 
> > +    my $status = shift;
> > +
> > +    my $facts = Dpkg::Deps::KnownFacts->new();
> > +    my %depends;
> > +    local $/ = '';
> > +    open(my $status_fh, '<', $status)
> > +        or syserr(_g('cannot open %s'), $status);
> > +    while (<$status_fh>) {
> > +        next unless /^Status: .*ok installed$/m;
> > +
> > +        my ($package) = /^Package: (.*)$/m;
> > +        my ($version) = /^Version: (.*)$/m;
> > +        my ($arch) = /^Architecture: (.*)$/m;
> > +        my ($multiarch) = /^Multi-Arch: (.*)$/m;
> > +        $facts->add_installed_package($package, $version, $arch,
> > +                                      $multiarch);
> > +
> > +        if (/^Provides: (.*)$/m) {
> > +            my $provides = deps_parse($1, reduce_arch => 1, union => 1);
> > +            next if not defined $provides;
> > +            foreach (grep { $_->isa('Dpkg::Deps::Simple') }
> > +                                 $provides->get_deps())
> > +            {
> > +                $facts->add_provided_package($_->{package},
> > +                                    $_->{relation}, $_->{version},
> > +                                    $package);
> > +            }
> > +        }
> > +
> > +        if (/^(?:Pre-)?Depends: (.*)$/m) {
> > +            foreach (split(/,\s*/, $1)) {
> > +                push @{$depends{$package}}, $_;
> 
> This will merge all dependencies from all Multi-Arch:same instances.
> But in any case why record the dependencies of the interesting packages?

Sorry, I'm not sure I understand what you are asking. We want to record
the version of all packages involved in the build.

> > +# include .dsc
> > +my $spackage = $changelog->{'Source'};
> > +(my $sversion = $changelog->{'Version'}) =~ s/^\d+://;
> 
> This will grab the binary version, not the source version, this will
> break on binNMUs.
> 
> We probably need to record both source and binary versions in the
> buildinfo file. And in case they differ the changelog entry, which
> tends to differ per arch.

I see. I wish handling binNMU could be simplier.

I think I've added enough support for binNMU:

 * Like for *.changes files, the source version is added in
   Source between parenthesis if it differs from binary version.
 * The latest changelog entry is recorded in an optional Changes field.

> > +foreach my $file (sort $dist->get_files()) {
> 
> Why the sort, the function is supposed to preserve the same insertion
> order.

Now I think I remember what was happening. In case of parallel builds,
it might be possible that files get added to debian/files in different
orders. To make the buildinfo file itself reproducible, sorting the file
list here is the easiest solution. Is there another option?

> > +# parse essential list
> > +open (RAWFILE, '/usr/share/build-essential/essential-packages-list')
> > +    or error("cannot read /usr/share/build-essential/essential-packages-list: $!\n");
> > +# skip header
> > +while (my $line = <RAWFILE>) {
> > +    chomp $line;
> > +    last if $line eq '';
> > +}
> > +while (my $line = <RAWFILE>) {
> > +    chomp $line;
> > +    $env_pkgs{$line} = 1;
> > +}
> > +close RAWFILE;
> 
> Hmm, please just record installed packages that have the Essential:yes
> field. I don't want the code to rely on such externally defined files.

Done.
 
> > +append_deps(\%env_pkgs, 'build-essential',
> > +                        $control->{source}->{'Build-Depends-Indep'},
> > +                        $control->{source}->{'Build-Depends'});
> 
> Missing Build-Depends-Arch field.

I missed it. I guess the policy and
<https://wiki.debian.org/Build-Depends-Indep> should be updated.

> But those fields should not be recorded if the build does not concern them.

Done.

> And this uses an undocumented access to the $control object.

Fixed.

> > +        if (defined $architecture &&
> > +            $architecture ne 'all' && $architecture ne $build_arch) {
> 
> This should probably be host_arch.

Sorry if I'm confused. Here's the reasoning:

dpkg-architecture(1) says “build machine: the machine the package is
built on”. We record this in the Build-Architecture field of the
buildinfo. We thus only need to record when the architecture of a
dependency differs from this default.

> I get the impression the code tracking the used installed packages is
> made much more difficult than it needs to be.

I agree. I'm not happy with it. I will wait for your other changes in
parse_status() before reworking it.

> And I don't feel very comfortable assuming the .buildinfo name here
> when it is not being generated in this same script, and when the
> script generating the output is not even defining the name itself.

Right. I've moved the logic to dpkg-buildpackage.

Thanks again for your review!

-- 
Lunar                                .''`. 
lunar at debian.org                    : :Ⓐ  :  # apt-get install anarchism
                                    `. `'` 
                                      `-   
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://lists.alioth.debian.org/pipermail/reproducible-builds/attachments/20150211/a0a5c7dd/attachment.sig>


More information about the Reproducible-builds mailing list