[Reproducible-builds] Bug#807111: libperl-apireference-perl: Make the stored data reproducible between builds

Sat Dec 5 18:10:44 UTC 2015

On Sat, Dec 05, 2015 at 05:32:13PM +0100, Axel Beckert wrote:
> Niko Tyni wrote:
> > This module recently switched to using Sereal::Encoder instead of
> > Data::Dumper to store pre-parsed data. The stored data representation
> > now varies between builds.  The attached patch fixes this, rendering
> > the build reproducible again.

> >    my $dump = Sereal::Encoder->new({
> > +    canonical      => 1,

> I wonder if it's wise to patch the module itself in such a permanent
> way instead of maybe adding a switch and setting canonical=1 only
> during the build or the running of the test suite.
> 
> Maybe users of that module won't be happy if canonical=1 is hardcoded
> that way, e.g. for (guessed) performance reasons as the above likely
> includes sorting which always has an performance impact at some scale.

This code path is in a private function that is only used during the
build (by Perl::APIReference::Generator) to serialize API documentation
structures inline into the module in a __DATA__ section, to avoid parsing
perlapi.pod files at runtime.

I doubt the canonical representation is much slower to decode, but that
phase (API documentation lookups) doesn't seem like a performance critical
thing to me.

A hypothetical performance critical subclass of
Perl::APIReference::Generator might suffer, but IMO this is very
contrived.

The old Data-Dumper based implementation used to set
$Data::Dumper::Sortkeys, so the loss of reproducibility is a regression.

I've also forwarded the patch upstream, so the author can protest
if he judges this loss of performance unacceptable.

I hope this addresses your concerns.
-- 
Niko Tyni   ntyni at debian.org