[Reproducible-builds] proposal: store information in one place instead of multiple ones

Jérémy Bobbio lunar at debian.org
Fri Jul 31 05:14:54 UTC 2015


Johannes Schauer:
> here are several questions I have which, for me boil down to information being
> duplicated and stored in different locations, leading to possible confusion for
> contributors and added work when adding new bugs and issues:

Before I go further with answering: it seems you assume there's
well-thoughts reasons for the current state of things. For most of your
questions, that is not the case. Things grew organically from
experiments and different people making things better when they see they
could.

> 1. Why is the set of bts usertags different from the set of r-b issues? The bts
>    usertags seem to be way more broad.

That was their point initially. I wanted to be able to make statistics
on which kind of class of issues were most prevalent.

>    A solution would be to ditch the current usertags and use the issue names
>    instead. This would allow a one-to-one mapping between issue and bug number.

This would make creating a new issue much harder. Usertags are not a
nice part of the BTS to interact with. We have been adding a couple
issues every week for a good while. See the weekly reports.

> 2. Why does packages.yml store the bug number(s) for each package? This
>    information can easily retrieved from the bts and then will also not be
>    outdated. packages.yml easily lags behind the actual bts information if not
>    regularly updated by someone.

packages.yml was meant to be self-contained at first. Some bugs
affecting reproducibility could not be reproducibility issues per-se.

> 3. Why are the issues explained in issues.yml *and* in the wiki? There should
>    be one canonical place to describe them because currently, any new issue
>    that is identified requires to edit multiple resources and then link between
>    the two. This not only requires more work when creating the issue but when
>    looking up issues it is also unclear which resource is the authoritative one
>    and which one will give the desired information. Instead, the information
>    should be stored in one place only.

Here I can see a real reason: they have different audiences. issues.yml
is mainly for people involved in the whole effort where the wiki page
should be accessible to maintainers of a single package. Some issues are
systemic and individual maintainer should not really care about these.

The wiki has a richer syntax and makes nicer page.

> So my proposal is:
> 
> 1. Instead of using the current usertags "toolchain", "infrastructure",
>    "timestamps" and so on, use issue names instead.
> 
>    Since each bugs can have multiple usertags, the old tags could even be kept
>    and the issue names be added in addition.
> 
>    Since packages.yml exists, much of this conversion could probably be even
>    automated (except for packages with more than one bug open for them).
> 
>    Sometimes, reproducibility problems only affect a single package and in that
>    case it would create too much overhead to create a new issue for it. But in
>    that case, why not just create a dummy issue just for the purpose to
>    associate this kind of bugs to the reproducible builds team?

I tend to feel this would be much less flexible than how we currently do
things. We don't have an issue for every single type of patch.

> 2. Do not add bug numbers to packages.yml. The bts already stores the
>    information which source package has which bugs by the reproducible builds
>    team.

That means we have to tag every bug that affects the build on our
environment. I don't like the idea that much, but since Faux started
adding `ftbfs`, I guess this opened the gates.

> 3. Use the wiki only to describe issues and ditch issues.yml. The advantages
>    are that the Debian wiki offers a much richer syntax and is also editable by
>    everybody in Debian and not only the reproducible builds team.

Creating a page on the wiki is much more work than adding a couple of
lines in issues.yml. Categorizing issues is not a super-fun task, and
the less frictions there are, the better. I've seen myself being lazy
and even if I saw a pattern, not create an issue straight away because
I wanted to avoid interacting with the wiki.

> 4. After this is done, it is hard to say why the notes.git is useful in the
>    first place. The content of issues.yml is described in the Debian wiki and
>    the bug numbers are stored in the bts. One last task of packages.yml would
>    probably be to store some tiny notes for packages for which there doesn't
>    exist a bug. But I'd say to also move these notes into the bts. I think that
>    filing a bug about a package's unreproducibility should be done even without
>    having a fix for it. In fact many packages with such bugs exist simply for
>    the reason that at the time the bug was filed, jenkins did less checks than
>    it does now, so the patch which is currently in the bts does not make the
>    package fully reproducibly anymore. Furthermore, storing these notes in the
>    bts might make the package maintainer aware of the issue and gives them a
>    chance to comment on these notes. I would say it gives maintainers more
>    incentive to react on the issue themselves that way.

I see your points and in some future it is probably the way to go. But
this is too soon for me. We still don't have any package in the main
archive properly reproducible.

> On IRC the following problems were raised:
> […]
>  - "there is more information in packages.yml than in the bts"
> 
>     * true, and i think that's a bug. By having this information in
>       packages.yml and through that on reproducible.d.n, you are not informing
>       the maintainer of the package about the little info you just found
>       analyzing their package for reproducibility issues. Instead, I think you
>       should file a bug and write the small information you gathered there. If
>       you find out more later, reply to that bug. This way you actively engage
>       with the maintainers who might then even feel more compelled to help you
>       or are even made aware of the problem in the first place.

It's not a bug for me. We are *still* experimenting on a lot of fronts.
I strongly think we should not bother maintainers when hesitating on different
solutions.

>  - "it is harder to keep track of packages affected by toolchain issues with
>    the bts"
>
>     * yes, but the bug against the toolchain package is also way more
>       impressive and way more likely to get attention if it is blocking 100
>       other bugs. Those blocking bugs also don't need to be empty placeholder
>       bugs. Many packages have more than one issue, and a generic "please make
>       this package reproducible" bug could then be blocked by a bug in one or
>       more toolchain packages.

I agree, but it is still too soon. There is still no easy way for
maintainers to test reproducibility by themselves.

-- 
Lunar                                .''`. 
lunar at debian.org                    : :Ⓐ  :  # apt-get install anarchism
                                    `. `'` 
                                      `-   
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://lists.alioth.debian.org/pipermail/reproducible-builds/attachments/20150731/1de4645c/attachment.sig>


More information about the Reproducible-builds mailing list