[Debian-med-packaging] metastudent_1.0.7-1_amd64.changes REJECTED

Laszlo Kajan lkajan at debian.org
Sat Apr 20 16:01:05 UTC 2013


Dear Team, FTP Masters, Luca!

How do we handle packages that depend on large data for operation? See below.

On 19/04/13 17:00, Luca Falavigna wrote:
> 
> Hi,
> 
> according to README.Debian, this package requires the download of an
> external resource to work properly, so it must be targeted contrib.

I have a free gene ontology term predictor 'metastudent' from Tobias Hamp. It searches BLAST databases that were specially prepared for it.
These databases and some additional data files are packed up in a free (GPL-2+) tar.gz [1] that is over 400MB. In order to save space, we
decided not to package and upload the data (after initially packaging it). That now seems to force the package to 'contrib' [2].

[1] ftp://rostlab.org/metastudent/metastudent-data_1.0.0.tar.gz
[2] http://www.debian.org/doc/debian-policy/ch-archive.html 2.2 Archive areas

* I doubt the package contains everything needed to /generate/ those data files (@Tobias: does it?).

* Without the data files the package would not be broken, but it would not be useful, it would not perform its function.

* predictprotein, by the way, could also not perform its function fully, and would not be useful, without large BLAST and other databases (that
have to be downloaded [3]). predictprotein does have all the tools and instruction packaged to obtain the required databases, though.

[3] http://wiki.debian.org/DebianMed/PredictProtein

My question is:

* Do we have a team policy for such packages that depend on large data? Where should the data go?

* Should metastudent, and consequently predictprotein, go into 'contrib'?

* Do you see a way - apart of creating a 400MB package for the metastudent data, and a several gigabytes large for predictprotein - to keep
these in 'main', and therefore in the distribution?

There's been a discussion about this issue during the DPL vote [4], whether software that is not useful without Internet connection could be in
'main'. Bart Martens suggested the interpretation that if a package installs software outside the distribution on the local system, then it
should not be in 'main' [5]. Russ Allbery wrote that point #1 of the social contract is relevant (and canonical) [6]. I interpret point #1 as
'as long as no non-free software is installed on the system by the package, a package can be part of the Debian system'.
The social contract [7] point 1. indeed seems to allow metastudent to be in 'main', in my interpretation, provided the data is DFSG free (I
think it is).

[4] http://lists.debian.org/debian-vote/2013/03/msg00249.html
[5] http://lists.debian.org/debian-vote/2013/03/msg00276.html
[6] http://lists.debian.org/debian-vote/2013/03/msg00279.html
[7] http://www.debian.org/social_contract

Your thoughts and suggestions are welcome.

Best regards,
Laszlo



More information about the Debian-med-packaging mailing list