[Debian-med-packaging] Question about proper archive area for packages that require big data for operation

Laszlo Kajan lkajan at rostlab.org
Wed Apr 24 13:43:39 UTC 2013


Hi Olivier!

On 24/04/13 08:20, Olivier Sallou wrote:
> 
> On 04/23/2013 11:48 AM, Laszlo Kajan wrote:
>> Dear Russ, Debian Med Team, Charles!
>>
>> (Please keep Tobias Hamp in replies.)
>>
>> @Russ: Please allow me to include you in a discussion about a few bioinformatics packages that depend on big, but free data [2]. I have cited
>> your opinion [3] in this discussion before. You are on the technical committee and on the policy team, so you, together with Charles, can help
>> substantially here.
>>
>> [2] http://lists.alioth.debian.org/pipermail/debian-med-packaging/2013-April/thread.html
>> [3] https://lists.debian.org/debian-vote/2013/03/msg00279.html
>>
>> This email is to continue the discussion about free packages that depend on big (e.g. >400MB) free data outside 'main'. These packages
>> apparently violate policy 2.2.1 [0] for inclusion in 'main' because they require software outside the 'main' area to function. They do not
>> violate point #1 of the social contract [1], which requires non-dependency on non-free components. For these big data packages, policy seems to
>> be overly restrictive compared to the social contract, leading to seemingly unfounded rejection from 'main'.
> Indeed, many bioinformatics programs relies on external data. But I am
> afraid that if we start to add some data packages, we will open an
> endless open door.... BioInformatics datasets are large, and becoming
> huge and numerous.
> This size will be an issue for Debian mirrors (mainly if some indexed
> data are system dependent) but will also be a pain for the user if, when
> installing a program (to have a look), it downloads GBs of dependent
> packaged data. It may be really slow and fill the user disk (and I do
> not talk of package updates).
> 
> Should not those data dependency clearly stated somewhere with the
> software package, with a script to get them ?

Yes, the former (clearly state large external data dependency in the long package description) is exactly what Charles Plessy recommended.
And your idea with the script to get the data is exactly what we implemented for this 'metastudent' package. So we clearly think along the same
lines... Now we just have to discuss it with the FTP master team as well, so we see if this is acceptable for them (or they prefer to have the
data in the archive).
Laszlo



More information about the Debian-med-packaging mailing list