[Debian-med-packaging] Seeking Advice on 2.7G Package Data

Olivier Sallou olivier.sallou at irisa.fr
Mon Oct 8 12:59:27 UTC 2012


Le 10/8/12 2:51 PM, "Steffen Möller" a écrit :
> Hello,
>
> I think we all agree that such a bunch of data should not become part of the archive. With biomaj and getData we have two tools that can perform the download of those extra gigabytes from your FTP servers. I suggest to employ them for the job.
>
> With all the historical bias of mine, I had prepared such a solution for the autodock package. See 
> debian-med/packages/autodocksuite/trunk/debian/autodock-getdata.install
> which installs 
> debian-med/packages/autodocksuite/trunk/debian/autodock-zinc.getData 
>
> And the latter reads
> $ cat debian/autodock-zinc.getData 
> print STDERR "Reading autodock-zinc configuration file\n" if $verbose;
>
> # This file is Copyright (C) Steffen Moeller <moeller at debian.org>
> # and made availabel under the terms of the GPL version 2 or any
> # later version as presented in '/usr/share/common-licenses/GPL-2'.
>
> # No chemical post-processing required since all files are in pdbqt format
> # already. But one needs to untar the files.
>
> foreach $n (("asinex", "chembridge_buildingblocks_pdbqt_1000split", "drugbank_nutraceutics",
>              "drugbank_smallmol", "fda_approved", "human_metabolome_pdbqt_1000split", "otava",
>              "zinc_natural_products")) {
>
>         print "$n\n";
>
>         $toBeMirrored{"zinc.pdbqt.$n"}={
>           "name" => "ZINC - PDBQT formatted – $n",
>           "tags" => ["pdbqt","compounds"],
>           "source" => "wget $sharedWgetOptions http://zinc.docking.org/pdbqt/$n.tar.gz",
>           "post-download" => "tar --no-same-owner --exclude prepare_lig.log --exclude mol2 -xzvf $n.tar.gz && chmod -R go+r . && find . -type d -exec chmod +x {} \\; "
>         };
> }
>
> 1;
>
> Olivier, how would one do that for biomaj? 
A description file in /etc/biomaj/db_properties could be set.
Regarding content of the property file, I could help if needed but oculd
be something like:

db.formats=fasta
db.fullname=My dabase
remote.dir=/myftppath/
release.regexp=
server=myftpserver
db.name=loctree
# Download all tgz
remote.files=.*.tgz
no.extract=false
local.files=.*
release.file=
db.type=MYDATATYPE
protocol=ftp

Then biomaj can download the db in /var/lib/biomaj/loctree, either
manually, or with a cron task for regular updates.
>
> Many greetings
>
> Steffen
>
> -------- Original-Nachricht --------
>> Datum: Mon, 08 Oct 2012 14:12:53 +0200
>> Von: Olivier Sallou <olivier.sallou at irisa.fr>
>> An: Debian Med Packaging Team <debian-med-packaging at lists.alioth.debian.org>
>> Betreff: Re: [Debian-med-packaging] Seeking Advice on 2.7G Package Data
>> Le 10/8/12 1:58 PM, Laszlo Kajan a écrit :
>>> Dear Team, Andreas, Steffen!
>>>
>>> Our lab has a new free sub-cellular localization prediction method [1].
>>>
>>> We would like to package it, and it is almost done. The tool depends on
>> an (arch indep) database that is 2.7GB (compressed, and it is used
>>> compressed). The data is (or will soon be) available as a tar.gz via
>> FTP. The question is:
>>> * What to do with the data? - how to make sure it's available for the
>> prediction method after it is installed?
>>>   1: We tried to create a 2.7GB loctree2-data package out of the data,
>> and make loctree2 depend on it. Creating the package went well, but apt
>>> has problems with the size, it look like some bug in the stable version.
>> This is not my preferred solution.
>> I think it should not Depends but Suggests it to avoid automatic
>> download. User may have the data, or use his own database.
>> A Recommends will make the download depending on configurations, so I
>> would keep a Suggests relationship.
>>
>> Furthermore it can be painful depending on available bandwidth.
>>>   2: Create a loctree2-data-installer package that downloads the large
>> data upon installation, flashplugin-installer style. I am worried this
>>> may be problematic for automatic testers (piuparts) at Debian, because
>> of the large data it moves. Do I need to worry about this? Also I don't
>>> know how this behaves with interrupted downloads (continuing the
>> download should be supported).
>>>   3: Have the executable download the data, or tell the user to download
>> the data, when it is run and the data is not available, or outdated. My
>>> worry here is that this makes system-wide installation more complicated.
>> The installation would be done by an admin, but the large data would be
>>> pulled in by an unprivileged user most likely, who can not install it
>> into /usr/share/loctree2-data. The admin would have to be warned by the
>>> user that installation of the package is not enough.
>> I would suggest to ask user to download the database (in README.Debian
>> and in man page) manually to use the tool.
>> When running the tool, if database is not present, tool could ask to
>> download the db.
>>
>> Olivier
>>
>>> Please advise, and thanks in advance.
>>>
>>> Best regards,
>>>
>>> Laszlo
>>>
>>> [1] http://www.ncbi.nlm.nih.gov/pubmed?term=22962467
>>>
>>> _______________________________________________
>>> Debian-med-packaging mailing list
>>> Debian-med-packaging at lists.alioth.debian.org
>>>
>> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/debian-med-packaging
>>
>> -- 
>> Olivier Sallou
>> IRISA / University of Rennes 1
>> Campus de Beaulieu, 35000 RENNES - FRANCE
>> Tel: 02.99.84.71.95
>>
>> gpg key id: 4096R/326D8438  (keyring.debian.org)
>> Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438
>>
>>
>> _______________________________________________
>> Debian-med-packaging mailing list
>> Debian-med-packaging at lists.alioth.debian.org
>> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/debian-med-packaging

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438




More information about the Debian-med-packaging mailing list