[med-svn] [Debian Wiki] Update of "DebianMed/Meeting/Aberdeen2014_Report" by TimBooth

Debian Wiki debian-www at lists.debian.org
Wed Feb 19 22:48:10 UTC 2014


Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Debian Wiki" for change notification.

The "DebianMed/Meeting/Aberdeen2014_Report" page has been changed by TimBooth:
https://wiki.debian.org/DebianMed/Meeting/Aberdeen2014_Report?action=diff&rev1=2&rev2=3

Comment:
Done, at last

     * #4 Install “ARB” X11 GUI phylogenetics suite (part of Bio-Linux)
  
   * Looking at MrBayes MPI on Sunday.  There were some issues getting this to work.
+    * Diagnosed problems on Sunday - fix requires package rebuild
  
+ ==== Work on Edam/Debtags/Tools registry ====
+ ''Investigate possible mechanisms of data interchange''
+ 
+ Steffen Möller, Matúš Kalaš, Kristoffer Rapacki, Olivier Sallou, Piotr Chmura, Emil Rydza
+ (probably splitting into subgroups)
+ 
+  * '''Main activities'''
+    * Mapping of DebTags to EDAM
+    * Draft 4.0 of tool description model
+    * Synchronisation of Debian Med's packages with the Tool Registry
+    * Integration of EDAM annotations into Debian Med
+  * '''Achievements'''
+    * made sure that the Registry description model accommodates all the attributes needed by Debian/DebianMed
+    * made sure that all the attributes essential to the Registry exist in DebianMed
+    * established from where in DebianMed the tool descriptions can be obtained regularly via programmatic access
+      * Outcome: Integration of Debtags with EDAM and with the registry model '''without loss of data determined to be do-able.'''
+    * This meeting has sparked off significant tasks and collaborations in this area
+    * Production of status report - see below.
+ 
+ ==== Packaging demonstration ====
+ Brad Chapman, Daniel Barker, Andreas Tille, Detlef Wolf, Iain Learmonth
+ 
+    * Packaged seqTK as a demo packaging task
+    * Demo went well - seqTK and DNAclust both packaged
+    * Attendees started their own packaging tasks:
+      * python3-fitbitscraper now built - asked Deb Python about pushing it
+      * started on ngila - work in progress
+ 
+ ==== Packaging the PubMed search (C + Java) from BioinfoC ====
+ Detlef Wolf, with help from Olivier Sallou, Andreas Tille, Jorge Soares, Iain Learmonth, Steffen Moeller, Tim Booth
+ 
+   * Roadmap to PubmedSearch packaging
+     * libbioinfoc-0.1.0: setup of GNU build system (configure.ac, Makefile.am) 
+       to produce library (for shared and static linking)
+     * towards Debian initiation: gpg key & show passport, alioth account
+     * Java part of pubmed search: for next sprint
+     * Many contributors to improving the build system, so that
+   * AM build now works and an initial package has been produced.
+   * Gave an impromptu demo at 2pm (http://bioinfoc.ch)
+   * continued work to neaten up the package on Sunday:
+     * Steffen added the example prog to the bioinfoc package.  
+     * Also the library gets a debugging package, 
+     * close to ready for a Debian upload.
  
  === Personal Reports ===
  
@@ -71, +115 @@

   * Gave a demo of a Python application visualising personal health data from FitBit ([[http://i.imgur.com/0NUpGc2.png|Screenshot]]).
   * Took some pictures, link at bottom of main page.
  
+ ==== Steffen Möller ====
+ 
+   * Did a package of bio-parser-isatab (Perl lib) and it is nearly ready for commit
+ 
+ ==== Jorge Soares ====
+ 
+   * Commit fixes to snp-sites and tidy up
+     * Committed. snp-sites v1.5.0 has now installed in all sid architectures
+     * Successful debugging of upstream issues
+   * Package Fastaq
+     * Initial debian git commit of Fastaq python package. 
+     * Initial editing of several debian files.
+ 
  ==== Tim Booth ====
  
   * Gave a short talk on Bio-Linux and recent updates
@@ -81, +138 @@

   * Discussed roadmap to making Bio-Linux love the Galaxy toolshed with Brad and Peter C
   * Planned with Kristoffer how to use the Tools Registry in BL and how to contribute
   * Connected to the Qlustar cluster and tried some basic ops
+ 
+ ==== Niall Beard ====
+ 
+  * Interested in new packages + involvement in the tools registry group (Biocatalogue)
+  * Maybe looking at external tools in taverna with Steffen
+  * Joined Andreas and started packaging Coot - work ongoing
+  * Proceeded to productive discussion on tool description related issues
+ 
+ ==== Olivier Sallou ====
+  
+  * Fix biojava and libgo-perl
+  * Package new upstream version biojava3
+  * New packages: discosnp and mapsembler2
+ 
+ ==== Peter Cock ====
+ 
+  * Worked with Brad and Tim - incl looking at Galaxy DEB package.
+  * Planned BOF at the next Galaxy conference after in-depth discussion on Galaxy toolshed issues and sane packaging.  
+  * Looked into packaging an astronomy package.
+ 
+ ==== Brad Chapman ====
+ 
+  * Gave talk and demo on Cloud BioLinux
+  * Participated in packaging demo
+  * Worked on the manifest idea – list installed progs in CBL
+ 
+ A critical missing component of CloudBioLinux full and flavor-based custom
+ installs is defining the full environment of packages and versions available on
+ the system. We worked at the 2011 [BOSC] Codefest hackathon to add minimal support for
+ creating this full manifest of packages and versions, but the script required
+ integration into production workflows and numerous cleanups. During the first
+ day of the DebianMed Sprint I focused on converting this manifest creation into
+ a [[[https://github.com/chapmanb/cloudbiolinux/blob/master/cloudbio/manifest.py|production ready importable module]]. It
+ now handles creation of YAML files with packages and versions for all install
+ methods supported by CloudBioLinux (Debian packages; Python, R and Ruby library
+ installs; Homebrew packages; and custom CloudBioLinux scripts). The Debian
+ version is 10x faster than previously thanks to tips on querying apt repos from
+ Tim Booth.
+ 
+ These updates to manifest creation make it possible to integrate it into
+ existing tools that use CloudBioLinux for installation. The community developed
+ open source [[https://github.com/chapmanb/bcbio-nextgen|bcbio-nextgen]]
+ next-generation sequencing pipeline uses this, and we adjusted the build scripts
+ to 
+ [[https://github.com/chapmanb/bcbio-nextgen/blob/master/bcbio/install.py#L231|generate manifests on installation]]
+ and then use these manifests to provide a list of the biological packages that
+ run as 
+ [[https://github.com/chapmanb/bcbio-nextgen/blob/master/bcbio/provenance/programs.py#L184|part of the pipeline]].
+ This replaces brittle code existing in bcbio-nextgen and ties automated
+ installation to the new manifest feature, ensuring that manifest creation will be
+ regularly updated going forward for all CloudBioLinux installs.
+ 
+ Additional, I worked to learn Debian package building thanks to help from
+ Andreas Tille. This resulted in creation of my first Debian package for
+ [[https://github.com/ekg/freebayes|FreeBayes]], a highly accurate variant caller
+ from Erik Garrison in the Marth Lab. I pushed a nearly completed version to
+ DebianMed, which Andreas helped to finalize and make available. The hope for
+ future versions of CloudBioLinux is to move back to Debian/Ubuntu based support
+ inside Docker containers, which will help this package replace a custom build
+ function in CloudBioLinux with a proper package.
+ 
+ === Report of tool registry working group (Reporting by Matúš) ===
+ 
+ 
+ ==== Summary ====
+ 
+ '''The following was achieved:'''
+  * we made sure that the Registry description model accommodates all the attributes needed by Debian DebianMed;
+  * we made sure that all the attributes essential to the Registry exist in DebianMed
+  * we established from where in DebianMed the tool descriptions can be obtained regularly via programmatic access
+ 
+ '''Motivation:'''
+  * Community of ToolRegistry and Debian Med is expected to significantly overlap -> effort should not be performed redundantly
+  * Expected Synergies
+    * increased visibility of Debian Med's efforts to scientific community
+    * head start for ToolRegistry with data provided
+    * facilitation of Debian packaging with tool descriptions, prioritization of efforts
+    * any mechanism for ensuring that tool description of desired 'tasks' is in the Tool Registry? (Not via harassment of Andreas)
+    * maybe in the later future: test I/O data pairs for automated testing & benchmarking may be recorded in the registry and useful for automated testing in Debian
+  * Best possible annotation for tools (and databases) in Computational Biology, resulting in improved accessibility, visibility & attribution, and provenance within the field
+ 
+ '''Constraints and challenges:'''
+  * Non-intrusive to ease acceptance in working communities
+    * maintainers are not forced to link to the Tool Registry
+    * maintainers may perhaps have a choice of ignoring the Tool Registry, importing information from the registry upon request, or some form of automatic updates with or without confirmation
+  * Licensing of debian-provided annotation - [[http://anonscm.debian.org/viewvc/debian-med/trunk/packages/dialign/trunk/debian/copyright?view=markup|Example here]]
+  * Difficulty to distinguish "source" packages and their general annotation with Debian's more fine-grained separation of binaries, APIs/libs, data, scripts, debug information ... and many bits and pieces that should be considered intrinsic parts of one tool
+  * Other way round too, package being a collection or an ad hoc cluster of tools
+ 
+ '''Ideas for implementation:'''
+  * The Tool Registry harvesting information regularly from Debian Med and including references to the created Tool Registry entries back into Debian
+    *  ''into debian/control (probably not) and/or debian/upstream and/or 'tasks' (these 2 probably reasonable and possibly optionable)''
+  * Debian Maintainers are encouraged to add tags to reference an eventual Tool Registry entry and an option for eventual automated imports from the registry
+    *  ''into debian/control (probably not) and/or debian/upstream and/or 'tasks' (these 2 probably reasonable and possibly optionable)''
+  * Tool information in Debian Med:
+    * [[UltimateDebianDatabase|Ultimate Debian Database]] should integrate all information from
+      * which packages are in Debian
+      * debian/control - [[http://anonscm.debian.org/viewvc/debian-med/trunk/packages/dialign/trunk/debian/control?view=markup|Example here]]
+      * debian/upstream [[http://anonscm.debian.org/viewvc/debian-med/trunk/packages/dialign/trunk/debian/upstream?view=markup|Example here]]
+         * ''[[http://anonscm.debian.org/viewvc/debian-med/trunk/package_template|recommended template of the debian files]]''
+      * See Also: [[Debtags|DebTags on the Wiki]], [[http://debtags.alioth.debian.org/paper-debtags.html|DebTags paper]], [[http://en.wikipedia.org/wiki/Faceted_classification|Faceted Classification]], [[http://debtags.debian.net|DebTags home]], [[https://wiki.debian.org/Debtags/FAQ|DebTags FAQ]]
+    * [[http://anonscm.debian.org/viewvc/blends/projects/med/trunk/debian-med/tasks|'tasks' page]]; [[http://blends.debian.org/med/tasks/bio|is shown here]]; [[http://blends.debian.org/blends/apa.html#staticwebpages|populated from here]]; [[http://anonscm.debian.org/gitweb/?p=blends/website.git;a=blob;f=webtools/tasks.py|and code is here]]
+      * description of unpackaged tools ignored until Tool Registry finds it useful to import them
+      * additional information about packages ignored until found useful
+      * matching of packages with registry entries may be implemented via information in the 'tasks' file (this may be likely in case the references are not desired in a package itself)
+      * '' '''Important:''' Andreas has recently made it so that almost all relevant stuff from the ‘tasks’ file is in UDD ''
+    * Contact information in the debian/copyright - [[http://anonscm.debian.org/viewvc/debian-med/trunk/packages/dialign/trunk/debian/copyright?view=markup|Example maintained here]]
+      * ''Andreas may be willing to include these into the UDD, as now they aren’t there''
+    * online manpages can be included in registry among documentation URLs [[http://manpages.debian.net/cgi-bin/man.cgi?query=<pkgname>]]
+  * Access to UDD via public pythonned postgres
+  * First attempt:
+    1. Get all Deb Med descriptions from UDD
+    2a. Description of packages that had already been recorded in the registry will be fully overwritten
+    3a. Return registry accessions for newly created (or all) entries
+    4. Let Andreas et al decide in which form they want to get them and record them
+  * In a later iteration:
+    2b. Solve synchronisation to allow update of descriptions without overwriting (likely via timestamps of imported information)
+    3b. Let Debian people decide whether, how - and eventually with what options - an updated information about packages is recorded back to Debian
+  * Integration of information (the “federated” model):
+    2c. Handle synchronisation of information updates from multiple sources (for simplicity start with Deb Med and SEQwiki?)
+  * In future, would it be of interest to automatically (optionally manually) populate debian/upstream with enriched scientific & semantic information? [[https://docs.google.com/document/d/19VpzwxZdlz1K4P1q1a-WYZUtiSXwUp2nafM716dzW8I/edit?pli=1#bookmark=id.v415n1pdyfl0|See also the sketch here]]
+ 
+ '''Other sources:'''
+  * [[http://directory.fsf.org/wiki/Main_Page|Free software directory]] of FSF harvests and integrates information from multiple sources including but not limited to UDD
+    * Should certainly be heavily included in the Tool Registry effort
+    * Would be enormously useful to get design suggestion from FSF directory architects in particular about the information integration (“federated” model)
+  * [[http://taverna.nordugrid.org/sharedRepository/index.php|Nordugrid Taverna WF elements]] 
+    * Useful for import to registry, or are those tools anyway better described elsewhere?
+    * Registry accessions could be included into the Nordugrid XML description
+  * [[http://nebc.nerc.ac.uk/tools/bio-linux/package-list|Bio-Linux]] only has few packages that aren’t in Deb Med or aren’t planned to be included in Deb, and still are well-defined software (these may be e.g. unfree or hard to package or too ad-hoc packages)
+    * Should be added to the Tool Registry manually, done person-to-person with Tim (who knows everything relevant about those tools that are relevant for registration)
+  * CloudBioLinux is full of various stuff
+    * thorough tool information starting to be in focus now: the “manifest” which is going to be a YAML about installed stuff. Expected with thrill! :-)
+    * Information from the Tool Registry may be of great benefit to CloudBioLinux
+  * Debian Nonfree: Is included in UDD and among tasks. Bits that aren’t included in those should be included in those :)
+ 
+ 
+ '''Integration of EDAM annotations into Debian Med'''
+  * DebTags, Enrico Zini [http://debtags.debian.net, https://wiki.debian.org/Debtags/FAQ]
+  * ‘tasks’ categorisation
+  * '''Challenges:'''
+    A. EDAM concepts need to be identified by alphanumeric IDs/URIs, because terms may & do change in time
+        -- At the same time, of course, the terms need to be presented to both the users and annotators
+    A. Lower priority but possibly high coolness: Search/filtering/grouping by EDAM DAG
+  * '''Solutions:'''
+    A. Separate mapping file (to be packaged) between DebTags and external vocabularies
+        -- Start with EDAM and Media types (in order to have more than EDAM only)
+        -- Before the larger mapping effort, DebTags need to be refactored (by us in accord with Enrico)
+        -- After the mapping, information about the external concepts should be shown to Debian taggers in the tagging Web app
+    A. Record information about available external vocabularies in the mapping files, at the Facet level
+ 
+ 
+ '''Tool description model draft 4.0'''
+  * Alignment with Deb Med pkg description
+    * status: fully drafted except DebTags (todo during refactoring of DebTags - see also above)
+  * Compatibility of the tool description XSD with Emil’s & Piotr’s tooling: … 
+  * Finishing the tool description XSD and making it compatible with Emil’s & Piotr’s tooling & v.v.
+    * Main bits to finish: Interfaces, Versions
+    * Todo soon: Release new minor BioXSD version catering for the needs of the tool description XSD
+ 
+ === Miscellanous ===
+ 
+ Keysigning all round.
+ 
  ----
  CategorySprint
  



More information about the debian-med-commit mailing list