[med-svn] [debian-med-benchmarking-spec.git] 01/01: Updated spec with architechture

Kevin Murray daube-guest at moszumanska.debian.org
Fri Feb 5 14:22:57 UTC 2016


This is an automated email from the git hooks/post-receive script.

daube-guest pushed a commit to branch master
in repository debian-med-benchmarking-spec.git.

commit 2460dd3f52d8850a90fbd2adca1312badddc65c7
Author: Kevin Murray <spam at kdmurray.id.au>
Date:   Fri Feb 5 15:22:29 2016 +0100

    Updated spec with architechture
---
 benchmarking.md | 184 ++++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 125 insertions(+), 59 deletions(-)

diff --git a/benchmarking.md b/benchmarking.md
index b49396e..94a3677 100644
--- a/benchmarking.md
+++ b/benchmarking.md
@@ -1,24 +1,118 @@
 Benchmarking CI Service
 =======================
 
-
 Brainstorm of Debian Med/SEQwiki/biotools benchmarking service
 
-This thing needs a name, ASAP!
-
-Possible datsets:
-    - https://sites.stanford.edu/abms/giab: Genome In A Bottle  is a NIST human
-      NGS resequencing dataset (Paper:
-http://biorxiv.org/content/early/2015/09/15/026468)
-    - http://gmisatest.referata.com/wiki/Dataset_1408MLGX6-3WGS
-  https://public.etherpad-mozilla.org/p/debian-med-benchmarking
-  
-Similar to ReproducibleBuilds project
-Machine that automatically defines a debian benchmark packages based on
-metadata repository whenever:
-    - There is a new version of the underlying package in debian med unstable
+**This thing needs a name, ASAP!**
+
+
+Similar to ReproducibleBuilds project, but for scientific accuracy
+
+Architecture:
+-------------
+
+  - Pre-package and publish to separate local archive (benchmarking specific
+    code ONLY):
+    - Metric script .debs
+    - Real/published dataset .debs
+      - From public databases e.g. SRA or RefSeq
+      - *Not* for simulated datasets, their parameters are kept within CWL
+        files
+  - Benchmarking workflows are CWL workflows:
+    - Tests run in docker containers:
+    - Poll YAML or DB for build-deps and datasets
+    - Install tool and tool deps from ftp.debian.org/debian
+    - Install required evaluation/metric tools and dataset .debs from our own
+      archive
+      - Need a way to code these build-deps in a CWL file
+    - Create Dockerfile for the workflow?
+      - Or is a CWL workflow with dockerised tools enough?
+    - Run container of above image, producing data file results
+    - Workflow steps:
+      - Obtain dataset:
+        - Either: a) run simulator steps (log seeds), or b) install data
+          package from local repo per above
+      - Do any pre-conversion to tool input format (auto-detected from EDAM
+        formats)
+      - Run tool on data
+      - Run any post-conversion tools to evaluation code input format (also
+        auto-detected)
+      - Run evaluation code
+      - Report full result
+        - Probably as YAML file, or similar.
+      - Report "benchmark status" to UDD
+        - Simple state-based update (fail, got worse, all ok)
+        - Report biggest change in metric (potentially biggest improvement and
+          regression) for all builds
+        - There may be many tools per package, report best/worst across all
+          tools
+        - Path to and checksum of tarball of all benchmark results
+        - Will need a new UDD table
+  - Docker images
+    - Could use CWL tool Docker containers thru CWL workflow
+      - Don't always use Debian, unfortunately
+    - Could run whole workflow within Debian Unstable docker container
+      - But still run CWL workflow within the container
+    - Debugging/reproducible containers
+      - auto-generated `Dockerfile` for image containing all datasets, metrics,
+        conversion tools, and `RUN` steps for obtaining data and running
+        pre-conversion (but stopping before tool execution.
+      - If we use docker for actual workflow execution, then this is what would
+        be used for the test execution
+  - Could all this run on a new instance of debci? or Jenkins?
+
+Requirements
+------------
+
+  - Have an EDAM-compatible DebTags
+  - Have a CWL tool description for each tool in the package
+    - Should contain EDAM tags per operation
+    - Potentially one CWL tool file per subtool/operation (e.g. `samtools view`
+      vs `samtools sort`)
+  - Be in `main`
+
+Operation
+---------
+
+  - New service that runs benchmarks when:
+    - There is a new version of the underlying package in Debian med unstable
+      - Including any conversion utilities
     - There is a change in an applicable script for the calculation of metrics
     - There is a change in an applicable benchmark dataset
+    - There is an applicable transition in progress??
+      - Could catch subtle bugs e.g. py3.4 -> py3.5 issues
+
+Schema of ideas
+---------------
+
+  - There may be many tools per package
+  - Each tool may have many benchmarkable operations
+  - Each operation of the tool should be tested by many datasets
+  - Each test should (or may) report more than one metric
+
+
+Possible datasets:
+------------------
+
+  - https://sites.stanford.edu/abms/giab: Genome In A Bottle  is a NIST human
+    NGS resequencing dataset (Paper:
+tp://biorxiv.org/content/early/2015/09/15/026468)
+  - http://gmisatest.referata.com/wiki/Dataset_1408MLGX6-3WGS
+https://public.etherpad-mozilla.org/p/debian-med-benchmarking
+
+
+
+
+
+
+
+
+
+
+
+
+`cat brain | less`
+------------------
 
 The EDAM classification of a tool lives in the DebTags, and the CWL description
 of a tool lives within the debian med package
@@ -50,56 +144,28 @@ file format to the metric calculation script's input file format
 
 Autopkgtest?
 
-    - It may be worth investigating using autopkgtest infrastruture (perhaps
-      run the service as a debci instance) that runs autopakcage tests:
-
-    - each benchmark package contains a test (or tests) in autopkgtest format,
-      that we parse & use on our debci
+ - It may be worth investigating using autopkgtest infrastruture (perhaps
+   run the service as a debci instance) that runs autopakcage tests:
+ - each benchmark package contains a test (or tests) in autopkgtest format,
+   that we parse & use on our debci
 
 
 
 Metadata storage:
-    - Repository of YAML-style markup parsed into SQL?
-    - Debtags?
-    - Just write a script?
-
-
-UDD:
-
-    - https://udd.debian.org/dmd/?email1=debian-med-packaging%40lists.alioth.debian.org&email2=&email3=&packages=&ignpackages=&format=html#todo
 
-    - Or, more sanely:
-      https://udd.debian.org/dmd/?email1=spam%40kdmurray.id.au&email2=&email3=&packages=&ignpackages=&format=html#todo
+ - Repository of YAML-style markup parsed into SQL?
+ - Debtags?
+ - Just write a script?
 
-    - New table required for debian benchmarks status data
 
-    - Errors in building a test or major changes in a metric are reported to
-      UDD in a fashion similar to how it is done by the ReproducibleBuilds
-system: We report whether the test could be computed at all and the largest
-positive and negative deviations of scores (in any dataset on any metric), plus
-a description of in which dataset and on which metric this deviation has
-occurred
-
-
-
-Architechture:
-    - Pre-package and publish to separate local archive (benchmarking specific
-      code ONLY):
-        - Metric script .debs
-        - dataset .debs
-    -Tests run in docker containers:
-        - Poll YAML or DB for build-deps and datasets
-
-    - Install tool and tool deps from ftp.debian.org/debian
-
-    - Install required evaluation/metric tools and dataset .debs from our own
-      archive
+UDD:
 
-        - Create dockerfile for image from above (saved and published for every
-          benchmark)
-        - Run container of above image, producing data file results
-        - Run evaluation code and report result
-        - Delete container and image (keeping Dockerfile)
-    - Publish result (either text file [TSV, CSV or YAML], or cgi script to
-      pull from DB)
-    - Buider pushes status to UDD
+ - [debmed's page](https://udd.debian.org/dmd/?email1=debian-med-packaging%40lists.alioth.debian.org&email2=&email3=&packages=&ignpackages=&format=html#todo)
+ - Or, a more sane example
+   [KDM](https://udd.debian.org/dmd/?email1=spam%40kdmurray.id.au&email2=&email3=&packages=&ignpackages=&format=html#todo)
+ - New table required for debian benchmarks status data
+ - Errors in building a test or major changes in a metric are reported to
+   UDD in a fashion similar to how it is done by the ReproducibleBuilds system:
+We report whether the test could be computed at all and the largest positive
+and negative deviations of scores (in any dataset on any metric), plus a
+description of in which dataset and on which metric this deviation has occurred

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/debian-med-benchmarking-spec.git.git



More information about the debian-med-commit mailing list