[med-svn] [debian-med-benchmarking-spec.git] 01/01: Updated spec with architechture
Kevin Murray
daube-guest at moszumanska.debian.org
Fri Feb 5 14:22:57 UTC 2016
This is an automated email from the git hooks/post-receive script.
daube-guest pushed a commit to branch master
in repository debian-med-benchmarking-spec.git.
commit 2460dd3f52d8850a90fbd2adca1312badddc65c7
Author: Kevin Murray <spam at kdmurray.id.au>
Date: Fri Feb 5 15:22:29 2016 +0100
Updated spec with architechture
---
benchmarking.md | 184 ++++++++++++++++++++++++++++++++++++++------------------
1 file changed, 125 insertions(+), 59 deletions(-)
diff --git a/benchmarking.md b/benchmarking.md
index b49396e..94a3677 100644
--- a/benchmarking.md
+++ b/benchmarking.md
@@ -1,24 +1,118 @@
Benchmarking CI Service
=======================
-
Brainstorm of Debian Med/SEQwiki/biotools benchmarking service
-This thing needs a name, ASAP!
-
-Possible datsets:
- - https://sites.stanford.edu/abms/giab: Genome In A Bottle is a NIST human
- NGS resequencing dataset (Paper:
-http://biorxiv.org/content/early/2015/09/15/026468)
- - http://gmisatest.referata.com/wiki/Dataset_1408MLGX6-3WGS
- https://public.etherpad-mozilla.org/p/debian-med-benchmarking
-
-Similar to ReproducibleBuilds project
-Machine that automatically defines a debian benchmark packages based on
-metadata repository whenever:
- - There is a new version of the underlying package in debian med unstable
+**This thing needs a name, ASAP!**
+
+
+Similar to ReproducibleBuilds project, but for scientific accuracy
+
+Architecture:
+-------------
+
+ - Pre-package and publish to separate local archive (benchmarking specific
+ code ONLY):
+ - Metric script .debs
+ - Real/published dataset .debs
+ - From public databases e.g. SRA or RefSeq
+ - *Not* for simulated datasets, their parameters are kept within CWL
+ files
+ - Benchmarking workflows are CWL workflows:
+ - Tests run in docker containers:
+ - Poll YAML or DB for build-deps and datasets
+ - Install tool and tool deps from ftp.debian.org/debian
+ - Install required evaluation/metric tools and dataset .debs from our own
+ archive
+ - Need a way to code these build-deps in a CWL file
+ - Create Dockerfile for the workflow?
+ - Or is a CWL workflow with dockerised tools enough?
+ - Run container of above image, producing data file results
+ - Workflow steps:
+ - Obtain dataset:
+ - Either: a) run simulator steps (log seeds), or b) install data
+ package from local repo per above
+ - Do any pre-conversion to tool input format (auto-detected from EDAM
+ formats)
+ - Run tool on data
+ - Run any post-conversion tools to evaluation code input format (also
+ auto-detected)
+ - Run evaluation code
+ - Report full result
+ - Probably as YAML file, or similar.
+ - Report "benchmark status" to UDD
+ - Simple state-based update (fail, got worse, all ok)
+ - Report biggest change in metric (potentially biggest improvement and
+ regression) for all builds
+ - There may be many tools per package, report best/worst across all
+ tools
+ - Path to and checksum of tarball of all benchmark results
+ - Will need a new UDD table
+ - Docker images
+ - Could use CWL tool Docker containers thru CWL workflow
+ - Don't always use Debian, unfortunately
+ - Could run whole workflow within Debian Unstable docker container
+ - But still run CWL workflow within the container
+ - Debugging/reproducible containers
+ - auto-generated `Dockerfile` for image containing all datasets, metrics,
+ conversion tools, and `RUN` steps for obtaining data and running
+ pre-conversion (but stopping before tool execution.
+ - If we use docker for actual workflow execution, then this is what would
+ be used for the test execution
+ - Could all this run on a new instance of debci? or Jenkins?
+
+Requirements
+------------
+
+ - Have an EDAM-compatible DebTags
+ - Have a CWL tool description for each tool in the package
+ - Should contain EDAM tags per operation
+ - Potentially one CWL tool file per subtool/operation (e.g. `samtools view`
+ vs `samtools sort`)
+ - Be in `main`
+
+Operation
+---------
+
+ - New service that runs benchmarks when:
+ - There is a new version of the underlying package in Debian med unstable
+ - Including any conversion utilities
- There is a change in an applicable script for the calculation of metrics
- There is a change in an applicable benchmark dataset
+ - There is an applicable transition in progress??
+ - Could catch subtle bugs e.g. py3.4 -> py3.5 issues
+
+Schema of ideas
+---------------
+
+ - There may be many tools per package
+ - Each tool may have many benchmarkable operations
+ - Each operation of the tool should be tested by many datasets
+ - Each test should (or may) report more than one metric
+
+
+Possible datasets:
+------------------
+
+ - https://sites.stanford.edu/abms/giab: Genome In A Bottle is a NIST human
+ NGS resequencing dataset (Paper:
+tp://biorxiv.org/content/early/2015/09/15/026468)
+ - http://gmisatest.referata.com/wiki/Dataset_1408MLGX6-3WGS
+https://public.etherpad-mozilla.org/p/debian-med-benchmarking
+
+
+
+
+
+
+
+
+
+
+
+
+`cat brain | less`
+------------------
The EDAM classification of a tool lives in the DebTags, and the CWL description
of a tool lives within the debian med package
@@ -50,56 +144,28 @@ file format to the metric calculation script's input file format
Autopkgtest?
- - It may be worth investigating using autopkgtest infrastruture (perhaps
- run the service as a debci instance) that runs autopakcage tests:
-
- - each benchmark package contains a test (or tests) in autopkgtest format,
- that we parse & use on our debci
+ - It may be worth investigating using autopkgtest infrastruture (perhaps
+ run the service as a debci instance) that runs autopakcage tests:
+ - each benchmark package contains a test (or tests) in autopkgtest format,
+ that we parse & use on our debci
Metadata storage:
- - Repository of YAML-style markup parsed into SQL?
- - Debtags?
- - Just write a script?
-
-
-UDD:
-
- - https://udd.debian.org/dmd/?email1=debian-med-packaging%40lists.alioth.debian.org&email2=&email3=&packages=&ignpackages=&format=html#todo
- - Or, more sanely:
- https://udd.debian.org/dmd/?email1=spam%40kdmurray.id.au&email2=&email3=&packages=&ignpackages=&format=html#todo
+ - Repository of YAML-style markup parsed into SQL?
+ - Debtags?
+ - Just write a script?
- - New table required for debian benchmarks status data
- - Errors in building a test or major changes in a metric are reported to
- UDD in a fashion similar to how it is done by the ReproducibleBuilds
-system: We report whether the test could be computed at all and the largest
-positive and negative deviations of scores (in any dataset on any metric), plus
-a description of in which dataset and on which metric this deviation has
-occurred
-
-
-
-Architechture:
- - Pre-package and publish to separate local archive (benchmarking specific
- code ONLY):
- - Metric script .debs
- - dataset .debs
- -Tests run in docker containers:
- - Poll YAML or DB for build-deps and datasets
-
- - Install tool and tool deps from ftp.debian.org/debian
-
- - Install required evaluation/metric tools and dataset .debs from our own
- archive
+UDD:
- - Create dockerfile for image from above (saved and published for every
- benchmark)
- - Run container of above image, producing data file results
- - Run evaluation code and report result
- - Delete container and image (keeping Dockerfile)
- - Publish result (either text file [TSV, CSV or YAML], or cgi script to
- pull from DB)
- - Buider pushes status to UDD
+ - [debmed's page](https://udd.debian.org/dmd/?email1=debian-med-packaging%40lists.alioth.debian.org&email2=&email3=&packages=&ignpackages=&format=html#todo)
+ - Or, a more sane example
+ [KDM](https://udd.debian.org/dmd/?email1=spam%40kdmurray.id.au&email2=&email3=&packages=&ignpackages=&format=html#todo)
+ - New table required for debian benchmarks status data
+ - Errors in building a test or major changes in a metric are reported to
+ UDD in a fashion similar to how it is done by the ReproducibleBuilds system:
+We report whether the test could be computed at all and the largest positive
+and negative deviations of scores (in any dataset on any metric), plus a
+description of in which dataset and on which metric this deviation has occurred
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/debian-med-benchmarking-spec.git.git
More information about the debian-med-commit
mailing list