[med-svn] [Git][med-team/grabix][master] 19 commits: Fix watch file

Andreas Tille gitlab at salsa.debian.org
Sun Oct 28 16:05:00 GMT 2018


Andreas Tille pushed to branch master at Debian Med / grabix


Commits:
e3643d43 by Andreas Tille at 2018-10-28T13:13:47Z
Fix watch file

- - - - -
bb2e5a92 by Andreas Tille at 2018-10-28T13:14:46Z
New upstream version 0.1.7
- - - - -
576aa358 by Andreas Tille at 2018-10-28T13:14:58Z
Update upstream source from tag 'upstream/0.1.7'

Update to upstream version '0.1.7'
with Debian dir e534837feed9a481db2d3afc588aaabe81c500ce
- - - - -
c5f7e05c by Andreas Tille at 2018-10-28T13:14:58Z
New upstream version

- - - - -
241cf3a3 by Andreas Tille at 2018-10-28T13:14:59Z
debhelper 11

- - - - -
76185eda by Andreas Tille at 2018-10-28T13:15:05Z
Point Vcs fields to salsa.debian.org

- - - - -
14da50dc by Andreas Tille at 2018-10-28T13:15:05Z
Standards-Version: 4.2.1

- - - - -
2ac39086 by Andreas Tille at 2018-10-28T13:15:11Z
Remove trailing whitespace in debian/control

- - - - -
4285fe26 by Andreas Tille at 2018-10-28T13:15:12Z
Remove trailing whitespace in debian/copyright

- - - - -
bcecce33 by Andreas Tille at 2018-10-28T13:15:12Z
Remove trailing whitespace in debian/rules

- - - - -
b8a411d9 by Andreas Tille at 2018-10-28T13:39:39Z
DEP3

- - - - -
11c5101e by Andreas Tille at 2018-10-28T15:07:10Z
typo

- - - - -
fd50f4b8 by Andreas Tille at 2018-10-28T15:28:27Z
Provide sensible build time test

- - - - -
dd579a48 by Andreas Tille at 2018-10-28T15:37:19Z
Do not confuse users by mentioning binaries from other packages in manpage

- - - - -
9b1e9569 by Andreas Tille at 2018-10-28T15:45:58Z
Add autopkgtest

- - - - -
c8cb30cf by Andreas Tille at 2018-10-28T15:50:33Z
Add unit test as user example

- - - - -
24dd5b41 by Andreas Tille at 2018-10-28T15:56:52Z
Assume that grabix can be found in PATH

- - - - -
aafffa30 by Andreas Tille at 2018-10-28T16:01:18Z
Deal with vcf files in different path

- - - - -
d8aeefaa by Andreas Tille at 2018-10-28T16:03:45Z
Upload to unstable

- - - - -


30 changed files:

- + debian/README.test
- debian/changelog
- debian/compat
- debian/control
- debian/copyright
- debian/docs
- + debian/examples
- debian/grabix.1
- debian/patches/Hardening.patch
- debian/patches/fix_assignment_of_char_to_pointer.patch
- + debian/patches/fix_test.patch
- debian/patches/introduceLTO.patch
- debian/patches/series
- − debian/patches/tests.patch
- debian/patches/warnings.patch
- debian/rules
- + debian/testdata/get
- + debian/testdata/test.PLs.vcf
- + debian/testdata/test.clinvar.vcf
- + debian/testdata/test.cosmic.vcf
- + debian/testdata/test.exac.vcf
- + debian/testdata/test.fusions.vcf
- + debian/testdata/test.sh
- + debian/tests/control
- + debian/tests/run-unit-test
- debian/watch
- grabix.cpp
- grabix.h
- test.sh
- − tests/empty.fastq.gz


Changes:

=====================================
debian/README.test
=====================================
@@ -0,0 +1,8 @@
+Notes on how this package can be tested.
+────────────────────────────────────────
+
+This package can be tested by running the provided test:
+
+    sh run-unit-test
+
+in order to confirm its integrity.


=====================================
debian/changelog
=====================================
@@ -1,3 +1,16 @@
+grabix (0.1.7-1) unstable; urgency=medium
+
+  * Team upload.
+  * Fix watch file
+  * debhelper 11
+  * Point Vcs fields to salsa.debian.org
+  * Standards-Version: 4.2.1
+  * Remove trailing whitespace in debian/control
+  * Remove trailing whitespace in debian/copyright
+  * Remove trailing whitespace in debian/rules
+
+ -- Andreas Tille <tille at debian.org>  Sun, 28 Oct 2018 17:01:35 +0100
+
 grabix (0.1.6+git20171023-1) unstable; urgency=low
 
   [ Steffen Moeller ]


=====================================
debian/compat
=====================================
@@ -1 +1 @@
-10
+11


=====================================
debian/control
=====================================
@@ -3,24 +3,29 @@ Maintainer: Debian Med Packaging Team <debian-med-packaging at lists.alioth.debian.
 Uploaders: Steffen Moeller <moeller at debian.org>
 Section: science
 Priority: optional
-Build-Depends: debhelper (>= 10),
-               zlib1g-dev
-Standards-Version: 4.1.3
-Vcs-Browser: https://anonscm.debian.org/cgit/debian-med/grabix.git
-Vcs-Git: https://anonscm.debian.org/git/debian-med/grabix.git
+Build-Depends: debhelper (>= 11~),
+               zlib1g-dev,
+               python,
+               tabix,
+               time,
+               less
+Standards-Version: 4.2.1
+Vcs-Browser: https://salsa.debian.org/med-team/grabix
+Vcs-Git: https://salsa.debian.org/med-team/grabix.git
 Homepage: https://github.com/arq5x/grabix
 
 Package: grabix
 Architecture: any
 Depends: ${shlibs:Depends},
-         ${misc:Depends}
+         ${misc:Depends},
+         tabix
 Description: wee tool for random access into BGZF files
  In biomedical research it is increasing practice to study
  the genetic basis of disease. This now frequently comprises
  the sequencing of human sequences. The output of the machine
  however is redundant, and the real sequence is the best
  sequence to explain the redundancy. The exchange of data
- happens only with compressed files - to huge and redundant 
+ happens only with compressed files - to huge and redundant
  to perform otherwise. One should avoid uncompression whenever
  possible.
  .


=====================================
debian/copyright
=====================================
@@ -34,6 +34,6 @@ License: MIT
  OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
  MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
  IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
- CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, 
- TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 
+ CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
  SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


=====================================
debian/docs
=====================================
@@ -1 +1,2 @@
 README.md
+debian/tests/run-unit-test


=====================================
debian/examples
=====================================
@@ -0,0 +1,2 @@
+tests
+debian/testdata


=====================================
debian/grabix.1
=====================================
@@ -5,9 +5,6 @@
 .SH NAME
 grabix \- random access on large compressed sequence data
 .SH SYNOPSIS
-.B bgzip
-.RI bedfile
-.br
 .B grabix
 index
 .RI bedfile.gz
@@ -30,7 +27,7 @@ possible.
 .PP
 grabix leverages the fantastic BGZF library of the samtools
 package to provide random access into text files that have
-been compressed with bgzip. grabix creates it's own index
+been compressed with bgzip (from tabix package). grabix creates it's own index
 (.gbi) of the bgzipped file. Once indexed, one can extract
 arbitrary lines from the file with the grab command. Or
 choose random lines with the, well, random command.


=====================================
debian/patches/Hardening.patch
=====================================
@@ -1,3 +1,7 @@
+Author: Andreas Tille
+Last-Update: 2017-12-07 20:03:53 +0100
+Description: Propagate hardening options
+
 --- a/Makefile
 +++ b/Makefile
 @@ -1,2 +1,9 @@


=====================================
debian/patches/fix_assignment_of_char_to_pointer.patch
=====================================
@@ -1,3 +1,7 @@
+Author: Steffen Moeller
+Last-Update: 2018-04-27 14:25:12 +0200
+Description: fix assignment of char to pointer
+
 Index: grabix/grabix.cpp
 ===================================================================
 --- grabix.orig/grabix.cpp


=====================================
debian/patches/fix_test.patch
=====================================
@@ -0,0 +1,17 @@
+Author: Andreas Tille <tille at debian.org>
+Last-Update: Sun, 28 Oct 2018 14:13:30 +0100
+Description: Assume that grabix can be found in PATH
+
+--- a/tests/test-fastq.py
++++ b/tests/test-fastq.py
+@@ -10,8 +10,8 @@ print "checking indexing at bounds:"
+ def check(gzname, start, end=None):
+     run = subprocess.check_output
+     exp = lines[start:start+1] if end is None else lines[start:end + 1]
+-    obs = run("./grabix grab %s %d" % (gzname, start), shell=True) if end is None \
+-            else run("./grabix grab %s %d %d" % (gzname, start, end), shell=True)
++    obs = run("grabix grab %s %d" % (gzname, start), shell=True) if end is None \
++            else run("grabix grab %s %d %d" % (gzname, start, end), shell=True)
+     obs = [x.strip() for x in obs.strip().split("\n")]
+     sys.stdout.write(".")
+     sys.stdout.flush()


=====================================
debian/patches/introduceLTO.patch
=====================================
@@ -1,12 +1,14 @@
-Index: grabix/Makefile
-===================================================================
---- grabix.orig/Makefile
-+++ grabix/Makefile
+Author: Steffen Moeller
+Last-Update: 2018-04-27 14:25:12 +0200
+Description: Add lto option
+
+--- a/Makefile
++++ b/Makefile
 @@ -1,5 +1,7 @@
  LDFLAGS+= -lstdc++ -lz
  CFLAGS+=-Wall
 +CFLAGS+= -flto
 +LDFLAGS+= -Wl,-flto
  
- all:	grabix
- 
+ all:
+ 	gcc $(CFLAGS) -o grabix grabix_main.cpp grabix.cpp bgzf.c $(LDFLAGS)


=====================================
debian/patches/series
=====================================
@@ -1,5 +1,5 @@
 fix_assignment_of_char_to_pointer.patch
 Hardening.patch
 warnings.patch
-tests.patch
 introduceLTO.patch
+fix_test.patch


=====================================
debian/patches/tests.patch deleted
=====================================
@@ -1,31 +0,0 @@
-Index: grabix/Makefile
-===================================================================
---- grabix.orig/Makefile
-+++ grabix/Makefile
-@@ -1,9 +1,15 @@
- LDFLAGS+= -lstdc++ -lz
- CFLAGS+=-Wall
- 
--all:
-+all:	grabix
-+
-+grabix:
- 	gcc $(CFLAGS) -o grabix grabix_main.cpp grabix.cpp bgzf.c $(LDFLAGS)
- 
-+test:
-+	bash ./test.sh
-+
- clean:
- 	rm -f grabix
-+	rm -f tests/empty.fastq.gz.gbi
- 
-Index: grabix/test.sh
-===================================================================
---- grabix.orig/test.sh
-+++ grabix/test.sh
-@@ -1,5 +1,3 @@
--make
--
- FQ=test.cnt.gz
- rm -f ${FQ}{,.gbi}
- 


=====================================
debian/patches/warnings.patch
=====================================
@@ -1,3 +1,7 @@
+Author: Steffen Moeller
+Last-Update: 2018-04-27 14:25:12 +0200
+Description: Avoid warnings about wrong type
+
 Index: grabix/grabix.cpp
 ===================================================================
 --- grabix.orig/grabix.cpp


=====================================
debian/rules
=====================================
@@ -8,4 +8,9 @@ DPKG_EXPORT_BUILDFLAGS = 1
 include /usr/share/dpkg/buildflags.mk
 
 %:
-	dh $@ 
+	dh $@
+
+override_dh_auto_test:
+ifeq (,$(filter nocheck,$(DEB_BUILD_OPTIONS)))
+	PATH=$(CURDIR):$(PATH) sh debian/testdata/test.sh
+endif


=====================================
debian/testdata/get
=====================================
@@ -0,0 +1,15 @@
+#!/bin/sh
+
+# Get five VCF files from test suite which have the smallest file size
+
+BASEDOWNLOADURL=https://raw.githubusercontent.com/arq5x/gemini/master/test/
+VCF="test.PLs.vcf
+     test.clinvar.vcf
+     test.cosmic.vcf
+     test.exac.vcf
+     test.fusions.vcf
+    "
+
+for V in $VCF ; do
+    wget --quiet ${BASEDOWNLOADURL}/$V
+done


=====================================
debian/testdata/test.PLs.vcf
=====================================
@@ -0,0 +1,43 @@
+##fileformat=VCFv4.1
+##FILTER=<ID=PASS,Description="All filters passed">
+##FILTER=<ID=LowQual,Description="Low quality">
+##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
+##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
+##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
+##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
+##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
+##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
+##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
+##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
+##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
+##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
+##INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">
+##INFO=<ID=Dels,Number=1,Type=Float,Description="Fraction of Reads Containing Spanning Deletions">
+##INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">
+##INFO=<ID=HaplotypeScore,Number=1,Type=Float,Description="Consistency of the site with at most two segregating haplotypes">
+##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
+##INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
+##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
+##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
+##INFO=<ID=MQ0,Number=1,Type=Integer,Description="Total Mapping Quality Zero Reads">
+##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score >From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
+##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">
+##INFO=<ID=RPA,Number=.,Type=Integer,Description="Number of times tandem repeat unit is repeated, for each allele (including reference)">
+##INFO=<ID=RU,Number=1,Type=String,Description="Tandem repeat unit (bases)">
+##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
+##INFO=<ID=STR,Number=0,Type=Flag,Description="Variant is a short tandem repeat">
+##UnifiedGenotyper="analysis_type=UnifiedGenotyper input_file=[bam/all.novoalign.rg.sorted.realigned.bam] read_buffer_size=null phone_home=STANDARD gatk_key=null tag=NA read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/home/arq5x/cphg-home/shared/genomes/hg19/bwa/gatk/hg19_gatk.fa nonDeterministicRandomSeed=false disableRandomization=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=250 baq=OFF baqGapOpenPenalty=40.0 fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false performanceLog=null useOriginalQualities=false BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 allow_bqsr_on_reduced_bams_despite_repeated_warnings=false defaultBaseQualities=-1 validation_strictness=SILENT remove_program_records=false keep_program_records=false unsafe=null num_threads=20 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false logging_level=INFO log_to_file=null help=false version=false genotype_likelihoods_model=BOTH pcr_error_rate=1.0E-4 computeSLOD=false annotateNDA=false pair_hmm_implementation=ORIGINAL min_base_quality_score=17 max_deletion_fraction=0.05 min_indel_count_for_genotyping=5 min_indel_fraction_per_sample=0.25 indel_heterozygosity=1.25E-4 indelGapContinuationPenalty=10 indelGapOpenPenalty=45 indelHaplotypeSize=80 indelDebug=false ignoreSNPAlleles=false allReadsSP=false ignoreLaneInfo=false reference_sample_calls=(RodBinding name= source=UNBOUND) reference_sample_name=null sample_ploidy=2 min_quality_score=1 max_quality_score=40 site_quality_prior=20 min_power_threshold_for_calling=0.95 min_reference_depth=100 exclude_filtered_reference_sites=false heterozygosity=0.0010 genotyping_mode=DISCOVERY output_mode=EMIT_VARIANTS_ONLY standard_min_confidence_threshold_for_calling=30.0 standard_min_confidence_threshold_for_emitting=30.0 alleles=(RodBinding name= source=UNBOUND) max_alternate_alleles=6 contamination_fraction_to_filter=0.05 contamination_fraction_per_sample_file=null p_nonref_model=EXACT_INDEPENDENT logRemovedReadsFromContaminationFiltering=null exactcallslog=null dbsnp=(RodBinding name= source=UNBOUND) comp=[] out=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub bcf=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub debug_file=null metrics_file=null annotation=[] excludeAnnotation=[] filter_mismatching_base_and_quals=false"
+##reference=file:///home/arq5x/cphg-home/shared/genomes/hg19/bwa/gatk/hg19_gatk.fa
+##bcftools_viewVersion=1.1-82-g4f3a265+htslib-1.1-87-g3c4f33a
+##INFO=<ID=OLD_MULTIALLELIC,Number=1,Type=String,Description="Original chr:pos:ref:alt encoding">
+##INFO=<ID=OLD_VARIANT,Number=1,Type=String,Description="Original chr:pos:ref:alt encoding">
+#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	A	B	C	D	E	F	G	H	I	J	K	L
+chr1	13273	.	G	C	1685.64	.	AC=5;AF=0.208;AN=24;BaseQRankSum=-1.06;DP=701;Dels=0;FS=0.887;HaplotypeScore=1.5983;InbreedingCoeff=-0.2652;MLEAC=5;MLEAF=0.208;MQ=25.13;MQ0=20;MQRankSum=6.545;QD=6.99;ReadPosRankSum=1.826	GT:AD:DP:GQ:PL	0/1:44,29:70:99:561,0,317	0/0:56,0:56:99:0,99,1176	0/0:158,0:158:99:0,280,3426	0/0:91,2:91:78:0,78,892	0/1:28,12:38:99:282,0,254	0/1:41,17:56:99:360,0,494	0/0:35,0:35:48:0,48,572	0/0:47,0:47:42:0,42,491	0/0:21,0:21:15:0,15,150	0/1:18,6:23:99:137,0,171	0/0:47,1:47:66:0,66,748	0/1:30,16:44:99:401,0,154
+chr1	13656	.	CAG	C	1065.21	.	AC=10;AF=0.5;AN=20;BaseQRankSum=3.156;DP=227;FS=0;InbreedingCoeff=-0.9974;MLEAC=10;MLEAF=0.5;MQ=15.83;MQ0=0;MQRankSum=2.299;QD=2.45;ReadPosRankSum=1.342	GT:AD:DP:GQ:PL	0/1:5,3:9:99:142,0,251	0/1:6,1:7:34:34,0,311	0/1:16,2:18:57:57,0,803	0/1:5,8:13:99:404,0,238	0/1:2,1:5:46:46,0,51	0/1:1,4:5:19:207,0,19	0/1:2,2:4:77:99,0,77	0/1:1,1:2:49:49,0,49	./.:.:.:.:.	./.:.:.:.:.	0/1:2,1:3:46:46,0,102	0/1:5,1:6:37:37,0,259
+chr1	14464	.	A	T	7711.47	.	AC=7;AF=0.292;AN=24;BaseQRankSum=-0.671;DP=1873;Dels=0;FS=8.549;HaplotypeScore=3.6331;InbreedingCoeff=-0.4118;MLEAC=7;MLEAF=0.292;MQ=28.21;MQ0=70;MQRankSum=12.577;QD=6.92;ReadPosRankSum=4.608	GT:AD:DP:GQ:PL	0/0:245,5:239:99:0,247,2833	0/1:198,51:238:99:744,0,2519	0/1:106,144:238:99:3261,0,959	0/0:227,3:222:99:0,184,2148	0/0:105,1:103:99:0,159,1940	0/1:163,50:203:99:898,0,1851	0/1:64,36:95:99:842,0,192	0/1:47,67:109:45:1397,0,45	0/1:46,6:50:99:103,0,137	0/0:79,3:79:99:0,105,1241	0/1:103,33:130:99:525,0,1103	0/0:90,0:90:69:0,69,840
+chr1	14653	.	C	T	5639.59	.	AC=12;AF=0.5;AN=24;BaseQRankSum=8.24;DP=2657;Dels=0;FS=311.609;HaplotypeScore=2.3028;InbreedingCoeff=-1;MLEAC=12;MLEAF=0.5;MQ=28.88;MQ0=22;MQRankSum=2.317;QD=2.12;ReadPosRankSum=-0.665	GT:AD:DP:GQ:PL	0/1:184,66:238:99:1119,0,3133	0/1:208,42:238:99:438,0,4373	0/1:219,31:238:87:87,0,5113	0/1:194,56:238:99:583,0,3872	0/1:211,37:236:99:383,0,3687	0/1:210,40:238:99:125,0,4458	0/1:142,46:179:99:755,0,1869	0/1:216,34:238:88:88,0,3878	0/1:77,66:136:63:1649,0,63	0/1:126,26:145:99:201,0,2385	0/1:210,40:238:99:161,0,3825	0/1:154,22:168:99:103,0,2315
+chr1	14907	.	A	G	31850.9	.	AC=11;AF=0.458;AN=24;BaseQRankSum=3.666;DP=2992;Dels=0;FS=17.089;HaplotypeScore=3.5152;InbreedingCoeff=-0.8462;MLEAC=11;MLEAF=0.458;MQ=44.71;MQ0=23;MQRankSum=-1.503;QD=11.62;ReadPosRankSum=6.988	GT:AD:DP:GQ:PL	0/1:113,135:236:99:3658,0,2804	0/1:170,80:238:99:1456,0,4749	0/0:218,32:238:65:0,65,6850	0/1:129,120:237:99:3479,0,2750	0/1:82,168:238:99:5081,0,1536	0/1:161,88:237:99:2326,0,4131	0/1:137,113:238:99:3063,0,3536	0/1:120,129:243:99:3505,0,3157	0/1:149,101:238:99:2614,0,3510	0/1:144,106:238:99:2902,0,3511	0/1:207,43:238:99:444,0,5850	0/1:120,127:241:99:3378,0,3064
+chr1	724714	.	A	T	119.73	.	AC=6;AF=0.5;AN=12;BaseQRankSum=-0.529;DP=13;Dels=0;FS=0;HaplotypeScore=0.4999;MLEAC=6;MLEAF=0.5;MQ=52.23;MQ0=0;MQRankSum=0.38;QD=17.1;ReadPosRankSum=-1.309	GT:AD:DP:GQ:PL	1/1:0,2:2:3:45,3,0	./.:.:.:.:.	0/0:4,0:4:6:0,6,90	0/1:1,1:2:36:36,0,36	1/1:0,1:1:3:39,3,0	./.:.:.:.:.	./.:.:.:.:.	./.:.:.:.:.	./.:.:.:.:.	./.:.:.:.:.	0/1:1,1:2:35:36,0,35	0/0:1,0:1:3:0,3,42
+chr1	724728	.	T	A	168.12	.	AC=7;AF=0.7;AN=10;BaseQRankSum=-1.067;DP=9;Dels=0;FS=3.332;HaplotypeScore=0.3999;MLEAC=7;MLEAF=0.7;MQ=57.14;MQ0=0;MQRankSum=-1.589;QD=24.02;ReadPosRankSum=1.067	GT:AD:DP:GQ:PL	1/1:0,2:2:3:45,3,0	./.:.:.:.:.	./.:.:.:.:.	1/1:0,2:2:6:80,6,0	1/1:0,1:1:3:39,3,0	./.:.:.:.:.	./.:.:.:.:.	./.:.:.:.:.	./.:.:.:.:.	./.:.:.:.:.	0/1:1,1:2:34:36,0,34	0/0:1,0:1:3:0,3,42
+chr1	724748	.	A	T	53.58	.	AC=4;AF=0.5;AN=8;BaseQRankSum=-0.358;DP=6;Dels=0;FS=0;HaplotypeScore=0.25;MLEAC=4;MLEAF=0.5;MQ=59.23;MQ0=0;MQRankSum=-0.358;QD=17.86;ReadPosRankSum=0.358	GT:AD:DP:GQ:PL	1/1:0,2:2:3:41,3,0	./.:.:.:.:.	./.:.:.:.:.	0/0:1,0:1:3:0,3,41	./.:.:.:.:.	./.:.:.:.:.	./.:.:.:.:.	./.:.:.:.:.	./.:.:.:.:.	./.:.:.:.:.	1/1:0,1:1:3:42,3,0	0/0:1,0:1:3:0,3,42
+chr1	724749	.	A	T	53.58	.	AC=4;AF=0.5;AN=8;BaseQRankSum=-0.358;DP=6;Dels=0;FS=0;HaplotypeScore=0.25;MLEAC=4;MLEAF=0.5;MQ=59.23;MQ0=0;MQRankSum=-0.358;QD=17.86;ReadPosRankSum=0.358	GT:AD:DP:GQ:PL	./.:.:.:.:.	./.:.:.:.:.	./.:.:.:.:.	./.:.:.:.:.	./.:.:.:.:.	./.:.:.:.:.	./.:.:.:.:.	./.:.:.:.:.	./.:.:.:.:.	./.:.:.:.:.	./.:.:.:.:.	./.:.:.:.:.


=====================================
debian/testdata/test.clinvar.vcf
=====================================
@@ -0,0 +1,79 @@
+##fileformat=VCFv4.0
+##fileStatus=!!!! This is a provisional file !!!!
+##fileDate=20130118
+##source=dbSNP
+##dbSNP_BUILD_ID=137
+##reference=GRCh37.p5
+##phasing=partial
+##variationPropertyDocumentationUrl=ftp://ftp.ncbi.nlm.nih.gov/snp/specs/dbSNP_BitField_latest.pdf	
+##INFO=<ID=RSPOS,Number=1,Type=Integer,Description="Chr position reported in dbSNP">
+##INFO=<ID=RV,Number=0,Type=Flag,Description="RS orientation is reversed">
+##INFO=<ID=VP,Number=1,Type=String,Description="Variation Property.  Documentation is at ftp://ftp.ncbi.nlm.nih.gov/snp/specs/dbSNP_BitField_latest.pdf">
+##INFO=<ID=GENEINFO,Number=1,Type=String,Description="Pairs each of gene symbol:gene id.  The gene symbol and id are delimited by a colon (:) and each pair is delimited by a vertical bar (|)">
+##INFO=<ID=dbSNPBuildID,Number=1,Type=Integer,Description="First dbSNP Build for RS">
+##INFO=<ID=SAO,Number=1,Type=Integer,Description="Variant Allele Origin: 0 - unspecified, 1 - Germline, 2 - Somatic, 3 - Both">
+##INFO=<ID=SSR,Number=1,Type=Integer,Description="Variant Suspect Reason Codes (may be more than one value added together) 0 - unspecified, 1 - Paralog, 2 - byEST, 4 - oldAlign, 8 - Para_EST, 16 - 1kg_failed, 1024 - other">
+##INFO=<ID=GMAF,Number=1,Type=Float,Description="Global Minor Allele Frequency [0, 0.5]; global population is 1000GenomesProject phase 1 genotype data from 629 individuals, released in the 11-23-2012 dataset">
+##INFO=<ID=WGT,Number=1,Type=Integer,Description="Weight, 00 - unmapped, 1 - weight 1, 2 - weight 2, 3 - weight 3 or more">
+##INFO=<ID=VC,Number=1,Type=String,Description="Variation Class">
+##INFO=<ID=PM,Number=0,Type=Flag,Description="Variant is Precious(Clinical,Pubmed Cited)">
+##INFO=<ID=TPA,Number=0,Type=Flag,Description="Provisional Third Party Annotation(TPA) (currently rs from PHARMGKB who will give phenotype data)">
+##INFO=<ID=PMC,Number=0,Type=Flag,Description="Links exist to PubMed Central article">
+##INFO=<ID=S3D,Number=0,Type=Flag,Description="Has 3D structure - SNP3D table">
+##INFO=<ID=SLO,Number=0,Type=Flag,Description="Has SubmitterLinkOut - From SNP->SubSNP->Batch.link_out">
+##INFO=<ID=NSF,Number=0,Type=Flag,Description="Has non-synonymous frameshift A coding region variation where one allele in the set changes all downstream amino acids. FxnClass = 44">
+##INFO=<ID=NSM,Number=0,Type=Flag,Description="Has non-synonymous missense A coding region variation where one allele in the set changes protein peptide. FxnClass = 42">
+##INFO=<ID=NSN,Number=0,Type=Flag,Description="Has non-synonymous nonsense A coding region variation where one allele in the set changes to STOP codon (TER). FxnClass = 41">
+##INFO=<ID=REF,Number=0,Type=Flag,Description="Has reference A coding region variation where one allele in the set is identical to the reference sequence. FxnCode = 8">
+##INFO=<ID=SYN,Number=0,Type=Flag,Description="Has synonymous A coding region variation where one allele in the set does not change the encoded amino acid. FxnCode = 3">
+##INFO=<ID=U3,Number=0,Type=Flag,Description="In 3' UTR Location is in an untranslated region (UTR). FxnCode = 53">
+##INFO=<ID=U5,Number=0,Type=Flag,Description="In 5' UTR Location is in an untranslated region (UTR). FxnCode = 55">
+##INFO=<ID=ASS,Number=0,Type=Flag,Description="In acceptor splice site FxnCode = 73">
+##INFO=<ID=DSS,Number=0,Type=Flag,Description="In donor splice-site FxnCode = 75">
+##INFO=<ID=INT,Number=0,Type=Flag,Description="In Intron FxnCode = 6">
+##INFO=<ID=R3,Number=0,Type=Flag,Description="In 3' gene region FxnCode = 13">
+##INFO=<ID=R5,Number=0,Type=Flag,Description="In 5' gene region FxnCode = 15">
+##INFO=<ID=OTH,Number=0,Type=Flag,Description="Has other variant with exactly the same set of mapped positions on NCBI refernce assembly.">
+##INFO=<ID=CFL,Number=0,Type=Flag,Description="Has Assembly conflict. This is for weight 1 and 2 variant that maps to different chromosomes on different assemblies.">
+##INFO=<ID=ASP,Number=0,Type=Flag,Description="Is Assembly specific. This is set if the variant only maps to one assembly">
+##INFO=<ID=MUT,Number=0,Type=Flag,Description="Is mutation (journal citation, explicit fact): a low frequency variation that is cited in journal and other reputable sources">
+##INFO=<ID=VLD,Number=0,Type=Flag,Description="Is Validated.  This bit is set if the variant has 2+ minor allele count based on frequency or genotype data.">
+##INFO=<ID=G5A,Number=0,Type=Flag,Description=">5% minor allele frequency in each and all populations">
+##INFO=<ID=G5,Number=0,Type=Flag,Description=">5% minor allele frequency in 1+ populations">
+##INFO=<ID=HD,Number=0,Type=Flag,Description="Marker is on high density genotyping kit (50K density or greater).  The variant may have phenotype associations present in dbGaP.">
+##INFO=<ID=GNO,Number=0,Type=Flag,Description="Genotypes available. The variant has individual genotype (in SubInd table).">
+##INFO=<ID=KGValidated,Number=0,Type=Flag,Description="1000 Genome validated">
+##INFO=<ID=KGPhase1,Number=0,Type=Flag,Description="1000 Genome phase 1 (incl. June Interim phase 1)">
+##INFO=<ID=KGPilot123,Number=0,Type=Flag,Description="1000 Genome discovery all pilots 2010(1,2,3)">
+##INFO=<ID=KGPROD,Number=0,Type=Flag,Description="Has 1000 Genome submission">
+##INFO=<ID=OTHERKG,Number=0,Type=Flag,Description="non-1000 Genome submission">
+##INFO=<ID=PH3,Number=0,Type=Flag,Description="HAP_MAP Phase 3 genotyped: filtered, non-redundant">
+##INFO=<ID=CDA,Number=0,Type=Flag,Description="Variation is interrogated in a clinical diagnostic assay">
+##INFO=<ID=LSD,Number=0,Type=Flag,Description="Submitted from a locus-specific database">
+##INFO=<ID=MTP,Number=0,Type=Flag,Description="Microattribution/third-party annotation(TPA:GWAS,PAGE)">
+##INFO=<ID=OM,Number=0,Type=Flag,Description="Has OMIM/OMIA">
+##INFO=<ID=NOC,Number=0,Type=Flag,Description="Contig allele not present in variant allele list. The reference sequence allele at the mapped position is not present in the variant allele list, adjusted for orientation.">
+##INFO=<ID=WTD,Number=0,Type=Flag,Description="Is Withdrawn by submitter If one member ss is withdrawn by submitter, then this bit is set.  If all member ss' are withdrawn, then the rs is deleted to SNPHistory">
+##INFO=<ID=NOV,Number=0,Type=Flag,Description="Rs cluster has non-overlapping allele sets. True when rs set has more than 2 alleles from different submissions and these sets share no alleles in common.">
+##INFO=<ID=GCF,Number=0,Type=Flag,Description="Has Genotype Conflict Same (rs, ind), different genotype.  N/N is not included.">
+##FILTER=<ID=NC,Description="Inconsistent Genotype Submission For At Least One Sample">
+##INFO=<ID=CLNHGVS,Number=.,Type=String,Description="Variant names from HGVS.    The order of these variants corresponds to the order of the info in the other clinical  INFO tags.">
+##INFO=<ID=CLNALLE,Number=.,Type=Integer,Description="Variant alleles from REF or ALT columns.  0 is REF, 1 is the first ALT allele, etc.  This is used to match alleles with other corresponding clinical (CLN) INFO tags.  A value of -1 indicates that no allele was found to match a corresponding HGVS allele name.">
+##INFO=<ID=CLNSRC,Number=.,Type=String,Description="Variant Clinical Chanels">
+##INFO=<ID=CLNORIGIN,Number=.,Type=String,Description="Allele Origin. One or more of the following values may be added: 0 - unknown; 1 - germline; 2 - somatic; 4 - inherited; 8 - paternal; 16 - maternal; 32 - de-novo; 64 - biparental; 128 - uniparental; 256 - not-tested; 512 - tested-inconclusive; 1073741824 - other">
+##INFO=<ID=CLNSRCID,Number=.,Type=String,Description="Variant Clinical Channel IDs">
+##INFO=<ID=CLNSIG,Number=.,Type=String,Description="Variant Clinical Significance, 0 - unknown, 1 - untested, 2 - non-pathogenic, 3 - probable-non-pathogenic, 4 - probable-pathogenic, 5 - pathogenic, 6 - drug-response, 7 - histocompatibility, 255 - other">
+##INFO=<ID=CLNDSDB,Number=.,Type=String,Description="Variant disease database name">
+##INFO=<ID=CLNDSDBID,Number=.,Type=String,Description="Variant disease database ID">
+##INFO=<ID=CLNDBN,Number=.,Type=String,Description="Variant disease name">
+##INFO=<ID=CLNACC,Number=.,Type=String,Description="Variant Accession and Versions">
+#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	JOE
+1	985955	rs199476396	G	C	.	.	RS=199476396;RSPOS=985955;dbSNPBuildID=136;SSR=0;SAO=1;VP=0x050260000000000002110100;GENEINFO=AGRN:375790;WGT=0;VC=SNV;PM;S3D;OTHERKG;LSD;OM;CLNALLE=1;CLNHGVS=NC_000001.10:g.985955G>C;CLNSRC=OMIM_Allelic_Variant;CLNORIGIN=1;CLNSRCID=103320.0001;CLNSIG=5;CLNDSDB=GeneReviews:MedGen:OMIM:Orphanet;CLNDSDBID=NBK1168:C1850792:254300:590;CLNDBN=Myasthenia\x2c_limb-girdle\x2c_familial;CLNACC=RCV000019902.26	GT	0/0
+1	1199489	rs207460006	G	A	.	.	RSPOS=1199489;dbSNPBuildID=136;SSR=0;SAO=0;VP=050060080000000002110100;GENEINFO=UBE2J2:118424;WGT=0;VC=SNV;PM;INT;OTHERKG;LSD;OM;CLNALLE=1;CLNHGVS=NC_000001.10:g.1199489G>A;CLNSRC=.;CLNORIGIN=2;CLNSRCID=.;CLNSIG=1;CLNDSDB=.;CLNDSDBID=.;CLNDBN=.;CLNACC=.	GT	0/0
+1	1959699	rs41307846	G	A	.	.	RS=41307846;RSPOS=1959699;dbSNPBuildID=127;SSR=0;SAO=1;VP=0x050260000000040116110100;GENEINFO=GABRD:2563;WGT=0;VC=SNV;PM;S3D;VLD;GNO;KGPhase1;KGPROD;OTHERKG;LSD;OM;CLNALLE=1;CLNHGVS=NC_000001.10:g.1959699G>A;CLNSRC=OMIM_Allelic_Variant;CLNORIGIN=1;CLNSRCID=137163.0002;CLNSIG=255|255|255;CLNDSDB=MedGen|MedGen|MedGen:OMIM;CLNDSDBID=C3150401|CN043549|C2751603:613060;CLNDBN=Generalized_epilepsy_with_febrile_seizures_plus_type_5|Epilepsy\x2c_juvenile_myoclonic_7|Epilepsy\x2c_idiopathic_generalized_10;CLNACC=RCV000017599.1|RCV000017600.1|RCV000022558.1;CAF=[0.9904,0.009642];COMMON=1	GT	0/0
+1	161276553	rs121913599	G	T	.	.	RS=121913599;RSPOS=161276553;RV;dbSNPBuildID=133;SSR=0;SAO=1;VP=0x050260000000000002110100;GENEINFO=MPZ:4359;WGT=0;VC=SNV;PM;S3D;OTHERKG;LSD;OM;CLNALLE=1;CLNHGVS=NC_000001.10:g.161276553G>T;CLNSRC=OMIM_Allelic_Variant;CLNORIGIN=1;CLNSRCID=159440.0021;CLNSIG=5;CLNDSDB=MedGen:OMIM:SNOMED_CT;CLNDSDBID=C0205713:180800:45853006;CLNDBN=Roussy-Lévy_syndrome;CLNACC=RCV000015250.24	GT	0/0
+1	235891375	rs80338665	CCGAACTTTCAACTGTATCAGAAGCATTATCTTCCACAAAATACATTCCACATTTAC	C	.	.	RS=80338665;RSPOS=235891376;RV;dbSNPBuildID=131;SSR=0;SAO=0;VP=0x050360000000000002110200;GENEINFO=LYST:1130;WGT=0;VC=DIV;PM;S3D;SLO;OTHERKG;LSD;OM;CLNALLE=1;CLNHGVS=NC_000001.10:g.235891376_235891431del56;CLNSRC=GeneReviews;CLNORIGIN=0;CLNSRCID=NBK5188;CLNSIG=5;CLNDSDB=GeneReviews:MedGen:OMIM:Orphanet:SNOMED_CT;CLNDSDBID=NBK5188:C0007965:214500:167:111396008;CLNDBN=Chédiak-Higashi_syndrome;CLNACC=RCV000033871.2	GT	0/0
+1	247587093	rs180177455	C	T	.	.	RS=180177455;RSPOS=247587093;dbSNPBuildID=135;SSR=0;SAO=0;VP=0x050160000000000002110100;GENEINFO=NLRP3:114548;WGT=0;VC=SNV;PM;SLO;OTHERKG;LSD;OM;CLNALLE=1;CLNHGVS=NC_000001.10:g.247587093C>T;CLNSRC=Unité_médicale_des_maladies_autoinflammatoires;CLNORIGIN=.;CLNSRCID=363;CLNSIG=1;CLNDSDB=MedGen:OMIM:Orphanet:SNOMED_CT;CLNDSDBID=C0343068:120100:47045:238687000;CLNDBN=Familial_cold_urticaria;CLNACC=RCV000084222.1	GT	0/0
+3	33063141	rs397515614	T	A	.	.	RS=397515614;RSPOS=33063141;RV;dbSNPBuildID=136;SSR=0;SAO=3;VP=0x050060000000000002110120;GENEINFO=GLB1:2720;WGT=0;VC=SNV;PM;OTHERKG;LSD;OM;CLNALLE=1;CLNHGVS=NC_000003.11:g.33063141T>A;CLNSRC=.;CLNORIGIN=2;CLNSRCID=.;CLNSIG=1;CLNDSDB=MedGen:OMIM:Orphanet:Orphanet:SNOMED_CT;CLNDSDBID=C0268272:230600:354:79256:18756002;CLNDBN=Juvenile_GM>1<_gangliosidosis;CLNACC=RCV000056404.1	GT	0/0
+X	31200832	rs3833412	A	AT	.	.	RS=3833412;RSPOS=31200832;RV;dbSNPBuildID=107;SSR=0;SAO=0;VP=0x050160080005000002100200;GENEINFO=DMD:1756;WGT=1;VC=DIV;PM;SLO;INT;ASP;OTHERKG;LSD;CLNALLE=0,1;CLNHGVS=NC_000023.10:g.31200833delT,NC_000023.10:g.31200833dupT;CLNSRC=.,.;CLNORIGIN=1,1;CLNSRCID=.,.;CLNSIG=2,2;CLNDSDB=.,.;CLNDSDBID=.,.;CLNDBN=AllHighlyPenetrant,AllHighlyPenetrant;CLNACC=RCV000080866.1,RCV000080867.1	GT	0/0
+X	89963314	rs2756884	T	G	.	.	RS=397515614;RSPOS=33063141;RV;dbSNPBuildID=136;SSR=0;SAO=3;VP=0x050060000000000002110120;GENEINFO=GLB1:2720;WGT=0;VC=SNV;PM;OTHERKG;LSD;OM;CLNALLE=1;CLNHGVS=NC_000003.11:g.33063141T>A;CLNSRC=.;CLNORIGIN=2;CLNSRCID=.;CLNSIG=1;CLNDSDB=MedGen:OMIM:Orphanet:Orphanet:SNOMED_CT;CLNDSDBID=C0268272:230600:354:79256:18756002;CLNDBN=Juvenile_GM>1<_gangliosidosis;CLNACC=RCV000056404.1	GT	0/0


=====================================
debian/testdata/test.cosmic.vcf
=====================================
@@ -0,0 +1,59 @@
+##fileformat=VCFv4.1
+##fileDate=20140729
+##source=freeBayes v0.9.14-17-g7696787-dirty
+##reference=/home/udp3f/cphg-home/udp_new/hg19_gatk.notag.fa
+##phasing=none
+##commandline="/home/udp3f/cphg-home/bin/freebayes -f /home/udp3f/cphg-home/udp_new/hg19_gatk.notag.fa --region 1 -b C12-8149.bwamem.sort.dedup.abra.realigned.sorted.bam S07-3816.bwamem.sort.dedup.abra.realigned.sorted.bam S12-13382.bwamem.sort.dedup.abra.realigned.sorted.bam S95-11305.bwamem.sort.dedup.abra.realigned.sorted.bam S95-4467.bwamem.sort.dedup.abra.realigned.sorted.bam --pooled-discrete --pooled-continuous --min-alternate-fraction 0.1 --genotype-qualities --report-genotype-likelihood-max --standard-filters --min-coverage 5 -v /m/cphg-quinlan3/cphg-quinlan3/udp/FM/abra/var/fm.1.minaaf.pc.vcf"
+##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of samples with data">
+##INFO=<ID=DP,Number=1,Type=Integer,Description="Total read depth at the locus">
+##INFO=<ID=DPB,Number=1,Type=Float,Description="Total read depth per bp at the locus; bases in reads overlapping / bases in haplotype">
+##INFO=<ID=AC,Number=A,Type=Integer,Description="Total number of alternate alleles in called genotypes">
+##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
+##INFO=<ID=AF,Number=A,Type=Float,Description="Estimated allele frequency in the range (0,1]">
+##INFO=<ID=RO,Number=1,Type=Integer,Description="Reference allele observation count, with partial observations recorded fractionally">
+##INFO=<ID=AO,Number=A,Type=Integer,Description="Alternate allele observations, with partial observations recorded fractionally">
+##INFO=<ID=PRO,Number=1,Type=Float,Description="Reference allele observation count, with partial observations recorded fractionally">
+##INFO=<ID=PAO,Number=A,Type=Float,Description="Alternate allele observations, with partial observations recorded fractionally">
+##INFO=<ID=QR,Number=1,Type=Integer,Description="Reference allele quality sum in phred">
+##INFO=<ID=QA,Number=A,Type=Integer,Description="Alternate allele quality sum in phred">
+##INFO=<ID=PQR,Number=1,Type=Float,Description="Reference allele quality sum in phred for partial observations">
+##INFO=<ID=PQA,Number=A,Type=Float,Description="Alternate allele quality sum in phred for partial observations">
+##INFO=<ID=SRF,Number=1,Type=Integer,Description="Number of reference observations on the forward strand">
+##INFO=<ID=SRR,Number=1,Type=Integer,Description="Number of reference observations on the reverse strand">
+##INFO=<ID=SAF,Number=A,Type=Integer,Description="Number of alternate observations on the forward strand">
+##INFO=<ID=SAR,Number=A,Type=Integer,Description="Number of alternate observations on the reverse strand">
+##INFO=<ID=SRP,Number=1,Type=Float,Description="Strand balance probability for the reference allele: Phred-scaled upper-bounds estimate of the probability of observing the deviation between SRF and SRR given E(SRF/SRR) ~ 0.5, derived using Hoeffding's inequality">
+##INFO=<ID=SAP,Number=A,Type=Float,Description="Strand balance probability for the alternate allele: Phred-scaled upper-bounds estimate of the probability of observing the deviation between SAF and SAR given E(SAF/SAR) ~ 0.5, derived using Hoeffding's inequality">
+##INFO=<ID=AB,Number=A,Type=Float,Description="Allele balance at heterozygous sites: a number between 0 and 1 representing the ratio of reads showing the reference allele to all reads, considering only reads from individuals called as heterozygous">
+##INFO=<ID=ABP,Number=A,Type=Float,Description="Allele balance probability at heterozygous sites: Phred-scaled upper-bounds estimate of the probability of observing the deviation between ABR and ABA given E(ABR/ABA) ~ 0.5, derived using Hoeffding's inequality">
+##INFO=<ID=RUN,Number=A,Type=Integer,Description="Run length: the number of consecutive repeats of the alternate allele in the reference genome">
+##INFO=<ID=RPP,Number=A,Type=Float,Description="Read Placement Probability: Phred-scaled upper-bounds estimate of the probability of observing the deviation between RPL and RPR given E(RPL/RPR) ~ 0.5, derived using Hoeffding's inequality">
+##INFO=<ID=RPPR,Number=1,Type=Float,Description="Read Placement Probability for reference observations: Phred-scaled upper-bounds estimate of the probability of observing the deviation between RPL and RPR given E(RPL/RPR) ~ 0.5, derived using Hoeffding's inequality">
+##INFO=<ID=EPP,Number=A,Type=Float,Description="End Placement Probability: Phred-scaled upper-bounds estimate of the probability of observing the deviation between EL and ER given E(EL/ER) ~ 0.5, derived using Hoeffding's inequality">
+##INFO=<ID=EPPR,Number=1,Type=Float,Description="End Placement Probability for reference observations: Phred-scaled upper-bounds estimate of the probability of observing the deviation between EL and ER given E(EL/ER) ~ 0.5, derived using Hoeffding's inequality">
+##INFO=<ID=DPRA,Number=A,Type=Float,Description="Alternate allele depth ratio.  Ratio between depth in samples with each called alternate allele and those without.">
+##INFO=<ID=ODDS,Number=1,Type=Float,Description="The log odds ratio of the best genotype combination to the second-best.">
+##INFO=<ID=GTI,Number=1,Type=Integer,Description="Number of genotyping iterations required to reach convergence or bailout.">
+##INFO=<ID=TYPE,Number=A,Type=String,Description="The type of allele, either snp, mnp, ins, del, or complex.">
+##INFO=<ID=CIGAR,Number=A,Type=String,Description="The extended CIGAR representation of each alternate allele, with the exception that '=' is replaced by 'M' to ease VCF parsing.  Note that INDEL alleles do not have the first matched base (which is provided by default, per the spec) referred to by the CIGAR.">
+##INFO=<ID=NUMALT,Number=1,Type=Integer,Description="Number of unique non-reference alleles in called genotypes at this position.">
+##INFO=<ID=MEANALT,Number=A,Type=Float,Description="Mean number of unique non-reference allele observations per sample with the corresponding alternate alleles.">
+##INFO=<ID=LEN,Number=A,Type=Integer,Description="allele length">
+##INFO=<ID=MQM,Number=A,Type=Float,Description="Mean mapping quality of observed alternate alleles">
+##INFO=<ID=MQMR,Number=1,Type=Float,Description="Mean mapping quality of observed reference alleles">
+##INFO=<ID=PAIRED,Number=A,Type=Float,Description="Proportion of observed alternate alleles which are supported by properly paired read fragments">
+##INFO=<ID=PAIREDR,Number=1,Type=Float,Description="Proportion of observed reference alleles which are supported by properly paired read fragments">
+##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
+##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality, the Phred-scaled marginal (or unconditional) probability of the called genotype">
+##FORMAT=<ID=GL,Number=G,Type=Float,Description="Genotype Likelihood, log10-scaled likelihoods of the data given the called genotype for each possible genotype generated from the reference and alternate alleles given the sample ploidy">
+##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
+##FORMAT=<ID=RO,Number=1,Type=Integer,Description="Reference allele observation count">
+##FORMAT=<ID=QR,Number=1,Type=Integer,Description="Sum of quality of the reference observations">
+##FORMAT=<ID=AO,Number=A,Type=Integer,Description="Alternate allele observation count">
+##FORMAT=<ID=QA,Number=A,Type=Integer,Description="Sum of quality of the alternate observations">
+##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence type as predicted by VEP. Format: Consequence|Codons|Amino_acids|Gene|SYMBOL|Feature|EXON|PolyPhen|SIFT|Protein_position|BIOTYPE|ALLELE_NUM">
+#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	C12	S07	S12	S95	S97
+1	10379	.	C	G	31.9382	.	AB=0;ABP=0;AC=2;AF=0.2;AN=10;AO=2;CIGAR=1X;DP=10;DPB=10;DPRA=1;EPP=7.35324;EPPR=20.3821;GTI=0;LEN=1;MEANALT=1;MQM=40;MQMR=39.5;NS=5;NUMALT=1;ODDS=1.2224;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=78;QR=294;RO=8;RPP=7.35324;RPPR=20.3821;RUN=1;SAF=2;SAP=7.35324;SAR=0;SRF=8;SRP=20.3821;SRR=0;TYPE=snp;CSQ=upstream_gene_variant|||ENSG00000223972|DDX11L1|ENST00000456328|||||processed_transcript|1,downstream_gene_variant|||ENSG00000227232|WASH7P|ENST00000488147|||||unprocessed_pseudogene|1,downstream_gene_variant|||ENSG00000227232|WASH7P|ENST00000541675|||||unprocessed_pseudogene|1,upstream_gene_variant|||ENSG00000223972|DDX11L1|ENST00000450305|||||transcribed_unprocessed_pseudogene|1,upstream_gene_variant|||ENSG00000223972|DDX11L1|ENST00000515242|||||transcribed_unprocessed_pseudogene|1,downstream_gene_variant|||ENSG00000227232|WASH7P|ENST00000538476|||||unprocessed_pseudogene|1,upstream_gene_variant|||ENSG00000223972|DDX11L1|ENST00000518655|||||transcribed_unprocessed_pseudogene|1,downstream_gene_variant|||ENSG00000227232|WASH7P|ENST00000438504|||||unprocessed_pseudogene|1,downstream_gene_variant|||ENSG00000227232|WASH7P|ENST00000423562|||||unprocessed_pseudogene|1	GT:GQ:DP:RO:QR:AO:QA:GL	1/1:8.57144:2:0:0:2:78:-7.41,-0.60206,0	0/0:11.4604:2:2:82:0:0:0,-0.60206,-7.79	0/0:17.481:4:4:141:0:0:0,-1.20412,-10	0/0:8.4334:1:1:34:0:0:0,-0.30103,-3.4	0/0:8.44172:1:1:37:0:0:0,-0.30103,-3.7
+1	120612002	.	CGG	C	1349.07	.	AB=0.0937907;ABP=3303.72;AC=5;AF=0.5;AN=10;AO=216;CIGAR=1M2D1M;DP=2303;DPB=2347.75;DPRA=0;EPP=32.3252;EPPR=116.747;GTI=3;LEN=2;MEANALT=5.2;MQM=46.912;MQMR=47.8155;NS=5;NUMALT=1;ODDS=15.5839;PAIRED=0.962963;PAIREDR=0.998539;PAO=85;PQA=2094.5;PQR=6741.5;PRO=223;QA=7908;QR=77049;RO=2054;RPP=70.6074;RPPR=346.492;RUN=1;SAF=124;SAP=13.3047;SAR=92;SRF=881;SRP=93.1507;SRR=1173;TYPE=del;CSQ=non_coding_exon_variant&nc_transcript_variant&feature_truncation|||ENSG00000134250|NOTCH2|ENST00000479412|1/14||||retained_intron|1,frameshift_variant&feature_truncation|||ENSG00000134250|NOTCH2|ENST00000256646|1/34|||6/2471|protein_coding|1	GT:GQ:DP:RO:QR:AO:QA:GL	0/1:0:433:376:14215:50:1789:-10,0,-10	0/1:-0:429:391:14775:31:1116:-10,0,-10	0/1:7.40941e-07:449:402:14845:43:1568:-10,0,-10	0/1:0:562:487:18208:64:2375:-10,0,-10	0/1:-0:430:398:15006:28:1060:-10,0,-10
+5	112175901	.	CAG	C	10246.9	.	AB=0.627329;ABP=71.0273;AC=2;AF=0.2;AN=10;AO=303;CIGAR=1M2D1M;DP=1434;DPB=1387.25;DPRA=0.76183;EPP=3.18946;EPPR=3.78033;GTI=0;LEN=2;MEANALT=1;MQM=59.9406;MQMR=59.9379;NS=5;NUMALT=1;ODDS=193.008;PAIRED=0.993399;PAIREDR=0.988475;PAO=80.5;PQA=2624.5;PQR=4594.5;PRO=132.5;QA=11692;QR=43795;RO=1128;RPP=6.17076;RPPR=16.5936;RUN=1;SAF=136;SAP=9.89738;SAR=167;SRF=546;SRP=5.50518;SRR=582;TYPE=del;CSQ=downstream_gene_variant|||ENSG00000134982|APC|ENST00000502371|||||retained_intron|1,frameshift_variant&feature_truncation|||ENSG00000134982|APC|ENST00000257430|16/16|||1537-1538/2843|protein_coding|1,downstream_gene_variant|||ENSG00000134982|APC|ENST00000514164|||||retained_intron|1,intron_variant&NMD_transcript_variant&feature_truncation|||ENSG00000258864|CTC-554D6.1|ENST00000520401||||-/94|nonsense_mediated_decay|1,frameshift_variant&feature_truncation|||ENSG00000134982|APC|ENST00000457016|16/16|||1537-1538/2843|protein_coding|1,downstream_gene_variant|||ENSG00000134982|APC|ENST00000507379||||-/1135|protein_coding|1,frameshift_variant&feature_truncation|||ENSG00000134982|APC|ENST00000508376|17/17|||1537-1538/2843|protein_coding|1,downstream_gene_variant|||ENSG00000134982|APC|ENST00000512211||||-/1305|protein_coding|1,3_prime_UTR_variant&NMD_transcript_variant&feature_truncation|||ENSG00000134982|APC|ENST00000508624|17/17|||-/287|nonsense_mediated_decay|1,downstream_gene_variant|||ENSG00000134982|APC|ENST00000504915||||-/272|protein_coding|1	GT:GQ:DP:RO:QR:AO:QA:GL	0/0:0:281:280:10881:0:0:0,-10,-10	0/1:0:254:130:5069:124:4800:-10,0,-10	0/1:0:229:50:1928:179:6892:-10,0,-10	0/0:0:239:238:9198:0:0:0,-10,-10	0/0:0:431:430:16719:0:0:0,-10,-10
+9	139277994	.	CGCT	C	320.337	.	AB=0.216216;ABP=54.7735;AC=4;AF=0.4;AN=10;AO=17;CIGAR=1M3D25M;DP=93;DPB=220.931;DPRA=0;EPP=4.1599;EPPR=4.1599;GTI=0;LEN=3;MEANALT=2.6;MQM=55.2353;MQMR=59.75;NS=5;NUMALT=1;ODDS=7.30364;PAIRED=1;PAIREDR=0.985294;PAO=112.5;PQA=3983;PQR=4718;PRO=131.5;QA=585;QR=2617;RO=68;RPP=4.1599;RPPR=4.1599;RUN=1;SAF=7;SAP=4.1599;SAR=10;SRF=38;SRP=5.05404;SRR=30;TYPE=del;CSQ=inframe_deletion|ggGAGCAGCAGCAGCAGCAGCAGCAGCAGC/ggGAGCAGCAGCAGCAGCAGCAGCAGC|GSSSSSSSSS/GSSSSSSSS|ENSG00000165684|SNAPC4|ENST00000298532|15/23|||533-542/1469|protein_coding|1	GT:GQ:DP:RO:QR:AO:QA:GL	0/1:106.695:6:2:80:4:150:-10,0,-6.96991	0/1:42.1067:27:22:839:4:108:-5.97186,0,-10	0/1:31.7227:14:10:378:2:78:-5.23218,0,-10	0/1:137.807:27:20:772:6:217:-10,0,-10	0/0:42.8462:19:14:548:1:32:0,-0.472859,-10


=====================================
debian/testdata/test.exac.vcf
=====================================
@@ -0,0 +1,78 @@
+##fileformat=VCFv4.0
+##fileStatus=!!!! This is a provisional file !!!!
+##fileDate=20130118
+##source=dbSNP
+##dbSNP_BUILD_ID=137
+##reference=GRCh37.p5
+##phasing=partial
+##variationPropertyDocumentationUrl=ftp://ftp.ncbi.nlm.nih.gov/snp/specs/dbSNP_BitField_latest.pdf	
+##INFO=<ID=RSPOS,Number=1,Type=Integer,Description="Chr position reported in dbSNP">
+##INFO=<ID=RV,Number=0,Type=Flag,Description="RS orientation is reversed">
+##INFO=<ID=VP,Number=1,Type=String,Description="Variation Property.  Documentation is at ftp://ftp.ncbi.nlm.nih.gov/snp/specs/dbSNP_BitField_latest.pdf">
+##INFO=<ID=GENEINFO,Number=1,Type=String,Description="Pairs each of gene symbol:gene id.  The gene symbol and id are delimited by a colon (:) and each pair is delimited by a vertical bar (|)">
+##INFO=<ID=dbSNPBuildID,Number=1,Type=Integer,Description="First dbSNP Build for RS">
+##INFO=<ID=SAO,Number=1,Type=Integer,Description="Variant Allele Origin: 0 - unspecified, 1 - Germline, 2 - Somatic, 3 - Both">
+##INFO=<ID=SSR,Number=1,Type=Integer,Description="Variant Suspect Reason Codes (may be more than one value added together) 0 - unspecified, 1 - Paralog, 2 - byEST, 4 - oldAlign, 8 - Para_EST, 16 - 1kg_failed, 1024 - other">
+##INFO=<ID=GMAF,Number=1,Type=Float,Description="Global Minor Allele Frequency [0, 0.5]; global population is 1000GenomesProject phase 1 genotype data from 629 individuals, released in the 11-23-2012 dataset">
+##INFO=<ID=WGT,Number=1,Type=Integer,Description="Weight, 00 - unmapped, 1 - weight 1, 2 - weight 2, 3 - weight 3 or more">
+##INFO=<ID=VC,Number=1,Type=String,Description="Variation Class">
+##INFO=<ID=PM,Number=0,Type=Flag,Description="Variant is Precious(Clinical,Pubmed Cited)">
+##INFO=<ID=TPA,Number=0,Type=Flag,Description="Provisional Third Party Annotation(TPA) (currently rs from PHARMGKB who will give phenotype data)">
+##INFO=<ID=PMC,Number=0,Type=Flag,Description="Links exist to PubMed Central article">
+##INFO=<ID=S3D,Number=0,Type=Flag,Description="Has 3D structure - SNP3D table">
+##INFO=<ID=SLO,Number=0,Type=Flag,Description="Has SubmitterLinkOut - From SNP->SubSNP->Batch.link_out">
+##INFO=<ID=NSF,Number=0,Type=Flag,Description="Has non-synonymous frameshift A coding region variation where one allele in the set changes all downstream amino acids. FxnClass = 44">
+##INFO=<ID=NSM,Number=0,Type=Flag,Description="Has non-synonymous missense A coding region variation where one allele in the set changes protein peptide. FxnClass = 42">
+##INFO=<ID=NSN,Number=0,Type=Flag,Description="Has non-synonymous nonsense A coding region variation where one allele in the set changes to STOP codon (TER). FxnClass = 41">
+##INFO=<ID=REF,Number=0,Type=Flag,Description="Has reference A coding region variation where one allele in the set is identical to the reference sequence. FxnCode = 8">
+##INFO=<ID=SYN,Number=0,Type=Flag,Description="Has synonymous A coding region variation where one allele in the set does not change the encoded amino acid. FxnCode = 3">
+##INFO=<ID=U3,Number=0,Type=Flag,Description="In 3' UTR Location is in an untranslated region (UTR). FxnCode = 53">
+##INFO=<ID=U5,Number=0,Type=Flag,Description="In 5' UTR Location is in an untranslated region (UTR). FxnCode = 55">
+##INFO=<ID=ASS,Number=0,Type=Flag,Description="In acceptor splice site FxnCode = 73">
+##INFO=<ID=DSS,Number=0,Type=Flag,Description="In donor splice-site FxnCode = 75">
+##INFO=<ID=INT,Number=0,Type=Flag,Description="In Intron FxnCode = 6">
+##INFO=<ID=R3,Number=0,Type=Flag,Description="In 3' gene region FxnCode = 13">
+##INFO=<ID=R5,Number=0,Type=Flag,Description="In 5' gene region FxnCode = 15">
+##INFO=<ID=OTH,Number=0,Type=Flag,Description="Has other variant with exactly the same set of mapped positions on NCBI refernce assembly.">
+##INFO=<ID=CFL,Number=0,Type=Flag,Description="Has Assembly conflict. This is for weight 1 and 2 variant that maps to different chromosomes on different assemblies.">
+##INFO=<ID=ASP,Number=0,Type=Flag,Description="Is Assembly specific. This is set if the variant only maps to one assembly">
+##INFO=<ID=MUT,Number=0,Type=Flag,Description="Is mutation (journal citation, explicit fact): a low frequency variation that is cited in journal and other reputable sources">
+##INFO=<ID=VLD,Number=0,Type=Flag,Description="Is Validated.  This bit is set if the variant has 2+ minor allele count based on frequency or genotype data.">
+##INFO=<ID=G5A,Number=0,Type=Flag,Description=">5% minor allele frequency in each and all populations">
+##INFO=<ID=G5,Number=0,Type=Flag,Description=">5% minor allele frequency in 1+ populations">
+##INFO=<ID=HD,Number=0,Type=Flag,Description="Marker is on high density genotyping kit (50K density or greater).  The variant may have phenotype associations present in dbGaP.">
+##INFO=<ID=GNO,Number=0,Type=Flag,Description="Genotypes available. The variant has individual genotype (in SubInd table).">
+##INFO=<ID=KGValidated,Number=0,Type=Flag,Description="1000 Genome validated">
+##INFO=<ID=KGPhase1,Number=0,Type=Flag,Description="1000 Genome phase 1 (incl. June Interim phase 1)">
+##INFO=<ID=KGPilot123,Number=0,Type=Flag,Description="1000 Genome discovery all pilots 2010(1,2,3)">
+##INFO=<ID=KGPROD,Number=0,Type=Flag,Description="Has 1000 Genome submission">
+##INFO=<ID=OTHERKG,Number=0,Type=Flag,Description="non-1000 Genome submission">
+##INFO=<ID=PH3,Number=0,Type=Flag,Description="HAP_MAP Phase 3 genotyped: filtered, non-redundant">
+##INFO=<ID=CDA,Number=0,Type=Flag,Description="Variation is interrogated in a clinical diagnostic assay">
+##INFO=<ID=LSD,Number=0,Type=Flag,Description="Submitted from a locus-specific database">
+##INFO=<ID=MTP,Number=0,Type=Flag,Description="Microattribution/third-party annotation(TPA:GWAS,PAGE)">
+##INFO=<ID=OM,Number=0,Type=Flag,Description="Has OMIM/OMIA">
+##INFO=<ID=NOC,Number=0,Type=Flag,Description="Contig allele not present in variant allele list. The reference sequence allele at the mapped position is not present in the variant allele list, adjusted for orientation.">
+##INFO=<ID=WTD,Number=0,Type=Flag,Description="Is Withdrawn by submitter If one member ss is withdrawn by submitter, then this bit is set.  If all member ss' are withdrawn, then the rs is deleted to SNPHistory">
+##INFO=<ID=NOV,Number=0,Type=Flag,Description="Rs cluster has non-overlapping allele sets. True when rs set has more than 2 alleles from different submissions and these sets share no alleles in common.">
+##INFO=<ID=GCF,Number=0,Type=Flag,Description="Has Genotype Conflict Same (rs, ind), different genotype.  N/N is not included.">
+##FILTER=<ID=NC,Description="Inconsistent Genotype Submission For At Least One Sample">
+##INFO=<ID=CLNHGVS,Number=.,Type=String,Description="Variant names from HGVS.    The order of these variants corresponds to the order of the info in the other clinical  INFO tags.">
+##INFO=<ID=CLNALLE,Number=.,Type=Integer,Description="Variant alleles from REF or ALT columns.  0 is REF, 1 is the first ALT allele, etc.  This is used to match alleles with other corresponding clinical (CLN) INFO tags.  A value of -1 indicates that no allele was found to match a corresponding HGVS allele name.">
+##INFO=<ID=CLNSRC,Number=.,Type=String,Description="Variant Clinical Chanels">
+##INFO=<ID=CLNORIGIN,Number=.,Type=String,Description="Allele Origin. One or more of the following values may be added: 0 - unknown; 1 - germline; 2 - somatic; 4 - inherited; 8 - paternal; 16 - maternal; 32 - de-novo; 64 - biparental; 128 - uniparental; 256 - not-tested; 512 - tested-inconclusive; 1073741824 - other">
+##INFO=<ID=CLNSRCID,Number=.,Type=String,Description="Variant Clinical Channel IDs">
+##INFO=<ID=CLNSIG,Number=.,Type=String,Description="Variant Clinical Significance, 0 - unknown, 1 - untested, 2 - non-pathogenic, 3 - probable-non-pathogenic, 4 - probable-pathogenic, 5 - pathogenic, 6 - drug-response, 7 - histocompatibility, 255 - other">
+##INFO=<ID=CLNDSDB,Number=.,Type=String,Description="Variant disease database name">
+##INFO=<ID=CLNDSDBID,Number=.,Type=String,Description="Variant disease database ID">
+##INFO=<ID=CLNDBN,Number=.,Type=String,Description="Variant disease name">
+##INFO=<ID=CLNACC,Number=.,Type=String,Description="Variant Accession and Versions">
+#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	JOE
+1	985955	rs199476396	G	C	.	.	RS=199476396;RSPOS=985955;dbSNPBuildID=136;SSR=0;SAO=1;VP=0x050260000000000002110100;GENEINFO=AGRN:375790;WGT=0;VC=SNV;PM;S3D;OTHERKG;LSD;OM;CLNALLE=1;CLNHGVS=NC_000001.10:g.985955G>C;CLNSRC=OMIM_Allelic_Variant;CLNORIGIN=1;CLNSRCID=103320.0001;CLNSIG=5;CLNDSDB=GeneReviews:MedGen:OMIM:Orphanet;CLNDSDBID=NBK1168:C1850792:254300:590;CLNDBN=Myasthenia\x2c_limb-girdle\x2c_familial;CLNACC=RCV000019902.26	GT	0/0
+1	1199489	rs207460006	G	A	.	.	RSPOS=1199489;dbSNPBuildID=136;SSR=0;SAO=0;VP=050060080000000002110100;GENEINFO=UBE2J2:118424;WGT=0;VC=SNV;PM;INT;OTHERKG;LSD;OM;CLNALLE=1;CLNHGVS=NC_000001.10:g.1199489G>A;CLNSRC=.;CLNORIGIN=2;CLNSRCID=.;CLNSIG=1;CLNDSDB=.;CLNDSDBID=.;CLNDBN=.;CLNACC=.	GT	0/0
+1	1959699	rs41307846	G	A	.	.	RS=41307846;RSPOS=1959699;dbSNPBuildID=127;SSR=0;SAO=1;VP=0x050260000000040116110100;GENEINFO=GABRD:2563;WGT=0;VC=SNV;PM;S3D;VLD;GNO;KGPhase1;KGPROD;OTHERKG;LSD;OM;CLNALLE=1;CLNHGVS=NC_000001.10:g.1959699G>A;CLNSRC=OMIM_Allelic_Variant;CLNORIGIN=1;CLNSRCID=137163.0002;CLNSIG=255|255|255;CLNDSDB=MedGen|MedGen|MedGen:OMIM;CLNDSDBID=C3150401|CN043549|C2751603:613060;CLNDBN=Generalized_epilepsy_with_febrile_seizures_plus_type_5|Epilepsy\x2c_juvenile_myoclonic_7|Epilepsy\x2c_idiopathic_generalized_10;CLNACC=RCV000017599.1|RCV000017600.1|RCV000022558.1;CAF=[0.9904,0.009642];COMMON=1	GT	0/0
+1	161276553	rs121913599	G	T	.	.	RS=121913599;RSPOS=161276553;RV;dbSNPBuildID=133;SSR=0;SAO=1;VP=0x050260000000000002110100;GENEINFO=MPZ:4359;WGT=0;VC=SNV;PM;S3D;OTHERKG;LSD;OM;CLNALLE=1;CLNHGVS=NC_000001.10:g.161276553G>T;CLNSRC=OMIM_Allelic_Variant;CLNORIGIN=1;CLNSRCID=159440.0021;CLNSIG=5;CLNDSDB=MedGen:OMIM:SNOMED_CT;CLNDSDBID=C0205713:180800:45853006;CLNDBN=Roussy-Lévy_syndrome;CLNACC=RCV000015250.24	GT	0/0
+1	247587093	rs180177455	C	T	.	.	RS=180177455;RSPOS=247587093;dbSNPBuildID=135;SSR=0;SAO=0;VP=0x050160000000000002110100;GENEINFO=NLRP3:114548;WGT=0;VC=SNV;PM;SLO;OTHERKG;LSD;OM;CLNALLE=1;CLNHGVS=NC_000001.10:g.247587093C>T;CLNSRC=Unité_médicale_des_maladies_autoinflammatoires;CLNORIGIN=.;CLNSRCID=363;CLNSIG=1;CLNDSDB=MedGen:OMIM:Orphanet:SNOMED_CT;CLNDSDBID=C0343068:120100:47045:238687000;CLNDBN=Familial_cold_urticaria;CLNACC=RCV000084222.1	GT	0/0
+9	112185056	rs397515614	C	G	.	.	RS=397515614;RSPOS=33063141;RV;dbSNPBuildID=136;SSR=0;SAO=3;VP=0x050060000000000002110120;GENEINFO=GLB1:2720;WGT=0;VC=SNV;PM;OTHERKG;LSD;OM;CLNALLE=1;CLNHGVS=NC_000003.11:g.33063141T>A;CLNSRC=.;CLNORIGIN=2;CLNSRCID=.;CLNSIG=1;CLNDSDB=MedGen:OMIM:Orphanet:Orphanet:SNOMED_CT;CLNDSDBID=C0268272:230600:354:79256:18756002;CLNDBN=Juvenile_GM>1<_gangliosidosis;CLNACC=RCV000056404.1	GT	0/0
+12	121432117	rs3833412	GC	G	.	.	RS=3833412;RSPOS=31200832;RV;dbSNPBuildID=107;SSR=0;SAO=0;VP=0x050160080005000002100200;GENEINFO=DMD:1756;WGT=1;VC=DIV;PM;SLO;INT;ASP;OTHERKG;LSD;CLNALLE=0,1;CLNHGVS=NC_000023.10:g.31200833delT,NC_000023.10:g.31200833dupT;CLNSRC=.,.;CLNORIGIN=1,1;CLNSRCID=.,.;CLNSIG=2,2;CLNDSDB=.,.;CLNDSDBID=.,.;CLNDBN=AllHighlyPenetrant,AllHighlyPenetrant;CLNACC=RCV000080866.1,RCV000080867.1	GT	0/0
+14	105420590	rs2756884	C	T	.	.	RS=397515614;RSPOS=33063141;RV;dbSNPBuildID=136;SSR=0;SAO=3;VP=0x050060000000000002110120;GENEINFO=GLB1:2720;WGT=0;VC=SNV;PM;OTHERKG;LSD;OM;CLNALLE=1;CLNHGVS=NC_000003.11:g.33063141T>A;CLNSRC=.;CLNORIGIN=2;CLNSRCID=.;CLNSIG=1;CLNDSDB=MedGen:OMIM:Orphanet:Orphanet:SNOMED_CT;CLNDSDBID=C0268272:230600:354:79256:18756002;CLNDBN=Juvenile_GM>1<_gangliosidosis;CLNACC=RCV000056404.1	GT	0/0


=====================================
debian/testdata/test.fusions.vcf
=====================================
@@ -0,0 +1,57 @@
+##fileformat=VCFv4.1
+##fileDate=20141003
+##reference=/shared/genomes/b37/full/human_g1k_v37.fasta
+##INFO=<ID=TOOL,Number=1,Type=String,Description="Tool used to generate variant call">
+##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
+##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">
+##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
+##INFO=<ID=STR,Number=.,Type=String,Description="Strand orientation of the adjacency in BEDPE format">
+##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation">
+##INFO=<ID=CIPOS,Number=2,Type=Integer,Description="Confidence interval around POS for imprecise variants">
+##INFO=<ID=CIEND,Number=2,Type=Integer,Description="Confidence interval around END for imprecise variants">
+##INFO=<ID=BKPTID,Number=.,Type=String,Description="ID of the assembled alternate allele in the assembly file">
+##INFO=<ID=PARID,Number=1,Type=String,Description="ID of partner breakend">
+##INFO=<ID=MATEID,Number=.,Type=String,Description="ID of mate breakends">
+##INFO=<ID=EVENT,Number=1,Type=String,Description="ID of event associated to breakend">
+##INFO=<ID=HOMLEN,Number=.,Type=Integer,Description="Length of base pair identical micro-homology at event breakpoints">
+##INFO=<ID=HOMSEQ,Number=.,Type=String,Description="Sequence of base pair identical micro-homology at event breakpoints">
+##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Somatic mutation">
+##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
+##INFO=<ID=AF,Number=A,Type=Float,Description="Allele frequency, for each ALT allele, in the same order as listed">
+##INFO=<ID=AN,Number=1,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
+##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of samples with data">
+##INFO=<ID=SUP,Number=.,Type=Integer,Description="Number of pieces of evidence supporting the variant across all samples">
+##INFO=<ID=PESUP,Number=.,Type=Integer,Description="Number of paired-end reads supporting the variant across all samples">
+##INFO=<ID=SRSUP,Number=.,Type=Integer,Description="Number of split reads supporting the variant across all samples">
+##INFO=<ID=EVTYPE,Number=.,Type=String,Description="Type of LUMPY evidence contributing to the variant call">
+##INFO=<ID=PRIN,Number=0,Type=Flag,Description="Indicates variant as the principal variant in a BEDPE pair">
+##ALT=<ID=DEL,Description="Deletion">
+##ALT=<ID=DUP,Description="Duplication">
+##ALT=<ID=INV,Description="Inversion">
+##ALT=<ID=DUP:TANDEM,Description="Tandem duplication">
+##ALT=<ID=INS,Description="Insertion of novel sequence">
+##ALT=<ID=CNV,Description="Copy number variable region">
+##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
+##FORMAT=<ID=SUP,Number=1,Type=Integer,Description="Number of pieces of evidence supporting the variant">
+##FORMAT=<ID=PE,Number=1,Type=Integer,Description="Number of paired-end reads supporting the variant">
+##FORMAT=<ID=SR,Number=1,Type=Integer,Description="Number of split reads supporting the variant">
+##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype quality">
+##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read depth">
+##FORMAT=<ID=CN,Number=1,Type=Integer,Description="Copy number genotype for imprecise events">
+##FORMAT=<ID=CNQ,Number=1,Type=Float,Description="Copy number genotype quality for imprecise events">
+##FORMAT=<ID=CNL,Number=.,Type=Float,Description="Copy number genotype likelihood form imprecise events">
+##FORMAT=<ID=NQ,Number=1,Type=Integer,Description="Phred style probability score that the variant is novel">
+##FORMAT=<ID=HAP,Number=1,Type=Integer,Description="Unique haplotype identifier">
+##FORMAT=<ID=AHAP,Number=1,Type=Integer,Description="Unique identifier of ancestral haplotype">
+##FORMAT=<ID=RO,Number=1,Type=Integer,Description="Reference allele observation count, with partial observations recorded fractionally">
+##FORMAT=<ID=AO,Number=A,Type=Integer,Description="Alternate allele observations, with partial observations recorded fractionally">
+##FORMAT=<ID=SQ,Number=1,Type=Float,Description="Phred-scaled probability that this site is variant (non-reference in this sample">
+##FORMAT=<ID=GL,Number=G,Type=Float,Description="Genotype Likelihood, log10-scaled likelihoods of the data given the called genotype for each possible genotype generated from the reference and alternate alleles given the sample ploidy">
+##VEP=v76 cache=/shared/external_bin/ensembl-tools-release-76/cache/homo_sapiens/76_GRCh37 db=.
+##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence type as predicted by VEP. Format: Consequence|Codons|Amino_acids|Gene|SYMBOL|Feature|EXON|PolyPhen|SIFT|Protein_position|BIOTYPE">
+#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	H_LS-E2-A14P-01A-31D-A19H-09	H_LS-E2-A14P-10A-01D-A19H-09
+1	1866375	4	T	<DEL>	920.58	.	TOOL=LUMPY;SVTYPE=DEL;SVLEN=-619;END=1866994;STR=+-:49;IMPRECISE;CIPOS=-1,20;CIEND=0,0;EVENT=4;SUP=49;PESUP=49;SRSUP=0;EVTYPE=PE;PRIN;CSQ=intron_variant&feature_truncation|||ENSG00000142609|C1orf222|ENST00000493964||||-/867|protein_coding	GT:SUP:PE:SR:GQ:DP:RO:AO:SQ:GL	1/1:26:26:0:0.00:28:2:26:221.74:-24,-8,-1	1/1:23:23:0:0.00:37:5:32:256.33:-27,-8,-1
+3	178906030	1233_2	T	[3:176909982[T	9.58	.	TOOL=LUMPY;SVTYPE=BND;STR=--:26;IMPRECISE;CIPOS=-39,0;CIEND=-28,0;MATEID=1233_1;EVENT=1233;SUP=15;PESUP=15;SRSUP=0;EVTYPE=PE;CSQ=intron_variant|||ENSG00000121879|PIK3CA|ENST00000468036||||-/118|protein_coding,intron_variant|||ENSG00000121879|PIK3CA|ENST00000477735||||-/21|protein_coding,intron_variant|||ENSG00000121879|PIK3CA|ENST00000263967||||-/1068|protein_coding	GT:SUP:PE:SR:GQ:DP:RO:AO:SQ:GL	0/1:15:15:0:0.51:124:94:29:9.58:-5,-4,-40	0/0:0:0:0:-0.00:90:90:0:.:-4,-20,-62
+3	176909982	1233_1	G	[3:178906030[G	9.58	.	TOOL=LUMPY;SVTYPE=BND;STR=--:26;IMPRECISE;CIPOS=-28,0;CIEND=-39,0;MATEID=1233_2;EVENT=1233;SUP=15;PESUP=15;SRSUP=0;EVTYPE=PE;PRIN;CSQ=intron_variant|||ENSG00000177565|TBL1XR1|ENST00000443315||||-/81|protein_coding,intron_variant|||ENSG00000177565|TBL1XR1|ENST00000457928||||-/514|protein_coding,intron_variant|||ENSG00000177565|TBL1XR1|ENST00000431674||||-/96|protein_coding,intron_variant|||ENSG00000177565|TBL1XR1|ENST00000413084||||-/33|protein_coding,intron_variant|||ENSG00000177565|TBL1XR1|ENST00000427349||||-/62|protein_coding,intron_variant|||ENSG00000177565|TBL1XR1|ENST00000422066||||-/87|protein_coding,intron_variant|||ENSG00000177565|TBL1XR1|ENST00000430069||||-/514|protein_coding,intron_variant|||ENSG00000177565|TBL1XR1|ENST00000431421||||-/55|protein_coding,intron_variant|||ENSG00000177565|TBL1XR1|ENST00000352800||||-/142|protein_coding,intron_variant|||ENSG00000177565|TBL1XR1|ENST00000422442||||-/68|protein_coding,intron_variant|||ENSG00000177565|TBL1XR1|ENST00000424913||||-/73|protein_coding,intron_variant|||ENSG00000177565|TBL1XR1|ENST00000450267||||-/120|protein_coding,intron_variant|||ENSG00000177565|TBL1XR1|ENST00000428970||||-/76|protein_coding,intron_variant|||ENSG00000177565|TBL1XR1|ENST00000437738||||-/142|protein_coding	GT:SUP:PE:SR:GQ:DP:RO:AO:SQ:GL	0/1:15:15:0:0.51:124:94:29:9.58:-5,-4,-40	0/0:0:0:0:-0.00:90:90:0:.:-4,-20,-62
+18	64284796	9629_1	A	A]X:6146702]	8.25	.	TOOL=LUMPY;SVTYPE=BND;STR=++:11;IMPRECISE;CIPOS=-1,1;CIEND=0,0;MATEID=9629_2;EVENT=9629;SUP=9;PESUP=8;SRSUP=1;EVTYPE=PE,SR;PRIN;CSQ=intergenic_variant||||||||||	GT:SUP:PE:SR:GQ:DP:RO:AO:SQ:GL	0/1:9:8:1:0.70:64:47:16:8.25:-3,-2,-20	0/0:0:0:0:-0.00:112:112:0:.:-5,-25,-78
+X	6146702	9629_2	C	C]18:64284796]	8.25	.	TOOL=LUMPY;SVTYPE=BND;STR=++:11;IMPRECISE;CIPOS=0,0;CIEND=-1,1;MATEID=9629_1;EVENT=9629;SUP=9;PESUP=8;SRSUP=1;EVTYPE=PE,SR;CSQ=upstream_gene_variant|||ENSG00000146938|NLGN4X|ENST00000381092||||-/816|protein_coding,5_prime_UTR_variant|||ENSG00000146938|NLGN4X|ENST00000381095|1/6|||-/816|protein_coding,upstream_gene_variant|||ENSG00000146938|NLGN4X|ENST00000469740|||||processed_transcript,upstream_gene_variant|||ENSG00000146938|NLGN4X|ENST00000381093||||-/836|protein_coding,upstream_gene_variant|||ENSG00000146938|NLGN4X|ENST00000538097||||-/816|protein_coding,upstream_gene_variant|||ENSG00000146938|NLGN4X|ENST00000275857||||-/816|protein_coding	GT:SUP:PE:SR:GQ:DP:RO:AO:SQ:GL	0/1:9:8:1:0.70:64:47:16:8.25:-3,-2,-20	0/0:0:0:0:-0.00:112:112:0:.:-5,-25,-78


=====================================
debian/testdata/test.sh
=====================================
@@ -0,0 +1,33 @@
+#!/bin/sh
+unset GZIP
+FQ=test.cnt.gz
+rm -f ${FQ}{,.gbi}
+
+lines=500000
+python tests/make-test-fastq.py $lines | bgzip -c > $FQ
+echo "indexing"
+time grabix index $FQ
+echo "indexed"
+python tests/test-fastq.py $FQ
+a=$(grabix grab test.cnt.gz $(($lines * 4)))
+b=$(zless $FQ | tail -1)
+if [ "$a" != "$b" ] ; then
+	echo FAIL last record
+fi
+rm -f ${FQ}{,.gbi}
+
+for V in `find . -name *.vcf -type f` ; do
+	rm -f ${V}.*
+	bgzip -f $V
+	grabix index ${V}.gz
+	sleep 1
+	exp=$(zgrep -cv "#" $V.gz)
+	obs=$(grabix size $V.gz)
+
+	if [ "$exp" != "$obs" ] ; then
+		echo "FAIL: $V: expected $exp lines found $obs"
+	else 
+		echo "OK $V"
+	fi
+	rm -f ${V}.*
+done


=====================================
debian/tests/control
=====================================
@@ -0,0 +1,3 @@
+Tests: run-unit-test
+Depends: @, tabix, time, less
+Restrictions: allow-stderr


=====================================
debian/tests/run-unit-test
=====================================
@@ -0,0 +1,18 @@
+#!/bin/bash
+set -e
+
+pkg=grabix
+
+if [ "$AUTOPKGTEST_TMP" = "" ] ; then
+  AUTOPKGTEST_TMP=`mktemp -d /tmp/${pkg}-test.XXXXXX`
+  trap "rm -rf $AUTOPKGTEST_TMP" 0 INT QUIT ABRT PIPE TERM
+fi
+
+cp -a /usr/share/doc/${pkg}/examples/* $AUTOPKGTEST_TMP
+
+cd $AUTOPKGTEST_TMP
+
+unset GZIP
+gunzip -r *
+
+sh testdata/test.sh


=====================================
debian/watch
=====================================
@@ -3,4 +3,4 @@ version=4
 # Release tag 1.4 is older as 0.1.5 and 0.1.6 -> so this is probably
 # tagged wrongly which is fixed here
 opts="uversionmangle=s/1\.4/0.1.4/" \
-  https://github.com/arq5x/grabix/releases .*/archive/@ANY_VERSION@@ARCHIVE_EXT@
+  https://github.com/arq5x/grabix/releases .*/archive/v?@ANY_VERSION@@ARCHIVE_EXT@


=====================================
grabix.cpp
=====================================
@@ -86,7 +86,7 @@ int create_grabix_index(string bgzf_file)
     int64_t offset = 0;
     while ((status = bgzf_getline(bgzf_fp, '\n', line)) >= 0)
     {
-        offset = bgzf_tell(bgzf_fp);
+        offset = bgzf_tell (bgzf_fp);
         if (line->s[0] != '#')
             break;
         prev_offset = offset;
@@ -104,10 +104,10 @@ int create_grabix_index(string bgzf_file)
     {
         // grab the next line and store the offset
         eof = bgzf_getline(bgzf_fp, '\n', line);
-        offset = bgzf_tell(bgzf_fp);
+        offset = bgzf_tell (bgzf_fp);
         chunk_count++;
         // stop if we have encountered an empty line
-        if (eof < 0 || offset == prev_offset)
+        if (eof <= 0)
         {
             if (bgzf_check_EOF(bgzf_fp) == 1) {
                 if (offset > prev_offset) {
@@ -116,7 +116,7 @@ int create_grabix_index(string bgzf_file)
                 }
                 break;
             }
-            break;
+            //break;
         }
         // store the offset of this chunk start
         else if (chunk_count == CHUNK_SIZE)
@@ -214,7 +214,7 @@ int grab(string bgzf_file, int64_t from_line, int64_t to_line)
         line->l = 0;
         line->m = 0;
 
-        while ((status = bgzf_getline(bgzf_fp, '\n', line)) > 0)
+        while ((status = bgzf_getline(bgzf_fp, '\n', line)) != 0)
         {
             if (line->s[0] == '#')
                 printf("%s\n", line->s);


=====================================
grabix.h
=====================================
@@ -7,7 +7,7 @@ using namespace std;
 #include "bgzf.h"
 
 
-#define VERSION "0.1.8"
+#define VERSION "0.1.7"
 // we only want to store the offset for every 10000th
 // line. otherwise, were we to store the position of every
 // line in the file, the index could become very large for


=====================================
test.sh
=====================================
@@ -9,29 +9,13 @@ echo "indexing"
 time ./grabix index $FQ
 echo "indexed"
 python tests/test-fastq.py $FQ
-a=$(./grabix grab test.cnt.gz $(($lines * 4)))
+a=$(grabix grab test.cnt.gz $(($lines * 4)))
 b=$(zless $FQ | tail -1)
 if [[ "$a" != "$b" ]]; then
 	echo FAIL last record
-    exit 1
-else
-	echo OK last record
 fi
 rm -f ${FQ}{,.gbi}
 
-rm -f tests/empty.fastq.gz.gbi
-./grabix index tests/empty.fastq.gz
-
-a=$(cat tests/empty.fastq.gz.gbi | awk 'NR == 2')
-if [[ "$a" != "16" ]]; then
-    echo FAIL index wrong size
-    exit 1
-else
-    echo "OK index size"
-fi
-
-
-
 for V in  \
 	test.PLs.vcf \
 	test.auto_dom.no_parents.2.vcf \


=====================================
tests/empty.fastq.gz deleted
=====================================
Binary files a/tests/empty.fastq.gz and /dev/null differ



View it on GitLab: https://salsa.debian.org/med-team/grabix/compare/0d8b47a15fb24ca99e03bca92d2c557d5defd2f4...d8aeefaa6b81ddc92047caa1782dcf385a5a4d37

-- 
View it on GitLab: https://salsa.debian.org/med-team/grabix/compare/0d8b47a15fb24ca99e03bca92d2c557d5defd2f4...d8aeefaa6b81ddc92047caa1782dcf385a5a4d37
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20181028/34f2b1e7/attachment-0001.html>


More information about the debian-med-commit mailing list