[med-svn] [Git][med-team/sniffles][master] 9 commits: routine-update: New upstream version

Andreas Tille (@tille) gitlab at salsa.debian.org
Wed Jul 27 11:22:22 BST 2022



Andreas Tille pushed to branch master at Debian Med / sniffles


Commits:
c8596d8e by Steffen Moeller at 2022-04-09T13:40:56+02:00
routine-update: New upstream version

- - - - -
e743cb00 by Steffen Moeller at 2022-04-09T13:40:57+02:00
New upstream version 2.0.3
- - - - -
230a32a6 by Steffen Moeller at 2022-04-09T13:44:05+02:00
Update upstream source from tag 'upstream/2.0.3'

Update to upstream version '2.0.3'
with Debian dir 8ab1b3cbf0aada493c1d858b2d32eaf010937cb5

- - - - -
74d4d198 by Steffen Moeller at 2022-04-09T13:48:23+02:00
Clean-up of src/sniffles.egg-info after build

- - - - -
0d62850b by Andreas Tille at 2022-07-27T12:20:36+02:00
New upstream version 2.0.7
- - - - -
5fd154b7 by Andreas Tille at 2022-07-27T12:20:36+02:00
routine-update: New upstream version

- - - - -
912bc046 by Andreas Tille at 2022-07-27T12:20:44+02:00
Update upstream source from tag 'upstream/2.0.7'

Update to upstream version '2.0.7'
with Debian dir 9fd258805d2d2dec060d381c20724cc748f81a65
- - - - -
d8b7158e by Andreas Tille at 2022-07-27T12:20:44+02:00
routine-update: Standards-Version: 4.6.1

- - - - -
cb2a02ef by Andreas Tille at 2022-07-27T12:21:31+02:00
routine-update: Ready to upload to unstable

- - - - -


19 changed files:

- + .gitignore
- − .project
- README.md
- + annotations/human_GRCh38_no_alt_analysis_set.trf.bed
- + annotations/human_hs37d5.trf.bed
- + annotations/source.txt
- debian/changelog
- debian/control
- debian/rules
- setup.cfg
- src/sniffles/cluster.py
- src/sniffles/config.py
- src/sniffles/leadprov.py
- src/sniffles/parallel.py
- src/sniffles/postprocessing.py
- src/sniffles/sniffles
- src/sniffles/sv.py
- src/sniffles/util.py
- src/sniffles/vcf.py


Changes:

=====================================
.gitignore
=====================================
@@ -0,0 +1,3 @@
+.DS_Store
+src/sniffles.egg-info
+dist


=====================================
.project deleted
=====================================
@@ -1,11 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<projectDescription>
-	<name>Sniffles</name>
-	<comment></comment>
-	<projects>
-	</projects>
-	<buildSpec>
-	</buildSpec>
-	<natures>
-	</natures>
-</projectDescription>


=====================================
README.md
=====================================
@@ -6,7 +6,9 @@ To call SVs from long read alignments (PacBio / ONT), you can use:
 
 `sniffles -i mapped_input.bam -v output.vcf`
 
-(see sniffles --help or below for full usage information)
+For improved calling in repetitive regions, Sniffles2 accepts a tandem repeat annotations file using the option `--tandem-repeats annotations.bed`. Sniffles2 compatible tandem repeat annotations for human references can be downloaded from the [annotations/ folder](https://github.com/fritzsedlazeck/Sniffles/tree/master/annotations).
+
+(see sniffles --help or below for full usage information).
 
 ## Installation
 You can install Sniffles2 using pip or conda using:
@@ -15,7 +17,11 @@ You can install Sniffles2 using pip or conda using:
 
 or
 
-`conda install sniffles`
+`conda install sniffles=2.0 `
+
+If you previously installed Sniffles1 using conda and want to upgrade to Sniffles2, you can use:
+
+`conda update sniffles=2.0`
 
 ## Requirements
 * Python >= 3.7
@@ -25,24 +31,31 @@ or
 * python==3.9.5
 * pysam==0.16.0.1
 
+## Citation
+Please cite our paper at:
+
+https://www.nature.com/articles/s41592-018-0001-7
+
+A new preprint for the new methods and improvements introduced with Sniffles2 is here: https://www.biorxiv.org/content/10.1101/2022.04.04.487055v1 
+
 ## Use-Cases / Modes
 
 ### A. General (all Modes)
 * To output deletion (DEL SV) sequences, the reference genome (.fasta) must be specified using e.g. `--reference reference.fasta`.
-* Sniffles2 supports optionally specifying tandem repeat region annotations (.bed), which can improve calling in these regions `--tandem-repeats annotations.bed`. Sniffles2 tandem repeat annotations are compatible with those from pbsv, which for human references can be downloaded at their [GitHub repository](https://github.com/PacificBiosciences/pbsv/blob/master/annotations/).
+* Sniffles2 supports optionally specifying tandem repeat region annotations (.bed), which can improve calling in these regions `--tandem-repeats annotations.bed`. Sniffles2 compatible tandem repeat annotations for human references can be found in the [annotations/ folder](https://github.com/fritzsedlazeck/Sniffles/tree/master/annotations).
 * Sniffles2 is fully parallelized and uses 4 threads by default. This value can be adapted using e.g. `--threads 4` as option. Memory requirements will increase with the number of threads used.
-* To output read names in SNF and VCF files, the `--output-rnmaes` option is required.
+* To output read names in SNF and VCF files, the `--output-rnames` option is required.
 
 ### B. Multi-Sample SV Calling (Trios, Populations)
 Multi-sample SV calling using Sniffles2 population mode works in two steps:
 
-1. Call SV candidates and create an associated .snf file for each sample: `sniffles2 --input sample1.bam --snf sample1.snf`
-2. Combined calling using multiple .snf files into a single .vcf: `sniffles2 --input sample1.snf sample2.snf ... sampleN.snf --vcf multisample.vcf`
+1. Call SV candidates and create an associated .snf file for each sample: `sniffles --input sample1.bam --snf sample1.snf`
+2. Combined calling using multiple .snf files into a single .vcf: `sniffles --input sample1.snf sample2.snf ... sampleN.snf --vcf multisample.vcf`
 
 Alternatively, for step 2. you can supply a .tsv file, containing a list of .snf files, and custom sample ids in an optional second column (one sample per line), .e.g.:
-2. Combined calling using a .tsv as sample list: `sniffles2 --input snf_files_list.tsv --vcf multisample.vcf`
+2. Combined calling using a .tsv as sample list: `sniffles --input snf_files_list.tsv --vcf multisample.vcf`
 
-### C. Non-Germline SV Calling (Somatic) 
+### C. Non-Germline SV Calling (Somatic)
 To call non-germline SVs (i.e. somatic/mosaic) SVs, the *--non-germline* option should be added, i.e.:
 
 `sniffles --input mapped_input.bam --vcf output.vcf --non-germline`
@@ -57,5 +70,4 @@ Example command, to determine the genotype of each SV in *input_known_svs.vcf* f
 ### Input / Output
 * .bam or .cram files containing long read alignments (i.e. from minimap2 or ngmlr) are supported as input
 * .vcf.gz (bgzipped+tabix indexed) output is supported
-* Simultaneous output of both .vcf and .snf file (for multi-sample calling) is supported 
-
+* Simultaneous output of both .vcf and .snf file (for multi-sample calling) is supported


=====================================
annotations/human_GRCh38_no_alt_analysis_set.trf.bed
=====================================
The diff for this file was not included because it is too large.

=====================================
annotations/human_hs37d5.trf.bed
=====================================
The diff for this file was not included because it is too large.

=====================================
annotations/source.txt
=====================================
@@ -0,0 +1,2 @@
+Source for human_hs37d5.trf.bed: https://github.com/PacificBiosciences/pbsv/tree/master/annotations
+Source for human_GRCh38_no_alt_analysis_set.trf.bed: https://github.com/PacificBiosciences/pbsv/tree/master/annotations


=====================================
debian/changelog
=====================================
@@ -1,3 +1,18 @@
+sniffles (2.0.7-1) unstable; urgency=medium
+
+  * Team upload.
+  * New upstream version
+  * Standards-Version: 4.6.1 (routine-update)
+
+ -- Andreas Tille <tille at debian.org>  Wed, 27 Jul 2022 12:20:51 +0200
+
+sniffles (2.0.3-1) unstable; urgency=medium
+
+  * Team upload.
+  * New upstream version
+
+ -- Steffen Möller <moeller at debian.org>  Sat, 09 Apr 2022 13:41:24 +0200
+
 sniffles (2.0.2-1) unstable; urgency=medium
 
   * Team upload.


=====================================
debian/control
=====================================
@@ -9,7 +9,7 @@ Build-Depends: debhelper-compat (= 13),
                python3-pep517,
                pybuild-plugin-pyproject,
                python3-setuptools,
-Standards-Version: 4.6.0
+Standards-Version: 4.6.1
 Vcs-Browser: https://salsa.debian.org/med-team/sniffles
 Vcs-Git: https://salsa.debian.org/med-team/sniffles.git
 Homepage: https://github.com/fritzsedlazeck/Sniffles


=====================================
debian/rules
=====================================
@@ -11,3 +11,7 @@ export DEB_BUILD_MAINT_OPTIONS = hardening=+all
 #ifeq (,$(filter nocheck,$(DEB_BUILD_OPTIONS)))
 #	python3 build/scripts-`py3versions -vd`/sniffles -m test_set/reads_region.bam -v test.vcf
 #endif
+
+override_dh_auto_clean:
+	dh_auto_clean
+	rm -rf src/sniffles.egg-info


=====================================
setup.cfg
=====================================
@@ -1,6 +1,6 @@
 [metadata]
 name = sniffles
-version = 2.0.2
+version = 2.0.7
 author = Moritz Smolka
 author_email = moritz.g.smolka at gmail.com
 description = A fast structural variation caller for long-read sequencing data


=====================================
src/sniffles/cluster.py
=====================================
@@ -193,8 +193,6 @@ def resolve(svtype,leadtab_provider,config,tr):
                 tr_start,tr_end=tr[tr_index]
             if seed > tr_start and seed < tr_end:
                 within_tr=True
-            #print(within_tr,seed,tr_index,len(tr))
-        #within_tr=True
 
         if svtype=="INS":
             leads=[lead for lead in leadtab[seed] if lead.svlen!=None]
@@ -278,10 +276,10 @@ def resolve(svtype,leadtab_provider,config,tr):
 
                 merge_inner(cluster,merge_inner_threshold)
 
-            if not cluster.repeat and not config.dev_no_resplit:
+            if not config.dev_no_resplit_repeat and not config.dev_no_resplit:
                 for new_cluster in resplit(cluster,
                                            prop=lambda lead: lead.svlen,
-                                           binsize=config.resplit_binsize,
+                                           binsize=config.cluster_resplit_binsize,
                                            merge_threshold_min=config.minsvlen,
                                            merge_threshold_frac=config.cluster_merge_len):
                     yield new_cluster
@@ -311,7 +309,7 @@ def resolve_block_groups(svtype,svcands,groups_initial,config):
                 #TODO: Favor bigger groups in placement
                 dist=abs(group.pos_mean - svcand.pos) + abs(abs(group.len_mean) - abs(svcand.svlen))
                 minlen=float(min(abs(group.len_mean),abs(svcand.svlen)))
-                if minlen>0 and dist < best_dist and dist <= config.combine_match * math.sqrt(minlen):
+                if minlen>0 and dist < best_dist and dist <= config.combine_match * math.sqrt(minlen) and dist <= config.combine_match_max:
                     if not config.combine_separate_intra or not svcand.sample_internal_id in group.included_samples:
                         best_group=group
                         best_dist=dist


=====================================
src/sniffles/config.py
=====================================
@@ -16,8 +16,8 @@ import argparse
 from sniffles import util
 
 VERSION="Sniffles2"
-BUILD="2.0.2"
-SNF_VERSION="S2_rc3"
+BUILD="2.0.7"
+SNF_VERSION="S2_rc4"
 
 class ArgFormatter(argparse.ArgumentDefaultsHelpFormatter, argparse.RawDescriptionHelpFormatter):
     pass
@@ -77,9 +77,9 @@ def from_cmdline():
     filter_args.add_argument("--minsupport", metavar="auto", type=str, help="Minimum number of supporting reads for a SV to be reported (default: automatically choose based on coverage)", default="auto")
     filter_args.add_argument("--minsupport-auto-mult", metavar="0.1/0.025", type=float, help="Coverage based minimum support multiplier for germline/non-germline modes (only for auto minsupport) ", default=None)
     filter_args.add_argument("--minsvlen", metavar="N", type=int, help="Minimum SV length (in bp)", default=35)
-    filter_args.add_argument("--minsvlen-screen-ratio", metavar="N", type=float, help="Minimum length for SV candidates (as fraction of --minsvlen)", default=0.95)
+    filter_args.add_argument("--minsvlen-screen-ratio", metavar="N", type=float, help="Minimum length for SV candidates (as fraction of --minsvlen)", default=0.9)
     filter_args.add_argument("--mapq", metavar="N", type=int, help="Alignments with mapping quality lower than this value will be ignored", default=25)
-    filter_args.add_argument("--no-qc", help="Output all SV candidates, disregarding quality control steps.", default=False, action="store_true")
+    filter_args.add_argument("--no-qc", "--qc-output-all", help="Output all SV candidates, disregarding quality control steps.", default=False, action="store_true")
     filter_args.add_argument("--qc-stdev", help="Apply filtering based on SV start position and length standard deviation", metavar="True", type=tobool, default=True)
     filter_args.add_argument("--qc-stdev-abs-max", help="Maximum standard deviation for SV length and size (in bp)", metavar="N", type=int, default=500)
     filter_args.add_argument("--qc-strand", help="Apply filtering based on strand support of SV calls", metavar="False", type=tobool, default=False)
@@ -89,7 +89,7 @@ def from_cmdline():
     filter_args.add_argument("--long-del-coverage", help="Long deletions with central coverage (in relation to upstream/downstream coverage) higher than this value will be filtered (Not applicable for --non-germline)", metavar="0.66", type=float, default=0.66)
     filter_args.add_argument("--long-dup-length", help="Duplication SVs longer than this value are subjected to central coverage increase-based filtering (Not applicable for --non-germline)", metavar="50000", type=int, default=50000)
     filter_args.add_argument("--long-dup-coverage", help="Long duplications with central coverage (in relation to upstream/downstream coverage) lower than this value will be filtered (Not applicable for --non-germline)", metavar="1.33", type=float, default=1.33)
-    filter_args.add_argument("--max-splits-kb", metavar="N", type=int, help="Additional number of splits per kilobase read sequence allowed before reads are ignored", default=0.1)
+    filter_args.add_argument("--max-splits-kb", metavar="N", type=float, help="Additional number of splits per kilobase read sequence allowed before reads are ignored", default=0.1)
     filter_args.add_argument("--max-splits-base", metavar="N", type=int, help="Base number of splits allowed before reads are ignored (in addition to --max-splits-kb)", default=3)
     filter_args.add_argument("--min-alignment-length", metavar="N", type=int, help="Reads with alignments shorter than this length (in bp) will be ignored", default=1000)
     filter_args.add_argument("--phase-conflict-threshold", metavar="F", type=float, help="Maximum fraction of conflicting reads permitted for SV phase information to be labelled as PASS (only for --phase)", default=0.1)
@@ -114,10 +114,10 @@ def from_cmdline():
     multi_args = parser.add_argument_group("Multi-Sample Calling / Combine parameters")
     multi_args.add_argument("--combine-high-confidence", metavar="F", type=float, help="Minimum fraction of samples in which a SV needs to have individually passed QC for it to be reported in combined output (a value of zero will report all SVs that pass QC in at least one of the input samples)", default=0.0)
     multi_args.add_argument("--combine-low-confidence", metavar="F", type=float, help="Minimum fraction of samples in which a SV needs to be present (failed QC) for it to be reported in combined output", default=0.2)
-    multi_args.add_argument("--combine-low-confidence-abs", metavar="N", type=int, help="Minimum absolute number of samples in which a SV needs to be present (failed QC) for it to be reported in combined output", default=3)
+    multi_args.add_argument("--combine-low-confidence-abs", metavar="N", type=int, help="Minimum absolute number of samples in which a SV needs to be present (failed QC) for it to be reported in combined output", default=2)
     multi_args.add_argument("--combine-null-min-coverage", metavar="N", type=int, help="Minimum coverage for a sample genotype to be reported as 0/0 (sample genotypes with coverage below this threshold at the SV location will be output as ./.)", default=5)
-    multi_args.add_argument("--combine-match", metavar="N", type=int, help="Maximum deviation of multiple SV's start/end position for them to be combined across samples. Given by max_dev=M*sqrt(min(SV_length_a,SV_length_b)), where M is this parameter.", default=500)
-    multi_args.add_argument("--combine-consensus", help="Output the consensus genotype of all samples", default=False, action="store_true")
+    multi_args.add_argument("--combine-match", metavar="N", type=int, help="Multiplier for maximum deviation of multiple SV's start/end position for them to be combined across samples. Given by max_dev=M*sqrt(min(SV_length_a,SV_length_b)), where M is this parameter.", default=250)
+    multi_args.add_argument("--combine-match-max", metavar="N", type=int, help="Upper limit for the maximum deviation computed for --combine-match, in bp.", default=1000)
     multi_args.add_argument("--combine-separate-intra", help="Disable combination of SVs within the same sample", default=False, action="store_true")
     multi_args.add_argument("--combine-output-filtered", help="Include low-confidence / putative non-germline SVs in multi-calling", default=False, action="store_true")
     #multi_args.add_argument("--combine-exhaustive", help="(DEV) Disable performance optimization in multi-calling", default=False, action="store_true")
@@ -130,8 +130,9 @@ def from_cmdline():
     postprocess_args.add_argument("--no-sort", help="Do not sort output VCF by genomic coordinates (may slightly improve performance)", default=False, action="store_true")
     postprocess_args.add_argument("--no-progress", help="Disable progress display", default=False, action="store_true")
     postprocess_args.add_argument("--quiet", help="Disable all logging, except errors", default=False, action="store_true")
-    postprocess_args.add_argument("--max-del-seq-len", metavar="N", type=str, help="Maximum deletion sequence length to be output. Deletion SVs longer than this value will be written to the output as symbolic SVs.", default=50000)
+    postprocess_args.add_argument("--max-del-seq-len", metavar="N", type=int, help="Maximum deletion sequence length to be output. Deletion SVs longer than this value will be written to the output as symbolic SVs.", default=50000)
     postprocess_args.add_argument("--symbolic", help="Output all SVs as symbolic, including insertions and deletions, instead of reporting nucleotide sequences.", default=False, action="store_true")
+    postprocess_args.add_argument("--allow-overwrite", help="Allow overwriting output files if already existing", default=False, action="store_true")
 
     developer_args = parser.add_argument_group("Developer parameters")
     developer_args.add_argument("--dev-cache", default=False, action="store_true", help=argparse.SUPPRESS)
@@ -144,13 +145,20 @@ def from_cmdline():
     developer_args.add_argument("--dev-seq-cache-maxlen", metavar="N", type=int, default=50000, help=argparse.SUPPRESS)
     developer_args.add_argument("--consensus-max-reads", metavar="N", type=int, default=20, help=argparse.SUPPRESS)
     developer_args.add_argument("--consensus-max-reads-bin", metavar="N", type=int, default=10, help=argparse.SUPPRESS)
+    developer_args.add_argument("--combine-consensus", help="Output the consensus genotype of all samples", default=False, action="store_true")
     developer_args.add_argument("--dev-dump-coverage", default=False, action="store_true", help=argparse.SUPPRESS)
     developer_args.add_argument("--dev-no-resplit", default=False, action="store_true", help=argparse.SUPPRESS)
+    developer_args.add_argument("--dev-no-resplit-repeat", default=False, action="store_true", help=argparse.SUPPRESS)
     developer_args.add_argument("--dev-skip-snf-validation", default=False, action="store_true", help=argparse.SUPPRESS)
     developer_args.add_argument("--low-memory", default=False, action="store_true", help=argparse.SUPPRESS)
     developer_args.add_argument("--repeat", default=False, action="store_true", help=argparse.SUPPRESS)
     developer_args.add_argument("--qc-nm", default=False, action="store_true", help=argparse.SUPPRESS)
     developer_args.add_argument("--qc-nm-max", metavar="F", type=float, default=0.2, help=argparse.SUPPRESS)
+    developer_args.add_argument("--coverage-updown-bins", metavar="N", type=int, default=5, help=argparse.SUPPRESS)
+    developer_args.add_argument("--coverage-shift-bins", metavar="N", type=int, default=3, help=argparse.SUPPRESS)
+    developer_args.add_argument("--coverage-shift-bins-min-aln-length", metavar="N", type=int, default=1000, help=argparse.SUPPRESS)
+    developer_args.add_argument("--cluster-binsize-combine-mult", metavar="N", type=int, default=5, help=argparse.SUPPRESS)
+    developer_args.add_argument("--cluster-resplit-binsize", metavar="N", type=int, default=20, help=argparse.SUPPRESS)
     #developer_args.add_argument("--qc-strand", help="(DEV)", default=False, action="store_true")
 
     config=parser.parse_args()
@@ -199,10 +207,8 @@ def from_cmdline():
         config.qc_nm=True
 
     config.coverage_binsize=config.cluster_binsize
-    config.coverage_binsize_combine=config.cluster_binsize*5
+    config.coverage_binsize_combine=config.cluster_binsize*config.cluster_binsize_combine_mult
 
-    config.coverage_updown_bins=5
-    config.coverage_shift_bins=3
 
     #INS Consensus parameters
     #config.consensus_max_reads=20
@@ -223,8 +229,8 @@ def from_cmdline():
 
     #Genotyping
     config.genotype_format="GT:GQ:DR:DV"
-    config.genotype_none=(".",".",0,0,0)
-    config.genotype_null=(0,0,0,0,0)
+    config.genotype_none=(".",".",0,0,0,None)
+    config.genotype_null=(0,0,0,0,0,None)
     config.genotype_min_z_score=5
     if config.genotype_ploidy!=2:
         util.fatal_error("Currently only --genotype-ploidy 2 is supported")
@@ -241,9 +247,9 @@ def from_cmdline():
 
     #Misc
     config.precise=25 #Max. sum of pos and length stdev for SVs to be labelled PRECISE
-    config.resplit_binsize=20
     config.tandem_repeat_region_pad=500
     config.id_prefix="Sniffles2."
+    config.phase_identifiers=["1","2"]
 
     config.dev_profile=False
 


=====================================
src/sniffles/leadprov.py
=====================================
@@ -212,50 +212,6 @@ def read_iterindels(read_id,read,contig,config,use_clips,read_nm):
         pos_read+=add_read*oplength
         pos_ref+=add_ref*oplength
 
-def read_iterindels_unoptimized(read_id,read,contig,config,use_clips):
-    minsvlen=config.minsvlen_screen
-    seq_cache_maxlen=config.dev_seq_cache_maxlen
-    qname=read.query_name
-    mapq=read.mapping_quality
-    strand="-" if read.is_reverse else "+"
-
-    #TODO: Parse CG tag (ultra long alignments), if present
-    pos_read=0
-    pos_ref=read.reference_start
-    for op,oplength in read.cigartuples:
-        if op==pysam.CMATCH or op==pysam.CEQUAL or op==pysam.CDIFF:
-            pos_read+=oplength
-            pos_ref+=oplength
-        elif op==pysam.CINS:
-            if oplength>=minsvlen:
-                #print(pos_read,pos_read+oplength)
-                #print(pos_read,pos_read+oplength,read.query_sequence[pos_read:pos_read+oplength])
-                if oplength <= seq_cache_maxlen:
-                    seq=read.query_sequence[pos_read:pos_read+oplength]
-                else:
-                    seq=None
-                yield Lead(read_id,qname,contig,pos_ref,pos_ref+0,pos_read,pos_read+oplength,strand,mapq,-1,"INLINE","INS",oplength,seq=seq)
-            pos_read+=oplength
-        elif op==pysam.CDEL:
-            pos_ref+=oplength
-            if oplength>=minsvlen:
-                yield Lead(read_id,qname,contig,pos_ref,pos_ref+oplength,pos_read,pos_read+0,strand,mapq,-1,"INLINE","DEL",-oplength)
-        elif op==pysam.CREF_SKIP:
-            pos_ref+=oplength
-        elif op==pysam.CSOFT_CLIP:
-            if use_clips and oplength >= config.long_ins_length:
-                yield Lead(read_id,qname,contig,pos_ref,pos_ref+0,pos_read,pos_read+oplength,strand,mapq,-1,"INLINE","INS",None)
-            pos_read+=oplength
-        elif op==pysam.CHARD_CLIP:
-            #pos_ref+=oplength
-            if use_clips and oplength >= config.long_ins_length:
-                yield Lead(read_id,qname,contig,pos_ref,pos_ref+0,pos_read,pos_read+oplength,strand,mapq,-1,"INLINE","INS",None)
-        elif op==pysam.CPAD:
-            pass
-        else:
-            print(f"Unknown OPType {op}")
-            return
-
 def read_itersplits_bnd(read_id,read,contig,config,read_nm):
     assert(read.is_supplementary)
     #SA:refname,pos,strand,CIGAR,MAPQ,NM
@@ -549,6 +505,7 @@ class LeadProvider:
         binsize=self.config.cluster_binsize
         coverage_binsize=self.config.coverage_binsize
         coverage_shift_bins=self.config.coverage_shift_bins
+        coverage_shift_min_aln_len=self.config.coverage_shift_bins_min_aln_length
         long_ins_threshold=self.config.long_ins_length*0.5
         qc_nm=self.config.qc_nm
         phase=self.config.phase
@@ -565,7 +522,8 @@ class LeadProvider:
             self.read_id+=1
             self.read_count+=1
 
-            if read.mapping_quality < mapq_min or read.is_secondary or read.query_alignment_length < alen_min:
+            alen=read.query_alignment_length
+            if read.mapping_quality < mapq_min or read.is_secondary or alen < alen_min:
                 continue
 
             has_sa=read.has_tag("SA")
@@ -602,9 +560,8 @@ class LeadProvider:
                 target_tab=self.covrtab_rev
             else:
                 target_tab=self.covrtab_fwd
-            covr_start_bin=(int(read.reference_start/coverage_binsize)+coverage_shift_bins)*coverage_binsize
-            covr_end_bin=(int(read_end/coverage_binsize)-coverage_shift_bins)*coverage_binsize
-
+            covr_start_bin=(int(read.reference_start/coverage_binsize)+coverage_shift_bins*(alen>=coverage_shift_min_aln_len))*coverage_binsize
+            covr_end_bin=(int(read_end/coverage_binsize)-coverage_shift_bins*(alen>=coverage_shift_min_aln_len))*coverage_binsize
 
             if covr_end_bin > covr_start_bin:
                 self.covrtab_min_bin=min(self.covrtab_min_bin,covr_start_bin)
@@ -621,40 +578,3 @@ class LeadProvider:
         else:
             cache_dir=self.config.dev_cache_dir
         return f"{cache_dir}/{os.path.basename(self.config.input)}_{contig}_{start}_{end}.pickle"
-
-    def dev_store_leadtab(self,contig,start,end,externals):
-        data={"externals":externals, "self": self}
-        filename=self.dev_leadtab_filename(contig,start,end)
-        with open(filename,"wb") as h:
-            pickle.dump(data,h)
-        print(f"(DEV/Cache) Dumped leadtab to {filename}")
-
-    def dev_load_leadtab(self,contig,start,end):
-        filename=self.dev_leadtab_filename(contig,start,end)
-
-        if not os.path.exists(filename):
-            return False
-
-        with open(filename,"rb") as h:
-            data=pickle.load(h)
-        for item in data["self"].__dict__:
-            self.__dict__[item]=data["self"].__dict__[item]
-        print(f"(DEV/Cache) Loaded leadtab from {filename}")
-        return data["externals"]
-
-    def dev_debug_graph(self,title):
-        import matplotlib.pyplot as plt
-        import seaborn as sns
-        print(title)
-        sns.set()
-        data=[]
-        for k,v in self.leadtab.items():
-            data.append(len(v))
-            if len(data)>50000:
-                break
-
-        plt.hist(data,bins=[i for i in range(0,20)])
-        #plt.savefig(filename)
-        plt.title(title)
-        plt.savefig(f"debug/{title}.png")
-        plt.close()


=====================================
src/sniffles/parallel.py
=====================================
@@ -293,7 +293,7 @@ def Main_Internal(proc_id,config,pipe):
                     for genotype_sv in genotype_svs_svtypes_bins[cand.svtype][bin]:
                         dist=abs(genotype_sv.pos - cand.pos) + abs(abs(genotype_sv.svlen) - abs(cand.svlen))
                         minlen=float(min(abs(genotype_sv.svlen),abs(cand.svlen)))
-                        if minlen>0 and dist < genotype_sv.genotype_match_dist and dist <= config.combine_match * math.sqrt(minlen):
+                        if minlen>0 and dist < genotype_sv.genotype_match_dist and dist <= config.combine_match * math.sqrt(minlen) and dist <= config.combine_match_max:
                             genotype_sv.genotype_match_sv=cand
                             genotype_sv.genotype_match_dist=dist
 
@@ -308,7 +308,7 @@ def Main_Internal(proc_id,config,pipe):
                 coverage=round(sum(coverage_list)/len(coverage_list))
                 svcall.genotypes={}
                 if coverage>0:
-                    svcall.genotypes[0]=(0,0,0,coverage,0)
+                    svcall.genotypes[0]=(0,0,0,coverage,0,None)
                 else:
                     svcall.genotypes[0]=config.genotype_none
 


=====================================
src/sniffles/postprocessing.py
=====================================
@@ -19,10 +19,12 @@ from sniffles import consensus
 import math
 
 def annotate_sv(svcall,config):
-    genotype_sv(svcall,config)
-
     if config.phase:
-        phase_sv(svcall,config)
+        phase=phase_sv(svcall,config)
+    else:
+        phase=None
+
+    genotype_sv(svcall,config,phase)
 
     if svcall.svtype=="INS" and not config.symbolic:
         merged_leads=[l for l in svcall.postprocess.cluster.leads if l.seq!=None]
@@ -77,13 +79,8 @@ def coverage_build_requests(calls,lead_provider,config):
             end=start+1
         else:
             end=svcall.pos+abs(svcall.svlen)
-        #if start > end:
-        #    assert(svcall.svtype=="DEL")
-        #    start,end=end,start
-        #add_request(svcall,"coverage_start",start+config.coverage_binsize,requests_for_coverage,config)
         add_request(svcall,"coverage_start",start-config.coverage_binsize,requests_for_coverage,config)
         add_request(svcall,"coverage_center",int((start+end)/2),requests_for_coverage,config)
-        #add_request(svcall,"coverage_end",end-config.coverage_binsize,requests_for_coverage,config)
         add_request(svcall,"coverage_end",end+config.coverage_binsize,requests_for_coverage,config)
         add_request(svcall,"coverage_upstream",start-config.coverage_binsize*config.coverage_updown_bins,requests_for_coverage,config)
         add_request(svcall,"coverage_downstream",end+config.coverage_binsize*config.coverage_updown_bins,requests_for_coverage,config)
@@ -125,9 +122,11 @@ def coverage_fulfill(requests_for_coverage,calls,lead_provider,config):
 def qc_sv_support(svcall,coverage_global,config):
     if config.minsupport == "auto":
         if not qc_support_auto(svcall,coverage_global,config):
+            svcall.filter="SUPPORT_MIN"
             return False
     else:
         if not qc_support_const(svcall,config):
+            svcall.filter="SUPPORT_MIN"
             return False
     return True
 
@@ -145,7 +144,11 @@ def qc_support_auto(svcall,coverage_global,config):
     #    coverage_list=[svcall.coverage_center]
     #else:
     coverage_list=[svcall.coverage_upstream,svcall.coverage_downstream]
-    coverage_list=[c for c in coverage_list if c!=None]
+    coverage_list=[c for c in coverage_list if c!=None and c!=0]
+    if len(coverage_list)==0:
+        coverage_list=[svcall.coverage_start,svcall.coverage_center,svcall.coverage_end]
+        coverage_list=[c for c in coverage_list if c!=None and c!=0]
+
     if len(coverage_list)==0:
         coverage_regional=coverage_global
     else:
@@ -165,34 +168,47 @@ def qc_sv(svcall,config):
     if config.qc_stdev:
         stdev_pos=svcall.get_info("STDEV_POS")
         if stdev_pos > config.qc_stdev_abs_max:
+            svcall.filter="STDEV_POS"
             return False
         if svcall.svtype!="BND" and stdev_pos / abs(svcall.svlen) > 2.0:
+            svcall.filter="STDEV_POS"
             return False
 
         stdev_len = svcall.get_info("STDEV_LEN")
         if stdev_len != None:
             if svcall.svtype != "BND" and stdev_len / abs(svcall.svlen) > 1.0:
+                svcall.filter="STDEV_LEN"
                 return False
-            if stdev_len > stdev_len > config.qc_stdev_abs_max:
+            if stdev_len > config.qc_stdev_abs_max:
+                svcall.filter="STDEV_LEN"
                 return False
 
+    if abs(svcall.svlen) < config.minsvlen:
+        svcall.filter="SVLEN_MIN"
+        return False
+
     #if (svcall.coverage_upstream != None and svcall.coverage_upstream < config.qc_coverage) or (svcall.coverage_downstream != None and svcall.coverage_downstream < config.qc_coverage):
     if svcall.svtype != "DEL" and svcall.svtype != "INS" and (svcall.coverage_center != None and svcall.coverage_center < config.qc_coverage):
+        svcall.filter="COV_MIN"
         return False
 
-    if svcall.svtype == "DEL" and abs(svcall.svlen) >= config.long_del_length and not config.non_germline:
+    if svcall.svtype == "DEL" and config.long_del_length != -1 and abs(svcall.svlen) >= config.long_del_length and not config.non_germline:
         if svcall.coverage_center != None and svcall.coverage_upstream != None and svcall.coverage_downstream != None and svcall.coverage_center > (svcall.coverage_upstream+svcall.coverage_downstream)/2.0 * config.long_del_coverage:
+            svcall.filter="COV_CHANGE"
             return False
     elif svcall.svtype=="INS" and ( (svcall.coverage_upstream != None and svcall.coverage_upstream < config.qc_coverage) or (svcall.coverage_downstream != None and svcall.coverage_downstream < config.qc_coverage)):
+        svcall.filter="COV_CHANGE"
         return False
-    elif svcall.svtype == "DUP" and abs(svcall.svlen) >= config.long_dup_length and not config.non_germline:
+    elif svcall.svtype == "DUP" and config.long_dup_length != -1 and abs(svcall.svlen) >= config.long_dup_length and not config.non_germline:
         if svcall.coverage_center != None and svcall.coverage_upstream != None and svcall.coverage_downstream != None and svcall.coverage_center < (svcall.coverage_upstream+svcall.coverage_downstream)/2.0 * config.long_dup_coverage:
+            svcall.filter="COV_CHANGE"
             return False
 
     return True
 
 def qc_sv_post_annotate(svcall,config):
     if (len(svcall.genotypes)==0 or (svcall.genotypes[0][0]!="." and svcall.genotypes[0][0]+svcall.genotypes[0][1]<2)) and (svcall.coverage_center != None and svcall.coverage_center < config.qc_coverage):
+        svcall.filter="COV_MIN"
         return False
 
     return True
@@ -217,7 +233,7 @@ def likelihood_ratio(q1,q2):
     else:
         return 0
 
-def genotype_sv(svcall,config):
+def genotype_sv(svcall,config,phase):
     normalization_target=250
     hom_ref_p=config.genotype_error
     het_p=(1.0/config.genotype_ploidy) # - config.genotype_error
@@ -246,7 +262,7 @@ def genotype_sv(svcall,config):
         else:
             coverage_list=[svcall.coverage_start,svcall.coverage_center,svcall.coverage_end]
 
-    coverage_list=[c for c in coverage_list if c!=None]
+    coverage_list=[c for c in coverage_list if c!=None and c!=0]
     if len(coverage_list)==0:
         return
     coverage+=round(sum(coverage_list)/len(coverage_list))
@@ -286,13 +302,14 @@ def genotype_sv(svcall,config):
 
     is_long_ins=(svcall.svtype=="INS" and svcall.svlen >= config.long_ins_length)
     if genotype_z_score < config.genotype_min_z_score and not config.non_germline and not is_long_ins:
-        svcall.filter="GT"
+        if svcall.filter=="PASS":
+            svcall.filter="GT"
 
     if is_long_ins and gt1==(0,0):
         a,b=".","."
     else:
         a,b=gt1
-    svcall.genotypes[0]=(a,b,genotype_quality,coverage-support,support) #f"{a}/{b}:{genotype_quality}:{coverage-support}:{support}"
+    svcall.genotypes[0]=(a,b,genotype_quality,coverage-support,support,phase)
     svcall.set_info("AF",af)
 
 def phase_sv(svcall,config):
@@ -315,7 +332,8 @@ def phase_sv(svcall,config):
         hp_filter="PASS"
 
     ps_filter="FAIL"
-    if hp != "NULL" and ps_support > 0 and float(other_ps_support)/(ps_support+other_ps_support) < config.phase_conflict_threshold:
+    if ps != "NULL" and ps_support > 0 and float(other_ps_support)/(ps_support+other_ps_support) < config.phase_conflict_threshold:
         ps_filter="PASS"
 
     svcall.set_info("PHASE",f"{hp},{ps},{hp_support},{ps_support},{hp_filter},{ps_filter}")
+    return (config.phase_identifiers.index(hp) if hp in config.phase_identifiers else None if hp_filter=="PASS" else None)


=====================================
src/sniffles/sniffles
=====================================
@@ -20,6 +20,7 @@ import collections
 import math
 import time
 import os
+from pathlib import Path
 import json
 
 import pysam
@@ -181,15 +182,19 @@ def Sniffles2_Main(config,processes):
         else:
             vcf_output_info_str=f"({', '.join(vcf_output_info)})"
 
-
-        if config.vcf_output_bgz:
-            if not config.sort:
-                util.fatal_error_main(".gz (bgzip) output is only supported with sorting enabled")
-            vcf_handle=pysam.BGZFile(config.vcf,"w")
+        if os.path.exists(config.vcf) and not config.allow_overwrite:
+            util.fatal_error_main(f"Output file '{config.vcf}' already exists! Use --allow-overwrite to ignore this check and overwrite.")
         else:
-            vcf_handle=open(config.vcf,"w")
+            if not Path(config.vcf).parent.exists():
+                Path(config.vcf).parent.mkdir(parents=True)
+            if config.vcf_output_bgz:
+                if not config.sort:
+                    util.fatal_error_main(".gz (bgzip) output is only supported with sorting enabled")
+                vcf_handle=pysam.BGZFile(config.vcf,"w")
+            else:
+                vcf_handle=open(config.vcf,"w")
 
-        vcf_out=vcf.VCF(config,vcf_handle)
+            vcf_out=vcf.VCF(config,vcf_handle)
 
         if config.mode=="call_sample" or config.mode=="combine":
             if config.reference!=None:
@@ -200,8 +205,12 @@ def Sniffles2_Main(config,processes):
 
     if config.snf != None:
         print(f"Opening for writing: {config.snf}")
-        snf_out=snf.SNFile(config,open(config.snf,"wb"))
-        snf_parts=[]
+
+        if os.path.exists(config.snf) and not config.allow_overwrite:
+            util.fatal_error_main(f"Output file '{config.snf}' already exists! Use --allow-overwrite to ignore this check and overwrite.")
+        else:
+            snf_out=snf.SNFile(config,open(config.snf,"wb"))
+            snf_parts=[]
 
     #
     # Plan multiprocessing tasks
@@ -621,3 +630,4 @@ if __name__ == "__main__":
                 proc.process.join()
             except:
                 pass
+        exit(1)


=====================================
src/sniffles/sv.py
=====================================
@@ -99,7 +99,7 @@ def call_from(cluster,config,keep_qc_fails,task):
 
     svlen=util.center(v.svlen for v in leads)
 
-    if abs(svlen) < config.minsvlen:
+    if abs(svlen) < config.minsvlen_screen:
         return
 
     #Count inline events only once per read, but split events as individual alignments, as in coverage calculation
@@ -125,12 +125,15 @@ def call_from(cluster,config,keep_qc_fails,task):
     support_fwd=sum(lead.strand == "+" for lead in leads)
     support_rev=len(leads) - support_fwd
 
+    filter="PASS"
     if config.qc_strand and (support_fwd==0 or support_rev==0):
+        filter="STRAND"
         qc=False
 
     if config.qc_nm:
         nm_mean=util.mean(v.nm for v in leads)
         if nm_mean > config.qc_nm_max:
+            filter="NM"
             qc=False
     else:
         nm_mean=-1
@@ -151,7 +154,7 @@ def call_from(cluster,config,keep_qc_fails,task):
                   ref="N",
                   alt=f"<{svtype}>",
                   qual=qual,
-                  filter="PASS",
+                  filter=filter,
                   info=dict(),
                   svtype=svtype,
                   svlen=svlen,
@@ -176,6 +179,8 @@ def call_from(cluster,config,keep_qc_fails,task):
     if stdev_len!=None:
         svcall.set_info("STDEV_LEN",stdev_len)
 
+    #svcall.set_info("CLUSTER_ID",cluster.id)
+
     task.sv_id+=1
 
     yield svcall
@@ -253,19 +258,19 @@ def call_group(svgroup,config,task):
         if config.output_rnames and cand.rnames!=None:
             rnames.extend(cand.rnames)
         if not 0 in cand.genotypes:
-            cand.genotypes[0]=(".",".",0,0,cand.support)
+            cand.genotypes[0]=(".",".",0,0,cand.support,None)
         if cand.sample_internal_id in genotypes:
             #Intra-sample merging
-            a,b,gt_qual,dr,dv=cand.genotypes[0]
-            curr_a,curr_b,curr_gt_qual,curr_dr,curr_dv,curr_id=genotypes[cand.sample_internal_id]
+            a,b,gt_qual,dr,dv,ps=cand.genotypes[0]
+            curr_a,curr_b,curr_gt_qual,curr_dr,curr_dv,curr_ps,curr_id=genotypes[cand.sample_internal_id]
             new_id=curr_id+","+config.id_prefix+cand.id
             if (curr_a==".") or (a != "." and (a,b) >= (curr_a,curr_b)):
-                genotypes[cand.sample_internal_id]=(a,b,gt_qual,dr,dv,new_id)
+                genotypes[cand.sample_internal_id]=(a,b,gt_qual,dr,dv,ps,new_id)
             else:
-                genotypes[cand.sample_internal_id]=(curr_a,curr_b,curr_gt_qual,curr_dr,curr_dv,new_id)
+                genotypes[cand.sample_internal_id]=(curr_a,curr_b,curr_gt_qual,curr_dr,curr_dv,curr_ps,new_id)
         else:
-            a,b,gt_qual,dr,dv=cand.genotypes[0]
-            genotypes[cand.sample_internal_id]=(a,b,gt_qual,dr,dv,config.id_prefix+cand.id)
+            a,b,gt_qual,dr,dv,ps=cand.genotypes[0]
+            genotypes[cand.sample_internal_id]=(a,b,gt_qual,dr,dv,ps,config.id_prefix+cand.id)
         genotyped_count+=1
 
     for sample_internal_id in sample_internal_ids:
@@ -273,9 +278,9 @@ def call_group(svgroup,config,task):
             continue
         coverage=svgroup.coverages_nonincluded[sample_internal_id]
         if coverage >= config.combine_null_min_coverage:
-            genotypes[sample_internal_id]=(0,0,0,coverage,0,"NULL")
+            genotypes[sample_internal_id]=(0,0,0,coverage,0,None,"NULL")
         else:
-            genotypes[sample_internal_id]=(".",".",0,coverage,0,"NULL")
+            genotypes[sample_internal_id]=(".",".",0,coverage,0,None,"NULL")
 
     if config.combine_consensus:
         genotypes_consensus={}
@@ -294,17 +299,31 @@ def call_group(svgroup,config,task):
         if cons_a!=1 and cons_b!=1:
             return None
 
+    svcall_pos=int(util.median(cand.pos for cand in svgroup.candidates))
+    svcall_svlen=int(util.median(cand.svlen for cand in svgroup.candidates))
+    svcall_alt=first_cand.alt
+    svcall_alt_mindist=abs(len(svcall_alt)-svcall_svlen)
+    if first_cand.svtype=="INS":
+        svcall_end=svcall_pos
+        for cand in svgroup.candidates:
+            dist=abs(len(cand.alt)-svcall_svlen)
+            if dist < svcall_alt_mindist:
+                svcall_alt_mindist=dist
+                svcall_alt=cand.alt
+    else:
+        svcall_end=svcall_pos+abs(svcall_svlen)
+
     svcall=SVCall(contig=first_cand.contig,
-                  pos=util.center(cand.pos for cand in svgroup.candidates),
+                  pos=svcall_pos,
                   id=f"{first_cand.svtype}.{task.sv_id:X}M{task.id:X}",
                   ref="N",
-                  alt=first_cand.alt,
+                  alt=svcall_alt,
                   qual=round(util.mean(int(cand.qual) for cand in svgroup.candidates)),
                   filter="PASS",
                   info=dict(),
                   svtype=first_cand.svtype,
-                  svlen=util.center(cand.svlen for cand in svgroup.candidates),
-                  end=util.center(cand.end for cand in svgroup.candidates),
+                  svlen=svcall_svlen,
+                  end=svcall_end,
                   genotypes=genotypes,
                   precise=sum(int(cand.precise) for cand in svgroup.candidates)/float(len(svgroup.candidates)) > 0.5,
                   support=round(util.mean(cand.support for cand in svgroup.candidates)),
@@ -315,10 +334,15 @@ def call_group(svgroup,config,task):
                   fwd=sum(cand.fwd for cand in svgroup.candidates),
                   rev=sum(cand.rev for cand in svgroup.candidates),
                   coverage_upstream=util.mean_or_none_round(cand.coverage_upstream for cand in svgroup.candidates if cand.coverage_upstream!=None),
+                  coverage_start=util.mean_or_none_round(cand.coverage_start for cand in svgroup.candidates if cand.coverage_start!=None),
                   coverage_center=util.mean_or_none_round(cand.coverage_center for cand in svgroup.candidates if cand.coverage_center!=None),
+                  coverage_end=util.mean_or_none_round(cand.coverage_end for cand in svgroup.candidates if cand.coverage_end!=None),
                   coverage_downstream=util.mean_or_none_round(cand.coverage_downstream for cand in svgroup.candidates if cand.coverage_downstream!=None) )
 
-    if abs(svcall.svlen) < config.minsvlen:
+    svcall.set_info("STDEV_POS",util.stdev(cand.pos for cand in svgroup.candidates))
+    svcall.set_info("STDEV_LEN",util.stdev(cand.svlen for cand in svgroup.candidates))
+
+    if abs(svcall.svlen) < config.minsvlen_screen:
         return None
 
     task.sv_id+=1


=====================================
src/sniffles/util.py
=====================================
@@ -27,7 +27,6 @@ def median_or_mode(nums):
     nums=list(nums)
     top=most_common(nums)
     if len(top)>1 and (top[0][0]-top[1][0]<2):
-    #if len(nums)>3:
         return median_noavg(nums)
     else:
         return median_modes(nums)


=====================================
src/sniffles/vcf.py
=====================================
@@ -23,12 +23,24 @@ def format_info(k,v):
         return f"{k}={v}"
 
 def format_genotype(gt):
-    if len(gt)==5:
-        a,b,qual,dr,dv=gt
-        return f"{a}/{b}:{qual}:{dr}:{dv}"
+    if len(gt)==6:
+        a,b,qual,dr,dv,ps=gt
+        if ps!=None and (a,b)==(0,1):
+            gt_sep="|"
+            if ps==1:
+                a,b=b,a
+        else:
+            gt_sep="/"
+        return f"{a}{gt_sep}{b}:{qual}:{dr}:{dv}"
     else:
-        a,b,qual,dr,dv,id=gt
-        return f"{a}/{b}:{qual}:{dr}:{dv}:{id}"
+        a,b,qual,dr,dv,ps,id=gt
+        if ps!=None and (a,b)==(0,1):
+            gt_sep="|"
+            if ps==1:
+                a,b=b,a
+        else:
+            gt_sep="/"
+        return f"{a}{gt_sep}{b}:{qual}:{dr}:{dv}:{id}"
 
 class VCF:
     def __init__(self,config,handle):
@@ -79,7 +91,14 @@ class VCF:
 
         self.write_header_line('FILTER=<ID=PASS,Description="All filters passed">')
         self.write_header_line('FILTER=<ID=GT,Description="Genotype filter">')
-        self.write_header_line('FILTER=<ID=CONS,Description="Low support for insertion consensus sequence">')
+        self.write_header_line('FILTER=<ID=SUPPORT_MIN,Description="Minimum read support filter">')
+        self.write_header_line('FILTER=<ID=STDEV_POS,Description="SV Breakpoint standard deviation filter">')
+        self.write_header_line('FILTER=<ID=STDEV_LEN,Description="SV length standard deviation filter">')
+        self.write_header_line('FILTER=<ID=COV_MIN,Description="Minimum coverage filter">')
+        self.write_header_line('FILTER=<ID=COV_CHANGE,Description="Coverage change filter">')
+        self.write_header_line('FILTER=<ID=STRAND,Description="Strand support filter">')
+        self.write_header_line('FILTER=<ID=SVLEN_MIN,Description="SV length filter">')
+        self.write_header_line('FILTER=<ID=NM,Description="Alignment noise level filter">')
         self.write_header_line('INFO=<ID=PRECISE,Number=0,Type=Flag,Description="Structural variation with precise breakpoints">')
         self.write_header_line('INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Structural variation with imprecise breakpoints">')
         self.write_header_line('INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Length of structural variation">')
@@ -95,17 +114,11 @@ class VCF:
         self.write_header_line('INFO=<ID=STRAND,Number=1,Type=String,Description="Strands of supporting reads for structural variant">')
         self.write_header_line('INFO=<ID=AC,Number=.,Type=Integer,Description="Allele count, summed up over all samples">')
         self.write_header_line('INFO=<ID=SUPP_VEC,Number=1,Type=String,Description="List of read support for all samples">')
-        #self.write_header_line('INFO=<ID=COVERAGE_REGIONAL,Number=A,Type=Float,Description="Coverage for local region">')
-        #self.write_header_line('INFO=<ID=COVERAGE_GLOBAL,Number=A,Type=Float,Description="Coverage for global region">')
-        #self.write_header_line('INFO=<ID=MINSUPPORT,Number=A,Type=Integer,Description="Determined minimum support threshold for this region">')
-        #self.write_header_line('INFO=<ID=COVERAGE_UPSTREAM,Number=A,Type=Float,Description="Coverage upstream of structural variation start">')
-        #self.write_header_line('INFO=<ID=COVERAGE_DOWNSTREAM,Number=A,Type=Float,Description="Coverage downstream of structural variation end">')
         self.write_header_line('INFO=<ID=CONSENSUS_SUPPORT,Number=1,Type=Integer,Description="Number of reads that support the generated insertion (INS) consensus sequence">')
         self.write_header_line('INFO=<ID=RNAMES,Number=.,Type=String,Description="Names of supporting reads (if enabled with --output-rnames)">')
         self.write_header_line('INFO=<ID=AF,Number=1,Type=Float,Description="Allele Frequency">')
         self.write_header_line('INFO=<ID=NM,Number=.,Type=Float,Description="Mean number of query alignment length adjusted mismatches of supporting reads">')
         self.write_header_line('INFO=<ID=PHASE,Number=.,Type=String,Description="Phasing information derived from supporting reads, represented as list of: HAPLOTYPE,PHASESET,HAPLOTYPE_SUPPORT,PHASESET_SUPPORT,HAPLOTYPE_FILTER,PHASESET_FILTER">')
-        #self.write_header_line('INFO=<ID=DBG_CLUSTER_SPAN,Number=A,Type=Float,Description="Debug info">')
 
         samples_header="\t".join(sample_id for internal_id,sample_id in self.config.sample_ids_vcf)
         self.write_raw(f"#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\t{samples_header}")
@@ -193,6 +206,8 @@ class VCF:
             call.ref="N"
             call.alt=f"<{call.svtype}>"
 
+        call.qual=max(0,min(60,call.qual))
+
         self.write_raw("\t".join(str(v) for v in [call.contig,pos,self.config.id_prefix+call.id,call.ref,call.alt,call.qual,call.filter,info_str,self.genotype_format]+sample_genotypes))
         self.call_count+=1
 



View it on GitLab: https://salsa.debian.org/med-team/sniffles/-/compare/8dbe88f681350cad411cf2bc4b3d577f30efd3a4...cb2a02efeac5ac958a7695a4906cadaaf737669d

-- 
View it on GitLab: https://salsa.debian.org/med-team/sniffles/-/compare/8dbe88f681350cad411cf2bc4b3d577f30efd3a4...cb2a02efeac5ac958a7695a4906cadaaf737669d
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20220727/ecbeb1a4/attachment-0001.htm>


More information about the debian-med-commit mailing list