[med-svn] [Git][med-team/sniffles][upstream] New upstream version 1.0.8+ds

Afif Elghraoui gitlab at salsa.debian.org
Sat Mar 24 20:23:09 UTC 2018


Afif Elghraoui pushed to branch upstream at Debian Med / sniffles


Commits:
cd703265 by Afif Elghraoui at 2018-03-24T15:01:36-04:00
New upstream version 1.0.8+ds
- - - - -


20 changed files:

- CMakeLists.txt
- README.md
- src/Alignment.cpp
- src/Alignment.h
- + src/ArgParseOutput.h
- src/CMakeLists.txt
- src/Genotyper/Genotyper.cpp
- src/Paramer.h
- src/Sniffles.cpp
- src/cluster/Cluster_SVs.cpp
- src/force_calling/Force_calling.cpp
- src/force_calling/VCF_parser.cpp
- src/print/BedpePrinter.cpp
- src/print/VCFPrinter.cpp
- src/sub/Breakpoint.cpp
- src/sub/Breakpoint.h
- src/sub/Detect_Breakpoints.cpp
- + test_set/README
- + test_set/expected_output.vcf
- + test_set/reads_region.bam


Changes:

=====================================
CMakeLists.txt
=====================================
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -1,13 +1,15 @@
 cmake_minimum_required(VERSION 2.8)
 project(Sniffles)
 
+option(STATIC "Build static binary" OFF)
+
  set( SNIF_VERSION_MAJOR 1 )
  set( SNIF_VERSION_MINOR 0 )
 IF(CMAKE_BUILD_TYPE STREQUAL "Debug")
     message(STATUS "Building in debug mode!")
-    set( SNIF_VERSION_BUILD 7-debug )
+    set( SNIF_VERSION_BUILD 8-debug )
 else()
-	set( SNIF_VERSION_BUILD 7 )
+	set( SNIF_VERSION_BUILD 8 )
 ENDIF()
 
 


=====================================
README.md
=====================================
--- a/README.md
+++ b/README.md
@@ -1,17 +1,14 @@
 # Sniffles
-Sniffles is a structural variation caller using third generation sequencing (PacBio or Oxford Nanopore). It detects all types of SVs (10bp+) using evidence from split-read alignments, high-mismatch regions, and coverage analysis. Please note the current version of Sniffles requires sorted output from BWA-MEM (use -M and -x parameter) or NGM-LR with the optional SAM attributes enabled! If you experience problems or have suggestions please contact: fritz.sedlazeck at gmail.com
+Sniffles is a structural variation caller using third generation sequencing (PacBio or Oxford Nanopore). It detects all types of SVs (10bp+) using evidence from split-read alignments, high-mismatch regions, and coverage analysis. Please note the current version of Sniffles requires sorted output from BWA-MEM (use -M and -x parameter) or NGMLR with the optional SAM attributes enabled! If you experience problems or have suggestions please contact: fritz.sedlazeck at gmail.com
 
 
 Please see our github wiki for more information (https://github.com/fritzsedlazeck/Sniffles/wiki)
 
 **************************************
-
-## NextGenMap-LR: (NGM-LR)
-
-
-Sniffles performs best with the mappings of NGM-LR our novel long read mapping method. 
+## NGMLR
+Sniffles performs best with the mappings of NGMLR our novel long read mapping method. 
 Please see:
-https://github.com/philres/nextgenmap-lr
+https://github.com/philres/ngmlr
 
 ****************************************
 ## Citation:
@@ -24,7 +21,7 @@ http://www.biorxiv.org/content/early/2017/07/28/169557
 [Accurate and fast detection of complex and nested structural variations using long read technologies](http://schatzlab.cshl.edu/presentations/2016/2016.10.28.BIODATA.PacBioSV.pdf)
 Biological Data Science, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 26 - 29.10.2016
 
-[NextGenMap-LR: Highly accurate read mapping of third generation sequencing reads for improved structural variation analysis](http://www.cibiv.at/~philipp_/files/gi2016_poster_phr.pdf) 
+[NGMLR: Highly accurate read mapping of third generation sequencing reads for improved structural variation analysis](http://www.cibiv.at/~philipp_/files/gi2016_poster_phr.pdf) 
 Genome Informatics 2016, Wellcome Genome Campus Conference Centre, Hinxton, Cambridge, UK, 19.09.-2.09.2016
 
 **************************************


=====================================
src/Alignment.cpp
=====================================
--- a/src/Alignment.cpp
+++ b/src/Alignment.cpp
@@ -51,6 +51,81 @@ void add_event(int pos, size_t & i, vector<differences_str> & events) {
 	events.insert(events.begin() + i, ev);
 }
 
+/*
+ * :6-ata:10+gtc:4*at:3,
+ * -ata: del
+ * +gtc: ins
+ * *at a->t
+ * :10 match bp
+ * cs:Z::5+g:7+t:4+c:7-a:6*tg:15*tg:2+g:6+c:1+c:1+c:4+a:15+g:4+c:12*gt*ta:2+c:7+t:8+t:17+t:8+g:5+g:10-g:4+g*tc:5+t*ag:10+a:4-ttt
+ */
+
+vector<differences_str> Alignment::summarize_csstring(std::vector<indel_str> &dels) {
+	string cs = this->get_cs();
+	int pos = this->getPosition();
+	int corr = 0;
+	int ref_pos = 0;
+	size_t pos_events = 0;
+	int max_size = (this->getRefLength() * 0.9) + getPosition();
+	//	comp_aln = clock();
+
+	indel_str del;
+	del.sequence = "";
+	del.pos = -1;
+
+	vector<differences_str> events;
+	differences_str ev;
+	std::cout<<"CS: "<<cs<<std::endl;
+	for (size_t i = 0; i < cs.size() && pos < max_size; i++) {
+		ev.position = pos;
+		if (cs[i] == ':') { //match (can be numbers or bp!)
+			i++;
+			int num = atoi(&cs[i]); //is 0 if letter!
+			bool is_long = ((num == 0 && cs[i] != '0'));
+			while ((i < cs.size() && cs[i] != '+') && (cs[i] != '-' && cs[i] != '+')) {
+				i++;
+				if (is_long) {
+					num++;
+				}
+			}
+			cout<<"\tMatch: "<<num<<std::endl;
+			pos += num;
+		} else if (cs[i] == '*') { //mismatch (ref,alt) pairs
+			//only every second char counts!
+			add_event(pos, pos_events, events);
+			pos++;
+			i += 2;
+			ev.type = 0; //mismatch
+			cout<<"\tMiss: "<<1<<std::endl;
+		} else if (cs[i] == '-') { //del
+			//collet del seq in dels!
+			indel_str del;
+			del.sequence = "";
+			del.pos = pos;
+			while ((i < cs.size() && cs[i] != '+') && (cs[i] != '-' && cs[i] != '+')) {
+				del.sequence += cs[i];
+			}
+			dels.push_back(del);
+			pos += del.sequence.size();
+			ev.type = del.sequence.size();
+			cout<<"\tDEL: "<<del.sequence.size()<<std::endl;
+
+		} else if (cs[i] == '+') { //ins
+			int num=0;
+			while ((i < cs.size() && cs[i] != '+') && (cs[i] != '-' && cs[i] != '+')) {
+				i++;
+				num++;
+			}
+			ev.type =num*-1;
+			cout<<"\tINS: "<<num<<std::endl;
+		}
+		events.push_back(ev);
+
+	}
+	std::cout<<"end CS: "<<cs<<std::endl;
+	return events;
+}
+
 vector<differences_str> Alignment::summarizeAlignment(std::vector<indel_str> &dels) {
 //	clock_t comp_aln = clock();
 	vector<differences_str> events;
@@ -69,16 +144,19 @@ vector<differences_str> Alignment::summarizeAlignment(std::vector<indel_str> &de
 			ev.position = pos;
 			ev.type = al->CigarData[i].Length; //deletion
 			ev.readposition = read_pos;
+			ev.resolved = true;
 			events.push_back(ev);
 			pos += al->CigarData[i].Length;
 		} else if (al->CigarData[i].Type == 'I') {
 			ev.position = pos;
+			ev.resolved = true;
 			ev.readposition = read_pos;
 			ev.type = al->CigarData[i].Length * -1; //insertion
 			events.push_back(ev);
 			read_pos += al->CigarData[i].Length;
 		} else if (al->CigarData[i].Type == 'N') {
 			pos += al->CigarData[i].Length;
+			ev.resolved = true;
 			read_pos += al->CigarData[i].Length;
 		} else if (al->CigarData[i].Type == 'S' && al->CigarData[i].Length > Parameter::Instance()->huge_ins) { /// Used for reads ranging into an inser
 			string sa;
@@ -91,6 +169,7 @@ vector<differences_str> Alignment::summarizeAlignment(std::vector<indel_str> &de
 				} else {
 					ev.readposition = read_pos;
 				}
+				ev.resolved = false;
 				ev.type = Parameter::Instance()->huge_ins * -1; //insertion: WE have to fix the length since we cannot estimate it!]
 				events.push_back(ev);
 			}
@@ -202,7 +281,7 @@ void Alignment::computeAlignment() {
 	}
 	if (to_del > 0) {
 		alignment.second = alignment.second.substr(0, alignment.second.size() - to_del);
-		//alignment.second.erase(alignment.second.size() - to_del, to_del);
+//alignment.second.erase(alignment.second.size() - to_del, to_del);
 	}
 	Parameter::Instance()->meassure_time(comp_aln, "\t\tCIGAR opterations ");
 	comp_aln = clock();
@@ -248,8 +327,8 @@ void Alignment::computeAlignment() {
 
 		cout << this->get_md() << endl;
 
-		//	exit(0);
-		//	return;
+//	exit(0);
+//	return;
 	}
 }
 int32_t Alignment::getPosition() {
@@ -600,7 +679,7 @@ vector<aln_str> Alignment::getSA(RefVector ref) {
 	string sa;
 	vector<aln_str> entries;
 	if (al->GetTag("SA", sa) && !sa.empty()) {
-		//store the main aln:
+//store the main aln:
 		aln_str tmp;
 		tmp.RefID = this->getRefID();
 		tmp.cigar = this->getCigar();
@@ -842,9 +921,24 @@ std::string Alignment::get_md() {
 	std::string md;
 	if (al->GetTag("MD", md)) {
 		return md;
+	}else{
+		std::cerr<<"No MD string detected! Check bam file! Otherwise generate using e.g. samtools."<<std::endl;
+		exit(0);
 	}
 	return md;
 }
+
+std::string Alignment::get_cs() {
+	std::string cs;
+	if (al->GetTag("cs", cs)) {
+		return cs;
+	}else{
+		std::cerr<<"No CS string detected! Check bam file!"<<std::endl;
+		exit(0);
+	}
+	return cs;
+}
+
 vector<str_event> Alignment::get_events_MD(int min_mis) {
 	vector<str_event> events;
 	/*std::string md;
@@ -1009,11 +1103,17 @@ vector<int> Alignment::get_avg_diff(double & dist, double & avg_del, double & av
 
 vector<str_event> Alignment::get_events_Aln() {
 
-	bool flag = (strcmp(this->getName().c_str(),Parameter::Instance()->read_name.c_str()) == 0);
+	bool flag = (strcmp(this->getName().c_str(), Parameter::Instance()->read_name.c_str()) == 0);
 
 //clock_t comp_aln = clock();
 	std::vector<indel_str> dels;
-	vector<differences_str> event_aln = summarizeAlignment(dels);
+	vector<differences_str> event_aln;
+	if (Parameter::Instance()->cs_string) {
+		cout<<"run cs check "<<std::endl;
+		event_aln = summarize_csstring(dels);
+	} else {
+		event_aln = summarizeAlignment(dels);
+	}
 //double time2 = Parameter::Instance()->meassure_time(comp_aln, "\tcompAln Events: ");
 
 	vector<str_event> events;
@@ -1032,7 +1132,7 @@ vector<str_event> Alignment::get_events_Aln() {
 	for (size_t i = 0; i < event_aln.size(); i++) {
 		pair_str tmp;
 		tmp.position = -1;
-		if (event_aln[i].type == 0) {
+		if (event_aln[i].type == 0) { //substitutions.
 			tmp = plane->add_mut(event_aln[i].position, 1, Parameter::Instance()->window_thresh);
 		} else {
 			tmp = plane->add_mut(event_aln[i].position, 1, Parameter::Instance()->window_thresh);	// abs(event_aln[i].type)
@@ -1049,7 +1149,6 @@ vector<str_event> Alignment::get_events_Aln() {
 	int stop = 0;
 	size_t start = 0;
 	for (size_t i = 0; i < profile.size() && stop < event_aln.size(); i++) {
-
 		if (profile[i].position >= event_aln[stop].position) {
 			//find the postion:
 			size_t pos = 0;
@@ -1072,8 +1171,10 @@ vector<str_event> Alignment::get_events_Aln() {
 				}
 				prev += prev_type;
 			}
-			start++; //we are running one too far!
 
+			if (start + 1 < event_aln.size()) { //TODO do some testing!
+				start++; //we are running one too far!
+			}
 			//run forward to identify the stop:
 			prev = event_aln[pos].position;
 			stop = pos;
@@ -1091,8 +1192,11 @@ vector<str_event> Alignment::get_events_Aln() {
 			if (stop > 0) {
 				stop--;
 			}
+
+			//	cout<<start<<" events: "<<event_aln[start].type <<" pos "<<event_aln[start].readposition<<endl;
 			int insert_max_pos = 0;
 			int insert_max = 0;
+
 			if (event_aln[start].type < 0) {
 				insert_max_pos = event_aln[start].position;
 				insert_max = abs(event_aln[start].type);
@@ -1101,6 +1205,13 @@ vector<str_event> Alignment::get_events_Aln() {
 			int del_max = 0;
 			int del_max_pos = 0;
 
+			if (event_aln[start].type > 0) {
+				//	cout<<"HIT"<<endl;
+				del_max_pos = event_aln[start].position;
+				del_max = event_aln[start].type;
+
+			}
+
 			double insert = 0;
 			double del = 0;
 			double mismatch = 0;
@@ -1122,6 +1233,8 @@ vector<str_event> Alignment::get_events_Aln() {
 					}
 				}
 			}
+
+			//	cout << "DELMAX: " << del_max << " " << Parameter::Instance()->avg_del << endl;
 			str_event tmp;
 			tmp.pos = event_aln[start].position;
 
@@ -1148,11 +1261,8 @@ vector<str_event> Alignment::get_events_Aln() {
 					}
 					tmp.read_pos = event_aln[start].readposition;
 					if (Parameter::Instance()->print_seq) {
-						//if (tmp.read_pos + tmp.length > this->getAlignment()->QueryBases.size() || tmp.read_pos<0) {
-						//	cerr << "BUG! ALN event INS: " << this->getName() << " " << tmp.read_pos << " " << tmp.length << " " << this->getAlignment()->QueryBases.size() << endl;
-						//	}
-						if(flag){
-							std::cout<<"Seq+:"<<this->getAlignment()->QueryBases.substr(tmp.read_pos, tmp.length)<<std::endl;
+						if (flag) {
+							std::cout << "Seq+:" << this->getAlignment()->QueryBases.substr(tmp.read_pos, tmp.length) << std::endl;
 
 						}
 						tmp.sequence = this->getAlignment()->QueryBases.substr(tmp.read_pos, tmp.length);
@@ -1193,14 +1303,14 @@ vector<str_event> Alignment::get_events_Aln() {
 			}
 
 			if (flag) {
-			 cout << "Read: " << " " << (double) this->getRefLength() << " events: " << event_aln.size() << " " << this->al->Name << std::endl;
-			 cout << "INS max " << insert_max << " del_max " << del_max << std::endl;
-			 cout << "INS:" << insert << " DEL: " << del << " MIS: " << mismatch << endl;
-			 cout << event_aln[start].position << " " << event_aln[stop].position << endl;
-			 cout << "store: " << tmp.pos << " " << tmp.pos + abs(tmp.length) << " " << tmp.length << endl;
-			 cout << tmp.sequence<<endl;
-			 cout << endl;
-			 }
+				cout << "Read: " << " " << (double) this->getRefLength() << " events: " << event_aln.size() << " " << this->al->Name << std::endl;
+				cout << "INS max " << insert_max << " del_max " << del_max << std::endl;
+				cout << "INS:" << insert << " DEL: " << del << " MIS: " << mismatch << endl;
+				cout << event_aln[start].position << " " << event_aln[stop].position << endl;
+				cout << "store: " << tmp.pos << " " << tmp.pos + abs(tmp.length) << " " << tmp.length << endl;
+				cout << tmp.sequence << endl;
+				cout << endl;
+			}
 
 			if (tmp.type != 0) {
 				events.push_back(tmp);


=====================================
src/Alignment.h
=====================================
--- a/src/Alignment.h
+++ b/src/Alignment.h
@@ -40,6 +40,7 @@ struct differences_str{
 	int position;
 	int readposition;
 	short type;
+	bool resolved;
 };
 struct indel_str{
 	int pos;
@@ -81,6 +82,7 @@ private:
 	 size_t get_length(std::vector<CigarOp> CigarData);
 	 int get_id(RefVector ref, std::string chr);
 	 vector<differences_str> summarizeAlignment(std::vector<indel_str> &dels);
+	 vector<differences_str> summarize_csstring (std::vector<indel_str> &dels) ;
 	 void sort_insert(aln_str tmp, vector<aln_str> &entries);
 
 	 void sort_insert_ref(aln_str tmp, vector<aln_str> &entries);
@@ -141,6 +143,7 @@ public:
 	 double get_num_mismatches(std::string md);
 	 double get_scrore_ratio();
 	 std::string get_md();
+	 std::string get_cs();
 	 double get_avg_indel_length_Cigar();
 	 vector<int> get_avg_diff(double & dist,double & avg_del, double & avg_len);
 


=====================================
src/ArgParseOutput.h
=====================================
--- /dev/null
+++ b/src/ArgParseOutput.h
@@ -0,0 +1,53 @@
+/**
+ * Contact: philipp.rescheneder at gmail.com
+ */
+
+#include <iostream>
+#include <string>
+
+#include <tclap/CmdLine.h>
+
+using std::cerr;
+using std::cout;
+using std::endl;
+
+class ArgParseOutput : public TCLAP::StdOutput
+{
+private:
+
+	std::string usageStr;
+
+	std::string versionStr;
+
+public:
+
+	ArgParseOutput(std::string usage, std::string version) {
+		usageStr = usage;
+		versionStr = version;
+	}
+
+	virtual ~ArgParseOutput() {
+
+	}
+
+	virtual void failure(TCLAP::CmdLineInterface& c, TCLAP::ArgException& e) {
+		cerr << "Error:" << endl;
+		cerr << "         " << e.error() << endl;
+		cerr << endl;
+		cerr << "Short usage:" << endl;
+		cerr << "       sniffles [options] -m <sorted.bam> -v <output.vcf> " << endl;
+		cerr << endl;
+		cerr << "For complete USAGE and HELP type:" << endl;
+		cerr << "    sniffles --help" << endl;
+		cerr << endl;
+		exit(1);
+	}
+
+	virtual void usage(TCLAP::CmdLineInterface& c) {
+		cerr << usageStr << std::endl;
+	}
+
+	virtual void version(TCLAP::CmdLineInterface& c) {
+
+	}
+};


=====================================
src/CMakeLists.txt
=====================================
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -1,11 +1,21 @@
 cmake_minimum_required(VERSION 2.8)
 project(Sniffles)
 
+
 include_directories (../lib/bamtools-2.3.0/src)
 include_directories(../lib/tclap-1.2.1/include)
 
 configure_file( Version.h.in ${CMAKE_SOURCE_DIR}/src/Version.h )
 
+
+
+ IF(STATIC)
+        SET(CMAKE_FIND_LIBRARY_SUFFIXES ".a")
+        SET(BUILD_SHARED_LIBRARIES OFF)
+        SET(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -static")
+ENDIF()
+
+
 add_executable(sniffles 	
                     tree/Breakpoint_Tree.cpp
                     Genotyper/Genotyper.cpp


=====================================
src/Genotyper/Genotyper.cpp
=====================================
--- a/src/Genotyper/Genotyper.cpp
+++ b/src/Genotyper/Genotyper.cpp
@@ -36,7 +36,6 @@ std::string Genotyper::mod_breakpoint_vcf(string buffer, int ref) {
 	//parse #reads supporting
 	//print #ref
 	string entry;
-
 	int pos = 0;
 
 	pos = buffer.find_last_of("GT");
@@ -46,12 +45,16 @@ std::string Genotyper::mod_breakpoint_vcf(string buffer, int ref) {
 	buffer = buffer.substr(pos + 1);		// the right part is only needed:
 	pos = buffer.find_last_of(':');
 	int support = atoi(buffer.substr(pos + 1).c_str());
-	entry += assess_genotype(ref, support);
+	string msg = assess_genotype(ref, support);
+	if (msg.empty()) {
+		return "";
+	}
+	entry += msg;
 	return entry;
 
 }
 
-std::string Genotyper::mod_breakpoint_bedpe( string buffer, int ref) {
+std::string Genotyper::mod_breakpoint_bedpe(string buffer, int ref) {
 
 	std::string tmp = buffer;
 	std::string entry = tmp;
@@ -180,7 +183,7 @@ void Genotyper::update_file(Breakpoint_Tree & tree, breakpoint_node *& node) {
 	}
 
 	string buffer;
-	getline(myfile,buffer);
+	getline(myfile, buffer);
 	//parse SVs breakpoints in file
 
 	while (!myfile.eof()) { // TODO:if first -> we need to define AF!
@@ -210,7 +213,7 @@ void Genotyper::update_file(Breakpoint_Tree & tree, breakpoint_node *& node) {
 			fprintf(file, "%c", '\n');
 		}
 
-		getline(myfile,buffer);
+		getline(myfile, buffer);
 	}
 	myfile.close();
 	fclose(file);
@@ -240,12 +243,12 @@ std::vector<std::string> Genotyper::read_SVs(Breakpoint_Tree & tree, breakpoint_
 	//size_t buffer_size = 250000000;
 	string buffer;
 
-	getline(myfile,buffer);
+	getline(myfile, buffer);
 	//char* buffer = new char[buffer_size];
 	//myfile.getline(buffer, buffer_size);
 	//parse SVs breakpoints in file
 
-	int num_sv=0;
+	int num_sv = 0;
 	int prev_pos1 = 0;
 	int prev_pos2 = 0;
 	while (!myfile.eof()) {
@@ -259,13 +262,13 @@ std::vector<std::string> Genotyper::read_SVs(Breakpoint_Tree & tree, breakpoint_
 			} else {
 				tmp = get_breakpoint_bedpe(buffer);
 			}
-		//	std::cout << "SV: " << tmp.pos << " " << tmp.pos2 << std::endl;
+			//	std::cout << "SV: " << tmp.pos << " " << tmp.pos2 << std::endl;
 			tree.insert(node, tmp.chr, tmp.pos, true); //true: start;
 			tree.insert(node, tmp.chr2, tmp.pos2, false); //false: stop;//
 			num_sv++;
-			if(num_sv%1000==0){
-				cout<<"\t\tRead in SV: "<<num_sv<<endl;
-				}
+			if (num_sv % 1000 == 0) {
+				cout << "\t\tRead in SV: " << num_sv << endl;
+			}
 		} else if (buffer[2] == 'c' && buffer[3] == 'o') { //##contig=<ID=chr1,length=699930>
 		//fill the refdict.
 			std::string id = "";
@@ -274,7 +277,7 @@ std::vector<std::string> Genotyper::read_SVs(Breakpoint_Tree & tree, breakpoint_
 			}
 			ref_dict.push_back(id);
 		}
-		getline(myfile,buffer);
+		getline(myfile, buffer);
 		//myfile.getline(buffer, buffer_size);
 	}
 	myfile.close();
@@ -293,7 +296,7 @@ void Genotyper::compute_cov(Breakpoint_Tree & tree, breakpoint_node *& node, std
 	int prev_id = -1;
 	while (nbytes != 0) {
 		//	 std::cout<<"Read: "<<" " <<tmp.chr_id<<":"<<ref_dict[tmp.chr_id]<<" " <<tmp.start<<" "<<tmp.length<<std::endl;
-			if (prev_id != tmp.chr_id) {
+		if (prev_id != tmp.chr_id) {
 			cout << "\t\tScanning CHR " << ref_dict[tmp.chr_id] << endl;
 			prev_id = tmp.chr_id;
 		}
@@ -315,7 +318,7 @@ void Genotyper::update_SVs() {
 	cout << "\tCleaning tmp files" << endl;
 	string del = "rm ";
 	del += Parameter::Instance()->tmp_genotyp;
-	//system(del.c_str());
+	system(del.c_str());
 }
 
 void Genotyper::update_SVs(std::vector<Breakpoint *> & svs, long ref_space) { //refspace for the ref reads!!


=====================================
src/Paramer.h
=====================================
--- a/src/Paramer.h
+++ b/src/Paramer.h
@@ -25,8 +25,8 @@ class Parameter {
 private:
 	Parameter() {
 		window_thresh=10;//TODO check!
-		version="1.0.7";
-		huge_ins = 2000;//TODO check??
+		version="1.0.8";
+		huge_ins = 999;//TODO check??
 	}
 	~Parameter() {
 
@@ -81,6 +81,9 @@ public:
 	bool ignore_std;
 	bool reportBND;
 	bool print_seq;
+	bool change_coords; //indicating for --Ivcf
+	bool skip_parameter_estimation;
+	bool cs_string;
 
 	void set_regions(std::string reg) {
 		size_t i = 0;


=====================================
src/Sniffles.cpp
=====================================
--- a/src/Sniffles.cpp
+++ b/src/Sniffles.cpp
@@ -22,11 +22,13 @@
 #include "Ignore_Regions.h"
 #include "plane-sweep/PlaneSweep_slim.h"
 #include "print/BedpePrinter.h"
+#include "ArgParseOutput.h"
 #include "force_calling/Force_calling.h"
 
 //cmake -D CMAKE_C_COMPILER=/opt/local/bin/gcc-mp-4.7 -D CMAKE_CXX_COMPILER=/opt/local/bin/g++-mp-4.7 ..
 
 //TODO:
+//check strand headers.
 // strand bias??
 // I think you could make your performance on PacBio reads even better with a few modifications:
 //b. In pbsv, I use a simply mononucleotide consistency check to determine whether to cluster insertions from different reads as supporting the "same" events.  In addition to looking at the similarity of length and breakpoints,
@@ -35,57 +37,141 @@
 //[min(A1,A2)+min(C1,C2)+min(G1,G2)+min(T1,T2)[/[max...]/
 Parameter* Parameter::m_pInstance = NULL;
 
+template<typename T>
+void printParameter(std::stringstream & usage, TCLAP::ValueArg<T> & arg) {
+
+	usage << "    " << arg.longID() << std::endl;
+	usage << "        " << arg.getDescription();
+	if (!arg.isRequired()) {
+		usage << " [" << arg.getValue() << "]";
+	}
+	usage << std::endl;
+}
+
+void printParameter(std::stringstream & usage, TCLAP::SwitchArg & arg) {
+
+	usage << "    " << arg.longID() << std::endl;
+	usage << "        " << arg.getDescription();
+	if (!arg.isRequired()) {
+		usage << " [" << (arg.getValue() ? "true" : "false") << "]";
+	}
+	usage << std::endl;
+}
+
 void read_parameters(int argc, char *argv[]) {
 
+//	TCLAP::CmdLine cmd("", ' ', "", true);
 	TCLAP::CmdLine cmd("Sniffles version ", ' ', Parameter::Instance()->version);
-	TCLAP::ValueArg<std::string> arg_bamfile("m", "mapped_reads", "Sorted bam File", true, "", "string");
-	TCLAP::ValueArg<std::string> arg_vcf("v", "vcf", "VCF output file name", false, "", "string");
-	TCLAP::ValueArg<std::string> arg_input_vcf("", "Ivcf", "Input VCF file name. Enable force calling", false, "", "string");
-	TCLAP::ValueArg<std::string> arg_bedpe("b", "bedpe", " bedpe output file name", false, "", "string");
+
+	TCLAP::ValueArg<std::string> arg_bamfile("m", "mapped_reads", "Sorted bam File", true, "", "string", cmd);
+	TCLAP::ValueArg<std::string> arg_vcf("v", "vcf", "VCF output file name", false, "", "string", cmd);
+	TCLAP::ValueArg<std::string> arg_input_vcf("", "Ivcf", "Input VCF file name. Enable force calling", false, "", "string", cmd);
+	TCLAP::ValueArg<std::string> arg_bedpe("b", "bedpe", " bedpe output file name", false, "", "string", cmd);
+	TCLAP::ValueArg<std::string> arg_tmp_file("", "tmp_file", "path to temporary file otherwise Sniffles will use the current directory.", false, "", "string", cmd);
+
 	//TCLAP::ValueArg<std::string> arg_chrs("c", "chrs", " comma seperated list of chrs to scan", false, "", "string");
-	TCLAP::ValueArg<int> arg_support("s", "min_support", "Minimum number of reads that support a SV. Default: 10", false, 10, "int");
-	TCLAP::ValueArg<int> arg_splits("", "max_num_splits", "Maximum number of splits per read to be still taken into account. Default: 7", false, 7, "int");
-	TCLAP::ValueArg<int> arg_dist("d", "max_distance", "Maximum distance to group SV together. Default: 1kb", false, 1000, "int");
-	TCLAP::ValueArg<int> arg_threads("t", "threads", "Number of threads to use. Default: 3", false, 3, "int");
-	TCLAP::ValueArg<int> arg_minlength("l", "min_length", "Minimum length of SV to be reported. Default: 30", false, 30, "int");
-	TCLAP::ValueArg<int> arg_mq("q", "minmapping_qual", "Minimum Mapping Quality. Default: 20", false, 20, "int");
-	TCLAP::ValueArg<int> arg_numreads("n", "num_reads_report", "Report up to N reads that support the SV in the vcf file. -1: report all. Default: 0", false, 0, "int");
-	TCLAP::ValueArg<int> arg_segsize("r", "min_seq_size", "Discard read if non of its segment is larger then this. Default: 2kb", false, 2000, "int");
-	TCLAP::ValueArg<int> arg_zmw("z", "min_zmw", "Discard SV that are not supported by at least x zmws. This applies only for PacBio recognizable reads. Default: 0", false, 0, "int");
-	TCLAP::ValueArg<std::string> arg_tmp_file("", "tmp_file", "path to temporary file otherwise Sniffles will use the current directory.", false, "", "string");
+	TCLAP::ValueArg<int> arg_support("s", "min_support", "Minimum number of reads that support a SV.", false, 10, "int", cmd);
+	TCLAP::ValueArg<int> arg_splits("", "max_num_splits", "Maximum number of splits per read to be still taken into account.", false, 7, "int", cmd);
+	TCLAP::ValueArg<int> arg_dist("d", "max_distance", "Maximum distance to group SV together.", false, 1000, "int", cmd);
+	TCLAP::ValueArg<int> arg_threads("t", "threads", "Number of threads to use.", false, 3, "int", cmd);
+	TCLAP::ValueArg<int> arg_minlength("l", "min_length", "Minimum length of SV to be reported.", false, 30, "int", cmd);
+	TCLAP::ValueArg<int> arg_mq("q", "minmapping_qual", "Minimum Mapping Quality.", false, 20, "int", cmd);
+	TCLAP::ValueArg<int> arg_numreads("n", "num_reads_report", "Report up to N reads that support the SV in the vcf file. -1: report all.", false, 0, "int", cmd);
+	TCLAP::ValueArg<int> arg_segsize("r", "min_seq_size", "Discard read if non of its segment is larger then this.", false, 2000, "int", cmd);
+	TCLAP::ValueArg<int> arg_zmw("z", "min_zmw", "Discard SV that are not supported by at least x zmws. This applies only for PacBio recognizable reads.", false, 0, "int", cmd);
+	TCLAP::ValueArg<int> arg_cluster_supp("", "cluster_support", "Minimum number of reads supporting clustering of SV.", false, 1, "int", cmd);
+	TCLAP::ValueArg<int> arg_parameter_maxdist("", "max_dist_aln_events", "Maximum distance between alignment (indel) events.", false, 4, "int", cmd);
+	TCLAP::ValueArg<int> arg_parameter_maxdiff("", "max_diff_per_window", "Maximum differences per 100bp.", false, 50, "int", cmd);
+
 	TCLAP::SwitchArg arg_genotype("", "genotype", "Enables Sniffles to compute the genotypes.", cmd, false);
 	TCLAP::SwitchArg arg_cluster("", "cluster", "Enables Sniffles to phase SVs that occur on the same reads", cmd, false);
-	TCLAP::SwitchArg arg_std("", "ignore_sd", "Ignores the sd based filtering. Default: false", cmd, false);
-	TCLAP::SwitchArg arg_bnd("", "report_BND", "Report BND instead of Tra in vcf output. Default: false", cmd, false);
-	TCLAP::SwitchArg arg_seq("", "report_seq", "Report sequences for indels in vcf output. (Beta version!) Default: false", cmd, false);
-	TCLAP::ValueArg<int> arg_cluster_supp("", "cluster_support", "Minimum number of reads supporting clustering of SV. Default: 1", false, 1, "int");
-	TCLAP::ValueArg<float> arg_allelefreq("f", "allelefreq", "Threshold on allele frequency (0-1). Default=0.0", false, 0.0, "float");
-
-	TCLAP::ValueArg<float> arg_hetfreq("", "min_het_af", "Threshold on allele frequency (0-1). Default=0.0", false, 0.3, "float");
-	TCLAP::ValueArg<float> arg_homofreq("", "min_homo_af", "Threshold on allele frequency (0-1). Default=0.0", false, 0.8, "float");
-
-	cmd.add(arg_homofreq);
-	cmd.add(arg_hetfreq);
-	cmd.add(arg_input_vcf);
-	cmd.add(arg_cluster_supp);
-	cmd.add(arg_numreads);
-	cmd.add(arg_zmw);
-	cmd.add(arg_segsize);
-	cmd.add(arg_tmp_file);
-	cmd.add(arg_dist);
-	cmd.add(arg_threads);
-	cmd.add(arg_minlength);
-	cmd.add(arg_mq);
-	cmd.add(arg_splits);
-	cmd.add(arg_bedpe);
-	cmd.add(arg_vcf);
-	cmd.add(arg_allelefreq);
-	cmd.add(arg_support);
-	cmd.add(arg_bamfile);
-//	cmd.add(arg_chrs);
+	TCLAP::SwitchArg arg_std("", "ignore_sd", "Ignores the sd based filtering. ", cmd, false);
+	TCLAP::SwitchArg arg_bnd("", "report_BND", "Report BND instead of Tra in vcf output. ", cmd, false);
+	TCLAP::SwitchArg arg_seq("", "report_seq", "Report sequences for indels in vcf output. (Beta version!) ", cmd, false);
+	TCLAP::SwitchArg arg_coords("", "change_coords", "Adopt coordinates for force calling if finding evidence. ", cmd, false);
+	TCLAP::SwitchArg arg_parameter("", "skip_parameter_estimation", "Enables the scan if only very few reads are present. ", cmd, false);
+	TCLAP::SwitchArg arg_cs_string("", "cs_string", "Enables the scan of CS string instead of Cigar and MD. ", cmd, false);
+
+
+	TCLAP::ValueArg<float> arg_allelefreq("f", "allelefreq", "Threshold on allele frequency (0-1). ", false, 0.0, "float", cmd);
+	TCLAP::ValueArg<float> arg_hetfreq("", "min_het_af", "Threshold on allele frequency (0-1). ", false, 0.3, "float", cmd);
+	TCLAP::ValueArg<float> arg_homofreq("", "min_homo_af", "Threshold on allele frequency (0-1). ", false, 0.8, "float", cmd);
+	TCLAP::ValueArg<float> arg_delratio("", "del_ratio", "Estimated ration of deletions per read (0-1). ", false, 0.0458369, "float", cmd);
+	TCLAP::ValueArg<float> arg_insratio("", "ins_ratio", "Estimated ratio of insertions per read (0-1). ", false, 0.049379, "float", cmd);
+
+	std::stringstream usage;
+	usage << "" << std::endl;
+	usage << "Usage: sniffles [options] -m <sorted.bam> -v <output.vcf> " << std::endl;
+
+	usage << "" << std::endl;
+	usage << "Input/Output:" << std::endl;
+	printParameter<std::string>(usage, arg_bamfile);
+	printParameter<std::string>(usage, arg_vcf);
+	printParameter<std::string>(usage, arg_bedpe);
+	printParameter<std::string>(usage, arg_input_vcf);
+	printParameter<std::string>(usage, arg_tmp_file);
+
+	usage << "" << std::endl;
+	usage << "General:" << std::endl;
+	printParameter<int>(usage, arg_support);
+	printParameter<int>(usage, arg_splits);
+	printParameter<int>(usage, arg_dist);
+	printParameter<int>(usage, arg_threads);
+	printParameter<int>(usage, arg_minlength);
+	printParameter<int>(usage, arg_mq);
+	printParameter<int>(usage, arg_numreads);
+	printParameter<int>(usage, arg_segsize);
+	printParameter<int>(usage, arg_zmw);
+	printParameter(usage,arg_cs_string);
+
+	usage << "" << std::endl;
+	usage << "Clustering/phasing and genotyping:" << std::endl;
+	printParameter(usage, arg_genotype);
+	printParameter(usage, arg_cluster);
+	printParameter<int>(usage, arg_cluster_supp);
+	printParameter<float>(usage, arg_allelefreq);
+	printParameter<float>(usage, arg_homofreq);
+	printParameter<float>(usage, arg_hetfreq);
+
+	usage << "" << std::endl;
+	usage << "Advanced:" << std::endl;
+	printParameter(usage, arg_bnd);
+	printParameter(usage, arg_seq);
+	printParameter(usage, arg_std);
+
+	usage << "" << std::endl;
+	usage << "Parameter estimation:" << std::endl;
+	printParameter(usage, arg_parameter);
+	printParameter<float>(usage, arg_delratio);
+	printParameter<float>(usage, arg_insratio);
+	printParameter<int>(usage, arg_parameter_maxdiff);
+	printParameter<int>(usage, arg_parameter_maxdist);
+
+	cmd.setOutput(new ArgParseOutput(usage.str(), ""));
+
+	/*	cmd.add(arg_homofreq);
+	 cmd.add(arg_hetfreq);
+	 cmd.add(arg_input_vcf);
+	 cmd.add(arg_cluster_supp);
+	 cmd.add(arg_numreads);
+	 cmd.add(arg_zmw);
+	 cmd.add(arg_segsize);
+	 cmd.add(arg_tmp_file);
+	 cmd.add(arg_dist);
+	 cmd.add(arg_threads);
+	 cmd.add(arg_minlength);
+	 cmd.add(arg_mq);
+	 cmd.add(arg_splits);
+	 cmd.add(arg_bedpe);
+	 cmd.add(arg_vcf);
+	 cmd.add(arg_allelefreq);
+	 cmd.add(arg_support);
+	 cmd.add(arg_bamfile);
+	 //	cmd.add(arg_chrs);*/
 	//parse cmd:
 	cmd.parse(argc, argv);
 
+	Parameter::Instance()->change_coords = arg_coords.getValue();
 	Parameter::Instance()->debug = true;
 	Parameter::Instance()->score_treshold = 10;
 	Parameter::Instance()->read_name = " "; //21_16296949_+";//21_40181680_-";//m151102_123142_42286_c100922632550000001823194205121665_s1_p0/80643/0_20394"; //"22_36746138"; //just for debuging reasons!
@@ -112,32 +198,43 @@ void read_parameters(int argc, char *argv[]) {
 	Parameter::Instance()->min_zmw = arg_zmw.getValue();
 	Parameter::Instance()->homfreq = arg_homofreq.getValue();
 	Parameter::Instance()->hetfreq = arg_hetfreq.getValue();
+	Parameter::Instance()->skip_parameter_estimation = arg_parameter.getValue();
+	Parameter::Instance()->cs_string = arg_cs_string.getValue();
+
+	if (Parameter::Instance()->skip_parameter_estimation) {
+		cout<<"\tSkip parameter estimation."<<endl;
+		Parameter::Instance()->score_treshold = 2;
+		Parameter::Instance()->window_thresh = arg_parameter_maxdiff.getValue();
+		Parameter::Instance()->max_dist_alns = arg_parameter_maxdist.getValue();
+		Parameter::Instance()->avg_del =arg_delratio.getValue();
+		Parameter::Instance()->avg_ins = arg_insratio.getValue();
+	}
 
 	//Parse IDS:
 	/*std::string buffer = arg_chrs.getValue();
-	int count = 0;
-	std::string name = "";
-	for (size_t i = 0; i < buffer.size(); i++) {
-		if (buffer[i] == ',') {
-			Parameter::Instance()->chr_names[name] = true;
-			name.clear();
-		} else {
-			name += buffer[i];
-		}
-	}
-	if (!name.empty()) {
-		Parameter::Instance()->chr_names[name] = true;
-	}
-*/
+	 int count = 0;
+	 std::string name = "";
+	 for (size_t i = 0; i < buffer.size(); i++) {
+	 if (buffer[i] == ',') {
+	 Parameter::Instance()->chr_names[name] = true;
+	 name.clear();
+	 } else {
+	 name += buffer[i];
+	 }
+	 }
+	 if (!name.empty()) {
+	 Parameter::Instance()->chr_names[name] = true;
+	 }
+	 */
 	if (Parameter::Instance()->min_allelel_frequency > 0 || !Parameter::Instance()->input_vcf.empty()) {
 		std::cerr << "Automatically enabling genotype mode" << std::endl;
 		Parameter::Instance()->genotype = true;
 	}
 
 	if (Parameter::Instance()->tmp_file.empty()) { //TODO change to genotyper file and phasing file!
-		if(Parameter::Instance()->output_bedpe.empty()){
+		if (Parameter::Instance()->output_bedpe.empty()) {
 			Parameter::Instance()->tmp_file = Parameter::Instance()->output_vcf;
-		}else{
+		} else {
 			Parameter::Instance()->tmp_file = Parameter::Instance()->output_bedpe;
 		}
 


=====================================
src/cluster/Cluster_SVs.cpp
=====================================
--- a/src/cluster/Cluster_SVs.cpp
+++ b/src/cluster/Cluster_SVs.cpp
@@ -151,6 +151,6 @@ void Cluster_SVS::update_SVs() {
 	std::cout << "\tCleaning tmp files" << std::endl;
 	std::string del = "rm ";
 	del += Parameter::Instance()->tmp_phasing;
-//	system(del.c_str());
+	system(del.c_str());
 
 }


=====================================
src/force_calling/Force_calling.cpp
=====================================
--- a/src/force_calling/Force_calling.cpp
+++ b/src/force_calling/Force_calling.cpp
@@ -31,35 +31,59 @@ void fill_tree(IntervallTree & final, TNode *& root_final, RefVector ref, std::m
 	long length = 0;
 	for (size_t i = 0; i < ref.size(); i++) {
 		ref_lens[ref[i].RefName.c_str()] = length;
-		length += ref[i].RefLength + Parameter::Instance()->max_dist;
+		length += (long) ref[i].RefLength + (long) Parameter::Instance()->max_dist;
 	}
 
+	//sometimes the stop coordinates are off especially for smaller chrs!??
+
 	//parse VCF file
 	std::vector<strvcfentry> entries = parse_vcf(Parameter::Instance()->input_vcf, 0);
 	std::cout << "\t\t" << entries.size() << " SVs found in input." << std::endl;
+	int invalid_svs=0;
 	for (size_t i = 0; i < entries.size(); i++) {
 		if (entries[i].type != -1) {
 			position_str svs;
+			//cout<<"start: "<<entries[i].start.chr << " stop "<<entries[i].stop.chr<<endl;
 			svs.start.min_pos = (long) entries[i].start.pos + ref_lens[entries[i].start.chr];
 			svs.stop.max_pos = (long) entries[i].stop.pos + ref_lens[entries[i].stop.chr];
 			read_str read;
+			if(ref_lens.find(entries[i].start.chr)==ref_lens.end()){
+				cerr<<"Warning undefined CHR in VCF vs. BAM header: "<<entries[i].start.chr<<endl;
+			}
+			if(ref_lens.find(entries[i].stop.chr)==ref_lens.end()){
+				cerr<<"Warning undefined CHR in VCF vs. BAM header: "<<entries[i].stop.chr<<endl;
+			}
+			read.coordinates.first = (long) entries[i].start.pos + ref_lens[entries[i].start.chr];
+			read.coordinates.second = (long) entries[i].stop.pos + ref_lens[entries[i].stop.chr];
+			if (entries[i].type == 4) { //ins?
+				if(entries[i].sv_len== Parameter::Instance()->huge_ins){
+					entries[i].sv_len++; // bad hack!
+				}
+				svs.stop.max_pos += (long) entries[i].sv_len;
+				read.coordinates.second+=(long) entries[i].sv_len;
+			//	cout << "Parse: " << entries[i].start.pos << " " << entries[i].stop.pos << " " << svs.start.min_pos  <<" "<<svs.stop.max_pos  << endl;
+			}
+
 			read.SV = assign_type(entries[i].type);
-			read.strand=entries[i].strands;
+			read.strand = entries[i].strands;
 			read.type = 2; //called
+			read.length = entries[i].sv_len; //svs.stop.max_pos-svs.start.min_pos;//try
 			svs.support["input"] = read;
-			Breakpoint * br = new Breakpoint(svs, (long) entries[i].sv_len);
+			Breakpoint * br = new Breakpoint(svs, (long) entries[i].sv_len, read.SV);
 			final.insert(br, root_final);
-			//std::cout << "Print:" << std::endl;
-			//final.print(root_final);
 		} else {
-			cerr << "Invalid type found skipping" << endl;
+			invalid_svs++;
+
 		}
 	}
+	cerr << "Invalid types found skipping " << invalid_svs    <<" entries." << endl;
+	//std::cout << "Print:" << std::endl;
+	//final.print(root_final);
 	entries.clear();
 }
 
 void force_calling(std::string bam_file, IPrinter *& printer) {
-	cout<<"Force calling SVs"<<endl;
+	cout << "Force calling SVs" << endl;
 	//parse reads
 	//only process reads overlapping SV
 	estimate_parameters(Parameter::Instance()->bam_files[0]);
@@ -81,9 +105,8 @@ void force_calling(std::string bam_file, IPrinter *& printer) {
 	std::map<std::string, long> ref_lens;
 	fill_tree(final, root_final, ref, ref_lens);
 
-	std::cout << "Start parsing..." << std::endl;
-
 	int current_RefID = 0;
+	std::cout << "Start parsing: Chr "<<ref[current_RefID].RefName << std::endl;
 
 	//FILE * alt_allel_reads;
 	FILE * ref_allel_reads;
@@ -91,6 +114,7 @@ void force_calling(std::string bam_file, IPrinter *& printer) {
 		ref_allel_reads = fopen(Parameter::Instance()->tmp_genotyp.c_str(), "wb");
 	}
 	Alignment * tmp_aln = mapped_file->parseRead(Parameter::Instance()->min_mq);
+
 	long ref_space = ref_lens[ref[tmp_aln->getRefID()].RefName];
 	long num_reads = 0;
 	while (!tmp_aln->getQueryBases().empty()) {
@@ -103,9 +127,9 @@ void force_calling(std::string bam_file, IPrinter *& printer) {
 			}
 
 			//check if overlap with any breakpoint!!
-			long read_start_pos = (long) tmp_aln->getPosition() - (long)Parameter::Instance()->max_dist;
+			long read_start_pos = (long) tmp_aln->getPosition() - (long) Parameter::Instance()->max_dist;
 			read_start_pos += ref_space;
-			long read_stop_pos = read_start_pos + (long) tmp_aln->getAlignment()->Length + (long)Parameter::Instance()->max_dist;	//getRefLength();//(long) tmp_aln->getPosition();
+			long read_stop_pos = read_start_pos + (long) tmp_aln->getAlignment()->Length + (long) Parameter::Instance()->max_dist;	//getRefLength();//(long) tmp_aln->getPosition();
 
 			if (final.overlaps(read_start_pos, read_stop_pos, root_final)) {
 				//SCAN read:
@@ -166,7 +190,7 @@ void force_calling(std::string bam_file, IPrinter *& printer) {
 		}
 	}
 
-	std::cout << "Print:" << std::endl;
+	//std::cout << "Print:" << std::endl;
 	//final.print(root_final);
 
 	//filter and copy results:
@@ -186,4 +210,5 @@ void force_calling(std::string bam_file, IPrinter *& printer) {
 		points[i]->predict_SV();
 		printer->printSV(points[i]); //redo! Ignore min support + STD etc.
 	}
+	std::cout << "Fin!" << std::endl;
 }


=====================================
src/force_calling/VCF_parser.cpp
=====================================
--- a/src/force_calling/VCF_parser.cpp
+++ b/src/force_calling/VCF_parser.cpp
@@ -96,6 +96,7 @@ short get_type(std::string type) {
 	} else if (strncmp(type.c_str(), "BND", 3) == 0) { //can be inv/inter/tra
 		return 5;
 	}
+	//std::cout << "type:" << type[0] << type[1] << type[2] << std::endl;
 	return -1;
 }
 
@@ -276,8 +277,8 @@ std::string get_most_effect(std::string alt, int ref) {
 }
 
 std::vector<strvcfentry> parse_vcf(std::string filename, int min_svs) {
-	size_t buffer_size = 200000000;
-	char*buffer = new char[buffer_size];
+	//size_t buffer_size = 200000000;
+	std::string buffer; //= new char[buffer_size];
 	std::ifstream myfile;
 	myfile.open(filename.c_str(), std::ifstream::in);
 	if (!myfile.good()) {
@@ -286,8 +287,8 @@ std::vector<strvcfentry> parse_vcf(std::string filename, int min_svs) {
 	}
 
 	std::vector<strvcfentry> calls;
-	myfile.getline(buffer, buffer_size);
-
+	//myfile.getline(buffer, buffer_size);
+	getline(myfile, buffer);
 	int num = 0;
 	while (!myfile.eof()) {
 		if (buffer[0] != '#') {
@@ -306,7 +307,7 @@ std::vector<strvcfentry> parse_vcf(std::string filename, int min_svs) {
 			tmp.num_reads.first = 0;
 			tmp.num_reads.second = 0;
 			tmp.sv_len = -1;
-			for (size_t i = 0; i < buffer_size && buffer[i] != '\0' && buffer[i] != '\n'; i++) {
+			for (size_t i = 0; i < buffer.size() && buffer[i] != '\0' && buffer[i] != '\n'; i++) {
 
 				if (count == 0 && buffer[i] != '\t') {
 					tmp.start.chr += buffer[i];
@@ -328,7 +329,7 @@ std::vector<strvcfentry> parse_vcf(std::string filename, int min_svs) {
 					tmp.stop = parse_stop(&buffer[i]);
 					//std::cout<<"Stop:"<<tmp.stop.pos<<std::endl;
 				}
-				if (count == 7 && strncmp(&buffer[i], "SVTYPE=", 7) == 0) {
+				if (count == 7 && (tmp.type == -1 && strncmp(&buffer[i], "SVTYPE=", 7) == 0)) {
 					tmp.type = get_type(std::string(&buffer[i + 7]));
 				}
 				if (count == 7 && strncmp(&buffer[i], ";SU=", 4) == 0) { //for lumpy!
@@ -376,18 +377,13 @@ std::vector<strvcfentry> parse_vcf(std::string filename, int min_svs) {
 					tmp.num_reads = parse_delly(&buffer[i]);
 				}
 
-				if (count == 4 && buffer[i - 1] == '<') {
+				if (count == 4 && (tmp.type == -1 && buffer[i - 1] == '<')) {
 					tmp.type = get_type(std::string(&buffer[i]));
 				}
 				if (tmp.stop.pos == -1 && (count == 4 && (buffer[i - 1] == '[' || buffer[i - 1] == ']'))) {
 					tmp.stop = parse_pos(&buffer[i - 1]);
 				}
 
-				if (count == 9 && buffer[i - 1] == '\t') {
-					tmp.calls[filename] = std::string(&buffer[i]);
-					break;
-				}
-
 				if (count < 9) {
 					tmp.header += buffer[i];
 				}
@@ -434,14 +430,7 @@ std::vector<strvcfentry> parse_vcf(std::string filename, int min_svs) {
 				tmp.sv_len = abs(tmp.start.pos - tmp.stop.pos);
 			}
 			if ((strcmp(tmp.start.chr.c_str(), tmp.stop.chr.c_str()) != 0 || (tmp.sv_len >= min_svs))) { // || tmp.type==4
-				std::size_t found = tmp.stop.chr.find("chr");
-				if (found != std::string::npos) {
-					tmp.stop.chr.erase(tmp.stop.chr.begin() + found, tmp.stop.chr.begin() + found + 3);
-				}
-				found = tmp.start.chr.find("chr");
-				if (found != std::string::npos) {
-					tmp.start.chr.erase(tmp.start.chr.begin() + found, tmp.start.chr.begin() + found + 3);
-				}
+
 
 				if (tmp.type == 5) { //BND
 					if (strcmp(tmp.stop.chr.c_str(), tmp.start.chr.c_str()) == 0) {
@@ -452,11 +441,12 @@ std::vector<strvcfentry> parse_vcf(std::string filename, int min_svs) {
 				}
 				calls.push_back(tmp);
 			}
+
 			tmp.calls.clear();
 		} else {
 
 		}
-		myfile.getline(buffer, buffer_size);
+		getline(myfile, buffer);
 	}
 	myfile.close();
 //std::cout << calls.size() << std::endl;


=====================================
src/print/BedpePrinter.cpp
=====================================
--- a/src/print/BedpePrinter.cpp
+++ b/src/print/BedpePrinter.cpp
@@ -8,7 +8,7 @@
 #include "BedpePrinter.h"
 
 void BedpePrinter::print_header() {
-	fprintf(file, "%s", "#Chrom\tstart\tstop\tchrom2\tstart2\tstop2\tvariant_name/ID\tscore (smaller is better)\tstrand1\tstrand2\ttype\tnumber_of_split_reads\tbest_chr1\tbest_start\tbest_chr2\tbest_stop\tpredicted_length\n");
+	fprintf(file, "%s", "#Chrom\tstart\tstop\tchrom2\tstart2\tstop2\tvariant_name/ID\tscore (smaller is better)\tstrand1\tstrand2\ttype\tnumber_of_split_reads\tbest_chr1\tbest_start\tbest_chr2\tbest_stop\tpredicted_length\tFILTER\n");
 }
 void BedpePrinter::print_body(Breakpoint * &SV, RefVector ref) {
 	if (!this->bed_tree.is_in(SV->get_coordinates().start.most_support, this->root) && !this->bed_tree.is_in(SV->get_coordinates().stop.most_support, this->root)) {
@@ -29,18 +29,35 @@ void BedpePrinter::print_body(Breakpoint * &SV, RefVector ref) {
 			std::string chr;
 			std::string strands = SV->get_strand(2);
 			int pos = IPrinter::calc_pos(SV->get_coordinates().start.min_pos, ref, chr);
+
+			//start coordinates:
 			fprintf(file, "%s", chr.c_str());
 			fprintf(file, "%c", '\t');
 			fprintf(file, "%i", pos);
 			fprintf(file, "%c", '\t');
 			fprintf(file, "%i", IPrinter::calc_pos(SV->get_coordinates().start.max_pos, ref, chr));
 			fprintf(file, "%c", '\t');
-			pos = IPrinter::calc_pos(SV->get_coordinates().stop.min_pos, ref, chr);
+
+			//stop coordinates
+			string chr_start;
+			int start = IPrinter::calc_pos(SV->get_coordinates().start.most_support, ref, chr_start);
+
+			long end_coord = SV->get_coordinates().stop.min_pos;
+			if (((SV->get_SVtype() & INS) && SV->get_length() == Parameter::Instance()->huge_ins) && SV->get_types().is_ALN) {
+				end_coord = std::max((SV->get_coordinates().stop.min_pos -(long)SV->get_length()), (long)start);
+			}
+
+			pos = IPrinter::calc_pos(end_coord, ref, chr);
+
 			fprintf(file, "%s", chr.c_str());
 			fprintf(file, "%c", '\t');
 			fprintf(file, "%i", pos);
 			fprintf(file, "%c", '\t');
-			fprintf(file, "%i", IPrinter::calc_pos(SV->get_coordinates().stop.max_pos, ref, chr));
+			end_coord = SV->get_coordinates().stop.max_pos;
+			if (((SV->get_SVtype() & INS) && SV->get_length() == Parameter::Instance()->huge_ins) && SV->get_types().is_ALN) {
+				end_coord = std::max((SV->get_coordinates().stop.max_pos - (long) SV->get_length()), (long)start);
+			}
+			fprintf(file, "%i", IPrinter::calc_pos(end_coord, ref, chr));
 			fprintf(file, "%c", '\t');
 			fprintf(file, "%i", id);
 			id++;
@@ -55,21 +72,37 @@ void BedpePrinter::print_body(Breakpoint * &SV, RefVector ref) {
 			fprintf(file, "%c", '\t');
 			fprintf(file, "%i", SV->get_support());
 			fprintf(file, "%c", '\t');
-			pos = IPrinter::calc_pos(SV->get_coordinates().start.most_support, ref, chr);
-			fprintf(file, "%s", chr.c_str());
+			fprintf(file, "%s", chr_start.c_str());
 			fprintf(file, "%c", '\t');
-			fprintf(file, "%i", pos);
+			fprintf(file, "%i", start);
 			fprintf(file, "%c", '\t');
-			pos = IPrinter::calc_pos(SV->get_coordinates().stop.most_support, ref, chr);
+
+			end_coord = SV->get_coordinates().stop.most_support;
+			if (((SV->get_SVtype() & INS) && SV->get_length() == Parameter::Instance()->huge_ins) && SV->get_types().is_ALN) {
+				end_coord =  std::max((SV->get_coordinates().stop.most_support - (long) SV->get_length()), (long)start);
+			}
+
+			pos = IPrinter::calc_pos(end_coord, ref, chr);
 			fprintf(file, "%s", chr.c_str());
 			fprintf(file, "%c", '\t');
 			fprintf(file, "%i", pos);
 			fprintf(file, "%c", '\t');
-			if (((SV->get_SVtype() & INS) && SV->get_length() == Parameter::Instance()->huge_ins) && !SV->get_types().is_SR) {
+
+
+			if (((SV->get_SVtype() & INS) && SV->get_length() == Parameter::Instance()->huge_ins) && SV->get_types().is_ALN) {//!
 				fprintf(file, "%s", "NA");
 			} else {
 				fprintf(file, "%i", SV->get_length());
 			}
+
+			if (((SV->get_SVtype() & INS) && SV->get_length() == Parameter::Instance()->huge_ins) && SV->get_types().is_ALN) {
+				fprintf(file, "%s", "\tUNRESOLVED");
+			} else 	if (std_quant.first < 10 && std_quant.second < 10) {
+				fprintf(file, "%s", "\tPRECISE");
+			} else {
+				fprintf(file, "%s", "\tIMPRECISE");
+			}
+
 			//fprintf(file, "%c", '\t');
 			//fprintf(file, "%i", SV->get_support());
 			fprintf(file, "%c", '\n');


=====================================
src/print/VCFPrinter.cpp
=====================================
--- a/src/print/VCFPrinter.cpp
+++ b/src/print/VCFPrinter.cpp
@@ -40,15 +40,18 @@ void VCFPrinter::print_header() {
 	fprintf(file, "%s", "##INFO=<ID=SVLEN,Number=1,Type=Integer,Description=\"Length of the SV\">\n");
 	fprintf(file, "%s", "##INFO=<ID=SVMETHOD,Number=1,Type=String,Description=\"Type of approach used to detect SV\">\n");
 	fprintf(file, "%s", "##INFO=<ID=SVTYPE,Number=1,Type=String,Description=\"Type of structural variant\">\n");
-	fprintf(file, "%s", "##INFO=<ID=STD_quant_start,Number=.,Type=Integer,Description=\"STD of the start breakpoints across the reads.\">\n");
-	fprintf(file, "%s", "##INFO=<ID=STD_quant_stop,Number=.,Type=Integer,Description=\"STD of the stop breakpoints across the reads.\">\n");
-	fprintf(file, "%s", "##INFO=<ID=Kurtosis_quant_start,Number=.,Type=Integer,Description=\"Kurtosis value of the start breakpoints accross the reads.\">\n");
-	fprintf(file, "%s", "##INFO=<ID=Kurtosis_quant_stop,Number=.,Type=Integer,Description=\"Kurtosis value of the stop breakpoints accross the reads.\">\n");
+	if (Parameter::Instance()->report_n_reads > 0 || Parameter::Instance()->report_n_reads == -1) {
+		fprintf(file, "%s", "##INFO=<ID=RNAMES,Number=1,Type=String,Description=\"Names of reads supporting SVs (comma seperated)\">\n");
+	}
+	fprintf(file, "%s", "##INFO=<ID=STD_quant_start,Number=A,Type=Integer,Description=\"STD of the start breakpoints across the reads.\">\n");
+	fprintf(file, "%s", "##INFO=<ID=STD_quant_stop,Number=A,Type=Integer,Description=\"STD of the stop breakpoints across the reads.\">\n");
+	fprintf(file, "%s", "##INFO=<ID=Kurtosis_quant_start,Number=A,Type=Integer,Description=\"Kurtosis value of the start breakpoints accross the reads.\">\n");
+	fprintf(file, "%s", "##INFO=<ID=Kurtosis_quant_stop,Number=A,Type=Integer,Description=\"Kurtosis value of the stop breakpoints accross the reads.\">\n");
 	fprintf(file, "%s", "##INFO=<ID=SUPTYPE,Number=1,Type=String,Description=\"Type by which the variant is supported.(SR,ALN)\">\n");
 	fprintf(file, "%s", "##INFO=<ID=SUPTYPE,Number=1,Type=String,Description=\"Type by which the variant is supported.(SR,ALN)\">\n");
-	fprintf(file, "%s", "##INFO=<ID=STRANDS,Number=.,Type=String,Description=\"Strand orientation of the adjacency in BEDPE format (DEL:+-, DUP:-+, INV:++/--)\">\n");
-	fprintf(file, "%s", "##INFO=<ID=AF,Number=.,Type=Integer,Description=\"Allele Frequency.\">\n");
-	fprintf(file, "%s", "##INFO=<ID=ZMW,Number=.,Type=Integer,Description=\"Number of ZMWs (Pacbio) supporting SV.\">\n");
+	fprintf(file, "%s", "##INFO=<ID=STRANDS,Number=A,Type=String,Description=\"Strand orientation of the adjacency in BEDPE format (DEL:+-, DUP:-+, INV:++/--)\">\n");
+	fprintf(file, "%s", "##INFO=<ID=AF,Number=A,Type=Float,Description=\"Allele Frequency.\">\n");
+	fprintf(file, "%s", "##INFO=<ID=ZMW,Number=A,Type=Integer,Description=\"Number of ZMWs (Pacbio) supporting SV.\">\n");
 	fprintf(file, "%s", "##FORMAT=<ID=GT,Number=1,Type=String,Description=\"Genotype\">\n");
 	fprintf(file, "%s", "##FORMAT=<ID=DR,Number=1,Type=Integer,Description=\"# high-quality reference reads\">\n");
 	fprintf(file, "%s", "##FORMAT=<ID=DV,Number=1,Type=Integer,Description=\"# high-quality variant reads\">\n");
@@ -86,7 +89,12 @@ void VCFPrinter::print_body(Breakpoint * &SV, RefVector ref) {
 			fprintf(file, "%i", id);
 			id++;
 
-			int end = IPrinter::calc_pos(SV->get_coordinates().stop.most_support, ref, chr);
+			long end_coord=SV->get_coordinates().stop.most_support;
+			if (((SV->get_SVtype() & INS) && SV->get_length() == Parameter::Instance()->huge_ins) && SV->get_types().is_ALN) {
+				end_coord=std::max((SV->get_coordinates().stop.most_support - (long)SV->get_length()),(long) start);
+			}
+
+			int end = IPrinter::calc_pos(end_coord, ref, chr);
 			std::string strands = SV->get_strand(1);
 			fprintf(file, "%s", "\tN\t");
 			if (Parameter::Instance()->reportBND && (SV->get_SVtype() & TRA)) {
@@ -112,7 +120,14 @@ void VCFPrinter::print_body(Breakpoint * &SV, RefVector ref) {
 				fprintf(file, "%s", IPrinter::get_type(SV->get_SVtype()).c_str());
 				fprintf(file, "%c", '>');
 			}
-			fprintf(file, "%s", "\t.\tPASS\t");
+
+			if (((SV->get_SVtype() & INS) && SV->get_length() == Parameter::Instance()->huge_ins) && SV->get_types().is_ALN) {
+				fprintf(file, "%s", "\t.\tUNRESOLVED\t");
+			}else{
+				fprintf(file, "%s", "\t.\tPASS\t");
+			}
+
+
 			if (std_quant.first < 10 && std_quant.second < 10) {
 				fprintf(file, "%s", "PRECISE");
 			} else {
@@ -126,11 +141,11 @@ void VCFPrinter::print_body(Breakpoint * &SV, RefVector ref) {
 				fprintf(file, "%s", chr.c_str());
 				fprintf(file, "%s", ";END=");
 
-				if (SV->get_SVtype() & INS) {
-					fprintf(file, "%i", std::max((int) (end - SV->get_length()), start));
-				} else {
+				//if (SV->get_SVtype() & INS) {
+				//	fprintf(file, "%i", std::max((int) (end - SV->get_length()), start));
+				//} else {
 					fprintf(file, "%i", end);
-				}
+				//}
 			}
 			if (zmws != 0) {
 				fprintf(file, "%s", ";ZMW=");
@@ -160,8 +175,8 @@ void VCFPrinter::print_body(Breakpoint * &SV, RefVector ref) {
 			fprintf(file, "%s", SV->get_supporting_types().c_str());
 			fprintf(file, "%s", ";SVLEN=");
 
-			if (((SV->get_SVtype() & INS) && SV->get_length() == Parameter::Instance()->huge_ins) && !SV->get_types().is_SR) {
-				fprintf(file, "%s", "NA");
+			if (((SV->get_SVtype() & INS) && SV->get_length() == Parameter::Instance()->huge_ins) && SV->get_types().is_ALN) {//!
+				fprintf(file, "%i", 999999999);
 			} else {
 				fprintf(file, "%i", SV->get_length());
 			}


=====================================
src/sub/Breakpoint.cpp
=====================================
--- a/src/sub/Breakpoint.cpp
+++ b/src/sub/Breakpoint.cpp
@@ -291,12 +291,11 @@ void Breakpoint::calc_support() {
 	this->sv_type = eval_type(sv);
 
 	if (get_SVtype() & TRA) { // we cannot make assumptions abut support yet.
-		set_valid((bool) (get_support() > 1)); // this is needed as we take each chr independently and just look at the primary alignment
+		set_valid((bool) (get_support() >= 1)); // this is needed as we take each chr independently and just look at the primary alignment
 	} else if (get_support() >= Parameter::Instance()->min_support) {
 		predict_SV();
 		set_valid((bool) (get_length() > Parameter::Instance()->min_length));
 	}
-
 }
 
 char Breakpoint::eval_type(ushort* SV) {
@@ -335,11 +334,15 @@ char Breakpoint::eval_type(ushort* SV) {
 }
 
 long get_median(std::vector<long> corrds) {
+
 	sort(corrds.begin(), corrds.end());
 	if (corrds.size() % 2 == 0) {
 		return (corrds[corrds.size() / 2 - 1] + corrds[corrds.size() / 2]) / 2;
 	}
-	return corrds[corrds.size() / 2];
+	if (corrds.size() == 1) {
+		return corrds[0];
+	}
+	return corrds[(int) corrds.size() / 2];
 }
 
 void Breakpoint::predict_SV() {
@@ -355,10 +358,14 @@ void Breakpoint::predict_SV() {
 	std::vector<long> stops2;
 	std::vector<long> lengths2;
 
-	for (std::map<std::string, read_str>::iterator i = positions.support.begin(); i != positions.support.end(); i++) {
+	bool scan_reads=true;
+	if(positions.support.find("input") != positions.support.end() && !Parameter::Instance()->change_coords){
+		scan_reads=false;
+	}
 
-		if (((*i).second.SV & this->sv_type) && strncmp((*i).first.c_str(), "input", 5) != 0) {			// && !((*i).second.SV & INS && (*i).second.length==Parameter::Instance()->huge_ins)) { ///check type
+	for (std::map<std::string, read_str>::iterator i = positions.support.begin(); i != positions.support.end() && scan_reads; i++) {
 
+		if (((*i).second.SV & this->sv_type) && strncmp((*i).first.c_str(), "input", 5) != 0) {			// && !((*i).second.SV & INS && (*i).second.length==Parameter::Instance()->huge_ins)) { ///check type
 			if ((*i).second.coordinates.first != -1) {
 				if ((*i).second.length != Parameter::Instance()->huge_ins) {
 					if (starts.find((*i).second.coordinates.first) == starts.end()) {
@@ -419,7 +426,7 @@ void Breakpoint::predict_SV() {
 	int maxim = 0;
 	long coord = 0;
 	if (num > 0) {
-
+		//find the start coordinate:
 		for (map<long, int>::iterator i = starts.begin(); i != starts.end(); i++) {
 			if ((*i).second > maxim) {
 				coord = (*i).first;
@@ -433,6 +440,7 @@ void Breakpoint::predict_SV() {
 		}
 		this->indel_sequence = "";
 
+		//find the stop coordinate:
 
 		maxim = 0;
 		coord = 0;
@@ -449,7 +457,7 @@ void Breakpoint::predict_SV() {
 		} else {
 			this->positions.stop.most_support = coord;
 		}
-
+		//find the length:
 		if (!(this->get_SVtype() & INS)) { //all types but Insertions:
 			this->length = this->positions.stop.most_support - this->positions.start.most_support;
 
@@ -463,20 +471,18 @@ void Breakpoint::predict_SV() {
 					coord = (*i).first;
 					maxim = (*i).second;
 				}
-
 			}
-			if (maxim < 3) {
+			if (maxim == 0) { //for forcecalling
+				this->length = this->positions.stop.most_support - this->positions.start.most_support;
+			} else if (maxim < 3) {
 				this->length = get_median(lengths2);
 			} else {
 				this->length = coord;
 			}
-
-			//	if(del)
+		//	cout << "third len: " << this->length << endl; // problem!
 		}
-
 		starts.clear();
 		stops.clear();
-
 		for (size_t i = 0; i < strands.size(); i++) {
 			maxim = 0;
 			std::string id;
@@ -494,27 +500,29 @@ void Breakpoint::predict_SV() {
 		}
 		strands.clear();
 
-
 		std::map<std::string, read_str>::iterator tmp = positions.support.begin();
-		int start_prev_dist=1000;
-		int stop_prev_dist=1000;
-		while(tmp != positions.support.end()) {
-			int start_dist=abs((*tmp).second.coordinates.first- this->positions.start.most_support);
-			int stop_dist=abs((*tmp).second.coordinates.second- this->positions.stop.most_support);
-			if (((*tmp).second.SV & this->sv_type) && ( (start_dist<start_prev_dist && stop_dist<stop_prev_dist)  && (strcmp((*tmp).second.sequence.c_str(), "NA") != 0 && !(*tmp).second.sequence.empty()))) {
+		int start_prev_dist = 1000;
+		int stop_prev_dist = 1000;
+		while (tmp != positions.support.end()) {
+			int start_dist = abs((*tmp).second.coordinates.first - this->positions.start.most_support);
+			int stop_dist = abs((*tmp).second.coordinates.second - this->positions.stop.most_support);
+			if (((*tmp).second.SV & this->sv_type) && ((start_dist < start_prev_dist && stop_dist < stop_prev_dist) && (strcmp((*tmp).second.sequence.c_str(), "NA") != 0 && !(*tmp).second.sequence.empty()))) {
 				this->indel_sequence = (*tmp).second.sequence;
 			}
 			tmp++;
 		}
 	}
-
-	if (num == 0 && positions.support.find("input") != positions.support.end()) {
-		this->positions.stop.most_support = this->positions.stop.max_pos;
-		this->positions.start.most_support = this->positions.start.min_pos;
-		this->length = this->positions.stop.max_pos - this->positions.start.min_pos;
+	this->supporting_types = "";
+	if (positions.support.find("input") != positions.support.end()) {
+		if (num == 0) {
+			//cout << "Force called: "<< this->positions.support["input"].length << endl;
+			this->positions.start.most_support = this->positions.support["input"].coordinates.first;
+			this->positions.stop.most_support = this->positions.support["input"].coordinates.second;
+			//this->length = this->positions.support["input"].length;
+		}
+		this->supporting_types += "input";
 	}
 
-	this->supporting_types = "";
 	if (aln) {
 		this->type.is_ALN = true;
 		this->supporting_types += "AL";
@@ -613,7 +621,7 @@ std::string Breakpoint::get_strand(int num_best) {
 #include "Detect_Breakpoints.h"
 std::string Breakpoint::to_string() {
 	std::stringstream ss;
-	if (positions.support.size() > 1) {
+	if (positions.support.size() >= 1) {
 		ss << "\t\tTREE: ";
 		ss << TRANS_type(this->get_SVtype());
 		ss << " ";


=====================================
src/sub/Breakpoint.h
=====================================
--- a/src/sub/Breakpoint.h
+++ b/src/sub/Breakpoint.h
@@ -113,6 +113,18 @@ public:
 		this->grouped_node=NULL;
 		this->length=len;
 	}
+	Breakpoint(position_str sv,long len,char sv_type) {
+		ref_allele=0;
+		should_be_stored=false;
+		this->sv_type =sv_type;
+		type.is_ALN=((*sv.support.begin()).second.type==0);
+		type.is_SR=((*sv.support.begin()).second.type==1);
+		type.is_Noise=((*sv.support.begin()).second.type==2);
+		type_support=-1;
+		this->positions = sv;
+		this->grouped_node=NULL;
+		this->length=len;
+	}
 	~Breakpoint() {
 
 	}


=====================================
src/sub/Detect_Breakpoints.cpp
=====================================
--- a/src/sub/Detect_Breakpoints.cpp
+++ b/src/sub/Detect_Breakpoints.cpp
@@ -184,8 +184,7 @@ void detect_breakpoints(std::string read_filename, IPrinter *& printer) {
 	}
 //Using PlaneSweep to comp coverage and iterate through reads:
 //PlaneSweep * sweep = new PlaneSweep();
-	std::cout << "Start parsing..." << std::endl;
-//Using Interval tree to store and manage breakpoints:
+	//Using Interval tree to store and manage breakpoints:
 
 	IntervallTree final;
 	IntervallTree bst;
@@ -207,6 +206,7 @@ void detect_breakpoints(std::string read_filename, IPrinter *& printer) {
 	if (Parameter::Instance()->genotype) {
 		go = new Genotyper();
 	}*/
+	std::cout << "Start parsing... "<<ref[tmp_aln->getRefID()].RefName  << std::endl;
 
 	while (!tmp_aln->getQueryBases().empty()) {
 
@@ -249,6 +249,9 @@ void detect_breakpoints(std::string read_filename, IPrinter *& printer) {
 			std::vector<aln_str> split_events;
 			if (tmp_aln->getMappingQual() > Parameter::Instance()->min_mq) {
 				double score = tmp_aln->get_scrore_ratio();
+
+				//
+
 #pragma omp parallel // starts a new team
 				{
 #pragma omp sections
@@ -654,6 +657,9 @@ void add_splits(Alignment *& tmp, std::vector<aln_str> events, short type, RefVe
 }
 
 void estimate_parameters(std::string read_filename) {
+	if(Parameter::Instance()->skip_parameter_estimation){
+		return;
+	}
 	cout << "Estimating parameter..." << endl;
 	BamParser * mapped_file = 0;
 	RefVector ref;


=====================================
test_set/README
=====================================
--- /dev/null
+++ b/test_set/README
@@ -0,0 +1,11 @@
+This is a test set to confirm the correct installation of Sniffles.
+Run Sniffles with default parameters:
+./sniffles -m reads_region.bam -v test.vcf
+
+You should expect to find one deletion at 21:21492142-21492648. You can also compare to the output provided in expected_output.vcf.
+
+If you have questions or concern please submit an issue on github: 
+https://github.com/fritzsedlazeck/Sniffles/issues
+
+or drop me an email:
+fritz.sedlazeck at gmail.com


=====================================
test_set/expected_output.vcf
=====================================
--- /dev/null
+++ b/test_set/expected_output.vcf
@@ -0,0 +1,118 @@
+##fileformat=VCFv4.2
+##source=Sniffles
+##fileDate=20180307
+##contig=<ID=1,length=249250621>
+##contig=<ID=2,length=243199373>
+##contig=<ID=3,length=198022430>
+##contig=<ID=4,length=191154276>
+##contig=<ID=5,length=180915260>
+##contig=<ID=6,length=171115067>
+##contig=<ID=7,length=159138663>
+##contig=<ID=8,length=146364022>
+##contig=<ID=9,length=141213431>
+##contig=<ID=10,length=135534747>
+##contig=<ID=11,length=135006516>
+##contig=<ID=12,length=133851895>
+##contig=<ID=13,length=115169878>
+##contig=<ID=14,length=107349540>
+##contig=<ID=15,length=102531392>
+##contig=<ID=16,length=90354753>
+##contig=<ID=17,length=81195210>
+##contig=<ID=18,length=78077248>
+##contig=<ID=19,length=59128983>
+##contig=<ID=20,length=63025520>
+##contig=<ID=21,length=48129895>
+##contig=<ID=22,length=51304566>
+##contig=<ID=X,length=155270560>
+##contig=<ID=Y,length=59373566>
+##contig=<ID=MT,length=16569>
+##contig=<ID=GL000207.1,length=4262>
+##contig=<ID=GL000226.1,length=15008>
+##contig=<ID=GL000229.1,length=19913>
+##contig=<ID=GL000231.1,length=27386>
+##contig=<ID=GL000210.1,length=27682>
+##contig=<ID=GL000239.1,length=33824>
+##contig=<ID=GL000235.1,length=34474>
+##contig=<ID=GL000201.1,length=36148>
+##contig=<ID=GL000247.1,length=36422>
+##contig=<ID=GL000245.1,length=36651>
+##contig=<ID=GL000197.1,length=37175>
+##contig=<ID=GL000203.1,length=37498>
+##contig=<ID=GL000246.1,length=38154>
+##contig=<ID=GL000249.1,length=38502>
+##contig=<ID=GL000196.1,length=38914>
+##contig=<ID=GL000248.1,length=39786>
+##contig=<ID=GL000244.1,length=39929>
+##contig=<ID=GL000238.1,length=39939>
+##contig=<ID=GL000202.1,length=40103>
+##contig=<ID=GL000234.1,length=40531>
+##contig=<ID=GL000232.1,length=40652>
+##contig=<ID=GL000206.1,length=41001>
+##contig=<ID=GL000240.1,length=41933>
+##contig=<ID=GL000236.1,length=41934>
+##contig=<ID=GL000241.1,length=42152>
+##contig=<ID=GL000243.1,length=43341>
+##contig=<ID=GL000242.1,length=43523>
+##contig=<ID=GL000230.1,length=43691>
+##contig=<ID=GL000237.1,length=45867>
+##contig=<ID=GL000233.1,length=45941>
+##contig=<ID=GL000204.1,length=81310>
+##contig=<ID=GL000198.1,length=90085>
+##contig=<ID=GL000208.1,length=92689>
+##contig=<ID=GL000191.1,length=106433>
+##contig=<ID=GL000227.1,length=128374>
+##contig=<ID=GL000228.1,length=129120>
+##contig=<ID=GL000214.1,length=137718>
+##contig=<ID=GL000221.1,length=155397>
+##contig=<ID=GL000209.1,length=159169>
+##contig=<ID=GL000218.1,length=161147>
+##contig=<ID=GL000220.1,length=161802>
+##contig=<ID=GL000213.1,length=164239>
+##contig=<ID=GL000211.1,length=166566>
+##contig=<ID=GL000199.1,length=169874>
+##contig=<ID=GL000217.1,length=172149>
+##contig=<ID=GL000216.1,length=172294>
+##contig=<ID=GL000215.1,length=172545>
+##contig=<ID=GL000205.1,length=174588>
+##contig=<ID=GL000219.1,length=179198>
+##contig=<ID=GL000224.1,length=179693>
+##contig=<ID=GL000223.1,length=180455>
+##contig=<ID=GL000195.1,length=182896>
+##contig=<ID=GL000212.1,length=186858>
+##contig=<ID=GL000222.1,length=186861>
+##contig=<ID=GL000200.1,length=187035>
+##contig=<ID=GL000193.1,length=189789>
+##contig=<ID=GL000194.1,length=191469>
+##contig=<ID=GL000225.1,length=211173>
+##contig=<ID=GL000192.1,length=547496>
+##contig=<ID=NC_007605,length=171823>
+##contig=<ID=hs37d5,length=35477943>
+##ALT=<ID=DEL,Description="Deletion">
+##ALT=<ID=DUP,Description="Duplication">
+##ALT=<ID=INV,Description="Inversion">
+##ALT=<ID=INVDUP,Description="InvertedDUP with unknown boundaries">
+##ALT=<ID=TRA,Description="Translocation">
+##ALT=<ID=INS,Description="Insertion">
+##INFO=<ID=CHR2,Number=1,Type=String,Description="Chromosome for END coordinate in case of a translocation">
+##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the structural variant">
+##INFO=<ID=MAPQ,Number=1,Type=Integer,Description="Median mapping quality of paired-ends">
+##INFO=<ID=RE,Number=1,Type=Integer,Description="read support">
+##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation">
+##INFO=<ID=PRECISE,Number=0,Type=Flag,Description="Precise structural variation">
+##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Length of the SV">
+##INFO=<ID=SVMETHOD,Number=1,Type=String,Description="Type of approach used to detect SV">
+##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
+##INFO=<ID=STD_quant_start,Number=A,Type=Integer,Description="STD of the start breakpoints across the reads.">
+##INFO=<ID=STD_quant_stop,Number=A,Type=Integer,Description="STD of the stop breakpoints across the reads.">
+##INFO=<ID=Kurtosis_quant_start,Number=A,Type=Integer,Description="Kurtosis value of the start breakpoints accross the reads.">
+##INFO=<ID=Kurtosis_quant_stop,Number=A,Type=Integer,Description="Kurtosis value of the stop breakpoints accross the reads.">
+##INFO=<ID=SUPTYPE,Number=1,Type=String,Description="Type by which the variant is supported.(SR,ALN)">
+##INFO=<ID=SUPTYPE,Number=1,Type=String,Description="Type by which the variant is supported.(SR,ALN)">
+##INFO=<ID=STRANDS,Number=A,Type=String,Description="Strand orientation of the adjacency in BEDPE format (DEL:+-, DUP:-+, INV:++/--)">
+##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency.">
+##INFO=<ID=ZMW,Number=A,Type=Integer,Description="Number of ZMWs (Pacbio) supporting SV.">
+##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
+##FORMAT=<ID=DR,Number=1,Type=Integer,Description="# high-quality reference reads">
+##FORMAT=<ID=DV,Number=1,Type=Integer,Description="# high-quality variant reads">
+#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	reads_region.bam
+21	21492142	0	N	<DEL>	.	PASS	PRECISE;SVMETHOD=Snifflesv1.0.8;CHR2=21;END=21492648;STD_quant_start=0.000000;STD_quant_stop=0.000000;Kurtosis_quant_start=0.572582;Kurtosis_quant_stop=1.417662;SVTYPE=DEL;SUPTYPE=AL,SR;SVLEN=506;STRANDS=+-;RE=48	GT:DR:DV	./.:.:48


=====================================
test_set/reads_region.bam
=====================================
Binary files /dev/null and b/test_set/reads_region.bam differ



View it on GitLab: https://salsa.debian.org/med-team/sniffles/commit/cd703265953cb8501fd451120dc69252a7e0e60b

---
View it on GitLab: https://salsa.debian.org/med-team/sniffles/commit/cd703265953cb8501fd451120dc69252a7e0e60b
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/debian-med-commit/attachments/20180324/cda8ecf8/attachment-0001.html>


More information about the debian-med-commit mailing list