[med-svn] r19049 - in trunk/packages/mummer/trunk/debian: . patches
Andreas Tille
tille at moszumanska.debian.org
Tue Apr 14 12:39:04 UTC 2015
Author: tille
Date: 2015-04-14 12:39:02 +0000 (Tue, 14 Apr 2015)
New Revision: 19049
Added:
trunk/packages/mummer/trunk/debian/delta-filter.1
trunk/packages/mummer/trunk/debian/patches/addition_from_report_duplicates.patch
Modified:
trunk/packages/mummer/trunk/debian/README.Debian
trunk/packages/mummer/trunk/debian/changelog
trunk/packages/mummer/trunk/debian/delta2blocks.1
trunk/packages/mummer/trunk/debian/mummer.1
trunk/packages/mummer/trunk/debian/mummer.links
trunk/packages/mummer/trunk/debian/patches/addition_from_mugsy.patch
trunk/packages/mummer/trunk/debian/patches/series
Log:
Add some patches from mugsy to include delta-filter -b for reporting duplications
Modified: trunk/packages/mummer/trunk/debian/README.Debian
===================================================================
--- trunk/packages/mummer/trunk/debian/README.Debian 2015-04-14 09:25:25 UTC (rev 19048)
+++ trunk/packages/mummer/trunk/debian/README.Debian 2015-04-14 12:39:02 UTC (rev 19049)
@@ -8,7 +8,15 @@
delta2blocks
delta2maf
-These tools are taken over to Debian's mummer code base to enable
-droping the extra code copy in mugsy.
+Moreover mugsy provides modifications to include delta-filter -b for
+reporting duplications.
+These tools and the additional option are taken over to Debian's mummer
+code base to enable droping the extra code copy in mugsy.
+
+Please test and report problems via
+
+ reportbug mummer
+
+
-- Andreas Tille <tille at debian.org> Mon, 13 Apr 2015 22:29:27 +0200
Modified: trunk/packages/mummer/trunk/debian/changelog
===================================================================
--- trunk/packages/mummer/trunk/debian/changelog 2015-04-14 09:25:25 UTC (rev 19048)
+++ trunk/packages/mummer/trunk/debian/changelog 2015-04-14 12:39:02 UTC (rev 19049)
@@ -2,6 +2,8 @@
* Moved debian/upstream to debian/upstream/metadata
* Add some patches from mugsy enhancing functionality by adding two tools
+ * Add some patches from mugsy to include delta-filter -b for reporting
+ duplications
* cme fix dpkg-control
* Remove SF privacy breach script from docs
* Fix manpage syntax and add missing manpage
Added: trunk/packages/mummer/trunk/debian/delta-filter.1
===================================================================
--- trunk/packages/mummer/trunk/debian/delta-filter.1 (rev 0)
+++ trunk/packages/mummer/trunk/debian/delta-filter.1 2015-04-14 12:39:02 UTC (rev 19049)
@@ -0,0 +1,74 @@
+.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.4.
+.TH DELTA-FILTER "1" "April 2015" "mummer 3.23" "User Commands"
+.SH NAME
+delta-filter \- read a delta alignment file from either nucmer or promer and filters the alignments
+.SH SYNOPSIS
+.B delta\-filter
+.RI [options] <deltafile>
+.SH DESCRIPTION
+.TP
+.BR \fB\-1\fR
+1\-to\-1 alignment allowing for rearrangements (intersection of \fB\-r\fR and \fB\-q\fR alignments)
+.TP
+\fB\-g\fR
+1\-to\-1 global alignment not allowing rearrangements
+.TP
+\fB\-h\fR
+Display help information
+.TP
+\fB\-i\fR float
+Set the minimum alignment identity [0, 100], default 0
+.TP
+\fB\-l\fR int
+Set the minimum alignment length, default 0
+.TP
+\fB\-m\fR
+Many\-to\-many alignment allowing for rearrangements (union of \fB\-r\fR and \fB\-q\fR alignments)
+.TP
+\fB\-q\fR
+Maps each position of each query to its best hit in
+the reference, allowing for reference overlaps
+.TP
+\fB\-r\fR
+Maps each position of each reference to its best hit
+in the query, allowing for query overlaps
+.TP
+\fB\-u\fR float Set the minimum alignment uniqueness, i.e. percent of
+.IP
+the alignment matching to unique reference AND query
+sequence [0, 100], default 0
+.TP
+\fB\-o\fR float
+Set the maximum alignment overlap for \fB\-r\fR and \fB\-q\fR options
+as a percent of the alignment length [0, 100], default 100
+.TP
+\fB\-v\fR
+Print the discarded alignments instead of those that pass filters
+.TP
+\fB\-b\fR
+Maps duplications
+(XOR of \fB\-r\fR and \fB\-q\fR alignments, one or the other but not both)
+.PP
+Reads a delta alignment file from either nucmer or promer and
+filters the alignments based on the command\-line switches, leaving
+only the desired alignments which are output to stdout in the same
+delta format as the input. For multiple switches, order of operations
+is as follows: \fB\-i\fR \fB\-l\fR \fB\-u\fR \fB\-q\fR \fB\-r\fR \fB\-g\fR \fB\-m\fR \fB\-1\fR \fB\-b\fR. If an alignment is excluded
+by a preceding operation, it will be ignored by the succeeding
+operations.
+.PP
+An important distinction between the \fB\-g\fR option and the \fB\-1\fR and \fB\-m\fR
+options is that \fB\-g\fR requires the alignments to be mutually consistent
+in their order, while the \fB\-1\fR and \fB\-m\fR options are not required to be
+mutually consistent and therefore tolerate translocations,
+inversions, etc. In general cases, the \fB\-m\fR option is the best choice,
+however \fB\-1\fR can be handy for applications such as SNP finding which
+require a 1\-to\-1 mapping. Finally, for mapping query contigs, or
+sequencing reads, to a reference genome, use \fB\-q\fR. The duplications
+printed with the \fB\-b\fR option are \fB\-r\fR and \fB\-q\fR alignments that are not
+present in the 1\-to\-1 alignment. These alignments are also the
+difference between the \fB\-1\fR and \fB\-m\fR alignments
+.SH SEE ALSO
+The \fB\-b\fR option originates from mugsy that provides a code copy of mummer
+with additional patches. The source can be found in SVN
+svn://svn.code.sf.net/p/mugsy/code/trunk
Modified: trunk/packages/mummer/trunk/debian/delta2blocks.1
===================================================================
--- trunk/packages/mummer/trunk/debian/delta2blocks.1 2015-04-14 09:25:25 UTC (rev 19048)
+++ trunk/packages/mummer/trunk/debian/delta2blocks.1 2015-04-14 12:39:02 UTC (rev 19049)
@@ -12,17 +12,22 @@
.RI [options] <deltafile> <ref ID> <qry ID>
.SH DESCRIPTION
.TP
-\fB\-h\fR Display help information
+\fB\-h\fR
+Display help information
.TP
-\fB\-q\fR Sort alignments by the query start coordinate
+\fB\-q\fR
+Sort alignments by the query start coordinate
.TP
-\fB\-r\fR Sort alignments by the reference start coordinate
+\fB\-r\fR
+Sort alignments by the reference start coordinate
.TP
-\fB\-w\fR int Set the screen width \- default is 60
+\fB\-w\fR int
+Set the screen width \- default is 60
.TP
-\fB\-x\fR int Set the matrix type \- default is 2 (BLOSUM 62),
-.PP
+\fB\-x\fR int
+Set the matrix type \- default is 2 (BLOSUM 62),
other options include 1 (BLOSUM 45) and 3 (BLOSUM 80)
+.br
note: only has effect on amino acid alignments
.PP
Input is the .delta output of either the "nucmer" or the
Modified: trunk/packages/mummer/trunk/debian/mummer.1
===================================================================
--- trunk/packages/mummer/trunk/debian/mummer.1 2015-04-14 09:25:25 UTC (rev 19048)
+++ trunk/packages/mummer/trunk/debian/mummer.1 2015-04-14 12:39:02 UTC (rev 19049)
@@ -24,9 +24,6 @@
.B combineMUMs
.RI <RefSequence> <MatchSequences> <GapsFile>
.br
-.B delta-filter
-.RI [options] <deltafile>
-.br
.B dnadiff
.RI [options] <reference> <query>
or
@@ -147,42 +144,6 @@
\-\-version Display the version information and exit
.br
-.B delta-filter
- \-e float For switches \-g \-r \-q, keep repeats within e percent
- of the best LIS score [0, 100], no repeats by default
- \-g Global alignment using length*identity weighted LIS.
- For every reference-query pair, leave only the aligns
- which form the longest mutually consistent set
- \-h Display help information
- \-i float Set the minimum alignment identity [0, 100], default 0
- \-l int Set the minimum alignment length, default 0
- \-q Query alignment using length*identity weighted LIS.
- For each query, leave only the aligns which form the
- longest consistent set for the query
- \-r Reference alignment using length*identity weighted LIS.
- For each reference, leave only the aligns which form
- the longest consistent set for the reference
- \-u float Set the minimum alignment uniqueness, i.e. percent of
- the alignment matching to unique reference AND query
- sequence [0, 100], default 0
- \-o float Set the maximum alignment overlap for \-r and \-q options
- as a percent of the alignment length [0, 100], default 100
-.PP
- Reads a delta alignment file from either nucmer or promer and
-filters the alignments based on the command-line switches, leaving
-only the desired alignments which are output to stdout in the same
-delta format as the input. For multiple switches, order of operations
-is as follows: \-i \-l \-u \-q \-r \-g. If an alignment is excluded by a
-preceding operation, it will be ignored by the succeeding operations
-.PP
- An important distinction between the \-g option and the \-r \-q
-options is that \-g requires the alignments to be mutually consistent
-in their order, while the \-r \-q options are not required to be
-mutually consistent and therefore tolerate translocations,
-inversions, etc. Thus, \-r provides a one-to-many, \-q a many-to-one,
-\-r \-q a one-to-one local mapping, and \-g a one-to-one global mapping
-of reference and query bases respectively.
-.br
.B mapview
.br
\-h
Modified: trunk/packages/mummer/trunk/debian/mummer.links
===================================================================
--- trunk/packages/mummer/trunk/debian/mummer.links 2015-04-14 09:25:25 UTC (rev 19048)
+++ trunk/packages/mummer/trunk/debian/mummer.links 2015-04-14 12:39:02 UTC (rev 19049)
@@ -1,7 +1,6 @@
usr/share/man/man1/mummer.1 usr/share/man/man1/mummer-annotate.1
usr/share/man/man1/mummer.1 usr/share/man/man1/combineMUMs.1
usr/share/man/man1/mummer.1 usr/share/man/man1/dnadiff.1
-usr/share/man/man1/mummer.1 usr/share/man/man1/delta-filter.1
usr/share/man/man1/mummer.1 usr/share/man/man1/exact-tandems.1
usr/share/man/man1/mummer.1 usr/share/man/man1/gaps.1
usr/share/man/man1/mummer.1 usr/share/man/man1/mapview.1
Modified: trunk/packages/mummer/trunk/debian/patches/addition_from_mugsy.patch
===================================================================
--- trunk/packages/mummer/trunk/debian/patches/addition_from_mugsy.patch 2015-04-14 09:25:25 UTC (rev 19048)
+++ trunk/packages/mummer/trunk/debian/patches/addition_from_mugsy.patch 2015-04-14 12:39:02 UTC (rev 19049)
@@ -1,9 +1,9 @@
Author: Andreas Tille <tille at debian.org>
Last-Update: Mon, 13 Apr 2015 21:50:34 +0200
-Description: The tool mugsy provides a mummer code copy with an additional
- tool delta2maf. Since the mummer copy in mummy does not feature all the
- mummer patches we rather inject the additional tool right into the Debian
- package.
+Description: The tool mugsy provides an old mummer code copy (version 3.20)
+ with a additional tools delta2maf and delta2blocks.
+ Since the mummer copy in mugsy does not feature all the mummer patches we
+ rather inject the additional tool right into the Debian package.
.
The source can be found in
svn://svn.code.sf.net/p/mugsy/code/trunk
Added: trunk/packages/mummer/trunk/debian/patches/addition_from_report_duplicates.patch
===================================================================
--- trunk/packages/mummer/trunk/debian/patches/addition_from_report_duplicates.patch (rev 0)
+++ trunk/packages/mummer/trunk/debian/patches/addition_from_report_duplicates.patch 2015-04-14 12:39:02 UTC (rev 19049)
@@ -0,0 +1,756 @@
+Author: Andreas Tille <tille at debian.org>
+Last-Update: Mon, 13 Apr 2015 21:50:34 +0200
+Description: The tool mugsy provides an old mummer code copy (version 3.20)
+ with modifications to include delta-filter -b for reporting duplications.
+ Since the mummer copy in mugsy does not feature all the mummer patches we
+ rather inject the additional tool right into the Debian package.
+ .
+ The source can be found in
+ svn://svn.code.sf.net/p/mugsy/code/trunk
+
+--- a/src/tigr/delta.cc
++++ b/src/tigr/delta.cc
+@@ -34,48 +34,6 @@ struct LIS_t
+ };
+
+
+-struct EdgeletQCmp_t
+-//!< Compares query lo coord
+-{
+- bool operator() (const DeltaEdgelet_t * i, const DeltaEdgelet_t * j) const
+- {
+- //-- Sorting by score in the event of a tie ensures that when building
+- // LIS chains, the highest scoring ones get seen first, thus avoiding
+- // overlap problems
+-
+- if ( i->loQ < j->loQ )
+- return true;
+- else if ( i->loQ > j->loQ )
+- return false;
+- else if ( ScoreLocal (0, i->hiQ - i->loQ + 1, 0, 0, i->idy, 0) >
+- ScoreLocal (0, j->hiQ - j->loQ + 1, 0, 0, j->idy, 0) )
+- return true;
+- else
+- return false;
+- }
+-};
+-
+-
+-struct EdgeletRCmp_t
+-//!< Compares reference lo coord
+-{
+- bool operator() (const DeltaEdgelet_t * i, const DeltaEdgelet_t * j) const
+- {
+- //-- Sorting by score in the event of a tie ensures that when building
+- // LIS chains, the highest scoring ones get seen first, thus avoiding
+- // overlap problems
+-
+- if ( i->loR < j->loR )
+- return true;
+- else if ( i->loR > j->loR )
+- return false;
+- else if ( ScoreLocal (0, i->hiR - i->loR + 1, 0, 0, i->idy, 0) >
+- ScoreLocal (0, j->hiR - j->loR + 1, 0, 0, j->idy, 0) )
+- return true;
+- else
+- return false;
+- }
+-};
+
+
+ struct NULLPred_t
+@@ -122,18 +80,6 @@ inline long RevC (const long & coord,
+ }
+
+
+-//------------------------------------------------------------ ScoreLocal ------
+-inline long ScoreLocal
+-(long scorej, long leni, long lenj,
+- long olap, float idyi, float maxolap)
+-{
+- if ( olap > 0 &&
+- ((float)olap / (float)leni * 100.0 > maxolap ||
+- (float)olap / (float)lenj * 100.0 > maxolap) )
+- return -1;
+- else
+- return (scorej + (long)((leni - olap) * pow (idyi, 2)));
+-}
+
+
+ //----------------------------------------------------------- ScoreGlobal ------
+@@ -450,7 +396,7 @@ void DeltaGraph_t::clean()
+ map<string, DeltaNode_t>::iterator i;
+ map<string, DeltaNode_t>::iterator ii;
+ vector<DeltaEdge_t *>::iterator j;
+- vector<DeltaEdgelet_t *>::iterator k;
++ EdgeletType::iterator k;
+
+ //-- For all ref nodes
+ for ( i = refnodes.begin(); i != refnodes.end(); )
+@@ -601,7 +547,7 @@ void DeltaGraph_t::flagGOOD()
+ {
+ map<string, DeltaNode_t>::const_iterator mi;
+ vector<DeltaEdge_t *>::const_iterator ei;
+- vector<DeltaEdgelet_t *>::iterator eli;
++ EdgeletType::iterator eli;
+
+ //-- All references
+ for ( mi = refnodes.begin(); mi != refnodes.end(); ++ mi )
+@@ -625,12 +571,13 @@ void DeltaGraph_t::flagGOOD()
+ //! \brief Intersection of flagQLIS and flagRLIS
+ void DeltaGraph_t::flag1to1(float epsilon, float maxolap)
+ {
++ //std::cerr << "Starting LIS" << std::endl;
+ flagRLIS(epsilon, maxolap, false);
+ flagQLIS(epsilon, maxolap, false);
+
+ map<string, DeltaNode_t>::const_iterator mi;
+ vector<DeltaEdge_t *>::const_iterator ei;
+- vector<DeltaEdgelet_t *>::iterator eli;
++ EdgeletType::iterator eli;
+
+ //-- All references
+ for ( mi = refnodes.begin(); mi != refnodes.end(); ++ mi )
+@@ -640,16 +587,56 @@ void DeltaGraph_t::flag1to1(float epsilo
+ ei != (mi->second).edges.end(); ++ ei )
+ {
+ //-- All alignments between reference and query
++ //SVA debugging
++ //std::cerr << "Size of edgelets:" << (*ei)->edgelets.size() << std::endl;
+ for ( eli = (*ei)->edgelets.begin();
+ eli != (*ei)->edgelets.end(); ++ eli )
+ {
+ if ( !(*eli)->isRLIS || !(*eli)->isQLIS )
+ (*eli)->isGOOD = false;
+ }
++ //std::cerr << "Done" << std::endl;
+ }
+ }
+ }
+
++//---------------------------------------------------------------- flagDup ----
++//! \brief Exclusive OR of flagQLIS and flagRLIS
++void DeltaGraph_t::flagDup(float epsilon, float maxolap)
++{
++ //std::cerr << "Starting LIS" << std::endl;
++ flagRLIS(epsilon, maxolap, false);
++ flagQLIS(epsilon, maxolap, false);
++
++ map<string, DeltaNode_t>::const_iterator mi;
++ vector<DeltaEdge_t *>::const_iterator ei;
++ EdgeletType::iterator eli;
++
++ //-- All references
++ for ( mi = refnodes.begin(); mi != refnodes.end(); ++ mi )
++ {
++ //-- All queries matching this reference
++ for ( ei = (mi->second).edges.begin();
++ ei != (mi->second).edges.end(); ++ ei )
++ {
++ //-- All alignments between reference and query
++ //SVA debugging
++ //std::cerr << "Size of edgelets:" << (*ei)->edgelets.size() << std::endl;
++ for ( eli = (*ei)->edgelets.begin();
++ eli != (*ei)->edgelets.end(); ++ eli )
++ {
++ if ( (!(*eli)->isRLIS || !(*eli)->isQLIS)
++ && ((*eli)->isRLIS || (*eli)->isQLIS)){
++ //(*eli)->isGOOD = true;
++ }
++ else{
++ (*eli)->isGOOD = false;
++ }
++ }
++ //std::cerr << "Done" << std::endl;
++ }
++ }
++}
+
+ //---------------------------------------------------------------- flagMtoM ----
+ //! \brief Union of flagQLIS and flagRLIS
+@@ -660,7 +647,7 @@ void DeltaGraph_t::flagMtoM(float epsilo
+
+ map<string, DeltaNode_t>::const_iterator mi;
+ vector<DeltaEdge_t *>::const_iterator ei;
+- vector<DeltaEdgelet_t *>::iterator eli;
++ EdgeletType::iterator eli;
+
+ //-- All references
+ for ( mi = refnodes.begin(); mi != refnodes.end(); ++ mi )
+@@ -700,11 +687,11 @@ void DeltaGraph_t::flagGLIS (float epsil
+ long i, j, n;
+ long olap, olapQ, olapR, len, lenQ, lenR, score, diff;
+
+- vector<DeltaEdgelet_t *> edgelets;
++ EdgeletType edgelets;
+
+ map<string, DeltaNode_t>::const_iterator mi;
+ vector<DeltaEdge_t *>::const_iterator ei;
+- vector<DeltaEdgelet_t *>::iterator eli;
++ EdgeletType::iterator eli;
+
+
+ //-- For each reference sequence
+@@ -861,25 +848,29 @@ void DeltaGraph_t::flagQLIS (float epsil
+ long i, j, n;
+ long olap, leni, lenj, score, diff;
+
+- vector<DeltaEdgelet_t *> edgelets;
++ EdgeletType edgelets;
++ edgelets.reserve(30000);
+
+ map<string, DeltaNode_t>::const_iterator mi;
+ vector<DeltaEdge_t *>::const_iterator ei;
+- vector<DeltaEdgelet_t *>::iterator eli;
+-
++ EdgeletType::iterator eli;
++ EdgeletType::iterator eli_end;
+
+ //-- For each query sequence
+ for ( mi = qrynodes.begin(); mi != qrynodes.end(); ++ mi )
+ {
+- //-- Collect all the good edgelets
+- edgelets.clear();
+ for ( ei = (mi->second).edges.begin();
+ ei != (mi->second).edges.end(); ++ ei )
++ {
++ //-- Collect all the good edgelets
++ edgelets.clear();
++ edgelets.reserve(30000);
++ lis_size=0;
++ eli_end = (*ei)->edgelets.end();
+ for ( eli = (*ei)->edgelets.begin();
+- eli != (*ei)->edgelets.end(); ++ eli )
++ eli != eli_end; ++ eli )
+ if ( (*eli)->isGOOD )
+ edgelets.push_back (*eli);
+-
+ //-- Resize and initialize
+ n = edgelets.size();
+ if ( n > lis_size )
+@@ -895,6 +886,7 @@ void DeltaGraph_t::flagQLIS (float epsil
+
+ //-- Continue until all equivalent repeats are extracted
+ vector<long> allbest;
++ allbest.reserve(30000);
+ do
+ {
+ //-- Dynamic
+@@ -956,8 +948,9 @@ void DeltaGraph_t::flagQLIS (float epsil
+ if ( ! (*eli)->isQLIS )
+ (*eli)->isGOOD = false;
+ }
+-
++ }
+ free (lis);
++
+ }
+
+
+@@ -982,25 +975,30 @@ void DeltaGraph_t::flagRLIS (float epsil
+ long i, j, n;
+ long olap, leni, lenj, score, diff;
+
+- vector<DeltaEdgelet_t *> edgelets;
++ EdgeletType edgelets;
++ edgelets.reserve(30000);
+
+ map<string, DeltaNode_t>::const_iterator mi;
+ vector<DeltaEdge_t *>::const_iterator ei;
+- vector<DeltaEdgelet_t *>::iterator eli;
+-
++ EdgeletType::iterator eli;
++ EdgeletType::iterator eli_end;
+
+ //-- For each reference sequence
+ for ( mi = refnodes.begin(); mi != refnodes.end(); ++ mi )
+ {
+- //-- Collect all the good edgelets
+- edgelets.clear();
++ //-- For each query matching this reference
+ for ( ei = (mi->second).edges.begin();
+ ei != (mi->second).edges.end(); ++ ei )
++ {
++ //-- Collect all the good edgelets
++ edgelets.clear();
++ edgelets.reserve(30000);
++ lis_size=0;
++ eli_end = (*ei)->edgelets.end();
+ for ( eli = (*ei)->edgelets.begin();
+- eli != (*ei)->edgelets.end(); ++ eli )
++ eli != eli_end; ++ eli )
+ if ( (*eli)->isGOOD )
+ edgelets.push_back (*eli);
+-
+ //-- Resize
+ n = edgelets.size();
+ if ( n > lis_size )
+@@ -1016,6 +1014,7 @@ void DeltaGraph_t::flagRLIS (float epsil
+
+ //-- Continue until all equivalent repeats are extracted
+ vector<long> allbest;
++ allbest.reserve(30000);
+ do
+ {
+ //-- Dynamic
+@@ -1072,11 +1071,14 @@ void DeltaGraph_t::flagRLIS (float epsil
+ for ( i = allbest[beg]; i >= 0 && i < n; i = lis[i].from )
+ lis[i].a->isRLIS = true;
+
+- if ( flagbad )
+- for ( eli = edgelets.begin(); eli != edgelets.end(); ++ eli )
++ if ( flagbad ){
++ eli_end = edgelets.end();
++ for ( eli = edgelets.begin(); eli != eli_end; ++ eli )
+ if ( ! (*eli)->isRLIS )
+ (*eli)->isGOOD = false;
+ }
++ }
++ }
+
+ free (lis);
+ }
+@@ -1095,7 +1097,7 @@ void DeltaGraph_t::flagScore (long minle
+ {
+ map<string, DeltaNode_t>::const_iterator mi;
+ vector<DeltaEdge_t *>::const_iterator ei;
+- vector<DeltaEdgelet_t *>::iterator eli;
++ EdgeletType::iterator eli;
+
+ for ( mi = refnodes.begin(); mi != refnodes.end(); ++ mi )
+ for ( ei = (mi->second).edges.begin();
+@@ -1128,11 +1130,11 @@ void DeltaGraph_t::flagUNIQ (float minun
+ {
+ long i, uniq, len;
+
+- vector<DeltaEdgelet_t *> edgelets;
++ EdgeletType edgelets;
+
+ map<string, DeltaNode_t>::const_iterator mi;
+ vector<DeltaEdge_t *>::const_iterator ei;
+- vector<DeltaEdgelet_t *>::iterator eli;
++ EdgeletType::iterator eli;
+
+
+ //-- For each reference sequence
+@@ -1141,6 +1143,10 @@ void DeltaGraph_t::flagUNIQ (float minun
+ unsigned char * ref_cov = NULL;
+ for ( mi = refnodes.begin(); mi != refnodes.end(); ++ mi )
+ {
++ /*
++ //SVA treat each query, ref pair as distinct
++ //TODO make optional so can use this way for complete genomes
++ //and other way for draft
+ //-- Reset the reference coverage array
+ ref_len = (mi->second).len;
+ if ( ref_len > ref_size )
+@@ -1150,11 +1156,20 @@ void DeltaGraph_t::flagUNIQ (float minun
+ }
+ for ( i = 1; i <= ref_len; ++ i )
+ ref_cov[i] = 0;
+-
++ */
+ //-- Collect all the good edgelets
+ edgelets.clear();
+ for ( ei = (mi->second).edges.begin();
+- ei != (mi->second).edges.end(); ++ ei )
++ ei != (mi->second).edges.end(); ++ ei ){
++ ref_len = (mi->second).len;
++ if ( ref_len > ref_size )
++ {
++ ref_cov = (unsigned char *) Safe_realloc (ref_cov, ref_len + 1);
++ ref_size = ref_len;
++ }
++ for ( i = 1; i <= ref_len; ++ i )
++ ref_cov[i] = 0;
++
+ for ( eli = (*ei)->edgelets.begin();
+ eli != (*ei)->edgelets.end(); ++ eli )
+ if ( (*eli)->isGOOD )
+@@ -1166,7 +1181,7 @@ void DeltaGraph_t::flagUNIQ (float minun
+ if ( ref_cov[i] < UCHAR_MAX )
+ ref_cov[i] ++;
+ }
+-
++ }
+ //-- Calculate the uniqueness of each edgelet
+ for ( eli = edgelets.begin(); eli != edgelets.end(); ++ eli )
+ {
+@@ -1312,16 +1327,17 @@ void DeltaGraph_t::loadSequences ()
+ //! \brief Outputs the contents of the graph as a deltafile
+ //!
+ //! \param out The output stream to write to
++//! \param inverse Print the discarded alignments, !isGood
+ //! \return The output stream
+ //!
+-ostream & DeltaGraph_t::outputDelta (ostream & out)
++ostream & DeltaGraph_t::outputDelta (ostream & out, bool inverse)
+ {
+ bool header;
+ long s1, e1, s2, e2;
+
+ map<string, DeltaNode_t>::const_iterator mi;
+ vector<DeltaEdge_t *>::const_iterator ei;
+- vector<DeltaEdgelet_t *>::const_iterator eli;
++ EdgeletType::const_iterator eli;
+
+ //-- Print the file header
+ cout
+@@ -1338,9 +1354,40 @@ ostream & DeltaGraph_t::outputDelta (ost
+ for ( eli = (*ei)->edgelets.begin();
+ eli != (*ei)->edgelets.end(); ++ eli )
+ {
+- if ( ! (*eli)->isGOOD )
+- continue;
++ if ( ! (*eli)->isGOOD){
++ //Check if printing discard alns
++ if(inverse){
++ //-- Print the sequence header
++ if ( ! header )
++ {
++ cout
++ << '>'
++ << *((*ei)->refnode->id) << ' '
++ << *((*ei)->qrynode->id) << ' '
++ << (*ei)->refnode->len << ' '
++ << (*ei)->qrynode->len << '\n';
++ header = true;
++ }
++ //-- Print the alignment
++ s1 = (*eli)->loR;
++ e1 = (*eli)->hiR;
++ s2 = (*eli)->loQ;
++ e2 = (*eli)->hiQ;
++ if ( (*eli)->dirR == REVERSE_DIR )
++ Swap (s1, e1);
++ if ( (*eli)->dirQ == REVERSE_DIR )
++ Swap (s2, e2);
+
++ cout
++ << s1 << ' ' << e1 << ' ' << s2 << ' ' << e2 << ' '
++ << (*eli)->idyc << ' '
++ << (*eli)->simc << ' '
++ << (*eli)->stpc << '\n'
++ << (*eli)->delta;
++ }
++ }
++ else{
++ if(!inverse){
+ //-- Print the sequence header
+ if ( ! header )
+ {
+@@ -1371,5 +1418,7 @@ ostream & DeltaGraph_t::outputDelta (ost
+ }
+ }
+ }
++ }
++ }
+ return out;
+ }
+--- a/src/tigr/delta-filter.cc
++++ b/src/tigr/delta-filter.cc
+@@ -38,6 +38,8 @@ float OPT_MinIdentity = 0.0;
+ float OPT_MinUnique = 0.0; // minimum %unique
+ float OPT_MaxOverlap = 100.0; // maximum olap as % of align len
+ float OPT_Epsilon = -1.0; // negligible alignment score
++float OPT_Inverse = false; // output the discarded alignments, !isGood
++float OPT_Dup = false; // output apparent dups
+
+
+ //========================================================== Fuction Decs ====//
+@@ -94,8 +96,12 @@ int main(int argc, char ** argv)
+ if ( OPT_1to1 )
+ graph.flag1to1(OPT_Epsilon, OPT_MaxOverlap);
+
++ //-- Dups
++ if ( OPT_Dup )
++ graph.flagDup(OPT_Epsilon, OPT_MaxOverlap);
++
+ //-- Output the filtered delta file
+- graph.outputDelta(cout);
++ graph.outputDelta(cout,OPT_Inverse);
+
+ return EXIT_SUCCESS;
+ }
+@@ -110,7 +116,7 @@ void ParseArgs(int argc, char ** argv)
+ optarg = NULL;
+
+ while ( !errflg &&
+- ((ch = getopt(argc, argv, "e:ghi:l:o:qru:m1")) != EOF) )
++ ((ch = getopt(argc, argv, "bve:ghi:l:o:qru:m1")) != EOF) )
+ switch (ch)
+ {
+ case 'e':
+@@ -158,6 +164,14 @@ void ParseArgs(int argc, char ** argv)
+ OPT_1to1 = true;
+ break;
+
++ case 'v':
++ OPT_Inverse = true;
++ break;
++
++ case 'b':
++ OPT_Dup = true;
++ break;
++
+ default:
+ errflg ++;
+ }
+@@ -225,6 +239,9 @@ void PrintHelp(const char * s)
+ << "-o float Set the maximum alignment overlap for -r and -q options\n"
+ << " as a percent of the alignment length [0, 100], default "
+ << OPT_MaxOverlap << endl
++ << "-v Print the discarded alignments instead of those that pass filters\n"
++ << "-b Maps duplications\n"
++ << " (XOR of -r and -q alignments, one or the other but not both)\n"
+ << endl;
+
+ cerr
+@@ -232,7 +249,7 @@ void PrintHelp(const char * s)
+ << "filters the alignments based on the command-line switches, leaving\n"
+ << "only the desired alignments which are output to stdout in the same\n"
+ << "delta format as the input. For multiple switches, order of operations\n"
+- << "is as follows: -i -l -u -q -r -g -m -1. If an alignment is excluded\n"
++ << "is as follows: -i -l -u -q -r -g -m -1 -b. If an alignment is excluded\n"
+ << "by a preceding operation, it will be ignored by the succeeding\n"
+ << "operations.\n"
+ << " An important distinction between the -g option and the -1 and -m\n"
+@@ -242,7 +259,10 @@ void PrintHelp(const char * s)
+ << "inversions, etc. In general cases, the -m option is the best choice,\n"
+ << "however -1 can be handy for applications such as SNP finding which\n"
+ << "require a 1-to-1 mapping. Finally, for mapping query contigs, or\n"
+- << "sequencing reads, to a reference genome, use -q.\n"
++ << "sequencing reads, to a reference genome, use -q. The duplications\n"
++ << "printed with the -b option are -r and -q alignments that are not\n"
++ << "present in the 1-to-1 alignment. These alignments are also the\n"
++ << "difference between the -1 and -m alignments\n"
+ << endl;
+
+ return;
+--- a/src/tigr/delta.hh
++++ b/src/tigr/delta.hh
+@@ -350,6 +350,11 @@ struct DeltaEdgelet_t
+ long loQ, hiQ, loR, hiR; //!< alignment bounds
+ int frmQ, frmR; //!< reading frame
+
++ unsigned int v1;
++ unsigned int v2;
++ void * SEQAN_idx;
++ // SEQAN_idx; //mapping to edge/list of vertices in segment graph in Seqan library. Testing by SVA
++
+ std::string delta; //!< delta information
+ std::vector<SNP_t *> snps; //!< snps for this edgelet
+
+@@ -395,6 +400,7 @@ struct DeltaEdgelet_t
+ };
+
+
++typedef std::vector<DeltaEdgelet_t *> EdgeletType;
+
+ //===================================================== DeltaEdge_t ============
+ struct DeltaEdge_t
+@@ -402,14 +408,14 @@ struct DeltaEdge_t
+ {
+ DeltaNode_t * refnode; //!< the adjacent reference node
+ DeltaNode_t * qrynode; //!< the adjacent query node
+- std::vector<DeltaEdgelet_t *> edgelets; //!< the set of individual alignments
++ EdgeletType edgelets; //!< the set of individual alignments
+
+ DeltaEdge_t ( )
+ { refnode = qrynode = NULL; }
+
+ ~DeltaEdge_t ( )
+ {
+- std::vector<DeltaEdgelet_t *>::iterator i;
++ EdgeletType::iterator i;
+ for ( i = edgelets . begin( ); i != edgelets . end( ); ++ i )
+ delete (*i);
+ }
+@@ -477,6 +483,7 @@ public:
+
+ void flagGOOD();
+ void flag1to1(float epsilon = -1, float maxolap = 100.0);
++ void flagDup(float epsilon = -1, float maxolap = 100.0);
+ void flagMtoM(float epsilon = -1, float maxolap = 100.0);
+ void flagGLIS(float epsilon = -1);
+ void flagQLIS(float epsilon = -1,
+@@ -489,7 +496,64 @@ public:
+ void flagUNIQ(float minuniq);
+
+ void loadSequences();
+- std::ostream & outputDelta(std::ostream & out);
++ std::ostream & outputDelta(std::ostream & out, bool inverse);
++};
++
++//------------------------------------------------------------ ScoreLocal ------
++inline long ScoreLocal
++(long scorej, long leni, long lenj,
++ long olap, float idyi, float maxolap)
++{
++ if ( olap > 0 &&
++ ((float)olap / (float)leni * 100.0 > maxolap ||
++ (float)olap / (float)lenj * 100.0 > maxolap) )
++ return -1;
++ else
++ //return (scorej + (long)((leni - olap) * pow (idyi, 2)));
++ return (scorej + (long)((leni - olap) * (idyi*idyi)));
++}
++
++struct EdgeletQCmp_t
++//!< Compares query lo coord
++{
++ bool operator() (const DeltaEdgelet_t * i, const DeltaEdgelet_t * j) const
++ {
++ //-- Sorting by score in the event of a tie ensures that when building
++ // LIS chains, the highest scoring ones get seen first, thus avoiding
++ // overlap problems
++
++ if ( i->loQ < j->loQ )
++ return true;
++ else if ( i->loQ > j->loQ )
++ return false;
++ else if ( ScoreLocal (0, i->hiQ - i->loQ + 1, 0, 0, i->idy, 0) >
++ ScoreLocal (0, j->hiQ - j->loQ + 1, 0, 0, j->idy, 0) )
++ return true;
++ else
++ return false;
++ }
++};
++
++
++struct EdgeletRCmp_t
++//!< Compares reference lo coord
++{
++ bool operator() (const DeltaEdgelet_t * i, const DeltaEdgelet_t * j) const
++ {
++ //-- Sorting by score in the event of a tie ensures that when building
++ // LIS chains, the highest scoring ones get seen first, thus avoiding
++ // overlap problems
++
++ if ( i->loR < j->loR )
++ return true;
++ else if ( i->loR > j->loR )
++ return false;
++ else if ( ScoreLocal (0, i->hiR - i->loR + 1, 0, 0, i->idy, 0) >
++ ScoreLocal (0, j->hiR - j->loR + 1, 0, 0, j->idy, 0) )
++ return true;
++ else
++ return false;
++ }
+ };
+
+ #endif // #ifndef __DELTA_HH
+--- a/src/tigr/postnuc.cc
++++ b/src/tigr/postnuc.cc
+@@ -148,7 +148,7 @@ void extendClusters
+ void flushAlignments
+ (vector<Alignment> & Alignments,
+ const FastaRecord * Af, const FastaRecord * Bf,
+- FILE * DeltaFile);
++ FILE * DeltaFile, int filter);
+
+ void flushSyntenys
+ (vector<Synteny> & Syntenys, FILE * ClusterFile);
+@@ -174,7 +174,7 @@ void parseDelta
+ void processSyntenys
+ (vector<Synteny> & Syntenys,
+ FastaRecord * Af, long int As,
+- FILE * QryFile, FILE * ClusterFile, FILE * DeltaFile);
++ FILE * QryFile, FILE * ClusterFile, FILE * DeltaFile, int filter);
+
+ inline long int revC
+ (long int Coord, long int Len);
+@@ -232,6 +232,8 @@ int main
+ setBreakLen ( 200 );
+ setBanding ( 0 );
+
++ int FILTER=0; //perform automatic delta-filter. SVA. trying to save time/space on large alignments
++
+ //-- Parse the command line arguments
+ {
+ optarg = NULL;
+@@ -268,6 +270,9 @@ int main
+ TO_SEQEND = true;
+ break;
+
++ case 'f' :
++ FILTER = 1;
++
+ default :
+ errflg ++;
+ }
+@@ -420,7 +425,7 @@ int main
+ {
+ //-- New B sequence header, process all the old synteny's
+ processSyntenys (Syntenys, Af, As,
+- QryFile, ClusterFile, DeltaFile);
++ QryFile, ClusterFile, DeltaFile, FILTER);
+ }
+
+ strcpy (IdA, Af[Seqi].Id);
+@@ -476,7 +481,7 @@ int main
+ if ( CurrSp->clusters.rbegin( )->matches.empty( ) )
+ CurrSp->clusters.pop_back( );
+
+- processSyntenys (Syntenys, Af, As, QryFile, ClusterFile, DeltaFile);
++ processSyntenys (Syntenys, Af, As, QryFile, ClusterFile, DeltaFile, FILTER);
+ fclose (QryFile);
+
+ //-- Free the reference sequences
+@@ -703,7 +708,7 @@ bool extendForward
+
+ void extendClusters
+ (vector<Cluster> & Clusters,
+- const FastaRecord * Af, const FastaRecord * Bf, FILE * DeltaFile)
++ const FastaRecord * Af, const FastaRecord * Bf, FILE * DeltaFile, int filter)
+
+ // Connect all the matches in every cluster between sequences A and B.
+ // Also, extend alignments off of the front and back of each cluster to
+@@ -864,7 +869,7 @@ void extendClusters
+ #endif
+
+ //-- Output the alignment data to the delta file
+- flushAlignments (Alignments, Af, Bf, DeltaFile);
++ flushAlignments (Alignments, Af, Bf, DeltaFile, filter);
+
+ if ( Brev != NULL )
+ free (Brev);
+@@ -878,7 +883,7 @@ void extendClusters
+ void flushAlignments
+ (vector<Alignment> & Alignments,
+ const FastaRecord * Af, const FastaRecord * Bf,
+- FILE * DeltaFile)
++ FILE * DeltaFile, int filter)
+
+ // Simply output the delta information stored in Alignments to the
+ // given delta file. Free the memory used by Alignments once the
+@@ -887,6 +892,11 @@ void flushAlignments
+ {
+ vector<Alignment>::iterator Ap; // alignment pointer
+ vector<long int>::iterator Dp; // delta pointer
++ //filter graph here
++
++ //DeltaGraph_t graph;
++ //srand(1);
++ //graph.flagMtoM(OPT_Epsilon, OPT_MaxOverlap);
+
+ fprintf (DeltaFile, ">%s %s %ld %ld\n", Af->Id, Bf->Id, Af->len, Bf->len);
+
+@@ -1348,7 +1358,7 @@ void parseDelta
+
+ void processSyntenys
+ (vector<Synteny> & Syntenys, FastaRecord * Af, long int As,
+- FILE * QryFile, FILE * ClusterFile, FILE * DeltaFile)
++ FILE * QryFile, FILE * ClusterFile, FILE * DeltaFile, int filter)
+
+ // For each syntenic region with clusters, read in the B sequence and
+ // extend the clusters to expand total alignment coverage. Only should
+@@ -1393,7 +1403,7 @@ void processSyntenys
+
+ //-- Extend clusters and create the alignment information
+ CurrSp->Bf.len = Bf.len;
+- extendClusters (CurrSp->clusters, CurrSp->AfP, &Bf, DeltaFile);
++ extendClusters (CurrSp->clusters, CurrSp->AfP, &Bf, DeltaFile,filter);
+ }
+
+ //-- Create the cluster information
Modified: trunk/packages/mummer/trunk/debian/patches/series
===================================================================
--- trunk/packages/mummer/trunk/debian/patches/series 2015-04-14 09:25:25 UTC (rev 19048)
+++ trunk/packages/mummer/trunk/debian/patches/series 2015-04-14 12:39:02 UTC (rev 19049)
@@ -5,3 +5,4 @@
hardening.patch
spelling.patch
addition_from_mugsy.patch
+addition_from_report_duplicates.patch
More information about the debian-med-commit
mailing list