[med-svn] [bowtie2] 04/06: Imported Upstream version 2.2.2

Alex Mestiashvili malex-guest at moszumanska.debian.org
Tue Apr 15 12:40:59 UTC 2014


This is an automated email from the git hooks/post-receive script.

malex-guest pushed a commit to branch master
in repository bowtie2.

commit 2eb8a65c54e6397184717dfad44d18e6f2046369
Author: Alexandre Mestiashvili <alex at biotec.tu-dresden.de>
Date:   Tue Apr 15 13:36:57 2014 +0200

    Imported Upstream version 2.2.2
---
 MANUAL          |  5 +++--
 MANUAL.markdown |  5 +++--
 NEWS            |  4 ++++
 VERSION         |  2 +-
 doc/manual.html |  2 +-
 reference.cpp   | 34 ++++++++++++++++++++++++++++------
 reference.h     |  9 ++++++---
 7 files changed, 46 insertions(+), 15 deletions(-)

diff --git a/MANUAL b/MANUAL
index f943ce4..23c6999 100644
--- a/MANUAL
+++ b/MANUAL
@@ -182,8 +182,9 @@ your [PATH].
 
 If you would like to install Bowtie 2 by copying the Bowtie 2 executable files
 to an existing directory in your [PATH], make sure that you copy all the
-executables, including `bowtie2`, `bowtie2-align`, `bowtie2-build` and
-`bowtie2-inspect`.
+executables, including `bowtie2`, `bowtie2-align-s`, `bowtie2-align-l`,
+`bowtie2-build`, `bowtie2-build-s`, `bowtie2-build-l`, `bowtie2-inspect`,
+`bowtie2-inspect-s` and `bowtie2-inspect-l`.
 
 [PATH environment variable]: http://en.wikipedia.org/wiki/PATH_(variable)
 [PATH]: http://en.wikipedia.org/wiki/PATH_(variable)
diff --git a/MANUAL.markdown b/MANUAL.markdown
index 450899b..76074ca 100644
--- a/MANUAL.markdown
+++ b/MANUAL.markdown
@@ -192,8 +192,9 @@ your [PATH].
 
 If you would like to install Bowtie 2 by copying the Bowtie 2 executable files
 to an existing directory in your [PATH], make sure that you copy all the
-executables, including `bowtie2`, `bowtie2-align`, `bowtie2-build` and
-`bowtie2-inspect`.
+executables, including `bowtie2`, `bowtie2-align-s`, `bowtie2-align-l`,
+`bowtie2-build`, `bowtie2-build-s`, `bowtie2-build-l`, `bowtie2-inspect`,
+`bowtie2-inspect-s` and `bowtie2-inspect-l`.
 
 [PATH environment variable]: http://en.wikipedia.org/wiki/PATH_(variable)
 [PATH]: http://en.wikipedia.org/wiki/PATH_(variable)
diff --git a/NEWS b/NEWS
index 677320f..63e5074 100644
--- a/NEWS
+++ b/NEWS
@@ -16,6 +16,10 @@ Please report any issues using the Sourceforge bug tracker:
 Version Release History
 =======================
 
+Version 2.2.2 - April 10, 2014
+   * Improved performance for cases where the reference contains ambiguous 
+     or masked nucleobases represented by Ns.  
+
 Version 2.2.1 - February 28, 2014
    * Improved way in which index files are loaded for alignment.  Should fix
      efficiency problems on some filesystems.
diff --git a/VERSION b/VERSION
index c043eea..b1b25a5 100644
--- a/VERSION
+++ b/VERSION
@@ -1 +1 @@
-2.2.1
+2.2.2
diff --git a/doc/manual.html b/doc/manual.html
index 5f6a3f6..78141cd 100644
--- a/doc/manual.html
+++ b/doc/manual.html
@@ -140,7 +140,7 @@
 <p>Bowtie 2 is using the multithreading software model in order to speed up execution times on SMP architectures where this is possible. On POSIX platforms (like linux, Mac OS, etc) it needs the pthread library. Although it is possible to use pthread library on non-POSIX platform like Windows, due to performance reasons bowtie 2 will try to use Windows native multithreading if possible.</p>
 <h2 id="adding-to-path"><a href="#TOC">Adding to PATH</a></h2>
 <p>By adding your new Bowtie 2 directory to your <a href="http://en.wikipedia.org/wiki/PATH_(variable)">PATH environment variable</a>, you ensure that whenever you run <code>bowtie2</code>, <code>bowtie2-build</code> or <code>bowtie2-inspect</code> from the command line, you will get the version you just installed without having to specify the entire path. This is recommended for most users. To do this, follow your operating system's instructions for adding the directory to your <a href= [...]
-<p>If you would like to install Bowtie 2 by copying the Bowtie 2 executable files to an existing directory in your <a href="http://en.wikipedia.org/wiki/PATH_(variable)">PATH</a>, make sure that you copy all the executables, including <code>bowtie2</code>, <code>bowtie2-align</code>, <code>bowtie2-build</code> and <code>bowtie2-inspect</code>.</p>
+<p>If you would like to install Bowtie 2 by copying the Bowtie 2 executable files to an existing directory in your <a href="http://en.wikipedia.org/wiki/PATH_(variable)">PATH</a>, make sure that you copy all the executables, including <code>bowtie2</code>, <code>bowtie2-align-s</code>, <code>bowtie2-align-l</code>, <code>bowtie2-build</code>, <code>bowtie2-build-s</code>, <code>bowtie2-build-l</code>, <code>bowtie2-inspect</code>, <code>bowtie2-inspect-s</code> and <code>bowtie2-inspect- [...]
 <h1 id="the-bowtie2-aligner"><a href="#TOC">The <code>bowtie2</code> aligner</a></h1>
 <p><code>bowtie2</code> takes a Bowtie 2 index and a set of sequencing read files and outputs a set of alignments in SAM format.</p>
 <p>"Alignment" is the process by which we discover how and where the read sequences are similar to the reference sequence. An "alignment" is a result from this process, specifically: an alignment is a way of "lining up" some or all of the characters in the read with some characters from the reference in a way that reveals how they're similar. For example:</p>
diff --git a/reference.cpp b/reference.cpp
index 6b8f215..dcb8dd8 100644
--- a/reference.cpp
+++ b/reference.cpp
@@ -148,6 +148,8 @@ BitPairReference::BitPairReference(
 			     << "'first'" << endl;
 			throw 1;
 		}
+		cumUnambig_.push_back(cumsz);
+		cumRefOff_.push_back(cumlen);
 		cumsz += recs_.back().len;
 		cumlen += recs_.back().off;
 		cumlen += recs_.back().len;
@@ -161,6 +163,8 @@ BitPairReference::BitPairReference(
 	refRecOffs_.push_back((TIndexOffU)recs_.size());
 	refOffs_.push_back(cumsz);
 	refLens_.push_back(cumlen);
+	cumUnambig_.push_back(cumsz);
+	cumRefOff_.push_back(cumlen);
 	bufSz_ = cumsz;
 	assert_eq(nrefs_, refLens_.size());
 	assert_eq(sz, recs_.size());
@@ -412,10 +416,6 @@ int BitPairReference::getStretchNaive(
 
 /**
  * Load a stretch of the reference string into memory at 'dest'.
- *
- * This implementation scans linearly through the records for the
- * unambiguous stretches of the target reference sequence.  When
- * there are many records, binary search would be more appropriate.
  */
 int BitPairReference::getStretch(
 	uint32_t *destU32,
@@ -447,11 +447,31 @@ int BitPairReference::getStretch(
 	uint64_t off = 0;
 	int64_t offset = 4;
 	bool firstStretch = true;
+	bool binarySearched = false;
+	uint64_t left  = reci;
+	uint64_t right = recf;
+	uint64_t mid   = 0;
 	// For all records pertaining to the target reference sequence...
 	for(uint64_t i = reci; i < recf; i++) {
-		ASSERT_ONLY(uint64_t origBufOff = bufOff);
+		uint64_t origBufOff = bufOff;
 		assert_geq(toff, off);
-		off += recs_[i].off;
+		if (firstStretch && recf > reci + 16){
+			// binary search finds smallest i s.t. toff >= cumRefOff_[i]
+			while (left < right-1) {
+				mid = left + ((right - left) >> 1);
+				if (cumRefOff_[mid] <= toff)
+					left = mid;
+				else
+					right = mid;
+			}
+			off = cumRefOff_[left];
+			bufOff = cumUnambig_[left];
+			origBufOff = bufOff;
+			i = left;
+			assert_gt(cumRefOff_[i+1], toff);
+			binarySearched = true;
+		}
+		off += recs_[i].off; // skip Ns at beginning of stretch
 		assert_gt(count, 0);
 		if(toff < off) {
 			size_t cpycnt = min((size_t)(off - toff), count);
@@ -468,6 +488,8 @@ int BitPairReference::getStretch(
 			bufOff += recs_[i].len;
 		}
 		off += recs_[i].len;
+		assert(off == cumRefOff_[i+1] || cumRefOff_[i+1] == 0);
+		assert(!binarySearched || toff < off);
 		if(toff < off) {
 			if(firstStretch) {
 				if(toff + 8 < off && count > 8) {
diff --git a/reference.h b/reference.h
index c955303..9b8e2e7 100644
--- a/reference.h
+++ b/reference.h
@@ -167,9 +167,12 @@ protected:
 	uint32_t byteToU32_[256];
 
 	EList<RefRecord> recs_;       /// records describing unambiguous stretches
-	EList<TIndexOffU>  refLens_;    /// approx lens of ref seqs (excludes trailing ambig chars)
-	EList<TIndexOffU>  refOffs_;    /// buf_ begin offsets per ref seq
-	EList<TIndexOffU>  refRecOffs_; /// record begin/end offsets per ref seq
+	// following two lists are purely for the binary search in getStretch
+	EList<TIndexOffU> cumUnambig_; // # unambig ref chars up to each record
+	EList<TIndexOffU> cumRefOff_;  // # ref chars up to each record
+	EList<TIndexOffU> refLens_;    /// approx lens of ref seqs (excludes trailing ambig chars)
+	EList<TIndexOffU> refOffs_;    /// buf_ begin offsets per ref seq
+	EList<TIndexOffU> refRecOffs_; /// record begin/end offsets per ref seq
 	uint8_t *buf_;      /// the whole reference as a big bitpacked byte array
 	uint8_t *sanityBuf_;/// for sanity-checking buf_
 	TIndexOffU bufSz_;    /// size of buf_

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/bowtie2.git



More information about the debian-med-commit mailing list