[med-svn] [e-mem] 03/05: New upstream version 1.0.0

Andreas Tille tille at debian.org
Wed May 10 13:10:33 UTC 2017


This is an automated email from the git hooks/post-receive script.

tille pushed a commit to branch master
in repository e-mem.

commit 18d1b6bbcad2406c5b4b43aaca65f2c61a2b5139
Author: Andreas Tille <tille at debian.org>
Date:   Wed May 10 15:05:42 2017 +0200

    New upstream version 1.0.0
---
 README.md          | 77 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 e-mem.cpp          | 12 ++++++---
 example/que1.fasta |  7 +++++
 example/ref.fasta  |  2 +-
 example/ref1.fasta |  7 +++++
 5 files changed, 101 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
new file mode 100644
index 0000000..1b36194
--- /dev/null
+++ b/README.md
@@ -0,0 +1,77 @@
+# E-MEM
+Efficient Computation of Maximal Exact Matches
+
+##-- DESCRIPTION --
+
+E-MEM is an efficient MEM computation program for large genomes which can be used as a stand alone application or a drop in replacement for MUMmer3. The detailed comparison of E-MEM with other leading programs can be found in E-MEM paper (Nilesh Khiste and Lucian Ilie).
+
+   USAGE:
+   
+    e-mem  [options]  < reference >  < query > ...
+
+    [options]    type 'e-mem -h' for a list of options.
+    <reference>  reference file with one or more FastA sequences
+    <query>      query file with one or more FastA sequences
+
+   OUTPUT:
+   
+                 stdout  a list of exact matches
+
+##-- INSTALLATION --
+
+After extracting the files into the desired installation directory,
+change to the "e-mem" directory.  Once in this directory, type: "make"
+
+This command will build the e-mem binary. If you see any error messages, please
+contact authors.
+
+For sanity test, please execute shell-script run_example. On success, the script
+will print "Test passed" as output.
+
+
+##-- RUNNING E-MEM STANDALONE --
+
+The E-MEM program can be used with fastA files with one or more sequences. The program can be run in both serial and parallel mode. The parallel mode has an advantage in terms of time with respect to serial mode.
+
+The valid set of options for e-mem program are:
+
+Options:
+
+-n	      match only the characters a, c, g, or t, they can be in upper or in lower case
+   
+-l	      set the minimum length of a match. The default length is 50
+
+-b	      compute forward and reverse complement matches
+
+-r	      only compute reverse complement matches
+
+-c	      report the query-position of a reverse complement match relative to the original query sequence
+
+-F	      force 4 column output format regardless of the number of reference sequence input
+
+-L	      show the length of the query sequences on the header line
+
+-d	      set the split size. The default value is 1
+
+-t	      number of threads. The default is 1 thread
+
+-h       show possible options
+
+Apart from the options used in Mummer, we provide two more options. The option -d is used for splitting the sequences into two or more parts.  By default this value is set to 1, which means no splitting. This option with value >1 will reduce the overall memory requirement of the program with some compromise on performance.
+
+The option -t is used for running the program in parallel mode. The default value is set to 1, which means serial mode. This option with value > 1 will reduce overall running time of the program with some compromise on memory.
+
+##-- RUNNING E-MEM WITHIN MUMMER3 --
+
+Mummer3 has many different scripts where one of the key program is MEM computation. In all the scripts, the MEM computation program can be replaced with E-MEM with ease for better performance.
+
+For example, in order to use NUCMER (all-vs-all comparison of nucleotide sequences contained in FastA files) with E-MEM, simply change all the references to mummer (MEM computation program) with e-mem program. In script NUCMER, search for "$BIN_DIR/mummer" and replace it with "< path >/e-mem" where < path > is installation idirectory for e-mem program.
+
+The other important script in MUmmer3 is run-mummer3 (the alignment program). To use this script with E-MEM, simply replace "$bindir/mummer" with "< path >/e-mem" where < path > is e-mem installation directory.
+
+###CITE
+If you find E-MEM program useful, please cite the E-MEM paper:
+
+N. Khiste, L. Ilie [E-MEM: efficient computation of maximal exact matches for very large genomes](http://bioinformatics.oxfordjournals.org/content/31/4/509.short) Bioinformatics, 2015
+
+ 
diff --git a/e-mem.cpp b/e-mem.cpp
index f4fdc79..e7f7671 100644
--- a/e-mem.cpp
+++ b/e-mem.cpp
@@ -156,9 +156,11 @@ void helperReportMem(uint64_t &currRPos, uint64_t &currQPos, uint64_t totalRBits
              * enter this loop.
              */
             currR = RefFile.binReads[offsetR?i:i-1];
-            currR >>= DATATYPE_WIDTH-offsetR;
+            if (offsetR)
+                currR >>= DATATYPE_WIDTH-offsetR;
             currQ = QueryFile.binReads[offsetQ?j:j-1];
-            currQ >>= DATATYPE_WIDTH-offsetQ;
+            if (offsetQ)
+                currQ >>= DATATYPE_WIDTH-offsetQ;
         } 
 
         if((currR & global_mask_right[matchSize/2 - 1]) != (currQ &  global_mask_right[matchSize/2 - 1])) {
@@ -485,7 +487,11 @@ void checkCommandLineOptions(uint32_t &options)
 
 void print_help_msg()
 {
-    cout << "e-mem finds and outputs the position and length of all maximal" << endl;
+    cout <<  endl;
+    cout << "E-MEM Version 1.0.0, Sep. 25, 2014" << endl;
+    cout << "© 2014 Nilesh Khiste, Lucian Ilie" << endl;
+    cout <<  endl;
+    cout << "E-MEM finds and outputs the position and length of all maximal" << endl;
     cout << "exact matches (MEMs) between <query-file> and <reference-file>" << endl;
     cout << endl;
     cout << "Usage: ../e-mem [options]  <reference-file>  <query-file>" << endl;
diff --git a/example/que1.fasta b/example/que1.fasta
new file mode 100644
index 0000000..e0ad4eb
--- /dev/null
+++ b/example/que1.fasta
@@ -0,0 +1,7 @@
+>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
+TGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGACT
+>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
+ATGAATGAATGACT
+>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
+TGAATGAATGAATGA
+
diff --git a/example/ref.fasta b/example/ref.fasta
index 277ab89..2ec1d50 100644
--- a/example/ref.fasta
+++ b/example/ref.fasta
@@ -1,3 +1,3 @@
->gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
+>giNilesh cytochrome b [Elephas maximus maximus]
 GAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAAT
 
diff --git a/example/ref1.fasta b/example/ref1.fasta
new file mode 100644
index 0000000..e762a03
--- /dev/null
+++ b/example/ref1.fasta
@@ -0,0 +1,7 @@
+>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
+GAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAAT
+>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
+GAATGAATGAAGAATGAATGAATGAA
+>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
+ATGAATGAATGAATGAATGAAT
+

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/e-mem.git



More information about the debian-med-commit mailing list