[med-svn] [Git][med-team/mhap][debian/jessie-backports] 12 commits: Bump minimum required libguava-java version

Andreas Tille gitlab at salsa.debian.org
Tue Jul 17 12:00:28 BST 2018


Andreas Tille pushed to branch debian/jessie-backports at Debian Med / mhap


Commits:
84391d0e by Afif Elghraoui at 2016-10-16T15:14:19-07:00
Bump minimum required libguava-java version

We otherwise get the compilation error:

[ERROR] /<<BUILDDIR>>/mhap-2.1+dfsg/src/main/java/edu/umd/marbl/mhap/sketch/FrequencyCounts.java:[104,27] error:
no suitable method found for create((value,sin[...]alue),long,double)

The MHAP pom file specifies v19 and we still get this build error with v18,
so it seems to be a necessary minimum version.

- - - - -
7cb51cbd by Afif Elghraoui at 2016-10-16T15:15:28-07:00
Prepare new revision

- - - - -
c8088c37 by Steffen Moeller at 2017-07-21T13:33:05+02:00
Added ref to OMICtools

- - - - -
3110815f by Andreas Tille at 2017-08-17T19:16:43+02:00
Fix yaml syntax

- - - - -
1cb04d5d by Andreas Tille at 2017-10-20T10:09:25+02:00
Fix OMICS entry

- - - - -
7520c502 by Afif Elghraoui at 2018-03-15T23:59:53-04:00
New upstream version 2.1.3+dfsg
- - - - -
24b67004 by Afif Elghraoui at 2018-03-15T23:59:54-04:00
Merge tag 'upstream/2.1.3+dfsg'

Upstream version 2.1.3+dfsg

- - - - -
88230ca9 by Afif Elghraoui at 2018-03-16T00:01:17-04:00
Standards-Version 4.1.3

- - - - -
eaf20dbf by Afif Elghraoui at 2018-03-16T00:07:21-04:00
releasing package mhap version 2.1.3+dfsg-1

- - - - -
32d6d678 by Andreas Tille at 2018-07-17T12:11:43+02:00
Merge branch 'master' into debian/jessie-backports

- - - - -
3ebd2ce7 by Andreas Tille at 2018-07-17T12:12:24+02:00
This is an inofficial backport since jessie-backports-sloppy will not supported any more

- - - - -
ef042b49 by Andreas Tille at 2018-07-17T12:40:26+02:00
Drop unnecessary ending newline patch which does not apply in Jessie

- - - - -


22 changed files:

- README.md
- debian/changelog
- debian/control
- debian/patches/maven-artifacts.patch
- debian/upstream/metadata
- docs/source/installation.rst
- pom.xml
- src/main/java/edu/umd/marbl/mhap/impl/AbstractMatchSearch.java
- src/main/java/edu/umd/marbl/mhap/impl/FastaData.java
- src/main/java/edu/umd/marbl/mhap/impl/MinHashBitSequenceSubSketches.java
- src/main/java/edu/umd/marbl/mhap/impl/MinHashSearch.java
- src/main/java/edu/umd/marbl/mhap/impl/SequenceSketch.java
- src/main/java/edu/umd/marbl/mhap/impl/SequenceSketchStreamer.java
- src/main/java/edu/umd/marbl/mhap/main/KmerStatSimulator.java
- src/main/java/edu/umd/marbl/mhap/main/MhapMain.java
- src/main/java/edu/umd/marbl/mhap/sketch/BottomOverlapSketch.java
- src/main/java/edu/umd/marbl/mhap/sketch/BottomSketch.java
- src/main/java/edu/umd/marbl/mhap/sketch/FrequencyCounts.java
- src/main/java/edu/umd/marbl/mhap/sketch/HashUtils.java
- src/main/java/edu/umd/marbl/mhap/sketch/MinHashBitSketch.java
- src/main/java/edu/umd/marbl/mhap/sketch/MinHashSketch.java
- src/main/java/edu/umd/marbl/mhap/utils/Utils.java


Changes:

=====================================
README.md
=====================================
--- a/README.md
+++ b/README.md
@@ -8,12 +8,12 @@ You must have a recent  [JDK](http://www.oracle.com/technetwork/java/javase/down
 
     git clone https://github.com/marbl/MHAP.git
     cd MHAP
-    maven install
+    mvn install
     
-For a quick user-quide, run:
+Maven executables vary by system, it could also be maven/mvn/mv32, depending on your installation. For a quick user-quide, run:
 
     cd target
-    java -jar mhap-2.1.1.jar
+    java -jar mhap-2.1.3.jar
 
 ## Docs
 For the full documentation information please see http://mhap.readthedocs.io/en/latest/


=====================================
debian/changelog
=====================================
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,21 @@
+mhap (2.1.3+dfsg-1~bpo8+1) jessie-backports; urgency=medium
+
+  * Rebuild for jessie-backports.
+
+ -- Andreas Tille <tille at debian.org>  Tue, 17 Jul 2018 12:11:51 +0200
+
+mhap (2.1.3+dfsg-1) unstable; urgency=medium
+
+  [ Afif Elghraoui ]
+  * Bump minimum required libguava-java version
+  * New upstream version 2.1.3+dfsg
+  * Standards-Version 4.1.3
+
+  [ Steffen Moeller, Andreas Tille ]
+  * Added ref to OMICtools
+
+ -- Afif Elghraoui <afif at debian.org>  Fri, 16 Mar 2018 00:05:16 -0400
+
 mhap (2.1.1+dfsg-1~bpo8+1) jessie-backports; urgency=medium
 
   * Rebuild for jessie-backports.


=====================================
debian/control
=====================================
--- a/debian/control
+++ b/debian/control
@@ -15,7 +15,7 @@ Build-Depends:
 	libguava-java (>= 19),
 	jaligner,
 	libssw-java,
-Standards-Version: 3.9.8
+Standards-Version: 4.1.3
 Homepage: http://mhap.readthedocs.org/en/stable/
 Vcs-Git: https://anonscm.debian.org/git/debian-med/mhap.git
 Vcs-Browser: https://anonscm.debian.org/cgit/debian-med/mhap.git


=====================================
debian/patches/maven-artifacts.patch
=====================================
--- a/debian/patches/maven-artifacts.patch
+++ b/debian/patches/maven-artifacts.patch
@@ -2,8 +2,8 @@ Description: Tell maven how to find libraries without maven artifacts
  This change was made with help from Tony Mancill <tmancill at debian.org>
 Author: Afif Elghraoui <afif at debian.org>
 Forwarded: not-needed
---- mhap.orig/pom.xml
-+++ mhap/pom.xml
+--- a/pom.xml
++++ b/pom.xml
 @@ -110,6 +110,8 @@
  			<groupId>it.unimi.dsi</groupId>
  			<artifactId>fastutil</artifactId>
@@ -13,10 +13,3 @@ Forwarded: not-needed
  		</dependency>
  		<dependency>
  			<groupId>org.apache.commons</groupId>
-@@ -134,4 +136,4 @@
- 	</dependencies>
- 	<url>https://github.com/marbl/MHAP</url>
- 	<description>MinHash alignment process (MHAP pronounced MAP): locality sensitive hashing to detect overlaps and utilities.</description>
--</project>
-\ No newline at end of file
-+</project>


=====================================
debian/upstream/metadata
=====================================
--- a/debian/upstream/metadata
+++ b/debian/upstream/metadata
@@ -9,3 +9,6 @@ Reference:
   DOI: 10.1038/nbt.3238
   PMID: 26006009
   URL: http://www.nature.com/nbt/journal/v33/n6/full/nbt.3238.html
+Registry:
+  - Name: OMICtools
+    Entry: OMICS_13515


=====================================
docs/source/installation.rst
=====================================
--- a/docs/source/installation.rst
+++ b/docs/source/installation.rst
@@ -28,19 +28,25 @@ The pre-compiled version is recommended to users who want to run MHAP, without d
 
 .. code-block:: bash
 
-    $ wget https://github.com/marbl/MHAP/releases/download/v2.1.1/mhap-2.1.1.tar.gz
+    $ wget https://github.com/marbl/MHAP/releases/download/v2.1.1/mhap-2.1.1.jar.gz
 
 And if ``wget`` not available, you can use ``curl`` instead:
 
 .. code-block:: bash
 
-    $ curl -L https://github.com/marbl/MHAP/releases/download/v2.1.1/mhap-2.1.1.tar.gz > mhap-2.1.1.tar.gz
+    $ curl -L https://github.com/marbl/MHAP/releases/download/v2.1.1/mhap-2.1.1.jar.gz
 
 Then run
 
 .. code-block:: bash
 
-   $ tar xvzf mhap-2.1.1.tar.gz
+   $ gunzip mhap-2.1.1.jar.gz
+   
+Now to run mhap
+
+.. code-block:: bash
+
+   $ java -jar mhap-2.1.1.jar
 
 Source
 -----------------


=====================================
pom.xml
=====================================
--- a/pom.xml
+++ b/pom.xml
@@ -3,7 +3,7 @@
 	<modelVersion>4.0.0</modelVersion>
 	<groupId>mhap</groupId>
 	<artifactId>mhap</artifactId>
-	<version>2.1.1</version>
+	<version>2.1.3</version>
 	<name>MinHash Alignment Process</name>
 	<build>
 		<resources>
@@ -134,4 +134,7 @@
 	</dependencies>
 	<url>https://github.com/marbl/MHAP</url>
 	<description>MinHash alignment process (MHAP pronounced MAP): locality sensitive hashing to detect overlaps and utilities.</description>
-</project>
\ No newline at end of file
+	<properties>
+		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
+	</properties>
+</project>


=====================================
src/main/java/edu/umd/marbl/mhap/impl/AbstractMatchSearch.java
=====================================
--- a/src/main/java/edu/umd/marbl/mhap/impl/AbstractMatchSearch.java
+++ b/src/main/java/edu/umd/marbl/mhap/impl/AbstractMatchSearch.java
@@ -64,7 +64,7 @@ public abstract class AbstractMatchSearch
 		this.sequencesSearched = new AtomicLong();
 	}
 
-	protected void addData(final SequenceSketchStreamer data)
+	protected void addData(final SequenceSketchStreamer data, boolean doReverseCompliment)
 	{
 		// figure out number of cores
 		ExecutorService execSvc = Executors.newFixedThreadPool(this.numThreads);
@@ -80,7 +80,7 @@ public abstract class AbstractMatchSearch
 					try
 					{
 						ReadBuffer buf = new ReadBuffer();
-						SequenceSketch seqHashes = data.dequeue(false, buf);
+						SequenceSketch seqHashes = data.dequeue(!doReverseCompliment, buf);
 						while (seqHashes != null)
 						{
 							addSequence(seqHashes);


=====================================
src/main/java/edu/umd/marbl/mhap/impl/FastaData.java
=====================================
--- a/src/main/java/edu/umd/marbl/mhap/impl/FastaData.java
+++ b/src/main/java/edu/umd/marbl/mhap/impl/FastaData.java
@@ -126,7 +126,8 @@ public class FastaData implements Cloneable
 	{
 		StringBuilder fastaSeq = new StringBuilder();
 		String header = null;
-
+		long index = -1 - this.offset;
+		
 		synchronized (this.fileReader)
 		{
 			if (this.readFullFile)
@@ -175,13 +176,14 @@ public class FastaData implements Cloneable
 				else
 					break;
 			}
+
+			if (fastaSeq.length()>0)
+				index = this.numberProcessed.incrementAndGet();			
 		}			
 		
 		String fastaSeqSring = fastaSeq.toString();
 		if (!fastaSeqSring.isEmpty())
 		{
-			long index = this.numberProcessed.incrementAndGet();
-			
 			//generate sequence id
 			SequenceId id;
 			if (SequenceId.STORE_FULL_ID)


=====================================
src/main/java/edu/umd/marbl/mhap/impl/MinHashBitSequenceSubSketches.java
=====================================
--- a/src/main/java/edu/umd/marbl/mhap/impl/MinHashBitSequenceSubSketches.java
+++ b/src/main/java/edu/umd/marbl/mhap/impl/MinHashBitSequenceSubSketches.java
@@ -61,7 +61,7 @@ public final class MinHashBitSequenceSubSketches
 			int currStart = Math.max(0, end-stepSize);			
 
 			//compute minhashes
-			int[] sketch = new MinHashSketch(seq.substring(currStart, end), nGramSize, numWords*64).getMinHashArray();
+			int[] sketch = new MinHashSketch(seq.substring(currStart, end), nGramSize, numWords*64, true).getMinHashArray();
 			
 			sequence[iter] = new MinHashBitSketch(sketch);
 			
@@ -91,7 +91,7 @@ public final class MinHashBitSequenceSubSketches
 			int currStart = Math.max(0, end-stepSize*2);			
 
 			//compute minhashes
-			sketches[iter] = new MinHashBitSketch(new MinHashSketch(seq.substring(currStart, end), nGramSize, numWords*64).getMinHashArray());
+			sketches[iter] = new MinHashBitSketch(new MinHashSketch(seq.substring(currStart, end), nGramSize, numWords*64, true).getMinHashArray());
 			
 			start += stepSize;
 		}


=====================================
src/main/java/edu/umd/marbl/mhap/impl/MinHashSearch.java
=====================================
--- a/src/main/java/edu/umd/marbl/mhap/impl/MinHashSearch.java
+++ b/src/main/java/edu/umd/marbl/mhap/impl/MinHashSearch.java
@@ -61,7 +61,7 @@ public final class MinHashSearch extends AbstractMatchSearch
 	private final Map<SequenceId, SequenceSketch> sequenceVectorsHash;
 	
 	public MinHashSearch(SequenceSketchStreamer data, int numHashes, int numMinMatches, int numThreads, 
-			boolean storeResults, int minStoreLength, double maxShift, double acceptScore) throws IOException
+			boolean storeResults, int minStoreLength, double maxShift, double acceptScore, boolean doReverseCompliment) throws IOException
 	{
 		super(numThreads, storeResults);
 
@@ -91,8 +91,8 @@ public final class MinHashSearch extends AbstractMatchSearch
 			this.hashes.add(map);
 		}
 		
-		addData(data);
-		
+		//store both forward andd reverse
+		addData(data, doReverseCompliment);
 		
 		System.err.println("Stored "+this.sequenceVectorsHash.size()+" sequences in the index.");
 	}


=====================================
src/main/java/edu/umd/marbl/mhap/impl/SequenceSketch.java
=====================================
--- a/src/main/java/edu/umd/marbl/mhap/impl/SequenceSketch.java
+++ b/src/main/java/edu/umd/marbl/mhap/impl/SequenceSketch.java
@@ -67,8 +67,10 @@ public final class SequenceSketch implements Serializable
 			// dos.writeBoolean(this.id.isForward());
 			boolean isFwd = input.readBoolean();
 
-			// dos.writeInt(this.id.getHeaderId());
-			SequenceId id = new SequenceId(input.readLong() + offset, isFwd);
+			// dos.writeLong(this.id.getHeaderId());
+			// dos.writeUTF(this.id.getHeader());
+			
+			SequenceId id = new SequenceId(input.readLong() + offset, isFwd, input.readUTF());
 			
 			//dos.writeInt(this.sequenceLength);
 			int sequenceLength = input.readInt();
@@ -101,13 +103,16 @@ public final class SequenceSketch implements Serializable
 		this.orderedHashes = orderedHashes;
 	}
 
-	public SequenceSketch(Sequence seq, int kmerSize, int numHashes, int orderedKmerSize, int orderedSketchSize, FrequencyCounts kmerFilter, double repeatWeight) throws ZeroNGramsFoundException
+	public SequenceSketch(Sequence seq, int kmerSize, int numHashes, int orderedKmerSize, int orderedSketchSize, FrequencyCounts kmerFilter, boolean doReverseCompliment, double repeatWeight) throws ZeroNGramsFoundException
 	{
 		this.sequenceLength = seq.length();
 		this.id = seq.getId();
-		this.mainHashes = new MinHashSketch(seq.getSquenceString(), kmerSize, numHashes, kmerFilter, repeatWeight);
 		
-		this.orderedHashes = new BottomOverlapSketch(seq.getSquenceString(), orderedKmerSize, orderedSketchSize);
+		//do not do reverse compliment for minhash, since unordered
+		this.mainHashes = new MinHashSketch(seq.getSquenceString(), kmerSize, numHashes, kmerFilter, false, repeatWeight);
+		
+		//do not do reverse compliment
+		this.orderedHashes = new BottomOverlapSketch(seq.getSquenceString(), orderedKmerSize, orderedSketchSize, false);
 	}
 
 	public SequenceSketch createOffset(int offset)
@@ -128,6 +133,7 @@ public final class SequenceSketch implements Serializable
 		{
 			dos.writeBoolean(this.id.isForward());
 			dos.writeLong(this.id.getHeaderId());
+			dos.writeUTF(this.id.getHeader());
 			dos.writeInt(this.sequenceLength);
 			dos.write(mainHashesBytes);
 			dos.write(orderedHashesBytes);


=====================================
src/main/java/edu/umd/marbl/mhap/impl/SequenceSketchStreamer.java
=====================================
--- a/src/main/java/edu/umd/marbl/mhap/impl/SequenceSketchStreamer.java
+++ b/src/main/java/edu/umd/marbl/mhap/impl/SequenceSketchStreamer.java
@@ -66,6 +66,7 @@ public class SequenceSketchStreamer
 
 	private final int orderedSketchSize;
 	private boolean readClosed;
+	private boolean doReverseCompliment;
 	private final boolean readingFasta;
 	private final double repeatWeight;
 	private final ConcurrentLinkedQueue<SequenceSketch> sequenceHashList;
@@ -79,6 +80,7 @@ public class SequenceSketchStreamer
 		this.kmerFilter = null;
 		this.repeatWeight = 0;
 		this.minOlapLength = minOlapLength;
+		this.doReverseCompliment = false;
 
 		this.kmerSize = 0;
 		this.numHashes = 0;
@@ -91,7 +93,7 @@ public class SequenceSketchStreamer
 	}
 
 	public SequenceSketchStreamer(String file, int minOlapLength, int kmerSize, int numHashes, int orderedKmerSize, int orderedSketchSize,
-			FrequencyCounts kmerFilter, double repeatWeight, int offset) throws IOException
+			FrequencyCounts kmerFilter, boolean doReverseCompliment, double repeatWeight, int offset) throws IOException
 	{
 		this.fastaData = new FastaData(file, offset);
 		this.readingFasta = true;
@@ -99,7 +101,8 @@ public class SequenceSketchStreamer
 		this.numberProcessed = new AtomicLong();
 		this.repeatWeight = repeatWeight;
 		this.minOlapLength = minOlapLength;
-
+		this.doReverseCompliment = doReverseCompliment;
+		
 		this.kmerFilter = kmerFilter;
 		this.kmerSize = kmerSize;
 		this.numHashes = numHashes;
@@ -259,7 +262,7 @@ public class SequenceSketchStreamer
 	public SequenceSketch getSketch(Sequence seq) throws ZeroNGramsFoundException
 	{
 		// compute the hashes
-		return new SequenceSketch(seq, this.kmerSize, this.numHashes, this.orderedKmerSize, this.orderedSketchSize, this.kmerFilter, this.repeatWeight);
+		return new SequenceSketch(seq, this.kmerSize, this.numHashes, this.orderedKmerSize, this.orderedSketchSize, this.kmerFilter, this.doReverseCompliment, this.repeatWeight);
 	}
 
 	protected void processAddition(SequenceSketch seqHashes)


=====================================
src/main/java/edu/umd/marbl/mhap/main/KmerStatSimulator.java
=====================================
--- a/src/main/java/edu/umd/marbl/mhap/main/KmerStatSimulator.java
+++ b/src/main/java/edu/umd/marbl/mhap/main/KmerStatSimulator.java
@@ -187,15 +187,15 @@ public class KmerStatSimulator {
 	}
 	
 	public double compareMinHash(String first, String second) {
-		BottomSketch h1 = new BottomSketch(first, this.kmer, 1256);
-		BottomSketch h2 = new BottomSketch(second, this.kmer, 1256);
+		BottomSketch h1 = new BottomSketch(first, this.kmer, 1256, true);
+		BottomSketch h2 = new BottomSketch(second, this.kmer, 1256, true);
 		
 		return h1.jaccard(h2);
 	}
 	
 	public double compareMinHash2(String first, String second) throws ZeroNGramsFoundException {
-		MinHashSketch h1 = new MinHashSketch(first, this.kmer, 1256, null, 1.0);
-		MinHashSketch h2 = new MinHashSketch(second, this.kmer, 1256, null, 1.0);
+		MinHashSketch h1 = new MinHashSketch(first, this.kmer, 1256, null, true, 1.0);
+		MinHashSketch h2 = new MinHashSketch(second, this.kmer, 1256, null, true, 1.0);
 		
 		return h1.jaccard(h2);
 	}


=====================================
src/main/java/edu/umd/marbl/mhap/main/MhapMain.java
=====================================
--- a/src/main/java/edu/umd/marbl/mhap/main/MhapMain.java
+++ b/src/main/java/edu/umd/marbl/mhap/main/MhapMain.java
@@ -62,11 +62,14 @@ public final class MhapMain
 	private final String processFile;
 	private final String toFile;
 	private final double repeatWeight;
+	private final boolean doReverseCompliment;
 
 	private static final double DEFAULT_OVERLAP_ACCEPT_SCORE = 0.78;
 
 	private static final double DEFAULT_REPEAT_WEIGHT= 0.9;
 
+	private static final double DEFAULT_REPEAT_IDF_SCALE = 3.0;
+
 	private static final double DEFAULT_FILTER_CUTOFF = 1.0e-5;
 
 	private static final int DEFAULT_KMER_SIZE = 16;
@@ -101,22 +104,24 @@ public final class MhapMain
 		options.addOption("-q", "Usage 1: The FASTA file of reads, or a directory of files, that will be compared to the set of reads in the box (see -s). Usage 2: The output directory for the binary formatted dat files.", "");
 		options.addOption("-p", "Usage 2 only. The directory containing FASTA files that should be converted to binary format for storage.", "");
 		options.addOption("-f", "k-mer filter file used for filtering out highly repetative k-mers. Must be sorted in descending order of frequency (second column).", "");
-		options.addOption("-k", "[int], k-mer size used for MinHashing. The k-mer size for second stage filter is seperate, and cannot be modified.", DEFAULT_KMER_SIZE);
-		options.addOption("--num-hashes", "[int], number of min-mers to be used in MinHashing.", DEFAULT_NUM_WORDS);
-		options.addOption("--threshold", "[double], the threshold cutoff for the second stage sort-merge filter. This is based on the identity score computed from the Jaccard distance of k-mers (size given by ordered-kmer-size) in the overlapping regions.", DEFAULT_OVERLAP_ACCEPT_SCORE);
-		options.addOption("--filter-threshold", "[double], the cutoff at which the k-mer in the k-mer filter file is considered repetitive. This value for a specific k-mer is specified in the second column in the filter file. If no filter file is provided, this option is ignored.", DEFAULT_FILTER_CUTOFF);
-		options.addOption("--max-shift", "[double], region size to the left and right of the estimated overlap, as derived from the median shift and sequence length, where a k-mer matches are still considered valid. Second stage filter only.", DEFAULT_MAX_SHIFT_PERCENT);
-		options.addOption("--num-min-matches", "[int], minimum # min-mer that must be shared before computing second stage filter. Any sequences below that value are considered non-overlapping.", DEFAULT_NUM_MIN_MATCHES);
-		options.addOption("--num-threads", "[int], number of threads to use for computation. Typically set to #cores.", DEFAULT_NUM_THREADS);
+		options.addOption("-k", "[int], k-mer size used for MinHashing. The k-mer size for second stage filter is seperate, and can also be modified.", DEFAULT_KMER_SIZE);
+		options.addOption("--num-hashes", "[int], Number of min-mers to be used in MinHashing.", DEFAULT_NUM_WORDS);
+		options.addOption("--threshold", "[double], The threshold cutoff for the second stage sort-merge filter. This is based on the identity score computed from the Jaccard distance of k-mers (size given by ordered-kmer-size) in the overlapping regions.", DEFAULT_OVERLAP_ACCEPT_SCORE);
+		options.addOption("--filter-threshold", "[double], The cutoff at which the k-mer in the k-mer filter file is considered repetitive. This value for a specific k-mer is specified in the second column in the filter file. If no filter file is provided, this option is ignored.", DEFAULT_FILTER_CUTOFF);
+		options.addOption("--max-shift", "[double], Region size to the left and right of the estimated overlap, as derived from the median shift and sequence length, where a k-mer matches are still considered valid. Second stage filter only.", DEFAULT_MAX_SHIFT_PERCENT);
+		options.addOption("--num-min-matches", "[int], Minimum # min-mer that must be shared before computing second stage filter. Any sequences below that value are considered non-overlapping.", DEFAULT_NUM_MIN_MATCHES);
+		options.addOption("--num-threads", "[int], nNumber of threads to use for computation. Typically set to #cores.", DEFAULT_NUM_THREADS);
 		options.addOption("--repeat-weight", "[double] Repeat suppression strength for tf-idf weighing. <0.0 do unweighted MinHash (version 1.0), >=1.0 do only the tf weighing. To perform no idf weighting, do no supply -f option. ", DEFAULT_REPEAT_WEIGHT);
+		options.addOption("--repeat-idf-scale", "[double] The upper range of the idf (from tf-idf) scale. The full scale will be [1,X], where X is the parameter.", DEFAULT_REPEAT_IDF_SCALE);
 		options.addOption("--ordered-kmer-size", "[int] The size of k-mers used in the ordered second stage filter.", DEFAULT_ORDERED_KMER_SIZE);
 		options.addOption("--ordered-sketch-size", "[int] The sketch size for second stage filter.", DEFAULT_ORDERED_SKETCH_SIZE);
 		options.addOption("--min-store-length", "[int], The minimum length of the read that is stored in the box. Used to filter out short reads from FASTA file.", DEFAULT_MIN_STORE_LENGTH);
 		options.addOption("--min-olap-length", "[int], The minimum length of the read that used for overlapping. Used to filter out short reads from FASTA file.", DEFAULT_MIN_OVL_LENGTH);
 		options.addOption("--no-self", "Do not compute the overlaps between sequences inside a box. Should be used when the to and from sequences are coming from different files.", false);
-		options.addOption("--store-full-id", "Store full IDs as seen in FASTA file, rather than storing just the sequence position in the file. Some FASTA files have long IDS, slowing output of results. This options is ignored when using compressed file format.", false);
+		options.addOption("--store-full-id", "Store full IDs as seen in FASTA files, rather than storing just the sequence position in the file. Some FASTA files have long IDS, slowing output of results. This options is ignored when using compressed file format. Indexed file (-s) is indexed first, followed by -q files in alphabetical order.", false);
 		options.addOption("--supress-noise", "[int] 0) Does nothing, 1) completely removes any k-mers not specified in the filter file, 2) supresses k-mers not specified in the filter file, similar to repeats. ", 0);
 		options.addOption("--no-tf", "Do not perform the tf weighing, in the tf-idf weighing.", false);
+		options.addOption("--no-rc", "Do not store or do comparison of the reverse compliment strings.", false);
 		options.addOption("--settings", "Set all unset parameters for the default settings. Same defaults are applied to Nanopore and Pacbio reads. 0) None, 1) Default, 2) Fast, 3) Sensitive.", 0);
 		
 		if (!options.process(args))
@@ -264,6 +269,13 @@ public final class MhapMain
 		}
 
 		//check range
+		if (options.get("--repeat-idf-scale").getDouble()<1.0)
+		{
+			System.out.println("The minimum repeat idf scale must be >=1.0.");
+			System.exit(1);
+		}
+
+		//check range
 		if (options.get("--max-shift").getDouble()<-1.0)
 		{
 			System.out.println("The minimum shift must be greater than -1.");
@@ -322,6 +334,7 @@ public final class MhapMain
 		this.repeatWeight = options.get("--repeat-weight").getDouble();
 		this.orderedKmerSize = options.get("--ordered-kmer-size").getInteger();
 		this.orderedSketchSize = options.get("--ordered-sketch-size").getInteger();
+		this.doReverseCompliment = !options.get("--no-rc").getBoolean();
 		
 		// read in the kmer filter set
 		String filterFile = options.get("-f").getString();
@@ -339,10 +352,11 @@ public final class MhapMain
 				double maxFraction = options.get("--filter-threshold").getDouble();
 				int removeUnique = options.get("--supress-noise").getInteger();
 				boolean noTf = options.get("--no-tf").getBoolean();
+				double range = options.get("--repeat-idf-scale").getDouble();
 			
 				try (BufferedReader bf = Utils.getFile(filterFile, null))
 				{
-					this.kmerFilter = new FrequencyCounts(bf, maxFraction, offset, removeUnique, noTf, this.numThreads);
+					this.kmerFilter = new FrequencyCounts(bf, maxFraction, offset, removeUnique, noTf, this.numThreads, range, this.doReverseCompliment);
 				}
 			}
 			catch (Exception e)
@@ -403,7 +417,10 @@ public final class MhapMain
 				
 				if (fileList!=null)
 					for (File cf : fileList)
-						processFiles.add(cf);				
+						processFiles.add(cf);	
+				
+				//sort the files in alphabetical order
+				Collections.sort(processFiles);
 			}
 			
 			for (File pf : processFiles)
@@ -537,7 +554,7 @@ public final class MhapMain
 	public MinHashSearch getMatchSearch(SequenceSketchStreamer hashStreamer) throws IOException
 	{
 		return new MinHashSearch(hashStreamer, this.numHashes, this.numMinMatches, this.numThreads, false,
-				this.minStoreLength, this.maxShift, this.acceptScore);
+				this.minStoreLength, this.maxShift, this.acceptScore, this.doReverseCompliment);
 	}
 	
 	public SequenceSketchStreamer getSequenceHashStreamer(String file, int offset) throws IOException
@@ -547,7 +564,7 @@ public final class MhapMain
 			seqStreamer = new SequenceSketchStreamer(file, this.minOlapLength, offset);
 		else
 			seqStreamer = new SequenceSketchStreamer(file, this.minOlapLength, this.kmerSize, this.numHashes,
-					this.orderedKmerSize, this.orderedSketchSize, this.kmerFilter, this.repeatWeight, offset);
+					this.orderedKmerSize, this.orderedSketchSize, this.kmerFilter, this.doReverseCompliment, this.repeatWeight, offset);
 
 		return seqStreamer;
 	}


=====================================
src/main/java/edu/umd/marbl/mhap/sketch/BottomOverlapSketch.java
=====================================
--- a/src/main/java/edu/umd/marbl/mhap/sketch/BottomOverlapSketch.java
+++ b/src/main/java/edu/umd/marbl/mhap/sketch/BottomOverlapSketch.java
@@ -61,7 +61,7 @@ public final class BottomOverlapSketch
 		}
 	}
 	
-	private final static class MatchData
+	public final static class MatchData
 	{
 		private int absMaxShiftInOverlap;
 		private int count; 
@@ -207,7 +207,7 @@ public final class BottomOverlapSketch
 				else
 				{
 					this.medianShift = 0;
-					this.absMaxShiftInOverlap = Math.max(this.seqLength1, this.seqLength2);
+					this.absMaxShiftInOverlap = Math.max(this.seqLength1, this.seqLength2)+1;
 				}
 			}
 			
@@ -274,6 +274,27 @@ public final class BottomOverlapSketch
 			
 			return valid;
 		}
+		
+		public String matchesToString()
+		{
+			StringBuilder str = new StringBuilder();
+			str.append("MatchData matches (size="+this.count+"):\n");
+			for (int i=0; i<this.count; i++)
+			{
+				str.append("\t"+this.pos1Index[i]+" "+this.pos2Index[i]+" "+this.posShift[i]+"\n");
+			}
+			
+			return str.toString();
+		}
+
+		/* (non-Javadoc)
+		 * @see java.lang.Object#toString()
+		 */
+		@Override
+		public String toString()
+		{
+			return "MatchData [count=" + count + ", shift="+getMedianShift()+"]";
+		}
 	}
 	
 	private final int kmerSize;
@@ -432,11 +453,63 @@ public final class BottomOverlapSketch
 				{				
 					//record match
 					matchData.recordMatch(pos1, pos2, currShift);
-	
-					// don't rely on repeats in the first iteration
-					if (repeat == 0)
+
+					//we need to create symmetry for reverse compliment, so we will look at first and last matches
+					
+					//move the index to last point of same hash
+					int i1Last = i1;
+					int i1Try = i1+1;
+					if (i1Try<seq1KmerHashes.length)
+					{
+						int hash1Try = seq1KmerHashes[i1Try][0];
+						int pos1Try = seq1KmerHashes[i1Try][1];
+						while((hash1Try == hash1 && pos1Try >= valid1Lower && pos1Try < valid1Upper))
+						{
+							i1Last = i1Try;
+
+							i1Try++;
+							if (i1Try>=seq1KmerHashes.length)
+								break;
+							
+							hash1Try = seq1KmerHashes[i1Try][0];
+							pos1Try = seq1KmerHashes[i1Try][1];
+						}
+					}
+
+					//move the index to last point of same hash
+					int i2Last = i2;
+					int i2Try = i2+1;
+					if (i2Try<seq2KmerHashes.length)
+					{
+						int hash2Try = seq2KmerHashes[i2Try][0];
+						int pos2Try = seq2KmerHashes[i2Try][1];
+						while((hash2Try == hash2 && pos2Try >= valid2Lower && pos2Try < valid2Upper))
+						{
+							i2Last = i2Try;
+							i2Try++;
+							if (i2Try>=seq2KmerHashes.length)
+								break;
+
+							hash2Try = seq2KmerHashes[i2Try][0];
+							pos2Try = seq2KmerHashes[i2Try][1];
+						}
+					}
+
+					//store the match and update the counters
+					if (i1!=i1Last || i2!=i2Last)
+					{		
+						int pos1New =  seq1KmerHashes[i1Last][1];
+						int pos2New =  seq2KmerHashes[i2Last][1];
+						matchData.recordMatch(pos1New, pos2New, pos2New-pos1New);
+						i1 = i1Last+1;
+						i2 = i2Last+1;
+					}
+					else
+					{
+						//simply move on if they don't match
 						i1++;
-					i2++;
+						i2++;
+					}
 				}
 			}
 		}
@@ -449,7 +522,7 @@ public final class BottomOverlapSketch
 		this.kmerSize = kmerSize;
 	}
 
-	public BottomOverlapSketch(String seq, int kmerSize, int sketchSize) throws ZeroNGramsFoundException
+	public BottomOverlapSketch(String seq, int kmerSize, int sketchSize, boolean doReverseCompliment) throws ZeroNGramsFoundException
 	{
 		this.kmerSize = kmerSize;
 		this.seqLength = seq.length() - kmerSize + 1;
@@ -458,7 +531,7 @@ public final class BottomOverlapSketch
 			throw new ZeroNGramsFoundException("Sequence length must be greater or equal to n-gram size "+kmerSize+".", seq);
 		
 		// compute just direct hash of sequence
-		int[] hashes = HashUtils.computeSequenceHashes(seq, kmerSize);
+		int[] hashes = HashUtils.computeSequenceHashes(seq, kmerSize, doReverseCompliment);
 
 		int[] perm = new int[hashes.length];
 
@@ -525,7 +598,8 @@ public final class BottomOverlapSketch
 		MatchData matchData = new MatchData(this, toSequence, maxShiftPercent);
 
 		//get the initial matches
-		recordMatchingKmers(matchData, this.orderedHashes, toSequence.orderedHashes, 0);			
+		recordMatchingKmers(matchData, this.orderedHashes, toSequence.orderedHashes, 0);
+		//System.out.println(matchData.matchesToString());
 		if (matchData.isEmpty())
 			return OverlapInfo.EMPTY;
 


=====================================
src/main/java/edu/umd/marbl/mhap/sketch/BottomSketch.java
=====================================
--- a/src/main/java/edu/umd/marbl/mhap/sketch/BottomSketch.java
+++ b/src/main/java/edu/umd/marbl/mhap/sketch/BottomSketch.java
@@ -11,9 +11,9 @@ public class BottomSketch implements Sketch<BottomSketch>
 	 */
 	private static final long serialVersionUID = 9035607728472270206L;
 
-	public BottomSketch(String str, int nGramSize, int k)
+	public BottomSketch(String str, int nGramSize, int k, boolean doReverseCompliment)
 	{
-		int[] hashes = HashUtils.computeSequenceHashes(str, nGramSize);
+		int[] hashes = HashUtils.computeSequenceHashes(str, nGramSize, doReverseCompliment);
 		
 		k = Math.min(k, hashes.length);
 		


=====================================
src/main/java/edu/umd/marbl/mhap/sketch/FrequencyCounts.java
=====================================
--- a/src/main/java/edu/umd/marbl/mhap/sketch/FrequencyCounts.java
+++ b/src/main/java/edu/umd/marbl/mhap/sketch/FrequencyCounts.java
@@ -56,17 +56,18 @@ public final class FrequencyCounts
 	private final double minValue;
 	private final boolean noTf;
 	private final double offset;
+	private final double range;
 	private final int removeUnique;
 	private final BloomFilter<Long> validMers;
 	
-	public static final double REPEAT_SCALE = 3.0;
-	
-	public FrequencyCounts(BufferedReader bf, double filterCutoff, double offset, int removeUnique, boolean noTf, int numThreads) throws IOException
+	public FrequencyCounts(BufferedReader bf, double filterCutoff, double offset, int removeUnique, boolean noTf, int numThreads, double range, boolean doReverseCompliment) throws IOException
 	{
 		//removeUnique = 0: do nothing extra to k-mers not specified in the file
 		//removeUnique = 1: remove k-mers not specified in the file from the sketch
 		//removeUnique = 2: supress k-mers not specified in the file the same as max supression
 		
+		this.range = range;
+		
 		if (removeUnique<0 || removeUnique>2)
 			throw new MhapRuntimeException("Unknown removeUnique option "+removeUnique+".");
 		
@@ -78,7 +79,7 @@ public final class FrequencyCounts
 		this.noTf = noTf;
 		
 		// generate hashset
-		Long2DoubleOpenHashMap validMap = new Long2DoubleOpenHashMap();
+		Long2DoubleOpenHashMap validMap;
 		BloomFilter<Long> validMers;
 
 		//the max value observed in the list
@@ -88,29 +89,52 @@ public final class FrequencyCounts
 		String line = bf.readLine();
 		try
 		{
-			long size;
+			long sizeBloom;
+			long sizeRepeat;
 			if (line==null)
 			{
 				System.err.println("Warning, k-mer filter file is empty. Assuming zero entries.");
-				size = 1L;
+				sizeBloom = sizeRepeat = 1L;
 			}
 			else
 			{
-				size = Long.parseLong(line);
+				// we assume the line has two entries, the first is the size of the bloom filter, the second is the size of the filter set
+				String[] splitLine = line.trim().split("\\s+");
+				sizeBloom = Long.parseLong(splitLine[0]);
+				sizeRepeat = Long.parseLong(splitLine[1]);
+				System.err.println("Read in values for repeat " + sizeRepeat + " and " + sizeBloom);
 			
-				if (size<0L)
+				if (sizeBloom<0L || sizeRepeat <0L)
 					throw new MhapRuntimeException("K-mer filter file size line must have positive long value.");
 				else
-				if (size==0L)
+				if (sizeBloom==0L)
 				{
 					System.err.println("Warning, k-mer filter file has zero elements.");
-					size = 1L;
+					sizeBloom = 1L;
 				}
 			}
 			
+			System.err.println("Initializing");
+			Long2DoubleOpenHashMap tempMap = null;
+			for (long i = sizeRepeat; i > 0; i /= 2) {
+				try {	
+					System.err.print("Trying size " + i);
+					tempMap = new Long2DoubleOpenHashMap((int)(i));
+					System.err.println(" and it was successfull");
+					break;
+				} catch (IllegalArgumentException e) {
+					System.err.println(" and it was too big, trying smaller");
+				}
+			}
+			if (tempMap == null)
+				validMap = new Long2DoubleOpenHashMap();
+			else
+				validMap = tempMap;
+			System.err.println("Initialized");
+
 			//if no nothing, no need to store the while list
 			if (removeUnique>0)
-				validMers = BloomFilter.create((value, sink) -> sink.putLong(value), size, 1.0e-5);
+				validMers = BloomFilter.create((value, sink) -> sink.putLong(value), sizeBloom, 1.0e-5);
 			else
 				validMers = null;
 		}
@@ -142,7 +166,7 @@ public final class FrequencyCounts
 						this.kmerSizes.add(str[0].length());
 					}					
 					
-					long[] hash = HashUtils.computeSequenceHashesLong(str[0], str[0].length(), 0);
+					long[] hash = HashUtils.computeSequenceHashesLong(str[0], str[0].length(), 0, doReverseCompliment);
 					
 					if (str.length >= 2)
 					{
@@ -265,7 +289,7 @@ public final class FrequencyCounts
 	
 	public double scaledIdf(long hash)
 	{
-		return scaledIdf(hash, REPEAT_SCALE);
+		return scaledIdf(hash, this.range);
 	}
 	
 	public double scaledIdf(long hash, double maxValue)


=====================================
src/main/java/edu/umd/marbl/mhap/sketch/HashUtils.java
=====================================
--- a/src/main/java/edu/umd/marbl/mhap/sketch/HashUtils.java
+++ b/src/main/java/edu/umd/marbl/mhap/sketch/HashUtils.java
@@ -37,6 +37,7 @@ import com.google.common.hash.Hashing;
 
 import edu.umd.marbl.mhap.math.BasicMath;
 import edu.umd.marbl.mhap.utils.MersenneTwisterFast;
+import edu.umd.marbl.mhap.utils.Utils;
 
 public class HashUtils
 {
@@ -157,7 +158,7 @@ public class HashUtils
 		return hashes;
 	}
 
-	public final static long[][] computeNGramHashes(final String seq, final int nGramSize, final int numWords, final int seed)
+	public final static long[][] computeNGramHashes(final String seq, final int nGramSize, final int numWords, final int seed, boolean doReverseCompliment)
 	{
 		final int numberNGrams = seq.length()-nGramSize+1;
 	
@@ -165,7 +166,7 @@ public class HashUtils
 			throw new SketchRuntimeException("N-gram size bigger than string length.");
 	
 		// get the rabin hashes
-		final long[] rabinHashes = computeSequenceHashesLong(seq, nGramSize, seed);
+		final long[] rabinHashes = computeSequenceHashesLong(seq, nGramSize, seed, doReverseCompliment);
 	
 		final long[][] hashes = new long[rabinHashes.length][numWords];
 	
@@ -209,28 +210,47 @@ public class HashUtils
 		return hashes;
 	}
 
-	public final static int[] computeSequenceHashes(final String seq, final int nGramSize)
+	public final static int[] computeSequenceHashes(final String seq, final int nGramSize, boolean doReverseCompliment)
 	{
 		HashFunction hf = Hashing.murmur3_32(0);
 	
 		int[] hashes = new int[seq.length() - nGramSize + 1];
 		for (int iter = 0; iter < hashes.length; iter++)
 		{
-			HashCode hc = hf.newHasher().putUnencodedChars(seq.substring(iter, iter + nGramSize)).hash();
+			String str = seq.substring(iter, iter + nGramSize);
+			
+			String strReverse = null;
+			if (doReverseCompliment)
+			{
+				strReverse  = Utils.rc(str);
+				if (strReverse.compareTo(str)<0)
+					str = strReverse;
+			}
+
+			HashCode hc = hf.newHasher().putUnencodedChars(str).hash();
 			hashes[iter] = hc.asInt();
 		}
 	
 		return hashes;
 	}
 
-	public final static long[] computeSequenceHashesLong(final String seq, final int nGramSize, final int seed)
+	public final static long[] computeSequenceHashesLong(final String seq, final int nGramSize, final int seed, final boolean doReverseCompliment)
 	{
 		HashFunction hf = Hashing.murmur3_128(seed);
 	
 		long[] hashes = new long[seq.length() - nGramSize + 1];
 		for (int iter = 0; iter < hashes.length; iter++)
 		{
-			HashCode hc = hf.newHasher().putUnencodedChars(seq.substring(iter, iter + nGramSize)).hash();
+			String str = seq.substring(iter, iter + nGramSize);
+			String strReverse = null;
+			if (doReverseCompliment)
+			{
+				strReverse  = Utils.rc(str);
+				if (strReverse.compareTo(str)<0)
+					str = strReverse;
+			}
+			
+			HashCode hc = hf.newHasher().putUnencodedChars(str).hash();
 			hashes[iter] = hc.asLong();
 		}
 	


=====================================
src/main/java/edu/umd/marbl/mhap/sketch/MinHashBitSketch.java
=====================================
--- a/src/main/java/edu/umd/marbl/mhap/sketch/MinHashBitSketch.java
+++ b/src/main/java/edu/umd/marbl/mhap/sketch/MinHashBitSketch.java
@@ -75,9 +75,9 @@ public final class MinHashBitSketch extends AbstractBitSketch<MinHashBitSketch>
 		super(getAsBits(minHashes));
 	}
 	
-	public MinHashBitSketch(String seq, int nGramSize, int numWords) throws ZeroNGramsFoundException
+	public MinHashBitSketch(String seq, int nGramSize, int numWords, boolean doReverseCompliment) throws ZeroNGramsFoundException
 	{
-		super(getAsBits(new MinHashSketch(seq, nGramSize, numWords*64).getMinHashArray()));
+		super(getAsBits(new MinHashSketch(seq, nGramSize, numWords*64, doReverseCompliment).getMinHashArray()));
 	}
 	
 	public final double jaccard(final MinHashBitSketch sh)


=====================================
src/main/java/edu/umd/marbl/mhap/sketch/MinHashSketch.java
=====================================
--- a/src/main/java/edu/umd/marbl/mhap/sketch/MinHashSketch.java
+++ b/src/main/java/edu/umd/marbl/mhap/sketch/MinHashSketch.java
@@ -49,7 +49,7 @@ public final class MinHashSketch implements Sketch<MinHashSketch>
 	private static final long serialVersionUID = 8846482698636860862L;
 	
 	private final static int[] computeNgramMinHashesWeighted(String seq, final int nGramSize, final int numHashes,
-			FrequencyCounts kmerFilter, double repeatWeight) throws ZeroNGramsFoundException
+			FrequencyCounts kmerFilter, boolean doReverseCompliment, double repeatWeight) throws ZeroNGramsFoundException
 	{
 		final int numberNGrams = seq.length() - nGramSize + 1;
 	
@@ -60,7 +60,7 @@ public final class MinHashSketch implements Sketch<MinHashSketch>
 		//	throw new SketchRuntimeException("repeatWeight cannot be >=1.");
 
 		// get the kmer hashes
-		final long[] kmerHashes = HashUtils.computeSequenceHashesLong(seq, nGramSize, 0);
+		final long[] kmerHashes = HashUtils.computeSequenceHashesLong(seq, nGramSize, 0, doReverseCompliment);
 		
 		//now compute the counts of occurance
 		Long2ObjectLinkedOpenHashMap<HitCounter> hitMap = new Long2ObjectLinkedOpenHashMap<HitCounter>(kmerHashes.length);
@@ -205,14 +205,14 @@ public final class MinHashSketch implements Sketch<MinHashSketch>
 		this.minHashes = minHashes;
 	}
 	
-	public MinHashSketch(String str, int nGramSize, int numHashes) throws ZeroNGramsFoundException
+	public MinHashSketch(String str, int nGramSize, int numHashes, boolean doReverseCompliment) throws ZeroNGramsFoundException
 	{
-		this.minHashes = MinHashSketch.computeNgramMinHashesWeighted(str, nGramSize, numHashes, null, -1.0);
+		this.minHashes = MinHashSketch.computeNgramMinHashesWeighted(str, nGramSize, numHashes, null, doReverseCompliment, -1.0);
 	}
 	
-	public MinHashSketch(String seq, int nGramSize, int numHashes, FrequencyCounts freqFilter, double repeatWeight) throws ZeroNGramsFoundException
+	public MinHashSketch(String seq, int nGramSize, int numHashes, FrequencyCounts freqFilter, boolean doReverseCompliment, double repeatWeight) throws ZeroNGramsFoundException
 	{
-		this.minHashes = MinHashSketch.computeNgramMinHashesWeighted(seq, nGramSize, numHashes, freqFilter, repeatWeight);
+		this.minHashes = MinHashSketch.computeNgramMinHashesWeighted(seq, nGramSize, numHashes, freqFilter, doReverseCompliment, repeatWeight);
 	}
 
 	public byte[] getAsByteArray()


=====================================
src/main/java/edu/umd/marbl/mhap/utils/Utils.java
=====================================
--- a/src/main/java/edu/umd/marbl/mhap/utils/Utils.java
+++ b/src/main/java/edu/umd/marbl/mhap/utils/Utils.java
@@ -33,6 +33,7 @@ import java.text.DecimalFormat;
 import java.text.NumberFormat;
 import java.util.ArrayList;
 import java.util.HashMap;
+import java.util.Map;
 import java.util.Random;
 import java.io.BufferedInputStream;
 import java.io.BufferedReader;
@@ -44,10 +45,12 @@ import java.io.FileReader;
 import org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream;
 import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
 
+import it.unimi.dsi.fastutil.objects.Object2ObjectOpenHashMap;
+
 public final class Utils
 {
 
-	public enum ToProtein
+	public static enum ToProtein
 	{
 		AAA("K"), AAC("N"), AAG("K"), AAT("N"), ACA("T"), ACC("T"), ACG("T"), ACT("T"), AGA("R"), AGC("S"), AGG("R"), AGT(
 				"S"), ATA("I"), ATC("I"), ATG("M"), ATT("I"), CAA("Q"), CAC("H"), CAG("Q"), CAT("H"), CCA("P"), CCC("P"), CCG(
@@ -78,22 +81,39 @@ public final class Utils
 		}
 	}
 
-	public enum Translate
+	public static class Translate
 	{
-		A("T"), B("V"), C("G"), D("H"), G("C"), H("D"), K("M"), M("K"), N("N"), R("Y"), S("S"), T("A"), V("B"), W("W"), Y(
-				"R");
-
-		private String other;
-
-		Translate(String other)
-		{
-			this.other = other;
+		private static Map<String,String> lookup = new Object2ObjectOpenHashMap<String,String>();
+		
+		static
+		{
+			lookup.put("A", "T");
+			lookup.put("B", "V");
+			lookup.put("C", "G");
+			lookup.put("D", "H");
+			lookup.put("G", "C");
+			lookup.put("H", "D");
+			lookup.put("K", "M");
+			lookup.put("M", "K");
+			lookup.put("N", "N");
+			lookup.put("R", "Y");
+			lookup.put("S", "S");
+			lookup.put("T", "A");
+			lookup.put("V", "B");
+			lookup.put("W", "W");
+			lookup.put("Y", "R");
+		}
+		
+		public static String getTranslation(String c)
+		{
+			String value = lookup.get(c);
+			if (value==null)
+				return c;
+			
+			return value;
 		}
 
-		public String getCompliment()
-		{
-			return this.other;
-		}
+		//A("T"), B("V"), C("G"), D("H"), G("C"), H("D"), K("M"), M("K"), N("N"), R("Y"), S("S"), T("A"), V("B"), W("W"), Y("R");
 	}
 
 	public static final int BUFFER_BYTE_SIZE = 8388608; // 8MB
@@ -479,16 +499,9 @@ public final class Utils
 		for (int i = supplied.length() - 1; i >= 0; i--)
 		{
 			char theChar = supplied.charAt(i);
-
-			if (theChar != '-')
-			{
-				Translate t = Translate.valueOf(Character.toString(theChar).toUpperCase());
-				st.append(t.getCompliment());
-			}
-			else
-			{
-				st.append("-");
-			}
+			
+			String c = Translate.getTranslation((Character.toString(theChar).toUpperCase()));
+			st.append(c);
 		}
 		return st.toString();
 	}



View it on GitLab: https://salsa.debian.org/med-team/mhap/compare/78207da2ebb99aa68b63b9209835c7138aab225f...ef042b49e04879970d8537a1ef7c967f59f3f67f

-- 
View it on GitLab: https://salsa.debian.org/med-team/mhap/compare/78207da2ebb99aa68b63b9209835c7138aab225f...ef042b49e04879970d8537a1ef7c967f59f3f67f
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20180717/94925dfd/attachment-0001.html>


More information about the debian-med-commit mailing list