[med-svn] [Git][med-team/fastp][upstream] New upstream version 1.0.1+dfsg

Dylan Aïssi (@daissi) gitlab at salsa.debian.org
Sat Aug 16 09:54:21 BST 2025



Dylan Aïssi pushed to branch upstream at Debian Med / fastp


Commits:
75395a60 by Dylan Aïssi at 2025-08-16T10:53:31+02:00
New upstream version 1.0.1+dfsg
- - - - -


30 changed files:

- .github/workflows/ci.yml
- Makefile
- README.md
- src/adaptertrimmer.cpp
- src/adaptertrimmer.h
- src/cmdline.h
- src/common.h
- src/evaluator.cpp
- src/evaluator.h
- src/fastqreader.cpp
- src/filterresult.cpp
- src/filterresult.h
- src/htmlreporter.cpp
- src/htmlreporter.h
- src/main.cpp
- + src/matcher.cpp
- + src/matcher.h
- src/options.cpp
- src/options.h
- src/overlapanalysis.cpp
- src/overlapanalysis.h
- src/peprocessor.cpp
- src/peprocessor.h
- src/seprocessor.cpp
- src/stats.cpp
- src/stats.h
- src/writer.cpp
- src/writer.h
- src/writerthread.cpp
- src/writerthread.h


Changes:

=====================================
.github/workflows/ci.yml
=====================================
@@ -1,18 +1,16 @@
 name: fastp ci
 on:
-  push:
-    branches:
-      - master
   pull_request:
     branches:
       - master
 jobs:
   build:
     strategy:
+      fail-fast: false
       matrix:
         os:
-          - ubuntu-latest
-          - macos-latest
+          - ubuntu-24.04
+          - macos-12
     runs-on: ${{ matrix.os }}
     steps:
       - name: checkout scm
@@ -35,6 +33,7 @@ jobs:
         with:
           repository: ebiggers/libdeflate
           path: src/libs/deflate
+          ref: v1.22
 
       - name: build deflate
         run: |
@@ -49,6 +48,7 @@ jobs:
         with:
           repository: intel/isa-l
           path: src/libs/isa-l
+          ref: v2.31.0
 
       - name: build isa-l
         run: |


=====================================
Makefile
=====================================
@@ -15,7 +15,7 @@ TARGET := fastp
 BIN_TARGET := ${TARGET}
 
 CXX ?= g++
-CXXFLAGS := -std=c++11 -pthread -g -O3 -I${DIR_INC} $(foreach includedir,$(INCLUDE_DIRS),-I$(includedir)) ${CXXFLAGS}
+CXXFLAGS := -std=c++11 -pthread -g -O3 -MD -MP -I${DIR_INC} $(foreach includedir,$(INCLUDE_DIRS),-I$(includedir)) ${CXXFLAGS}
 LIBS := -lisal -ldeflate -lpthread
 STATIC_FLAGS := -static -Wl,--no-as-needed -pthread
 LD_FLAGS := $(foreach librarydir,$(LIBRARY_DIRS),-L$(librarydir)) $(LIBS) $(LD_FLAGS)
@@ -28,27 +28,18 @@ ${BIN_TARGET}:${OBJ}
 static:${OBJ}
 	$(CXX) $(OBJ) -o ${BIN_TARGET} $(STATIC_LD_FLAGS)
 
-${DIR_OBJ}/%.o:${DIR_SRC}/%.cpp make_obj_dir
+${DIR_OBJ}/%.o:${DIR_SRC}/%.cpp
+	@mkdir -p $(@D)
 	$(CXX) -c $< -o $@ $(CXXFLAGS)
 
 .PHONY:clean
 .PHONY:static
 clean:
-	@if test -d $(DIR_OBJ) ; \
-	then \
-		find $(DIR_OBJ) -name *.o -delete; \
-	fi
-	@if test -e $(TARGET) ; \
-	then \
-		rm $(TARGET) ; \
-	fi
-
-make_obj_dir:
-	@if test ! -d $(DIR_OBJ) ; \
-	then \
-		mkdir $(DIR_OBJ) ; \
-	fi
+	@rm -rf $(DIR_OBJ)
+	@rm -f $(TARGET)
 
 install:
 	install $(TARGET) $(BINDIR)/$(TARGET)
 	@echo "Installed."
+
+-include $(OBJ:.o=.d)


=====================================
README.md
=====================================
@@ -210,12 +210,13 @@ New filters are being implemented. If you have a new idea or new request, please
 
 # adapters
 Adapter trimming is enabled by default, but you can disable it by `-A` or `--disable_adapter_trimming`. Adapter sequences can be automatically detected for both PE/SE data.
-* For SE data, the adapters are evaluated by analyzing the tails of first ~1M reads. This evaluation may be inacurrate, and you can specify the adapter sequence by `-a` or `--adapter_sequence` option. If adapter sequence is specified, the auto detection for SE data will be disabled.
-* For PE data, the adapters can be detected by per-read overlap analysis, which seeks for the overlap of each pair of reads. This method is robust and fast, so normally you don't have to input the adapter sequence even you know it. But you can still specify the adapter sequences for read1 by `--adapter_sequence`, and for read2 by `--adapter_sequence_r2`. If `fastp` fails to find an overlap (i.e. due to low quality bases), it will use these sequences to trim adapters for read1 and read2 respectively.
-* For PE data, the adapter sequence auto-detection is disabled by default since the adapters can be trimmed by overlap analysis. However, you can specify `--detect_adapter_for_pe` to enable it.
-* For PE data, `fastp` will run a little slower if you specify the sequence adapters or enable adapter auto-detection, but usually result in a slightly cleaner output, since the overlap analysis may fail due to sequencing errors or adapter dimers.
-* The most widely used adapter is the Illumina TruSeq adapters. If your data is from the TruSeq library, you can add `--adapter_sequence=AGATCGGAAGAGCACACGTCTGAACTCCAGTCA --adapter_sequence_r2=AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT` to your command lines, or enable auto detection for PE data by specifing `detect_adapter_for_pe`.
-* `fastp` contains some built-in known adapter sequences for better auto-detection. If you want to make some adapters to be a part of the built-in adapters, please file an issue.
+* For SE data, the adapter can be detected for most cases, but if `fastp` failed to find the adapter sequence, you can specify it by `-a` or `--adapter_sequence` option. If adapter sequence is specified, the auto detection is disabled.
+* For PE data, the adapters can be trimmed automatically by per-read overlap analysis, which seeks for the overlap of each pair of reads. This method is robust and fast, so normally you don't have to input the adapter sequence. But you can still specify the adapter sequences for read1 by `--adapter_sequence`, and for read2 by `--adapter_sequence_r2`. In case `fastp` fails to find an overlap for some pairs (i.e. due to low quality bases), it will use these sequences to trim adapters for read1 and read2 respectively.
+* For PE data, the auto adapter detection is disabled by default. You can enable it by specifing `-2` or `--detect_adapter_for_pe`. If you want to obtain ultra-clean data, this option is strongly suggested.
+* For PE data, `fastp` will run a little slower if you specify the sequence adapters or enable the adapter auto-detection. But it may result in a slightly cleaner output (usually finds 0.1% to 0.5% more adapters), since the overlap analysis may fail due to sequencing errors.
+* For PE data, you can specify `--allow_gap_overlap_trimming` to allow up to one gap when trim adapters by overlap analysis for PE data. By default no gap is allowed. This may take more time and usually have very limited effect (finds ~0.01% more adapters).
+* The most widely used adapters are Illumina TruSeq adapters. If your data is from the TruSeq library, `fastp` should be able to detect it successfully, otherwise you can add `--adapter_sequence=AGATCGGAAGAGCACACGTCTGAACTCCAGTCA --adapter_sequence_r2=AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT` to your command lines.
+* `fastp` contains some built-in known adapter sequences for better auto-detection. If you want to make some adapters to be a part of the built-in adapters, please file an issue or make a change in https://github.com/OpenGene/fastp/blob/master/src/knownadapters.h
 
 You can also specify `--adapter_fasta` to give a FASTA file to tell `fastp` to trim multiple adapters in this FASTA file. Here is a sample of such adapter FASTA file:
 ```
@@ -357,7 +358,7 @@ means that 150bp are from read1, and 15bp are from read2. `fastp` prefers the ba
 Same as the [base correction feature](#base-correction-for-pe-data), this function is also based on overlapping detection, which has adjustable parameters `overlap_len_require (default 30)`, `overlap_diff_limit (default 5)` and `overlap_diff_percent_limit (default 20%)`. Please note that the reads should meet these three conditions simultaneously.
 
 # duplication rate and deduplication
-For both SE and PE data, fastp supports evaluating its duplication rate and removing duplicated reads/pairs. fastp considers one read as duplicated only if its all base pairs are identical as another one. This meas if there is a sequencing error or an N base, the read will not be treated as duplicated.
+For both SE and PE data, fastp supports evaluating its duplication rate and removing duplicated reads/pairs. fastp considers one read as duplicated only if its all base pairs are identical as another one. This means if there is a sequencing error or an N base, the read will not be treated as duplicated.
 
 ## duplication rate evaluation
 By default, fastp evaluates duplication rate, and this module may use 1G memory and take 10% ~ 20% more running time. If you don't need the duplication rate information, you can set `--dont_eval_duplication` to disable the duplication evaluation. But please be noted that, if deduplication (`--dedup`) option is enabled, then `--dont_eval_duplication` option is ignored.
@@ -407,7 +408,8 @@ options:
   -a, --adapter_sequence               the adapter for read1. For SE data, if not specified, the adapter will be auto-detected. For PE data, this is used if R1/R2 are found not overlapped. (string [=auto])
       --adapter_sequence_r2            the adapter for read2 (PE data only). This is used if R1/R2 are found not overlapped. If not specified, it will be the same as <adapter_sequence> (string [=])
       --adapter_fasta                  specify a FASTA file to trim both read1 and read2 (if PE) by all the sequences in this FASTA file (string [=])
-      --detect_adapter_for_pe          by default, the adapter sequence auto-detection is enabled for SE data only, turn on this option to enable it for PE data.
+  -2, --detect_adapter_for_pe          enable adapter detection for PE data to get ultra-clean data. It takes more time to find just a little bit more adapters.
+      --allow_gap_overlap_trimming     allow up to one gap when trim adapters by overlap analysis for PE data. By default no gap is allowed.
 
   # global trimming options
   -f, --trim_front1                    trimming how many bases in front for read1, default is 0 (int [=0])


=====================================
src/adaptertrimmer.cpp
=====================================
@@ -1,4 +1,5 @@
 #include "adaptertrimmer.h"
+#include "matcher.h"
 
 AdapterTrimmer::AdapterTrimmer(){
 }
@@ -89,6 +90,7 @@ bool AdapterTrimmer::trimBySequence(Read* r, FilterResult* fr, string& adapterse
     else if(alen >= 8)
         start = -2;
     // we start from negative numbers since the Illumina adapter dimer usually have the first A skipped as A-tailing
+    // try exact match with hamming distance (no insertion of deletion)
     for(pos = start; pos<rlen-matchReq; pos++) {
         int cmplen = min(rlen - pos, alen);
         int allowedMismatch = cmplen/allowOneMismatchForEach;
@@ -110,6 +112,41 @@ bool AdapterTrimmer::trimBySequence(Read* r, FilterResult* fr, string& adapterse
 
     }
 
+    // if failed to exact match, we try one gap
+    // to lower computational cost, we only allow one gap, and it's much enough for short reads
+    // we try insertion in the sequence
+    bool hasInsertion = false;
+    if(!found) {
+        for(pos = 0; pos<rlen-matchReq-1; pos++) {
+            int cmplen = min(rlen - pos - 1, alen);
+            int allowedMismatch = cmplen/allowOneMismatchForEach -1;
+            bool matched = Matcher::matchWithOneInsertion(rdata, adata, cmplen, allowedMismatch);
+            if(matched) {
+                found = true;
+                hasInsertion = true;
+                //cerr << ".";
+                break;
+            }
+        }
+    }
+
+    // if failed to exact match, and failed to match with one insertion in sequence
+    // we then try deletion in the sequence
+    bool hasDeletion = false;
+    if(!found) {
+        for(pos = 0; pos<rlen-matchReq; pos++) {
+            int cmplen = min(rlen - pos, alen - 1);
+            int allowedMismatch = cmplen/allowOneMismatchForEach -1;
+            bool matched = Matcher::matchWithOneInsertion(adata, rdata, cmplen, allowedMismatch);
+            if(matched) {
+                found = true;
+                hasDeletion = true;
+                //cerr << "|";
+                break;
+            }
+        }
+    }
+
     if(found) {
         if(pos < 0) {
             string adapter = adapterseq.substr(0, alen+pos);


=====================================
src/adaptertrimmer.h
=====================================
@@ -21,7 +21,6 @@ public:
     static bool trimByMultiSequences(Read* r1, FilterResult* fr, vector<string>& adapterList, bool isR2 = false, bool incTrimmedCounter = true);
     static bool test();
 
-
 };
 
 


=====================================
src/cmdline.h
=====================================
@@ -588,7 +588,7 @@ private:
 
   void check(int argc, bool ok){
     if ((argc==1 && !ok) || exist("help")){
-      std::cerr<<usage();
+      std::cout<<usage();
       exit(0);
     }
 


=====================================
src/common.h
=====================================
@@ -1,7 +1,7 @@
 #ifndef COMMON_H
 #define COMMON_H
 
-#define FASTP_VER "0.24.0"
+#define FASTP_VER "1.0.1"
 
 #define _DEBUG false
 


=====================================
src/evaluator.cpp
=====================================
@@ -204,207 +204,92 @@ void Evaluator::evaluateReadNum(long& readNum) {
     }
 }
 
-// Depreciated
-string Evaluator::evalAdapterAndReadNumDepreciated(long& readNum) {
-    FastqReader reader(mOptions->in1);
-    // stat up to 1M reads
-    const long READ_LIMIT = 1024*1024;
-    const long BASE_LIMIT = 151 * READ_LIMIT;
-    long records = 0;
-    long bases = 0;
-    size_t firstReadPos = 0;
-
-    size_t bytesRead;
-    size_t bytesTotal;
+string Evaluator::checkKnownAdapters(Read** reads, long num) {
+    map<string, string> knownAdapters = getKnownAdapter();
+    map<string, int> possibleCounts;
+    map<string, int> mismatches;
 
-    // we have to shift last cycle for evaluation since it is so noisy, especially for Illumina data
-    const int shiftTail = max(1, mOptions->trim.tail1);
+    // for performance, up to 100k reads and 100M bases
+    const int MAX_CHECK_READS = 100000;
+    const int MAX_CHECK_BASES = MAX_CHECK_READS*1000;
+    // if we hit for 1000 times, then we exit
+    const int MAX_HIT = 1000;
 
-    // we count the last [2, 9] bp of each read
-    // why we add trim_tail here? since the last cycle are usually with low quality and should be trimmed
-    const int keylen = 10;
-    int size = 1 << (keylen*2 );
-    unsigned int* counts = new unsigned int[size];
-    memset(counts, 0, sizeof(unsigned int)*size);
-    bool reachedEOF = false;
-    bool first = true;
-    while(records < READ_LIMIT && bases < BASE_LIMIT) {
-        Read* r = reader.read();
-        if(!r) {
-            reachedEOF = true;
-            break;
-        }
-        if(first) {
-            reader.getBytes(bytesRead, bytesTotal);
-            firstReadPos = bytesRead;
-            first = false;
-        }
-        int rlen = r->length();
-        bases += rlen;
-        if(rlen < keylen + 1 + shiftTail)
-            continue;
+    const int matchReq = 8;
+    const int allowOneMismatchForEach = 16;
 
-        const char* data = r->mSeq->c_str();
-        bool valid = true;
-        unsigned int key = 0;
-        for(int i=0; i<keylen; i++) {
-            key = (key << 2);
-            char base = data[rlen - keylen - shiftTail + i];
-            switch (base) {
-                case 'A':
-                    key += 0;
-                    break;
-                case 'T':
-                    key += 1;
-                    break;
-                case 'C':
-                    key += 2;
-                    break;
-                case 'G':
-                    key += 3;
-                    break;
-                default:
-                    // N or anything else
-                    valid = false;
-                    break;
-            }
-            if(!valid)
-                break;
-        }
-        if(valid) {
-            counts[key]++;
-            records++;
-        }
-        delete r;
-    }
+    map<string, string>::iterator iter;
 
-    readNum = 0;
-    if(reachedEOF){
-        readNum = records;
-    } else if(records>0) {
-        // by the way, update readNum so we don't need to evaluate it if splitting output is enabled
-        reader.getBytes(bytesRead, bytesTotal);
-        double bytesPerRead = (double)(bytesRead - firstReadPos) / (double) records;
-        // increase it by 1% since the evaluation is usually a bit lower due to bad quality causes lower compression rate
-        readNum = (long) (bytesTotal*1.01 / bytesPerRead);
+    for(iter = knownAdapters.begin(); iter!= knownAdapters.end(); iter++) {
+        string adapter = iter->first;
+        possibleCounts[adapter] = 0;
+        mismatches[adapter] = 0;
     }
 
-    // we need at least 10000 valid records to evaluate
-    if(records < 10000) {
-        delete[] counts;
-        return "";
-    }
+    long checkedReads = 0;
+    long checkedBases = 0;
+    int curMaxCount = 0;
+    for (long i=0; i<num; i++) {
+        Read* r = reads[i];
+        const char* rdata = r->mSeq->c_str();
+        int rlen = r->length();
 
-    int repeatReq = 0.0001 * records;
-
-    // initialize candidates
-    map<string, unsigned int> candidates;
-    for(int i=0; i<size; i++) {
-        if(counts[i] >= repeatReq) {
-            string seq = int2seq(i, keylen);
-            // remove low complexity seq
-            int diff = 0;
-            for(int s=0; s<seq.length() - 1; s++) {
-                if(seq[s] != seq[s+1])
-                    diff++;
-            }
-            if(diff >=2){
-                candidates[seq] = counts[i];
-                //cerr << seq << ": " << candidates[seq] << endl;
+        checkedReads++;
+        checkedBases+= rlen;
+        if(checkedReads > MAX_CHECK_READS || checkedBases > MAX_CHECK_BASES)
+            break;
+        if(curMaxCount > MAX_HIT)
+            break;
+        for(iter = knownAdapters.begin(); iter!= knownAdapters.end(); iter++) {
+            string adapter = iter->first;
+            const char* adata = adapter.c_str();
+            int alen = adapter.length();
+            if(alen >= rlen)
+                continue;
+            // this one is not the candidate, skip it for speedup
+            if(curMaxCount > 20 && possibleCounts[adapter] <curMaxCount/10) {
+                continue; 
             }
-        }
-    }
-
-    map<string, unsigned int>::iterator iter;
-
-    // remove the fake ones have only first base different
-    vector<string> needToDelete;
-    for(iter = candidates.begin(); iter!=candidates.end(); iter++) {
-        string seq = iter->first;
-        char bases[4] = {'A', 'T', 'C', 'G'};
-        int num = 0;
-        for(int b=0; b<4; b++) {
-            seq[0] = bases[b];
-            if(candidates.count(seq) > 0)
-                num++;
-        }
-        if(num >=2 ) {
-            needToDelete.push_back(iter->first);
-        }
-    }
-    for(int i=0; i<needToDelete.size(); i++) {
-        candidates.erase(needToDelete[i]);
-    }
-
-    map<string, unsigned int>::iterator iter1;
-    map<string, unsigned int>::iterator iter2;
-
-    while(true) {
-        bool changed = false;
-        for(iter1 = candidates.begin(); iter1!=candidates.end(); iter1++) {
-            bool aligned = false;
-            for(iter2 = candidates.begin(); iter2!=candidates.end(); iter2++) {
-                if(iter1 == iter2)
-                    continue;
-
-                string a1 = iter1->first;
-                string a2 = iter2->first;
-                int len1 = a1.length();
-                int len2 = a2.length();
-                int overlap = keylen - 1;
-                //cerr << a1 << ":" << a2 << endl;
-
-                // check identidal
-                bool identical = true;
-                for(int o=0; o<overlap; o++) {
-                    identical &= (a1[len1 - overlap + o] == a2[o]);
+            for(int pos = 0; pos<rlen-matchReq; pos++) {
+                int cmplen = min(rlen - pos, alen);
+                int allowedMismatch = cmplen/allowOneMismatchForEach;
+                int mismatch = 0;
+                bool matched = true;
+                for(int i=0; i<cmplen; i++) {
+                    if( adata[i] != rdata[i+pos] ){
+                        mismatch++;
+                        if(mismatch > allowedMismatch) {
+                            matched = false;
+                            break;
+                        }
+                    }
                 }
-
-                if(identical) {
-                    // merge them
-                    string mergedAdapter = a1 + a2.substr(overlap, len2-overlap);
-                    int mergedCount = iter1->second + iter2->second;
-                    candidates.erase(a1);
-                    candidates.erase(a2);
-                    candidates[mergedAdapter] = mergedCount;
-                    aligned = true;
+                if(matched) {
+                    possibleCounts[adapter]++;
+                    if(curMaxCount < possibleCounts[adapter])
+                        curMaxCount = possibleCounts[adapter];
+                    mismatches[adapter] += mismatch;
                     break;
                 }
-
-            }
-            if(aligned) {
-                changed = true;
-                break;
             }
         }
-        if(changed == false)
-            break;
     }
 
-    // find the longest adapter
-    int largest = 0;
-    string finalAdapter = "";
-    for(iter = candidates.begin(); iter!=candidates.end(); iter++) {
-        if(iter->second > largest) {
-            largest = iter->second;
-            finalAdapter = iter->first;
+    string adapter = "";
+    int maxCount = 0;
+    map<string, int>::iterator iter2;
+    for(iter2 = possibleCounts.begin(); iter2 != possibleCounts.end(); iter2++) {
+        if(iter2->second > maxCount) {
+            adapter = iter2->first;
+            maxCount = iter2->second;
         }
     }
-
-    delete[] counts;
-
-    if(finalAdapter.length() > 60)
-        finalAdapter.resize(60);
-    string matchedAdapter = matchKnownAdapter(finalAdapter);
-    if(!matchedAdapter.empty()) {
-        map<string, string> knownAdapters = getKnownAdapter();
-        cerr << knownAdapters[matchedAdapter] << ": " << matchedAdapter << endl;
-        return matchedAdapter;
-    } else {
-        cerr << finalAdapter << endl;
-        return finalAdapter;
+    if(maxCount > checkedReads/50 || (maxCount > checkedReads/200 && mismatches[adapter] < checkedReads)) {
+        cerr << knownAdapters[adapter] << endl;
+        cerr << adapter << endl;
+        return adapter;
     }
-
+    return "";
 }
 
 string Evaluator::evalAdapterAndReadNum(long& readNum, bool isR2) {
@@ -465,6 +350,16 @@ string Evaluator::evalAdapterAndReadNum(long& readNum, bool isR2) {
         return "";
     }
 
+    string knownAdapter = checkKnownAdapters(loadedReads, records);
+     if(knownAdapter.size() > 8) {
+        for(int r=0; r<records; r++) {
+            delete loadedReads[r];
+            loadedReads[r] = NULL;
+        }
+        delete[] loadedReads;
+        return knownAdapter;
+    }
+
     // we have to shift last cycle for evaluation since it is so noisy, especially for Illumina data
     const int shiftTail = max(1, mOptions->trim.tail1);
 
@@ -577,13 +472,14 @@ string Evaluator::evalAdapterAndReadNum(long& readNum, bool isR2) {
 string Evaluator::getAdapterWithSeed(int seed, Read** loadedReads, long records, int keylen) {
     // we have to shift last cycle for evaluation since it is so noisy, especially for Illumina data
     const int shiftTail = max(1, mOptions->trim.tail1);
+    const int MAX_SEARCH_LENGTH = 500;
     NucleotideTree forwardTree(mOptions);
     // forward search
     for(int i=0; i<records; i++) {
         Read* r = loadedReads[i];
         const char* data = r->mSeq->c_str();
         int key = -1;
-        for(int pos = 20; pos <= r->length()-keylen-shiftTail; pos++) {
+        for(int pos = 20; pos <= r->length()-keylen-shiftTail && pos <MAX_SEARCH_LENGTH; pos++) {
             key = seq2int(r->mSeq, pos, keylen, key);
             if(key == seed) {
                 forwardTree.addSeq(r->mSeq->substr(pos+keylen, r->length()-keylen-shiftTail-pos));
@@ -599,7 +495,7 @@ string Evaluator::getAdapterWithSeed(int seed, Read** loadedReads, long records,
         Read* r = loadedReads[i];
         const char* data = r->mSeq->c_str();
         int key = -1;
-        for(int pos = 20; pos <= r->length()-keylen-shiftTail; pos++) {
+        for(int pos = 20; pos <= r->length()-keylen-shiftTail && pos <MAX_SEARCH_LENGTH; pos++) {
             key = seq2int(r->mSeq, pos, keylen, key);
             if(key == seed) {
                 string seq =  r->mSeq->substr(0, pos);


=====================================
src/evaluator.h
=====================================
@@ -16,7 +16,6 @@ public:
     ~Evaluator();
     // evaluate how many reads are stored in the input file
     void evaluateReadNum(long& readNum);
-    string evalAdapterAndReadNumDepreciated(long& readNum);
     string evalAdapterAndReadNum(long& readNum, bool isR2);
     bool isTwoColorSystem();
     void evaluateSeqLen();
@@ -32,6 +31,7 @@ private:
     int seq2int(string* seq, int pos, int seqlen, int lastVal = -1);
     int seq2int(string& seq, int pos, int seqlen, int lastVal = -1);
     string getAdapterWithSeed(int seed, Read** loadedReads, long records, int keylen);
+    string checkKnownAdapters(Read** reads, long num);
 };
 
 


=====================================
src/fastqreader.cpp
=====================================
@@ -252,6 +252,12 @@ void FastqReader::getLine(string* line){
 		readToBuf();
 		start = 0;
 		end = 0;
+		// handle the case that \r or \n in the start of buf
+		if(line->empty()) {
+			while(start < mBufDataLen && (mFastqBuf[start] == '\r' || mFastqBuf[start] == '\n'))
+				start++;
+			end = start;
+		}
 		while(end < mBufDataLen) {
 			if(mFastqBuf[end] != '\r' && mFastqBuf[end] != '\n')
 				end++;
@@ -318,8 +324,10 @@ Read* FastqReader::read(){
 	getLine(quality);
 
 	if (strand->empty() || (*strand)[0]!='+') {
+		cerr << *name << endl;
 		cerr << "Expected '+', got " << *strand << endl;
-		error_exit("'+' expected");
+		cerr << "Your FASTQ may be invalid, please check the tail of your FASTQ file" << endl;
+		return NULL;
 	}
 
 	if(quality->length() != sequence->length()) {
@@ -328,7 +336,7 @@ Read* FastqReader::read(){
 		cerr << *sequence << endl;
 		cerr << *strand << endl;
 		cerr << *quality << endl;
-		error_exit("sequence and quality have different length");
+		cerr << "Your FASTQ may be invalid, please check the tail of your FASTQ file" << endl;
 		return NULL;
 	}
 


=====================================
src/filterresult.cpp
=====================================
@@ -4,6 +4,9 @@
 #include "htmlreporter.h"
 #include <memory.h>
 
+#define MAX_ADAPTER_REC 20000
+#define LOW_COMPLEXITY_SKIP 5000
+
 FilterResult::FilterResult(Options* opt, bool paired){
     mOptions = opt;
     mPaired = paired;
@@ -109,6 +112,15 @@ long FilterResult::getCorrectionNum(char from, char to) {
     return mCorrectionMatrix[f*8 + t];
 }
 
+bool FilterResult::isLowComplexity(string& adapter) {
+    int diff = 0;
+    for(int i=0; i<adapter.length()-1; i++) {
+        if(adapter[i] != adapter[i+1])
+            diff++;
+    }
+    return diff < adapter.length()/2;
+}
+
 void FilterResult::addAdapterTrimmed(string adapter, bool isR2, bool incTrimmedCounter ) {
     if(adapter.empty())
         return;
@@ -118,13 +130,25 @@ void FilterResult::addAdapterTrimmed(string adapter, bool isR2, bool incTrimmedC
     if(!isR2) {
         if(mAdapter1.count(adapter) >0 )
             mAdapter1[adapter]++;
-        else
+        else {
+            // to prevent possible memory explosion
+            // we skip this adapter when there are too many adapters recorded
+            // or when this adapter is with low complexity
+            if(mAdapter1.size() > MAX_ADAPTER_REC || (mAdapter1.size()>LOW_COMPLEXITY_SKIP && isLowComplexity(adapter)))
+                return;
             mAdapter1[adapter] = 1;
+        }
     } else {
         if(mAdapter2.count(adapter) >0 )
             mAdapter2[adapter]++;
-        else
+        else {
+            // to prevent possible memory explosion
+            // we skip this adapter when there are too many adapters recorded
+            // or when this adapter is with low complexity
+            if(mAdapter2.size() > MAX_ADAPTER_REC || (mAdapter2.size()>LOW_COMPLEXITY_SKIP && isLowComplexity(adapter)))
+                return;
             mAdapter2[adapter] = 1;
+        }
     }
 }
 
@@ -135,14 +159,26 @@ void FilterResult::addAdapterTrimmed(string adapter1, string adapter2) {
     if(!adapter1.empty()){
         if(mAdapter1.count(adapter1) >0 )
             mAdapter1[adapter1]++;
-        else
+        else {
+            // to prevent possible memory explosion
+            // we skip this adapter when there are too many adapters recorded
+            // or when this adapter is with low complexity
+            if(mAdapter1.size() > MAX_ADAPTER_REC || (mAdapter1.size()>LOW_COMPLEXITY_SKIP && isLowComplexity(adapter1)))
+                return;
             mAdapter1[adapter1] = 1;
+        }
     }
     if(!adapter2.empty()) {
         if(mAdapter2.count(adapter2) >0 )
             mAdapter2[adapter2]++;
-        else
+        else {
+            // to prevent possible memory explosion
+            // we skip this adapter when there are too many adapters recorded
+            // or when this adapter is with low complexity
+            if(mAdapter2.size() > MAX_ADAPTER_REC || (mAdapter2.size()>LOW_COMPLEXITY_SKIP && isLowComplexity(adapter2)))
+                return;
             mAdapter2[adapter2] = 1;
+        }
     }
 }
 
@@ -334,19 +370,49 @@ void FilterResult::reportHtml(ofstream& ofs, long totalReads, long totalBases) {
 }
 
 void FilterResult::reportAdapterHtml(ofstream& ofs, long totalBases) {
-    ofs << "<div class='subsection_title' onclick=showOrHide('read1_adapters')>Adapter or bad ligation of read1</div>\n";
-    ofs << "<div id='read1_adapters'>\n";
-    outputAdaptersHtml(ofs, mAdapter1, totalBases);
-    ofs << "</div>\n";
-    if(mOptions->isPaired()) {
+    if(!mOptions->isPaired()) {
+        ofs << "<div class='subsection_title' onclick=showOrHide('read_adapters')>Adapter or bad ligation</div>\n";
+        ofs << "<div id='read_adapters'>\n";
+        outputAdaptersHtml(ofs, mAdapter1, totalBases);
+        ofs << "</div>\n";
+    } else {
+        int limitCount = min(getAdapterReportCount(mAdapter1), getAdapterReportCount(mAdapter2));
+        ofs << "<table> <tr> <td>\n";
+        ofs << "<div class='subsection_title' onclick=showOrHide('read1_adapters')>Adapter or bad ligation of read1</div>\n";
+        ofs << "<div id='read1_adapters'>\n";
+        outputAdaptersHtml(ofs, mAdapter1, totalBases, limitCount);
+        ofs << "</div>\n";
+        ofs << "</td><td>\n";
         ofs << "<div class='subsection_title' onclick=showOrHide('read2_adapters')>Adapter or bad ligation of read2</div>\n";
         ofs << "<div id='read2_adapters'>\n";
-        outputAdaptersHtml(ofs, mAdapter2, totalBases);
+        outputAdaptersHtml(ofs, mAdapter2, totalBases, limitCount);
         ofs << "</div>\n";
+        ofs << "</td></tr></table>\n";
     }
 }
 
-void FilterResult::outputAdaptersHtml(ofstream& ofs, map<string, long, classcomp>& adapterCounts, long totalBases) {
+int FilterResult::getAdapterReportCount(map<string, long, classcomp>& adapterCounts) {
+    map<string, long>::iterator iter;
+    long total = 0;
+    long totalAdapterBases = 0;
+    for(iter = adapterCounts.begin(); iter!=adapterCounts.end(); iter++) {
+        total += iter->second;
+        totalAdapterBases += iter->first.length() * iter->second;
+    }
+    if(total == 0)
+        return 0;
+    const double reportThreshold = 0.01;
+    const double dTotal = (double)total;
+    int count = 0;
+    for(iter = adapterCounts.begin(); iter!=adapterCounts.end(); iter++) {
+        if(iter->second /dTotal < reportThreshold )
+            continue;
+        count++;
+    }
+    return count;
+}
+
+int FilterResult::outputAdaptersHtml(ofstream& ofs, map<string, long, classcomp>& adapterCounts, long totalBases, int limitCount) {
 
     map<string, long>::iterator iter;
 
@@ -366,7 +432,7 @@ void FilterResult::outputAdaptersHtml(ofstream& ofs, map<string, long, classcomp
     }
 
     if(total == 0)
-        return ;
+        return 0;
 
     ofs << "<table class='summary_table'>\n";
     ofs << "<tr><td class='adapter_col' style='font-size:14px;color:#ffffff;background:#556699'>" << "Sequence" << "</td><td class='col2' style='font-size:14px;color:#ffffff;background:#556699'>" << "Occurrences" << "</td></tr>\n";
@@ -374,6 +440,7 @@ void FilterResult::outputAdaptersHtml(ofstream& ofs, map<string, long, classcomp
     const double reportThreshold = 0.01;
     const double dTotal = (double)total;
     long reported = 0;
+    int count = 0;
     for(iter = adapterCounts.begin(); iter!=adapterCounts.end(); iter++) {
         if(iter->second /dTotal < reportThreshold )
             continue;
@@ -381,6 +448,9 @@ void FilterResult::outputAdaptersHtml(ofstream& ofs, map<string, long, classcomp
         ofs << "<tr><td class='adapter_col'>" << iter->first << "</td><td class='col2'>" << iter->second << "</td></tr>\n";
 
         reported += iter->second;
+        count++;
+        if(count >= limitCount && limitCount>0)
+            break;
     }
 
     long unreported = total - reported;
@@ -392,4 +462,5 @@ void FilterResult::outputAdaptersHtml(ofstream& ofs, map<string, long, classcomp
         ofs << "<tr><td class='adapter_col'>" << tag << "</td><td class='col2'>" << unreported << "</td></tr>\n";
     }
     ofs << "</table>\n";
+    return count;
 }
\ No newline at end of file


=====================================
src/filterresult.h
=====================================
@@ -50,7 +50,8 @@ public:
     // a part of HTML report for adapters
     void reportAdapterHtml(ofstream& ofs, long totalBases);
     void outputAdaptersJson(ofstream& ofs, map<string, long, classcomp>& adapterCounts);
-    void outputAdaptersHtml(ofstream& ofs, map<string, long, classcomp>& adapterCounts, long totalBases);
+    int outputAdaptersHtml(ofstream& ofs, map<string, long, classcomp>& adapterCounts, long totalBases, int limitCount = 0);
+    int getAdapterReportCount(map<string, long, classcomp>& adapterCounts);
     // deal with base correction results
     long* getCorrectionMatrix() {return mCorrectionMatrix;}
     long getTotalCorrectedBases();
@@ -58,6 +59,7 @@ public:
     long getCorrectionNum(char from, char to);
     void incCorrectedReads(int count);
     void addMergedPairs(int pairs);
+    bool isLowComplexity(string& adapter);
 
 
 public:


=====================================
src/htmlreporter.cpp
=====================================
@@ -1,6 +1,7 @@
 #include "htmlreporter.h"
 #include <chrono>
 #include <memory.h>
+#include "knownadapters.h"
 
 extern string command;
 
@@ -69,6 +70,10 @@ void HtmlReporter::printSummary(ofstream& ofs, FilterResult* result, Stats* preS
     if(preStats2)
         pre_q30_bases += preStats2->getQ30();
 
+    long pre_q40_bases = preStats1->getQ40();
+    if(preStats2)
+        pre_q40_bases += preStats2->getQ40();
+
     long pre_total_gc = preStats1->getGCNumber();
     if(preStats2)
         pre_total_gc += preStats2->getGCNumber();
@@ -89,6 +94,10 @@ void HtmlReporter::printSummary(ofstream& ofs, FilterResult* result, Stats* preS
     if(postStats2)
         post_q30_bases += postStats2->getQ30();
 
+    long post_q40_bases = postStats1->getQ40();
+    if(postStats2)
+        post_q40_bases += postStats2->getQ40();
+
     long post_total_gc = postStats1->getGCNumber();
     if(postStats2)
         post_total_gc += postStats2->getGCNumber();
@@ -132,10 +141,18 @@ void HtmlReporter::printSummary(ofstream& ofs, FilterResult* result, Stats* preS
         outputRow(ofs, "Insert size peak:", mInsertSizePeak);
     }
     if(mOptions->adapterCuttingEnabled()) {
-        if(!mOptions->adapter.detectedAdapter1.empty())
-            outputRow(ofs, "Detected read1 adapter:", mOptions->adapter.detectedAdapter1);
-        if(!mOptions->adapter.detectedAdapter2.empty())
-            outputRow(ofs, "Detected read2 adapter:", mOptions->adapter.detectedAdapter2);
+        map<string, string> knownAdapters = getKnownAdapter();
+        if(!mOptions->adapter.detectedAdapter1.empty()) {
+            string adapterinfo1 = mOptions->adapter.detectedAdapter1;
+            if(knownAdapters.count(adapterinfo1) > 0)
+                adapterinfo1 += " -" + knownAdapters[mOptions->adapter.detectedAdapter1];
+            outputRow(ofs, "Detected read1 adapter:", adapterinfo1);
+        } if(!mOptions->adapter.detectedAdapter2.empty()) {
+            string adapterinfo2 = mOptions->adapter.detectedAdapter2;
+            if(knownAdapters.count(adapterinfo2) > 0)
+                adapterinfo2 += " -" + knownAdapters[mOptions->adapter.detectedAdapter2];
+            outputRow(ofs, "Detected read2 adapter:", adapterinfo2);
+        }
     }
     ofs << "</table>\n";
     ofs << "</div>\n";
@@ -147,6 +164,7 @@ void HtmlReporter::printSummary(ofstream& ofs, FilterResult* result, Stats* preS
     outputRow(ofs, "total bases:", formatNumber(pre_total_bases));
     outputRow(ofs, "Q20 bases:", formatNumber(pre_q20_bases) + " (" + getPercents(pre_q20_bases,pre_total_bases) + "%)");
     outputRow(ofs, "Q30 bases:", formatNumber(pre_q30_bases) + " (" + getPercents(pre_q30_bases, pre_total_bases) + "%)");
+    outputRow(ofs, "Q40 bases:", formatNumber(pre_q40_bases) + " (" + getPercents(pre_q40_bases, pre_total_bases) + "%)");
     outputRow(ofs, "GC content:", getPercents(pre_total_gc,pre_total_bases) + "%");
     ofs << "</table>\n";
     ofs << "</div>\n";
@@ -158,6 +176,7 @@ void HtmlReporter::printSummary(ofstream& ofs, FilterResult* result, Stats* preS
     outputRow(ofs, "total bases:", formatNumber(post_total_bases));
     outputRow(ofs, "Q20 bases:", formatNumber(post_q20_bases) + " (" + getPercents(post_q20_bases, post_total_bases) + "%)");
     outputRow(ofs, "Q30 bases:", formatNumber(post_q30_bases) + " (" + getPercents(post_q30_bases, post_total_bases) + "%)");
+    outputRow(ofs, "Q40 bases:", formatNumber(post_q40_bases) + " (" + getPercents(post_q40_bases, post_total_bases) + "%)");
     outputRow(ofs, "GC content:", getPercents(post_total_gc,post_total_bases) + "%");
     ofs << "</table>\n";
     ofs << "</div>\n";
@@ -322,6 +341,89 @@ void HtmlReporter::reportDuplication(ofstream& ofs) {
     delete[] gc;
 }
 
+void HtmlReporter::reportQualHistogram(ofstream& ofs, string caption, Stats* stats1, Stats* stats2) {
+    long* hist1 = stats1->getQualHist();
+
+    string divName = replace(caption, " ", "_");
+    divName = replace(divName, ":", "_");
+
+    ofs << "<div class='figure' style='height:400' id='plot_" + divName + "'></div>\n";
+    ofs << "<script language='javascript'>" << endl;
+    ofs << " var hist1 = {" << endl;
+    ofs << " x: [";
+    bool first = true;
+    for(int q=33; q<128; q++) {
+        if(hist1[q]>0) {
+            if(!first) {
+                ofs << ",";
+            }
+            ofs << q-33;
+            first = false;
+        }
+    }
+    ofs << "]," << endl;
+    ofs << " y: [";
+    first = true;
+    for(int q=33; q<128; q++) {
+        if(hist1[q]>0) {
+            if(!first) {
+                ofs << ",";
+            }
+            ofs << hist1[q];
+            first = false;
+        }
+    }
+    ofs << "]," << endl;
+    ofs << "name: 'read1'," << endl;
+    ofs << "type: 'bar'," << endl;
+    ofs << "};" << endl;
+
+    if(stats2 != NULL) {
+        long* hist2 = stats2->getQualHist();
+        ofs << " var hist2 = {" << endl;
+        ofs << " x: [";
+        first = true;
+        for(int q=33; q<128; q++) {
+            if(hist2[q]>0) {
+                if(!first) {
+                    ofs << ",";
+                }
+                ofs << q-33;
+                first = false;
+            }
+        }
+        ofs << "]," << endl;
+        ofs << " y: [";
+        first = true;
+        for(int q=33; q<128; q++) {
+            if(hist2[q]>0) {
+                if(!first) {
+                    ofs << ",";
+                }
+                ofs << hist2[q];
+                first = false;
+            }
+        }
+        ofs << "]," << endl;
+        ofs << "name: 'read2'," << endl;
+        ofs << "type: 'bar'," << endl;
+        ofs << "};" << endl;
+    }
+
+    if(stats2 != NULL) {
+        ofs << "var data = [hist1, hist2];" << endl;
+    } else {
+        ofs << "var data = [hist1];" << endl;
+    }
+
+    ofs << "var layout = {barmode: 'group', ";
+    ofs << "title: '" << caption << "',";
+    ofs <<"xaxis:{title:'base quality score'}, yaxis:{title:'base count'}}; "<< endl;
+
+    ofs << "Plotly.newPlot('plot_" + divName + "', data, layout); " << endl;
+    ofs << "</script>" << endl;
+}
+
 void HtmlReporter::report(FilterResult* result, Stats* preStats1, Stats* postStats1, Stats* preStats2, Stats* postStats2) {
     ofstream ofs;
     ofs.open(mOptions->htmlFile, ifstream::out);
@@ -331,36 +433,123 @@ void HtmlReporter::report(FilterResult* result, Stats* preStats1, Stats* postSta
     printSummary(ofs, result, preStats1, postStats1, preStats2, postStats2);
 
     ofs << "<div class='section_div'>\n";
-    ofs << "<div class='section_title' onclick=showOrHide('before_filtering')><a name='summary'>Before filtering</a></div>\n";
-    ofs << "<div id='before_filtering'>\n";
+    //ofs << "<div class='section_title' onclick=showOrHide('before_filtering')><a name='summary'>Before filtering</a></div>\n";
+    //ofs << "<div id='before_filtering'>\n";
 
+    ofs << "<div class='section_title' onclick=showOrHide('quality_stat')><a name='summary'>Filtering statistics</a>";
+    if(mOptions->out1.empty() && ! mOptions->outputToSTDOUT) {
+        // no output, we give a hint here for clarification
+        ofs << "<font size=-2> (although no output file specified, the filtered statistics are still presented here) </font>" << endl;
+    }
+    ofs << "</div>\n";
+    ofs << "<table id='quality_stat' class='section_table'>\n";
+
+    string postRead1Name = "read1";
+    if(mOptions->merge.enabled)
+        postRead1Name = "merged";
+
+    // base quality scores
+    ofs << "<tr><td>\n";
     if(preStats1) {
-        preStats1 -> reportHtml(ofs, "Before filtering", "read1");
+        preStats1 -> reportHtmlQuality(ofs, "Before filtering", "read1");
+    }
+    ofs << "</td><td>\n";
+    if(postStats1) {
+        postStats1 -> reportHtmlQuality(ofs, "After filtering", postRead1Name);
     }
+    ofs << "</td></tr>\n";
+
 
     if(preStats2) {
-        preStats2 -> reportHtml(ofs, "Before filtering", "read2");
+        ofs << "<tr><td>\n";
+        preStats2 -> reportHtmlQuality(ofs, "Before filtering", "read2");
+        ofs << "</td><td>\n";
+        if(!mOptions->merge.enabled) {
+            postStats2 -> reportHtmlQuality(ofs, "After filtering", "read2");
+        }
+        ofs << "</td></tr>\n";
     }
 
-    ofs << "</div>\n";
-    ofs << "</div>\n";
+    // quality histogram
+    ofs << "<tr style='height:20px;background:#999999;'></tr>\n";
+    ofs << "<tr><td>\n";
+    reportQualHistogram(ofs, "Before filtering: quality score histogram", preStats1, preStats2);
+    ofs << "</td><td>\n";
+    reportQualHistogram(ofs, "After filtering: quality score histogram", postStats1, postStats2);
+    ofs << "</td></tr>\n";
+
+    // base contents
+    ofs << "<tr style='height:20px;background:#999999;'></tr>\n";
+    ofs << "<tr><td>\n";
+    if(preStats1) {
+        preStats1 -> reportHtmlContents(ofs, "Before filtering", "read1");
+    }
+    ofs << "</td><td>\n";
+    if(postStats1) {
+        postStats1 -> reportHtmlContents(ofs, "After filtering", postRead1Name);
+    }
+    ofs << "</td></tr>\n";
 
-    ofs << "<div class='section_div'>\n";
-    ofs << "<div class='section_title' onclick=showOrHide('after_filtering')><a name='summary'>After filtering</a></div>\n";
-    ofs << "<div id='after_filtering'>\n";
+    if(preStats2) {
+        ofs << "<tr><td>\n";
+        preStats2 -> reportHtmlContents(ofs, "Before filtering", "read2");
+        ofs << "</td><td>\n";
+        if(!mOptions->merge.enabled) {
+            postStats2 -> reportHtmlContents(ofs, "After filtering", "read2");
+        }
+        ofs << "</td></tr>\n";
+    }
 
+    // KMER
+    ofs << "<tr style='height:20px;background:#999999;'></tr>\n";
+    ofs << "<tr><td>\n";
+    if(preStats1) {
+        preStats1 -> reportHtmlKMER(ofs, "Before filtering", "read1");
+    }
+    ofs << "</td><td>\n";
     if(postStats1) {
-        string name = "read1";
-        if(mOptions->merge.enabled)
-            name = "merged";
-        postStats1 -> reportHtml(ofs, "After filtering", name);
+        postStats1 -> reportHtmlKMER(ofs, "After filtering", postRead1Name);
     }
+    ofs << "</td></tr>\n";
 
+    ofs << "<tr><td>\n";
+    if(preStats2) {
+        preStats2 -> reportHtmlKMER(ofs, "Before filtering", "read2");
+    }
+    ofs << "</td><td>\n";
     if(postStats2 && !mOptions->merge.enabled) {
-        postStats2 -> reportHtml(ofs, "After filtering", "read2");
+        postStats2 -> reportHtmlKMER(ofs, "After filtering", "read2");
     }
+    ofs << "</td></tr>\n";
+
+    // over represented seqs
+    if(mOptions->overRepAnalysis.enabled) {
+        ofs << "<tr style='height:20px;'></tr>\n";
+        ofs << "<tr style='vertical-align: top;'><td>\n";
+        if(preStats1) {
+            preStats1 -> reportHtmlORA(ofs, "Before filtering", "read1");
+        }
+        ofs << "</td><td>\n";
+        if(postStats1) {
+            postStats1 -> reportHtmlORA(ofs, "After filtering", postRead1Name);
+        }
+        ofs << "</td></tr>\n";
 
-    ofs << "</div>\n";
+        ofs << "<tr style='vertical-align: top;'><td>\n";
+        if(preStats2) {
+            preStats2 -> reportHtmlORA(ofs, "Before filtering", "read2");
+        }
+        ofs << "</td><td>\n";
+        if(postStats2 && !mOptions->merge.enabled) {
+            postStats2 -> reportHtmlORA(ofs, "After filtering", "read2");
+        }
+        ofs << "</td></tr>\n";
+    }
+
+
+    ofs << "</table>\n";
+
+    //ofs << "</div>\n";
     ofs << "</div>\n";
 
     printFooter(ofs);
@@ -379,7 +568,7 @@ void HtmlReporter::printHeader(ofstream& ofs){
 void HtmlReporter::printCSS(ofstream& ofs){
     ofs << "<style type=\"text/css\">" << endl;
     ofs << "td {border:1px solid #dddddd;padding:5px;font-size:12px;}" << endl;
-    ofs << "table {border:1px solid #999999;padding:2x;border-collapse:collapse; width:800px}" << endl;
+    ofs << "table {border:1px solid #999999;padding:2x;border-collapse:collapse;width:100%}" << endl;
     ofs << ".col1 {width:240px; font-weight:bold;}" << endl;
     ofs << ".adapter_col {width:500px; font-size:10px;}" << endl;
     ofs << "img {padding:30px;}" << endl;
@@ -388,9 +577,10 @@ void HtmlReporter::printCSS(ofstream& ofs){
     ofs << "a:visited {color: #999999}" << endl;
     ofs << ".alignleft {text-align:left;}" << endl;
     ofs << ".alignright {text-align:right;}" << endl;
-    ofs << ".figure {width:800px;height:600px;}" << endl;
+    ofs << ".figure {width:680px;height:600px;}" << endl;
     ofs << ".header {color:#ffffff;padding:1px;height:20px;background:#000000;}" << endl;
     ofs << ".section_title {color:#ffffff;font-size:20px;padding:5px;text-align:left;background:#663355; margin-top:10px;}" << endl;
+    ofs << ".section_table {width:100%;}" << endl;
     ofs << ".subsection_title {font-size:16px;padding:5px;margin-top:10px;text-align:left;color:#663355}" << endl;
     ofs << "#container {text-align:center;padding:3px 3px 3px 10px;font-family:Arail,'Liberation Mono', Menlo, Courier, monospace;}" << endl;
     ofs << ".menu_item {text-align:left;padding-top:5px;font-size:18px;}" << endl;


=====================================
src/htmlreporter.h
=====================================
@@ -33,6 +33,7 @@ private:
     void printCSS(ofstream& ofs);
     void printJS(ofstream& ofs);
     void printFooter(ofstream& ofs);
+    void reportQualHistogram(ofstream& ofs, string caption, Stats* stats1, Stats* stats2 = NULL);
     void reportDuplication(ofstream& ofs);
     void reportInsertSize(ofstream& ofs, int isizeLimit);
     void printSummary(ofstream& ofs, FilterResult* result, Stats* preStats1, Stats* postStats1, Stats* preStats2, Stats* postStats2);


=====================================
src/main.cpp
=====================================
@@ -26,7 +26,7 @@ int main(int argc, char* argv[]){
         return 0;
     }
     if (argc == 2 && (strcmp(argv[1], "-v")==0 || strcmp(argv[1], "--version")==0)){
-        cerr << "fastp " << FASTP_VER << endl;
+        cout << "fastp " << FASTP_VER << endl;
         return 0;
     }
     cmdline::parser cmd;
@@ -57,7 +57,8 @@ int main(int argc, char* argv[]){
     cmd.add<string>("adapter_sequence", 'a', "the adapter for read1. For SE data, if not specified, the adapter will be auto-detected. For PE data, this is used if R1/R2 are found not overlapped.", false, "auto");
     cmd.add<string>("adapter_sequence_r2", 0, "the adapter for read2 (PE data only). This is used if R1/R2 are found not overlapped. If not specified, it will be the same as <adapter_sequence>", false, "auto");
     cmd.add<string>("adapter_fasta", 0, "specify a FASTA file to trim both read1 and read2 (if PE) by all the sequences in this FASTA file", false, "");
-    cmd.add("detect_adapter_for_pe", 0, "by default, the auto-detection for adapter is for SE data input only, turn on this option to enable it for PE data.");
+    cmd.add("detect_adapter_for_pe", '2', "enable adapter detection for PE data to get ultra-clean data. It takes more time to find just a little bit more adapters.");
+    cmd.add("allow_gap_overlap_trimming", 0, "allow up to one gap when trim adapters by overlap analysis for PE data. By default no gap is allowed.");
 
     // trimming
     cmd.add<int>("trim_front1", 'f', "trimming how many bases in front for read1, default is 0", false, 0);
@@ -215,6 +216,7 @@ int main(int argc, char* argv[]){
     // adapter cutting
     opt.adapter.enabled = !cmd.exist("disable_adapter_trimming");
     opt.adapter.detectAdapterForPE = cmd.exist("detect_adapter_for_pe");
+    opt.adapter.allowGapOverlapTrimming = cmd.exist("allow_gap_overlap_trimming");
     opt.adapter.sequence = cmd.get<string>("adapter_sequence");
     opt.adapter.sequenceR2 = cmd.get<string>("adapter_sequence_r2");
     opt.adapter.fastaFile = cmd.get<string>("adapter_fasta");


=====================================
src/matcher.cpp
=====================================
@@ -0,0 +1,101 @@
+#include "matcher.h"
+
+Matcher::Matcher(){
+}
+
+
+Matcher::~Matcher(){
+}
+
+bool Matcher::matchWithOneInsertion(const char* insData, const char* normalData, int cmplen, int diffLimit) {
+    // accumlated mismatches from left/right
+    int accMismatchFromLeft[cmplen];
+    int accMismatchFromRight[cmplen];
+
+    // accMismatchFromLeft[0]: head vs. head
+    // accMismatchFromRight[cmplen-1]: tail vs. tail
+    accMismatchFromLeft[0] = insData[0] == normalData[0] ? 0 : 1;
+    accMismatchFromRight[cmplen-1] = insData[cmplen] == normalData[cmplen-1] ? 0 : 1;
+    for(int i=1; i<cmplen; i++) {
+        if(insData[i] != normalData[i])
+            accMismatchFromLeft[i] = accMismatchFromLeft[i-1]+1;
+        else
+            accMismatchFromLeft[i] = accMismatchFromLeft[i-1];
+        
+        if(accMismatchFromLeft[i] + accMismatchFromRight[cmplen-1] >diffLimit)
+            break;
+    }
+    for(int i=cmplen - 2; i>=0; i--) {
+        if(insData[i+1] != normalData[i])
+            accMismatchFromRight[i] = accMismatchFromRight[i+1]+1;
+        else
+            accMismatchFromRight[i] = accMismatchFromRight[i+1];
+        if(accMismatchFromRight[i] + accMismatchFromLeft[0]> diffLimit) {
+            for(int p=0; p<i; p++)
+                accMismatchFromRight[p] = diffLimit+1;
+            break;
+        }
+    }
+
+    //    insData:     XXXXXXXXXXXXXXXXXXXXXXX[i]XXXXXXXXXXXXXXXXXXXXXXXX
+    // normalData:     YYYYYYYYYYYYYYYYYYYYYYY   YYYYYYYYYYYYYYYYYYYYYYYY
+    //       diff:    accMismatchFromLeft[i-1] + accMismatchFromRight[i]
+
+    // insertion can be from pos = 1 to cmplen - 1
+    for(int i=1; i<cmplen; i++) {
+        if(accMismatchFromLeft[i-1] + accMismatchFromRight[cmplen-1]> diffLimit)
+            return false;
+        int diff = accMismatchFromLeft[i-1] + accMismatchFromRight[i];
+        if(diff <= diffLimit)
+            return true;
+    }
+
+    return false;
+}
+
+int Matcher::diffWithOneInsertion(const char* insData, const char* normalData, int cmplen, int diffLimit) {
+    // accumlated mismatches from left/right
+    int accMismatchFromLeft[cmplen];
+    int accMismatchFromRight[cmplen];
+
+    // accMismatchFromLeft[0]: head vs. head
+    // accMismatchFromRight[cmplen-1]: tail vs. tail
+    accMismatchFromLeft[0] = insData[0] == normalData[0] ? 0 : 1;
+    accMismatchFromRight[cmplen-1] = insData[cmplen] == normalData[cmplen-1] ? 0 : 1;
+    for(int i=1; i<cmplen; i++) {
+        if(insData[i] != normalData[i])
+            accMismatchFromLeft[i] = accMismatchFromLeft[i-1]+1;
+        else
+            accMismatchFromLeft[i] = accMismatchFromLeft[i-1];
+        
+        if(accMismatchFromLeft[i] + accMismatchFromRight[cmplen-1] >diffLimit)
+            break;
+    }
+    for(int i=cmplen - 2; i>=0; i--) {
+        if(insData[i+1] != normalData[i])
+            accMismatchFromRight[i] = accMismatchFromRight[i+1]+1;
+        else
+            accMismatchFromRight[i] = accMismatchFromRight[i+1];
+        if(accMismatchFromRight[i] + accMismatchFromLeft[0]> diffLimit) {
+            for(int p=0; p<i; p++)
+                accMismatchFromRight[p] = diffLimit+1;
+            break;
+        }
+    }
+
+    //    insData:     XXXXXXXXXXXXXXXXXXXXXXX[i]XXXXXXXXXXXXXXXXXXXXXXXX
+    // normalData:     YYYYYYYYYYYYYYYYYYYYYYY   YYYYYYYYYYYYYYYYYYYYYYYY
+    //       diff:    accMismatchFromLeft[i-1] + accMismatchFromRight[i]
+
+    int minDiff = 100000000;
+    // insertion can be from pos = 1 to cmplen - 1
+    for(int i=1; i<cmplen; i++) {
+        if(accMismatchFromLeft[i-1] + accMismatchFromRight[cmplen-1]> diffLimit)
+            return -1; // -1 means higher than diffLimit
+        int diff = accMismatchFromLeft[i-1] + accMismatchFromRight[i];
+        if(diff <= minDiff)
+            minDiff = diff;
+    }
+
+    return minDiff;
+}
\ No newline at end of file


=====================================
src/matcher.h
=====================================
@@ -0,0 +1,22 @@
+#ifndef MATCHER_H
+#define MATCHER_H
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string>
+
+using namespace std;
+
+class Matcher{
+public:
+    Matcher();
+    ~Matcher();
+
+    static bool matchWithOneInsertion(const char* insData, const char* normalData, int cmplen, int diffLimit);
+    static int diffWithOneInsertion(const char* insData, const char* normalData, int cmplen, int diffLimit);
+
+
+};
+
+
+#endif
\ No newline at end of file


=====================================
src/options.cpp
=====================================
@@ -94,6 +94,17 @@ bool Options::validate() {
         check_file_valid(in2);
     }
 
+    if(outputToSTDOUT) {
+        if(!out1.empty()) {
+            cerr << "In STDOUT mode, ignore the out1 filename " << out1 << endl;
+            out1 = "";
+        }
+        if(!out2.empty()) {
+            cerr << "In STDOUT mode, ignore the out2 filename " << out2 << endl;
+            out2 = "";
+        }
+    }
+
     if(merge.enabled) {
         if(split.enabled) {
             error_exit("splitting mode cannot work with merging mode");
@@ -278,9 +289,9 @@ bool Options::validate() {
 
     if(thread < 1) {
         thread = 1;
-    } else if(thread > 16) {
-        cerr << "WARNING: fastp uses up to 16 threads although you specified " << thread << endl;
-        thread = 16;
+    } else if(thread > 64) {
+        cerr << "WARNING: fastp uses up to 64 threads although you specified " << thread << endl;
+        thread = 64;
     }
 
     if(trim.front1 < 0 || trim.front1 > 30)


=====================================
src/options.h
=====================================
@@ -201,6 +201,7 @@ public:
         hasSeqR1 = false;
         hasSeqR2 = false;
         detectAdapterForPE = false;
+        allowGapOverlapTrimming = false;
     }
 public:
     bool enabled;
@@ -214,6 +215,7 @@ public:
     bool hasSeqR2;
     bool hasFasta;
     bool detectAdapterForPE;
+    bool allowGapOverlapTrimming;
 };
 
 class TrimmingOptions {


=====================================
src/overlapanalysis.cpp
=====================================
@@ -1,3 +1,4 @@
+#include "matcher.h"
 #include "overlapanalysis.h"
 
 OverlapAnalysis::OverlapAnalysis(){
@@ -7,12 +8,12 @@ OverlapAnalysis::OverlapAnalysis(){
 OverlapAnalysis::~OverlapAnalysis(){
 }
 
-OverlapResult OverlapAnalysis::analyze(Read* r1, Read* r2, int overlapDiffLimit, int overlapRequire, double diffPercentLimit) {
-    return analyze(r1->mSeq, r2->mSeq, overlapDiffLimit, overlapRequire, diffPercentLimit);
+OverlapResult OverlapAnalysis::analyze(Read* r1, Read* r2, int overlapDiffLimit, int overlapRequire, double diffPercentLimit, bool allowGap) {
+    return analyze(r1->mSeq, r2->mSeq, overlapDiffLimit, overlapRequire, diffPercentLimit, allowGap);
 }
 
 // ported from the python code of AfterQC
-OverlapResult OverlapAnalysis::analyze(string*  r1, string*  r2, int diffLimit, int overlapRequire, double diffPercentLimit) {
+OverlapResult OverlapAnalysis::analyze(string*  r1, string*  r2, int diffLimit, int overlapRequire, double diffPercentLimit, bool allowGap) {
     string rcr2 = Sequence::reverseComplement(r2);
     int len1 = r1->length();
     int len2 = rcr2.length();
@@ -26,7 +27,7 @@ OverlapResult OverlapAnalysis::analyze(string*  r1, string*  r2, int diffLimit,
     int offset = 0;
     int diff = 0;
 
-    // forward
+    // forward with no gap
     // a match of less than overlapRequire is considered as unconfident
     while (offset < len1-overlapRequire) {
         // the overlap length of r1 & r2 when r2 is move right for offset
@@ -49,6 +50,7 @@ OverlapResult OverlapAnalysis::analyze(string*  r1, string*  r2, int diffLimit,
             ov.offset = offset;
             ov.overlap_len = overlap_len;
             ov.diff = diff;
+            ov.hasGap = false;
             return ov;
         }
 
@@ -56,7 +58,7 @@ OverlapResult OverlapAnalysis::analyze(string*  r1, string*  r2, int diffLimit,
     }
 
 
-    // reverse
+    // reverse with no gap
     // in this case, the adapter is sequenced since TEMPLATE_LEN < SEQ_LEN
     // check if distance can get smaller if offset goes negative
     // this only happens when insert DNA is shorter than sequencing read length, and some adapter/primer is sequenced but not trimmed cleanly
@@ -83,15 +85,67 @@ OverlapResult OverlapAnalysis::analyze(string*  r1, string*  r2, int diffLimit,
             ov.offset = offset;
             ov.overlap_len = overlap_len;
             ov.diff = diff;
+            ov.hasGap = false;
             return ov;
         }
 
         offset -= 1;
     }
 
+    if(allowGap) {
+        // forward with one gap
+        offset = 0;
+        while (offset < len1-overlapRequire) {
+            // the overlap length of r1 & r2 when r2 is move right for offset
+            overlap_len = min(len1 - offset, len2);
+            int overlapDiffLimit = min(diffLimit, (int)(overlap_len * diffPercentLimit));
+
+            int diff = Matcher::diffWithOneInsertion(str1 + offset, str2, overlap_len-1, overlapDiffLimit);
+            if(diff <0 || diff > overlapDiffLimit)
+                diff = Matcher::diffWithOneInsertion(str2, str1 + offset, overlap_len-1, overlapDiffLimit);
+            
+            if (diff <= overlapDiffLimit && diff >=0){
+                OverlapResult ov;
+                ov.overlapped = true;
+                ov.offset = offset;
+                ov.overlap_len = overlap_len;
+                ov.diff = diff;
+                ov.hasGap = true;
+                return ov;
+            }
+
+            offset += 1;
+        }
+
+        // reverse with one gap
+        offset = 0;
+        while (offset > -(len2-overlapRequire)){
+            // the overlap length of r1 & r2 when r2 is move right for offset
+            overlap_len = min(len1,  len2- abs(offset));
+            int overlapDiffLimit = min(diffLimit, (int)(overlap_len * diffPercentLimit));
+
+            int diff = Matcher::diffWithOneInsertion(str1, str2-offset, overlap_len-1, overlapDiffLimit);
+            if(diff <0 || diff > overlapDiffLimit)
+                diff = Matcher::diffWithOneInsertion(str2-offset, str1, overlap_len-1, overlapDiffLimit);
+            
+            if (diff <= overlapDiffLimit && diff >=0){
+                OverlapResult ov;
+                ov.overlapped = true;
+                ov.offset = offset;
+                ov.overlap_len = overlap_len;
+                ov.diff = diff;
+                ov.hasGap = true;
+                return ov;
+            }
+
+            offset -= 1;
+        }
+    }
+
     OverlapResult ov;
     ov.overlapped = false;
     ov.offset = ov.overlap_len = ov.diff = 0;
+    ov.hasGap = false;
     return ov;
 }
 


=====================================
src/overlapanalysis.h
=====================================
@@ -18,6 +18,7 @@ public:
     int offset;
     int overlap_len;
     int diff;
+    bool hasGap;
 };
 
 class OverlapAnalysis{
@@ -25,8 +26,8 @@ public:
     OverlapAnalysis();
     ~OverlapAnalysis();
 
-    static OverlapResult analyze(string*  r1, string*  r2, int diffLimit, int overlapRequire, double diffPercentLimit);
-    static OverlapResult analyze(Read* r1, Read* r2, int diffLimit, int overlapRequire, double diffPercentLimit);
+    static OverlapResult analyze(string*  r1, string*  r2, int diffLimit, int overlapRequire, double diffPercentLimit, bool allowGap = false);
+    static OverlapResult analyze(Read* r1, Read* r2, int diffLimit, int overlapRequire, double diffPercentLimit, bool allowGap = false);
     static Read* merge(Read* r1, Read* r2, OverlapResult ov);
 
 public:


=====================================
src/peprocessor.cpp
=====================================
@@ -30,6 +30,7 @@ PairEndProcessor::PairEndProcessor(Options* opt){
     mMergedWriter = NULL;
     mFailedWriter = NULL;
     mOverlappedWriter = NULL;
+    shouldStopReading = false;
 
     mDuplicate = NULL;
     if(mOptions->duplicate.enabled) {
@@ -363,8 +364,8 @@ bool PairEndProcessor::processPairEnd(ReadPack* leftPack, ReadPack* rightPack, T
         cerr << "WARNNIG: different read numbers of the " << mPackProcessedCounter << " pack" << endl;
         cerr << "Read1 pack size: " << leftPack->count << endl;
         cerr << "Read2 pack size: " << rightPack->count << endl;
-        cerr << endl;
-	error_exit("input files don't contain identical amount of reads");
+        cerr << "Ignore the unmatched reads" << endl << endl;
+        shouldStopReading = true;
     }
     int tid = config->getThreadId();
 
@@ -430,13 +431,14 @@ bool PairEndProcessor::processPairEnd(ReadPack* leftPack, ReadPack* rightPack, T
         }
         bool isizeEvaluated = false;
         if(r1 != NULL && r2!=NULL && (mOptions->adapter.enabled || mOptions->correction.enabled)){
-            OverlapResult ov = OverlapAnalysis::analyze(r1, r2, mOptions->overlapDiffLimit, mOptions->overlapRequire, mOptions->overlapDiffPercentLimit/100.0);
+            OverlapResult ov = OverlapAnalysis::analyze(r1, r2, mOptions->overlapDiffLimit, mOptions->overlapRequire, mOptions->overlapDiffPercentLimit/100.0, mOptions->adapter.allowGapOverlapTrimming);
             // we only use thread 0 to evaluae ISIZE
             if(config->getThreadId() == 0) {
                 statInsertSize(r1, r2, ov, frontTrimmed1, frontTrimmed2);
                 isizeEvaluated = true;
             }
-            if(mOptions->correction.enabled) {
+            if(mOptions->correction.enabled && !ov.hasGap) {
+                // no gap allowed for overlap correction
                 BaseCorrector::correctByOverlapAnalysis(r1, r2, config->getFilterResult(), ov);
             }
             if(mOptions->adapter.enabled) {
@@ -459,7 +461,7 @@ bool PairEndProcessor::processPairEnd(ReadPack* leftPack, ReadPack* rightPack, T
         if(r1 != NULL && r2!=NULL && mOverlappedWriter) {
             OverlapResult ov = OverlapAnalysis::analyze(r1, r2, mOptions->overlapDiffLimit, mOptions->overlapRequire, 0);
             if(ov.overlapped) {
-                Read* overlappedRead = new Read(r1->mName, new string(r1->mSeq->substr(max(0,ov.offset)), ov.overlap_len), r1->mStrand, new string(r1->mQuality->substr(max(0,ov.offset)), ov.overlap_len));
+                Read* overlappedRead = new Read(new string(*r1->mName), new string(r1->mSeq->substr(max(0,ov.offset)), ov.overlap_len), new string(*r1->mStrand), new string(r1->mQuality->substr(max(0,ov.offset)), ov.overlap_len));
                 overlappedRead->appendToString(overlappedOut);
                 recycleToPool1(tid, overlappedRead);
             }
@@ -735,6 +737,8 @@ void PairEndProcessor::readerTask(bool isLeft)
     int count=0;
     bool needToBreak = false;
     while(true){
+        if(shouldStopReading)
+            break;
         Read* read = reader->read();
         if(!read || needToBreak){
             // the last pack


=====================================
src/peprocessor.h
=====================================
@@ -66,6 +66,7 @@ private:
     atomic_long mPackProcessedCounter;
     ReadPool* mLeftReadPool;
     ReadPool* mRightReadPool;
+    atomic_bool shouldStopReading;
 };
 
 


=====================================
src/seprocessor.cpp
=====================================
@@ -47,9 +47,9 @@ SingleEndProcessor::~SingleEndProcessor() {
 void SingleEndProcessor::initOutput() {
     if(!mOptions->failedOut.empty())
         mFailedWriter = new WriterThread(mOptions, mOptions->failedOut);
-    if(mOptions->out1.empty())
+    if(mOptions->out1.empty() && !mOptions->outputToSTDOUT)
         return;
-    mLeftWriter = new WriterThread(mOptions, mOptions->out1);
+    mLeftWriter = new WriterThread(mOptions, mOptions->out1, mOptions->outputToSTDOUT);
 }
 
 void SingleEndProcessor::closeOutput() {
@@ -281,9 +281,7 @@ bool SingleEndProcessor::processSingleEnd(ReadPack* pack, ThreadConfig* config){
             recycleToPool(tid, r1);
     }
 
-    if(mOptions->outputToSTDOUT) {
-        fwrite(outstr->c_str(), 1, outstr->length(), stdout);
-    } else if(mOptions->split.enabled) {
+    if(mOptions->split.enabled) {
         // split output by each worker thread
         if(!mOptions->out1.empty())
             config->getWriter1()->writeString(outstr);


=====================================
src/stats.cpp
=====================================
@@ -23,6 +23,7 @@ Stats::Stats(Options* opt, bool isRead2, int guessedCycles, int bufferMargin){
     mBases = 0;
     mQ20Total = 0;
     mQ30Total = 0;
+    mQ40Total = 0;
     summarized = false;
     mKmerMin = 0;
     mKmerMax = 0;
@@ -57,6 +58,8 @@ Stats::Stats(Options* opt, bool isRead2, int guessedCycles, int bufferMargin){
     mKmer = new long[mKmerBufLen];
     memset(mKmer, 0, sizeof(long)*mKmerBufLen);
 
+    memset(mBaseQualHistogram, 0, sizeof(long)*128);
+
     initOverRepSeq();
 }
 
@@ -163,6 +166,10 @@ void Stats::summarize(bool forced) {
         mQ30Total += mQ30Bases[i];
     }
 
+    for(char c=40; c<127-33; c++) {
+        mQ40Total += mBaseQualHistogram[c+33];
+    }
+
 
     // quality curve for mean qual
     double* meanQualCurve = new double[mCycles];
@@ -244,6 +251,8 @@ void Stats::statRead(Read* r) {
         const char q20 = '5';
         const char q30 = '?';
 
+        mBaseQualHistogram[qual]++;
+
         if(qual >= q30) {
             mCycleQ30Bases[b][i]++;
             mCycleQ20Bases[b][i]++;
@@ -367,6 +376,16 @@ long Stats::getQ30() {
     return mQ30Total;
 }
 
+long Stats::getQ40() {
+    if(!summarized)
+        summarize();
+    return mQ40Total;
+}
+
+long* Stats::getQualHist() {
+    return mBaseQualHistogram;
+}
+
 long Stats::getGCNumber() {
     if(!summarized)
         summarize();
@@ -381,6 +400,7 @@ void Stats::print() {
     cerr << "total bases: " << mBases << endl;
     cerr << "Q20 bases: " << mQ20Total << "(" << (mQ20Total*100.0)/mBases << "%)" << endl;
     cerr << "Q30 bases: " << mQ30Total << "(" << (mQ30Total*100.0)/mBases << "%)" << endl;
+    cerr << "Q40 bases: " << mQ40Total << "(" << (mQ40Total*100.0)/mBases << "%)" << endl;
 }
 
 void Stats::reportJson(ofstream& ofs, string padding) {
@@ -390,6 +410,7 @@ void Stats::reportJson(ofstream& ofs, string padding) {
     ofs << padding << "\t" << "\"total_bases\": " << mBases << "," << endl;
     ofs << padding << "\t" << "\"q20_bases\": " << mQ20Total << "," << endl;
     ofs << padding << "\t" << "\"q30_bases\": " << mQ30Total << "," << endl;
+    ofs << padding << "\t" << "\"q40_bases\": " << mQ40Total << "," << endl;
     ofs << padding << "\t" << "\"total_cycles\": " << mCycles << "," << endl;
 
     // quality curves
@@ -921,6 +942,11 @@ Stats* Stats::merge(vector<Stats*>& list) {
             s->mKmer[i] += list[t]->mKmer[i];
         }
 
+        // merge base/read qual histogram
+        for(int i=0; i<128; i++) {
+            s->mBaseQualHistogram[i] += list[t]->mBaseQualHistogram[i];
+        }
+
         // merge over rep seq
         for(iter = s->mOverRepSeq.begin(); iter != s->mOverRepSeq.end(); iter++) {
             string seq = iter->first;


=====================================
src/stats.h
=====================================
@@ -21,7 +21,9 @@ public:
     long getBases();
     long getQ20();
     long getQ30();
+    long getQ40();
     long getGCNumber();
+    long* getQualHist();
     // by default the qualified qual score is Q20 ('5')
     void statRead(Read* r);
 
@@ -75,6 +77,7 @@ private:
     long *mCycleTotalBase;
     long *mCycleTotalQual;
     long *mKmer;
+    long mBaseQualHistogram[128];
 
     map<string, double*> mQualityCurves;
     map<string, double*> mContentCurves;
@@ -90,6 +93,7 @@ private:
     long mBaseContents[8];
     long mQ20Total;
     long mQ30Total;
+    long mQ40Total;
     bool summarized;
     long mKmerMax;
     long mKmerMin;


=====================================
src/writer.cpp
=====================================
@@ -26,7 +26,7 @@ SOFTWARE.
 #include "util.h"
 #include <string.h>
 
-Writer::Writer(Options* opt, string filename, int compression){
+Writer::Writer(Options* opt, string filename, int compression, bool isSTDOUT){
 	mCompression = compression;
 	mFilename = filename;
 	mCompressor = NULL;
@@ -36,6 +36,7 @@ Writer::Writer(Options* opt, string filename, int compression){
 	mBufDataLen = 0;
 	mOptions = opt;
 	mBufSize = mOptions->writerBufferSize;
+	mSTDOUT = isSTDOUT;
 	init();
 }
 
@@ -62,6 +63,10 @@ void Writer::init(){
 	if(mBuffer == NULL) {
 		error_exit("Failed to allocate write buffer with size: " + to_string(mBufSize));
 	}
+	if(mSTDOUT) {
+		mFP = stdout;
+		return ;
+	}
 	if (ends_with(mFilename, ".gz")){
 		mCompressor = libdeflate_alloc_compressor(mCompression);
 		if(mCompressor == NULL) {
@@ -138,7 +143,7 @@ void Writer::close(){
 		free(mBuffer);
 		mBuffer = NULL;
 	}
-	if(mFP) {
+	if(mFP && !mSTDOUT) {
 		fclose(mFP);
 		mFP = NULL;
 	}


=====================================
src/writer.h
=====================================
@@ -38,7 +38,7 @@ using namespace std;
 
 class Writer{
 public:
-	Writer(Options* opt, string filename, int compression);
+	Writer(Options* opt, string filename, int compression, bool isSTDOUT = false);
 	~Writer();
 	bool isZipped();
 	bool writeString(const string& str);
@@ -67,6 +67,7 @@ private:
 	size_t mBufDataLen;
 	size_t mBufSize;
 	Options* mOptions;
+	bool mSTDOUT;
 };
 
 #endif
\ No newline at end of file


=====================================
src/writerthread.cpp
=====================================
@@ -3,7 +3,7 @@
 #include <memory.h>
 #include <unistd.h>
 
-WriterThread::WriterThread(Options* opt, string filename){
+WriterThread::WriterThread(Options* opt, string filename, bool isSTDOUT){
     mOptions = opt;
 
     mWriter1 = NULL;
@@ -11,7 +11,7 @@ WriterThread::WriterThread(Options* opt, string filename){
     mInputCompleted = false;
     mFilename = filename;
 
-    initWriter(filename);
+    initWriter(filename, isSTDOUT);
     initBufferLists();
     mWorkingBufferList = 0; // 0 ~ mOptions->thread-1
     mBufferLength = 0;
@@ -69,9 +69,9 @@ void WriterThread::deleteWriter() {
     }
 }
 
-void WriterThread::initWriter(string filename1) {
+void WriterThread::initWriter(string filename1, bool isSTDOUT) {
     deleteWriter();
-    mWriter1 = new Writer(mOptions, filename1, mOptions->compression);
+    mWriter1 = new Writer(mOptions, filename1, mOptions->compression, isSTDOUT);
 }
 
 void WriterThread::initBufferLists() {


=====================================
src/writerthread.h
=====================================
@@ -15,10 +15,10 @@ using namespace std;
 
 class WriterThread{
 public:
-    WriterThread(Options* opt, string filename);
+    WriterThread(Options* opt, string filename, bool isSTDOUT = false);
     ~WriterThread();
 
-    void initWriter(string filename1);
+    void initWriter(string filename1, bool isSTDOUT = false);
     void initBufferLists();
 
     void cleanup();



View it on GitLab: https://salsa.debian.org/med-team/fastp/-/commit/75395a60dc0c4274337f4da4c31149002d6aa530

-- 
View it on GitLab: https://salsa.debian.org/med-team/fastp/-/commit/75395a60dc0c4274337f4da4c31149002d6aa530
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20250816/6747a3b4/attachment-0001.htm>


More information about the debian-med-commit mailing list