[med-svn] [Git][med-team/vsearch][upstream] New upstream version 2.21.0

Andreas Tille (@tille) gitlab at salsa.debian.org
Mon Jan 17 14:49:14 GMT 2022



Andreas Tille pushed to branch upstream at Debian Med / vsearch


Commits:
e1364a48 by Andreas Tille at 2022-01-17T15:39:56+01:00
New upstream version 2.21.0
- - - - -


30 changed files:

- Dockerfile
- README.md
- configure.ac
- man/vsearch.1
- src/Makefile.am
- src/allpairs.cc
- src/chimera.cc
- src/cluster.cc
- src/derep.cc
- src/derep.h
- src/eestats.cc
- src/fasta.cc
- src/fastq.cc
- src/fastqops.cc
- src/mask.cc
- src/mergepairs.cc
- src/rerep.cc
- src/results.cc
- src/results.h
- src/search.cc
- src/searchexact.cc
- src/shuffle.cc
- src/sintax.cc
- src/sortbylength.cc
- src/sortbysize.cc
- + src/tax.cc
- + src/tax.h
- src/udb.cc
- src/vsearch.cc
- src/vsearch.h


Changes:

=====================================
Dockerfile
=====================================
@@ -5,7 +5,7 @@ RUN apk add --no-cache \
         libstdc++ zlib-dev bzip2-dev \
         autoconf automake make g++ && \
     ./autogen.sh && \
-    ./configure && \
+    ./configure CFLAGS="-O3" CXXFLAGS="-O3" && \
     make clean && \
     make && \
     make install && \


=====================================
README.md
=====================================
@@ -37,7 +37,7 @@ Most of the nucleotide based commands and options in USEARCH version 7 are suppo
 
 ## Getting Help
 
-If you can't find an answer in the [VSEARCH documentation](https://github.com/torognes/vsearch/releases/download/v2.18.0/vsearch_manual.pdf), please visit the [VSEARCH Web Forum](https://groups.google.com/forum/#!forum/vsearch-forum) to post a question or start a discussion.
+If you can't find an answer in the [VSEARCH documentation](https://github.com/torognes/vsearch/releases/download/v2.21.0/vsearch_manual.pdf), please visit the [VSEARCH Web Forum](https://groups.google.com/forum/#!forum/vsearch-forum) to post a question or start a discussion.
 
 ## Example
 
@@ -50,16 +50,16 @@ In the example below, VSEARCH will identify sequences in the file database.fsa t
 **Source distribution** To download the source distribution from a [release](https://github.com/torognes/vsearch/releases) and build the executable and the documentation, use the following commands:
 
 ```
-wget https://github.com/torognes/vsearch/archive/v2.18.0.tar.gz
-tar xzf v2.18.0.tar.gz
-cd vsearch-2.18.0
+wget https://github.com/torognes/vsearch/archive/v2.21.0.tar.gz
+tar xzf v2.21.0.tar.gz
+cd vsearch-2.21.0
 ./autogen.sh
-./configure
+./configure CFLAGS="-O3" CXXFLAGS="-O3"
 make
 make install  # as root or sudo make install
 ```
 
-You may customize the installation directory using the `--prefix=DIR` option to `configure`. If the compression libraries [zlib](https://www.zlib.net) and/or [bzip2](https://www.sourceware.org/bzip2/) are installed on the system, they will be detected automatically and support for compressed files will be included in vsearch. Support for compressed files may be disabled using the `--disable-zlib` and `--disable-bzip2` options to `configure`. A PDF version of the manual will be created from the `vsearch.1` manual file if `ps2pdf` is available, unless disabled using the `--disable-pdfman` option to `configure`. Other  options may also be applied to `configure`, please run `configure -h` to see them all. GNU autoconf (version 2.63 or later), automake and the GCC C++ compiler is required to build vsearch.
+You may customize the installation directory using the `--prefix=DIR` option to `configure`. If the compression libraries [zlib](https://www.zlib.net) and/or [bzip2](https://www.sourceware.org/bzip2/) are installed on the system, they will be detected automatically and support for compressed files will be included in vsearch. Support for compressed files may be disabled using the `--disable-zlib` and `--disable-bzip2` options to `configure`. A PDF version of the manual will be created from the `vsearch.1` manual file if `ps2pdf` is available, unless disabled using the `--disable-pdfman` option to `configure`. It is recommended to run configure with the options `CFLAGS="-O3"` and `CXXFLAGS="-O3"`. Other  options may also be applied to `configure`, please run `configure -h` to see them all. GNU autoconf (version 2.63 or later), automake and the GCC C++ compiler is required to build vsearch. Version 3.82 or later of Make may be required on Linux, while version 3.81 is sufficient on macOS.
 
 The distributed Linux ppc64le and aarch64 binaries and the Windows binary were compiled using the [Mingw-w64](http://mingw-w64.org/) C++ cross-compiler.
 
@@ -81,43 +81,43 @@ Binary distributions are provided for x86-64 systems running GNU/Linux, macOS (v
 Download the appropriate executable for your system using the following commands if you are using a Linux x86_64 system:
 
 ```sh
-wget https://github.com/torognes/vsearch/releases/download/v2.18.0/vsearch-2.18.0-linux-x86_64.tar.gz
-tar xzf vsearch-2.18.0-linux-x86_64.tar.gz
+wget https://github.com/torognes/vsearch/releases/download/v2.21.0/vsearch-2.21.0-linux-x86_64.tar.gz
+tar xzf vsearch-2.21.0-linux-x86_64.tar.gz
 ```
 
 Or these commands if you are using a Linux ppc64le system:
 
 ```sh
-wget https://github.com/torognes/vsearch/releases/download/v2.18.0/vsearch-2.18.0-linux-ppc64le.tar.gz
-tar xzf vsearch-2.18.0-linux-ppc64le.tar.gz
+wget https://github.com/torognes/vsearch/releases/download/v2.21.0/vsearch-2.21.0-linux-ppc64le.tar.gz
+tar xzf vsearch-2.21.0-linux-ppc64le.tar.gz
 ```
 
 Or these commands if you are using a Linux aarch64 (arm64) system:
 
 ```sh
-wget https://github.com/torognes/vsearch/releases/download/v2.18.0/vsearch-2.18.0-linux-aarch64.tar.gz
-tar xzf vsearch-2.18.0-linux-aarch64.tar.gz
+wget https://github.com/torognes/vsearch/releases/download/v2.21.0/vsearch-2.21.0-linux-aarch64.tar.gz
+tar xzf vsearch-2.21.0-linux-aarch64.tar.gz
 ```
 
 Or these commands if you are using a Mac:
 
 ```sh
-wget https://github.com/torognes/vsearch/releases/download/v2.18.0/vsearch-2.18.0-macos-x86_64.tar.gz
-tar xzf vsearch-2.18.0-macos-x86_64.tar.gz
+wget https://github.com/torognes/vsearch/releases/download/v2.21.0/vsearch-2.21.0-macos-x86_64.tar.gz
+tar xzf vsearch-2.21.0-macos-x86_64.tar.gz
 ```
 
 Or if you are using Windows, download and extract (unzip) the contents of this file:
 
 ```
-https://github.com/torognes/vsearch/releases/download/v2.18.0/vsearch-2.18.0-win-x86_64.zip
+https://github.com/torognes/vsearch/releases/download/v2.21.0/vsearch-2.21.0-win-x86_64.zip
 ```
 
-Linux and Mac: You will now have the binary distribution in a folder called `vsearch-2.18.0-linux-x86_64` or `vsearch-2.18.0-macos-x86_64` in which you will find three subfolders `bin`, `man` and `doc`. We recommend making a copy or a symbolic link to the vsearch binary `bin/vsearch` in a folder included in your `$PATH`, and a copy or a symbolic link to the vsearch man page `man/vsearch.1` in a folder included in your `$MANPATH`. The PDF version of the manual is available in `doc/vsearch_manual.pdf`. Versions with statically compiled libraries are available for Linux systems. These have "-static" in their name, and could be used on systems that do not have all the necessary libraries installed.
+Linux and Mac: You will now have the binary distribution in a folder called `vsearch-2.21.0-linux-x86_64` or `vsearch-2.21.0-macos-x86_64` in which you will find three subfolders `bin`, `man` and `doc`. We recommend making a copy or a symbolic link to the vsearch binary `bin/vsearch` in a folder included in your `$PATH`, and a copy or a symbolic link to the vsearch man page `man/vsearch.1` in a folder included in your `$MANPATH`. The PDF version of the manual is available in `doc/vsearch_manual.pdf`. Versions with statically compiled libraries are available for Linux systems. These have "-static" in their name, and could be used on systems that do not have all the necessary libraries installed.
 
-Windows: You will now have the binary distribution in a folder called `vsearch-2.18.0-win-x86_64`. The vsearch executable is called `vsearch.exe`. The manual in PDF format is called `vsearch_manual.pdf`.
+Windows: You will now have the binary distribution in a folder called `vsearch-2.21.0-win-x86_64`. The vsearch executable is called `vsearch.exe`. The manual in PDF format is called `vsearch_manual.pdf`.
 
 
-**Documentation** The VSEARCH user's manual is available in the `man` folder in the form of a [man page](https://github.com/torognes/vsearch/blob/master/man/vsearch.1). A pdf version ([vsearch_manual.pdf](https://github.com/torognes/vsearch/releases/download/v2.18.0/vsearch_manual.pdf)) will be generated by `make`. To install the manpage manually, copy the `vsearch.1` file or a create a symbolic link to `vsearch.1` in a folder included in your `$MANPATH`. The manual in both formats is also available with the binary distribution. The manual in PDF form ([vsearch_manual.pdf](https://github.com/torognes/vsearch/releases/download/v2.18.0/vsearch_manual.pdf)) is also attached to the latest [release](https://github.com/torognes/vsearch/releases).
+**Documentation** The VSEARCH user's manual is available in the `man` folder in the form of a [man page](https://github.com/torognes/vsearch/blob/master/man/vsearch.1). A pdf version ([vsearch_manual.pdf](https://github.com/torognes/vsearch/releases/download/v2.21.0/vsearch_manual.pdf)) will be generated by `make`. To install the manpage manually, copy the `vsearch.1` file or a create a symbolic link to `vsearch.1` in a folder included in your `$MANPATH`. The manual in both formats is also available with the binary distribution. The manual in PDF form ([vsearch_manual.pdf](https://github.com/torognes/vsearch/releases/download/v2.21.0/vsearch_manual.pdf)) is also attached to the latest [release](https://github.com/torognes/vsearch/releases).
 
 
 ## Packages, plugins, and wrappers
@@ -191,7 +191,7 @@ VSEARCH may include code from the [bzip2](https://www.sourceware.org/bzip2/) lib
 
 ## Code
 
-The code is written in C++ but most of it is actually mostly C with some C++ syntax conventions.
+The code is written mostly in C++.
 
 File | Description
 ---|---
@@ -205,6 +205,7 @@ File | Description
 **city.cc** | CityHash code
 **cluster.cc** | Clustering (cluster\_fast and cluster\_smallmem)
 **cpu.cc** | Code dependent on specific cpu features (e.g. ssse3)
+**cut.cc** | Restriction site cutting
 **db.cc** | Handles the database file read, access etc
 **dbhash.cc** | Database hashing for exact searches
 **dbindex.cc** | Indexes the database by identifying unique kmers in the sequences
@@ -238,9 +239,11 @@ File | Description
 **sha1.c** | SHA1 message digest
 **showalign.cc** | Output an alignment in a human-readable way given a CIGAR-string and the sequences
 **shuffle.cc** | Shuffle sequences
+**sintax.cc** | Taxonomic classification using Sintax method
 **sortbylength.cc** | Code for sorting by length
 **sortbysize.cc** | Code for sorting by size (abundance)
 **subsample.cc** | Subsampling reads from a FASTA file
+**tax.cc** | Taxonomy information parsing
 **udb.cc** | UDB database file handling
 **unique.cc** | Find unique kmers in a sequence
 **userfields.cc** | Code for parsing the userfields option argument
@@ -316,6 +319,11 @@ doi:[10.1093/bioinformatics/btq461](https://doi.org/10.1093/bioinformatics/btq46
 *Bioinformatics*, 27 (16): 2194-2200.
 doi:[10.1093/bioinformatics/btr381](https://doi.org/10.1093/bioinformatics/btr381)
 
+* Edgar RC, Flyvbjerg H (2015)
+**Error filtering, pair assembly and error correction for next-generation sequencing reads.**
+*Bioinformatics*, 31 (21): 3476-3482.
+doi:[10.1093/bioinformatics/btv401](https://doi.org/10.1093/bioinformatics/btv401)
+
 * Guillou L, Bachar D, Audic S, Bass D, Berney C, Bittner L, Boutte C, Burgaud G, de Vargas C, Decelle J, del Campo J, Dolan J, Dunthorn M, Edvardsen B, Holzmann M, Kooistra W, Lara E, Lebescot N, Logares R, Mahé F, Massana R, Montresor M, Morard R, Not F, Pawlowski J, Probert I, Sauvadet A-L, Siano R, Stoeck T, Vaulot D, Zimmermann P & Christen R (2013)
 **The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote Small Sub-Unit rRNA sequences with curated taxonomy.**
 *Nucleic Acids Research*, 41 (D1), D597-D604.


=====================================
configure.ac
=====================================
@@ -2,7 +2,7 @@
 # Process this file with autoconf to produce a configure script.
 
 AC_PREREQ([2.63])
-AC_INIT([vsearch], [2.18.0], [torognes at ifi.uio.no], [vsearch], [https://github.com/torognes/vsearch])
+AC_INIT([vsearch], [2.21.0], [torognes at ifi.uio.no], [vsearch], [https://github.com/torognes/vsearch])
 AC_CANONICAL_TARGET
 AM_INIT_AUTOMAKE([subdir-objects])
 AC_LANG([C++])
@@ -12,11 +12,6 @@ AC_CONFIG_HEADERS([config.h])
 AC_SUBST(MACOSX_DEPLOYMENT_TARGET)
 MACOSX_DEPLOYMENT_TARGET="10.9"
 
-# Set default gcc and g++ options
-
-CFLAGS='-g'
-CXXFLAGS='-g -std=c++11'
-
 # Checks for programs.
 AC_PROG_CXX
 AC_PROG_RANLIB


=====================================
man/vsearch.1
=====================================
@@ -1,5 +1,5 @@
 .\" ============================================================================
-.TH vsearch 1 "August 27, 2021" "version 2.18.0" "USER COMMANDS"
+.TH vsearch 1 "January 12, 2022" "version 2.21.0" "USER COMMANDS"
 .\" ============================================================================
 .SH NAME
 vsearch \(em a versatile open-source tool for microbiome analysis,
@@ -36,6 +36,10 @@ Clustering:
 .RE
 Dereplication and rereplication:
 .RS
+\fBvsearch\fR \-\-fastx_uniques (\fIfastafile\fR | \fIfastqfile\fR)
+(\-\-fastaout | \-\-fastqout | \-\-tabbedout | \-\-uc) \fIoutputfile\fR
+[\fIoptions\fR]
+.PP
 \fBvsearch\fR (\-\-derep_fulllength | \-\-derep_id | \-\-derep_prefix)
 \fIfastafile\fR (\-\-output | \-\-uc) \fIoutputfile\fR [\fIoptions\fR]
 .PP
@@ -140,13 +144,14 @@ Searching:
 .RS
 \fBvsearch\fR \-\-search_exact \fIfastafile\fR \-\-db \fIfastafile\fR
 (\-\-alnout | \-\-biomout | \-\-blast6out | \-\-mothur_shared_out |
-\-\-otutabout | \-\-samout | \-\-uc | \-\-userout) \fIoutputfile\fR
-[\fIoptions\fR]
+\-\-otutabout | \-\-samout | \-\-uc | \-\-userout | \-\-lcaout)
+\fIoutputfile\fR [\fIoptions\fR]
 .PP
 \fBvsearch\fR \-\-usearch_global \fIfastafile\fR \-\-db
 \fIfastafile\fR (\-\-alnout | \-\-biomout | \-\-blast6out |
 \-\-mothur_shared_out | \-\-otutabout | \-\-samout | \-\-uc |
-\-\-userout) \fIoutputfile\fR \-\-id \fIreal\fR [\fIoptions\fR]
+\-\-userout | \-\-lcaout) \fIoutputfile\fR \-\-id \fIreal\fR
+[\fIoptions\fR]
 .PP
 .RE
 Shuffling and sorting:
@@ -346,6 +351,11 @@ the value to zero to eliminate the wrapping.
 When reading from a pipe streaming gzip-compressed data, decompress
 the data. This option is not needed when reading from a standard
 gzip-compressed file.
+.TAG label_suffix
+.TP
+.BI \-\-label_suffix\~ string
+When writing FASTA or FASTQ files, add the suffix \fIstring\fR to
+sequence headers.
 .TAG log
 .TP
 .BI \-\-log \0filename
@@ -383,6 +393,12 @@ the sintax command.
 .B \-\-quiet
 Suppress all messages to stdout and stderr except for warnings and
 fatal error messages.
+.TAG sample
+.TP
+.BI \-\-sample\~ string
+When writing FASTA or FASTQ files, add the the given sample identifier
+\fIstring\fR to sequence headers. For instance, if the given string is
+ABC, the text ";sample=ABC" will be added to the header.
 .TAG threads
 .TP
 .BI \-\-threads\~ "positive integer"
@@ -449,11 +465,13 @@ order may vary when using multiple threads.
 .TAG db
 .TP
 .BI \-\-db \0filename
-When using \-\-uchime_ref, detect chimeras using the fasta-formatted
-reference sequences contained in \fIfilename\fR. Reference sequences
-are assumed to be chimera-free. Chimeras cannot be detected if their
-parents, or sufficiently close relatives, are not present in the
-database.
+When using \-\-uchime_ref, detect chimeras using the reference
+sequences contained in \fIfilename\fR. Reference sequences are assumed
+to be chimera-free. Chimeras cannot be detected if their parents, or
+sufficiently close relatives, are not present in the database. The
+file name must refer to a FASTA file or to a UDB file. If a UDB file
+is used, it should be created using the \-\-makeudb_usearch command
+with the \-\-dbmask dust option.
 .TAG dn
 .TP
 .BI \-\-dn\~ "strictly positive real number"
@@ -884,6 +902,11 @@ Mask regions in sequences using the
 \fIdust\fR or the \fIsoft\fR methods, or do not mask
 (\fInone\fR). Warning, when using \fIsoft\fR masking, clustering
 becomes case sensitive. The default is to mask using \fIdust\fR.
+.TAG qsegout
+.TP
+.BI \-\-qsegout \0filename
+Write the aligned part of each query sequence to \fIfilename\fR in
+FASTA format.
 .TAG relabel
 .TP
 .BI \-\-relabel \0string
@@ -945,6 +968,11 @@ per cluster for centroids.
 .BI \-\-strand\~ "plus|both"
 When comparing sequences with the cluster seed, check the \fIplus\fR
 strand only (default) or check \fIboth\fR strands.
+.TAG tsegout
+.TP
+.BI \-\-tsegout \0filename
+Write the aligned part of each target sequence to \fIfilename\fR in
+FASTA format.
 .TAG uc
 .TP
 .BI \-\-uc \0filename
@@ -1013,7 +1041,32 @@ definitions): \-\-alnout, \-\-blast6out, \-\-fastapairs, \-\-matched,
 .\" ----------------------------------------------------------------------------
 .TAG dereplication-and-rereplication-options
 Dereplication and rereplication options:
+.PP
 .RS
+VSEARCH can dereplicate sequences with the commands
+\-\-derep_fulllength, \-\-derep_id, \-\-derep_prefix and
+\-\-fastx_uniques. The \-\-derep_fulllength command is depreciated and
+is replaced by the new \-\-fastx_uniques command that can also handle
+FASTQ files in addition to FASTA files. The \-\-derep_fulllength and
+\-\-fastx_uniques commands requires strictly identical sequences of
+the same length, but ignores upper/lower case and treats T and U as
+identical symbols. The \-\-derep_id command requires both identical
+sequences and identical headers/labels. The \-\-derep_prefix command
+will group sequences with a common prefix and does not require them to
+be equally long. The \-\-fastx_uniques command can write FASTQ output
+(specified with \-\-fastqout) or FASTA output (specified with
+\-\-fastaout) as well as a special tab-separated column text format
+(with \-\-tabbedout). The other commands can write FASTA output to the
+file specified with the \-\-output option. All dereplication commands
+can write output to a special UCLUST-like file specified with the
+\-\-uc option. The \-\-rereplicate command can duplicate sequences in
+the input file according to the abundance of each input
+sequence. Other valid options are \-\-fastq_ascii, \-\-fastq_asciiout,
+\-\-fastq_qmax, \-\-fastq_qmaxout, \-\-fastq_qmin, \-\-fastq_qminout,
+\-\-fastq_qout_max, \-\-maxuniquesize, \-\-minuniquesize, \-\-relabel,
+\-\-relabel_keep, \-\-relabel_md5, \-\-relabel_self, \-\-relabel_sha1,
+\-\-sizein, \-\-sizeout, \-\-strand, \-\-topn, and \-\-xsize.
+.PP
 .TAG derep_fulllength
 .TP 9
 .BI \-\-derep_fulllength \0filename
@@ -1024,7 +1077,7 @@ are considered the same). See the options \-\-sizein and \-\-sizeout
 to take into account and compute abundance values. This command does
 not support multithreading.
 .TAG derep_id
-.TP 9
+.TP
 .BI \-\-derep_id \0filename
 Merge strictly identical sequences contained in \fIfilename\fR, as
 with the \-\-derep_fulllength command, but the sequence labels
@@ -1041,6 +1094,88 @@ it is clustered with the most abundant. Remaining ties are solved
 using sequence headers and sequence input order. Sequence comparisons
 are case insensitive, and T and U are considered identical. This
 command does not support multithreading.
+.TAG fastaout
+.TP
+.BI \-\-fastaout \0filename
+Write the dereplicated sequences to \fIfilename\fR, in fasta format
+and sorted by decreasing abundance. Identical sequences receive the
+header of the first sequence of their group. If \-\-sizeout is used,
+the number of occurrences (i.e. abundance) of each sequence is
+indicated at the end of their fasta header using the
+pattern ';size=\fIinteger\fR;'. This option is only valid for
+\-\-fastx_uniques.
+.TAG fastqout
+.TP
+.BI \-\-fastqout \0filename
+Write the dereplicated sequences to \fIfilename\fR, in fastq format
+and sorted by decreasing abundance. Identical sequences receive the
+header of the first sequence of their group. If \-\-sizeout is used,
+the number of occurrences (i.e. abundance) of each sequence is
+indicated at the end of their fastq header using the
+pattern ';size=\fIinteger\fR;'. This option is only valid for
+\-\-fastx_uniques.
+.TAG fastq_ascii
+.TP
+.BI \-\-fastq_ascii\~ "positive integer"
+Define the ASCII character number used as the basis for the FASTQ
+quality score. The default is 33, which is used by the Sanger /
+Illumina 1.8+ FASTQ format (phred+33). The value 64 is used by the
+Solexa, Illumina 1.3+ and Illumina 1.5+ formats (phred+64). Only 33
+and 64 are valid arguments.
+.TAG fastq_asciiout
+.TP
+.BI \-\-fastq_asciiout\~ "positive integer"
+When using \-\-fastq_convert, \-\-sff_convert or \-\-fasta2fastq,
+define the ASCII character number used as the basis for the FASTQ
+quality score when writing FASTQ output files. The default is 33. Only
+33 and 64 are valid arguments.
+.TAG fastq_qmax
+.TP
+.BI \-\-fastq_qmax\~ "positive integer"
+Specify the maximum quality score accepted when reading FASTQ
+files. The default is 41, which is usual for recent Sanger/Illumina
+1.8+ files.
+.TAG fastq_qmaxout
+.TP
+.BI \-\-fastq_qmaxout\~ "positive integer"
+Specify the maximum quality score used when writing
+FASTQ files. The default
+is 41, which is usual for recent Sanger/Illumina 1.8+ files. Older
+formats may use a maximum quality score of 40.
+.TAG fastq_qmin
+.TP
+.BI \-\-fastq_qmin\~ "positive integer"
+Specify the minimum quality score accepted for FASTQ files. The
+default is 0, which is usual for recent Sanger/Illumina 1.8+
+files. Older formats may use scores between -5 and 2.
+.TAG fastq_qminout
+.TP
+.BI \-\-fastq_qminout\~ "positive integer"
+Specify the minimum quality score used when writing FASTQ files. The
+default is 0, which is usual for Sanger/Illumina 1.8+ files. Older
+versions of the format may use scores between -5 and 2.
+.TAG fastq_qout_max
+.TP
+.BI \-\-fastq_qout_max
+For \-\-fastx_uniques, indicate that the new quality scores computed
+when dereplicating FASTQ files should be equal to the maximum (best)
+of the input quality scores for each position (corresponding to the
+lowest error probability). The default is to output a quality score
+corresponding to the average of the error probabilities for each
+position.
+.TAG fastx_uniques
+.TP
+.BI \-\-fastx_uniques \0filename
+Merge strictly identical sequences contained in FASTA or FASTQ file
+\fIfilename\fR. Identical sequences are defined as having the same
+length and the same string of nucleotides (case insensitive, T and U
+are considered the same). See the options \-\-sizein and \-\-sizeout
+to take into account and compute abundance values. This command does
+not support multithreading. By default, the quality scores in FASTQ
+output files will correspond to the average error probability of the
+nucleotides in the each position. If the \-\-fastq_qout_max option is
+given, the quality score will be the highest (best) quality score
+observed in each position.
 .TAG maxuniquesize
 .TP
 .BI \-\-maxuniquesize\~ "positive integer"
@@ -1059,7 +1194,8 @@ and sorted by decreasing abundance. Identical sequences receive the
 header of the first sequence of their group. If \-\-sizeout is used,
 the number of occurrences (i.e. abundance) of each sequence is
 indicated at the end of their fasta header using the
-pattern ';size=\fIinteger\fR;'.
+pattern ';size=\fIinteger\fR;'. This option is not allowed for
+fastx_uniques.
 .TP
 .TAG relabel
 .BI \-\-relabel \0string
@@ -1116,6 +1252,20 @@ corresponding to its number of occurrences in the input file.
 .BI \-\-strand\~ "plus|both"
 When searching for strictly identical sequences, check the \fIplus\fR
 strand only (default) or check \fIboth\fR strands.
+.TAG tabbedout
+.TP
+.BI \-\-tabbedout \0filename
+Output clustering info to the specified tab-separated text file with 6
+columns and a row for each input sequence. Column 1 contains the
+original label/header of the sequence. Column 2 contains the label of
+the output sequence which is equal to the label/header of the first
+sequence in each cluster, but potentially relabelled. Column 3
+contains the cluster number, starting from 0. Column 4 contains the
+sequence number within each cluster, starting at 0. Column 5 contains
+the number of sequences in the cluster. Column 6 contains the original
+label/header of the first sequence in the cluster before any potential
+relabelling. This option is only valid for the \-\-fastx_uniques
+command.
 .TAG topn
 .TP
 .BI \-\-topn\~ "positive integer"
@@ -1861,11 +2011,6 @@ When running \-\-fastq_join, use the \fIstring\fR as a quality padding
 string. The default is a string of I's equal in length to the sequence
 padding string. The letter I corresponds to a base quality score of 40
 indicating a very high quality base with error probability of 0.0001.
-.TAG label_suffix
-.TP
-.BI \-\-label_suffix\~ string
-When using \-\-fastx_revcomp or \-\-fastq_mergepairs, add the suffix
-\fIstring\fR to sequence headers.
 .TAG maxsize
 .TP
 .BI \-\-maxsize\~ "positive integer"
@@ -2143,74 +2288,20 @@ number of matching words on the reverse complementary strand.
 .RE
 .PP
 .\" ----------------------------------------------------------------------------
-.TAG restriction-site-cutting-options
-Restriction site cutting options:
-.RS
-.PP
-The input sequences in the file specified with the \-\-cut command are
-cut into fragments at all restriction sites matching the pattern given
-with the \-\-cut_pattern option. The fragments on the forward strand
-are written to the file specified with the \-\-fastaout file and the
-fragments on the reverse strand are written to the file specified with
-the \-\-fastaout_rev option. Input sequences that do not match are
-written to the file specified with the option \-\-fastaout_discarded,
-and their reverse complement are also written to the file specfied
-with the \-\-fastaout_discarded_rev option. The relabel options
-(\-\-relabel, \-\-relabel_self, \-\-relabel_keep, \-\-relabel_md5, and
-\-\-relabel_sha1) may be used to relabel the output sequences).
-.TAG cut
-.TP 9
-.BI \-\-cut \0filename
-Specify the input file with sequences in FASTA format.
-.TAG cut_pattern
-.TP
-.BI \-\-cut_pattern \0string
-Specify the restriction site cutting pattern and positions. The
-pattern is a string of lower- or uppercase letters specifying the
-nucleotides that must match, and may include ambiguous nucleotide
-symbols. The special characters "^" (circumflex) and "_" (underscore)
-are used to indicate the cutting position on the forward and reverse
-strand, respectively. For example, the pattern "G^AATT_C" is the
-pattern for the EcoRI restriction site. For such palindromic patterns
-(identical to its reverse complement) the command will output all
-possible fragments on both strands. For non-palindromic sites, it may
-be necessary to run the command also on the reverse complemented input
-sequences. Exactly one cutting site on each strand must be indicated.
-.TAG fastaout
-.TP
-.BI \-\-fastaout \0filename
-Specify the output file for the resulting fragments on the forward
-strand.
-.TAG fastaout_rev
-.TP
-.BI \-\-fastaout_rev \0filename
-Specify the output file for the resulting fragments on the reverse
-strand.
-.TAG fastaout_discarded
-.TP
-.BI \-\-fastaout_discarded \0filename
-Specify the output file for the non-matching sequences.
-.TAG fastaout_discarded_rev
-.TP
-.BI \-\-fastaout_discarded_rev \0filename
-Specify the output file for the non-matching seqeunces, reverse
-complemented.
-.RE
-.PP
-.\" ----------------------------------------------------------------------------
 .TAG pairwise-alignment-options
 Pairwise alignment options:
 .RS
 .PP
 The results of the n * (n-1) / 2 pairwise alignments are written to
 the result files specified with \-\-alnout, \-\-blast6out,
-\-\-fastapairs \-\-matched, \-\-notmatched, \-\-samout, \-\-uc or
-\-\-userout (see Searching section below). Specify either the
-\-\-acceptall option to output all pairwise alignments, or specify an
-identity level with \-\-id to discard weak alignments. Most other
-accept/reject options (see Searching options below) may also be
-used. Sequences are aligned on their \fIplus\fR strand only. Masking
-is performed as usual and specified with \-\-qmask and \-\-hardmask.
+\-\-fastapairs \-\-matched, \-\-notmatched, \-\-qsegout, \-\-samout,
+\-\-tsegout, \-\-uc or \-\-userout (see Searching section
+below). Specify either the \-\-acceptall option to output all pairwise
+alignments, or specify an identity level with \-\-id to discard weak
+alignments. Most other accept/reject options (see Searching options
+below) may also be used. Sequences are aligned on their \fIplus\fR
+strand only. Masking is performed as usual and specified with
+\-\-qmask and \-\-hardmask.
 .TAG acceptall
 .TP 9
 .B \-\-acceptall
@@ -2276,6 +2367,61 @@ Label of the target sequence.
 .RE
 .PP
 .\" ----------------------------------------------------------------------------
+.TAG restriction-site-cutting-options
+Restriction site cutting options:
+.RS
+.PP
+The input sequences in the file specified with the \-\-cut command are
+cut into fragments at all restriction sites matching the pattern given
+with the \-\-cut_pattern option. The fragments on the forward strand
+are written to the file specified with the \-\-fastaout file and the
+fragments on the reverse strand are written to the file specified with
+the \-\-fastaout_rev option. Input sequences that do not match are
+written to the file specified with the option \-\-fastaout_discarded,
+and their reverse complement are also written to the file specfied
+with the \-\-fastaout_discarded_rev option. The relabel options
+(\-\-relabel, \-\-relabel_self, \-\-relabel_keep, \-\-relabel_md5, and
+\-\-relabel_sha1) may be used to relabel the output sequences).
+.TAG cut
+.TP 9
+.BI \-\-cut \0filename
+Specify the input file with sequences in FASTA format.
+.TAG cut_pattern
+.TP
+.BI \-\-cut_pattern \0string
+Specify the restriction site cutting pattern and positions. The
+pattern is a string of lower- or uppercase letters specifying the
+nucleotides that must match, and may include ambiguous nucleotide
+symbols. The special characters "^" (circumflex) and "_" (underscore)
+are used to indicate the cutting position on the forward and reverse
+strand, respectively. For example, the pattern "G^AATT_C" is the
+pattern for the EcoRI restriction site. For such palindromic patterns
+(identical to its reverse complement) the command will output all
+possible fragments on both strands. For non-palindromic sites, it may
+be necessary to run the command also on the reverse complemented input
+sequences. Exactly one cutting site on each strand must be indicated.
+.TAG fastaout
+.TP
+.BI \-\-fastaout \0filename
+Specify the output file for the resulting fragments on the forward
+strand.
+.TAG fastaout_rev
+.TP
+.BI \-\-fastaout_rev \0filename
+Specify the output file for the resulting fragments on the reverse
+strand.
+.TAG fastaout_discarded
+.TP
+.BI \-\-fastaout_discarded \0filename
+Specify the output file for the non-matching sequences.
+.TAG fastaout_discarded_rev
+.TP
+.BI \-\-fastaout_discarded_rev \0filename
+Specify the output file for the non-matching seqeunces, reverse
+complemented.
+.RE
+.PP
+.\" ----------------------------------------------------------------------------
 .TAG searching-options
 Searching options:
 .RS
@@ -2496,6 +2642,31 @@ the target do not match the query.
 .BI \-\-idsuffix\~ "positive integer"
 Reject the sequence match if the last \fIinteger\fR nucleotides of the
 target do not match the query.
+.TAG lca_cutoff
+.TP
+.BI \-\-lca_cutoff \0real
+Adjust the fraction of matching hits required for the last common
+ancestor (LCA) output with the \-\-lcaout option during searches. The
+default value is 1.0 which requires all hits to match at each
+taxonomic rank for that rank to be included. If a lower cutoff value
+is used, e.g. 0.95, a small fraction of non-matching hits are allowed
+while that rank will still be reported. The argument to this option
+must be larger than 0.5, but not larger than 1.0.
+.TAG lcaout
+.TP
+.BI \-\-lcaout \0filename
+Output last common ancestor (LCA) information about the hits of each
+query to a text file in a tab-separated format. The first column
+contains the query id, while the second column contains the taxonomic
+information. The headers of the sequences in the database must contain
+taxonomic information in the same format as used with the \-\-sintax
+command, e.g. "tax=k:Archaea,p:Euryarchaeota,c:Halobacteria". Only the
+initial parts of the taxonomy that are common to a large fraction of
+the hits of each query will be output. It is necessary to set the
+\-\-maxaccepts option to a value differrent from 1 for this
+information to be useful. The \-\-top_hits_only option may also be
+useful. The fraction of matching hits required may be adjusted by the
+\-\-lca_cutoff option (default 1.0).
 .TAG leftjust
 .TP
 .B \-\-leftjust
@@ -2665,6 +2836,11 @@ Mask regions in the query sequences
 using the dust or the soft algorithms, or do not mask
 (none). Warning, when using soft masking search commands
 become case sensitive. The default is to mask using \fIdust\fR.
+.TAG qsegout
+.TP
+.BI \-\-qsegout \0filename
+Write the aligned part of each query sequence to \fIfilename\fR in
+FASTA format.
 .TAG query_cov
 .TP
 .BI \-\-query_cov \0real
@@ -2797,15 +2973,21 @@ length.  Internal or terminal gaps are not taken into account.
 .TAG top_hits_only
 .TP
 .B \-\-top_hits_only
-Only the top hits between the query and database sequence sets are
-written to the output specified with the options \-\-alnout,
-\-\-samout, \-\-userout, \-\-blast6out, \-\-uc, \-\-fastapairs,
-\-\-matched or \-\-notmatched (but not \-\-dbmatched and
-\-\-dbnotmatched). For each query, the top hit is the one presenting
-the highest percentage of identity (see the \-\-iddef option to change
-the way identity is measured). For a given query, if several top hits
-present exactly the same percentage of identity, the number of hits
-reported is controlled by the \-\-maxaccepts value (1 by default).
+Only the top hits with an equally high percentage of identity between
+the query and database sequence sets are written to the output
+specified with the options \-\-lcaout, \-\-alnout, \-\-samout,
+\-\-userout, \-\-blast6out, \-\-uc, \-\-fastapairs, \-\-matched or
+\-\-notmatched (but not \-\-dbmatched and \-\-dbnotmatched). For each
+query, the top hit is the one presenting the highest percentage of
+identity (see the \-\-iddef option to change the way identity is
+measured). For a given query, if several top hits present exactly the
+same percentage of identity, the number of hits reported is controlled
+by the \-\-maxaccepts value (1 by default).
+.TAG tsegout
+.TP
+.BI \-\-tsegout \0filename
+Write the aligned part of each target sequence to \fIfilename\fR in
+FASTA format.
 .TAG uc
 .TP
 .BI \-\-uc \0filename
@@ -3203,11 +3385,14 @@ sequences, BioRxiv, 074161. Preprint. doi: 10.1101/074161
 .URL https://doi.org/10.1101/074161 (link)
 .PP
 The name of the fasta file containing the input sequences to be
-classified is given as an argument to the \-\-sintax command. The reference
-sequence database is specified with the \-\-db option. The results are
-written in a tab delimited text file whose name is specified with the
-\-\-tabbedout option. The \-\-sintax_cutoff option may be used to set a
-minimum level of bootstrap support for the taxonomic ranks to be reported.
+classified is given as an argument to the \-\-sintax command. The
+reference sequence database is specified with the \-\-db option. The
+results are written in a tab delimited text file whose name is
+specified with the \-\-tabbedout option. The \-\-sintax_cutoff option
+may be used to set a minimum level of bootstrap support for the
+taxonomic ranks to be reported. The `--randseed` option may be
+included to specify a seed for initialisation of the random number
+generator used by the algorithm.
 .PP
 Multithreading is supported. Databases in UDB files are supported.
 The strand option may be specified.
@@ -3231,6 +3416,13 @@ allowing spaces in the taxonomic identifiers.
 .BI \-\-db \0filename
 Read the reference sequences from \fIfilename\fR, in FASTA, FASTQ or
 UDB format. These sequences needs to be annotated with taxonomy.
+.TAG randseed
+.TP
+.BI \-\-randseed\~ "positive integer"
+Use \fIinteger\fR as seed for the random number generator used in the
+Sintax algorithm. A given seed always produces the same output order
+(useful for replicability). Set to 0 to use a pseudo-random seed
+(default behavior).
 .TAG sintax_cutoff
 .TP
 .BI \-\-sintax_cutoff\~ "real"
@@ -4417,6 +4609,24 @@ Modernized code. Minor changes to help info.
 Added the fasta2fastq command. Fixed search bug on ppc64le. Fixed bug
 with removal of size and ee info in uc files. Fixed compilation errors
 in some cases. Made some general code improvements. Updated manual.
+.TP
+.BR v2.19.0\~ "released December 21st, 2021"
+Added the lcaout and lca_cutoff options to enable the output of last
+common ancestor (LCA) information about hits when searching. The
+randseed option was added as a valid option to the sintax
+command. Code improvements.
+.TP
+.BR v2.20.0\~ "released January 10th, 2022"
+Added the fastx_uniques command and the fastq_qout_max option for
+dereplication of FASTQ files. Some code cleaning.
+.TP
+.BR v2.20.1\~ "released January 11th, 2022"
+Fixes a bug in fastq_mergepair that caused an occational hang at the
+end when using multiple threads.
+.TP
+.BR v2.21.0\~ "released January 12th, 2022"
+This version adds the sample, qsegout and tsegout options. It enables
+the use of UDB databases with uchime_ref.
 .LP
 .\" ============================================================================
 .\" TODO:


=====================================
src/Makefile.am
=====================================
@@ -1,16 +1,16 @@
 bin_PROGRAMS = $(top_builddir)/bin/vsearch
 
 if TARGET_PPC
-AM_CXXFLAGS=-Wall -Wsign-compare -O3 -g -mcpu=powerpc64le -maltivec
+AM_CFLAGS=-Wall -Wsign-compare -mcpu=powerpc64le -maltivec
 else
 if TARGET_AARCH64
-AM_CXXFLAGS=-Wall -Wsign-compare -O3 -g -march=armv8-a+simd -mtune=generic
+AM_CFLAGS=-Wall -Wsign-compare -march=armv8-a+simd -mtune=generic
 else
-AM_CXXFLAGS=-Wall -Wsign-compare -O3 -g -march=x86-64 -mtune=generic
+AM_CFLAGS=-Wall -Wsign-compare -march=x86-64 -mtune=generic
 endif
 endif
 
-AM_CFLAGS=$(AM_CXXFLAGS)
+AM_CXXFLAGS=$(AM_CFLAGS) -std=c++11
 
 export MACOSX_DEPLOYMENT_TARGET=10.9
 
@@ -64,6 +64,7 @@ sintax.h \
 sortbylength.h \
 sortbysize.h \
 subsample.h \
+tax.h \
 udb.h \
 unique.h \
 userfields.h \
@@ -158,6 +159,7 @@ sintax.cc \
 sortbylength.cc \
 sortbysize.cc \
 subsample.cc \
+tax.cc \
 udb.cc \
 unique.cc \
 userfields.cc \


=====================================
src/allpairs.cc
=====================================
@@ -80,6 +80,8 @@ static FILE * fp_uc = nullptr;
 static FILE * fp_fastapairs = nullptr;
 static FILE * fp_matched = nullptr;
 static FILE * fp_notmatched = nullptr;
+static FILE * fp_qsegout = nullptr;
+static FILE * fp_tsegout = nullptr;
 
 static int count_matched = 0;
 static int count_notmatched = 0;
@@ -171,6 +173,26 @@ void allpairs_output_results(int hit_count,
                                           qsequence_rc);
             }
 
+          if (fp_qsegout)
+            {
+              results_show_qsegout_one(fp_qsegout,
+                                       hp,
+                                       query_head,
+                                       qsequence,
+                                       qseqlen,
+                                       qsequence_rc);
+            }
+
+          if (fp_tsegout)
+            {
+              results_show_tsegout_one(fp_tsegout,
+                                       hp,
+                                       query_head,
+                                       qsequence,
+                                       qseqlen,
+                                       qsequence_rc);
+            }
+
           if (fp_uc)
             {
               if ((t==0) || opt_uc_allhits)
@@ -647,6 +669,24 @@ void allpairs_global(char * cmdline, char * progheader)
         }
     }
 
+  if (opt_qsegout)
+    {
+      fp_qsegout = fopen_output(opt_qsegout);
+      if (! fp_qsegout)
+        {
+          fatal("Unable to open qsegout output file for writing");
+        }
+    }
+
+  if (opt_tsegout)
+    {
+      fp_tsegout = fopen_output(opt_tsegout);
+      if (! fp_tsegout)
+        {
+          fatal("Unable to open tsegout output file for writing");
+        }
+    }
+
   if (opt_matched)
     {
       fp_matched = fopen_output(opt_matched);
@@ -738,6 +778,14 @@ void allpairs_global(char * cmdline, char * progheader)
     {
       fclose(fp_fastapairs);
     }
+  if (opt_qsegout)
+    {
+      fclose(fp_qsegout);
+    }
+  if (opt_tsegout)
+    {
+      fclose(fp_tsegout);
+    }
   if (fp_uc)
     {
       fclose(fp_uc);


=====================================
src/chimera.cc
=====================================
@@ -1771,19 +1771,29 @@ void chimera()
   /* prepare queries / database */
   if (opt_uchime_ref)
     {
-      db_read(opt_db, 0);
+      /* check if the reference database may be an UDB file */
 
-      if (opt_dbmask == MASK_DUST)
+      bool is_udb = udb_detect_isudb(opt_db);
+
+      if (is_udb)
         {
-          dust_all();
+          udb_read(opt_db, true, true);
         }
-      else if ((opt_dbmask == MASK_SOFT) && (opt_hardmask))
+      else
         {
-          hardmask_all();
+          db_read(opt_db, 0);
+          if (opt_dbmask == MASK_DUST)
+            {
+              dust_all();
+            }
+          else if ((opt_dbmask == MASK_SOFT) && (opt_hardmask))
+            {
+              hardmask_all();
+            }
+          dbindex_prepare(1, opt_dbmask);
+          dbindex_addallsequences(opt_dbmask);
         }
 
-      dbindex_prepare(1, opt_dbmask);
-      dbindex_addallsequences(opt_dbmask);
       query_fasta_h = fasta_open(opt_uchime_ref);
       progress_total = fasta_get_size(query_fasta_h);
     }


=====================================
src/cluster.cc
=====================================
@@ -91,6 +91,8 @@ static FILE * fp_notmatched = nullptr;
 static FILE * fp_otutabout = nullptr;
 static FILE * fp_mothur_shared_out = nullptr;
 static FILE * fp_biomout = nullptr;
+static FILE * fp_qsegout = nullptr;
+static FILE * fp_tsegout = nullptr;
 
 static pthread_attr_t attr;
 
@@ -454,6 +456,26 @@ void cluster_core_results_hit(struct hit * best,
                                   qsequence_rc);
     }
 
+  if (fp_qsegout)
+    {
+      results_show_qsegout_one(fp_qsegout,
+                               best,
+                               query_head,
+                               qsequence,
+                               qseqlen,
+                               qsequence_rc);
+    }
+
+  if (fp_tsegout)
+    {
+      results_show_tsegout_one(fp_tsegout,
+                               best,
+                               query_head,
+                               qsequence,
+                               qseqlen,
+                               qsequence_rc);
+    }
+
   if (fp_userout)
     {
       results_show_userout_one(fp_userout, best, query_head,
@@ -1202,6 +1224,24 @@ void cluster(char * dbname,
         }
     }
 
+  if (opt_qsegout)
+    {
+      fp_qsegout = fopen_output(opt_qsegout);
+      if (! fp_qsegout)
+        {
+          fatal("Unable to open qsegout output file for writing");
+        }
+    }
+
+  if (opt_tsegout)
+    {
+      fp_tsegout = fopen_output(opt_tsegout);
+      if (! fp_tsegout)
+        {
+          fatal("Unable to open tsegout output file for writing");
+        }
+    }
+
   if (opt_matched)
     {
       fp_matched = fopen_output(opt_matched);
@@ -1678,6 +1718,14 @@ void cluster(char * dbname,
     {
       fclose(fp_fastapairs);
     }
+  if (opt_qsegout)
+    {
+      fclose(fp_qsegout);
+    }
+  if (opt_tsegout)
+    {
+      fclose(fp_tsegout);
+    }
   if (fp_blast6out)
     {
       fclose(fp_blast6out);


=====================================
src/derep.cc
=====================================
@@ -68,9 +68,11 @@ struct bucket
   unsigned int seqno_first;
   unsigned int seqno_last;
   unsigned int size;
+  unsigned int count;
   bool deleted;
   char * header;
   char * seq;
+  char * qual;
 };
 
 int derep_compare_prefix(const void * a, const void * b)
@@ -244,21 +246,113 @@ void rehash(struct bucket * * hashtableref, int64_t alloc_clusters)
   * hashtableref = new_hashtable;
 }
 
+inline double convert_q_to_p(int q)
+{
+  int x = q - opt_fastq_ascii;
+  if (x < 2)
+    {
+      return 0.75;
+    }
+  else
+    {
+      return exp10(-x/10.0);
+    }
+}
+
+inline int convert_p_to_q(double p)
+{
+  // int q = round(-10.0 * log10(p));
+  int q = int(-10.0 * log10(p));
+  q = MIN(q, opt_fastq_qmaxout);
+  q = MAX(q, opt_fastq_qminout);
+  return opt_fastq_asciiout + q;
+}
+
 void derep(char * input_filename, bool use_header)
 {
   /* dereplicate full length sequences, optionally require identical headers */
 
+  /*
+    derep_fulllength output options: --output, --uc (only FASTA, depreciated)
+    fastx_uniques output options: --fastaout, --fastqout, --uc, --tabbedout
+  */
+
   show_rusage();
 
-  FILE * fp_output = nullptr;
+  fastx_handle h = fastx_open(input_filename);
+
+  if (!h)
+    {
+      fatal("Unrecognized input file type (not proper FASTA or FASTQ format)");
+    }
+
+  if (fastx_is_fastq(h))
+    {
+      if (!opt_fastx_uniques)
+        fatal("FASTQ input is only allowed with the fastx_uniques command");
+    }
+  else
+    {
+      if (opt_fastqout)
+        fatal("Cannot write FASTQ output when input file is not in FASTQ format");
+      if (opt_tabbedout)
+        fatal("Cannot write tab separated output file when input file is not in FASTQ format");
+    }
+
+  FILE * fp_fastaout = nullptr;
+  FILE * fp_fastqout = nullptr;
   FILE * fp_uc = nullptr;
+  FILE * fp_tabbedout = nullptr;
 
-  if (opt_output)
+  if (opt_fastx_uniques)
     {
-      fp_output = fopen_output(opt_output);
-      if (!fp_output)
+      if ((!opt_uc) && (!opt_fastaout) && (!opt_fastqout) && (!opt_tabbedout))
+        fatal("Output file for dereplication with fastx_uniques must be specified with --fastaout, --fastqout, --tabbedout, or --uc");
+    }
+  else
+    {
+      if ((!opt_output) && (!opt_uc))
+        fatal("Output file for dereplication must be specified with --output or --uc");
+    }
+
+  if (opt_fastx_uniques)
+    {
+      if (opt_fastaout)
         {
-          fatal("Unable to open output file for writing");
+          fp_fastaout = fopen_output(opt_fastaout);
+          if (!fp_fastaout)
+            {
+              fatal("Unable to open FASTA output file for writing");
+            }
+        }
+
+      if (opt_fastqout)
+        {
+          fp_fastqout = fopen_output(opt_fastqout);
+          if (!fp_fastqout)
+            {
+              fatal("Unable to open FASTQ output file for writing");
+            }
+        }
+
+      if (opt_tabbedout)
+        {
+          fp_tabbedout = fopen_output(opt_tabbedout);
+          if (!fp_tabbedout)
+            {
+              fatal("Unable to open tab delimited output file for writing");
+            }
+        }
+    }
+  else
+    {
+      if (opt_output)
+        {
+          fp_fastaout = fopen_output(opt_output);
+          if (!fp_fastaout)
+            {
+              fatal("Unable to open FASTA output file for writing");
+            }
         }
     }
 
@@ -271,15 +365,6 @@ void derep(char * input_filename, bool use_header)
         }
     }
 
-  fastx_handle h = fastx_open(input_filename);
-
-  show_rusage();
-
-  if (!h)
-    {
-      fatal("Unrecognized input file type (not proper FASTA or FASTQ format)");
-    }
-
   uint64_t filesize = fastx_get_size(h);
 
 
@@ -303,9 +388,12 @@ void derep(char * input_filename, bool use_header)
   char ** headertab = nullptr;
   char * match_strand = nullptr;
 
-  if (opt_uc)
+  bool extra_info = opt_uc || opt_tabbedout;
+
+  if (extra_info)
     {
-      /* If the uc option is in effect we need to keep some extra info.
+      /* If the uc or tabbedout option is in effect,
+         we need to keep some extra info.
          Allocate and init memory for this. */
 
       /* Links to other sequences in cluster */
@@ -384,7 +472,7 @@ void derep(char * input_filename, bool use_header)
           show_rusage();
         }
 
-      if (opt_uc && (sequencecount + 1 > alloc_seqs))
+      if (extra_info && (sequencecount + 1 > alloc_seqs))
         {
           uint64_t new_alloc_seqs = 2 * alloc_seqs;
 
@@ -423,6 +511,7 @@ void derep(char * input_filename, bool use_header)
       char * seq = fastx_get_sequence(h);
       char * header = fastx_get_header(h);
       int64_t headerlen = fastx_get_header_length(h);
+      char * qual = fastx_get_quality(h); // nullptr if FASTA
 
       /* normalize sequence: uppercase and replace U by T  */
       string_normalize(seq_up, seq, seqlen);
@@ -488,10 +577,9 @@ void derep(char * input_filename, bool use_header)
             {
               bp = rc_bp;
               j = k;
-              if (opt_uc)
+              if (extra_info)
                 {
                   match_strand[sequencecount] = 1;
-
                 }
             }
         }
@@ -503,15 +591,75 @@ void derep(char * input_filename, bool use_header)
       if (bp->size)
         {
           /* at least one identical sequence already */
-          bp->size += ab;
-
-          if (opt_uc)
+          if (extra_info)
             {
               unsigned int last = bp->seqno_last;
               nextseqtab[last] = sequencecount;
               bp->seqno_last = sequencecount;
               headertab[sequencecount] = xstrdup(header);
             }
+
+          int64_t s1 = bp->size;
+          int64_t s2 = ab;
+          int64_t s3 = s1 + s2;
+
+          if (opt_fastqout)
+            {
+              /* update quality scores */
+              for (int i = 0; i < seqlen; i++)
+                {
+                  int q1 = bp->qual[i];
+                  int q2 = qual[i];
+                  double p1 = convert_q_to_p(q1);
+                  double p2 = convert_q_to_p(q2);
+                  double p3;
+
+                  /* how to compute the new quality score? */
+
+                  if (opt_fastq_qout_max)
+                    {
+                      // fastq_qout_max
+                      /* min error prob, highest quality */
+                      p3 = MIN(p1, p2);
+                    }
+                  else
+                    {
+                      // fastq_qout_avg
+                      /* average, as in USEARCH */
+                      p3 = (p1 * s1 + p2 * s2) / s3;
+                    }
+
+                  // fastq_qout_min
+                  /* max error prob, lowest quality */
+                  // p3 = MAX(p1, p2);
+
+                  // fastq_qout_first
+                  /* keep first */
+                  // p3 = p1;
+
+                  // fastq_qout_last
+                  /* keep last */
+                  // p3 = p2;
+
+                  // fastq_qout_ef
+                  /* Compute as multiple independent observations
+                     Edgar & Flyvbjerg (2015)
+                     But what about s1 and s2? */
+                  // p3 = p1 * p2 / 3.0 / (1.0 - p1 - p2 + (4.0 * p1 * p2 / 3.0));
+
+                  /* always worst quality possible, certain error */
+                  // p3 = 1.0;
+
+                  // always best quality possible, perfect, no errors */
+                  // p3 = 0.0;
+
+                  int q3 = convert_p_to_q(p3);
+                  bp->qual[i] = q3;
+                }
+            }
+
+          bp->size = s3;
+          bp->count++;
         }
       else
         {
@@ -522,6 +670,11 @@ void derep(char * input_filename, bool use_header)
           bp->seqno_last = sequencecount;
           bp->seq = xstrdup(seq);
           bp->header = xstrdup(header);
+          bp->count = 1;
+          if (qual)
+            bp->qual = xstrdup(qual);
+          else
+            bp->qual = nullptr;
           clusters++;
         }
 
@@ -700,9 +853,9 @@ void derep(char * input_filename, bool use_header)
 
   /* write output */
 
-  if (opt_output)
+  if (opt_output || opt_fastaout)
     {
-      progress_init("Writing output file", clusters);
+      progress_init("Writing FASTA output file", clusters);
 
       int64_t relabel_count = 0;
       for (uint64_t i=0; i<clusters; i++)
@@ -712,7 +865,7 @@ void derep(char * input_filename, bool use_header)
           if ((size >= opt_minuniquesize) && (size <= opt_maxuniquesize))
             {
               relabel_count++;
-              fasta_print_general(fp_output,
+              fasta_print_general(fp_fastaout,
                                   nullptr,
                                   bp->seq,
                                   strlen(bp->seq),
@@ -731,7 +884,40 @@ void derep(char * input_filename, bool use_header)
         }
 
       progress_done();
-      fclose(fp_output);
+      fclose(fp_fastaout);
+    }
+
+  if (opt_fastqout)
+    {
+      progress_init("Writing FASTQ output file", clusters);
+
+      int64_t relabel_count = 0;
+      for (uint64_t i=0; i<clusters; i++)
+        {
+          struct bucket * bp = hashtable + i;
+          int64_t size = bp->size;
+          if ((size >= opt_minuniquesize) && (size <= opt_maxuniquesize))
+            {
+              relabel_count++;
+              fastq_print_general(fp_fastqout,
+                                  bp->seq,
+                                  strlen(bp->seq),
+                                  bp->header,
+                                  strlen(bp->header),
+                                  bp->qual,
+                                  size,
+                                  relabel_count,
+                                  -1.0);
+              if (relabel_count == opt_topn)
+                {
+                  break;
+                }
+            }
+          progress_update(i);
+        }
+
+      progress_done();
+      fclose(fp_fastqout);
     }
 
   show_rusage();
@@ -775,6 +961,46 @@ void derep(char * input_filename, bool use_header)
       progress_done();
     }
 
+  if (opt_tabbedout)
+    {
+      progress_init("Writing tab separated file", clusters);
+      for (uint64_t i=0; i<clusters; i++)
+        {
+          struct bucket * bp = hashtable + i;
+          char * hh =  bp->header;
+
+          if (opt_relabel)
+            fprintf(fp_tabbedout,
+                    "%s\t%s%" PRIu64 "\t%" PRIu64 "\t%" PRIu64 "\t%u\t%s\n",
+                    hh, opt_relabel, i + 1, i, (uint64_t) 0, bp->count, hh);
+          else
+            fprintf(fp_tabbedout,
+                    "%s\t%s\t%" PRIu64 "\t%" PRIu64 "\t%u\t%s\n",
+                    hh, hh, i, (uint64_t) 0, bp->count, hh);
+
+          uint64_t j = 1;
+          for (unsigned int next = nextseqtab[bp->seqno_first];
+               next != terminal;
+               next = nextseqtab[next])
+            {
+              if (opt_relabel)
+                fprintf(fp_tabbedout,
+                        "%s\t%s%" PRIu64 "\t%" PRIu64 "\t%" PRIu64 "\t%u\t%s\n",
+                        headertab[next], opt_relabel, i + 1, i, j, bp->count, hh);
+              else
+                fprintf(fp_tabbedout,
+                        "%s\t%s\t%" PRIu64 "\t%" PRIu64 "\t%u\t%s\n",
+                        headertab[next], hh, i, j, bp->count, hh);
+              j++;
+            }
+
+          progress_update(i);
+        }
+      fclose(fp_tabbedout);
+      progress_done();
+    }
+
+
   show_rusage();
 
   if (selected < clusters)
@@ -809,6 +1035,8 @@ void derep(char * input_filename, bool use_header)
         {
           xfree(bp->seq);
           xfree(bp->header);
+          if (bp->qual)
+            xfree(bp->qual);
         }
     }
 
@@ -838,6 +1066,11 @@ void derep_prefix()
   FILE * fp_output = nullptr;
   FILE * fp_uc = nullptr;
 
+  if (opt_strand > 1)
+    {
+      fatal("Option '--strand both' not supported with --derep_prefix");
+    }
+
   if (opt_output)
     {
       fp_output = fopen_output(opt_output);
@@ -1233,13 +1466,3 @@ void derep_prefix()
   xfree(hashtable);
   db_free();
 }
-
-void derep_fulllength()
-{
-  derep(opt_derep_fulllength, false);
-}
-
-void derep_id()
-{
-  derep(opt_derep_id, true);
-}


=====================================
src/derep.h
=====================================
@@ -58,6 +58,5 @@
 
 */
 
-void derep_fulllength();
-void derep_id();
+void derep(char * input_filename, bool use_header);
 void derep_prefix();


=====================================
src/eestats.cc
=====================================
@@ -117,6 +117,9 @@ int64_t ee_start(int pos, int resolution)
 
 void fastq_eestats()
 {
+  if (!opt_output)
+    fatal("Output file for fastq_eestats must be specified with --output");
+
   fastx_handle h = fastq_open(opt_fastq_eestats);
 
   uint64_t filesize = fastq_get_size(h);
@@ -436,6 +439,9 @@ void fastq_eestats()
 
 void fastq_eestats2()
 {
+  if (!opt_output)
+    fatal("Output file for fastq_eestats2 must be specified with --output");
+
   fastx_handle h = fastq_open(opt_fastq_eestats2);
 
   uint64_t filesize = fastq_get_size(h);


=====================================
src/fasta.cc
=====================================
@@ -401,6 +401,11 @@ void fasta_print_general(FILE * fp,
       fprintf(fp, "%s", opt_label_suffix);
     }
 
+  if (opt_sample)
+    {
+      fprintf(fp, ";sample=%s", opt_sample);
+    }
+
   if (clustersize > 0)
     {
       fprintf(fp, ";seqs=%d", clustersize);


=====================================
src/fastq.cc
=====================================
@@ -557,6 +557,11 @@ void fastq_print_general(FILE * fp,
       fprintf(fp, "%s", opt_label_suffix);
     }
 
+  if (opt_sample)
+    {
+      fprintf(fp, ";sample=%s", opt_sample);
+    }
+
   if (opt_sizeout && (abundance > 0))
     {
       fprintf(fp, ";size=%u", abundance);


=====================================
src/fastqops.cc
=====================================
@@ -732,6 +732,9 @@ void fastx_revcomp()
   char * seq_buffer = (char*) xmalloc(buffer_alloc);
   char * qual_buffer = (char*) xmalloc(buffer_alloc);
 
+  if ((!opt_fastaout) && (!opt_fastqout))
+    fatal("No output files specified");
+
   fastx_handle h = fastx_open(opt_fastx_revcomp);
 
   if (!h)
@@ -867,6 +870,9 @@ void fastx_revcomp()
 
 void fastq_convert()
 {
+  if (! opt_fastqout)
+    fatal("No output file specified with --fastqout");
+
   fastx_handle h = fastq_open(opt_fastq_convert);
 
   if (!h)


=====================================
src/mask.cc
=====================================
@@ -254,6 +254,9 @@ void hardmask_all()
 
 void maskfasta()
 {
+  if (!opt_output)
+    fatal("Output file for masking must be specified with --output");
+
   FILE * fp_output = fopen_output(opt_output);
   if (!fp_output)
     {
@@ -293,6 +296,9 @@ void fastx_mask()
   FILE * fp_fastaout = nullptr;
   FILE * fp_fastqout = nullptr;
 
+  if ((!opt_fastaout) && (!opt_fastqout))
+    fatal("Specify output files for masking with --fastaout and/or --fastqout");
+
   if (opt_fastaout)
     {
       fp_fastaout = fopen_output(opt_fastaout);


=====================================
src/mergepairs.cc
=====================================
@@ -1125,6 +1125,10 @@ inline void chunk_perform_read()
       if (r < chunk_size)
         {
           finished_reading = true;
+          if (pairs_written >= pairs_read)
+            {
+              finished_all = true;
+            }
         }
       xpthread_cond_broadcast(&cond_chunks);
     }


=====================================
src/rerep.cc
=====================================
@@ -63,6 +63,9 @@
 
 void rereplicate()
 {
+  if (!opt_output)
+    fatal("FASTA output file for rereplicate must be specified with --output");
+
   opt_xsize = true;
 
   FILE * fp_output = nullptr;


=====================================
src/results.cc
=====================================
@@ -114,6 +114,65 @@ void results_show_fastapairs_one(FILE * fp,
 }
 
 
+void results_show_qsegout_one(FILE * fp,
+                              struct hit * hp,
+                              char * query_head,
+                              char * qsequence,
+                              int64_t qseqlen,
+                              char * rc)
+{
+  if (hp)
+    {
+      char * qseg = (hp->strand ? rc : qsequence) + hp->trim_q_left;
+      int qseglen = qseqlen
+        - hp->trim_q_left - hp->trim_q_right;
+
+      fasta_print_general(fp,
+                          nullptr,
+                          qseg,
+                          qseglen,
+                          query_head,
+                          strlen(query_head),
+                          0,
+                          0,
+                          -1.0,
+                          -1,
+                          -1,
+                          nullptr,
+                          0.0);
+    }
+}
+
+void results_show_tsegout_one(FILE * fp,
+                              struct hit * hp,
+                              char * query_head,
+                              char * qsequence,
+                              int64_t qseqlen,
+                              char * rc)
+{
+  if (hp)
+    {
+      char * tseg = db_getsequence(hp->target) + hp->trim_t_left;
+      int tseglen = db_getsequencelen(hp->target)
+        - hp->trim_t_left - hp->trim_t_right;
+
+      fasta_print_general(fp,
+                          nullptr,
+                          tseg,
+                          tseglen,
+                          db_getheader(hp->target),
+                          db_getheaderlen(hp->target),
+                          0,
+                          0,
+                          -1.0,
+                          -1,
+                          -1,
+                          nullptr,
+                          0.0);
+    }
+}
+
+
 void results_show_blast6out_one(FILE * fp,
                                 struct hit * hp,
                                 char * query_head,
@@ -455,6 +514,82 @@ void results_show_userout_one(FILE * fp, struct hit * hp,
   fprintf(fp, "\n");
 }
 
+void results_show_lcaout(FILE * fp,
+                         struct hit * hits,
+                         int hitcount,
+                         char * query_head,
+                         char * qsequence,
+                         int64_t qseqlen,
+                         char * rc)
+{
+  /* Output last common ancestor (LCA) of the hits,
+     in a similar way to the Sintax command */
+
+  int first_level_start[tax_levels];
+  int first_level_len[tax_levels];
+  int level_match[tax_levels];
+  char * first_h = nullptr;
+
+  fprintf(fp, "%s\t", query_head);
+
+  if (hitcount > 0)
+    {
+      for (int t = 0; t < hitcount; t++)
+        {
+          int seqno = hits[t].target;
+          if (t == 0)
+            {
+              tax_split(seqno, first_level_start, first_level_len);
+              first_h = db_getheader(seqno);
+              for (int j = 0; j < tax_levels; j++)
+                {
+                  level_match[j] = 1;
+                }
+            }
+          else
+            {
+              int level_start[tax_levels];
+              int level_len[tax_levels];
+              tax_split(seqno, level_start, level_len);
+              char * h = db_getheader(seqno);
+              for (int j = 0; j < tax_levels; j++)
+                {
+                  /* For each taxonomic level */
+                  if ((level_len[j] == first_level_len[j]) &&
+                      (strncmp(first_h + first_level_start[j],
+                               h + level_start[j],
+                               level_len[j]) == 0))
+                    {
+                      level_match[j]++;
+                    }
+                }
+            }
+        }
+
+      bool comma = false;
+      for (int j = 0; j < tax_levels; j++)
+        {
+          if (1.0 * level_match[j] / hitcount < opt_lca_cutoff)
+            {
+              break;
+            }
+
+          if (first_level_len[j] > 0)
+            {
+              fprintf(fp,
+                      "%s%c:%.*s",
+                      (comma ? "," : ""),
+                      tax_letters[j],
+                      first_level_len[j],
+                      first_h + first_level_start[j]);
+              comma = true;
+            }
+        }
+    }
+
+  fprintf(fp, "\n");
+}
+
 void results_show_alnout(FILE * fp,
                          struct hit * hits,
                          int hitcount,


=====================================
src/results.h
=====================================
@@ -66,6 +66,14 @@ void results_show_alnout(FILE * fp,
                          int64_t qseqlen,
                          char * rc);
 
+void results_show_lcaout(FILE * fp,
+                         struct hit * hits,
+                         int hitcount,
+                         char * query_head,
+                         char * qsequence,
+                         int64_t qseqlen,
+                         char * rc);
+
 void results_show_blast6out_one(FILE * fp,
                                 struct hit * hp,
                                 char * query_head,
@@ -95,6 +103,20 @@ void results_show_fastapairs_one(FILE * fp,
                                  int64_t qseqlen,
                                  char * rc);
 
+void results_show_qsegout_one(FILE * fp,
+                              struct hit * hp,
+                              char * query_head,
+                              char * qsequence,
+                              int64_t qseqlen,
+                              char * rc);
+
+void results_show_tsegout_one(FILE * fp,
+                              struct hit * hp,
+                              char * query_head,
+                              char * qsequence,
+                              int64_t qseqlen,
+                              char * rc);
+
 void results_show_samheader(FILE * fp,
                             char * cmdline,
                             char * dbname);


=====================================
src/search.cc
=====================================
@@ -91,6 +91,9 @@ static FILE * fp_dbnotmatched = nullptr;
 static FILE * fp_otutabout = nullptr;
 static FILE * fp_mothur_shared_out = nullptr;
 static FILE * fp_biomout = nullptr;
+static FILE * fp_lcaout = nullptr;
+static FILE * fp_qsegout = nullptr;
+static FILE * fp_tsegout = nullptr;
 
 static int count_matched = 0;
 static int count_notmatched = 0;
@@ -119,6 +122,17 @@ void search_output_results(int hit_count,
                           qsequence_rc);
     }
 
+  if (fp_lcaout)
+    {
+      results_show_lcaout(fp_lcaout,
+                          hits,
+                          toreport,
+                          query_head,
+                          qsequence,
+                          qseqlen,
+                          qsequence_rc);
+    }
+
   if (fp_samout)
     {
       results_show_samout(fp_samout,
@@ -160,6 +174,26 @@ void search_output_results(int hit_count,
                                           qsequence_rc);
             }
 
+          if (fp_qsegout)
+            {
+              results_show_qsegout_one(fp_qsegout,
+                                       hp,
+                                       query_head,
+                                       qsequence,
+                                       qseqlen,
+                                       qsequence_rc);
+            }
+
+          if (fp_tsegout)
+            {
+              results_show_tsegout_one(fp_tsegout,
+                                       hp,
+                                       query_head,
+                                       qsequence,
+                                       qseqlen,
+                                       qsequence_rc);
+            }
+
           if (fp_uc)
             {
               if ((t==0) || opt_uc_allhits)
@@ -535,6 +569,15 @@ void search_prep(char * cmdline, char * progheader)
       fprintf(fp_alnout, "%s\n", progheader);
     }
 
+  if (opt_lcaout)
+    {
+      fp_lcaout = fopen_output(opt_lcaout);
+      if (! fp_lcaout)
+        {
+          fatal("Unable to open lca output file for writing");
+        }
+    }
+
   if (opt_samout)
     {
       fp_samout = fopen_output(opt_samout);
@@ -580,6 +623,24 @@ void search_prep(char * cmdline, char * progheader)
         }
     }
 
+  if (opt_qsegout)
+    {
+      fp_qsegout = fopen_output(opt_qsegout);
+      if (! fp_qsegout)
+        {
+          fatal("Unable to open qsegout output file for writing");
+        }
+    }
+
+  if (opt_tsegout)
+    {
+      fp_tsegout = fopen_output(opt_tsegout);
+      if (! fp_tsegout)
+        {
+          fatal("Unable to open tsegout output file for writing");
+        }
+    }
+
   if (opt_matched)
     {
       fp_matched = fopen_output(opt_matched);
@@ -689,6 +750,10 @@ void search_done()
   dbindex_free();
   db_free();
 
+  if (opt_lcaout)
+    {
+      fclose(fp_lcaout);
+    }
   if (opt_matched)
     {
       fclose(fp_matched);
@@ -701,6 +766,14 @@ void search_done()
     {
       fclose(fp_fastapairs);
     }
+  if (opt_qsegout)
+    {
+      fclose(fp_qsegout);
+    }
+  if (opt_tsegout)
+    {
+      fclose(fp_tsegout);
+    }
   if (fp_uc)
     {
       fclose(fp_uc);


=====================================
src/searchexact.cc
=====================================
@@ -89,6 +89,8 @@ static FILE * fp_dbnotmatched = nullptr;
 static FILE * fp_otutabout = nullptr;
 static FILE * fp_mothur_shared_out = nullptr;
 static FILE * fp_biomout = nullptr;
+static FILE * fp_qsegout = nullptr;
+static FILE * fp_tsegout = nullptr;
 
 static int count_matched = 0;
 static int count_notmatched = 0;
@@ -234,6 +236,26 @@ void search_exact_output_results(int hit_count,
                                           qsequence_rc);
             }
 
+          if (fp_qsegout)
+            {
+              results_show_qsegout_one(fp_qsegout,
+                                       hp,
+                                       query_head,
+                                       qsequence,
+                                       qseqlen,
+                                       qsequence_rc);
+            }
+
+          if (fp_tsegout)
+            {
+              results_show_tsegout_one(fp_tsegout,
+                                       hp,
+                                       query_head,
+                                       qsequence,
+                                       qseqlen,
+                                       qsequence_rc);
+            }
+
           if (fp_uc)
             {
               if ((t==0) || opt_uc_allhits)
@@ -622,6 +644,24 @@ void search_exact_prep(char * cmdline, char * progheader)
         }
     }
 
+  if (opt_qsegout)
+    {
+      fp_qsegout = fopen_output(opt_qsegout);
+      if (! fp_qsegout)
+        {
+          fatal("Unable to open qsegout output file for writing");
+        }
+    }
+
+  if (opt_tsegout)
+    {
+      fp_tsegout = fopen_output(opt_tsegout);
+      if (! fp_tsegout)
+        {
+          fatal("Unable to open tsegout output file for writing");
+        }
+    }
+
   if (opt_matched)
     {
       fp_matched = fopen_output(opt_matched);
@@ -740,6 +780,14 @@ void search_exact_done()
     {
       fclose(fp_fastapairs);
     }
+  if (opt_qsegout)
+    {
+      fclose(fp_qsegout);
+    }
+  if (opt_tsegout)
+    {
+      fclose(fp_tsegout);
+    }
   if (fp_uc)
     {
       fclose(fp_uc);


=====================================
src/shuffle.cc
=====================================
@@ -62,6 +62,9 @@
 
 void shuffle()
 {
+  if (!opt_output)
+    fatal("Output file for shuffling must be specified with --output");
+
   FILE * fp_output = fopen_output(opt_output);
   if (!fp_output)
     {


=====================================
src/sintax.cc
=====================================
@@ -85,8 +85,6 @@ static int seqcount; /* number of database sequences */
 static pthread_attr_t attr;
 static fastx_handle query_fastx_h;
 
-const int tax_levels = 8;
-const char * tax_letters = "dkpcofgs";
 const int subset_size = 32;
 const int bootstrap_count = 100;
 
@@ -97,128 +95,6 @@ static FILE * fp_tabbedout;
 static int queries = 0;
 static int classified = 0;
 
-bool sintax_parse_tax(const char * header,
-                      int header_length,
-                      int * tax_start,
-                      int * tax_end)
-{
-  /*
-    Identify the first occurence of the pattern (^|;)tax=([^;]*)(;|$)
-  */
-
-  if (! header)
-    {
-      return false;
-    }
-
-  const char * attribute = "tax=";
-
-  int hlen = header_length;
-  int alen = strlen(attribute);
-
-  int i = 0;
-
-  while (i < hlen - alen)
-    {
-      char * r = (char *) strstr(header + i, attribute);
-
-      /* no match */
-      if (r == nullptr)
-        {
-          break;
-        }
-
-      i = r - header;
-
-      /* check for ';' in front */
-      if ((i > 0) && (header[i-1] != ';'))
-        {
-          i += alen + 1;
-          continue;
-        }
-
-      * tax_start = i;
-
-      /* find end (semicolon or end of header) */
-      const char * s = strchr(header+i+alen, ';');
-      if (s == nullptr)
-        {
-          * tax_end = hlen;
-        }
-      else
-        {
-          * tax_end = s - header;
-        }
-
-      return true;
-    }
-  return false;
-}
-
-void sintax_split(int seqno, int * level_start, int * level_len)
-{
-  /* Parse taxonomy string into the following parts
-     d domain
-     k kingdom
-     p phylum
-     c class
-     o order
-     f family
-     g genus
-     s species
-  */
-
-  for (int i = 0; i < tax_levels; i++)
-    {
-      level_start[i] = 0;
-      level_len[i] = 0;
-    }
-
-  int tax_start, tax_end;
-  char * h = db_getheader(seqno);
-  int hlen = db_getheaderlen(seqno);
-  if (sintax_parse_tax(h, hlen, & tax_start, & tax_end))
-    {
-      int t = tax_start + 4;
-
-      while (t < tax_end)
-        {
-          /* Is the next char a recogized tax level letter? */
-          const char * r = strchr(tax_letters, tolower(h[t]));
-          if (r)
-            {
-              int level = r - tax_letters;
-
-              /* Is there a colon after it? */
-              if (h[t + 1] == ':')
-                {
-                  level_start[level] = t + 2;
-
-                  char * z = strchr(h + t + 2, ',');
-                  if (z)
-                    {
-                      level_len[level] = z - h - t - 2;
-                    }
-                  else
-                    {
-                      level_len[level] = tax_end - t - 2;
-                    }
-                }
-            }
-
-          /* skip past next comma */
-          char * x = strchr(h + t, ',');
-          if (x)
-            {
-              t = x - h + 1;
-            }
-          else
-            {
-              t = tax_end;
-            }
-        }
-    }
-}
 
 void sintax_analyse(char * query_head,
                     int strand,
@@ -236,7 +112,7 @@ void sintax_analyse(char * query_head,
     {
       char * best_h = db_getheader(best_seqno);
 
-      sintax_split(best_seqno, best_level_start, best_level_len);
+      tax_split(best_seqno, best_level_start, best_level_len);
 
       for (int & j :
              level_match)
@@ -250,7 +126,7 @@ void sintax_analyse(char * query_head,
 
           int level_start[tax_levels];
           int level_len[tax_levels];
-          sintax_split(all_seqno[i], level_start, level_len);
+          tax_split(all_seqno[i], level_start, level_len);
 
           char * h = db_getheader(all_seqno[i]);
 


=====================================
src/sortbylength.cc
=====================================
@@ -118,6 +118,9 @@ int sortbylength_compare(const void * a, const void * b)
 
 void sortbylength()
 {
+  if (!opt_output)
+    fatal("FASTA output file for sortbylength must be specified with --output");
+
   FILE * fp_output = fopen_output(opt_output);
   if (!fp_output)
     {


=====================================
src/sortbysize.cc
=====================================
@@ -108,6 +108,9 @@ int sortbysize_compare(const void * a, const void * b)
 
 void sortbysize()
 {
+  if (!opt_output)
+    fatal("FASTA output file for sortbysize must be specified with --output");
+
   FILE * fp_output = fopen_output(opt_output);
   if (!fp_output)
     {


=====================================
src/tax.cc
=====================================
@@ -0,0 +1,186 @@
+/*
+
+  VSEARCH: a versatile open source tool for metagenomics
+
+  Copyright (C) 2014-2021, Torbjorn Rognes, Frederic Mahe and Tomas Flouri
+  All rights reserved.
+
+  Contact: Torbjorn Rognes <torognes at ifi.uio.no>,
+  Department of Informatics, University of Oslo,
+  PO Box 1080 Blindern, NO-0316 Oslo, Norway
+
+  This software is dual-licensed and available under a choice
+  of one of two licenses, either under the terms of the GNU
+  General Public License version 3 or the BSD 2-Clause License.
+
+
+  GNU General Public License version 3
+
+  This program is free software: you can redistribute it and/or modify
+  it under the terms of the GNU General Public License as published by
+  the Free Software Foundation, either version 3 of the License, or
+  (at your option) any later version.
+
+  This program is distributed in the hope that it will be useful,
+  but WITHOUT ANY WARRANTY; without even the implied warranty of
+  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+  GNU General Public License for more details.
+
+  You should have received a copy of the GNU General Public License
+  along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+
+  The BSD 2-Clause License
+
+  Redistribution and use in source and binary forms, with or without
+  modification, are permitted provided that the following conditions
+  are met:
+
+  1. Redistributions of source code must retain the above copyright
+  notice, this list of conditions and the following disclaimer.
+
+  2. Redistributions in binary form must reproduce the above copyright
+  notice, this list of conditions and the following disclaimer in the
+  documentation and/or other materials provided with the distribution.
+
+  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+  FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+  COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+  INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+  BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+  LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+  CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+  LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+  ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+  POSSIBILITY OF SUCH DAMAGE.
+
+*/
+
+#include "vsearch.h"
+
+const char * tax_letters = "dkpcofgs";
+
+bool tax_parse(const char * header,
+               int header_length,
+               int * tax_start,
+               int * tax_end)
+{
+  /*
+    Identify the first occurence of the pattern (^|;)tax=([^;]*)(;|$)
+  */
+
+  if (! header)
+    {
+      return false;
+    }
+
+  const char * attribute = "tax=";
+
+  int hlen = header_length;
+  int alen = strlen(attribute);
+
+  int i = 0;
+
+  while (i < hlen - alen)
+    {
+      char * r = (char *) strstr(header + i, attribute);
+
+      /* no match */
+      if (r == nullptr)
+        {
+          break;
+        }
+
+      i = r - header;
+
+      /* check for ';' in front */
+      if ((i > 0) && (header[i-1] != ';'))
+        {
+          i += alen + 1;
+          continue;
+        }
+
+      * tax_start = i;
+
+      /* find end (semicolon or end of header) */
+      const char * s = strchr(header+i+alen, ';');
+      if (s == nullptr)
+        {
+          * tax_end = hlen;
+        }
+      else
+        {
+          * tax_end = s - header;
+        }
+
+      return true;
+    }
+  return false;
+}
+
+void tax_split(int seqno, int * level_start, int * level_len)
+{
+  /* Parse taxonomy string into the following parts
+     d domain
+     k kingdom
+     p phylum
+     c class
+     o order
+     f family
+     g genus
+     s species
+  */
+
+  for (int i = 0; i < tax_levels; i++)
+    {
+      level_start[i] = 0;
+      level_len[i] = 0;
+    }
+
+  int tax_start, tax_end;
+  char * h = db_getheader(seqno);
+  int hlen = db_getheaderlen(seqno);
+  if (tax_parse(h, hlen, & tax_start, & tax_end))
+    {
+      int t = tax_start + 4;
+
+      while (t < tax_end)
+        {
+          /* Is the next char a recogized tax level letter? */
+          const char * r = strchr(tax_letters, tolower(h[t]));
+          if (r)
+            {
+              int level = r - tax_letters;
+
+              /* Is there a colon after it? */
+              if (h[t + 1] == ':')
+                {
+                  level_start[level] = t + 2;
+
+                  char * z = strchr(h + t + 2, ',');
+                  if (z)
+                    {
+                      level_len[level] = z - h - t - 2;
+                    }
+                  else
+                    {
+                      level_len[level] = tax_end - t - 2;
+                    }
+                }
+            }
+
+          /* skip past next comma */
+          char * x = strchr(h + t, ',');
+          if (x)
+            {
+              t = x - h + 1;
+            }
+          else
+            {
+              t = tax_end;
+            }
+        }
+    }
+}


=====================================
src/tax.h
=====================================
@@ -0,0 +1,69 @@
+/*
+
+  VSEARCH: a versatile open source tool for metagenomics
+
+  Copyright (C) 2014-2021, Torbjorn Rognes, Frederic Mahe and Tomas Flouri
+  All rights reserved.
+
+  Contact: Torbjorn Rognes <torognes at ifi.uio.no>,
+  Department of Informatics, University of Oslo,
+  PO Box 1080 Blindern, NO-0316 Oslo, Norway
+
+  This software is dual-licensed and available under a choice
+  of one of two licenses, either under the terms of the GNU
+  General Public License version 3 or the BSD 2-Clause License.
+
+
+  GNU General Public License version 3
+
+  This program is free software: you can redistribute it and/or modify
+  it under the terms of the GNU General Public License as published by
+  the Free Software Foundation, either version 3 of the License, or
+  (at your option) any later version.
+
+  This program is distributed in the hope that it will be useful,
+  but WITHOUT ANY WARRANTY; without even the implied warranty of
+  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+  GNU General Public License for more details.
+
+  You should have received a copy of the GNU General Public License
+  along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+
+  The BSD 2-Clause License
+
+  Redistribution and use in source and binary forms, with or without
+  modification, are permitted provided that the following conditions
+  are met:
+
+  1. Redistributions of source code must retain the above copyright
+  notice, this list of conditions and the following disclaimer.
+
+  2. Redistributions in binary form must reproduce the above copyright
+  notice, this list of conditions and the following disclaimer in the
+  documentation and/or other materials provided with the distribution.
+
+  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+  FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+  COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+  INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+  BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+  LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+  CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+  LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+  ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+  POSSIBILITY OF SUCH DAMAGE.
+
+*/
+
+const int tax_levels = 8;
+extern const char * tax_letters;
+
+bool tax_parse(const char * header,
+               int header_length,
+               int * tax_start,
+               int * tax_end);
+
+void tax_split(int seqno, int * level_start, int * level_len);


=====================================
src/udb.cc
=====================================
@@ -151,13 +151,16 @@ uint64_t largewrite(int fd, void * buf, uint64_t nbyte, uint64_t offset)
   return nbyte;
 }
 
-bool udb_detect_isudb(const char * filename)
+auto udb_detect_isudb(const char * filename) -> bool
 {
   /*
     Detect whether the given filename seems to refer to an UDB file.
     It must be an uncompressed regular file, not a pipe.
   */
 
+  constexpr uint32_t udb_file_signature {0x55444246};
+  constexpr uint64_t expected_n_bytes {sizeof(uint32_t)};
+
   xstat_t fs;
 
   if (xstat(filename, & fs))
@@ -179,10 +182,10 @@ bool udb_detect_isudb(const char * filename)
     }
 
   unsigned int magic = 0;
-  uint64_t bytesread = read(fd, & magic, 4);
+  uint64_t bytesread = read(fd, & magic, expected_n_bytes);
   close(fd);
 
-  if ((bytesread == 4) && (magic == 0x55444246))
+  if ((bytesread == expected_n_bytes) && (magic == udb_file_signature))
     {
       return true;
     }
@@ -602,6 +605,9 @@ void udb_read(const char * filename,
 
 void udb_fasta()
 {
+  if (!opt_output)
+    fatal("FASTA output file must be specified with --output");
+
   /* open FASTA file for writing */
 
   FILE * fp_output = fopen_output(opt_output);
@@ -864,6 +870,9 @@ void udb_stats()
 
 void udb_make()
 {
+  if (!opt_output)
+    fatal("UDB output file must be specified with --output");
+
   int fd_output = 0;
 
   fd_output = xopen_write(opt_output);


=====================================
src/vsearch.cc
=====================================
@@ -73,6 +73,7 @@ bool opt_fastq_nostagger;
 bool opt_gzip_decompress;
 bool opt_label_substr_match;
 bool opt_no_progress;
+bool opt_fastq_qout_max;
 bool opt_quiet;
 bool opt_relabel_keep;
 bool opt_relabel_md5;
@@ -134,6 +135,7 @@ char * opt_fastx_getsubseq;
 char * opt_fastx_mask;
 char * opt_fastx_revcomp;
 char * opt_fastx_subsample;
+char * opt_fastx_uniques;
 char * opt_join_padgap;
 char * opt_join_padgapq;
 char * opt_label;
@@ -142,6 +144,7 @@ char * opt_label_suffix;
 char * opt_label_word;
 char * opt_label_words;
 char * opt_label_field;
+char * opt_lcaout;
 char * opt_log;
 char * opt_makeudb_usearch;
 char * opt_maskfasta;
@@ -156,10 +159,12 @@ char * opt_otutabout;
 char * opt_output;
 char * opt_pattern;
 char * opt_profile;
+char * opt_qsegout;
 char * opt_relabel;
 char * opt_rereplicate;
 char * opt_reverse;
 char * opt_samout;
+char * opt_sample;
 char * opt_search_exact;
 char * opt_sff_convert;
 char * opt_shuffle;
@@ -167,6 +172,7 @@ char * opt_sintax;
 char * opt_sortbylength;
 char * opt_sortbysize;
 char * opt_tabbedout;
+char * opt_tsegout;
 char * opt_udb2fasta;
 char * opt_udbinfo;
 char * opt_udbstats;
@@ -187,6 +193,7 @@ double opt_fastq_maxee;
 double opt_fastq_maxee_rate;
 double opt_fastq_truncee;
 double opt_id;
+double opt_lca_cutoff;
 double opt_max_unmasked_pct;
 double opt_maxid;
 double opt_maxqt;
@@ -803,6 +810,7 @@ void args_init(int argc, char **argv)
   opt_fastq_qmaxout = 41;
   opt_fastq_qmin = 0;
   opt_fastq_qminout = 0;
+  opt_fastq_qout_max = false;
   opt_fastq_stats = nullptr;
   opt_fastq_stripleft = 0;
   opt_fastq_stripright = 0;
@@ -855,6 +863,8 @@ void args_init(int argc, char **argv)
   opt_length_cutoffs_increment = 50;
   opt_length_cutoffs_longest = INT_MAX;
   opt_length_cutoffs_shortest = 50;
+  opt_lca_cutoff = 1.0;
+  opt_lcaout = nullptr;
   opt_log = nullptr;
   opt_makeudb_usearch = nullptr;
   opt_maskfasta = nullptr;
@@ -904,6 +914,7 @@ void args_init(int argc, char **argv)
   opt_pattern = nullptr;
   opt_profile = nullptr;
   opt_qmask = MASK_DUST;
+  opt_qsegout = nullptr;
   opt_query_cov = 0.0;
   opt_quiet = false;
   opt_randseed = 0;
@@ -918,6 +929,7 @@ void args_init(int argc, char **argv)
   opt_rowlen = 64;
   opt_samheader = false;
   opt_samout = nullptr;
+  opt_sample = nullptr;
   opt_sample_pct = 0;
   opt_sample_size = 0;
   opt_search_exact = nullptr;
@@ -942,6 +954,7 @@ void args_init(int argc, char **argv)
   opt_threads = 0;
   opt_top_hits_only = 0;
   opt_topn = LONG_MAX;
+  opt_tsegout = nullptr;
   opt_udb2fasta = nullptr;
   opt_udbinfo = nullptr;
   opt_udbstats = nullptr;
@@ -1039,6 +1052,7 @@ void args_init(int argc, char **argv)
       option_fastq_qmaxout,
       option_fastq_qmin,
       option_fastq_qminout,
+      option_fastq_qout_max,
       option_fastq_stats,
       option_fastq_stripleft,
       option_fastq_stripright,
@@ -1060,6 +1074,7 @@ void args_init(int argc, char **argv)
       option_fastx_mask,
       option_fastx_revcomp,
       option_fastx_subsample,
+      option_fastx_uniques,
       option_fulldp,
       option_gapext,
       option_gapopen,
@@ -1081,6 +1096,8 @@ void args_init(int argc, char **argv)
       option_label_word,
       option_label_words,
       option_labels,
+      option_lca_cutoff,
+      option_lcaout,
       option_leftjust,
       option_length_cutoffs,
       option_log,
@@ -1133,6 +1150,7 @@ void args_init(int argc, char **argv)
       option_pattern,
       option_profile,
       option_qmask,
+      option_qsegout,
       option_query_cov,
       option_quiet,
       option_randseed,
@@ -1147,6 +1165,7 @@ void args_init(int argc, char **argv)
       option_rowlen,
       option_samheader,
       option_samout,
+      option_sample,
       option_sample_pct,
       option_sample_size,
       option_search_exact,
@@ -1171,6 +1190,7 @@ void args_init(int argc, char **argv)
       option_threads,
       option_top_hits_only,
       option_topn,
+      option_tsegout,
       option_uc,
       option_uc_allhits,
       option_uchime2_denovo,
@@ -1270,6 +1290,7 @@ void args_init(int argc, char **argv)
       {"fastq_qmaxout",         required_argument, nullptr, 0 },
       {"fastq_qmin",            required_argument, nullptr, 0 },
       {"fastq_qminout",         required_argument, nullptr, 0 },
+      {"fastq_qout_max",        no_argument,       nullptr, 0 },
       {"fastq_stats",           required_argument, nullptr, 0 },
       {"fastq_stripleft",       required_argument, nullptr, 0 },
       {"fastq_stripright",      required_argument, nullptr, 0 },
@@ -1291,6 +1312,7 @@ void args_init(int argc, char **argv)
       {"fastx_mask",            required_argument, nullptr, 0 },
       {"fastx_revcomp",         required_argument, nullptr, 0 },
       {"fastx_subsample",       required_argument, nullptr, 0 },
+      {"fastx_uniques",         required_argument, nullptr, 0 },
       {"fulldp",                no_argument,       nullptr, 0 },
       {"gapext",                required_argument, nullptr, 0 },
       {"gapopen",               required_argument, nullptr, 0 },
@@ -1312,6 +1334,8 @@ void args_init(int argc, char **argv)
       {"label_word",            required_argument, nullptr, 0 },
       {"label_words",           required_argument, nullptr, 0 },
       {"labels",                required_argument, nullptr, 0 },
+      {"lca_cutoff",            required_argument, nullptr, 0 },
+      {"lcaout",                required_argument, nullptr, 0 },
       {"leftjust",              no_argument,       nullptr, 0 },
       {"length_cutoffs",        required_argument, nullptr, 0 },
       {"log",                   required_argument, nullptr, 0 },
@@ -1364,6 +1388,7 @@ void args_init(int argc, char **argv)
       {"pattern",               required_argument, nullptr, 0 },
       {"profile",               required_argument, nullptr, 0 },
       {"qmask",                 required_argument, nullptr, 0 },
+      {"qsegout",               required_argument, nullptr, 0 },
       {"query_cov",             required_argument, nullptr, 0 },
       {"quiet",                 no_argument,       nullptr, 0 },
       {"randseed",              required_argument, nullptr, 0 },
@@ -1378,6 +1403,7 @@ void args_init(int argc, char **argv)
       {"rowlen",                required_argument, nullptr, 0 },
       {"samheader",             no_argument,       nullptr, 0 },
       {"samout",                required_argument, nullptr, 0 },
+      {"sample",                required_argument, nullptr, 0 },
       {"sample_pct",            required_argument, nullptr, 0 },
       {"sample_size",           required_argument, nullptr, 0 },
       {"search_exact",          required_argument, nullptr, 0 },
@@ -1402,6 +1428,7 @@ void args_init(int argc, char **argv)
       {"threads",               required_argument, nullptr, 0 },
       {"top_hits_only",         no_argument,       nullptr, 0 },
       {"topn",                  required_argument, nullptr, 0 },
+      {"tsegout",               required_argument, nullptr, 0 },
       {"uc",                    required_argument, nullptr, 0 },
       {"uc_allhits",            no_argument,       nullptr, 0 },
       {"uchime2_denovo",        required_argument, nullptr, 0 },
@@ -2420,6 +2447,34 @@ void args_init(int argc, char **argv)
           opt_fasta2fastq = optarg;
           break;
 
+        case option_lcaout:
+          opt_lcaout = optarg;
+          break;
+
+        case option_lca_cutoff:
+          opt_lca_cutoff = args_getdouble(optarg);
+          break;
+
+        case option_fastx_uniques:
+          opt_fastx_uniques = optarg;
+          break;
+
+        case option_fastq_qout_max:
+          opt_fastq_qout_max = true;
+          break;
+
+        case option_sample:
+          opt_sample = optarg;
+          break;
+
+        case option_qsegout:
+          opt_qsegout = optarg;
+          break;
+
+        case option_tsegout:
+          opt_tsegout = optarg;
+          break;
+
         default:
           fatal("Internal error in option parsing");
         }
@@ -2466,6 +2521,7 @@ void args_init(int argc, char **argv)
       option_fastx_mask,
       option_fastx_revcomp,
       option_fastx_subsample,
+      option_fastx_uniques,
       option_h,
       option_help,
       option_makeudb_usearch,
@@ -2497,7 +2553,7 @@ void args_init(int argc, char **argv)
     The first line is the command and the lines below are the valid options.
   */
 
-  const int valid_options[][92] =
+  const int valid_options[][96] =
     {
       {
         option_allpairs_global,
@@ -2518,6 +2574,7 @@ void args_init(int argc, char **argv)
         option_iddef,
         option_idprefix,
         option_idsuffix,
+        option_label_suffix,
         option_leftjust,
         option_log,
         option_match,
@@ -2550,6 +2607,7 @@ void args_init(int argc, char **argv)
         option_output_no_hits,
         option_pattern,
         option_qmask,
+        option_qsegout,
         option_query_cov,
         option_quiet,
         option_relabel,
@@ -2561,6 +2619,7 @@ void args_init(int argc, char **argv)
         option_rowlen,
         option_samheader,
         option_samout,
+        option_sample,
         option_self,
         option_selfid,
         option_sizein,
@@ -2569,6 +2628,7 @@ void args_init(int argc, char **argv)
         option_target_cov,
         option_threads,
         option_top_hits_only,
+        option_tsegout,
         option_uc,
         option_userfields,
         option_userout,
@@ -2603,6 +2663,7 @@ void args_init(int argc, char **argv)
         option_iddef,
         option_idprefix,
         option_idsuffix,
+        option_label_suffix,
         option_leftjust,
         option_log,
         option_match,
@@ -2639,6 +2700,7 @@ void args_init(int argc, char **argv)
         option_pattern,
         option_profile,
         option_qmask,
+        option_qsegout,
         option_query_cov,
         option_quiet,
         option_relabel,
@@ -2650,6 +2712,7 @@ void args_init(int argc, char **argv)
         option_rowlen,
         option_samheader,
         option_samout,
+        option_sample,
         option_self,
         option_selfid,
         option_sizein,
@@ -2660,6 +2723,7 @@ void args_init(int argc, char **argv)
         option_target_cov,
         option_threads,
         option_top_hits_only,
+        option_tsegout,
         option_uc,
         option_userfields,
         option_userout,
@@ -2694,6 +2758,7 @@ void args_init(int argc, char **argv)
         option_iddef,
         option_idprefix,
         option_idsuffix,
+        option_label_suffix,
         option_leftjust,
         option_log,
         option_match,
@@ -2730,6 +2795,7 @@ void args_init(int argc, char **argv)
         option_pattern,
         option_profile,
         option_qmask,
+        option_qsegout,
         option_query_cov,
         option_quiet,
         option_relabel,
@@ -2741,6 +2807,7 @@ void args_init(int argc, char **argv)
         option_rowlen,
         option_samheader,
         option_samout,
+        option_sample,
         option_self,
         option_selfid,
         option_sizein,
@@ -2751,6 +2818,7 @@ void args_init(int argc, char **argv)
         option_target_cov,
         option_threads,
         option_top_hits_only,
+        option_tsegout,
         option_uc,
         option_userfields,
         option_userout,
@@ -2785,6 +2853,7 @@ void args_init(int argc, char **argv)
         option_iddef,
         option_idprefix,
         option_idsuffix,
+        option_label_suffix,
         option_leftjust,
         option_log,
         option_match,
@@ -2821,6 +2890,7 @@ void args_init(int argc, char **argv)
         option_pattern,
         option_profile,
         option_qmask,
+        option_qsegout,
         option_query_cov,
         option_quiet,
         option_relabel,
@@ -2832,6 +2902,7 @@ void args_init(int argc, char **argv)
         option_rowlen,
         option_samheader,
         option_samout,
+        option_sample,
         option_self,
         option_selfid,
         option_sizein,
@@ -2842,6 +2913,7 @@ void args_init(int argc, char **argv)
         option_target_cov,
         option_threads,
         option_top_hits_only,
+        option_tsegout,
         option_uc,
         option_userfields,
         option_userout,
@@ -2877,6 +2949,7 @@ void args_init(int argc, char **argv)
         option_iddef,
         option_idprefix,
         option_idsuffix,
+        option_label_suffix,
         option_leftjust,
         option_log,
         option_match,
@@ -2911,6 +2984,7 @@ void args_init(int argc, char **argv)
         option_notrunclabels,
         option_otutabout,
         option_output_no_hits,
+        option_qsegout,
         option_pattern,
         option_profile,
         option_qmask,
@@ -2925,6 +2999,7 @@ void args_init(int argc, char **argv)
         option_rowlen,
         option_samheader,
         option_samout,
+        option_sample,
         option_self,
         option_selfid,
         option_sizein,
@@ -2935,6 +3010,7 @@ void args_init(int argc, char **argv)
         option_target_cov,
         option_threads,
         option_top_hits_only,
+        option_tsegout,
         option_uc,
         option_unoise_alpha,
         option_userfields,
@@ -2955,6 +3031,7 @@ void args_init(int argc, char **argv)
         option_fastaout_discarded_rev,
         option_fastaout_rev,
         option_gzip_decompress,
+        option_label_suffix,
         option_log,
         option_no_progress,
         option_notrunclabels,
@@ -2964,6 +3041,7 @@ void args_init(int argc, char **argv)
         option_relabel_md5,
         option_relabel_self,
         option_relabel_sha1,
+        option_sample,
         option_sizein,
         option_sizeout,
         option_xee,
@@ -2988,6 +3066,7 @@ void args_init(int argc, char **argv)
         option_relabel_md5,
         option_relabel_self,
         option_relabel_sha1,
+        option_sample,
         option_sizein,
         option_sizeout,
         option_strand,
@@ -3002,6 +3081,7 @@ void args_init(int argc, char **argv)
         option_bzip2_decompress,
         option_fasta_width,
         option_gzip_decompress,
+        option_label_suffix,
         option_log,
         option_maxseqlength,
         option_maxuniquesize,
@@ -3016,6 +3096,7 @@ void args_init(int argc, char **argv)
         option_relabel_md5,
         option_relabel_self,
         option_relabel_sha1,
+        option_sample,
         option_sizein,
         option_sizeout,
         option_strand,
@@ -3030,6 +3111,7 @@ void args_init(int argc, char **argv)
         option_bzip2_decompress,
         option_fasta_width,
         option_gzip_decompress,
+        option_label_suffix,
         option_log,
         option_maxseqlength,
         option_maxuniquesize,
@@ -3044,6 +3126,7 @@ void args_init(int argc, char **argv)
         option_relabel_md5,
         option_relabel_self,
         option_relabel_sha1,
+        option_sample,
         option_sizein,
         option_sizeout,
         option_strand,
@@ -3060,6 +3143,7 @@ void args_init(int argc, char **argv)
         option_fastq_qmaxout,
         option_fastqout,
         option_gzip_decompress,
+        option_label_suffix,
         option_log,
         option_no_progress,
         option_quiet,
@@ -3068,6 +3152,7 @@ void args_init(int argc, char **argv)
         option_relabel_md5,
         option_relabel_self,
         option_relabel_sha1,
+        option_sample,
         option_sizein,
         option_sizeout,
         option_threads,
@@ -3095,6 +3180,7 @@ void args_init(int argc, char **argv)
         option_fastq_qminout,
         option_fastqout,
         option_gzip_decompress,
+        option_label_suffix,
         option_log,
         option_no_progress,
         option_quiet,
@@ -3103,6 +3189,7 @@ void args_init(int argc, char **argv)
         option_relabel_md5,
         option_relabel_self,
         option_relabel_sha1,
+        option_sample,
         option_sizein,
         option_sizeout,
         option_threads,
@@ -3166,6 +3253,7 @@ void args_init(int argc, char **argv)
         option_fastqout_discarded_rev,
         option_fastqout_rev,
         option_gzip_decompress,
+        option_label_suffix,
         option_log,
         option_maxsize,
         option_minsize,
@@ -3177,6 +3265,7 @@ void args_init(int argc, char **argv)
         option_relabel_self,
         option_relabel_sha1,
         option_reverse,
+        option_sample,
         option_sizein,
         option_sizeout,
         option_threads,
@@ -3195,6 +3284,7 @@ void args_init(int argc, char **argv)
         option_gzip_decompress,
         option_join_padgap,
         option_join_padgapq,
+        option_label_suffix,
         option_log,
         option_no_progress,
         option_quiet,
@@ -3251,6 +3341,7 @@ void args_init(int argc, char **argv)
         option_relabel_self,
         option_relabel_sha1,
         option_reverse,
+        option_sample,
         option_sizein,
         option_sizeout,
         option_threads,
@@ -3299,6 +3390,7 @@ void args_init(int argc, char **argv)
         option_fastqout_discarded_rev,
         option_fastqout_rev,
         option_gzip_decompress,
+        option_label_suffix,
         option_log,
         option_maxsize,
         option_minsize,
@@ -3311,6 +3403,7 @@ void args_init(int argc, char **argv)
         option_relabel_self,
         option_relabel_sha1,
         option_reverse,
+        option_sample,
         option_sizein,
         option_sizeout,
         option_threads,
@@ -3329,6 +3422,7 @@ void args_init(int argc, char **argv)
         option_gzip_decompress,
         option_label,
         option_label_substr_match,
+        option_label_suffix,
         option_log,
         option_no_progress,
         option_notmatched,
@@ -3340,6 +3434,7 @@ void args_init(int argc, char **argv)
         option_relabel_md5,
         option_relabel_self,
         option_relabel_sha1,
+        option_sample,
         option_sizein,
         option_sizeout,
         option_threads,
@@ -3359,6 +3454,7 @@ void args_init(int argc, char **argv)
         option_label,
         option_label_field,
         option_label_substr_match,
+        option_label_suffix,
         option_label_word,
         option_label_words,
         option_labels,
@@ -3373,6 +3469,7 @@ void args_init(int argc, char **argv)
         option_relabel_md5,
         option_relabel_self,
         option_relabel_sha1,
+        option_sample,
         option_sizein,
         option_sizeout,
         option_threads,
@@ -3391,6 +3488,7 @@ void args_init(int argc, char **argv)
         option_gzip_decompress,
         option_label,
         option_label_substr_match,
+        option_label_suffix,
         option_log,
         option_no_progress,
         option_notmatched,
@@ -3402,6 +3500,7 @@ void args_init(int argc, char **argv)
         option_relabel_md5,
         option_relabel_self,
         option_relabel_sha1,
+        option_sample,
         option_sizein,
         option_sizeout,
         option_subseq_end,
@@ -3421,6 +3520,7 @@ void args_init(int argc, char **argv)
         option_fastqout,
         option_gzip_decompress,
         option_hardmask,
+        option_label_suffix,
         option_log,
         option_max_unmasked_pct,
         option_min_unmasked_pct,
@@ -3433,6 +3533,7 @@ void args_init(int argc, char **argv)
         option_relabel_md5,
         option_relabel_self,
         option_relabel_sha1,
+        option_sample,
         option_sizein,
         option_sizeout,
         option_threads,
@@ -3459,6 +3560,7 @@ void args_init(int argc, char **argv)
         option_relabel_md5,
         option_relabel_self,
         option_relabel_sha1,
+        option_sample,
         option_sizein,
         option_sizeout,
         option_threads,
@@ -3477,6 +3579,7 @@ void args_init(int argc, char **argv)
         option_fastqout,
         option_fastqout_discarded,
         option_gzip_decompress,
+        option_label_suffix,
         option_log,
         option_no_progress,
         option_notrunclabels,
@@ -3487,6 +3590,7 @@ void args_init(int argc, char **argv)
         option_relabel_md5,
         option_relabel_self,
         option_relabel_sha1,
+        option_sample,
         option_sample_pct,
         option_sample_size,
         option_sizein,
@@ -3496,6 +3600,45 @@ void args_init(int argc, char **argv)
         option_xsize,
         -1 },
 
+      { option_fastx_uniques,
+        option_bzip2_decompress,
+        option_fasta_width,
+        option_fastaout,
+        option_fastq_ascii,
+        option_fastq_asciiout,
+        option_fastq_qmax,
+        option_fastq_qmaxout,
+        option_fastq_qmin,
+        option_fastq_qminout,
+        option_fastq_qout_max,
+        option_fastqout,
+        option_gzip_decompress,
+        option_label_suffix,
+        option_log,
+        option_maxseqlength,
+        option_maxuniquesize,
+        option_minseqlength,
+        option_minuniquesize,
+        option_no_progress,
+        option_notrunclabels,
+        option_quiet,
+        option_relabel,
+        option_relabel_keep,
+        option_relabel_md5,
+        option_relabel_self,
+        option_relabel_sha1,
+        option_sample,
+        option_sizein,
+        option_sizeout,
+        option_strand,
+        option_tabbedout,
+        option_threads,
+        option_topn,
+        option_uc,
+        option_xee,
+        option_xsize,
+        -1 },
+
       { option_h,
         option_log,
         option_quiet,
@@ -3528,6 +3671,7 @@ void args_init(int argc, char **argv)
         option_fasta_width,
         option_gzip_decompress,
         option_hardmask,
+        option_label_suffix,
         option_log,
         option_max_unmasked_pct,
         option_maxseqlength,
@@ -3543,6 +3687,7 @@ void args_init(int argc, char **argv)
         option_relabel_md5,
         option_relabel_self,
         option_relabel_sha1,
+        option_sample,
         option_sizein,
         option_sizeout,
         option_threads,
@@ -3558,6 +3703,7 @@ void args_init(int argc, char **argv)
         option_fastaout,
         option_fastqout,
         option_gzip_decompress,
+        option_label_suffix,
         option_log,
         option_no_progress,
         option_notmatched,
@@ -3569,6 +3715,7 @@ void args_init(int argc, char **argv)
         option_relabel_md5,
         option_relabel_self,
         option_relabel_sha1,
+        option_sample,
         option_sizein,
         option_sizeout,
         option_tabbedout,
@@ -3582,6 +3729,7 @@ void args_init(int argc, char **argv)
         option_bzip2_decompress,
         option_fasta_width,
         option_gzip_decompress,
+        option_label_suffix,
         option_log,
         option_no_progress,
         option_notrunclabels,
@@ -3592,6 +3740,7 @@ void args_init(int argc, char **argv)
         option_relabel_md5,
         option_relabel_self,
         option_relabel_sha1,
+        option_sample,
         option_sizein,
         option_sizeout,
         option_threads,
@@ -3612,6 +3761,9 @@ void args_init(int argc, char **argv)
         option_fastapairs,
         option_gzip_decompress,
         option_hardmask,
+        option_label_suffix,
+        option_lca_cutoff,
+        option_lcaout,
         option_log,
         option_match,
         option_matched,
@@ -3635,6 +3787,7 @@ void args_init(int argc, char **argv)
         option_otutabout,
         option_output_no_hits,
         option_qmask,
+        option_qsegout,
         option_quiet,
         option_relabel,
         option_relabel_keep,
@@ -3644,12 +3797,14 @@ void args_init(int argc, char **argv)
         option_rowlen,
         option_samheader,
         option_samout,
+        option_sample,
         option_self,
         option_sizein,
         option_sizeout,
         option_strand,
         option_threads,
         option_top_hits_only,
+        option_tsegout,
         option_uc,
         option_uc_allhits,
         option_userfields,
@@ -3663,6 +3818,7 @@ void args_init(int argc, char **argv)
         option_fastq_qmaxout,
         option_fastq_qminout,
         option_fastqout,
+        option_label_suffix,
         option_log,
         option_no_progress,
         option_quiet,
@@ -3671,6 +3827,7 @@ void args_init(int argc, char **argv)
         option_relabel_md5,
         option_relabel_self,
         option_relabel_sha1,
+        option_sample,
         option_sff_clip,
         option_sizeout,
         option_threads,
@@ -3683,6 +3840,7 @@ void args_init(int argc, char **argv)
         option_fastq_qmax,
         option_fastq_qmin,
         option_gzip_decompress,
+        option_label_suffix,
         option_log,
         option_maxseqlength,
         option_minseqlength,
@@ -3696,6 +3854,7 @@ void args_init(int argc, char **argv)
         option_relabel_md5,
         option_relabel_self,
         option_relabel_sha1,
+        option_sample,
         option_sizein,
         option_sizeout,
         option_threads,
@@ -3712,10 +3871,12 @@ void args_init(int argc, char **argv)
         option_fastq_qmax,
         option_fastq_qmin,
         option_gzip_decompress,
+        option_label_suffix,
         option_log,
         option_no_progress,
         option_notrunclabels,
         option_quiet,
+        option_randseed,
         option_sintax_cutoff,
         option_strand,
         option_tabbedout,
@@ -3730,6 +3891,7 @@ void args_init(int argc, char **argv)
         option_fastq_qmax,
         option_fastq_qmin,
         option_gzip_decompress,
+        option_label_suffix,
         option_log,
         option_maxseqlength,
         option_minseqlength,
@@ -3742,6 +3904,7 @@ void args_init(int argc, char **argv)
         option_relabel_md5,
         option_relabel_self,
         option_relabel_sha1,
+        option_sample,
         option_sizein,
         option_sizeout,
         option_threads,
@@ -3757,6 +3920,7 @@ void args_init(int argc, char **argv)
         option_fastq_qmax,
         option_fastq_qmin,
         option_gzip_decompress,
+        option_label_suffix,
         option_log,
         option_maxseqlength,
         option_maxsize,
@@ -3771,6 +3935,7 @@ void args_init(int argc, char **argv)
         option_relabel_md5,
         option_relabel_self,
         option_relabel_sha1,
+        option_sample,
         option_sizein,
         option_sizeout,
         option_threads,
@@ -3790,6 +3955,7 @@ void args_init(int argc, char **argv)
         option_gapext,
         option_gapopen,
         option_hardmask,
+        option_label_suffix,
         option_log,
         option_match,
         option_mindiffs,
@@ -3806,6 +3972,7 @@ void args_init(int argc, char **argv)
         option_relabel_md5,
         option_relabel_self,
         option_relabel_sha1,
+        option_sample,
         option_sizein,
         option_sizeout,
         option_threads,
@@ -3828,6 +3995,7 @@ void args_init(int argc, char **argv)
         option_gapext,
         option_gapopen,
         option_hardmask,
+        option_label_suffix,
         option_log,
         option_match,
         option_mindiffs,
@@ -3844,6 +4012,7 @@ void args_init(int argc, char **argv)
         option_relabel_md5,
         option_relabel_self,
         option_relabel_sha1,
+        option_sample,
         option_sizein,
         option_sizeout,
         option_threads,
@@ -3866,6 +4035,7 @@ void args_init(int argc, char **argv)
         option_gapext,
         option_gapopen,
         option_hardmask,
+        option_label_suffix,
         option_log,
         option_match,
         option_mindiffs,
@@ -3882,6 +4052,7 @@ void args_init(int argc, char **argv)
         option_relabel_md5,
         option_relabel_self,
         option_relabel_sha1,
+        option_sample,
         option_sizein,
         option_sizeout,
         option_threads,
@@ -3906,6 +4077,7 @@ void args_init(int argc, char **argv)
         option_gapext,
         option_gapopen,
         option_hardmask,
+        option_label_suffix,
         option_log,
         option_match,
         option_mindiffs,
@@ -3922,6 +4094,7 @@ void args_init(int argc, char **argv)
         option_relabel_md5,
         option_relabel_self,
         option_relabel_sha1,
+        option_sample,
         option_self,
         option_selfid,
         option_sizein,
@@ -3938,6 +4111,7 @@ void args_init(int argc, char **argv)
 
       { option_udb2fasta,
         option_fasta_width,
+        option_label_suffix,
         option_log,
         option_no_progress,
         option_output,
@@ -3947,6 +4121,7 @@ void args_init(int argc, char **argv)
         option_relabel_md5,
         option_relabel_self,
         option_relabel_sha1,
+        option_sample,
         option_sizein,
         option_sizeout,
         option_threads,
@@ -3989,6 +4164,9 @@ void args_init(int argc, char **argv)
         option_iddef,
         option_idprefix,
         option_idsuffix,
+        option_label_suffix,
+        option_lca_cutoff,
+        option_lcaout,
         option_leftjust,
         option_log,
         option_match,
@@ -4023,6 +4201,7 @@ void args_init(int argc, char **argv)
         option_output_no_hits,
         option_pattern,
         option_qmask,
+        option_qsegout,
         option_query_cov,
         option_quiet,
         option_relabel,
@@ -4034,6 +4213,7 @@ void args_init(int argc, char **argv)
         option_rowlen,
         option_samheader,
         option_samout,
+        option_sample,
         option_self,
         option_selfid,
         option_sizein,
@@ -4043,6 +4223,7 @@ void args_init(int argc, char **argv)
         option_target_cov,
         option_threads,
         option_top_hits_only,
+        option_tsegout,
         option_uc,
         option_uc_allhits,
         option_userfields,
@@ -4355,6 +4536,11 @@ void args_init(int argc, char **argv)
       fatal("The argument to sintax_cutoff must be in the range 0.0 to 1.0");
     }
 
+  if ((opt_lca_cutoff <= 0.5) || (opt_lca_cutoff > 1.0))
+    {
+      fatal("The argument to lca_cutoff must be larger than 0.5, but not larger than 1.0");
+    }
+
   if (opt_minuniquesize < 1)
     {
       fatal("The argument to minuniquesize must be at least 1");
@@ -4643,6 +4829,7 @@ void cmd_help()
               "  --derep_fulllength FILENAME dereplicate sequences in the given FASTA file\n"
               "  --derep_id FILENAME         dereplicate using both identifiers and sequences\n"
               "  --derep_prefix FILENAME     dereplicate sequences in file based on prefixes\n"
+              "  --fastx_uniques FILENAME    dereplicate sequences in the FASTA/FASTQ file\n"
               "  --rereplicate FILENAME      rereplicate sequences in the given FASTA file\n"
               " Parameters\n"
               "  --maxuniquesize INT         maximum abundance for output from dereplication\n"
@@ -4650,13 +4837,21 @@ void cmd_help()
               "  --sizein                    propagate abundance annotation from input\n"
               "  --strand plus|both          dereplicate plus or both strands (plus)\n"
               " Output\n"
-              "  --output FILENAME           output FASTA file\n"
+              "  --fastq_ascii INT           FASTQ input quality score ASCII base char (33)\n"
+              "  --fastq_qmax INT            maximum base quality value for FASTQ input (41)\n"
+              "  --fastq_qmaxout INT         maximum base quality value for FASTQ output (41)\n"
+              "  --fastq_qmin INT            minimum base quality value for FASTQ input (0)\n"
+              "  --fastq_qminout INT         minimum base quality value for FASTQ output (0)\n"
+              "  --fastaout FILENAME         output FASTA file (for fastx_uniques)\n"
+              "  --fastqout FILENAME         output FASTQ file (for fastx_uniques)\n"
+              "  --output FILENAME           output FASTA file (not for fastx_uniques)\n"
               "  --relabel STRING            relabel with this prefix string\n"
               "  --relabel_keep              keep the old label after the new when relabelling\n"
               "  --relabel_md5               relabel with md5 digest of normalized sequence\n"
               "  --relabel_self              relabel with the sequence itself as label\n"
               "  --relabel_sha1              relabel with sha1 digest of normalized sequence\n"
               "  --sizeout                   write abundance annotation to output\n"
+              "  --tabbedout FILENAME        write cluster info to tsv file for fastx_uniques\n"
               "  --topn INT                  output only n most abundant sequences after derep\n"
               "  --uc FILENAME               filename for UCLUST-like dereplication output\n"
               "  --xsize                     strip abundance information in derep output\n"
@@ -4820,6 +5015,7 @@ void cmd_help()
               "  --iddef INT                 id definition, 0-4=CD-HIT,all,int,MBL,BLAST (2)\n"
               "  --idprefix INT              reject if first n nucleotides do not match\n"
               "  --idsuffix INT              reject if last n nucleotides do not match\n"
+              "  --lca_cutoff REAL           fraction of matching hits required for LCA (1.0)\n"
               "  --leftjust                  reject if terminal gaps at alignment left end\n"
               "  --match INT                 score for match (2)\n"
               "  --maxaccepts INT            number of hits to accept and show per strand (1)\n"
@@ -4860,6 +5056,7 @@ void cmd_help()
               "  --dbmatched FILENAME        FASTA file for matching database sequences\n"
               "  --dbnotmatched FILENAME     FASTA file for non-matching database sequences\n"
               "  --fastapairs FILENAME       FASTA file with pairs of query and target\n"
+              "  --lcaout FILENAME           output LCA of matching sequences to file\n"
               "  --matched FILENAME          FASTA file for matching query sequences\n"
               "  --mothur_shared_out FN      filename for OTU table output in mothur format\n"
               "  --notmatched FILENAME       FASTA file for non-matching query sequences\n"
@@ -5011,7 +5208,7 @@ void cmd_usearch_global()
       (!opt_dbmatched) && (!opt_dbnotmatched) &&
       (!opt_samout) && (!opt_otutabout) &&
       (!opt_biomout) && (!opt_mothur_shared_out) &&
-      (!opt_fastapairs))
+      (!opt_fastapairs) && (!opt_lcaout))
     {
       fatal("No output files specified");
     }
@@ -5039,7 +5236,7 @@ void cmd_search_exact()
       (!opt_dbmatched) && (!opt_dbnotmatched) &&
       (!opt_samout) && (!opt_otutabout) &&
       (!opt_biomout) && (!opt_mothur_shared_out) &&
-      (!opt_fastapairs))
+      (!opt_fastapairs) && (!opt_lcaout))
     {
       fatal("No output files specified");
     }
@@ -5052,94 +5249,6 @@ void cmd_search_exact()
   search_exact(cmdline, progheader);
 }
 
-void cmd_sortbysize()
-{
-  if (!opt_output)
-    {
-      fatal("FASTA output file for sortbysize must be specified with --output");
-    }
-
-  sortbysize();
-}
-
-void cmd_sortbylength()
-{
-  if (!opt_output)
-    {
-      fatal("FASTA output file for sortbylength must be specified with --output");
-    }
-
-  sortbylength();
-}
-
-void cmd_rereplicate()
-{
-  if (!opt_output)
-    {
-      fatal("FASTA output file for rereplicate must be specified with --output");
-    }
-
-  rereplicate();
-}
-
-void cmd_derep()
-{
-  if ((!opt_output) && (!opt_uc))
-    {
-      fatal("Output file for dereplication must be specified with --output or --uc");
-    }
-
-  if (opt_derep_fulllength)
-    {
-      derep_fulllength();
-    }
-  else if (opt_derep_id)
-    {
-      derep_id();
-    }
-  else
-    {
-      if (opt_strand > 1)
-        {
-          fatal("Option '--strand both' not supported with --derep_prefix");
-        }
-      else
-        {
-          derep_prefix();
-        }
-    }
-}
-
-void cmd_shuffle()
-{
-  if (!opt_output)
-    {
-      fatal("Output file for shuffling must be specified with --output");
-    }
-
-  shuffle();
-}
-
-void cmd_fastq_eestats()
-{
-  if (!opt_output)
-    {
-      fatal("Output file for fastq_eestats must be specified with --output");
-    }
-
-  fastq_eestats();
-}
-
-void cmd_fastq_eestats2()
-{
-  if (!opt_output)
-    {
-      fatal("Output file for fastq_eestats2 must be specified with --output");
-    }
-
-  fastq_eestats2();
-}
-
 void cmd_subsample()
 {
   if ((!opt_fastaout) && (!opt_fastqout))
@@ -5155,44 +5264,6 @@ void cmd_subsample()
   subsample();
 }
 
-void cmd_maskfasta()
-{
-  if (!opt_output)
-    {
-      fatal("Output file for masking must be specified with --output");
-    }
-
-  maskfasta();
-}
-
-void cmd_makeudb_usearch()
-{
-  if (!opt_output)
-    {
-      fatal("UDB output file must be specified with --output");
-    }
-  udb_make();
-}
-
-void cmd_udb2fasta()
-{
-  if (!opt_output)
-    {
-      fatal("FASTA output file must be specified with --output");
-    }
-  udb_fasta();
-}
-
-void cmd_fastx_mask()
-{
-  if ((!opt_fastaout) && (!opt_fastqout))
-    {
-      fatal("Specify output files for masking with --fastaout and/or --fastqout");
-    }
-
-  fastx_mask();
-}
-
 void cmd_none()
 {
   if (! opt_quiet)
@@ -5206,7 +5277,6 @@ void cmd_none()
               "vsearch --allpairs_global FILENAME --id 0.5 --alnout FILENAME\n"
               "vsearch --cluster_size FILENAME --id 0.97 --centroids FILENAME\n"
               "vsearch --cut FILENAME --cut_pattern G^AATT_C --fastaout FILENAME\n"
-              "vsearch --derep_fulllength FILENAME --output FILENAME\n"
               "vsearch --fastq_chars FILENAME\n"
               "vsearch --fastq_convert FILENAME --fastqout FILENAME --fastq_ascii 64\n"
               "vsearch --fastq_eestats FILENAME --output FILENAME\n"
@@ -5218,6 +5288,7 @@ void cmd_none()
               "vsearch --fastx_mask FILENAME --fastaout FILENAME\n"
               "vsearch --fastx_revcomp FILENAME --fastqout FILENAME\n"
               "vsearch --fastx_subsample FILENAME --fastaout FILENAME --sample_pct 1\n"
+              "vsearch --fastx_uniques FILENAME --output FILENAME\n"
               "vsearch --makeudb_usearch FILENAME --output FILENAME\n"
               "vsearch --search_exact FILENAME --db FILENAME --alnout FILENAME\n"
               "vsearch --sff_convert FILENAME --output FILENAME --sff_clip\n"
@@ -5230,35 +5301,15 @@ void cmd_none()
               "vsearch --usearch_global FILENAME --db FILENAME --id 0.97 --alnout FILENAME\n"
               "\n"
               "Other commands: cluster_fast, cluster_smallmem, cluster_unoise, cut,\n"
-              "                derep_id, derep_prefix, fasta2fastq, fastq_filter,\n"
-              "                fastq_join, fastx_getseqs, fastx_getsubseqs, maskfasta,\n"
-              "                orient, rereplicate, uchime2_denovo, uchime3_denovo,\n"
-              "                udb2fasta, udbinfo, udbstats, version\n"
+              "                derep_id, derep_fulllength, derep_prefix, fasta2fastq,\n"
+              "                fastq_filter, fastq_join, fastx_getseqs, fastx_getsubseqs,\n"
+              "                maskfasta, orient, rereplicate, uchime2_denovo,\n"
+              "                uchime3_denovo, udb2fasta, udbinfo, udbstats, version\n"
               "\n",
               progname);
     }
 }
 
-void cmd_fastx_revcomp()
-{
-  if ((!opt_fastaout) && (!opt_fastqout))
-    {
-      fatal("No output files specified");
-    }
-
-  fastx_revcomp();
-}
-
-void cmd_fastq_convert()
-{
-  if (! opt_fastqout)
-    {
-      fatal("No output file specified with --fastqout");
-    }
-
-  fastq_convert();
-}
-
 void cmd_cluster()
 {
   if ((!opt_alnout) && (!opt_userout) &&
@@ -5464,19 +5515,27 @@ int main(int argc, char** argv)
     }
   else if (opt_sortbysize)
     {
-      cmd_sortbysize();
+      sortbysize();
     }
   else if (opt_sortbylength)
     {
-      cmd_sortbylength();
+      sortbylength();
+    }
+  else if (opt_derep_fulllength)
+    {
+      derep(opt_derep_fulllength, false);
     }
-  else if (opt_derep_fulllength || opt_derep_id || opt_derep_prefix)
+  else if (opt_derep_prefix)
     {
-      cmd_derep();
+      derep_prefix();
+    }
+  else if (opt_derep_id)
+    {
+      derep(opt_derep_id, true);
     }
   else if (opt_shuffle)
     {
-      cmd_shuffle();
+      shuffle();
     }
   else if (opt_fastx_subsample)
     {
@@ -5484,7 +5543,7 @@ int main(int argc, char** argv)
     }
   else if (opt_maskfasta)
     {
-      cmd_maskfasta();
+      maskfasta();
     }
   else if (opt_cluster_smallmem || opt_cluster_fast || opt_cluster_size || opt_cluster_unoise)
     {
@@ -5512,7 +5571,7 @@ int main(int argc, char** argv)
     }
   else if (opt_fastx_revcomp)
     {
-      cmd_fastx_revcomp();
+      fastx_revcomp();
     }
   else if (opt_search_exact)
     {
@@ -5520,11 +5579,11 @@ int main(int argc, char** argv)
     }
   else if (opt_fastx_mask)
     {
-      cmd_fastx_mask();
+      fastx_mask();
     }
   else if (opt_fastq_convert)
     {
-      cmd_fastq_convert();
+      fastq_convert();
     }
   else if (opt_fastq_mergepairs)
     {
@@ -5532,11 +5591,11 @@ int main(int argc, char** argv)
     }
   else if (opt_fastq_eestats)
     {
-      cmd_fastq_eestats();
+      fastq_eestats();
     }
   else if (opt_fastq_eestats2)
     {
-      cmd_fastq_eestats2();
+      fastq_eestats2();
     }
   else if (opt_fastq_join)
     {
@@ -5544,7 +5603,7 @@ int main(int argc, char** argv)
     }
   else if (opt_rereplicate)
     {
-      cmd_rereplicate();
+      rereplicate();
     }
   else if (opt_version)
     {
@@ -5552,11 +5611,11 @@ int main(int argc, char** argv)
     }
   else if (opt_makeudb_usearch)
     {
-      cmd_makeudb_usearch();
+      udb_make();
     }
   else if (opt_udb2fasta)
     {
-      cmd_udb2fasta();
+      udb_fasta();
     }
   else if (opt_udbinfo)
     {
@@ -5598,6 +5657,10 @@ int main(int argc, char** argv)
     {
       fasta2fastq();
     }
+  else if (opt_fastx_uniques)
+    {
+      derep(opt_fastx_uniques, false);
+    }
   else
     {
       cmd_none();


=====================================
src/vsearch.h
=====================================
@@ -250,6 +250,7 @@
 #include "otutable.h"
 #include "udb.h"
 #include "kmerhash.h"
+#include "tax.h"
 #include "sintax.h"
 #include "fastqjoin.h"
 #include "sffconvert.h"
@@ -271,6 +272,7 @@ extern bool opt_fastq_nostagger;
 extern bool opt_gzip_decompress;
 extern bool opt_label_substr_match;
 extern bool opt_no_progress;
+extern bool opt_fastq_qout_max;
 extern bool opt_quiet;
 extern bool opt_relabel_keep;
 extern bool opt_relabel_md5;
@@ -332,6 +334,7 @@ extern char * opt_fastx_getsubseq;
 extern char * opt_fastx_mask;
 extern char * opt_fastx_revcomp;
 extern char * opt_fastx_subsample;
+extern char * opt_fastx_uniques;
 extern char * opt_join_padgap;
 extern char * opt_join_padgapq;
 extern char * opt_label;
@@ -340,6 +343,7 @@ extern char * opt_labels;
 extern char * opt_label_word;
 extern char * opt_label_words;
 extern char * opt_label_field;
+extern char * opt_lcaout;
 extern char * opt_log;
 extern char * opt_makeudb_usearch;
 extern char * opt_maskfasta;
@@ -354,10 +358,12 @@ extern char * opt_otutabout;
 extern char * opt_output;
 extern char * opt_pattern;
 extern char * opt_profile;
+extern char * opt_qsegout;
 extern char * opt_relabel;
 extern char * opt_rereplicate;
 extern char * opt_reverse;
 extern char * opt_samout;
+extern char * opt_sample;
 extern char * opt_search_exact;
 extern char * opt_sff_convert;
 extern char * opt_shuffle;
@@ -365,6 +371,7 @@ extern char * opt_sintax;
 extern char * opt_sortbylength;
 extern char * opt_sortbysize;
 extern char * opt_tabbedout;
+extern char * opt_tsegout;
 extern char * opt_uc;
 extern char * opt_uchime2_denovo;
 extern char * opt_uchime3_denovo;
@@ -385,6 +392,7 @@ extern double opt_fastq_maxee;
 extern double opt_fastq_maxee_rate;
 extern double opt_fastq_truncee;
 extern double opt_id;
+extern double opt_lca_cutoff;
 extern double opt_max_unmasked_pct;
 extern double opt_maxid;
 extern double opt_maxqt;



View it on GitLab: https://salsa.debian.org/med-team/vsearch/-/commit/e1364a48a92c981c1b2be023badb5ca8c3528ca3

-- 
View it on GitLab: https://salsa.debian.org/med-team/vsearch/-/commit/e1364a48a92c981c1b2be023badb5ca8c3528ca3
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20220117/69b2fc63/attachment-0001.htm>


More information about the debian-med-commit mailing list