[med-svn] [Git][med-team/vsearch][upstream] New upstream version 2.14.1
Steffen Möller
gitlab at salsa.debian.org
Fri Nov 29 13:35:01 GMT 2019
Steffen Möller pushed to branch upstream at Debian Med / vsearch
Commits:
a6e84605 by Steffen Moeller at 2019-11-29T13:31:44Z
New upstream version 2.14.1
- - - - -
15 changed files:
- .travis.yml
- README.md
- configure.ac
- man/Makefile.am
- man/vsearch.1
- src/cluster.cc
- src/fasta.cc
- src/fasta.h
- src/fastq.cc
- src/filter.cc
- src/mergepairs.cc
- src/msa.cc
- src/sffconvert.cc
- src/vsearch.cc
- src/vsearch.h
Changes:
=====================================
.travis.yml
=====================================
@@ -10,7 +10,7 @@ compiler:
- clang
install:
-- if [ $TRAVIS_OS_NAME = linux ]; then sudo apt-get install ghostscript; else brew install ghostscript; fi
+- if [ $TRAVIS_OS_NAME = linux ]; then sudo apt-get install ghostscript; sudo apt-get install groff; else brew install ghostscript; fi
script:
- ./autogen.sh
=====================================
README.md
=====================================
@@ -34,7 +34,7 @@ Most of the nucleotide based commands and options in USEARCH version 7 are suppo
## Getting Help
-If you can't find an answer in the [VSEARCH documentation](https://github.com/torognes/vsearch/releases/download/v2.13.6/vsearch_manual.pdf), please visit the [VSEARCH Web Forum](https://groups.google.com/forum/#!forum/vsearch-forum) to post a question or start a discussion.
+If you can't find an answer in the [VSEARCH documentation](https://github.com/torognes/vsearch/releases/download/v2.14.1/vsearch_manual.pdf), please visit the [VSEARCH Web Forum](https://groups.google.com/forum/#!forum/vsearch-forum) to post a question or start a discussion.
## Example
@@ -47,9 +47,9 @@ In the example below, VSEARCH will identify sequences in the file database.fsa t
**Source distribution** To download the source distribution from a [release](https://github.com/torognes/vsearch/releases) and build the executable and the documentation, use the following commands:
```
-wget https://github.com/torognes/vsearch/archive/v2.13.6.tar.gz
-tar xzf v2.13.6.tar.gz
-cd vsearch-2.13.6
+wget https://github.com/torognes/vsearch/archive/v2.14.1.tar.gz
+tar xzf v2.14.1.tar.gz
+cd vsearch-2.14.1
./autogen.sh
./configure
make
@@ -78,43 +78,43 @@ Binary distributions are provided for x86-64 systems running GNU/Linux, macOS (v
Download the appropriate executable for your system using the following commands if you are using a Linux x86_64 system:
```sh
-wget https://github.com/torognes/vsearch/releases/download/v2.13.6/vsearch-2.13.6-linux-x86_64.tar.gz
-tar xzf vsearch-2.13.6-linux-x86_64.tar.gz
+wget https://github.com/torognes/vsearch/releases/download/v2.14.1/vsearch-2.14.1-linux-x86_64.tar.gz
+tar xzf vsearch-2.14.1-linux-x86_64.tar.gz
```
Or these commands if you are using a Linux ppc64le system:
```sh
-wget https://github.com/torognes/vsearch/releases/download/v2.13.6/vsearch-2.13.6-linux-ppc64le.tar.gz
-tar xzf vsearch-2.13.6-linux-ppc64le.tar.gz
+wget https://github.com/torognes/vsearch/releases/download/v2.14.1/vsearch-2.14.1-linux-ppc64le.tar.gz
+tar xzf vsearch-2.14.1-linux-ppc64le.tar.gz
```
Or these commands if you are using a Linux aarch64 system:
```sh
-wget https://github.com/torognes/vsearch/releases/download/v2.13.6/vsearch-2.13.6-linux-aarch64.tar.gz
-tar xzf vsearch-2.13.6-linux-aarch64.tar.gz
+wget https://github.com/torognes/vsearch/releases/download/v2.14.1/vsearch-2.14.1-linux-aarch64.tar.gz
+tar xzf vsearch-2.14.1-linux-aarch64.tar.gz
```
Or these commands if you are using a Mac:
```sh
-wget https://github.com/torognes/vsearch/releases/download/v2.13.6/vsearch-2.13.6-macos-x86_64.tar.gz
-tar xzf vsearch-2.13.6-macos-x86_64.tar.gz
+wget https://github.com/torognes/vsearch/releases/download/v2.14.1/vsearch-2.14.1-macos-x86_64.tar.gz
+tar xzf vsearch-2.14.1-macos-x86_64.tar.gz
```
Or if you are using Windows, download and extract (unzip) the contents of this file:
```
-https://github.com/torognes/vsearch/releases/download/v2.13.6/vsearch-2.13.6-win-x86_64.zip
+https://github.com/torognes/vsearch/releases/download/v2.14.1/vsearch-2.14.1-win-x86_64.zip
```
-Linux and Mac: You will now have the binary distribution in a folder called `vsearch-2.13.6-linux-x86_64` or `vsearch-2.13.6-macos-x86_64` in which you will find three subfolders `bin`, `man` and `doc`. We recommend making a copy or a symbolic link to the vsearch binary `bin/vsearch` in a folder included in your `$PATH`, and a copy or a symbolic link to the vsearch man page `man/vsearch.1` in a folder included in your `$MANPATH`. The PDF version of the manual is available in `doc/vsearch_manual.pdf`.
+Linux and Mac: You will now have the binary distribution in a folder called `vsearch-2.14.1-linux-x86_64` or `vsearch-2.14.1-macos-x86_64` in which you will find three subfolders `bin`, `man` and `doc`. We recommend making a copy or a symbolic link to the vsearch binary `bin/vsearch` in a folder included in your `$PATH`, and a copy or a symbolic link to the vsearch man page `man/vsearch.1` in a folder included in your `$MANPATH`. The PDF version of the manual is available in `doc/vsearch_manual.pdf`.
-Windows: You will now have the binary distribution in a folder called `vsearch-2.13.6-win-x86_64`. The vsearch executable is called `vsearch.exe`. The manual in PDF format is called `vsearch_manual.pdf`.
+Windows: You will now have the binary distribution in a folder called `vsearch-2.14.1-win-x86_64`. The vsearch executable is called `vsearch.exe`. The manual in PDF format is called `vsearch_manual.pdf`.
-**Documentation** The VSEARCH user's manual is available in the `man` folder in the form of a [man page](https://github.com/torognes/vsearch/blob/master/man/vsearch.1). A pdf version ([vsearch_manual.pdf](https://github.com/torognes/vsearch/releases/download/v2.13.6/vsearch_manual.pdf)) will be generated by `make`. To install the manpage manually, copy the `vsearch.1` file or a create a symbolic link to `vsearch.1` in a folder included in your `$MANPATH`. The manual in both formats is also available with the binary distribution. The manual in PDF form ([vsearch_manual.pdf](https://github.com/torognes/vsearch/releases/download/v2.13.6/vsearch_manual.pdf)) is also attached to the latest [release](https://github.com/torognes/vsearch/releases).
+**Documentation** The VSEARCH user's manual is available in the `man` folder in the form of a [man page](https://github.com/torognes/vsearch/blob/master/man/vsearch.1). A pdf version ([vsearch_manual.pdf](https://github.com/torognes/vsearch/releases/download/v2.14.1/vsearch_manual.pdf)) will be generated by `make`. To install the manpage manually, copy the `vsearch.1` file or a create a symbolic link to `vsearch.1` in a folder included in your `$MANPATH`. The manual in both formats is also available with the binary distribution. The manual in PDF form ([vsearch_manual.pdf](https://github.com/torognes/vsearch/releases/download/v2.14.1/vsearch_manual.pdf)) is also attached to the latest [release](https://github.com/torognes/vsearch/releases).
## Packages, plugins, and wrappers
=====================================
configure.ac
=====================================
@@ -2,7 +2,7 @@
# Process this file with autoconf to produce a configure script.
AC_PREREQ([2.63])
-AC_INIT([vsearch], [2.13.6], [torognes at ifi.uio.no])
+AC_INIT([vsearch], [2.14.1], [torognes at ifi.uio.no])
AC_CANONICAL_TARGET
AM_INIT_AUTOMAKE([subdir-objects])
AC_LANG([C++])
@@ -73,6 +73,8 @@ AS_IF([test "x$enable_pdfman" != "xno"], [
fi
])
+have_man_html=no
+
if test "x${have_pthreads}" = "xyes"; then
AC_CHECK_HEADERS([pthread.h], [], [have_pthreads=no])
fi
@@ -96,6 +98,7 @@ AM_CONDITIONAL(HAVE_BZLIB, test "x${have_bzip2}" = "xyes")
AM_CONDITIONAL(HAVE_ZLIB, test "x${have_zlib}" = "xyes")
AM_CONDITIONAL(HAVE_PTHREADS, test "x${have_pthreads}" = "xyes")
AM_CONDITIONAL(HAVE_PS2PDF, test "x${have_ps2pdf}" = "xyes")
+AM_CONDITIONAL(HAVE_MAN_HTML, test "x${have_man_html}" = "xyes")
AM_CONDITIONAL(TARGET_PPC, test "x${target_ppc}" = "xyes")
AM_CONDITIONAL(TARGET_AARCH64, test "x${target_aarch64}" = "xyes")
AM_PROG_CC_C_O
=====================================
man/Makefile.am
=====================================
@@ -2,21 +2,40 @@
dist_man_MANS = vsearch.1
+doc_DATA =
+CLEANFILES =
+
+if HAVE_MAN_HTML
+
+doc_DATA += vsearch_manual.html
+
+vsearch_manual.html : vsearch.1
+ sed -e 's/\\-/-/g' $< | \
+ if [ $$(uname) == "Darwin" ] ; then \
+ iconv -f UTF-8 -t ISO-8859-1 ; \
+ else \
+ cat ; \
+ fi | \
+ groff -t -m mandoc -m www -Thtml > $@
+
+CLEANFILES += vsearch_manual.html
+
+endif
+
+
if HAVE_PS2PDF
-doc_DATA = vsearch_manual.pdf
+doc_DATA += vsearch_manual.pdf
vsearch_manual.pdf : vsearch.1
- TEMP=$$(mktemp temp.XXXXXXXX) ; \
+ sed -e 's/\\-/-/g' $< | \
if [ $$(uname) == "Darwin" ] ; then \
- sed -e 's/\\-/-/g' $< | \
- iconv -f UTF-8 -t ISO-8859-1 > $$TEMP ; \
+ iconv -f UTF-8 -t ISO-8859-1 ; \
else \
- sed -e 's/\\-/-/g' $< > $$TEMP ; \
- fi ; \
- man -t ./$$TEMP | ps2pdf -sPAPERSIZE=a4 - $@ ; \
- rm $$TEMP
+ cat ; \
+ fi | \
+ groff -t -m mandoc -T ps -P -pa4 | ps2pdf - $@
-CLEANFILES=vsearch_manual.pdf
+CLEANFILES += vsearch_manual.pdf
endif
=====================================
man/vsearch.1
=====================================
@@ -1,5 +1,5 @@
.\" ============================================================================
-.TH vsearch 1 "July 2, 2019" "version 2.13.6" "USER COMMANDS"
+.TH vsearch 1 "September 18, 2019" "version 2.14.1" "USER COMMANDS"
.\" ============================================================================
.SH NAME
vsearch \(em chimera detection, clustering, dereplication and
@@ -288,37 +288,48 @@ ambiguous (e.g. \-\-derep_f).
.RE
.PP
.\" ----------------------------------------------------------------------------
+.TAG help-and-version-commands
Help and version commands:
.PP
.RS
-.B \-\-help | \-\-h
+.TAG help
+.TAG h
+.TP 9
+.B \-\-help \-\-h
Display help text with brief information about all commands and
options.
+.TAG version
+.TAG v
.TP
-.B \-\-version | \-\-v
+.B \-\-version \-\-v
Output version information and a citation for the VSEARCH
publication. Show the status of the support for gzip- and
bzip2-compressed input files.
.RE
.PP
.\" ----------------------------------------------------------------------------
+.TAG general-options
General options:
.RS
+.TAG bzip2_decompress
.TP 9
.B \-\-bzip2_decompress
When reading from a pipe streaming bzip2-compressed data, decompress
the data. That option is not needed when reading from a standard
bzip2-compressed file.
+.TAG fasta_width
.TP
.BI \-\-fasta_width\~ "positive integer"
Fasta files produced by \fBvsearch\fR are wrapped (sequences are
written on lines of \fIinteger\fR nucleotides, 80 by default). Set
that value to zero to eliminate the wrapping.
+.TAG gzip_decompress
.TP
.B \-\-gzip_decompress
When reading from a pipe streaming gzip-compressed data, decompress
the data. That option is not needed when reading from a standard
gzip-compressed file.
+.TAG log
.TP
.BI \-\-log \0filename
Write messages to the specified log file. Information written includes
@@ -328,26 +339,32 @@ and fatal errors. The start and finish times are also recorded as well
as the elapsed time and the maximum amount of memory consumed. The
different \fBvsearch\fR commands can also write additional
informations to the log file.
+.TAG maxseqlength
.TP
.BI \-\-maxseqlength\~ "positive integer"
All \fBvsearch\fR operations discard sequences of length equal or
greater than \fIinteger\fR (50,000 nucleotides by default).
+.TAG minseqlength
.TP
.BI \-\-minseqlength\~ "positive integer"
All \fBvsearch\fR operations discard sequences of length smaller than
\fIinteger\fR: 1 nucleotide by default for sorting or shuffling, 32
nucleotides for clustering, dereplication or searching.
+.TAG no_progress
.TP
.B \-\-no_progress
Do not show the gradually increasing progress indicator.
+.TAG notrunclabels
.TP
.B \-\-notrunclabels
Do not truncate sequence labels at first space or tab, use the full
header in output files.
+.TAG quiet
.TP
.B \-\-quiet
Suppress all messages to stdout and stderr except for warnings and
fatal error messages.
+.TAG threads
.TP
.BI \-\-threads\~ "positive integer"
Number of computation threads to use (1 to 1024). The number of threads
@@ -361,6 +378,7 @@ the other commands.
.RE
.PP
.\" ----------------------------------------------------------------------------
+.TAG chimera-detection-options
Chimera detection options:
.PP
.RS
@@ -381,6 +399,7 @@ sort sequences by decreasing abundance (default of
\-\-derep_fulllength command). If your sequence set needs to be
sorted, please see the \-\-sortbysize command in the sorting section.
.PP
+.TAG abskew
.TP 9
.BI \-\-abskew \0real
When using \-\-uchime_denovo, the abundance skew is used to
@@ -391,20 +410,24 @@ their parents. For \-\-uchime3_denovo the default value is 16.0. For
the other commands, the default value is 2.0, which means that the
parents should be at least 2 times more abundant than their
chimera. Any positive value equal or greater than 1.0 can be used.
+.TAG alignwidth
.TP
.BI \-\-alignwidth\~ "positive integer"
When using \-\-uchimealns, set the width of the three-way alignments
(80 nucleotides by default). Set to zero to eliminate wrapping.
+.TAG borderline
.TP
.BI \-\-borderline \0filename
Output borderline chimeric sequences to \fIfilename\fR, in fasta
format. Borderline chimeric sequences are sequences that have a high
enough score but which are not sufficiently different from their
closest parent.
+.TAG chimeras
.TP
.BI \-\-chimeras \0filename
Output chimeric sequences to \fIfilename\fR, in fasta format. Output
order may vary when using multiple threads.
+.TAG db
.TP
.BI \-\-db \0filename
When using \-\-uchime_ref, detect chimeras using the fasta-formatted
@@ -412,41 +435,50 @@ reference sequences contained in \fIfilename\fR. Reference sequences
are assumed to be chimera-free. Chimeras cannot be detected if their
parents, or sufficiently close relatives, are not present in the
database.
+.TAG dn
.TP
.BI \-\-dn \0real
No vote pseudo-count, corresponding to the parameter \fIn\fR in the
chimera scoring function (default value is 1.4).
+.TAG fasta_score
.TP
.B \-\-fasta_score
Add the chimera score to the headers in the fasta output files for
chimeras, non-chimeras and borderline sequences, using the format
';uchime_denovo=\fIfloat\fR;'.
+.TAG mindiffs
.TP
.BI \-\-mindiffs\~ "positive integer"
Minimum number of differences per segment (default value is 3). The
parameter is ignored with \-\-uchime2_denovo and \-\-uchime3_denovo.
+.TAG mindiv
.TP
.BI \-\-mindiv \0real
Minimum divergence from closest parent (default value is 0.8). The
parameter is ignored with \-\-uchime2_denovo and \-\-uchime3_denovo.
+.TAG minh
.TP
.BI \-\-minh \0real
Minimum score (\fIh\fR). Increasing this value tends to reduce the
number of false positives and to decrease sensitivity. Default value
is 0.28, and values ranging from 0.0 to 1.0 included are accepted. The
parameter is ignored with \-\-uchime2_denovo and \-\-uchime3_denovo.
+.TAG nonchimeras
.TP
.BI \-\-nonchimeras \0filename
Output non-chimeric sequences to \fIfilename\fR, in fasta
format. Output order may vary when using multiple threads.
+.TAG relabel
.TP
.BI \-\-relabel \0string
Relabel sequences using the prefix \fIstring\fR and a ticker (1, 2, 3,
etc.) to construct the new headers. Use \-\-sizeout to conserve the
abundance annotations.
+.TAG relabel_keep
.TP
.B \-\-relabel_keep
When relabelling, keep the old identifier in the header after a space.
+.TAG relabel_md5
.TP
.B \-\-relabel_md5
Relabel sequences using the MD5 message digest algorithm applied to
@@ -461,6 +493,11 @@ generates a 128-bit (16-byte) digest that is represented by 16
hexadecimal numbers (using 32 symbols among 0123456789abcdef). Use
\-\-sizeout to conserve the abundance annotations.
.\" The probablity of collision for two sequences is 1/2^128
+.TAG relabel_self
+.TP
+.B \-\-relabel_self
+Relabel sequences using each sequence itself as a label.
+.TAG relabel_sha1
.TP
.B \-\-relabel_sha1
Relabel sequences using the SHA1 message digest algorithm applied to
@@ -471,27 +508,32 @@ the SHA1 algorithm instead of the MD5 algorithm. SHA1 generates a
sequences resulting in the same digest) is smaller for the SHA1
algorithm than it is for the MD5 algorithm.
.\" The probablity of collision for two sequences is 1/2^160
+.TAG self
.TP
.B \-\-self
When using \-\-uchime_ref, ignore a reference sequence when its label
matches the label of the query sequence (useful to estimate
false-positive rate in reference sequences).
.\" I am not sure the statement above is true.
+.TAG selfid
.TP
.B \-\-selfid
When using \-\-uchime_ref, ignore a reference sequence when its
nucleotide sequence is strictly identical to the nucleotidic sequence
of the query.
+.TAG sizeout
.TP
.B \-\-sizeout
When relabelling, add abundance annotations to fasta headers (using
the format ';size=\fIinteger\fR;').
+.TAG uchime_denovo
.TP
.BI \-\-uchime_denovo \0filename
Detect chimeras present in the fasta-formatted \fIfilename\fR, without
external references (i.e. \fIde novo\fR). Automatically sort the
sequences in \fIfilename\fR by decreasing abundance beforehand (see
the sorting section for details). Multithreading is not supported.
+.TAG uchime2_denovo
.TP
.BI \-\-uchime2_denovo \0filename
Detect chimeras present in the fasta-formatted \fIfilename\fR, using
@@ -499,17 +541,20 @@ the UCHIME2 algorithm. This algorithm is designed for denoised
amplicons (see \-\-cluster_unoise). Automatically sort the sequences
in \fIfilename\fR by decreasing abundance beforehand (see the sorting
section for details). Multithreading is not supported.
+.TAG uchime3_denovo
.TP
.BI \-\-uchime3_denovo \0filename
Detect chimeras present in the fasta-formatted \fIfilename\fR, using
the UCHIME2 algorithm. The only difference from \-\-uchime2_denovo is
that the default minimum abundance skew (\-\-abskew) is set to 16.0
rather than 2.0.
+.TAG uchime_ref
.TP
.BI \-\-uchime_ref \0filename
Detect chimeras present in the fasta-formatted \fIfilename\fR by
comparing them with reference sequences (option
\-\-db). Multithreading is supported.
+.TAG uchimealns
.TP
.BI \-\-uchimealns \0filename
Write the three-way global alignments (parentA, parentB, chimera) to
@@ -517,6 +562,7 @@ Write the three-way global alignments (parentA, parentB, chimera) to
modify alignment length. Output order may vary when using multiple
threads. All sequences are converted to upper case before
alignment. Lower case letters indicate disagreement in the alignment.
+.TAG uchimeout
.TP
.BI \-\-uchimeout \0filename
Write chimera detection results to \fIfilename\fR using a 18-field,
@@ -566,16 +612,19 @@ div: divergence, defined as (idQM - idQT).
YN: query is chimeric (Y), or not (N), or is a borderline case (?).
.RE
.RE
+.TAG uchimeout5
.TP
.B \-\-uchimeout5
When using \-\-uchimeout, write chimera detection results using a
17\-field, tab\-separated uchime\-like format (drop the 5th field of
\-\-uchimeout), compatible with usearch version 5 and earlier
versions.
+.TAG xn
.TP
.BI \-\-xn \0real
No vote weight, corresponding to the parameter \fIbeta\fR in the
scoring function (default value is 8.0).
+.TAG xsize
.TP
.B \-\-xsize
Strip abundance information from the headers when writing the output
@@ -583,6 +632,7 @@ file.
.RE
.PP
.\" ----------------------------------------------------------------------------
+.TAG clustering-options
Clustering options:
.RS
.PP
@@ -594,11 +644,13 @@ definition (\-\-iddef).
.PP
Input sequences are masked as specified with the \-\-qmask and
\-\-hardmask options.
+.TAG biomout
.TP 9
.BI \-\-biomout \0filename
Generate an OTU table in the biom version 1.0 JSON file format as
specified at
-http://biom-format.org/documentation/format_versions/biom-1.0.html.
+.URL http://biom-format.org/documentation/format_versions/biom-1.0.html "(link)"
+<http://biom-format.org/documentation/format_versions/biom-1.0.html>.
The format describes how to store a sparse matrix containing the
abundances of the OTUs in the different samples. This format is much
more efficient than the classic and mothur OTU table formats available
@@ -622,42 +674,51 @@ header. The OTU identifier may contain any printable character except
semicolons. If no such OTU label is found, the identifier in the
initial part of the header will be used, and all characters except
semicolons are allowed. Alternatively, OTU identifers can be generated
-using the relabelling options (\-\-relabel, \-\-relabel_sha1 or
-\-\-relabel_md5). Taxonomy information, if present, will also be
-extracted from the headers of the centroid sequences. If the header
-contains ';tax=Homo_sapiens;' or a similar string somewhere, then the
-given taxonomy information (here 'Homo_sapiens') will be used. The
-semicolon is not mandatory at the beginning or end of the header. The
-taxonomy information may contain any printable character except
-semicolons. If an OTU table in the biom version 2.1 HDF5 file format
-is required, the biom utility may be used as described at
-http://biom-format.org/documentation/biom_conversion.html.
+using the relabelling options (\-\-relabel, \-\-relabel_self,
+\-\-relabel_sha1, or \-\-relabel_md5). Taxonomy information, if
+present, will also be extracted from the headers of the centroid
+sequences. If the header contains ';tax=Homo_sapiens;' or a similar
+string somewhere, then the given taxonomy information
+(here 'Homo_sapiens') will be used. The semicolon is not mandatory at
+the beginning or end of the header. The taxonomy information may
+contain any printable character except semicolons. If an OTU table in
+the biom version 2.1 HDF5 file format is required, the biom utility
+may be used as described at
+.URL http://biom-format.org/documentation/biom_conversion.html "(link)"
+<http://biom-format.org/documentation/biom_conversion.html>.
+.TAG centroids
.TP
.BI \-\-centroids \0filename
Output cluster centroid sequences to \fIfilename\fR, in fasta
format. The centroid is the sequence that seeded the cluster (i.e. the
first sequence of the cluster).
+.TAG clusterout_id
.TP
.BI \-\-clusterout_id
Add cluster identifier information to the output files
when using the \-\-consout and \-\-profile options.
+.TAG clusterout_sort
.TP
.BI \-\-clusterout_sort
Sort output files by decreasing abundance
when using the \-\-consout, \-\-msaout and \-\-profile options.
+.TAG cluster_fast
.TP
.BI \-\-cluster_fast \0filename
Clusterize the fasta sequences in \fIfilename\fR, automatically sort
by decreasing sequence length beforehand.
+.TAG cluster_size
.TP
.BI \-\-cluster_size \0filename
Clusterize the fasta sequences in \fIfilename\fR, automatically sort
by decreasing sequence abundance beforehand.
+.TAG cluster_smallmem
.TP
.BI \-\-cluster_smallmem \0filename
Clusterize the fasta sequences in \fIfilename\fR without automatically
modifying their order beforehand. Sequence are expected to be sorted
by decreasing sequence length, unless \-\-usersort is used.
+.TAG cluster_unoise
.TP
.BI \-\-cluster_unoise \0filename
Perform denoising of the fasta sequences in \fIfilename\fR according
@@ -666,11 +727,13 @@ chimera removal step. The options \-\-minsize (default 8) and
\-\-unoise_alpha (default 2.0) may be specified. Chimera removal
(\fIde novo\fR) should be performed afterwards with
\-\-uchime3_denovo.
+.TAG clusters
.TP
.BI \-\-clusters \0string
Output each cluster to a separate fasta file using the prefix
\fIstring\fR and a ticker (0, 1, 2, etc.) to construct the path and
filenames.
+.TAG consout
.TP
.BI \-\-consout \0filename
Output cluster consensus sequences to \fIfilename\fR. For each
@@ -679,6 +742,7 @@ constructed by taking the majority symbol (nucleotide or gap) from
each column of the alignment. Columns containing a majority of gaps
are skipped, except for terminal gaps. If the \-\-sizein
option is specified, sequence abundances will be taken into account.
+.TAG cons_truncate
.TP
.B \-\-cons_truncate
This command is ignored. A warning is issued.
@@ -688,6 +752,7 @@ This command is ignored. A warning is issued.
.\" do not ignore terminal gaps. That option skips terminal columns
.\" if they contain a majority of gaps, yielding shorter consensus
.\" sequences than when using \-\-consout alone.
+.TAG id
.TP
.BI \-\-id \0real
Do not add the target to the cluster if the pairwise identity with the
@@ -695,6 +760,7 @@ centroid is lower than \fIreal\fR (value ranging from 0.0 to 1.0
included). The pairwise identity is defined as the number of (matching
columns) / (alignment length - terminal gaps). That definition can be
modified by \-\-iddef.
+.TAG iddef
.TP
.BI \-\-iddef\~ "0|1|2|3|4"
Change the pairwise identity definition used in \-\-id. Values
@@ -718,10 +784,12 @@ BLAST definition, equivalent to \-\-iddef 1 in a context of global
pairwise alignment.
.RE
.RE
+.TAG minsize
.TP
.BI \-\-minsize\~ "positive integer"
Specify the minimum abundance of sequences for denoising using
\-\-cluster_unoise. The default is 8.
+.TAG msaout
.TP
.BI \-\-msaout \0filename
Output a multiple sequence alignment and a consensus sequence for each
@@ -733,10 +801,13 @@ the majority symbol (nucleotide or gap) from each column of the
alignment. Columns containing a majority of gaps are skipped, except
for terminal gaps. If the \-\-sizein option is specified, sequence
abundances will be taken into account when computing the consensus.
+.TAG mothur_shared_out
.TP
.BI \-\-mothur_shared_out \0filename
Output an OTU table in the mothur 'shared' tab-separated plain text
-format as described at http://www.mothur.org/wiki/Shared_file. The
+format as described at
+.URL https://www.mothur.org/wiki/Shared_file (link)
+<https://www.mothur.org/wiki/Shared_file>. The
format describes how a matrix containing the abundances of the OTUs in
the different samples is stored. The first line will start with the
strings 'label', 'group' and 'numOtus' and is followed by a list of
@@ -747,6 +818,7 @@ in the order given on the first line. The OTU and sample identifiers
are extracted from the FASTA headers of the sequences. The OTUs are
represented by the cluster centroids. See the \-\-biomout option for
further details.
+.TAG otutabout
.TP
.BI \-\-otutabout \0filename
Output an OTU table in the classic tab-separated plain text format as
@@ -762,6 +834,7 @@ added to the right of the table if taxonomy information is available
for at least one of the OTUs. This column will be labelled 'taxonomy'
and each row will then contain the taxonomy information extracted for
that OTU. See the \-\-biomout option for further details.
+.TAG profile
.TP
.BI \-\-profile \0filename
Output a sequence profile to a text file with the frequency of each
@@ -773,35 +846,48 @@ number of Cs, number of Gs, number of Ts or Us, number of gap symbols,
and finally the total number of ambiguous nucleotide symbols (B, D, H,
K, M, N, R, S, Y, V or W). All numbers are integers. If the \-\-sizein
option is specified, sequence abundances will be taken into account.
+.TAG qmask
.TP
.BI \-\-qmask\~ "none|dust|soft"
Mask regions in sequences using the
\fIdust\fR or the \fIsoft\fR methods, or do not mask
(\fInone\fR). Warning, when using \fIsoft\fR masking, clustering
becomes case sensitive. The default is to mask using \fIdust\fR.
+.TAG relabel
.TP
.BI \-\-relabel \0string
Relabel sequence identifiers in the output files produced by
\-\-consout, \-\-profile and \-\-centroids options. Please see the
description of the same option under Chimera detection for details.
+.TAG relabel_keep
.TP
.B \-\-relabel_keep
When relabelling, keep the old identifier in the header after a space.
+.TAG relabel_md5
.TP
-.BI \-\-relabel_md5
+.B \-\-relabel_md5
Relabel sequence identifiers in the output files produced by
\-\-consout, \-\-profile and \-\-centroids options. Please see the
description of the same option under Chimera detection for details.
+.TAG relabel_self
.TP
-.BI \-\-relabel_sha1
+.B \-\-relabel_self
Relabel sequence identifiers in the output files produced by
\-\-consout, \-\-profile and \-\-centroids options. Please see the
description of the same option under Chimera detection for details.
+.TAG relabel_sha1
+.TP
+.B \-\-relabel_sha1
+Relabel sequence identifiers in the output files produced by
+\-\-consout, \-\-profile and \-\-centroids options. Please see the
+description of the same option under Chimera detection for details.
+.TAG sizein
.TP
.B \-\-sizein
Take into account the abundance annotations present in the input fasta
file (search for the pattern '[>;]size=\fIinteger\fR[;]' in sequence
headers).
+.TAG sizeorder
.TP
.B \-\-sizeorder
When an amplicon is close to 2 or more centroids, both within the
@@ -812,6 +898,7 @@ specified with \-\-maxaccepts is higher than one. The \-\-sizeorder
option turns on what is sometimes referred to as abundance-based
greedy clustering (AGC), in contrast to the default distance-based
greedy clustering (DGC).
+.TAG sizeout
.TP
.B \-\-sizeout
Add abundance annotations to the output fasta files (add the pattern
@@ -822,10 +909,12 @@ the total abundance of the amplicons included in the cluster
(\-\-centroids option). If \-\-sizein is not specified, input
abundances are set to 1 for amplicons, and to the number of amplicons
per cluster for centroids.
+.TAG strand
.TP
.BI \-\-strand\~ "plus|both"
When comparing sequences with the cluster seed, check the \fIplus\fR
strand only (default) or check \fIboth\fR strands.
+.TAG uc
.TP
.BI \-\-uc \0filename
Output clustering results in \fIfilename\fR using a tab-separated
@@ -866,14 +955,17 @@ Label of the query sequence (H), or of the centroid sequence (S, C).
Label of the centroid sequence (H), or set to '*' (S, C).
.RE
.RE
+.TAG unoise_alpha
.TP
.BI \-\-unoise_alpha\~ real
Specify the alpha parameter to the \-\-cluster_unoise command. The
default i 2.0.
+.TAG usersort
.TP
.B \-\-usersort
When using \-\-cluster_smallmem, allow any sequence input order, not
just a decreasing length ordering.
+.TAG xsize
.TP
.B \-\-xsize
Strip abundance information from the headers when writing the output
@@ -888,8 +980,10 @@ definitions): \-\-alnout, \-\-blast6out, \-\-fastapairs, \-\-matched,
.RE
.PP
.\" ----------------------------------------------------------------------------
+.TAG dereplication-and-rereplication-options
Dereplication and rereplication options:
.RS
+.TAG derep_fulllength
.TP 9
.BI \-\-derep_fulllength \0filename
Merge strictly identical sequences contained in
@@ -898,6 +992,7 @@ length and the same string of nucleotides (case insensitive, T and U
are considered the same). See the options \-\-sizein and \-\-sizeout
to take into account and compute abundance values. This command does
not support multithreading.
+.TAG derep_prefix
.TP
.BI \-\-derep_prefix \0filename
Merge sequences with identical prefixes contained in \fIfilename\fR.
@@ -909,14 +1004,17 @@ it is clustered with the most abundant. Remaining ties are solved
using sequence headers and sequence input order. Sequence comparisons
are case insensitive, and T and U are considered identical. This
command does not support multithreading.
+.TAG maxuniquesize
.TP
.BI \-\-maxuniquesize\~ "positive integer"
Discard sequences with a post-dereplication abundance value greater
than \fIinteger\fR.
+.TAG minuniquesize
.TP
.BI \-\-minuniquesize\~ "positive integer"
Discard sequences with a post-dereplication abundance value smaller
than \fIinteger\fR.
+.TAG output
.TP
.BI \-\-output \0filename
Write the dereplicated sequences to \fIfilename\fR, in fasta format
@@ -926,35 +1024,47 @@ the number of occurrences (i.e. abundance) of each sequence is
indicated at the end of their fasta header using the pattern
';size=\fIinteger\fR;'.
.TP
+.TAG relabel
.BI \-\-relabel \0string
Please see the description of the same option under Chimera detection
for details.
.TP
+.TAG relabel_keep
.B \-\-relabel_keep
When relabelling, keep the old identifier in the header after a space.
.TP
-.BI \-\-relabel_md5
+.TAG relabel_md5
+.B \-\-relabel_md5
Please see the description of the same option under Chimera detection
for details.
.TP
-.BI \-\-relabel_sha1
+.TAG relabel_self
+.B \-\-relabel_self
+Please see the description of the same option under Chimera detection
+for details.
+.TP
+.TAG relabel_sha1
+.B \-\-relabel_sha1
Please see the description of the same option under Chimera detection
for details.
.TP
+.TAG rereplicate
.BI \-\-rereplicate \0filename
Duplicate each sequence the number of times indicated by the abundance
of each sequence in the specified file (option \-\-sizein is always
implied). The sequence labels are identical for the same sequence,
-unless \-\-relabel, \-\-relabel_sha1 or \-\-relabel_md5 is used to
-create unique labels. Output is written to the file specified with the
-\-\-output option, in FASTA format. The output file does not contain
-abundance information unless \-\-sizeout is specified, in which case
-an abundance of 1 is used.
+unless \-\-relabel, \-\-relabel_self, \-\-relabel_sha1 or
+\-\-relabel_md5 is used to create unique labels. Output is written to
+the file specified with the \-\-output option, in FASTA format. The
+output file does not contain abundance information unless \-\-sizeout
+is specified, in which case an abundance of 1 is used.
+.TAG sizein
.TP
.B \-\-sizein
Take into account the abundance annotations present in the input fasta
file (search for the pattern '[>;]size=\fIinteger\fR[;]' in sequence
headers). That option is active by default when rereplicating.
+.TAG sizeout
.TP
.B \-\-sizeout
Add abundance annotations to the output fasta file (add the pattern
@@ -964,13 +1074,16 @@ corresponding to its total abundance (sum of the abundances of its
occurrences). If \-\-sizein is not specified, input abundances are set
to 1, and each unique sequence receives a new abundance value
corresponding to its number of occurrences in the input file.
+.TAG strand
.TP
.BI \-\-strand\~ "plus|both"
When searching for strictly identical sequences, check the \fIplus\fR
strand only (default) or check \fIboth\fR strands.
+.TAG topn
.TP
.BI \-\-topn\~ "positive integer"
Output only the top \fIinteger\fR sequences (i.e. the most abundant).
+.TAG uc
.TP
.BI \-\-uc \0filename
Output full-length or prefix-dereplication results in \fIfilename\fR
@@ -1007,13 +1120,17 @@ Label of the query sequence (H), or of the centroid sequence (S, C).
Label of the centroid sequence (H), or set to '*' (S, C).
.RE
.RE
+.RE
+.PP
+.RS
+.TAG xsize
.TP
.B \-\-xsize
-Strip abundance information from the headers when writing the output
-file.
+Strip abundance information from the headers when writing the output file.
.RE
.PP
.\" ----------------------------------------------------------------------------
+.TAG extraction-options
Extraction options:
.RS
.PP
@@ -1056,20 +1173,24 @@ beginning or end of the header. Word matching is case-sensitive. The
\-\-label_field option will limit the matching of words to a certain
field in the header.
.PP
+.TAG fastaout
.TP 9
.BI \-\-fastaout \0filename
Write the extracted sequences in FASTA format to the file with the
given name.
+.TAG fastqout
.TP
.BI \-\-fastqout \0filename
Write the extracted sequences in FASTQ format to the file with the
given name. This option is illegal if the input is in FASTA format.
+.TAG fastx_getseq
.TP
.BI \-\-fastx_getseq \0filename
Extract sequences from the given FASTA or FASTQ file. Specify a label
to match using the \-\-label option. Output files are specified with
the \-\-fastaout, \-\-fastqout, \-\-notmatched and \-\-notmatchedfq
options.
+.TAG fastx_getseqs
.TP
.BI \-\-fastx_getseqs \0filename
Extract sequences from the given FASTA or FASTQ file. Specify the
@@ -1077,6 +1198,7 @@ label or labels to match using one of the following options: \-\-label,
\-\-labels, \-\-label_word, or \-\-label_words. Output
files are specified with the \-\-fastaout, \-\-fastqout,
\-\-notmatched and \-\-notmatchedfq options.
+.TAG fastx_getsubseq
.TP
.BI \-\-fastx_getsubseq \0filename
Extract a certain part of some of the sequences in the given FASTA or
@@ -1085,11 +1207,13 @@ option. Specify the subsequence range to be extracted with the
\-\-subseq_start and \-\-subseq_end options. Output files are
specified with the \-\-fastaout, \-\-fastqout, \-\-notmatched and
\-\-notmatchedfq options.
+.TAG label
.TP
.BI \-\-label \0string
Specifiy the label to match in the sequence header. Unless the
\-\-label_substr_match option is given, the label must match the
entire header. The comparison is not case-sensitive.
+.TAG label_field
.TP
.BI \-\-label_field \0string
Specify a field name to be used when matching using the \-\-label_word
@@ -1098,17 +1222,20 @@ must precede the word to be matched with an equals sign (=) in
between. The field must be delimited by semicolons or the beginning or
end of the header. The following header will match the label 123 in
the field abc: "seq1;abc=123".
+.TAG label_substr_match
.TP
.BI \-\-label_substr_match
The labels specified with the \-\-label or the \-\-labels option may
match anywhere in the header if this option is given. Otherwise a
label needs to match the entire header.
+.TAG label_word
.TP
.BI \-\-label_word \0string
Specifiy a word to match in the sequence header. Words are defined as
strings delimited by either the start or end of the header or by any
symbol that is not a letter (A-Z, a-z) or digit (0-9). The comparison is
case-sensitive.
+.TAG label_words
.TP
.BI \-\-label_words \0filename
Specify a file containing words to be matched against the sequence
@@ -1116,27 +1243,32 @@ headers. The plain text file must contain one word on each line.
Words are defined as strings delimited by either the start or end of
the header or by any symbol that is not a letter (A-Z, a-z) or digit
(0-9). The comparison is case-sensitive.
+.TAG labels
.TP
.BI \-\-labels \0filename
Specify a file containing labels to be matched against the sequence
headers. The plain text file must contain one label on each
line. Unless the \-\-label_substr_match option is given, a label must
match the entire header. The comparison is not case-sensitive.
+.TAG notmatched
.TP
.BI \-\-notmatched \0filename
Write the sequences that were not extracted to the file with the given
name, in FASTA format.
+.TAG notmatchedfq
.TP
.BI \-\-notmatchedfq \0filename
Write the sequences that were not extracted to the file with the given
name, in FASTQ format. This option is illegal if the input is in FASTA
format.
+.TAG subseq_end
.TP
.BI \-\-subseq_end\~ "positive integer"
Specifiy the end position in the sequences when extracting
subsequences using the \-\-fastx_getsubseq command. Positions are
1-based, so the sequences start at position 1. The default is to end
at the end of the sequence if this option is not specified.
+.TAG subseq_start
.TP
.BI \-\-subseq_start\~ "positive integer"
Specifiy the starting position in the sequences when extracting
@@ -1147,6 +1279,7 @@ specified.
.RE
.PP
.\" ----------------------------------------------------------------------------
+.TAG fasta-fastq-file-processing-options
FASTA/FASTQ file processing options:
.RS
.PP
@@ -1164,6 +1297,7 @@ merged using the \-\-fastq_mergepairs command. The \-\-fastx_revcomp
command reverse-complements sequences. Finally, the \-\-sff_convert
command can be used to convert SFF files to FASTQ.
.PP
+.TAG eeout
.TP 9
.B \-\-eeout
When using \-\-fastq_filter or \-\-fastq_mergepairs, include the
@@ -1171,6 +1305,7 @@ number of expected errors (ee) in the sequence header of FASTQ and
FASTA files. This option is a synonym of the \-\-fastq_eeout
option. Use the \-\-xee option to remove this information from
headers.
+.TAG eetabbedout
.TP
.BI \-\-eetabbedout \0filename
When specified with the \-\-fastq_mergepairs command, write statistics
@@ -1181,33 +1316,40 @@ reverse read, the number of observed errors in the forward read, and
the number of observed errors in the reverse read. The observed number
of errors are the number of differences in the overlap region of the
merged sequence relative to each of the reads in the pair.
+.TAG fastaout
.TP
.BI \-\-fastaout \0filename
When using \-\-fastq_filter, \-\-fastq_mergepairs or \-\-fastx_filter,
write to the given FASTA-formatted file the sequences passing the
filter, or the merged sequences.
+.TAG fastaout_rev
.TP
.BI \-\-fastaout_rev \0filename
When using \-\-fastq_filter, or \-\-fastx_filter,
write to the given FASTA-formatted file the reverse reads passing the
filter.
+.TAG fastaout_notmerged_fwd
.TP
.BI \-\-fastaout_notmerged_fwd \0filename
When using \-\-fastq_mergepairs, write forward reads not merged to the
specified FASTA file.
+.TAG fastaout_notmerged_rev
.TP
.BI \-\-fastaout_notmerged_rev \0filename
When using \-\-fastq_mergepairs, write reverse reads not merged to the
specified FASTA file.
+.TAG fastaout_discarded
.TP
.BI \-\-fastaout_discarded \0filename
Write sequences that do not pass the filter of the \-\-fastq_filter or
\-\-fastx_filter command to the given FASTA-formatted file.
+.TAG fastaout_discarded_rev
.TP
.BI \-\-fastaout_discarded_rev \0filename
Write reverse reads that do not pass the filter of the
\-\-fastq_filter or \-\-fastx_filter command to the given
FASTA-formatted file.
+.TAG fastq_allowmergestagger
.TP
.B \-\-fastq_allowmergestagger
When using \-\-fastq_mergepairs, allow to merge staggered read
@@ -1217,6 +1359,7 @@ situation can occur when a very short fragment is sequenced. The 3'
overhang of the reverse read is not included in the merged
sequence. The opposite option is the \-\-fastq_nostagger option. The
default is to discard staggered pairs.
+.TAG fastq_ascii
.TP
.BI \-\-fastq_ascii\~ "positive integer"
Define the ASCII character number used as the basis for the FASTQ
@@ -1224,12 +1367,14 @@ quality score. The default is 33, which is used by the Sanger /
Illumina 1.8+ FASTQ format (phred+33). The value 64 is used by the
Solexa, Illumina 1.3+ and Illumina 1.5+ formats (phred+64). Only 33
and 64 are valid arguments.
+.TAG fastq_asciiout
.TP
.BI \-\-fastq_asciiout\~ "positive integer"
When using \-\-fastq_convert or \-\-sff_convert, define the ASCII
character number used as the basis for the FASTQ quality score when
writing FASTQ output files. The default is 33. Only 33 and 64 are
valid arguments.
+.TAG fastq_chars
.TP
.BI \-\-fastq_chars \0filename
Summarize the composition of sequence and quality strings contained in
@@ -1247,6 +1392,7 @@ analyzing the range of observed quality score values. In case of
success, \-\-fastq_chars suggests values for the \-\-fastq_ascii (33
or 64), \-\-fastq_qmin and \-\-fastq_qmax options to be used with the
other commands that require a FASTQ input file.
+.TAG fastq_convert
.TP
.BI \-\-fastq_convert \0filename
Convert between the different variants of the FASTQ file format. The
@@ -1256,6 +1402,7 @@ output quality encoding must be specified with the \-\-fastq_asciiout
option (default 33). The minimum and maximum output quality scores may
be limited using the \-\-fastq_qminout and \-\-fastq_qmaxout
options. The output file is specified with the \-\-fastqout option.
+.TAG fastq_eeout
.TP
.B \-\-fastq_eeout
When using \-\-fastq_filter, \-\-fastx_filter or \-\-fastq_mergepairs,
@@ -1263,6 +1410,7 @@ include the number of expected errors (ee) in the sequence header of
FASTQ and FASTA files. This option is a synonym of the \-\-eeout
option. Use the \-\-xee option to remove this information from
headers.
+.TAG fastq_eestats
.TP
.BI \-\-fastq_eestats \0filename
Analyze a FASTQ file and report statistics on the distributions of
@@ -1281,6 +1429,7 @@ EE distributions, the following statistics are included: minimum value
(Hi), and maximum value (Max). The quality encoding and the range of
quality values may be specified with \-\-fastq_ascii \-\-fastq_qmin
and \-\-fastq_qmax.
+.TAG fastq_eestats2
.TP
.BI \-\-fastq_eestats2 \0filename
Analyze the specified FASTQ file and report statistics on the number
@@ -1305,11 +1454,13 @@ expected error (EE) cutoffs may be specified with the \-\-ee_cutoffs
option which requires a comma-separated list of floating point numbers
as its argument. The default setting is "0.5,1.0,2.0" that indicates
that expected error levels of 0.5, 1.0 and 2.0 should be used.
+.TAG fastq_filter
.TP
.BI \-\-fastq_filter \0filename
Trim and/or filter sequences in the given FASTQ file. Similar to
the \-\-fastx_filter command, but works only on FASTQ files. See
\-\-fastx_filter for details.
+.TAG fastq_join
.TP
.BI \-\-fastq_join\0 filename
Join paired-end sequence reads into one sequence and add a gap between
@@ -1326,12 +1477,14 @@ IIIIIIII, corresponding to a base quality score of 40 (a very high
quality score with error probability 0.0001). The joined sequences are
output to the file(s) specified with the \-\-fastaout or \-\-fastqout
options.
+.TAG fastq_maxdiffs
.TP
.BI \-\-fastq_maxdiffs\~ "positive integer"
When using \-\-fastq_mergepairs, specify the maximum number of
non-matching nucleotides allowed in the overlap region. That option
has a strong influence on the merging success rate. The default
value is 10.
+.TAG fastq_maxdiffpct
.TP
.BI \-\-fastq_maxdiffpct\~ real
When using \-\-fastq_mergepairs, specify the maximum percentage of
@@ -1339,27 +1492,33 @@ non-matching nucleotides allowed in the overlap region. The default
value is 100.0%. There are other more sophisticated rules in the
merging algorithm that will discard read pairs with a high fraction of
mismatches.
+.TAG fastq_maxee
.TP
.BI \-\-fastq_maxee\~ real
When using \-\-fastq_filter, \-\-fastq_mergepairs or \-\-fastx_filter,
discard sequences with more than the specified number of expected
errors.
+.TAG fastq_maxee_rate
.TP
.BI \-\-fastq_maxee_rate\~ real
When using \-\-fastq_filter or \-\-fastx_filter, discard sequences
with more than the specified number of expected errors per base.
+.TAG fastq_maxlen
.TP
.BI \-\-fastq_maxlen\~ "positive integer"
When using \-\-fastq_filter, \-\-fastq_mergepairs or \-\-fastx_filter,
discard sequences with more than the specified number of bases.
+.TAG fastq_maxmergelen
.TP
.BI \-\-fastq_maxmergelen\~ "positive integer"
When using \-\-fastq_mergepairs, specify the maximum length of the
merged sequence. By default there is no limit.
+.TAG fastq_maxns
.TP
.BI \-\-fastq_maxns\~ "positive integer"
When using \-\-fastq_filter, \-\-fastq_mergepairs or \-\-fastx_filter,
discard sequences with more than the specified number of N's.
+.TAG fastq_mergepairs
.TP
.BI \-\-fastq_mergepairs\0 filename
Merge paired-end sequence reads into one sequence. The forward reads
@@ -1390,46 +1549,55 @@ may be specified with the \-\-fastq_minmergelen and
are: \-\-fastq_ascii, \-\-fastq_maxee, \-\-fastq_nostagger,
\-\-fastq_qmax, \-\-fastq_qmaxout, \-\-fastq_qmin, \-\-fastq_qminout,
and \-\-label_suffix.
+.TAG fastq_minlen
.TP
.BI \-\-fastq_minlen\~ "positive integer"
When using \-\-fastq_filter, \-\-fastq_mergepairs or \-\-fastx_filter,
discard sequences with less than the specified number of bases
(default 1).
+.TAG fastq_minmergelen
.TP
.BI \-\-fastq_minmergelen\~ "positive integer"
When using \-\-fastq_mergepairs, specify the minimum length of the
merged sequence. The default is 1.
+.TAG fastq_minovlen
.TP
.BI \-\-fastq_minovlen\~ "positive integer"
When using \-\-fastq_mergepairs, specify the minimum overlap between
the merged reads. The default is 10.
+.TAG fastq_nostagger
.TP
.B \-\-fastq_nostagger
When using \-\-fastq_mergepairs, forbid the merging of staggered read
pairs. This is the default behaviour of \-\-fastq_mergepairs. To
change that behaviour, see the \-\-fastq_allowmergestagger option.
+.TAG fastq_qmax
.TP
.BI \-\-fastq_qmax\~ "positive integer"
Specify the maximum quality score accepted when reading FASTQ
files. The default is 41, which is usual for recent Sanger/Illumina
1.8+ files.
+.TAG fastq_qmaxout
.TP
.BI \-\-fastq_qmaxout\~ "positive integer"
When using \-\-fastq_convert or \-\-sff_convert, specify the maximum
quality score used when writing FASTQ files. The default is 41, which
is usual for recent Sanger/Illumina 1.8+ files. Older formats may use
a maximum quality score of 40.
+.TAG fastq_qmin
.TP
.BI \-\-fastq_qmin\~ "positive integer"
Specify the minimum quality score accepted for FASTQ files. The
default is 0, which is usual for recent Sanger/Illumina 1.8+
files. Older formats may use scores between -5 and 2.
+.TAG fastq_qminout
.TP
.BI \-\-fastq_qminout\~ "positive integer"
When using \-\-fastq_convert or \-\-sff_convert, specify the minimum
quality score used when writing FASTQ files. The default is 0, which
is usual for Sanger/Illumina 1.8+ files. Older versions of the format
may use scores between -5 and 2.
+.TAG fastq_stats
.TP
.BI \-\-fastq_stats \0filename
Analyze a FASTQ file and report the number of reads it contains. The
@@ -1513,63 +1681,77 @@ position with a quality \fIQ\fR below 5, 10, 15 or 20 (option
\-\-fastq_truncqual \fIQ\fR).
.RE
.RE
+.TAG fastq_stripleft
.TP
.BI \-\-fastq_stripleft\~ "positive integer"
When using \-\-fastq_filter or \-\-fastx_filter, strip the specified
number of bases from the left end of the reads.
+.TAG fastq_stripright
.TP
.BI \-\-fastq_stripright\~ "positive integer"
When using \-\-fastq_filter or \-\-fastx_filter, strip the specified
number of bases from the right end of the reads.
+.TAG fastq_tail
.TP
.BI \-\-fastq_tail\~ "positive integer"
When using \-\-fastq_chars, count the number of times a series of
characters of length \fIk\fR appears at the end of quality strings. By
default, \fIk\fR = 4.
+.TAG fastq_truncee
.TP
.BI \-\-fastq_truncee\~ real
When using \-\-fastq_filter or \-\-fastx_filter, truncate sequences so
that their total expected error is not higher than the specified
value.
+.TAG fastq_trunclen
.TP
.BI \-\-fastq_trunclen\~ "positive integer"
When using \-\-fastq_filter or \-\-fastx_filter, truncate sequences to
the specified length. Shorter sequences are discarded.
+.TAG fastq_trunclen_keep
.TP
.BI \-\-fastq_trunclen_keep\~ "positive integer"
When using \-\-fastq_filter or \-\-fastx_filter, truncate sequences to
the specified length. Shorter sequences are not discarded.
+.TAG fastq_truncqual
.TP
.BI \-\-fastq_truncqual\~ "positive integer"
When using \-\-fastq_filter or \-\-fastx_filter, truncate sequences
starting from the first base with the specified base quality score
value or lower.
+.TAG fastqout
.TP
.BI \-\-fastqout \0filename
When using \-\-fastq_filter, \-\-fastq_mergepairs or \-\-fastx_filter,
write to the given FASTQ-formatted file the sequences passing the
filter, or the merged sequences.
+.TAG fastqout_rev
.TP
.BI \-\-fastqout_rev \0filename
When using \-\-fastq_filter or \-\-fastx_filter,
write to the given FASTQ-formatted file the reverse reads passing the
filter.
+.TAG fastqout_discarded
.TP
.BI \-\-fastqout_discarded \0filename
When using \-\-fastq_filter or \-\-fastx_filter, write sequences that
do not pass the filter to the given FASTQ-formatted file.
+.TAG fastqout_discarded_rev
.TP
.BI \-\-fastqout_discarded_rev \0filename
When using \-\-fastq_filter or \-\-fastx_filter, write reverse reads that
do not pass the filter to the given FASTQ-formatted file.
+.TAG fastqout_notmerged_fwd
.TP
.BI \-\-fastqout_notmerged_fwd \0filename
When using \-\-fastq_mergepairs, write forward reads not merged to the
specified FASTQ file.
+.TAG fastqout_notmerged_rev
.TP
.BI \-\-fastqout_notmerged_rev \0filename
When using \-\-fastq_mergepairs, write reverse reads not merged to the
specified FASTQ file.
+.TAG fastx_filter
.TP
.BI \-\-fastx_filter \0filename
Trim and/or filter the sequences in the given FASTA or FASTQ file and
@@ -1600,61 +1782,79 @@ relabel the output sequences. The \-\-eeout option may be used to output the
expected number of errors in each sequence. After all sequences have
been processed, the number of kept and discarded sequences will be
shown, as well as how many of the kept sequences were trimmed.
+.TAG fastx_revcomp
.TP
.BI \-\-fastx_revcomp \0filename
Reverse-complement the sequences in the given FASTA or FASTQ file to a
file specified with the \-\-fastaout and/or \-\-fastqout options. If
the input file is in FASTA format, the output can not be written back
to a FASTQ file due to missing base quality scores.
+.TAG join_padgap
.TP
.BI \-\-join_padgap\~ string
When running \-\-fastq_join, use the \fIstring\fR as a sequence
padding string. The default is NNNNNNNN (8 N's).
+.TAG join_padgapq
.TP
.BI \-\-join_padgapq\~ string
When running \-\-fastq_join, use the \fIstring\fR as a quality padding
string. The default is a string of I's equal in length to the sequence
padding string. The letter I corresponds to a base quality score of 40
indicating a very high quality base with error probability of 0.0001.
+.TAG label_suffix
.TP
.BI \-\-label_suffix\~ string
When using \-\-fastx_revcomp or \-\-fastq_mergepairs, add the suffix
\fIstring\fR to sequence headers.
+.TAG maxsize
.TP
.BI \-\-maxsize\~ "positive integer"
When using \-\-fastq_filter or \-\-fastx_filter, discard sequences
with an abundance higher than the specified value.
+.TAG minsize
.TP
.BI \-\-minsize\~ "positive integer"
When using \-\-fastq_filter or \-\-fastx_filter, discard sequences
with an abundance lower than the specified value.
+.TAG output
.TP
.BI \-\-output \0filename
When using \-\-fastq_eestats or \-\-fastq_eestats2, write tabulated
results to \fIfilename\fR. See \-\-fastq_eestats's and
\-\-fastq_eestats2's documentation for a complete description of the
table.
+.TAG relabel_keep
.TP
.B \-\-relabel_keep
When using \-\-relabel, keep the old identifier in the header after a
space.
+.TAG relabel
.TP
.BI \-\-relabel \0string
Please see the description of the same option under Chimera detection
for details.
+.TAG relabel_md5
.TP
.BI \-\-relabel_md5
Please see the description of the same option under Chimera detection
for details.
+.TAG relabel_self
+.TP
+.BI \-\-relabel_self
+Please see the description of the same option under Chimera detection
+for details.
+.TAG relabel_sha1
.TP
.BI \-\-relabel_sha1
Please see the description of the same option under Chimera detection
for details.
+.TAG reverse
.TP
.BI \-\-reverse \0filename
When using \-\-fastq_filter, \-\-fastx_filter, \-\-fastq_mergepairs or
\-\-fastq_join, specify the FASTQ file containing containing the
reverse reads.
+.TAG sff_convert
.TP
.BI \-\-sff_convert \0filename
Convert the given SFF file to FASTQ. The FASTQ output file is
@@ -1665,15 +1865,18 @@ converted to lower case, while the rest is in upper case. The output
quality encoding may be specified with the \-\-fastq_asciiout option
(default 33). The minimum and maximum output quality scores may be
limited using the \-\-fastq_qminout and \-\-fastq_qmaxout options.
+.TAG sff_clip
.TP
.BI \-\-sff_clip
Specifies that the sequences converted by the \-\-sff_convert command
should be clipped in both ends as indicated in the SFF file. By
default no clipping is performed.
+.TAG xsize
.TP
.B \-\-xsize
Strip abundance information from the headers when writing the output
file.
+.TAG xee
.TP
.B \-\-xee
Strip information about expected errors (ee) from the output file
@@ -1682,6 +1885,7 @@ headers. This information is added by the \-\-fastq_eeout and
.RE
.PP
.\" ----------------------------------------------------------------------------
+.TAG masking-options
Masking options:
.RS
.PP
@@ -1689,70 +1893,87 @@ An input sequence can be composed of lower- or uppercase letters. When
soft masking is specified, lower case letters are treated as symbols
that should be masked. Otherwise the case of the input sequences is
ignored.
-.br
+.PP
Masking is performed by the commands for chimera detection
(uchime_denovo, uchime_ref), clustering (cluster_fast,
cluster_smallmem, cluster_size), masking (maskfasta, fastx_mask),
pairwise alignment (allpairs_global) and searching (search_exact,
usearch_global).
-.br
+.PP
Masking is usually specified with the \-\-qmask option, while the
\-\-dbmask option is used for the database sequences specified with
the \-\-db option with the \-\-usearch_global, \-\-search_exact and
\-\-uchime_ref commands.
-.br
+.PP
The argument to the \-\-qmask and \-\-dbmask option may be none, soft
or dust. If the argument is none, the no masking is performed. If the
argument is soft the lower case symbols are masked. Finally, if the
argument is dust, the sequence is masked using the DUST algorithm by
Tatusov and Lipman to mask low-complexity regions.
-.br
+.PP
If the \-\-hardmask option is specified, all masked regions are
converted to N's, otherwise masked regions are indicated by lower case
letters.
-.br
+.PP
If any sequence is masked, the masked version of the sequence (with
lower case letters or N's) is used in all output files. Otherwise the
sequence is unmodified. The exception is the sequences in the output
file specified with the \-\-uchimealns option, where the input
sequences are converted to upper case first and lower case letters
indicate disagreement between the aligned sequences.
-.br
+.PP
+The \-\-qmask option (or \-\-dbmask for database sequences) may be
+combined with the \-\-hardmask option. The results of using the none,
+dust or soft argument to \-\-qmask or \-\-dbmask are presented below,
+assuming each input sequence contains both lower and uppercase
+symbols.
+.PP
+Results if the \-\-hardmask option is off (default):
+.RS
+.TP 9
+.B none:
+no masking, all symbols used, no change
+.TP
+.B dust:
+masked symbols lowercased, rest uppercased
+.TP
+.B soft:
+lowercase symbols masked, no case changes
+.RE
+.PP
+Results if the \-\-hardmask option is on:
+.RS
+.TP 9
+.B none:
+no masking, all symbols used, no change
+.TP
+.B dust:
+masked symbols changed to Ns, rest unchanged
+.TP
+.B soft:
+lowercase symbols masked and changed to Ns
+.RE
+.PP
When a sequence region is masked, words in the region are not included
in the indices used in the heuristic search algorithm. In all other
aspects, the region is treated as other regions.
-.br
+.PP
Regions in sequences that are hardmasked (with N's) have a zero
alignment score and do not contribute to an alignment.
-.br
-Here are the results of combined masking options \-\-qmask (or
-\-\-dbmask for database sequences) and \-\-hardmask, assuming each
-input sequence contains both lower and uppercase nucleotides:
-.PP
-.ce 10 \# center the table (10 lines)
-.TS
-tab(:);
-c c c
-l l l.
-qmask:hardmask:action
-_
-none:off:no masking, all symbols used, no change
-none:on:no masking, all symbols used, no change
-dust:off:masked symbols lowercased, rest uppercased
-dust:on:masked symbols changed to Ns, rest unchanged
-soft:off:lowercase symbols masked, no case changes
-soft:on:lowercase symbols masked and changed to Ns
-.TE
-.ce 0 \# end of centering
+.RE
.PP
+.RS
+.TAG fastaout
.TP 9
.BI \-\-fastaout \0filename
Write the masked sequences to \fIfilename\fR, in fasta format. Applies
only to the \-\-fastx_mask command.
+.TAG fastqout
.TP
.BI \-\-fastqout \0filename
Write the masked sequences to \fIfilename\fR, in fastq format. Applies
only to the \-\-fastx_mask command.
+.TAG fastx_mask
.TP
.BI \-\-fastx_mask \0filename
Mask regions in sequences contained
@@ -1762,10 +1983,12 @@ are specified with the \-\-fastaout and \-\-fastqout options. The
minimum and maximum percentage of unmasked residues may be specified
with the \-\-min_unmasked_pct and \-\-max_unmasked_pct options,
respectively.
+.TAG hardmask
.TP
.B \-\-hardmask
Symbols in masked regions are replaced by N's. The default is to
replace the masked regions by lower case letters.
+.TAG maskfasta
.TP
.BI \-\-maskfasta \0filename
Mask regions in sequences contained
@@ -1773,18 +1996,22 @@ in the fasta file \fIfilename\fR. The default is to mask using
\fIdust\fR (use \-\-qmask to modify that behavior). The output file is
specified with the \-\-output option. This command is depreciated,
please use \-\-fastx_mask instead.
+.TAG max_unmasked_pct
.TP
.BI \-\-max_unmasked_pct \0real
Discard sequences with more than the specified maximum percentage of
unmasked residues. Works only with \-\-fastx_mask.
+.TAG min_unmasked_pct
.TP
.BI \-\-min_unmasked_pct \0real
Discard sequences with less than the specified minimum percentage of
unmasked residues. Works only with \-\-fastx_mask.
+.TAG output
.TP
.BI \-\-output \0filename
Write the masked sequences to \fIfilename\fR, in fasta format. Applies
only to the \-\-mask_fasta command.
+.TAG qmask
.TP
.BI \-\-qmask\~ "none|dust|soft"
If the argument is dust, mask regions in sequences using the
@@ -1795,6 +2022,7 @@ mask.
.RE
.PP
.\" ----------------------------------------------------------------------------
+.TAG restriction-site-cutting-options
Restriction site cutting options:
.RS
.PP
@@ -1807,11 +2035,13 @@ the \-\-fastaout_rev option. Input sequences that do not match are
written to the file specified with the option \-\-fastaout_discarded,
and their reverse complement are also written to the file specfied
with the \-\-fastaout_discarded_rev option. The relabel options
-(\-\-relabel, \-\-relabel_keep, \-\-relabel_md5, and \-\-relabel_sha1)
-may be used to relabel the output sequences).
+(\-\-relabel, \-\-relabel_self, \-\-relabel_keep, \-\-relabel_md5, and
+\-\-relabel_sha1) may be used to relabel the output sequences).
+.TAG cut
.TP 9
.BI \-\-cut \0filename
Specify the input file with sequences in FASTA format.
+.TAG cut_pattern
.TP
.BI \-\-cut_pattern \0string
Specify the restriction site cutting pattern and positions. The
@@ -1825,17 +2055,21 @@ pattern for the EcoRI restriction site. For such palindromic patterns
possible fragments on both strands. For non-palindromic sites, it may
be necessary to run the command also on the reverse complemented input
sequences. Exactly one cutting site on each strand must be indicated.
+.TAG fastaout
.TP
.BI \-\-fastaout \0filename
Specify the output file for the resulting fragments on the forward
strand.
+.TAG fastaout_rev
.TP
.BI \-\-fastaout_rev \0filename
Specify the output file for the resulting fragments on the reverse
strand.
+.TAG fastaout_discarded
.TP
.BI \-\-fastaout_discarded \0filename
Specify the output file for the non-matching sequences.
+.TAG fastaout_discarded_rev
.TP
.BI \-\-fastaout_discarded_rev \0filename
Specify the output file for the non-matching seqeunces, reverse
@@ -1843,6 +2077,7 @@ complemented.
.RE
.PP
.\" ----------------------------------------------------------------------------
+.TAG pairwise-alignment-options
Pairwise alignment options:
.RS
.PP
@@ -1855,24 +2090,29 @@ identity level with \-\-id to discard weak alignments. Most other
accept/reject options (see Searching options below) may also be
used. Sequences are aligned on their \fIplus\fR strand only. Masking
is performed as usual and specified with \-\-qmask and \-\-hardmask.
+.TAG acceptall
.TP 9
.B \-\-acceptall
Write the results of all alignments to output files. This option
overrides all other accept/reject options (including \-\-id).
+.TAG allpairs_global
.TP
.BI \-\-allpairs_global \0filename
Perform optimal global pairwise alignments of all vs. all fasta
sequences contained in \fIfilename\fR. This command is multi-threaded.
+.TAG id
.TP
.BI \-\-id \0real
Reject the sequence match if the pairwise identity is lower than
\fIreal\fR (value ranging from 0.0 to 1.0 included).
+.TAG threads
.TP
.BI \-\-threads\~ "positive integer"
Number of computation threads to use (1 to 1024). The number of
threads should be lesser or equal to the number of available CPU
cores. The default is to use all available resources and to launch one
thread per logical core.
+.TAG uc
.TP
.BI \-\-uc \0filename
Output pairwise alignment results in \fIfilename\fR using a
@@ -1912,13 +2152,16 @@ Label of the target sequence.
.RE
.PP
.\" ----------------------------------------------------------------------------
+.TAG searching-options
Searching options:
.RS
+.TAG alnout
.TP 9
.BI \-\-alnout \0filename
Write pairwise global alignments to \fIfilename\fR using a
human-readable format. Use \-\-rowlen to modify alignment
length. Output order may vary when using multiple threads.
+.TAG biomout
.TP
.BI \-\-biomout \0filename
Write search results to an OTU table in the biom version 1.0 file
@@ -1926,6 +2169,7 @@ format. The query file contains the samples, while the database file
contains the OTUs. Sample and OTU identifiers are extracted from the
header of these sequences. See the \-\-biomout option in the
Clustering section for further details.
+.TAG blast6out
.TP
.BI \-\-blast6out \0filename
Write search results to \fIfilename\fR using a blast-like
@@ -1986,6 +2230,7 @@ alignments). Always set to -1.
alignments). Always set to 0.
.RE
.RE
+.TAG db
.TP
.BI \-\-db \0filename
Compare query sequences (specified with \-\-usearch_global) to the
@@ -1993,32 +2238,38 @@ fasta-formatted target sequences contained in \fIfilename\fR, using
global pairwise alignment. Alternatively, the name of a preformatted
UDB database created using the makeudb_usearch command (see below) may
be specified.
+.TAG dbmask
.TP
.BI \-\-dbmask\~ "none|dust|soft"
Mask regions in the target database sequences using the dust method or
the soft method, or do not mask (none). Warning, when using soft
masking search commands become case sensitive. The default is to mask
using dust.
+.TAG dbmatched
.TP
.BI \-\-dbmatched \0filename
Write database target sequences matching at least one query sequence
to \fIfilename\fR, in fasta format. If the option \-\-sizeout is used,
the number of queries that matched each target sequence is indicated
using the pattern ";size=\fIinteger\fR;".
+.TAG dbnotmatched
.TP
.BI \-\-dbnotmatched \0filename
Write database target sequences not matching query sequences to
\fIfilename\fR, in fasta format.
+.TAG fastapairs
.TP
.BI \-\-fastapairs \0filename
Write pairwise alignments of query and target sequences to
\fIfilename\fR, in fasta format.
+.TAG fulldp
.TP
.B \-\-fulldp
Dummy option for compatibility with usearch. To maximize search
sensitivity, \fBvsearch\fR uses a 8-way 16-bit SIMD vectorized full
dynamic programming algorithm (Needleman-Wunsch), whether or not
\-\-fulldp is specified.
+.TAG gapext
.TP
.BI \-\-gapext \0string
Set penalties for a gap extension. See \-\-gapopen for a complete
@@ -2026,6 +2277,7 @@ description of the penalty declaration system. The default is to
initialize the six gap extending penalties using a penalty of 2 for
extending internal gaps and a penalty of 1 for extending terminal
gaps, in both query and target sequences (i.e. 2I/1E).
+.TAG gapopen
.TP
.BI \-\-gapopen \0string
Set penalties for a gap opening. A gap opening can occur in six
@@ -2060,11 +2312,13 @@ integer gap penalties. Because the lowest gap penalties are 0.5 by
default in usearch, all default scores and gap penalties in
\fBvsearch\fR have been doubled to maintain equivalent penalties and
to produce identical alignments.
+.TAG hardmask
.TP
.B \-\-hardmask
Mask sequence regions by replacing them with Ns instead of setting
them to lower case as is the default. For more information, please see
the Masking section.
+.TAG id
.TP
.BI \-\-id \0real
Reject the sequence match if the pairwise identity is lower than
@@ -2080,6 +2334,7 @@ query needs to match the target. Consequently, using values lower than
pairwise identity is by default defined as the number of (matching
columns) / (alignment length - terminal gaps). That definition can be
modified by \-\-iddef.
+.TAG iddef
.TP
.BI \-\-iddef\~ "0|1|2|3|4"
Change the pairwise identity definition used in \-\-id. Values
@@ -2107,25 +2362,31 @@ The option \-\-userfields accepts the fields id0 to id4, in addition
to the field id, to report the pairwise identity values corresponding
to the different definitions.
.RE
+.TAG idprefix
.TP
.BI \-\-idprefix\~ "positive integer"
Reject the sequence match if the first \fIinteger\fR nucleotides of
the target do not match the query.
+.TAG idsuffix
.TP
.BI \-\-idsuffix\~ "positive integer"
Reject the sequence match if the last \fIinteger\fR nucleotides of the
target do not match the query.
+.TAG leftjust
.TP
.B \-\-leftjust
Reject the sequence match if the pairwise alignment begins with gaps.
+.TAG match
.TP
.BI \-\-match\~ "integer"
Score assigned to a match (i.e. identical nucleotides) in the pairwise
alignment. The default value is 2.
+.TAG matched
.TP
.BI \-\-matched \0filename
Write query sequences matching database target sequences to
\fIfilename\fR, in fasta format.
+.TAG maxaccepts
.TP
.BI \-\-maxaccepts\~ "positive integer"
Maximum number of hits to accept before stopping the search. The
@@ -2138,31 +2399,38 @@ criteria, it is accepted as best hit and the search process stops for
that query. If \-\-maxaccepts is set to a higher value, more hits are
accepted. If \-\-maxaccepts and \-\-maxrejects are both set to 0, the
complete database is searched.
+.TAG maxdiffs
.TP
.BI \-\-maxdiffs\~ "positive integer"
Reject the sequence match if the alignment contains at least
\fIinteger\fR substitutions, insertions or deletions.
+.TAG maxgaps
.TP
.BI \-\-maxgaps\~ "positive integer"
Reject the sequence match if the alignment contains at least
\fIinteger\fR insertions or deletions.
+.TAG maxhits
.TP
.BI \-\-maxhits\~ "positive integer"
Maximum number of hits to show once the search is terminated (hits are
sorted by decreasing identity). Unlimited by default. That option
applies to \-\-alnout, \-\-blast6out, \-\-fastapairs, \-\-samout,
\-\-uc, or \-\-userout output files.
+.TAG maxid
.TP
.BI \-\-maxid \0real
Reject the sequence match if the percentage of identity between the
two sequences is greater than \fIreal\fR.
+.TAG maxqsize
.TP
.BI \-\-maxqsize\~ "positive integer"
Reject query sequences with an abundance greater than \fIinteger\fR.
+.TAG maxqt
.TP
.BI \-\-maxqt \0real
Reject if the query/target sequence length ratio is greater than
\fIreal\fR.
+.TAG maxrejects
.TP
.BI \-\-maxrejects\~ "positive integer"
Maximum number of non-matching target sequences to consider before
@@ -2176,40 +2444,50 @@ process stops for that query (no hit). If \-\-maxrejects is set to a
higher value, more target sequences are considered. If \-\-maxaccepts
and \-\-maxrejects are both set to 0, the complete database is
searched.
+.TAG maxsizeratio
.TP
.BI \-\-maxsizeratio \0real
Reject if the query/target abundance ratio is greater than
\fIreal\fR.
+.TAG maxsl
.TP
.BI \-\-maxsl \0real
Reject if the shorter/longer sequence length ratio is
greater than \fIreal\fR.
+.TAG maxsubs
.TP
.BI \-\-maxsubs\~ "positive integer"
Reject the sequence match if the pairwise alignment contains more than
\fIinteger\fR substitutions.
+.TAG mid
.TP
.BI \-\-mid \0real
Reject the sequence match if the percentage of identity is lower than
\fIreal\fR (ignoring all gaps, internal and terminal).
+.TAG mincols
.TP
.BI \-\-mincols\~ "positive integer"
Reject the sequence match if the alignment length is shorter than
\fIinteger\fR.
+.TAG minqt
.TP
.BI \-\-minqt \0real
Reject if the query/target sequence length ratio is lower than
\fIreal\fR.
+.TAG minsizeratio
.TP
.BI \-\-minsizeratio \0real
Reject if the query/target abundance ratio is lower than \fIreal\fR.
+.TAG minsl
.TP
.BI \-\-minsl \0real
Reject if the shorter/longer sequence length ratio is lower than
\fIreal\fR.
+.TAG mintsize
.TP
.BI \-\-mintsize\~ "positive integer"
Reject target sequences with an abundance lower than \fIinteger\fR.
+.TAG minwordmatches
.TP
.BI \-\-minwordmatches\~ "non-negative integer"
Minimum number of word matches required for a sequence to be
@@ -2219,10 +2497,12 @@ considered further. Default value is 12 for the default word length
sequence has fewer unique words than the number specified, all words
in the query must match. If the argument is 0, no word matches are
required.
+.TAG mismatch
.TP
.BI \-\-mismatch\~ "integer"
Score assigned to a mismatch (i.e. different nucleotides) in the
pairwise alignment. The default value is -4.
+.TAG mothur_shared_out
.TP
.BI \-\-mothur_shared_out \0filename
Write search results to an OTU table in the mothur 'shared'
@@ -2230,10 +2510,12 @@ tab-separated plain text file format. The query file contains the
samples, while the database file contains the OTUs. Sample and OTU
identifiers are extracted from the header of these sequences. See the
\-\-otutabout option in the Clustering section for further details.
+.TAG notmatched
.TP
.BI \-\-notmatched \0filename
Write query sequences not matching database target sequences to
\fIfilename\fR, in fasta format.
+.TAG otutabout
.TP
.BI \-\-otutabout \0filename
Write search results to an OTU table in the classic tab-separated
@@ -2242,39 +2524,49 @@ database file contains the OTUs. Sample and OTU identifiers are
extracted from the header of these sequences. See the
\-\-mothur_shared_out option in the Clustering section for further
details.
+.TAG output_no_hits
.TP
.B \-\-output_no_hits
Write both matching and non-matching queries to \-\-alnout,
\-\-blast6out, \-\-samout or \-\-userout output files. Non-matching
queries are labelled 'No hits' in \-\-alnout files.
+.TAG pattern
.TP
.B \-\-pattern \fIstring\fR
This option is ignored. It is provided for compatibility with usearch.
+.TAG qmask
.TP
.BI \-\-qmask\~ "none|dust|soft"
Mask regions in the query sequences
using the dust or the soft algorithms, or do not mask
(none). Warning, when using soft masking search commands
become case sensitive. The default is to mask using \fIdust\fR.
+.TAG query_cov
.TP
.BI \-\-query_cov \0real
Reject if the fraction of the query aligned to the target sequence is
lower than \fIreal\fR. The query coverage is computed as
(matches + mismatches) / query sequence length. Internal or terminal
gaps are not taken into account.
+.TAG rightjust
.TP
.B \-\-rightjust
Reject the sequence match if the pairwise alignment ends with gaps.
+.TAG rowlen
.TP
.BI \-\-rowlen\~ "positive integer"
Width of alignment lines in \-\-alnout output. The default value is
64. Set to 0 to eliminate wrapping.
+.TAG samheader
.TP
.B \-\-samheader
Include header lines to the SAM file when \-\-samout is specified. The
header includes lines starting with @HD, @SQ and @PG, but no @RG lines
-(see <https://github.com/samtools/hts-specs>). By default no header
-line is written.
+(see
+.URL https://github.com/samtools/hts-specs (link)
+<https://github.com/samtools/hts-specs>). By default no header line is
+written.
+.TAG samout
.TP
.BI \-\-samout \0filename
Write alignment results to \fIfilename\fR using the SAM format (a
@@ -2283,8 +2575,10 @@ file starts with header lines. Each non-header line is a SAM record,
which represents either a query-target alignment or the absence of
match for a query (output order may vary when using multiple
threads). Each record contains 11 mandatory fields and optional fields
-(see <https://github.com/samtools/hts-specs> for a complete
-description of the format):
+(see
+.URL https://github.com/samtools/hts-specs (link)
+<https://github.com/samtools/hts-specs> for a complete description of
+the format):
.RS
.RS
.nr step 1 1
@@ -2318,9 +2612,7 @@ as usearch does).
.IP \n+[step].
quality string (ignored, always set to '*').
.RE
-.TP
-Optional fields for query-target matches (number and order of fields
-may vary):
+Optional fields for query-target matches (number and order of fields may vary):
.RS
.nr step 12 1
.IP \n[step]. 4
@@ -2341,6 +2633,7 @@ MD:Z:? string for mismatching positions.
YT:Z:UU string representing the alignment type.
.RE
.RE
+.TAG search_exact
.TP
.BI \-\-search_exact \0filename
Search for exact full-length matches to the query sequences contained
@@ -2349,29 +2642,35 @@ in \fIfilename\fR in the database of target sequences (\-\-db). Only
\-\-usearch_global. The \-\-id, \-\-maxaccepts and \-\-maxrejects
options are ignored, but the rest of the searching options may be
specified.
+.TAG self
.TP
.B \-\-self
Reject the sequence match if the query and target labels are
identical.
+.TAG selfid
.TP
.B \-\-selfid
Reject the sequence match if the query and target sequences are
strictly identical.
+.TAG sizeout
.TP
.B \-\-sizeout
Add abundance annotations to the output of the option \-\-dbmatched
(using the pattern ';size=\fIinteger\fR;'), to report the number of
queries that matched each target.
+.TAG strand
.TP
.BI \-\-strand\~ "plus|both"
When searching for similar sequences, check the \fIplus\fR strand only
(default) or check \fIboth\fR strands.
+.TAG target_cov
.TP
.BI \-\-target_cov \0real
Reject the sequence match if the fraction of the target sequence
aligned to the query sequence is lower than \fIreal\fR. The target
coverage is computed as (matches + mismatches) / target sequence
length. Internal or terminal gaps are not taken into account.
+.TAG top_hits_only
.TP
.B \-\-top_hits_only
Only the top hits between the query and database sequence sets are
@@ -2383,6 +2682,7 @@ the highest percentage of identity (see the \-\-iddef option to change
the way identity is measured). For a given query, if several top hits
present exactly the same percentage of identity, the number of hits
reported is controlled by the \-\-maxaccepts value (1 by default).
+.TAG uc
.TP
.BI \-\-uc \0filename
Output searching results in \fIfilename\fR using a tab-separated
@@ -2424,26 +2724,31 @@ Label of the query sequence.
Label of the target centroid sequence. Set to '*' for N.
.RE
.RE
+.TAG uc_allhits
.TP
.B \-\-uc_allhits
When using the \-\-uc option, show all hits, not just the top hit for
each query.
+.TAG usearch_global
.TP
.BI \-\-usearch_global \0filename
Compare target sequences (\-\-db) to the fasta-formatted query
sequences contained in \fIfilename\fR, using global pairwise
alignment.
+.TAG userfields
.TP
.BI \-\-userfields \0string
When using \-\-userout, select and order the fields written to the
output file. Fields are separated by '+' (e.g. query+target+id). See
the 'Userfields' section for a complete list of fields.
+.TAG userout
.TP
.BI \-\-userout \0filename
Write user-defined tab-separated output to \fIfilename\fR. Select the
fields with the option \-\-userfields. Output order may vary when
using multiple threads. If \-\-userfields is empty or not present,
\fIfilename\fR is empty.
+.TAG weak_id
.TP
.BI \-\-weak_id \0real
Show hits with percentage of identity of at least \fIreal\fR, without
@@ -2453,6 +2758,7 @@ are found (as defined by \-\-maxaccepts, \-\-maxrejects, and
\-\-maxaccepts, high \-\-id values can be used, hence preserving both
speed and sensitivity. Logically, \fIreal\fR must be smaller than the
value indicated by \-\-id.
+.TAG wordlength
.TP
.BI \-\-wordlength\~ "positive integer"
Length of words (i.e. \fIk\fR-mers) for database indexing. The range
@@ -2469,26 +2775,32 @@ default value is 8.
.RE
.PP
.\" ----------------------------------------------------------------------------
+.TAG shuffling-options
Shuffling options:
.RS
Fasta entries in the input file are outputted in a pseudo-random
order.
+.TAG output
.TP 9
.BI \-\-output \0filename
Write the shuffled sequences to \fIfilename\fR, in fasta format.
+.TAG randseed
.TP
.BI \-\-randseed\~ "positive integer"
When shuffling sequence order, use \fIinteger\fR as seed. A given seed
always produces the same output order (useful for replicability). Set
to 0 to use a pseudo-random seed (default behavior).
+.TAG relabel
.TP
.BI \-\-relabel \0string
Relabel sequences using the prefix \fIstring\fR and a ticker (1, 2, 3,
etc.) to construct the new headers. Use \-\-sizeout to conserve the
abundance annotations.
+.TAG relabel_keep
.TP
.B \-\-relabel_keep
When relabelling, keep the old identifier in the header after a space.
+.TAG relabel_md5
.TP
.B \-\-relabel_md5
Relabel sequences using the MD5 message digest algorithm applied to
@@ -2502,6 +2814,11 @@ inputs give the same result. The MD5 digest generates a 128-bit
(16-byte) digest that is represented by 16 hexadecimal numbers (using
32 symbols among 0123456789abcdef). Use \-\-sizeout to conserve the
abundance annotations.
+.TAG relabel_self
+.TP
+.B \-\-relabel_self
+Relabel sequences using the sequence itself as the label.
+.TAG relabel_sha1
.TP
.B \-\-relabel_sha1
Relabel sequences using the SHA1 message digest algorithm applied to
@@ -2512,19 +2829,23 @@ hexadecimal numbers (40 symbols). The probability of a collision (two
non-identical sequences having the same digest) is smaller for the
SHA1 algorithm than it is for the MD5 algorithm. Use \-\-sizeout to
conserve the abundance annotations.
+.TAG sizeout
.TP
.B \-\-sizeout
-When using \-\-relabel, \-\-relabel_md5 or \-\-relabel_sha1, preserve
-and report abundance annotations to the output fasta file (using the
-pattern ';size=\fIinteger\fR;').
+When using \-\-relabel, \-\-relabel_self, \-\-relabel_md5 or
+\-\-relabel_sha1, preserve and report abundance annotations to the
+output fasta file (using the pattern ';size=\fIinteger\fR;').
+.TAG shuffle
.TP
.BI \-\-shuffle \0filename
Pseudo-randomly shuffle the order of sequences contained in
\fIfilename\fR.
+.TAG topn
.TP
.BI \-\-topn\~ "positive integer"
Output only the first \fIinteger\fR sequences after pseudo-random
reordering.
+.TAG xsize
.TP
.B \-\-xsize
Strip abundance information from the headers when writing the output
@@ -2532,6 +2853,7 @@ file.
.RE
.PP
.\" ----------------------------------------------------------------------------
+.TAG sorting-options
Sorting options:
.RS
Fasta entries are sorted by decreasing abundance (\-\-sortbysize) or
@@ -2544,51 +2866,68 @@ sorting performed during chimera checking (\-\-uchime_denovo),
dereplication (\-\-derep_fulllength), and clustering (\-\-cluster_fast
and \-\-cluster_size).
.PP
+.TAG maxsize
.TP 9
.BI \-\-maxsize\~ "positive integer"
When using \-\-sortbysize, discard sequences with an abundance value
greater than \fIinteger\fR.
+.TAG minsize
.TP
.BI \-\-minsize\~ "positive integer"
When using \-\-sortbysize, discard sequences with an abundance value
smaller than \fIinteger\fR.
+.TAG output
.TP
.BI \-\-output \0filename
Write the sorted sequences to \fIfilename\fR, in fasta format.
+.TAG relabel
.TP
.BI \-\-relabel \0string
Please see the description of the same option under Chimera detection
for details.
+.TAG relabel_keep
.TP
.B \-\-relabel_keep
When relabelling, keep the old identifier in the header after a space.
+.TAG relabel_md5
.TP
.BI \-\-relabel_md5
Please see the description of the same option under Chimera detection
for details.
+.TAG relabel_self
+.TP
+.BI \-\-relabel_self
+Please see the description of the same option under Chimera detection
+for details.
+.TAG relabel_sha1
.TP
.BI \-\-relabel_sha1
Please see the description of the same option under Chimera detection
for details.
+.TAG sizeout
.TP
.B \-\-sizeout
When using \-\-relabel, report abundance annotations to the output
fasta file (using the pattern ';size=\fIinteger\fR;').
+.TAG sortbylength
.TP
.BI \-\-sortbylength \0filename
Sort by decreasing length the sequences contained in
\fIfilename\fR. See the general options \-\-minseqlength and
\-\-maxseqlength to eliminate short and long sequences.
+.TAG sortbysize
.TP
.BI \-\-sortbysize \0filename
Sort by decreasing abundance the sequences contained in \fIfilename\fR
(missing abundance values are assumed to be ';size=1'). See the
options \-\-minsize and \-\-maxsize to eliminate rare and dominant
sequences.
+.TAG topn
.TP
.BI \-\-topn\~ "positive integer"
Output only the top \fIinteger\fR sequences (i.e. the longest or the
most abundant).
+.TAG xsize
.TP
.B \-\-xsize
Strip abundance information from the headers when writing the output
@@ -2596,6 +2935,7 @@ file.
.RE
.PP
.\" ----------------------------------------------------------------------------
+.TAG subsampling-options
Subsampling options:
.RS
Subsampling randomly extracts a certain number or a certain percentage
@@ -2614,12 +2954,15 @@ with the options \-\-fasta_discarded and \-\-fastq_discarded. The
\-\-fastq_ascii, \-\-fastq_qmin and \-\-fastq_qmax options are also
available.
.PP
+.TAG fastaout
.TP 9
.BI \-\-fastaout \0filename
Write the sampled sequences to \fIfilename\fR, in fasta format.
+.TAG fastaout_discarded
.TP
.BI \-\-fastaout_discarded \0filename
Write the sequences not sampled to \fIfilename\fR, in fasta format.
+.TAG fastq_ascii
.TP
.BI \-\-fastq_ascii\~ "positive integer"
Define the ASCII character number used as the basis for the FASTQ
@@ -2627,42 +2970,51 @@ quality score. The default is 33, which is used by the Sanger /
Illumina 1.8+ FASTQ format (phred+33). The value 64 is used by the
Solexa, Illumina 1.3+ and Illumina 1.5+ formats (phred+64). Only 33
and 64 are valid arguments.
+.TAG fastq_qmax
.TP
.BI \-\-fastq_qmax\~ "positive integer"
Specify the maximum quality score accepted when reading FASTQ
files. The default is 41, which is usual for recent Sanger/Illumina
1.8+ files.
+.TAG fastq_qmin
.TP
.BI \-\-fastq_qmin\~ "positive integer"
Specify the minimum quality score accepted for FASTQ files. The
default is 0, which is usual for recent Sanger/Illumina 1.8+
files. Older formats may use scores between -5 and 2.
+.TAG fastqout
.TP
.BI \-\-fastqout \0filename
Write the sampled sequences to \fIfilename\fR, in fastq
format. Requires input in fastq format.
+.TAG fastqout_discarded
.TP
.BI \-\-fastqout_discarded \0filename
Write the sequences not sampled to \fIfilename\fR, in fastq
format. Requires input in fastq format.
+.TAG fastx_subsample
.TP
.BI \-\-fastx_subsample \0filename
Perform subsampling from the sequences in the specified input file
that is in FASTA or FASTQ format.
+.TAG randseed
.TP
.BI \-\-randseed\~ "positive integer"
Use \fIinteger\fR as a seed for the pseudo-random generator. A given
seed always produces the same output, which is useful for
replicability. Set to 0 to use a pseudo-random seed (default
behavior).
+.TAG relabel
.TP
.BI \-\-relabel \0string
Relabel sequences using the prefix \fIstring\fR and a ticker (1, 2, 3,
etc.) to construct the new headers. Use \-\-sizeout to conserve the
abundance annotations.
+.TAG relabel_keep
.TP
.B \-\-relabel_keep
When relabelling, keep the old identifier in the header after a space.
+.TAG relabel_md5
.TP
.B \-\-relabel_md5
Relabel sequences using the MD5 message digest algorithm applied to
@@ -2676,6 +3028,11 @@ inputs give the same result. The MD5 digest generates a 128-bit
(16-byte) digest that is represented by 16 hexadecimal numbers (using
32 symbols among 0123456789abcdef). Use \-\-sizeout to conserve the
abundance annotations.
+.TAG relabel_self
+.TP
+.B \-\-relabel_self
+Relabel sequences using the sequence itself as the label.
+.TAG relabel_sha1
.TP
.B \-\-relabel_sha1
Relabel sequences using the SHA1 message digest algorithm applied to
@@ -2686,20 +3043,25 @@ hexadecimal numbers (40 symbols). The probability of a collision (two
non-identical sequences having the same digest) is smaller for the
SHA1 algorithm than it is for the MD5 algorithm. Use \-\-sizeout to
conserve the abundance annotations.
+.TAG sample_pct
.TP
.BI \-\-sample_pct\~ "real"
Subsample the given percentage of the input sequences. Accepted values
range from 0.0 to 100.0.
+.TAG sample_size
.TP
.BI \-\-sample_size\~ "positive integer"
Extract the given number of sequences.
+.TAG sizein
.TP
.B \-\-sizein
Take the abundance information of the input file into account,
otherwise the abundance of each sequence is considered to be 1.
+.TAG sizeout
.TP
.B \-\-sizeout
Write abundance information to the output file.
+.TAG xsize
.TP
.B \-\-xsize
Strip abundance information from the headers when writing the output
@@ -2707,12 +3069,14 @@ file.
.RE
.PP
.\" ----------------------------------------------------------------------------
+.TAG taxonomic-classification-options
Taxonomic classification options:
.RS
The vsearch command \-\-sintax will classify the input sequences
according to the Sintax algorithm as described by Robert Edgar (2016)
in SINTAX: a simple non-Bayesian taxonomy classifier for 16S and ITS
-sequences, BioRxiv, 074161. Preprint. doi: https://doi.org/10.1101/074161.
+sequences, BioRxiv, 074161. Preprint. doi: 10.1101/074161
+.URL https://doi.org/10.1101/074161 (link)
.PP
The name of the fasta file containing the input sequences to be
classified is given as an argument to the \-\-sintax command. The reference
@@ -2733,20 +3097,24 @@ of the rank by one of the letters d (for domain) k (kingdom), p
(species). The letter is followed by a colon (:) and the name of that
rank. Commas and semicolons are not allowed in the name of the rank.
.PP
-Example: ">X80725_S000004313;tax=d:Bacteria,p:Proteobacteria,c:Gammaproteobacteria,o:Enterobacteriales,f:Enterobacteriaceae,g:Escherichia/Shigella,s:Escherichia_coli".
+Example: ">X80725_S000004313;\:tax=d:Bacteria,\:p:Proteobacteria,\:c:Gammaproteobacteria,\:o:Enterobacteriales,\:f:Enterobacteriaceae,\:g:Escherichia/Shigella,\:s:Escherichia_coli".
.PP
+.TAG db
.TP 9
.BI \-\-db \0filename
Read the reference sequences from \fIfilename\fR, in FASTA, FASTQ or
UDB format. These sequences needs to be annotated with taxonomy.
+.TAG sintax_cutoff
.TP
.BI \-\-sintax_cutoff\~ "real"
Specify a minimum level of bootstrap support for the taxonomic ranks
that will be included in column 4 of the output file. For instance
0.9, corresponding to 90%.
+.TAG sintax
.TP
.BI \-\-sintax \0filename
Read the input sequences from \fIfilename\fR, in FASTA or FASTQ format.
+.TAG tabbedout
.TP
.BI \-\-tabbedout \0filename
Write the results to \fIfilename\fR, in a tab-separated text
@@ -2760,6 +3128,7 @@ the threshold.
.RE
.PP
.\" ----------------------------------------------------------------------------
+.TAG udb-options
UDB options:
.RS
Databases to be used with the \-\-usearch_global command may be
@@ -2771,6 +3140,7 @@ below can be used to create and inspect UDB files. An UDB file may be
specified with the \-\-db option instead of a FASTA formatted file
with the \-\-usearch_global command.
.PP
+.TAG dbmask
.TP 9
.BI \-\-dbmask\~ "none|dust|soft"
Specify the sequence masking method used with the \-\-makeudb_usearch
@@ -2779,32 +3149,39 @@ is specified. When dust is specified, the DUST algorithm will be used
for masking low complexity regions (short repeats and skewed
composition). Lower case letters in the input file will be masked when
soft is specified (soft masking).
+.TAG hardmask
.TP
.B \-\-hardmask
Mask sequences by replacing letters with N for the \-\-makeudb_usearch
command. The default is to use lower case letters (soft masking).
+.TAG makeudb_usearch
.TP
.BI \-\-makeudb_usearch \0filename
Create an UDB database file from the FASTA-formatted sequences in the
file with the given \fIfilename\fR. The UDB database is written to the
file specified with the \-\-output option.
+.TAG output
.TP
.BI \-\-output \0filename
Specify the \fIfilename\fR of a FASTA or UDB output file for the
\-\-makeudb_usearch or the \-\-udb2fasta command, respectively.
+.TAG udb2fasta
.TP
.BI \-\-udb2fasta \0filename
Read the UDB database in the file with the given \fIfilename\fR and
output the sequences in FASTA format in the file specified by the
\-\-output option.
+.TAG udbinfo
.TP
.BI \-\-udbinfo \0filename
Show information about the UDB database in the file with the given
\fIfilename\fR.
+.TAG udbstats
.TP
.BI \-\-udbstats \0filename
Report statistics about the indexed words in the UDB database in the
file with the given \fIfilename\fR.
+.TAG wordlength
.TP
.BI \-\-wordlength\~ "positive integer"
Specify the length of the words to be used when creating the UDB
@@ -2813,6 +3190,7 @@ range from 3 to 15. The default is 8.
.RE
.PP
.\" ----------------------------------------------------------------------------
+.TAG userfields
Userfields (fields accepted by the \-\-userfields option):
.RS
.TP 9
@@ -3082,8 +3460,8 @@ iddef (clustering, pairwise alignment, searching)
.IP -
maxuniquesize (dereplication)
.IP -
-relabel_md5 and relabel_sha1 (chimera detection, dereplication, FASTQ
-processing, shuffling, sorting)
+relabel_md5, relabel_self and relabel_sha1 (chimera detection,
+dereplication, FASTQ processing, shuffling, sorting)
.IP -
shuffle (shuffling)
.IP -
@@ -3198,14 +3576,18 @@ Frédéric Mahé.
Rognes T, Flouri T, Nichols B, Quince C, Mahé F. (2016)
VSEARCH: a versatile open source tool for metagenomics.
\fIPeerJ\fR 4:e2584 doi: 10.7717/peerj.2584
-<https://doi.org/10.7717/peerj.2584>
+.URL https://doi.org/10.7717/peerj.2584 (link)
.PP
.\" ============================================================================
.SH REPORTING BUGS
Submit suggestions and bug-reports at
+.URL https://github.com/torognes/vsearch/issues (link)
<https://github.com/torognes/vsearch/issues>, send a pull request on
+.URL https://github.com/torognes/vsearch (link)
<https://github.com/torognes/vsearch>, or compose a friendly or
-curmudgeont e-mail to Torbjørn Rognes <torognes at ifi.uio.no>.
+curmudgeont e-mail to Torbjørn Rognes
+.MTO torognes at ifi.uio.no (link)
+<torognes at ifi.uio.no>.
.PP
.\" ============================================================================
.SH AVAILABILITY
@@ -3240,7 +3622,9 @@ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
.PP
You should have received a copy of the GNU General Public License
-along with this program. If not, see <http://www.gnu.org/licenses/>.
+along with this program. If not, see
+.URL http://www.gnu.org/licenses/ (link)
+<http://www.gnu.org/licenses/>.
.PP
.PP
\fBThe BSD 2-Clause License\fR
@@ -3297,17 +3681,18 @@ copyright Julian R. Seward.
.SH SEE ALSO
\fBswipe\fR, an extremely fast pairwise local (Smith-Waterman)
database search tool by Torbjørn Rognes, available at
+.URL https://github.com/torognes/swipe "(link)"
<https://github.com/torognes/swipe>.
.PP
\fBswarm\fR, a fast and accurate amplicon clustering method by
Frédéric Mahé and Torbjørn Rognes, available at
+.URL https://github.com/torognes/swarm "(link)"
<https://github.com/torognes/swarm>.
.PP
.\" ============================================================================
.SH VERSION HISTORY
New features and important modifications of \fBvsearch\fR (short lived
or minor bug releases may not be mentioned):
-.RS
.TP
.BR v1.0.0\~ "released November 28th, 2014"
First public release.
@@ -3844,7 +4229,17 @@ output from the fastq_stats command if quiet option was given. Updated manual.
.TP
.BR v2.13.6\~ "released July 2nd, 2019"
Added info about cut command to output of help command.
-.RE
+.TP
+.BR v2.13.7\~ "released September 2nd, 2019"
+Fixed bug in consensus sequence introduced in version 2.13.0.
+.TP
+.BR v2.14.0\~ "released September 11th, 2019"
+Added relabel_self option. Made fasta_width, sizein, sizeout and
+relabelling options valid for certain commands.
+.TP
+.BR v2.14.1\~ "released September 18th, 2019"
+Fixed bug with sequences written to file specified with fastaout_rev
+for commands fastx_filter and fastq_filter.
.LP
.\" ============================================================================
.\" TODO:
=====================================
src/cluster.cc
=====================================
@@ -338,6 +338,11 @@ char * relabel_otu(int clusterno, char * sequence, int seqlen)
label = (char*) xmalloc(strlen(opt_relabel) + 21);
sprintf(label, "%s%d", opt_relabel, clusterno+1);
}
+ else if (opt_relabel_self)
+ {
+ label = (char*) xmalloc(seqlen + 1);
+ sprintf(label, "%.*s", seqlen, sequence);
+ }
else if (opt_relabel_sha1)
{
label = (char*) xmalloc(LEN_HEX_DIG_SHA1);
@@ -363,7 +368,7 @@ void cluster_core_results_hit(struct hit * best,
if (opt_otutabout || opt_mothur_shared_out || opt_biomout)
{
- if (opt_relabel || opt_relabel_sha1 || opt_relabel_md5)
+ if (opt_relabel || opt_relabel_self || opt_relabel_sha1 || opt_relabel_md5)
{
char * label = relabel_otu(clusterno,
db_getsequence(best->target),
@@ -433,7 +438,7 @@ void cluster_core_results_nohit(int clusterno,
if (opt_otutabout || opt_mothur_shared_out || opt_biomout)
{
- if (opt_relabel || opt_relabel_sha1 || opt_relabel_md5)
+ if (opt_relabel || opt_relabel_self || opt_relabel_sha1 || opt_relabel_md5)
{
char * label = relabel_otu(clusterno, qsequence, qseqlen);
otutable_add(query_head, label, qsize);
=====================================
src/fasta.cc
=====================================
@@ -322,13 +322,19 @@ void fasta_print(FILE * fp, const char * hdr,
fasta_print_sequence(fp, seq, len, opt_fasta_width);
}
+inline void fprint_seq_label(FILE * fp, char * seq, int len)
+{
+ /* normalize first? */
+ fprintf(fp, "%.*s", len, seq);
+}
+
void fasta_print_general(FILE * fp,
const char * prefix,
char * seq,
int len,
char * header,
int header_len,
- int abundance,
+ unsigned int abundance,
int ordinal,
double ee,
int clustersize,
@@ -341,7 +347,9 @@ void fasta_print_general(FILE * fp,
if (prefix)
fprintf(fp, "%s", prefix);
- if (opt_relabel_sha1)
+ if (opt_relabel_self)
+ fprint_seq_label(fp, seq, len);
+ else if (opt_relabel_sha1)
fprint_seq_digest_sha1(fp, seq, len);
else if (opt_relabel_md5)
fprint_seq_digest_md5(fp, seq, len);
@@ -374,7 +382,7 @@ void fasta_print_general(FILE * fp,
fprintf(fp, ";%s=%.4lf", score_name, score);
if (opt_relabel_keep &&
- ((opt_relabel && (ordinal > 0)) || opt_relabel_sha1 || opt_relabel_md5))
+ ((opt_relabel && (ordinal > 0)) || opt_relabel_sha1 || opt_relabel_md5 || opt_relabel_self))
fprintf(fp, " %s", header);
fprintf(fp, "\n");
=====================================
src/fasta.h
=====================================
@@ -91,7 +91,7 @@ void fasta_print_general(FILE * fp,
int len,
char * header,
int header_len,
- int abundance,
+ unsigned int abundance,
int ordinal,
double ee,
int clustersize,
=====================================
src/fastq.cc
=====================================
@@ -446,6 +446,12 @@ int64_t fastq_get_abundance_and_presence(fastx_handle h)
return header_get_size(h->header_buffer.data, h->header_buffer.length);
}
+inline void fprint_seq_label(FILE * fp, char * seq, int len)
+{
+ /* normalize first? */
+ fprintf(fp, "%.*s", len, seq);
+}
+
void fastq_print_general(FILE * fp,
char * seq,
int len,
@@ -458,7 +464,9 @@ void fastq_print_general(FILE * fp,
{
fprintf(fp, "@");
- if (opt_relabel_sha1)
+ if (opt_relabel_self)
+ fprint_seq_label(fp, seq, len);
+ else if (opt_relabel_sha1)
fprint_seq_digest_sha1(fp, seq, len);
else if (opt_relabel_md5)
fprint_seq_digest_md5(fp, seq, len);
@@ -482,7 +490,7 @@ void fastq_print_general(FILE * fp,
fprintf(fp, ";ee=%.4lf", ee);
if (opt_relabel_keep &&
- ((opt_relabel && (ordinal > 0)) || opt_relabel_sha1 || opt_relabel_md5))
+ ((opt_relabel && (ordinal > 0)) || opt_relabel_sha1 || opt_relabel_md5 || opt_relabel_self))
fprintf(fp, " %.*s", header_len, header);
fprintf(fp, "\n%.*s\n+\n%.*s\n", len, seq, len, quality);
=====================================
src/filter.cc
=====================================
@@ -433,7 +433,7 @@ void filter(bool fastq_only, char * filename)
if (opt_fastaout_rev)
fasta_print_general(fp_fastaout_rev,
0,
- fastx_get_sequence(h1) + res2.start,
+ fastx_get_sequence(h2) + res2.start,
res2.length,
fastx_get_header(h2),
fastx_get_header_length(h2),
=====================================
src/mergepairs.cc
=====================================
@@ -1408,7 +1408,7 @@ void fastq_mergepairs()
if (failed_minscore)
fprintf(stderr,
- "%10" PRIu64 " alignment score too low, or score drop to high\n",
+ "%10" PRIu64 " alignment score too low, or score drop too high\n",
failed_minscore);
if (failed_minovlen)
=====================================
src/msa.cc
=====================================
@@ -296,7 +296,7 @@ void msa(FILE * fp_msaout, FILE * fp_consout, FILE * fp_profile,
if (count > best_count)
{
best_count = count;
- best_sym = c+1;
+ best_sym = 1 << c;
}
}
=====================================
src/sffconvert.cc
=====================================
@@ -335,7 +335,7 @@ void sff_convert()
read_name,
strlen(read_name),
qual + clip_start,
- 0, 0, -1.0);
+ 1, read_no + 1, -1.0);
xfree(read_name);
xfree(bases);
=====================================
src/vsearch.cc
=====================================
@@ -76,6 +76,7 @@ bool opt_no_progress;
bool opt_quiet;
bool opt_relabel_keep;
bool opt_relabel_md5;
+bool opt_relabel_self;
bool opt_relabel_sha1;
bool opt_samheader;
bool opt_sff_clip;
@@ -826,6 +827,7 @@ void args_init(int argc, char **argv)
opt_relabel = 0;
opt_relabel_keep = 0;
opt_relabel_md5 = 0;
+ opt_relabel_self = 0;
opt_relabel_sha1 = 0;
opt_rereplicate = 0;
opt_reverse = 0;
@@ -884,456 +886,458 @@ void args_init(int argc, char **argv)
enum
{
- option_help,
- option_version,
+ option_abskew,
+ option_acceptall,
+ option_alignwidth,
+ option_allpairs_global,
option_alnout,
- option_usearch_global,
- option_db,
- option_id,
- option_maxaccepts,
- option_maxrejects,
- option_wordlength,
- option_match,
- option_mismatch,
- option_fulldp,
- option_strand,
- option_threads,
- option_gapopen,
- option_gapext,
- option_rowlen,
- option_userfields,
- option_userout,
- option_self,
+ option_band,
+ option_biomout,
option_blast6out,
- option_uc,
- option_weak_id,
- option_uc_allhits,
- option_notrunclabels,
- option_sortbysize,
- option_output,
- option_minsize,
- option_maxsize,
- option_relabel,
- option_sizeout,
- option_derep_fulllength,
- option_minseqlength,
- option_minuniquesize,
- option_topn,
- option_maxseqlength,
- option_sizein,
- option_sortbylength,
- option_matched,
- option_notmatched,
- option_dbmatched,
- option_dbnotmatched,
- option_fastapairs,
- option_output_no_hits,
- option_maxhits,
- option_top_hits_only,
- option_fasta_width,
- option_query_cov,
- option_target_cov,
- option_idprefix,
- option_idsuffix,
- option_minqt,
- option_maxqt,
- option_minsl,
- option_maxsl,
- option_leftjust,
- option_rightjust,
- option_selfid,
- option_maxid,
- option_minsizeratio,
- option_maxsizeratio,
- option_maxdiffs,
- option_maxsubs,
- option_maxgaps,
- option_mincols,
- option_maxqsize,
- option_mintsize,
- option_mid,
- option_shuffle,
- option_randseed,
- option_maskfasta,
- option_hardmask,
- option_qmask,
- option_dbmask,
- option_cluster_smallmem,
- option_cluster_fast,
+ option_borderline,
+ option_bzip2_decompress,
option_centroids,
- option_clusters,
- option_consout,
- option_cons_truncate,
- option_msaout,
- option_usersort,
- option_xn,
- option_iddef,
- option_slots,
- option_pattern,
- option_maxuniquesize,
- option_abskew,
option_chimeras,
- option_dn,
- option_mindiffs,
- option_mindiv,
- option_minh,
- option_nonchimeras,
- option_uchime_denovo,
- option_uchime_ref,
- option_uchimealns,
- option_uchimeout,
- option_uchimeout5,
- option_alignwidth,
- option_allpairs_global,
- option_acceptall,
+ option_cluster_fast,
option_cluster_size,
- option_samout,
- option_log,
- option_quiet,
- option_fastx_subsample,
- option_sample_pct,
- option_fastq_chars,
- option_profile,
- option_sample_size,
- option_fastaout,
- option_xsize,
+ option_cluster_smallmem,
+ option_cluster_unoise,
option_clusterout_id,
option_clusterout_sort,
- option_borderline,
- option_relabel_sha1,
- option_relabel_md5,
+ option_clusters,
+ option_cons_truncate,
+ option_consout,
+ option_cut,
+ option_cut_pattern,
+ option_db,
+ option_dbmask,
+ option_dbmatched,
+ option_dbnotmatched,
+ option_derep_fulllength,
option_derep_prefix,
- option_fastq_filter,
- option_fastqout,
+ option_dn,
+ option_ee_cutoffs,
+ option_eeout,
+ option_eetabbedout,
+ option_fasta_score,
+ option_fasta_width,
+ option_fastaout,
option_fastaout_discarded,
- option_fastqout_discarded,
- option_fastq_truncqual,
+ option_fastaout_discarded_rev,
+ option_fastaout_notmerged_fwd,
+ option_fastaout_notmerged_rev,
+ option_fastaout_rev,
+ option_fastapairs,
+ option_fastq_allowmergestagger,
+ option_fastq_ascii,
+ option_fastq_asciiout,
+ option_fastq_chars,
+ option_fastq_convert,
+ option_fastq_eeout,
+ option_fastq_eestats,
+ option_fastq_eestats2,
+ option_fastq_filter,
+ option_fastq_join,
+ option_fastq_maxdiffpct,
+ option_fastq_maxdiffs,
option_fastq_maxee,
- option_fastq_trunclen,
- option_fastq_minlen,
- option_fastq_stripleft,
option_fastq_maxee_rate,
+ option_fastq_maxlen,
+ option_fastq_maxmergelen,
option_fastq_maxns,
- option_eeout,
- option_fastq_ascii,
- option_fastq_qmin,
+ option_fastq_mergepairs,
+ option_fastq_minlen,
+ option_fastq_minmergelen,
+ option_fastq_minovlen,
+ option_fastq_nostagger,
option_fastq_qmax,
option_fastq_qmaxout,
+ option_fastq_qmin,
+ option_fastq_qminout,
option_fastq_stats,
+ option_fastq_stripleft,
+ option_fastq_stripright,
option_fastq_tail,
- option_fastx_revcomp,
- option_label_suffix,
- option_h,
- option_samheader,
- option_sizeorder,
- option_minwordmatches,
- option_v,
- option_relabel_keep,
- option_search_exact,
- option_fastx_mask,
- option_min_unmasked_pct,
- option_max_unmasked_pct,
- option_fastq_convert,
- option_fastq_asciiout,
- option_fastq_qminout,
- option_fastq_mergepairs,
- option_fastq_eeout,
- option_fastqout_notmerged_fwd,
- option_fastqout_notmerged_rev,
- option_fastq_minovlen,
- option_fastq_minmergelen,
- option_fastq_maxmergelen,
- option_fastq_nostagger,
- option_fastq_allowmergestagger,
- option_fastq_maxdiffs,
- option_fastaout_notmerged_fwd,
- option_fastaout_notmerged_rev,
- option_reverse,
- option_eetabbedout,
- option_fasta_score,
- option_fastq_eestats,
- option_rereplicate,
- option_xdrop_nw,
- option_minhsp,
- option_band,
- option_hspw,
- option_gzip_decompress,
- option_bzip2_decompress,
- option_fastq_maxlen,
option_fastq_truncee,
- option_fastx_filter,
- option_otutabout,
- option_mothur_shared_out,
- option_biomout,
+ option_fastq_trunclen,
option_fastq_trunclen_keep,
- option_fastq_stripright,
- option_no_progress,
- option_fastq_eestats2,
- option_ee_cutoffs,
- option_length_cutoffs,
- option_makeudb_usearch,
- option_udb2fasta,
- option_udbinfo,
- option_udbstats,
- option_cluster_unoise,
- option_unoise_alpha,
- option_uchime2_denovo,
- option_uchime3_denovo,
- option_sintax,
- option_sintax_cutoff,
- option_tabbedout,
- option_fastq_maxdiffpct,
- option_fastq_join,
- option_join_padgap,
- option_join_padgapq,
- option_sff_convert,
- option_sff_clip,
- option_fastaout_rev,
- option_fastaout_discarded_rev,
- option_fastqout_rev,
+ option_fastq_truncqual,
+ option_fastqout,
+ option_fastqout_discarded,
option_fastqout_discarded_rev,
- option_xee,
+ option_fastqout_notmerged_fwd,
+ option_fastqout_notmerged_rev,
+ option_fastqout_rev,
+ option_fastx_filter,
option_fastx_getseq,
option_fastx_getseqs,
option_fastx_getsubseq,
- option_label_substr_match,
- option_label,
- option_subseq_start,
- option_subseq_end,
- option_notmatchedfq,
+ option_fastx_mask,
+ option_fastx_revcomp,
+ option_fastx_subsample,
+ option_fulldp,
+ option_gapext,
+ option_gapopen,
+ option_gzip_decompress,
+ option_h,
+ option_hardmask,
+ option_help,
+ option_hspw,
+ option_id,
+ option_iddef,
+ option_idprefix,
+ option_idsuffix,
+ option_join_padgap,
+ option_join_padgapq,
+ option_label,
option_label_field,
+ option_label_substr_match,
+ option_label_suffix,
option_label_word,
option_label_words,
option_labels,
- option_cut,
- option_cut_pattern
+ option_leftjust,
+ option_length_cutoffs,
+ option_log,
+ option_makeudb_usearch,
+ option_maskfasta,
+ option_match,
+ option_matched,
+ option_max_unmasked_pct,
+ option_maxaccepts,
+ option_maxdiffs,
+ option_maxgaps,
+ option_maxhits,
+ option_maxid,
+ option_maxqsize,
+ option_maxqt,
+ option_maxrejects,
+ option_maxseqlength,
+ option_maxsize,
+ option_maxsizeratio,
+ option_maxsl,
+ option_maxsubs,
+ option_maxuniquesize,
+ option_mid,
+ option_min_unmasked_pct,
+ option_mincols,
+ option_mindiffs,
+ option_mindiv,
+ option_minh,
+ option_minhsp,
+ option_minqt,
+ option_minseqlength,
+ option_minsize,
+ option_minsizeratio,
+ option_minsl,
+ option_mintsize,
+ option_minuniquesize,
+ option_minwordmatches,
+ option_mismatch,
+ option_mothur_shared_out,
+ option_msaout,
+ option_no_progress,
+ option_nonchimeras,
+ option_notmatched,
+ option_notmatchedfq,
+ option_notrunclabels,
+ option_otutabout,
+ option_output,
+ option_output_no_hits,
+ option_pattern,
+ option_profile,
+ option_qmask,
+ option_query_cov,
+ option_quiet,
+ option_randseed,
+ option_relabel,
+ option_relabel_keep,
+ option_relabel_md5,
+ option_relabel_self,
+ option_relabel_sha1,
+ option_rereplicate,
+ option_reverse,
+ option_rightjust,
+ option_rowlen,
+ option_samheader,
+ option_samout,
+ option_sample_pct,
+ option_sample_size,
+ option_search_exact,
+ option_self,
+ option_selfid,
+ option_sff_clip,
+ option_sff_convert,
+ option_shuffle,
+ option_sintax,
+ option_sintax_cutoff,
+ option_sizein,
+ option_sizeorder,
+ option_sizeout,
+ option_slots,
+ option_sortbylength,
+ option_sortbysize,
+ option_strand,
+ option_subseq_end,
+ option_subseq_start,
+ option_tabbedout,
+ option_target_cov,
+ option_threads,
+ option_top_hits_only,
+ option_topn,
+ option_uc,
+ option_uc_allhits,
+ option_uchime2_denovo,
+ option_uchime3_denovo,
+ option_uchime_denovo,
+ option_uchime_ref,
+ option_uchimealns,
+ option_uchimeout,
+ option_uchimeout5,
+ option_udb2fasta,
+ option_udbinfo,
+ option_udbstats,
+ option_unoise_alpha,
+ option_usearch_global,
+ option_userfields,
+ option_userout,
+ option_usersort,
+ option_v,
+ option_version,
+ option_weak_id,
+ option_wordlength,
+ option_xdrop_nw,
+ option_xee,
+ option_xn,
+ option_xsize
};
static struct option long_options[] =
{
- {"help", no_argument, 0, 0 },
- {"version", no_argument, 0, 0 },
+ {"abskew", required_argument, 0, 0 },
+ {"acceptall", no_argument, 0, 0 },
+ {"alignwidth", required_argument, 0, 0 },
+ {"allpairs_global", required_argument, 0, 0 },
{"alnout", required_argument, 0, 0 },
- {"usearch_global", required_argument, 0, 0 },
+ {"band", required_argument, 0, 0 },
+ {"biomout", required_argument, 0, 0 },
+ {"blast6out", required_argument, 0, 0 },
+ {"borderline", required_argument, 0, 0 },
+ {"bzip2_decompress", no_argument, 0, 0 },
+ {"centroids", required_argument, 0, 0 },
+ {"chimeras", required_argument, 0, 0 },
+ {"cluster_fast", required_argument, 0, 0 },
+ {"cluster_size", required_argument, 0, 0 },
+ {"cluster_smallmem", required_argument, 0, 0 },
+ {"cluster_unoise", required_argument, 0, 0 },
+ {"clusterout_id", no_argument, 0, 0 },
+ {"clusterout_sort", no_argument, 0, 0 },
+ {"clusters", required_argument, 0, 0 },
+ {"cons_truncate", no_argument, 0, 0 },
+ {"consout", required_argument, 0, 0 },
+ {"cut", required_argument, 0, 0 },
+ {"cut_pattern", required_argument, 0, 0 },
{"db", required_argument, 0, 0 },
+ {"dbmask", required_argument, 0, 0 },
+ {"dbmatched", required_argument, 0, 0 },
+ {"dbnotmatched", required_argument, 0, 0 },
+ {"derep_fulllength", required_argument, 0, 0 },
+ {"derep_prefix", required_argument, 0, 0 },
+ {"dn", required_argument, 0, 0 },
+ {"ee_cutoffs", required_argument, 0, 0 },
+ {"eeout", no_argument, 0, 0 },
+ {"eetabbedout", required_argument, 0, 0 },
+ {"fasta_score", no_argument, 0, 0 },
+ {"fasta_width", required_argument, 0, 0 },
+ {"fastaout", required_argument, 0, 0 },
+ {"fastaout_discarded", required_argument, 0, 0 },
+ {"fastaout_discarded_rev",required_argument, 0, 0 },
+ {"fastaout_notmerged_fwd",required_argument, 0, 0 },
+ {"fastaout_notmerged_rev",required_argument, 0, 0 },
+ {"fastaout_rev", required_argument, 0, 0 },
+ {"fastapairs", required_argument, 0, 0 },
+ {"fastq_allowmergestagger", no_argument, 0, 0 },
+ {"fastq_ascii", required_argument, 0, 0 },
+ {"fastq_asciiout", required_argument, 0, 0 },
+ {"fastq_chars", required_argument, 0, 0 },
+ {"fastq_convert", required_argument, 0, 0 },
+ {"fastq_eeout", no_argument, 0, 0 },
+ {"fastq_eestats", required_argument, 0, 0 },
+ {"fastq_eestats2", required_argument, 0, 0 },
+ {"fastq_filter", required_argument, 0, 0 },
+ {"fastq_join", required_argument, 0, 0 },
+ {"fastq_maxdiffpct", required_argument, 0, 0 },
+ {"fastq_maxdiffs", required_argument, 0, 0 },
+ {"fastq_maxee", required_argument, 0, 0 },
+ {"fastq_maxee_rate", required_argument, 0, 0 },
+ {"fastq_maxlen", required_argument, 0, 0 },
+ {"fastq_maxmergelen", required_argument, 0, 0 },
+ {"fastq_maxns", required_argument, 0, 0 },
+ {"fastq_mergepairs", required_argument, 0, 0 },
+ {"fastq_minlen", required_argument, 0, 0 },
+ {"fastq_minmergelen", required_argument, 0, 0 },
+ {"fastq_minovlen", required_argument, 0, 0 },
+ {"fastq_nostagger", no_argument, 0, 0 },
+ {"fastq_qmax", required_argument, 0, 0 },
+ {"fastq_qmaxout", required_argument, 0, 0 },
+ {"fastq_qmin", required_argument, 0, 0 },
+ {"fastq_qminout", required_argument, 0, 0 },
+ {"fastq_stats", required_argument, 0, 0 },
+ {"fastq_stripleft", required_argument, 0, 0 },
+ {"fastq_stripright", required_argument, 0, 0 },
+ {"fastq_tail", required_argument, 0, 0 },
+ {"fastq_truncee", required_argument, 0, 0 },
+ {"fastq_trunclen", required_argument, 0, 0 },
+ {"fastq_trunclen_keep", required_argument, 0, 0 },
+ {"fastq_truncqual", required_argument, 0, 0 },
+ {"fastqout", required_argument, 0, 0 },
+ {"fastqout_discarded", required_argument, 0, 0 },
+ {"fastqout_discarded_rev",required_argument, 0, 0 },
+ {"fastqout_notmerged_fwd",required_argument, 0, 0 },
+ {"fastqout_notmerged_rev",required_argument, 0, 0 },
+ {"fastqout_rev", required_argument, 0, 0 },
+ {"fastx_filter", required_argument, 0, 0 },
+ {"fastx_getseq", required_argument, 0, 0 },
+ {"fastx_getseqs", required_argument, 0, 0 },
+ {"fastx_getsubseq", required_argument, 0, 0 },
+ {"fastx_mask", required_argument, 0, 0 },
+ {"fastx_revcomp", required_argument, 0, 0 },
+ {"fastx_subsample", required_argument, 0, 0 },
+ {"fulldp", no_argument, 0, 0 },
+ {"gapext", required_argument, 0, 0 },
+ {"gapopen", required_argument, 0, 0 },
+ {"gzip_decompress", no_argument, 0, 0 },
+ {"h", no_argument, 0, 0 },
+ {"hardmask", no_argument, 0, 0 },
+ {"help", no_argument, 0, 0 },
+ {"hspw", required_argument, 0, 0 },
{"id", required_argument, 0, 0 },
+ {"iddef", required_argument, 0, 0 },
+ {"idprefix", required_argument, 0, 0 },
+ {"idsuffix", required_argument, 0, 0 },
+ {"join_padgap", required_argument, 0, 0 },
+ {"join_padgapq", required_argument, 0, 0 },
+ {"label", required_argument, 0, 0 },
+ {"label_field", required_argument, 0, 0 },
+ {"label_substr_match", no_argument, 0, 0 },
+ {"label_suffix", required_argument, 0, 0 },
+ {"label_word", required_argument, 0, 0 },
+ {"label_words", required_argument, 0, 0 },
+ {"labels", required_argument, 0, 0 },
+ {"leftjust", no_argument, 0, 0 },
+ {"length_cutoffs", required_argument, 0, 0 },
+ {"log", required_argument, 0, 0 },
+ {"makeudb_usearch", required_argument, 0, 0 },
+ {"maskfasta", required_argument, 0, 0 },
+ {"match", required_argument, 0, 0 },
+ {"matched", required_argument, 0, 0 },
+ {"max_unmasked_pct", required_argument, 0, 0 },
{"maxaccepts", required_argument, 0, 0 },
+ {"maxdiffs", required_argument, 0, 0 },
+ {"maxgaps", required_argument, 0, 0 },
+ {"maxhits", required_argument, 0, 0 },
+ {"maxid", required_argument, 0, 0 },
+ {"maxqsize", required_argument, 0, 0 },
+ {"maxqt", required_argument, 0, 0 },
{"maxrejects", required_argument, 0, 0 },
- {"wordlength", required_argument, 0, 0 },
- {"match", required_argument, 0, 0 },
- {"mismatch", required_argument, 0, 0 },
- {"fulldp", no_argument, 0, 0 },
- {"strand", required_argument, 0, 0 },
- {"threads", required_argument, 0, 0 },
- {"gapopen", required_argument, 0, 0 },
- {"gapext", required_argument, 0, 0 },
- {"rowlen", required_argument, 0, 0 },
- {"userfields", required_argument, 0, 0 },
- {"userout", required_argument, 0, 0 },
- {"self", no_argument, 0, 0 },
- {"blast6out", required_argument, 0, 0 },
- {"uc", required_argument, 0, 0 },
- {"weak_id", required_argument, 0, 0 },
- {"uc_allhits", no_argument, 0, 0 },
- {"notrunclabels", no_argument, 0, 0 },
- {"sortbysize", required_argument, 0, 0 },
- {"output", required_argument, 0, 0 },
- {"minsize", required_argument, 0, 0 },
+ {"maxseqlength", required_argument, 0, 0 },
{"maxsize", required_argument, 0, 0 },
- {"relabel", required_argument, 0, 0 },
- {"sizeout", no_argument, 0, 0 },
- {"derep_fulllength", required_argument, 0, 0 },
+ {"maxsizeratio", required_argument, 0, 0 },
+ {"maxsl", required_argument, 0, 0 },
+ {"maxsubs", required_argument, 0, 0 },
+ {"maxuniquesize", required_argument, 0, 0 },
+ {"mid", required_argument, 0, 0 },
+ {"min_unmasked_pct", required_argument, 0, 0 },
+ {"mincols", required_argument, 0, 0 },
+ {"mindiffs", required_argument, 0, 0 },
+ {"mindiv", required_argument, 0, 0 },
+ {"minh", required_argument, 0, 0 },
+ {"minhsp", required_argument, 0, 0 },
+ {"minqt", required_argument, 0, 0 },
{"minseqlength", required_argument, 0, 0 },
+ {"minsize", required_argument, 0, 0 },
+ {"minsizeratio", required_argument, 0, 0 },
+ {"minsl", required_argument, 0, 0 },
+ {"mintsize", required_argument, 0, 0 },
{"minuniquesize", required_argument, 0, 0 },
- {"topn", required_argument, 0, 0 },
- {"maxseqlength", required_argument, 0, 0 },
- {"sizein", no_argument, 0, 0 },
- {"sortbylength", required_argument, 0, 0 },
- {"matched", required_argument, 0, 0 },
- {"notmatched", required_argument, 0, 0 },
- {"dbmatched", required_argument, 0, 0 },
- {"dbnotmatched", required_argument, 0, 0 },
- {"fastapairs", required_argument, 0, 0 },
+ {"minwordmatches", required_argument, 0, 0 },
+ {"mismatch", required_argument, 0, 0 },
+ {"mothur_shared_out", required_argument, 0, 0 },
+ {"msaout", required_argument, 0, 0 },
+ {"no_progress", no_argument, 0, 0 },
+ {"nonchimeras", required_argument, 0, 0 },
+ {"notmatched", required_argument, 0, 0 },
+ {"notmatchedfq", required_argument, 0, 0 },
+ {"notrunclabels", no_argument, 0, 0 },
+ {"otutabout", required_argument, 0, 0 },
+ {"output", required_argument, 0, 0 },
{"output_no_hits", no_argument, 0, 0 },
- {"maxhits", required_argument, 0, 0 },
- {"top_hits_only", no_argument, 0, 0 },
- {"fasta_width", required_argument, 0, 0 },
+ {"pattern", required_argument, 0, 0 },
+ {"profile", required_argument, 0, 0 },
+ {"qmask", required_argument, 0, 0 },
{"query_cov", required_argument, 0, 0 },
- {"target_cov", required_argument, 0, 0 },
- {"idprefix", required_argument, 0, 0 },
- {"idsuffix", required_argument, 0, 0 },
- {"minqt", required_argument, 0, 0 },
- {"maxqt", required_argument, 0, 0 },
- {"minsl", required_argument, 0, 0 },
- {"maxsl", required_argument, 0, 0 },
- {"leftjust", no_argument, 0, 0 },
+ {"quiet", no_argument, 0, 0 },
+ {"randseed", required_argument, 0, 0 },
+ {"relabel", required_argument, 0, 0 },
+ {"relabel_keep", no_argument, 0, 0 },
+ {"relabel_md5", no_argument, 0, 0 },
+ {"relabel_self", no_argument, 0, 0 },
+ {"relabel_sha1", no_argument, 0, 0 },
+ {"rereplicate", required_argument, 0, 0 },
+ {"reverse", required_argument, 0, 0 },
{"rightjust", no_argument, 0, 0 },
+ {"rowlen", required_argument, 0, 0 },
+ {"samheader", no_argument, 0, 0 },
+ {"samout", required_argument, 0, 0 },
+ {"sample_pct", required_argument, 0, 0 },
+ {"sample_size", required_argument, 0, 0 },
+ {"search_exact", required_argument, 0, 0 },
+ {"self", no_argument, 0, 0 },
{"selfid", no_argument, 0, 0 },
- {"maxid", required_argument, 0, 0 },
- {"minsizeratio", required_argument, 0, 0 },
- {"maxsizeratio", required_argument, 0, 0 },
- {"maxdiffs", required_argument, 0, 0 },
- {"maxsubs", required_argument, 0, 0 },
- {"maxgaps", required_argument, 0, 0 },
- {"mincols", required_argument, 0, 0 },
- {"maxqsize", required_argument, 0, 0 },
- {"mintsize", required_argument, 0, 0 },
- {"mid", required_argument, 0, 0 },
+ {"sff_clip", no_argument, 0, 0 },
+ {"sff_convert", required_argument, 0, 0 },
{"shuffle", required_argument, 0, 0 },
- {"randseed", required_argument, 0, 0 },
- {"maskfasta", required_argument, 0, 0 },
- {"hardmask", no_argument, 0, 0 },
- {"qmask", required_argument, 0, 0 },
- {"dbmask", required_argument, 0, 0 },
- {"cluster_smallmem", required_argument, 0, 0 },
- {"cluster_fast", required_argument, 0, 0 },
- {"centroids", required_argument, 0, 0 },
- {"clusters", required_argument, 0, 0 },
- {"consout", required_argument, 0, 0 },
- {"cons_truncate", no_argument, 0, 0 },
- {"msaout", required_argument, 0, 0 },
- {"usersort", no_argument, 0, 0 },
- {"xn", required_argument, 0, 0 },
- {"iddef", required_argument, 0, 0 },
+ {"sintax", required_argument, 0, 0 },
+ {"sintax_cutoff", required_argument, 0, 0 },
+ {"sizein", no_argument, 0, 0 },
+ {"sizeorder", no_argument, 0, 0 },
+ {"sizeout", no_argument, 0, 0 },
{"slots", required_argument, 0, 0 },
- {"pattern", required_argument, 0, 0 },
- {"maxuniquesize", required_argument, 0, 0 },
- {"abskew", required_argument, 0, 0 },
- {"chimeras", required_argument, 0, 0 },
- {"dn", required_argument, 0, 0 },
- {"mindiffs", required_argument, 0, 0 },
- {"mindiv", required_argument, 0, 0 },
- {"minh", required_argument, 0, 0 },
- {"nonchimeras", required_argument, 0, 0 },
+ {"sortbylength", required_argument, 0, 0 },
+ {"sortbysize", required_argument, 0, 0 },
+ {"strand", required_argument, 0, 0 },
+ {"subseq_end", required_argument, 0, 0 },
+ {"subseq_start", required_argument, 0, 0 },
+ {"tabbedout", required_argument, 0, 0 },
+ {"target_cov", required_argument, 0, 0 },
+ {"threads", required_argument, 0, 0 },
+ {"top_hits_only", no_argument, 0, 0 },
+ {"topn", required_argument, 0, 0 },
+ {"uc", required_argument, 0, 0 },
+ {"uc_allhits", no_argument, 0, 0 },
+ {"uchime2_denovo", required_argument, 0, 0 },
+ {"uchime3_denovo", required_argument, 0, 0 },
{"uchime_denovo", required_argument, 0, 0 },
{"uchime_ref", required_argument, 0, 0 },
{"uchimealns", required_argument, 0, 0 },
{"uchimeout", required_argument, 0, 0 },
{"uchimeout5", no_argument, 0, 0 },
- {"alignwidth", required_argument, 0, 0 },
- {"allpairs_global", required_argument, 0, 0 },
- {"acceptall", no_argument, 0, 0 },
- {"cluster_size", required_argument, 0, 0 },
- {"samout", required_argument, 0, 0 },
- {"log", required_argument, 0, 0 },
- {"quiet", no_argument, 0, 0 },
- {"fastx_subsample", required_argument, 0, 0 },
- {"sample_pct", required_argument, 0, 0 },
- {"fastq_chars", required_argument, 0, 0 },
- {"profile", required_argument, 0, 0 },
- {"sample_size", required_argument, 0, 0 },
- {"fastaout", required_argument, 0, 0 },
- {"xsize", no_argument, 0, 0 },
- {"clusterout_id", no_argument, 0, 0 },
- {"clusterout_sort", no_argument, 0, 0 },
- {"borderline", required_argument, 0, 0 },
- {"relabel_sha1", no_argument, 0, 0 },
- {"relabel_md5", no_argument, 0, 0 },
- {"derep_prefix", required_argument, 0, 0 },
- {"fastq_filter", required_argument, 0, 0 },
- {"fastqout", required_argument, 0, 0 },
- {"fastaout_discarded", required_argument, 0, 0 },
- {"fastqout_discarded", required_argument, 0, 0 },
- {"fastq_truncqual", required_argument, 0, 0 },
- {"fastq_maxee", required_argument, 0, 0 },
- {"fastq_trunclen", required_argument, 0, 0 },
- {"fastq_minlen", required_argument, 0, 0 },
- {"fastq_stripleft", required_argument, 0, 0 },
- {"fastq_maxee_rate", required_argument, 0, 0 },
- {"fastq_maxns", required_argument, 0, 0 },
- {"eeout", no_argument, 0, 0 },
- {"fastq_ascii", required_argument, 0, 0 },
- {"fastq_qmin", required_argument, 0, 0 },
- {"fastq_qmax", required_argument, 0, 0 },
- {"fastq_qmaxout", required_argument, 0, 0 },
- {"fastq_stats", required_argument, 0, 0 },
- {"fastq_tail", required_argument, 0, 0 },
- {"fastx_revcomp", required_argument, 0, 0 },
- {"label_suffix", required_argument, 0, 0 },
- {"h", no_argument, 0, 0 },
- {"samheader", no_argument, 0, 0 },
- {"sizeorder", no_argument, 0, 0 },
- {"minwordmatches", required_argument, 0, 0 },
- {"v", no_argument, 0, 0 },
- {"relabel_keep", no_argument, 0, 0 },
- {"search_exact", required_argument, 0, 0 },
- {"fastx_mask", required_argument, 0, 0 },
- {"min_unmasked_pct", required_argument, 0, 0 },
- {"max_unmasked_pct", required_argument, 0, 0 },
- {"fastq_convert", required_argument, 0, 0 },
- {"fastq_asciiout", required_argument, 0, 0 },
- {"fastq_qminout", required_argument, 0, 0 },
- {"fastq_mergepairs", required_argument, 0, 0 },
- {"fastq_eeout", no_argument, 0, 0 },
- {"fastqout_notmerged_fwd",required_argument, 0, 0 },
- {"fastqout_notmerged_rev",required_argument, 0, 0 },
- {"fastq_minovlen", required_argument, 0, 0 },
- {"fastq_minmergelen", required_argument, 0, 0 },
- {"fastq_maxmergelen", required_argument, 0, 0 },
- {"fastq_nostagger", no_argument, 0, 0 },
- {"fastq_allowmergestagger", no_argument, 0, 0 },
- {"fastq_maxdiffs", required_argument, 0, 0 },
- {"fastaout_notmerged_fwd",required_argument, 0, 0 },
- {"fastaout_notmerged_rev",required_argument, 0, 0 },
- {"reverse", required_argument, 0, 0 },
- {"eetabbedout", required_argument, 0, 0 },
- {"fasta_score", no_argument, 0, 0 },
- {"fastq_eestats", required_argument, 0, 0 },
- {"rereplicate", required_argument, 0, 0 },
- {"xdrop_nw", required_argument, 0, 0 },
- {"minhsp", required_argument, 0, 0 },
- {"band", required_argument, 0, 0 },
- {"hspw", required_argument, 0, 0 },
- {"gzip_decompress", no_argument, 0, 0 },
- {"bzip2_decompress", no_argument, 0, 0 },
- {"fastq_maxlen", required_argument, 0, 0 },
- {"fastq_truncee", required_argument, 0, 0 },
- {"fastx_filter", required_argument, 0, 0 },
- {"otutabout", required_argument, 0, 0 },
- {"mothur_shared_out", required_argument, 0, 0 },
- {"biomout", required_argument, 0, 0 },
- {"fastq_trunclen_keep", required_argument, 0, 0 },
- {"fastq_stripright", required_argument, 0, 0 },
- {"no_progress", no_argument, 0, 0 },
- {"fastq_eestats2", required_argument, 0, 0 },
- {"ee_cutoffs", required_argument, 0, 0 },
- {"length_cutoffs", required_argument, 0, 0 },
- {"makeudb_usearch", required_argument, 0, 0 },
{"udb2fasta", required_argument, 0, 0 },
{"udbinfo", required_argument, 0, 0 },
{"udbstats", required_argument, 0, 0 },
- {"cluster_unoise", required_argument, 0, 0 },
{"unoise_alpha", required_argument, 0, 0 },
- {"uchime2_denovo", required_argument, 0, 0 },
- {"uchime3_denovo", required_argument, 0, 0 },
- {"sintax", required_argument, 0, 0 },
- {"sintax_cutoff", required_argument, 0, 0 },
- {"tabbedout", required_argument, 0, 0 },
- {"fastq_maxdiffpct", required_argument, 0, 0 },
- {"fastq_join", required_argument, 0, 0 },
- {"join_padgap", required_argument, 0, 0 },
- {"join_padgapq", required_argument, 0, 0 },
- {"sff_convert", required_argument, 0, 0 },
- {"sff_clip", no_argument, 0, 0 },
- {"fastaout_rev", required_argument, 0, 0 },
- {"fastaout_discarded_rev",required_argument, 0, 0 },
- {"fastqout_rev", required_argument, 0, 0 },
- {"fastqout_discarded_rev",required_argument, 0, 0 },
+ {"usearch_global", required_argument, 0, 0 },
+ {"userfields", required_argument, 0, 0 },
+ {"userout", required_argument, 0, 0 },
+ {"usersort", no_argument, 0, 0 },
+ {"v", no_argument, 0, 0 },
+ {"version", no_argument, 0, 0 },
+ {"weak_id", required_argument, 0, 0 },
+ {"wordlength", required_argument, 0, 0 },
+ {"xdrop_nw", required_argument, 0, 0 },
{"xee", no_argument, 0, 0 },
- {"fastx_getseq", required_argument, 0, 0 },
- {"fastx_getseqs", required_argument, 0, 0 },
- {"fastx_getsubseq", required_argument, 0, 0 },
- {"label_substr_match", no_argument, 0, 0 },
- {"label", required_argument, 0, 0 },
- {"subseq_start", required_argument, 0, 0 },
- {"subseq_end", required_argument, 0, 0 },
- {"notmatchedfq", required_argument, 0, 0 },
- {"label_field", required_argument, 0, 0 },
- {"label_word", required_argument, 0, 0 },
- {"label_words", required_argument, 0, 0 },
- {"labels", required_argument, 0, 0 },
- {"cut", required_argument, 0, 0 },
- {"cut_pattern", required_argument, 0, 0 },
+ {"xn", required_argument, 0, 0 },
+ {"xsize", no_argument, 0, 0 },
{ 0, 0, 0, 0 }
};
@@ -2279,6 +2283,10 @@ void args_init(int argc, char **argv)
opt_cut_pattern = optarg;
break;
+ case option_relabel_self:
+ opt_relabel_self = 1;
+ break;
+
default:
fatal("Internal error in option parsing");
}
@@ -2292,6 +2300,8 @@ void args_init(int argc, char **argv)
if (optind < argc)
fatal("Unrecognized string on command line (%s)", argv[optind]);
+ /* Below is a list of all command names, in alphabetical order. */
+
int command_options[] =
{
option_allpairs_global,
@@ -2342,7 +2352,12 @@ void args_init(int argc, char **argv)
const int commands_count = sizeof(command_options) / sizeof(int);
- const int valid_options[][91] =
+ /*
+ Below is a list of all the options that are valid for each command.
+ The first line is the command and the lines below are the valid options.
+ */
+
+ const int valid_options[][92] =
{
{
option_allpairs_global,
@@ -2400,6 +2415,7 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
option_rightjust,
option_rowlen,
@@ -2488,6 +2504,7 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
option_rightjust,
option_rowlen,
@@ -2578,6 +2595,7 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
option_rightjust,
option_rowlen,
@@ -2668,6 +2686,7 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
option_rightjust,
option_rowlen,
@@ -2760,6 +2779,7 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
option_rightjust,
option_rowlen,
@@ -2802,7 +2822,10 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
+ option_sizein,
+ option_sizeout,
option_xee,
option_xsize,
-1 },
@@ -2823,6 +2846,7 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
option_sizein,
option_sizeout,
@@ -2850,6 +2874,7 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
option_sizein,
option_sizeout,
@@ -2887,6 +2912,7 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
option_sizein,
option_sizeout,
@@ -2959,6 +2985,7 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
option_reverse,
option_sizein,
@@ -2985,8 +3012,11 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
option_reverse,
+ option_sizein,
+ option_sizeout,
option_threads,
option_xee,
option_xsize,
@@ -3028,8 +3058,11 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
option_reverse,
+ option_sizein,
+ option_sizeout,
option_threads,
option_xee,
option_xsize,
@@ -3085,6 +3118,7 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
option_reverse,
option_sizein,
@@ -3114,7 +3148,10 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
+ option_sizein,
+ option_sizeout,
option_threads,
option_xee,
option_xsize,
@@ -3122,6 +3159,7 @@ void args_init(int argc, char **argv)
{ option_fastx_getseqs,
option_bzip2_decompress,
+ option_fasta_width,
option_fastaout,
option_fastq_ascii,
option_fastq_qmax,
@@ -3143,7 +3181,10 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
+ option_sizein,
+ option_sizeout,
option_threads,
option_xee,
option_xsize,
@@ -3169,7 +3210,10 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
+ option_sizein,
+ option_sizeout,
option_subseq_end,
option_subseq_start,
option_threads,
@@ -3197,7 +3241,10 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
+ option_sizein,
+ option_sizeout,
option_threads,
option_xee,
option_xsize,
@@ -3220,7 +3267,10 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
+ option_sizein,
+ option_sizeout,
option_threads,
option_xee,
option_xsize,
@@ -3245,6 +3295,7 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
option_sample_pct,
option_sample_size,
@@ -3299,7 +3350,10 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
+ option_sizein,
+ option_sizeout,
option_threads,
option_xee,
option_xsize,
@@ -3317,6 +3371,7 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
option_sizein,
option_sizeout,
@@ -3365,6 +3420,7 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
option_rowlen,
option_samheader,
@@ -3391,7 +3447,13 @@ void args_init(int argc, char **argv)
option_log,
option_no_progress,
option_quiet,
+ option_relabel,
+ option_relabel_keep,
+ option_relabel_md5,
+ option_relabel_self,
+ option_relabel_sha1,
option_sff_clip,
+ option_sizeout,
option_threads,
-1 },
@@ -3413,7 +3475,9 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
+ option_sizein,
option_sizeout,
option_threads,
option_topn,
@@ -3457,7 +3521,9 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
+ option_sizein,
option_sizeout,
option_threads,
option_topn,
@@ -3484,7 +3550,9 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
+ option_sizein,
option_sizeout,
option_threads,
option_topn,
@@ -3517,6 +3585,7 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
option_sizein,
option_sizeout,
@@ -3554,6 +3623,7 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
option_sizein,
option_sizeout,
@@ -3591,6 +3661,7 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
option_sizein,
option_sizeout,
@@ -3630,6 +3701,7 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
option_self,
option_selfid,
@@ -3654,7 +3726,10 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
+ option_sizein,
+ option_sizeout,
option_threads,
option_xee,
option_xsize,
@@ -3734,6 +3809,7 @@ void args_init(int argc, char **argv)
option_relabel,
option_relabel_keep,
option_relabel_md5,
+ option_relabel_self,
option_relabel_sha1,
option_rightjust,
option_rowlen,
@@ -3924,14 +4000,9 @@ void args_init(int argc, char **argv)
if (opt_sample_size < 0)
fatal("The argument to --sample_size must not be negative");
- if (opt_relabel_sha1 && opt_relabel_md5)
- fatal("Specify either --relabel_sha1 or --relabel_md5, not both");
-
- if (opt_relabel && opt_relabel_md5)
- fatal("Specify either --relabel or --relabel_md5, not both");
-
- if (opt_relabel && opt_relabel_sha1)
- fatal("Specify either --relabel or --relabel_sha1, not both");
+ if (((opt_relabel ? 1 : 0) +
+ opt_relabel_md5 + opt_relabel_self + opt_relabel_sha1) > 1)
+ fatal("Specify only one of --relabel, --relabel_self, --relabel_sha1, or --relabel_md5");
if (opt_fastq_tail < 1)
fatal("The argument to --fastq_tail must be positive");
@@ -4139,6 +4210,7 @@ void cmd_help()
" --relabel STRING relabel nonchimeras with this prefix string\n"
" --relabel_keep keep the old label after the new when relabelling\n"
" --relabel_md5 relabel with md5 digest of normalized sequence\n"
+ " --relabel_self relabel with the sequence itself as label\n"
" --relabel_sha1 relabel with sha1 digest of normalized sequence\n"
" --sizeout include abundance information when relabelling\n"
" --uchimealns FILENAME output chimera alignments to file\n"
@@ -4175,6 +4247,7 @@ void cmd_help()
" --relabel STRING relabel centroids with this prefix string\n"
" --relabel_keep keep the old label after the new when relabelling\n"
" --relabel_md5 relabel with md5 digest of normalized sequence\n"
+ " --relabel_self relabel with the sequence itself as label\n"
" --relabel_sha1 relabel with sha1 digest of normalized sequence\n"
" --sizeorder sort accepted centroids by abundance, AGC\n"
" --sizeout write cluster abundances to centroid file\n"
@@ -4205,6 +4278,7 @@ void cmd_help()
" --relabel STRING relabel with this prefix string\n"
" --relabel_keep keep the old label after the new when relabelling\n"
" --relabel_md5 relabel with md5 digest of normalized sequence\n"
+ " --relabel_self relabel with the sequence itself as label\n"
" --relabel_sha1 relabel with sha1 digest of normalized sequence\n"
" --sizeout write abundance annotation to output\n"
" --topn INT output only n most abundant sequences after derep\n"
@@ -4418,6 +4492,7 @@ void cmd_help()
" --relabel STRING relabel sequences with this prefix string\n"
" --relabel_keep keep the old label after the new when relabelling\n"
" --relabel_md5 relabel with md5 digest of normalized sequence\n"
+ " --relabel_self relabel with the sequence itself as label\n"
" --relabel_sha1 relabel with sha1 digest of normalized sequence\n"
" --sizeout include abundance information when relabelling\n"
" --topn INT output just first n sequences\n"
@@ -4441,6 +4516,7 @@ void cmd_help()
" --relabel STRING relabel sequences with this prefix string\n"
" --relabel_keep keep the old label after the new when relabelling\n"
" --relabel_md5 relabel with md5 digest of normalized sequence\n"
+ " --relabel_self relabel with the sequence itself as label\n"
" --relabel_sha1 relabel with sha1 digest of normalized sequence\n"
" --sizeout update abundance information in output\n"
" --xsize strip abundance information in output\n"
@@ -4487,6 +4563,7 @@ void cmd_help()
" --relabel STRING relabel filtered sequences with given prefix\n"
" --relabel_keep keep the old label after the new when relabelling\n"
" --relabel_md5 relabel filtered sequences with md5 digest\n"
+ " --relabel_self relabel with the sequence itself as label\n"
" --relabel_sha1 relabel filtered sequences with sha1 digest\n"
" --sizeout include abundance information when relabelling\n"
" --xee remove expected errors (ee) info from output\n"
=====================================
src/vsearch.h
=====================================
@@ -265,6 +265,7 @@ extern bool opt_no_progress;
extern bool opt_quiet;
extern bool opt_relabel_keep;
extern bool opt_relabel_md5;
+extern bool opt_relabel_self;
extern bool opt_relabel_sha1;
extern bool opt_samheader;
extern bool opt_sff_clip;
View it on GitLab: https://salsa.debian.org/med-team/vsearch/commit/a6e84605ff4ae3d2eb2fe51571bda5469e368a1e
--
View it on GitLab: https://salsa.debian.org/med-team/vsearch/commit/a6e84605ff4ae3d2eb2fe51571bda5469e368a1e
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20191129/b56df0c9/attachment-0001.html>
More information about the debian-med-commit
mailing list