[med-svn] [swarm-cluster] 01/08: New upstream version 2.1.10
Sascha Steinbiss
satta at debian.org
Sat Jan 14 07:37:41 UTC 2017
This is an automated email from the git hooks/post-receive script.
satta pushed a commit to branch master
in repository swarm-cluster.
commit 19f3a6cb5036a1c6dfb2eeda08a346409f6abd7c
Author: Sascha Steinbiss <satta at debian.org>
Date: Fri Jan 13 22:17:53 2017 +0000
New upstream version 2.1.10
---
README.md | 21 +++++++++++++++----
man/swarm.1 | 57 +++++++++++++++++++++++++++++++-------------------
man/swarm_manual.pdf | Bin 25500 -> 26035 bytes
scripts/graph_plot.py | 14 ++++++-------
src/algo.cc | 2 +-
src/algod1.cc | 2 +-
src/swarm.cc | 2 +-
src/swarm.h | 2 +-
8 files changed, 63 insertions(+), 37 deletions(-)
diff --git a/README.md b/README.md
index 16586c5..3e90da0 100644
--- a/README.md
+++ b/README.md
@@ -318,7 +318,9 @@ filename` option.
For each OTU, get the fasta sequences for all amplicons. Warning, this
loop can generate a very large number of files. To limit the number of
files, a test can be added to exclude swarms with less than *n*
-elements.
+elements. See
+[this wiki page](https://github.com/torognes/swarm/wiki/Get-fasta-sequences-for-all-amplicons-in-a-OTU)
+for more examples.
```sh
INPUT_SWARM="amplicons.swarms"
@@ -326,10 +328,11 @@ INPUT_FASTA="amplicons.fasta"
OUTPUT_FOLDER="swarms_fasta"
AMPLICONS=$(mktemp)
mkdir "${OUTPUT_FOLDER}"
-while read swarm ; do
- tr " " "\n" <<< "${swarm}" | sed -e 's/^/>/' > "${AMPLICONS}"
+while read OTU ; do
+ tr " " "\n" <<< "${OTU}" | sed -e 's/^/>/' > "${AMPLICONS}"
seed=$(head -n 1 "${AMPLICONS}")
- grep -A 1 -F -f "${AMPLICONS}" "${INPUT_FASTA}" | sed -e '/^--$/d' > "./${OUTPUT_FOLDER}/${seed/>/}.fasta"
+ grep -A 1 -F -f "${AMPLICONS}" "${INPUT_FASTA}" | \
+ sed -e '/^--$/d' > "./${OUTPUT_FOLDER}/${seed/>/}.fasta"
done < "${INPUT_SWARM}"
rm "${AMPLICONS}"
```
@@ -401,6 +404,16 @@ methods, here are some links:
<a name="history"/>
## Version history##
+<a name="version2110"/>
+### version 2.1.10 ###
+
+**swarm** 2.1.10 fixes two bugs related to gap penalties of
+alignments. The first bug may lead to wrong aligments and similarity
+percentages reported in UCLUST (.uc) files. The second bug makes Swarm
+use a slightly higher gap extension penalty than specified. The
+default gap extension penalty used have actually been 4.5 instead of
+4.
+
<a name="version219"/>
### version 2.1.9 ###
diff --git a/man/swarm.1 b/man/swarm.1
index a504f7d..2372d4c 100644
--- a/man/swarm.1
+++ b/man/swarm.1
@@ -1,5 +1,5 @@
.\" ============================================================================
-.TH swarm 1 "July 6, 2016" "version 2.1.9" "USER COMMANDS"
+.TH swarm 1 "December 22, 2016" "version 2.1.10" "USER COMMANDS"
.\" ============================================================================
.SH NAME
swarm \(em find clusters of nearly-identical nucleotide amplicons
@@ -19,27 +19,28 @@ clustering methods are based on greedy, input-order dependent
algorithms, with arbitrary selection of global cluster size and
cluster centroids. To address that problem, we developed \fBswarm\fR,
a fast and robust method that recursively groups amplicons with
-\fId\fR or less differences. \fBswarm\fR produces natural and stable
-clusters centered on local peaks of abundance, mostly free from
-input-order dependency induced by centroid selection.
+\fId\fR or less differences (i.e. substitutions, insertions or
+deletions). \fBswarm\fR produces natural and stable clusters centered
+on local peaks of abundance, mostly free from input-order dependency
+induced by centroid selection.
.PP
Exact clustering is impractical on large data sets when using a naïve
all-vs-all approach (more precisely a 2-combination without
repetitions), as it implies unrealistic numbers of pairwise
comparisons. \fBswarm\fR is based on a maximum number of differences
\fId\fR between two amplicons, and focuses only on very close local
-relationships. For \fId\fR = 1 (default value), swarm uses an
+relationships. For \fId\fR = 1, the default value, \fBswarm\fR uses an
algorithm of linear complexity that generates all possible single
mutations and performs exact-string matching by comparing
-hash-values. For \fId\fR = 2 or greater, swarm uses an algorithm of
-quadratic complexity that performs pairwise string comparisons. An
+hash-values. For \fId\fR = 2 or greater, \fBswarm\fR uses an algorithm
+of quadratic complexity that performs pairwise string comparisons. An
efficient \fIk\fR-mer-based filtering and an astute use of comparisons
-results obtained during the clustering process allows to avoid most of
-the amplicon comparisons needed in a naïve approach. To speed up the
-remaining amplicon comparisons, \fBswarm\fR implements an extremely
-fast Needleman-Wunsch algorithm making use of the Streaming SIMD
-Extensions (SSE2) of modern x86-64 CPUs. If SSE2 instructions are not
-available, \fBswarm\fR exits with an error message.
+results obtained during the clustering process allows \fBswarm\fR to
+avoid most of the amplicon comparisons needed in a naïve approach. To
+speed up the remaining amplicon comparisons, \fBswarm\fR implements an
+extremely fast Needleman-Wunsch algorithm making use of the Streaming
+SIMD Extensions (SSE2) of modern x86-64 CPUs. If SSE2 instructions are
+not available, \fBswarm\fR exits with an error message.
.PP
\fBswarm\fR reads the named input \fIfilename\fR, a fasta file of
nucleotide amplicons. The amplicon identifier is defined as the string
@@ -61,7 +62,7 @@ other symbol is present.
.SS General options
.TP 9
.B \-h\fP,\fB\ \-\-help
-display this help and exit.
+display this help and exit successfully.
.TP
.BI \-t\fP,\fB\ \-\-threads\~ "positive integer"
number of computation threads to use. Values between 1 and 256 are
@@ -69,7 +70,12 @@ accepted, but we recommend to use a number of threads lesser or equal
to the number of available CPU cores. Default number of threads is 1.
.TP
.B \-v\fP,\fB\ \-\-version
-output version information and exit.
+output version information and exit successfully.
+.TP
+.B \-\-
+delimit the option list. Later arguments, if any, are treated as
+operands even if they begin with "\-". For example, "swarm \-\-
+\-file.fasta" reads from the file "\-file.fasta".
.LP
.\" ----------------------------------------------------------------------------
.SS Clustering options
@@ -89,7 +95,7 @@ higher. When using \fId\fR = 0, \fBswarm\fR will output results
corresponding to a strict dereplication of the dataset, i.e. merging
identical amplicons. Warning, whatever the \fId\fR value, \fBswarm\fR
requires fasta entries to present abundance values. Default number of
-differences is 1.
+differences \fId\fR is 1.
.TP
.B \-n\fP,\fB\ \-\-no\-otu\-breaking
deactivate the built-in OTU refinement (not recommended). Amplicon
@@ -207,16 +213,16 @@ with one OTU per row and seven columns of information:
.IP \n[step]. 4
number of unique amplicons in the OTU,
.IP \n+[step].
-total copy number of amplicons in the OTU,
+total abundance of amplicons in the OTU,
.IP \n+[step].
identifier of the initial seed,
.IP \n+[step].
-initial seed copy number,
+initial seed abundance,
.IP \n+[step].
-number of amplicons with a copy number of 1 in the OTU,
+number of amplicons with an abundance of 1 in the OTU,
.IP \n+[step].
maximum number of iterations before the OTU reached its natural
-limits),
+limit),
.IP \n+[step].
theoretical maximum radius of the OTU (i.e., number of cummulated
differences between the seed and the furthermost amplicon in the
@@ -230,7 +236,7 @@ file. That option does not modify \fBswarm\fR's default output format.
.TP
.BI \-w\fP,\fB\ \-\-seeds \0filename
output OTU representatives to \fIfilename\fR in fasta format. The
-abundance value of each representative is the sum of the abundances of
+abundance value of each OTU representative is the sum of the abundances of
all the amplicons in the OTU.
.TP
.B \-z\fP,\fB\ \-\-usearch\-abundance
@@ -302,7 +308,7 @@ curmudgeonly e-mail to Frédéric Mahé <mahe at rhrk.uni-kl.de> and
Torbjørn Rognes <torognes at ifi.uio.no>.
.\" ============================================================================
.SH AVAILABILITY
-The software is available from <https://github.com/torognes/swarm>
+Source code and binaries are available at <https://github.com/torognes/swarm>
.\" ============================================================================
.SH COPYRIGHT
Copyright (C) 2012-2016 Frédéric Mahé & Torbjørn Rognes
@@ -338,6 +344,13 @@ New features and important modifications of \fBswarm\fR (short lived
or minor bug releases are not mentioned):
.RS
.TP
+.BR v2.1.10\~ "released December 22, 2016"
+Version 2.1.10 fixes two bugs related to gap penalties of alignments.
+The first bug may lead to wrong aligments and similarity percentages
+reported in UCLUST (.uc) files. The second bug makes Swarm use a
+slightly higher gap extension penalty than specified. The default gap
+extension penalty used have actually been 4.5 instead of 4.
+.TP
.BR v2.1.9\~ "released July 6, 2016"
Version 2.1.9 fixes errors when compiling with GCC version 6.
.TP
diff --git a/man/swarm_manual.pdf b/man/swarm_manual.pdf
index 2e67208..8b22a9c 100644
Binary files a/man/swarm_manual.pdf and b/man/swarm_manual.pdf differ
diff --git a/scripts/graph_plot.py b/scripts/graph_plot.py
index bb76a9a..7b355fe 100644
--- a/scripts/graph_plot.py
+++ b/scripts/graph_plot.py
@@ -5,15 +5,14 @@
abundance). Requires the module igraph and python 2.7+.
Limitations: amplicons grafted with the fastidious option will be
- discarded and will not be visualized. The script does not deal
- with ";size=" abundance annotations.
+ discarded and will not be visualized.
"""
from __future__ import print_function
__author__ = "Frédéric Mahé <mahe at rhrk.uni-kl.fr>"
-__date__ = "2015/04/22"
-__version__ = "$Revision: 3.0"
+__date__ = "2016/11/09"
+__version__ = "$Revision: 3.1"
import sys
import os.path
@@ -33,9 +32,9 @@ def option_parse():
"""
desc = """Visualize the internal structure of a given OTU."""
- parser = OptionParser(usage="usage: %prog -s FILE -i FILE -o INT",
+ parser = OptionParser(usage="usage: %prog -s FILE -i FILE [-o INT -d INT]",
description=desc,
- version="%prog version 2.0")
+ version="%prog version 3.1")
parser.add_option("-s", "--swarms",
metavar="<FILENAME>",
@@ -63,7 +62,8 @@ def option_parse():
type="int",
dest="drop",
default=0,
- help="Drop amplicons seen <INTEGER> or less times (0)")
+ help="Drop amplicons seen <INTEGER> or less times \
+ (zero by default)")
(options, args) = parser.parse_args()
diff --git a/src/algo.cc b/src/algo.cc
index 3174192..c9ff0fd 100644
--- a/src/algo.cc
+++ b/src/algo.cc
@@ -424,7 +424,7 @@ void algo_run()
unsigned long nwalignmentlength = 0;
nw(dseq, dend, qseq, qend,
- score_matrix_63, gapopen, gapextend,
+ score_matrix_63, penalty_gapopen, penalty_gapextend,
& nwscore, & nwdiff, & nwalignmentlength, & nwalignment,
dir, hearray, 0, 0);
diff --git a/src/algod1.cc b/src/algod1.cc
index 14cebe9..6004b8c 100644
--- a/src/algod1.cc
+++ b/src/algod1.cc
@@ -1431,7 +1431,7 @@ void algo_d1_run()
unsigned long nwalignmentlength = 0;
nw(dseq, dend, qseq, qend,
- score_matrix_63, gapopen, gapextend,
+ score_matrix_63, penalty_gapopen, penalty_gapextend,
& nwscore, & nwdiff, & nwalignmentlength, & nwalignment,
dir, hearray, 0, 0);
diff --git a/src/swarm.cc b/src/swarm.cc
index e6276d7..54af068 100644
--- a/src/swarm.cc
+++ b/src/swarm.cc
@@ -526,7 +526,7 @@ int main(int argc, char** argv)
penalty_mismatch = 2 * matchscore - 2 * mismatchscore;
penalty_gapopen = 2 * gapopen;
- penalty_gapextend = 2 * matchscore + gapextend;
+ penalty_gapextend = matchscore + 2 * gapextend;
penalty_factor = gcd(gcd(penalty_mismatch, penalty_gapopen), penalty_gapextend);
diff --git a/src/swarm.h b/src/swarm.h
index 2d2dd51..bcc84cd 100644
--- a/src/swarm.h
+++ b/src/swarm.h
@@ -52,7 +52,7 @@
#define LINE_MAX 2048
#endif
-#define SWARM_VERSION "2.1.9"
+#define SWARM_VERSION "2.1.10"
#define WIDTH 32
#define WIDTH_SHIFT 5
#define BLOCKWIDTH 32
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/swarm-cluster.git
More information about the debian-med-commit
mailing list