[med-svn] [swarm-cluster] 01/08: New upstream version 2.1.10

Sat Jan 14 07:37:41 UTC 2017

This is an automated email from the git hooks/post-receive script.

satta pushed a commit to branch master
in repository swarm-cluster.

commit 19f3a6cb5036a1c6dfb2eeda08a346409f6abd7c
Author: Sascha Steinbiss <satta at debian.org>
Date:   Fri Jan 13 22:17:53 2017 +0000

    New upstream version 2.1.10
---
 README.md             |  21 +++++++++++++++----
 man/swarm.1           |  57 +++++++++++++++++++++++++++++++-------------------
 man/swarm_manual.pdf  | Bin 25500 -> 26035 bytes
 scripts/graph_plot.py |  14 ++++++-------
 src/algo.cc           |   2 +-
 src/algod1.cc         |   2 +-
 src/swarm.cc          |   2 +-
 src/swarm.h           |   2 +-
 8 files changed, 63 insertions(+), 37 deletions(-)

diff --git a/README.md b/README.md
index 16586c5..3e90da0 100644
--- a/README.md
+++ b/README.md
@@ -318,7 +318,9 @@ filename` option.
 For each OTU, get the fasta sequences for all amplicons. Warning, this
 loop can generate a very large number of files. To limit the number of
 files, a test can be added to exclude swarms with less than *n*
-elements.
+elements. See
+[this wiki page](https://github.com/torognes/swarm/wiki/Get-fasta-sequences-for-all-amplicons-in-a-OTU)
+for more examples.
 
 ```sh
 INPUT_SWARM="amplicons.swarms"
@@ -326,10 +328,11 @@ INPUT_FASTA="amplicons.fasta"
 OUTPUT_FOLDER="swarms_fasta"
 AMPLICONS=$(mktemp)
 mkdir "${OUTPUT_FOLDER}"
-while read swarm ; do
-    tr " " "\n" <<< "${swarm}" | sed -e 's/^/>/' > "${AMPLICONS}"
+while read OTU ; do
+    tr " " "\n" <<< "${OTU}" | sed -e 's/^/>/' > "${AMPLICONS}"
     seed=$(head -n 1 "${AMPLICONS}")
-    grep -A 1 -F -f "${AMPLICONS}" "${INPUT_FASTA}" | sed -e '/^--$/d' > "./${OUTPUT_FOLDER}/${seed/>/}.fasta"
+    grep -A 1 -F -f "${AMPLICONS}" "${INPUT_FASTA}" | \
+        sed -e '/^--$/d' > "./${OUTPUT_FOLDER}/${seed/>/}.fasta"
 done < "${INPUT_SWARM}"
 rm "${AMPLICONS}"
 ```
@@ -401,6 +404,16 @@ methods, here are some links:
 <a name="history"/>
 ## Version history##
 
+<a name="version2110"/>
+### version 2.1.10 ###
+
+**swarm** 2.1.10 fixes two bugs related to gap penalties of
+alignments.  The first bug may lead to wrong aligments and similarity
+percentages reported in UCLUST (.uc) files. The second bug makes Swarm
+use a slightly higher gap extension penalty than specified. The
+default gap extension penalty used have actually been 4.5 instead of
+4.
+
 <a name="version219"/>
 ### version 2.1.9 ###
 
diff --git a/man/swarm.1 b/man/swarm.1
index a504f7d..2372d4c 100644
--- a/man/swarm.1
+++ b/man/swarm.1
@@ -1,5 +1,5 @@
 .\" ============================================================================
-.TH swarm 1 "July 6, 2016" "version 2.1.9" "USER COMMANDS"
+.TH swarm 1 "December 22, 2016" "version 2.1.10" "USER COMMANDS"
 .\" ============================================================================
 .SH NAME
 swarm \(em find clusters of nearly-identical nucleotide amplicons
@@ -19,27 +19,28 @@ clustering methods are based on greedy, input-order dependent
 algorithms, with arbitrary selection of global cluster size and
 cluster centroids. To address that problem, we developed \fBswarm\fR,
 a fast and robust method that recursively groups amplicons with
-\fId\fR or less differences. \fBswarm\fR produces natural and stable
-clusters centered on local peaks of abundance, mostly free from
-input-order dependency induced by centroid selection.
+\fId\fR or less differences (i.e. substitutions, insertions or
+deletions). \fBswarm\fR produces natural and stable clusters centered
+on local peaks of abundance, mostly free from input-order dependency
+induced by centroid selection.
 .PP
 Exact clustering is impractical on large data sets when using a naïve
 all-vs-all approach (more precisely a 2-combination without
 repetitions), as it implies unrealistic numbers of pairwise
 comparisons. \fBswarm\fR is based on a maximum number of differences
 \fId\fR between two amplicons, and focuses only on very close local
-relationships. For \fId\fR = 1 (default value), swarm uses an
+relationships. For \fId\fR = 1, the default value, \fBswarm\fR uses an
 algorithm of linear complexity that generates all possible single
 mutations and performs exact-string matching by comparing
-hash-values. For \fId\fR = 2 or greater, swarm uses an algorithm of
-quadratic complexity that performs pairwise string comparisons. An
+hash-values. For \fId\fR = 2 or greater, \fBswarm\fR uses an algorithm
+of quadratic complexity that performs pairwise string comparisons. An
 efficient \fIk\fR-mer-based filtering and an astute use of comparisons
-results obtained during the clustering process allows to avoid most of
-the amplicon comparisons needed in a naïve approach. To speed up the
-remaining amplicon comparisons, \fBswarm\fR implements an extremely
-fast Needleman-Wunsch algorithm making use of the Streaming SIMD
-Extensions (SSE2) of modern x86-64 CPUs. If SSE2 instructions are not
-available, \fBswarm\fR exits with an error message.
+results obtained during the clustering process allows \fBswarm\fR to
+avoid most of the amplicon comparisons needed in a naïve approach. To
+speed up the remaining amplicon comparisons, \fBswarm\fR implements an
+extremely fast Needleman-Wunsch algorithm making use of the Streaming
+SIMD Extensions (SSE2) of modern x86-64 CPUs. If SSE2 instructions are
+not available, \fBswarm\fR exits with an error message.
 .PP
 \fBswarm\fR reads the named input \fIfilename\fR, a fasta file of
 nucleotide amplicons. The amplicon identifier is defined as the string
@@ -61,7 +62,7 @@ other symbol is present.
 .SS General options
 .TP 9
 .B \-h\fP,\fB\ \-\-help
-display this help and exit.
+display this help and exit successfully.
 .TP
 .BI \-t\fP,\fB\ \-\-threads\~ "positive integer"
 number of computation threads to use. Values between 1 and 256 are
@@ -69,7 +70,12 @@ accepted, but we recommend to use a number of threads lesser or equal
 to the number of available CPU cores. Default number of threads is 1.
 .TP
 .B \-v\fP,\fB\ \-\-version
-output version information and exit.
+output version information and exit successfully.
+.TP
+.B \-\-
+delimit the option list. Later arguments, if any, are treated as
+operands even if they begin with "\-". For example, "swarm \-\-
+\-file.fasta" reads from the file "\-file.fasta".
 .LP
 .\" ----------------------------------------------------------------------------
 .SS Clustering options
@@ -89,7 +95,7 @@ higher. When using \fId\fR = 0, \fBswarm\fR will output results
 corresponding to a strict dereplication of the dataset, i.e. merging
 identical amplicons. Warning, whatever the \fId\fR value, \fBswarm\fR
 requires fasta entries to present abundance values. Default number of
-differences is 1.
+differences \fId\fR is 1.
 .TP
 .B \-n\fP,\fB\ \-\-no\-otu\-breaking
 deactivate the built-in OTU refinement (not recommended). Amplicon
@@ -207,16 +213,16 @@ with one OTU per row and seven columns of information:
 .IP \n[step]. 4
 number of unique amplicons in the OTU,
 .IP \n+[step].
-total copy number of amplicons in the OTU,
+total abundance of amplicons in the OTU,
 .IP \n+[step].
 identifier of the initial seed,
 .IP \n+[step].
-initial seed copy number,
+initial seed abundance,
 .IP \n+[step].
-number of amplicons with a copy number of 1 in the OTU,
+number of amplicons with an abundance of 1 in the OTU,
 .IP \n+[step].
 maximum number of iterations before the OTU reached its natural
-limits),
+limit),
 .IP \n+[step].
 theoretical maximum radius of the OTU (i.e., number of cummulated
 differences between the seed and the furthermost amplicon in the
@@ -230,7 +236,7 @@ file. That option does not modify \fBswarm\fR's default output format.
 .TP
 .BI \-w\fP,\fB\ \-\-seeds \0filename
 output OTU representatives to \fIfilename\fR in fasta format. The
-abundance value of each representative is the sum of the abundances of
+abundance value of each OTU representative is the sum of the abundances of
 all the amplicons in the OTU.
 .TP
 .B \-z\fP,\fB\ \-\-usearch\-abundance
@@ -302,7 +308,7 @@ curmudgeonly e-mail to Frédéric Mahé <mahe at rhrk.uni-kl.de> and
 Torbjørn Rognes <torognes at ifi.uio.no>.
 .\" ============================================================================
 .SH AVAILABILITY
-The software is available from <https://github.com/torognes/swarm>
+Source code and binaries are available at <https://github.com/torognes/swarm>
 .\" ============================================================================
 .SH COPYRIGHT
 Copyright (C) 2012-2016 Frédéric Mahé & Torbjørn Rognes
@@ -338,6 +344,13 @@ New features and important modifications of \fBswarm\fR (short lived
 or minor bug releases are not mentioned):
 .RS
 .TP
+.BR v2.1.10\~ "released December 22, 2016"
+Version 2.1.10 fixes two bugs related to gap penalties of alignments.
+The first bug may lead to wrong aligments and similarity percentages
+reported in UCLUST (.uc) files. The second bug makes Swarm use a
+slightly higher gap extension penalty than specified. The default gap
+extension penalty used have actually been 4.5 instead of 4.
+.TP
 .BR v2.1.9\~ "released July 6, 2016"
 Version 2.1.9 fixes errors when compiling with GCC version 6.
 .TP
diff --git a/man/swarm_manual.pdf b/man/swarm_manual.pdf
index 2e67208..8b22a9c 100644
Binary files a/man/swarm_manual.pdf and b/man/swarm_manual.pdf differ
diff --git a/scripts/graph_plot.py b/scripts/graph_plot.py
index bb76a9a..7b355fe 100644
--- a/scripts/graph_plot.py
+++ b/scripts/graph_plot.py
@@ -5,15 +5,14 @@
     abundance). Requires the module igraph and python 2.7+.
 
     Limitations: amplicons grafted with the fastidious option will be
-    discarded and will not be visualized. The script does not deal
-    with ";size=" abundance annotations.
+    discarded and will not be visualized.
 """
 
 from __future__ import print_function
 
 __author__ = "Frédéric Mahé <mahe at rhrk.uni-kl.fr>"
-__date__ = "2015/04/22"
-__version__ = "$Revision: 3.0"
+__date__ = "2016/11/09"
+__version__ = "$Revision: 3.1"
 
 import sys
 import os.path
@@ -33,9 +32,9 @@ def option_parse():
     """
     desc = """Visualize the internal structure of a given OTU."""
 
-    parser = OptionParser(usage="usage: %prog -s FILE -i FILE -o INT",
+    parser = OptionParser(usage="usage: %prog -s FILE -i FILE [-o INT -d INT]",
                           description=desc,
-                          version="%prog version 2.0")
+                          version="%prog version 3.1")
 
     parser.add_option("-s", "--swarms",
                       metavar="<FILENAME>",
@@ -63,7 +62,8 @@ def option_parse():
                       type="int",
                       dest="drop",
                       default=0,
-                      help="Drop amplicons seen <INTEGER> or less times (0)")
+                      help="Drop amplicons seen <INTEGER> or less times \
+                            (zero by default)")
 
     (options, args) = parser.parse_args()
 
diff --git a/src/algo.cc b/src/algo.cc
index 3174192..c9ff0fd 100644
--- a/src/algo.cc
+++ b/src/algo.cc
@@ -424,7 +424,7 @@ void algo_run()
               unsigned long nwalignmentlength = 0;
 
               nw(dseq, dend, qseq, qend, 
-                 score_matrix_63, gapopen, gapextend, 
+                 score_matrix_63, penalty_gapopen, penalty_gapextend,
                  & nwscore, & nwdiff, & nwalignmentlength, & nwalignment,
                  dir, hearray, 0, 0);
               
diff --git a/src/algod1.cc b/src/algod1.cc
index 14cebe9..6004b8c 100644
--- a/src/algod1.cc
+++ b/src/algod1.cc
@@ -1431,7 +1431,7 @@ void algo_d1_run()
                   unsigned long nwalignmentlength = 0;
                   
                   nw(dseq, dend, qseq, qend,
-                     score_matrix_63, gapopen, gapextend,
+                     score_matrix_63, penalty_gapopen, penalty_gapextend,
                      & nwscore, & nwdiff, & nwalignmentlength, & nwalignment,
                      dir, hearray, 0, 0);
                   
diff --git a/src/swarm.cc b/src/swarm.cc
index e6276d7..54af068 100644
--- a/src/swarm.cc
+++ b/src/swarm.cc
@@ -526,7 +526,7 @@ int main(int argc, char** argv)
 
   penalty_mismatch = 2 * matchscore - 2 * mismatchscore;
   penalty_gapopen = 2 * gapopen;
-  penalty_gapextend = 2 * matchscore + gapextend;
+  penalty_gapextend = matchscore + 2 * gapextend;
 
   penalty_factor = gcd(gcd(penalty_mismatch, penalty_gapopen), penalty_gapextend);
   
diff --git a/src/swarm.h b/src/swarm.h
index 2d2dd51..bcc84cd 100644
--- a/src/swarm.h
+++ b/src/swarm.h
@@ -52,7 +52,7 @@
 #define LINE_MAX 2048
 #endif
 
-#define SWARM_VERSION "2.1.9"
+#define SWARM_VERSION "2.1.10"
 #define WIDTH 32
 #define WIDTH_SHIFT 5
 #define BLOCKWIDTH 32

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/swarm-cluster.git