[med-svn] [Git][med-team/vsearch][upstream] New upstream version 2.25.0
Étienne Mollier (@emollier)
gitlab at salsa.debian.org
Wed Nov 15 14:58:38 GMT 2023
Étienne Mollier pushed to branch upstream at Debian Med / vsearch
Commits:
88399a1e by Étienne Mollier at 2023-11-15T15:36:38+01:00
New upstream version 2.25.0
- - - - -
8 changed files:
- README.md
- configure.ac
- man/vsearch.1
- src/chimera.cc
- src/fasta.cc
- src/searchexact.cc
- src/vsearch.cc
- src/vsearch.h
Changes:
=====================================
README.md
=====================================
@@ -37,7 +37,7 @@ Most of the nucleotide based commands and options in USEARCH version 7 are suppo
## Getting Help
-If you can't find an answer in the [VSEARCH documentation](https://github.com/torognes/vsearch/releases/download/v2.24.0/vsearch_manual.pdf), please visit the [VSEARCH Web Forum](https://groups.google.com/forum/#!forum/vsearch-forum) to post a question or start a discussion.
+If you can't find an answer in the [VSEARCH documentation](https://github.com/torognes/vsearch/releases/download/v2.25.0/vsearch_manual.pdf), please visit the [VSEARCH Web Forum](https://groups.google.com/forum/#!forum/vsearch-forum) to post a question or start a discussion.
## Example
@@ -50,9 +50,9 @@ In the example below, VSEARCH will identify sequences in the file database.fsa t
**Source distribution** To download the source distribution from a [release](https://github.com/torognes/vsearch/releases) and build the executable and the documentation, use the following commands:
```
-wget https://github.com/torognes/vsearch/archive/v2.24.0.tar.gz
-tar xzf v2.24.0.tar.gz
-cd vsearch-2.24.0
+wget https://github.com/torognes/vsearch/archive/v2.25.0.tar.gz
+tar xzf v2.25.0.tar.gz
+cd vsearch-2.25.0
./autogen.sh
./configure CFLAGS="-O3" CXXFLAGS="-O3"
make
@@ -81,48 +81,48 @@ Binary distributions are provided for x86-64 systems running GNU/Linux, macOS (v
Download the appropriate executable for your system using the following commands if you are using a Linux x86_64 system:
```sh
-wget https://github.com/torognes/vsearch/releases/download/v2.24.0/vsearch-2.24.0-linux-x86_64.tar.gz
-tar xzf vsearch-2.24.0-linux-x86_64.tar.gz
+wget https://github.com/torognes/vsearch/releases/download/v2.25.0/vsearch-2.25.0-linux-x86_64.tar.gz
+tar xzf vsearch-2.25.0-linux-x86_64.tar.gz
```
Or these commands if you are using a Linux ppc64le system:
```sh
-wget https://github.com/torognes/vsearch/releases/download/v2.24.0/vsearch-2.24.0-linux-ppc64le.tar.gz
-tar xzf vsearch-2.24.0-linux-ppc64le.tar.gz
+wget https://github.com/torognes/vsearch/releases/download/v2.25.0/vsearch-2.25.0-linux-ppc64le.tar.gz
+tar xzf vsearch-2.25.0-linux-ppc64le.tar.gz
```
Or these commands if you are using a Linux aarch64 (arm64) system:
```sh
-wget https://github.com/torognes/vsearch/releases/download/v2.24.0/vsearch-2.24.0-linux-aarch64.tar.gz
-tar xzf vsearch-2.24.0-linux-aarch64.tar.gz
+wget https://github.com/torognes/vsearch/releases/download/v2.25.0/vsearch-2.25.0-linux-aarch64.tar.gz
+tar xzf vsearch-2.25.0-linux-aarch64.tar.gz
```
Or these commands if you are using a Mac with an Apple Silicon CPU:
```sh
-wget https://github.com/torognes/vsearch/releases/download/v2.24.0/vsearch-2.24.0-macos-aarch64.tar.gz
-tar xzf vsearch-2.24.0-macos-aarch64.tar.gz
+wget https://github.com/torognes/vsearch/releases/download/v2.25.0/vsearch-2.25.0-macos-aarch64.tar.gz
+tar xzf vsearch-2.25.0-macos-aarch64.tar.gz
```
Or these commands if you are using a Mac with an Intel CPU:
```sh
-wget https://github.com/torognes/vsearch/releases/download/v2.24.0/vsearch-2.24.0-macos-x86_64.tar.gz
-tar xzf vsearch-2.24.0-macos-x86_64.tar.gz
+wget https://github.com/torognes/vsearch/releases/download/v2.25.0/vsearch-2.25.0-macos-x86_64.tar.gz
+tar xzf vsearch-2.25.0-macos-x86_64.tar.gz
```
Or if you are using Windows, download and extract (unzip) the contents of this file:
```
-https://github.com/torognes/vsearch/releases/download/v2.24.0/vsearch-2.24.0-win-x86_64.zip
+https://github.com/torognes/vsearch/releases/download/v2.25.0/vsearch-2.25.0-win-x86_64.zip
```
-Linux and Mac: You will now have the binary distribution in a folder called `vsearch-2.24.0-linux-x86_64` or `vsearch-2.24.0-macos-x86_64` in which you will find three subfolders `bin`, `man` and `doc`. We recommend making a copy or a symbolic link to the vsearch binary `bin/vsearch` in a folder included in your `$PATH`, and a copy or a symbolic link to the vsearch man page `man/vsearch.1` in a folder included in your `$MANPATH`. The PDF version of the manual is available in `doc/vsearch_manual.pdf`. Versions with statically compiled libraries are available for Linux systems. These have "-static" in their name, and could be used on systems that do not have all the necessary libraries installed.
+Linux and Mac: You will now have the binary distribution in a folder called `vsearch-2.25.0-linux-x86_64` or `vsearch-2.25.0-macos-x86_64` in which you will find three subfolders `bin`, `man` and `doc`. We recommend making a copy or a symbolic link to the vsearch binary `bin/vsearch` in a folder included in your `$PATH`, and a copy or a symbolic link to the vsearch man page `man/vsearch.1` in a folder included in your `$MANPATH`. The PDF version of the manual is available in `doc/vsearch_manual.pdf`. Versions with statically compiled libraries are available for Linux systems. These have "-static" in their name, and could be used on systems that do not have all the necessary libraries installed.
**Windows**: You will now have the binary distribution in a folder
-called `vsearch-2.24.0-win-x86_64`. The vsearch executable is called
+called `vsearch-2.25.0-win-x86_64`. The vsearch executable is called
`vsearch.exe`. The manual in PDF format is called
`vsearch_manual.pdf`. If you want to be able to call `vsearch.exe`
from any command prompt window, you can put the vsearch executable in
@@ -133,7 +133,7 @@ searching for it in the Start menu, `Edit` user variables, add
your changes.
-**Documentation** The VSEARCH user's manual is available in the `man` folder in the form of a [man page](https://github.com/torognes/vsearch/blob/master/man/vsearch.1). A pdf version ([vsearch_manual.pdf](https://github.com/torognes/vsearch/releases/download/v2.24.0/vsearch_manual.pdf)) will be generated by `make`. To install the manpage manually, copy the `vsearch.1` file or a create a symbolic link to `vsearch.1` in a folder included in your `$MANPATH`. The manual in both formats is also available with the binary distribution. The manual in PDF form ([vsearch_manual.pdf](https://github.com/torognes/vsearch/releases/download/v2.24.0/vsearch_manual.pdf)) is also attached to the latest [release](https://github.com/torognes/vsearch/releases).
+**Documentation** The VSEARCH user's manual is available in the `man` folder in the form of a [man page](https://github.com/torognes/vsearch/blob/master/man/vsearch.1). A pdf version ([vsearch_manual.pdf](https://github.com/torognes/vsearch/releases/download/v2.25.0/vsearch_manual.pdf)) will be generated by `make`. To install the manpage manually, copy the `vsearch.1` file or a create a symbolic link to `vsearch.1` in a folder included in your `$MANPATH`. The manual in both formats is also available with the binary distribution. The manual in PDF form ([vsearch_manual.pdf](https://github.com/torognes/vsearch/releases/download/v2.25.0/vsearch_manual.pdf)) is also attached to the latest [release](https://github.com/torognes/vsearch/releases).
## Packages, plugins, and wrappers
=====================================
configure.ac
=====================================
@@ -2,7 +2,7 @@
# Process this file with autoconf to produce a configure script.
AC_PREREQ([2.63])
-AC_INIT([vsearch], [2.24.0], [torognes at ifi.uio.no], [vsearch], [https://github.com/torognes/vsearch])
+AC_INIT([vsearch], [2.25.0], [torognes at ifi.uio.no], [vsearch], [https://github.com/torognes/vsearch])
AC_CANONICAL_TARGET
AM_INIT_AUTOMAKE([subdir-objects])
AC_LANG([C++])
=====================================
man/vsearch.1
=====================================
@@ -1,5 +1,5 @@
.\" ============================================================================
-.TH vsearch 1 "October 26, 2023" "version 2.24.0" "USER COMMANDS"
+.TH vsearch 1 "November 10, 2023" "version 2.25.0" "USER COMMANDS"
.\" ============================================================================
.SH NAME
vsearch \(em a versatile open-source tool for microbiome analysis,
@@ -4830,6 +4830,10 @@ with clustering. Add more references.
Update documentation. Improve code. Allow up to 20 parents for the
undocumented and experimental chimeras_denovo command. Fix compilation
warnings for sha1.c. Compile for release (not debug) by default.
+.TP
+.BR v2.25.0\~ "released November 10th, 2023"
+Allow a given percentage of mismatches between chimeras and parents
+for the experimental chimeras_denovo command.
.LP
.\" ============================================================================
.\" TODO:
=====================================
src/chimera.cc
=====================================
@@ -139,6 +139,9 @@ struct chimera_info_s
int * smooth;
int * maxsmooth;
+ double * scan_p;
+ double * scan_q;
+
int parents_found;
int best_parents[maxparents];
int best_start[maxparents];
@@ -213,6 +216,11 @@ void realloc_arrays(struct chimera_info_s * ci)
ci->smooth = (int*) xrealloc(ci->smooth,
maxcandidates * maxqlen * sizeof(int));
+ ci->scan_p = (double *) xrealloc(ci->scan_p,
+ (maxqlen + 1) * sizeof(double));
+ ci->scan_q = (double *) xrealloc(ci->scan_q,
+ (maxqlen + 1) * sizeof(double));
+
int maxalnlen = maxqlen + 2 * db_getlongestsequence();
for (int f = 0; f < maxparents ; f++)
{
@@ -306,10 +314,75 @@ int compare_positions(const void * a, const void * b)
return 0;
}
+bool scan_matches(struct chimera_info_s * ci,
+ int * matches,
+ int len,
+ double percentage,
+ int * best_start,
+ int * best_len)
+{
+ /*
+ Scan matches array of zeros and ones, and find the longest subsequence
+ having a match fraction above or equal to the given percentage (e.g. 2%).
+ Based on an idea of finding the longest positive sum substring:
+ https://stackoverflow.com/questions/28356453/longest-positive-sum-substring
+ If the percentage is 2%, matches are given a score of 2 and mismatches -98.
+ */
+
+ double score_match = percentage;
+ double score_mismatch = percentage - 100.0;
+
+ double * p = ci->scan_p;
+ double * q = ci->scan_q;
+
+ p[0] = 0.0;
+ for (int i = 0; i < len; i++)
+ p[i + 1] = p[i] + (matches[i] ? score_match : score_mismatch);
+
+ q[len] = p[len];
+ for (int i = len - 1; i >= 0; i--)
+ q[i] = MAX(q[i + 1], p[i]);
+
+ int best_i = 0;
+ int best_d = -1;
+ double best_c = -1.0;
+ int i = 1;
+ int j = 1;
+ while (j <= len)
+ {
+ double c = q[j] - p[i - 1];
+ if (c >= 0.0)
+ {
+ int d = j - i + 1;
+ if (d > best_d)
+ {
+ best_i = i;
+ best_d = d;
+ best_c = c;
+ }
+ j += 1;
+ }
+ else
+ {
+ i += 1;
+ }
+ }
+
+ if (best_c >= 0.0)
+ {
+ * best_start = best_i - 1;
+ * best_len = best_d;
+ return true;
+ }
+ else
+ return false;
+}
+
int find_best_parents_long(struct chimera_info_s * ci)
{
- /* find parents with longest perfect match regions,
- excluding regions matched by previously identified parents */
+ /* Find parents with longest matching regions, without indels, allowing
+ a given percentage of mismatches (specified with --chimeras_diff_pct),
+ and excluding regions matched by previously identified parents. */
find_matches(ci);
@@ -342,29 +415,38 @@ int find_best_parents_long(struct chimera_info_s * ci)
{
int start = 0;
int len = 0;
-
- for (int j = 0; j < ci->query_len; j++)
+ int j = 0;
+ while (j < ci->query_len)
{
- if ((position_used[j] == false) &&
- (ci->match[i * ci->query_len + j] == 1) &&
- ((len == 0) || (ci->insert[i * ci->query_len + j] == 0)))
+ start = j;
+ len = 0;
+ while ((j < ci->query_len) &&
+ (! position_used[j]) &&
+ ((len == 0) || (ci->insert[i * ci->query_len + j] == 0)))
{
- if (len == 0)
- {
- start = j;
- }
len++;
- if (len > best_len)
- {
- best_cand = i;
- best_start = start;
- best_len = len;
- }
+ j++;
}
- else
+ if (len > best_len)
{
- len = 0;
+ int scan_best_start = 0;
+ int scan_best_len = 0;
+ if (scan_matches(ci,
+ ci->match + i*ci->query_len + start,
+ len,
+ opt_chimeras_diff_pct,
+ & scan_best_start,
+ & scan_best_len))
+ {
+ if (scan_best_len > best_len)
+ {
+ best_cand = i;
+ best_start = start + scan_best_start;
+ best_len = scan_best_len;
+ }
+ }
}
+ j++;
}
}
@@ -727,7 +809,7 @@ int eval_parents_long(struct chimera_info_s * ci)
unsigned int qsym = chrmap_4bit[(int)(ci->qaln[i])];
unsigned int psym[maxparents];
for (int f = 0; f < maxparents; f++)
- psym[f] = 0;
+ psym[f] = 0;
for (int f = 0; f < ci->parents_found; f++)
psym[f] = chrmap_4bit[(int)(ci->paln[f][i])];
@@ -748,13 +830,15 @@ int eval_parents_long(struct chimera_info_s * ci)
if (all_defined)
{
- bool parents_equal = true;
- for (int f = 1; f < ci->parents_found; f++)
- if (psym[f] != psym[0])
- parents_equal = false;
-
- if (! parents_equal)
- diff = ci->model[i];
+ int z = 0;
+ for (int f = 0; f < ci->parents_found; f++)
+ if (psym[f] == qsym)
+ {
+ diff = 'A' + f;
+ z++;
+ }
+ if (z > 1)
+ diff = ' ';
}
ci->diffs[i] = diff;
@@ -778,11 +862,11 @@ int eval_parents_long(struct chimera_info_s * ci)
char qsym = chrmap_4bit[(int)(ci->qaln[i])];
for(int f = 0; f < ci->parents_found; f++)
- {
- char psym = chrmap_4bit[(int)(ci->paln[f][i])];
- if (qsym == psym)
- match_QP[f]++;
- }
+ {
+ char psym = chrmap_4bit[(int)(ci->paln[f][i])];
+ if (qsym == psym)
+ match_QP[f]++;
+ }
}
@@ -1632,6 +1716,8 @@ void chimera_thread_init(struct chimera_info_s * ci)
ci->votes = nullptr;
ci->model = nullptr;
ci->ignore = nullptr;
+ ci->scan_p = nullptr;
+ ci->scan_q = nullptr;
for (int f = 0; f < maxparents; f++)
{
@@ -1716,6 +1802,14 @@ void chimera_thread_exit(struct chimera_info_s * ci)
{
xfree(ci->query_head);
}
+ if (ci->scan_p)
+ {
+ xfree(ci->scan_p);
+ }
+ if (ci->scan_q)
+ {
+ xfree(ci->scan_q);
+ }
for (int f = 0; f < maxparents; f++)
if (ci->paln[f])
=====================================
src/fasta.cc
=====================================
@@ -426,23 +426,23 @@ void fasta_print_general(FILE * fp,
if ((opt_eeout || opt_fastq_eeout) && (ee >= 0.0))
{
if (ee < 0.000000001)
- fprintf(fp, ";ee=%.13lf", ee);
+ fprintf(fp, ";ee=%.13lf", ee);
else if (ee < 0.00000001)
- fprintf(fp, ";ee=%.12lf", ee);
+ fprintf(fp, ";ee=%.12lf", ee);
else if (ee < 0.0000001)
- fprintf(fp, ";ee=%.11lf", ee);
+ fprintf(fp, ";ee=%.11lf", ee);
else if (ee < 0.000001)
- fprintf(fp, ";ee=%.10lf", ee);
+ fprintf(fp, ";ee=%.10lf", ee);
else if (ee < 0.00001)
- fprintf(fp, ";ee=%.9lf", ee);
+ fprintf(fp, ";ee=%.9lf", ee);
else if (ee < 0.0001)
- fprintf(fp, ";ee=%.8lf", ee);
+ fprintf(fp, ";ee=%.8lf", ee);
else if (ee < 0.001)
- fprintf(fp, ";ee=%.7lf", ee);
+ fprintf(fp, ";ee=%.7lf", ee);
else if (ee < 0.01)
- fprintf(fp, ";ee=%.6lf", ee);
+ fprintf(fp, ";ee=%.6lf", ee);
else if (ee < 0.1)
- fprintf(fp, ";ee=%.5lf", ee);
+ fprintf(fp, ";ee=%.5lf", ee);
else
fprintf(fp, ";ee=%.4lf", ee);
}
=====================================
src/searchexact.cc
=====================================
@@ -495,13 +495,13 @@ void search_exact_thread_run(int64_t t)
/* update stats */
queries++;
- queries_abundance += qsize;
+ queries_abundance += qsize;
if (match)
{
qmatches++;
- qmatches_abundance += qsize;
- }
+ qmatches_abundance += qsize;
+ }
/* show progress */
progress_update(progress);
@@ -876,9 +876,9 @@ void search_exact(char * cmdline, char * progheader)
}
fprintf(stderr, "\n");
if (opt_sizein)
- {
+ {
fprintf(stderr, "Matching total query sequences: %" PRIu64 " of %"
- PRIu64,
+ PRIu64,
qmatches_abundance, queries_abundance);
if (queries_abundance > 0)
{
=====================================
src/vsearch.cc
=====================================
@@ -194,6 +194,7 @@ char * opt_usearch_global;
char * opt_userout;
double * opt_ee_cutoffs_values;
double opt_abskew;
+double opt_chimeras_diff_pct;
double opt_dn;
double opt_fastq_maxdiffpct;
double opt_fastq_maxee;
@@ -755,6 +756,7 @@ void args_init(int argc, char **argv)
opt_centroids = nullptr;
opt_chimeras = nullptr;
opt_chimeras_denovo = nullptr;
+ opt_chimeras_diff_pct = 0.0;
opt_chimeras_length_min = 10;
opt_chimeras_parents_max = 3;
opt_chimeras_parts = 0;
@@ -1010,6 +1012,7 @@ void args_init(int argc, char **argv)
option_centroids,
option_chimeras,
option_chimeras_denovo,
+ option_chimeras_diff_pct,
option_chimeras_length_min,
option_chimeras_parents_max,
option_chimeras_parts,
@@ -1255,6 +1258,7 @@ void args_init(int argc, char **argv)
{"centroids", required_argument, nullptr, 0 },
{"chimeras", required_argument, nullptr, 0 },
{"chimeras_denovo", required_argument, nullptr, 0 },
+ {"chimeras_diff_pct", required_argument, nullptr, 0 },
{"chimeras_length_min", required_argument, nullptr, 0 },
{"chimeras_parents_max", required_argument, nullptr, 0 },
{"chimeras_parts", required_argument, nullptr, 0 },
@@ -2536,6 +2540,10 @@ void args_init(int argc, char **argv)
opt_chimeras_parents_max = args_getlong(optarg);
break;
+ case option_chimeras_diff_pct:
+ opt_chimeras_diff_pct = args_getdouble(optarg);
+ break;
+
default:
fatal("Internal error in option parsing");
}
@@ -2709,6 +2717,7 @@ void args_init(int argc, char **argv)
option_alignwidth,
option_alnout,
option_chimeras,
+ option_chimeras_diff_pct,
option_chimeras_length_min,
option_chimeras_parents_max,
option_chimeras_parts,
@@ -4782,6 +4791,11 @@ void args_init(int argc, char **argv)
fatal("The argument to chimeras_parents_max must be in the range 2 to %s.\n", maxparents_string);
}
+ if ((opt_chimeras_diff_pct < 0.0) || (opt_chimeras_diff_pct > 50.0))
+ {
+ fatal("The argument to chimeras_diff_pct must be in the range 0.0 to 50.0");
+ }
+
if (options_selected[option_chimeras_parts] &&
((opt_chimeras_parts < 2) || (opt_chimeras_parts > 100)))
{
@@ -4986,6 +5000,7 @@ void cmd_help()
" --chimeras_denovo FILENAME detect chimeras de novo in long exact sequences\n"
" Parameters\n"
" --abskew REAL minimum abundance ratio (1.0)\n"
+ " --chimeras_diff_pct mismatch %% allowed in each chimeric region (0.0)\n"
" --chimeras_length_min minimum length of each chimeric region (10)\n"
" --chimeras_parents_max maximum number of parent sequences (3)\n"
" --chimeras_parts number of parts to divide sequences (length/100)\n"
=====================================
src/vsearch.h
=====================================
@@ -393,6 +393,7 @@ extern char * opt_usearch_global;
extern char * opt_userout;
extern double * opt_ee_cutoffs_values;
extern double opt_abskew;
+extern double opt_chimeras_diff_pct;
extern double opt_dn;
extern double opt_fastq_maxdiffpct;
extern double opt_fastq_maxee;
View it on GitLab: https://salsa.debian.org/med-team/vsearch/-/commit/88399a1e38abf1a5ebff27623d60e6e59b8b54d5
--
View it on GitLab: https://salsa.debian.org/med-team/vsearch/-/commit/88399a1e38abf1a5ebff27623d60e6e59b8b54d5
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20231115/94b5f030/attachment-0001.htm>
More information about the debian-med-commit
mailing list