[med-svn] [Git][med-team/vsearch][upstream] New upstream version 2.15.1

Nilesh Patra gitlab at salsa.debian.org
Fri Oct 30 19:22:53 GMT 2020



Nilesh Patra pushed to branch upstream at Debian Med / vsearch


Commits:
b20dc72e by Nilesh Patra at 2020-10-31T00:46:13+05:30
New upstream version 2.15.1
- - - - -


9 changed files:

- README.md
- configure.ac
- man/vsearch.1
- src/arch.cc
- src/arch.h
- src/derep.cc
- src/dynlibs.cc
- src/filter.cc
- src/vsearch.cc


Changes:

=====================================
README.md
=====================================
@@ -34,7 +34,7 @@ Most of the nucleotide based commands and options in USEARCH version 7 are suppo
 
 ## Getting Help
 
-If you can't find an answer in the [VSEARCH documentation](https://github.com/torognes/vsearch/releases/download/v2.15.0/vsearch_manual.pdf), please visit the [VSEARCH Web Forum](https://groups.google.com/forum/#!forum/vsearch-forum) to post a question or start a discussion.
+If you can't find an answer in the [VSEARCH documentation](https://github.com/torognes/vsearch/releases/download/v2.15.1/vsearch_manual.pdf), please visit the [VSEARCH Web Forum](https://groups.google.com/forum/#!forum/vsearch-forum) to post a question or start a discussion.
 
 ## Example
 
@@ -47,9 +47,9 @@ In the example below, VSEARCH will identify sequences in the file database.fsa t
 **Source distribution** To download the source distribution from a [release](https://github.com/torognes/vsearch/releases) and build the executable and the documentation, use the following commands:
 
 ```
-wget https://github.com/torognes/vsearch/archive/v2.15.0.tar.gz
-tar xzf v2.15.0.tar.gz
-cd vsearch-2.15.0
+wget https://github.com/torognes/vsearch/archive/v2.15.1.tar.gz
+tar xzf v2.15.1.tar.gz
+cd vsearch-2.15.1
 ./autogen.sh
 ./configure
 make
@@ -78,43 +78,43 @@ Binary distributions are provided for x86-64 systems running GNU/Linux, macOS (v
 Download the appropriate executable for your system using the following commands if you are using a Linux x86_64 system:
 
 ```sh
-wget https://github.com/torognes/vsearch/releases/download/v2.15.0/vsearch-2.15.0-linux-x86_64.tar.gz
-tar xzf vsearch-2.15.0-linux-x86_64.tar.gz
+wget https://github.com/torognes/vsearch/releases/download/v2.15.1/vsearch-2.15.1-linux-x86_64.tar.gz
+tar xzf vsearch-2.15.1-linux-x86_64.tar.gz
 ```
 
 Or these commands if you are using a Linux ppc64le system:
 
 ```sh
-wget https://github.com/torognes/vsearch/releases/download/v2.15.0/vsearch-2.15.0-linux-ppc64le.tar.gz
-tar xzf vsearch-2.15.0-linux-ppc64le.tar.gz
+wget https://github.com/torognes/vsearch/releases/download/v2.15.1/vsearch-2.15.1-linux-ppc64le.tar.gz
+tar xzf vsearch-2.15.1-linux-ppc64le.tar.gz
 ```
 
 Or these commands if you are using a Linux aarch64 system:
 
 ```sh
-wget https://github.com/torognes/vsearch/releases/download/v2.15.0/vsearch-2.15.0-linux-aarch64.tar.gz
-tar xzf vsearch-2.15.0-linux-aarch64.tar.gz
+wget https://github.com/torognes/vsearch/releases/download/v2.15.1/vsearch-2.15.1-linux-aarch64.tar.gz
+tar xzf vsearch-2.15.1-linux-aarch64.tar.gz
 ```
 
 Or these commands if you are using a Mac:
 
 ```sh
-wget https://github.com/torognes/vsearch/releases/download/v2.15.0/vsearch-2.15.0-macos-x86_64.tar.gz
-tar xzf vsearch-2.15.0-macos-x86_64.tar.gz
+wget https://github.com/torognes/vsearch/releases/download/v2.15.1/vsearch-2.15.1-macos-x86_64.tar.gz
+tar xzf vsearch-2.15.1-macos-x86_64.tar.gz
 ```
 
 Or if you are using Windows, download and extract (unzip) the contents of this file:
 
 ```
-https://github.com/torognes/vsearch/releases/download/v2.15.0/vsearch-2.15.0-win-x86_64.zip
+https://github.com/torognes/vsearch/releases/download/v2.15.1/vsearch-2.15.1-win-x86_64.zip
 ```
 
-Linux and Mac: You will now have the binary distribution in a folder called `vsearch-2.15.0-linux-x86_64` or `vsearch-2.15.0-macos-x86_64` in which you will find three subfolders `bin`, `man` and `doc`. We recommend making a copy or a symbolic link to the vsearch binary `bin/vsearch` in a folder included in your `$PATH`, and a copy or a symbolic link to the vsearch man page `man/vsearch.1` in a folder included in your `$MANPATH`. The PDF version of the manual is available in `doc/vsearch_manual.pdf`.
+Linux and Mac: You will now have the binary distribution in a folder called `vsearch-2.15.1-linux-x86_64` or `vsearch-2.15.1-macos-x86_64` in which you will find three subfolders `bin`, `man` and `doc`. We recommend making a copy or a symbolic link to the vsearch binary `bin/vsearch` in a folder included in your `$PATH`, and a copy or a symbolic link to the vsearch man page `man/vsearch.1` in a folder included in your `$MANPATH`. The PDF version of the manual is available in `doc/vsearch_manual.pdf`.
 
-Windows: You will now have the binary distribution in a folder called `vsearch-2.15.0-win-x86_64`. The vsearch executable is called `vsearch.exe`. The manual in PDF format is called `vsearch_manual.pdf`.
+Windows: You will now have the binary distribution in a folder called `vsearch-2.15.1-win-x86_64`. The vsearch executable is called `vsearch.exe`. The manual in PDF format is called `vsearch_manual.pdf`.
 
 
-**Documentation** The VSEARCH user's manual is available in the `man` folder in the form of a [man page](https://github.com/torognes/vsearch/blob/master/man/vsearch.1). A pdf version ([vsearch_manual.pdf](https://github.com/torognes/vsearch/releases/download/v2.15.0/vsearch_manual.pdf)) will be generated by `make`. To install the manpage manually, copy the `vsearch.1` file or a create a symbolic link to `vsearch.1` in a folder included in your `$MANPATH`. The manual in both formats is also available with the binary distribution. The manual in PDF form ([vsearch_manual.pdf](https://github.com/torognes/vsearch/releases/download/v2.15.0/vsearch_manual.pdf)) is also attached to the latest [release](https://github.com/torognes/vsearch/releases).
+**Documentation** The VSEARCH user's manual is available in the `man` folder in the form of a [man page](https://github.com/torognes/vsearch/blob/master/man/vsearch.1). A pdf version ([vsearch_manual.pdf](https://github.com/torognes/vsearch/releases/download/v2.15.1/vsearch_manual.pdf)) will be generated by `make`. To install the manpage manually, copy the `vsearch.1` file or a create a symbolic link to `vsearch.1` in a folder included in your `$MANPATH`. The manual in both formats is also available with the binary distribution. The manual in PDF form ([vsearch_manual.pdf](https://github.com/torognes/vsearch/releases/download/v2.15.1/vsearch_manual.pdf)) is also attached to the latest [release](https://github.com/torognes/vsearch/releases).
 
 
 ## Packages, plugins, and wrappers
@@ -156,9 +156,11 @@ When compiling VSEARCH the header files for the following two optional libraries
 * libz (zlib library) (zlib.h header file) (optional)
 * libbz2 (bzip2lib library) (bzlib.h header file) (optional)
 
+VSEARCH will automatically check whether these libraries are available and load them dynamically.
+
 On Windows these libraries are called zlib1.dll and bz2.dll.
 
-VSEARCH will automatically check whether these libraries are available and load them dynamically.
+Unfortunately, VSEARCH will not work properly with all the different variants of the `zlib1.dll` file on Windows. One that works well is provided by the MinGW-w64 project and is found in the `bin` folder within the [zlib-1.2.5-bin-x64.zip](https://sourceforge.net/projects/mingw-w64/files/External%20binary%20packages%20%28Win64%20hosted%29/Binaries%20%2864-bit%29/zlib-1.2.5-bin-x64.zip) archive available on SourceForge. The MD5 of the `zlib1.dll` file should be `0f67ee0b965d3d29388c238aebcf60bc`.
 
 To create the PDF file with the manual the ps2pdf tool is required. It is part of the ghostscript package.
 


=====================================
configure.ac
=====================================
@@ -2,7 +2,7 @@
 # Process this file with autoconf to produce a configure script.
 
 AC_PREREQ([2.63])
-AC_INIT([vsearch], [2.15.0], [torognes at ifi.uio.no])
+AC_INIT([vsearch], [2.15.1], [torognes at ifi.uio.no])
 AC_CANONICAL_TARGET
 AM_INIT_AUTOMAKE([subdir-objects])
 AC_LANG([C++])


=====================================
man/vsearch.1
=====================================
@@ -1,5 +1,5 @@
 .\" ============================================================================
-.TH vsearch 1 "June 19, 2020" "version 2.15.0" "USER COMMANDS"
+.TH vsearch 1 "October 28, 2020" "version 2.15.1" "USER COMMANDS"
 .\" ============================================================================
 .SH NAME
 vsearch \(em chimera detection, clustering, dereplication and
@@ -977,7 +977,7 @@ Label of the centroid sequence (H), or set to '*' (S, C).
 .TP
 .BI \-\-unoise_alpha\~ real
 Specify the alpha parameter to the \-\-cluster_unoise command. The
-default i 2.0.
+default is 2.0.
 .TAG usersort
 .TP
 .B \-\-usersort
@@ -1324,11 +1324,11 @@ command can be used to convert SFF files to FASTQ.
 .TAG eeout
 .TP 9
 .B \-\-eeout
-When using \-\-fastq_filter or \-\-fastq_mergepairs, include the
-number of expected errors (ee) in the sequence header of FASTQ and
-FASTA files. This option is a synonym of the \-\-fastq_eeout
-option. Use the \-\-xee option to remove this information from
-headers.
+When using \-\-fastq_filter, \-\-fastx_filter or \-\-fastq_mergepairs,
+include the number of expected errors (ee) in the sequence header of
+FASTQ and FASTA output files. This option is a synonym of the
+\-\-fastq_eeout option. Use the \-\-xee option to remove this
+information from headers.
 .TAG eetabbedout
 .TP
 .BI \-\-eetabbedout \0filename
@@ -1792,24 +1792,30 @@ detected. If the input consists of paired sequences, an input file
 with reverse reads may be specified with the \-\-reverse option, and
 corresponding output will be written to the files specified with the
 \-\-fastqout_rev, \-\-fastaout_rev, \-\-fastqout_discarded_rev, and
-\-\-fastaout_discarded_rev options. Output can not be written to FASTQ files
-if the input is in FASTA format. The sequences are first trimmed and
-then filtered based on the remaining bases. Sequences may be trimmed
-using the options \-\-fastq_stripleft, \-\-fastq_stripright,
+\-\-fastaout_discarded_rev options. Output can not be written to FASTQ
+files if the input is in FASTA format. The sequences are first trimmed
+and then filtered based on the remaining bases. Sequences may be
+trimmed using the options \-\-fastq_stripleft, \-\-fastq_stripright,
 \-\-fastq_truncee, \-\-fastq_trunclen, \-\-fastq_trunclen_keep and
-\-\-fastq_truncqual.  The sequences may be filtered using the options
+\-\-fastq_truncqual. The sequences may be filtered using the options
 \-\-fastq_maxee, \-\-fastq_maxee_rate, \-\-fastq_maxlen,
 \-\-fastq_maxns, \-\-fastq_minlen (default 1), \-\-fastq_trunclen,
 \-\-maxsize, and \-\-minsize. Sequences not satisfying the
 requirements are discarded. For pairs of sequences, both sequences in
-a pair must satisfy the requirements, otherwise both are
-discarded. If no shortening or filtering options are given, all
-sequences are written to the output files, possibly after conversion
-from FASTQ to FASTA format. The \-\-relabel option may be used to
-relabel the output sequences. The \-\-eeout option may be used to output the
-expected number of errors in each sequence. After all sequences have
-been processed, the number of kept and discarded sequences will be
-shown, as well as how many of the kept sequences were trimmed.
+a pair must satisfy the requirements, otherwise both are discarded. If
+no shortening or filtering options are given, all sequences are
+written to the output files, possibly after conversion from FASTQ to
+FASTA format. The \-\-relabel option may be used to relabel the output
+sequences. The \-\-eeout option may be used to output the expected
+number of errors in each sequence. After all sequences have been
+processed, the number of kept and discarded sequences will be shown,
+as well as how many of the kept sequences were trimmed. When the input
+is in FASTA format, the following options are not accepted because
+quality scores are not available: \-\-eeout, \-\-fastq_ascii,
+\-\-fastq_eeout, \-\-fastq_maxee, \-\-fastq_maxee_rate, \-\-fastq_out,
+\-\-fastq_qmax, \-\-fastq_qmin, \-\-fastq_truncee,
+\-\-fastq_truncqual, \-\-fastqout_discarded,
+\-\-fastqout_discarded_rev, \-\-fastqout_rev.
 .TAG fastx_revcomp
 .TP
 .BI \-\-fastx_revcomp \0filename
@@ -4285,6 +4291,13 @@ error messages when parsing FASTQ files. Add missing fastq_qminout
 option and fix label_suffix option for fastq_mergepairs. Add derep_id
 command that dereplicates based on both label and sequence. Remove
 compilation warnings.
+.TP
+.BR v2.15.1\~ "released October 28th, 2020"
+Fix for dereplication when including reverse complement sequences and
+headers. Make some extra checks when loading compression libraries and
+add more diagnostic output about them to the output of the version
+command. Report an error when fastx_filter is used with FASTA input
+and options that require FASTQ input. Update manual.
 .LP
 .\" ============================================================================
 .\" TODO:


=====================================
src/arch.cc
=====================================
@@ -307,3 +307,16 @@ const char * xstrcasestr(const char * haystack, const char * needle)
   return strcasestr(haystack, needle);
 #endif
 }
+
+#ifdef _WIN32
+FARPROC arch_dlsym(HMODULE handle, const char * symbol)
+#else
+void * arch_dlsym(void * handle, const char * symbol)
+#endif
+{
+#ifdef _WIN32
+  return GetProcAddress(handle, symbol);
+#else
+  return dlsym(handle, symbol);
+#endif
+}


=====================================
src/arch.h
=====================================
@@ -83,3 +83,9 @@ int xopen_read(const char * path);
 int xopen_write(const char * path);
 
 const char * xstrcasestr(const char * haystack, const char * needle);
+
+#ifdef _WIN32
+FARPROC arch_dlsym(HMODULE handle, const char * symbol);
+#else
+void * arch_dlsym(void * handle, const char * symbol);
+#endif


=====================================
src/derep.cc
=====================================
@@ -387,9 +387,13 @@ void derep(char * input_filename, bool use_header)
         collision when the number of sequences is about 5e9.
       */
 
-      uint64_t hash = HASH(seq_up, seqlen);
+      uint64_t hash_header;
       if (use_header)
-        hash ^= HASH(header, headerlen);
+        hash_header = HASH(header, headerlen);
+      else
+        hash_header = 0;
+
+      uint64_t hash = HASH(seq_up, seqlen) ^ hash_header;
       uint64_t j = hash & hash_mask;
       struct bucket * bp = hashtable + j;
 
@@ -408,7 +412,7 @@ void derep(char * input_filename, bool use_header)
           /* no match on plus strand */
           /* check minus strand as well */
 
-          uint64_t rc_hash = HASH(rc_seq_up, seqlen);
+          uint64_t rc_hash = HASH(rc_seq_up, seqlen) ^ hash_header;
           uint64_t k = rc_hash & hash_mask;
           struct bucket * rc_bp = hashtable + k;
 


=====================================
src/dynlibs.cc
=====================================
@@ -72,13 +72,11 @@ const char gz_libname[] = "libz.so";
 #  endif
 void * gz_lib;
 # endif
-gzFile (*gzdopen_p)(int, const char *);
-int (*gzclose_p)(gzFile);
-int (*gzread_p)(gzFile, void *, unsigned);
-int (*gzgetc_p)(gzFile);
-int (*gzrewind_p)(gzFile);
-int (*gzungetc_p)(int, gzFile);
-const char * (*gzerror_p)(gzFile, int*);
+
+gzFile ZEXPORT (*gzdopen_p) OF((int, const char *));
+int ZEXPORT (*gzclose_p) OF((gzFile));
+int ZEXPORT (*gzread_p) OF((gzFile, void *, unsigned));
+
 #endif
 
 #ifdef HAVE_BZLIB_H
@@ -98,19 +96,6 @@ void (*BZ2_bzReadClose_p)(int*, BZFILE*);
 int (*BZ2_bzRead_p)(int*, BZFILE*, void*, int);
 #endif
 
-#ifdef _WIN32
-FARPROC arch_dlsym(HMODULE handle, const char * symbol)
-#else
-void * arch_dlsym(void * handle, const char * symbol)
-#endif
-{
-#ifdef _WIN32
-  return GetProcAddress(handle, symbol);
-#else
-  return dlsym(handle, symbol);
-#endif
-}
-
 void dynlibs_open()
 {
 #ifdef HAVE_ZLIB_H
@@ -124,10 +109,8 @@ void dynlibs_open()
       gzdopen_p = (gzFile (*)(int, const char*)) arch_dlsym(gz_lib, "gzdopen");
       gzclose_p = (int (*)(gzFile)) arch_dlsym(gz_lib, "gzclose");
       gzread_p = (int (*)(gzFile, void*, unsigned)) arch_dlsym(gz_lib, "gzread");
-      gzgetc_p = (int (*)(gzFile)) arch_dlsym(gz_lib, "gzgetc");
-      gzrewind_p = (int (*)(gzFile)) arch_dlsym(gz_lib, "gzrewind");
-      gzerror_p = (const char * (*)(gzFile, int*)) arch_dlsym(gz_lib, "gzerror");
-      gzungetc_p = (int (*)(int, gzFile)) arch_dlsym(gz_lib, "gzungetc");
+      if (!(gzdopen_p && gzclose_p && gzread_p))
+        fatal("Invalid compression library (zlib)");
     }
 #endif
 
@@ -145,6 +128,8 @@ void dynlibs_open()
         arch_dlsym(bz2_lib, "BZ2_bzReadClose");
       BZ2_bzRead_p = (int (*)(int*, BZFILE*, void*, int))
         arch_dlsym(bz2_lib, "BZ2_bzRead");
+      if (!(BZ2_bzReadOpen_p && BZ2_bzReadClose_p && BZ2_bzRead_p))
+        fatal("Invalid compression library (bz2)");
     }
 #endif
 }


=====================================
src/filter.cc
=====================================
@@ -223,11 +223,29 @@ void filter(bool fastq_only, char * filename)
   if (!h1)
     fatal("Unrecognized file type (not proper FASTA or FASTQ format)");
 
-  if (fastq_only && ! h1->is_fastq)
-    fatal("FASTA input files not allowed with fastq_filter, consider using fastx_filter command instead");
-
-  if ((opt_fastqout || opt_fastqout_discarded) && ! h1->is_fastq)
-    fatal("Cannot write FASTQ output with FASTA input file (no quality scores)");
+  if (! h1->is_fastq)
+    {
+      if (fastq_only)
+        {
+          fatal("FASTA input files not allowed with fastq_filter, consider using fastx_filter command instead");
+        }
+      else if (opt_eeout ||
+               (opt_fastq_ascii != 33) ||
+               opt_fastq_eeout ||
+               (opt_fastq_maxee < DBL_MAX) ||
+               (opt_fastq_maxee_rate < DBL_MAX) ||
+               opt_fastqout ||
+               (opt_fastq_qmax < 41) ||
+               (opt_fastq_qmin > 0) ||
+               (opt_fastq_truncee < DBL_MAX) ||
+               (opt_fastq_truncqual < LONG_MIN) ||
+               opt_fastqout_discarded ||
+               opt_fastqout_discarded_rev ||
+               opt_fastqout_rev)
+        {
+          fatal("The following options are not accepted with the fastx_filter command when the input is a FASTA file, because quality scores are not available: eeout, fastq_ascii, fastq_eeout, fastq_maxee, fastq_maxee_rate, fastq_out, fastq_qmax, fastq_qmin, fastq_truncee, fastq_truncqual,  fastqout_discarded, fastqout_discarded_rev, fastqout_rev");
+        }
+    }
 
   uint64_t filesize = fastx_get_size(h1);
 
@@ -238,11 +256,32 @@ void filter(bool fastq_only, char * filename)
       if (!h2)
         fatal("Unrecognized file type (not proper FASTA or FASTQ format) for reverse reads");
 
-      if (fastq_only && ! h2->is_fastq)
-        fatal("FASTA input files not allowed with fastq_filter, consider using fastx_filter command instead");
+      if (h1->is_fastq != h2->is_fastq)
+        fatal("The forward and reverse input sequence must in the same format, either FASTA or FASTQ");
 
-      if ((opt_fastqout_rev || opt_fastqout_discarded_rev) && ! h2->is_fastq)
-        fatal("Cannot write FASTQ output with a FASTA input file, lacking quality scores");
+      if (! h2->is_fastq)
+        {
+          if (fastq_only)
+            {
+              fatal("FASTA input files not allowed with fastq_filter, consider using fastx_filter command instead");
+            }
+          else if (opt_eeout ||
+                   (opt_fastq_ascii != 33) ||
+                   opt_fastq_eeout ||
+                   (opt_fastq_maxee < DBL_MAX) ||
+                   (opt_fastq_maxee_rate < DBL_MAX) ||
+                   opt_fastqout ||
+                   (opt_fastq_qmax < 41) ||
+                   (opt_fastq_qmin > 0) ||
+                   (opt_fastq_truncee < DBL_MAX) ||
+                   (opt_fastq_truncqual < LONG_MIN) ||
+                   opt_fastqout_discarded ||
+                   opt_fastqout_discarded_rev ||
+                   opt_fastqout_rev)
+            {
+              fatal("The following options are not accepted with the fastx_filter command when the input is a FASTA file, because quality scores are not available: eeout, fastq_ascii, fastq_eeout, fastq_maxee, fastq_maxee_rate, fastq_out, fastq_qmax, fastq_qmin, fastq_truncee, fastq_truncqual,  fastqout_discarded, fastqout_discarded_rev, fastqout_rev");
+            }
+        }
     }
 
   FILE * fp_fastaout = 0;


=====================================
src/vsearch.cc
=====================================
@@ -4192,7 +4192,23 @@ void cmd_version()
 #ifdef HAVE_ZLIB_H
       printf("Compiled with support for gzip-compressed files,");
       if (gz_lib)
-        printf(" and the library is loaded.\n");
+        {
+          printf(" and the library is loaded.\n");
+
+          char * (*zlibVersion_p)();
+          zlibVersion_p = (char * (*)()) arch_dlsym(gz_lib,
+                                                    "zlibVersion");
+          char * gz_version = (*zlibVersion_p)();
+          uLong (*zlibCompileFlags_p)(void);
+          zlibCompileFlags_p = (uLong (*)()) arch_dlsym(gz_lib,
+                                                        "zlibCompileFlags");
+          uLong flags = (*zlibCompileFlags_p)();
+
+          printf("zlib version %s, compile flags %lx", gz_version, flags);
+          if (flags & 0x0400)
+            printf(" (ZLIB_WINAPI)");
+          printf("\n");
+        }
       else
         printf(" but the library was not found.\n");
 #else



View it on GitLab: https://salsa.debian.org/med-team/vsearch/-/commit/b20dc72e7235e2d59640fc950802d4114ac95387

-- 
View it on GitLab: https://salsa.debian.org/med-team/vsearch/-/commit/b20dc72e7235e2d59640fc950802d4114ac95387
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20201030/88ecf917/attachment-0001.html>


More information about the debian-med-commit mailing list