[med-svn] [Git][med-team/nanofilt][upstream] New upstream version 2.8.0

Sat Dec 31 12:32:56 GMT 2022


Nilesh Patra pushed to branch upstream at Debian Med / nanofilt


Commits:
d3f0260f by Nilesh Patra at 2022-12-31T17:54:10+05:30
New upstream version 2.8.0
- - - - -


6 changed files:

- − .travis.yml
- README.rst
- nanofilt/NanoFilt.py
- nanofilt/utils.py
- nanofilt/version.py
- setup.py


Changes:

=====================================
.travis.yml deleted
=====================================
@@ -1,24 +0,0 @@
-language: python
-
-python:
-  - "3.5"
-  - "3.5-dev"
-  - "3.6"
-  - "3.6-dev"
-
-before_install:
-  - cp README.md README.rst
-  - pip install flake8
-
-install:
-  - pip install -e .
-
-script:
-  - bash scripts/test.sh
-  - flake8 nanofilt/NanoFilt.py
-
-notifications:
-  email: false
-  webhooks:
-    urls:
-        - https://webhooks.gitter.im/e/4b1c45cea6826ce475c2


=====================================
README.rst
=====================================
@@ -3,11 +3,12 @@ Nanofilt
 
 Filtering and trimming of long read sequencing data.
 
-|Twitter URL| |conda badge| |Build Status| |Code Health|
+|Twitter URL| |conda badge| |Build Status|
 
 | Filtering on quality and/or read length, and optional trimming after
   passing filters.
-| Reads from stdin, writes to stdout.
+| Reads from stdin, writes to stdout. Optionally reads directly from an
+  uncompressed file specified on the command line.
 
 | Intended to be used:
 | - directly after fastq extraction
@@ -21,8 +22,8 @@ Filtering and trimming of long read sequencing data.
   between calculated read quality and the quality as summarized by
   albacore this script takes since v1.1.0 optionally also a
   ``--summary`` argument. Using this argument with the
-  sequencing\_summary.txt file from albacore will do the filtering using
-  the quality scores from the summary. It's also faster.
+  sequencing_summary.txt file from albacore will do the filtering using
+  the quality scores from the summary. It’s also faster.
 
 INSTALLATION AND UPGRADING:
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -41,28 +42,45 @@ USAGE:
 
 ::
 
-    NanoFilt [-h] [-q QUALITY] [-l LENGTH] [--headcrop HEADCROP] [--tailcrop TAILCROP]
-
-    optional arguments:  
-      -h, --help            show this help message and exit  
-      -s --summary SUMMARYFILE optional, the sequencing_summary file from albacore for extracting quality scores
-      -q, --quality QUALITY  Filter on a minimum average read quality score  
-      -l, --length LENGTH Filter on a minimum read length  
-      --headcrop HEADCROP   Trim n nucleotides from start of read  
-      --tailcrop TAILCROP   Trim n nucleotides from end of read
-      --minGC MINGC         Sequences must have GC content >= to this. Float
-                            between 0.0 and 1.0. Ignored if using summary file.
-      --maxGC MAXGC         Sequences must have GC content <= to this. Float
-                            between 0.0 and 1.0. Ignored if using summary file.
+   NanoFilt [-h] [-v] [--logfile LOGFILE] [-l LENGTH]
+                   [--maxlength MAXLENGTH] [-q QUALITY] [--minGC MINGC]
+                   [--maxGC MAXGC] [--headcrop HEADCROP] [--tailcrop TAILCROP]
+                   [-s SUMMARY] [--readtype {1D,2D,1D2}]
+                   [input]
+
+   Perform quality and/or length and/or GC filtering of (long read) fastq data. Reads on stdin.
+
+   General options:
+     -h, --help            show the help and exit
+     -v, --version         Print version and exit.
+     --logfile LOGFILE     Specify the path and filename for the log file.
+     input                 input, uncompressed fastq file (optional)
+
+   Options for filtering reads on.:
+     -l, --length LENGTH   Filter on a minimum read length
+     --maxlength MAXLENGTH Filter on a maximum read length
+     -q, --quality QUALITY Filter on a minimum average read quality score
+     --minGC MINGC         Sequences must have GC content >= to this. Float between 0.0 and 1.0. Ignored if
+                           using summary file.
+     --maxGC MAXGC         Sequences must have GC content <= to this. Float between 0.0 and 1.0. Ignored if
+                           using summary file.
+
+   Options for trimming reads.:
+     --headcrop HEADCROP   Trim n nucleotides from start of read
+     --tailcrop TAILCROP   Trim n nucleotides from end of read
+
+   Input options.:
+     -s, --summary SUMMARY Use albacore or guppy summary file for quality scores
+     --readtype            Which read type to extract information about from summary. Options are 1D, 2D or 1D2
 
 EXAMPLES
 ~~~~~~~~
 
 .. code:: bash
 
-    gunzip -c reads.fastq.gz | NanoFilt -q 10 -l 500 --headcrop 50 | minimap2 genome.fa - | samtools sort -O BAM - at 24 -o alignment.bam -
-    gunzip -c reads.fastq.gz | NanoFilt -q 12 --headcrop 75 | gzip > trimmed-reads.fastq.gz
-    gunzip -c reads.fastq.gz | NanoFilt -q 10 | gzip > highQuality-reads.fastq.gz
+   gunzip -c reads.fastq.gz | NanoFilt -q 10 -l 500 --headcrop 50 | minimap2 genome.fa - | samtools sort -O BAM - at 24 -o alignment.bam -
+   gunzip -c reads.fastq.gz | NanoFilt -q 12 --headcrop 75 | gzip > trimmed-reads.fastq.gz
+   gunzip -c reads.fastq.gz | NanoFilt -q 10 | gzip > highQuality-reads.fastq.gz
 
 I welcome all suggestions, bug reports, feature requests and
 contributions. Please leave an
@@ -70,11 +88,15 @@ contributions. Please leave an
 request. I will usually respond within a day, or rarely within a few
 days.
 
+CITATION
+--------
+
+If you use this tool, please consider citing our
+`publication <https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/bty149/4934939>`__.
+
 .. |Twitter URL| image:: https://img.shields.io/twitter/url/https/twitter.com/wouter_decoster.svg?style=social&label=Follow%20%40wouter_decoster
    :target: https://twitter.com/wouter_decoster
 .. |conda badge| image:: https://anaconda.org/bioconda/nanofilt/badges/installer/conda.svg
    :target: https://anaconda.org/bioconda/nanofilt
 .. |Build Status| image:: https://travis-ci.org/wdecoster/nanofilt.svg?branch=master
    :target: https://travis-ci.org/wdecoster/nanofilt
-.. |Code Health| image:: https://landscape.io/github/wdecoster/nanofilt/master/landscape.svg?style=flat
-   :target: https://landscape.io/github/wdecoster/nanofilt/master


=====================================
nanofilt/NanoFilt.py
=====================================
@@ -28,14 +28,18 @@ import pandas as pd
 import nanofilt.utils as utils
 import logging
 from math import log
+from nanofilt.version import __version__
 
 
 def main():
     args = utils.get_args()
     utils.start_logging(args.logfile)
+    logging.info('NanoFilt {} started with arguments {}'.format(__version__, args))
     try:
         if args.tailcrop:
             args.tailcrop = -args.tailcrop
+        if args.tailcrop == 0:
+            args.tailcrop = None
         if args.summary:
             filter_using_summary(args.input, args)
         else:
@@ -94,15 +98,17 @@ def filter_using_summary(fq, args):
     ).rename(mapper={"mean_qscore_template": "quals", "mean_qscore_2d": "quals"}, axis="columns") \
         .set_index("read_id") \
         .to_dict()["quals"]
-    try:
-        for rec in SeqIO.parse(fq, "fastq"):
+    for rec in SeqIO.parse(fq, "fastq"):
+        try:
             if data[rec.id] >= args.quality and args.length <= len(rec) <= args.maxlength:
                 print(rec[args.headcrop:args.tailcrop].format("fastq"), end="")
-    except KeyError:
-        logging.error("mismatch between summary and fastq: \
-                       {} was not found in the summary file.".format(rec.id))
-        sys.exit('\nERROR: mismatch between sequencing_summary and fastq file: \
-                 {} was not found in the summary file.\nQuitting.'.format(rec.id))
+        except KeyError:
+            logging.warning("mismatch between summary and fastq: \
+                   {} was not found in the summary file. \
+                   Falling back to calculating.".format(rec.id))
+            if ave_qual(rec.letter_annotations["phred_quality"]) >= args.quality \
+                    and args.length <= len(rec) <= args.maxlength:
+                print(rec[args.headcrop:args.tailcrop].format("fastq"), end="")
 
 
 def errs_tab(n):


=====================================
nanofilt/utils.py
=====================================
@@ -84,7 +84,6 @@ def get_args():
         args.GC_filter = False
     else:
         args.GC_filter = True
-    logging.info('NanoFilt {} started with arguments {}'.format(__version__, args))
     return args
 
 


=====================================
nanofilt/version.py
=====================================
@@ -1 +1 @@
-__version__ = "2.6.0"
+__version__ = "2.8.0"


=====================================
setup.py
=====================================
@@ -17,7 +17,7 @@ setup(
     url='https://github.com/wdecoster/nanofilt',
     author='Wouter De Coster',
     author_email='decosterwouter at gmail.com',
-    license='MIT',
+    license='GPLv3',
     classifiers=[
         'Development Status :: 4 - Beta',
         'Intended Audience :: Science/Research',



View it on GitLab: https://salsa.debian.org/med-team/nanofilt/-/commit/d3f0260f62a4e91f00662e087214755c7498d4a9

-- 
View it on GitLab: https://salsa.debian.org/med-team/nanofilt/-/commit/d3f0260f62a4e91f00662e087214755c7498d4a9
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20221231/7e12c6c9/attachment-0001.htm>