[med-svn] [Git][med-team/nanofilt][upstream] New upstream version 2.8.0
Nilesh Patra (@nilesh)
gitlab at salsa.debian.org
Sat Dec 31 12:32:56 GMT 2022
Nilesh Patra pushed to branch upstream at Debian Med / nanofilt
Commits:
d3f0260f by Nilesh Patra at 2022-12-31T17:54:10+05:30
New upstream version 2.8.0
- - - - -
6 changed files:
- − .travis.yml
- README.rst
- nanofilt/NanoFilt.py
- nanofilt/utils.py
- nanofilt/version.py
- setup.py
Changes:
=====================================
.travis.yml deleted
=====================================
@@ -1,24 +0,0 @@
-language: python
-
-python:
- - "3.5"
- - "3.5-dev"
- - "3.6"
- - "3.6-dev"
-
-before_install:
- - cp README.md README.rst
- - pip install flake8
-
-install:
- - pip install -e .
-
-script:
- - bash scripts/test.sh
- - flake8 nanofilt/NanoFilt.py
-
-notifications:
- email: false
- webhooks:
- urls:
- - https://webhooks.gitter.im/e/4b1c45cea6826ce475c2
=====================================
README.rst
=====================================
@@ -3,11 +3,12 @@ Nanofilt
Filtering and trimming of long read sequencing data.
-|Twitter URL| |conda badge| |Build Status| |Code Health|
+|Twitter URL| |conda badge| |Build Status|
| Filtering on quality and/or read length, and optional trimming after
passing filters.
-| Reads from stdin, writes to stdout.
+| Reads from stdin, writes to stdout. Optionally reads directly from an
+ uncompressed file specified on the command line.
| Intended to be used:
| - directly after fastq extraction
@@ -21,8 +22,8 @@ Filtering and trimming of long read sequencing data.
between calculated read quality and the quality as summarized by
albacore this script takes since v1.1.0 optionally also a
``--summary`` argument. Using this argument with the
- sequencing\_summary.txt file from albacore will do the filtering using
- the quality scores from the summary. It's also faster.
+ sequencing_summary.txt file from albacore will do the filtering using
+ the quality scores from the summary. It’s also faster.
INSTALLATION AND UPGRADING:
~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -41,28 +42,45 @@ USAGE:
::
- NanoFilt [-h] [-q QUALITY] [-l LENGTH] [--headcrop HEADCROP] [--tailcrop TAILCROP]
-
- optional arguments:
- -h, --help show this help message and exit
- -s --summary SUMMARYFILE optional, the sequencing_summary file from albacore for extracting quality scores
- -q, --quality QUALITY Filter on a minimum average read quality score
- -l, --length LENGTH Filter on a minimum read length
- --headcrop HEADCROP Trim n nucleotides from start of read
- --tailcrop TAILCROP Trim n nucleotides from end of read
- --minGC MINGC Sequences must have GC content >= to this. Float
- between 0.0 and 1.0. Ignored if using summary file.
- --maxGC MAXGC Sequences must have GC content <= to this. Float
- between 0.0 and 1.0. Ignored if using summary file.
+ NanoFilt [-h] [-v] [--logfile LOGFILE] [-l LENGTH]
+ [--maxlength MAXLENGTH] [-q QUALITY] [--minGC MINGC]
+ [--maxGC MAXGC] [--headcrop HEADCROP] [--tailcrop TAILCROP]
+ [-s SUMMARY] [--readtype {1D,2D,1D2}]
+ [input]
+
+ Perform quality and/or length and/or GC filtering of (long read) fastq data. Reads on stdin.
+
+ General options:
+ -h, --help show the help and exit
+ -v, --version Print version and exit.
+ --logfile LOGFILE Specify the path and filename for the log file.
+ input input, uncompressed fastq file (optional)
+
+ Options for filtering reads on.:
+ -l, --length LENGTH Filter on a minimum read length
+ --maxlength MAXLENGTH Filter on a maximum read length
+ -q, --quality QUALITY Filter on a minimum average read quality score
+ --minGC MINGC Sequences must have GC content >= to this. Float between 0.0 and 1.0. Ignored if
+ using summary file.
+ --maxGC MAXGC Sequences must have GC content <= to this. Float between 0.0 and 1.0. Ignored if
+ using summary file.
+
+ Options for trimming reads.:
+ --headcrop HEADCROP Trim n nucleotides from start of read
+ --tailcrop TAILCROP Trim n nucleotides from end of read
+
+ Input options.:
+ -s, --summary SUMMARY Use albacore or guppy summary file for quality scores
+ --readtype Which read type to extract information about from summary. Options are 1D, 2D or 1D2
EXAMPLES
~~~~~~~~
.. code:: bash
- gunzip -c reads.fastq.gz | NanoFilt -q 10 -l 500 --headcrop 50 | minimap2 genome.fa - | samtools sort -O BAM - at 24 -o alignment.bam -
- gunzip -c reads.fastq.gz | NanoFilt -q 12 --headcrop 75 | gzip > trimmed-reads.fastq.gz
- gunzip -c reads.fastq.gz | NanoFilt -q 10 | gzip > highQuality-reads.fastq.gz
+ gunzip -c reads.fastq.gz | NanoFilt -q 10 -l 500 --headcrop 50 | minimap2 genome.fa - | samtools sort -O BAM - at 24 -o alignment.bam -
+ gunzip -c reads.fastq.gz | NanoFilt -q 12 --headcrop 75 | gzip > trimmed-reads.fastq.gz
+ gunzip -c reads.fastq.gz | NanoFilt -q 10 | gzip > highQuality-reads.fastq.gz
I welcome all suggestions, bug reports, feature requests and
contributions. Please leave an
@@ -70,11 +88,15 @@ contributions. Please leave an
request. I will usually respond within a day, or rarely within a few
days.
+CITATION
+--------
+
+If you use this tool, please consider citing our
+`publication <https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/bty149/4934939>`__.
+
.. |Twitter URL| image:: https://img.shields.io/twitter/url/https/twitter.com/wouter_decoster.svg?style=social&label=Follow%20%40wouter_decoster
:target: https://twitter.com/wouter_decoster
.. |conda badge| image:: https://anaconda.org/bioconda/nanofilt/badges/installer/conda.svg
:target: https://anaconda.org/bioconda/nanofilt
.. |Build Status| image:: https://travis-ci.org/wdecoster/nanofilt.svg?branch=master
:target: https://travis-ci.org/wdecoster/nanofilt
-.. |Code Health| image:: https://landscape.io/github/wdecoster/nanofilt/master/landscape.svg?style=flat
- :target: https://landscape.io/github/wdecoster/nanofilt/master
=====================================
nanofilt/NanoFilt.py
=====================================
@@ -28,14 +28,18 @@ import pandas as pd
import nanofilt.utils as utils
import logging
from math import log
+from nanofilt.version import __version__
def main():
args = utils.get_args()
utils.start_logging(args.logfile)
+ logging.info('NanoFilt {} started with arguments {}'.format(__version__, args))
try:
if args.tailcrop:
args.tailcrop = -args.tailcrop
+ if args.tailcrop == 0:
+ args.tailcrop = None
if args.summary:
filter_using_summary(args.input, args)
else:
@@ -94,15 +98,17 @@ def filter_using_summary(fq, args):
).rename(mapper={"mean_qscore_template": "quals", "mean_qscore_2d": "quals"}, axis="columns") \
.set_index("read_id") \
.to_dict()["quals"]
- try:
- for rec in SeqIO.parse(fq, "fastq"):
+ for rec in SeqIO.parse(fq, "fastq"):
+ try:
if data[rec.id] >= args.quality and args.length <= len(rec) <= args.maxlength:
print(rec[args.headcrop:args.tailcrop].format("fastq"), end="")
- except KeyError:
- logging.error("mismatch between summary and fastq: \
- {} was not found in the summary file.".format(rec.id))
- sys.exit('\nERROR: mismatch between sequencing_summary and fastq file: \
- {} was not found in the summary file.\nQuitting.'.format(rec.id))
+ except KeyError:
+ logging.warning("mismatch between summary and fastq: \
+ {} was not found in the summary file. \
+ Falling back to calculating.".format(rec.id))
+ if ave_qual(rec.letter_annotations["phred_quality"]) >= args.quality \
+ and args.length <= len(rec) <= args.maxlength:
+ print(rec[args.headcrop:args.tailcrop].format("fastq"), end="")
def errs_tab(n):
=====================================
nanofilt/utils.py
=====================================
@@ -84,7 +84,6 @@ def get_args():
args.GC_filter = False
else:
args.GC_filter = True
- logging.info('NanoFilt {} started with arguments {}'.format(__version__, args))
return args
=====================================
nanofilt/version.py
=====================================
@@ -1 +1 @@
-__version__ = "2.6.0"
+__version__ = "2.8.0"
=====================================
setup.py
=====================================
@@ -17,7 +17,7 @@ setup(
url='https://github.com/wdecoster/nanofilt',
author='Wouter De Coster',
author_email='decosterwouter at gmail.com',
- license='MIT',
+ license='GPLv3',
classifiers=[
'Development Status :: 4 - Beta',
'Intended Audience :: Science/Research',
View it on GitLab: https://salsa.debian.org/med-team/nanofilt/-/commit/d3f0260f62a4e91f00662e087214755c7498d4a9
--
View it on GitLab: https://salsa.debian.org/med-team/nanofilt/-/commit/d3f0260f62a4e91f00662e087214755c7498d4a9
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20221231/7e12c6c9/attachment-0001.htm>
More information about the debian-med-commit
mailing list