[med-svn] [Git][med-team/python-cutadapt][upstream] New upstream version 2.9
Steffen Möller
gitlab at salsa.debian.org
Tue Mar 31 15:32:33 BST 2020
Steffen Möller pushed to branch upstream at Debian Med / python-cutadapt
Commits:
afc23835 by Steffen Moeller at 2020-03-31T16:14:54+02:00
New upstream version 2.9
- - - - -
28 changed files:
- .gitignore
- .travis.yml
- CHANGES.rst
- LICENSE
- doc/conf.py
- doc/guide.rst
- doc/installation.rst
- pyproject.toml
- src/cutadapt/__main__.py
- src/cutadapt/adapters.py
- src/cutadapt/filters.py
- src/cutadapt/modifiers.py
- src/cutadapt/pipeline.py
- src/cutadapt/qualtrim.pyx
- src/cutadapt/report.py
- + tests/cut/maxee.fastq
- − tests/data/SRR2040271_1.fastq
- + tests/data/maxee.fastq
- − tests/data/plus.fastq
- tests/test_adapters.py
- tests/test_commandline.py
- tests/test_main.py
- tests/test_paired.py
- tests/test_parser.py
- tests/test_qualtrim.py
- tests/test_trim.py
- tests/utils.py
- tox.ini
Changes:
=====================================
.gitignore
=====================================
@@ -15,3 +15,4 @@ __pycache__/
/doc/_build
/src/cutadapt.egg-info/
src/cutadapt/_version.py
+.mpypy_cache
=====================================
.travis.yml
=====================================
@@ -1,20 +1,11 @@
language: python
-dist: xenial
-
cache:
directories:
- $HOME/.cache/pip
-python:
- - "3.5"
- - "3.6"
- - "3.7"
- - "3.8"
- - "nightly"
-
install:
- - pip install 'Cython>=0.28' tox-travis
+ - pip install tox
script:
- tox
@@ -31,6 +22,30 @@ env:
jobs:
include:
+ - python: "3.5"
+ env: TOXENV=py35
+
+ - python: "3.6"
+ env: TOXENV=py36
+
+ - python: "3.7"
+ env: TOXENV=py37
+
+ - python: "3.8"
+ env: TOXENV=py38
+
+ - name: flake8
+ python: "3.6"
+ env: TOXENV=flake8
+
+ - name: mypy
+ python: "3.6"
+ env: TOXENV=mypy
+
+ - name: docs
+ python: "3.6"
+ env: TOXENV=docs
+
- stage: deploy
services:
- docker
@@ -44,10 +59,5 @@ jobs:
ls -l dist/
python3 -m twine upload dist/*
- - name: flake8
- python: "3.6"
- install: python3 -m pip install flake8
- script: flake8 src/ tests/
-
allow_failures:
- python: "nightly"
=====================================
CHANGES.rst
=====================================
@@ -2,6 +2,17 @@
Changes
=======
+v2.9 (2020-03-18)
+-----------------
+
+* :issue:`441`: Add a ``--max-ee`` (or ``--max-expected-errors``) option
+ for filtering reads whose number of expected errors exceeds the given
+ threshold. The idea comes from
+ `Edgar et al. (2015) <https://academic.oup.com/bioinformatics/article/31/21/3476/194979>`_.
+* :issue:`438`: The info file now contains the `` rc`` suffix that is added to
+ the names of reverse-complemented reads (with ``--revcomp``).
+* :issue:`448`: ``.bz2`` and ``.xz`` output wasn’t possible in multi-core mode.
+
v2.8 (2020-01-13)
-----------------
=====================================
LICENSE
=====================================
@@ -1,4 +1,4 @@
-Copyright (c) 2010-2019 Marcel Martin <marcel.martin at scilifelab.se>
+Copyright (c) 2010-2020 Marcel Martin <marcel.martin at scilifelab.se>
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
=====================================
doc/conf.py
=====================================
@@ -14,6 +14,7 @@
import sys
import os
+import time
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
@@ -46,8 +47,8 @@ source_suffix = '.rst'
master_doc = 'index'
# General information about the project.
-project = u'Cutadapt'
-copyright = u'2010-2019, Marcel Martin'
+project = 'Cutadapt'
+copyright = '2010-{}, Marcel Martin'.format(time.gmtime().tm_year)
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
=====================================
doc/guide.rst
=====================================
@@ -807,10 +807,12 @@ Searching reverse complements
-----------------------------
.. note::
- Option ``--revcomp`` is added on a tentative basis. Its behaviour
+ Option ``--revcomp`` is added on a tentative basis. Its behaviour may change in the next
+ releases.
By default, Cutadapt expects adapters to be given in the same orientation (5' to 3') as the reads.
+That is, neither reads nor adapters are reverse-complemented.
To change this, use option ``--revcomp`` or its abbreviation ``--rc``. If given, Cutadapt searches
both the read and its reverse complement for adapters. If the reverse complemented read yields
@@ -1122,6 +1124,12 @@ reads. They always discard those reads for which the filtering criterion applies
Discard reads with more than COUNT ``N`` bases. If ``COUNT_or_FRACTION`` is
a number between 0 and 1, it is interpreted as a fraction of the read length
+``--max-expected-errors ERRORS`` or ``--max-ee ERRORS``
+ Discard reads with more than ERRORS expected errors. The number of expected
+ errors is computed as described in
+ `Edgar et al. (2015) <https://academic.oup.com/bioinformatics/article/31/21/3476/194979>`_,
+ (Section 2.2).
+
``--discard-casava``
Discard reads that did not pass CASAVA filtering. Illumina’s CASAVA pipeline in
version 1.8 adds an *is_filtered* header field to each read. Specifying this
=====================================
doc/installation.rst
=====================================
@@ -11,7 +11,7 @@ Quick installation
The easiest way to install Cutadapt is to use ``pip3`` on the command line::
- pip3 install --user --upgrade cutadapt
+ python3 -m pip install --user --upgrade cutadapt
This will download the software from `PyPI (the Python packaging
index) <https://pypi.python.org/pypi/cutadapt/>`_, and
@@ -28,14 +28,30 @@ If you want to avoid typing the full path, add the directory
Installation with conda
-----------------------
-Alternatively, Cutadapt is available as a conda package from the
-`bioconda channel <https://bioconda.github.io/>`_. If you do not have conda,
-`install miniconda <http://conda.pydata.org/miniconda.html>`_ first.
-Then install Cutadapt like this::
+Alternatively, Cutadapt is available as a Conda package from the
+`bioconda channel <https://bioconda.github.io/>`_.
+`Install miniconda <http://conda.pydata.org/miniconda.html>`_ if
+you don’t have Conda. Then follow the `Bioconda installation
+instructions <https://bioconda.github.io/user/install.html>`_ (in particular,
+make sure you have both `bioconda` and `conda-forge` in your channels list).
- conda install -c bioconda cutadapt
+To then install Cutadapt into a new Conda environment, use this command::
-If neither ``pip`` nor ``conda`` installation works, keep reading.
+ conda create -n cutadaptenv cutadapt
+
+Here, ``cutadaptenv`` is the name of the Conda environment. (You can
+choose a different name.)
+
+An environment needs to be activated every time you want to use the
+programs in it::
+
+ conda activate cutadaptenv
+
+Finally, check whether it worked::
+
+ cutadapt --version
+
+This should show the Cutadapt version number.
Installation on a Debian-based Linux distribution
=====================================
pyproject.toml
=====================================
@@ -1,2 +1,5 @@
[build-system]
requires = ["setuptools", "wheel", "setuptools_scm", "cython"]
+
+[black.tool]
+line-length = 100
=====================================
src/cutadapt/__main__.py
=====================================
@@ -1,6 +1,6 @@
#!/usr/bin/env python
#
-# Copyright (c) 2010-2019 Marcel Martin <marcel.martin at scilifelab.se>
+# Copyright (c) 2010-2020 Marcel Martin <marcel.martin at scilifelab.se>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
@@ -23,7 +23,7 @@
"""
cutadapt version {version}
-Copyright (C) 2010-2019 Marcel Martin <marcel.martin at scilifelab.se>
+Copyright (C) 2010-2020 Marcel Martin <marcel.martin at scilifelab.se>
cutadapt removes adapter sequences from high-throughput sequencing reads..
@@ -57,7 +57,7 @@ import sys
import time
import logging
import platform
-from typing import Tuple, Optional, Sequence, List, Any, Iterator, Union
+from typing import Tuple, Optional, Sequence, List, Any, Iterator, Union, Type
from argparse import ArgumentParser, SUPPRESS, HelpFormatter
import dnaio
@@ -65,13 +65,13 @@ import dnaio
from cutadapt import __version__
from cutadapt.adapters import warn_duplicate_adapters, Adapter
from cutadapt.parser import AdapterParser
-from cutadapt.modifiers import (Modifier, LengthTagModifier, SuffixRemover, PrefixSuffixAdder,
+from cutadapt.modifiers import (SingleEndModifier, LengthTagModifier, SuffixRemover, PrefixSuffixAdder,
ZeroCapper, QualityTrimmer, UnconditionalCutter, NEndTrimmer, AdapterCutter,
PairedAdapterCutterError, PairedAdapterCutter, NextseqQualityTrimmer, Shortener,
ReverseComplementer)
from cutadapt.report import full_report, minimal_report
from cutadapt.pipeline import (Pipeline, SingleEndPipeline, PairedEndPipeline, InputFiles,
- OutputFiles, SerialPipelineRunner, ParallelPipelineRunner)
+ OutputFiles, PipelineRunner, SerialPipelineRunner, ParallelPipelineRunner)
from cutadapt.utils import available_cpu_count, Progress, DummyProgress, FileOpener
from cutadapt.log import setup_logging, REPORT
@@ -248,6 +248,10 @@ def get_argument_parser() -> ArgumentParser:
group.add_argument("--max-n", type=float, default=None, metavar="COUNT",
help="Discard reads with more than COUNT 'N' bases. If COUNT is a number "
"between 0 and 1, it is interpreted as a fraction of the read length.")
+ group.add_argument("--max-expected-errors", "--max-ee", type=float, default=None,
+ metavar="ERRORS",
+ help="Discard reads whose expected number of errors (computed "
+ "from quality values) exceeds ERRORS.")
group.add_argument("--discard-trimmed", "--discard", action='store_true', default=False,
help="Discard reads that contain an adapter. Use also -O to avoid "
"discarding too many randomly matching reads.")
@@ -672,6 +676,7 @@ def pipeline_from_parsed_args(args, paired, file_opener) -> Pipeline:
lengths = (lengths[0], lengths[0])
setattr(pipeline, attr, lengths)
pipeline.max_n = args.max_n
+ pipeline.max_expected_errors = args.max_expected_errors
pipeline.discard_casava = args.discard_casava
pipeline.discard_trimmed = args.discard_trimmed
pipeline.discard_untrimmed = args.discard_untrimmed
@@ -764,7 +769,7 @@ def add_adapter_cutter(
pipeline.add(modifier)
-def modifiers_applying_to_both_ends_if_paired(args) -> Iterator[Modifier]:
+def modifiers_applying_to_both_ends_if_paired(args) -> Iterator[SingleEndModifier]:
if args.length is not None:
yield Shortener(args.length)
if args.trim_n:
@@ -836,6 +841,11 @@ def main(cmdlineargs=None, default_outfile=sys.stdout.buffer):
cores = available_cpu_count() if args.cores == 0 else args.cores
file_opener = FileOpener(
compression_level=args.compression_level, threads=0 if cores == 1 else None)
+ if sys.stderr.isatty() and not args.quiet:
+ progress = Progress()
+ else:
+ progress = DummyProgress()
+
try:
is_interleaved_input, is_interleaved_output = determine_interleaved(args)
input_filename, input_paired_filename = setup_input_files(args.inputs,
@@ -843,40 +853,19 @@ def main(cmdlineargs=None, default_outfile=sys.stdout.buffer):
check_arguments(args, paired, is_interleaved_output)
pipeline = pipeline_from_parsed_args(args, paired, file_opener)
outfiles = open_output_files(args, default_outfile, is_interleaved_output, file_opener)
+ infiles = InputFiles(input_filename, file2=input_paired_filename,
+ interleaved=is_interleaved_input)
+ runner = setup_runner(pipeline, infiles, outfiles, progress, cores, args.buffer_size)
except CommandLineError as e:
parser.error(str(e))
return # avoid IDE warnings below
- if cores > 1:
- if ParallelPipelineRunner.can_output_to(outfiles):
- runner_class = ParallelPipelineRunner
- runner_kwargs = dict(n_workers=cores, buffer_size=args.buffer_size)
- else:
- parser.error("Running in parallel is currently not supported "
- "when using --format or when demultiplexing.\n"
- "Omit --cores/-j to continue.")
- return # avoid IDE warnings below
- else:
- runner_class = SerialPipelineRunner
- runner_kwargs = dict()
- infiles = InputFiles(input_filename, file2=input_paired_filename,
- interleaved=is_interleaved_input)
- if sys.stderr.isatty() and not args.quiet:
- progress = Progress()
- else:
- progress = DummyProgress()
- try:
- runner = runner_class(pipeline, infiles, outfiles, progress, **runner_kwargs)
- except (dnaio.UnknownFileFormat, IOError) as e:
- parser.error(e)
- return # avoid IDE warnings below
-
logger.info("Processing reads on %d core%s in %s mode ...",
cores, 's' if cores > 1 else '',
{False: 'single-end', True: 'paired-end'}[pipeline.paired])
try:
- stats = runner.run()
- runner.close()
+ with runner as r:
+ stats = r.run()
except KeyboardInterrupt:
print("Interrupted", file=sys.stderr)
sys.exit(130)
@@ -897,5 +886,26 @@ def main(cmdlineargs=None, default_outfile=sys.stdout.buffer):
pstats.Stats(profiler).sort_stats('time').print_stats(20)
+def setup_runner(pipeline: Pipeline, infiles, outfiles, progress, cores, buffer_size):
+ if cores > 1:
+ if ParallelPipelineRunner.can_output_to(outfiles):
+ runner_class = ParallelPipelineRunner # type: Type[PipelineRunner]
+ runner_kwargs = dict(n_workers=cores, buffer_size=buffer_size)
+ else:
+ raise CommandLineError("Running in parallel is currently not supported "
+ "when using --format or when demultiplexing.\n"
+ "Omit --cores/-j to continue.")
+ # return # avoid IDE warnings below
+ else:
+ runner_class = SerialPipelineRunner
+ runner_kwargs = dict()
+ try:
+ runner = runner_class(pipeline, infiles, outfiles, progress, **runner_kwargs)
+ except (dnaio.UnknownFileFormat, IOError) as e:
+ raise CommandLineError(e)
+
+ return runner
+
+
if __name__ == '__main__':
main()
=====================================
src/cutadapt/adapters.py
=====================================
@@ -181,6 +181,7 @@ class AdapterStatistics:
class Match(ABC):
+
@abstractmethod
def remainder_interval(self) -> Tuple[int, int]:
pass
@@ -279,7 +280,7 @@ class SingleMatch(Match):
seq = self.read.sequence
qualities = self.read.qualities
info = [
- self.read.name,
+ "",
self.errors,
self.rstart,
self.rstop,
@@ -381,6 +382,7 @@ class SingleAdapter(Adapter):
indels: bool = True,
):
super().__init__()
+ assert not isinstance(remove, str)
self._debug = False # type: bool
self.name = _generate_adapter_name() if name is None else name # type: str
self.sequence = sequence.upper().replace("U", "T") # type: str
@@ -703,8 +705,16 @@ class MultiAdapter(Adapter):
self._accept(adapter)
if adapter.where is not self._where:
raise ValueError("All adapters must have identical 'where' attributes")
+ assert self._where in (Where.PREFIX, Where.SUFFIX)
self._adapters = adapters
- self._longest, self._index = self._make_index()
+ self._lengths, self._index = self._make_index()
+ if self._where is Where.PREFIX:
+ def make_affix(read, n):
+ return read.sequence[:n]
+ else:
+ def make_affix(read, n):
+ return read.sequence[-n:]
+ self._make_affix = make_affix
def __repr__(self):
return 'MultiAdapter(adapters={!r}, where={})'.format(self._adapters, self._where)
@@ -741,7 +751,7 @@ class MultiAdapter(Adapter):
def _make_index(self):
logger.info('Building index of %s adapters ...', len(self._adapters))
index = dict()
- longest = 0
+ lengths = set()
has_warned = False
for adapter in self._adapters:
sequence = adapter.sequence
@@ -762,35 +772,27 @@ class MultiAdapter(Adapter):
has_warned = True
else:
index[s] = (adapter, errors, matches)
- longest = max(longest, len(s))
+ lengths.add(len(s))
logger.info('Built an index containing %s strings.', len(index))
- return longest, index
+ return sorted(lengths, reverse=True), index
def match_to(self, read):
"""
Match the adapters against the read and return a Match that represents
the best match or None if no match was found
"""
- if self._where is Where.PREFIX:
- def make_affix(n):
- return read.sequence[:n]
- else:
- def make_affix(n):
- return read.sequence[-n:]
-
- # Check all the prefixes of the read that could match
+ # Check all the prefixes or suffixes (affixes) of the read that could match
best_adapter = None
best_length = 0
best_m = -1
best_e = 1000
- # TODO do not go through all the lengths, only those that actually exist in the index
- for length in range(self._longest, -1, -1):
+ for length in self._lengths:
if length < best_m:
# No chance of getting the same or a higher number of matches, so we can stop early
break
- affix = make_affix(length)
+ affix = self._make_affix(read, length)
try:
adapter, e, m = self._index[affix]
except KeyError:
=====================================
src/cutadapt/filters.py
=====================================
@@ -13,9 +13,13 @@ filters is created and each redirector is called in turn until one returns True.
The read is then assumed to have been "consumed", that is, either written
somewhere or filtered (should be discarded).
"""
+from collections import Counter
from abc import ABC, abstractmethod
-from typing import List
-from .adapters import Match
+from typing import List, Tuple, Optional, Dict, Any
+
+from .qualtrim import expected_errors
+from .utils import FileOpener
+from .modifiers import ModificationInfo
# Constants used when returning from a Filter’s __call__ method to improve
@@ -26,81 +30,113 @@ KEEP = False
class SingleEndFilter(ABC):
@abstractmethod
- def __call__(self, read, matches):
- pass
+ def __call__(self, read, info: ModificationInfo):
+ """
+ Called to process a single-end read
+
+ Any adapter matches are append to the matches list.
+ """
class PairedEndFilter(ABC):
@abstractmethod
- def __call__(self, read1, matches1, read2, matches2):
- pass
+ def __call__(self, read1, read2, info1: ModificationInfo, info2: ModificationInfo):
+ """
+ Called to process the read pair (read1, read2)
+
+ Any adapter matches are append to the info.matches list.
+ """
+
+
+class WithStatistics(ABC):
+ def __init__(self) -> None:
+ self._written = 0 # no. of written reads or read pairs
+ self._written_bp = [0, 0]
+ self._written_lengths = [Counter(), Counter()] # type: List[Counter]
+
+ def written_reads(self) -> int:
+ return self._written
+
+ def written_bp(self) -> Tuple[int, ...]:
+ return tuple(self._written_bp)
+
+ def written_lengths(self) -> Tuple[Counter, Counter]:
+ return (self._written_lengths[0].copy(), self._written_lengths[1].copy())
+
+
+class SingleEndFilterWithStatistics(SingleEndFilter, WithStatistics, ABC):
+ def __init__(self):
+ super().__init__()
+
+ def update_statistics(self, read) -> None:
+ self._written += 1
+ self._written_bp[0] += len(read)
+ self._written_lengths[0][len(read)] += 1
-class NoFilter(SingleEndFilter):
+class PairedEndFilterWithStatistics(PairedEndFilter, WithStatistics, ABC):
+ def __init__(self):
+ super().__init__()
+
+ def update_statistics(self, read1, read2):
+ self._written += 1
+ self._written_bp[0] += len(read1)
+ self._written_bp[1] += len(read2)
+ self._written_lengths[0][len(read1)] += 1
+ self._written_lengths[1][len(read2)] += 1
+
+
+class NoFilter(SingleEndFilterWithStatistics):
"""
No filtering, just send each read to the given writer.
"""
def __init__(self, writer):
+ super().__init__()
self.writer = writer
- self.written = 0 # no of written reads TODO move to writer
- self.written_bp = [0, 0]
-
- @property
- def filtered(self):
- return 0
- def __call__(self, read, matches):
+ def __call__(self, read, info: ModificationInfo):
self.writer.write(read)
- self.written += 1
- self.written_bp[0] += len(read)
+ self.update_statistics(read)
return DISCARD
-class PairedNoFilter(PairedEndFilter):
+class PairedNoFilter(PairedEndFilterWithStatistics):
"""
No filtering, just send each paired-end read to the given writer.
"""
def __init__(self, writer):
+ super().__init__()
self.writer = writer
- self.written = 0 # no of written reads or read pairs TODO move to writer
- self.written_bp = [0, 0]
- @property
- def filtered(self):
- return 0
-
- def __call__(self, read1, read2, matches1, matches2):
+ def __call__(self, read1, read2, info1: ModificationInfo, info2: ModificationInfo):
self.writer.write(read1, read2)
- self.written += 1
- self.written_bp[0] += len(read1)
- self.written_bp[1] += len(read2)
+ self.update_statistics(read1, read2)
+
return DISCARD
-class Redirector(SingleEndFilter):
+class Redirector(SingleEndFilterWithStatistics):
"""
Redirect discarded reads to the given writer. This is for single-end reads.
"""
def __init__(self, writer, filter: SingleEndFilter, filter2=None):
+ super().__init__()
# TODO filter2 should really not be here
self.filtered = 0
self.writer = writer
self.filter = filter
- self.written = 0 # no of written reads TODO move to writer
- self.written_bp = [0, 0]
- def __call__(self, read, matches):
- if self.filter(read, matches):
+ def __call__(self, read, info: ModificationInfo):
+ if self.filter(read, info):
self.filtered += 1
if self.writer is not None:
self.writer.write(read)
- self.written += 1
- self.written_bp[0] += len(read)
+ self.update_statistics(read)
return DISCARD
return KEEP
-class PairedRedirector(PairedEndFilter):
+class PairedRedirector(PairedEndFilterWithStatistics):
"""
Redirect paired-end reads matching a filtering criterion to a writer..
Different filtering styles are supported, differing by which of the
@@ -113,14 +149,13 @@ class PairedRedirector(PairedEndFilter):
'both': The pair is discarded if both reads match.
'first': The pair is discarded if the first read matches.
"""
+ super().__init__()
if pair_filter_mode not in ('any', 'both', 'first'):
raise ValueError("pair_filter_mode must be 'any', 'both' or 'first'")
self.filtered = 0
self.writer = writer
self.filter = filter
self.filter2 = filter2
- self.written = 0 # no of written reads or read pairs TODO move to writer
- self.written_bp = [0, 0]
if filter2 is None:
self._is_filtered = self._is_filtered_first
elif filter is None:
@@ -132,26 +167,24 @@ class PairedRedirector(PairedEndFilter):
else:
self._is_filtered = self._is_filtered_first
- def _is_filtered_any(self, read1, read2, matches1, matches2):
- return self.filter(read1, matches1) or self.filter2(read2, matches2)
+ def _is_filtered_any(self, read1, read2, info1: ModificationInfo, info2: ModificationInfo):
+ return self.filter(read1, info1) or self.filter2(read2, info2)
- def _is_filtered_both(self, read1, read2, matches1, matches2):
- return self.filter(read1, matches1) and self.filter2(read2, matches2)
+ def _is_filtered_both(self, read1, read2, info1: ModificationInfo, info2: ModificationInfo):
+ return self.filter(read1, info1) and self.filter2(read2, info2)
- def _is_filtered_first(self, read1, read2, matches1, matches2):
- return self.filter(read1, matches1)
+ def _is_filtered_first(self, read1, read2, info1: ModificationInfo, info2: ModificationInfo):
+ return self.filter(read1, info1)
- def _is_filtered_second(self, read1, read2, matches1, matches2):
- return self.filter2(read2, matches2)
+ def _is_filtered_second(self, read1, read2, info1: ModificationInfo, info2: ModificationInfo):
+ return self.filter2(read2, info2)
- def __call__(self, read1, read2, matches1, matches2):
- if self._is_filtered(read1, read2, matches1, matches2):
+ def __call__(self, read1, read2, info1: ModificationInfo, info2: ModificationInfo):
+ if self._is_filtered(read1, read2, info1, info2):
self.filtered += 1
if self.writer is not None:
self.writer.write(read1, read2)
- self.written += 1
- self.written_bp[0] += len(read1)
- self.written_bp[1] += len(read2)
+ self.update_statistics(read1, read2)
return DISCARD
return KEEP
@@ -160,7 +193,7 @@ class TooShortReadFilter(SingleEndFilter):
def __init__(self, minimum_length):
self.minimum_length = minimum_length
- def __call__(self, read, matches):
+ def __call__(self, read, info: ModificationInfo):
return len(read) < self.minimum_length
@@ -168,13 +201,29 @@ class TooLongReadFilter(SingleEndFilter):
def __init__(self, maximum_length):
self.maximum_length = maximum_length
- def __call__(self, read, matches):
+ def __call__(self, read, info: ModificationInfo):
return len(read) > self.maximum_length
+class MaximumExpectedErrorsFilter(SingleEndFilter):
+ """
+ Discard reads whose expected number of errors, according to the quality
+ values, exceeds a threshold.
+
+ The idea comes from usearch's -fastq_maxee parameter
+ (http://drive5.com/usearch/).
+ """
+ def __init__(self, max_errors):
+ self.max_errors = max_errors
+
+ def __call__(self, read, info: ModificationInfo):
+ """Return True when the read should be discarded"""
+ return expected_errors(read.qualities) > self.max_errors
+
+
class NContentFilter(SingleEndFilter):
"""
- Discards a reads that has a number of 'N's over a given threshold. It handles both raw counts
+ Discard a read if it has too many 'N' bases. It handles both raw counts
of Ns as well as proportions. Note, for raw counts, it is a 'greater than' comparison,
so a cutoff of '1' will keep reads with a single N in it.
"""
@@ -187,7 +236,7 @@ class NContentFilter(SingleEndFilter):
self.is_proportion = count < 1.0
self.cutoff = count
- def __call__(self, read, matches):
+ def __call__(self, read, info: ModificationInfo):
"""Return True when the read should be discarded"""
n_count = read.sequence.lower().count('n')
if self.is_proportion:
@@ -202,16 +251,16 @@ class DiscardUntrimmedFilter(SingleEndFilter):
"""
Return True if read is untrimmed.
"""
- def __call__(self, read, matches):
- return not matches
+ def __call__(self, read, info: ModificationInfo):
+ return not info.matches
class DiscardTrimmedFilter(SingleEndFilter):
"""
Return True if read is trimmed.
"""
- def __call__(self, read, matches):
- return bool(matches)
+ def __call__(self, read, info: ModificationInfo):
+ return bool(info.matches)
class CasavaFilter(SingleEndFilter):
@@ -222,12 +271,12 @@ class CasavaFilter(SingleEndFilter):
Reads with unrecognized headers are kept.
"""
- def __call__(self, read, matches):
+ def __call__(self, read, info: ModificationInfo):
_, _, right = read.name.partition(' ')
return right[1:4] == ':Y:' # discard if :Y: found
-class Demultiplexer(SingleEndFilter):
+class Demultiplexer(SingleEndFilterWithStatistics):
"""
Demultiplex trimmed reads. Reads are written to different output files
depending on which adapter matches. Files are created when the first read
@@ -240,35 +289,32 @@ class Demultiplexer(SingleEndFilter):
Reads without an adapter match are written to the file named by
untrimmed_path.
"""
+ super().__init__()
assert '{name}' in path_template
self.template = path_template
self.untrimmed_path = untrimmed_path
self.untrimmed_writer = None
self.writers = dict()
- self.written = 0
- self.written_bp = [0, 0]
self.qualities = qualities
self.file_opener = file_opener
- def __call__(self, read, matches):
+ def __call__(self, read, info):
"""
Write the read to the proper output file according to the most recent match
"""
- if matches:
- name = matches[-1].adapter.name
+ if info.matches:
+ name = info.matches[-1].adapter.name
if name not in self.writers:
self.writers[name] = self.file_opener.dnaio_open_raise_limit(
self.template.replace('{name}', name), self.qualities)
- self.written += 1
- self.written_bp[0] += len(read)
+ self.update_statistics(read)
self.writers[name].write(read)
else:
if self.untrimmed_writer is None and self.untrimmed_path is not None:
self.untrimmed_writer = self.file_opener.dnaio_open_raise_limit(
self.untrimmed_path, self.qualities)
if self.untrimmed_writer is not None:
- self.written += 1
- self.written_bp[0] += len(read)
+ self.update_statistics(read)
self.untrimmed_writer.write(read)
return DISCARD
@@ -279,7 +325,7 @@ class Demultiplexer(SingleEndFilter):
self.untrimmed_writer.close()
-class PairedDemultiplexer(PairedEndFilter):
+class PairedDemultiplexer(PairedEndFilterWithStatistics):
"""
Demultiplex trimmed paired-end reads. Reads are written to different output files
depending on which adapter (in read 1) matches.
@@ -292,34 +338,40 @@ class PairedDemultiplexer(PairedEndFilter):
Read pairs without an adapter match are written to the files named by
untrimmed_path.
"""
+ super().__init__()
self._demultiplexer1 = Demultiplexer(path_template, untrimmed_path, qualities, file_opener)
self._demultiplexer2 = Demultiplexer(path_paired_template, untrimmed_paired_path,
qualities, file_opener)
- @property
- def written(self):
- return self._demultiplexer1.written + self._demultiplexer2.written
+ def written(self) -> int:
+ return self._demultiplexer1._written + self._demultiplexer2._written
- @property
- def written_bp(self):
- return [self._demultiplexer1.written_bp[0], self._demultiplexer2.written_bp[0]]
+ def written_bp(self) -> Tuple[int, int]:
+ return (self._demultiplexer1._written_bp[0], self._demultiplexer2._written_bp[0])
- def __call__(self, read1, read2, matches1, matches2):
+ def __call__(self, read1, read2, info1: ModificationInfo, info2: ModificationInfo):
assert read2 is not None
- self._demultiplexer1(read1, matches1)
- self._demultiplexer2(read2, matches1)
+ self._demultiplexer1(read1, info1)
+ self._demultiplexer2(read2, info1)
def close(self):
self._demultiplexer1.close()
self._demultiplexer2.close()
-class CombinatorialDemultiplexer(PairedEndFilter):
+class CombinatorialDemultiplexer(PairedEndFilterWithStatistics):
"""
Demultiplex reads depending on which adapter matches, taking into account both matches
on R1 and R2.
"""
- def __init__(self, path_template, path_paired_template, untrimmed_name, qualities, file_opener):
+ def __init__(
+ self,
+ path_template: str,
+ path_paired_template: str,
+ untrimmed_name: Optional[str],
+ qualities: bool,
+ file_opener: FileOpener,
+ ):
"""
path_template must contain the string '{name1}' and '{name2}', which will be replaced
with the name of the adapters found on R1 and R2, respectively to form the final output
@@ -327,15 +379,17 @@ class CombinatorialDemultiplexer(PairedEndFilter):
specified by untrimmed_name. Alternatively, untrimmed_name can be set to None; in that
case, read pairs for which at least one read does not have an adapter match are
discarded.
+
+ untrimmed_name -- what to replace the templates with when one or both of the reads
+ do not contain an adapter (use "unknown"). Set to None to discard these read pairs.
"""
+ super().__init__()
assert '{name1}' in path_template and '{name2}' in path_template
assert '{name1}' in path_paired_template and '{name2}' in path_paired_template
self.template = path_template
self.paired_template = path_paired_template
self.untrimmed_name = untrimmed_name
- self.writers = dict()
- self.written = 0
- self.written_bp = [0, 0]
+ self.writers = dict() # type: Dict[Tuple[str, str], Any]
self.qualities = qualities
self.file_opener = file_opener
@@ -343,16 +397,17 @@ class CombinatorialDemultiplexer(PairedEndFilter):
def _make_path(template, name1, name2):
return template.replace('{name1}', name1).replace('{name2}', name2)
- def __call__(self, read1, read2, matches1, matches2):
+ def __call__(self, read1, read2, info1, info2):
"""
Write the read to the proper output file according to the most recent matches both on
R1 and R2
"""
assert read2 is not None
- name1 = matches1[-1].adapter.name if matches1 else None
- name2 = matches2[-1].adapter.name if matches2 else None
+ name1 = info1.matches[-1].adapter.name if info1.matches else None
+ name2 = info2.matches[-1].adapter.name if info2.matches else None
key = (name1, name2)
if key not in self.writers:
+ # Open writer on first use
if name1 is None:
name1 = self.untrimmed_name
if name2 is None:
@@ -366,9 +421,7 @@ class CombinatorialDemultiplexer(PairedEndFilter):
self.file_opener.dnaio_open_raise_limit(path2, qualities=self.qualities),
)
writer1, writer2 = self.writers[key]
- self.written += 1
- self.written_bp[0] += len(read1)
- self.written_bp[1] += len(read2)
+ self.update_statistics(read1, read2)
writer1.write(read1)
writer2.write(read2)
return DISCARD
@@ -383,9 +436,10 @@ class RestFileWriter(SingleEndFilter):
def __init__(self, file):
self.file = file
- def __call__(self, read, matches):
- if matches:
- rest = matches[-1].rest()
+ def __call__(self, read, info):
+ # TODO this fails with linked adapters
+ if info.matches:
+ rest = info.matches[-1].rest()
if len(rest) > 0:
print(rest, read.name, file=self.file)
return KEEP
@@ -395,9 +449,10 @@ class WildcardFileWriter(SingleEndFilter):
def __init__(self, file):
self.file = file
- def __call__(self, read, matches):
- if matches:
- print(matches[-1].wildcards(), read.name, file=self.file)
+ def __call__(self, read, info):
+ # TODO this fails with linked adapters
+ if info.matches:
+ print(info.matches[-1].wildcards(), read.name, file=self.file)
return KEEP
@@ -405,11 +460,12 @@ class InfoFileWriter(SingleEndFilter):
def __init__(self, file):
self.file = file
- def __call__(self, read, matches: List[Match]):
- if matches:
- for match in matches:
+ def __call__(self, read, info: ModificationInfo):
+ if info.matches:
+ for match in info.matches:
for info_record in match.get_info_records():
- print(*info_record, sep='\t', file=self.file)
+ # info_record[0] is the read name suffix
+ print(read.name + info_record[0], *info_record[1:], sep='\t', file=self.file)
else:
seq = read.sequence
qualities = read.qualities if read.qualities is not None else ''
=====================================
src/cutadapt/modifiers.py
=====================================
@@ -13,15 +13,27 @@ from .adapters import Where, MultiAdapter, Match, remainder
from .utils import reverse_complemented_sequence
-class Modifier(ABC):
+class ModificationInfo:
+ """
+ An object of this class is created for each read that passes through the pipeline.
+ Any information (except the read itself) that needs to be passed from one modifier
+ to one later in the pipeline or from one modifier to the filters is recorded here.
+ """
+ __slots__ = ["matches"]
+
+ def __init__(self):
+ self.matches = [] # type: List[Match]
+
+
+class SingleEndModifier(ABC):
@abstractmethod
- def __call__(self, read, matches: List[Match]):
+ def __call__(self, read, info: ModificationInfo):
pass
class PairedModifier(ABC):
@abstractmethod
- def __call__(self, read1, read2, matches1, matches2):
+ def __call__(self, read1, read2, info1: ModificationInfo, info2: ModificationInfo):
pass
@@ -31,24 +43,26 @@ class PairedModifierWrapper(PairedModifier):
"""
paired = True
- def __init__(self, modifier1: Optional[Modifier], modifier2: Optional[Modifier]):
+ def __init__(self, modifier1: Optional[SingleEndModifier], modifier2: Optional[SingleEndModifier]):
"""Set one of the modifiers to None to work on R1 or R2 only"""
self._modifier1 = modifier1
self._modifier2 = modifier2
+ if self._modifier1 is None and self._modifier2 is None:
+ raise ValueError("Not both modifiers may be None")
def __repr__(self):
return 'PairedModifier(modifier1={!r}, modifier2={!r})'.format(
self._modifier1, self._modifier2)
- def __call__(self, read1, read2, matches1, matches2):
+ def __call__(self, read1, read2, info1: ModificationInfo, info2: ModificationInfo):
if self._modifier1 is None:
- return read1, self._modifier2(read2, matches2)
+ return read1, self._modifier2(read2, info2) # type: ignore
if self._modifier2 is None:
- return self._modifier1(read1, matches1), read2
- return self._modifier1(read1, matches1), self._modifier2(read2, matches2)
+ return self._modifier1(read1, info1), read2
+ return self._modifier1(read1, info1), self._modifier2(read2, info2)
-class AdapterCutter(Modifier):
+class AdapterCutter(SingleEndModifier):
"""
Repeatedly find one of multiple adapters in reads.
The number of times the search is repeated is specified by the
@@ -148,13 +162,13 @@ class AdapterCutter(Modifier):
trimmed_read.qualities = read.qualities
return trimmed_read
- def __call__(self, read, inmatches: List[Match]):
+ def __call__(self, read, info: ModificationInfo):
trimmed_read, matches = self.match_and_trim(read)
if matches:
self.with_adapters += 1
for match in matches:
match.update_statistics(self.adapter_statistics[match.adapter])
- inmatches.extend(matches)
+ info.matches.extend(matches) # TODO extend or overwrite?
return trimmed_read
def match_and_trim(self, read):
@@ -198,7 +212,7 @@ class AdapterCutter(Modifier):
return trimmed_read, matches
-class ReverseComplementer(Modifier):
+class ReverseComplementer(SingleEndModifier):
"""Trim adapters from a read and its reverse complement"""
def __init__(self, adapter_cutter: AdapterCutter, rc_suffix: Optional[str] = " rc"):
@@ -209,7 +223,7 @@ class ReverseComplementer(Modifier):
self.reverse_complemented = 0
self._suffix = rc_suffix
- def __call__(self, read, inmatches: List[Match]):
+ def __call__(self, read, info: ModificationInfo):
reverse_read = reverse_complemented_sequence(read)
forward_trimmed_read, forward_matches = self.adapter_cutter.match_and_trim(read)
@@ -234,7 +248,7 @@ class ReverseComplementer(Modifier):
stats = self.adapter_cutter.adapter_statistics[match.adapter]
match.update_statistics(stats)
stats.reverse_complemented += bool(use_reverse_complement)
- inmatches.extend(matches)
+ info.matches.extend(matches) # TODO extend or overwrite?
return trimmed_read
@@ -277,7 +291,7 @@ class PairedAdapterCutter(PairedModifier):
return 'PairedAdapterCutter(adapters1={!r}, adapters2={!r})'.format(
self._adapters1, self._adapters2)
- def __call__(self, read1, read2, matches1, matches2):
+ def __call__(self, read1, read2, info1, info2):
"""
"""
match1 = AdapterCutter.best_match(self._adapters1, read1)
@@ -310,12 +324,12 @@ class PairedAdapterCutter(PairedModifier):
elif self.action is None: # --no-trim
trimmed_read = read[:]
result.append(trimmed_read)
- matches1.append(match1)
- matches2.append(match2)
+ info1.matches.append(match1)
+ info2.matches.append(match2)
return result
-class UnconditionalCutter(Modifier):
+class UnconditionalCutter(SingleEndModifier):
"""
A modifier that unconditionally removes the first n or the last n bases from a read.
@@ -325,14 +339,14 @@ class UnconditionalCutter(Modifier):
def __init__(self, length: int):
self.length = length
- def __call__(self, read, matches: List[Match]):
+ def __call__(self, read, info: ModificationInfo):
if self.length > 0:
return read[self.length:]
elif self.length < 0:
return read[:self.length]
-class LengthTagModifier(Modifier):
+class LengthTagModifier(SingleEndModifier):
"""
Replace "length=..." strings in read names.
"""
@@ -340,28 +354,28 @@ class LengthTagModifier(Modifier):
self.regex = re.compile(r"\b" + length_tag + r"[0-9]*\b")
self.length_tag = length_tag
- def __call__(self, read, matches):
+ def __call__(self, read, info: ModificationInfo):
read = read[:]
if read.name.find(self.length_tag) >= 0:
read.name = self.regex.sub(self.length_tag + str(len(read.sequence)), read.name)
return read
-class SuffixRemover(Modifier):
+class SuffixRemover(SingleEndModifier):
"""
Remove a given suffix from read names.
"""
def __init__(self, suffix):
self.suffix = suffix
- def __call__(self, read, matches):
+ def __call__(self, read, info: ModificationInfo):
read = read[:]
if read.name.endswith(self.suffix):
read.name = read.name[:-len(self.suffix)]
return read
-class PrefixSuffixAdder(Modifier):
+class PrefixSuffixAdder(SingleEndModifier):
"""
Add a suffix and a prefix to read names
"""
@@ -369,15 +383,15 @@ class PrefixSuffixAdder(Modifier):
self.prefix = prefix
self.suffix = suffix
- def __call__(self, read, matches):
+ def __call__(self, read, info):
read = read[:]
- adapter_name = matches[-1].adapter.name if matches else 'no_adapter'
+ adapter_name = info.matches[-1].adapter.name if info.matches else 'no_adapter'
read.name = self.prefix.replace('{name}', adapter_name) + read.name + \
self.suffix.replace('{name}', adapter_name)
return read
-class ZeroCapper(Modifier):
+class ZeroCapper(SingleEndModifier):
"""
Change negative quality values of a read to zero
"""
@@ -385,38 +399,38 @@ class ZeroCapper(Modifier):
qb = quality_base
self.zero_cap_trans = str.maketrans(''.join(map(chr, range(qb))), chr(qb) * qb)
- def __call__(self, read, matches):
+ def __call__(self, read, info: ModificationInfo):
read = read[:]
read.qualities = read.qualities.translate(self.zero_cap_trans)
return read
-class NextseqQualityTrimmer(Modifier):
+class NextseqQualityTrimmer(SingleEndModifier):
def __init__(self, cutoff, base):
self.cutoff = cutoff
self.base = base
self.trimmed_bases = 0
- def __call__(self, read, matches):
+ def __call__(self, read, info: ModificationInfo):
stop = nextseq_trim_index(read, self.cutoff, self.base)
self.trimmed_bases += len(read) - stop
return read[:stop]
-class QualityTrimmer(Modifier):
+class QualityTrimmer(SingleEndModifier):
def __init__(self, cutoff_front, cutoff_back, base):
self.cutoff_front = cutoff_front
self.cutoff_back = cutoff_back
self.base = base
self.trimmed_bases = 0
- def __call__(self, read, matches):
+ def __call__(self, read, info: ModificationInfo):
start, stop = quality_trim_index(read.qualities, self.cutoff_front, self.cutoff_back, self.base)
self.trimmed_bases += len(read) - (stop - start)
return read[start:stop]
-class Shortener(Modifier):
+class Shortener(SingleEndModifier):
"""Unconditionally shorten a read to the given length
If the length is positive, the bases are removed from the end of the read.
@@ -425,20 +439,20 @@ class Shortener(Modifier):
def __init__(self, length):
self.length = length
- def __call__(self, read, matches):
+ def __call__(self, read, info: ModificationInfo):
if self.length >= 0:
return read[:self.length]
else:
return read[self.length:]
-class NEndTrimmer(Modifier):
+class NEndTrimmer(SingleEndModifier):
"""Trims Ns from the 3' and 5' end of reads"""
def __init__(self):
self.start_trim = re.compile(r'^N+')
self.end_trim = re.compile(r'N+$')
- def __call__(self, read, matches):
+ def __call__(self, read, info: ModificationInfo):
sequence = read.sequence
start_cut = self.start_trim.match(sequence)
end_cut = self.end_trim.search(sequence)
=====================================
src/cutadapt/pipeline.py
=====================================
@@ -4,7 +4,7 @@ import sys
import copy
import logging
import functools
-from typing import List, IO, Optional, BinaryIO, TextIO, Any, Tuple, Union
+from typing import List, IO, Optional, BinaryIO, TextIO, Any, Tuple
from abc import ABC, abstractmethod
from multiprocessing import Process, Pipe, Queue
from pathlib import Path
@@ -16,10 +16,11 @@ from xopen import xopen
import dnaio
from .utils import Progress, FileOpener
-from .modifiers import Modifier, PairedModifier, PairedModifierWrapper
+from .modifiers import SingleEndModifier, PairedModifier, PairedModifierWrapper, ModificationInfo
from .report import Statistics
from .filters import (Redirector, PairedRedirector, NoFilter, PairedNoFilter, InfoFileWriter,
RestFileWriter, WildcardFileWriter, TooShortReadFilter, TooLongReadFilter, NContentFilter,
+ MaximumExpectedErrorsFilter,
CasavaFilter, DiscardTrimmedFilter, DiscardUntrimmedFilter, Demultiplexer,
PairedDemultiplexer, CombinatorialDemultiplexer)
@@ -98,10 +99,8 @@ class Pipeline(ABC):
def __init__(self, file_opener: FileOpener):
self._close_files = [] # type: List[IO]
- self._reader = None
+ self._reader = None # type: Any
self._filters = [] # type: List[Any]
- # TODO type should be Union[List[Modifier], List[PairedModifier]]
- self._modifiers = [] # type: List[Union[Modifier, PairedModifier]]
self._outfiles = None # type: Optional[OutputFiles]
self._demultiplexer = None
self._textiowrappers = [] # type: List[TextIO]
@@ -110,6 +109,7 @@ class Pipeline(ABC):
self._minimum_length = None
self._maximum_length = None
self.max_n = None
+ self.max_expected_errors = None
self.discard_casava = False
self.discard_trimmed = False
self.discard_untrimmed = False
@@ -172,6 +172,13 @@ class Pipeline(ABC):
f1 = f2 = NContentFilter(self.max_n)
self._filters.append(filter_wrapper(None, f1, f2))
+ if self.max_expected_errors is not None:
+ if not self._reader.delivers_qualities:
+ logger.warning("Ignoring option --max-ee as input does not contain quality values")
+ else:
+ f1 = f2 = MaximumExpectedErrorsFilter(self.max_expected_errors)
+ self._filters.append(filter_wrapper(None, f1, f2))
+
if self.discard_casava:
f1 = f2 = CasavaFilter()
self._filters.append(filter_wrapper(None, f1, f2))
@@ -211,6 +218,7 @@ class Pipeline(ABC):
f.flush()
def close(self) -> None:
+ self._reader.close()
for f in self._textiowrappers:
f.close() # This also closes the underlying files; a second close occurs below
assert self._outfiles is not None
@@ -255,9 +263,9 @@ class SingleEndPipeline(Pipeline):
def __init__(self, file_opener: FileOpener):
super().__init__(file_opener)
- self._modifiers = []
+ self._modifiers = [] # type: List[SingleEndModifier]
- def add(self, modifier: Modifier):
+ def add(self, modifier: SingleEndModifier):
if modifier is None:
raise ValueError("Modifier must not be None")
self._modifiers.append(modifier)
@@ -271,12 +279,12 @@ class SingleEndPipeline(Pipeline):
n += 1
if n % 10000 == 0 and progress:
progress.update(n)
- total_bp += len(read.sequence)
- matches = []
+ total_bp += len(read)
+ info = ModificationInfo()
for modifier in self._modifiers:
- read = modifier(read, matches)
+ read = modifier(read, info)
for filter_ in self._filters:
- if filter_(read, matches):
+ if filter_(read, info):
break
return (n, total_bp, None)
@@ -322,12 +330,13 @@ class PairedEndPipeline(Pipeline):
def __init__(self, pair_filter_mode, file_opener: FileOpener):
super().__init__(file_opener)
+ self._modifiers = [] # type: List[PairedModifier]
self._pair_filter_mode = pair_filter_mode
self._reader = None
# Whether to ignore pair_filter mode for discard-untrimmed filter
self.override_untrimmed_pair_filter = False
- def add(self, modifier1: Optional[Modifier], modifier2: Optional[Modifier]):
+ def add(self, modifier1: Optional[SingleEndModifier], modifier2: Optional[SingleEndModifier]):
"""
Add a modifier for R1 and R2. One of them can be None, in which case the modifier
will only be added for the respective read.
@@ -336,7 +345,7 @@ class PairedEndPipeline(Pipeline):
raise ValueError("Not both modifiers can be None")
self._modifiers.append(PairedModifierWrapper(modifier1, modifier2))
- def add_both(self, modifier: Modifier):
+ def add_both(self, modifier: SingleEndModifier):
"""
Add one modifier for both R1 and R2
"""
@@ -356,15 +365,15 @@ class PairedEndPipeline(Pipeline):
n += 1
if n % 10000 == 0 and progress:
progress.update(n)
- total1_bp += len(read1.sequence)
- total2_bp += len(read2.sequence)
- matches1 = []
- matches2 = []
+ total1_bp += len(read1)
+ total2_bp += len(read2)
+ info1 = ModificationInfo()
+ info2 = ModificationInfo()
for modifier in self._modifiers:
- read1, read2 = modifier(read1, read2, matches1, matches2)
+ read1, read2 = modifier(read1, read2, info1, info2)
for filter_ in self._filters:
# Stop writing as soon as one of the filters was successful.
- if filter_(read1, read2, matches1, matches2):
+ if filter_(read1, read2, info1, info2):
break
return (n, total1_bp, total2_bp)
@@ -556,7 +565,6 @@ class WorkerProcess(Process):
orig_outfile = getattr(self._orig_outfiles, attr)
if orig_outfile is not None:
output = io.BytesIO()
- output.name = orig_outfile.name
setattr(output_files, attr, output)
return output_files
@@ -602,7 +610,7 @@ class PipelineRunner(ABC):
"""
A read processing pipeline
"""
- def __init__(self, pipeline: Pipeline, progress: Progress):
+ def __init__(self, pipeline: Pipeline, progress: Progress, *args, **kwargs):
self._pipeline = pipeline
self._progress = progress
@@ -614,6 +622,12 @@ class PipelineRunner(ABC):
def close(self):
pass
+ def __enter__(self):
+ return self
+
+ def __exit__(self, *args):
+ self.close()
+
class ParallelPipelineRunner(PipelineRunner):
"""
=====================================
src/cutadapt/qualtrim.pyx
=====================================
@@ -82,3 +82,25 @@ def nextseq_trim_index(sequence, int cutoff, int base=33):
max_qual = s
max_i = i
return max_i
+
+
+def expected_errors(str qualities, int base=33):
+ """
+ Return the number of expected errors (as double) from a read’s
+ qualities.
+
+ This uses the formula in Edgar et al. (2015),
+ see Section 2.2 in <https://academic.oup.com/bioinformatics/article/31/21/3476/194979>.
+
+ qualities -- ASCII-encoded qualities (chr(qual + base))
+ """
+ cdef:
+ int i, q
+ bytes quals = qualities.encode()
+ char* cq = quals
+ double e = 0.0
+
+ for i in range(len(qualities)):
+ q = cq[i] - base
+ e += 10 ** (-q / 10)
+ return e
=====================================
src/cutadapt/report.py
=====================================
@@ -3,13 +3,12 @@ Routines for printing a report.
"""
from io import StringIO
import textwrap
+from collections import Counter
from typing import Any, Optional, List
from .adapters import Where, EndStatistics, AdapterStatistics, ADAPTER_TYPE_NAMES
-from .modifiers import (Modifier, PairedModifier, QualityTrimmer, NextseqQualityTrimmer,
+from .modifiers import (SingleEndModifier, PairedModifier, QualityTrimmer, NextseqQualityTrimmer,
AdapterCutter, PairedAdapterCutter, ReverseComplementer)
-from .filters import (NoFilter, PairedNoFilter, TooShortReadFilter, TooLongReadFilter,
- PairedDemultiplexer, CombinatorialDemultiplexer, Demultiplexer, NContentFilter, InfoFileWriter,
- WildcardFileWriter, RestFileWriter)
+from .filters import WithStatistics, TooShortReadFilter, TooLongReadFilter, NContentFilter
def safe_divide(numerator, denominator):
@@ -41,6 +40,7 @@ class Statistics:
self.written = 0
self.total_bp = [0, 0]
self.written_bp = [0, 0]
+ self.written_lengths = [Counter(), Counter()] # type: List[Counter]
self.with_adapters = [0, 0]
self.quality_trimmed_bp = [0, 0]
self.adapter_stats = [[], []] # type: List[List[AdapterStatistics]]
@@ -66,6 +66,7 @@ class Statistics:
for i in (0, 1):
self.total_bp[i] += other.total_bp[i]
self.written_bp[i] += other.written_bp[i]
+ self.written_lengths[i] += other.written_lengths[i]
self.with_adapters[i] += other.with_adapters[i]
self.quality_trimmed_bp[i] += other.quality_trimmed_bp[i]
if self.adapter_stats[i] and other.adapter_stats[i]:
@@ -102,13 +103,13 @@ class Statistics:
return self
def _collect_writer(self, w):
- if isinstance(w, (InfoFileWriter, RestFileWriter, WildcardFileWriter)):
- return
- elif isinstance(w, (NoFilter, PairedNoFilter, PairedDemultiplexer,
- CombinatorialDemultiplexer, Demultiplexer)):
- self.written += w.written
- self.written_bp[0] += w.written_bp[0]
- self.written_bp[1] += w.written_bp[1]
+ if isinstance(w, WithStatistics):
+ self.written += w.written_reads()
+ written_bp = w.written_bp()
+ written_lengths = w.written_lengths()
+ for i in 0, 1:
+ self.written_bp[i] += written_bp[i]
+ self.written_lengths[i] += written_lengths[i]
elif isinstance(w.filter, TooShortReadFilter):
self.too_short = w.filtered
elif isinstance(w.filter, TooLongReadFilter):
@@ -116,7 +117,7 @@ class Statistics:
elif isinstance(w.filter, NContentFilter):
self.too_many_n = w.filtered
- def _collect_modifier(self, m: Modifier):
+ def _collect_modifier(self, m: SingleEndModifier):
if isinstance(m, PairedAdapterCutter):
for i in 0, 1:
self.with_adapters[i] += m.with_adapters
@@ -368,7 +369,7 @@ def full_report(stats: Statistics, time: float, gc_content: float) -> str: # no
print_s("Sequence: {}; Type: {}; Length: {}; Trimmed: {} times".
format(adapter_statistics.front.sequence, ADAPTER_TYPE_NAMES[adapter_statistics.where],
len(adapter_statistics.front.sequence), total), end="")
- if reverse_complemented is not None:
+ if stats.reverse_complemented is not None:
print_s("; Reverse-complemented: {} times".format(reverse_complemented))
else:
print_s()
=====================================
tests/cut/maxee.fastq
=====================================
@@ -0,0 +1,8 @@
+ at empty
+
++
+
+ at ee_0.8
+ACGTTGCA
++
+++++++++
=====================================
tests/data/SRR2040271_1.fastq deleted
=====================================
@@ -1,8 +0,0 @@
- at SRR2040271.1 SN603_WBP007_8_1101_63.30_99.90 length=100
-NTCATTCCATGACATTGTCTGTTGGTTGCTTTTTGAGTATATTTTCTCATGGCTTCATCTATCTTGCTCATAAGACTAAATGGGGAGACAGACTTCCTGG
-+SRR2040271.1 SN603_WBP007_8_1101_63.30_99.90 length=100
-!1=DDFFFGHHHHIJJJJJIIJJJGGHJGJIJJGBGGHEGHJIIIIEHIJEH>@G;FHCG<<BCG:CG at F4@D@=@DG>EE>EED(,,(;((,;;@A>CC
- at SRR2040271.2 SN603_WBP007_8_1101_79.90_99.30 length=100
-NTAAAGACCCTTCAACTCAGGGCTTCTGAACCCTGGCTACCCATTAGAATCACTTAGGGAGCTTTGAAAAATATCAACACCCCAGCCCTAACCCAGCCCA
-+SRR2040271.2 SN603_WBP007_8_1101_79.90_99.30 length=100
-!4=BBDDDFHHHH at FHIIIIIIIIIGIIHGIIIIIIIIHIIIIIICEGGIIIIIIGHIIEGHIGIIIHHHFHEEEEEEECC=B=@BBBCCCBBBBA(88?
=====================================
tests/data/maxee.fastq
=====================================
@@ -0,0 +1,16 @@
+ at empty
+
++
+
+ at ee_1
+A
++
+!
+ at ee_0.8
+ACGTTGCA
++
+++++++++
+ at ee_1.01
+TGGACGTTGCA
++
++5+++++++++
=====================================
tests/data/plus.fastq deleted
=====================================
@@ -1,8 +0,0 @@
- at first_sequence some other text
-SEQUENCE1
-+first_sequence some other text
-:6;;8<=:<
- at second_sequence and more text
-SEQUENCE2
-+second_sequence and more text
-83<??:(61
=====================================
tests/test_adapters.py
=====================================
@@ -1,14 +1,14 @@
import pytest
from dnaio import Sequence
-from cutadapt.adapters import SingleAdapter, SingleMatch, Where, LinkedAdapter
+from cutadapt.adapters import SingleAdapter, SingleMatch, Where, LinkedAdapter, WhereToRemove
def test_issue_52():
adapter = SingleAdapter(
sequence='GAACTCCAGTCACNNNNN',
where=Where.BACK,
- remove='suffix',
+ remove=WhereToRemove.SUFFIX,
max_error_rate=0.12,
min_overlap=5,
read_wildcards=False,
@@ -45,7 +45,7 @@ def test_issue_80():
adapter = SingleAdapter(
sequence="TCGTATGCCGTCTTC",
where=Where.BACK,
- remove='suffix',
+ remove=WhereToRemove.SUFFIX,
max_error_rate=0.2,
min_overlap=3,
read_wildcards=False,
@@ -58,7 +58,7 @@ def test_issue_80():
def test_str():
- a = SingleAdapter('ACGT', where=Where.BACK, remove='suffix', max_error_rate=0.1)
+ a = SingleAdapter('ACGT', where=Where.BACK, remove=WhereToRemove.SUFFIX, max_error_rate=0.1)
str(a)
str(a.match_to(Sequence(name='seq', sequence='TTACGT')))
@@ -91,7 +91,7 @@ def test_info_record():
am = SingleMatch(astart=0, astop=17, rstart=5, rstop=21, matches=15, errors=2, remove_before=False,
adapter=adapter, read=read)
assert am.get_info_records() == [[
- "abc",
+ "",
2,
5,
21,
=====================================
tests/test_commandline.py
=====================================
@@ -9,7 +9,7 @@ import pytest
import subprocess
from cutadapt.__main__ import main
-from utils import assert_files_equal, datapath, cutpath, redirect_stderr
+from utils import assert_files_equal, datapath, cutpath
# pytest.mark.timeout will not fail even if pytest-timeout is not installed
try:
@@ -69,19 +69,6 @@ def test_discard_untrimmed(run):
run('-b CAAGAT --discard-untrimmed', 'discard-untrimmed.fastq', 'small.fastq')
- at pytest.mark.skip(reason='Regression since switching to dnaio')
-def test_second_header_retained(run, cores):
- """test if sequence name after the "+" is retained"""
- run("--cores {} -e 0.12 -b TTAGACATATCTCCGTCG".format(cores), "plus.fastq", "plus.fastq")
-
-
- at pytest.mark.skip(reason='Regression since switching to dnaio')
-def test_length_tag_second_header(run, cores):
- """Ensure --length-tag= also modifies the second header line"""
- run("--cores {} -a GGCTTC --length-tag=length=".format(cores),
- 'SRR2040271_1.fastq', 'SRR2040271_1.fastq')
-
-
def test_extensiontxtgz(run):
"""automatic recognition of "_sequence.txt.gz" extension"""
run("-b TTAGACATATCTCCGTCG", "s_1_sequence.txt", "s_1_sequence.txt.gz")
@@ -343,8 +330,16 @@ def test_adapter_with_u(run):
run("-a GCCGAACUUCUUAGACUGCCUUAAGGACGU", "illumina.fastq", "illumina.fastq.gz")
-def test_bzip2(run):
- run('-b TTAGACATATCTCCGTCG', 'small.fastq', 'small.fastq.bz2')
+def test_bzip2_input(run, cores):
+ run(["--cores", str(cores), "-a", "TTAGACATATCTCCGTCG"], "small.fastq", "small.fastq.bz2")
+
+
+ at pytest.mark.parametrize("extension", ["bz2", "xz", "gz"])
+def test_compressed_output(tmp_path, cores, extension):
+ out_path = str(tmp_path / ("small.fastq." + extension))
+ params = [
+ "--cores", str(cores), "-a", "TTAGACATATCTCCGTCG", "-o", out_path, datapath("small.fastq")]
+ assert main(params) is None
if sys.version_info[:2] >= (3, 3):
@@ -358,14 +353,12 @@ def test_xz(run):
def test_no_args():
with pytest.raises(SystemExit):
- with redirect_stderr():
- main([])
+ main([])
def test_two_fastqs():
with pytest.raises(SystemExit):
- with redirect_stderr():
- main([datapath('paired.1.fastq'), datapath('paired.2.fastq')])
+ main([datapath('paired.1.fastq'), datapath('paired.2.fastq')])
def test_anchored_no_indels(run):
@@ -386,8 +379,7 @@ def test_anchored_no_indels_wildcard_adapt(run):
def test_non_iupac_characters(run):
with pytest.raises(SystemExit):
- with redirect_stderr():
- main(['-a', 'ZACGT', datapath('small.fastq')])
+ main(['-a', 'ZACGT', datapath('small.fastq')])
def test_unconditional_cut_front(run):
@@ -544,20 +536,17 @@ def test_linked_info_file(tmpdir):
def test_linked_anywhere():
with pytest.raises(SystemExit):
- with redirect_stderr():
- main(['-b', 'AAA...TTT', datapath('linked.fasta')])
+ main(['-b', 'AAA...TTT', datapath('linked.fasta')])
def test_anywhere_anchored_5p():
with pytest.raises(SystemExit):
- with redirect_stderr():
- main(['-b', '^AAA', datapath('small.fastq')])
+ main(['-b', '^AAA', datapath('small.fastq')])
def test_anywhere_anchored_3p():
with pytest.raises(SystemExit):
- with redirect_stderr():
- main(['-b', 'TTT$', datapath('small.fastq')])
+ main(['-b', 'TTT$', datapath('small.fastq')])
def test_fasta(run):
@@ -709,15 +698,41 @@ def test_adapter_order(run):
run("-a CCGGG -g ^AAACC", "adapterorder-ag.fasta", "adapterorder.fasta")
- at pytest.mark.skip(reason="Not implemented")
-def test_reverse_complement_not_normalized(run):
- run("--rc=yes -g ^TTATTTGTCT -g ^TCCGCACTGG",
- "revcomp-notnormalized-single.fastq", "revcomp.1.fastq")
-
-
def test_reverse_complement_normalized(run):
run(
"--revcomp -g ^TTATTTGTCT -g ^TCCGCACTGG",
"revcomp-single-normalize.fastq",
"revcomp.1.fastq",
)
+
+
+def test_reverse_complement_and_info_file(run, tmp_path, cores):
+ info_path = str(tmp_path / "info.txt")
+ run(
+ [
+ "--revcomp",
+ "-g",
+ "^TTATTTGTCT",
+ "-g",
+ "^TCCGCACTGG",
+ "--info-file",
+ info_path,
+ ],
+ "revcomp-single-normalize.fastq",
+ "revcomp.1.fastq",
+ )
+ with open(info_path) as f:
+ lines = f.readlines()
+ assert len(lines) == 6
+ assert lines[0].split("\t")[0] == "read1/1"
+ assert lines[1].split("\t")[0] == "read2/1 rc"
+
+
+def test_max_expected_errors(run, cores):
+ run("--max-ee=0.9", "maxee.fastq", "maxee.fastq")
+
+
+def test_max_expected_errors_fasta(tmp_path):
+ path = tmp_path / "input.fasta"
+ path.write_text(">read\nACGTACGT\n")
+ main(["--max-ee=0.001", "-o", "/dev/null", str(path)])
=====================================
tests/test_main.py
=====================================
@@ -1,9 +1,46 @@
import pytest
-from cutadapt.__main__ import main
+from cutadapt.__main__ import main, parse_cutoffs, parse_lengths, CommandLineError, setup_logging
def test_help():
with pytest.raises(SystemExit) as e:
main(["--help"])
assert e.value.args[0] == 0
+
+
+def test_parse_cutoffs():
+ assert parse_cutoffs("5") == (0, 5)
+ assert parse_cutoffs("6,7") == (6, 7)
+ with pytest.raises(CommandLineError):
+ parse_cutoffs("a,7")
+ with pytest.raises(CommandLineError):
+ parse_cutoffs("a")
+ with pytest.raises(CommandLineError):
+ parse_cutoffs("a,7")
+ with pytest.raises(CommandLineError):
+ parse_cutoffs("1,2,3")
+
+
+def test_parse_lengths():
+ assert parse_lengths("25") == (25, )
+ assert parse_lengths("17:25") == (17, 25)
+ assert parse_lengths("25:") == (25, None)
+ assert parse_lengths(":25") == (None, 25)
+ with pytest.raises(CommandLineError):
+ parse_lengths("1:2:3")
+ with pytest.raises(CommandLineError):
+ parse_lengths("a:2")
+ with pytest.raises(CommandLineError):
+ parse_lengths("a")
+ with pytest.raises(CommandLineError):
+ parse_lengths("2:a")
+ with pytest.raises(CommandLineError):
+ parse_lengths(":")
+
+
+def test_setup_logging():
+ import logging
+ logger = logging.getLogger(__name__)
+ setup_logging(logger, stdout=True, quiet=False, minimal=False, debug=False)
+ logger.info("Log message")
=====================================
tests/test_paired.py
=====================================
@@ -5,7 +5,7 @@ from itertools import product
import pytest
from cutadapt.__main__ import main
-from utils import assert_files_equal, datapath, cutpath, redirect_stderr
+from utils import assert_files_equal, datapath, cutpath
@pytest.fixture
@@ -131,9 +131,8 @@ def test_no_trimming():
def test_missing_file(tmpdir):
- with redirect_stderr():
- with pytest.raises(SystemExit):
- main(["--paired-output", str(tmpdir.join("out.fastq")), datapath("paired.1.fastq")])
+ with pytest.raises(SystemExit):
+ main(["--paired-output", str(tmpdir.join("out.fastq")), datapath("paired.1.fastq")])
def test_first_too_short(tmpdir, cores):
@@ -144,14 +143,13 @@ def test_first_too_short(tmpdir, cores):
lines = lines[:-4]
trunc1.write("".join(lines))
- with redirect_stderr():
- with pytest.raises(SystemExit):
- main([
- "-o", "/dev/null",
- "--paired-output", str(tmpdir.join("out.fastq")),
- "--cores", str(cores),
- str(trunc1), datapath("paired.2.fastq")
- ])
+ with pytest.raises(SystemExit):
+ main([
+ "-o", "/dev/null",
+ "--paired-output", str(tmpdir.join("out.fastq")),
+ "--cores", str(cores),
+ str(trunc1), datapath("paired.2.fastq")
+ ])
def test_second_too_short(tmpdir, cores):
@@ -162,14 +160,13 @@ def test_second_too_short(tmpdir, cores):
lines = lines[:-4]
trunc2.write("".join(lines))
- with redirect_stderr():
- with pytest.raises(SystemExit):
- main([
- "-o", "/dev/null",
- "--paired-output", str(tmpdir.join("out.fastq")),
- "--cores", str(cores),
- datapath("paired.1.fastq"), str(trunc2)
- ])
+ with pytest.raises(SystemExit):
+ main([
+ "-o", "/dev/null",
+ "--paired-output", str(tmpdir.join("out.fastq")),
+ "--cores", str(cores),
+ datapath("paired.1.fastq"), str(trunc2)
+ ])
def test_unmatched_read_names(tmpdir, cores):
@@ -327,10 +324,9 @@ def test_interleaved_neither_nor(tmpdir):
p1 = str(tmpdir.join("temp-paired.1.fastq"))
p2 = str(tmpdir.join("temp-paired.2.fastq"))
params = "-a XX --interleaved".split()
- with redirect_stderr():
- params += ["-o", p1, "-p1", p2, "paired.1.fastq", "paired.2.fastq"]
- with pytest.raises(SystemExit):
- main(params)
+ params += ["-o", p1, "-p1", p2, "paired.1.fastq", "paired.2.fastq"]
+ with pytest.raises(SystemExit):
+ main(params)
def test_pair_filter_both(run_paired, cores):
@@ -504,25 +500,40 @@ def test_pair_adapters_demultiplexing(tmpdir):
assert_files_equal(cutpath(name), str(tmpdir.join(name)))
-def test_combinatorial_demultiplexing(tmpdir):
+ at pytest.mark.parametrize("discarduntrimmed", (False, True))
+def test_combinatorial_demultiplexing(tmpdir, discarduntrimmed):
params = "-g A=^AAAAAAAAAA -g C=^CCCCCCCCCC -G G=^GGGGGGGGGG -G T=^TTTTTTTTTT".split()
params += ["-o", str(tmpdir.join("combinatorial.{name1}_{name2}.1.fasta"))]
params += ["-p", str(tmpdir.join("combinatorial.{name1}_{name2}.2.fasta"))]
params += [datapath("combinatorial.1.fasta"), datapath("combinatorial.2.fasta")]
+ combinations = [
+ # third column says whether the file must exist
+ ("A", "G", True),
+ ("A", "T", True),
+ ("C", "G", True),
+ ("C", "T", True),
+ ]
+ if discarduntrimmed:
+ combinations += [
+ ("unknown", "G", False),
+ ("A", "unknown", False),
+ ]
+ params += ["--discard-untrimmed"]
+ else:
+ combinations += [
+ ("unknown", "G", True),
+ ("A", "unknown", True),
+ ]
assert main(params) is None
- for (name1, name2) in [
- ("A", "G"),
- ("A", "T"),
- ("C", "G"),
- ("C", "T"),
- ("unknown", "G"),
- ("A", "unknown"),
- ]:
+ for (name1, name2, should_exist) in combinations:
for i in (1, 2):
name = "combinatorial.{name1}_{name2}.{i}.fasta".format(name1=name1, name2=name2, i=i)
path = cutpath(os.path.join("combinatorial", name))
- assert tmpdir.join(name).check(), "Output file missing"
- assert_files_equal(path, str(tmpdir.join(name)))
+ if should_exist:
+ assert tmpdir.join(name).check(), "Output file missing"
+ assert_files_equal(path, str(tmpdir.join(name)))
+ else:
+ assert not tmpdir.join(name).check(), "Output file should not exist"
def test_info_file(tmpdir):
=====================================
tests/test_parser.py
=====================================
@@ -4,6 +4,7 @@ import pytest
from dnaio import Sequence
from cutadapt.adapters import Where, WhereToRemove, LinkedAdapter, SingleAdapter
from cutadapt.parser import AdapterParser, AdapterSpecification
+from cutadapt.modifiers import ModificationInfo
def test_expand_braces():
@@ -178,5 +179,5 @@ def test_anywhere_parameter():
read = Sequence('foo1', 'TGAAGTACACGGTTAAAAAAAAAA')
from cutadapt.modifiers import AdapterCutter
cutter = AdapterCutter([adapter])
- trimmed_read = cutter(read, [])
+ trimmed_read = cutter(read, ModificationInfo())
assert trimmed_read.sequence == ''
=====================================
tests/test_qualtrim.py
=====================================
@@ -1,5 +1,7 @@
+import pytest
+
from dnaio import Sequence
-from cutadapt.qualtrim import nextseq_trim_index
+from cutadapt.qualtrim import nextseq_trim_index, expected_errors
def test_nextseq_trim():
@@ -11,3 +13,22 @@ def test_nextseq_trim():
'AA//EAEE//A6///E//A//EA/EEEEEEAEA//EEEEEEEEEEEEEEE###########EE#EA'
)
assert nextseq_trim_index(s, cutoff=22) == 33
+
+
+def test_expected_errors():
+ def encode_qualities(quals):
+ return "".join(chr(q + 33) for q in quals)
+
+ assert pytest.approx(0.0) == expected_errors("")
+
+ assert pytest.approx(0.1) == expected_errors(encode_qualities([10]))
+ assert pytest.approx(0.01) == expected_errors(encode_qualities([20]))
+ assert pytest.approx(0.001) == expected_errors(encode_qualities([30]))
+
+ assert pytest.approx(0.2) == expected_errors(encode_qualities([10, 10]))
+ assert pytest.approx(0.11) == expected_errors(encode_qualities([10, 20]))
+ assert pytest.approx(0.11) == expected_errors(encode_qualities([20, 10]))
+
+ assert pytest.approx(0.3) == expected_errors(encode_qualities([10, 10, 10]))
+ assert pytest.approx(0.111) == expected_errors(encode_qualities([10, 20, 30]))
+ assert pytest.approx(0.2111) == expected_errors(encode_qualities([10, 10, 20, 30, 40]))
=====================================
tests/test_trim.py
=====================================
@@ -1,13 +1,13 @@
from dnaio import Sequence
from cutadapt.adapters import SingleAdapter, Where
-from cutadapt.modifiers import AdapterCutter
+from cutadapt.modifiers import AdapterCutter, ModificationInfo
def test_statistics():
read = Sequence('name', 'AAAACCCCAAAA')
adapters = [SingleAdapter('CCCC', Where.BACK, max_error_rate=0.1)]
cutter = AdapterCutter(adapters, times=3)
- cutter(read, [])
+ cutter(read, ModificationInfo())
# TODO make this a lot simpler
trimmed_bp = 0
for adapter in adapters:
@@ -29,7 +29,7 @@ def test_end_trim_with_mismatch():
read = Sequence('foo1', 'AAAAAAAAAAATCGTCGATC')
cutter = AdapterCutter([adapter], times=1)
- trimmed_read = cutter(read, [])
+ trimmed_read = cutter(read, ModificationInfo())
assert trimmed_read.sequence == 'AAAAAAAAAAA'
assert cutter.adapter_statistics[adapter].back.lengths == {9: 1}
@@ -39,7 +39,7 @@ def test_end_trim_with_mismatch():
read = Sequence('foo2', 'AAAAAAAAAAATCGAACGA')
cutter = AdapterCutter([adapter], times=1)
- trimmed_read = cutter(read, [])
+ trimmed_read = cutter(read, ModificationInfo())
assert trimmed_read.sequence == read.sequence
assert cutter.adapter_statistics[adapter].back.lengths == {}
@@ -57,5 +57,5 @@ def test_anywhere_with_errors():
):
read = Sequence('foo', seq)
cutter = AdapterCutter([adapter], times=1)
- trimmed_read = cutter(read, [])
+ trimmed_read = cutter(read, ModificationInfo())
assert trimmed_read.sequence == expected_trimmed
=====================================
tests/utils.py
=====================================
@@ -1,16 +1,5 @@
import os.path
import subprocess
-import sys
-from contextlib import contextmanager
-
-
- at contextmanager
-def redirect_stderr():
- """Send stderr to stdout. Nose doesn't capture stderr, yet."""
- old_stderr = sys.stderr
- sys.stderr = sys.stdout
- yield
- sys.stderr = old_stderr
def datapath(path):
=====================================
tox.ini
=====================================
@@ -8,6 +8,7 @@ deps =
pytest
pytest-timeout
pytest-mock
+setenv = PYTHONDEVMODE = 1
commands =
coverage run --concurrency=multiprocessing -m pytest --doctest-modules --pyargs cutadapt tests
coverage combine
@@ -31,10 +32,6 @@ basepython = python3.6
deps = mypy
commands = mypy src/
-[travis]
-python =
- 3.6: py36, docs, mypy
-
[coverage:run]
parallel = True
include =
@@ -48,6 +45,6 @@ source =
[flake8]
max-line-length = 120
-max-complexity = 20
+max-complexity = 16
select = E,F,W,C90,W504
extend_ignore = E128,E131,W503,E203
View it on GitLab: https://salsa.debian.org/med-team/python-cutadapt/-/commit/afc23835c3b1b22776876ed5f9011295b0059b30
--
View it on GitLab: https://salsa.debian.org/med-team/python-cutadapt/-/commit/afc23835c3b1b22776876ed5f9011295b0059b30
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20200331/a676ad69/attachment-0001.html>
More information about the debian-med-commit
mailing list