[med-svn] [Git][med-team/python-cutadapt][upstream] New upstream version 2.9

Steffen Möller gitlab at salsa.debian.org
Tue Mar 31 15:32:33 BST 2020



Steffen Möller pushed to branch upstream at Debian Med / python-cutadapt


Commits:
afc23835 by Steffen Moeller at 2020-03-31T16:14:54+02:00
New upstream version 2.9
- - - - -


28 changed files:

- .gitignore
- .travis.yml
- CHANGES.rst
- LICENSE
- doc/conf.py
- doc/guide.rst
- doc/installation.rst
- pyproject.toml
- src/cutadapt/__main__.py
- src/cutadapt/adapters.py
- src/cutadapt/filters.py
- src/cutadapt/modifiers.py
- src/cutadapt/pipeline.py
- src/cutadapt/qualtrim.pyx
- src/cutadapt/report.py
- + tests/cut/maxee.fastq
- − tests/data/SRR2040271_1.fastq
- + tests/data/maxee.fastq
- − tests/data/plus.fastq
- tests/test_adapters.py
- tests/test_commandline.py
- tests/test_main.py
- tests/test_paired.py
- tests/test_parser.py
- tests/test_qualtrim.py
- tests/test_trim.py
- tests/utils.py
- tox.ini


Changes:

=====================================
.gitignore
=====================================
@@ -15,3 +15,4 @@ __pycache__/
 /doc/_build
 /src/cutadapt.egg-info/
 src/cutadapt/_version.py
+.mpypy_cache


=====================================
.travis.yml
=====================================
@@ -1,20 +1,11 @@
 language: python
 
-dist: xenial
-
 cache:
   directories:
     - $HOME/.cache/pip
 
-python:
-  - "3.5"
-  - "3.6"
-  - "3.7"
-  - "3.8"
-  - "nightly"
-
 install:
-  - pip install 'Cython>=0.28' tox-travis
+  - pip install tox
 
 script:
   - tox
@@ -31,6 +22,30 @@ env:
 
 jobs:
   include:
+    - python: "3.5"
+      env: TOXENV=py35
+
+    - python: "3.6"
+      env: TOXENV=py36
+
+    - python: "3.7"
+      env: TOXENV=py37
+
+    - python: "3.8"
+      env: TOXENV=py38
+
+    - name: flake8
+      python: "3.6"
+      env: TOXENV=flake8
+
+    - name: mypy
+      python: "3.6"
+      env: TOXENV=mypy
+
+    - name: docs
+      python: "3.6"
+      env: TOXENV=docs
+
     - stage: deploy
       services:
         - docker
@@ -44,10 +59,5 @@ jobs:
           ls -l dist/
           python3 -m twine upload dist/*
 
-    - name: flake8
-      python: "3.6"
-      install: python3 -m pip install flake8
-      script: flake8 src/ tests/
-
   allow_failures:
     - python: "nightly"


=====================================
CHANGES.rst
=====================================
@@ -2,6 +2,17 @@
 Changes
 =======
 
+v2.9 (2020-03-18)
+-----------------
+
+* :issue:`441`: Add a ``--max-ee`` (or ``--max-expected-errors``) option
+  for filtering reads whose number of expected errors exceeds the given
+  threshold. The idea comes from
+  `Edgar et al. (2015) <https://academic.oup.com/bioinformatics/article/31/21/3476/194979>`_.
+* :issue:`438`: The info file now contains the `` rc`` suffix that is added to
+  the names of reverse-complemented reads (with ``--revcomp``).
+* :issue:`448`: ``.bz2`` and ``.xz`` output wasn’t possible in multi-core mode.
+
 v2.8 (2020-01-13)
 -----------------
 


=====================================
LICENSE
=====================================
@@ -1,4 +1,4 @@
-Copyright (c) 2010-2019 Marcel Martin <marcel.martin at scilifelab.se>
+Copyright (c) 2010-2020 Marcel Martin <marcel.martin at scilifelab.se>
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal


=====================================
doc/conf.py
=====================================
@@ -14,6 +14,7 @@
 
 import sys
 import os
+import time
 
 # If extensions (or modules to document with autodoc) are in another directory,
 # add these directories to sys.path here. If the directory is relative to the
@@ -46,8 +47,8 @@ source_suffix = '.rst'
 master_doc = 'index'
 
 # General information about the project.
-project = u'Cutadapt'
-copyright = u'2010-2019, Marcel Martin'
+project = 'Cutadapt'
+copyright = '2010-{}, Marcel Martin'.format(time.gmtime().tm_year)
 
 # The version info for the project you're documenting, acts as replacement for
 # |version| and |release|, also used in various other places throughout the


=====================================
doc/guide.rst
=====================================
@@ -807,10 +807,12 @@ Searching reverse complements
 -----------------------------
 
 .. note::
-    Option ``--revcomp`` is added on a tentative basis. Its behaviour
+    Option ``--revcomp`` is added on a tentative basis. Its behaviour may change in the next
+    releases.
 
 
 By default, Cutadapt expects adapters to be given in the same orientation (5' to 3') as the reads.
+That is, neither reads nor adapters are reverse-complemented.
 
 To change this, use option ``--revcomp`` or its abbreviation ``--rc``. If given, Cutadapt searches
 both the read and its reverse complement for adapters. If the reverse complemented read yields
@@ -1122,6 +1124,12 @@ reads. They always discard those reads for which the filtering criterion applies
     Discard reads with more than COUNT ``N`` bases. If ``COUNT_or_FRACTION`` is
     a number between 0 and 1, it is interpreted as a fraction of the read length
 
+``--max-expected-errors ERRORS`` or ``--max-ee ERRORS``
+    Discard reads with more than ERRORS expected errors. The number of expected
+    errors is computed as described in
+    `Edgar et al. (2015) <https://academic.oup.com/bioinformatics/article/31/21/3476/194979>`_,
+    (Section 2.2).
+
 ``--discard-casava``
     Discard reads that did not pass CASAVA filtering. Illumina’s CASAVA pipeline in
     version 1.8 adds an *is_filtered* header field to each read. Specifying this


=====================================
doc/installation.rst
=====================================
@@ -11,7 +11,7 @@ Quick installation
 
 The easiest way to install Cutadapt is to use ``pip3`` on the command line::
 
-    pip3 install --user --upgrade cutadapt
+    python3 -m pip install --user --upgrade cutadapt
 
 This will download the software from `PyPI (the Python packaging
 index) <https://pypi.python.org/pypi/cutadapt/>`_, and
@@ -28,14 +28,30 @@ If you want to avoid typing the full path, add the directory
 Installation with conda
 -----------------------
 
-Alternatively, Cutadapt is available as a conda package from the
-`bioconda channel <https://bioconda.github.io/>`_. If you do not have conda,
-`install miniconda <http://conda.pydata.org/miniconda.html>`_ first.
-Then install Cutadapt like this::
+Alternatively, Cutadapt is available as a Conda package from the
+`bioconda channel <https://bioconda.github.io/>`_.
+`Install miniconda <http://conda.pydata.org/miniconda.html>`_ if
+you don’t have Conda. Then follow the `Bioconda installation
+instructions <https://bioconda.github.io/user/install.html>`_ (in particular,
+make sure you have both `bioconda` and `conda-forge` in your channels list).
 
-    conda install -c bioconda cutadapt
+To then install Cutadapt into a new Conda environment, use this command::
 
-If neither ``pip`` nor ``conda`` installation works, keep reading.
+    conda create -n cutadaptenv cutadapt
+
+Here, ``cutadaptenv`` is the name of the Conda environment. (You can
+choose a different name.)
+
+An environment needs to be activated every time you want to use the
+programs in it::
+
+    conda activate cutadaptenv
+
+Finally, check whether it worked::
+
+    cutadapt --version
+
+This should show the Cutadapt version number.
 
 
 Installation on a Debian-based Linux distribution


=====================================
pyproject.toml
=====================================
@@ -1,2 +1,5 @@
 [build-system]
 requires = ["setuptools", "wheel", "setuptools_scm", "cython"]
+
+[black.tool]
+line-length = 100


=====================================
src/cutadapt/__main__.py
=====================================
@@ -1,6 +1,6 @@
 #!/usr/bin/env python
 #
-# Copyright (c) 2010-2019 Marcel Martin <marcel.martin at scilifelab.se>
+# Copyright (c) 2010-2020 Marcel Martin <marcel.martin at scilifelab.se>
 #
 # Permission is hereby granted, free of charge, to any person obtaining a copy
 # of this software and associated documentation files (the "Software"), to deal
@@ -23,7 +23,7 @@
 """
 cutadapt version {version}
 
-Copyright (C) 2010-2019 Marcel Martin <marcel.martin at scilifelab.se>
+Copyright (C) 2010-2020 Marcel Martin <marcel.martin at scilifelab.se>
 
 cutadapt removes adapter sequences from high-throughput sequencing reads..
 
@@ -57,7 +57,7 @@ import sys
 import time
 import logging
 import platform
-from typing import Tuple, Optional, Sequence, List, Any, Iterator, Union
+from typing import Tuple, Optional, Sequence, List, Any, Iterator, Union, Type
 from argparse import ArgumentParser, SUPPRESS, HelpFormatter
 
 import dnaio
@@ -65,13 +65,13 @@ import dnaio
 from cutadapt import __version__
 from cutadapt.adapters import warn_duplicate_adapters, Adapter
 from cutadapt.parser import AdapterParser
-from cutadapt.modifiers import (Modifier, LengthTagModifier, SuffixRemover, PrefixSuffixAdder,
+from cutadapt.modifiers import (SingleEndModifier, LengthTagModifier, SuffixRemover, PrefixSuffixAdder,
     ZeroCapper, QualityTrimmer, UnconditionalCutter, NEndTrimmer, AdapterCutter,
     PairedAdapterCutterError, PairedAdapterCutter, NextseqQualityTrimmer, Shortener,
     ReverseComplementer)
 from cutadapt.report import full_report, minimal_report
 from cutadapt.pipeline import (Pipeline, SingleEndPipeline, PairedEndPipeline, InputFiles,
-    OutputFiles, SerialPipelineRunner, ParallelPipelineRunner)
+    OutputFiles, PipelineRunner, SerialPipelineRunner, ParallelPipelineRunner)
 from cutadapt.utils import available_cpu_count, Progress, DummyProgress, FileOpener
 from cutadapt.log import setup_logging, REPORT
 
@@ -248,6 +248,10 @@ def get_argument_parser() -> ArgumentParser:
     group.add_argument("--max-n", type=float, default=None, metavar="COUNT",
         help="Discard reads with more than COUNT 'N' bases. If COUNT is a number "
              "between 0 and 1, it is interpreted as a fraction of the read length.")
+    group.add_argument("--max-expected-errors", "--max-ee", type=float, default=None,
+        metavar="ERRORS",
+        help="Discard reads whose expected number of errors (computed "
+            "from quality values) exceeds ERRORS.")
     group.add_argument("--discard-trimmed", "--discard", action='store_true', default=False,
         help="Discard reads that contain an adapter. Use also -O to avoid "
             "discarding too many randomly matching reads.")
@@ -672,6 +676,7 @@ def pipeline_from_parsed_args(args, paired, file_opener) -> Pipeline:
                 lengths = (lengths[0], lengths[0])
             setattr(pipeline, attr, lengths)
     pipeline.max_n = args.max_n
+    pipeline.max_expected_errors = args.max_expected_errors
     pipeline.discard_casava = args.discard_casava
     pipeline.discard_trimmed = args.discard_trimmed
     pipeline.discard_untrimmed = args.discard_untrimmed
@@ -764,7 +769,7 @@ def add_adapter_cutter(
             pipeline.add(modifier)
 
 
-def modifiers_applying_to_both_ends_if_paired(args) -> Iterator[Modifier]:
+def modifiers_applying_to_both_ends_if_paired(args) -> Iterator[SingleEndModifier]:
     if args.length is not None:
         yield Shortener(args.length)
     if args.trim_n:
@@ -836,6 +841,11 @@ def main(cmdlineargs=None, default_outfile=sys.stdout.buffer):
     cores = available_cpu_count() if args.cores == 0 else args.cores
     file_opener = FileOpener(
         compression_level=args.compression_level, threads=0 if cores == 1 else None)
+    if sys.stderr.isatty() and not args.quiet:
+        progress = Progress()
+    else:
+        progress = DummyProgress()
+
     try:
         is_interleaved_input, is_interleaved_output = determine_interleaved(args)
         input_filename, input_paired_filename = setup_input_files(args.inputs,
@@ -843,40 +853,19 @@ def main(cmdlineargs=None, default_outfile=sys.stdout.buffer):
         check_arguments(args, paired, is_interleaved_output)
         pipeline = pipeline_from_parsed_args(args, paired, file_opener)
         outfiles = open_output_files(args, default_outfile, is_interleaved_output, file_opener)
+        infiles = InputFiles(input_filename, file2=input_paired_filename,
+            interleaved=is_interleaved_input)
+        runner = setup_runner(pipeline, infiles, outfiles, progress, cores, args.buffer_size)
     except CommandLineError as e:
         parser.error(str(e))
         return  # avoid IDE warnings below
 
-    if cores > 1:
-        if ParallelPipelineRunner.can_output_to(outfiles):
-            runner_class = ParallelPipelineRunner
-            runner_kwargs = dict(n_workers=cores, buffer_size=args.buffer_size)
-        else:
-            parser.error("Running in parallel is currently not supported "
-                "when using --format or when demultiplexing.\n"
-                "Omit --cores/-j to continue.")
-            return  # avoid IDE warnings below
-    else:
-        runner_class = SerialPipelineRunner
-        runner_kwargs = dict()
-    infiles = InputFiles(input_filename, file2=input_paired_filename,
-            interleaved=is_interleaved_input)
-    if sys.stderr.isatty() and not args.quiet:
-        progress = Progress()
-    else:
-        progress = DummyProgress()
-    try:
-        runner = runner_class(pipeline, infiles, outfiles, progress, **runner_kwargs)
-    except (dnaio.UnknownFileFormat, IOError) as e:
-        parser.error(e)
-        return  # avoid IDE warnings below
-
     logger.info("Processing reads on %d core%s in %s mode ...",
         cores, 's' if cores > 1 else '',
         {False: 'single-end', True: 'paired-end'}[pipeline.paired])
     try:
-        stats = runner.run()
-        runner.close()
+        with runner as r:
+            stats = r.run()
     except KeyboardInterrupt:
         print("Interrupted", file=sys.stderr)
         sys.exit(130)
@@ -897,5 +886,26 @@ def main(cmdlineargs=None, default_outfile=sys.stdout.buffer):
         pstats.Stats(profiler).sort_stats('time').print_stats(20)
 
 
+def setup_runner(pipeline: Pipeline, infiles, outfiles, progress, cores, buffer_size):
+    if cores > 1:
+        if ParallelPipelineRunner.can_output_to(outfiles):
+            runner_class = ParallelPipelineRunner  # type: Type[PipelineRunner]
+            runner_kwargs = dict(n_workers=cores, buffer_size=buffer_size)
+        else:
+            raise CommandLineError("Running in parallel is currently not supported "
+                 "when using --format or when demultiplexing.\n"
+                 "Omit --cores/-j to continue.")
+            # return  # avoid IDE warnings below
+    else:
+        runner_class = SerialPipelineRunner
+        runner_kwargs = dict()
+    try:
+        runner = runner_class(pipeline, infiles, outfiles, progress, **runner_kwargs)
+    except (dnaio.UnknownFileFormat, IOError) as e:
+        raise CommandLineError(e)
+
+    return runner
+
+
 if __name__ == '__main__':
     main()


=====================================
src/cutadapt/adapters.py
=====================================
@@ -181,6 +181,7 @@ class AdapterStatistics:
 
 
 class Match(ABC):
+
     @abstractmethod
     def remainder_interval(self) -> Tuple[int, int]:
         pass
@@ -279,7 +280,7 @@ class SingleMatch(Match):
         seq = self.read.sequence
         qualities = self.read.qualities
         info = [
-            self.read.name,
+            "",
             self.errors,
             self.rstart,
             self.rstop,
@@ -381,6 +382,7 @@ class SingleAdapter(Adapter):
         indels: bool = True,
     ):
         super().__init__()
+        assert not isinstance(remove, str)
         self._debug = False  # type: bool
         self.name = _generate_adapter_name() if name is None else name  # type: str
         self.sequence = sequence.upper().replace("U", "T")  # type: str
@@ -703,8 +705,16 @@ class MultiAdapter(Adapter):
             self._accept(adapter)
             if adapter.where is not self._where:
                 raise ValueError("All adapters must have identical 'where' attributes")
+        assert self._where in (Where.PREFIX, Where.SUFFIX)
         self._adapters = adapters
-        self._longest, self._index = self._make_index()
+        self._lengths, self._index = self._make_index()
+        if self._where is Where.PREFIX:
+            def make_affix(read, n):
+                return read.sequence[:n]
+        else:
+            def make_affix(read, n):
+                return read.sequence[-n:]
+        self._make_affix = make_affix
 
     def __repr__(self):
         return 'MultiAdapter(adapters={!r}, where={})'.format(self._adapters, self._where)
@@ -741,7 +751,7 @@ class MultiAdapter(Adapter):
     def _make_index(self):
         logger.info('Building index of %s adapters ...', len(self._adapters))
         index = dict()
-        longest = 0
+        lengths = set()
         has_warned = False
         for adapter in self._adapters:
             sequence = adapter.sequence
@@ -762,35 +772,27 @@ class MultiAdapter(Adapter):
                         has_warned = True
                 else:
                     index[s] = (adapter, errors, matches)
-                longest = max(longest, len(s))
+                lengths.add(len(s))
         logger.info('Built an index containing %s strings.', len(index))
 
-        return longest, index
+        return sorted(lengths, reverse=True), index
 
     def match_to(self, read):
         """
         Match the adapters against the read and return a Match that represents
         the best match or None if no match was found
         """
-        if self._where is Where.PREFIX:
-            def make_affix(n):
-                return read.sequence[:n]
-        else:
-            def make_affix(n):
-                return read.sequence[-n:]
-
-        # Check all the prefixes of the read that could match
+        # Check all the prefixes or suffixes (affixes) of the read that could match
         best_adapter = None
         best_length = 0
         best_m = -1
         best_e = 1000
-        # TODO do not go through all the lengths, only those that actually exist in the index
-        for length in range(self._longest, -1, -1):
+        for length in self._lengths:
             if length < best_m:
                 # No chance of getting the same or a higher number of matches, so we can stop early
                 break
 
-            affix = make_affix(length)
+            affix = self._make_affix(read, length)
             try:
                 adapter, e, m = self._index[affix]
             except KeyError:


=====================================
src/cutadapt/filters.py
=====================================
@@ -13,9 +13,13 @@ filters is created and each redirector is called in turn until one returns True.
 The read is then assumed to have been "consumed", that is, either written
 somewhere or filtered (should be discarded).
 """
+from collections import Counter
 from abc import ABC, abstractmethod
-from typing import List
-from .adapters import Match
+from typing import List, Tuple, Optional, Dict, Any
+
+from .qualtrim import expected_errors
+from .utils import FileOpener
+from .modifiers import ModificationInfo
 
 
 # Constants used when returning from a Filter’s __call__ method to improve
@@ -26,81 +30,113 @@ KEEP = False
 
 class SingleEndFilter(ABC):
     @abstractmethod
-    def __call__(self, read, matches):
-        pass
+    def __call__(self, read, info: ModificationInfo):
+        """
+        Called to process a single-end read
+
+        Any adapter matches are append to the matches list.
+        """
 
 
 class PairedEndFilter(ABC):
     @abstractmethod
-    def __call__(self, read1, matches1, read2, matches2):
-        pass
+    def __call__(self, read1, read2, info1: ModificationInfo, info2: ModificationInfo):
+        """
+        Called to process the read pair (read1, read2)
+
+        Any adapter matches are append to the info.matches list.
+        """
+
+
+class WithStatistics(ABC):
+    def __init__(self) -> None:
+        self._written = 0  # no. of written reads or read pairs
+        self._written_bp = [0, 0]
+        self._written_lengths = [Counter(), Counter()]  # type: List[Counter]
+
+    def written_reads(self) -> int:
+        return self._written
+
+    def written_bp(self) -> Tuple[int, ...]:
+        return tuple(self._written_bp)
+
+    def written_lengths(self) -> Tuple[Counter, Counter]:
+        return (self._written_lengths[0].copy(), self._written_lengths[1].copy())
+
+
+class SingleEndFilterWithStatistics(SingleEndFilter, WithStatistics, ABC):
+    def __init__(self):
+        super().__init__()
+
+    def update_statistics(self, read) -> None:
+        self._written += 1
+        self._written_bp[0] += len(read)
+        self._written_lengths[0][len(read)] += 1
 
 
-class NoFilter(SingleEndFilter):
+class PairedEndFilterWithStatistics(PairedEndFilter, WithStatistics, ABC):
+    def __init__(self):
+        super().__init__()
+
+    def update_statistics(self, read1, read2):
+        self._written += 1
+        self._written_bp[0] += len(read1)
+        self._written_bp[1] += len(read2)
+        self._written_lengths[0][len(read1)] += 1
+        self._written_lengths[1][len(read2)] += 1
+
+
+class NoFilter(SingleEndFilterWithStatistics):
     """
     No filtering, just send each read to the given writer.
     """
     def __init__(self, writer):
+        super().__init__()
         self.writer = writer
-        self.written = 0  # no of written reads  TODO move to writer
-        self.written_bp = [0, 0]
-
-    @property
-    def filtered(self):
-        return 0
 
-    def __call__(self, read, matches):
+    def __call__(self, read, info: ModificationInfo):
         self.writer.write(read)
-        self.written += 1
-        self.written_bp[0] += len(read)
+        self.update_statistics(read)
         return DISCARD
 
 
-class PairedNoFilter(PairedEndFilter):
+class PairedNoFilter(PairedEndFilterWithStatistics):
     """
     No filtering, just send each paired-end read to the given writer.
     """
     def __init__(self, writer):
+        super().__init__()
         self.writer = writer
-        self.written = 0  # no of written reads or read pairs  TODO move to writer
-        self.written_bp = [0, 0]
 
-    @property
-    def filtered(self):
-        return 0
-
-    def __call__(self, read1, read2, matches1, matches2):
+    def __call__(self, read1, read2, info1: ModificationInfo, info2: ModificationInfo):
         self.writer.write(read1, read2)
-        self.written += 1
-        self.written_bp[0] += len(read1)
-        self.written_bp[1] += len(read2)
+        self.update_statistics(read1, read2)
+
         return DISCARD
 
 
-class Redirector(SingleEndFilter):
+class Redirector(SingleEndFilterWithStatistics):
     """
     Redirect discarded reads to the given writer. This is for single-end reads.
     """
     def __init__(self, writer, filter: SingleEndFilter, filter2=None):
+        super().__init__()
         # TODO filter2 should really not be here
         self.filtered = 0
         self.writer = writer
         self.filter = filter
-        self.written = 0  # no of written reads  TODO move to writer
-        self.written_bp = [0, 0]
 
-    def __call__(self, read, matches):
-        if self.filter(read, matches):
+    def __call__(self, read, info: ModificationInfo):
+        if self.filter(read, info):
             self.filtered += 1
             if self.writer is not None:
                 self.writer.write(read)
-                self.written += 1
-                self.written_bp[0] += len(read)
+                self.update_statistics(read)
             return DISCARD
         return KEEP
 
 
-class PairedRedirector(PairedEndFilter):
+class PairedRedirector(PairedEndFilterWithStatistics):
     """
     Redirect paired-end reads matching a filtering criterion to a writer..
     Different filtering styles are supported, differing by which of the
@@ -113,14 +149,13 @@ class PairedRedirector(PairedEndFilter):
             'both': The pair is discarded if both reads match.
             'first': The pair is discarded if the first read matches.
         """
+        super().__init__()
         if pair_filter_mode not in ('any', 'both', 'first'):
             raise ValueError("pair_filter_mode must be 'any', 'both' or 'first'")
         self.filtered = 0
         self.writer = writer
         self.filter = filter
         self.filter2 = filter2
-        self.written = 0  # no of written reads or read pairs  TODO move to writer
-        self.written_bp = [0, 0]
         if filter2 is None:
             self._is_filtered = self._is_filtered_first
         elif filter is None:
@@ -132,26 +167,24 @@ class PairedRedirector(PairedEndFilter):
         else:
             self._is_filtered = self._is_filtered_first
 
-    def _is_filtered_any(self, read1, read2, matches1, matches2):
-        return self.filter(read1, matches1) or self.filter2(read2, matches2)
+    def _is_filtered_any(self, read1, read2, info1: ModificationInfo, info2: ModificationInfo):
+        return self.filter(read1, info1) or self.filter2(read2, info2)
 
-    def _is_filtered_both(self, read1, read2, matches1, matches2):
-        return self.filter(read1, matches1) and self.filter2(read2, matches2)
+    def _is_filtered_both(self, read1, read2, info1: ModificationInfo, info2: ModificationInfo):
+        return self.filter(read1, info1) and self.filter2(read2, info2)
 
-    def _is_filtered_first(self, read1, read2, matches1, matches2):
-        return self.filter(read1, matches1)
+    def _is_filtered_first(self, read1, read2, info1: ModificationInfo, info2: ModificationInfo):
+        return self.filter(read1, info1)
 
-    def _is_filtered_second(self, read1, read2, matches1, matches2):
-        return self.filter2(read2, matches2)
+    def _is_filtered_second(self, read1, read2, info1: ModificationInfo, info2: ModificationInfo):
+        return self.filter2(read2, info2)
 
-    def __call__(self, read1, read2, matches1, matches2):
-        if self._is_filtered(read1, read2, matches1, matches2):
+    def __call__(self, read1, read2, info1: ModificationInfo, info2: ModificationInfo):
+        if self._is_filtered(read1, read2, info1, info2):
             self.filtered += 1
             if self.writer is not None:
                 self.writer.write(read1, read2)
-                self.written += 1
-                self.written_bp[0] += len(read1)
-                self.written_bp[1] += len(read2)
+                self.update_statistics(read1, read2)
             return DISCARD
         return KEEP
 
@@ -160,7 +193,7 @@ class TooShortReadFilter(SingleEndFilter):
     def __init__(self, minimum_length):
         self.minimum_length = minimum_length
 
-    def __call__(self, read, matches):
+    def __call__(self, read, info: ModificationInfo):
         return len(read) < self.minimum_length
 
 
@@ -168,13 +201,29 @@ class TooLongReadFilter(SingleEndFilter):
     def __init__(self, maximum_length):
         self.maximum_length = maximum_length
 
-    def __call__(self, read, matches):
+    def __call__(self, read, info: ModificationInfo):
         return len(read) > self.maximum_length
 
 
+class MaximumExpectedErrorsFilter(SingleEndFilter):
+    """
+    Discard reads whose expected number of errors, according to the quality
+    values, exceeds a threshold.
+
+    The idea comes from usearch's -fastq_maxee parameter
+    (http://drive5.com/usearch/).
+    """
+    def __init__(self, max_errors):
+        self.max_errors = max_errors
+
+    def __call__(self, read, info: ModificationInfo):
+        """Return True when the read should be discarded"""
+        return expected_errors(read.qualities) > self.max_errors
+
+
 class NContentFilter(SingleEndFilter):
     """
-    Discards a reads that has a number of 'N's over a given threshold. It handles both raw counts
+    Discard a read if it has too many 'N' bases. It handles both raw counts
     of Ns as well as proportions. Note, for raw counts, it is a 'greater than' comparison,
     so a cutoff of '1' will keep reads with a single N in it.
     """
@@ -187,7 +236,7 @@ class NContentFilter(SingleEndFilter):
         self.is_proportion = count < 1.0
         self.cutoff = count
 
-    def __call__(self, read, matches):
+    def __call__(self, read, info: ModificationInfo):
         """Return True when the read should be discarded"""
         n_count = read.sequence.lower().count('n')
         if self.is_proportion:
@@ -202,16 +251,16 @@ class DiscardUntrimmedFilter(SingleEndFilter):
     """
     Return True if read is untrimmed.
     """
-    def __call__(self, read, matches):
-        return not matches
+    def __call__(self, read, info: ModificationInfo):
+        return not info.matches
 
 
 class DiscardTrimmedFilter(SingleEndFilter):
     """
     Return True if read is trimmed.
     """
-    def __call__(self, read, matches):
-        return bool(matches)
+    def __call__(self, read, info: ModificationInfo):
+        return bool(info.matches)
 
 
 class CasavaFilter(SingleEndFilter):
@@ -222,12 +271,12 @@ class CasavaFilter(SingleEndFilter):
 
     Reads with unrecognized headers are kept.
     """
-    def __call__(self, read, matches):
+    def __call__(self, read, info: ModificationInfo):
         _, _, right = read.name.partition(' ')
         return right[1:4] == ':Y:'  # discard if :Y: found
 
 
-class Demultiplexer(SingleEndFilter):
+class Demultiplexer(SingleEndFilterWithStatistics):
     """
     Demultiplex trimmed reads. Reads are written to different output files
     depending on which adapter matches. Files are created when the first read
@@ -240,35 +289,32 @@ class Demultiplexer(SingleEndFilter):
         Reads without an adapter match are written to the file named by
         untrimmed_path.
         """
+        super().__init__()
         assert '{name}' in path_template
         self.template = path_template
         self.untrimmed_path = untrimmed_path
         self.untrimmed_writer = None
         self.writers = dict()
-        self.written = 0
-        self.written_bp = [0, 0]
         self.qualities = qualities
         self.file_opener = file_opener
 
-    def __call__(self, read, matches):
+    def __call__(self, read, info):
         """
         Write the read to the proper output file according to the most recent match
         """
-        if matches:
-            name = matches[-1].adapter.name
+        if info.matches:
+            name = info.matches[-1].adapter.name
             if name not in self.writers:
                 self.writers[name] = self.file_opener.dnaio_open_raise_limit(
                     self.template.replace('{name}', name), self.qualities)
-            self.written += 1
-            self.written_bp[0] += len(read)
+            self.update_statistics(read)
             self.writers[name].write(read)
         else:
             if self.untrimmed_writer is None and self.untrimmed_path is not None:
                 self.untrimmed_writer = self.file_opener.dnaio_open_raise_limit(
                     self.untrimmed_path, self.qualities)
             if self.untrimmed_writer is not None:
-                self.written += 1
-                self.written_bp[0] += len(read)
+                self.update_statistics(read)
                 self.untrimmed_writer.write(read)
         return DISCARD
 
@@ -279,7 +325,7 @@ class Demultiplexer(SingleEndFilter):
             self.untrimmed_writer.close()
 
 
-class PairedDemultiplexer(PairedEndFilter):
+class PairedDemultiplexer(PairedEndFilterWithStatistics):
     """
     Demultiplex trimmed paired-end reads. Reads are written to different output files
     depending on which adapter (in read 1) matches.
@@ -292,34 +338,40 @@ class PairedDemultiplexer(PairedEndFilter):
         Read pairs without an adapter match are written to the files named by
         untrimmed_path.
         """
+        super().__init__()
         self._demultiplexer1 = Demultiplexer(path_template, untrimmed_path, qualities, file_opener)
         self._demultiplexer2 = Demultiplexer(path_paired_template, untrimmed_paired_path,
             qualities, file_opener)
 
-    @property
-    def written(self):
-        return self._demultiplexer1.written + self._demultiplexer2.written
+    def written(self) -> int:
+        return self._demultiplexer1._written + self._demultiplexer2._written
 
-    @property
-    def written_bp(self):
-        return [self._demultiplexer1.written_bp[0], self._demultiplexer2.written_bp[0]]
+    def written_bp(self) -> Tuple[int, int]:
+        return (self._demultiplexer1._written_bp[0], self._demultiplexer2._written_bp[0])
 
-    def __call__(self, read1, read2, matches1, matches2):
+    def __call__(self, read1, read2, info1: ModificationInfo, info2: ModificationInfo):
         assert read2 is not None
-        self._demultiplexer1(read1, matches1)
-        self._demultiplexer2(read2, matches1)
+        self._demultiplexer1(read1, info1)
+        self._demultiplexer2(read2, info1)
 
     def close(self):
         self._demultiplexer1.close()
         self._demultiplexer2.close()
 
 
-class CombinatorialDemultiplexer(PairedEndFilter):
+class CombinatorialDemultiplexer(PairedEndFilterWithStatistics):
     """
     Demultiplex reads depending on which adapter matches, taking into account both matches
     on R1 and R2.
     """
-    def __init__(self, path_template, path_paired_template, untrimmed_name, qualities, file_opener):
+    def __init__(
+        self,
+        path_template: str,
+        path_paired_template: str,
+        untrimmed_name: Optional[str],
+        qualities: bool,
+        file_opener: FileOpener,
+    ):
         """
         path_template must contain the string '{name1}' and '{name2}', which will be replaced
         with the name of the adapters found on R1 and R2, respectively to form the final output
@@ -327,15 +379,17 @@ class CombinatorialDemultiplexer(PairedEndFilter):
         specified by untrimmed_name. Alternatively, untrimmed_name can be set to None; in that
         case, read pairs for which at least one read does not have an adapter match are
         discarded.
+
+        untrimmed_name -- what to replace the templates with when one or both of the reads
+            do not contain an adapter (use "unknown"). Set to None to discard these read pairs.
         """
+        super().__init__()
         assert '{name1}' in path_template and '{name2}' in path_template
         assert '{name1}' in path_paired_template and '{name2}' in path_paired_template
         self.template = path_template
         self.paired_template = path_paired_template
         self.untrimmed_name = untrimmed_name
-        self.writers = dict()
-        self.written = 0
-        self.written_bp = [0, 0]
+        self.writers = dict()  # type: Dict[Tuple[str, str], Any]
         self.qualities = qualities
         self.file_opener = file_opener
 
@@ -343,16 +397,17 @@ class CombinatorialDemultiplexer(PairedEndFilter):
     def _make_path(template, name1, name2):
         return template.replace('{name1}', name1).replace('{name2}', name2)
 
-    def __call__(self, read1, read2, matches1, matches2):
+    def __call__(self, read1, read2, info1, info2):
         """
         Write the read to the proper output file according to the most recent matches both on
         R1 and R2
         """
         assert read2 is not None
-        name1 = matches1[-1].adapter.name if matches1 else None
-        name2 = matches2[-1].adapter.name if matches2 else None
+        name1 = info1.matches[-1].adapter.name if info1.matches else None
+        name2 = info2.matches[-1].adapter.name if info2.matches else None
         key = (name1, name2)
         if key not in self.writers:
+            # Open writer on first use
             if name1 is None:
                 name1 = self.untrimmed_name
             if name2 is None:
@@ -366,9 +421,7 @@ class CombinatorialDemultiplexer(PairedEndFilter):
                 self.file_opener.dnaio_open_raise_limit(path2, qualities=self.qualities),
             )
         writer1, writer2 = self.writers[key]
-        self.written += 1
-        self.written_bp[0] += len(read1)
-        self.written_bp[1] += len(read2)
+        self.update_statistics(read1, read2)
         writer1.write(read1)
         writer2.write(read2)
         return DISCARD
@@ -383,9 +436,10 @@ class RestFileWriter(SingleEndFilter):
     def __init__(self, file):
         self.file = file
 
-    def __call__(self, read, matches):
-        if matches:
-            rest = matches[-1].rest()
+    def __call__(self, read, info):
+        # TODO this fails with linked adapters
+        if info.matches:
+            rest = info.matches[-1].rest()
             if len(rest) > 0:
                 print(rest, read.name, file=self.file)
         return KEEP
@@ -395,9 +449,10 @@ class WildcardFileWriter(SingleEndFilter):
     def __init__(self, file):
         self.file = file
 
-    def __call__(self, read, matches):
-        if matches:
-            print(matches[-1].wildcards(), read.name, file=self.file)
+    def __call__(self, read, info):
+        # TODO this fails with linked adapters
+        if info.matches:
+            print(info.matches[-1].wildcards(), read.name, file=self.file)
         return KEEP
 
 
@@ -405,11 +460,12 @@ class InfoFileWriter(SingleEndFilter):
     def __init__(self, file):
         self.file = file
 
-    def __call__(self, read, matches: List[Match]):
-        if matches:
-            for match in matches:
+    def __call__(self, read, info: ModificationInfo):
+        if info.matches:
+            for match in info.matches:
                 for info_record in match.get_info_records():
-                    print(*info_record, sep='\t', file=self.file)
+                    # info_record[0] is the read name suffix
+                    print(read.name + info_record[0], *info_record[1:], sep='\t', file=self.file)
         else:
             seq = read.sequence
             qualities = read.qualities if read.qualities is not None else ''


=====================================
src/cutadapt/modifiers.py
=====================================
@@ -13,15 +13,27 @@ from .adapters import Where, MultiAdapter, Match, remainder
 from .utils import reverse_complemented_sequence
 
 
-class Modifier(ABC):
+class ModificationInfo:
+    """
+    An object of this class is created for each read that passes through the pipeline.
+    Any information (except the read itself) that needs to be passed from one modifier
+    to one later in the pipeline or from one modifier to the filters is recorded here.
+    """
+    __slots__ = ["matches"]
+
+    def __init__(self):
+        self.matches = []  # type: List[Match]
+
+
+class SingleEndModifier(ABC):
     @abstractmethod
-    def __call__(self, read, matches: List[Match]):
+    def __call__(self, read, info: ModificationInfo):
         pass
 
 
 class PairedModifier(ABC):
     @abstractmethod
-    def __call__(self, read1, read2, matches1, matches2):
+    def __call__(self, read1, read2, info1: ModificationInfo, info2: ModificationInfo):
         pass
 
 
@@ -31,24 +43,26 @@ class PairedModifierWrapper(PairedModifier):
     """
     paired = True
 
-    def __init__(self, modifier1: Optional[Modifier], modifier2: Optional[Modifier]):
+    def __init__(self, modifier1: Optional[SingleEndModifier], modifier2: Optional[SingleEndModifier]):
         """Set one of the modifiers to None to work on R1 or R2 only"""
         self._modifier1 = modifier1
         self._modifier2 = modifier2
+        if self._modifier1 is None and self._modifier2 is None:
+            raise ValueError("Not both modifiers may be None")
 
     def __repr__(self):
         return 'PairedModifier(modifier1={!r}, modifier2={!r})'.format(
             self._modifier1, self._modifier2)
 
-    def __call__(self, read1, read2, matches1, matches2):
+    def __call__(self, read1, read2, info1: ModificationInfo, info2: ModificationInfo):
         if self._modifier1 is None:
-            return read1, self._modifier2(read2, matches2)
+            return read1, self._modifier2(read2, info2)  # type: ignore
         if self._modifier2 is None:
-            return self._modifier1(read1, matches1), read2
-        return self._modifier1(read1, matches1), self._modifier2(read2, matches2)
+            return self._modifier1(read1, info1), read2
+        return self._modifier1(read1, info1), self._modifier2(read2, info2)
 
 
-class AdapterCutter(Modifier):
+class AdapterCutter(SingleEndModifier):
     """
     Repeatedly find one of multiple adapters in reads.
     The number of times the search is repeated is specified by the
@@ -148,13 +162,13 @@ class AdapterCutter(Modifier):
         trimmed_read.qualities = read.qualities
         return trimmed_read
 
-    def __call__(self, read, inmatches: List[Match]):
+    def __call__(self, read, info: ModificationInfo):
         trimmed_read, matches = self.match_and_trim(read)
         if matches:
             self.with_adapters += 1
             for match in matches:
                 match.update_statistics(self.adapter_statistics[match.adapter])
-        inmatches.extend(matches)
+        info.matches.extend(matches)  # TODO extend or overwrite?
         return trimmed_read
 
     def match_and_trim(self, read):
@@ -198,7 +212,7 @@ class AdapterCutter(Modifier):
         return trimmed_read, matches
 
 
-class ReverseComplementer(Modifier):
+class ReverseComplementer(SingleEndModifier):
     """Trim adapters from a read and its reverse complement"""
 
     def __init__(self, adapter_cutter: AdapterCutter, rc_suffix: Optional[str] = " rc"):
@@ -209,7 +223,7 @@ class ReverseComplementer(Modifier):
         self.reverse_complemented = 0
         self._suffix = rc_suffix
 
-    def __call__(self, read, inmatches: List[Match]):
+    def __call__(self, read, info: ModificationInfo):
         reverse_read = reverse_complemented_sequence(read)
 
         forward_trimmed_read, forward_matches = self.adapter_cutter.match_and_trim(read)
@@ -234,7 +248,7 @@ class ReverseComplementer(Modifier):
                 stats = self.adapter_cutter.adapter_statistics[match.adapter]
                 match.update_statistics(stats)
                 stats.reverse_complemented += bool(use_reverse_complement)
-            inmatches.extend(matches)
+            info.matches.extend(matches)  # TODO extend or overwrite?
         return trimmed_read
 
 
@@ -277,7 +291,7 @@ class PairedAdapterCutter(PairedModifier):
         return 'PairedAdapterCutter(adapters1={!r}, adapters2={!r})'.format(
             self._adapters1, self._adapters2)
 
-    def __call__(self, read1, read2, matches1, matches2):
+    def __call__(self, read1, read2, info1, info2):
         """
         """
         match1 = AdapterCutter.best_match(self._adapters1, read1)
@@ -310,12 +324,12 @@ class PairedAdapterCutter(PairedModifier):
             elif self.action is None:  # --no-trim
                 trimmed_read = read[:]
             result.append(trimmed_read)
-        matches1.append(match1)
-        matches2.append(match2)
+        info1.matches.append(match1)
+        info2.matches.append(match2)
         return result
 
 
-class UnconditionalCutter(Modifier):
+class UnconditionalCutter(SingleEndModifier):
     """
     A modifier that unconditionally removes the first n or the last n bases from a read.
 
@@ -325,14 +339,14 @@ class UnconditionalCutter(Modifier):
     def __init__(self, length: int):
         self.length = length
 
-    def __call__(self, read, matches: List[Match]):
+    def __call__(self, read, info: ModificationInfo):
         if self.length > 0:
             return read[self.length:]
         elif self.length < 0:
             return read[:self.length]
 
 
-class LengthTagModifier(Modifier):
+class LengthTagModifier(SingleEndModifier):
     """
     Replace "length=..." strings in read names.
     """
@@ -340,28 +354,28 @@ class LengthTagModifier(Modifier):
         self.regex = re.compile(r"\b" + length_tag + r"[0-9]*\b")
         self.length_tag = length_tag
 
-    def __call__(self, read, matches):
+    def __call__(self, read, info: ModificationInfo):
         read = read[:]
         if read.name.find(self.length_tag) >= 0:
             read.name = self.regex.sub(self.length_tag + str(len(read.sequence)), read.name)
         return read
 
 
-class SuffixRemover(Modifier):
+class SuffixRemover(SingleEndModifier):
     """
     Remove a given suffix from read names.
     """
     def __init__(self, suffix):
         self.suffix = suffix
 
-    def __call__(self, read, matches):
+    def __call__(self, read, info: ModificationInfo):
         read = read[:]
         if read.name.endswith(self.suffix):
             read.name = read.name[:-len(self.suffix)]
         return read
 
 
-class PrefixSuffixAdder(Modifier):
+class PrefixSuffixAdder(SingleEndModifier):
     """
     Add a suffix and a prefix to read names
     """
@@ -369,15 +383,15 @@ class PrefixSuffixAdder(Modifier):
         self.prefix = prefix
         self.suffix = suffix
 
-    def __call__(self, read, matches):
+    def __call__(self, read, info):
         read = read[:]
-        adapter_name = matches[-1].adapter.name if matches else 'no_adapter'
+        adapter_name = info.matches[-1].adapter.name if info.matches else 'no_adapter'
         read.name = self.prefix.replace('{name}', adapter_name) + read.name + \
             self.suffix.replace('{name}', adapter_name)
         return read
 
 
-class ZeroCapper(Modifier):
+class ZeroCapper(SingleEndModifier):
     """
     Change negative quality values of a read to zero
     """
@@ -385,38 +399,38 @@ class ZeroCapper(Modifier):
         qb = quality_base
         self.zero_cap_trans = str.maketrans(''.join(map(chr, range(qb))), chr(qb) * qb)
 
-    def __call__(self, read, matches):
+    def __call__(self, read, info: ModificationInfo):
         read = read[:]
         read.qualities = read.qualities.translate(self.zero_cap_trans)
         return read
 
 
-class NextseqQualityTrimmer(Modifier):
+class NextseqQualityTrimmer(SingleEndModifier):
     def __init__(self, cutoff, base):
         self.cutoff = cutoff
         self.base = base
         self.trimmed_bases = 0
 
-    def __call__(self, read, matches):
+    def __call__(self, read, info: ModificationInfo):
         stop = nextseq_trim_index(read, self.cutoff, self.base)
         self.trimmed_bases += len(read) - stop
         return read[:stop]
 
 
-class QualityTrimmer(Modifier):
+class QualityTrimmer(SingleEndModifier):
     def __init__(self, cutoff_front, cutoff_back, base):
         self.cutoff_front = cutoff_front
         self.cutoff_back = cutoff_back
         self.base = base
         self.trimmed_bases = 0
 
-    def __call__(self, read, matches):
+    def __call__(self, read, info: ModificationInfo):
         start, stop = quality_trim_index(read.qualities, self.cutoff_front, self.cutoff_back, self.base)
         self.trimmed_bases += len(read) - (stop - start)
         return read[start:stop]
 
 
-class Shortener(Modifier):
+class Shortener(SingleEndModifier):
     """Unconditionally shorten a read to the given length
 
     If the length is positive, the bases are removed from the end of the read.
@@ -425,20 +439,20 @@ class Shortener(Modifier):
     def __init__(self, length):
         self.length = length
 
-    def __call__(self, read, matches):
+    def __call__(self, read, info: ModificationInfo):
         if self.length >= 0:
             return read[:self.length]
         else:
             return read[self.length:]
 
 
-class NEndTrimmer(Modifier):
+class NEndTrimmer(SingleEndModifier):
     """Trims Ns from the 3' and 5' end of reads"""
     def __init__(self):
         self.start_trim = re.compile(r'^N+')
         self.end_trim = re.compile(r'N+$')
 
-    def __call__(self, read, matches):
+    def __call__(self, read, info: ModificationInfo):
         sequence = read.sequence
         start_cut = self.start_trim.match(sequence)
         end_cut = self.end_trim.search(sequence)


=====================================
src/cutadapt/pipeline.py
=====================================
@@ -4,7 +4,7 @@ import sys
 import copy
 import logging
 import functools
-from typing import List, IO, Optional, BinaryIO, TextIO, Any, Tuple, Union
+from typing import List, IO, Optional, BinaryIO, TextIO, Any, Tuple
 from abc import ABC, abstractmethod
 from multiprocessing import Process, Pipe, Queue
 from pathlib import Path
@@ -16,10 +16,11 @@ from xopen import xopen
 import dnaio
 
 from .utils import Progress, FileOpener
-from .modifiers import Modifier, PairedModifier, PairedModifierWrapper
+from .modifiers import SingleEndModifier, PairedModifier, PairedModifierWrapper, ModificationInfo
 from .report import Statistics
 from .filters import (Redirector, PairedRedirector, NoFilter, PairedNoFilter, InfoFileWriter,
     RestFileWriter, WildcardFileWriter, TooShortReadFilter, TooLongReadFilter, NContentFilter,
+    MaximumExpectedErrorsFilter,
     CasavaFilter, DiscardTrimmedFilter, DiscardUntrimmedFilter, Demultiplexer,
     PairedDemultiplexer, CombinatorialDemultiplexer)
 
@@ -98,10 +99,8 @@ class Pipeline(ABC):
 
     def __init__(self, file_opener: FileOpener):
         self._close_files = []  # type: List[IO]
-        self._reader = None
+        self._reader = None  # type: Any
         self._filters = []  # type: List[Any]
-        # TODO type should be Union[List[Modifier], List[PairedModifier]]
-        self._modifiers = []  # type: List[Union[Modifier, PairedModifier]]
         self._outfiles = None  # type: Optional[OutputFiles]
         self._demultiplexer = None
         self._textiowrappers = []  # type: List[TextIO]
@@ -110,6 +109,7 @@ class Pipeline(ABC):
         self._minimum_length = None
         self._maximum_length = None
         self.max_n = None
+        self.max_expected_errors = None
         self.discard_casava = False
         self.discard_trimmed = False
         self.discard_untrimmed = False
@@ -172,6 +172,13 @@ class Pipeline(ABC):
             f1 = f2 = NContentFilter(self.max_n)
             self._filters.append(filter_wrapper(None, f1, f2))
 
+        if self.max_expected_errors is not None:
+            if not self._reader.delivers_qualities:
+                logger.warning("Ignoring option --max-ee as input does not contain quality values")
+            else:
+                f1 = f2 = MaximumExpectedErrorsFilter(self.max_expected_errors)
+                self._filters.append(filter_wrapper(None, f1, f2))
+
         if self.discard_casava:
             f1 = f2 = CasavaFilter()
             self._filters.append(filter_wrapper(None, f1, f2))
@@ -211,6 +218,7 @@ class Pipeline(ABC):
                 f.flush()
 
     def close(self) -> None:
+        self._reader.close()
         for f in self._textiowrappers:
             f.close()  # This also closes the underlying files; a second close occurs below
         assert self._outfiles is not None
@@ -255,9 +263,9 @@ class SingleEndPipeline(Pipeline):
 
     def __init__(self, file_opener: FileOpener):
         super().__init__(file_opener)
-        self._modifiers = []
+        self._modifiers = []  # type: List[SingleEndModifier]
 
-    def add(self, modifier: Modifier):
+    def add(self, modifier: SingleEndModifier):
         if modifier is None:
             raise ValueError("Modifier must not be None")
         self._modifiers.append(modifier)
@@ -271,12 +279,12 @@ class SingleEndPipeline(Pipeline):
             n += 1
             if n % 10000 == 0 and progress:
                 progress.update(n)
-            total_bp += len(read.sequence)
-            matches = []
+            total_bp += len(read)
+            info = ModificationInfo()
             for modifier in self._modifiers:
-                read = modifier(read, matches)
+                read = modifier(read, info)
             for filter_ in self._filters:
-                if filter_(read, matches):
+                if filter_(read, info):
                     break
         return (n, total_bp, None)
 
@@ -322,12 +330,13 @@ class PairedEndPipeline(Pipeline):
 
     def __init__(self, pair_filter_mode, file_opener: FileOpener):
         super().__init__(file_opener)
+        self._modifiers = []  # type: List[PairedModifier]
         self._pair_filter_mode = pair_filter_mode
         self._reader = None
         # Whether to ignore pair_filter mode for discard-untrimmed filter
         self.override_untrimmed_pair_filter = False
 
-    def add(self, modifier1: Optional[Modifier], modifier2: Optional[Modifier]):
+    def add(self, modifier1: Optional[SingleEndModifier], modifier2: Optional[SingleEndModifier]):
         """
         Add a modifier for R1 and R2. One of them can be None, in which case the modifier
         will only be added for the respective read.
@@ -336,7 +345,7 @@ class PairedEndPipeline(Pipeline):
             raise ValueError("Not both modifiers can be None")
         self._modifiers.append(PairedModifierWrapper(modifier1, modifier2))
 
-    def add_both(self, modifier: Modifier):
+    def add_both(self, modifier: SingleEndModifier):
         """
         Add one modifier for both R1 and R2
         """
@@ -356,15 +365,15 @@ class PairedEndPipeline(Pipeline):
             n += 1
             if n % 10000 == 0 and progress:
                 progress.update(n)
-            total1_bp += len(read1.sequence)
-            total2_bp += len(read2.sequence)
-            matches1 = []
-            matches2 = []
+            total1_bp += len(read1)
+            total2_bp += len(read2)
+            info1 = ModificationInfo()
+            info2 = ModificationInfo()
             for modifier in self._modifiers:
-                read1, read2 = modifier(read1, read2, matches1, matches2)
+                read1, read2 = modifier(read1, read2, info1, info2)
             for filter_ in self._filters:
                 # Stop writing as soon as one of the filters was successful.
-                if filter_(read1, read2, matches1, matches2):
+                if filter_(read1, read2, info1, info2):
                     break
         return (n, total1_bp, total2_bp)
 
@@ -556,7 +565,6 @@ class WorkerProcess(Process):
             orig_outfile = getattr(self._orig_outfiles, attr)
             if orig_outfile is not None:
                 output = io.BytesIO()
-                output.name = orig_outfile.name
                 setattr(output_files, attr, output)
 
         return output_files
@@ -602,7 +610,7 @@ class PipelineRunner(ABC):
     """
     A read processing pipeline
     """
-    def __init__(self, pipeline: Pipeline, progress: Progress):
+    def __init__(self, pipeline: Pipeline, progress: Progress, *args, **kwargs):
         self._pipeline = pipeline
         self._progress = progress
 
@@ -614,6 +622,12 @@ class PipelineRunner(ABC):
     def close(self):
         pass
 
+    def __enter__(self):
+        return self
+
+    def __exit__(self, *args):
+        self.close()
+
 
 class ParallelPipelineRunner(PipelineRunner):
     """


=====================================
src/cutadapt/qualtrim.pyx
=====================================
@@ -82,3 +82,25 @@ def nextseq_trim_index(sequence, int cutoff, int base=33):
             max_qual = s
             max_i = i
     return max_i
+
+
+def expected_errors(str qualities, int base=33):
+    """
+    Return the number of expected errors (as double) from a read’s
+    qualities.
+
+    This uses the formula in Edgar et al. (2015),
+    see Section 2.2 in <https://academic.oup.com/bioinformatics/article/31/21/3476/194979>.
+
+    qualities -- ASCII-encoded qualities (chr(qual + base))
+    """
+    cdef:
+        int i, q
+        bytes quals = qualities.encode()
+        char* cq = quals
+        double e = 0.0
+
+    for i in range(len(qualities)):
+        q = cq[i] - base
+        e += 10 ** (-q / 10)
+    return e


=====================================
src/cutadapt/report.py
=====================================
@@ -3,13 +3,12 @@ Routines for printing a report.
 """
 from io import StringIO
 import textwrap
+from collections import Counter
 from typing import Any, Optional, List
 from .adapters import Where, EndStatistics, AdapterStatistics, ADAPTER_TYPE_NAMES
-from .modifiers import (Modifier, PairedModifier, QualityTrimmer, NextseqQualityTrimmer,
+from .modifiers import (SingleEndModifier, PairedModifier, QualityTrimmer, NextseqQualityTrimmer,
     AdapterCutter, PairedAdapterCutter, ReverseComplementer)
-from .filters import (NoFilter, PairedNoFilter, TooShortReadFilter, TooLongReadFilter,
-    PairedDemultiplexer, CombinatorialDemultiplexer, Demultiplexer, NContentFilter, InfoFileWriter,
-    WildcardFileWriter, RestFileWriter)
+from .filters import WithStatistics, TooShortReadFilter, TooLongReadFilter, NContentFilter
 
 
 def safe_divide(numerator, denominator):
@@ -41,6 +40,7 @@ class Statistics:
         self.written = 0
         self.total_bp = [0, 0]
         self.written_bp = [0, 0]
+        self.written_lengths = [Counter(), Counter()]  # type: List[Counter]
         self.with_adapters = [0, 0]
         self.quality_trimmed_bp = [0, 0]
         self.adapter_stats = [[], []]  # type: List[List[AdapterStatistics]]
@@ -66,6 +66,7 @@ class Statistics:
         for i in (0, 1):
             self.total_bp[i] += other.total_bp[i]
             self.written_bp[i] += other.written_bp[i]
+            self.written_lengths[i] += other.written_lengths[i]
             self.with_adapters[i] += other.with_adapters[i]
             self.quality_trimmed_bp[i] += other.quality_trimmed_bp[i]
             if self.adapter_stats[i] and other.adapter_stats[i]:
@@ -102,13 +103,13 @@ class Statistics:
         return self
 
     def _collect_writer(self, w):
-        if isinstance(w, (InfoFileWriter, RestFileWriter, WildcardFileWriter)):
-            return
-        elif isinstance(w, (NoFilter, PairedNoFilter, PairedDemultiplexer,
-                CombinatorialDemultiplexer, Demultiplexer)):
-            self.written += w.written
-            self.written_bp[0] += w.written_bp[0]
-            self.written_bp[1] += w.written_bp[1]
+        if isinstance(w, WithStatistics):
+            self.written += w.written_reads()
+            written_bp = w.written_bp()
+            written_lengths = w.written_lengths()
+            for i in 0, 1:
+                self.written_bp[i] += written_bp[i]
+                self.written_lengths[i] += written_lengths[i]
         elif isinstance(w.filter, TooShortReadFilter):
             self.too_short = w.filtered
         elif isinstance(w.filter, TooLongReadFilter):
@@ -116,7 +117,7 @@ class Statistics:
         elif isinstance(w.filter, NContentFilter):
             self.too_many_n = w.filtered
 
-    def _collect_modifier(self, m: Modifier):
+    def _collect_modifier(self, m: SingleEndModifier):
         if isinstance(m, PairedAdapterCutter):
             for i in 0, 1:
                 self.with_adapters[i] += m.with_adapters
@@ -368,7 +369,7 @@ def full_report(stats: Statistics, time: float, gc_content: float) -> str:  # no
                 print_s("Sequence: {}; Type: {}; Length: {}; Trimmed: {} times".
                     format(adapter_statistics.front.sequence, ADAPTER_TYPE_NAMES[adapter_statistics.where],
                         len(adapter_statistics.front.sequence), total), end="")
-            if reverse_complemented is not None:
+            if stats.reverse_complemented is not None:
                 print_s("; Reverse-complemented: {} times".format(reverse_complemented))
             else:
                 print_s()


=====================================
tests/cut/maxee.fastq
=====================================
@@ -0,0 +1,8 @@
+ at empty
+
++
+
+ at ee_0.8
+ACGTTGCA
++
+++++++++


=====================================
tests/data/SRR2040271_1.fastq deleted
=====================================
@@ -1,8 +0,0 @@
- at SRR2040271.1 SN603_WBP007_8_1101_63.30_99.90 length=100
-NTCATTCCATGACATTGTCTGTTGGTTGCTTTTTGAGTATATTTTCTCATGGCTTCATCTATCTTGCTCATAAGACTAAATGGGGAGACAGACTTCCTGG
-+SRR2040271.1 SN603_WBP007_8_1101_63.30_99.90 length=100
-!1=DDFFFGHHHHIJJJJJIIJJJGGHJGJIJJGBGGHEGHJIIIIEHIJEH>@G;FHCG<<BCG:CG at F4@D@=@DG>EE>EED(,,(;((,;;@A>CC
- at SRR2040271.2 SN603_WBP007_8_1101_79.90_99.30 length=100
-NTAAAGACCCTTCAACTCAGGGCTTCTGAACCCTGGCTACCCATTAGAATCACTTAGGGAGCTTTGAAAAATATCAACACCCCAGCCCTAACCCAGCCCA
-+SRR2040271.2 SN603_WBP007_8_1101_79.90_99.30 length=100
-!4=BBDDDFHHHH at FHIIIIIIIIIGIIHGIIIIIIIIHIIIIIICEGGIIIIIIGHIIEGHIGIIIHHHFHEEEEEEECC=B=@BBBCCCBBBBA(88?


=====================================
tests/data/maxee.fastq
=====================================
@@ -0,0 +1,16 @@
+ at empty
+
++
+
+ at ee_1
+A
++
+!
+ at ee_0.8
+ACGTTGCA
++
+++++++++
+ at ee_1.01
+TGGACGTTGCA
++
++5+++++++++


=====================================
tests/data/plus.fastq deleted
=====================================
@@ -1,8 +0,0 @@
- at first_sequence some other text
-SEQUENCE1
-+first_sequence some other text
-:6;;8<=:<
- at second_sequence and more text
-SEQUENCE2
-+second_sequence and more text
-83<??:(61


=====================================
tests/test_adapters.py
=====================================
@@ -1,14 +1,14 @@
 import pytest
 
 from dnaio import Sequence
-from cutadapt.adapters import SingleAdapter, SingleMatch, Where, LinkedAdapter
+from cutadapt.adapters import SingleAdapter, SingleMatch, Where, LinkedAdapter, WhereToRemove
 
 
 def test_issue_52():
     adapter = SingleAdapter(
         sequence='GAACTCCAGTCACNNNNN',
         where=Where.BACK,
-        remove='suffix',
+        remove=WhereToRemove.SUFFIX,
         max_error_rate=0.12,
         min_overlap=5,
         read_wildcards=False,
@@ -45,7 +45,7 @@ def test_issue_80():
     adapter = SingleAdapter(
         sequence="TCGTATGCCGTCTTC",
         where=Where.BACK,
-        remove='suffix',
+        remove=WhereToRemove.SUFFIX,
         max_error_rate=0.2,
         min_overlap=3,
         read_wildcards=False,
@@ -58,7 +58,7 @@ def test_issue_80():
 
 
 def test_str():
-    a = SingleAdapter('ACGT', where=Where.BACK, remove='suffix', max_error_rate=0.1)
+    a = SingleAdapter('ACGT', where=Where.BACK, remove=WhereToRemove.SUFFIX, max_error_rate=0.1)
     str(a)
     str(a.match_to(Sequence(name='seq', sequence='TTACGT')))
 
@@ -91,7 +91,7 @@ def test_info_record():
     am = SingleMatch(astart=0, astop=17, rstart=5, rstop=21, matches=15, errors=2, remove_before=False,
         adapter=adapter, read=read)
     assert am.get_info_records() == [[
-        "abc",
+        "",
         2,
         5,
         21,


=====================================
tests/test_commandline.py
=====================================
@@ -9,7 +9,7 @@ import pytest
 import subprocess
 
 from cutadapt.__main__ import main
-from utils import assert_files_equal, datapath, cutpath, redirect_stderr
+from utils import assert_files_equal, datapath, cutpath
 
 # pytest.mark.timeout will not fail even if pytest-timeout is not installed
 try:
@@ -69,19 +69,6 @@ def test_discard_untrimmed(run):
     run('-b CAAGAT --discard-untrimmed', 'discard-untrimmed.fastq', 'small.fastq')
 
 
- at pytest.mark.skip(reason='Regression since switching to dnaio')
-def test_second_header_retained(run, cores):
-    """test if sequence name after the "+" is retained"""
-    run("--cores {} -e 0.12 -b TTAGACATATCTCCGTCG".format(cores), "plus.fastq", "plus.fastq")
-
-
- at pytest.mark.skip(reason='Regression since switching to dnaio')
-def test_length_tag_second_header(run, cores):
-    """Ensure --length-tag= also modifies the second header line"""
-    run("--cores {} -a GGCTTC --length-tag=length=".format(cores),
-        'SRR2040271_1.fastq', 'SRR2040271_1.fastq')
-
-
 def test_extensiontxtgz(run):
     """automatic recognition of "_sequence.txt.gz" extension"""
     run("-b TTAGACATATCTCCGTCG", "s_1_sequence.txt", "s_1_sequence.txt.gz")
@@ -343,8 +330,16 @@ def test_adapter_with_u(run):
     run("-a GCCGAACUUCUUAGACUGCCUUAAGGACGU", "illumina.fastq", "illumina.fastq.gz")
 
 
-def test_bzip2(run):
-    run('-b TTAGACATATCTCCGTCG', 'small.fastq', 'small.fastq.bz2')
+def test_bzip2_input(run, cores):
+    run(["--cores", str(cores), "-a", "TTAGACATATCTCCGTCG"], "small.fastq", "small.fastq.bz2")
+
+
+ at pytest.mark.parametrize("extension", ["bz2", "xz", "gz"])
+def test_compressed_output(tmp_path, cores, extension):
+    out_path = str(tmp_path / ("small.fastq." + extension))
+    params = [
+        "--cores", str(cores), "-a", "TTAGACATATCTCCGTCG", "-o", out_path, datapath("small.fastq")]
+    assert main(params) is None
 
 
 if sys.version_info[:2] >= (3, 3):
@@ -358,14 +353,12 @@ def test_xz(run):
 
 def test_no_args():
     with pytest.raises(SystemExit):
-        with redirect_stderr():
-            main([])
+        main([])
 
 
 def test_two_fastqs():
     with pytest.raises(SystemExit):
-        with redirect_stderr():
-            main([datapath('paired.1.fastq'), datapath('paired.2.fastq')])
+        main([datapath('paired.1.fastq'), datapath('paired.2.fastq')])
 
 
 def test_anchored_no_indels(run):
@@ -386,8 +379,7 @@ def test_anchored_no_indels_wildcard_adapt(run):
 
 def test_non_iupac_characters(run):
     with pytest.raises(SystemExit):
-        with redirect_stderr():
-            main(['-a', 'ZACGT', datapath('small.fastq')])
+        main(['-a', 'ZACGT', datapath('small.fastq')])
 
 
 def test_unconditional_cut_front(run):
@@ -544,20 +536,17 @@ def test_linked_info_file(tmpdir):
 
 def test_linked_anywhere():
     with pytest.raises(SystemExit):
-        with redirect_stderr():
-            main(['-b', 'AAA...TTT', datapath('linked.fasta')])
+        main(['-b', 'AAA...TTT', datapath('linked.fasta')])
 
 
 def test_anywhere_anchored_5p():
     with pytest.raises(SystemExit):
-        with redirect_stderr():
-            main(['-b', '^AAA', datapath('small.fastq')])
+        main(['-b', '^AAA', datapath('small.fastq')])
 
 
 def test_anywhere_anchored_3p():
     with pytest.raises(SystemExit):
-        with redirect_stderr():
-            main(['-b', 'TTT$', datapath('small.fastq')])
+        main(['-b', 'TTT$', datapath('small.fastq')])
 
 
 def test_fasta(run):
@@ -709,15 +698,41 @@ def test_adapter_order(run):
     run("-a CCGGG -g ^AAACC", "adapterorder-ag.fasta", "adapterorder.fasta")
 
 
- at pytest.mark.skip(reason="Not implemented")
-def test_reverse_complement_not_normalized(run):
-    run("--rc=yes -g ^TTATTTGTCT -g ^TCCGCACTGG",
-        "revcomp-notnormalized-single.fastq", "revcomp.1.fastq")
-
-
 def test_reverse_complement_normalized(run):
     run(
         "--revcomp -g ^TTATTTGTCT -g ^TCCGCACTGG",
         "revcomp-single-normalize.fastq",
         "revcomp.1.fastq",
     )
+
+
+def test_reverse_complement_and_info_file(run, tmp_path, cores):
+    info_path = str(tmp_path / "info.txt")
+    run(
+        [
+            "--revcomp",
+            "-g",
+            "^TTATTTGTCT",
+            "-g",
+            "^TCCGCACTGG",
+            "--info-file",
+            info_path,
+        ],
+        "revcomp-single-normalize.fastq",
+        "revcomp.1.fastq",
+    )
+    with open(info_path) as f:
+        lines = f.readlines()
+    assert len(lines) == 6
+    assert lines[0].split("\t")[0] == "read1/1"
+    assert lines[1].split("\t")[0] == "read2/1 rc"
+
+
+def test_max_expected_errors(run, cores):
+    run("--max-ee=0.9", "maxee.fastq", "maxee.fastq")
+
+
+def test_max_expected_errors_fasta(tmp_path):
+    path = tmp_path / "input.fasta"
+    path.write_text(">read\nACGTACGT\n")
+    main(["--max-ee=0.001", "-o", "/dev/null", str(path)])


=====================================
tests/test_main.py
=====================================
@@ -1,9 +1,46 @@
 import pytest
 
-from cutadapt.__main__ import main
+from cutadapt.__main__ import main, parse_cutoffs, parse_lengths, CommandLineError, setup_logging
 
 
 def test_help():
     with pytest.raises(SystemExit) as e:
         main(["--help"])
     assert e.value.args[0] == 0
+
+
+def test_parse_cutoffs():
+    assert parse_cutoffs("5") == (0, 5)
+    assert parse_cutoffs("6,7") == (6, 7)
+    with pytest.raises(CommandLineError):
+        parse_cutoffs("a,7")
+    with pytest.raises(CommandLineError):
+        parse_cutoffs("a")
+    with pytest.raises(CommandLineError):
+        parse_cutoffs("a,7")
+    with pytest.raises(CommandLineError):
+        parse_cutoffs("1,2,3")
+
+
+def test_parse_lengths():
+    assert parse_lengths("25") == (25, )
+    assert parse_lengths("17:25") == (17, 25)
+    assert parse_lengths("25:") == (25, None)
+    assert parse_lengths(":25") == (None, 25)
+    with pytest.raises(CommandLineError):
+        parse_lengths("1:2:3")
+    with pytest.raises(CommandLineError):
+        parse_lengths("a:2")
+    with pytest.raises(CommandLineError):
+        parse_lengths("a")
+    with pytest.raises(CommandLineError):
+        parse_lengths("2:a")
+    with pytest.raises(CommandLineError):
+        parse_lengths(":")
+
+
+def test_setup_logging():
+    import logging
+    logger = logging.getLogger(__name__)
+    setup_logging(logger, stdout=True, quiet=False, minimal=False, debug=False)
+    logger.info("Log message")


=====================================
tests/test_paired.py
=====================================
@@ -5,7 +5,7 @@ from itertools import product
 import pytest
 
 from cutadapt.__main__ import main
-from utils import assert_files_equal, datapath, cutpath, redirect_stderr
+from utils import assert_files_equal, datapath, cutpath
 
 
 @pytest.fixture
@@ -131,9 +131,8 @@ def test_no_trimming():
 
 
 def test_missing_file(tmpdir):
-    with redirect_stderr():
-        with pytest.raises(SystemExit):
-            main(["--paired-output", str(tmpdir.join("out.fastq")), datapath("paired.1.fastq")])
+    with pytest.raises(SystemExit):
+        main(["--paired-output", str(tmpdir.join("out.fastq")), datapath("paired.1.fastq")])
 
 
 def test_first_too_short(tmpdir, cores):
@@ -144,14 +143,13 @@ def test_first_too_short(tmpdir, cores):
         lines = lines[:-4]
     trunc1.write("".join(lines))
 
-    with redirect_stderr():
-        with pytest.raises(SystemExit):
-            main([
-                "-o", "/dev/null",
-                "--paired-output", str(tmpdir.join("out.fastq")),
-                "--cores", str(cores),
-                str(trunc1), datapath("paired.2.fastq")
-            ])
+    with pytest.raises(SystemExit):
+        main([
+            "-o", "/dev/null",
+            "--paired-output", str(tmpdir.join("out.fastq")),
+            "--cores", str(cores),
+            str(trunc1), datapath("paired.2.fastq")
+        ])
 
 
 def test_second_too_short(tmpdir, cores):
@@ -162,14 +160,13 @@ def test_second_too_short(tmpdir, cores):
         lines = lines[:-4]
     trunc2.write("".join(lines))
 
-    with redirect_stderr():
-        with pytest.raises(SystemExit):
-            main([
-                "-o", "/dev/null",
-                "--paired-output", str(tmpdir.join("out.fastq")),
-                "--cores", str(cores),
-                datapath("paired.1.fastq"), str(trunc2)
-            ])
+    with pytest.raises(SystemExit):
+        main([
+            "-o", "/dev/null",
+            "--paired-output", str(tmpdir.join("out.fastq")),
+            "--cores", str(cores),
+            datapath("paired.1.fastq"), str(trunc2)
+        ])
 
 
 def test_unmatched_read_names(tmpdir, cores):
@@ -327,10 +324,9 @@ def test_interleaved_neither_nor(tmpdir):
     p1 = str(tmpdir.join("temp-paired.1.fastq"))
     p2 = str(tmpdir.join("temp-paired.2.fastq"))
     params = "-a XX --interleaved".split()
-    with redirect_stderr():
-        params += ["-o", p1, "-p1", p2, "paired.1.fastq", "paired.2.fastq"]
-        with pytest.raises(SystemExit):
-            main(params)
+    params += ["-o", p1, "-p1", p2, "paired.1.fastq", "paired.2.fastq"]
+    with pytest.raises(SystemExit):
+        main(params)
 
 
 def test_pair_filter_both(run_paired, cores):
@@ -504,25 +500,40 @@ def test_pair_adapters_demultiplexing(tmpdir):
         assert_files_equal(cutpath(name), str(tmpdir.join(name)))
 
 
-def test_combinatorial_demultiplexing(tmpdir):
+ at pytest.mark.parametrize("discarduntrimmed", (False, True))
+def test_combinatorial_demultiplexing(tmpdir, discarduntrimmed):
     params = "-g A=^AAAAAAAAAA -g C=^CCCCCCCCCC -G G=^GGGGGGGGGG -G T=^TTTTTTTTTT".split()
     params += ["-o", str(tmpdir.join("combinatorial.{name1}_{name2}.1.fasta"))]
     params += ["-p", str(tmpdir.join("combinatorial.{name1}_{name2}.2.fasta"))]
     params += [datapath("combinatorial.1.fasta"), datapath("combinatorial.2.fasta")]
+    combinations = [
+        # third column says whether the file must exist
+        ("A", "G", True),
+        ("A", "T", True),
+        ("C", "G", True),
+        ("C", "T", True),
+    ]
+    if discarduntrimmed:
+        combinations += [
+            ("unknown", "G", False),
+            ("A", "unknown", False),
+        ]
+        params += ["--discard-untrimmed"]
+    else:
+        combinations += [
+            ("unknown", "G", True),
+            ("A", "unknown", True),
+        ]
     assert main(params) is None
-    for (name1, name2) in [
-        ("A", "G"),
-        ("A", "T"),
-        ("C", "G"),
-        ("C", "T"),
-        ("unknown", "G"),
-        ("A", "unknown"),
-    ]:
+    for (name1, name2, should_exist) in combinations:
         for i in (1, 2):
             name = "combinatorial.{name1}_{name2}.{i}.fasta".format(name1=name1, name2=name2, i=i)
             path = cutpath(os.path.join("combinatorial", name))
-            assert tmpdir.join(name).check(), "Output file missing"
-            assert_files_equal(path, str(tmpdir.join(name)))
+            if should_exist:
+                assert tmpdir.join(name).check(), "Output file missing"
+                assert_files_equal(path, str(tmpdir.join(name)))
+            else:
+                assert not tmpdir.join(name).check(), "Output file should not exist"
 
 
 def test_info_file(tmpdir):


=====================================
tests/test_parser.py
=====================================
@@ -4,6 +4,7 @@ import pytest
 from dnaio import Sequence
 from cutadapt.adapters import Where, WhereToRemove, LinkedAdapter, SingleAdapter
 from cutadapt.parser import AdapterParser, AdapterSpecification
+from cutadapt.modifiers import ModificationInfo
 
 
 def test_expand_braces():
@@ -178,5 +179,5 @@ def test_anywhere_parameter():
     read = Sequence('foo1', 'TGAAGTACACGGTTAAAAAAAAAA')
     from cutadapt.modifiers import AdapterCutter
     cutter = AdapterCutter([adapter])
-    trimmed_read = cutter(read, [])
+    trimmed_read = cutter(read, ModificationInfo())
     assert trimmed_read.sequence == ''


=====================================
tests/test_qualtrim.py
=====================================
@@ -1,5 +1,7 @@
+import pytest
+
 from dnaio import Sequence
-from cutadapt.qualtrim import nextseq_trim_index
+from cutadapt.qualtrim import nextseq_trim_index, expected_errors
 
 
 def test_nextseq_trim():
@@ -11,3 +13,22 @@ def test_nextseq_trim():
         'AA//EAEE//A6///E//A//EA/EEEEEEAEA//EEEEEEEEEEEEEEE###########EE#EA'
     )
     assert nextseq_trim_index(s, cutoff=22) == 33
+
+
+def test_expected_errors():
+    def encode_qualities(quals):
+        return "".join(chr(q + 33) for q in quals)
+
+    assert pytest.approx(0.0) == expected_errors("")
+
+    assert pytest.approx(0.1) == expected_errors(encode_qualities([10]))
+    assert pytest.approx(0.01) == expected_errors(encode_qualities([20]))
+    assert pytest.approx(0.001) == expected_errors(encode_qualities([30]))
+
+    assert pytest.approx(0.2) == expected_errors(encode_qualities([10, 10]))
+    assert pytest.approx(0.11) == expected_errors(encode_qualities([10, 20]))
+    assert pytest.approx(0.11) == expected_errors(encode_qualities([20, 10]))
+
+    assert pytest.approx(0.3) == expected_errors(encode_qualities([10, 10, 10]))
+    assert pytest.approx(0.111) == expected_errors(encode_qualities([10, 20, 30]))
+    assert pytest.approx(0.2111) == expected_errors(encode_qualities([10, 10, 20, 30, 40]))


=====================================
tests/test_trim.py
=====================================
@@ -1,13 +1,13 @@
 from dnaio import Sequence
 from cutadapt.adapters import SingleAdapter, Where
-from cutadapt.modifiers import AdapterCutter
+from cutadapt.modifiers import AdapterCutter, ModificationInfo
 
 
 def test_statistics():
     read = Sequence('name', 'AAAACCCCAAAA')
     adapters = [SingleAdapter('CCCC', Where.BACK, max_error_rate=0.1)]
     cutter = AdapterCutter(adapters, times=3)
-    cutter(read, [])
+    cutter(read, ModificationInfo())
     # TODO make this a lot simpler
     trimmed_bp = 0
     for adapter in adapters:
@@ -29,7 +29,7 @@ def test_end_trim_with_mismatch():
 
     read = Sequence('foo1', 'AAAAAAAAAAATCGTCGATC')
     cutter = AdapterCutter([adapter], times=1)
-    trimmed_read = cutter(read, [])
+    trimmed_read = cutter(read, ModificationInfo())
 
     assert trimmed_read.sequence == 'AAAAAAAAAAA'
     assert cutter.adapter_statistics[adapter].back.lengths == {9: 1}
@@ -39,7 +39,7 @@ def test_end_trim_with_mismatch():
 
     read = Sequence('foo2', 'AAAAAAAAAAATCGAACGA')
     cutter = AdapterCutter([adapter], times=1)
-    trimmed_read = cutter(read, [])
+    trimmed_read = cutter(read, ModificationInfo())
 
     assert trimmed_read.sequence == read.sequence
     assert cutter.adapter_statistics[adapter].back.lengths == {}
@@ -57,5 +57,5 @@ def test_anywhere_with_errors():
     ):
         read = Sequence('foo', seq)
         cutter = AdapterCutter([adapter], times=1)
-        trimmed_read = cutter(read, [])
+        trimmed_read = cutter(read, ModificationInfo())
         assert trimmed_read.sequence == expected_trimmed


=====================================
tests/utils.py
=====================================
@@ -1,16 +1,5 @@
 import os.path
 import subprocess
-import sys
-from contextlib import contextmanager
-
-
- at contextmanager
-def redirect_stderr():
-    """Send stderr to stdout. Nose doesn't capture stderr, yet."""
-    old_stderr = sys.stderr
-    sys.stderr = sys.stdout
-    yield
-    sys.stderr = old_stderr
 
 
 def datapath(path):


=====================================
tox.ini
=====================================
@@ -8,6 +8,7 @@ deps =
     pytest
     pytest-timeout
     pytest-mock
+setenv = PYTHONDEVMODE = 1
 commands =
     coverage run --concurrency=multiprocessing -m pytest --doctest-modules --pyargs cutadapt tests
     coverage combine
@@ -31,10 +32,6 @@ basepython = python3.6
 deps = mypy
 commands = mypy src/
 
-[travis]
-python =
-  3.6: py36, docs, mypy
-
 [coverage:run]
 parallel = True
 include =
@@ -48,6 +45,6 @@ source =
 
 [flake8]
 max-line-length = 120
-max-complexity = 20
+max-complexity = 16
 select = E,F,W,C90,W504
 extend_ignore = E128,E131,W503,E203



View it on GitLab: https://salsa.debian.org/med-team/python-cutadapt/-/commit/afc23835c3b1b22776876ed5f9011295b0059b30

-- 
View it on GitLab: https://salsa.debian.org/med-team/python-cutadapt/-/commit/afc23835c3b1b22776876ed5f9011295b0059b30
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20200331/a676ad69/attachment-0001.html>


More information about the debian-med-commit mailing list