[med-svn] [Git][med-team/python-cutadapt][upstream] New upstream version 3.1

Nilesh Patra gitlab at salsa.debian.org
Sat Dec 12 13:18:13 GMT 2020



Nilesh Patra pushed to branch upstream at Debian Med / python-cutadapt


Commits:
02bef8f8 by Nilesh Patra at 2020-12-12T18:36:06+05:30
New upstream version 3.1
- - - - -


21 changed files:

- .codecov.yml
- + .github/workflows/ci.yml
- − .travis.yml
- CHANGES.rst
- buildwheels.sh
- doc/guide.rst
- setup.py
- src/cutadapt/__main__.py
- src/cutadapt/adapters.py
- src/cutadapt/log.py
- src/cutadapt/modifiers.py
- src/cutadapt/parser.py
- src/cutadapt/pipeline.py
- src/cutadapt/report.py
- src/cutadapt/utils.py
- + tests/cut/action_retain.fasta
- + tests/data/action_retain.fasta
- + tests/test_command.py
- tests/test_commandline.py
- tests/test_modifiers.py
- tox.ini


Changes:

=====================================
.codecov.yml
=====================================
@@ -1,5 +1,3 @@
-comment: off
-
 codecov:
   require_ci_to_pass: no
 
@@ -9,7 +7,11 @@ coverage:
   range: "90...100"
 
   status:
-    project: yes
+    project:
+      default:
+        target: auto
+        threshold: 1%
+        base: auto
     patch: no
     changes: no
 


=====================================
.github/workflows/ci.yml
=====================================
@@ -0,0 +1,72 @@
+name: CI
+
+on: [push, pull_request]
+
+jobs:
+  lint:
+    timeout-minutes: 5
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: [3.7]
+        toxenv: [flake8, mypy, docs]
+    steps:
+    - uses: actions/checkout at v2
+    - name: Set up Python ${{ matrix.python-version }}
+      uses: actions/setup-python at v2
+      with:
+        python-version: ${{ matrix.python-version }}
+    - name: Install dependencies
+      run: python -m pip install tox
+    - name: Run tox ${{ matrix.toxenv }}
+      run: tox -e ${{ matrix.toxenv }}
+
+  test:
+    timeout-minutes: 5
+    runs-on: ${{ matrix.os }}
+    strategy:
+      matrix:
+        os: [ubuntu-latest]
+        python-version: [3.6, 3.7, 3.8, 3.9]
+        include:
+        - python-version: 3.8
+          os: macos-latest
+    steps:
+    - uses: actions/checkout at v2
+    - name: Set up Python ${{ matrix.python-version }}
+      uses: actions/setup-python at v2
+      with:
+        python-version: ${{ matrix.python-version }}
+    - name: Install dependencies
+      run: python -m pip install tox
+    - name: Test
+      run: tox -e py
+    - name: Upload coverage report
+      uses: codecov/codecov-action at v1
+
+  deploy:
+    timeout-minutes: 5
+    runs-on: ubuntu-latest
+    needs: [lint, test]
+    if: startsWith(github.ref, 'refs/tags')
+    steps:
+    - uses: actions/checkout at v2
+      with:
+        fetch-depth: 0  # required for setuptools_scm
+    - name: Set up Python
+      uses: actions/setup-python at v2
+      with:
+        python-version: 3.7
+    - name: Make distributions
+      run: |
+        python -m pip install Cython
+        python setup.py sdist
+        ./buildwheels.sh
+        ls -l dist/
+    - name: Publish to PyPI
+      uses: pypa/gh-action-pypi-publish at v1.4.1
+      with:
+        user: __token__
+        password: ${{ secrets.pypi_password }}
+        #password: ${{ secrets.test_pypi_password }}
+        #repository_url: https://test.pypi.org/legacy/


=====================================
.travis.yml deleted
=====================================
@@ -1,63 +0,0 @@
-language: python
-
-cache:
-  directories:
-    - $HOME/.cache/pip
-
-install:
-  - pip install tox
-
-script:
-  - tox
-
-after_success:
-  - pip install codecov
-  - codecov
-
-env:
-  global:
-    #- TWINE_REPOSITORY_URL=https://test.pypi.org/legacy/
-    - TWINE_USERNAME=marcelm
-    # TWINE_PASSWORD is set in Travis settings
-
-jobs:
-  include:
-    - python: "3.6"
-      env: TOXENV=py36
-
-    - python: "3.7"
-      env: TOXENV=py37
-
-    - python: "3.8"
-      env: TOXENV=py38
-
-    - python: "3.9"
-      env: TOXENV=py39
-
-    - name: flake8
-      python: "3.6"
-      env: TOXENV=flake8
-
-    - name: mypy
-      python: "3.6"
-      env: TOXENV=mypy
-
-    - name: docs
-      python: "3.6"
-      env: TOXENV=docs
-
-    - stage: deploy
-      services:
-        - docker
-      python: "3.6"
-      install: python3 -m pip install Cython twine
-      if: tag IS present
-      script:
-        - |
-          python3 setup.py sdist
-          ./buildwheels.sh
-          ls -l dist/
-          python3 -m twine upload dist/*
-
-  allow_failures:
-    - python: "nightly"


=====================================
CHANGES.rst
=====================================
@@ -2,6 +2,25 @@
 Changes
 =======
 
+v3.1 (2020-12-03)
+-----------------
+
+* :issue:`443`: With ``--action=retain``, it is now possible to trim reads while
+  leaving the adapter sequence itself in the read. That is, only the sequence
+  before (for 5’ adapters) or after (for 3’ adapters) is removed. With linked
+  adapters, both adapters are retained.
+* :issue:`495`: Running with multiple cores did not work using macOS and Python 3.8+.
+  To prevent problems like these in the future, automated testing has been extended
+  to also run on macOS.
+* :issue:`482`: Print statistics for ``--discard-casava`` and ``--max-ee`` in the
+  report.
+* :issue:`497`: The changelog for 3.0 previously forgot to mention that the following
+  options, which were deprecated in version 2.0, have now been removed, and
+  using them will lead to an error: ``--format``, ``--colorspace``, ``-c``, ``-d``,
+  ``--double-encode``, ``-t``, ``--trim-primer``, ``--strip-f3``, ``--maq``,
+  ``--bwa``, ``--no-zero-cap``. This frees up some single-character options,
+  allowing them to be re-purposed for future Cutadapt features.
+
 v3.0 (2020-11-10)
 -----------------
 
@@ -28,6 +47,12 @@ v3.0 (2020-11-10)
   or had too many ``N``. (This unintentionally disappeared in a previous version.)
 * :issue:`487`: When demultiplexing, the reported number of written pairs was
   always zero.
+* :issue:`497`: The following options, which were deprecated in version 2.0, have
+  been removed, and using them will lead to an error:
+  ``--format``, ``--colorspace``, ``-c``, ``-d``, ``--double-encode``,
+  ``-t``, ``--trim-primer``, ``--strip-f3``, ``--maq``, ``--bwa``, ``--no-zero-cap``.
+  This frees up some single-character options,
+  allowing them to be re-purposed for future Cutadapt features.
 * Ensure Cutadapt runs under Python 3.9.
 * Drop support for Python 3.5.
 


=====================================
buildwheels.sh
=====================================
@@ -16,7 +16,7 @@ manylinux=quay.io/pypa/manylinux2010_x86_64
 
 # For convenience, if this script is called from outside of a docker container,
 # it starts a container and runs itself inside of it.
-if ! grep -q docker /proc/1/cgroup; then
+if ! grep -q docker /proc/1/cgroup && ! test -d /opt/python; then
   # We are not inside a container
   docker pull ${manylinux}
   exec docker run --rm -v $(pwd):/io ${manylinux} /io/$0


=====================================
doc/guide.rst
=====================================
@@ -903,25 +903,47 @@ This section describes in which ways reads can be modified other than adapter
 removal.
 
 
-Not trimming adapters
----------------------
+.. _changing-what-is-done-when-an-adapter-is-found:
+.. _action:
 
-Instead of removing an adapter from a read, it is also possible to take other
-actions when an adapter is found by specifying the ``--action`` option.
+``--action`` changes what is done when an adapter is found
+----------------------------------------------------------
 
-The default is ``--action=trim``, which will remove the adapter and either
-the sequence before or after it from the read.
+The ``--action`` option can be used to change what is done when an adapter match
+is found in a read.
 
-Use ``--action=none`` to not remove the adapter from the read. This is useful
-when combined with other options, such as ``--untrimmed-output``, which
-will redirect the reads without adapter to a different file. Other read
-modification options (as listed below) may still change the read.
+The default is ``--action=trim``, which will remove the adapter and the
+sequence before or after it from the read. For 5' adapters, the adapter and
+the sequence preceding it is removed. For 3' adapters, the adapter and the
+sequence following it is removed. Since linked adapters are a combination of
+a 5' and 3' adapter, in effect only the sequence between the 5' and the 3'
+adapter matches is kept.
 
-Use ``--action=mask`` to write ``N`` characters to that parts of the read
-that would otherwise have been removed .
+With ``--action=retain``, the read is trimmed, but the adapter sequence itself
+is not removed. Up- and downstream sequences are removed in the same way as
+for the ``trim`` action. For linked adapters, both adapter sequences are kept.
 
-Use ``--action=lowercase`` to change to lowercase that part of the read that would otherwise
-have been removed. The rest is converted to uppercase.
+.. note::
+    Because it is somewhat unclear what should happen, ``--action=retain`` can
+    at the moment not be combined with ``--times`` (multiple rounds of adapter
+    removal).
+
+Use ``--action=none`` to not change the read even if there is a match.
+This is useful because the statistics will still be updated as before
+and because the read will still be considered "trimmed" for the read
+filtering options. Combining this with ``--untrimmed-output``, for
+example, can be used to copy reads without adapters to a different
+file. Other read modification options, if used, may still change
+the read.
+
+Use ``--action=mask`` to write ``N`` characters to those parts of the read
+that would otherwise have been removed.
+
+Use ``--action=lowercase`` to change to lowercase those parts of the read that
+would otherwise have been removed. The rest is converted to uppercase.
+
+.. versionadded:: 3.1
+    The ``retain`` action.
 
 
 .. _cut-bases:


=====================================
setup.py
=====================================
@@ -92,6 +92,7 @@ setup(
     url='https://cutadapt.readthedocs.io/',
     description='trim adapters from high-throughput sequencing reads',
     long_description=long_description,
+    long_description_content_type='text/x-rst',
     license='MIT',
     cmdclass={'build_ext': BuildExt, 'sdist': SDist},
     ext_modules=extensions,


=====================================
src/cutadapt/__main__.py
=====================================
@@ -125,9 +125,8 @@ def get_argument_parser() -> ArgumentParser:
     group.add_argument("-h", "--help", action="help", help="Show this help message and exit")
     group.add_argument("--version", action="version", help="Show version number and exit",
         version=__version__)
-    group.add_argument("--debug", nargs="?", const=True, default=False,
-        choices=("trace", ),
-        help="Print debug log. 'trace' prints also DP matrices")
+    group.add_argument("--debug", action="count", default=0,
+        help="Print debug log. Use twice to also print DP matrices")
     group.add_argument("--profile", action="store_true", default=False, help=SUPPRESS)
     group.add_argument('-j', '--cores', type=int, default=1,
         help='Number of CPU cores to use. Use 0 to auto-detect. Default: %(default)s')
@@ -192,12 +191,14 @@ def get_argument_parser() -> ArgumentParser:
     group.add_argument("-N", "--no-match-adapter-wildcards", action="store_false",
         default=True, dest='match_adapter_wildcards',
         help="Do not interpret IUPAC wildcards in adapters.")
-    group.add_argument("--action", choices=('trim', 'mask', 'lowercase', 'none'), default='trim',
-        help="What to do with found adapters. "
+    group.add_argument("--action", choices=("trim", "retain", "mask", "lowercase", "none"),
+        default="trim",
+        help="What to do if a match was found. "
+            "trim: trim adapter and up- or downstream sequence; "
+            "retain: trim, but retain adapter; "
             "mask: replace with 'N' characters; "
             "lowercase: convert to lowercase; "
-            "none: leave unchanged (useful with "
-            "--discard-untrimmed). Default: %(default)s")
+            "none: leave unchanged. Default: %(default)s")
     group.add_argument("--rc", "--revcomp", dest="reverse_complement", default=False,
         action="store_true",
         help="Check both the read and its reverse complement for adapter matches. If "
@@ -724,7 +725,7 @@ def adapters_from_args(args) -> Tuple[List[Adapter], List[Adapter]]:
         raise CommandLineError(e)
     warn_duplicate_adapters(adapters)
     warn_duplicate_adapters(adapters2)
-    if args.debug == "trace":
+    if args.debug > 1:
         for adapter in adapters + adapters2:
             adapter.enable_debug()
     return adapters, adapters2
@@ -776,10 +777,13 @@ def add_adapter_cutter(
         pipeline.add_paired_modifier(cutter)
     else:
         adapter_cutter, adapter_cutter2 = None, None
-        if adapters:
-            adapter_cutter = AdapterCutter(adapters, times, action, allow_index)
-        if adapters2:
-            adapter_cutter2 = AdapterCutter(adapters2, times, action, allow_index)
+        try:
+            if adapters:
+                adapter_cutter = AdapterCutter(adapters, times, action, allow_index)
+            if adapters2:
+                adapter_cutter2 = AdapterCutter(adapters2, times, action, allow_index)
+        except ValueError as e:
+            raise CommandLineError(e)
         if paired:
             if reverse_complement:
                 raise CommandLineError("--revcomp not implemented for paired-end reads")
@@ -837,11 +841,11 @@ def main(cmdlineargs, default_outfile=sys.stdout.buffer) -> Statistics:
     start_time = time.time()
     parser = get_argument_parser()
     args, leftover_args = parser.parse_known_args(args=cmdlineargs)
-    # log to stderr if results are to be sent to stdout
-    log_to_stdout = args.output is not None and args.output != "-" and args.paired_output != "-"
     # Setup logging only if there are not already any handlers (can happen when
     # this function is being called externally such as from unit tests)
     if not logging.root.handlers:
+        # If results are to be sent to stdout, logging needs to go to stderr
+        log_to_stdout = args.output is not None and args.output != "-" and args.paired_output != "-"
         setup_logging(logger, stdout=log_to_stdout,
             quiet=args.quiet, minimal=args.report == 'minimal', debug=args.debug)
     log_header(cmdlineargs)


=====================================
src/cutadapt/adapters.py
=====================================
@@ -151,6 +151,10 @@ class Match(ABC):
     def remainder_interval(self) -> Tuple[int, int]:
         pass
 
+    @abstractmethod
+    def retained_adapter_interval(self) -> Tuple[int, int]:
+        pass
+
     @abstractmethod
     def get_info_records(self, read) -> List[List]:
         pass
@@ -271,6 +275,9 @@ class RemoveBeforeMatch(SingleMatch):
         """
         return self.rstop, len(self.sequence)
 
+    def retained_adapter_interval(self) -> Tuple[int, int]:
+        return self.rstart, len(self.sequence)
+
     def trim_slice(self):
         # Same as remainder_interval, but as a slice() object
         return slice(self.rstop, None)
@@ -306,6 +313,9 @@ class RemoveAfterMatch(SingleMatch):
         """
         return 0, self.rstart
 
+    def retained_adapter_interval(self) -> Tuple[int, int]:
+        return 0, self.rstop
+
     def trim_slice(self):
         # Same as remainder_interval, but as a slice() object
         return slice(None, self.rstart)
@@ -329,9 +339,7 @@ def _generate_adapter_name(_start=[1]) -> str:
     return name
 
 
-class Adapter(ABC):
-
-    description = "adapter with one component"  # this is overriden in subclasses
+class Matchable(ABC):
 
     def __init__(self, name: str, *args, **kwargs):
         self.name = name
@@ -345,6 +353,15 @@ class Adapter(ABC):
         pass
 
 
+class Adapter(Matchable, ABC):
+
+    description = "adapter with one component"  # this is overriden in subclasses
+
+    @abstractmethod
+    def create_statistics(self) -> AdapterStatistics:
+        pass
+
+
 class SingleAdapter(Adapter, ABC):
     """
     This class can find a single adapter characterized by sequence, error rate,
@@ -685,6 +702,18 @@ class LinkedMatch(Match):
         matches = [match for match in [self.front_match, self.back_match] if match is not None]
         return remainder(matches)
 
+    def retained_adapter_interval(self) -> Tuple[int, int]:
+        if self.front_match:
+            start = self.front_match.rstart
+            offset = self.front_match.rstop
+        else:
+            start = offset = 0
+        if self.back_match:
+            end = self.back_match.rstop + offset
+        else:
+            end = len(self.front_match.sequence)
+        return start, end
+
     def get_info_records(self, read) -> List[List]:
         records = []
         for match, namesuffix in [
@@ -707,11 +736,11 @@ class LinkedAdapter(Adapter):
 
     def __init__(
         self,
-        front_adapter,
-        back_adapter,
-        front_required,
-        back_required,
-        name,
+        front_adapter: SingleAdapter,
+        back_adapter: SingleAdapter,
+        front_required: bool,
+        back_required: bool,
+        name: str,
     ):
         super().__init__(name)
         self.front_required = front_required
@@ -754,11 +783,11 @@ class LinkedAdapter(Adapter):
         return None
 
 
-class MultipleAdapters(Adapter):
+class MultipleAdapters(Matchable):
     """
     Represent multiple adapters at once
     """
-    def __init__(self, adapters: Sequence[Adapter]):
+    def __init__(self, adapters: Sequence[Matchable]):
         super().__init__(name="multiple_adapters")
         self._adapters = adapters
 
@@ -792,7 +821,7 @@ class MultipleAdapters(Adapter):
         return best_match
 
 
-class IndexedAdapters(Adapter, ABC):
+class IndexedAdapters(Matchable, ABC):
     """
     Represent multiple adapters of the same type at once and use an index data structure
     to speed up matching. This acts like a "normal" Adapter as it provides a match_to


=====================================
src/cutadapt/log.py
=====================================
@@ -6,6 +6,17 @@ import logging
 REPORT = 25
 
 
+class CrashingHandler(logging.StreamHandler):
+
+    def emit(self, record):
+        """Unlike the method it overrides, this will not catch exceptions"""
+        msg = self.format(record)
+        stream = self.stream
+        stream.write(msg)
+        stream.write(self.terminator)
+        self.flush()
+
+
 class NiceFormatter(logging.Formatter):
     """
     Do not prefix "INFO:" to info-level log messages (but do it for all other
@@ -19,7 +30,7 @@ class NiceFormatter(logging.Formatter):
         return super().format(record)
 
 
-def setup_logging(logger, stdout=False, minimal=False, quiet=False, debug=False):
+def setup_logging(logger, stdout=False, minimal=False, quiet=False, debug=0):
     """
     Attach handler to the global logger object
     """
@@ -28,12 +39,10 @@ def setup_logging(logger, stdout=False, minimal=False, quiet=False, debug=False)
     # INFO level (and the ERROR level would give us an 'ERROR:' prefix).
     logging.addLevelName(REPORT, 'REPORT')
 
-    # Due to backwards compatibility, logging output is sent to standard output
-    # instead of standard error if the -o option is used.
-    stream_handler = logging.StreamHandler(sys.stdout if stdout else sys.stderr)
+    stream_handler = CrashingHandler(sys.stdout if stdout else sys.stderr)
     stream_handler.setFormatter(NiceFormatter())
     # debug overrides quiet overrides minimal
-    if debug:
+    if debug > 0:
         level = logging.DEBUG
     elif quiet:
         level = logging.ERROR


=====================================
src/cutadapt/modifiers.py
=====================================
@@ -9,7 +9,8 @@ from abc import ABC, abstractmethod
 from collections import OrderedDict
 
 from .qualtrim import quality_trim_index, nextseq_trim_index
-from .adapters import MultipleAdapters, SingleAdapter, IndexedPrefixAdapters, IndexedSuffixAdapters, Match, remainder
+from .adapters import MultipleAdapters, SingleAdapter, IndexedPrefixAdapters, IndexedSuffixAdapters, \
+    Match, remainder, Adapter
 from .utils import reverse_complemented_sequence
 
 
@@ -76,18 +77,23 @@ class AdapterCutter(SingleEndModifier):
 
     def __init__(
         self,
-        adapters: List[SingleAdapter],
+        adapters: List[Adapter],
         times: int = 1,
         action: Optional[str] = "trim",
         index: bool = True,
     ):
         """
-        action -- What to do with a found adapter: None, 'trim', 'mask' or 'lowercase'
+        action -- What to do with a found adapter:
+          None: Do nothing, only update the ModificationInfo appropriately
+          "trim": Remove the adapter and down- or upstream sequence depending on adapter type
+          "mask": Replace the part of the sequence that would have been removed with "N" bases
+          "lowercase": Convert the part of the sequence that would have been removed to lowercase
+          "retain": Like "trim", but leave the adapter sequence itself in the read
 
         index -- if True, an adapter index (for multiple adapters) is created if possible
         """
         self.times = times
-        assert action in ('trim', 'mask', 'lowercase', None)
+        assert action in ("trim", "mask", "lowercase", "retain", None)
         self.action = action
         self.with_adapters = 0
         self.adapter_statistics = OrderedDict((a, a.create_statistics()) for a in adapters)
@@ -95,6 +101,8 @@ class AdapterCutter(SingleEndModifier):
             self.adapters = MultipleAdapters(self._regroup_into_indexed_adapters(adapters))
         else:
             self.adapters = MultipleAdapters(adapters)
+        if action == "retain" and times > 1:
+            raise ValueError("'retain' cannot be combined with times > 1")
 
     def __repr__(self):
         return 'AdapterCutter(adapters={!r}, times={}, action={!r})'.format(
@@ -141,6 +149,11 @@ class AdapterCutter(SingleEndModifier):
                 other.append(a)
         return prefix, suffix, other
 
+    @staticmethod
+    def trim_but_retain_adapter(read, matches: Sequence[Match]):
+        start, stop = matches[-1].retained_adapter_interval()
+        return read[start:stop]
+
     @staticmethod
     def masked_read(read, matches: Sequence[Match]):
         start, stop = remainder(matches)
@@ -174,7 +187,7 @@ class AdapterCutter(SingleEndModifier):
     def match_and_trim(self, read):
         """
         Search for the best-matching adapter in a read, perform the requested action
-        ('trim', 'mask', 'lowercase' or None as determined by self.action) and return the
+        ('trim', 'mask' etc. as determined by self.action) and return the
         (possibly) modified read.
 
         *self.times* adapter removal rounds are done. During each round,
@@ -184,7 +197,7 @@ class AdapterCutter(SingleEndModifier):
         Return a pair (trimmed_read, matches), where matches is a list of Match instances.
         """
         matches = []
-        if self.action == 'lowercase':
+        if self.action == 'lowercase':  # TODO this should not be needed
             read.sequence = read.sequence.upper()
         trimmed_read = read
         for _ in range(self.times):
@@ -201,12 +214,14 @@ class AdapterCutter(SingleEndModifier):
         if self.action == 'trim':
             # read is already trimmed, nothing to do
             pass
+        elif self.action == 'retain':
+            trimmed_read = self.trim_but_retain_adapter(read, matches)
         elif self.action == 'mask':
             trimmed_read = self.masked_read(read, matches)
         elif self.action == 'lowercase':
             trimmed_read = self.lowercased_read(read, matches)
             assert len(trimmed_read.sequence) == len(read)
-        elif self.action is None:  # --no-trim
+        elif self.action is None:
             trimmed_read = read[:]
 
         return trimmed_read, matches
@@ -321,6 +336,8 @@ class PairedAdapterCutter(PairedModifier):
             elif self.action == 'lowercase':
                 trimmed_read = AdapterCutter.lowercased_read(read, [match])
                 assert len(trimmed_read.sequence) == len(read)
+            elif self.action == 'retain':
+                trimmed_read = AdapterCutter.trim_but_retain_adapter(read, [match])
             elif self.action is None:  # --no-trim
                 trimmed_read = read[:]
             result.append(trimmed_read)


=====================================
src/cutadapt/parser.py
=====================================
@@ -352,7 +352,7 @@ class AdapterParser:
             raise ValueError("'anywhere' (-b) adapters may not be linked")
         front_spec = AdapterSpecification.parse(spec1, 'front')
         back_spec = AdapterSpecification.parse(spec2, 'back')
-        if not name:
+        if name is None:
             name = front_spec.name
 
         front_anchored = front_spec.restriction is not None


=====================================
src/cutadapt/pipeline.py
=====================================
@@ -4,7 +4,7 @@ import sys
 import copy
 import logging
 import functools
-from typing import List, IO, Optional, BinaryIO, TextIO, Any, Tuple, Dict
+from typing import List, Optional, BinaryIO, TextIO, Any, Tuple, Dict
 from abc import ABC, abstractmethod
 from multiprocessing import Process, Pipe, Queue
 from pathlib import Path
@@ -120,7 +120,7 @@ class OutputFiles:
                     assert f is not None
                     yield f
 
-    def as_bytesio(self):
+    def as_bytesio(self) -> "OutputFiles":
         """
         Create a new OutputFiles instance that has BytesIO instances for each non-None output file
         """
@@ -141,6 +141,13 @@ class OutputFiles:
                 result.demultiplex_out2[k] = io.BytesIO()
         return result
 
+    def close(self) -> None:
+        """Close all output files that are not stdout"""
+        for f in self:
+            if f is sys.stdout or f is sys.stdout.buffer:
+                continue
+            f.close()
+
 
 class Pipeline(ABC):
     """
@@ -150,7 +157,6 @@ class Pipeline(ABC):
     paired = False
 
     def __init__(self, file_opener: FileOpener):
-        self._close_files = []  # type: List[IO]
         self._reader = None  # type: Any
         self._filters = []  # type: List[Any]
         self._infiles = None  # type: Optional[InputFiles]
@@ -168,7 +174,7 @@ class Pipeline(ABC):
         self.discard_untrimmed = False
         self.file_opener = file_opener
 
-    def connect_io(self, infiles: InputFiles, outfiles: OutputFiles):
+    def connect_io(self, infiles: InputFiles, outfiles: OutputFiles) -> None:
         self._infiles = infiles
         self._reader = infiles.open()
         self._set_output(outfiles)
@@ -182,7 +188,7 @@ class Pipeline(ABC):
     ):
         pass
 
-    def _set_output(self, outfiles: OutputFiles):
+    def _set_output(self, outfiles: OutputFiles) -> None:
         self._filters = []
         self._outfiles = outfiles
         filter_wrapper = self._filter_wrapper()
@@ -262,20 +268,21 @@ class Pipeline(ABC):
             f.flush()
 
     def close(self) -> None:
+        self._close_input()
+        self._close_output()
+
+    def _close_input(self) -> None:
         self._reader.close()
         if self._infiles is not None:
             self._infiles.close()
+
+    def _close_output(self) -> None:
         for f in self._textiowrappers:
-            f.close()  # This also closes the underlying files; a second close occurs below
+            f.close()
+        # Closing a TextIOWrapper also closes the underlying file, so
+        # this closes some files a second time.
         assert self._outfiles is not None
-        for f in self._outfiles:
-            # TODO do not use hasattr
-            if f is not sys.stdin and f is not sys.stdout and f is not sys.stdout.buffer and hasattr(f, 'close'):
-                f.close()
-        for outs in [self._outfiles.demultiplex_out, self._outfiles.demultiplex_out2]:
-            if outs is not None:
-                for out in outs.values():
-                    out.close()
+        self._outfiles.close()
 
     @property
     def uses_qualities(self) -> bool:
@@ -283,7 +290,7 @@ class Pipeline(ABC):
         return self._reader.delivers_qualities
 
     @abstractmethod
-    def process_reads(self, progress: Progress = None):
+    def process_reads(self, progress: Progress = None) -> Tuple[int, int, Optional[int]]:
         pass
 
     @abstractmethod
@@ -316,7 +323,7 @@ class SingleEndPipeline(Pipeline):
             raise ValueError("Modifier must not be None")
         self._modifiers.append(modifier)
 
-    def process_reads(self, progress: Progress = None):
+    def process_reads(self, progress: Progress = None) -> Tuple[int, int, Optional[int]]:
         """Run the pipeline. Return statistics"""
         n = 0  # no. of processed reads
         total_bp = 0
@@ -402,7 +409,7 @@ class PairedEndPipeline(Pipeline):
         # Whether to ignore pair_filter mode for discard-untrimmed filter
         self.override_untrimmed_pair_filter = False
 
-    def add(self, modifier1: Optional[SingleEndModifier], modifier2: Optional[SingleEndModifier]):
+    def add(self, modifier1: Optional[SingleEndModifier], modifier2: Optional[SingleEndModifier]) -> None:
         """
         Add a modifier for R1 and R2. One of them can be None, in which case the modifier
         will only be added for the respective read.
@@ -411,18 +418,18 @@ class PairedEndPipeline(Pipeline):
             raise ValueError("Not both modifiers can be None")
         self._modifiers.append(PairedModifierWrapper(modifier1, modifier2))
 
-    def add_both(self, modifier: SingleEndModifier):
+    def add_both(self, modifier: SingleEndModifier) -> None:
         """
         Add one modifier for both R1 and R2
         """
         assert modifier is not None
         self._modifiers.append(PairedModifierWrapper(modifier, copy.copy(modifier)))
 
-    def add_paired_modifier(self, paired_modifier: PairedModifier):
+    def add_paired_modifier(self, paired_modifier: PairedModifier) -> None:
         """Add a Modifier (without wrapping it in a PairedModifierWrapper)"""
         self._modifiers.append(paired_modifier)
 
-    def process_reads(self, progress: Progress = None):
+    def process_reads(self, progress: Progress = None) -> Tuple[int, int, Optional[int]]:
         n = 0  # no. of processed reads
         total1_bp = 0
         total2_bp = 0
@@ -594,8 +601,17 @@ class WorkerProcess(Process):
     To notify the reader process that it wants data, it puts its own identifier into the
     need_work_queue before attempting to read data from the read_pipe.
     """
-    def __init__(self, id_, pipeline, two_input_files,
-            interleaved_input, orig_outfiles, read_pipe, write_pipe, need_work_queue):
+    def __init__(
+        self,
+        id_: int,
+        pipeline: Pipeline,
+        two_input_files: bool,
+        interleaved_input: bool,
+        orig_outfiles: OutputFiles,
+        read_pipe: Connection,
+        write_pipe: Connection,
+        need_work_queue: Queue,
+    ):
         super().__init__()
         self._id = id_
         self._pipeline = pipeline
@@ -604,7 +620,9 @@ class WorkerProcess(Process):
         self._read_pipe = read_pipe
         self._write_pipe = write_pipe
         self._need_work_queue = need_work_queue
-        self._original_outfiles = orig_outfiles
+        # Do not store orig_outfiles directly because it contains
+        # _io.BufferedWriter attributes, which cannot be pickled.
+        self._original_outfiles = orig_outfiles.as_bytesio()
 
     def run(self):
         try:
@@ -690,7 +708,7 @@ class PipelineRunner(ABC):
     """
     A read processing pipeline
     """
-    def __init__(self, pipeline: Pipeline, progress: Progress, *args, **kwargs):
+    def __init__(self, pipeline: Pipeline, progress: Progress):
         self._pipeline = pipeline
         self._progress = progress
 
@@ -746,14 +764,14 @@ class ParallelPipelineRunner(PipelineRunner):
         self._need_work_queue = Queue()  # type: Queue
         self._buffer_size = buffer_size
         self._assign_input(infiles.path1, infiles.path2, infiles.interleaved)
-        self._assign_output(outfiles)
+        self._outfiles = outfiles
 
     def _assign_input(
         self,
         path1: str,
         path2: Optional[str] = None,
         interleaved: bool = False,
-    ):
+    ) -> None:
         self._two_input_files = path2 is not None
         self._interleaved_input = interleaved
         # the workers read from these connections
@@ -770,9 +788,6 @@ class ParallelPipelineRunner(PipelineRunner):
         self._reader_process.daemon = True
         self._reader_process.start()
 
-    def _assign_output(self, outfiles: OutputFiles):
-        self._outfiles = outfiles
-
     def _start_workers(self) -> Tuple[List[WorkerProcess], List[Connection]]:
         workers = []
         connections = []
@@ -789,16 +804,17 @@ class ParallelPipelineRunner(PipelineRunner):
             workers.append(worker)
         return workers, connections
 
-    def run(self):
+    def run(self) -> Statistics:
         workers, connections = self._start_workers()
         writers = []
         for f in self._outfiles:
             writers.append(OrderedChunkWriter(f))
-        stats = None
+        stats = Statistics()
         n = 0  # A running total of the number of processed reads (for progress indicator)
         while connections:
             ready_connections = multiprocessing.connection.wait(connections)
             for connection in ready_connections:
+                assert isinstance(connection, Connection)
                 chunk_index = connection.recv()
                 if chunk_index == -1:
                     # the worker is done
@@ -810,10 +826,7 @@ class ParallelPipelineRunner(PipelineRunner):
                         e, tb_str = connection.recv()
                         logger.error('%s', tb_str)
                         raise e
-                    if stats is None:
-                        stats = cur_stats
-                    else:
-                        stats += cur_stats
+                    stats += cur_stats
                     connections.remove(connection)
                     continue
                 elif chunk_index == -2:
@@ -844,11 +857,8 @@ class ParallelPipelineRunner(PipelineRunner):
         self._progress.stop(n)
         return stats
 
-    def close(self):
-        for f in self._outfiles:
-            # TODO do not use hasattr
-            if f is not sys.stdin and f is not sys.stdout and f is not sys.stdout.buffer and hasattr(f, 'close'):
-                f.close()
+    def close(self) -> None:
+        self._outfiles.close()
 
 
 class SerialPipelineRunner(PipelineRunner):
@@ -866,12 +876,14 @@ class SerialPipelineRunner(PipelineRunner):
         super().__init__(pipeline, progress)
         self._pipeline.connect_io(infiles, outfiles)
 
-    def run(self):
+    def run(self) -> Statistics:
         (n, total1_bp, total2_bp) = self._pipeline.process_reads(progress=self._progress)
         if self._progress:
             self._progress.stop(n)
         # TODO
-        return Statistics().collect(n, total1_bp, total2_bp, self._pipeline._modifiers, self._pipeline._filters)
+        modifiers = getattr(self._pipeline, "_modifiers", None)
+        assert modifiers is not None
+        return Statistics().collect(n, total1_bp, total2_bp, modifiers, self._pipeline._filters)
 
-    def close(self):
+    def close(self) -> None:
         self._pipeline.close()


=====================================
src/cutadapt/report.py
=====================================
@@ -1,6 +1,7 @@
 """
 Routines for printing a report.
 """
+import sys
 from io import StringIO
 import textwrap
 from collections import Counter
@@ -11,7 +12,8 @@ from .adapters import (
 )
 from .modifiers import (SingleEndModifier, PairedModifier, QualityTrimmer, NextseqQualityTrimmer,
     AdapterCutter, PairedAdapterCutter, ReverseComplementer)
-from .filters import WithStatistics, TooShortReadFilter, TooLongReadFilter, NContentFilter
+from .filters import (WithStatistics, TooShortReadFilter, TooLongReadFilter, NContentFilter,
+    CasavaFilter, MaximumExpectedErrorsFilter)
 
 
 def safe_divide(numerator, denominator):
@@ -38,6 +40,8 @@ class Statistics:
         self.too_short = None
         self.too_long = None
         self.too_many_n = None
+        self.too_many_expected_errors = None
+        self.casava_filtered = None
         self.reverse_complemented = None  # type: Optional[int]
         self.n = 0
         self.written = 0
@@ -66,6 +70,9 @@ class Statistics:
         self.too_short = add_if_not_none(self.too_short, other.too_short)
         self.too_long = add_if_not_none(self.too_long, other.too_long)
         self.too_many_n = add_if_not_none(self.too_many_n, other.too_many_n)
+        self.too_many_expected_errors = add_if_not_none(
+            self.too_many_expected_errors, other.too_many_expected_errors)
+        self.casava_filtered = add_if_not_none(self.casava_filtered, other.casava_filtered)
         for i in (0, 1):
             self.total_bp[i] += other.total_bp[i]
             self.written_bp[i] += other.written_bp[i]
@@ -120,6 +127,10 @@ class Statistics:
                 self.too_long = w.filtered
             elif isinstance(w.filter, NContentFilter):
                 self.too_many_n = w.filtered
+            elif isinstance(w.filter, MaximumExpectedErrorsFilter):
+                self.too_many_expected_errors = w.filtered
+            elif isinstance(w.filter, CasavaFilter):
+                self.casava_filtered = w.filtered
 
     def _collect_modifier(self, m: SingleEndModifier):
         if isinstance(m, PairedAdapterCutter):
@@ -187,6 +198,14 @@ class Statistics:
     def too_many_n_fraction(self):
         return safe_divide(self.too_many_n, self.n)
 
+    @property
+    def too_many_expected_errors_fraction(self):
+        return safe_divide(self.too_many_expected_errors, self.n)
+
+    @property
+    def casava_filtered_fraction(self):
+        return safe_divide(self.casava_filtered, self.n)
+
 
 def error_ranges(adapter_statistics: EndStatistics):
     length = adapter_statistics.effective_length
@@ -291,8 +310,12 @@ def full_report(stats: Statistics, time: float, gc_content: float) -> str:  # no
         kwargs['file'] = sio
         print(*args, **kwargs)
 
-    print_s("Finished in {:.2F} s ({:.0F} µs/read; {:.2F} M reads/minute).".format(
-        time, 1E6 * time / stats.n, stats.n / time * 60 / 1E6))
+    if sys.version_info[:2] <= (3, 6):
+        micro = "u"
+    else:
+        micro = "µ"
+    print_s("Finished in {:.2F} s ({:.0F} {}s/read; {:.2F} M reads/minute).".format(
+        time, 1E6 * time / stats.n, micro, stats.n / time * 60 / 1E6))
 
     report = "\n=== Summary ===\n\n"
     if stats.paired:
@@ -309,12 +332,19 @@ def full_report(stats: Statistics, time: float, gc_content: float) -> str:  # no
     if stats.reverse_complemented is not None:
         report += "Reverse-complemented:            " \
                   "{o.reverse_complemented:13,d} ({o.reverse_complemented_fraction:.1%})\n"
+
     if stats.too_short is not None:
         report += "{pairs_or_reads} that were too short:       {o.too_short:13,d} ({o.too_short_fraction:.1%})\n"
     if stats.too_long is not None:
         report += "{pairs_or_reads} that were too long:        {o.too_long:13,d} ({o.too_long_fraction:.1%})\n"
     if stats.too_many_n is not None:
         report += "{pairs_or_reads} with too many N:           {o.too_many_n:13,d} ({o.too_many_n_fraction:.1%})\n"
+    if stats.too_many_expected_errors is not None:
+        report += "{pairs_or_reads} with too many exp. errors: " \
+                  "{o.too_many_expected_errors:13,d} ({o.too_many_expected_errors_fraction:.1%})\n"
+    if stats.casava_filtered is not None:
+        report += "{pairs_or_reads} failed CASAVA filter:      " \
+                  "{o.casava_filtered:13,d} ({o.casava_filtered_fraction:.1%})\n"
 
     report += textwrap.dedent("""\
     {pairs_or_reads} written (passing filters): {o.written:13,d} ({o.written_fraction:.1%})


=====================================
src/cutadapt/utils.py
=====================================
@@ -194,6 +194,7 @@ class FileOpener:
             f = self.dnaio_open(*args, **kwargs)
         except OSError as e:
             if e.errno == errno.EMFILE:  # Too many open files
+                logger.debug("Too many open files, attempting to raise soft limit")
                 raise_open_files_limit(8)
                 f = self.dnaio_open(*args, **kwargs)
             else:


=====================================
tests/cut/action_retain.fasta
=====================================
@@ -0,0 +1,16 @@
+>r1
+CGTCCGAAcaag
+>r2
+caag
+>r3
+TGCCCTAGTcaag
+>r4
+TGCCCTAGTcaa
+>r5
+ggttaaCCGCCTTGA
+>r6
+ggttaaCATTGCCCTAGTTTATT
+>r7
+ttaaGTTCATGT
+>r8
+ACGTACGT


=====================================
tests/data/action_retain.fasta
=====================================
@@ -0,0 +1,16 @@
+>r1
+CGTCCGAAcaagCCTGCCACAT
+>r2
+caagACAAGACCT
+>r3
+TGCCCTAGTcaag
+>r4
+TGCCCTAGTcaa
+>r5
+TGTTGggttaaCCGCCTTGA
+>r6
+ggttaaCATTGCCCTAGTTTATT
+>r7
+ttaaGTTCATGT
+>r8
+ACGTACGT


=====================================
tests/test_command.py
=====================================
@@ -0,0 +1,79 @@
+"""Tests that run the program in a subprocess"""
+
+import subprocess
+import sys
+
+from utils import datapath, assert_files_equal, cutpath
+
+
+def test_run_cutadapt_process():
+    subprocess.check_call(["cutadapt", "--version"])
+
+
+def test_run_as_module():
+    """Check that "python3 -m cutadapt ..." works"""
+    from cutadapt import __version__
+    with subprocess.Popen([sys.executable, "-m", "cutadapt", "--version"], stdout=subprocess.PIPE) as py:
+        assert py.communicate()[0].decode().strip() == __version__
+
+
+def test_standard_input_pipe(tmpdir, cores):
+    """Read FASTQ from standard input"""
+    out_path = str(tmpdir.join("out.fastq"))
+    in_path = datapath("small.fastq")
+    # Use 'cat' to simulate that no file name is available for stdin
+    with subprocess.Popen(["cat", in_path], stdout=subprocess.PIPE) as cat:
+        with subprocess.Popen([
+            sys.executable, "-m", "cutadapt", "--cores", str(cores),
+            "-a", "TTAGACATATCTCCGTCG", "-o", out_path, "-"],
+            stdin=cat.stdout
+        ) as py:
+            _ = py.communicate()
+            cat.stdout.close()
+            _ = py.communicate()[0]
+    assert_files_equal(cutpath("small.fastq"), out_path)
+
+
+def test_standard_output(tmpdir, cores):
+    """Write FASTQ to standard output (not using --output/-o option)"""
+    out_path = str(tmpdir.join("out.fastq"))
+    with open(out_path, "w") as out_file:
+        py = subprocess.Popen([
+            sys.executable, "-m", "cutadapt", "--cores", str(cores),
+            "-a", "TTAGACATATCTCCGTCG", datapath("small.fastq")],
+            stdout=out_file)
+        _ = py.communicate()
+    assert_files_equal(cutpath("small.fastq"), out_path)
+
+
+def test_explicit_standard_output(tmpdir, cores):
+    """Write FASTQ to standard output (using "-o -")"""
+
+    out_path = str(tmpdir.join("out.fastq"))
+    with open(out_path, "w") as out_file:
+        py = subprocess.Popen([
+            sys.executable, "-m", "cutadapt", "-o", "-", "--cores", str(cores),
+            "-a", "TTAGACATATCTCCGTCG", datapath("small.fastq")],
+            stdout=out_file)
+        _ = py.communicate()
+    assert_files_equal(cutpath("small.fastq"), out_path)
+
+
+def test_force_fasta_output(tmpdir, cores):
+    """Write FASTA to standard output even on FASTQ input"""
+
+    out_path = str(tmpdir.join("out.fasta"))
+    with open(out_path, "w") as out_file:
+        py = subprocess.Popen([
+            sys.executable, "-m", "cutadapt", "--fasta", "-o", "-", "--cores", str(cores),
+            "-a", "TTAGACATATCTCCGTCG", datapath("small.fastq")],
+            stdout=out_file)
+        _ = py.communicate()
+    assert_files_equal(cutpath("small.fasta"), out_path)
+
+
+def test_non_utf8_locale():
+    subprocess.check_call(
+        [sys.executable, "-m", "cutadapt", "-o", "/dev/null", datapath("small.fastq")],
+        env={"LC_CTYPE": "C"},
+    )


=====================================
tests/test_commandline.py
=====================================
@@ -1,10 +1,7 @@
-import os
-import shutil
+import subprocess
 import sys
-import tempfile
 from io import StringIO, BytesIO
 import pytest
-import subprocess
 
 from cutadapt.__main__ import main
 from utils import assert_files_equal, datapath, cutpath
@@ -54,7 +51,7 @@ def test_debug():
 
 
 def test_debug_trace():
-    main(["--debug=trace", "-a", "ACGT", datapath("small.fastq")])
+    main(["--debug", "--debug", "-a", "ACGT", datapath("small.fastq")])
 
 
 def test_example(run):
@@ -230,6 +227,15 @@ def test_action_lowercase(run):
     run("-b CAAG -n 3 --action=lowercase", "action_lowercase.fasta", "action_lowercase.fasta")
 
 
+def test_action_retain(run):
+    run("-g GGTTAACC -a CAAG --action=retain", "action_retain.fasta", "action_retain.fasta")
+
+
+def test_action_retain_times():
+    with pytest.raises(SystemExit):
+        main(["-a", "ACGT", "--times=2", "--action=retain", datapath("small.fastq")])
+
+
 def test_gz_multiblock(run):
     """compressed gz file with multiple blocks (created by concatenating two .gz files)"""
     run("-b TTAGACATATCTCCGTCG", "small.fastq", "multiblock.fastq.gz")
@@ -470,9 +476,9 @@ def test_adapter_file_empty_name(run):
     run('-N -a file:' + datapath('adapter-empty-name.fasta'), 'illumina.fastq', 'illumina.fastq.gz')
 
 
-def test_demultiplex(cores):
-    tempdir = tempfile.mkdtemp(prefix='cutadapt-tests.')
-    multiout = os.path.join(tempdir, 'tmp-demulti.{name}.fasta')
+ at pytest.mark.parametrize("ext", ["", ".gz"])
+def test_demultiplex(cores, tmp_path, ext):
+    multiout = str(tmp_path / 'tmp-demulti.{name}.fasta') + ext
     params = [
         '--cores', str(cores),
         '-a', 'first=AATTTCAGGAATT',
@@ -481,10 +487,13 @@ def test_demultiplex(cores):
         datapath('twoadapters.fasta'),
     ]
     main(params)
-    assert_files_equal(cutpath('twoadapters.first.fasta'), multiout.format(name='first'))
-    assert_files_equal(cutpath('twoadapters.second.fasta'), multiout.format(name='second'))
-    assert_files_equal(cutpath('twoadapters.unknown.fasta'), multiout.format(name='unknown'))
-    shutil.rmtree(tempdir)
+    for name in ("first", "second", "unknown"):
+        actual = multiout.format(name=name)
+        if ext == ".gz":
+            subprocess.run(["gzip", "-d", actual], check=True)
+            actual = actual[:-3]
+        expected = cutpath("twoadapters.{name}.fasta".format(name=name))
+        assert_files_equal(expected, actual)
 
 
 def test_multiple_fake_anchored_adapters(run):
@@ -610,10 +619,6 @@ def test_negative_length(run):
     run('--length -5', 'shortened-negative.fastq', 'small.fastq')
 
 
-def test_run_cutadapt_process():
-    subprocess.check_call(['cutadapt', '--version'])
-
-
 @pytest.mark.timeout(0.5)
 def test_issue_296(tmpdir):
     # Hang when using both --no-trim and --info-file together
@@ -636,7 +641,8 @@ def test_adapterx(run):
 
 
 def test_discard_casava(run):
-    run('--discard-casava', 'casava.fastq', 'casava.fastq')
+    stats = run('--discard-casava', 'casava.fastq', 'casava.fastq')
+    assert stats.casava_filtered == 1
 
 
 def test_underscore(run):
@@ -663,68 +669,6 @@ def test_paired_separate(run):
     run("-a CAGTGGAGTA", "paired-separate.2.fastq", "paired.2.fastq")
 
 
-def test_run_as_module():
-    """Check that "python3 -m cutadapt ..." works"""
-    from cutadapt import __version__
-    py = subprocess.Popen([sys.executable, "-m", "cutadapt", "--version"], stdout=subprocess.PIPE)
-    assert py.communicate()[0].decode().strip() == __version__
-
-
-def test_standard_input_pipe(tmpdir, cores):
-    """Read FASTQ from standard input"""
-    out_path = str(tmpdir.join("out.fastq"))
-    in_path = datapath("small.fastq")
-    # Use 'cat' to simulate that no file name is available for stdin
-    with subprocess.Popen(["cat", in_path], stdout=subprocess.PIPE) as cat:
-        with subprocess.Popen([
-            sys.executable, "-m", "cutadapt", "--cores", str(cores),
-            "-a", "TTAGACATATCTCCGTCG", "-o", out_path, "-"],
-            stdin=cat.stdout
-        ) as py:
-            _ = py.communicate()
-            cat.stdout.close()
-            _ = py.communicate()[0]
-    assert_files_equal(cutpath("small.fastq"), out_path)
-
-
-def test_standard_output(tmpdir, cores):
-    """Write FASTQ to standard output (not using --output/-o option)"""
-    out_path = str(tmpdir.join("out.fastq"))
-    with open(out_path, "w") as out_file:
-        py = subprocess.Popen([
-            sys.executable, "-m", "cutadapt", "--cores", str(cores),
-            "-a", "TTAGACATATCTCCGTCG", datapath("small.fastq")],
-            stdout=out_file)
-        _ = py.communicate()
-    assert_files_equal(cutpath("small.fastq"), out_path)
-
-
-def test_explicit_standard_output(tmpdir, cores):
-    """Write FASTQ to standard output (using "-o -")"""
-
-    out_path = str(tmpdir.join("out.fastq"))
-    with open(out_path, "w") as out_file:
-        py = subprocess.Popen([
-            sys.executable, "-m", "cutadapt", "-o", "-", "--cores", str(cores),
-            "-a", "TTAGACATATCTCCGTCG", datapath("small.fastq")],
-            stdout=out_file)
-        _ = py.communicate()
-    assert_files_equal(cutpath("small.fastq"), out_path)
-
-
-def test_force_fasta_output(tmpdir, cores):
-    """Write FASTA to standard output even on FASTQ input"""
-
-    out_path = str(tmpdir.join("out.fasta"))
-    with open(out_path, "w") as out_file:
-        py = subprocess.Popen([
-            sys.executable, "-m", "cutadapt", "--fasta", "-o", "-", "--cores", str(cores),
-            "-a", "TTAGACATATCTCCGTCG", datapath("small.fastq")],
-            stdout=out_file)
-        _ = py.communicate()
-    assert_files_equal(cutpath("small.fasta"), out_path)
-
-
 def test_empty_read_with_wildcard_in_adapter(run):
     run("-g CWC", "empty.fastq", "empty.fastq")
 
@@ -773,7 +717,8 @@ def test_reverse_complement_and_info_file(run, tmp_path, cores):
 
 
 def test_max_expected_errors(run, cores):
-    run("--max-ee=0.9", "maxee.fastq", "maxee.fastq")
+    stats = run("--max-ee=0.9", "maxee.fastq", "maxee.fastq")
+    assert stats.too_many_expected_errors == 2
 
 
 def test_max_expected_errors_fasta(tmp_path):


=====================================
tests/test_modifiers.py
=====================================
@@ -1,7 +1,10 @@
+from typing import List
+
 import pytest
 
 from dnaio import Sequence
-from cutadapt.adapters import BackAdapter, PrefixAdapter, IndexedPrefixAdapters
+from cutadapt.adapters import BackAdapter, PrefixAdapter, IndexedPrefixAdapters, LinkedAdapter, \
+    FrontAdapter, Adapter
 from cutadapt.modifiers import (UnconditionalCutter, NEndTrimmer, QualityTrimmer,
     Shortener, AdapterCutter, PairedAdapterCutter, ModificationInfo, ZeroCapper)
 
@@ -80,7 +83,8 @@ def test_adapter_cutter_indexing():
     (None, "CCCCGGTTAACCCC", "TTTTAACCGGTTTT"),
     ("trim", "CCCC", "TTTT"),
     ("lowercase", "CCCCggttaacccc", "TTTTaaccggtttt"),
-    ("mask", "CCCCNNNNNNNNNN", "TTTTNNNNNNNNNN")
+    ("mask", "CCCCNNNNNNNNNN", "TTTTNNNNNNNNNN"),
+    ("retain", "CCCCGGTTAA", "TTTTAACCGG"),
 ])
 def test_paired_adapter_cutter_actions(action, expected_trimmed1, expected_trimmed2):
     a1 = BackAdapter("GGTTAA")
@@ -93,3 +97,37 @@ def test_paired_adapter_cutter_actions(action, expected_trimmed1, expected_trimm
     trimmed1, trimmed2 = pac(s1, s2, info1, info2)
     assert expected_trimmed1 == trimmed1.sequence
     assert expected_trimmed2 == trimmed2.sequence
+
+
+def test_retain_times():
+    with pytest.raises(ValueError) as e:
+        AdapterCutter([BackAdapter("ACGT")], times=2, action="retain")
+    assert "cannot be combined with times" in e.value.args[0]
+
+
+def test_action_retain():
+    back = BackAdapter("AACCGG")
+    ac = AdapterCutter([back], action="retain")
+    seq = Sequence("r1", "ATTGCCAACCGGTATATAT")
+    info = ModificationInfo(seq)
+    trimmed = ac(seq, info)
+    assert "ATTGCCAACCGG" == trimmed.sequence
+
+
+ at pytest.mark.parametrize("s,expected", [
+    ("ATTATTggttaaccAAAAAaaccggTATT", "ggttaaccAAAAAaaccgg"),
+    ("AAAAAaaccggTATT", "AAAAAaaccgg"),
+    ("ATTATTggttaaccAAAAA", "ggttaaccAAAAA"),
+    ("ATTATT", "ATTATT"),
+])
+def test_linked_action_retain(s, expected):
+    front = FrontAdapter("GGTTAACC")
+    back = BackAdapter("AACCGG")
+    adapters: List[Adapter] = [
+        LinkedAdapter(front, back, front_required=False, back_required=False, name="linked")
+    ]
+    ac = AdapterCutter(adapters, action="retain")
+    seq = Sequence("r1", s)
+    info = ModificationInfo(seq)
+    trimmed = ac(seq, info)
+    assert expected == trimmed.sequence


=====================================
tox.ini
=====================================
@@ -13,6 +13,7 @@ commands =
     coverage run --concurrency=multiprocessing -m pytest --doctest-modules --pyargs cutadapt tests
     coverage combine
     coverage report
+    coverage xml
 
 [testenv:docs]
 basepython = python3.6



View it on GitLab: https://salsa.debian.org/med-team/python-cutadapt/-/commit/02bef8f81a3ff83013340ae5ab51880b970c2bbf

-- 
View it on GitLab: https://salsa.debian.org/med-team/python-cutadapt/-/commit/02bef8f81a3ff83013340ae5ab51880b970c2bbf
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20201212/42804faa/attachment-0001.html>


More information about the debian-med-commit mailing list