[med-svn] [Git][med-team/python-cutadapt][master] 7 commits: New upstream version 3.1
Nilesh Patra
gitlab at salsa.debian.org
Sat Dec 12 13:17:58 GMT 2020
Nilesh Patra pushed to branch master at Debian Med / python-cutadapt
Commits:
02bef8f8 by Nilesh Patra at 2020-12-12T18:36:06+05:30
New upstream version 3.1
- - - - -
8d90f6ee by Nilesh Patra at 2020-12-12T18:36:06+05:30
routine-update: New upstream version
- - - - -
d5d88dd0 by Nilesh Patra at 2020-12-12T18:36:07+05:30
Update upstream source from tag 'upstream/3.1'
Update to upstream version '3.1'
with Debian dir 2d6a3b9e4bdc36fd3ad85de49c19bf505a7988e6
- - - - -
4d06e551 by Nilesh Patra at 2020-12-12T18:36:08+05:30
routine-update: Standards-Version: 4.5.1
- - - - -
acbe606d by Nilesh Patra at 2020-12-12T18:38:34+05:30
Refresh patch
- - - - -
c707e807 by Nilesh Patra at 2020-12-12T18:43:27+05:30
Do not mix python3-all and python3-all-dev
- - - - -
20c3da49 by Nilesh Patra at 2020-12-12T18:43:48+05:30
routine-update: Ready to upload to unstable
- - - - -
24 changed files:
- .codecov.yml
- + .github/workflows/ci.yml
- − .travis.yml
- CHANGES.rst
- buildwheels.sh
- debian/changelog
- debian/control
- debian/patches/xfail_cutadapt_executable_test.patch
- doc/guide.rst
- setup.py
- src/cutadapt/__main__.py
- src/cutadapt/adapters.py
- src/cutadapt/log.py
- src/cutadapt/modifiers.py
- src/cutadapt/parser.py
- src/cutadapt/pipeline.py
- src/cutadapt/report.py
- src/cutadapt/utils.py
- + tests/cut/action_retain.fasta
- + tests/data/action_retain.fasta
- + tests/test_command.py
- tests/test_commandline.py
- tests/test_modifiers.py
- tox.ini
Changes:
=====================================
.codecov.yml
=====================================
@@ -1,5 +1,3 @@
-comment: off
-
codecov:
require_ci_to_pass: no
@@ -9,7 +7,11 @@ coverage:
range: "90...100"
status:
- project: yes
+ project:
+ default:
+ target: auto
+ threshold: 1%
+ base: auto
patch: no
changes: no
=====================================
.github/workflows/ci.yml
=====================================
@@ -0,0 +1,72 @@
+name: CI
+
+on: [push, pull_request]
+
+jobs:
+ lint:
+ timeout-minutes: 5
+ runs-on: ubuntu-latest
+ strategy:
+ matrix:
+ python-version: [3.7]
+ toxenv: [flake8, mypy, docs]
+ steps:
+ - uses: actions/checkout at v2
+ - name: Set up Python ${{ matrix.python-version }}
+ uses: actions/setup-python at v2
+ with:
+ python-version: ${{ matrix.python-version }}
+ - name: Install dependencies
+ run: python -m pip install tox
+ - name: Run tox ${{ matrix.toxenv }}
+ run: tox -e ${{ matrix.toxenv }}
+
+ test:
+ timeout-minutes: 5
+ runs-on: ${{ matrix.os }}
+ strategy:
+ matrix:
+ os: [ubuntu-latest]
+ python-version: [3.6, 3.7, 3.8, 3.9]
+ include:
+ - python-version: 3.8
+ os: macos-latest
+ steps:
+ - uses: actions/checkout at v2
+ - name: Set up Python ${{ matrix.python-version }}
+ uses: actions/setup-python at v2
+ with:
+ python-version: ${{ matrix.python-version }}
+ - name: Install dependencies
+ run: python -m pip install tox
+ - name: Test
+ run: tox -e py
+ - name: Upload coverage report
+ uses: codecov/codecov-action at v1
+
+ deploy:
+ timeout-minutes: 5
+ runs-on: ubuntu-latest
+ needs: [lint, test]
+ if: startsWith(github.ref, 'refs/tags')
+ steps:
+ - uses: actions/checkout at v2
+ with:
+ fetch-depth: 0 # required for setuptools_scm
+ - name: Set up Python
+ uses: actions/setup-python at v2
+ with:
+ python-version: 3.7
+ - name: Make distributions
+ run: |
+ python -m pip install Cython
+ python setup.py sdist
+ ./buildwheels.sh
+ ls -l dist/
+ - name: Publish to PyPI
+ uses: pypa/gh-action-pypi-publish at v1.4.1
+ with:
+ user: __token__
+ password: ${{ secrets.pypi_password }}
+ #password: ${{ secrets.test_pypi_password }}
+ #repository_url: https://test.pypi.org/legacy/
=====================================
.travis.yml deleted
=====================================
@@ -1,63 +0,0 @@
-language: python
-
-cache:
- directories:
- - $HOME/.cache/pip
-
-install:
- - pip install tox
-
-script:
- - tox
-
-after_success:
- - pip install codecov
- - codecov
-
-env:
- global:
- #- TWINE_REPOSITORY_URL=https://test.pypi.org/legacy/
- - TWINE_USERNAME=marcelm
- # TWINE_PASSWORD is set in Travis settings
-
-jobs:
- include:
- - python: "3.6"
- env: TOXENV=py36
-
- - python: "3.7"
- env: TOXENV=py37
-
- - python: "3.8"
- env: TOXENV=py38
-
- - python: "3.9"
- env: TOXENV=py39
-
- - name: flake8
- python: "3.6"
- env: TOXENV=flake8
-
- - name: mypy
- python: "3.6"
- env: TOXENV=mypy
-
- - name: docs
- python: "3.6"
- env: TOXENV=docs
-
- - stage: deploy
- services:
- - docker
- python: "3.6"
- install: python3 -m pip install Cython twine
- if: tag IS present
- script:
- - |
- python3 setup.py sdist
- ./buildwheels.sh
- ls -l dist/
- python3 -m twine upload dist/*
-
- allow_failures:
- - python: "nightly"
=====================================
CHANGES.rst
=====================================
@@ -2,6 +2,25 @@
Changes
=======
+v3.1 (2020-12-03)
+-----------------
+
+* :issue:`443`: With ``--action=retain``, it is now possible to trim reads while
+ leaving the adapter sequence itself in the read. That is, only the sequence
+ before (for 5’ adapters) or after (for 3’ adapters) is removed. With linked
+ adapters, both adapters are retained.
+* :issue:`495`: Running with multiple cores did not work using macOS and Python 3.8+.
+ To prevent problems like these in the future, automated testing has been extended
+ to also run on macOS.
+* :issue:`482`: Print statistics for ``--discard-casava`` and ``--max-ee`` in the
+ report.
+* :issue:`497`: The changelog for 3.0 previously forgot to mention that the following
+ options, which were deprecated in version 2.0, have now been removed, and
+ using them will lead to an error: ``--format``, ``--colorspace``, ``-c``, ``-d``,
+ ``--double-encode``, ``-t``, ``--trim-primer``, ``--strip-f3``, ``--maq``,
+ ``--bwa``, ``--no-zero-cap``. This frees up some single-character options,
+ allowing them to be re-purposed for future Cutadapt features.
+
v3.0 (2020-11-10)
-----------------
@@ -28,6 +47,12 @@ v3.0 (2020-11-10)
or had too many ``N``. (This unintentionally disappeared in a previous version.)
* :issue:`487`: When demultiplexing, the reported number of written pairs was
always zero.
+* :issue:`497`: The following options, which were deprecated in version 2.0, have
+ been removed, and using them will lead to an error:
+ ``--format``, ``--colorspace``, ``-c``, ``-d``, ``--double-encode``,
+ ``-t``, ``--trim-primer``, ``--strip-f3``, ``--maq``, ``--bwa``, ``--no-zero-cap``.
+ This frees up some single-character options,
+ allowing them to be re-purposed for future Cutadapt features.
* Ensure Cutadapt runs under Python 3.9.
* Drop support for Python 3.5.
=====================================
buildwheels.sh
=====================================
@@ -16,7 +16,7 @@ manylinux=quay.io/pypa/manylinux2010_x86_64
# For convenience, if this script is called from outside of a docker container,
# it starts a container and runs itself inside of it.
-if ! grep -q docker /proc/1/cgroup; then
+if ! grep -q docker /proc/1/cgroup && ! test -d /opt/python; then
# We are not inside a container
docker pull ${manylinux}
exec docker run --rm -v $(pwd):/io ${manylinux} /io/$0
=====================================
debian/changelog
=====================================
@@ -1,3 +1,13 @@
+python-cutadapt (3.1-1) unstable; urgency=medium
+
+ * Team upload.
+ * New upstream version
+ * Standards-Version: 4.5.1 (routine-update)
+ * Do not mix python3-all and python3-all-dev
+ * Refresh patch
+
+ -- Nilesh Patra <npatra974 at gmail.com> Sat, 12 Dec 2020 18:43:48 +0530
+
python-cutadapt (3.0-1) unstable; urgency=medium
* Team upload.
=====================================
debian/control
=====================================
@@ -9,7 +9,6 @@ Testsuite: autopkgtest-pkg-python
Priority: optional
Build-Depends: debhelper-compat (= 13),
dh-python,
- python3-all,
python3-all-dev,
python3-setuptools,
python3-setuptools-scm,
@@ -19,7 +18,7 @@ Build-Depends: debhelper-compat (= 13),
python3-dnaio,
python3-pytest,
cython3
-Standards-Version: 4.5.0
+Standards-Version: 4.5.1
Vcs-Browser: https://salsa.debian.org/med-team/python-cutadapt
Vcs-Git: https://salsa.debian.org/med-team/python-cutadapt.git
Homepage: https://pypi.python.org/pypi/cutadapt
=====================================
debian/patches/xfail_cutadapt_executable_test.patch
=====================================
@@ -4,13 +4,16 @@ Description: Since built-time tests are running before installation,
executables are not available yet. Marking tests as xfail - which will
simply xpass during autopkgtests
---- a/tests/test_commandline.py
-+++ b/tests/test_commandline.py
-@@ -610,6 +610,7 @@
- run('--length -5', 'shortened-negative.fastq', 'small.fastq')
+--- a/tests/test_command.py
++++ b/tests/test_command.py
+@@ -4,8 +4,10 @@
+ import sys
+
+ from utils import datapath, assert_files_equal, cutpath
++import pytest
+ at pytest.mark.xfail(reason='We cannot test this during built-time tests')
def test_run_cutadapt_process():
- subprocess.check_call(['cutadapt', '--version'])
+ subprocess.check_call(["cutadapt", "--version"])
=====================================
doc/guide.rst
=====================================
@@ -903,25 +903,47 @@ This section describes in which ways reads can be modified other than adapter
removal.
-Not trimming adapters
----------------------
+.. _changing-what-is-done-when-an-adapter-is-found:
+.. _action:
-Instead of removing an adapter from a read, it is also possible to take other
-actions when an adapter is found by specifying the ``--action`` option.
+``--action`` changes what is done when an adapter is found
+----------------------------------------------------------
-The default is ``--action=trim``, which will remove the adapter and either
-the sequence before or after it from the read.
+The ``--action`` option can be used to change what is done when an adapter match
+is found in a read.
-Use ``--action=none`` to not remove the adapter from the read. This is useful
-when combined with other options, such as ``--untrimmed-output``, which
-will redirect the reads without adapter to a different file. Other read
-modification options (as listed below) may still change the read.
+The default is ``--action=trim``, which will remove the adapter and the
+sequence before or after it from the read. For 5' adapters, the adapter and
+the sequence preceding it is removed. For 3' adapters, the adapter and the
+sequence following it is removed. Since linked adapters are a combination of
+a 5' and 3' adapter, in effect only the sequence between the 5' and the 3'
+adapter matches is kept.
-Use ``--action=mask`` to write ``N`` characters to that parts of the read
-that would otherwise have been removed .
+With ``--action=retain``, the read is trimmed, but the adapter sequence itself
+is not removed. Up- and downstream sequences are removed in the same way as
+for the ``trim`` action. For linked adapters, both adapter sequences are kept.
-Use ``--action=lowercase`` to change to lowercase that part of the read that would otherwise
-have been removed. The rest is converted to uppercase.
+.. note::
+ Because it is somewhat unclear what should happen, ``--action=retain`` can
+ at the moment not be combined with ``--times`` (multiple rounds of adapter
+ removal).
+
+Use ``--action=none`` to not change the read even if there is a match.
+This is useful because the statistics will still be updated as before
+and because the read will still be considered "trimmed" for the read
+filtering options. Combining this with ``--untrimmed-output``, for
+example, can be used to copy reads without adapters to a different
+file. Other read modification options, if used, may still change
+the read.
+
+Use ``--action=mask`` to write ``N`` characters to those parts of the read
+that would otherwise have been removed.
+
+Use ``--action=lowercase`` to change to lowercase those parts of the read that
+would otherwise have been removed. The rest is converted to uppercase.
+
+.. versionadded:: 3.1
+ The ``retain`` action.
.. _cut-bases:
=====================================
setup.py
=====================================
@@ -92,6 +92,7 @@ setup(
url='https://cutadapt.readthedocs.io/',
description='trim adapters from high-throughput sequencing reads',
long_description=long_description,
+ long_description_content_type='text/x-rst',
license='MIT',
cmdclass={'build_ext': BuildExt, 'sdist': SDist},
ext_modules=extensions,
=====================================
src/cutadapt/__main__.py
=====================================
@@ -125,9 +125,8 @@ def get_argument_parser() -> ArgumentParser:
group.add_argument("-h", "--help", action="help", help="Show this help message and exit")
group.add_argument("--version", action="version", help="Show version number and exit",
version=__version__)
- group.add_argument("--debug", nargs="?", const=True, default=False,
- choices=("trace", ),
- help="Print debug log. 'trace' prints also DP matrices")
+ group.add_argument("--debug", action="count", default=0,
+ help="Print debug log. Use twice to also print DP matrices")
group.add_argument("--profile", action="store_true", default=False, help=SUPPRESS)
group.add_argument('-j', '--cores', type=int, default=1,
help='Number of CPU cores to use. Use 0 to auto-detect. Default: %(default)s')
@@ -192,12 +191,14 @@ def get_argument_parser() -> ArgumentParser:
group.add_argument("-N", "--no-match-adapter-wildcards", action="store_false",
default=True, dest='match_adapter_wildcards',
help="Do not interpret IUPAC wildcards in adapters.")
- group.add_argument("--action", choices=('trim', 'mask', 'lowercase', 'none'), default='trim',
- help="What to do with found adapters. "
+ group.add_argument("--action", choices=("trim", "retain", "mask", "lowercase", "none"),
+ default="trim",
+ help="What to do if a match was found. "
+ "trim: trim adapter and up- or downstream sequence; "
+ "retain: trim, but retain adapter; "
"mask: replace with 'N' characters; "
"lowercase: convert to lowercase; "
- "none: leave unchanged (useful with "
- "--discard-untrimmed). Default: %(default)s")
+ "none: leave unchanged. Default: %(default)s")
group.add_argument("--rc", "--revcomp", dest="reverse_complement", default=False,
action="store_true",
help="Check both the read and its reverse complement for adapter matches. If "
@@ -724,7 +725,7 @@ def adapters_from_args(args) -> Tuple[List[Adapter], List[Adapter]]:
raise CommandLineError(e)
warn_duplicate_adapters(adapters)
warn_duplicate_adapters(adapters2)
- if args.debug == "trace":
+ if args.debug > 1:
for adapter in adapters + adapters2:
adapter.enable_debug()
return adapters, adapters2
@@ -776,10 +777,13 @@ def add_adapter_cutter(
pipeline.add_paired_modifier(cutter)
else:
adapter_cutter, adapter_cutter2 = None, None
- if adapters:
- adapter_cutter = AdapterCutter(adapters, times, action, allow_index)
- if adapters2:
- adapter_cutter2 = AdapterCutter(adapters2, times, action, allow_index)
+ try:
+ if adapters:
+ adapter_cutter = AdapterCutter(adapters, times, action, allow_index)
+ if adapters2:
+ adapter_cutter2 = AdapterCutter(adapters2, times, action, allow_index)
+ except ValueError as e:
+ raise CommandLineError(e)
if paired:
if reverse_complement:
raise CommandLineError("--revcomp not implemented for paired-end reads")
@@ -837,11 +841,11 @@ def main(cmdlineargs, default_outfile=sys.stdout.buffer) -> Statistics:
start_time = time.time()
parser = get_argument_parser()
args, leftover_args = parser.parse_known_args(args=cmdlineargs)
- # log to stderr if results are to be sent to stdout
- log_to_stdout = args.output is not None and args.output != "-" and args.paired_output != "-"
# Setup logging only if there are not already any handlers (can happen when
# this function is being called externally such as from unit tests)
if not logging.root.handlers:
+ # If results are to be sent to stdout, logging needs to go to stderr
+ log_to_stdout = args.output is not None and args.output != "-" and args.paired_output != "-"
setup_logging(logger, stdout=log_to_stdout,
quiet=args.quiet, minimal=args.report == 'minimal', debug=args.debug)
log_header(cmdlineargs)
=====================================
src/cutadapt/adapters.py
=====================================
@@ -151,6 +151,10 @@ class Match(ABC):
def remainder_interval(self) -> Tuple[int, int]:
pass
+ @abstractmethod
+ def retained_adapter_interval(self) -> Tuple[int, int]:
+ pass
+
@abstractmethod
def get_info_records(self, read) -> List[List]:
pass
@@ -271,6 +275,9 @@ class RemoveBeforeMatch(SingleMatch):
"""
return self.rstop, len(self.sequence)
+ def retained_adapter_interval(self) -> Tuple[int, int]:
+ return self.rstart, len(self.sequence)
+
def trim_slice(self):
# Same as remainder_interval, but as a slice() object
return slice(self.rstop, None)
@@ -306,6 +313,9 @@ class RemoveAfterMatch(SingleMatch):
"""
return 0, self.rstart
+ def retained_adapter_interval(self) -> Tuple[int, int]:
+ return 0, self.rstop
+
def trim_slice(self):
# Same as remainder_interval, but as a slice() object
return slice(None, self.rstart)
@@ -329,9 +339,7 @@ def _generate_adapter_name(_start=[1]) -> str:
return name
-class Adapter(ABC):
-
- description = "adapter with one component" # this is overriden in subclasses
+class Matchable(ABC):
def __init__(self, name: str, *args, **kwargs):
self.name = name
@@ -345,6 +353,15 @@ class Adapter(ABC):
pass
+class Adapter(Matchable, ABC):
+
+ description = "adapter with one component" # this is overriden in subclasses
+
+ @abstractmethod
+ def create_statistics(self) -> AdapterStatistics:
+ pass
+
+
class SingleAdapter(Adapter, ABC):
"""
This class can find a single adapter characterized by sequence, error rate,
@@ -685,6 +702,18 @@ class LinkedMatch(Match):
matches = [match for match in [self.front_match, self.back_match] if match is not None]
return remainder(matches)
+ def retained_adapter_interval(self) -> Tuple[int, int]:
+ if self.front_match:
+ start = self.front_match.rstart
+ offset = self.front_match.rstop
+ else:
+ start = offset = 0
+ if self.back_match:
+ end = self.back_match.rstop + offset
+ else:
+ end = len(self.front_match.sequence)
+ return start, end
+
def get_info_records(self, read) -> List[List]:
records = []
for match, namesuffix in [
@@ -707,11 +736,11 @@ class LinkedAdapter(Adapter):
def __init__(
self,
- front_adapter,
- back_adapter,
- front_required,
- back_required,
- name,
+ front_adapter: SingleAdapter,
+ back_adapter: SingleAdapter,
+ front_required: bool,
+ back_required: bool,
+ name: str,
):
super().__init__(name)
self.front_required = front_required
@@ -754,11 +783,11 @@ class LinkedAdapter(Adapter):
return None
-class MultipleAdapters(Adapter):
+class MultipleAdapters(Matchable):
"""
Represent multiple adapters at once
"""
- def __init__(self, adapters: Sequence[Adapter]):
+ def __init__(self, adapters: Sequence[Matchable]):
super().__init__(name="multiple_adapters")
self._adapters = adapters
@@ -792,7 +821,7 @@ class MultipleAdapters(Adapter):
return best_match
-class IndexedAdapters(Adapter, ABC):
+class IndexedAdapters(Matchable, ABC):
"""
Represent multiple adapters of the same type at once and use an index data structure
to speed up matching. This acts like a "normal" Adapter as it provides a match_to
=====================================
src/cutadapt/log.py
=====================================
@@ -6,6 +6,17 @@ import logging
REPORT = 25
+class CrashingHandler(logging.StreamHandler):
+
+ def emit(self, record):
+ """Unlike the method it overrides, this will not catch exceptions"""
+ msg = self.format(record)
+ stream = self.stream
+ stream.write(msg)
+ stream.write(self.terminator)
+ self.flush()
+
+
class NiceFormatter(logging.Formatter):
"""
Do not prefix "INFO:" to info-level log messages (but do it for all other
@@ -19,7 +30,7 @@ class NiceFormatter(logging.Formatter):
return super().format(record)
-def setup_logging(logger, stdout=False, minimal=False, quiet=False, debug=False):
+def setup_logging(logger, stdout=False, minimal=False, quiet=False, debug=0):
"""
Attach handler to the global logger object
"""
@@ -28,12 +39,10 @@ def setup_logging(logger, stdout=False, minimal=False, quiet=False, debug=False)
# INFO level (and the ERROR level would give us an 'ERROR:' prefix).
logging.addLevelName(REPORT, 'REPORT')
- # Due to backwards compatibility, logging output is sent to standard output
- # instead of standard error if the -o option is used.
- stream_handler = logging.StreamHandler(sys.stdout if stdout else sys.stderr)
+ stream_handler = CrashingHandler(sys.stdout if stdout else sys.stderr)
stream_handler.setFormatter(NiceFormatter())
# debug overrides quiet overrides minimal
- if debug:
+ if debug > 0:
level = logging.DEBUG
elif quiet:
level = logging.ERROR
=====================================
src/cutadapt/modifiers.py
=====================================
@@ -9,7 +9,8 @@ from abc import ABC, abstractmethod
from collections import OrderedDict
from .qualtrim import quality_trim_index, nextseq_trim_index
-from .adapters import MultipleAdapters, SingleAdapter, IndexedPrefixAdapters, IndexedSuffixAdapters, Match, remainder
+from .adapters import MultipleAdapters, SingleAdapter, IndexedPrefixAdapters, IndexedSuffixAdapters, \
+ Match, remainder, Adapter
from .utils import reverse_complemented_sequence
@@ -76,18 +77,23 @@ class AdapterCutter(SingleEndModifier):
def __init__(
self,
- adapters: List[SingleAdapter],
+ adapters: List[Adapter],
times: int = 1,
action: Optional[str] = "trim",
index: bool = True,
):
"""
- action -- What to do with a found adapter: None, 'trim', 'mask' or 'lowercase'
+ action -- What to do with a found adapter:
+ None: Do nothing, only update the ModificationInfo appropriately
+ "trim": Remove the adapter and down- or upstream sequence depending on adapter type
+ "mask": Replace the part of the sequence that would have been removed with "N" bases
+ "lowercase": Convert the part of the sequence that would have been removed to lowercase
+ "retain": Like "trim", but leave the adapter sequence itself in the read
index -- if True, an adapter index (for multiple adapters) is created if possible
"""
self.times = times
- assert action in ('trim', 'mask', 'lowercase', None)
+ assert action in ("trim", "mask", "lowercase", "retain", None)
self.action = action
self.with_adapters = 0
self.adapter_statistics = OrderedDict((a, a.create_statistics()) for a in adapters)
@@ -95,6 +101,8 @@ class AdapterCutter(SingleEndModifier):
self.adapters = MultipleAdapters(self._regroup_into_indexed_adapters(adapters))
else:
self.adapters = MultipleAdapters(adapters)
+ if action == "retain" and times > 1:
+ raise ValueError("'retain' cannot be combined with times > 1")
def __repr__(self):
return 'AdapterCutter(adapters={!r}, times={}, action={!r})'.format(
@@ -141,6 +149,11 @@ class AdapterCutter(SingleEndModifier):
other.append(a)
return prefix, suffix, other
+ @staticmethod
+ def trim_but_retain_adapter(read, matches: Sequence[Match]):
+ start, stop = matches[-1].retained_adapter_interval()
+ return read[start:stop]
+
@staticmethod
def masked_read(read, matches: Sequence[Match]):
start, stop = remainder(matches)
@@ -174,7 +187,7 @@ class AdapterCutter(SingleEndModifier):
def match_and_trim(self, read):
"""
Search for the best-matching adapter in a read, perform the requested action
- ('trim', 'mask', 'lowercase' or None as determined by self.action) and return the
+ ('trim', 'mask' etc. as determined by self.action) and return the
(possibly) modified read.
*self.times* adapter removal rounds are done. During each round,
@@ -184,7 +197,7 @@ class AdapterCutter(SingleEndModifier):
Return a pair (trimmed_read, matches), where matches is a list of Match instances.
"""
matches = []
- if self.action == 'lowercase':
+ if self.action == 'lowercase': # TODO this should not be needed
read.sequence = read.sequence.upper()
trimmed_read = read
for _ in range(self.times):
@@ -201,12 +214,14 @@ class AdapterCutter(SingleEndModifier):
if self.action == 'trim':
# read is already trimmed, nothing to do
pass
+ elif self.action == 'retain':
+ trimmed_read = self.trim_but_retain_adapter(read, matches)
elif self.action == 'mask':
trimmed_read = self.masked_read(read, matches)
elif self.action == 'lowercase':
trimmed_read = self.lowercased_read(read, matches)
assert len(trimmed_read.sequence) == len(read)
- elif self.action is None: # --no-trim
+ elif self.action is None:
trimmed_read = read[:]
return trimmed_read, matches
@@ -321,6 +336,8 @@ class PairedAdapterCutter(PairedModifier):
elif self.action == 'lowercase':
trimmed_read = AdapterCutter.lowercased_read(read, [match])
assert len(trimmed_read.sequence) == len(read)
+ elif self.action == 'retain':
+ trimmed_read = AdapterCutter.trim_but_retain_adapter(read, [match])
elif self.action is None: # --no-trim
trimmed_read = read[:]
result.append(trimmed_read)
=====================================
src/cutadapt/parser.py
=====================================
@@ -352,7 +352,7 @@ class AdapterParser:
raise ValueError("'anywhere' (-b) adapters may not be linked")
front_spec = AdapterSpecification.parse(spec1, 'front')
back_spec = AdapterSpecification.parse(spec2, 'back')
- if not name:
+ if name is None:
name = front_spec.name
front_anchored = front_spec.restriction is not None
=====================================
src/cutadapt/pipeline.py
=====================================
@@ -4,7 +4,7 @@ import sys
import copy
import logging
import functools
-from typing import List, IO, Optional, BinaryIO, TextIO, Any, Tuple, Dict
+from typing import List, Optional, BinaryIO, TextIO, Any, Tuple, Dict
from abc import ABC, abstractmethod
from multiprocessing import Process, Pipe, Queue
from pathlib import Path
@@ -120,7 +120,7 @@ class OutputFiles:
assert f is not None
yield f
- def as_bytesio(self):
+ def as_bytesio(self) -> "OutputFiles":
"""
Create a new OutputFiles instance that has BytesIO instances for each non-None output file
"""
@@ -141,6 +141,13 @@ class OutputFiles:
result.demultiplex_out2[k] = io.BytesIO()
return result
+ def close(self) -> None:
+ """Close all output files that are not stdout"""
+ for f in self:
+ if f is sys.stdout or f is sys.stdout.buffer:
+ continue
+ f.close()
+
class Pipeline(ABC):
"""
@@ -150,7 +157,6 @@ class Pipeline(ABC):
paired = False
def __init__(self, file_opener: FileOpener):
- self._close_files = [] # type: List[IO]
self._reader = None # type: Any
self._filters = [] # type: List[Any]
self._infiles = None # type: Optional[InputFiles]
@@ -168,7 +174,7 @@ class Pipeline(ABC):
self.discard_untrimmed = False
self.file_opener = file_opener
- def connect_io(self, infiles: InputFiles, outfiles: OutputFiles):
+ def connect_io(self, infiles: InputFiles, outfiles: OutputFiles) -> None:
self._infiles = infiles
self._reader = infiles.open()
self._set_output(outfiles)
@@ -182,7 +188,7 @@ class Pipeline(ABC):
):
pass
- def _set_output(self, outfiles: OutputFiles):
+ def _set_output(self, outfiles: OutputFiles) -> None:
self._filters = []
self._outfiles = outfiles
filter_wrapper = self._filter_wrapper()
@@ -262,20 +268,21 @@ class Pipeline(ABC):
f.flush()
def close(self) -> None:
+ self._close_input()
+ self._close_output()
+
+ def _close_input(self) -> None:
self._reader.close()
if self._infiles is not None:
self._infiles.close()
+
+ def _close_output(self) -> None:
for f in self._textiowrappers:
- f.close() # This also closes the underlying files; a second close occurs below
+ f.close()
+ # Closing a TextIOWrapper also closes the underlying file, so
+ # this closes some files a second time.
assert self._outfiles is not None
- for f in self._outfiles:
- # TODO do not use hasattr
- if f is not sys.stdin and f is not sys.stdout and f is not sys.stdout.buffer and hasattr(f, 'close'):
- f.close()
- for outs in [self._outfiles.demultiplex_out, self._outfiles.demultiplex_out2]:
- if outs is not None:
- for out in outs.values():
- out.close()
+ self._outfiles.close()
@property
def uses_qualities(self) -> bool:
@@ -283,7 +290,7 @@ class Pipeline(ABC):
return self._reader.delivers_qualities
@abstractmethod
- def process_reads(self, progress: Progress = None):
+ def process_reads(self, progress: Progress = None) -> Tuple[int, int, Optional[int]]:
pass
@abstractmethod
@@ -316,7 +323,7 @@ class SingleEndPipeline(Pipeline):
raise ValueError("Modifier must not be None")
self._modifiers.append(modifier)
- def process_reads(self, progress: Progress = None):
+ def process_reads(self, progress: Progress = None) -> Tuple[int, int, Optional[int]]:
"""Run the pipeline. Return statistics"""
n = 0 # no. of processed reads
total_bp = 0
@@ -402,7 +409,7 @@ class PairedEndPipeline(Pipeline):
# Whether to ignore pair_filter mode for discard-untrimmed filter
self.override_untrimmed_pair_filter = False
- def add(self, modifier1: Optional[SingleEndModifier], modifier2: Optional[SingleEndModifier]):
+ def add(self, modifier1: Optional[SingleEndModifier], modifier2: Optional[SingleEndModifier]) -> None:
"""
Add a modifier for R1 and R2. One of them can be None, in which case the modifier
will only be added for the respective read.
@@ -411,18 +418,18 @@ class PairedEndPipeline(Pipeline):
raise ValueError("Not both modifiers can be None")
self._modifiers.append(PairedModifierWrapper(modifier1, modifier2))
- def add_both(self, modifier: SingleEndModifier):
+ def add_both(self, modifier: SingleEndModifier) -> None:
"""
Add one modifier for both R1 and R2
"""
assert modifier is not None
self._modifiers.append(PairedModifierWrapper(modifier, copy.copy(modifier)))
- def add_paired_modifier(self, paired_modifier: PairedModifier):
+ def add_paired_modifier(self, paired_modifier: PairedModifier) -> None:
"""Add a Modifier (without wrapping it in a PairedModifierWrapper)"""
self._modifiers.append(paired_modifier)
- def process_reads(self, progress: Progress = None):
+ def process_reads(self, progress: Progress = None) -> Tuple[int, int, Optional[int]]:
n = 0 # no. of processed reads
total1_bp = 0
total2_bp = 0
@@ -594,8 +601,17 @@ class WorkerProcess(Process):
To notify the reader process that it wants data, it puts its own identifier into the
need_work_queue before attempting to read data from the read_pipe.
"""
- def __init__(self, id_, pipeline, two_input_files,
- interleaved_input, orig_outfiles, read_pipe, write_pipe, need_work_queue):
+ def __init__(
+ self,
+ id_: int,
+ pipeline: Pipeline,
+ two_input_files: bool,
+ interleaved_input: bool,
+ orig_outfiles: OutputFiles,
+ read_pipe: Connection,
+ write_pipe: Connection,
+ need_work_queue: Queue,
+ ):
super().__init__()
self._id = id_
self._pipeline = pipeline
@@ -604,7 +620,9 @@ class WorkerProcess(Process):
self._read_pipe = read_pipe
self._write_pipe = write_pipe
self._need_work_queue = need_work_queue
- self._original_outfiles = orig_outfiles
+ # Do not store orig_outfiles directly because it contains
+ # _io.BufferedWriter attributes, which cannot be pickled.
+ self._original_outfiles = orig_outfiles.as_bytesio()
def run(self):
try:
@@ -690,7 +708,7 @@ class PipelineRunner(ABC):
"""
A read processing pipeline
"""
- def __init__(self, pipeline: Pipeline, progress: Progress, *args, **kwargs):
+ def __init__(self, pipeline: Pipeline, progress: Progress):
self._pipeline = pipeline
self._progress = progress
@@ -746,14 +764,14 @@ class ParallelPipelineRunner(PipelineRunner):
self._need_work_queue = Queue() # type: Queue
self._buffer_size = buffer_size
self._assign_input(infiles.path1, infiles.path2, infiles.interleaved)
- self._assign_output(outfiles)
+ self._outfiles = outfiles
def _assign_input(
self,
path1: str,
path2: Optional[str] = None,
interleaved: bool = False,
- ):
+ ) -> None:
self._two_input_files = path2 is not None
self._interleaved_input = interleaved
# the workers read from these connections
@@ -770,9 +788,6 @@ class ParallelPipelineRunner(PipelineRunner):
self._reader_process.daemon = True
self._reader_process.start()
- def _assign_output(self, outfiles: OutputFiles):
- self._outfiles = outfiles
-
def _start_workers(self) -> Tuple[List[WorkerProcess], List[Connection]]:
workers = []
connections = []
@@ -789,16 +804,17 @@ class ParallelPipelineRunner(PipelineRunner):
workers.append(worker)
return workers, connections
- def run(self):
+ def run(self) -> Statistics:
workers, connections = self._start_workers()
writers = []
for f in self._outfiles:
writers.append(OrderedChunkWriter(f))
- stats = None
+ stats = Statistics()
n = 0 # A running total of the number of processed reads (for progress indicator)
while connections:
ready_connections = multiprocessing.connection.wait(connections)
for connection in ready_connections:
+ assert isinstance(connection, Connection)
chunk_index = connection.recv()
if chunk_index == -1:
# the worker is done
@@ -810,10 +826,7 @@ class ParallelPipelineRunner(PipelineRunner):
e, tb_str = connection.recv()
logger.error('%s', tb_str)
raise e
- if stats is None:
- stats = cur_stats
- else:
- stats += cur_stats
+ stats += cur_stats
connections.remove(connection)
continue
elif chunk_index == -2:
@@ -844,11 +857,8 @@ class ParallelPipelineRunner(PipelineRunner):
self._progress.stop(n)
return stats
- def close(self):
- for f in self._outfiles:
- # TODO do not use hasattr
- if f is not sys.stdin and f is not sys.stdout and f is not sys.stdout.buffer and hasattr(f, 'close'):
- f.close()
+ def close(self) -> None:
+ self._outfiles.close()
class SerialPipelineRunner(PipelineRunner):
@@ -866,12 +876,14 @@ class SerialPipelineRunner(PipelineRunner):
super().__init__(pipeline, progress)
self._pipeline.connect_io(infiles, outfiles)
- def run(self):
+ def run(self) -> Statistics:
(n, total1_bp, total2_bp) = self._pipeline.process_reads(progress=self._progress)
if self._progress:
self._progress.stop(n)
# TODO
- return Statistics().collect(n, total1_bp, total2_bp, self._pipeline._modifiers, self._pipeline._filters)
+ modifiers = getattr(self._pipeline, "_modifiers", None)
+ assert modifiers is not None
+ return Statistics().collect(n, total1_bp, total2_bp, modifiers, self._pipeline._filters)
- def close(self):
+ def close(self) -> None:
self._pipeline.close()
=====================================
src/cutadapt/report.py
=====================================
@@ -1,6 +1,7 @@
"""
Routines for printing a report.
"""
+import sys
from io import StringIO
import textwrap
from collections import Counter
@@ -11,7 +12,8 @@ from .adapters import (
)
from .modifiers import (SingleEndModifier, PairedModifier, QualityTrimmer, NextseqQualityTrimmer,
AdapterCutter, PairedAdapterCutter, ReverseComplementer)
-from .filters import WithStatistics, TooShortReadFilter, TooLongReadFilter, NContentFilter
+from .filters import (WithStatistics, TooShortReadFilter, TooLongReadFilter, NContentFilter,
+ CasavaFilter, MaximumExpectedErrorsFilter)
def safe_divide(numerator, denominator):
@@ -38,6 +40,8 @@ class Statistics:
self.too_short = None
self.too_long = None
self.too_many_n = None
+ self.too_many_expected_errors = None
+ self.casava_filtered = None
self.reverse_complemented = None # type: Optional[int]
self.n = 0
self.written = 0
@@ -66,6 +70,9 @@ class Statistics:
self.too_short = add_if_not_none(self.too_short, other.too_short)
self.too_long = add_if_not_none(self.too_long, other.too_long)
self.too_many_n = add_if_not_none(self.too_many_n, other.too_many_n)
+ self.too_many_expected_errors = add_if_not_none(
+ self.too_many_expected_errors, other.too_many_expected_errors)
+ self.casava_filtered = add_if_not_none(self.casava_filtered, other.casava_filtered)
for i in (0, 1):
self.total_bp[i] += other.total_bp[i]
self.written_bp[i] += other.written_bp[i]
@@ -120,6 +127,10 @@ class Statistics:
self.too_long = w.filtered
elif isinstance(w.filter, NContentFilter):
self.too_many_n = w.filtered
+ elif isinstance(w.filter, MaximumExpectedErrorsFilter):
+ self.too_many_expected_errors = w.filtered
+ elif isinstance(w.filter, CasavaFilter):
+ self.casava_filtered = w.filtered
def _collect_modifier(self, m: SingleEndModifier):
if isinstance(m, PairedAdapterCutter):
@@ -187,6 +198,14 @@ class Statistics:
def too_many_n_fraction(self):
return safe_divide(self.too_many_n, self.n)
+ @property
+ def too_many_expected_errors_fraction(self):
+ return safe_divide(self.too_many_expected_errors, self.n)
+
+ @property
+ def casava_filtered_fraction(self):
+ return safe_divide(self.casava_filtered, self.n)
+
def error_ranges(adapter_statistics: EndStatistics):
length = adapter_statistics.effective_length
@@ -291,8 +310,12 @@ def full_report(stats: Statistics, time: float, gc_content: float) -> str: # no
kwargs['file'] = sio
print(*args, **kwargs)
- print_s("Finished in {:.2F} s ({:.0F} µs/read; {:.2F} M reads/minute).".format(
- time, 1E6 * time / stats.n, stats.n / time * 60 / 1E6))
+ if sys.version_info[:2] <= (3, 6):
+ micro = "u"
+ else:
+ micro = "µ"
+ print_s("Finished in {:.2F} s ({:.0F} {}s/read; {:.2F} M reads/minute).".format(
+ time, 1E6 * time / stats.n, micro, stats.n / time * 60 / 1E6))
report = "\n=== Summary ===\n\n"
if stats.paired:
@@ -309,12 +332,19 @@ def full_report(stats: Statistics, time: float, gc_content: float) -> str: # no
if stats.reverse_complemented is not None:
report += "Reverse-complemented: " \
"{o.reverse_complemented:13,d} ({o.reverse_complemented_fraction:.1%})\n"
+
if stats.too_short is not None:
report += "{pairs_or_reads} that were too short: {o.too_short:13,d} ({o.too_short_fraction:.1%})\n"
if stats.too_long is not None:
report += "{pairs_or_reads} that were too long: {o.too_long:13,d} ({o.too_long_fraction:.1%})\n"
if stats.too_many_n is not None:
report += "{pairs_or_reads} with too many N: {o.too_many_n:13,d} ({o.too_many_n_fraction:.1%})\n"
+ if stats.too_many_expected_errors is not None:
+ report += "{pairs_or_reads} with too many exp. errors: " \
+ "{o.too_many_expected_errors:13,d} ({o.too_many_expected_errors_fraction:.1%})\n"
+ if stats.casava_filtered is not None:
+ report += "{pairs_or_reads} failed CASAVA filter: " \
+ "{o.casava_filtered:13,d} ({o.casava_filtered_fraction:.1%})\n"
report += textwrap.dedent("""\
{pairs_or_reads} written (passing filters): {o.written:13,d} ({o.written_fraction:.1%})
=====================================
src/cutadapt/utils.py
=====================================
@@ -194,6 +194,7 @@ class FileOpener:
f = self.dnaio_open(*args, **kwargs)
except OSError as e:
if e.errno == errno.EMFILE: # Too many open files
+ logger.debug("Too many open files, attempting to raise soft limit")
raise_open_files_limit(8)
f = self.dnaio_open(*args, **kwargs)
else:
=====================================
tests/cut/action_retain.fasta
=====================================
@@ -0,0 +1,16 @@
+>r1
+CGTCCGAAcaag
+>r2
+caag
+>r3
+TGCCCTAGTcaag
+>r4
+TGCCCTAGTcaa
+>r5
+ggttaaCCGCCTTGA
+>r6
+ggttaaCATTGCCCTAGTTTATT
+>r7
+ttaaGTTCATGT
+>r8
+ACGTACGT
=====================================
tests/data/action_retain.fasta
=====================================
@@ -0,0 +1,16 @@
+>r1
+CGTCCGAAcaagCCTGCCACAT
+>r2
+caagACAAGACCT
+>r3
+TGCCCTAGTcaag
+>r4
+TGCCCTAGTcaa
+>r5
+TGTTGggttaaCCGCCTTGA
+>r6
+ggttaaCATTGCCCTAGTTTATT
+>r7
+ttaaGTTCATGT
+>r8
+ACGTACGT
=====================================
tests/test_command.py
=====================================
@@ -0,0 +1,79 @@
+"""Tests that run the program in a subprocess"""
+
+import subprocess
+import sys
+
+from utils import datapath, assert_files_equal, cutpath
+
+
+def test_run_cutadapt_process():
+ subprocess.check_call(["cutadapt", "--version"])
+
+
+def test_run_as_module():
+ """Check that "python3 -m cutadapt ..." works"""
+ from cutadapt import __version__
+ with subprocess.Popen([sys.executable, "-m", "cutadapt", "--version"], stdout=subprocess.PIPE) as py:
+ assert py.communicate()[0].decode().strip() == __version__
+
+
+def test_standard_input_pipe(tmpdir, cores):
+ """Read FASTQ from standard input"""
+ out_path = str(tmpdir.join("out.fastq"))
+ in_path = datapath("small.fastq")
+ # Use 'cat' to simulate that no file name is available for stdin
+ with subprocess.Popen(["cat", in_path], stdout=subprocess.PIPE) as cat:
+ with subprocess.Popen([
+ sys.executable, "-m", "cutadapt", "--cores", str(cores),
+ "-a", "TTAGACATATCTCCGTCG", "-o", out_path, "-"],
+ stdin=cat.stdout
+ ) as py:
+ _ = py.communicate()
+ cat.stdout.close()
+ _ = py.communicate()[0]
+ assert_files_equal(cutpath("small.fastq"), out_path)
+
+
+def test_standard_output(tmpdir, cores):
+ """Write FASTQ to standard output (not using --output/-o option)"""
+ out_path = str(tmpdir.join("out.fastq"))
+ with open(out_path, "w") as out_file:
+ py = subprocess.Popen([
+ sys.executable, "-m", "cutadapt", "--cores", str(cores),
+ "-a", "TTAGACATATCTCCGTCG", datapath("small.fastq")],
+ stdout=out_file)
+ _ = py.communicate()
+ assert_files_equal(cutpath("small.fastq"), out_path)
+
+
+def test_explicit_standard_output(tmpdir, cores):
+ """Write FASTQ to standard output (using "-o -")"""
+
+ out_path = str(tmpdir.join("out.fastq"))
+ with open(out_path, "w") as out_file:
+ py = subprocess.Popen([
+ sys.executable, "-m", "cutadapt", "-o", "-", "--cores", str(cores),
+ "-a", "TTAGACATATCTCCGTCG", datapath("small.fastq")],
+ stdout=out_file)
+ _ = py.communicate()
+ assert_files_equal(cutpath("small.fastq"), out_path)
+
+
+def test_force_fasta_output(tmpdir, cores):
+ """Write FASTA to standard output even on FASTQ input"""
+
+ out_path = str(tmpdir.join("out.fasta"))
+ with open(out_path, "w") as out_file:
+ py = subprocess.Popen([
+ sys.executable, "-m", "cutadapt", "--fasta", "-o", "-", "--cores", str(cores),
+ "-a", "TTAGACATATCTCCGTCG", datapath("small.fastq")],
+ stdout=out_file)
+ _ = py.communicate()
+ assert_files_equal(cutpath("small.fasta"), out_path)
+
+
+def test_non_utf8_locale():
+ subprocess.check_call(
+ [sys.executable, "-m", "cutadapt", "-o", "/dev/null", datapath("small.fastq")],
+ env={"LC_CTYPE": "C"},
+ )
=====================================
tests/test_commandline.py
=====================================
@@ -1,10 +1,7 @@
-import os
-import shutil
+import subprocess
import sys
-import tempfile
from io import StringIO, BytesIO
import pytest
-import subprocess
from cutadapt.__main__ import main
from utils import assert_files_equal, datapath, cutpath
@@ -54,7 +51,7 @@ def test_debug():
def test_debug_trace():
- main(["--debug=trace", "-a", "ACGT", datapath("small.fastq")])
+ main(["--debug", "--debug", "-a", "ACGT", datapath("small.fastq")])
def test_example(run):
@@ -230,6 +227,15 @@ def test_action_lowercase(run):
run("-b CAAG -n 3 --action=lowercase", "action_lowercase.fasta", "action_lowercase.fasta")
+def test_action_retain(run):
+ run("-g GGTTAACC -a CAAG --action=retain", "action_retain.fasta", "action_retain.fasta")
+
+
+def test_action_retain_times():
+ with pytest.raises(SystemExit):
+ main(["-a", "ACGT", "--times=2", "--action=retain", datapath("small.fastq")])
+
+
def test_gz_multiblock(run):
"""compressed gz file with multiple blocks (created by concatenating two .gz files)"""
run("-b TTAGACATATCTCCGTCG", "small.fastq", "multiblock.fastq.gz")
@@ -470,9 +476,9 @@ def test_adapter_file_empty_name(run):
run('-N -a file:' + datapath('adapter-empty-name.fasta'), 'illumina.fastq', 'illumina.fastq.gz')
-def test_demultiplex(cores):
- tempdir = tempfile.mkdtemp(prefix='cutadapt-tests.')
- multiout = os.path.join(tempdir, 'tmp-demulti.{name}.fasta')
+ at pytest.mark.parametrize("ext", ["", ".gz"])
+def test_demultiplex(cores, tmp_path, ext):
+ multiout = str(tmp_path / 'tmp-demulti.{name}.fasta') + ext
params = [
'--cores', str(cores),
'-a', 'first=AATTTCAGGAATT',
@@ -481,10 +487,13 @@ def test_demultiplex(cores):
datapath('twoadapters.fasta'),
]
main(params)
- assert_files_equal(cutpath('twoadapters.first.fasta'), multiout.format(name='first'))
- assert_files_equal(cutpath('twoadapters.second.fasta'), multiout.format(name='second'))
- assert_files_equal(cutpath('twoadapters.unknown.fasta'), multiout.format(name='unknown'))
- shutil.rmtree(tempdir)
+ for name in ("first", "second", "unknown"):
+ actual = multiout.format(name=name)
+ if ext == ".gz":
+ subprocess.run(["gzip", "-d", actual], check=True)
+ actual = actual[:-3]
+ expected = cutpath("twoadapters.{name}.fasta".format(name=name))
+ assert_files_equal(expected, actual)
def test_multiple_fake_anchored_adapters(run):
@@ -610,10 +619,6 @@ def test_negative_length(run):
run('--length -5', 'shortened-negative.fastq', 'small.fastq')
-def test_run_cutadapt_process():
- subprocess.check_call(['cutadapt', '--version'])
-
-
@pytest.mark.timeout(0.5)
def test_issue_296(tmpdir):
# Hang when using both --no-trim and --info-file together
@@ -636,7 +641,8 @@ def test_adapterx(run):
def test_discard_casava(run):
- run('--discard-casava', 'casava.fastq', 'casava.fastq')
+ stats = run('--discard-casava', 'casava.fastq', 'casava.fastq')
+ assert stats.casava_filtered == 1
def test_underscore(run):
@@ -663,68 +669,6 @@ def test_paired_separate(run):
run("-a CAGTGGAGTA", "paired-separate.2.fastq", "paired.2.fastq")
-def test_run_as_module():
- """Check that "python3 -m cutadapt ..." works"""
- from cutadapt import __version__
- py = subprocess.Popen([sys.executable, "-m", "cutadapt", "--version"], stdout=subprocess.PIPE)
- assert py.communicate()[0].decode().strip() == __version__
-
-
-def test_standard_input_pipe(tmpdir, cores):
- """Read FASTQ from standard input"""
- out_path = str(tmpdir.join("out.fastq"))
- in_path = datapath("small.fastq")
- # Use 'cat' to simulate that no file name is available for stdin
- with subprocess.Popen(["cat", in_path], stdout=subprocess.PIPE) as cat:
- with subprocess.Popen([
- sys.executable, "-m", "cutadapt", "--cores", str(cores),
- "-a", "TTAGACATATCTCCGTCG", "-o", out_path, "-"],
- stdin=cat.stdout
- ) as py:
- _ = py.communicate()
- cat.stdout.close()
- _ = py.communicate()[0]
- assert_files_equal(cutpath("small.fastq"), out_path)
-
-
-def test_standard_output(tmpdir, cores):
- """Write FASTQ to standard output (not using --output/-o option)"""
- out_path = str(tmpdir.join("out.fastq"))
- with open(out_path, "w") as out_file:
- py = subprocess.Popen([
- sys.executable, "-m", "cutadapt", "--cores", str(cores),
- "-a", "TTAGACATATCTCCGTCG", datapath("small.fastq")],
- stdout=out_file)
- _ = py.communicate()
- assert_files_equal(cutpath("small.fastq"), out_path)
-
-
-def test_explicit_standard_output(tmpdir, cores):
- """Write FASTQ to standard output (using "-o -")"""
-
- out_path = str(tmpdir.join("out.fastq"))
- with open(out_path, "w") as out_file:
- py = subprocess.Popen([
- sys.executable, "-m", "cutadapt", "-o", "-", "--cores", str(cores),
- "-a", "TTAGACATATCTCCGTCG", datapath("small.fastq")],
- stdout=out_file)
- _ = py.communicate()
- assert_files_equal(cutpath("small.fastq"), out_path)
-
-
-def test_force_fasta_output(tmpdir, cores):
- """Write FASTA to standard output even on FASTQ input"""
-
- out_path = str(tmpdir.join("out.fasta"))
- with open(out_path, "w") as out_file:
- py = subprocess.Popen([
- sys.executable, "-m", "cutadapt", "--fasta", "-o", "-", "--cores", str(cores),
- "-a", "TTAGACATATCTCCGTCG", datapath("small.fastq")],
- stdout=out_file)
- _ = py.communicate()
- assert_files_equal(cutpath("small.fasta"), out_path)
-
-
def test_empty_read_with_wildcard_in_adapter(run):
run("-g CWC", "empty.fastq", "empty.fastq")
@@ -773,7 +717,8 @@ def test_reverse_complement_and_info_file(run, tmp_path, cores):
def test_max_expected_errors(run, cores):
- run("--max-ee=0.9", "maxee.fastq", "maxee.fastq")
+ stats = run("--max-ee=0.9", "maxee.fastq", "maxee.fastq")
+ assert stats.too_many_expected_errors == 2
def test_max_expected_errors_fasta(tmp_path):
=====================================
tests/test_modifiers.py
=====================================
@@ -1,7 +1,10 @@
+from typing import List
+
import pytest
from dnaio import Sequence
-from cutadapt.adapters import BackAdapter, PrefixAdapter, IndexedPrefixAdapters
+from cutadapt.adapters import BackAdapter, PrefixAdapter, IndexedPrefixAdapters, LinkedAdapter, \
+ FrontAdapter, Adapter
from cutadapt.modifiers import (UnconditionalCutter, NEndTrimmer, QualityTrimmer,
Shortener, AdapterCutter, PairedAdapterCutter, ModificationInfo, ZeroCapper)
@@ -80,7 +83,8 @@ def test_adapter_cutter_indexing():
(None, "CCCCGGTTAACCCC", "TTTTAACCGGTTTT"),
("trim", "CCCC", "TTTT"),
("lowercase", "CCCCggttaacccc", "TTTTaaccggtttt"),
- ("mask", "CCCCNNNNNNNNNN", "TTTTNNNNNNNNNN")
+ ("mask", "CCCCNNNNNNNNNN", "TTTTNNNNNNNNNN"),
+ ("retain", "CCCCGGTTAA", "TTTTAACCGG"),
])
def test_paired_adapter_cutter_actions(action, expected_trimmed1, expected_trimmed2):
a1 = BackAdapter("GGTTAA")
@@ -93,3 +97,37 @@ def test_paired_adapter_cutter_actions(action, expected_trimmed1, expected_trimm
trimmed1, trimmed2 = pac(s1, s2, info1, info2)
assert expected_trimmed1 == trimmed1.sequence
assert expected_trimmed2 == trimmed2.sequence
+
+
+def test_retain_times():
+ with pytest.raises(ValueError) as e:
+ AdapterCutter([BackAdapter("ACGT")], times=2, action="retain")
+ assert "cannot be combined with times" in e.value.args[0]
+
+
+def test_action_retain():
+ back = BackAdapter("AACCGG")
+ ac = AdapterCutter([back], action="retain")
+ seq = Sequence("r1", "ATTGCCAACCGGTATATAT")
+ info = ModificationInfo(seq)
+ trimmed = ac(seq, info)
+ assert "ATTGCCAACCGG" == trimmed.sequence
+
+
+ at pytest.mark.parametrize("s,expected", [
+ ("ATTATTggttaaccAAAAAaaccggTATT", "ggttaaccAAAAAaaccgg"),
+ ("AAAAAaaccggTATT", "AAAAAaaccgg"),
+ ("ATTATTggttaaccAAAAA", "ggttaaccAAAAA"),
+ ("ATTATT", "ATTATT"),
+])
+def test_linked_action_retain(s, expected):
+ front = FrontAdapter("GGTTAACC")
+ back = BackAdapter("AACCGG")
+ adapters: List[Adapter] = [
+ LinkedAdapter(front, back, front_required=False, back_required=False, name="linked")
+ ]
+ ac = AdapterCutter(adapters, action="retain")
+ seq = Sequence("r1", s)
+ info = ModificationInfo(seq)
+ trimmed = ac(seq, info)
+ assert expected == trimmed.sequence
=====================================
tox.ini
=====================================
@@ -13,6 +13,7 @@ commands =
coverage run --concurrency=multiprocessing -m pytest --doctest-modules --pyargs cutadapt tests
coverage combine
coverage report
+ coverage xml
[testenv:docs]
basepython = python3.6
View it on GitLab: https://salsa.debian.org/med-team/python-cutadapt/-/compare/d9941300422f3af409fc54b1967f71cc9765233c...20c3da4983c7b52684fbfbec51ac8e6db3bc3b26
--
View it on GitLab: https://salsa.debian.org/med-team/python-cutadapt/-/compare/d9941300422f3af409fc54b1967f71cc9765233c...20c3da4983c7b52684fbfbec51ac8e6db3bc3b26
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20201212/ed658180/attachment-0001.html>
More information about the debian-med-commit
mailing list