[med-svn] [Git][med-team/python-dnaio][master] 4 commits: New upstream version 0.10.0

Nilesh Patra (@nilesh) gitlab at salsa.debian.org
Sat Dec 31 11:40:18 GMT 2022



Nilesh Patra pushed to branch master at Debian Med / python-dnaio


Commits:
9a606fda by Nilesh Patra at 2022-12-31T17:03:50+05:30
New upstream version 0.10.0
- - - - -
ecc4f29e by Nilesh Patra at 2022-12-31T17:03:50+05:30
Update upstream source from tag 'upstream/0.10.0'

Update to upstream version '0.10.0'
with Debian dir 7c21dae8da332ca6b952e1452395dedef7c163ab
- - - - -
142d2736 by Nilesh Patra at 2022-12-31T17:04:03+05:30
Bump Standards-Version to 4.6.2 (no changes needed)

- - - - -
5855df2a by Nilesh Patra at 2022-12-31T17:05:16+05:30
Upload to unstable

- - - - -


22 changed files:

- − .github/workflows/ci.yml
- CHANGES.rst
- debian/changelog
- debian/control
- doc/api.rst
- doc/tutorial.rst
- pyproject.toml
- src/dnaio/__init__.py
- src/dnaio/_core.pyi
- src/dnaio/_core.pyx
- src/dnaio/_util.py
- src/dnaio/interfaces.py
- + src/dnaio/multipleend.py
- src/dnaio/pairedend.py
- src/dnaio/readers.py
- src/dnaio/singleend.py
- src/dnaio/writers.py
- tests/test_chunks.py
- tests/test_internal.py
- + tests/test_multiple.py
- tests/test_open.py
- tox.ini


Changes:

=====================================
.github/workflows/ci.yml deleted
=====================================
@@ -1,107 +0,0 @@
-name: CI
-
-on: [push, pull_request]
-
-jobs:
-  lint:
-    timeout-minutes: 10
-    runs-on: ubuntu-latest
-    strategy:
-      matrix:
-        python-version: [3.7]
-        toxenv: [flake8, black, mypy, docs]
-    steps:
-    - uses: actions/checkout at v2
-    - name: Set up Python ${{ matrix.python-version }}
-      uses: actions/setup-python at v2
-      with:
-        python-version: ${{ matrix.python-version }}
-    - name: Install tox
-      run: python -m pip install tox
-    - name: Run tox ${{ matrix.toxenv }}
-      run: tox -e ${{ matrix.toxenv }}
-
-  build:
-    runs-on: ubuntu-latest
-    steps:
-    - uses: actions/checkout at v2
-      with:
-        fetch-depth: 0  # required for setuptools_scm
-    - name: Build sdist and temporary wheel
-      run: pipx run build
-    - uses: actions/upload-artifact at v2
-      with:
-        name: sdist
-        path: dist/*.tar.gz
-
-  test:
-    timeout-minutes: 10
-    runs-on: ${{ matrix.os }}
-    strategy:
-      matrix:
-        os: [ubuntu-latest]
-        python-version: ["3.7", "3.8", "3.9", "3.10"]
-        include:
-        - os: macos-latest
-          python-version: 3.8
-        - os: windows-latest
-          python-version: 3.8
-    steps:
-    - uses: actions/checkout at v2
-    - name: Set up Python ${{ matrix.python-version }}
-      uses: actions/setup-python at v2
-      with:
-        python-version: ${{ matrix.python-version }}
-    - name: Install tox
-      run: python -m pip install tox
-    - name: Test
-      run: tox -e py
-    - name: Upload coverage report
-      uses: codecov/codecov-action at v1
-
-  wheels:
-    if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags')
-    needs: [lint, test]
-    timeout-minutes: 15
-    strategy:
-      matrix:
-        os: [ubuntu-20.04, windows-2019, macos-10.15]
-    runs-on: ${{ matrix.os }}
-    steps:
-    - uses: actions/checkout at v2
-      with:
-        fetch-depth: 0  # required for setuptools_scm
-    - name: Build wheels
-      uses: pypa/cibuildwheel at v2.3.1
-      env:
-        CIBW_BUILD: "cp*-manylinux_x86_64 cp3*-win_amd64 cp3*-macosx_x86_64"
-        CIBW_ENVIRONMENT: "CFLAGS=-g0"
-        CIBW_TEST_REQUIRES: "pytest"
-        CIBW_TEST_COMMAND_LINUX: "cd {project} && pytest tests"
-        CIBW_TEST_COMMAND_MACOS: "cd {project} && pytest tests"
-        CIBW_TEST_COMMAND_WINDOWS: "cd /d {project} && pytest tests"
-    - uses: actions/upload-artifact at v2
-      with:
-        name: wheels
-        path: wheelhouse/*.whl
-
-  publish:
-    if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags')
-    needs: [build, wheels]
-    runs-on: ubuntu-latest
-    steps:
-    - uses: actions/download-artifact at v2
-      with:
-        name: sdist
-        path: dist/
-    - uses: actions/download-artifact at v2
-      with:
-        name: wheels
-        path: dist/
-    - name: Publish to PyPI
-      uses: pypa/gh-action-pypi-publish at v1.4.2
-      with:
-        user: __token__
-        password: ${{ secrets.pypi_password }}
-        #password: ${{ secrets.test_pypi_password }}
-        #repository_url: https://test.pypi.org/legacy/


=====================================
CHANGES.rst
=====================================
@@ -2,6 +2,18 @@
 Changelog
 =========
 
+v0.10.0 (2022-12-05)
+--------------------
+
+* :pr:`99`: SequenceRecord initialization is now faster, which also provides
+  a speed boost to FASTQ iteration. ``SequenceRecord.__new__`` cannot be used
+  anymore to initialize `SequenceRecord` objects.
+* :pr:`96`: ``open_threads`` and ``compression_level`` are now added
+  to `~dnaio.open` as arguments. By default dnaio now uses compression level
+  1 and does not utilize external programs to speed up gzip (de)compression.
+* :pr:`87`: `~dnaio.open` can now open more than two files.
+  The ``file1`` and ``file2`` arguments are now deprecated.
+
 v0.9.1 (2022-08-01)
 -------------------
 
@@ -12,7 +24,7 @@ v0.9.1 (2022-08-01)
 v0.9.0 (2022-05-17)
 -------------------
 
-* :pr:`79`: Added a `records_are_mates` function to be used for checking whether
+* :pr:`79`: Added a `~dnaio.records_are_mates` function to be used for checking whether
   three or more records are mates of each other (by checking the ID).
 * :pr:`74`, :pr:`68`: Made FASTQ parsing faster by implementing the check for
   ASCII using SSE vector instructions.
@@ -23,12 +35,13 @@ v0.8.0 (2022-03-26)
 
 * Preliminary documentation is available at
   <https://dnaio.readthedocs.io/>.
-* :pr:`53`: Renamed ``Sequence`` to `SequenceRecord`.
+* :pr:`53`: Renamed ``Sequence`` to `~dnaio.SequenceRecord`.
   The previous name is still available as an alias
   so that existing code will continue to work.
 * When reading a FASTQ file, there is now a check that ensures that
   all characters are ASCII.
-* Function ``record_names_match`` is deprecated, use `SequenceRecord.is_mate` instead.
+* Function ``record_names_match`` is deprecated, use `~dnaio.SequenceRecord.is_mate` instead.
+* Added `~dnaio.SequenceRecord.reverse_complement`.
 * Dropped Python 3.6 support as it is end-of-life.
 
 v0.7.1 (2022-01-26)


=====================================
debian/changelog
=====================================
@@ -1,8 +1,14 @@
-python-dnaio (0.9.1-2) UNRELEASED; urgency=medium
+python-dnaio (0.10.0-1) unstable; urgency=medium
 
+  * Team Upload.
+  [ Andreas Tille ]
   * d/watch: Proper upstream tarball name
 
- -- Andreas Tille <tille at debian.org>  Fri, 26 Aug 2022 08:13:03 +0200
+  [ Nilesh Patra ]
+  * New upstream version 0.10.0
+  * Bump Standards-Version to 4.6.2 (no changes needed)
+
+ -- Nilesh Patra <nilesh at debian.org>  Sat, 31 Dec 2022 17:05:08 +0530
 
 python-dnaio (0.9.1-1) unstable; urgency=medium
 


=====================================
debian/control
=====================================
@@ -11,7 +11,7 @@ Build-Depends: debhelper-compat (= 13),
                python3-pytest,
                python3-xopen,
                cython3
-Standards-Version: 4.6.1
+Standards-Version: 4.6.2
 Vcs-Browser: https://salsa.debian.org/med-team/python-dnaio
 Vcs-Git: https://salsa.debian.org/med-team/python-dnaio.git
 Homepage: https://github.com/marcelm/dnaio


=====================================
doc/api.rst
=====================================
@@ -36,6 +36,9 @@ Reader and writer interfaces
 .. autoclass:: PairedEndWriter
    :members: write
 
+.. autoclass:: MultipleFileWriter
+   :members: write, write_iterable
+
 
 Reader and writer classes
 -------------------------
@@ -68,6 +71,15 @@ They can also be used directly if needed.
 .. autoclass:: InterleavedPairedEndWriter
    :show-inheritance:
 
+.. autoclass:: MultipleFileReader
+   :members: __iter__
+
+.. autoclass:: MultipleFastaWriter
+   :show-inheritance:
+
+.. autoclass:: MultipleFastqWriter
+   :show-inheritance:
+
 
 Chunked reading of sequence records
 -----------------------------------


=====================================
doc/tutorial.rst
=====================================
@@ -61,9 +61,8 @@ pass the ``mode="w"`` argument to ``dnaio.open``::
 Here, a `~dnaio.FastqWriter` object is returned by ``dnaio.open``,
 which has a `~dnaio.FastqWriter.write()` method that accepts a ``SequenceRecord``.
 
-Instead of constructing a single record from scratch,
-in practice it is more realistic to take input reads,
-process them, and write them to a new output file.
+A possibly more common use case is to read an input file,
+modify the reads and write them to a new output file.
 The following example program shows how that can be done.
 It truncates all reads in the input file to a length of 30 nt
 and writes them to another file::
@@ -83,23 +82,22 @@ trimmed to the first 30 characters, leaving the name unchanged.
 Paired-end data
 ---------------
 
-Paired-end data is supported in two forms:
-Either as a single file that contains the read in an interleaved form (R1, R2, R1, R2, ...)
-or as two separate files. To read from separate files, provide the ``file2=`` argument
-with the name of the second file to ``dnaio.open``::
+Paired-end data is supported in two forms: Two separate files or interleaved.
+
+To read from separate files, provide two input file names to the ``dnaio.open``
+function::
 
     import dnaio
 
-    with dnaio.open("reads.1.fastq.gz", file2="reads.2.fastq.gz") as reader:
+    with dnaio.open("reads.1.fastq.gz", "reads.2.fastq.gz") as reader:
         bp = 0
         for r1, r2 in reader:
             bp += len(r1) + len(r2)
         print(f"The paired-end input contains {bp/1E6:.1f} Mbp")
 
-Note that ``file2`` is a keyword-only argument, so you need to write the ``file2=`` part.
-In this example, ``dnaio.open`` returns a `~dnaio.TwoFilePairedEndReader`.
-It also supports iteration, but instead of a single ``SequenceRecord``,
-it returns a pair of them.
+Here, ``dnaio.open`` returns a `~dnaio.TwoFilePairedEndReader`.
+It also supports iteration, but instead of a plain ``SequenceRecord``,
+it returns a tuple of two ``SequenceRecord`` instances.
 
 To read from interleaved paired-end data,
 pass ``interleaved=True`` to ``dnaio.open`` instead of a second file name::
@@ -122,7 +120,7 @@ to R2::
     import dnaio
 
     with dnaio.open("in.fastq.gz") as reader, \
-            dnaio.open("out.1.fastq.gz", file2="out.2.fastq.gz", mode="w") as writer:
+            dnaio.open("out.1.fastq.gz", "out.2.fastq.gz", mode="w") as writer:
         for record in reader:
             r1 = record[:30]
             r2 = record[-30:]


=====================================
pyproject.toml
=====================================
@@ -7,3 +7,12 @@ write_to = "src/dnaio/_version.py"
 
 [tool.pytest.ini_options]
 testpaths = ["tests"]
+
+[tool.cibuildwheel]
+environment = "CFLAGS=-g0"
+test-requires = "pytest"
+test-command = ["cd {project}", "pytest tests"]
+
+[[tool.cibuildwheel.overrides]]
+select = "*-win*"
+test-command = ["cd /d {project}", "pytest tests"]


=====================================
src/dnaio/__init__.py
=====================================
@@ -21,12 +21,16 @@ __all__ = [
     "InterleavedPairedEndWriter",
     "TwoFilePairedEndReader",
     "TwoFilePairedEndWriter",
+    "MultipleFileReader",
+    "MultipleFastaWriter",
+    "MultipleFastqWriter",
     "read_chunks",
     "read_paired_chunks",
     "records_are_mates",
     "__version__",
 ]
 
+import functools
 from os import PathLike
 from typing import Optional, Union, BinaryIO
 
@@ -47,6 +51,13 @@ from .pairedend import (
     InterleavedPairedEndReader,
     InterleavedPairedEndWriter,
 )
+from .multipleend import (
+    MultipleFastaWriter,
+    MultipleFastqWriter,
+    MultipleFileReader,
+    MultipleFileWriter,
+    _open_multiple,
+)
 from .exceptions import (
     UnknownFileFormat,
     FileFormatError,
@@ -63,40 +74,50 @@ from .chunks import read_chunks, read_paired_chunks
 from ._version import version as __version__
 
 
-# Backwards compatibility aliases
+# Backwards compatibility alias
 Sequence = SequenceRecord
 
 
 def open(
-    file1: Union[str, PathLike, BinaryIO],
-    *,
+    *files: Union[str, PathLike, BinaryIO],
+    file1: Optional[Union[str, PathLike, BinaryIO]] = None,
     file2: Optional[Union[str, PathLike, BinaryIO]] = None,
     fileformat: Optional[str] = None,
     interleaved: bool = False,
     mode: str = "r",
     qualities: Optional[bool] = None,
-    opener=xopen
-) -> Union[SingleEndReader, PairedEndReader, SingleEndWriter, PairedEndWriter]:
+    opener=xopen,
+    compression_level: int = 1,
+    open_threads: int = 0,
+) -> Union[
+    SingleEndReader,
+    PairedEndReader,
+    SingleEndWriter,
+    PairedEndWriter,
+    MultipleFileReader,
+    MultipleFileWriter,
+]:
     """
-    Open one or two files in FASTA or FASTQ format for reading or writing.
+    Open one or more FASTQ or FASTA files for reading or writing.
 
     Parameters:
+      files:
+        one or more Path or open file-like objects. One for single-end
+        reads, two for paired-end reads etc. More than two files are also
+        supported. At least one file is required.
 
       file1:
-        Path or an open file-like object. For reading single-end reads, this is
-        the only required argument.
+        Deprecated keyword argument for the first file.
 
       file2:
-        Path or an open file-like object. When reading paired-end reads from
-        two files, set this to the second file.
+        Deprecated keyword argument for the second file.
 
       mode:
-        Either ``'r'`` or ``'rb'`` for reading, ``'w'`` for writing
-        or ``'a'`` for appending.
+        Set to ``'r'`` for reading, ``'w'`` for writing or ``'a'`` for appending.
 
       interleaved:
-        If True, then file1 contains interleaved paired-end data.
-        file2 must be None in this case.
+        If True, then there must be only one file argument that contains
+        interleaved paired-end data.
 
       fileformat:
         If *None*, the file format is autodetected from the file name
@@ -114,29 +135,68 @@ def open(
         - When False (no qualities available), an exception is raised when the
           auto-detected output format is FASTQ.
 
-      opener: A function that is used to open file1 and file2 if they are not
+      opener: A function that is used to open the files if they are not
         already open file-like objects. By default, ``xopen`` is used, which can
         also open compressed file formats.
 
-    Return:
-       A subclass of `SingleEndReader`, `PairedEndReader`, `SingleEndWriter` or
-       `PairedEndWriter`.
+      open_threads: By default, dnaio opens files in the main thread.
+        When threads is greater than 0, external processes are opened for
+        compressing and decompressing files. This decreases wall clock time
+        at the cost of a little extra overhead. This parameter does not work
+        when a custom opener is set.
+
+      compression_level: By default dnaio uses compression level 1 for writing
+        gzipped files as this is the fastest. A higher level can be set using
+        this parameter. This parameter does not work when a custom opener is
+        set.
     """
-    if mode not in ("r", "rb", "w", "a"):
-        raise ValueError("Mode must be 'r', 'rb', 'w' or 'a'")
-    elif interleaved and file2 is not None:
-        raise ValueError("When interleaved is True, file2 must be None")
-    elif interleaved or file2 is not None:
+    if files and (file1 is not None):
+        raise ValueError(
+            "The file1 keyword argument cannot be used together with files specified"
+            "as positional arguments"
+        )
+    elif len(files) > 1 and file2 is not None:
+        raise ValueError(
+            "The file2 argument cannot be used together with more than one "
+            "file specified as positional argument"
+        )
+    elif file1 is not None and file2 is not None and files:
+        raise ValueError(
+            "file1 and file2 arguments cannot be used together with files specified"
+            "as positional arguments"
+        )
+    elif file1 is not None and file2 is not None:
+        files = (file1, file2)
+    elif file2 is not None and len(files) == 1:
+        files = (files[0], file2)
+
+    if len(files) > 1 and interleaved:
+        raise ValueError("When interleaved is True, only one file must be specified.")
+    elif mode not in ("r", "w", "a"):
+        raise ValueError("Mode must be 'r', 'w' or 'a'")
+
+    if opener == xopen:
+        opener = functools.partial(
+            xopen, threads=open_threads, compresslevel=compression_level
+        )
+    if interleaved or len(files) == 2:
         return _open_paired(
-            file1,
-            file2=file2,
+            *files,
             opener=opener,
             fileformat=fileformat,
-            interleaved=interleaved,
             mode=mode,
             qualities=qualities,
         )
+    elif len(files) > 2:
+        return _open_multiple(
+            *files, fileformat=fileformat, mode=mode, qualities=qualities, opener=opener
+        )
+
     else:
         return _open_single(
-            file1, opener=opener, fileformat=fileformat, mode=mode, qualities=qualities
+            files[0],
+            opener=opener,
+            fileformat=fileformat,
+            mode=mode,
+            qualities=qualities,
         )


=====================================
src/dnaio/_core.pyi
=====================================
@@ -1,4 +1,13 @@
-from typing import Optional, Tuple, BinaryIO, Iterator, Type, TypeVar, ByteString
+from typing import (
+    Generic,
+    Optional,
+    Tuple,
+    BinaryIO,
+    Iterator,
+    Type,
+    TypeVar,
+    ByteString,
+)
 
 class SequenceRecord:
     name: str
@@ -31,14 +40,14 @@ def records_are_mates(
 
 T = TypeVar("T")
 
-class FastqIter:
+class FastqIter(Generic[T]):
     def __init__(
         self, file: BinaryIO, sequence_class: Type[T], buffer_size: int = ...
     ): ...
     def __iter__(self) -> Iterator[T]: ...
     def __next__(self) -> T: ...
     @property
-    def n_records(self) -> int: ...
+    def number_of_records(self) -> int: ...
 
 # Deprecated
 def record_names_match(header1: str, header2: str) -> bool: ...


=====================================
src/dnaio/_core.pyx
=====================================
@@ -41,6 +41,24 @@ def bytes_ascii_check(bytes string, Py_ssize_t length = -1):
     return ascii
 
 
+def is_not_ascii_message(field, value):
+    """
+    Return an error message for a non-ASCII field encountered when initializing a SequenceRecord
+
+    Arguments:
+        field: Description of the field ("name", "sequence", "qualities" or similar)
+            in which non-ASCII characters were found
+        value: Unicode string that was intended to be assigned to the field
+    """
+    detail = ""
+    try:
+        value.encode("ascii")
+    except UnicodeEncodeError as e:
+        detail = (
+            f", but found '{value[e.start:e.end]}' at index {e.start}"
+        )
+    return f"'{field}' in sequence file must be ASCII encoded{detail}"
+
 
 cdef class SequenceRecord:
     """
@@ -57,38 +75,38 @@ cdef class SequenceRecord:
             If quality values are available, this is a string
             that contains the Phred-scaled qualities encoded as
             ASCII(qual+33) (as in FASTQ).
+
+    Raises:
+        ValueError: One of the provide attributes is not ASCII or
+            the lengths of sequence and qualities differ
     """
     cdef:
         object _name
         object _sequence
         object _qualities
 
-    def __cinit__(self, object name, object sequence, object qualities=None):
-        """Set qualities to None if there are no quality values"""
-        self._name = name
-        self._sequence = sequence
-        self._qualities = qualities
-
     def __init__(self, object name, object sequence, object qualities = None):
-        # __cinit__ is called first and sets all the variables.
         if not PyUnicode_CheckExact(name):
             raise TypeError(f"name should be of type str, got {type(name)}")
         if not PyUnicode_IS_COMPACT_ASCII(name):
-            raise ValueError("name must be a valid ASCII-string.")
+            raise ValueError(is_not_ascii_message("name", name))
         if not PyUnicode_CheckExact(sequence):
             raise TypeError(f"sequence should be of type str, got {type(sequence)}")
         if not PyUnicode_IS_COMPACT_ASCII(sequence):
-            raise ValueError("sequence must be a valid ASCII-string.")
+            raise ValueError(is_not_ascii_message("sequence", sequence))
         if qualities is not None:
             if not PyUnicode_CheckExact(qualities):
                 raise TypeError(f"qualities should be of type str, got {type(qualities)}")
             if not PyUnicode_IS_COMPACT_ASCII(qualities):
-                raise ValueError("qualities must be a valid ASCII-string.")
+                raise ValueError(is_not_ascii_message("qualities", qualities))
             if len(qualities) != len(sequence):
                 rname = shorten(name)
                 raise ValueError("In read named {!r}: length of quality sequence "
                                  "({}) and length of read ({}) do not match".format(
                     rname, len(qualities), len(sequence)))
+        self._name = name
+        self._sequence = sequence
+        self._qualities = qualities
 
     @property
     def name(self):
@@ -99,7 +117,7 @@ cdef class SequenceRecord:
         if not PyUnicode_CheckExact(name):
             raise TypeError(f"name must be of type str, got {type(name)}")
         if not PyUnicode_IS_COMPACT_ASCII(name):
-            raise ValueError("name must be a valid ASCII-string.")
+            raise ValueError(is_not_ascii_message("name", name))
         self._name = name
 
     @property
@@ -111,7 +129,7 @@ cdef class SequenceRecord:
         if not PyUnicode_CheckExact(sequence):
             raise TypeError(f"sequence must be of type str, got {type(sequence)}")
         if not PyUnicode_IS_COMPACT_ASCII(sequence):
-            raise ValueError("sequence must be a valid ASCII-string.")
+            raise ValueError(is_not_ascii_message("sequence", sequence))
         self._sequence = sequence
 
     @property
@@ -122,7 +140,7 @@ cdef class SequenceRecord:
     def qualities(self, qualities):
         if PyUnicode_CheckExact(qualities):
             if not PyUnicode_IS_COMPACT_ASCII(qualities):
-                raise ValueError("qualities must be a valid ASCII-string.")
+                raise ValueError(is_not_ascii_message("qualities", qualities))
         elif qualities is None:
             pass
         else:
@@ -267,6 +285,13 @@ cdef class SequenceRecord:
                                 header2_length, id1_ends_with_number)
 
     def reverse_complement(self):
+        """
+        Return a reverse-complemented version of this record.
+
+        - The name remains unchanged.
+        - The sequence is reverse complemented.
+        - If quality values exist, their order is reversed.
+        """
         cdef:
             Py_ssize_t sequence_length = PyUnicode_GET_LENGTH(self._sequence)
             object reversed_sequence_obj = PyUnicode_New(sequence_length, 127)
@@ -277,6 +302,7 @@ cdef class SequenceRecord:
             char *qualities
             Py_ssize_t cursor, reverse_cursor
             unsigned char nucleotide
+            SequenceRecord seq_record
         reverse_cursor = sequence_length
         for cursor in range(sequence_length):
             reverse_cursor -= 1
@@ -293,17 +319,21 @@ cdef class SequenceRecord:
                 reversed_qualities[reverse_cursor] = qualities[cursor]
         else:
             reversed_qualities_obj = None
-        return SequenceRecord.__new__(
-            SequenceRecord, self._name, reversed_sequence_obj, reversed_qualities_obj)
+        seq_record = SequenceRecord.__new__(SequenceRecord)
+        seq_record._name = self._name
+        seq_record._sequence = reversed_sequence_obj
+        seq_record._qualities = reversed_qualities_obj
+        return seq_record
 
 
 def paired_fastq_heads(buf1, buf2, Py_ssize_t end1, Py_ssize_t end2):
     """
     Skip forward in the two buffers by multiples of four lines.
 
-    Return a tuple (length1, length2) such that buf1[:length1] and
-    buf2[:length2] contain the same number of lines (where the
-    line number is divisible by four).
+    Returns:
+        A tuple (length1, length2) such that buf1[:length1] and
+        buf2[:length2] contain the same number of lines (where the
+        line number is divisible by four).
     """
     # Acquire buffers. Cython automatically checks for errors here.
     cdef Py_buffer data1_buffer
@@ -349,15 +379,21 @@ cdef class FastqIter:
     """
     Parse a FASTQ file and yield SequenceRecord objects
 
-    The *first value* that the generator yields is a boolean indicating whether
-    the first record in the FASTQ has a repeated header (in the third row
-    after the ``+``).
+    Arguments:
+        file: a file-like object, opened in binary mode (it must have a readinto
+            method)
 
-    file -- a file-like object, opened in binary mode (it must have a readinto
-    method)
+        sequence_class: A custom class to use for the returned instances
+            (instead of SequenceRecord)
 
-    buffer_size -- size of the initial buffer. This is automatically grown
-        if a FASTQ record is encountered that does not fit.
+        buffer_size: size of the initial buffer. This is automatically grown
+            if a FASTQ record is encountered that does not fit.
+
+    Yields:
+        The *first value* that the generator yields is a boolean indicating whether
+        the first record in the FASTQ has a repeated header (in the third row
+        after the ``+``). Subsequent values are SequenceRecord objects (or whichever
+        objects sequence_class returned if specified)
     """
     cdef:
         Py_ssize_t buffer_size
@@ -461,6 +497,7 @@ cdef class FastqIter:
     def __next__(self):
         cdef:
             object ret_val
+            SequenceRecord seq_record
             char *name_start
             char *name_end
             char *sequence_start
@@ -470,6 +507,7 @@ cdef class FastqIter:
             char *qualities_start
             char *qualities_end
             char *buffer_end
+            size_t remaining_bytes
             Py_ssize_t name_length, sequence_length, second_header_length, qualities_length
         # Repeatedly attempt to parse the buffer until we have found a full record.
         # If an attempt fails, we read more data before retrying.
@@ -495,10 +533,15 @@ cdef class FastqIter:
                 self._read_into_buffer()
                 continue
             second_header_start = sequence_end + 1
-            second_header_end = <char *>memchr(second_header_start, b'\n', <size_t>(buffer_end - second_header_start))
-            if second_header_end == NULL:
-                self._read_into_buffer()
-                continue
+            remaining_bytes = (buffer_end - second_header_start)
+            # Usually there is no second header, so we skip the memchr call.
+            if remaining_bytes > 2 and second_header_start[0] == b'+' and second_header_start[1] == b'\n':
+                second_header_end = second_header_start + 1
+            else:
+                second_header_end = <char *>memchr(second_header_start, b'\n', <size_t>(remaining_bytes))
+                if second_header_end == NULL:
+                    self._read_into_buffer()
+                    continue
             qualities_start = second_header_end + 1
             qualities_end = <char *>memchr(qualities_start, b'\n', <size_t>(buffer_end - qualities_start))
             if qualities_end == NULL:
@@ -564,8 +607,11 @@ cdef class FastqIter:
             if self.use_custom_class:
                 ret_val = self.sequence_class(name, sequence, qualities)
             else:
-                ret_val = SequenceRecord.__new__(SequenceRecord, name, sequence, qualities)
-
+                seq_record = SequenceRecord.__new__(SequenceRecord)
+                seq_record._name = name
+                seq_record._sequence = sequence
+                seq_record._qualities = qualities
+                ret_val = seq_record
             # Advance record to next position
             self.number_of_records += 1
             self.record_start = qualities_end + 1


=====================================
src/dnaio/_util.py
=====================================
@@ -1,24 +1,3 @@
-import pathlib
-
-
-def _is_path(obj: object) -> bool:
-    """
-    Return whether the given object looks like a path (str, pathlib.Path or pathlib2.Path)
-    """
-    # TODO
-    # pytest uses pathlib2.Path objects on Python 3.5 for its tmp_path fixture.
-    # On Python 3.6+, this function can be replaced with isinstance(obj, os.PathLike)
-    import sys
-
-    if "pathlib2" in sys.modules:
-        import pathlib2  # type: ignore
-
-        path_classes = [str, pathlib.Path, pathlib2.Path]
-    else:
-        path_classes = [str, pathlib.Path]
-    return isinstance(obj, tuple(path_classes))
-
-
 def shorten(s: str, n: int = 100) -> str:
 
     """Shorten string s to at most n characters, appending "..." if necessary."""


=====================================
src/dnaio/interfaces.py
=====================================
@@ -1,24 +1,42 @@
 from abc import ABC, abstractmethod
-from typing import Iterator, Tuple
+from typing import Iterable, Iterator, Tuple
 
 from dnaio import SequenceRecord
 
 
 class SingleEndReader(ABC):
+    delivers_qualities: bool
+    number_of_records: int
+
     @abstractmethod
     def __iter__(self) -> Iterator[SequenceRecord]:
-        """Yield the records in the input as `SequenceRecord` objects."""
+        """
+        Iterate over an input containing sequence records
+
+        Yields:
+            `SequenceRecord` objects
+
+        Raises:
+            `FileFormatError`
+                if there was a parse error
+        """
 
 
 class PairedEndReader(ABC):
     @abstractmethod
     def __iter__(self) -> Iterator[Tuple[SequenceRecord, SequenceRecord]]:
         """
-        Yield the records in the paired-end input as pairs of `SequenceRecord` objects.
+        Iterate over an input containing paired-end records
 
-        Raises a `FileFormatError` if reads are improperly paired, that is,
-        if there are more reads in one file than the other or if the record IDs
-        do not match (according to `SequenceRecord.is_mate`).
+        Yields:
+            Pairs of `SequenceRecord` objects
+
+        Raises:
+            `FileFormatError`
+                if there was a parse error or if reads are improperly paired,
+                that is, if there are more reads in one file than the other or
+                if the record IDs do not match (according to
+                `SequenceRecord.is_mate`).
         """
 
 
@@ -38,5 +56,30 @@ class PairedEndWriter(ABC):
         because this was already done at parsing time. If it is possible
         that the record IDs no longer match, check that
         ``record1.is_mate(record2)`` returns True before calling
-        this function.
+        this method.
+        """
+
+
+class MultipleFileWriter(ABC):
+    _number_of_files: int
+
+    @abstractmethod
+    def write(self, *records: SequenceRecord) -> None:
+        """
+        Write N SequenceRecords to the output. N must be equal
+        to the number of files the MultipleFileWriter was initialized with.
+
+        This method does not check whether the records are properly paired.
+        """
+
+    @abstractmethod
+    def write_iterable(self, list_of_records: Iterable[Tuple[SequenceRecord, ...]]):
+        """
+        Iterate over the list (or other iterable container) and write all
+        N-tuples of SequenceRecord to disk. N must be equal
+        to the number of files the MultipleFileWriter was initialized with.
+
+        This method does not check whether the records are properly paired.
+        This method may provide a speed boost over calling write for each
+        tuple of SequenceRecords individually.
         """


=====================================
src/dnaio/multipleend.py
=====================================
@@ -0,0 +1,258 @@
+import contextlib
+import os
+from os import PathLike
+from typing import BinaryIO, IO, Iterable, Iterator, List, Optional, Tuple, Union
+
+from xopen import xopen
+
+from ._core import SequenceRecord, records_are_mates
+from .exceptions import FileFormatError
+from .interfaces import MultipleFileWriter
+from .readers import FastaReader, FastqReader
+from .singleend import _open_single, _detect_format_from_name
+from .writers import FastaWriter, FastqWriter
+
+
+def _open_multiple(
+    *files: Union[str, PathLike, BinaryIO],
+    fileformat: Optional[str] = None,
+    mode: str = "r",
+    qualities: Optional[bool] = None,
+    opener=xopen,
+):
+    if not files:
+        raise ValueError("At least one file is required")
+    if mode not in ("r", "w", "a"):
+        raise ValueError("Mode must be one of 'r', 'w', 'a'")
+    elif mode == "r":
+        return MultipleFileReader(*files, fileformat=fileformat, opener=opener)
+    elif mode == "w" and fileformat is None:
+        # Assume mixed files will not be offered.
+        for file in files:
+            if isinstance(file, (str, os.PathLike)):
+                fileformat = _detect_format_from_name(os.fspath(file))
+    append = mode == "a"
+    if fileformat == "fastq" or qualities or (fileformat is None and qualities is None):
+        return MultipleFastqWriter(*files, opener=opener, append=append)
+    return MultipleFastaWriter(*files, opener=opener, append=append)
+
+
+class MultipleFileReader:
+    """
+    Read multiple FASTA/FASTQ files simultaneously. Useful when additional
+    FASTQ files with extra information are supplied (UMIs, indices etc.)..
+
+    While this class can be instantiated directly, the recommended way is to
+    use `dnaio.open` with appropriate arguments.
+    """
+
+    def __init__(
+        self,
+        *files: Union[str, PathLike, BinaryIO],
+        fileformat: Optional[str] = None,
+        opener=xopen,
+    ):
+        if len(files) < 1:
+            raise ValueError("At least one file is required")
+        self._files = files
+        self._stack = contextlib.ExitStack()
+        self._readers: List[Union[FastaReader, FastqReader]] = [
+            self._stack.enter_context(
+                _open_single(  # type: ignore
+                    file, opener=opener, fileformat=fileformat, mode="r"
+                )
+            )
+            for file in self._files
+        ]
+        self.delivers_qualities: bool = self._readers[0].delivers_qualities
+
+    def __repr__(self) -> str:
+        return (
+            f"{self.__class__.__name__}"
+            f"({', '.join(repr(reader) for reader in self._readers)})"
+        )
+
+    def __iter__(self) -> Iterator[Tuple[SequenceRecord, ...]]:
+        """
+        Iterate over multiple inputs containing records
+
+        Yields:
+            N-tuples of `SequenceRecord` objects where N is equal to the number
+            of files.
+
+        Raises:
+            `FileFormatError`
+                if there was a parse error or if reads are improperly paired,
+                that is, if there are more reads in one file than the others or
+                if the record IDs do not match (according to
+                `records_are_mates`).
+        """
+        if len(self._files) == 1:
+            yield from zip(self._readers[0])
+        else:
+            for records in zip(*self._readers):
+                if not records_are_mates(*records):
+                    raise FileFormatError(
+                        f"Records are out of sync, names "
+                        f"{', '.join(repr(r.name) for r in records)} do not match.",
+                        line=None,
+                    )
+                yield records
+        # Consume one iteration to check if all the files have an equal number
+        # of records.
+        for reader in self._readers:
+            try:
+                _ = next(iter(reader))
+            except StopIteration:
+                pass
+        record_numbers = [r.number_of_records for r in self._readers]
+        if len(set(record_numbers)) != 1:
+            raise FileFormatError(
+                f"Files: {', '.join(str(file) for file in self._files)} have "
+                f"an unequal amount of reads.",
+                line=None,
+            )
+
+    def close(self):
+        self._stack.close()
+
+    def __enter__(self):
+        return self
+
+    def __exit__(self, *exc):
+        self.close()
+
+
+class MultipleFastaWriter(MultipleFileWriter):
+    """
+    Write multiple FASTA files simultaneously.
+
+    While this class can be instantiated directly, the recommended way is to
+    use `dnaio.open` with appropriate arguments.
+    """
+
+    def __init__(
+        self,
+        *files: Union[str, PathLike, BinaryIO],
+        opener=xopen,
+        append: bool = False,
+    ):
+        if len(files) < 1:
+            raise ValueError("At least one file is required")
+        mode = "a" if append else "w"
+        self._files = files
+        self._number_of_files = len(files)
+        self._stack = contextlib.ExitStack()
+        self._writers: List[Union[FastaWriter, FastqWriter]] = [
+            self._stack.enter_context(
+                _open_single(  # type: ignore
+                    file,
+                    opener=opener,
+                    fileformat="fasta",
+                    mode=mode,
+                    qualities=False,
+                )
+            )
+            for file in self._files
+        ]
+
+    def __repr__(self) -> str:
+        return (
+            f"{self.__class__.__name__}"
+            f"({', '.join(repr(writer) for writer in self._writers)})"
+        )
+
+    def close(self):
+        self._stack.close()
+
+    def write(self, *records: SequenceRecord):
+        if len(records) != self._number_of_files:
+            raise ValueError(f"records must have length {self._number_of_files}")
+        for record, writer in zip(records, self._writers):
+            writer.write(record)
+
+    def write_iterable(self, records_iterable: Iterable[Tuple[SequenceRecord, ...]]):
+        for records in records_iterable:
+            self.write(*records)
+
+    def __enter__(self):
+        return self
+
+    def __exit__(self, *exc):
+        self.close()
+
+
+class MultipleFastqWriter(MultipleFileWriter):
+    """
+    Write multiple FASTA files simultaneously.
+
+    While this class can be instantiated directly, the recommended way is to
+    use `dnaio.open` with appropriate arguments.
+    """
+
+    def __init__(
+        self,
+        *files: Union[str, PathLike, BinaryIO],
+        opener=xopen,
+        append: bool = False,
+    ):
+        if len(files) < 1:
+            raise ValueError("At least one file is required")
+        mode = "a" if append else "w"
+        self._files = files
+        self._number_of_files = len(files)
+        self._stack = contextlib.ExitStack()
+        self._writers: List[IO] = [
+            self._stack.enter_context(
+                opener(file, mode + "b") if not hasattr(file, "write") else file
+            )
+            for file in self._files
+        ]
+
+    def __repr__(self) -> str:
+        return (
+            f"{self.__class__.__name__}" f"({', '.join(str(f) for f in self._files)})"
+        )
+
+    def close(self):
+        self._stack.close()
+
+    def write(self, *records: SequenceRecord):
+        if len(records) != self._number_of_files:
+            raise ValueError(f"records must have length {self._number_of_files}")
+        for record, writer in zip(records, self._writers):
+            writer.write(record.fastq_bytes())
+
+    def write_iterable(self, records_iterable: Iterable[Tuple[SequenceRecord, ...]]):
+        # Use faster methods for more common cases before falling back to
+        # generic multiple files mode (which is much slower due to calling the
+        # zip function).
+        if self._number_of_files == 1:
+            output = self._writers[0]
+            for (record,) in records_iterable:
+                output.write(record.fastq_bytes())
+        elif self._number_of_files == 2:
+            output1 = self._writers[0]
+            output2 = self._writers[1]
+            for record1, record2 in records_iterable:
+                output1.write(record1.fastq_bytes())
+                output2.write(record2.fastq_bytes())
+        elif self._number_of_files == 3:
+            output1 = self._writers[0]
+            output2 = self._writers[1]
+            output3 = self._writers[2]
+            for record1, record2, record3 in records_iterable:
+                output1.write(record1.fastq_bytes())
+                output2.write(record2.fastq_bytes())
+                output3.write(record3.fastq_bytes())
+        else:  # More than 3 files is quite uncommon.
+            writers = self._writers
+            for records in records_iterable:
+                for record, output in zip(records, writers):
+                    output.write(record.fastq_bytes())
+
+    def __enter__(self):
+        return self
+
+    def __exit__(self, *exc):
+        self.close()


=====================================
src/dnaio/pairedend.py
=====================================
@@ -13,11 +13,8 @@ from .singleend import _open_single
 
 
 def _open_paired(
-    file1: Union[str, PathLike, BinaryIO],
-    *,
-    file2: Optional[Union[str, PathLike, BinaryIO]] = None,
+    *files: Union[str, PathLike, BinaryIO],
     fileformat: Optional[str] = None,
-    interleaved: bool = False,
     mode: str = "r",
     qualities: Optional[bool] = None,
     opener=xopen,
@@ -25,43 +22,43 @@ def _open_paired(
     """
     Open paired-end reads
     """
-    if interleaved and file2 is not None:
-        raise ValueError("When interleaved is True, file2 must be None")
-    if file2 is not None:
-        if mode in "wa" and file1 == file2:
+    if len(files) == 2:
+        if mode in "wa" and files[0] == files[1]:
             raise ValueError("The paired-end output files are identical")
         if "r" in mode:
             return TwoFilePairedEndReader(
-                file1, file2, fileformat=fileformat, opener=opener, mode=mode
+                *files, fileformat=fileformat, opener=opener, mode=mode
             )
         append = mode == "a"
         return TwoFilePairedEndWriter(
-            file1,
-            file2,
+            *files,
             fileformat=fileformat,
             qualities=qualities,
             opener=opener,
             append=append,
         )
-    if interleaved:
+    elif len(files) == 1:
         if "r" in mode:
             return InterleavedPairedEndReader(
-                file1, fileformat=fileformat, opener=opener, mode=mode
+                files[0], fileformat=fileformat, opener=opener, mode=mode
             )
         append = mode == "a"
         return InterleavedPairedEndWriter(
-            file1,
+            files[0],
             fileformat=fileformat,
             qualities=qualities,
             opener=opener,
             append=append,
         )
-    assert False
+    raise ValueError("_open_paired must be called with one or two files.")
 
 
 class TwoFilePairedEndReader(PairedEndReader):
     """
-    Read paired-end reads from two files.
+    Read paired-end reads from two files (not interleaved)
+
+    While this class can be instantiated directly, the recommended way is to
+    use `dnaio.open` with appropriate arguments.
     """
 
     paired = True
@@ -92,7 +89,7 @@ class TwoFilePairedEndReader(PairedEndReader):
     def __iter__(self) -> Iterator[Tuple[SequenceRecord, SequenceRecord]]:
         """
         Iterate over the paired reads.
-        Each yielded item is a pair of SequenceRecord objects.
+        Each yielded item is a pair of `SequenceRecord` objects.
 
         Raises a `FileFormatError` if reads are improperly paired.
         """
@@ -139,7 +136,10 @@ class TwoFilePairedEndReader(PairedEndReader):
 
 class InterleavedPairedEndReader(PairedEndReader):
     """
-    Read paired-end reads from an interleaved FASTQ file.
+    Read paired-end reads from an interleaved FASTQ file
+
+    While this class can be instantiated directly, the recommended way is to
+    use `dnaio.open` with appropriate arguments.
     """
 
     paired = True
@@ -191,6 +191,13 @@ class InterleavedPairedEndReader(PairedEndReader):
 
 
 class TwoFilePairedEndWriter(PairedEndWriter):
+    """
+    Write paired-end reads to two files (not interleaved)
+
+    While this class can be instantiated directly, the recommended way is to
+    use `dnaio.open` with appropriate arguments.
+    """
+
     def __init__(
         self,
         file1: Union[str, PathLike, BinaryIO],
@@ -246,6 +253,9 @@ class TwoFilePairedEndWriter(PairedEndWriter):
 class InterleavedPairedEndWriter(PairedEndWriter):
     """
     Write paired-end reads to an interleaved FASTA or FASTQ file
+
+    While this class can be instantiated directly, the recommended way is to
+    use `dnaio.open` with appropriate arguments.
     """
 
     def __init__(


=====================================
src/dnaio/readers.py
=====================================
@@ -64,6 +64,9 @@ class BinaryFileReader:
 class FastaReader(BinaryFileReader, SingleEndReader):
     """
     Reader for FASTA files
+
+    While this class can be instantiated directly, the recommended way is to
+    use `dnaio.open` with appropriate arguments.
     """
 
     def __init__(
@@ -104,7 +107,14 @@ class FastaReader(BinaryFileReader, SingleEndReader):
             if line and line[0] == ">":
                 if name is not None:
                     self.number_of_records += 1
-                    yield self.sequence_class(name, self._delimiter.join(seq), None)
+                    try:
+                        yield self.sequence_class(name, self._delimiter.join(seq), None)
+                    except ValueError as e:
+                        raise FastaFormatError(
+                            str(e)
+                            + " (line number refers to record after the problematic one)",
+                            line=i,
+                        )
                 name = line[1:]
                 seq = []
             elif line and line[0] == "#":
@@ -119,12 +129,18 @@ class FastaReader(BinaryFileReader, SingleEndReader):
 
         if name is not None:
             self.number_of_records += 1
-            yield self.sequence_class(name, self._delimiter.join(seq), None)
+            try:
+                yield self.sequence_class(name, self._delimiter.join(seq), None)
+            except ValueError as e:
+                raise FastaFormatError(str(e), line=None)
 
 
 class FastqReader(BinaryFileReader, SingleEndReader):
     """
     Reader for FASTQ files. Does not support multi-line FASTQ files.
+
+    While this class can be instantiated directly, the recommended way is to
+    use `dnaio.open` with appropriate arguments.
     """
 
     def __init__(


=====================================
src/dnaio/singleend.py
=====================================
@@ -1,10 +1,9 @@
 import os
-from typing import Optional, Union, BinaryIO
+from typing import Optional, Union, BinaryIO, Tuple
 
 from .exceptions import UnknownFileFormat
 from .readers import FastaReader, FastqReader
 from .writers import FastaWriter, FastqWriter
-from ._util import _is_path
 
 
 def _open_single(
@@ -21,22 +20,8 @@ def _open_single(
     if mode not in ("r", "w", "a"):
         raise ValueError("Mode must be 'r', 'w' or 'a'")
 
-    path: Optional[str]
-    if _is_path(file_or_path):
-        path = os.fspath(file_or_path)  # type: ignore
-        file = opener(path, mode[0] + "b")
-        close_file = True
-    else:
-        if "r" in mode and not hasattr(file_or_path, "readinto"):
-            raise ValueError(
-                "When passing in an open file-like object, it must have been opened in binary mode"
-            )
-        file = file_or_path
-        if hasattr(file, "name") and isinstance(file.name, str):
-            path = file.name
-        else:
-            path = None
-        close_file = False
+    close_file, file, path = _open_file_or_path(file_or_path, mode, opener)
+    del file_or_path
 
     if path is not None and fileformat is None:
         fileformat = _detect_format_from_name(path)
@@ -75,11 +60,38 @@ def _open_single(
             return FastqReader(file, _close_file=close_file)
         return FastqWriter(file, _close_file=close_file)
 
+    if close_file:
+        file.close()
     raise UnknownFileFormat(
         f"File format '{fileformat}' is unknown (expected 'fasta' or 'fastq')."
     )
 
 
+def _open_file_or_path(
+    file_or_path: Union[str, os.PathLike, BinaryIO], mode: str, opener
+) -> Tuple[bool, BinaryIO, Optional[str]]:
+    path: Optional[str]
+    file: BinaryIO
+    try:
+        path = os.fspath(file_or_path)  # type: ignore
+    except TypeError:
+        if "r" in mode and not hasattr(file_or_path, "readinto"):
+            raise ValueError(
+                "When passing in an open file-like object, it must have been opened in binary mode"
+            )
+        file = file_or_path  # type: ignore
+        if hasattr(file, "name") and isinstance(file.name, str):
+            path = file.name
+        else:
+            path = None
+        close_file = False
+    else:
+        file = opener(path, mode[0] + "b")
+        close_file = True
+
+    return close_file, file, path
+
+
 def _detect_format_from_name(name: str) -> Optional[str]:
     """
     name -- file name
@@ -87,7 +99,7 @@ def _detect_format_from_name(name: str) -> Optional[str]:
     Return 'fasta', 'fastq' or None if the format could not be detected.
     """
     name = name.lower()
-    for ext in (".gz", ".xz", ".bz2"):
+    for ext in (".gz", ".xz", ".bz2", ".zst"):
         if name.endswith(ext):
             name = name[: -len(ext)]
             break


=====================================
src/dnaio/writers.py
=====================================
@@ -1,10 +1,10 @@
+import os
 from os import PathLike
 from typing import Union, BinaryIO, Optional
 
 from xopen import xopen
 
 from . import SequenceRecord
-from ._util import _is_path
 from .interfaces import SingleEndWriter
 
 
@@ -13,6 +13,8 @@ class FileWriter:
     A mix-in that manages opening and closing and provides a context manager
     """
 
+    _file: BinaryIO
+
     def __init__(
         self,
         file: Union[PathLike, str, BinaryIO],
@@ -20,12 +22,15 @@ class FileWriter:
         opener=xopen,
         _close_file: Optional[bool] = None,
     ):
-        if _is_path(file):
+        try:
+            os.fspath(file)  # type: ignore
+        except TypeError:
+            # Assume it’s an open file-like object
+            self._file = file  # type: ignore
+            self._close_on_exit = bool(_close_file)
+        else:
             self._file = opener(file, "wb")
             self._close_on_exit = True
-        else:
-            self._file = file
-            self._close_on_exit = bool(_close_file)
 
     def __repr__(self) -> str:
         return f"{self.__class__.__name__}('{getattr(self._file, 'name', self._file)}')"
@@ -45,7 +50,14 @@ class FileWriter:
 
 class FastaWriter(FileWriter, SingleEndWriter):
     """
-    Write FASTA-formatted sequences to a file.
+    Write FASTA-formatted sequences to a file
+
+    While this class can be instantiated directly, the recommended way is to
+    use `dnaio.open` with appropriate arguments unless you need to set the
+    line_length argument.
+
+    Arguments:
+        line_length: Wrap sequence lines after this many characters (None disables wrapping)
     """
 
     def __init__(
@@ -56,13 +68,6 @@ class FastaWriter(FileWriter, SingleEndWriter):
         opener=xopen,
         _close_file: Optional[bool] = None,
     ):
-        """
-
-        Arguments:
-            file: A path or an open file-like object
-            line_length: Wrap sequence lines after this many characters (None disables wrapping)
-            opener: If *file* is a path, this function is called to open it.
-        """
         super().__init__(file, opener=opener, _close_file=_close_file)
         self.line_length = line_length if line_length != 0 else None
 
@@ -101,14 +106,15 @@ class FastaWriter(FileWriter, SingleEndWriter):
 
 class FastqWriter(FileWriter, SingleEndWriter):
     """
-    Write records in FASTQ format.
+    Write records in FASTQ format
 
-    FASTQ files are formatted like this::
+    While this class can be instantiated directly, the recommended way is to
+    use `dnaio.open` with appropriate arguments unless you need to set
+    two_headers to True.
 
-        @read name
-        AACCGGTT
-        +
-        FF,:F,,F
+    Arguments:
+        two_headers: If True, the header is repeated on the third line
+            of each record after the "+".
     """
 
     file_mode = "wb"
@@ -121,13 +127,6 @@ class FastqWriter(FileWriter, SingleEndWriter):
         opener=xopen,
         _close_file: Optional[bool] = None,
     ):
-        """
-        Arguments:
-            file: A path or an open file-like object
-            two_headers: If True, the header is repeated on the third line
-                of each record after the "+".
-            opener: If *file* is a path, this function is called to open it.
-        """
         super().__init__(file, opener=opener, _close_file=_close_file)
         self._two_headers = two_headers
         # setattr avoids a complaint from Mypy


=====================================
tests/test_chunks.py
=====================================
@@ -1,6 +1,7 @@
 from pytest import raises
 from io import BytesIO
 
+from dnaio import UnknownFileFormat
 from dnaio._core import paired_fastq_heads
 from dnaio.chunks import _fastq_head, _fasta_head, read_chunks, read_paired_chunks
 
@@ -84,3 +85,8 @@ def test_read_chunks():
 
 def test_read_chunks_empty():
     assert list(read_chunks(BytesIO(b""))) == []
+
+
+def test_invalid_file_format():
+    with raises(UnknownFileFormat):
+        list(read_chunks(BytesIO(b"invalid format")))


=====================================
tests/test_internal.py
=====================================
@@ -22,8 +22,10 @@ from dnaio import (
     FastqWriter,
     InterleavedPairedEndWriter,
     TwoFilePairedEndReader,
+    records_are_mates,
+    record_names_match,
+    SequenceRecord,
 )
-from dnaio import records_are_mates, record_names_match, SequenceRecord
 from dnaio.writers import FileWriter
 from dnaio.readers import BinaryFileReader
 
@@ -264,10 +266,10 @@ class TestFastqReader:
 
 
 class TestOpen:
-    def setup(self):
+    def setup_method(self):
         self._tmpdir = mkdtemp()
 
-    def teardown(self):
+    def teardown_method(self):
         shutil.rmtree(self._tmpdir)
 
     def test_sequence_reader(self):
@@ -339,7 +341,7 @@ class TestOpen:
         path = os.path.join(self._tmpdir, "tmp.fastq")
         with raises(ValueError):
             with dnaio.open(path, mode="w", qualities=False):
-                pass
+                pass  # pragma: no cover
 
 
 class TestInterleavedReader:
@@ -386,11 +388,11 @@ class TestInterleavedReader:
 
 
 class TestFastaWriter:
-    def setup(self):
+    def setup_method(self):
         self._tmpdir = mkdtemp()
         self.path = os.path.join(self._tmpdir, "tmp.fasta")
 
-    def teardown(self):
+    def teardown_method(self):
         shutil.rmtree(self._tmpdir)
 
     def test(self):
@@ -436,11 +438,11 @@ class TestFastaWriter:
 
 
 class TestFastqWriter:
-    def setup(self):
+    def setup_method(self):
         self._tmpdir = mkdtemp()
         self.path = os.path.join(self._tmpdir, "tmp.fastq")
 
-    def teardown(self):
+    def teardown_method(self):
         shutil.rmtree(self._tmpdir)
 
     def test(self):
@@ -608,7 +610,7 @@ def test_file_writer(tmp_path):
     assert path.exists()
     with raises(ValueError) as e:
         with fw:
-            pass
+            pass  # pragma: no coverage
     assert "operation on closed file" in e.value.args[0]
 
 
@@ -618,7 +620,7 @@ def test_binary_file_reader():
     bfr.close()
     with raises(ValueError) as e:
         with bfr:
-            pass
+            pass  # pragma: no coverage
     assert "operation on closed" in e.value.args[0]
 
 


=====================================
tests/test_multiple.py
=====================================
@@ -0,0 +1,121 @@
+import io
+import itertools
+import os
+from pathlib import Path
+
+import dnaio
+from dnaio import SequenceRecord, _open_multiple
+
+import pytest
+
+
+ at pytest.mark.parametrize(
+    ["fileformat", "number_of_files"],
+    itertools.product(("fasta", "fastq"), (1, 2, 3, 4)),
+)
+def test_read_files(fileformat, number_of_files):
+    file = Path(__file__).parent / "data" / ("simple." + fileformat)
+    files = [file for _ in range(number_of_files)]
+    with _open_multiple(*files) as multiple_reader:
+        for records in multiple_reader:
+            pass
+        assert len(records) == number_of_files
+        assert isinstance(records, tuple)
+
+
+ at pytest.mark.parametrize(
+    "kwargs",
+    [
+        dict(mode="w", fileformat="fasta"),
+        dict(mode="r"),
+        dict(mode="w", fileformat="fastq"),
+    ],
+)
+def test_open_no_file_error(kwargs):
+    with pytest.raises(ValueError):
+        _open_multiple(**kwargs)
+
+
+def test_open_multiple_unsupported_mode():
+    with pytest.raises(ValueError) as error:
+        _open_multiple(os.devnull, mode="X")
+    error.match("one of 'r', 'w', 'a'")
+
+
+ at pytest.mark.parametrize(
+    ["number_of_files", "content"],
+    itertools.product(
+        (1, 2, 3, 4), (">my_fasta\nAGCTAGA\n", "@my_fastq\nAGC\n+\nHHH\n")
+    ),
+)
+def test_multiple_binary_read(number_of_files, content):
+    files = [io.BytesIO(content.encode("ascii")) for _ in range(number_of_files)]
+    with _open_multiple(*files) as reader:
+        for records_tup in reader:
+            pass
+
+
+ at pytest.mark.parametrize(
+    ["number_of_files", "fileformat"],
+    itertools.product((1, 2, 3, 4), ("fastq", "fasta")),
+)
+def test_multiple_binary_write(number_of_files, fileformat):
+    files = [io.BytesIO() for _ in range(number_of_files)]
+    records = [SequenceRecord("A", "A", "A") for _ in range(number_of_files)]
+    with _open_multiple(*files, mode="w", fileformat=fileformat) as writer:
+        writer.write(*records)
+
+
+ at pytest.mark.parametrize(
+    ["number_of_files", "fileformat"],
+    itertools.product((1, 2, 3, 4), ("fastq", "fasta")),
+)
+def test_multiple_write_too_many(number_of_files, fileformat):
+    files = [io.BytesIO() for _ in range(number_of_files)]
+    records = [SequenceRecord("A", "A", "A") for _ in range(number_of_files + 1)]
+    with _open_multiple(*files, mode="w", fileformat=fileformat) as writer:
+        with pytest.raises(ValueError) as error:
+            writer.write(*records)
+    error.match(str(number_of_files))
+
+
+ at pytest.mark.parametrize(
+    ["number_of_files", "fileformat"],
+    itertools.product((1, 2, 3, 4), ("fastq", "fasta")),
+)
+def test_multiple_write_iterable(number_of_files, fileformat):
+    files = [io.BytesIO() for _ in range(number_of_files)]
+    records = [SequenceRecord("A", "A", "A") for _ in range(number_of_files)]
+    records_list = [records, records, records]
+    with _open_multiple(*files, mode="w", fileformat=fileformat) as writer:
+        writer.write_iterable(records_list)
+
+
+ at pytest.mark.parametrize("number_of_files", (2, 3, 4))
+def test_multiple_read_unmatched_names(number_of_files):
+    record1_content = b"@my_fastq\nAGC\n+\nHHH\n"
+    record2_content = b"@my_fasterq\nAGC\n+\nHHH\n"
+    files = (
+        io.BytesIO(record1_content),
+        *(io.BytesIO(record2_content) for _ in range(number_of_files - 1)),
+    )
+    with _open_multiple(*files) as reader:
+        with pytest.raises(dnaio.FileFormatError) as error:
+            for records in reader:
+                pass
+    error.match("do not match")
+
+
+ at pytest.mark.parametrize("number_of_files", (2, 3, 4))
+def test_multiple_read_out_of_sync(number_of_files):
+    record1_content = b"@my_fastq\nAGC\n+\nHHH\n"
+    record2_content = b"@my_fastq\nAGC\n+\nHHH\n at my_secondfastq\nAGC\n+\nHHH\n"
+    files = (
+        io.BytesIO(record1_content),
+        *(io.BytesIO(record2_content) for _ in range(number_of_files - 1)),
+    )
+    with _open_multiple(*files) as reader:
+        with pytest.raises(dnaio.FileFormatError) as error:
+            for records in reader:
+                pass
+    error.match("unequal amount")


=====================================
tests/test_open.py
=====================================
@@ -1,9 +1,11 @@
+import os
 from pathlib import Path
 
-import dnaio
+import pytest
 from xopen import xopen
 
-import pytest
+import dnaio
+from dnaio import FileFormatError, UnknownFileFormat
 
 
 @pytest.fixture(params=["", ".gz", ".bz2", ".xz"])
@@ -31,17 +33,12 @@ SIMPLE_RECORDS = {
 def formatted_sequence(record, fileformat):
     if fileformat == "fastq":
         return "@{}\n{}\n+\n{}\n".format(record.name, record.sequence, record.qualities)
-    elif fileformat == "fastq_bytes":
-        return b"@%b\n%b\n+\n%b\n" % (record.name, record.sequence, record.qualities)
     else:
         return ">{}\n{}\n".format(record.name, record.sequence)
 
 
 def formatted_sequences(records, fileformat):
-    record_iter = (formatted_sequence(record, fileformat) for record in records)
-    if fileformat == "fastq_bytes":
-        return b"".join(record_iter)
-    return "".join(record_iter)
+    return "".join(formatted_sequence(record, fileformat) for record in records)
 
 
 def test_formatted_sequence():
@@ -57,7 +54,7 @@ def test_version():
 def test_open_nonexistent(tmp_path):
     with pytest.raises(FileNotFoundError):
         with dnaio.open(tmp_path / "nonexistent"):
-            pass
+            pass  # pragma: no cover
 
 
 def test_open_empty_file_with_unrecognized_extension(tmp_path):
@@ -68,6 +65,43 @@ def test_open_empty_file_with_unrecognized_extension(tmp_path):
     assert records == []
 
 
+def test_fileformat_error(tmp_path):
+    with open(tmp_path / "file.fastq", mode="w") as f:
+        print("this is not a FASTQ file", file=f)
+    with pytest.raises(FileFormatError) as e:
+        with dnaio.open(tmp_path / "file.fastq") as f:
+            _ = list(f)  # pragma: no cover
+    assert "at line 2" in str(e.value)  # Premature end of file
+
+
+def test_write_unknown_file_format(tmp_path):
+    with pytest.raises(UnknownFileFormat):
+        with dnaio.open(tmp_path / "out.txt", mode="w") as f:
+            f.write(dnaio.SequenceRecord("name", "ACG", "###"))  # pragma: no cover
+
+
+def test_read_unknown_file_format(tmp_path):
+    with open(tmp_path / "file.txt", mode="w") as f:
+        print("text file", file=f)
+    with pytest.raises(UnknownFileFormat):
+        with dnaio.open(tmp_path / "file.txt") as f:
+            _ = list(f)  # pragma: no cover
+
+
+def test_invalid_format(tmp_path):
+    with pytest.raises(UnknownFileFormat):
+        with dnaio.open(tmp_path / "out.txt", mode="w", fileformat="foo"):
+            pass  # pragma: no cover
+
+
+def test_write_qualities_to_file_without_fastq_extension(tmp_path):
+    with dnaio.open(tmp_path / "out.txt", mode="w", qualities=True) as f:
+        f.write(dnaio.SequenceRecord("name", "ACG", "###"))
+
+    with dnaio.open(tmp_path / "out.txt", mode="w", qualities=False) as f:
+        f.write(dnaio.SequenceRecord("name", "ACG", None))
+
+
 def test_read(fileformat, extension):
     with dnaio.open("tests/data/simple." + fileformat + extension) as f:
         records = list(f)
@@ -188,7 +222,7 @@ def test_write_paired_same_path(tmp_path):
     path2 = tmp_path / "same.fastq"
     with pytest.raises(ValueError):
         with dnaio.open(file1=path1, file2=path2, mode="w"):
-            pass
+            pass  # pragma: no cover
 
 
 def test_write_paired(tmp_path, fileformat, extension):
@@ -301,3 +335,50 @@ def test_islice_gzip_does_not_fail(tmp_path):
     f = dnaio.open(path)
     next(iter(f))
     f.close()
+
+
+def test_unsupported_mode():
+    with pytest.raises(ValueError) as error:
+        _ = dnaio.open(os.devnull, mode="x")
+    error.match("Mode must be")
+
+
+def test_no_file2_with_multiple_args():
+    with pytest.raises(ValueError) as error:
+        _ = dnaio.open(os.devnull, os.devnull, file2=os.devnull)
+    error.match("as positional argument")
+    error.match("file2")
+
+
+def test_no_multiple_files_interleaved():
+    with pytest.raises(ValueError) as error:
+        _ = dnaio.open(os.devnull, os.devnull, interleaved=True)
+    error.match("interleaved")
+    error.match("one file")
+
+
+ at pytest.mark.parametrize(
+    ["mode", "expected_class"],
+    [("r", dnaio.PairedEndReader), ("w", dnaio.PairedEndWriter)],
+)
+def test_paired_open_with_multiple_args(tmp_path, fileformat, mode, expected_class):
+    path = tmp_path / "file"
+    path2 = tmp_path / "file2"
+    path.touch()
+    path2.touch()
+    with dnaio.open(path, path2, fileformat=fileformat, mode=mode) as f:
+        assert isinstance(f, expected_class)
+
+
+ at pytest.mark.parametrize(
+    ["kwargs", "expected_class"],
+    [
+        ({}, dnaio.multipleend.MultipleFileReader),
+        ({"mode": "w"}, dnaio.multipleend.MultipleFastqWriter),
+        ({"mode": "w", "fileformat": "fastq"}, dnaio.multipleend.MultipleFastqWriter),
+        ({"mode": "w", "fileformat": "fasta"}, dnaio.multipleend.MultipleFastaWriter),
+    ],
+)
+def test_multiple_open_fastq(kwargs, expected_class):
+    with dnaio.open(os.devnull, os.devnull, os.devnull, **kwargs) as f:
+        assert isinstance(f, expected_class)


=====================================
tox.ini
=====================================
@@ -1,5 +1,5 @@
 [tox]
-envlist = flake8,black,mypy,docs,py37,py38,py39,py310
+envlist = flake8,black,mypy,docs,py37,py38,py39,py310,py311
 isolated_build = True
 
 [testenv]
@@ -7,8 +7,9 @@ deps =
     pytest
     coverage
 commands =
-    coverage run --concurrency=multiprocessing -m pytest --doctest-modules --pyargs tests/
+    coverage run -m pytest
     coverage combine
+    coverage xml
     coverage report
 setenv = PYTHONDEVMODE = 1
 
@@ -46,6 +47,12 @@ source =
     src/
     */site-packages/
 
+[coverage:report]
+precision = 1
+exclude_lines =
+    pragma: no cover
+    def __repr__
+
 [flake8]
 max-line-length = 99
 max-complexity = 15



View it on GitLab: https://salsa.debian.org/med-team/python-dnaio/-/compare/94db0ad1243b81b4373b04aaa9481f56124b1a29...5855df2a71bdc488fc12bbbcb7e915967b4260b4

-- 
View it on GitLab: https://salsa.debian.org/med-team/python-dnaio/-/compare/94db0ad1243b81b4373b04aaa9481f56124b1a29...5855df2a71bdc488fc12bbbcb7e915967b4260b4
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20221231/680fe4c9/attachment-0001.htm>


More information about the debian-med-commit mailing list