[med-svn] [Git][med-team/python-dnaio][upstream] New upstream version 0.9.1

Thu Aug 25 17:15:03 BST 2022


Andreas Tille pushed to branch upstream at Debian Med / python-dnaio


Commits:
7a5dc96e by Andreas Tille at 2022-08-25T18:13:18+02:00
New upstream version 0.9.1
- - - - -


15 changed files:

- .github/workflows/ci.yml
- CHANGES.rst
- README.rst
- doc/api.rst
- doc/conf.py
- doc/tutorial.rst
- pyproject.toml
- setup.cfg
- src/dnaio/__init__.py
- src/dnaio/_core.pyx
- src/dnaio/interfaces.py
- src/dnaio/pairedend.py
- src/dnaio/readers.py
- src/dnaio/singleend.py
- src/dnaio/writers.py


Changes:

=====================================
.github/workflows/ci.yml
=====================================
@@ -65,7 +65,7 @@ jobs:
     timeout-minutes: 15
     strategy:
       matrix:
-        os: [ubuntu-20.04, windows-2019]
+        os: [ubuntu-20.04, windows-2019, macos-10.15]
     runs-on: ${{ matrix.os }}
     steps:
     - uses: actions/checkout at v2
@@ -74,10 +74,11 @@ jobs:
     - name: Build wheels
       uses: pypa/cibuildwheel at v2.3.1
       env:
-        CIBW_BUILD: "cp*-manylinux_x86_64 cp3*-win_amd64"
+        CIBW_BUILD: "cp*-manylinux_x86_64 cp3*-win_amd64 cp3*-macosx_x86_64"
         CIBW_ENVIRONMENT: "CFLAGS=-g0"
         CIBW_TEST_REQUIRES: "pytest"
         CIBW_TEST_COMMAND_LINUX: "cd {project} && pytest tests"
+        CIBW_TEST_COMMAND_MACOS: "cd {project} && pytest tests"
         CIBW_TEST_COMMAND_WINDOWS: "cd /d {project} && pytest tests"
     - uses: actions/upload-artifact at v2
       with:


=====================================
CHANGES.rst
=====================================
@@ -2,6 +2,13 @@
 Changelog
 =========
 
+v0.9.1 (2022-08-01)
+-------------------
+
+* :pr:`85`: macOS wheels are now also built as part of the release procedure.
+* :pr:`81`: API documentation improvements and minor code refactors for
+  readability.
+
 v0.9.0 (2022-05-17)
 -------------------
 


=====================================
README.rst
=====================================
@@ -30,7 +30,8 @@ The main interface is the `dnaio.open <https://dnaio.readthedocs.io/en/latest/ap
             bp += len(record)
     print(f"The input file contains {bp/1E6:.1f} Mbp")
 
-See the `documentation <https://dnaio.readthedocs.io/>`_ for more.
+For more, see the `tutorial <https://dnaio.readthedocs.io/en/latest/tutorial.html>`_ and
+`API documentation <https://dnaio.readthedocs.io/en/latest/api.html>`_.
 
 Features and supported file types
 =================================
@@ -43,14 +44,12 @@ Features and supported file types
 - Files with DOS/Windows linebreaks can be read
 - FASTQ files with a second header line (after the ``+``) are supported
 
-
 Limitations
 ===========
 
 - Multi-line FASTQ files are not supported.
 - FASTQ parsing is the focus of this library. The FASTA parser is not as optimized.
 
-
 Links
 =====
 


=====================================
doc/api.rst
=====================================
@@ -4,6 +4,7 @@ The dnaio API
 
 .. module:: dnaio
 
+
 The open function
 -----------------
 
@@ -20,40 +21,52 @@ The ``SequenceRecord`` class
    .. automethod:: __init__(name: str, sequence: str, qualities: Optional[str] = None)
 
 
+Reader and writer interfaces
+----------------------------
+
+.. autoclass:: SingleEndReader
+   :members: __iter__
+
+.. autoclass:: PairedEndReader
+   :members: __iter__
+
+.. autoclass:: SingleEndWriter
+   :members: write
+
+.. autoclass:: PairedEndWriter
+   :members: write
+
+
 Reader and writer classes
 -------------------------
 
+The `dnaio.open` function returns an instance of one of the following classes.
+They can also be used directly if needed.
+
+
 .. autoclass:: FastaReader
+   :show-inheritance:
 
 .. autoclass:: FastaWriter
-   :members: write
+   :show-inheritance:
 
 .. autoclass:: FastqReader
+   :show-inheritance:
 
 .. autoclass:: FastqWriter
-   :members: writeseq
-
-   .. py:method:: write(record: SequenceRecord) -> None:
-
-      Write a SequenceRecord to the FASTQ file.
+   :show-inheritance:
 
 .. autoclass:: TwoFilePairedEndReader
+   :show-inheritance:
 
 .. autoclass:: TwoFilePairedEndWriter
+   :show-inheritance:
 
 .. autoclass:: InterleavedPairedEndReader
+   :show-inheritance:
 
 .. autoclass:: InterleavedPairedEndWriter
-   :members: write
-
-.. autoclass:: SingleEndReader
-
-.. autoclass:: PairedEndReader
-
-.. autoclass:: SingleEndWriter
-
-.. autoclass:: PairedEndWriter
-   :members: write
+   :show-inheritance:
 
 
 Chunked reading of sequence records
@@ -68,6 +81,12 @@ processed there.
 .. autofunction:: read_paired_chunks
 
 
+Functions
+---------
+
+.. autofunction:: records_are_mates
+
+
 Exceptions
 ----------
 


=====================================
doc/conf.py
=====================================
@@ -36,3 +36,6 @@ default_role = "obj"  # (or "any")
 
 issues_uri = "https://github.com/marcelm/dnaio/issues/{issue}"
 issues_pr_uri = "https://github.com/marcelm/dnaio/pull/{pr}"
+
+autodoc_typehints = "description"
+python_use_unqualified_type_names = True


=====================================
doc/tutorial.rst
=====================================
@@ -3,7 +3,7 @@ Tutorial
 
 This should get you started with using ``dnaio``.
 The only essential concepts to know about are
-the `dnaio.open` function and the `SequenceRecord` object.
+the `dnaio.open` function and the `~dnaio.SequenceRecord` object.
 
 
 Reading
@@ -35,7 +35,7 @@ A ``SequenceRecord`` has the attributes ``name``, ``sequence``
 and ``qualities``. All of these are ``str`` objects.
 The ``qualities`` attribute is ``None`` when reading FASTA files.
 
-This program uses the ``name`` attribute
+The following program uses the ``name`` attribute
 to check whether any sequence names are duplicated in a FASTA file::
 
     import dnaio
@@ -59,12 +59,13 @@ pass the ``mode="w"`` argument to ``dnaio.open``::
         writer.write(dnaio.SequenceRecord("name", "ACGT", "#B!#"))
 
 Here, a `~dnaio.FastqWriter` object is returned by ``dnaio.open``,
-which has a ``~dnaio.FastqWriter.write()`` method that accepts a ``SequenceRecord``.
+which has a `~dnaio.FastqWriter.write()` method that accepts a ``SequenceRecord``.
 
 Instead of constructing a single record from scratch,
-it may be more realistic to take input reads,
-process them somehow and write them to a new output file.
-The following program truncates all reads in the input file to a length of 30 nt
+in practice it is more realistic to take input reads,
+process them, and write them to a new output file.
+The following example program shows how that can be done.
+It truncates all reads in the input file to a length of 30 nt
 and writes them to another file::
 
     import dnaio
@@ -76,7 +77,7 @@ and writes them to another file::
 
 This also shows that `~dnaio.SequenceRecord` objects support slicing:
 ``record[:30]`` returns a new ``SequenceRecord`` object with the sequence and qualities
-trimmed to the first 30 characters (leaving the name unchanged).
+trimmed to the first 30 characters, leaving the name unchanged.
 
 
 Paired-end data
@@ -100,7 +101,7 @@ In this example, ``dnaio.open`` returns a `~dnaio.TwoFilePairedEndReader`.
 It also supports iteration, but instead of a single ``SequenceRecord``,
 it returns a pair of them.
 
-To read from interleaved paired-end data, the only change needed is to
+To read from interleaved paired-end data,
 pass ``interleaved=True`` to ``dnaio.open`` instead of a second file name::
 
     ...
@@ -110,11 +111,11 @@ pass ``interleaved=True`` to ``dnaio.open`` instead of a second file name::
 The ``PairedEndReader`` classes check whether the input files are properly paired,
 that is, whether they have the same number of reads in both inputs and whether the
 read names match.
-For this reason, always use a single call to ``dnaio.open`` to open paired-end files.
-(Avoid opening them as two single-end files.)
+For this reason, always use a single call to ``dnaio.open`` to open paired-end files
+(that is, avoid opening them as two single-end files.)
 
 To demonstrate how to write paired-end data,
-we show a program that reads from a single-end FASTQ file and converts them to
+we show a program that reads from a single-end FASTQ file and converts the records to
 simulated paired-end reads by writing the first 30 nt to R1 and the last 30 nt
 to R2::
 


=====================================
pyproject.toml
=====================================
@@ -1,6 +1,9 @@
 [build-system]
-requires = ["setuptools >= 52", "wheel", "setuptools_scm >= 6.2", "Cython >= 0.29.20"]
+requires = ["setuptools >= 52", "setuptools_scm >= 6.2", "Cython >= 0.29.20"]
 build-backend = "setuptools.build_meta"
 
 [tool.setuptools_scm]
 write_to = "src/dnaio/_version.py"
+
+[tool.pytest.ini_options]
+testpaths = ["tests"]


=====================================
setup.cfg
=====================================
@@ -5,7 +5,7 @@ author_email = marcel.martin at scilifelab.se
 url = https://dnaio.readthedocs.io/
 description = Read and write FASTA and FASTQ files efficiently
 long_description = file: README.rst
-long_description_content_type = text/markdown
+long_description_content_type = text/x-rst
 license = MIT
 project_urls =
     Changelog = https://dnaio.readthedocs.io/en/latest/changes.html


=====================================
src/dnaio/__init__.py
=====================================
@@ -41,6 +41,7 @@ from .readers import FastaReader, FastqReader
 from .writers import FastaWriter, FastqWriter
 from .singleend import _open_single
 from .pairedend import (
+    _open_paired,
     TwoFilePairedEndReader,
     TwoFilePairedEndWriter,
     InterleavedPairedEndReader,
@@ -77,7 +78,7 @@ def open(
     opener=xopen
 ) -> Union[SingleEndReader, PairedEndReader, SingleEndWriter, PairedEndWriter]:
     """
-    Open sequence files in FASTA or FASTQ format for reading or writing.
+    Open one or two files in FASTA or FASTQ format for reading or writing.
 
     Parameters:
 
@@ -104,14 +105,15 @@ def open(
       qualities:
         When mode is ``'w'`` and fileformat is *None*, this can be set
         to *True* or *False* to specify whether the written sequences will have
-        quality values. This is is used in two ways:
+        quality values. This is used in two ways:
 
         - If the output format cannot be determined (unrecognized extension
-          etc), no exception is raised, but fasta or fastq format is chosen
+          etc.), no exception is raised, but FASTA or FASTQ format is chosen
           appropriately.
 
         - When False (no qualities available), an exception is raised when the
           auto-detected output format is FASTQ.
+
       opener: A function that is used to open file1 and file2 if they are not
         already open file-like objects. By default, ``xopen`` is used, which can
         also open compressed file formats.
@@ -122,41 +124,19 @@ def open(
     """
     if mode not in ("r", "rb", "w", "a"):
         raise ValueError("Mode must be 'r', 'rb', 'w' or 'a'")
-    if interleaved and file2 is not None:
-        raise ValueError("When interleaved is set, file2 must be None")
-
-    if file2 is not None:
-        if mode in "wa" and file1 == file2:
-            raise ValueError("The paired-end output files are identical")
-        if "r" in mode:
-            return TwoFilePairedEndReader(
-                file1, file2, fileformat=fileformat, opener=opener, mode=mode
-            )
-        append = mode == "a"
-        return TwoFilePairedEndWriter(
+    elif interleaved and file2 is not None:
+        raise ValueError("When interleaved is True, file2 must be None")
+    elif interleaved or file2 is not None:
+        return _open_paired(
             file1,
-            file2,
-            fileformat=fileformat,
-            qualities=qualities,
+            file2=file2,
             opener=opener,
-            append=append,
-        )
-    if interleaved:
-        if "r" in mode:
-            return InterleavedPairedEndReader(
-                file1, fileformat=fileformat, opener=opener, mode=mode
-            )
-        append = mode == "a"
-        return InterleavedPairedEndWriter(
-            file1,
             fileformat=fileformat,
+            interleaved=interleaved,
+            mode=mode,
             qualities=qualities,
-            opener=opener,
-            append=append,
         )
-
-    # The multi-file options have been dealt with, delegate rest to the
-    # single-file function.
-    return _open_single(
-        file1, opener=opener, fileformat=fileformat, mode=mode, qualities=qualities
-    )
+    else:
+        return _open_single(
+            file1, opener=opener, fileformat=fileformat, mode=mode, qualities=qualities
+        )


=====================================
src/dnaio/_core.pyx
=====================================
@@ -239,10 +239,7 @@ cdef class SequenceRecord:
 
 
     def fastq_bytes_two_headers(self):
-        """
-        Return this record in FASTQ format as a bytes object where the header (after the @) is
-        repeated on the third line.
-        """
+        # Deprecated, use ``.fastq_bytes(two_headers=True)`` instead.
         return self.fastq_bytes(two_headers=True)
 
     def is_mate(self, SequenceRecord other):
@@ -619,11 +616,11 @@ cdef inline bint record_ids_match(char *header1,
                                   size_t header2_length,
                                   bint id1_ends_with_number):
     """
-    Check whether the ASCII-encoded IDs match. 
-    
+    Check whether the ASCII-encoded IDs match.
+
     header1, header2 pointers to the ASCII-encoded headers
     id1_length, the length of header1 before the first whitespace
-    header2_length, the full length of header2. 
+    header2_length, the full length of header2.
     id1_ends_with_number, whether id1 ends with a 1,2 or 3.
     """
 
@@ -644,19 +641,26 @@ cdef inline bint record_ids_match(char *header1,
     return memcmp(<void *>header1, <void *>header2, id1_length) == 0
 
 
-def records_are_mates(*args):
+def records_are_mates(*args) -> bool:
     """
-    Check if provided SequenceRecord objects are all mates of each other by
+    Check if the provided `SequenceRecord` objects are all mates of each other by
     comparing their record IDs.
-    Accepts two or more SequenceRecord objects.
-    
-    Example usage: 
-    for records in zip(*all_my_fastq_readers):
-        if not records_are_mates(*records):
-            raise MateError(f"Ids do not match for {records}")
-     
+    Accepts two or more `SequenceRecord` objects.
+
+    This is the same as `SequenceRecord.is_mate` in the case of only two records,
+    but allows for for cases where information is split into three records or more
+    (such as UMI, R1, R2 or index, R1, R2).
+
+    If there are only two records to check, prefer `SequenceRecord.is_mate`.
+
+    Example usage::
+
+        for records in zip(*all_my_fastq_readers):
+            if not records_are_mates(*records):
+                raise MateError(f"IDs do not match for {records}")
+
     Args:
-        *args: two or more SequenceRecord objects
+        *args: two or more `~dnaio.SequenceRecord` objects
 
     Returns: True or False
     """


=====================================
src/dnaio/interfaces.py
=====================================
@@ -7,22 +7,36 @@ from dnaio import SequenceRecord
 class SingleEndReader(ABC):
     @abstractmethod
     def __iter__(self) -> Iterator[SequenceRecord]:
-        pass
+        """Yield the records in the input as `SequenceRecord` objects."""
 
 
 class PairedEndReader(ABC):
     @abstractmethod
     def __iter__(self) -> Iterator[Tuple[SequenceRecord, SequenceRecord]]:
-        pass
+        """
+        Yield the records in the paired-end input as pairs of `SequenceRecord` objects.
+
+        Raises a `FileFormatError` if reads are improperly paired, that is,
+        if there are more reads in one file than the other or if the record IDs
+        do not match (according to `SequenceRecord.is_mate`).
+        """
 
 
 class SingleEndWriter(ABC):
     @abstractmethod
     def write(self, record: SequenceRecord) -> None:
-        pass
+        """Write a `SequenceRecord` to the output."""
 
 
 class PairedEndWriter(ABC):
     @abstractmethod
     def write(self, record1: SequenceRecord, record2: SequenceRecord) -> None:
-        pass
+        """
+        Write a pair of `SequenceRecord` objects to the paired-end output.
+
+        This method does not verify that both records have matching IDs
+        because this was already done at parsing time. If it is possible
+        that the record IDs no longer match, check that
+        ``record1.is_mate(record2)`` returns True before calling
+        this function.
+        """


=====================================
src/dnaio/pairedend.py
=====================================
@@ -12,12 +12,56 @@ from .writers import FastaWriter, FastqWriter
 from .singleend import _open_single
 
 
+def _open_paired(
+    file1: Union[str, PathLike, BinaryIO],
+    *,
+    file2: Optional[Union[str, PathLike, BinaryIO]] = None,
+    fileformat: Optional[str] = None,
+    interleaved: bool = False,
+    mode: str = "r",
+    qualities: Optional[bool] = None,
+    opener=xopen,
+) -> Union[PairedEndReader, PairedEndWriter]:
+    """
+    Open paired-end reads
+    """
+    if interleaved and file2 is not None:
+        raise ValueError("When interleaved is True, file2 must be None")
+    if file2 is not None:
+        if mode in "wa" and file1 == file2:
+            raise ValueError("The paired-end output files are identical")
+        if "r" in mode:
+            return TwoFilePairedEndReader(
+                file1, file2, fileformat=fileformat, opener=opener, mode=mode
+            )
+        append = mode == "a"
+        return TwoFilePairedEndWriter(
+            file1,
+            file2,
+            fileformat=fileformat,
+            qualities=qualities,
+            opener=opener,
+            append=append,
+        )
+    if interleaved:
+        if "r" in mode:
+            return InterleavedPairedEndReader(
+                file1, fileformat=fileformat, opener=opener, mode=mode
+            )
+        append = mode == "a"
+        return InterleavedPairedEndWriter(
+            file1,
+            fileformat=fileformat,
+            qualities=qualities,
+            opener=opener,
+            append=append,
+        )
+    assert False
+
+
 class TwoFilePairedEndReader(PairedEndReader):
     """
     Read paired-end reads from two files.
-
-    Wraps two BinaryFileReader instances, making sure that reads are properly
-    paired.
     """
 
     paired = True


=====================================
src/dnaio/readers.py
=====================================
@@ -4,6 +4,7 @@ Classes for reading FASTA and FASTQ files
 __all__ = ["FastaReader", "FastqReader"]
 
 import io
+from os import PathLike
 from typing import Union, BinaryIO, Optional, Iterator, List
 
 from xopen import xopen
@@ -25,7 +26,7 @@ class BinaryFileReader:
 
     def __init__(
         self,
-        file: Union[str, BinaryIO],
+        file: Union[PathLike, str, BinaryIO],
         *,
         opener=xopen,
         _close_file: Optional[bool] = None,
@@ -67,7 +68,7 @@ class FastaReader(BinaryFileReader, SingleEndReader):
 
     def __init__(
         self,
-        file: Union[str, BinaryIO],
+        file: Union[PathLike, str, BinaryIO],
         *,
         keep_linebreaks: bool = False,
         sequence_class=SequenceRecord,
@@ -89,7 +90,7 @@ class FastaReader(BinaryFileReader, SingleEndReader):
 
     def __iter__(self) -> Iterator[SequenceRecord]:
         """
-        Read next entry from the file (single entry at a time).
+        Iterate over the records in this FASTA file.
         """
         name = None
         seq: List[str] = []
@@ -128,7 +129,7 @@ class FastqReader(BinaryFileReader, SingleEndReader):
 
     def __init__(
         self,
-        file: Union[str, BinaryIO],
+        file: Union[PathLike, str, BinaryIO],
         *,
         sequence_class=SequenceRecord,
         buffer_size: int = 128 * 1024,  # Buffer size used by cat, pigz etc.
@@ -165,6 +166,7 @@ class FastqReader(BinaryFileReader, SingleEndReader):
             raise
 
     def __iter__(self) -> Iterator[SequenceRecord]:
+        """Iterate over the records in this FASTQ file."""
         return self._iter
 
     @property


=====================================
src/dnaio/singleend.py
=====================================
@@ -16,7 +16,7 @@ def _open_single(
     qualities: Optional[bool] = None,
 ) -> Union[FastaReader, FastaWriter, FastqReader, FastqWriter]:
     """
-    Open a single sequence file. See description of open() above.
+    Open a single sequence file.
     """
     if mode not in ("r", "w", "a"):
         raise ValueError("Mode must be 'r', 'w' or 'a'")


=====================================
src/dnaio/writers.py
=====================================
@@ -140,9 +140,10 @@ class FastqWriter(FileWriter, SingleEndWriter):
 
     def write(self, record: SequenceRecord) -> None:
         """
-        Dummy method to make it possible to instantiate this class.
-        The correct write method is assigned in the constructor.
+        Write a record to the FASTQ file.
         """
+        # The 'write' attribute is overwritten in the constructor with the correct
+        # write method (_write or _write_two_headers)
         assert False
 
     def _write(self, record: SequenceRecord) -> None:
@@ -159,4 +160,5 @@ class FastqWriter(FileWriter, SingleEndWriter):
         self._file.write(record.fastq_bytes(two_headers=True))
 
     def writeseq(self, name: str, sequence: str, qualities: str) -> None:
+        # Deprecated
         self._file.write(f"@{name:s}\n{sequence:s}\n+\n{qualities:s}\n".encode("ascii"))



View it on GitLab: https://salsa.debian.org/med-team/python-dnaio/-/commit/7a5dc96e45f1471a6a42950ddad9554b53fa6c33

-- 
View it on GitLab: https://salsa.debian.org/med-team/python-dnaio/-/commit/7a5dc96e45f1471a6a42950ddad9554b53fa6c33
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20220825/507f1180/attachment-0001.htm>