[med-svn] [Git][med-team/python-dnaio][upstream] New upstream version 0.9.1
Andreas Tille (@tille)
gitlab at salsa.debian.org
Thu Aug 25 17:15:03 BST 2022
Andreas Tille pushed to branch upstream at Debian Med / python-dnaio
Commits:
7a5dc96e by Andreas Tille at 2022-08-25T18:13:18+02:00
New upstream version 0.9.1
- - - - -
15 changed files:
- .github/workflows/ci.yml
- CHANGES.rst
- README.rst
- doc/api.rst
- doc/conf.py
- doc/tutorial.rst
- pyproject.toml
- setup.cfg
- src/dnaio/__init__.py
- src/dnaio/_core.pyx
- src/dnaio/interfaces.py
- src/dnaio/pairedend.py
- src/dnaio/readers.py
- src/dnaio/singleend.py
- src/dnaio/writers.py
Changes:
=====================================
.github/workflows/ci.yml
=====================================
@@ -65,7 +65,7 @@ jobs:
timeout-minutes: 15
strategy:
matrix:
- os: [ubuntu-20.04, windows-2019]
+ os: [ubuntu-20.04, windows-2019, macos-10.15]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout at v2
@@ -74,10 +74,11 @@ jobs:
- name: Build wheels
uses: pypa/cibuildwheel at v2.3.1
env:
- CIBW_BUILD: "cp*-manylinux_x86_64 cp3*-win_amd64"
+ CIBW_BUILD: "cp*-manylinux_x86_64 cp3*-win_amd64 cp3*-macosx_x86_64"
CIBW_ENVIRONMENT: "CFLAGS=-g0"
CIBW_TEST_REQUIRES: "pytest"
CIBW_TEST_COMMAND_LINUX: "cd {project} && pytest tests"
+ CIBW_TEST_COMMAND_MACOS: "cd {project} && pytest tests"
CIBW_TEST_COMMAND_WINDOWS: "cd /d {project} && pytest tests"
- uses: actions/upload-artifact at v2
with:
=====================================
CHANGES.rst
=====================================
@@ -2,6 +2,13 @@
Changelog
=========
+v0.9.1 (2022-08-01)
+-------------------
+
+* :pr:`85`: macOS wheels are now also built as part of the release procedure.
+* :pr:`81`: API documentation improvements and minor code refactors for
+ readability.
+
v0.9.0 (2022-05-17)
-------------------
=====================================
README.rst
=====================================
@@ -30,7 +30,8 @@ The main interface is the `dnaio.open <https://dnaio.readthedocs.io/en/latest/ap
bp += len(record)
print(f"The input file contains {bp/1E6:.1f} Mbp")
-See the `documentation <https://dnaio.readthedocs.io/>`_ for more.
+For more, see the `tutorial <https://dnaio.readthedocs.io/en/latest/tutorial.html>`_ and
+`API documentation <https://dnaio.readthedocs.io/en/latest/api.html>`_.
Features and supported file types
=================================
@@ -43,14 +44,12 @@ Features and supported file types
- Files with DOS/Windows linebreaks can be read
- FASTQ files with a second header line (after the ``+``) are supported
-
Limitations
===========
- Multi-line FASTQ files are not supported.
- FASTQ parsing is the focus of this library. The FASTA parser is not as optimized.
-
Links
=====
=====================================
doc/api.rst
=====================================
@@ -4,6 +4,7 @@ The dnaio API
.. module:: dnaio
+
The open function
-----------------
@@ -20,40 +21,52 @@ The ``SequenceRecord`` class
.. automethod:: __init__(name: str, sequence: str, qualities: Optional[str] = None)
+Reader and writer interfaces
+----------------------------
+
+.. autoclass:: SingleEndReader
+ :members: __iter__
+
+.. autoclass:: PairedEndReader
+ :members: __iter__
+
+.. autoclass:: SingleEndWriter
+ :members: write
+
+.. autoclass:: PairedEndWriter
+ :members: write
+
+
Reader and writer classes
-------------------------
+The `dnaio.open` function returns an instance of one of the following classes.
+They can also be used directly if needed.
+
+
.. autoclass:: FastaReader
+ :show-inheritance:
.. autoclass:: FastaWriter
- :members: write
+ :show-inheritance:
.. autoclass:: FastqReader
+ :show-inheritance:
.. autoclass:: FastqWriter
- :members: writeseq
-
- .. py:method:: write(record: SequenceRecord) -> None:
-
- Write a SequenceRecord to the FASTQ file.
+ :show-inheritance:
.. autoclass:: TwoFilePairedEndReader
+ :show-inheritance:
.. autoclass:: TwoFilePairedEndWriter
+ :show-inheritance:
.. autoclass:: InterleavedPairedEndReader
+ :show-inheritance:
.. autoclass:: InterleavedPairedEndWriter
- :members: write
-
-.. autoclass:: SingleEndReader
-
-.. autoclass:: PairedEndReader
-
-.. autoclass:: SingleEndWriter
-
-.. autoclass:: PairedEndWriter
- :members: write
+ :show-inheritance:
Chunked reading of sequence records
@@ -68,6 +81,12 @@ processed there.
.. autofunction:: read_paired_chunks
+Functions
+---------
+
+.. autofunction:: records_are_mates
+
+
Exceptions
----------
=====================================
doc/conf.py
=====================================
@@ -36,3 +36,6 @@ default_role = "obj" # (or "any")
issues_uri = "https://github.com/marcelm/dnaio/issues/{issue}"
issues_pr_uri = "https://github.com/marcelm/dnaio/pull/{pr}"
+
+autodoc_typehints = "description"
+python_use_unqualified_type_names = True
=====================================
doc/tutorial.rst
=====================================
@@ -3,7 +3,7 @@ Tutorial
This should get you started with using ``dnaio``.
The only essential concepts to know about are
-the `dnaio.open` function and the `SequenceRecord` object.
+the `dnaio.open` function and the `~dnaio.SequenceRecord` object.
Reading
@@ -35,7 +35,7 @@ A ``SequenceRecord`` has the attributes ``name``, ``sequence``
and ``qualities``. All of these are ``str`` objects.
The ``qualities`` attribute is ``None`` when reading FASTA files.
-This program uses the ``name`` attribute
+The following program uses the ``name`` attribute
to check whether any sequence names are duplicated in a FASTA file::
import dnaio
@@ -59,12 +59,13 @@ pass the ``mode="w"`` argument to ``dnaio.open``::
writer.write(dnaio.SequenceRecord("name", "ACGT", "#B!#"))
Here, a `~dnaio.FastqWriter` object is returned by ``dnaio.open``,
-which has a ``~dnaio.FastqWriter.write()`` method that accepts a ``SequenceRecord``.
+which has a `~dnaio.FastqWriter.write()` method that accepts a ``SequenceRecord``.
Instead of constructing a single record from scratch,
-it may be more realistic to take input reads,
-process them somehow and write them to a new output file.
-The following program truncates all reads in the input file to a length of 30 nt
+in practice it is more realistic to take input reads,
+process them, and write them to a new output file.
+The following example program shows how that can be done.
+It truncates all reads in the input file to a length of 30 nt
and writes them to another file::
import dnaio
@@ -76,7 +77,7 @@ and writes them to another file::
This also shows that `~dnaio.SequenceRecord` objects support slicing:
``record[:30]`` returns a new ``SequenceRecord`` object with the sequence and qualities
-trimmed to the first 30 characters (leaving the name unchanged).
+trimmed to the first 30 characters, leaving the name unchanged.
Paired-end data
@@ -100,7 +101,7 @@ In this example, ``dnaio.open`` returns a `~dnaio.TwoFilePairedEndReader`.
It also supports iteration, but instead of a single ``SequenceRecord``,
it returns a pair of them.
-To read from interleaved paired-end data, the only change needed is to
+To read from interleaved paired-end data,
pass ``interleaved=True`` to ``dnaio.open`` instead of a second file name::
...
@@ -110,11 +111,11 @@ pass ``interleaved=True`` to ``dnaio.open`` instead of a second file name::
The ``PairedEndReader`` classes check whether the input files are properly paired,
that is, whether they have the same number of reads in both inputs and whether the
read names match.
-For this reason, always use a single call to ``dnaio.open`` to open paired-end files.
-(Avoid opening them as two single-end files.)
+For this reason, always use a single call to ``dnaio.open`` to open paired-end files
+(that is, avoid opening them as two single-end files.)
To demonstrate how to write paired-end data,
-we show a program that reads from a single-end FASTQ file and converts them to
+we show a program that reads from a single-end FASTQ file and converts the records to
simulated paired-end reads by writing the first 30 nt to R1 and the last 30 nt
to R2::
=====================================
pyproject.toml
=====================================
@@ -1,6 +1,9 @@
[build-system]
-requires = ["setuptools >= 52", "wheel", "setuptools_scm >= 6.2", "Cython >= 0.29.20"]
+requires = ["setuptools >= 52", "setuptools_scm >= 6.2", "Cython >= 0.29.20"]
build-backend = "setuptools.build_meta"
[tool.setuptools_scm]
write_to = "src/dnaio/_version.py"
+
+[tool.pytest.ini_options]
+testpaths = ["tests"]
=====================================
setup.cfg
=====================================
@@ -5,7 +5,7 @@ author_email = marcel.martin at scilifelab.se
url = https://dnaio.readthedocs.io/
description = Read and write FASTA and FASTQ files efficiently
long_description = file: README.rst
-long_description_content_type = text/markdown
+long_description_content_type = text/x-rst
license = MIT
project_urls =
Changelog = https://dnaio.readthedocs.io/en/latest/changes.html
=====================================
src/dnaio/__init__.py
=====================================
@@ -41,6 +41,7 @@ from .readers import FastaReader, FastqReader
from .writers import FastaWriter, FastqWriter
from .singleend import _open_single
from .pairedend import (
+ _open_paired,
TwoFilePairedEndReader,
TwoFilePairedEndWriter,
InterleavedPairedEndReader,
@@ -77,7 +78,7 @@ def open(
opener=xopen
) -> Union[SingleEndReader, PairedEndReader, SingleEndWriter, PairedEndWriter]:
"""
- Open sequence files in FASTA or FASTQ format for reading or writing.
+ Open one or two files in FASTA or FASTQ format for reading or writing.
Parameters:
@@ -104,14 +105,15 @@ def open(
qualities:
When mode is ``'w'`` and fileformat is *None*, this can be set
to *True* or *False* to specify whether the written sequences will have
- quality values. This is is used in two ways:
+ quality values. This is used in two ways:
- If the output format cannot be determined (unrecognized extension
- etc), no exception is raised, but fasta or fastq format is chosen
+ etc.), no exception is raised, but FASTA or FASTQ format is chosen
appropriately.
- When False (no qualities available), an exception is raised when the
auto-detected output format is FASTQ.
+
opener: A function that is used to open file1 and file2 if they are not
already open file-like objects. By default, ``xopen`` is used, which can
also open compressed file formats.
@@ -122,41 +124,19 @@ def open(
"""
if mode not in ("r", "rb", "w", "a"):
raise ValueError("Mode must be 'r', 'rb', 'w' or 'a'")
- if interleaved and file2 is not None:
- raise ValueError("When interleaved is set, file2 must be None")
-
- if file2 is not None:
- if mode in "wa" and file1 == file2:
- raise ValueError("The paired-end output files are identical")
- if "r" in mode:
- return TwoFilePairedEndReader(
- file1, file2, fileformat=fileformat, opener=opener, mode=mode
- )
- append = mode == "a"
- return TwoFilePairedEndWriter(
+ elif interleaved and file2 is not None:
+ raise ValueError("When interleaved is True, file2 must be None")
+ elif interleaved or file2 is not None:
+ return _open_paired(
file1,
- file2,
- fileformat=fileformat,
- qualities=qualities,
+ file2=file2,
opener=opener,
- append=append,
- )
- if interleaved:
- if "r" in mode:
- return InterleavedPairedEndReader(
- file1, fileformat=fileformat, opener=opener, mode=mode
- )
- append = mode == "a"
- return InterleavedPairedEndWriter(
- file1,
fileformat=fileformat,
+ interleaved=interleaved,
+ mode=mode,
qualities=qualities,
- opener=opener,
- append=append,
)
-
- # The multi-file options have been dealt with, delegate rest to the
- # single-file function.
- return _open_single(
- file1, opener=opener, fileformat=fileformat, mode=mode, qualities=qualities
- )
+ else:
+ return _open_single(
+ file1, opener=opener, fileformat=fileformat, mode=mode, qualities=qualities
+ )
=====================================
src/dnaio/_core.pyx
=====================================
@@ -239,10 +239,7 @@ cdef class SequenceRecord:
def fastq_bytes_two_headers(self):
- """
- Return this record in FASTQ format as a bytes object where the header (after the @) is
- repeated on the third line.
- """
+ # Deprecated, use ``.fastq_bytes(two_headers=True)`` instead.
return self.fastq_bytes(two_headers=True)
def is_mate(self, SequenceRecord other):
@@ -619,11 +616,11 @@ cdef inline bint record_ids_match(char *header1,
size_t header2_length,
bint id1_ends_with_number):
"""
- Check whether the ASCII-encoded IDs match.
-
+ Check whether the ASCII-encoded IDs match.
+
header1, header2 pointers to the ASCII-encoded headers
id1_length, the length of header1 before the first whitespace
- header2_length, the full length of header2.
+ header2_length, the full length of header2.
id1_ends_with_number, whether id1 ends with a 1,2 or 3.
"""
@@ -644,19 +641,26 @@ cdef inline bint record_ids_match(char *header1,
return memcmp(<void *>header1, <void *>header2, id1_length) == 0
-def records_are_mates(*args):
+def records_are_mates(*args) -> bool:
"""
- Check if provided SequenceRecord objects are all mates of each other by
+ Check if the provided `SequenceRecord` objects are all mates of each other by
comparing their record IDs.
- Accepts two or more SequenceRecord objects.
-
- Example usage:
- for records in zip(*all_my_fastq_readers):
- if not records_are_mates(*records):
- raise MateError(f"Ids do not match for {records}")
-
+ Accepts two or more `SequenceRecord` objects.
+
+ This is the same as `SequenceRecord.is_mate` in the case of only two records,
+ but allows for for cases where information is split into three records or more
+ (such as UMI, R1, R2 or index, R1, R2).
+
+ If there are only two records to check, prefer `SequenceRecord.is_mate`.
+
+ Example usage::
+
+ for records in zip(*all_my_fastq_readers):
+ if not records_are_mates(*records):
+ raise MateError(f"IDs do not match for {records}")
+
Args:
- *args: two or more SequenceRecord objects
+ *args: two or more `~dnaio.SequenceRecord` objects
Returns: True or False
"""
=====================================
src/dnaio/interfaces.py
=====================================
@@ -7,22 +7,36 @@ from dnaio import SequenceRecord
class SingleEndReader(ABC):
@abstractmethod
def __iter__(self) -> Iterator[SequenceRecord]:
- pass
+ """Yield the records in the input as `SequenceRecord` objects."""
class PairedEndReader(ABC):
@abstractmethod
def __iter__(self) -> Iterator[Tuple[SequenceRecord, SequenceRecord]]:
- pass
+ """
+ Yield the records in the paired-end input as pairs of `SequenceRecord` objects.
+
+ Raises a `FileFormatError` if reads are improperly paired, that is,
+ if there are more reads in one file than the other or if the record IDs
+ do not match (according to `SequenceRecord.is_mate`).
+ """
class SingleEndWriter(ABC):
@abstractmethod
def write(self, record: SequenceRecord) -> None:
- pass
+ """Write a `SequenceRecord` to the output."""
class PairedEndWriter(ABC):
@abstractmethod
def write(self, record1: SequenceRecord, record2: SequenceRecord) -> None:
- pass
+ """
+ Write a pair of `SequenceRecord` objects to the paired-end output.
+
+ This method does not verify that both records have matching IDs
+ because this was already done at parsing time. If it is possible
+ that the record IDs no longer match, check that
+ ``record1.is_mate(record2)`` returns True before calling
+ this function.
+ """
=====================================
src/dnaio/pairedend.py
=====================================
@@ -12,12 +12,56 @@ from .writers import FastaWriter, FastqWriter
from .singleend import _open_single
+def _open_paired(
+ file1: Union[str, PathLike, BinaryIO],
+ *,
+ file2: Optional[Union[str, PathLike, BinaryIO]] = None,
+ fileformat: Optional[str] = None,
+ interleaved: bool = False,
+ mode: str = "r",
+ qualities: Optional[bool] = None,
+ opener=xopen,
+) -> Union[PairedEndReader, PairedEndWriter]:
+ """
+ Open paired-end reads
+ """
+ if interleaved and file2 is not None:
+ raise ValueError("When interleaved is True, file2 must be None")
+ if file2 is not None:
+ if mode in "wa" and file1 == file2:
+ raise ValueError("The paired-end output files are identical")
+ if "r" in mode:
+ return TwoFilePairedEndReader(
+ file1, file2, fileformat=fileformat, opener=opener, mode=mode
+ )
+ append = mode == "a"
+ return TwoFilePairedEndWriter(
+ file1,
+ file2,
+ fileformat=fileformat,
+ qualities=qualities,
+ opener=opener,
+ append=append,
+ )
+ if interleaved:
+ if "r" in mode:
+ return InterleavedPairedEndReader(
+ file1, fileformat=fileformat, opener=opener, mode=mode
+ )
+ append = mode == "a"
+ return InterleavedPairedEndWriter(
+ file1,
+ fileformat=fileformat,
+ qualities=qualities,
+ opener=opener,
+ append=append,
+ )
+ assert False
+
+
class TwoFilePairedEndReader(PairedEndReader):
"""
Read paired-end reads from two files.
-
- Wraps two BinaryFileReader instances, making sure that reads are properly
- paired.
"""
paired = True
=====================================
src/dnaio/readers.py
=====================================
@@ -4,6 +4,7 @@ Classes for reading FASTA and FASTQ files
__all__ = ["FastaReader", "FastqReader"]
import io
+from os import PathLike
from typing import Union, BinaryIO, Optional, Iterator, List
from xopen import xopen
@@ -25,7 +26,7 @@ class BinaryFileReader:
def __init__(
self,
- file: Union[str, BinaryIO],
+ file: Union[PathLike, str, BinaryIO],
*,
opener=xopen,
_close_file: Optional[bool] = None,
@@ -67,7 +68,7 @@ class FastaReader(BinaryFileReader, SingleEndReader):
def __init__(
self,
- file: Union[str, BinaryIO],
+ file: Union[PathLike, str, BinaryIO],
*,
keep_linebreaks: bool = False,
sequence_class=SequenceRecord,
@@ -89,7 +90,7 @@ class FastaReader(BinaryFileReader, SingleEndReader):
def __iter__(self) -> Iterator[SequenceRecord]:
"""
- Read next entry from the file (single entry at a time).
+ Iterate over the records in this FASTA file.
"""
name = None
seq: List[str] = []
@@ -128,7 +129,7 @@ class FastqReader(BinaryFileReader, SingleEndReader):
def __init__(
self,
- file: Union[str, BinaryIO],
+ file: Union[PathLike, str, BinaryIO],
*,
sequence_class=SequenceRecord,
buffer_size: int = 128 * 1024, # Buffer size used by cat, pigz etc.
@@ -165,6 +166,7 @@ class FastqReader(BinaryFileReader, SingleEndReader):
raise
def __iter__(self) -> Iterator[SequenceRecord]:
+ """Iterate over the records in this FASTQ file."""
return self._iter
@property
=====================================
src/dnaio/singleend.py
=====================================
@@ -16,7 +16,7 @@ def _open_single(
qualities: Optional[bool] = None,
) -> Union[FastaReader, FastaWriter, FastqReader, FastqWriter]:
"""
- Open a single sequence file. See description of open() above.
+ Open a single sequence file.
"""
if mode not in ("r", "w", "a"):
raise ValueError("Mode must be 'r', 'w' or 'a'")
=====================================
src/dnaio/writers.py
=====================================
@@ -140,9 +140,10 @@ class FastqWriter(FileWriter, SingleEndWriter):
def write(self, record: SequenceRecord) -> None:
"""
- Dummy method to make it possible to instantiate this class.
- The correct write method is assigned in the constructor.
+ Write a record to the FASTQ file.
"""
+ # The 'write' attribute is overwritten in the constructor with the correct
+ # write method (_write or _write_two_headers)
assert False
def _write(self, record: SequenceRecord) -> None:
@@ -159,4 +160,5 @@ class FastqWriter(FileWriter, SingleEndWriter):
self._file.write(record.fastq_bytes(two_headers=True))
def writeseq(self, name: str, sequence: str, qualities: str) -> None:
+ # Deprecated
self._file.write(f"@{name:s}\n{sequence:s}\n+\n{qualities:s}\n".encode("ascii"))
View it on GitLab: https://salsa.debian.org/med-team/python-dnaio/-/commit/7a5dc96e45f1471a6a42950ddad9554b53fa6c33
--
View it on GitLab: https://salsa.debian.org/med-team/python-dnaio/-/commit/7a5dc96e45f1471a6a42950ddad9554b53fa6c33
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20220825/507f1180/attachment-0001.htm>
More information about the debian-med-commit
mailing list