[med-svn] [Git][med-team/python-dnaio][upstream] New upstream version 1.2.0
Lance Lin (@linqigang)
gitlab at salsa.debian.org
Fri Feb 16 13:12:28 GMT 2024
Lance Lin pushed to branch upstream at Debian Med / python-dnaio
Commits:
23bbce11 by Lance Lin at 2024-02-16T19:37:06+07:00
New upstream version 1.2.0
- - - - -
25 changed files:
- .gitattributes
- .github/workflows/ci.yml
- CHANGES.rst
- LICENSE
- README.rst
- doc/api.rst
- pyproject.toml
- src/dnaio/__init__.py
- src/dnaio/_core.pyi
- src/dnaio/_core.pyx
- + src/dnaio/bam.h
- src/dnaio/chunks.py
- src/dnaio/readers.py
- src/dnaio/singleend.py
- + tests/data/project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878.bwa.markDuplicates.bam
- + tests/data/project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878.bwa.markDuplicates.header.sam
- + tests/data/project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878.bwa.markDuplicates.sam
- + tests/data/project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878.bwa.markDuplicates.unmapped.bam
- + tests/data/project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878.bwa.markDuplicates.unmapped.sam
- + tests/data/simple.unaligned.bam
- + tests/data/simplebamnoextension
- tests/test_chunks.py
- tests/test_internal.py
- tests/test_open.py
- tox.ini
Changes:
=====================================
.gitattributes
=====================================
@@ -1,2 +1,3 @@
*.fastq -crlf
*.fasta -crlf
+*.sam -crlf
=====================================
.github/workflows/ci.yml
=====================================
@@ -51,11 +51,15 @@ jobs:
matrix:
os: [ubuntu-latest]
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]
+ compile_flags: [""]
include:
- os: macos-latest
python-version: "3.10"
- os: windows-latest
python-version: "3.10"
+ - os: ubuntu-latest
+ python-version: "3.10"
+ compile_flags: "-mssse3"
steps:
- uses: actions/checkout at v3
- name: Set up Python ${{ matrix.python-version }}
@@ -66,6 +70,8 @@ jobs:
run: python -m pip install tox
- name: Test
run: tox -e py
+ env:
+ CFLAGS: ${{ matrix.compile_flags }}
- name: Upload coverage report
uses: codecov/codecov-action at v3
=====================================
CHANGES.rst
=====================================
@@ -2,6 +2,19 @@
Changelog
=========
+v1.2.0 (2023-12-11)
+-------------------
+
+* :pr:`124`: Added support for chunking FASTA reads to ``read_paired_chunks``.
+ Previously, only FASTQ was supported.
+
+v1.1.0 (2023-11-20)
+-------------------
+
+* :pr:`116`: Added experimental support for reading unaligned BAM files
+ (single-end only at the moment). This uses a custom, highly efficient
+ BAM parser, making dnaio faster than htslib in this particular case.
+
v1.0.1 (2023-10-06)
-------------------
=====================================
LICENSE
=====================================
@@ -1,4 +1,4 @@
-Copyright (c) 2010-2018 Marcel Martin <marcel.martin at scilifelab.se>
+Copyright (c) 2010 Marcel Martin <marcel.martin at scilifelab.se>
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
=====================================
README.rst
=====================================
@@ -13,7 +13,7 @@
dnaio processes FASTQ and FASTA files
=====================================
-``dnaio`` is a Python 3.7+ library for very efficient parsing and writing of FASTQ and also FASTA files.
+``dnaio`` is a Python 3.8+ library for very efficient parsing and writing of FASTQ and also FASTA files.
The code was previously part of the
`Cutadapt <https://cutadapt.readthedocs.io/>`_ tool and has been improved significantly since it has been split out.
@@ -33,12 +33,22 @@ The main interface is the `dnaio.open <https://dnaio.readthedocs.io/en/latest/ap
For more, see the `tutorial <https://dnaio.readthedocs.io/en/latest/tutorial.html>`_ and
`API documentation <https://dnaio.readthedocs.io/en/latest/api.html>`_.
+Installation
+============
+
+Using pip::
+
+ pip install dnaio zstandard
+
+``zstandard`` can be omitted if support for Zstandard (``.zst``) files is not required.
+
Features and supported file types
=================================
- FASTQ input and output
- FASTA input and output
-- Compressed input and output (``.gz``, ``.bz2`` and ``.xz``, detected automatically)
+- BAM input
+- Compressed input and output (``.gz``, ``.bz2``, ``.xz`` and ``.zst`` are detected automatically)
- Paired-end data in two files
- Interleaved paired-end data in a single file
- Files with DOS/Windows linebreaks can be read
@@ -47,8 +57,8 @@ Features and supported file types
Limitations
===========
-- Multi-line FASTQ files are not supported.
-- FASTQ parsing is the focus of this library. The FASTA parser is not as optimized.
+- Multi-line FASTQ files are not supported
+- FASTQ and BAM parsing is the focus of this library. The FASTA parser is not as optimized
Links
=====
=====================================
doc/api.rst
=====================================
@@ -59,6 +59,9 @@ They can also be used directly if needed.
.. autoclass:: FastqWriter
:show-inheritance:
+.. autoclass:: BamReader
+ :show-inheritance:
+
.. autoclass:: TwoFilePairedEndReader
:show-inheritance:
=====================================
pyproject.toml
=====================================
@@ -45,8 +45,16 @@ write_to = "src/dnaio/_version.py"
[tool.pytest.ini_options]
testpaths = ["tests"]
+[tool.cibuildwheel.windows.environment]
+CFLAGS = "-g0 -DNDEBUG"
+
+[tool.cibuildwheel.macos.environment]
+CFLAGS = "-g0 -DNDEBUG"
+
+[tool.cibuildwheel.linux.environment]
+CFLAGS = "-g0 -DNDEBUG -mssse3"
+
[tool.cibuildwheel]
-environment = "CFLAGS=-g0"
test-requires = "pytest"
test-command = ["cd {project}", "pytest tests"]
=====================================
src/dnaio/__init__.py
=====================================
@@ -4,6 +4,7 @@ Sequence I/O: Read and write FASTA and FASTQ files efficiently
__all__ = [
"open",
+ "BamReader",
"SequenceRecord",
"SingleEndReader",
"PairedEndReader",
@@ -41,7 +42,7 @@ from ._core import (
)
from ._core import record_names_match # noqa: F401 # deprecated
from ._core import records_are_mates
-from .readers import FastaReader, FastqReader
+from .readers import BamReader, FastaReader, FastqReader
from .writers import FastaWriter, FastqWriter
from .singleend import _open_single
from .pairedend import (
@@ -226,7 +227,8 @@ def open(
MultipleFileWriter,
]:
"""
- Open one or more FASTQ or FASTA files for reading or writing.
+ Open one or more FASTQ or FASTA files for reading or writing,
+ or open one (unaligned) BAM file for reading.
Parameters:
files:
@@ -242,6 +244,7 @@ def open(
mode:
Set to ``'r'`` for reading, ``'w'`` for writing or ``'a'`` for appending.
+ For BAM files, only reading is supported.
interleaved:
If True, then there must be only one file argument that contains
@@ -249,7 +252,7 @@ def open(
fileformat:
If *None*, the file format is autodetected from the file name
- extension. Set to ``'fasta'`` or ``'fastq'`` to not auto-detect.
+ extension. Set to ``'fasta'``, ``'fastq'`` or ``'bam'`` to not auto-detect.
qualities:
When mode is ``'w'`` and fileformat is *None*, this can be set
=====================================
src/dnaio/_core.pyi
=====================================
@@ -53,6 +53,15 @@ class FastqIter(Generic[T]):
@property
def number_of_records(self) -> int: ...
+class BamIter:
+ def __init__(self, file: BinaryIO, buffer_size: int): ...
+ def __iter__(self) -> Iterator[SequenceRecord]: ...
+ def __next__(self) -> SequenceRecord: ...
+ @property
+ def header(self) -> bytes: ...
+ @property
+ def number_of_records(self) -> int: ...
+
# Deprecated
def record_names_match(header1: str, header2: str) -> bool: ...
=====================================
src/dnaio/_core.pyx
=====================================
@@ -8,6 +8,8 @@ from cpython.object cimport Py_TYPE, PyTypeObject
from cpython.ref cimport PyObject
from cpython.tuple cimport PyTuple_GET_ITEM
from libc.string cimport memcmp, memcpy, memchr, strcspn, strspn, memmove
+from libc.stdint cimport uint8_t, uint16_t, uint32_t, int32_t
+
cimport cython
cdef extern from "Python.h":
@@ -21,6 +23,10 @@ cdef extern from "ascii_check.h":
cdef extern from "_conversions.h":
const char NUCLEOTIDE_COMPLEMENTS[256]
+cdef extern from "bam.h":
+ void decode_bam_sequence(void *dest, void *encoded_sequence, size_t length)
+ void decode_bam_qualities(uint8_t *dest, uint8_t *encoded_qualities, size_t length)
+
from .exceptions import FastqFormatError
from ._util import shorten
@@ -585,7 +591,7 @@ cdef class FastqIter:
second_header_start = sequence_end + 1
remaining_bytes = (buffer_end - second_header_start)
# Usually there is no second header, so we skip the memchr call.
- if remaining_bytes > 2 and second_header_start[0] == b'+' and second_header_start[1] == b'\n':
+ if remaining_bytes > 2 and memcmp(second_header_start, b"+\n", 2) == 0:
second_header_end = second_header_start + 1
else:
second_header_end = <char *>memchr(second_header_start, b'\n', <size_t>(remaining_bytes))
@@ -667,6 +673,168 @@ cdef class FastqIter:
self.record_start = qualities_end + 1
return ret_val
+cdef struct BamRecordHeader:
+ uint32_t block_size
+ int32_t reference_id
+ int32_t pos
+ uint8_t l_read_name
+ uint8_t mapq
+ uint16_t bin
+ uint16_t n_cigar_op
+ uint16_t flag
+ uint32_t l_seq
+ int32_t next_ref_id
+ int32_t next_pos
+ int32_t tlen
+
+cdef class BamIter:
+ cdef:
+ uint8_t *record_start
+ uint8_t *buffer_end
+ size_t read_in_size
+ uint8_t *read_in_buffer
+ size_t read_in_buffer_size
+ object file
+ readonly object header
+ readonly Py_ssize_t number_of_records
+
+ def __dealloc__(self):
+ PyMem_Free(self.read_in_buffer)
+
+ def __cinit__(self, fileobj, read_in_size = 48 * 1024):
+ if read_in_size < 4:
+ raise ValueError(f"read_in_size must be at least 4 got "
+ f"{read_in_size}")
+
+ # Skip ahead and save the BAM header for later inspection
+ magic_and_header_size = fileobj.read(8)
+ if not isinstance(magic_and_header_size, bytes):
+ raise TypeError(f"fileobj {fileobj} is not a binary IO type, "
+ f"got {type(fileobj)}")
+ if len(magic_and_header_size) < 8:
+ raise EOFError("Truncated BAM file")
+ if magic_and_header_size[:4] != b"BAM\1":
+ raise ValueError(
+ f"fileobj: {fileobj}, is not a BAM file. No BAM magic, instead "
+ f"found {magic_and_header_size[:4]}")
+ l_text = int.from_bytes(magic_and_header_size[4:], "little", signed=False)
+ header = fileobj.read(l_text)
+ if len(header) < l_text:
+ raise EOFError("Truncated BAM file")
+ n_ref_obj = fileobj.read(4)
+ if len(n_ref_obj) < 4:
+ raise EOFError("Truncated BAM file")
+ n_ref = int.from_bytes(n_ref_obj, "little", signed=False)
+ for i in range(n_ref):
+ l_name_obj = fileobj.read(4)
+ if len(l_name_obj) < 4:
+ raise EOFError("Truncated BAM file")
+ l_name = int.from_bytes(l_name_obj, "little", signed=False)
+ reference_chunk_size = l_name + 4 # Include name and uint32_t of size
+ reference_chunk = fileobj.read(reference_chunk_size)
+ if len(reference_chunk) < reference_chunk_size:
+ raise EOFError("Truncated BAM file")
+ # Fileobj is now skipped ahead and at the start of the BAM records
+
+ self.header = header
+ self.read_in_size = read_in_size
+ self.file = fileobj
+ self.read_in_buffer = NULL
+ self.read_in_buffer_size = 0
+ self.record_start = self.read_in_buffer
+ self.buffer_end = self.record_start
+
+ def __iter__(self):
+ return self
+
+ cdef _read_into_buffer(self):
+ cdef size_t read_in_size
+ cdef size_t leftover_size = self.buffer_end - self.record_start
+ cdef uint32_t block_size
+ memmove(self.read_in_buffer, self.record_start, leftover_size)
+ self.record_start = self.read_in_buffer
+ self.buffer_end = self.record_start + leftover_size
+ if leftover_size >= 4:
+ # Immediately check how much data is at least required
+ block_size = (<uint32_t *>self.record_start)[0]
+ read_in_size = max(block_size, self.read_in_size)
+ else:
+ read_in_size = self.read_in_size - leftover_size
+ new_bytes = self.file.read(read_in_size)
+ cdef size_t new_bytes_size = PyBytes_GET_SIZE(new_bytes)
+ cdef uint8_t *new_bytes_buf = <uint8_t *>PyBytes_AS_STRING(new_bytes)
+ cdef size_t new_buffer_size = leftover_size + new_bytes_size
+ if new_buffer_size == 0:
+ # File completely read
+ raise StopIteration()
+ elif new_bytes_size == 0:
+ raise EOFError("Incomplete record at the end of file")
+ cdef uint8_t *tmp
+ if new_buffer_size > self.read_in_buffer_size:
+ tmp = <uint8_t *>PyMem_Realloc(self.read_in_buffer, new_buffer_size)
+ if tmp == NULL:
+ raise MemoryError()
+ self.read_in_buffer = tmp
+ self.read_in_buffer_size = new_buffer_size
+ memcpy(self.read_in_buffer + leftover_size, new_bytes_buf, new_bytes_size)
+ self.record_start = self.read_in_buffer
+ self.buffer_end = self.record_start + new_buffer_size
+
+ def __next__(self):
+ cdef:
+ SequenceRecord seq_record
+ uint8_t *record_start
+ uint8_t *buffer_end
+ uint32_t record_size
+ uint8_t *record_end
+ cdef BamRecordHeader header
+ cdef uint8_t *bam_name_start
+ uint32_t name_length
+ uint8_t *bam_seq_start
+ uint32_t seq_length
+ uint8_t *bam_qual_start
+ uint32_t encoded_seq_length
+
+ while True:
+ record_start = self.record_start
+ buffer_end = self.buffer_end
+ if record_start + 4 >= buffer_end:
+ self._read_into_buffer()
+ continue
+ record_size = (<uint32_t *>record_start)[0]
+ record_end = record_start + 4 + record_size
+ if record_end > buffer_end:
+ self._read_into_buffer()
+ continue
+ header = (<BamRecordHeader *>record_start)[0]
+ if header.flag != 4:
+ raise NotImplementedError(
+ "The BAM parser has been implemented with unmapped single "
+ "reads in mind to support ONT sequencing input. Mapped "
+ "BAM files or files with multiple reads are not supported. "
+ "Please use samtools fastq to make the appropriate "
+ "conversion to FASTQ format."
+ )
+ bam_name_start = record_start + sizeof(BamRecordHeader)
+ name_length = header.l_read_name
+ bam_seq_start = bam_name_start + name_length + header.n_cigar_op * sizeof(uint32_t)
+ name_length -= 1 # Do not include the null byte
+ seq_length = header.l_seq
+ encoded_seq_length = (seq_length + 1) // 2
+ bam_qual_start = bam_seq_start + encoded_seq_length
+ name = PyUnicode_New(name_length, 127)
+ sequence = PyUnicode_New(seq_length, 127)
+ qualities = PyUnicode_New(seq_length, 127)
+ memcpy(<uint8_t *>PyUnicode_DATA(name), bam_name_start, name_length)
+ decode_bam_sequence(<uint8_t *>PyUnicode_DATA(sequence), bam_seq_start, seq_length)
+ decode_bam_qualities(<uint8_t *>PyUnicode_DATA(qualities), bam_qual_start, seq_length)
+ seq_record = SequenceRecord.__new__(SequenceRecord)
+ seq_record._name = name
+ seq_record._sequence = sequence
+ seq_record._qualities = qualities
+ self.number_of_records += 1
+ self.record_start = record_end
+ return seq_record
def record_names_match(header1: str, header2: str):
"""
=====================================
src/dnaio/bam.h
=====================================
@@ -0,0 +1,127 @@
+#include <stdint.h>
+#include <stddef.h>
+#include <string.h>
+#include <assert.h>
+
+#ifdef __SSE2__
+#include "emmintrin.h"
+#endif
+
+#ifdef __SSSE3__
+#include "tmmintrin.h"
+#endif
+
+static void
+decode_bam_sequence(uint8_t *dest, const uint8_t *encoded_sequence, size_t length)
+{
+ /* Reuse a trick from sam_internal.h in htslib. Have a table to lookup
+ two characters simultaneously.*/
+ static const char code2base[512] =
+ "===A=C=M=G=R=S=V=T=W=Y=H=K=D=B=N"
+ "A=AAACAMAGARASAVATAWAYAHAKADABAN"
+ "C=CACCCMCGCRCSCVCTCWCYCHCKCDCBCN"
+ "M=MAMCMMMGMRMSMVMTMWMYMHMKMDMBMN"
+ "G=GAGCGMGGGRGSGVGTGWGYGHGKGDGBGN"
+ "R=RARCRMRGRRRSRVRTRWRYRHRKRDRBRN"
+ "S=SASCSMSGSRSSSVSTSWSYSHSKSDSBSN"
+ "V=VAVCVMVGVRVSVVVTVWVYVHVKVDVBVN"
+ "T=TATCTMTGTRTSTVTTTWTYTHTKTDTBTN"
+ "W=WAWCWMWGWRWSWVWTWWWYWHWKWDWBWN"
+ "Y=YAYCYMYGYRYSYVYTYWYYYHYKYDYBYN"
+ "H=HAHCHMHGHRHSHVHTHWHYHHHKHDHBHN"
+ "K=KAKCKMKGKRKSKVKTKWKYKHKKKDKBKN"
+ "D=DADCDMDGDRDSDVDTDWDYDHDKDDDBDN"
+ "B=BABCBMBGBRBSBVBTBWBYBHBKBDBBBN"
+ "N=NANCNMNGNRNSNVNTNWNYNHNKNDNBNN";
+ static const uint8_t *nuc_lookup = (uint8_t *)"=ACMGRSVTWYHKDBN";
+ const uint8_t *dest_end_ptr = dest + length;
+ uint8_t *dest_cursor = dest;
+ const uint8_t *encoded_cursor = encoded_sequence;
+ #ifdef __SSSE3__
+ const uint8_t *dest_vec_end_ptr = dest_end_ptr - (2 * sizeof(__m128i));
+ __m128i first_upper_shuffle = _mm_setr_epi8(
+ 0, 0xff, 1, 0xff, 2, 0xff, 3, 0xff, 4, 0xff, 5, 0xff, 6, 0xff, 7, 0xff);
+ __m128i first_lower_shuffle = _mm_setr_epi8(
+ 0xff, 0, 0xff, 1, 0xff, 2, 0xff, 3, 0xff, 4, 0xff, 5, 0xff, 6, 0xff, 7);
+ __m128i second_upper_shuffle = _mm_setr_epi8(
+ 8, 0xff, 9, 0xff, 10, 0xff, 11, 0xff, 12, 0xff, 13, 0xff, 14, 0xff, 15, 0xff);
+ __m128i second_lower_shuffle = _mm_setr_epi8(
+ 0xff, 8, 0xff, 9, 0xff, 10, 0xff, 11, 0xff, 12, 0xff, 13, 0xff, 14, 0xff, 15);
+ __m128i nuc_lookup_vec = _mm_lddqu_si128((__m128i *)nuc_lookup);
+ /* Work on 16 encoded characters at the time resulting in 32 decoded characters
+ Examples are given for 8 encoded characters A until H to keep it readable.
+ Encoded stored as |AB|CD|EF|GH|
+ Shuffle into |AB|00|CD|00|EF|00|GH|00| and
+ |00|AB|00|CD|00|EF|00|GH|
+ Shift upper to the right resulting into
+ |0A|B0|0C|D0|0E|F0|0G|H0| and
+ |00|AB|00|CD|00|EF|00|GH|
+ Merge with or resulting into (X stands for garbage)
+ |0A|XB|0C|XD|0E|XF|0G|XH|
+ Bitwise and with 0b1111 leads to:
+ |0A|0B|0C|0D|0E|0F|0G|0H|
+ We can use the resulting 4-bit integers as indexes for the shuffle of
+ the nucleotide lookup. */
+ while (dest_cursor < dest_vec_end_ptr) {
+ __m128i encoded = _mm_lddqu_si128((__m128i *)encoded_cursor);
+
+ __m128i first_upper = _mm_shuffle_epi8(encoded, first_upper_shuffle);
+ __m128i first_lower = _mm_shuffle_epi8(encoded, first_lower_shuffle);
+ __m128i shifted_first_upper = _mm_srli_epi64(first_upper, 4);
+ __m128i first_merged = _mm_or_si128(shifted_first_upper, first_lower);
+ __m128i first_indexes = _mm_and_si128(first_merged, _mm_set1_epi8(0b1111));
+ __m128i first_nucleotides = _mm_shuffle_epi8(nuc_lookup_vec, first_indexes);
+ _mm_storeu_si128((__m128i *)dest_cursor, first_nucleotides);
+
+ __m128i second_upper = _mm_shuffle_epi8(encoded, second_upper_shuffle);
+ __m128i second_lower = _mm_shuffle_epi8(encoded, second_lower_shuffle);
+ __m128i shifted_second_upper = _mm_srli_epi64(second_upper, 4);
+ __m128i second_merged = _mm_or_si128(shifted_second_upper, second_lower);
+ __m128i second_indexes = _mm_and_si128(second_merged, _mm_set1_epi8(0b1111));
+ __m128i second_nucleotides = _mm_shuffle_epi8(nuc_lookup_vec, second_indexes);
+ _mm_storeu_si128((__m128i *)(dest_cursor + 16), second_nucleotides);
+
+ encoded_cursor += sizeof(__m128i);
+ dest_cursor += 2 * sizeof(__m128i);
+ }
+ #endif
+ /* Do two at the time until it gets to the last even address. */
+ const uint8_t *dest_end_ptr_twoatatime = dest + (length & (~1ULL));
+ while (dest_cursor < dest_end_ptr_twoatatime) {
+ /* According to htslib, size_t cast helps the optimizer.
+ Code confirmed to indeed run faster. */
+ memcpy(dest_cursor, code2base + ((size_t)*encoded_cursor * 2), 2);
+ dest_cursor += 2;
+ encoded_cursor += 1;
+ }
+ assert((dest_end_ptr - dest_cursor) < 2);
+ if (dest_cursor != dest_end_ptr) {
+ /* There is a single encoded nuc left */
+ uint8_t encoded_nucs = *encoded_cursor;
+ uint8_t upper_nuc_index = encoded_nucs >> 4;
+ dest_cursor[0] = nuc_lookup[upper_nuc_index];
+ }
+}
+
+static void
+decode_bam_qualities(uint8_t *dest, const uint8_t *encoded_qualities, size_t length)
+{
+ const uint8_t *end_ptr = encoded_qualities + length;
+ const uint8_t *cursor = encoded_qualities;
+ uint8_t *dest_cursor = dest;
+ #ifdef __SSE2__
+ const uint8_t *vec_end_ptr = end_ptr - sizeof(__m128i);
+ while (cursor < vec_end_ptr) {
+ __m128i quals = _mm_loadu_si128((__m128i *)cursor);
+ __m128i phreds = _mm_add_epi8(quals, _mm_set1_epi8(33));
+ _mm_storeu_si128((__m128i *)dest_cursor, phreds);
+ cursor += sizeof(__m128i);
+ dest_cursor += sizeof(__m128i);
+ }
+ #endif
+ while (cursor < end_ptr) {
+ *dest_cursor = *cursor + 33;
+ cursor += 1;
+ dest_cursor += 1;
+ }
+}
\ No newline at end of file
=====================================
src/dnaio/chunks.py
=====================================
@@ -34,6 +34,34 @@ def _fasta_head(buf: bytes, end: Optional[int] = None) -> int:
)
+def _paired_fasta_heads(
+ buf1: bytes, buf2: bytes, end1: int, end2: int
+) -> Tuple[int, int]:
+ """
+ Return positions pos1, pos2 where right1 <= end1 and right2 <= end2
+ such that buf1[:pos1] and buf2[:pos2] contain the same number of complete FASTA
+ records.
+ """
+ if end1 == 0 or end2 == 0:
+ return (0, 0)
+
+ if (end1 > 0 and buf1[:1] != b">") or (end2 > 0 and buf2[:1] != b">"):
+ raise FastaFormatError("FASTA file expected to start with '>'", line=None)
+
+ # Count complete records
+ n_records1 = buf1.count(b"\n>", 0, end1)
+ n_records2 = buf2.count(b"\n>", 0, end2)
+ n_records = min(n_records1, n_records2)
+
+ pos1 = pos2 = 0
+ while n_records > 0:
+ pos1 = buf1.find(b"\n>", pos1, end1) + 1
+ pos2 = buf2.find(b"\n>", pos2, end2) + 1
+ n_records -= 1
+
+ return (pos1, pos2)
+
+
def _fastq_head(buf: bytes, end: Optional[int] = None) -> int:
"""
Search for the end of the last complete *two* FASTQ records in buf[:end].
@@ -129,7 +157,7 @@ def read_paired_chunks(
buffer_size: int = 4 * 1024**2,
) -> Iterator[Tuple[memoryview, memoryview]]:
"""
- Read chunks of paired-end FASTQ reads from two files.
+ Read chunks of paired-end FASTA or FASTQ records from two files.
A pair of chunks (memoryview objects) is yielded on each iteration,
and both chunks are guaranteed to have the same number of sequences.
That is, the paired-end reads will stay in sync.
@@ -138,7 +166,6 @@ def read_paired_chunks(
buffer is re-used in the next iteration.
This is similar to `read_chunks`, but for paired-end data.
- Unlike `read_chunks`, this only works for FASTQ input.
Args:
f: File with R1 reads; must have been opened in binary mode
@@ -149,7 +176,7 @@ def read_paired_chunks(
Pairs of memoryview objects.
Raises:
- ValueError: A FASTQ record was encountered that is larger than *buffer_size*.
+ ValueError: A FASTA or FASTQ record was encountered that is larger than *buffer_size*.
"""
if buffer_size < 6:
raise ValueError("Buffer size too small")
@@ -160,27 +187,51 @@ def read_paired_chunks(
# Read one byte to make sure we are processing FASTQ
start1 = f.readinto(memoryview(buf1)[0:1])
start2 = f2.readinto(memoryview(buf2)[0:1])
- if (start1 == 1 and buf1[0:1] != b"@") or (start2 == 1 and buf2[0:1] != b"@"):
+
+ if start1 == 0 and start2 == 0:
+ return memoryview(b""), memoryview(b"")
+
+ if (start1 == 0) != (start2 == 0):
+ i = 2 if start1 == 0 else 1
+ raise FileFormatError(
+ f"Paired-end reads not in sync: File with R{i} reads is empty and the other is not",
+ line=None,
+ )
+
+ if buf1[:1] == b"@" != buf2[:1] == b"@":
raise FileFormatError(
"Paired-end data must be in FASTQ format when using multiple cores",
line=None,
)
+ if buf1[:1] == b"@":
+ file_format = "FASTQ"
+ paired_heads = _paired_fastq_heads
+ elif buf1[:1] == b">":
+ file_format = "FASTA"
+ paired_heads = _paired_fasta_heads
+ else:
+ raise FileFormatError(
+ "First character in input file must be '@' (FASTQ) or '>' (FASTA), "
+ f"but found {buf1[:1]}",
+ line=None,
+ )
+
while True:
if start1 == len(buf1) and start2 == len(buf2):
raise ValueError(
- f"FASTQ records do not fit into buffer of size {buffer_size}"
+ f"FASTA/FASTQ records do not fit into buffer of size {buffer_size}"
)
bufend1 = f.readinto(memoryview(buf1)[start1:]) + start1 # type: ignore
bufend2 = f2.readinto(memoryview(buf2)[start2:]) + start2 # type: ignore
if start1 == bufend1 and start2 == bufend2:
break
- end1, end2 = _paired_fastq_heads(buf1, buf2, bufend1, bufend2)
+ end1, end2 = paired_heads(buf1, buf2, bufend1, bufend2)
assert end1 <= bufend1
assert end2 <= bufend2
- if end1 > 0 or end2 > 0:
+ if end1 > 0 or end2 > 0 or file_format == "FASTA":
yield (memoryview(buf1)[0:end1], memoryview(buf2)[0:end2])
else:
assert end1 == 0 and end2 == 0
@@ -189,7 +240,7 @@ def read_paired_chunks(
i = 1 if bufend1 == 0 else 2
extra = f". File {i} ended, but more data found in the other file"
raise FileFormatError(
- f"Premature end of paired FASTQ input{extra}.", line=None
+ f"Premature end of paired-end input{extra}.", line=None
)
start1 = bufend1 - end1
assert start1 >= 0
=====================================
src/dnaio/readers.py
=====================================
@@ -9,7 +9,7 @@ from typing import Union, BinaryIO, Optional, Iterator, List
from xopen import xopen
-from ._core import FastqIter, SequenceRecord
+from ._core import BamIter, FastqIter, SequenceRecord
from ._util import shorten as _shorten
from .exceptions import FastaFormatError
from .interfaces import SingleEndReader
@@ -33,7 +33,7 @@ class BinaryFileReader:
):
"""
The file is a path or a file-like object. In both cases, the file may
- be compressed (.gz, .bz2, .xz).
+ be compressed (.gz, .bz2, .xz, .zst).
"""
if isinstance(file, str):
self._file = opener(file, self.mode)
@@ -80,7 +80,7 @@ class FastaReader(BinaryFileReader, SingleEndReader):
):
"""
file is a path or a file-like object. In both cases, the file may
- be compressed (.gz, .bz2, .xz).
+ be compressed (.gz, .bz2, .xz, .zst).
keep_linebreaks -- whether to keep newline characters in the sequence
"""
@@ -191,3 +191,54 @@ class FastqReader(BinaryFileReader, SingleEndReader):
return self._iter.number_of_records
except AttributeError:
return 0
+
+
+class BamReader(BinaryFileReader, SingleEndReader):
+ """
+ Reader for BAM files.
+
+ All records in the input BAM must be unmapped single-end reads
+ (with a flag value of 4).
+
+ While this class can be instantiated directly, the recommended way is to
+ use `dnaio.open` with appropriate arguments.
+ """
+
+ def __init__(
+ self,
+ file: Union[PathLike, str, BinaryIO],
+ *,
+ sequence_class=SequenceRecord,
+ buffer_size: int = 128 * 1024, # Buffer size used by cat, pigz etc.
+ opener=xopen,
+ _close_file: Optional[bool] = None,
+ ):
+ super().__init__(file, opener=opener, _close_file=_close_file)
+ self.sequence_class = sequence_class
+ self.delivers_qualities = True
+ self.buffer_size = buffer_size
+ try:
+ self._iter: Iterator[SequenceRecord] = BamIter(self._file, self.buffer_size)
+ except Exception:
+ self.close()
+ raise
+
+ self.two_headers: bool = False
+
+ def __iter__(self) -> Iterator[SequenceRecord]:
+ """Iterate over the records in this BAM file."""
+ return self._iter
+
+ @property
+ def number_of_records(self):
+ try:
+ return self._iter.number_of_records
+ except AttributeError:
+ return 0
+
+ @property
+ def header(self):
+ try:
+ return self._iter.header
+ except AttributeError:
+ return b""
=====================================
src/dnaio/singleend.py
=====================================
@@ -2,7 +2,7 @@ import os
from typing import Optional, Union, BinaryIO, Tuple
from .exceptions import UnknownFileFormat
-from .readers import FastaReader, FastqReader
+from .readers import BamReader, FastaReader, FastqReader
from .writers import FastaWriter, FastqWriter
@@ -13,7 +13,7 @@ def _open_single(
fileformat: Optional[str] = None,
mode: str = "r",
qualities: Optional[bool] = None,
-) -> Union[FastaReader, FastaWriter, FastqReader, FastqWriter]:
+) -> Union[BamReader, FastaReader, FastaWriter, FastqReader, FastqWriter]:
"""
Open a single sequence file.
"""
@@ -59,7 +59,11 @@ def _open_single(
if "r" in mode:
return FastqReader(file, _close_file=close_file)
return FastqWriter(file, _close_file=close_file)
-
+ elif fileformat == "bam":
+ if "r" in mode:
+ return BamReader(file, _close_file=close_file)
+ # This should not be reached
+ raise NotImplementedError("Only reading is supported for BAM files")
if close_file:
file.close()
raise UnknownFileFormat(
@@ -108,6 +112,8 @@ def _detect_format_from_name(name: str) -> Optional[str]:
return "fasta"
elif ext in [".fastq", ".fq"] or (ext == ".txt" and name.endswith("_sequence")):
return "fastq"
+ elif ext == "bam":
+ return "bam"
return None
@@ -117,15 +123,17 @@ def _detect_format_from_content(file: BinaryIO) -> Optional[str]:
"""
if file.seekable():
original_position = file.tell()
- first_char = file.read(1)
+ magic = file.read(4)
file.seek(original_position)
else:
# We cannot always use peek() because BytesIO objects do not suppert it
- first_char = file.peek(1)[0:1] # type: ignore
- formats = {
- b"@": "fastq",
- b">": "fasta",
- b"#": "fasta", # Some FASTA variants allow comments
- b"": "fastq", # Pretend FASTQ for empty input
- }
- return formats.get(first_char, None)
+ magic = file.peek(4)[0:4] # type: ignore
+ if magic.startswith(b"@") or magic == b"":
+ # Pretend FASTQ for empty input
+ return "fastq"
+ elif magic.startswith(b">") or magic.startswith(b"#"):
+ # Some FASTA variants allow comments
+ return "fasta"
+ elif magic == b"BAM\1":
+ return "bam"
+ return None
=====================================
tests/data/project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878.bwa.markDuplicates.bam
=====================================
Binary files /dev/null and b/tests/data/project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878.bwa.markDuplicates.bam differ
=====================================
tests/data/project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878.bwa.markDuplicates.header.sam
=====================================
@@ -0,0 +1,97 @@
+ at HD VN:1.4 SO:coordinate
+ at SQ SN:chrM LN:16571
+ at SQ SN:chr1 LN:249250621
+ at SQ SN:chr2 LN:243199373
+ at SQ SN:chr3 LN:198022430
+ at SQ SN:chr4 LN:191154276
+ at SQ SN:chr5 LN:180915260
+ at SQ SN:chr6 LN:171115067
+ at SQ SN:chr7 LN:159138663
+ at SQ SN:chr8 LN:146364022
+ at SQ SN:chr9 LN:141213431
+ at SQ SN:chr10 LN:135534747
+ at SQ SN:chr11 LN:135006516
+ at SQ SN:chr12 LN:133851895
+ at SQ SN:chr13 LN:115169878
+ at SQ SN:chr14 LN:107349540
+ at SQ SN:chr15 LN:102531392
+ at SQ SN:chr16 LN:90354753
+ at SQ SN:chr17 LN:81195210
+ at SQ SN:chr18 LN:78077248
+ at SQ SN:chr19 LN:59128983
+ at SQ SN:chr20 LN:63025520
+ at SQ SN:chr21 LN:48129895
+ at SQ SN:chr22 LN:51304566
+ at SQ SN:chrX LN:155270560
+ at SQ SN:chrY LN:59373566
+ at SQ SN:chr1_gl000191_random LN:106433
+ at SQ SN:chr1_gl000192_random LN:547496
+ at SQ SN:chr4_ctg9_hap1 LN:590426
+ at SQ SN:chr4_gl000193_random LN:189789
+ at SQ SN:chr4_gl000194_random LN:191469
+ at SQ SN:chr6_apd_hap1 LN:4622290
+ at SQ SN:chr6_cox_hap2 LN:4795371
+ at SQ SN:chr6_dbb_hap3 LN:4610396
+ at SQ SN:chr6_mann_hap4 LN:4683263
+ at SQ SN:chr6_mcf_hap5 LN:4833398
+ at SQ SN:chr6_qbl_hap6 LN:4611984
+ at SQ SN:chr6_ssto_hap7 LN:4928567
+ at SQ SN:chr7_gl000195_random LN:182896
+ at SQ SN:chr8_gl000196_random LN:38914
+ at SQ SN:chr8_gl000197_random LN:37175
+ at SQ SN:chr9_gl000198_random LN:90085
+ at SQ SN:chr9_gl000199_random LN:169874
+ at SQ SN:chr9_gl000200_random LN:187035
+ at SQ SN:chr9_gl000201_random LN:36148
+ at SQ SN:chr11_gl000202_random LN:40103
+ at SQ SN:chr17_ctg5_hap1 LN:1680828
+ at SQ SN:chr17_gl000203_random LN:37498
+ at SQ SN:chr17_gl000204_random LN:81310
+ at SQ SN:chr17_gl000205_random LN:174588
+ at SQ SN:chr17_gl000206_random LN:41001
+ at SQ SN:chr18_gl000207_random LN:4262
+ at SQ SN:chr19_gl000208_random LN:92689
+ at SQ SN:chr19_gl000209_random LN:159169
+ at SQ SN:chr21_gl000210_random LN:27682
+ at SQ SN:chrUn_gl000211 LN:166566
+ at SQ SN:chrUn_gl000212 LN:186858
+ at SQ SN:chrUn_gl000213 LN:164239
+ at SQ SN:chrUn_gl000214 LN:137718
+ at SQ SN:chrUn_gl000215 LN:172545
+ at SQ SN:chrUn_gl000216 LN:172294
+ at SQ SN:chrUn_gl000217 LN:172149
+ at SQ SN:chrUn_gl000218 LN:161147
+ at SQ SN:chrUn_gl000219 LN:179198
+ at SQ SN:chrUn_gl000220 LN:161802
+ at SQ SN:chrUn_gl000221 LN:155397
+ at SQ SN:chrUn_gl000222 LN:186861
+ at SQ SN:chrUn_gl000223 LN:180455
+ at SQ SN:chrUn_gl000224 LN:179693
+ at SQ SN:chrUn_gl000225 LN:211173
+ at SQ SN:chrUn_gl000226 LN:15008
+ at SQ SN:chrUn_gl000227 LN:128374
+ at SQ SN:chrUn_gl000228 LN:129120
+ at SQ SN:chrUn_gl000229 LN:19913
+ at SQ SN:chrUn_gl000230 LN:43691
+ at SQ SN:chrUn_gl000231 LN:27386
+ at SQ SN:chrUn_gl000232 LN:40652
+ at SQ SN:chrUn_gl000233 LN:45941
+ at SQ SN:chrUn_gl000234 LN:40531
+ at SQ SN:chrUn_gl000235 LN:34474
+ at SQ SN:chrUn_gl000236 LN:41934
+ at SQ SN:chrUn_gl000237 LN:45867
+ at SQ SN:chrUn_gl000238 LN:39939
+ at SQ SN:chrUn_gl000239 LN:33824
+ at SQ SN:chrUn_gl000240 LN:41933
+ at SQ SN:chrUn_gl000241 LN:42152
+ at SQ SN:chrUn_gl000242 LN:43523
+ at SQ SN:chrUn_gl000243 LN:43341
+ at SQ SN:chrUn_gl000244 LN:39929
+ at SQ SN:chrUn_gl000245 LN:36651
+ at SQ SN:chrUn_gl000246 LN:38154
+ at SQ SN:chrUn_gl000247 LN:36422
+ at SQ SN:chrUn_gl000248 LN:39786
+ at SQ SN:chrUn_gl000249 LN:38502
+ at RG ID:1 PL:Illumina PU:H7AP8ADXX_TAAGGCGA_1_NA12878 LB:library DS:SampleProject:NIST SM:NIST7035 CN:illuminafacility
+ at PG ID:bwa PN:bwa VN:0.7.5a-r405
+ at PG ID:MarkDuplicates PN:MarkDuplicates CL:net.sf.picard.sam.MarkDuplicates INPUT=[/misc/ClinicalGenomicsPipeline/dev/projects/CLINICAL_NIST/work_dirs/2013_1211_162011/run/phase1_phase1_list/project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878.bwa.bam] OUTPUT=/misc/ClinicalGenomicsPipeline/dev/projects/CLINICAL_NIST/work_dirs/2013_1211_162011/run/phase1_phase1_list/project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878.bwa.markDuplicates.bam METRICS_FILE=/misc/ClinicalGenomicsPipeline/dev/projects/CLINICAL_NIST/work_dirs/2013_1211_162011/run/phase1_phase1_list/project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878.bwa.markDuplicates.metrics TMP_DIR=[/misc/ClinicalGenomicsPipeline/dev/projects/CLINICAL_NIST/work_dirs/2013_1211_162011/run/phase1_phase1_list/.queue/tmp] VALIDATION_STRINGENCY=SILENT CREATE_INDEX=true PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates REMOVE_DUPLICATES=false ASSUME_SORTED=false MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false
=====================================
tests/data/project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878.bwa.markDuplicates.sam
=====================================
@@ -0,0 +1,100 @@
+ at HD VN:1.4 SO:coordinate
+ at SQ SN:chrM LN:16571
+ at SQ SN:chr1 LN:249250621
+ at SQ SN:chr2 LN:243199373
+ at SQ SN:chr3 LN:198022430
+ at SQ SN:chr4 LN:191154276
+ at SQ SN:chr5 LN:180915260
+ at SQ SN:chr6 LN:171115067
+ at SQ SN:chr7 LN:159138663
+ at SQ SN:chr8 LN:146364022
+ at SQ SN:chr9 LN:141213431
+ at SQ SN:chr10 LN:135534747
+ at SQ SN:chr11 LN:135006516
+ at SQ SN:chr12 LN:133851895
+ at SQ SN:chr13 LN:115169878
+ at SQ SN:chr14 LN:107349540
+ at SQ SN:chr15 LN:102531392
+ at SQ SN:chr16 LN:90354753
+ at SQ SN:chr17 LN:81195210
+ at SQ SN:chr18 LN:78077248
+ at SQ SN:chr19 LN:59128983
+ at SQ SN:chr20 LN:63025520
+ at SQ SN:chr21 LN:48129895
+ at SQ SN:chr22 LN:51304566
+ at SQ SN:chrX LN:155270560
+ at SQ SN:chrY LN:59373566
+ at SQ SN:chr1_gl000191_random LN:106433
+ at SQ SN:chr1_gl000192_random LN:547496
+ at SQ SN:chr4_ctg9_hap1 LN:590426
+ at SQ SN:chr4_gl000193_random LN:189789
+ at SQ SN:chr4_gl000194_random LN:191469
+ at SQ SN:chr6_apd_hap1 LN:4622290
+ at SQ SN:chr6_cox_hap2 LN:4795371
+ at SQ SN:chr6_dbb_hap3 LN:4610396
+ at SQ SN:chr6_mann_hap4 LN:4683263
+ at SQ SN:chr6_mcf_hap5 LN:4833398
+ at SQ SN:chr6_qbl_hap6 LN:4611984
+ at SQ SN:chr6_ssto_hap7 LN:4928567
+ at SQ SN:chr7_gl000195_random LN:182896
+ at SQ SN:chr8_gl000196_random LN:38914
+ at SQ SN:chr8_gl000197_random LN:37175
+ at SQ SN:chr9_gl000198_random LN:90085
+ at SQ SN:chr9_gl000199_random LN:169874
+ at SQ SN:chr9_gl000200_random LN:187035
+ at SQ SN:chr9_gl000201_random LN:36148
+ at SQ SN:chr11_gl000202_random LN:40103
+ at SQ SN:chr17_ctg5_hap1 LN:1680828
+ at SQ SN:chr17_gl000203_random LN:37498
+ at SQ SN:chr17_gl000204_random LN:81310
+ at SQ SN:chr17_gl000205_random LN:174588
+ at SQ SN:chr17_gl000206_random LN:41001
+ at SQ SN:chr18_gl000207_random LN:4262
+ at SQ SN:chr19_gl000208_random LN:92689
+ at SQ SN:chr19_gl000209_random LN:159169
+ at SQ SN:chr21_gl000210_random LN:27682
+ at SQ SN:chrUn_gl000211 LN:166566
+ at SQ SN:chrUn_gl000212 LN:186858
+ at SQ SN:chrUn_gl000213 LN:164239
+ at SQ SN:chrUn_gl000214 LN:137718
+ at SQ SN:chrUn_gl000215 LN:172545
+ at SQ SN:chrUn_gl000216 LN:172294
+ at SQ SN:chrUn_gl000217 LN:172149
+ at SQ SN:chrUn_gl000218 LN:161147
+ at SQ SN:chrUn_gl000219 LN:179198
+ at SQ SN:chrUn_gl000220 LN:161802
+ at SQ SN:chrUn_gl000221 LN:155397
+ at SQ SN:chrUn_gl000222 LN:186861
+ at SQ SN:chrUn_gl000223 LN:180455
+ at SQ SN:chrUn_gl000224 LN:179693
+ at SQ SN:chrUn_gl000225 LN:211173
+ at SQ SN:chrUn_gl000226 LN:15008
+ at SQ SN:chrUn_gl000227 LN:128374
+ at SQ SN:chrUn_gl000228 LN:129120
+ at SQ SN:chrUn_gl000229 LN:19913
+ at SQ SN:chrUn_gl000230 LN:43691
+ at SQ SN:chrUn_gl000231 LN:27386
+ at SQ SN:chrUn_gl000232 LN:40652
+ at SQ SN:chrUn_gl000233 LN:45941
+ at SQ SN:chrUn_gl000234 LN:40531
+ at SQ SN:chrUn_gl000235 LN:34474
+ at SQ SN:chrUn_gl000236 LN:41934
+ at SQ SN:chrUn_gl000237 LN:45867
+ at SQ SN:chrUn_gl000238 LN:39939
+ at SQ SN:chrUn_gl000239 LN:33824
+ at SQ SN:chrUn_gl000240 LN:41933
+ at SQ SN:chrUn_gl000241 LN:42152
+ at SQ SN:chrUn_gl000242 LN:43523
+ at SQ SN:chrUn_gl000243 LN:43341
+ at SQ SN:chrUn_gl000244 LN:39929
+ at SQ SN:chrUn_gl000245 LN:36651
+ at SQ SN:chrUn_gl000246 LN:38154
+ at SQ SN:chrUn_gl000247 LN:36422
+ at SQ SN:chrUn_gl000248 LN:39786
+ at SQ SN:chrUn_gl000249 LN:38502
+ at RG ID:1 PL:Illumina PU:H7AP8ADXX_TAAGGCGA_1_NA12878 LB:library DS:SampleProject:NIST SM:NIST7035 CN:illuminafacility
+ at PG ID:bwa PN:bwa VN:0.7.5a-r405
+ at PG ID:MarkDuplicates PN:MarkDuplicates CL:net.sf.picard.sam.MarkDuplicates INPUT=[/misc/ClinicalGenomicsPipeline/dev/projects/CLINICAL_NIST/work_dirs/2013_1211_162011/run/phase1_phase1_list/project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878.bwa.bam] OUTPUT=/misc/ClinicalGenomicsPipeline/dev/projects/CLINICAL_NIST/work_dirs/2013_1211_162011/run/phase1_phase1_list/project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878.bwa.markDuplicates.bam METRICS_FILE=/misc/ClinicalGenomicsPipeline/dev/projects/CLINICAL_NIST/work_dirs/2013_1211_162011/run/phase1_phase1_list/project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878.bwa.markDuplicates.metrics TMP_DIR=[/misc/ClinicalGenomicsPipeline/dev/projects/CLINICAL_NIST/work_dirs/2013_1211_162011/run/phase1_phase1_list/.queue/tmp] VALIDATION_STRINGENCY=SILENT CREATE_INDEX=true PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates REMOVE_DUPLICATES=false ASSUME_SORTED=false MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false
+HWI-D00119:50:H7AP8ADXX:1:1104:8519:18990 99 chrM 1 60 101M = 24 124 GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGG CCCFFFFFHFFHHJIJJIJGGJJJJJJJJJJJJJIGHIIEHIJJJJJJIJJJJIBGGIIIHIIIIHHHHDD;9CCDEDDDDDDDDDDEDDDDDDDDDDDDD X0:i:1 X1:i:0 MD:Z:72G28 PG:Z:MarkDuplicates RG:Z:1 XG:i:0 AM:i:37 NM:i:1 SM:i:37 XM:i:1 XO:i:0 XT:A:U
+HWI-D00119:50:H7AP8ADXX:1:2104:18479:82511 99 chrM 1 60 101M = 10 109 GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGG CCCFFFFFHFFHHJJJJIJJJJIIJJJJJGIJJJJGIJJJJJJJJGJIJJIJJJGHIJJJJJJJIHHHHDD@>CDDEDDDDDDDDDDEDDCDDDDD?BBD9 X0:i:1 X1:i:0 MD:Z:72G28 PG:Z:MarkDuplicates RG:Z:1 XG:i:0 AM:i:37 NM:i:1 SM:i:37 XM:i:1 XO:i:0 XT:A:U
+HWI-D00119:50:H7AP8ADXX:1:2105:7076:23015 163 chrM 1 60 101M = 59 159 GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGG @@CFFFDFGFHHHJIIJIJIJJJJJJJIIJJJJIIJIJFIIJJJJIIIGIJJJJDHIJIIJIJJJHHGGCB>BDDDDDDDDDDDBDDEDDDDDDDDDDDDD X0:i:1 X1:i:0 MD:Z:72G28 PG:Z:MarkDuplicates RG:Z:1 XG:i:0 AM:i:37 NM:i:1 SM:i:37 XM:i:1 XO:i:0 XT:A:U
=====================================
tests/data/project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878.bwa.markDuplicates.unmapped.bam
=====================================
Binary files /dev/null and b/tests/data/project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878.bwa.markDuplicates.unmapped.bam differ
=====================================
tests/data/project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878.bwa.markDuplicates.unmapped.sam
=====================================
@@ -0,0 +1,100 @@
+ at HD VN:1.4 SO:coordinate
+ at SQ SN:chrM LN:16571
+ at SQ SN:chr1 LN:249250621
+ at SQ SN:chr2 LN:243199373
+ at SQ SN:chr3 LN:198022430
+ at SQ SN:chr4 LN:191154276
+ at SQ SN:chr5 LN:180915260
+ at SQ SN:chr6 LN:171115067
+ at SQ SN:chr7 LN:159138663
+ at SQ SN:chr8 LN:146364022
+ at SQ SN:chr9 LN:141213431
+ at SQ SN:chr10 LN:135534747
+ at SQ SN:chr11 LN:135006516
+ at SQ SN:chr12 LN:133851895
+ at SQ SN:chr13 LN:115169878
+ at SQ SN:chr14 LN:107349540
+ at SQ SN:chr15 LN:102531392
+ at SQ SN:chr16 LN:90354753
+ at SQ SN:chr17 LN:81195210
+ at SQ SN:chr18 LN:78077248
+ at SQ SN:chr19 LN:59128983
+ at SQ SN:chr20 LN:63025520
+ at SQ SN:chr21 LN:48129895
+ at SQ SN:chr22 LN:51304566
+ at SQ SN:chrX LN:155270560
+ at SQ SN:chrY LN:59373566
+ at SQ SN:chr1_gl000191_random LN:106433
+ at SQ SN:chr1_gl000192_random LN:547496
+ at SQ SN:chr4_ctg9_hap1 LN:590426
+ at SQ SN:chr4_gl000193_random LN:189789
+ at SQ SN:chr4_gl000194_random LN:191469
+ at SQ SN:chr6_apd_hap1 LN:4622290
+ at SQ SN:chr6_cox_hap2 LN:4795371
+ at SQ SN:chr6_dbb_hap3 LN:4610396
+ at SQ SN:chr6_mann_hap4 LN:4683263
+ at SQ SN:chr6_mcf_hap5 LN:4833398
+ at SQ SN:chr6_qbl_hap6 LN:4611984
+ at SQ SN:chr6_ssto_hap7 LN:4928567
+ at SQ SN:chr7_gl000195_random LN:182896
+ at SQ SN:chr8_gl000196_random LN:38914
+ at SQ SN:chr8_gl000197_random LN:37175
+ at SQ SN:chr9_gl000198_random LN:90085
+ at SQ SN:chr9_gl000199_random LN:169874
+ at SQ SN:chr9_gl000200_random LN:187035
+ at SQ SN:chr9_gl000201_random LN:36148
+ at SQ SN:chr11_gl000202_random LN:40103
+ at SQ SN:chr17_ctg5_hap1 LN:1680828
+ at SQ SN:chr17_gl000203_random LN:37498
+ at SQ SN:chr17_gl000204_random LN:81310
+ at SQ SN:chr17_gl000205_random LN:174588
+ at SQ SN:chr17_gl000206_random LN:41001
+ at SQ SN:chr18_gl000207_random LN:4262
+ at SQ SN:chr19_gl000208_random LN:92689
+ at SQ SN:chr19_gl000209_random LN:159169
+ at SQ SN:chr21_gl000210_random LN:27682
+ at SQ SN:chrUn_gl000211 LN:166566
+ at SQ SN:chrUn_gl000212 LN:186858
+ at SQ SN:chrUn_gl000213 LN:164239
+ at SQ SN:chrUn_gl000214 LN:137718
+ at SQ SN:chrUn_gl000215 LN:172545
+ at SQ SN:chrUn_gl000216 LN:172294
+ at SQ SN:chrUn_gl000217 LN:172149
+ at SQ SN:chrUn_gl000218 LN:161147
+ at SQ SN:chrUn_gl000219 LN:179198
+ at SQ SN:chrUn_gl000220 LN:161802
+ at SQ SN:chrUn_gl000221 LN:155397
+ at SQ SN:chrUn_gl000222 LN:186861
+ at SQ SN:chrUn_gl000223 LN:180455
+ at SQ SN:chrUn_gl000224 LN:179693
+ at SQ SN:chrUn_gl000225 LN:211173
+ at SQ SN:chrUn_gl000226 LN:15008
+ at SQ SN:chrUn_gl000227 LN:128374
+ at SQ SN:chrUn_gl000228 LN:129120
+ at SQ SN:chrUn_gl000229 LN:19913
+ at SQ SN:chrUn_gl000230 LN:43691
+ at SQ SN:chrUn_gl000231 LN:27386
+ at SQ SN:chrUn_gl000232 LN:40652
+ at SQ SN:chrUn_gl000233 LN:45941
+ at SQ SN:chrUn_gl000234 LN:40531
+ at SQ SN:chrUn_gl000235 LN:34474
+ at SQ SN:chrUn_gl000236 LN:41934
+ at SQ SN:chrUn_gl000237 LN:45867
+ at SQ SN:chrUn_gl000238 LN:39939
+ at SQ SN:chrUn_gl000239 LN:33824
+ at SQ SN:chrUn_gl000240 LN:41933
+ at SQ SN:chrUn_gl000241 LN:42152
+ at SQ SN:chrUn_gl000242 LN:43523
+ at SQ SN:chrUn_gl000243 LN:43341
+ at SQ SN:chrUn_gl000244 LN:39929
+ at SQ SN:chrUn_gl000245 LN:36651
+ at SQ SN:chrUn_gl000246 LN:38154
+ at SQ SN:chrUn_gl000247 LN:36422
+ at SQ SN:chrUn_gl000248 LN:39786
+ at SQ SN:chrUn_gl000249 LN:38502
+ at RG ID:1 PL:Illumina PU:H7AP8ADXX_TAAGGCGA_1_NA12878 LB:library DS:SampleProject:NIST SM:NIST7035 CN:illuminafacility
+ at PG ID:bwa PN:bwa VN:0.7.5a-r405
+ at PG ID:MarkDuplicates PN:MarkDuplicates CL:net.sf.picard.sam.MarkDuplicates INPUT=[/misc/ClinicalGenomicsPipeline/dev/projects/CLINICAL_NIST/work_dirs/2013_1211_162011/run/phase1_phase1_list/project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878.bwa.bam] OUTPUT=/misc/ClinicalGenomicsPipeline/dev/projects/CLINICAL_NIST/work_dirs/2013_1211_162011/run/phase1_phase1_list/project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878.bwa.markDuplicates.bam METRICS_FILE=/misc/ClinicalGenomicsPipeline/dev/projects/CLINICAL_NIST/work_dirs/2013_1211_162011/run/phase1_phase1_list/project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878.bwa.markDuplicates.metrics TMP_DIR=[/misc/ClinicalGenomicsPipeline/dev/projects/CLINICAL_NIST/work_dirs/2013_1211_162011/run/phase1_phase1_list/.queue/tmp] VALIDATION_STRINGENCY=SILENT CREATE_INDEX=true PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates REMOVE_DUPLICATES=false ASSUME_SORTED=false MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false
+HWI-D00119:50:H7AP8ADXX:1:1104:8519:18990 4 chrM 1 60 101M = 24 124 GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGG CCCFFFFFHFFHHJIJJIJGGJJJJJJJJJJJJJIGHIIEHIJJJJJJIJJJJIBGGIIIHIIIIHHHHDD;9CCDEDDDDDDDDDDEDDDDDDDDDDDDD X0:i:1 X1:i:0 MD:Z:72G28 PG:Z:MarkDuplicates RG:Z:1 XG:i:0 AM:i:37 NM:i:1 SM:i:37 XM:i:1 XO:i:0 XT:A:U
+HWI-D00119:50:H7AP8ADXX:1:2104:18479:82511 4 chrM 1 60 101M = 10 109 GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGG CCCFFFFFHFFHHJJJJIJJJJIIJJJJJGIJJJJGIJJJJJJJJGJIJJIJJJGHIJJJJJJJIHHHHDD@>CDDEDDDDDDDDDDEDDCDDDDD?BBD9 X0:i:1 X1:i:0 MD:Z:72G28 PG:Z:MarkDuplicates RG:Z:1 XG:i:0 AM:i:37 NM:i:1 SM:i:37 XM:i:1 XO:i:0 XT:A:U
+HWI-D00119:50:H7AP8ADXX:1:2105:7076:23015 4 chrM 1 60 101M = 59 159 GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGG @@CFFFDFGFHHHJIIJIJIJJJJJJJIIJJJJIIJIJFIIJJJJIIIGIJJJJDHIJIIJIJJJHHGGCB>BDDDDDDDDDDDBDDEDDDDDDDDDDDDD X0:i:1 X1:i:0 MD:Z:72G28 PG:Z:MarkDuplicates RG:Z:1 XG:i:0 AM:i:37 NM:i:1 SM:i:37 XM:i:1 XO:i:0 XT:A:U
=====================================
tests/data/simple.unaligned.bam
=====================================
Binary files /dev/null and b/tests/data/simple.unaligned.bam differ
=====================================
tests/data/simplebamnoextension
=====================================
Binary files /dev/null and b/tests/data/simplebamnoextension differ
=====================================
tests/test_chunks.py
=====================================
@@ -1,9 +1,18 @@
+import textwrap
+
from pytest import raises
from io import BytesIO
+import dnaio
from dnaio import UnknownFileFormat, FileFormatError
from dnaio._core import paired_fastq_heads
-from dnaio.chunks import _fastq_head, _fasta_head, read_chunks, read_paired_chunks
+from dnaio.chunks import (
+ _fastq_head,
+ _fasta_head,
+ read_chunks,
+ read_paired_chunks,
+ _paired_fasta_heads,
+)
def test_fasta_head():
@@ -32,6 +41,52 @@ def test_fasta_head_with_comment():
assert _fasta_head(b"#\n>3\n5\n>") == 7
+def test_paired_fasta_heads():
+ def pheads(buf1, buf2):
+ return _paired_fasta_heads(buf1, buf2, len(buf1), len(buf2))
+
+ assert pheads(b"", b"") == (0, 0)
+ assert pheads(b">r", b">r") == (0, 0)
+ assert pheads(b">r\nA\n>s", b">r") == (0, 0)
+ assert pheads(b">r\nA\n>s", b">r\nCT\n>s") == (5, 6)
+ assert pheads(b">r\nA\n>s\nG\n>t\n", b">r\nCT\n>s") == (5, 6)
+
+ buf1 = (
+ textwrap.dedent(
+ """
+ >1
+ a
+ b
+ >2
+ c
+ >3
+ uv
+ """
+ )
+ .strip()
+ .encode()
+ )
+ buf2 = (
+ textwrap.dedent(
+ """
+ >1
+ def
+ >2
+ gh
+ i
+ >3
+ """
+ )
+ .strip()
+ .encode()
+ )
+
+ assert pheads(buf1, buf2) == (
+ len(b">1\na\nb\n>2\nc\n"),
+ len(b">1\ndef\n>2\ngh\ni\n"),
+ )
+
+
def test_paired_fastq_heads():
buf1 = b"first\nsecond\nthird\nfourth\nfifth"
buf2 = b"a\nb\nc\nd\ne\nf\ng"
@@ -67,13 +122,27 @@ def test_fastq_head():
assert _fastq_head(b"A\nB\nC\nD\nE\nF\nG\nH\nI\n") == 16
-def test_read_paired_chunks():
+def test_read_paired_chunks_fastq():
with open("tests/data/paired.1.fastq", "rb") as f1:
with open("tests/data/paired.2.fastq", "rb") as f2:
for c1, c2 in read_paired_chunks(f1, f2, buffer_size=128):
print(c1, c2)
+def test_paired_chunks_fasta(tmp_path):
+ for i in (1, 2):
+ with dnaio.open(f"tests/data/paired.{i}.fastq") as infile:
+ with dnaio.open(tmp_path / f"{i}.fasta", mode="w") as outfile:
+ for record in infile:
+ record.qualities = None
+ outfile.write(record)
+
+ with open(tmp_path / "1.fasta", "rb") as r1:
+ with open(tmp_path / "2.fasta", "rb") as r2:
+ for c1, c2 in read_paired_chunks(r1, r2, buffer_size=128):
+ print(c1.tobytes(), c2.tobytes())
+
+
def test_paired_chunks_different_number_of_records():
record = b"@r\nAA\n+\n##\n"
buf1 = record
=====================================
tests/test_internal.py
=====================================
@@ -1,3 +1,5 @@
+import gzip
+import io
import os
import shutil
import subprocess
@@ -12,6 +14,7 @@ from pytest import raises, mark
import dnaio
from dnaio import (
+ BamReader,
FileFormatError,
FastaFormatError,
FastqFormatError,
@@ -706,3 +709,97 @@ class TestRecordsAreMates:
with pytest.raises(TypeError) as error:
records_are_mates(SequenceRecord("A", "A", "A")) # type: ignore
error.match("records_are_mates requires at least two arguments")
+
+
+class TestBamReader:
+ bam_file = (
+ TEST_DATA / "project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878"
+ ".bwa.markDuplicates.unmapped.bam"
+ )
+ raw_bam_bytes = gzip.decompress(bam_file.read_bytes())
+ complete_record_with_header = raw_bam_bytes[:6661]
+ complete_header = complete_record_with_header[:6359]
+
+ def test_parse_bam(self):
+ with dnaio.open(self.bam_file) as reader:
+ records = list(reader)
+ assert len(records) == 3
+ assert reader.number_of_records == 3
+ assert records[0].name == "HWI-D00119:50:H7AP8ADXX:1:1104:8519:18990"
+ assert records[0].sequence == (
+ "GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCT"
+ "GGGGGGTATGCACGCGATAGCATTGCGAGACGCTGG"
+ )
+ assert records[0].qualities == (
+ "CCCFFFFFHFFHHJIJJIJGGJJJJJJJJJJJJJIGHIIEHIJJJJJJIJJJJIBGGIIIHIIII"
+ "HHHHDD;9CCDEDDDDDDDDDDEDDDDDDDDDDDDD"
+ )
+ assert records[1].name == "HWI-D00119:50:H7AP8ADXX:1:2104:18479:82511"
+ assert records[1].sequence == (
+ "GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCT"
+ "GGGGGGTATGCACGCGATAGCATTGCGAGACGCTGG"
+ )
+ assert records[1].qualities == (
+ "CCCFFFFFHFFHHJJJJIJJJJIIJJJJJGIJJJJGIJJJJJJJJGJIJJIJJJGHIJJJJJJJI"
+ "HHHHDD@>CDDEDDDDDDDDDDEDDCDDDDD?BBD9"
+ )
+ assert records[2].name == "HWI-D00119:50:H7AP8ADXX:1:2105:7076:23015"
+ assert records[2].sequence == (
+ "GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCT"
+ "GGGGGGTATGCACGCGATAGCATTGCGAGACGCTGG"
+ )
+ assert records[2].qualities == (
+ "@@CFFFDFGFHHHJIIJIJIJJJJJJJIIJJJJIIJIJFIIJJJJIIIGIJJJJDHIJIIJIJJJ"
+ "HHGGCB>BDDDDDDDDDDDBDDEDDDDDDDDDDDDD"
+ )
+
+ def test_parse_header(self):
+ header = (
+ Path(__file__).parent
+ / "data"
+ / "project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878.bwa"
+ ".markDuplicates.header.sam"
+ )
+ header_bytes = header.read_bytes()
+ with dnaio.open(self.bam_file) as bam:
+ assert bam.header == header_bytes
+
+ @pytest.mark.parametrize(
+ "end", range(len(complete_header) + 1, len(complete_record_with_header))
+ )
+ def test_truncated_record(self, end: int):
+ file = io.BytesIO(self.complete_record_with_header[:end])
+ with pytest.raises(EOFError) as e:
+ list(BamReader(file))
+ e.match("Incomplete record at the end of file")
+
+ @pytest.mark.parametrize("end", [3, 5, 2000, 6000])
+ def test_truncated_header(self, end):
+ file = io.BytesIO(self.complete_record_with_header[:end])
+ with pytest.raises(EOFError) as e:
+ list(BamReader(file))
+ e.match("Truncated BAM file")
+
+ def test_bam_parser_not_binary_error(self):
+ file = io.StringIO(
+ "Don't be too proud of this technological terror you have constructed."
+ )
+ with pytest.raises(TypeError) as error:
+ BamReader(file)
+ error.match("binary IO")
+
+ @pytest.mark.parametrize("buffersize", [4, 8, 10, 20, 40])
+ def test_small_buffersize(self, buffersize):
+ reader = BamReader(str(self.bam_file), buffer_size=buffersize)
+ assert len(list(reader)) == 3
+
+ def test_error_on_mapped_bam(self):
+ bam = TEST_DATA / (
+ "project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878"
+ ".bwa.markDuplicates.bam"
+ )
+ reader = BamReader(str(bam))
+ it = iter(reader)
+ with pytest.raises(NotImplementedError) as error:
+ next(it)
+ assert error.match("unmapped single reads")
=====================================
tests/test_open.py
=====================================
@@ -182,6 +182,18 @@ def test_detect_compressed_fastq_from_content() -> None:
assert record.name == "prefix:1_13_573/1"
+def test_detect_bam_from_content() -> None:
+ with dnaio.open("tests/data/simplebamnoextension") as f:
+ record = next(iter(f))
+ assert record.name == "Myheader"
+
+
+def test_detect_bam_from_filename() -> None:
+ with dnaio.open("tests/data/simple.unaligned.bam") as f:
+ record = next(iter(f))
+ assert record.name == "Myheader"
+
+
def test_write(tmp_path, extension) -> None:
out_fastq = tmp_path / ("out.fastq" + extension)
with dnaio.open(str(out_fastq), mode="w") as f:
=====================================
tox.ini
=====================================
@@ -31,6 +31,15 @@ deps =
pytest
commands = mypy src/ tests/
+[testenv:asan]
+setenv=
+ PYTHONDEVMODE=1
+ PYTHONMALLOC=malloc
+ CFLAGS=-lasan -fsanitize=address -fno-omit-frame-pointer
+allowlist_externals=bash
+commands=
+ bash -c 'export LD_PRELOAD=$(gcc -print-file-name=libasan.so) && printenv LD_PRELOAD && python -c "import dnaio" && pytest tests'
+
[testenv:docs]
basepython = python3.10
changedir = doc
View it on GitLab: https://salsa.debian.org/med-team/python-dnaio/-/commit/23bbce11b11a2fca7e6d96f59c33e6b18a709282
--
View it on GitLab: https://salsa.debian.org/med-team/python-dnaio/-/commit/23bbce11b11a2fca7e6d96f59c33e6b18a709282
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20240216/a57e250b/attachment-0001.htm>
More information about the debian-med-commit
mailing list