[med-svn] [Git][med-team/python-cogent][upstream] New upstream version 2026.6.2a0+dfsg
Karsten Schöke (@karso)
gitlab at salsa.debian.org
Fri Jun 5 05:22:23 BST 2026
Karsten Schöke pushed to branch upstream at Debian Med / python-cogent
Commits:
a39393dc by Karsten Schöke at 2026-06-04T13:08:20+02:00
New upstream version 2026.6.2a0+dfsg
- - - - -
12 changed files:
- .github/workflows/codeql.yml
- changelog.md
- doc/cookbook/annotation_db.rst
- doc/cookbook/features.rst
- src/cogent3/_version.py
- src/cogent3/core/alignment.py
- src/cogent3/core/annotation_db.py
- src/cogent3/core/sequence.py
- src/cogent3/evolve/coevolution.py
- tests/test_core/test_aln_annotation.py
- tests/test_core/test_annotation_db.py
- tests/test_evolve/test_coevolution.py
Changes:
=====================================
.github/workflows/codeql.yml
=====================================
@@ -43,7 +43,7 @@ jobs:
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
- uses: github/codeql-action/init at v4
+ uses: github/codeql-action/init at v4.35.5
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
@@ -57,7 +57,7 @@ jobs:
# Autobuild attempts to build any compiled languages (C/C++, C#, Go, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
- uses: github/codeql-action/autobuild at v4
+ uses: github/codeql-action/autobuild at v4.35.5
# ℹ️ Command-line programs to run using the OS shell.
# 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun
@@ -70,6 +70,6 @@ jobs:
# ./location_of_script_within_repo/buildscript.sh
- name: Perform CodeQL Analysis
- uses: github/codeql-action/analyze at v4
+ uses: github/codeql-action/analyze at v4.35.5
with:
category: "/language:${{matrix.language}}"
=====================================
changelog.md
=====================================
@@ -1,4 +1,38 @@
+<a id='changelog-2026.6.2a0'></a>
+# Changes in release "2026.6.2a0"
+
+A bug fix and minor enhancement release.
+
+## Contributors
+
+- @GavinHuttley
+- @sanvila reported a bug and proposed a fix 🚀
+
+## Enhancements
+
+- Added `iter_fastq_records` to `cogent3.parse.fastq` for streaming fastq records
+ as `(label, sequence, quality)` tuples with optional bytes converters.
+- Added `make_qual_converter` and `PhredEncoding` to `cogent3.core.alphabet` for
+ mapping Phred+33 / Phred+64 quality ASCII into `numpy.uint8` score arrays.
+- Added `limit` keyword argument to all methods that select annotations. Propagated
+ through to the `AnnotationDb.get_(features|records)_matching()` methods. Applied
+ in aggregate across all tables.
+
+## Bug fixes
+
+- `load_annotations()` now correctly uses `format_name`. For cases where the file
+ name suffix did not match a known format, `format_name` argument was having no
+ effect. This is now fixed.
+- the `coevolution_matrix()` function now correctly caps `max_workers`, thanks to
+ @sanvila for reporting and proposing the fix!
+
+## Documentation
+
+- Added cookbook sections covering direct fastq parsing with `iter_fastq_records`
+ and constructing quality-score converters with `make_qual_converter`.
+- Added docs describing the new `limit` argument.
+
<a id='changelog-2026.5.19a0'></a>
# Changes in release "2026.5.19a0"
=====================================
doc/cookbook/annotation_db.rst
=====================================
@@ -185,6 +185,25 @@ For example, we can query for all CDS related to replication:
.. note:: Extended attribute querying only works for GFF databases!
+Limiting the number of returned records
+"""""""""""""""""""""""""""""""""""""""
+
+Both ``get_features_matching()`` and ``get_records_matching()`` accept an optional ``limit`` argument that caps the total number of records yielded. This is useful when probing a large database or building previews.
+
+.. jupyter-execute::
+
+ first_two_genes = list(gff_db.get_features_matching(biotype="gene", limit=2))
+ first_two_genes
+
+The same applies to ``get_records_matching()``:
+
+.. jupyter-execute::
+
+ first_record = list(gff_db.get_records_matching(biotype="CDS", limit=1))
+ first_record
+
+``limit`` counts across all internal tables (e.g. the "gff" and "user" tables of a ``GffAnnotationDb``). It must be a positive integer; ``limit=None`` (the default) returns all matching records.
+
How to interrogate an ``AnnotationDb``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
=====================================
doc/cookbook/features.rst
=====================================
@@ -344,6 +344,18 @@ We can again provide a combination of conditions, for example, querying for all
mRNA = list(seq.get_features(start=10148, stop=29322, biotype="mRNA"))[0]
mRNA
+Limiting the number of returned Features
+++++++++++++++++++++++++++++++++++++++++
+
+Both ``Sequence.get_features()`` and the equivalent methods on ``SequenceCollection`` and ``Alignment`` accept an optional ``limit`` argument that caps the total number of features yielded. This is handy when a query would otherwise return a very large set.
+
+.. jupyter-execute::
+
+ first_three = list(seq.get_features(biotype="CDS", limit=3))
+ first_three
+
+``limit`` must be a positive integer; ``limit=None`` (the default) returns all matching features. For an ``Alignment`` the limit applies to the combined total of sequence-level and alignment-level features.
+
Querying a Sequence (via an Alignment) for Features
"""""""""""""""""""""""""""""""""""""""""""""""""""
=====================================
src/cogent3/_version.py
=====================================
@@ -1 +1 @@
-__version__ = "2026.5.25a0"
+__version__ = "2026.6.2a0"
=====================================
src/cogent3/core/alignment.py
=====================================
@@ -575,6 +575,7 @@ class CollectionBase(AnnotatableMixin, ABC, Generic[TSequenceOrAligned]):
biotype: str | tuple[str, ...] | list[str] | set[str] | None = None,
name: str | None = None,
allow_partial: bool = False,
+ limit: int | None = None,
**kwargs: Any,
) -> Iterator[Feature[Any]]: ...
@@ -2300,6 +2301,7 @@ class SequenceCollection(CollectionBase[c3_sequence.Sequence]):
allow_partial: bool = False,
start: int | None = None,
stop: int | None = None,
+ limit: int | None = None,
**kwargs: Any,
) -> Iterator[Feature[c3_sequence.Sequence]]:
"""yields Feature instances
@@ -2318,6 +2320,9 @@ class SequenceCollection(CollectionBase[c3_sequence.Sequence]):
stop position of the feature (inclusive)
allow_partial
allow features partially overlaping self
+ limit
+ maximum total number of features to yield across all sequences.
+ If None, all matching features are returned. Must be positive.
kwargs
additional keyword arguments to query the annotation db
@@ -2333,6 +2338,10 @@ class SequenceCollection(CollectionBase[c3_sequence.Sequence]):
if not self.has_annotation_db():
return None
+ if limit is not None and limit <= 0:
+ msg = f"limit must be positive, got {limit!r}"
+ raise ValueError(msg)
+
if seqid and (seqid not in self.names):
msg = f"unknown {seqid=}"
raise ValueError(msg)
@@ -2341,7 +2350,8 @@ class SequenceCollection(CollectionBase[c3_sequence.Sequence]):
# if no seqids provided, we do direct search to find the seqids
# that match the other parameters. This matters for collections
# with large numbers of sequences due to the overhead of creating
- # Sequence instances
+ # Sequence instances. We do NOT apply limit here because we need
+ # to discover all candidate seqids before yielding from any of them.
matched = {
record["seqid"]
for record in self.annotation_db.get_features_matching(
@@ -2359,16 +2369,23 @@ class SequenceCollection(CollectionBase[c3_sequence.Sequence]):
"list[str]", [seqid] if isinstance(seqid, str) else self.names
)
+ yielded = 0
for seqid in seqids:
seq = self.seqs[seqid]
- yield from seq.get_features(
+ remaining = None if limit is None else limit - yielded
+ for feature in seq.get_features(
biotype=biotype,
name=name,
start=start,
stop=stop,
allow_partial=allow_partial,
+ limit=remaining,
**kwargs,
- )
+ ):
+ yield feature
+ yielded += 1
+ if limit is not None and yielded >= limit:
+ return
def is_ragged(self) -> bool:
"""rerturns True if sequences are of different lengths"""
@@ -3372,6 +3389,7 @@ class Alignment(CollectionBase[Aligned]):
biotype: str | None = None,
name: str | None = None,
allow_partial: bool = False,
+ limit: int | None = None,
) -> Iterator[Feature[Alignment]]:
"""yields Feature instances
@@ -3385,6 +3403,9 @@ class Alignment(CollectionBase[Aligned]):
name of the feature
allow_partial
allow features partially overlaping self
+ limit
+ maximum total number of features to yield across all sequences.
+ If None, all matching features are returned.
Notes
-----
@@ -3407,6 +3428,7 @@ class Alignment(CollectionBase[Aligned]):
msg = f"unknown {seqid=}"
raise ValueError(msg)
+ yielded = 0
for seqid in seqids:
seqname = seqid_to_seqname[seqid]
seq = self.seqs[seqname]
@@ -3417,6 +3439,7 @@ class Alignment(CollectionBase[Aligned]):
# to the alignment coordinates
offset = self.storage.offset.get(seqid, 0)
+ remaining = None if limit is None else limit - yielded
for feature in self.annotation_db.get_features_matching(
seqid=parent_id,
biotype=biotype,
@@ -3425,11 +3448,15 @@ class Alignment(CollectionBase[Aligned]):
allow_partial=allow_partial,
start=start + offset,
stop=stop + offset,
+ limit=remaining,
):
if offset:
feature["spans"] = (numpy.array(feature["spans"]) - offset).tolist()
# passing self only used when self is an Alignment
yield seq.make_feature(feature, self)
+ yielded += 1
+ if limit is not None and yielded >= limit:
+ return
def get_features(
self,
@@ -3439,6 +3466,7 @@ class Alignment(CollectionBase[Aligned]):
name: str | None = None,
allow_partial: bool = False,
on_alignment: bool | None = None,
+ limit: int | None = None,
**kwargs: Any,
) -> Iterator[Feature[Alignment]]:
"""yields Feature instances
@@ -3456,6 +3484,10 @@ class Alignment(CollectionBase[Aligned]):
SequenceCollection instances.
allow_partial
allow features partially overlaping self
+ limit
+ maximum total number of features to yield across sequence-level
+ and alignment-level features. If None, all matching features
+ are returned. Must be positive.
Notes
-----
@@ -3468,42 +3500,48 @@ class Alignment(CollectionBase[Aligned]):
if not self.has_annotation_db() or not len(self._annotation_db):
return None
- # we only do on-alignment in here
+ if limit is not None and limit <= 0:
+ msg = f"limit must be positive, got {limit!r}"
+ raise ValueError(msg)
+
+ remaining = limit
if not on_alignment:
- local_vars = locals()
- kwargs = {k: v for k, v in local_vars.items() if k != "self"}
- kwargs.pop("on_alignment")
- yield from self._get_seq_features(**kwargs)
+ for feature in self._get_seq_features(
+ seqid=cast("str | None", seqid),
+ biotype=cast("str | None", biotype),
+ name=name,
+ allow_partial=allow_partial,
+ limit=remaining,
+ ):
+ yield feature
+ if remaining is not None:
+ remaining -= 1
if on_alignment == False: # noqa: E712
return
+ if remaining == 0:
+ return
+
seq_map = None
- for feature in self.annotation_db.get_features_matching(
+ strand: int | str | None
+ for record in self.annotation_db.get_features_matching(
biotype=biotype,
name=name,
- on_alignment=on_alignment,
+ on_alignment=True,
allow_partial=allow_partial,
+ limit=remaining,
):
- if feature["seqid"]:
- continue
- on_al = cast("bool", feature.pop("on_alignment", on_alignment)) # type: ignore[misc]
- if feature["seqid"]:
- msg = f"{on_alignment=} {feature=}"
- raise RuntimeError(msg)
-
- strand: int | str | None
+ on_al = cast("bool", record.pop("on_alignment", True)) # type: ignore[misc]
if seq_map is None:
seq_map = self.seqs[0].map.to_feature_map()
*_, strand = self.seqs[0].seq.parent_coordinates()
else:
- strand = feature.pop("strand", None) # type: ignore[misc]
-
- spans = seq_map.relative_position(numpy.array(feature["spans"]))
- feature["spans"] = spans.tolist()
- # and if i've been reversed...?
- feature["strand"] = cast("int", Strand.from_value(strand).value)
- yield self.make_feature(feature=feature, on_alignment=on_al)
+ strand = record.pop("strand", None) # type: ignore[misc]
+ spans = seq_map.relative_position(numpy.array(record["spans"]))
+ record["spans"] = spans.tolist()
+ record["strand"] = cast("int", Strand.from_value(strand).value)
+ yield self.make_feature(feature=record, on_alignment=on_al)
def is_ragged(self) -> bool:
"""by definition False for an Alignment"""
=====================================
src/cogent3/core/annotation_db.py
=====================================
@@ -470,6 +470,7 @@ class AnnotationDbABC(abc.ABC):
attributes: str | None = None,
on_alignment: bool | None = None,
allow_partial: bool = False,
+ limit: int | None = None,
) -> Iterator[FeatureDataType]: ...
@abc.abstractmethod
@@ -485,6 +486,7 @@ class AnnotationDbABC(abc.ABC):
attributes: str | None = None,
on_alignment: bool | None = None,
allow_partial: bool = False,
+ limit: int | None = None,
) -> Iterator[dict[str, Any]]: ...
@abc.abstractmethod
@@ -720,6 +722,7 @@ def _select_records_sql(
conditions: dict[str, Any],
columns: Iterable[str] | None = None,
allow_partial: bool = True,
+ limit: int | None = None,
) -> tuple[str, tuple[Any, ...] | None]:
"""create SQL select statement and values
@@ -734,6 +737,9 @@ def _select_records_sql(
allow_partial
if False, only records within start, stop are included. If True,
all records that overlap the segment defined by start, stop are included.
+ limit
+ maximum number of records to return. If None, all matching records
+ are returned.
Returns
-------
@@ -747,10 +753,11 @@ def _select_records_sql(
)
columns_str = f"{', '.join(columns)}" if columns else "*"
sql = f"SELECT {columns_str} FROM {table_name}"
+ limit_clause = f" LIMIT {limit}" if limit is not None else ""
if not where:
- return sql, None
+ return f"{sql}{limit_clause}", None
- sql = f"{sql} WHERE {where};"
+ sql = f"{sql} WHERE {where}{limit_clause};"
return sql, vals
@@ -1504,6 +1511,7 @@ class SqliteAnnotationDbMixin:
kwargs["attributes"] = f"%{kwargs['attributes']}%"
columns = kwargs.pop("columns", None)
allow_partial = kwargs.pop("allow_partial", False)
+ limit = kwargs.pop("limit", None)
# Translate query conditions and column names for normalized schema
if self._lookup_cache is not None:
@@ -1515,6 +1523,7 @@ class SqliteAnnotationDbMixin:
conditions=kwargs,
columns=columns,
allow_partial=allow_partial,
+ limit=limit,
)
with contextlib.suppress(sqlite3.ProgrammingError):
# garbage collection issue
@@ -1797,11 +1806,22 @@ class SqliteAnnotationDbMixin:
attributes: str | None = None,
on_alignment: bool | None = None,
allow_partial: bool = False,
+ limit: int | None = None,
) -> Iterator[dict[str, Any]]:
- """return all fields for matching records"""
+ """return all fields for matching records
+
+ Parameters
+ ----------
+ limit
+ maximum total number of records to yield across all tables.
+ If None, all matching records are returned. Must be positive.
+ """
# a record is Everything, a Feature is a subset
# we define query as all defined variables from local name space,
# excluding "self" and kwargs at default values
+ if limit is not None and limit <= 0:
+ msg = f"limit must be positive, got {limit!r}"
+ raise ValueError(msg)
local_vars = locals()
kwargs = {k: v for k, v in local_vars.items() if k != "self" and v is not None}
if "strand" in kwargs:
@@ -1809,10 +1829,16 @@ class SqliteAnnotationDbMixin:
# alignment features are created by the user specific
table_names = ["user"] if on_alignment else self.table_names
+ yielded = 0
for table_name in table_names:
+ if limit is not None:
+ kwargs["limit"] = limit - yielded
for result in self._get_records_matching(table_name, **kwargs):
res = dict(zip(result.keys(), result, strict=False))
yield self._translate_record_from_ids(res)
+ yielded += 1
+ if limit is not None and yielded >= limit:
+ return
def get_features_matching(
self,
@@ -1826,10 +1852,22 @@ class SqliteAnnotationDbMixin:
attributes: str | None = None,
on_alignment: bool | None = None,
allow_partial: bool = False,
+ limit: int | None = None,
) -> Iterator[FeatureDataType]:
+ """yield essential values to create a Feature for matching records
+
+ Parameters
+ ----------
+ limit
+ maximum total number of features to yield across all tables.
+ If None, all matching features are returned. Must be positive.
+ """
# returns essential values to create a Feature
# we define query as all defined variables from local name space,
# excluding "self" and kwargs with a default value of None
+ if limit is not None and limit <= 0:
+ msg = f"limit must be positive, got {limit!r}"
+ raise ValueError(msg)
local_vars = locals()
kwargs = {k: v for k, v in local_vars.items() if k != "self" and v is not None}
if "strand" in kwargs:
@@ -1837,6 +1875,7 @@ class SqliteAnnotationDbMixin:
# alignment features are created by the user specific
table_names = ["user"] if on_alignment else self.table_names
+ yielded = 0
for table_name in table_names:
columns: tuple[str, ...] = ("seqid", "biotype", "spans", "strand", "name")
query_args = {**kwargs}
@@ -1846,6 +1885,9 @@ class SqliteAnnotationDbMixin:
else:
query_args.pop("on_alignment", None)
+ if limit is not None:
+ query_args["limit"] = limit - yielded
+
for result in self._get_records_matching(
table_name=table_name,
columns=columns,
@@ -1857,6 +1899,9 @@ class SqliteAnnotationDbMixin:
res["on_alignment"] = res.get("on_alignment")
res["spans"] = [cast("tuple[int, int]", tuple(c)) for c in res["spans"]]
yield cast("FeatureDataType", res)
+ yielded += 1
+ if limit is not None and yielded >= limit:
+ return
def num_matches(
self,
=====================================
src/cogent3/core/sequence.py
=====================================
@@ -1129,6 +1129,7 @@ class Sequence(AnnotatableMixin):
start: int | None = None,
stop: int | None = None,
allow_partial: bool = False,
+ limit: int | None = None,
**kwargs: Any,
) -> Iterator[Feature[Sequence]]:
"""yields Feature instances
@@ -1142,6 +1143,9 @@ class Sequence(AnnotatableMixin):
start, stop
start, stop positions to search between, relative to offset
of this sequence. If not provided, entire span of sequence is used.
+ limit
+ maximum number of features to yield. If None, all matching
+ features are returned. Must be positive.
kwargs
keyword arguments passed to annotation_db.get_features_matching()
@@ -1215,7 +1219,7 @@ class Sequence(AnnotatableMixin):
# To piggy-back on that method we need to convert our feature spans
# into the current orientation. HOWEVER, we also have the reversed
# flag which comes back from the db
- kwargs |= {"allow_partial": allow_partial}
+ kwargs |= {"allow_partial": allow_partial, "limit": limit}
for feature in self.annotation_db.get_features_matching(
seqid=parent_id,
name=name,
=====================================
src/cogent3/evolve/coevolution.py
=====================================
@@ -869,6 +869,8 @@ def coevolution_matrix(
prev_threads = None
if max_workers is not None and parallel:
prev_threads = numba.get_num_threads()
+ # don't exceed available threads
+ max_workers = min(max_workers, numba.get_num_threads())
numba.set_num_threads(max_workers)
try:
=====================================
tests/test_core/test_aln_annotation.py
=====================================
@@ -1099,3 +1099,99 @@ def test_get_feature_seqs_offset_minus_strand(mk_cls):
got = got if mk_cls == c3_alignment.make_unaligned_seqs else got.get_seq("s1")
assert str(got) == expect
close_dbs(coll)
+
+
+ at pytest.fixture
+def coll_with_many_features():
+ data = {
+ "s1": "AAAATTTTGGGGCCCC",
+ "s2": "AAAATTTTGGGGCCCC",
+ }
+ coll = c3_alignment.make_unaligned_seqs(data, moltype="dna")
+ for seqid in ("s1", "s2"):
+ for i in range(3):
+ coll.annotation_db.add_feature(
+ seqid=seqid,
+ biotype="exon",
+ name=f"{seqid}_exon{i}",
+ spans=[(i * 4, i * 4 + 3)],
+ strand="+",
+ )
+ yield coll
+ close_dbs(coll)
+
+
+ at pytest.mark.parametrize("limit", [1, 3, 5, 6, 100])
+def test_collection_get_features_limit(coll_with_many_features, limit):
+ coll = coll_with_many_features
+ got = list(coll.get_features(biotype="exon", limit=limit))
+ assert len(got) == min(limit, 6)
+
+
+def test_collection_get_features_limit_none(coll_with_many_features):
+ coll = coll_with_many_features
+ full = list(coll.get_features(biotype="exon"))
+ assert len(full) == 6
+ same = list(coll.get_features(biotype="exon", limit=None))
+ assert len(same) == 6
+
+
+ at pytest.mark.parametrize("limit", [0, -1])
+def test_collection_get_features_limit_invalid(coll_with_many_features, limit):
+ with pytest.raises(ValueError):
+ list(coll_with_many_features.get_features(biotype="exon", limit=limit))
+
+
+ at pytest.fixture
+def aln_with_many_features():
+ data = {
+ "s1": "AAAATTTTGGGGCCCC",
+ "s2": "AAAATTTTGGGGCCCC",
+ }
+ aln = c3_alignment.make_aligned_seqs(data, moltype="dna")
+ # 4 sequence-level features
+ for seqid in ("s1", "s2"):
+ for i in range(2):
+ aln.annotation_db.add_feature(
+ seqid=seqid,
+ biotype="exon",
+ name=f"{seqid}_exon{i}",
+ spans=[(i * 4, i * 4 + 3)],
+ strand="+",
+ )
+ # 3 alignment-level features
+ for i in range(3):
+ aln.add_feature(
+ biotype="exon",
+ name=f"aln_exon{i}",
+ spans=[(i * 4, i * 4 + 2)],
+ on_alignment=True,
+ )
+ yield aln
+ close_dbs(aln)
+
+
+ at pytest.mark.parametrize("limit", [1, 3, 5, 7, 100])
+def test_alignment_get_features_limit(aln_with_many_features, limit):
+ aln = aln_with_many_features
+ got = list(aln.get_features(biotype="exon", limit=limit))
+ # 4 seq-level + 3 aln-level = 7 total
+ assert len(got) == min(limit, 7)
+
+
+def test_alignment_get_features_limit_on_alignment(aln_with_many_features):
+ aln = aln_with_many_features
+ got = list(aln.get_features(biotype="exon", on_alignment=True, limit=2))
+ assert len(got) == 2
+
+
+def test_alignment_get_features_limit_seq_only(aln_with_many_features):
+ aln = aln_with_many_features
+ got = list(aln.get_features(biotype="exon", on_alignment=False, limit=3))
+ assert len(got) == 3
+
+
+ at pytest.mark.parametrize("limit", [0, -1])
+def test_alignment_get_features_limit_invalid(aln_with_many_features, limit):
+ with pytest.raises(ValueError):
+ list(aln_with_many_features.get_features(biotype="exon", limit=limit))
=====================================
tests/test_core/test_annotation_db.py
=====================================
@@ -757,6 +757,145 @@ def test_get_features_matching_start_stop_seqview(DATA_DIR, seq):
close_dbs(db)
+ at pytest.fixture
+def populated_basic_db() -> BasicAnnotationDb:
+ db = BasicAnnotationDb()
+ for i in range(5):
+ db.add_feature(
+ seqid="seq1",
+ biotype="exon",
+ name=f"exon{i}",
+ spans=[(i * 10, i * 10 + 5)],
+ strand="+",
+ )
+ yield db
+ db.close()
+
+
+ at pytest.mark.parametrize("limit", [1, 2, 5, 10])
+def test_get_features_matching_limit_basic(populated_basic_db, limit):
+ got = list(populated_basic_db.get_features_matching(biotype="exon", limit=limit))
+ assert len(got) == min(limit, 5)
+
+
+ at pytest.mark.parametrize("limit", [1, 2, 5, 10])
+def test_get_records_matching_limit_basic(populated_basic_db, limit):
+ got = list(populated_basic_db.get_records_matching(biotype="exon", limit=limit))
+ assert len(got) == min(limit, 5)
+
+
+def test_get_features_matching_limit_none_equivalent_to_unlimited(populated_basic_db):
+ default = list(populated_basic_db.get_features_matching(biotype="exon"))
+ with_none = list(
+ populated_basic_db.get_features_matching(biotype="exon", limit=None)
+ )
+ assert len(default) == len(with_none) == 5
+
+
+ at pytest.mark.parametrize("limit", [0, -1, -5])
+def test_get_features_matching_limit_invalid(populated_basic_db, limit):
+ with pytest.raises(ValueError):
+ # generator must be consumed for the exception to fire
+ list(populated_basic_db.get_features_matching(limit=limit))
+
+
+ at pytest.mark.parametrize("limit", [0, -1])
+def test_get_records_matching_limit_invalid(populated_basic_db, limit):
+ with pytest.raises(ValueError):
+ list(populated_basic_db.get_records_matching(limit=limit))
+
+
+def test_get_features_matching_limit_with_filters(populated_basic_db):
+ got = list(
+ populated_basic_db.get_features_matching(
+ biotype="exon",
+ start=0,
+ stop=25,
+ allow_partial=True,
+ limit=2,
+ )
+ )
+ assert len(got) == 2
+ # confirm filter still applies (the unlimited result would be 3)
+ full = list(
+ populated_basic_db.get_features_matching(
+ biotype="exon",
+ start=0,
+ stop=25,
+ allow_partial=True,
+ )
+ )
+ assert len(full) == 3
+
+
+ at pytest.fixture
+def gff_db_with_user_exons(DATA_DIR):
+ # simple.gff loads 2 exon rows into the 'gff' table; add 2 more into
+ # the 'user' table to exercise limit-spanning behaviour across tables
+ db = load_annotations(path=DATA_DIR / "simple.gff")
+ for i in range(2):
+ db.add_feature(
+ seqid="test_seq",
+ biotype="exon",
+ name=f"user_exon{i}",
+ spans=[(100 + i * 10, 105 + i * 10)],
+ strand="+",
+ )
+ yield db
+ db.close()
+
+
+def test_get_features_matching_spans_tables_total(gff_db_with_user_exons):
+ got = list(gff_db_with_user_exons.get_features_matching(biotype="exon"))
+ assert len(got) == 4
+
+
+ at pytest.mark.parametrize(("limit", "expected"), [(1, 1), (2, 2), (3, 3), (100, 4)])
+def test_get_features_matching_limit_spans_tables(
+ gff_db_with_user_exons,
+ limit,
+ expected,
+):
+ got = list(
+ gff_db_with_user_exons.get_features_matching(biotype="exon", limit=limit)
+ )
+ assert len(got) == expected
+
+
+def test_get_records_matching_spans_tables_total(gff_db_with_user_exons):
+ total = sum(1 for _ in gff_db_with_user_exons.get_records_matching(biotype="exon"))
+ assert total == 4
+
+
+def test_get_records_matching_limit_spans_tables(gff_db_with_user_exons):
+ got = list(gff_db_with_user_exons.get_records_matching(biotype="exon", limit=2))
+ assert len(got) == 2
+
+
+ at pytest.fixture
+def seq_with_exons(seq, anno_db):
+ seq.annotation_db = anno_db
+ for i in range(4):
+ anno_db.add_feature(
+ seqid=seq.name,
+ biotype="exon",
+ name=f"exon{i}",
+ spans=[(i * 3, i * 3 + 2)],
+ strand="+",
+ )
+ return seq
+
+
+def test_sequence_get_features_unlimited(seq_with_exons):
+ full = list(seq_with_exons.get_features(biotype="exon"))
+ assert len(full) == 4
+
+
+def test_sequence_get_features_limit(seq_with_exons):
+ got = list(seq_with_exons.get_features(biotype="exon", limit=2))
+ assert len(got) == 2
+
+
def test_get_slice():
"""get_slice should return the same as slicing the sequence directly"""
seq = cogent3.make_seq("ATTGTACGCCCCTGA", name="test_seq", moltype="dna")
=====================================
tests/test_evolve/test_coevolution.py
=====================================
@@ -403,13 +403,16 @@ def test_parallel_protein(stat):
assert_allclose(got.array, serial.array)
-def test_par_kw_max_workers(alignment):
+ at pytest.mark.parametrize("max_workers", [2, 20_000])
+def test_par_kw_max_workers(alignment, max_workers):
"""par_kw max_workers is accepted without error"""
+ # larger than num threads should be silently
+ # reduced to num threads
got = c3_coevo.coevolution_matrix(
alignment=alignment,
stat="nmi",
parallel=True,
- par_kw={"max_workers": 2},
+ par_kw={"max_workers": max_workers},
show_progress=False,
)
serial = c3_coevo.coevolution_matrix(
View it on GitLab: https://salsa.debian.org/med-team/python-cogent/-/commit/a39393dc50f9dfe42346ee8caf2f705ed66d2734
--
View it on GitLab: https://salsa.debian.org/med-team/python-cogent/-/commit/a39393dc50f9dfe42346ee8caf2f705ed66d2734
You're receiving this email because of your account on salsa.debian.org. Manage all notifications: https://salsa.debian.org/-/profile/notifications | Help: https://salsa.debian.org/help
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20260605/0d20bdda/attachment-0001.htm>
More information about the debian-med-commit
mailing list