[med-svn] [Git][med-team/python-cogent][upstream] New upstream version 2026.6.2a0+dfsg

Fri Jun 5 05:22:23 BST 2026


Karsten Schöke pushed to branch upstream at Debian Med / python-cogent


Commits:
a39393dc by Karsten Schöke at 2026-06-04T13:08:20+02:00
New upstream version 2026.6.2a0+dfsg
- - - - -


12 changed files:

- .github/workflows/codeql.yml
- changelog.md
- doc/cookbook/annotation_db.rst
- doc/cookbook/features.rst
- src/cogent3/_version.py
- src/cogent3/core/alignment.py
- src/cogent3/core/annotation_db.py
- src/cogent3/core/sequence.py
- src/cogent3/evolve/coevolution.py
- tests/test_core/test_aln_annotation.py
- tests/test_core/test_annotation_db.py
- tests/test_evolve/test_coevolution.py


Changes:

=====================================
.github/workflows/codeql.yml
=====================================
@@ -43,7 +43,7 @@ jobs:
 
     # Initializes the CodeQL tools for scanning.
     - name: Initialize CodeQL
-      uses: github/codeql-action/init at v4
+      uses: github/codeql-action/init at v4.35.5
       with:
         languages: ${{ matrix.language }}
         # If you wish to specify custom queries, you can do so here or in a config file.
@@ -57,7 +57,7 @@ jobs:
     # Autobuild attempts to build any compiled languages  (C/C++, C#, Go, or Java).
     # If this step fails, then you should remove it and run the build manually (see below)
     - name: Autobuild
-      uses: github/codeql-action/autobuild at v4
+      uses: github/codeql-action/autobuild at v4.35.5
 
     # ℹ️ Command-line programs to run using the OS shell.
     # 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun
@@ -70,6 +70,6 @@ jobs:
     #   ./location_of_script_within_repo/buildscript.sh
 
     - name: Perform CodeQL Analysis
-      uses: github/codeql-action/analyze at v4
+      uses: github/codeql-action/analyze at v4.35.5
       with:
         category: "/language:${{matrix.language}}"


=====================================
changelog.md
=====================================
@@ -1,4 +1,38 @@
 
+<a id='changelog-2026.6.2a0'></a>
+# Changes in release "2026.6.2a0"
+
+A bug fix and minor enhancement release.
+
+## Contributors
+
+- @GavinHuttley
+- @sanvila reported a bug and proposed a fix 🚀
+
+## Enhancements
+
+- Added `iter_fastq_records` to `cogent3.parse.fastq` for streaming fastq records
+  as `(label, sequence, quality)` tuples with optional bytes converters.
+- Added `make_qual_converter` and `PhredEncoding` to `cogent3.core.alphabet` for
+  mapping Phred+33 / Phred+64 quality ASCII into `numpy.uint8` score arrays.
+- Added `limit` keyword argument to all methods that select annotations. Propagated
+  through to the `AnnotationDb.get_(features|records)_matching()` methods. Applied
+  in aggregate across all tables.
+
+## Bug fixes
+
+- `load_annotations()` now correctly uses `format_name`. For cases where the file
+  name suffix did not match a known format, `format_name` argument was having no
+  effect. This is now fixed.
+- the `coevolution_matrix()` function now correctly caps `max_workers`, thanks to
+  @sanvila for reporting and proposing the fix!
+
+## Documentation
+
+- Added cookbook sections covering direct fastq parsing with `iter_fastq_records`
+  and constructing quality-score converters with `make_qual_converter`.
+- Added docs describing the new `limit` argument.
+
 <a id='changelog-2026.5.19a0'></a>
 # Changes in release "2026.5.19a0"
 


=====================================
doc/cookbook/annotation_db.rst
=====================================
@@ -185,6 +185,25 @@ For example, we can query for all CDS related to replication:
 
 .. note:: Extended attribute querying only works for GFF databases!
 
+Limiting the number of returned records
+"""""""""""""""""""""""""""""""""""""""
+
+Both ``get_features_matching()`` and ``get_records_matching()`` accept an optional ``limit`` argument that caps the total number of records yielded. This is useful when probing a large database or building previews.
+
+.. jupyter-execute::
+
+    first_two_genes = list(gff_db.get_features_matching(biotype="gene", limit=2))
+    first_two_genes
+
+The same applies to ``get_records_matching()``:
+
+.. jupyter-execute::
+
+    first_record = list(gff_db.get_records_matching(biotype="CDS", limit=1))
+    first_record
+
+``limit`` counts across all internal tables (e.g. the "gff" and "user" tables of a ``GffAnnotationDb``). It must be a positive integer; ``limit=None`` (the default) returns all matching records.
+
 How to interrogate an ``AnnotationDb``
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 


=====================================
doc/cookbook/features.rst
=====================================
@@ -344,6 +344,18 @@ We can again provide a combination of conditions, for example, querying for all
     mRNA = list(seq.get_features(start=10148, stop=29322, biotype="mRNA"))[0]
     mRNA
 
+Limiting the number of returned Features
+++++++++++++++++++++++++++++++++++++++++
+
+Both ``Sequence.get_features()`` and the equivalent methods on ``SequenceCollection`` and ``Alignment`` accept an optional ``limit`` argument that caps the total number of features yielded. This is handy when a query would otherwise return a very large set.
+
+.. jupyter-execute::
+
+    first_three = list(seq.get_features(biotype="CDS", limit=3))
+    first_three
+
+``limit`` must be a positive integer; ``limit=None`` (the default) returns all matching features. For an ``Alignment`` the limit applies to the combined total of sequence-level and alignment-level features.
+
 Querying a Sequence (via an Alignment) for Features
 """""""""""""""""""""""""""""""""""""""""""""""""""
 


=====================================
src/cogent3/_version.py
=====================================
@@ -1 +1 @@
-__version__ = "2026.5.25a0"
+__version__ = "2026.6.2a0"


=====================================
src/cogent3/core/alignment.py
=====================================
@@ -575,6 +575,7 @@ class CollectionBase(AnnotatableMixin, ABC, Generic[TSequenceOrAligned]):
         biotype: str | tuple[str, ...] | list[str] | set[str] | None = None,
         name: str | None = None,
         allow_partial: bool = False,
+        limit: int | None = None,
         **kwargs: Any,
     ) -> Iterator[Feature[Any]]: ...
 
@@ -2300,6 +2301,7 @@ class SequenceCollection(CollectionBase[c3_sequence.Sequence]):
         allow_partial: bool = False,
         start: int | None = None,
         stop: int | None = None,
+        limit: int | None = None,
         **kwargs: Any,
     ) -> Iterator[Feature[c3_sequence.Sequence]]:
         """yields Feature instances
@@ -2318,6 +2320,9 @@ class SequenceCollection(CollectionBase[c3_sequence.Sequence]):
             stop position of the feature (inclusive)
         allow_partial
             allow features partially overlaping self
+        limit
+            maximum total number of features to yield across all sequences.
+            If None, all matching features are returned. Must be positive.
         kwargs
             additional keyword arguments to query the annotation db
 
@@ -2333,6 +2338,10 @@ class SequenceCollection(CollectionBase[c3_sequence.Sequence]):
         if not self.has_annotation_db():
             return None
 
+        if limit is not None and limit <= 0:
+            msg = f"limit must be positive, got {limit!r}"
+            raise ValueError(msg)
+
         if seqid and (seqid not in self.names):
             msg = f"unknown {seqid=}"
             raise ValueError(msg)
@@ -2341,7 +2350,8 @@ class SequenceCollection(CollectionBase[c3_sequence.Sequence]):
             # if no seqids provided, we do direct search to find the seqids
             # that match the other parameters. This matters for collections
             # with large numbers of sequences due to the overhead of creating
-            # Sequence instances
+            # Sequence instances. We do NOT apply limit here because we need
+            # to discover all candidate seqids before yielding from any of them.
             matched = {
                 record["seqid"]
                 for record in self.annotation_db.get_features_matching(
@@ -2359,16 +2369,23 @@ class SequenceCollection(CollectionBase[c3_sequence.Sequence]):
                 "list[str]", [seqid] if isinstance(seqid, str) else self.names
             )
 
+        yielded = 0
         for seqid in seqids:
             seq = self.seqs[seqid]
-            yield from seq.get_features(
+            remaining = None if limit is None else limit - yielded
+            for feature in seq.get_features(
                 biotype=biotype,
                 name=name,
                 start=start,
                 stop=stop,
                 allow_partial=allow_partial,
+                limit=remaining,
                 **kwargs,
-            )
+            ):
+                yield feature
+                yielded += 1
+                if limit is not None and yielded >= limit:
+                    return
 
     def is_ragged(self) -> bool:
         """rerturns True if sequences are of different lengths"""
@@ -3372,6 +3389,7 @@ class Alignment(CollectionBase[Aligned]):
         biotype: str | None = None,
         name: str | None = None,
         allow_partial: bool = False,
+        limit: int | None = None,
     ) -> Iterator[Feature[Alignment]]:
         """yields Feature instances
 
@@ -3385,6 +3403,9 @@ class Alignment(CollectionBase[Aligned]):
             name of the feature
         allow_partial
             allow features partially overlaping self
+        limit
+            maximum total number of features to yield across all sequences.
+            If None, all matching features are returned.
 
         Notes
         -----
@@ -3407,6 +3428,7 @@ class Alignment(CollectionBase[Aligned]):
             msg = f"unknown {seqid=}"
             raise ValueError(msg)
 
+        yielded = 0
         for seqid in seqids:
             seqname = seqid_to_seqname[seqid]
             seq = self.seqs[seqname]
@@ -3417,6 +3439,7 @@ class Alignment(CollectionBase[Aligned]):
             # to the alignment coordinates
             offset = self.storage.offset.get(seqid, 0)
 
+            remaining = None if limit is None else limit - yielded
             for feature in self.annotation_db.get_features_matching(
                 seqid=parent_id,
                 biotype=biotype,
@@ -3425,11 +3448,15 @@ class Alignment(CollectionBase[Aligned]):
                 allow_partial=allow_partial,
                 start=start + offset,
                 stop=stop + offset,
+                limit=remaining,
             ):
                 if offset:
                     feature["spans"] = (numpy.array(feature["spans"]) - offset).tolist()
                 # passing self only used when self is an Alignment
                 yield seq.make_feature(feature, self)
+                yielded += 1
+                if limit is not None and yielded >= limit:
+                    return
 
     def get_features(
         self,
@@ -3439,6 +3466,7 @@ class Alignment(CollectionBase[Aligned]):
         name: str | None = None,
         allow_partial: bool = False,
         on_alignment: bool | None = None,
+        limit: int | None = None,
         **kwargs: Any,
     ) -> Iterator[Feature[Alignment]]:
         """yields Feature instances
@@ -3456,6 +3484,10 @@ class Alignment(CollectionBase[Aligned]):
             SequenceCollection instances.
         allow_partial
             allow features partially overlaping self
+        limit
+            maximum total number of features to yield across sequence-level
+            and alignment-level features. If None, all matching features
+            are returned. Must be positive.
 
         Notes
         -----
@@ -3468,42 +3500,48 @@ class Alignment(CollectionBase[Aligned]):
         if not self.has_annotation_db() or not len(self._annotation_db):
             return None
 
-        # we only do on-alignment in here
+        if limit is not None and limit <= 0:
+            msg = f"limit must be positive, got {limit!r}"
+            raise ValueError(msg)
+
+        remaining = limit
         if not on_alignment:
-            local_vars = locals()
-            kwargs = {k: v for k, v in local_vars.items() if k != "self"}
-            kwargs.pop("on_alignment")
-            yield from self._get_seq_features(**kwargs)
+            for feature in self._get_seq_features(
+                seqid=cast("str | None", seqid),
+                biotype=cast("str | None", biotype),
+                name=name,
+                allow_partial=allow_partial,
+                limit=remaining,
+            ):
+                yield feature
+                if remaining is not None:
+                    remaining -= 1
 
         if on_alignment == False:  # noqa: E712
             return
 
+        if remaining == 0:
+            return
+
         seq_map = None
-        for feature in self.annotation_db.get_features_matching(
+        strand: int | str | None
+        for record in self.annotation_db.get_features_matching(
             biotype=biotype,
             name=name,
-            on_alignment=on_alignment,
+            on_alignment=True,
             allow_partial=allow_partial,
+            limit=remaining,
         ):
-            if feature["seqid"]:
-                continue
-            on_al = cast("bool", feature.pop("on_alignment", on_alignment))  # type: ignore[misc]
-            if feature["seqid"]:
-                msg = f"{on_alignment=} {feature=}"
-                raise RuntimeError(msg)
-
-            strand: int | str | None
+            on_al = cast("bool", record.pop("on_alignment", True))  # type: ignore[misc]
             if seq_map is None:
                 seq_map = self.seqs[0].map.to_feature_map()
                 *_, strand = self.seqs[0].seq.parent_coordinates()
             else:
-                strand = feature.pop("strand", None)  # type: ignore[misc]
-
-            spans = seq_map.relative_position(numpy.array(feature["spans"]))
-            feature["spans"] = spans.tolist()
-            # and if i've been reversed...?
-            feature["strand"] = cast("int", Strand.from_value(strand).value)
-            yield self.make_feature(feature=feature, on_alignment=on_al)
+                strand = record.pop("strand", None)  # type: ignore[misc]
+            spans = seq_map.relative_position(numpy.array(record["spans"]))
+            record["spans"] = spans.tolist()
+            record["strand"] = cast("int", Strand.from_value(strand).value)
+            yield self.make_feature(feature=record, on_alignment=on_al)
 
     def is_ragged(self) -> bool:
         """by definition False for an Alignment"""


=====================================
src/cogent3/core/annotation_db.py
=====================================
@@ -470,6 +470,7 @@ class AnnotationDbABC(abc.ABC):
         attributes: str | None = None,
         on_alignment: bool | None = None,
         allow_partial: bool = False,
+        limit: int | None = None,
     ) -> Iterator[FeatureDataType]: ...
 
     @abc.abstractmethod
@@ -485,6 +486,7 @@ class AnnotationDbABC(abc.ABC):
         attributes: str | None = None,
         on_alignment: bool | None = None,
         allow_partial: bool = False,
+        limit: int | None = None,
     ) -> Iterator[dict[str, Any]]: ...
 
     @abc.abstractmethod
@@ -720,6 +722,7 @@ def _select_records_sql(
     conditions: dict[str, Any],
     columns: Iterable[str] | None = None,
     allow_partial: bool = True,
+    limit: int | None = None,
 ) -> tuple[str, tuple[Any, ...] | None]:
     """create SQL select statement and values
 
@@ -734,6 +737,9 @@ def _select_records_sql(
     allow_partial
         if False, only records within start, stop are included. If True,
         all records that overlap the segment defined by start, stop are included.
+    limit
+        maximum number of records to return. If None, all matching records
+        are returned.
 
     Returns
     -------
@@ -747,10 +753,11 @@ def _select_records_sql(
     )
     columns_str = f"{', '.join(columns)}" if columns else "*"
     sql = f"SELECT {columns_str} FROM {table_name}"
+    limit_clause = f" LIMIT {limit}" if limit is not None else ""
     if not where:
-        return sql, None
+        return f"{sql}{limit_clause}", None
 
-    sql = f"{sql} WHERE {where};"
+    sql = f"{sql} WHERE {where}{limit_clause};"
     return sql, vals
 
 
@@ -1504,6 +1511,7 @@ class SqliteAnnotationDbMixin:
             kwargs["attributes"] = f"%{kwargs['attributes']}%"
         columns = kwargs.pop("columns", None)
         allow_partial = kwargs.pop("allow_partial", False)
+        limit = kwargs.pop("limit", None)
 
         # Translate query conditions and column names for normalized schema
         if self._lookup_cache is not None:
@@ -1515,6 +1523,7 @@ class SqliteAnnotationDbMixin:
             conditions=kwargs,
             columns=columns,
             allow_partial=allow_partial,
+            limit=limit,
         )
         with contextlib.suppress(sqlite3.ProgrammingError):
             # garbage collection issue
@@ -1797,11 +1806,22 @@ class SqliteAnnotationDbMixin:
         attributes: str | None = None,
         on_alignment: bool | None = None,
         allow_partial: bool = False,
+        limit: int | None = None,
     ) -> Iterator[dict[str, Any]]:
-        """return all fields for matching records"""
+        """return all fields for matching records
+
+        Parameters
+        ----------
+        limit
+            maximum total number of records to yield across all tables.
+            If None, all matching records are returned. Must be positive.
+        """
         # a record is Everything, a Feature is a subset
         # we define query as all defined variables from local name space,
         # excluding "self" and kwargs at default values
+        if limit is not None and limit <= 0:
+            msg = f"limit must be positive, got {limit!r}"
+            raise ValueError(msg)
         local_vars = locals()
         kwargs = {k: v for k, v in local_vars.items() if k != "self" and v is not None}
         if "strand" in kwargs:
@@ -1809,10 +1829,16 @@ class SqliteAnnotationDbMixin:
 
         # alignment features are created by the user specific
         table_names = ["user"] if on_alignment else self.table_names
+        yielded = 0
         for table_name in table_names:
+            if limit is not None:
+                kwargs["limit"] = limit - yielded
             for result in self._get_records_matching(table_name, **kwargs):
                 res = dict(zip(result.keys(), result, strict=False))
                 yield self._translate_record_from_ids(res)
+                yielded += 1
+                if limit is not None and yielded >= limit:
+                    return
 
     def get_features_matching(
         self,
@@ -1826,10 +1852,22 @@ class SqliteAnnotationDbMixin:
         attributes: str | None = None,
         on_alignment: bool | None = None,
         allow_partial: bool = False,
+        limit: int | None = None,
     ) -> Iterator[FeatureDataType]:
+        """yield essential values to create a Feature for matching records
+
+        Parameters
+        ----------
+        limit
+            maximum total number of features to yield across all tables.
+            If None, all matching features are returned. Must be positive.
+        """
         # returns essential values to create a Feature
         # we define query as all defined variables from local name space,
         # excluding "self" and kwargs with a default value of None
+        if limit is not None and limit <= 0:
+            msg = f"limit must be positive, got {limit!r}"
+            raise ValueError(msg)
         local_vars = locals()
         kwargs = {k: v for k, v in local_vars.items() if k != "self" and v is not None}
         if "strand" in kwargs:
@@ -1837,6 +1875,7 @@ class SqliteAnnotationDbMixin:
 
         # alignment features are created by the user specific
         table_names = ["user"] if on_alignment else self.table_names
+        yielded = 0
         for table_name in table_names:
             columns: tuple[str, ...] = ("seqid", "biotype", "spans", "strand", "name")
             query_args = {**kwargs}
@@ -1846,6 +1885,9 @@ class SqliteAnnotationDbMixin:
             else:
                 query_args.pop("on_alignment", None)
 
+            if limit is not None:
+                query_args["limit"] = limit - yielded
+
             for result in self._get_records_matching(
                 table_name=table_name,
                 columns=columns,
@@ -1857,6 +1899,9 @@ class SqliteAnnotationDbMixin:
                 res["on_alignment"] = res.get("on_alignment")
                 res["spans"] = [cast("tuple[int, int]", tuple(c)) for c in res["spans"]]
                 yield cast("FeatureDataType", res)
+                yielded += 1
+                if limit is not None and yielded >= limit:
+                    return
 
     def num_matches(
         self,


=====================================
src/cogent3/core/sequence.py
=====================================
@@ -1129,6 +1129,7 @@ class Sequence(AnnotatableMixin):
         start: int | None = None,
         stop: int | None = None,
         allow_partial: bool = False,
+        limit: int | None = None,
         **kwargs: Any,
     ) -> Iterator[Feature[Sequence]]:
         """yields Feature instances
@@ -1142,6 +1143,9 @@ class Sequence(AnnotatableMixin):
         start, stop
             start, stop positions to search between, relative to offset
             of this sequence. If not provided, entire span of sequence is used.
+        limit
+            maximum number of features to yield. If None, all matching
+            features are returned. Must be positive.
         kwargs
             keyword arguments passed to annotation_db.get_features_matching()
 
@@ -1215,7 +1219,7 @@ class Sequence(AnnotatableMixin):
         # To piggy-back on that method we need to convert our feature spans
         # into the current orientation. HOWEVER, we also have the reversed
         # flag which comes back from the db
-        kwargs |= {"allow_partial": allow_partial}
+        kwargs |= {"allow_partial": allow_partial, "limit": limit}
         for feature in self.annotation_db.get_features_matching(
             seqid=parent_id,
             name=name,


=====================================
src/cogent3/evolve/coevolution.py
=====================================
@@ -869,6 +869,8 @@ def coevolution_matrix(
     prev_threads = None
     if max_workers is not None and parallel:
         prev_threads = numba.get_num_threads()
+        # don't exceed available threads
+        max_workers = min(max_workers, numba.get_num_threads())
         numba.set_num_threads(max_workers)
 
     try:


=====================================
tests/test_core/test_aln_annotation.py
=====================================
@@ -1099,3 +1099,99 @@ def test_get_feature_seqs_offset_minus_strand(mk_cls):
     got = got if mk_cls == c3_alignment.make_unaligned_seqs else got.get_seq("s1")
     assert str(got) == expect
     close_dbs(coll)
+
+
+ at pytest.fixture
+def coll_with_many_features():
+    data = {
+        "s1": "AAAATTTTGGGGCCCC",
+        "s2": "AAAATTTTGGGGCCCC",
+    }
+    coll = c3_alignment.make_unaligned_seqs(data, moltype="dna")
+    for seqid in ("s1", "s2"):
+        for i in range(3):
+            coll.annotation_db.add_feature(
+                seqid=seqid,
+                biotype="exon",
+                name=f"{seqid}_exon{i}",
+                spans=[(i * 4, i * 4 + 3)],
+                strand="+",
+            )
+    yield coll
+    close_dbs(coll)
+
+
+ at pytest.mark.parametrize("limit", [1, 3, 5, 6, 100])
+def test_collection_get_features_limit(coll_with_many_features, limit):
+    coll = coll_with_many_features
+    got = list(coll.get_features(biotype="exon", limit=limit))
+    assert len(got) == min(limit, 6)
+
+
+def test_collection_get_features_limit_none(coll_with_many_features):
+    coll = coll_with_many_features
+    full = list(coll.get_features(biotype="exon"))
+    assert len(full) == 6
+    same = list(coll.get_features(biotype="exon", limit=None))
+    assert len(same) == 6
+
+
+ at pytest.mark.parametrize("limit", [0, -1])
+def test_collection_get_features_limit_invalid(coll_with_many_features, limit):
+    with pytest.raises(ValueError):
+        list(coll_with_many_features.get_features(biotype="exon", limit=limit))
+
+
+ at pytest.fixture
+def aln_with_many_features():
+    data = {
+        "s1": "AAAATTTTGGGGCCCC",
+        "s2": "AAAATTTTGGGGCCCC",
+    }
+    aln = c3_alignment.make_aligned_seqs(data, moltype="dna")
+    # 4 sequence-level features
+    for seqid in ("s1", "s2"):
+        for i in range(2):
+            aln.annotation_db.add_feature(
+                seqid=seqid,
+                biotype="exon",
+                name=f"{seqid}_exon{i}",
+                spans=[(i * 4, i * 4 + 3)],
+                strand="+",
+            )
+    # 3 alignment-level features
+    for i in range(3):
+        aln.add_feature(
+            biotype="exon",
+            name=f"aln_exon{i}",
+            spans=[(i * 4, i * 4 + 2)],
+            on_alignment=True,
+        )
+    yield aln
+    close_dbs(aln)
+
+
+ at pytest.mark.parametrize("limit", [1, 3, 5, 7, 100])
+def test_alignment_get_features_limit(aln_with_many_features, limit):
+    aln = aln_with_many_features
+    got = list(aln.get_features(biotype="exon", limit=limit))
+    # 4 seq-level + 3 aln-level = 7 total
+    assert len(got) == min(limit, 7)
+
+
+def test_alignment_get_features_limit_on_alignment(aln_with_many_features):
+    aln = aln_with_many_features
+    got = list(aln.get_features(biotype="exon", on_alignment=True, limit=2))
+    assert len(got) == 2
+
+
+def test_alignment_get_features_limit_seq_only(aln_with_many_features):
+    aln = aln_with_many_features
+    got = list(aln.get_features(biotype="exon", on_alignment=False, limit=3))
+    assert len(got) == 3
+
+
+ at pytest.mark.parametrize("limit", [0, -1])
+def test_alignment_get_features_limit_invalid(aln_with_many_features, limit):
+    with pytest.raises(ValueError):
+        list(aln_with_many_features.get_features(biotype="exon", limit=limit))


=====================================
tests/test_core/test_annotation_db.py
=====================================
@@ -757,6 +757,145 @@ def test_get_features_matching_start_stop_seqview(DATA_DIR, seq):
     close_dbs(db)
 
 
+ at pytest.fixture
+def populated_basic_db() -> BasicAnnotationDb:
+    db = BasicAnnotationDb()
+    for i in range(5):
+        db.add_feature(
+            seqid="seq1",
+            biotype="exon",
+            name=f"exon{i}",
+            spans=[(i * 10, i * 10 + 5)],
+            strand="+",
+        )
+    yield db
+    db.close()
+
+
+ at pytest.mark.parametrize("limit", [1, 2, 5, 10])
+def test_get_features_matching_limit_basic(populated_basic_db, limit):
+    got = list(populated_basic_db.get_features_matching(biotype="exon", limit=limit))
+    assert len(got) == min(limit, 5)
+
+
+ at pytest.mark.parametrize("limit", [1, 2, 5, 10])
+def test_get_records_matching_limit_basic(populated_basic_db, limit):
+    got = list(populated_basic_db.get_records_matching(biotype="exon", limit=limit))
+    assert len(got) == min(limit, 5)
+
+
+def test_get_features_matching_limit_none_equivalent_to_unlimited(populated_basic_db):
+    default = list(populated_basic_db.get_features_matching(biotype="exon"))
+    with_none = list(
+        populated_basic_db.get_features_matching(biotype="exon", limit=None)
+    )
+    assert len(default) == len(with_none) == 5
+
+
+ at pytest.mark.parametrize("limit", [0, -1, -5])
+def test_get_features_matching_limit_invalid(populated_basic_db, limit):
+    with pytest.raises(ValueError):
+        # generator must be consumed for the exception to fire
+        list(populated_basic_db.get_features_matching(limit=limit))
+
+
+ at pytest.mark.parametrize("limit", [0, -1])
+def test_get_records_matching_limit_invalid(populated_basic_db, limit):
+    with pytest.raises(ValueError):
+        list(populated_basic_db.get_records_matching(limit=limit))
+
+
+def test_get_features_matching_limit_with_filters(populated_basic_db):
+    got = list(
+        populated_basic_db.get_features_matching(
+            biotype="exon",
+            start=0,
+            stop=25,
+            allow_partial=True,
+            limit=2,
+        )
+    )
+    assert len(got) == 2
+    # confirm filter still applies (the unlimited result would be 3)
+    full = list(
+        populated_basic_db.get_features_matching(
+            biotype="exon",
+            start=0,
+            stop=25,
+            allow_partial=True,
+        )
+    )
+    assert len(full) == 3
+
+
+ at pytest.fixture
+def gff_db_with_user_exons(DATA_DIR):
+    # simple.gff loads 2 exon rows into the 'gff' table; add 2 more into
+    # the 'user' table to exercise limit-spanning behaviour across tables
+    db = load_annotations(path=DATA_DIR / "simple.gff")
+    for i in range(2):
+        db.add_feature(
+            seqid="test_seq",
+            biotype="exon",
+            name=f"user_exon{i}",
+            spans=[(100 + i * 10, 105 + i * 10)],
+            strand="+",
+        )
+    yield db
+    db.close()
+
+
+def test_get_features_matching_spans_tables_total(gff_db_with_user_exons):
+    got = list(gff_db_with_user_exons.get_features_matching(biotype="exon"))
+    assert len(got) == 4
+
+
+ at pytest.mark.parametrize(("limit", "expected"), [(1, 1), (2, 2), (3, 3), (100, 4)])
+def test_get_features_matching_limit_spans_tables(
+    gff_db_with_user_exons,
+    limit,
+    expected,
+):
+    got = list(
+        gff_db_with_user_exons.get_features_matching(biotype="exon", limit=limit)
+    )
+    assert len(got) == expected
+
+
+def test_get_records_matching_spans_tables_total(gff_db_with_user_exons):
+    total = sum(1 for _ in gff_db_with_user_exons.get_records_matching(biotype="exon"))
+    assert total == 4
+
+
+def test_get_records_matching_limit_spans_tables(gff_db_with_user_exons):
+    got = list(gff_db_with_user_exons.get_records_matching(biotype="exon", limit=2))
+    assert len(got) == 2
+
+
+ at pytest.fixture
+def seq_with_exons(seq, anno_db):
+    seq.annotation_db = anno_db
+    for i in range(4):
+        anno_db.add_feature(
+            seqid=seq.name,
+            biotype="exon",
+            name=f"exon{i}",
+            spans=[(i * 3, i * 3 + 2)],
+            strand="+",
+        )
+    return seq
+
+
+def test_sequence_get_features_unlimited(seq_with_exons):
+    full = list(seq_with_exons.get_features(biotype="exon"))
+    assert len(full) == 4
+
+
+def test_sequence_get_features_limit(seq_with_exons):
+    got = list(seq_with_exons.get_features(biotype="exon", limit=2))
+    assert len(got) == 2
+
+
 def test_get_slice():
     """get_slice should return the same as slicing the sequence directly"""
     seq = cogent3.make_seq("ATTGTACGCCCCTGA", name="test_seq", moltype="dna")


=====================================
tests/test_evolve/test_coevolution.py
=====================================
@@ -403,13 +403,16 @@ def test_parallel_protein(stat):
     assert_allclose(got.array, serial.array)
 
 
-def test_par_kw_max_workers(alignment):
+ at pytest.mark.parametrize("max_workers", [2, 20_000])
+def test_par_kw_max_workers(alignment, max_workers):
     """par_kw max_workers is accepted without error"""
+    # larger than num threads should be silently
+    # reduced to num threads
     got = c3_coevo.coevolution_matrix(
         alignment=alignment,
         stat="nmi",
         parallel=True,
-        par_kw={"max_workers": 2},
+        par_kw={"max_workers": max_workers},
         show_progress=False,
     )
     serial = c3_coevo.coevolution_matrix(



View it on GitLab: https://salsa.debian.org/med-team/python-cogent/-/commit/a39393dc50f9dfe42346ee8caf2f705ed66d2734

-- 
View it on GitLab: https://salsa.debian.org/med-team/python-cogent/-/commit/a39393dc50f9dfe42346ee8caf2f705ed66d2734
You're receiving this email because of your account on salsa.debian.org. Manage all notifications: https://salsa.debian.org/-/profile/notifications | Help: https://salsa.debian.org/help


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20260605/0d20bdda/attachment-0001.htm>