[med-svn] [Git][med-team/augur][master] 4 commits: routine-update: New upstream version
Étienne Mollier (@emollier)
gitlab at salsa.debian.org
Mon Sep 4 20:46:35 BST 2023
Étienne Mollier pushed to branch master at Debian Med / augur
Commits:
a3edebb0 by Étienne Mollier at 2023-09-01T21:33:32+02:00
routine-update: New upstream version
- - - - -
d0a658be by Étienne Mollier at 2023-09-01T21:33:33+02:00
New upstream version 22.4.0
- - - - -
8600e56c by Étienne Mollier at 2023-09-01T21:34:27+02:00
Update upstream source from tag 'upstream/22.4.0'
Update to upstream version '22.4.0'
with Debian dir da42b893d50f841c32f4d482839a7c04f2933a05
- - - - -
47687198 by Étienne Mollier at 2023-09-01T21:38:08+02:00
routine-update: Ready to upload to unstable
- - - - -
18 changed files:
- .github/workflows/ci.yaml
- CHANGES.md
- augur/__main__.py
- augur/__version__.py
- augur/ancestral.py
- augur/clades.py
- augur/data/schema-annotations.json
- augur/data/schema-export-v2.json
- augur/distance.py
- augur/filter/include_exclude_rules.py
- augur/refine.py
- augur/validate.py
- debian/changelog
- tests/functional/filter/cram/filter-query-numerical.t
- tests/functional/refine/cram/timetree.t
- + tests/functional/refine/cram/timetree_with_fixed_clock_rate.t
- tests/test_validate.py
- tests/util_support/test_node_data_file.py
Changes:
=====================================
.github/workflows/ci.yaml
=====================================
@@ -80,7 +80,10 @@ jobs:
name: coverage
path: "${{ env.COVERAGE_FILE }}"
- pathogen-ci:
+ # TODO: Use the central pathogen-repo-ci workflow¹. Currently, this is not
+ # possible because it only supports "stock" docker and conda runtimes.
+ # ¹ https://github.com/nextstrain/.github/blob/-/.github/workflows/pathogen-repo-ci.yaml
+ pathogen-repo-ci:
runs-on: ubuntu-latest
continue-on-error: true
env:
@@ -97,48 +100,62 @@ jobs:
pathogen: ncov,
build-args: all_regions -j 2 --profile nextstrain_profiles/nextstrain-ci,
}
+ - { pathogen: rsv }
- {
pathogen: seasonal-flu,
build-args: --configfile profiles/ci/builds.yaml -p,
}
- { pathogen: zika }
- name: test-pathogen-repo-ci (${{ matrix.pathogen }})
+ name: pathogen-repo-ci (${{ matrix.pathogen }})
defaults:
run:
shell: bash -l {0}
steps:
- uses: actions/checkout at v3
- - uses: mamba-org/provision-with-micromamba at v15
with:
- environment-file: false
+ path: ./augur
+
+ - uses: mamba-org/setup-micromamba at v1
+ with:
+ create-args: nextstrain-base
+ condarc: |
+ channels:
+ - nextstrain
+ - conda-forge
+ - bioconda
+ channel_priority: strict
+ cache-environment: true
environment-name: augur
- extra-specs: nextstrain-base
- channels: nextstrain,conda-forge,bioconda
- cache-env: true
- - run: pip install .
+
+ - run: pip install ./augur
- uses: actions/checkout at v3
with:
repository: nextstrain/${{ matrix.pathogen }}
+ path: ./pathogen-repo
+
- name: Copy example data
+ working-directory: ./pathogen-repo
run: |
if [[ -d example_data ]]; then
mkdir -p data/
- cp -v example_data/* data/
+ cp -r -v example_data/* data/
else
echo No example data to copy.
fi
- - run: snakemake -c all ${{ matrix.build-args }}
+
+ - run: nextstrain build --ambient ./pathogen-repo ${{ matrix.build-args }}
+
- if: always()
uses: actions/upload-artifact at v3
with:
name: output-${{ matrix.pathogen }}
path: |
- auspice/
- results/
- benchmarks/
- logs/
- .snakemake/log/
+ ./pathogen-repo/auspice/
+ ./pathogen-repo/results/
+ ./pathogen-repo/benchmarks/
+ ./pathogen-repo/logs/
+ ./pathogen-repo/.snakemake/log/
codecov:
if: github.repository == 'nextstrain/augur'
=====================================
CHANGES.md
=====================================
@@ -3,6 +3,23 @@
## __NEXT__
+## 22.4.0 (29 August 2023)
+
+### Features
+
+* refine: Export covariance matrix and standard deviation for clock rate regression in the node data JSON output when these values are calculated by TreeTime. These new values appear in the `clock` data structure of the JSON output as `cov` and `rate_std` keys, respectively. [#1284][] (@huddlej)
+
+### Bug fixes
+
+* clades: Fix outputs for genes named `NA` (previously the value was replaced by `nan`). [#1293][] (@rneher)
+* distance: Improve documentation by describing how gaps get treated as indels and how users can ignore specific characters in distance calculations. [#1285][] (@huddlej)
+* Fix help output compatibility with non-Unicode streams. [#1290][] (@victorlin)
+
+[#1284]: https://github.com/nextstrain/augur/pull/1284
+[#1285]: https://github.com/nextstrain/augur/pull/1285
+[#1290]: https://github.com/nextstrain/augur/pull/1290
+[#1293]: https://github.com/nextstrain/augur/pull/1293
+
## 22.3.0 (14 August 2023)
### Features
=====================================
augur/__main__.py
=====================================
@@ -2,11 +2,25 @@
Stub function and module used as a setuptools entry point.
"""
+import sys
import augur
from sys import argv, exit
# Entry point for setuptools-installed script and bin/augur dev wrapper.
def main():
+ sys.stdout.reconfigure(
+ # Support non-Unicode encodings by replacing Unicode characters instead of erroring.
+ errors="backslashreplace",
+
+ # Explicitly enable universal newlines mode so we do the right thing.
+ newline=None,
+ )
+ # Apply the above to stderr as well.
+ sys.stderr.reconfigure(
+ errors="backslashreplace",
+ newline=None,
+ )
+
return augur.run( argv[1:] )
# Run when called as `python -m augur`, here for good measure.
=====================================
augur/__version__.py
=====================================
@@ -1,4 +1,4 @@
-__version__ = '22.3.0'
+__version__ = '22.4.0'
def is_augur_version_compatible(version):
=====================================
augur/ancestral.py
=====================================
@@ -8,6 +8,16 @@ Each node then gets assigned a list of nucleotide mutations for any position
that has a mismatch between its own sequence and its parent's sequence.
The node sequences and mutations are output to a node-data JSON file.
+If amino acid options are provided, the ancestral amino acid sequences for each
+requested gene are inferred with the same method as the nucleotide sequences described above.
+The inferred amino acid mutations will be included in the output node-data JSON
+file, with the format equivalent to the output of `augur translate`.
+
+The nucleotide and amino acid sequences are inferred separately in this command,
+which can potentially result in mismatches between the nucleotide and amino
+acid mutations. If you want amino acid mutations based on the inferred
+nucleotide sequences, please use `augur translate`.
+
.. note::
The mutation positions in the node-data JSON are one-based.
=====================================
augur/clades.py
=====================================
@@ -1,7 +1,7 @@
"""
Assign clades to nodes in a tree based on amino-acid or nucleotide signatures.
-Nodes which are members of a clade are stored via
+Nodes which are members of a clade are stored via
<OUTPUT_NODE_DATA> → nodes → <node_name> → clade_membership
and if this file is used in `augur export v2` these will automatically become a coloring.
@@ -62,7 +62,8 @@ def read_in_clade_definitions(clade_file):
df = pd.read_csv(
clade_file,
sep='\t' if clade_file.endswith('.tsv') else ',',
- comment='#'
+ comment='#',
+ na_filter=False,
)
clade_inheritance_rows = df[df['gene'] == 'clade']
@@ -83,9 +84,13 @@ def read_in_clade_definitions(clade_file):
# Use integer 0 as root so as not to conflict with any string clade names
# String '0' can still be used this way
root = 0
+
+ # Skip rows that are missing a clade name.
+ defined_clades = (clade for clade in df.clade.unique() if clade != '')
+
# For every clade, add edge from root as default
# This way all clades can be reached by traversal
- for clade in df.clade.unique():
+ for clade in defined_clades:
G.add_edge(root, clade)
# Build inheritance graph
@@ -181,7 +186,7 @@ def ensure_no_multiple_mutations(all_muts):
aa_positions = [int(mut[1:-1])-1 for mut in node['aa_muts'][gene]]
if len(set(aa_positions))!=len(aa_positions):
multiples.append(f"Node {name} ({gene})")
-
+
if multiples:
raise AugurError(f"Multiple mutations at the same position on a single branch were found: {', '.join(multiples)}")
@@ -310,7 +315,7 @@ def get_reference_sequence_from_root_node(all_muts, root_name):
except KeyError:
missing.append(gene)
- if missing:
+ if missing:
print(f"WARNING in augur.clades: sequences at the root node have not been specified for {{{', '.join(missing)}}}, \
even though mutations were observed. Clades which are annotated using bases/codons present at the root \
of the tree may not be correctly inferred.")
@@ -358,7 +363,6 @@ def run(args):
ref = get_reference_sequence_from_root_node(all_muts, tree.root.name)
clade_designations = read_in_clade_definitions(args.clades)
-
membership, labels = assign_clades(clade_designations, all_muts, tree, ref)
warn_if_clades_not_found(membership, clade_designations)
=====================================
augur/data/schema-annotations.json
=====================================
@@ -1,34 +1,90 @@
{
"type" : "object",
"$schema": "http://json-schema.org/draft-06/schema#",
- "title": "JSON object for the `annotations` key, typically produced by `augur translate`",
- "description": "Coordinates etc of genes / genome",
+ "$id": "https://nextstrain.org/schemas/augur/annotations",
+ "title": "Schema for the 'annotations' property (node-data JSON) or the 'genome_annotations' property (auspice JSON)",
+ "properties": {
+ "nuc": {
+ "type": "object",
+ "allOf": [{ "$ref": "#/$defs/startend" }],
+ "properties": {
+ "start": {
+ "enum": [1],
+ "$comment": "nuc must begin at 1"
+ },
+ "strand": {
+ "type": "string",
+ "enum":["+"],
+ "description": "Strand is optional for nuc, as it should be +ve for all genomes (-ve strand genomes are reverse complemented)",
+ "$comment": "Auspice will not proceed if the JSON has strand='-'"
+ }
+ },
+ "additionalProperties": true,
+ "$comment": "All other properties are unused by Auspice."
+ }
+ },
+ "required": ["nuc"],
"patternProperties": {
- "^[a-zA-Z0-9*_-]+$": {
+ "^(?!nuc)[a-zA-Z0-9*_-]+$": {
+ "$comment": "Each object here defines a single CDS",
"type": "object",
+ "oneOf": [{ "$ref": "#/$defs/startend" }, { "$ref": "#/$defs/segments" }],
+ "additionalProperties": true,
+ "required": ["strand"],
"properties": {
- "seqid":{
- "description": "Sequence on which the coordinates below are valid. Could be viral segment, bacterial contig, etc",
- "$comment": "Unused by Auspice 2.0",
- "type": "string"
+ "gene": {
+ "type": "string",
+ "description": "The name of the gene the CDS is from. Optional.",
+ "$comment": "Shown in on-hover infobox & influences default CDS colors"
},
- "type": {
- "description": "Type of the feature. could be mRNA, CDS, or similar",
- "$comment": "Unused by Auspice 2.0",
- "type": "string"
+ "strand": {
+ "description": "Strand of the CDS",
+ "type": "string",
+ "enum": ["-", "+"]
},
- "start": {
- "description": "Gene start position (one-based, following GFF format)",
- "type": "number"
+ "color": {
+ "type": "string",
+ "description": "A CSS color or a color hex code. Optional."
},
- "end": {
- "description": "Gene end position (one-based closed, last position of feature, following GFF format)",
- "type": "number"
+ "display_name": {
+ "type": "string",
+ "$comment": "Shown in the on-hover info box"
},
- "strand": {
- "description": "Positive or negative strand",
+ "description": {
"type": "string",
- "enum": ["-","+"]
+ "$comment": "Shown in the on-hover info box"
+ }
+ }
+ }
+ },
+ "$defs": {
+ "startend": {
+ "type": "object",
+ "required": ["start", "end"],
+ "properties": {
+ "start": {
+ "type": "integer",
+ "minimum": 1,
+ "description": "Start position (one-based, following GFF format)"
+ },
+ "end": {
+ "type": "integer",
+ "minimum": 2,
+ "description": "End position (one-based, following GFF format). This value _must_ be greater than the start."
+ }
+ }
+ },
+ "segments": {
+ "type": "object",
+ "required": ["segments"],
+ "properties": {
+ "segments": {
+ "type": "array",
+ "minItems": 1,
+ "items": {
+ "type": "object",
+ "allOf": [{ "$ref": "#/$defs/startend" }]
+ }
}
}
}
=====================================
augur/data/schema-export-v2.json
=====================================
@@ -51,44 +51,7 @@
}
},
"genome_annotations": {
- "description": "Genome annotations (e.g. genes), relative to the reference genome",
- "$comment": "Required for the entropy panel",
- "type": "object",
- "required": ["nuc"],
- "additionalProperties": false,
- "properties": {
- "nuc": {
- "type": "object",
- "properties": {
- "seqid":{
- "description": "Sequence on which the coordinates below are valid. Could be viral segment, bacterial contig, etc",
- "$comment": "currently unused by Auspice",
- "type": "string"
- },
- "type": {
- "description": "Type of the feature. could be mRNA, CDS, or similar",
- "$comment": "currently unused by Auspice",
- "type": "string"
- },
- "start": {
- "description": "Gene start position (one-based, following GFF format)",
- "type": "number"
- },
- "end": {
- "description": "Gene end position (one-based closed, last position of feature, following GFF format)",
- "type": "number"
- },
- "strand": {
- "description": "Positive or negative strand",
- "type": "string",
- "enum": ["-","+"]
- }
- }
- }
- },
- "patternProperties": {
- "^[a-zA-Z0-9*_-]+$": {"$ref": "#/properties/meta/properties/genome_annotations/properties/nuc"}
- }
+ "$ref": "https://nextstrain.org/schemas/augur/annotations"
},
"filters": {
"description": "These appear as filters in the footer of Auspice (which populates the displayed values based upon the tree)",
=====================================
augur/distance.py
=====================================
@@ -31,6 +31,12 @@ tips sampled from previous seasons prior to the given date. These two date
parameters allow users to specify a fixed time interval for pairwise
calculations, limiting the computationally complexity of the comparisons.
+For all distance calculations, a consecutive series of gap characters (`-`)
+counts as a single difference between any pair of sequences. This behavior
+reflects the assumption that there was an underlying biological process that
+produced the insertion or deletion as a single event as opposed to multiple
+independent insertion/deletion events.
+
**Distance maps**
Distance maps are defined in JSON format with two required top-level keys.
@@ -47,6 +53,19 @@ The simplest possible distance map calculates Hamming distance between sequences
"map": {}
}
+To ignore specific characters such as gaps or ambiguous nucleotides from the
+distance calculation, define a top-level `ignored_characters` key with a list of
+characters to ignore.
+
+.. code-block:: json
+
+ {
+ "name": "Hamming distance",
+ "default": 1,
+ "ignored_characters": ["-", "N"],
+ "map": {}
+ }
+
By default, distances are floating point values whose precision can be controlled with the `precision` key that defines the number of decimal places to retain for each distance.
The following example shows how to specify a precision of two decimal places in the final output:
=====================================
augur/filter/include_exclude_rules.py
=====================================
@@ -187,7 +187,20 @@ def filter_by_query(metadata, query) -> FilterFunctionReturn:
# Create a copy to prevent modification of the original DataFrame.
metadata_copy = metadata.copy()
- # Try converting all columns to numeric.
+ # Support numeric comparisons in query strings.
+ #
+ # The built-in data type inference when loading the DataFrame does not
+ # support nullable numeric columns, so numeric comparisons won't work on
+ # those columns. pd.to_numeric does proper conversion on those columns, and
+ # will not make any changes to columns with other values.
+ #
+ # TODO: Parse the query string and apply conversion only to columns used for
+ # numeric comparison. Pandas does not expose the API used to parse the query
+ # string internally, so this is non-trivial and requires a bit of
+ # reverse-engineering. Commit 2ead5b3e3306dc1100b49eb774287496018122d9 got
+ # halfway there but had issues so it was reverted.
+ #
+ # TODO: Try boolean conversion?
for column in metadata_copy.columns:
metadata_copy[column] = pd.to_numeric(metadata_copy[column], errors='ignore')
=====================================
augur/refine.py
=====================================
@@ -259,6 +259,12 @@ def run(args):
node_data['clock'] = {'rate': tt.date2dist.clock_rate,
'intercept': tt.date2dist.intercept,
'rtt_Tmrca': -tt.date2dist.intercept/tt.date2dist.clock_rate}
+ # Include the standard deviation of the clock rate, if the covariance
+ # matrix is available.
+ if hasattr(tt.date2dist, "cov") and tt.date2dist.cov is not None:
+ node_data["clock"]["cov"] = tt.date2dist.cov
+ node_data["clock"]["rate_std"] = np.sqrt(tt.date2dist.cov[0, 0])
+
if args.coalescent=='skyline':
try:
skyline, conf = tt.merger_model.skyline_inferred(gen=args.gen_per_year, confidence=2)
=====================================
augur/validate.py
=====================================
@@ -25,7 +25,7 @@ class ValidateError(Exception):
pass
-def load_json_schema(path):
+def load_json_schema(path, refs=None):
'''
Load a JSON schema from the augur included set of schemas
(located in augur/data)
@@ -40,7 +40,28 @@ def load_json_schema(path):
Validator.check_schema(schema)
except jsonschema.exceptions.SchemaError as err:
raise ValidateError(f"Schema {path} is not a valid JSON Schema ({Validator.META_SCHEMA['$schema']}). Error: {err}")
- return Validator(schema)
+
+ if refs:
+ # Make the validator aware of additional schemas
+ schema_store = {k: json.loads(resource_string(__package__, os.path.join("data", v))) for k,v in refs.items()}
+ resolver = jsonschema.RefResolver.from_schema(schema,store=schema_store)
+ schema_validator = Validator(schema, resolver=resolver)
+ else:
+ schema_validator = Validator(schema)
+
+ # By default $ref URLs which we don't define in a schema_store are fetched
+ # by jsonschema. This often indicates a typo (the $ref doesn't match the key
+ # of the schema_store) or we forgot to add a local mapping for a new $ref.
+ # Either way, Augur should not be accessing the network.
+ def resolve_remote(url):
+ # The exception type is not important as jsonschema will catch & re-raise as a RefResolutionError
+ raise Exception(f"The schema used for validation attempted to fetch the remote URL {url!r}. " +
+ "Augur should resolve schema references to local files, please check the schema used " +
+ "and update the appropriate schema_store as needed." )
+ schema_validator.resolver.resolve_remote = resolve_remote
+
+ return schema_validator
+
def load_json(path):
with open(path, 'rb') as fh:
@@ -163,7 +184,15 @@ def auspice_config_v2(config_json, **kwargs):
validate(config, schema, config_json)
def export_v2(main_json, **kwargs):
- main_schema = load_json_schema("schema-export-v2.json")
+ # The main_schema uses references to other schemas, and the suggested use is
+ # to define these refs as valid URLs. Augur itself should not access schemas
+ # over the wire so we provide a mapping between URLs and filepaths here. The
+ # filepath is specified relative to ./augur/data (where all the schemas
+ # live).
+ refs = {
+ 'https://nextstrain.org/schemas/augur/annotations': "schema-annotations.json"
+ }
+ main_schema = load_json_schema("schema-export-v2.json", refs)
if main_json.endswith("frequencies.json") or main_json.endswith("entropy.json") or main_json.endswith("sequences.json"):
raise ValidateError("This validation subfunction is for the main `augur export v2` JSON only.")
=====================================
debian/changelog
=====================================
@@ -1,3 +1,9 @@
+augur (22.4.0-1) unstable; urgency=medium
+
+ * New upstream version
+
+ -- Étienne Mollier <emollier at debian.org> Fri, 01 Sep 2023 21:34:48 +0200
+
augur (22.3.0-1) unstable; urgency=medium
* New upstream version
=====================================
tests/functional/filter/cram/filter-query-numerical.t
=====================================
@@ -5,11 +5,11 @@ Setup
Create metadata file for testing.
$ cat >metadata.tsv <<~~
- > strain coverage
- > SEQ_1 0.94
- > SEQ_2 0.95
- > SEQ_3 0.96
- > SEQ_4
+ > strain coverage category
+ > SEQ_1 0.94 A
+ > SEQ_2 0.95 B
+ > SEQ_3 0.96 C
+ > SEQ_4
> ~~
The 'coverage' column should be query-able by numerical comparisons.
@@ -22,3 +22,14 @@ The 'coverage' column should be query-able by numerical comparisons.
$ sort filtered_strains.txt
SEQ_2
SEQ_3
+
+The 'category' column will fail when used with a numerical comparison.
+
+ $ ${AUGUR} filter \
+ > --metadata metadata.tsv \
+ > --query "category >= 0.95" \
+ > --output-strains filtered_strains.txt
+ ERROR: Internal Pandas error when applying query:
+ '>=' not supported between instances of 'str' and 'float'
+ Ensure the syntax is valid per <https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-query>.
+ [2]
=====================================
tests/functional/refine/cram/timetree.t
=====================================
@@ -21,3 +21,23 @@ Confirm that TreeTime trees match expected topology and branch lengths.
$ python3 "$TESTDIR/../../../../scripts/diff_trees.py" "$TESTDIR/../data/tree.nwk" tree.nwk --significant-digits 2
{}
+
+Confirm that JSON output includes details about the clock rate.
+
+ $ grep -A 15 '\"clock\"' branch_lengths.json
+ "clock": {
+ "cov": [
+ [
+ .*, (re)
+ .* (re)
+ ],
+ [
+ .*, (re)
+ .* (re)
+ ]
+ ],
+ "intercept": .*, (re)
+ "rate": .*, (re)
+ "rate_std": .*, (re)
+ "rtt_Tmrca": .* (re)
+ },
=====================================
tests/functional/refine/cram/timetree_with_fixed_clock_rate.t
=====================================
@@ -0,0 +1,29 @@
+Setup
+
+ $ source "$TESTDIR"/_setup.sh
+
+Try building a time tree with a fixed clock rate and clock std dev.
+
+ $ ${AUGUR} refine \
+ > --tree "$TESTDIR/../data/tree_raw.nwk" \
+ > --alignment "$TESTDIR/../data/aligned.fasta" \
+ > --metadata "$TESTDIR/../data/metadata.tsv" \
+ > --output-tree tree.nwk \
+ > --output-node-data branch_lengths.json \
+ > --timetree \
+ > --coalescent opt \
+ > --date-confidence \
+ > --date-inference marginal \
+ > --clock-rate 0.0012 \
+ > --clock-std-dev 0.0002 \
+ > --clock-filter-iqd 4 \
+ > --seed 314159 &> /dev/null
+
+Confirm that JSON output does not include information about the clock rate std dev, since it was provided by the user.
+
+ $ grep -A 4 '\"clock\"' branch_lengths.json
+ "clock": {
+ "intercept": .*, (re)
+ "rate": .*, (re)
+ "rtt_Tmrca": .* (re)
+ },
=====================================
tests/test_validate.py
=====================================
@@ -4,7 +4,10 @@ import random
from augur.validate import (
validate_collection_config_fields,
validate_collection_display_defaults,
- validate_measurements_config
+ validate_measurements_config,
+ load_json_schema,
+ validate_json,
+ ValidateError
)
@@ -88,3 +91,71 @@ class TestValidateMeasurements():
}
assert not validate_measurements_config(measurements)
assert capsys.readouterr().err == "ERROR: The default collection key 'invalid_collection' does not match any of the collections' keys.\n"
+
+
+ at pytest.fixture
+def genome_annotation_schema():
+ return load_json_schema("schema-annotations.json")
+
+class TestValidateGenomeAnnotations():
+ def test_negative_strand_nuc(self, capsys, genome_annotation_schema):
+ d = {"nuc": {"start": 1, "end": 200, "strand": "-"}}
+ with pytest.raises(ValidateError):
+ validate_json(d, genome_annotation_schema, "<test-json>")
+ capsys.readouterr() # suppress validation error printing
+
+ def test_nuc_not_starting_at_one(self, capsys, genome_annotation_schema):
+ d = {"nuc": {"start": 100, "end": 200, "strand": "+"}}
+ with pytest.raises(ValidateError):
+ validate_json(d, genome_annotation_schema, "<test-json>")
+ capsys.readouterr() # suppress validation error printing
+
+ def test_missing_nuc(self, capsys, genome_annotation_schema):
+ d = {"cds": {"start": 100, "end": 200, "strand": "+"}}
+ with pytest.raises(ValidateError):
+ validate_json(d, genome_annotation_schema, "<test-json>")
+ capsys.readouterr() # suppress validation error printing
+
+ def test_missing_properties(self, capsys, genome_annotation_schema):
+ d = {"nuc": {"start": 1, "end": 100}, "cds": {"start": 20, "strand": "+"}}
+ with pytest.raises(ValidateError):
+ validate_json(d, genome_annotation_schema, "<test-json>")
+ capsys.readouterr() # suppress validation error printing
+
+ def test_not_stranded_cds(self, capsys, genome_annotation_schema):
+ # Strand . is for features that are not stranded (as per GFF spec), and thus they're not CDSs
+ d = {"nuc": {"start": 1, "end": 100}, "cds": {"start": 18, "end": 20, "strand": "."}}
+ with pytest.raises(ValidateError):
+ validate_json(d, genome_annotation_schema, "<test-json>")
+ capsys.readouterr() # suppress validation error printing
+
+ def test_negative_coordinates(self, capsys, genome_annotation_schema):
+ d = {"nuc": {"start": 1, "end": 100}, "cds": {"start": -2, "end": 10, "strand": "+"}}
+ with pytest.raises(ValidateError):
+ validate_json(d, genome_annotation_schema, "<test-json>")
+ capsys.readouterr() # suppress validation error printing
+
+ def test_valid_genome(self, capsys, genome_annotation_schema):
+ d = {"nuc": {"start": 1, "end": 100}, "cds": {"start": 20, "end": 28, "strand": "+"}}
+ validate_json(d, genome_annotation_schema, "<test-json>")
+ capsys.readouterr() # suppress validation error printing
+
+ def test_valid_segmented_genome(self, capsys, genome_annotation_schema):
+ d = {"nuc": {"start": 1, "end": 100},
+ "cds": {"segments": [{"start": 20, "end": 28}], "strand": "+"}}
+ validate_json(d, genome_annotation_schema, "<test-json>")
+ capsys.readouterr() # suppress validation error printing
+
+ def test_invalid_segmented_genome(self, capsys, genome_annotation_schema):
+ d = {"nuc": {"start": 1, "end": 100},
+ "cds": {"segments": [{"start": 20, "end": 28}, {"start": 27}], "strand": "+"}}
+ with pytest.raises(ValidateError):
+ validate_json(d, genome_annotation_schema, "<test-json>")
+ capsys.readouterr() # suppress validation error printing
+
+ def test_string_coordinates(self, capsys, genome_annotation_schema):
+ d = {"nuc": {"start": 1, "end": 100},
+ "cds": {"segments": [{"start": 20, "end": 28}, {"start": "27", "end": "29"}], "strand": "+"}}
+ with pytest.raises(ValidateError):
+ validate_json(d, genome_annotation_schema, "<test-json>")
+ capsys.readouterr() # suppress validation error printing
\ No newline at end of file
=====================================
tests/util_support/test_node_data_file.py
=====================================
@@ -38,7 +38,7 @@ class TestNodeDataFile:
build_node_data_file(
f"""
{{
- "annotations": {{ "a": {{ "start": 5 }} }},
+ "annotations": {{ "nuc": {{ "start": 1, "end": 100 }} }},
"generated_by": {{ "program": "augur", "version": "{__version__}" }},
"nodes": {{ "a": 5 }}
}}
View it on GitLab: https://salsa.debian.org/med-team/augur/-/compare/b3ac3e3e0e1c03d328b0018585af3e6e8acfe278...4768719855138a6e6cefd7ebded3696314650f40
--
View it on GitLab: https://salsa.debian.org/med-team/augur/-/compare/b3ac3e3e0e1c03d328b0018585af3e6e8acfe278...4768719855138a6e6cefd7ebded3696314650f40
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20230904/fcf4d27e/attachment-0001.htm>
More information about the debian-med-commit
mailing list