[med-svn] [Git][med-team/augur][master] 3 commits: routine-update: New upstream version
Andreas Tille (@tille)
gitlab at salsa.debian.org
Thu Feb 8 06:17:17 GMT 2024
Andreas Tille pushed to branch master at Debian Med / augur
Commits:
5e6f94ab by Andreas Tille at 2024-02-08T06:13:50+01:00
routine-update: New upstream version
- - - - -
ff14f170 by Andreas Tille at 2024-02-08T06:13:51+01:00
New upstream version 24.1.0
- - - - -
571e2557 by Andreas Tille at 2024-02-08T06:14:11+01:00
Update upstream source from tag 'upstream/24.1.0'
Update to upstream version '24.1.0'
with Debian dir f86674db5eca5b549f50f06a1bbdabb7621e2f8b
- - - - -
28 changed files:
- .github/dependabot.yml
- .github/workflows/ci.yaml
- .github/workflows/release.yaml
- CHANGES.md
- augur/__version__.py
- augur/filter/_run.py
- augur/frequencies.py
- augur/index.py
- augur/io/metadata.py
- augur/io/vcf.py
- augur/refine.py
- augur/traits.py
- augur/translate.py
- augur/utils.py
- debian/changelog
- docs/conf.py
- docs/installation/installation.rst
- + docs/installation/non-python-dependencies.rst
- setup.py
- tests/functional/ancestral/cram/case-sensitive.t
- tests/functional/ancestral/cram/general.t
- tests/functional/ancestral/cram/vcf-multi-allele.t
- tests/functional/ancestral/cram/vcf.t
- tests/functional/ancestral/data/simple-genome/nt_muts.ref-seq.json
- tests/functional/filter/cram/filter-query-numerical.t
- + tests/functional/frequencies/cram/weights.t
- tests/functional/translate/cram/root-mutations.t
- tests/io/test_vcf.py
Changes:
=====================================
.github/dependabot.yml
=====================================
@@ -1,13 +1,16 @@
# Dependabot configuration file
-# Reference: https://docs.github.com/en/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file
+# <https://docs.github.com/en/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file>
#
-# Each ecosystem is checked on a scheduled interval defined below.
-# To trigger a check manually, go to https://github.com/nextstrain/augur/network/updates
-# and click "Last checked …" then "Check for updates".
-
+# Each ecosystem is checked on a scheduled interval defined below. To trigger
+# a check manually, go to
+#
+# https://github.com/nextstrain/augur/network/updates
+#
+# and look for a "Check for updates" button. You may need to click around a
+# bit first.
+---
version: 2
updates:
-
- package-ecosystem: "github-actions"
directory: "/"
schedule:
=====================================
.github/workflows/ci.yaml
=====================================
@@ -27,17 +27,8 @@ jobs:
biopython-version:
# list of Biopython versions with support for a new Python version
# from https://github.com/biopython/biopython/blob/master/NEWS.rst
- - '1.76' # first to support Python 3.8
- - '1.79' # first to support Python 3.9
- '1.80' # first to support Python 3.10 and 3.11
- '' # latest
- exclude:
- # some older Biopython versions are incompatible with later Python versions
- - { biopython-version: '1.76', python-version: '3.9' }
- - { biopython-version: '1.76', python-version: '3.10' }
- - { biopython-version: '1.76', python-version: '3.11' }
- - { biopython-version: '1.79', python-version: '3.10' }
- - { biopython-version: '1.79', python-version: '3.11' }
defaults:
run:
shell: bash -l {0}
@@ -46,19 +37,28 @@ jobs:
COVERAGE_RCFILE: ${{ github.workspace }}/.coveragerc
steps:
- uses: actions/checkout at v4
- - uses: conda-incubator/setup-miniconda at v3
+
+ - name: Set cache key
+ run: echo "DATE=$(date +'%Y-%m-%d')" >> "$GITHUB_ENV"
+
+ # While it may be tempting to install Augur using Conda to avoid hardcoding
+ # the list of dependencies here, installing just the dependencies not
+ # available on PyPI allows us to test against dependencies installed by pip,
+ # which may have slightly different versions compared to Conda counterparts.
+ - name: Install dependencies from Conda
+ uses: mamba-org/setup-micromamba at v1
with:
- python-version: ${{ matrix.python-version }}
- miniforge-variant: Mambaforge
- channels: conda-forge,bioconda
- - run: |
- mamba install \
- mafft \
- raxml \
- fasttree \
- iqtree \
- vcftools \
- biopython=${{ matrix.biopython-version }}
+ create-args: mafft raxml fasttree iqtree vcftools biopython=${{ matrix.biopython-version }} python=${{ matrix.python-version }}
+ condarc: |
+ channels:
+ - conda-forge
+ - bioconda
+ channel_priority: strict
+ cache-environment: true
+ cache-environment-key: ${{ env.DATE }}
+ environment-name: augur
+
+ # Replace the Conda Augur installation with the local version.
- run: pip install .[dev]
- run: conda info
- run: conda list
@@ -114,10 +114,11 @@ jobs:
path: ./augur
- name: Set cache key
- id: cache-key
run: echo "DATE=$(date +'%Y-%m-%d')" >> "$GITHUB_ENV"
- - uses: mamba-org/setup-micromamba at v1
+ # Set up a Conda environment that replicates Nextstrain's Conda runtime.
+ - name: Install nextstrain-base from Conda
+ uses: mamba-org/setup-micromamba at v1
with:
create-args: nextstrain-base
condarc: |
@@ -130,6 +131,7 @@ jobs:
cache-environment-key: ${{ env.DATE }}
environment-name: augur
+ # Replace the Conda Augur installation with the local version.
- run: pip install ./augur
- uses: actions/checkout at v4
=====================================
.github/workflows/release.yaml
=====================================
@@ -1,4 +1,5 @@
-name: Release
+run-name: Release ${{ inputs.version }}
+
on:
workflow_dispatch:
inputs:
=====================================
CHANGES.md
=====================================
@@ -3,6 +3,23 @@
## __NEXT__
+## 24.1.0 (30 January 2024)
+
+### Features
+
+* `augur.io.read_metadata`: A new optional `dtype` argument allows custom data types for all columns. Automatic type inference still happens by default, so this is not a breaking change. [#1252][] (@victorlin)
+* `augur.io.read_vcf` has been removed and usage replaced with TreeTime's function of the same name which has improved validation of the VCF file. [#1366][] (@jameshadfield)
+
+### Bug Fixes
+
+* filter, frequencies, refine: Speed up reading of the metadata file. [#1252][] (@victorlin)
+* traits: Previously, columns with only numeric values were treated as numerical data. These are now treated as categorical data for discrete trait analysis. [#1252][] (@victorlin)
+* Support Biopython `≥1.82` by requiring bcbio-gff `≥0.7.1`. [#1400][] (@victorlin)
+
+[#1252]: https://github.com/nextstrain/augur/pull/1252
+[#1366]: https://github.com/nextstrain/augur/pull/1366
+[#1400]: https://github.com/nextstrain/augur/pull/1400
+
## 24.0.0 (22 January 2024)
### Major Changes
=====================================
augur/__version__.py
=====================================
@@ -1,4 +1,4 @@
-__version__ = '24.0.0'
+__version__ = '24.1.0'
def is_augur_version_compatible(version):
=====================================
augur/filter/_run.py
=====================================
@@ -169,6 +169,7 @@ def run(args):
delimiters=args.metadata_delimiters,
id_columns=args.metadata_id_columns,
chunk_size=args.metadata_chunk_size,
+ dtype="string",
)
except InvalidDelimiter:
raise AugurError(
@@ -320,6 +321,7 @@ def run(args):
delimiters=args.metadata_delimiters,
id_columns=args.metadata_id_columns,
chunk_size=args.metadata_chunk_size,
+ dtype="string",
)
for metadata in metadata_reader:
# Recalculate groups for subsampling as we loop through the
=====================================
augur/frequencies.py
=====================================
@@ -85,7 +85,14 @@ def format_frequencies(freq):
def run(args):
try:
- metadata = read_metadata(args.metadata, delimiters=args.metadata_delimiters, id_columns=args.metadata_id_columns)
+ # TODO: load only the ID, date, and --weights-attribute columns when
+ # read_metadata supports loading a subset of all columns.
+ metadata = read_metadata(
+ args.metadata,
+ delimiters=args.metadata_delimiters,
+ id_columns=args.metadata_id_columns,
+ dtype="string",
+ )
except InvalidDelimiter:
raise AugurError(
f"Could not determine the delimiter of {args.metadata!r}. "
@@ -114,10 +121,9 @@ def run(args):
tip.attr = {"num_date": np.mean(dates[tip.name])}
tps.append(tip.attr["num_date"])
- # Annotate tips with metadata to enable filtering and weighting of
- # frequencies by metadata attributes.
- for key, value in metadata.loc[tip.name].items():
- tip.attr[key] = value
+ if weights_attribute:
+ # Annotate tip with weight attribute.
+ tip.attr[weights_attribute] = metadata.loc[tip.name, weights_attribute]
if args.method == "diffusion":
# estimate tree frequencies
=====================================
augur/index.py
=====================================
@@ -7,7 +7,9 @@ import csv
from .io.file import open_file
from .io.sequences import read_sequences
-from .io.vcf import is_vcf, read_vcf
+from .io.vcf import is_vcf
+from treetime.vcf_utils import read_vcf
+
DELIMITER = '\t'
@@ -40,7 +42,8 @@ def index_vcf(vcf_path, index_path):
number of strains indexed
"""
- strains, _ = read_vcf(vcf_path)
+ strains = list(read_vcf(vcf_path)['sequences'].keys())
+
num_of_seqs = 0
with open_file(index_path, 'wt') as out_file:
=====================================
augur/io/metadata.py
=====================================
@@ -24,7 +24,7 @@ class InvalidDelimiter(Exception):
pass
-def read_metadata(metadata_file, delimiters=DEFAULT_DELIMITERS, id_columns=DEFAULT_ID_COLUMNS, chunk_size=None):
+def read_metadata(metadata_file, delimiters=DEFAULT_DELIMITERS, id_columns=DEFAULT_ID_COLUMNS, chunk_size=None, dtype=None):
r"""Read metadata from a given filename and into a pandas `DataFrame` or
`TextFileReader` object.
@@ -40,7 +40,9 @@ def read_metadata(metadata_file, delimiters=DEFAULT_DELIMITERS, id_columns=DEFAU
Only one id column will be inferred.
chunk_size : int
Size of chunks to stream from disk with an iterator instead of loading the entire input file into memory.
-
+ dtype : dict or str
+ Data types to apply to columns in metadata. If unspecified, pandas data type inference will be used.
+ See documentation for an argument of the same name to `pandas.read_csv()`.
Returns
-------
pandas.DataFrame or `pandas.io.parsers.TextFileReader`
@@ -107,15 +109,31 @@ def read_metadata(metadata_file, delimiters=DEFAULT_DELIMITERS, id_columns=DEFAU
else:
index_col = id_columns_present[0]
- # If we found a valid column to index the DataFrame, specify that column and
- # also tell pandas that the column should be treated like a string instead
- # of having its type inferred. This latter argument allows users to provide
- # numerical ids that don't get converted to numbers by pandas.
+ # If we found a valid column to index the DataFrame, specify that column.
kwargs["index_col"] = index_col
- kwargs["dtype"] = {
- index_col: "string",
- METADATA_DATE_COLUMN: "string"
- }
+
+ if dtype is None:
+ dtype = {}
+
+ if isinstance(dtype, dict):
+ # Avoid reading numerical IDs as integers.
+ dtype["index_col"] = "string"
+
+ # Avoid reading year-only dates as integers.
+ dtype[METADATA_DATE_COLUMN] = "string"
+
+ elif isinstance(dtype, str):
+ if dtype != "string":
+ raise AugurError(f"""
+ dtype='{dtype}' converts values in all columns to be of type
+ '{dtype}'. However, values in columns '{index_col}' and
+ '{METADATA_DATE_COLUMN}' must be treated as strings in Augur.
+ Specify dtype as a dict per column instead.
+ """)
+ else:
+ raise AugurError(f"Unsupported value for dtype: '{dtype}'")
+
+ kwargs["dtype"] = dtype
return pd.read_csv(
metadata_file,
=====================================
augur/io/vcf.py
=====================================
@@ -24,22 +24,6 @@ def is_vcf(filename):
return bool(filename) and any(filename.lower().endswith(x) for x in ('.vcf', '.vcf.gz'))
-def read_vcf(filename):
- if filename.lower().endswith(".gz"):
- import gzip
- file = gzip.open(filename, mode="rt", encoding='utf-8')
- else:
- file = open(filename, encoding='utf-8')
-
- chrom_line = next(line for line in file if line.startswith("#C"))
- file.close()
- headers = chrom_line.strip().split("\t")
- sequences = headers[headers.index("FORMAT") + 1:]
-
- # because we need 'seqs to remove' for VCF
- return sequences, sequences.copy()
-
-
def write_vcf(input_filename, output_filename, dropped_samps):
if _filename_gz(input_filename):
input_arg = "--gzvcf"
=====================================
augur/refine.py
=====================================
@@ -214,10 +214,14 @@ def run(args):
print("ERROR: meta data with dates is required for time tree reconstruction", file=sys.stderr)
return 1
try:
+ # TODO: load only the ID and date columns when read_metadata
+ # supports loading a subset of all columns.
metadata = read_metadata(
args.metadata,
delimiters=args.metadata_delimiters,
- id_columns=args.metadata_id_columns)
+ id_columns=args.metadata_id_columns,
+ dtype="string",
+ )
except InvalidDelimiter:
raise AugurError(
f"Could not determine the delimiter of {args.metadata!r}. "
=====================================
augur/traits.py
=====================================
@@ -131,7 +131,11 @@ def run(args):
traits = read_metadata(
args.metadata,
delimiters=args.metadata_delimiters,
- id_columns=args.metadata_id_columns)
+ id_columns=args.metadata_id_columns,
+
+ # Read all columns as string for discrete trait analysis
+ dtype="string",
+ )
except InvalidDelimiter:
raise AugurError(
f"Could not determine the delimiter of {args.metadata!r}. "
=====================================
augur/translate.py
=====================================
@@ -185,7 +185,7 @@ def translate_vcf_feature(sequences, ref, feature, feature_name):
#Translate just the codon this nuc diff is in, and find out which AA loc
#But need numbering to be w/in protin, not whole genome
- if feature.strand == -1:
+ if feature.location.strand == -1:
aaRepLocs = {(end-start-i-1)//3:safe_translate( str_reverse_comp( "".join([sequences[seqk][key+start]
if key+start in sequences[seqk].keys() else ref[key+start]
for key in range(i-i%3,i+3-i%3)]) ))
=====================================
augur/utils.py
=====================================
@@ -296,12 +296,12 @@ def _read_gff(reference, feature_names):
# Note that `GFF.parse` doesn't always yield GFF records in the order
# one may expect, but since we raise AugurError if there are multiple
# this doesn't matter.
- gff_entries = list(GFF.parse(in_handle, limit_info={'gff_type': valid_types}))
-
- # TODO: Remove warning filter reversal after it's addressed upstream:
- # <https://github.com/chapmanb/bcbb/issues/140>
+ # TODO: Remove warning suppression after it's addressed upstream:
+ # <https://github.com/chapmanb/bcbb/issues/143>
import warnings
from Bio import BiopythonDeprecationWarning
+ warnings.simplefilter("ignore", BiopythonDeprecationWarning)
+ gff_entries = list(GFF.parse(in_handle, limit_info={'gff_type': valid_types}))
warnings.simplefilter("default", BiopythonDeprecationWarning)
if len(gff_entries) == 0:
=====================================
debian/changelog
=====================================
@@ -1,3 +1,9 @@
+augur (24.1.0-1) UNRELEASED; urgency=medium
+
+ * New upstream version
+
+ -- Andreas Tille <tille at debian.org> Thu, 08 Feb 2024 06:13:50 +0100
+
augur (24.0.0-1) unstable; urgency=medium
* New upstream version (Closes: #1044079)
=====================================
docs/conf.py
=====================================
@@ -64,6 +64,7 @@ extensions = [
'sphinx_autodoc_typehints', # must come after napoleon https://github.com/tox-dev/sphinx-autodoc-typehints/blob/1.21.4/README.md#compatibility-with-sphinxextnapoleon
'sphinx_markdown_tables',
'sphinx.ext.intersphinx',
+ 'sphinx_tabs.tabs',
'nextstrain.sphinx.theme',
]
@@ -137,6 +138,7 @@ nitpick_ignore = [
intersphinx_mapping = {
'Bio': ('https://biopython.org/docs/latest/api/', None),
'docs.nextstrain.org': ('https://docs.nextstrain.org/en/latest/', None),
+ 'cli': ('https://docs.nextstrain.org/projects/cli/en/stable', None),
'python': ('https://docs.python.org/3', None),
'numpy': ('https://numpy.org/doc/stable', None),
'pandas': ('https://pandas.pydata.org/docs', None),
=====================================
docs/installation/installation.rst
=====================================
@@ -8,87 +8,74 @@ Installation
.. contents::
:local:
-Installing dependencies
-=======================
+Install Augur
+=============
-Augur uses some external bioinformatics programs:
+There are several ways to install Augur, ordered from least to most complex.
-- ``augur align`` requires `mafft <https://mafft.cbrc.jp/alignment/software/>`__
+.. tabs::
-- ``augur tree`` requires at least one of:
+ .. group-tab:: Nextstrain
- - `IQ-TREE <http://www.iqtree.org/>`__ (used by default)
- - `RAxML <https://sco.h-its.org/exelixis/web/software/raxml/>`__ (optional alternative)
- - `FastTree <http://www.microbesonline.org/fasttree/>`__ (optional alternative)
+ Augur is part of the Nextstrain project and is available in all :term:`Nextstrain runtimes <docs.nextstrain.org:runtime>`.
-- Bacterial data (or any VCF usage) requires `vcftools <https://vcftools.github.io/>`__
+ Continue by following the :doc:`Nextstrain installation guide <docs.nextstrain.org:install>`.
-If you use Conda or Mamba, you can install them in an active environment:
+ Once installed, you can use :doc:`cli:commands/shell` to run ``augur`` directly.
-.. code:: bash
+ .. group-tab:: Conda
- conda install -c conda-forge -c bioconda mafft raxml fasttree iqtree vcftools --yes
+ Augur can be installed using Conda or another variant. This assumes you are familiar with how to `manage Conda environments <https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html>`__.
-On macOS using `Homebrew <https://brew.sh/>`__:
+ .. code:: bash
-.. code:: bash
+ conda install -c conda-forge -c bioconda augur
- brew tap brewsci/bio
- brew install mafft iqtree raxml fasttree vcftools
+ This installs Augur along with all dependencies.
-On Debian/Ubuntu:
+ .. group-tab:: PyPI
-.. code:: bash
+ .. warning::
+ Installing other Python packages after Augur may cause dependency incompatibilities. If this happens, re-install Augur using the command in step 1.
- sudo apt install mafft iqtree raxml fasttree vcftools
+ Augur is written in Python 3 and requires at least Python 3.8. It's published on `PyPI <https://pypi.org>`__ as `nextstrain-augur <https://pypi.org/project/nextstrain-augur>`__.
-Other Linux distributions will likely have the same packages available, although the names may differ slightly.
+ 1. Install Augur along with Python dependencies.
-Install Augur as a user
-=======================
+ .. code:: bash
-Using Mamba
------------
+ python3 -m pip install nextstrain-augur
-This assumes you have Conda installed and an environment active. If not, refer to instructions for ambient runtime setup on `the Nextstrain installation guide <https://docs.nextstrain.org/en/latest/install.html>`__.
+ 2. Install other dependencies.
-.. code:: bash
+ .. include:: non-python-dependencies.rst
- conda install -c conda-forge -c bioconda augur
+ .. group-tab:: Source
-If you encounter environment solving errors or want a faster installation process, use `mamba <https://github.com/TheSnakePit/mamba>`__ as a drop-in replacement for conda:
+ .. warning::
+ Installing other Python packages after Augur may cause dependency incompatibilities. If this happens, re-install Augur using the command in step 1.
-.. code:: bash
+ Augur can be installed from source. This is useful if you want to use unreleased changes or develop Augur locally.
- mamba install -c conda-forge -c bioconda augur
+ 1. Install Augur along with Python dependencies.
-Using pip from PyPi
--------------------
+ .. code:: bash
-Augur is written in Python 3 and requires at least Python 3.8. It's published on `PyPi <https://pypi.org>`__ as `nextstrain-augur <https://pypi.org/project/nextstrain-augur>`__, so you can install it with ``pip`` like so:
+ git clone https://github.com/nextstrain/augur.git
+ cd augur
+ python3 -m pip install .
-.. code:: bash
+ .. note::
- python3 -m pip install nextstrain-augur
+ For local development, install from source in editable mode with ``dev`` dependencies.
-From source
------------
+ .. code:: bash
-.. code:: bash
+ python3 -m pip install -e .'[dev]'
- git clone https://github.com/nextstrain/augur.git
- python3 -m pip install .
+ 2. Install other dependencies.
-This installs Augur along with external Python dependencies.
-
-Install Augur as a developer
-============================
-
-.. code:: bash
-
- python3 -m pip install -e '.[dev]'
-
-This installs dependencies necessary for local development.
+ .. include:: non-python-dependencies.rst
Testing if it worked
====================
=====================================
docs/installation/non-python-dependencies.rst
=====================================
@@ -0,0 +1,32 @@
+Augur uses some external bioinformatics programs that are not available on PyPI:
+
+- ``augur align`` requires `mafft <https://mafft.cbrc.jp/alignment/software/>`__
+
+- ``augur tree`` requires at least one of:
+
+ - `IQ-TREE <http://www.iqtree.org/>`__ (used by default)
+ - `RAxML <https://sco.h-its.org/exelixis/web/software/raxml/>`__ (optional alternative)
+ - `FastTree <http://www.microbesonline.org/fasttree/>`__ (optional alternative)
+
+- Bacterial data (or any VCF usage) requires `vcftools <https://vcftools.github.io/>`__
+
+If you use Conda, you can install them in an active environment:
+
+.. code:: bash
+
+ conda install -c conda-forge -c bioconda mafft raxml fasttree iqtree vcftools --yes
+
+On macOS using `Homebrew <https://brew.sh/>`__:
+
+.. code:: bash
+
+ brew tap brewsci/bio
+ brew install mafft iqtree raxml fasttree vcftools
+
+On Debian/Ubuntu:
+
+.. code:: bash
+
+ sudo apt install mafft iqtree raxml fasttree vcftools
+
+Other Linux distributions will likely have the same packages available, although the names may differ slightly.
=====================================
setup.py
=====================================
@@ -51,9 +51,9 @@ setuptools.setup(
package_data = {'augur': ['data/*']},
python_requires = '>={}'.format('.'.join(str(n) for n in py_min_version)),
install_requires = [
- "bcbio-gff >=0.7.0, ==0.7.*",
- # Skip Biopython >=1.82: https://github.com/nextstrain/augur/issues/1373
- "biopython >=1.67, !=1.77, !=1.78, <1.82",
+ "bcbio-gff >=0.7.1, ==0.7.*",
+ # TODO: Remove biopython >= 1.80 pin if it is added to bcbio-gff: https://github.com/chapmanb/bcbb/issues/142
+ "biopython >=1.80, ==1.*",
"cvxopt >=1.1.9, ==1.*",
"importlib_resources >=5.3.0; python_version < '3.11'",
"isodate ==0.6.*",
@@ -87,6 +87,7 @@ setuptools.setup(
"sphinx-markdown-tables >= 0.0.9",
"sphinx-rtd-theme >=0.4.3",
"sphinx-autodoc-typehints >=1.21.4",
+ "sphinx-tabs",
"types-jsonschema >=3.0.0, ==3.*",
"types-setuptools",
"wheel >=0.32.3",
=====================================
tests/functional/ancestral/cram/case-sensitive.t
=====================================
@@ -16,7 +16,7 @@ Change the _reference_ to lowercase
$ python3 "$TESTDIR/../../../../scripts/diff_jsons.py" \
> "$TESTDIR/../data/simple-genome/nt_muts.ref-seq.json" \
> "nt_muts.ref-seq.json" \
- > --exclude-paths "root['meta']['updated']"
+ > --exclude-paths "root['generated_by']"
{}
@@ -38,5 +38,5 @@ be lowecase which will be compared against the uppercase reference
$ python3 "$TESTDIR/../../../../scripts/diff_jsons.py" \
> "$TESTDIR/../data/simple-genome/nt_muts.ref-seq.json" \
> "nt_muts.ref-seq.json" \
- > --exclude-paths "root['meta']['updated']"
+ > --exclude-paths "root['generated_by']"
{}
\ No newline at end of file
=====================================
tests/functional/ancestral/cram/general.t
=====================================
@@ -20,7 +20,7 @@ node-data JSON we diff against.
$ python3 "$TESTDIR/../../../../scripts/diff_jsons.py" \
> "$TESTDIR/../data/simple-genome/nt_muts.ref-seq.json" \
> "nt_muts.ref-seq.json" \
- > --exclude-paths "root['meta']['updated']"
+ > --exclude-paths "root['generated_by']"
{}
Same as above but without providing a `--root-sequence`. The effect of this on behaviour is:
@@ -40,5 +40,5 @@ mutations (as there's nothing to compare the root node to)
$ python3 "$TESTDIR/../../../../scripts/diff_jsons.py" \
> "$TESTDIR/../data/simple-genome/nt_muts.no-ref-seq.json" \
> "nt_muts.no-ref-seq.json" \
- > --exclude-paths "root['meta']['updated']"
+ > --exclude-paths "root['generated_by']"
{}
=====================================
tests/functional/ancestral/cram/vcf-multi-allele.t
=====================================
@@ -24,7 +24,7 @@ See <https://github.com/nextstrain/augur/issues/1380> for the bug this is testin
$ python3 "$TESTDIR/../../../../scripts/diff_jsons.py" \
> "$DATA/nt_muts.ref-seq.json" \
> nt_muts.json \
- > --exclude-regex-paths "root\['nodes'\]\['.+'\]\['sequence'\]"
+ > --exclude-regex-paths "root\['nodes'\]\['.+'\]\['sequence'\]" "root\['generated_by'\]"
{'iterable_item_added': {"root['nodes']['sample_B']['muts'][0]": 'A30G'}}
$ cat > expected.vcf <<EOF
=====================================
tests/functional/ancestral/cram/vcf.t
=====================================
@@ -21,7 +21,7 @@ but it will have the reference sequence attached.
$ python3 "$TESTDIR/../../../../scripts/diff_jsons.py" \
> "$DATA/nt_muts.ref-seq.json" \
> "nt_muts.vcf-input.ref-seq.json" \
- > --exclude-regex-paths "root\['nodes'\]\['.+'\]\['sequence'\]" "root['meta']['updated']"
+ > --exclude-regex-paths "root\['nodes'\]\['.+'\]\['sequence'\]" "root\['generated_by'\]"
{}
Here's the same mutations as in $DATA/nt_muts.ref-seq.json,
=====================================
tests/functional/ancestral/data/simple-genome/nt_muts.ref-seq.json
=====================================
@@ -7,10 +7,6 @@
"type": "source"
}
},
- "generated_by": {
- "program": "augur",
- "version": "23.1.1"
- },
"mask": "00000000000000000000000000000000000000000000000000",
"nodes": {
"node_AB": {
=====================================
tests/functional/filter/cram/filter-query-numerical.t
=====================================
@@ -33,3 +33,28 @@ The 'category' column will fail when used with a numerical comparison.
'>=' not supported between instances of 'str' and 'float'
Ensure the syntax is valid per <https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-query>.
[2]
+
+Create another metadata file for testing.
+
+ $ cat >metadata.tsv <<~~
+ > strain metric1 metric2
+ > SEQ1 4 5
+ > SEQ2 5 9
+ > SEQ3 6 10
+ > ~~
+
+Use a Pandas query to filter by a numerical value.
+This relies on having proper data types associated with the columns. If < is
+comparing strings, it's likely that SEQ3 will be dropped or errors arise.
+
+ $ ${AUGUR} filter \
+ > --metadata metadata.tsv \
+ > --query "metric1 > 4 & metric1 < metric2" \
+ > --output-strains filtered_strains.txt
+ 1 strains were dropped during filtering
+ \t1 of these were filtered out by the query: "metric1 > 4 & metric1 < metric2" (esc)
+ 2 strains passed all filters
+
+ $ sort filtered_strains.txt
+ SEQ2
+ SEQ3
=====================================
tests/functional/frequencies/cram/weights.t
=====================================
@@ -0,0 +1,68 @@
+Setup
+
+ $ source "$TESTDIR"/_setup.sh
+
+Create input files.
+
+ $ cat >metadata.tsv <<~~
+ > strain date region
+ > SEQ1 2021-01-01 A
+ > SEQ2 2021-01-02 B
+ > SEQ3 2021-01-01 C
+ > SEQ4 2021-01-02 C
+ > SEQ5 2021-02-02 D
+ > ~~
+
+ $ cat >weights.json <<~~
+ > { "A": 2, "B": 1, "C": 1, "D": 0 }
+ > ~~
+
+ $ cat >tree.nwk <<~~
+ > (SEQ2,SEQ3,SEQ4,SEQ5)SEQ1;
+ > ~~
+
+Weight by region.
+
+ $ ${AUGUR} frequencies \
+ > --method kde \
+ > --tree tree.nwk \
+ > --metadata metadata.tsv \
+ > --weights weights.json \
+ > --weights-attribute region \
+ > --output tip-frequencies.json >/dev/null
+
+ $ cat tip-frequencies.json
+ {
+ "SEQ2": {
+ "frequencies": [
+ 0.5,
+ 0.5
+ ]
+ },
+ "SEQ3": {
+ "frequencies": [
+ 0.0,
+ 0.0
+ ]
+ },
+ "SEQ4": {
+ "frequencies": [
+ 0.5,
+ 0.5
+ ]
+ },
+ "SEQ5": {
+ "frequencies": [
+ 0.0,
+ 0.0
+ ]
+ },
+ "generated_by": {
+ "program": "augur",
+ "version": ".*" (re)
+ },
+ "pivots": [
+ 2021.0041,
+ 2021.2507
+ ]
+ } (no-eol)
=====================================
tests/functional/translate/cram/root-mutations.t
=====================================
@@ -6,13 +6,14 @@ Setup
$ export DATA="$TESTDIR/../data/simple-genome"
This is the same as the "general.t" test, but we are modifying the input data
-such that the reference sequence contains "G" at pos 20 (1-based), and
-include a compensating mutation G20A on the root node.
-This results in the reference translation of gene1 to be MPCE* not MPCG*.
-(Note that the compensating nuc mutation doesn't actually need to be present
-in the JSON, `augur translate` just looks at the sequence attached to each node.)
+such that the reference sequence contains "G" at pos 20 (1-based), and include a
+compensating mutation G20A on the root node. (This manipulation would be much
+nicer using `jq` but it's not (yet) available in all nextstrain environments.)
+This results in the reference translation of gene1 to be MPCE* not MPCG*. (Note
+that the compensating nuc mutation doesn't actually need to be present in the
+JSON, `augur translate` just looks at the sequence attached to each node.)
- $ sed '29s/^/ "G20A",\n/' "$ANC_DATA/nt_muts.ref-seq.json" |
+ $ sed '24s/^/ "G20A",\n/' "$ANC_DATA/nt_muts.ref-seq.json" |
> sed 's/"nuc": "AAAAAAAAAATGCCCTGCGGG/"nuc": "AAAAAAAAAATGCCCTGCGAG/' > nt_muts.json
$ ${AUGUR} translate \
=====================================
tests/io/test_vcf.py
=====================================
@@ -1,5 +1,6 @@
import pytest
import augur.io.vcf
+from treetime.vcf_utils import read_vcf
@pytest.fixture
@@ -8,21 +9,20 @@ def mock_run_shell_command(mocker):
class TestVCF:
+ # The `read_vcf` functionality used to be in an augur module when these
+ # tests were originally written but we now use TreeTime's function of the
+ # same name. The tests remain here to protect against any unforeseen changes.
def test_read_vcf_compressed(self):
- seq_keep, all_seq = augur.io.vcf.read_vcf(
- "tests/data/tb_lee_2015.vcf.gz"
- )
+ seq_keep = list(read_vcf("tests/data/tb_lee_2015.vcf.gz")['sequences'].keys())
assert len(seq_keep) == 150
assert seq_keep[149] == "G22733"
- assert seq_keep == all_seq
def test_read_vcf_uncompressed(self):
- seq_keep, all_seq = augur.io.vcf.read_vcf("tests/data/tb_lee_2015.vcf")
+ seq_keep = list(read_vcf("tests/data/tb_lee_2015.vcf")['sequences'].keys())
assert len(seq_keep) == 150
assert seq_keep[149] == "G22733"
- assert seq_keep == all_seq
def test_write_vcf_compressed_input(self, mock_run_shell_command):
augur.io.vcf.write_vcf(
View it on GitLab: https://salsa.debian.org/med-team/augur/-/compare/972dd44fb6cac8e90342b29eaa882fdecc4f9cd4...571e2557b03c6d13cab25800f49cd211aff21570
--
View it on GitLab: https://salsa.debian.org/med-team/augur/-/compare/972dd44fb6cac8e90342b29eaa882fdecc4f9cd4...571e2557b03c6d13cab25800f49cd211aff21570
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20240208/56a434c2/attachment-0001.htm>
More information about the debian-med-commit
mailing list