[med-svn] [Git][med-team/augur][master] 3 commits: routine-update: New upstream version

Thu Feb 8 06:17:17 GMT 2024


Andreas Tille pushed to branch master at Debian Med / augur


Commits:
5e6f94ab by Andreas Tille at 2024-02-08T06:13:50+01:00
routine-update: New upstream version

- - - - -
ff14f170 by Andreas Tille at 2024-02-08T06:13:51+01:00
New upstream version 24.1.0
- - - - -
571e2557 by Andreas Tille at 2024-02-08T06:14:11+01:00
Update upstream source from tag 'upstream/24.1.0'

Update to upstream version '24.1.0'
with Debian dir f86674db5eca5b549f50f06a1bbdabb7621e2f8b
- - - - -


28 changed files:

- .github/dependabot.yml
- .github/workflows/ci.yaml
- .github/workflows/release.yaml
- CHANGES.md
- augur/__version__.py
- augur/filter/_run.py
- augur/frequencies.py
- augur/index.py
- augur/io/metadata.py
- augur/io/vcf.py
- augur/refine.py
- augur/traits.py
- augur/translate.py
- augur/utils.py
- debian/changelog
- docs/conf.py
- docs/installation/installation.rst
- + docs/installation/non-python-dependencies.rst
- setup.py
- tests/functional/ancestral/cram/case-sensitive.t
- tests/functional/ancestral/cram/general.t
- tests/functional/ancestral/cram/vcf-multi-allele.t
- tests/functional/ancestral/cram/vcf.t
- tests/functional/ancestral/data/simple-genome/nt_muts.ref-seq.json
- tests/functional/filter/cram/filter-query-numerical.t
- + tests/functional/frequencies/cram/weights.t
- tests/functional/translate/cram/root-mutations.t
- tests/io/test_vcf.py


Changes:

=====================================
.github/dependabot.yml
=====================================
@@ -1,13 +1,16 @@
 # Dependabot configuration file
-# Reference: https://docs.github.com/en/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file
+# <https://docs.github.com/en/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file>
 #
-# Each ecosystem is checked on a scheduled interval defined below.
-# To trigger a check manually, go to https://github.com/nextstrain/augur/network/updates
-# and click "Last checked …" then "Check for updates".
-
+# Each ecosystem is checked on a scheduled interval defined below.  To trigger
+# a check manually, go to
+#
+#   https://github.com/nextstrain/augur/network/updates
+#
+# and look for a "Check for updates" button.  You may need to click around a
+# bit first.
+---
 version: 2
 updates:
-
   - package-ecosystem: "github-actions"
     directory: "/"
     schedule:


=====================================
.github/workflows/ci.yaml
=====================================
@@ -27,17 +27,8 @@ jobs:
         biopython-version:
           # list of Biopython versions with support for a new Python version
           # from https://github.com/biopython/biopython/blob/master/NEWS.rst
-          - '1.76' # first to support Python 3.8
-          - '1.79' # first to support Python 3.9
           - '1.80' # first to support Python 3.10 and 3.11
           - ''     # latest
-        exclude:
-          # some older Biopython versions are incompatible with later Python versions
-          - { biopython-version: '1.76', python-version: '3.9' }
-          - { biopython-version: '1.76', python-version: '3.10' }
-          - { biopython-version: '1.76', python-version: '3.11' }
-          - { biopython-version: '1.79', python-version: '3.10' }
-          - { biopython-version: '1.79', python-version: '3.11' }
     defaults:
       run:
         shell: bash -l {0}
@@ -46,19 +37,28 @@ jobs:
       COVERAGE_RCFILE: ${{ github.workspace }}/.coveragerc
     steps:
     - uses: actions/checkout at v4
-    - uses: conda-incubator/setup-miniconda at v3
+
+    - name: Set cache key
+      run: echo "DATE=$(date +'%Y-%m-%d')" >> "$GITHUB_ENV"
+
+    # While it may be tempting to install Augur using Conda to avoid hardcoding
+    # the list of dependencies here, installing just the dependencies not
+    # available on PyPI allows us to test against dependencies installed by pip,
+    # which may have slightly different versions compared to Conda counterparts.
+    - name: Install dependencies from Conda
+      uses: mamba-org/setup-micromamba at v1
       with:
-        python-version: ${{ matrix.python-version }}
-        miniforge-variant: Mambaforge
-        channels: conda-forge,bioconda
-    - run: |
-        mamba install \
-          mafft \
-          raxml \
-          fasttree \
-          iqtree \
-          vcftools \
-          biopython=${{ matrix.biopython-version }}
+        create-args: mafft raxml fasttree iqtree vcftools biopython=${{ matrix.biopython-version }} python=${{ matrix.python-version }}
+        condarc: |
+          channels:
+            - conda-forge
+            - bioconda
+          channel_priority: strict
+        cache-environment: true
+        cache-environment-key: ${{ env.DATE }}
+        environment-name: augur
+
+    # Replace the Conda Augur installation with the local version.
     - run: pip install .[dev]
     - run: conda info
     - run: conda list
@@ -114,10 +114,11 @@ jobs:
           path: ./augur
 
       - name: Set cache key
-        id: cache-key
         run: echo "DATE=$(date +'%Y-%m-%d')" >> "$GITHUB_ENV"
 
-      - uses: mamba-org/setup-micromamba at v1
+      # Set up a Conda environment that replicates Nextstrain's Conda runtime.
+      - name: Install nextstrain-base from Conda
+        uses: mamba-org/setup-micromamba at v1
         with:
           create-args: nextstrain-base
           condarc: |
@@ -130,6 +131,7 @@ jobs:
           cache-environment-key: ${{ env.DATE }}
           environment-name: augur
 
+      # Replace the Conda Augur installation with the local version.
       - run: pip install ./augur
 
       - uses: actions/checkout at v4


=====================================
.github/workflows/release.yaml
=====================================
@@ -1,4 +1,5 @@
-name: Release
+run-name: Release ${{ inputs.version }}
+
 on:
   workflow_dispatch:
     inputs:


=====================================
CHANGES.md
=====================================
@@ -3,6 +3,23 @@
 ## __NEXT__
 
 
+## 24.1.0 (30 January 2024)
+
+### Features
+
+* `augur.io.read_metadata`: A new optional `dtype` argument allows custom data types for all columns. Automatic type inference still happens by default, so this is not a breaking change. [#1252][] (@victorlin)
+* `augur.io.read_vcf` has been removed and usage replaced with TreeTime's function of the same name which has improved validation of the VCF file. [#1366][] (@jameshadfield)
+
+### Bug Fixes
+
+* filter, frequencies, refine: Speed up reading of the metadata file. [#1252][] (@victorlin)
+* traits: Previously, columns with only numeric values were treated as numerical data. These are now treated as categorical data for discrete trait analysis. [#1252][] (@victorlin)
+* Support Biopython `≥1.82` by requiring bcbio-gff `≥0.7.1`. [#1400][] (@victorlin)
+
+[#1252]: https://github.com/nextstrain/augur/pull/1252
+[#1366]: https://github.com/nextstrain/augur/pull/1366
+[#1400]: https://github.com/nextstrain/augur/pull/1400
+
 ## 24.0.0 (22 January 2024)
 
 ### Major Changes


=====================================
augur/__version__.py
=====================================
@@ -1,4 +1,4 @@
-__version__ = '24.0.0'
+__version__ = '24.1.0'
 
 
 def is_augur_version_compatible(version):


=====================================
augur/filter/_run.py
=====================================
@@ -169,6 +169,7 @@ def run(args):
             delimiters=args.metadata_delimiters,
             id_columns=args.metadata_id_columns,
             chunk_size=args.metadata_chunk_size,
+            dtype="string",
         )
     except InvalidDelimiter:
         raise AugurError(
@@ -320,6 +321,7 @@ def run(args):
             delimiters=args.metadata_delimiters,
             id_columns=args.metadata_id_columns,
             chunk_size=args.metadata_chunk_size,
+            dtype="string",
         )
         for metadata in metadata_reader:
             # Recalculate groups for subsampling as we loop through the


=====================================
augur/frequencies.py
=====================================
@@ -85,7 +85,14 @@ def format_frequencies(freq):
 
 def run(args):
     try:
-        metadata = read_metadata(args.metadata, delimiters=args.metadata_delimiters, id_columns=args.metadata_id_columns)
+        # TODO: load only the ID, date, and --weights-attribute columns when
+        # read_metadata supports loading a subset of all columns.
+        metadata = read_metadata(
+            args.metadata,
+            delimiters=args.metadata_delimiters,
+            id_columns=args.metadata_id_columns,
+            dtype="string",
+        )
     except InvalidDelimiter:
         raise AugurError(
             f"Could not determine the delimiter of {args.metadata!r}. "
@@ -114,10 +121,9 @@ def run(args):
             tip.attr = {"num_date": np.mean(dates[tip.name])}
             tps.append(tip.attr["num_date"])
 
-            # Annotate tips with metadata to enable filtering and weighting of
-            # frequencies by metadata attributes.
-            for key, value in metadata.loc[tip.name].items():
-                tip.attr[key] = value
+            if weights_attribute:
+                # Annotate tip with weight attribute.
+                tip.attr[weights_attribute] = metadata.loc[tip.name, weights_attribute]
 
         if args.method == "diffusion":
             # estimate tree frequencies


=====================================
augur/index.py
=====================================
@@ -7,7 +7,9 @@ import csv
 
 from .io.file import open_file
 from .io.sequences import read_sequences
-from .io.vcf import is_vcf, read_vcf
+from .io.vcf import is_vcf
+from treetime.vcf_utils import read_vcf
+
 
 
 DELIMITER = '\t'
@@ -40,7 +42,8 @@ def index_vcf(vcf_path, index_path):
         number of strains indexed
 
     """
-    strains, _ = read_vcf(vcf_path)
+    strains = list(read_vcf(vcf_path)['sequences'].keys())
+
     num_of_seqs = 0
 
     with open_file(index_path, 'wt') as out_file:


=====================================
augur/io/metadata.py
=====================================
@@ -24,7 +24,7 @@ class InvalidDelimiter(Exception):
     pass
 
 
-def read_metadata(metadata_file, delimiters=DEFAULT_DELIMITERS, id_columns=DEFAULT_ID_COLUMNS, chunk_size=None):
+def read_metadata(metadata_file, delimiters=DEFAULT_DELIMITERS, id_columns=DEFAULT_ID_COLUMNS, chunk_size=None, dtype=None):
     r"""Read metadata from a given filename and into a pandas `DataFrame` or
     `TextFileReader` object.
 
@@ -40,7 +40,9 @@ def read_metadata(metadata_file, delimiters=DEFAULT_DELIMITERS, id_columns=DEFAU
         Only one id column will be inferred.
     chunk_size : int
         Size of chunks to stream from disk with an iterator instead of loading the entire input file into memory.
-
+    dtype : dict or str
+        Data types to apply to columns in metadata. If unspecified, pandas data type inference will be used.
+        See documentation for an argument of the same name to `pandas.read_csv()`.
     Returns
     -------
     pandas.DataFrame or `pandas.io.parsers.TextFileReader`
@@ -107,15 +109,31 @@ def read_metadata(metadata_file, delimiters=DEFAULT_DELIMITERS, id_columns=DEFAU
     else:
         index_col = id_columns_present[0]
 
-    # If we found a valid column to index the DataFrame, specify that column and
-    # also tell pandas that the column should be treated like a string instead
-    # of having its type inferred. This latter argument allows users to provide
-    # numerical ids that don't get converted to numbers by pandas.
+    # If we found a valid column to index the DataFrame, specify that column.
     kwargs["index_col"] = index_col
-    kwargs["dtype"] = {
-        index_col: "string",
-        METADATA_DATE_COLUMN: "string"
-    }
+
+    if dtype is None:
+        dtype = {}
+
+    if isinstance(dtype, dict):
+        # Avoid reading numerical IDs as integers.
+        dtype["index_col"] = "string"
+
+        # Avoid reading year-only dates as integers.
+        dtype[METADATA_DATE_COLUMN] = "string"
+
+    elif isinstance(dtype, str):
+        if dtype != "string":
+            raise AugurError(f"""
+                dtype='{dtype}' converts values in all columns to be of type
+                '{dtype}'. However, values in columns '{index_col}' and
+                '{METADATA_DATE_COLUMN}' must be treated as strings in Augur.
+                Specify dtype as a dict per column instead.
+            """)
+    else:
+        raise AugurError(f"Unsupported value for dtype: '{dtype}'")
+
+    kwargs["dtype"] = dtype
 
     return pd.read_csv(
         metadata_file,


=====================================
augur/io/vcf.py
=====================================
@@ -24,22 +24,6 @@ def is_vcf(filename):
     return bool(filename) and any(filename.lower().endswith(x) for x in ('.vcf', '.vcf.gz'))
 
 
-def read_vcf(filename):
-    if filename.lower().endswith(".gz"):
-        import gzip
-        file = gzip.open(filename, mode="rt", encoding='utf-8')
-    else:
-        file = open(filename, encoding='utf-8')
-
-    chrom_line = next(line for line in file if line.startswith("#C"))
-    file.close()
-    headers = chrom_line.strip().split("\t")
-    sequences = headers[headers.index("FORMAT") + 1:]
-
-    # because we need 'seqs to remove' for VCF
-    return sequences, sequences.copy()
-
-
 def write_vcf(input_filename, output_filename, dropped_samps):
     if _filename_gz(input_filename):
         input_arg = "--gzvcf"


=====================================
augur/refine.py
=====================================
@@ -214,10 +214,14 @@ def run(args):
             print("ERROR: meta data with dates is required for time tree reconstruction", file=sys.stderr)
             return 1
         try:
+            # TODO: load only the ID and date columns when read_metadata
+            # supports loading a subset of all columns.
             metadata = read_metadata(
                 args.metadata,
                 delimiters=args.metadata_delimiters,
-                id_columns=args.metadata_id_columns)
+                id_columns=args.metadata_id_columns,
+                dtype="string",
+            )
         except InvalidDelimiter:
             raise AugurError(
                 f"Could not determine the delimiter of {args.metadata!r}. "


=====================================
augur/traits.py
=====================================
@@ -131,7 +131,11 @@ def run(args):
         traits = read_metadata(
             args.metadata,
             delimiters=args.metadata_delimiters,
-            id_columns=args.metadata_id_columns)
+            id_columns=args.metadata_id_columns,
+
+            # Read all columns as string for discrete trait analysis
+            dtype="string",
+        )
     except InvalidDelimiter:
         raise AugurError(
                 f"Could not determine the delimiter of {args.metadata!r}. "


=====================================
augur/translate.py
=====================================
@@ -185,7 +185,7 @@ def translate_vcf_feature(sequences, ref, feature, feature_name):
 
         #Translate just the codon this nuc diff is in, and find out which AA loc
         #But need numbering to be w/in protin, not whole genome
-        if feature.strand == -1:
+        if feature.location.strand == -1:
             aaRepLocs = {(end-start-i-1)//3:safe_translate( str_reverse_comp( "".join([sequences[seqk][key+start]
                                     if key+start in sequences[seqk].keys() else ref[key+start]
                                 for key in range(i-i%3,i+3-i%3)]) ))


=====================================
augur/utils.py
=====================================
@@ -296,12 +296,12 @@ def _read_gff(reference, feature_names):
         # Note that `GFF.parse` doesn't always yield GFF records in the order
         # one may expect, but since we raise AugurError if there are multiple
         # this doesn't matter.
-        gff_entries = list(GFF.parse(in_handle, limit_info={'gff_type': valid_types}))
-
-        # TODO: Remove warning filter reversal after it's addressed upstream:
-        # <https://github.com/chapmanb/bcbb/issues/140>
+        # TODO: Remove warning suppression after it's addressed upstream:
+        # <https://github.com/chapmanb/bcbb/issues/143>
         import warnings
         from Bio import BiopythonDeprecationWarning
+        warnings.simplefilter("ignore", BiopythonDeprecationWarning)
+        gff_entries = list(GFF.parse(in_handle, limit_info={'gff_type': valid_types}))
         warnings.simplefilter("default", BiopythonDeprecationWarning)
 
         if len(gff_entries) == 0:


=====================================
debian/changelog
=====================================
@@ -1,3 +1,9 @@
+augur (24.1.0-1) UNRELEASED; urgency=medium
+
+  * New upstream version
+
+ -- Andreas Tille <tille at debian.org>  Thu, 08 Feb 2024 06:13:50 +0100
+
 augur (24.0.0-1) unstable; urgency=medium
 
   * New upstream version  (Closes: #1044079)


=====================================
docs/conf.py
=====================================
@@ -64,6 +64,7 @@ extensions = [
     'sphinx_autodoc_typehints', # must come after napoleon https://github.com/tox-dev/sphinx-autodoc-typehints/blob/1.21.4/README.md#compatibility-with-sphinxextnapoleon
     'sphinx_markdown_tables',
     'sphinx.ext.intersphinx',
+    'sphinx_tabs.tabs',
     'nextstrain.sphinx.theme',
 ]
 
@@ -137,6 +138,7 @@ nitpick_ignore = [
 intersphinx_mapping = {
     'Bio': ('https://biopython.org/docs/latest/api/', None),
     'docs.nextstrain.org': ('https://docs.nextstrain.org/en/latest/', None),
+    'cli': ('https://docs.nextstrain.org/projects/cli/en/stable', None),
     'python': ('https://docs.python.org/3', None),
     'numpy': ('https://numpy.org/doc/stable', None),
     'pandas': ('https://pandas.pydata.org/docs', None),


=====================================
docs/installation/installation.rst
=====================================
@@ -8,87 +8,74 @@ Installation
 .. contents::
    :local:
 
-Installing dependencies
-=======================
+Install Augur
+=============
 
-Augur uses some external bioinformatics programs:
+There are several ways to install Augur, ordered from least to most complex.
 
-- ``augur align`` requires `mafft <https://mafft.cbrc.jp/alignment/software/>`__
+.. tabs::
 
-- ``augur tree`` requires at least one of:
+   .. group-tab:: Nextstrain
 
-   - `IQ-TREE <http://www.iqtree.org/>`__ (used by default)
-   - `RAxML <https://sco.h-its.org/exelixis/web/software/raxml/>`__ (optional alternative)
-   - `FastTree <http://www.microbesonline.org/fasttree/>`__ (optional alternative)
+      Augur is part of the Nextstrain project and is available in all :term:`Nextstrain runtimes <docs.nextstrain.org:runtime>`.
 
-- Bacterial data (or any VCF usage) requires `vcftools <https://vcftools.github.io/>`__
+      Continue by following the :doc:`Nextstrain installation guide <docs.nextstrain.org:install>`.
 
-If you use Conda or Mamba, you can install them in an active environment:
+      Once installed, you can use :doc:`cli:commands/shell` to run ``augur`` directly.
 
-.. code:: bash
+   .. group-tab:: Conda
 
-   conda install -c conda-forge -c bioconda mafft raxml fasttree iqtree vcftools --yes
+      Augur can be installed using Conda or another variant. This assumes you are familiar with how to `manage Conda environments <https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html>`__.
 
-On macOS using `Homebrew <https://brew.sh/>`__:
+      .. code:: bash
 
-.. code:: bash
+         conda install -c conda-forge -c bioconda augur
 
-   brew tap brewsci/bio
-   brew install mafft iqtree raxml fasttree vcftools
+      This installs Augur along with all dependencies.
 
-On Debian/Ubuntu:
+   .. group-tab:: PyPI
 
-.. code:: bash
+      .. warning::
+         Installing other Python packages after Augur may cause dependency incompatibilities. If this happens, re-install Augur using the command in step 1.
 
-   sudo apt install mafft iqtree raxml fasttree vcftools
+      Augur is written in Python 3 and requires at least Python 3.8. It's published on `PyPI <https://pypi.org>`__ as `nextstrain-augur <https://pypi.org/project/nextstrain-augur>`__.
 
-Other Linux distributions will likely have the same packages available, although the names may differ slightly.
+      1. Install Augur along with Python dependencies.
 
-Install Augur as a user
-=======================
+         .. code:: bash
 
-Using Mamba
------------
+            python3 -m pip install nextstrain-augur
 
-This assumes you have Conda installed and an environment active. If not, refer to instructions for ambient runtime setup on `the Nextstrain installation guide <https://docs.nextstrain.org/en/latest/install.html>`__.
+      2. Install other dependencies.
 
-.. code:: bash
+         .. include:: non-python-dependencies.rst
 
-   conda install -c conda-forge -c bioconda augur
+   .. group-tab:: Source
 
-If you encounter environment solving errors or want a faster installation process, use `mamba <https://github.com/TheSnakePit/mamba>`__ as a drop-in replacement for conda:
+      .. warning::
+         Installing other Python packages after Augur may cause dependency incompatibilities. If this happens, re-install Augur using the command in step 1.
 
-.. code:: bash
+      Augur can be installed from source. This is useful if you want to use unreleased changes or develop Augur locally.
 
-   mamba install -c conda-forge -c bioconda augur
+      1. Install Augur along with Python dependencies.
 
-Using pip from PyPi
--------------------
+         .. code:: bash
 
-Augur is written in Python 3 and requires at least Python 3.8. It's published on `PyPi <https://pypi.org>`__ as `nextstrain-augur <https://pypi.org/project/nextstrain-augur>`__, so you can install it with ``pip`` like so:
+            git clone https://github.com/nextstrain/augur.git
+            cd augur
+            python3 -m pip install .
 
-.. code:: bash
+         .. note::
 
-   python3 -m pip install nextstrain-augur
+            For local development, install from source in editable mode with ``dev`` dependencies.
 
-From source
------------
+            .. code:: bash
 
-.. code:: bash
+               python3 -m pip install -e .'[dev]'
 
-   git clone https://github.com/nextstrain/augur.git
-   python3 -m pip install .
+      2. Install other dependencies.
 
-This installs Augur along with external Python dependencies.
-
-Install Augur as a developer
-============================
-
-.. code:: bash
-
-   python3 -m pip install -e '.[dev]'
-
-This installs dependencies necessary for local development.
+         .. include:: non-python-dependencies.rst
 
 Testing if it worked
 ====================


=====================================
docs/installation/non-python-dependencies.rst
=====================================
@@ -0,0 +1,32 @@
+Augur uses some external bioinformatics programs that are not available on PyPI:
+
+- ``augur align`` requires `mafft <https://mafft.cbrc.jp/alignment/software/>`__
+
+- ``augur tree`` requires at least one of:
+
+   - `IQ-TREE <http://www.iqtree.org/>`__ (used by default)
+   - `RAxML <https://sco.h-its.org/exelixis/web/software/raxml/>`__ (optional alternative)
+   - `FastTree <http://www.microbesonline.org/fasttree/>`__ (optional alternative)
+
+- Bacterial data (or any VCF usage) requires `vcftools <https://vcftools.github.io/>`__
+
+If you use Conda, you can install them in an active environment:
+
+.. code:: bash
+
+   conda install -c conda-forge -c bioconda mafft raxml fasttree iqtree vcftools --yes
+
+On macOS using `Homebrew <https://brew.sh/>`__:
+
+.. code:: bash
+
+   brew tap brewsci/bio
+   brew install mafft iqtree raxml fasttree vcftools
+
+On Debian/Ubuntu:
+
+.. code:: bash
+
+   sudo apt install mafft iqtree raxml fasttree vcftools
+
+Other Linux distributions will likely have the same packages available, although the names may differ slightly.


=====================================
setup.py
=====================================
@@ -51,9 +51,9 @@ setuptools.setup(
     package_data = {'augur': ['data/*']},
     python_requires = '>={}'.format('.'.join(str(n) for n in py_min_version)),
     install_requires = [
-        "bcbio-gff >=0.7.0, ==0.7.*",
-        # Skip Biopython >=1.82: https://github.com/nextstrain/augur/issues/1373
-        "biopython >=1.67, !=1.77, !=1.78, <1.82",
+        "bcbio-gff >=0.7.1, ==0.7.*",
+        # TODO: Remove biopython >= 1.80 pin if it is added to bcbio-gff: https://github.com/chapmanb/bcbb/issues/142
+        "biopython >=1.80, ==1.*",
         "cvxopt >=1.1.9, ==1.*",
         "importlib_resources >=5.3.0; python_version < '3.11'",
         "isodate ==0.6.*",
@@ -87,6 +87,7 @@ setuptools.setup(
             "sphinx-markdown-tables >= 0.0.9",
             "sphinx-rtd-theme >=0.4.3",
             "sphinx-autodoc-typehints >=1.21.4",
+            "sphinx-tabs",
             "types-jsonschema >=3.0.0, ==3.*",
             "types-setuptools",
             "wheel >=0.32.3",


=====================================
tests/functional/ancestral/cram/case-sensitive.t
=====================================
@@ -16,7 +16,7 @@ Change the _reference_ to lowercase
   $ python3 "$TESTDIR/../../../../scripts/diff_jsons.py" \
   >   "$TESTDIR/../data/simple-genome/nt_muts.ref-seq.json" \
   >   "nt_muts.ref-seq.json" \
-  >   --exclude-paths "root['meta']['updated']" 
+  >   --exclude-paths "root['generated_by']"
   {}
 
 
@@ -38,5 +38,5 @@ be lowecase which will be compared against the uppercase reference
   $ python3 "$TESTDIR/../../../../scripts/diff_jsons.py" \
   >   "$TESTDIR/../data/simple-genome/nt_muts.ref-seq.json" \
   >   "nt_muts.ref-seq.json" \
-  >   --exclude-paths "root['meta']['updated']" 
+  >   --exclude-paths "root['generated_by']"
   {}
\ No newline at end of file


=====================================
tests/functional/ancestral/cram/general.t
=====================================
@@ -20,7 +20,7 @@ node-data JSON we diff against.
   $ python3 "$TESTDIR/../../../../scripts/diff_jsons.py" \
   >   "$TESTDIR/../data/simple-genome/nt_muts.ref-seq.json" \
   >   "nt_muts.ref-seq.json" \
-  >   --exclude-paths "root['meta']['updated']" 
+  >   --exclude-paths "root['generated_by']"
   {}
 
 Same as above but without providing a `--root-sequence`. The effect of this on behaviour is:
@@ -40,5 +40,5 @@ mutations (as there's nothing to compare the root node to)
   $ python3 "$TESTDIR/../../../../scripts/diff_jsons.py" \
   >   "$TESTDIR/../data/simple-genome/nt_muts.no-ref-seq.json" \
   >   "nt_muts.no-ref-seq.json" \
-  >   --exclude-paths "root['meta']['updated']" 
+  >   --exclude-paths "root['generated_by']"
   {}


=====================================
tests/functional/ancestral/cram/vcf-multi-allele.t
=====================================
@@ -24,7 +24,7 @@ See <https://github.com/nextstrain/augur/issues/1380> for the bug this is testin
   $ python3 "$TESTDIR/../../../../scripts/diff_jsons.py" \
   >   "$DATA/nt_muts.ref-seq.json" \
   >   nt_muts.json \
-  >   --exclude-regex-paths "root\['nodes'\]\['.+'\]\['sequence'\]"
+  >   --exclude-regex-paths "root\['nodes'\]\['.+'\]\['sequence'\]" "root\['generated_by'\]"
   {'iterable_item_added': {"root['nodes']['sample_B']['muts'][0]": 'A30G'}}
 
   $ cat > expected.vcf <<EOF


=====================================
tests/functional/ancestral/cram/vcf.t
=====================================
@@ -21,7 +21,7 @@ but it will have the reference sequence attached.
   $ python3 "$TESTDIR/../../../../scripts/diff_jsons.py" \
   >   "$DATA/nt_muts.ref-seq.json" \
   >   "nt_muts.vcf-input.ref-seq.json" \
-  >   --exclude-regex-paths "root\['nodes'\]\['.+'\]\['sequence'\]" "root['meta']['updated']"
+  >   --exclude-regex-paths "root\['nodes'\]\['.+'\]\['sequence'\]" "root\['generated_by'\]"
   {}
 
 Here's the same mutations as in $DATA/nt_muts.ref-seq.json,


=====================================
tests/functional/ancestral/data/simple-genome/nt_muts.ref-seq.json
=====================================
@@ -7,10 +7,6 @@
       "type": "source"
     }
   },
-  "generated_by": {
-    "program": "augur",
-    "version": "23.1.1"
-  },
   "mask": "00000000000000000000000000000000000000000000000000",
   "nodes": {
     "node_AB": {


=====================================
tests/functional/filter/cram/filter-query-numerical.t
=====================================
@@ -33,3 +33,28 @@ The 'category' column will fail when used with a numerical comparison.
   	'>=' not supported between instances of 'str' and 'float'
   Ensure the syntax is valid per <https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-query>.
   [2]
+
+Create another metadata file for testing.
+
+  $ cat >metadata.tsv <<~~
+  > strain	metric1	metric2
+  > SEQ1	4	5
+  > SEQ2	5	9
+  > SEQ3	6	10
+  > ~~
+
+Use a Pandas query to filter by a numerical value.
+This relies on having proper data types associated with the columns. If < is
+comparing strings, it's likely that SEQ3 will be dropped or errors arise.
+
+  $ ${AUGUR} filter \
+  >  --metadata metadata.tsv \
+  >  --query "metric1 > 4 & metric1 < metric2" \
+  >  --output-strains filtered_strains.txt
+  1 strains were dropped during filtering
+  \t1 of these were filtered out by the query: "metric1 > 4 & metric1 < metric2" (esc)
+  2 strains passed all filters
+
+  $ sort filtered_strains.txt
+  SEQ2
+  SEQ3


=====================================
tests/functional/frequencies/cram/weights.t
=====================================
@@ -0,0 +1,68 @@
+Setup
+
+  $ source "$TESTDIR"/_setup.sh
+
+Create input files.
+
+  $ cat >metadata.tsv <<~~
+  > strain	date	region
+  > SEQ1	2021-01-01	A
+  > SEQ2	2021-01-02	B
+  > SEQ3	2021-01-01	C
+  > SEQ4	2021-01-02	C
+  > SEQ5	2021-02-02	D
+  > ~~
+
+  $ cat >weights.json <<~~
+  > { "A": 2, "B": 1, "C": 1, "D": 0 }
+  > ~~
+
+  $ cat >tree.nwk <<~~
+  > (SEQ2,SEQ3,SEQ4,SEQ5)SEQ1;
+  > ~~
+
+Weight by region.
+
+  $ ${AUGUR} frequencies \
+  >  --method kde \
+  >  --tree tree.nwk \
+  >  --metadata metadata.tsv \
+  >  --weights weights.json \
+  >  --weights-attribute region \
+  >  --output tip-frequencies.json >/dev/null
+
+  $ cat tip-frequencies.json
+  {
+    "SEQ2": {
+      "frequencies": [
+        0.5,
+        0.5
+      ]
+    },
+    "SEQ3": {
+      "frequencies": [
+        0.0,
+        0.0
+      ]
+    },
+    "SEQ4": {
+      "frequencies": [
+        0.5,
+        0.5
+      ]
+    },
+    "SEQ5": {
+      "frequencies": [
+        0.0,
+        0.0
+      ]
+    },
+    "generated_by": {
+      "program": "augur",
+      "version": ".*" (re)
+    },
+    "pivots": [
+      2021.0041,
+      2021.2507
+    ]
+  } (no-eol)


=====================================
tests/functional/translate/cram/root-mutations.t
=====================================
@@ -6,13 +6,14 @@ Setup
   $ export DATA="$TESTDIR/../data/simple-genome"
 
 This is the same as the "general.t" test, but we are modifying the input data
-such that the reference sequence contains "G" at pos 20 (1-based), and
-include a compensating mutation G20A on the root node.
-This results in the reference translation of gene1 to be MPCE* not MPCG*.
-(Note that the compensating nuc mutation doesn't actually need to be present
-in the JSON, `augur translate` just looks at the sequence attached to each node.)
+such that the reference sequence contains "G" at pos 20 (1-based), and include a
+compensating mutation G20A on the root node. (This manipulation would be much
+nicer using `jq` but it's not (yet) available in all nextstrain environments.)
+This results in the reference translation of gene1 to be MPCE* not MPCG*. (Note
+that the compensating nuc mutation doesn't actually need to be present in the
+JSON, `augur translate` just looks at the sequence attached to each node.)
 
-  $ sed '29s/^/        "G20A",\n/' "$ANC_DATA/nt_muts.ref-seq.json" | 
+  $ sed '24s/^/        "G20A",\n/' "$ANC_DATA/nt_muts.ref-seq.json" | 
   > sed 's/"nuc": "AAAAAAAAAATGCCCTGCGGG/"nuc": "AAAAAAAAAATGCCCTGCGAG/' > nt_muts.json
 
   $ ${AUGUR} translate \


=====================================
tests/io/test_vcf.py
=====================================
@@ -1,5 +1,6 @@
 import pytest
 import augur.io.vcf
+from treetime.vcf_utils import read_vcf
 
 
 @pytest.fixture
@@ -8,21 +9,20 @@ def mock_run_shell_command(mocker):
 
 
 class TestVCF:
+    # The `read_vcf` functionality used to be in an augur module when these
+    # tests were originally written but we now use TreeTime's function of the
+    # same name. The tests remain here to protect against any unforeseen changes.
     def test_read_vcf_compressed(self):
-        seq_keep, all_seq = augur.io.vcf.read_vcf(
-            "tests/data/tb_lee_2015.vcf.gz"
-        )
+        seq_keep = list(read_vcf("tests/data/tb_lee_2015.vcf.gz")['sequences'].keys())
 
         assert len(seq_keep) == 150
         assert seq_keep[149] == "G22733"
-        assert seq_keep == all_seq
 
     def test_read_vcf_uncompressed(self):
-        seq_keep, all_seq = augur.io.vcf.read_vcf("tests/data/tb_lee_2015.vcf")
+        seq_keep = list(read_vcf("tests/data/tb_lee_2015.vcf")['sequences'].keys())
 
         assert len(seq_keep) == 150
         assert seq_keep[149] == "G22733"
-        assert seq_keep == all_seq
 
     def test_write_vcf_compressed_input(self, mock_run_shell_command):
         augur.io.vcf.write_vcf(



View it on GitLab: https://salsa.debian.org/med-team/augur/-/compare/972dd44fb6cac8e90342b29eaa882fdecc4f9cd4...571e2557b03c6d13cab25800f49cd211aff21570

-- 
View it on GitLab: https://salsa.debian.org/med-team/augur/-/compare/972dd44fb6cac8e90342b29eaa882fdecc4f9cd4...571e2557b03c6d13cab25800f49cd211aff21570
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20240208/56a434c2/attachment-0001.htm>