[Git][debian-gis-team/xarray-datatree][upstream] New upstream version 0.0.13

Antonio Valentino (@antonio.valentino) gitlab at salsa.debian.org
Sat Oct 28 17:14:25 BST 2023



Antonio Valentino pushed to branch upstream at Debian GIS Project / xarray-datatree


Commits:
bed37d4b by Antonio Valentino at 2023-10-28T15:55:36+00:00
New upstream version 0.0.13
- - - - -


17 changed files:

- .github/workflows/main.yaml
- .github/workflows/pypipublish.yaml
- .pre-commit-config.yaml
- README.md
- datatree/datatree.py
- datatree/mapping.py
- datatree/testing.py
- datatree/tests/__init__.py
- datatree/tests/test_datatree.py
- datatree/tests/test_formatting.py
- datatree/tests/test_mapping.py
- datatree/tests/test_treenode.py
- datatree/treenode.py
- docs/source/api.rst
- docs/source/hierarchical-data.rst
- docs/source/whats-new.rst
- pyproject.toml


Changes:

=====================================
.github/workflows/main.yaml
=====================================
@@ -20,9 +20,9 @@ jobs:
         shell: bash -l {0}
     strategy:
       matrix:
-        python-version: ["3.9", "3.10", "3.11"]
+        python-version: ["3.9", "3.10", "3.11", "3.12"]
     steps:
-      - uses: actions/checkout at v3
+      - uses: actions/checkout at v4
 
       - name: Create conda environment
         uses: mamba-org/provision-with-micromamba at main
@@ -48,7 +48,7 @@ jobs:
           python -m pytest --cov=./ --cov-report=xml --verbose
 
       - name: Upload code coverage to Codecov
-        uses: codecov/codecov-action at v3.1.1
+        uses: codecov/codecov-action at v3.1.4
         with:
           file: ./coverage.xml
           flags: unittests
@@ -65,9 +65,9 @@ jobs:
         shell: bash -l {0}
     strategy:
       matrix:
-        python-version: ["3.9", "3.10", "3.11"]
+        python-version: ["3.9", "3.10", "3.11", "3.12"]
     steps:
-      - uses: actions/checkout at v3
+      - uses: actions/checkout at v4
 
       - name: Create conda environment
         uses: mamba-org/provision-with-micromamba at main


=====================================
.github/workflows/pypipublish.yaml
=====================================
@@ -19,7 +19,7 @@ jobs:
     runs-on: ubuntu-latest
     if: github.repository == 'xarray-contrib/datatree'
     steps:
-      - uses: actions/checkout at v3
+      - uses: actions/checkout at v4
         with:
           fetch-depth: 0
       - uses: actions/setup-python at v4
@@ -77,7 +77,7 @@ jobs:
           name: releases
           path: dist
       - name: Publish package to PyPI
-        uses: pypa/gh-action-pypi-publish at v1.6.4
+        uses: pypa/gh-action-pypi-publish at v1.8.10
         with:
           user: ${{ secrets.PYPI_USERNAME }}
           password: ${{ secrets.PYPI_PASSWORD }}


=====================================
.pre-commit-config.yaml
=====================================
@@ -15,7 +15,7 @@ repos:
       - id: isort
   # https://github.com/python/black#version-control-integration
   - repo: https://github.com/psf/black
-    rev: 23.1.0
+    rev: 23.9.1
     hooks:
       - id: black
   - repo: https://github.com/keewis/blackdoc
@@ -23,7 +23,7 @@ repos:
     hooks:
       - id: blackdoc
   - repo: https://github.com/PyCQA/flake8
-    rev: 6.0.0
+    rev: 6.1.0
     hooks:
       - id: flake8
   # - repo: https://github.com/Carreau/velin
@@ -32,7 +32,7 @@ repos:
   #     - id: velin
   #       args: ["--write", "--compact"]
   - repo: https://github.com/pre-commit/mirrors-mypy
-    rev: v1.0.1
+    rev: v1.5.1
     hooks:
       - id: mypy
         # Copied from setup.cfg
@@ -45,7 +45,7 @@ repos:
             types-pytz,
             # Dependencies that are typed
             numpy,
-            typing-extensions==3.10.0.0,
+            typing-extensions>=4.1.0,
           ]
   # run this occasionally, ref discussion https://github.com/pydata/xarray/pull/3194
   # - repo: https://github.com/asottile/pyupgrade


=====================================
README.md
=====================================
@@ -14,6 +14,17 @@ that was more flexible than a single `xarray.Dataset` object.
 The initial motivation was to represent netCDF files / Zarr stores with multiple nested groups in a single in-memory object,
 but `datatree.DataTree` objects have many other uses.
 
+### Installation
+You can install datatree via pip:
+```shell
+pip install xarray-datatree
+```
+
+or via conda-forge
+```shell
+conda install -c conda-forge xarray-datatree
+```
+
 ### Why Datatree?
 
 You might want to use datatree for:
@@ -41,7 +52,7 @@ The approach used here is based on benbovy's [`DatasetNode` example](https://gis
 You can create a `DataTree` object in 3 ways:
 1) Load from a netCDF file (or Zarr store) that has groups via `open_datatree()`.
 2) Using the init method of `DataTree`, which creates an individual node.
-  You can then specify the nodes' relationships to one other, either by setting `.parent` and `.chlldren` attributes,
+  You can then specify the nodes' relationships to one other, either by setting `.parent` and `.children` attributes,
   or through `__get/setitem__` access, e.g. `dt['path/to/node'] = DataTree()`.
 3) Create a tree from a dictionary of paths to datasets using `DataTree.from_dict()`.
 


=====================================
datatree/datatree.py
=====================================
@@ -36,6 +36,7 @@ from xarray.core.utils import (
     HybridMappingProxy,
     _default,
     either_dict_or_kwargs,
+    maybe_wrap_array,
 )
 from xarray.core.variable import Variable
 
@@ -107,13 +108,10 @@ class DatasetView(Dataset):
     An immutable Dataset-like view onto the data in a single DataTree node.
 
     In-place operations modifying this object should raise an AttributeError.
+    This requires overriding all inherited constructors.
 
     Operations returning a new result will return a new xarray.Dataset object.
     This includes all API on Dataset, which will be inherited.
-
-    This requires overriding all inherited private constructors.
-
-    We leave the public init constructor because it is used by type() in some xarray code (see datatree GH issue #188)
     """
 
     # TODO what happens if user alters (in-place) a DataArray they extracted from this object?
@@ -129,6 +127,14 @@ class DatasetView(Dataset):
         "_variables",
     )
 
+    def __init__(
+        self,
+        data_vars: Optional[Mapping[Any, Any]] = None,
+        coords: Optional[Mapping[Any, Any]] = None,
+        attrs: Optional[Mapping[Any, Any]] = None,
+    ):
+        raise AttributeError("DatasetView objects are not to be initialized directly")
+
     @classmethod
     def _from_node(
         cls,
@@ -149,14 +155,16 @@ class DatasetView(Dataset):
 
     def __setitem__(self, key, val) -> None:
         raise AttributeError(
-            "Mutation of the DatasetView is not allowed, please use __setitem__ on the wrapping DataTree node, "
-            "or use `DataTree.to_dataset()` if you want a mutable dataset"
+            "Mutation of the DatasetView is not allowed, please use `.__setitem__` on the wrapping DataTree node, "
+            "or use `dt.to_dataset()` if you want a mutable dataset. If calling this from within `map_over_subtree`,"
+            "use `.copy()` first to get a mutable version of the input dataset."
         )
 
     def update(self, other) -> None:
         raise AttributeError(
-            "Mutation of the DatasetView is not allowed, please use .update on the wrapping DataTree node, "
-            "or use `DataTree.to_dataset()` if you want a mutable dataset"
+            "Mutation of the DatasetView is not allowed, please use `.update` on the wrapping DataTree node, "
+            "or use `dt.to_dataset()` if you want a mutable dataset. If calling this from within `map_over_subtree`,"
+            "use `.copy()` first to get a mutable version of the input dataset."
         )
 
     # FIXME https://github.com/python/mypy/issues/7328
@@ -235,6 +243,65 @@ class DatasetView(Dataset):
             inplace=inplace,
         )
 
+    def map(
+        self,
+        func: Callable,
+        keep_attrs: bool | None = None,
+        args: Iterable[Any] = (),
+        **kwargs: Any,
+    ) -> Dataset:
+        """Apply a function to each data variable in this dataset
+
+        Parameters
+        ----------
+        func : callable
+            Function which can be called in the form `func(x, *args, **kwargs)`
+            to transform each DataArray `x` in this dataset into another
+            DataArray.
+        keep_attrs : bool or None, optional
+            If True, both the dataset's and variables' attributes (`attrs`) will be
+            copied from the original objects to the new ones. If False, the new dataset
+            and variables will be returned without copying the attributes.
+        args : iterable, optional
+            Positional arguments passed on to `func`.
+        **kwargs : Any
+            Keyword arguments passed on to `func`.
+
+        Returns
+        -------
+        applied : Dataset
+            Resulting dataset from applying ``func`` to each data variable.
+
+        Examples
+        --------
+        >>> da = xr.DataArray(np.random.randn(2, 3))
+        >>> ds = xr.Dataset({"foo": da, "bar": ("x", [-1, 2])})
+        >>> ds
+        <xarray.Dataset>
+        Dimensions:  (dim_0: 2, dim_1: 3, x: 2)
+        Dimensions without coordinates: dim_0, dim_1, x
+        Data variables:
+            foo      (dim_0, dim_1) float64 1.764 0.4002 0.9787 2.241 1.868 -0.9773
+            bar      (x) int64 -1 2
+        >>> ds.map(np.fabs)
+        <xarray.Dataset>
+        Dimensions:  (dim_0: 2, dim_1: 3, x: 2)
+        Dimensions without coordinates: dim_0, dim_1, x
+        Data variables:
+            foo      (dim_0, dim_1) float64 1.764 0.4002 0.9787 2.241 1.868 0.9773
+            bar      (x) float64 1.0 2.0
+        """
+
+        # Copied from xarray.Dataset so as not to call type(self), which causes problems (see datatree GH188).
+        # TODO Refactor xarray upstream to avoid needing to overwrite this.
+        # TODO This copied version will drop all attrs - the keep_attrs stuff should be re-instated
+        variables = {
+            k: maybe_wrap_array(v, func(v, *args, **kwargs))
+            for k, v in self.data_vars.items()
+        }
+        # return type(self)(variables, attrs=attrs)
+        return Dataset(variables)
+
 
 class DataTree(
     NamedNode,
@@ -438,6 +505,11 @@ class DataTree(
         """False if node contains any data or attrs. Does not look at children."""
         return not (self.has_data or self.has_attrs)
 
+    @property
+    def is_hollow(self) -> bool:
+        """True if only leaf nodes contain data."""
+        return not any(node.has_data for node in self.subtree if not node.is_leaf)
+
     @property
     def variables(self) -> Mapping[Hashable, Variable]:
         """Low level interface to node contents as dict of Variable objects.
@@ -1175,8 +1247,13 @@ class DataTree(
         filterfunc: function
             A function which accepts only one DataTree - the node on which filterfunc will be called.
 
+        Returns
+        -------
+        DataTree
+
         See Also
         --------
+        match
         pipe
         map_over_subtree
         """
@@ -1185,6 +1262,51 @@ class DataTree(
         }
         return DataTree.from_dict(filtered_nodes, name=self.root.name)
 
+    def match(self, pattern: str) -> DataTree:
+        """
+        Return nodes with paths matching pattern.
+
+        Uses unix glob-like syntax for pattern-matching.
+
+        Parameters
+        ----------
+        pattern: str
+            A pattern to match each node path against.
+
+        Returns
+        -------
+        DataTree
+
+        See Also
+        --------
+        filter
+        pipe
+        map_over_subtree
+
+        Examples
+        --------
+        >>> dt = DataTree.from_dict(
+        ...     {
+        ...         "/a/A": None,
+        ...         "/a/B": None,
+        ...         "/b/A": None,
+        ...         "/b/B": None,
+        ...     }
+        ... )
+        >>> dt.match("*/B")
+        DataTree('None', parent=None)
+        ├── DataTree('a')
+        │   └── DataTree('B')
+        └── DataTree('b')
+            └── DataTree('B')
+        """
+        matching_nodes = {
+            node.path: node.ds
+            for node in self.subtree
+            if NodePath(node.path).match(pattern)
+        }
+        return DataTree.from_dict(matching_nodes, name=self.root.name)
+
     def map_over_subtree(
         self,
         func: Callable,


=====================================
datatree/mapping.py
=====================================
@@ -1,6 +1,7 @@
 from __future__ import annotations
 
 import functools
+import sys
 from itertools import repeat
 from textwrap import dedent
 from typing import TYPE_CHECKING, Callable, Tuple
@@ -109,8 +110,8 @@ def map_over_subtree(func: Callable) -> Callable:
 
     Applies a function to every dataset in one or more subtrees, returning new trees which store the results.
 
-    The function will be applied to any non-empty dataset stored in any of the nodes in the trees. The returned trees
-    will have the same structure as the supplied trees.
+    The function will be applied to any data-containing dataset stored in any of the nodes in the trees. The returned
+    trees will have the same structure as the supplied trees.
 
     `func` needs to return one Datasets, DataArrays, or None in order to be able to rebuild the subtrees after
     mapping, as each result will be assigned to its respective node of a new tree via `DataTree.__setitem__`. Any
@@ -189,24 +190,28 @@ def map_over_subtree(func: Callable) -> Callable:
             *args_as_tree_length_iterables,
             *list(kwargs_as_tree_length_iterables.values()),
         ):
-            node_args_as_datasets = [
-                a.to_dataset() if isinstance(a, DataTree) else a
-                for a in all_node_args[:n_args]
+            node_args_as_datasetviews = [
+                a.ds if isinstance(a, DataTree) else a for a in all_node_args[:n_args]
             ]
-            node_kwargs_as_datasets = dict(
+            node_kwargs_as_datasetviews = dict(
                 zip(
                     [k for k in kwargs_as_tree_length_iterables.keys()],
                     [
-                        v.to_dataset() if isinstance(v, DataTree) else v
+                        v.ds if isinstance(v, DataTree) else v
                         for v in all_node_args[n_args:]
                     ],
                 )
             )
+            func_with_error_context = _handle_errors_with_path_context(
+                node_of_first_tree.path
+            )(func)
 
             # Now we can call func on the data in this particular set of corresponding nodes
             results = (
-                func(*node_args_as_datasets, **node_kwargs_as_datasets)
-                if not node_of_first_tree.is_empty
+                func_with_error_context(
+                    *node_args_as_datasetviews, **node_kwargs_as_datasetviews
+                )
+                if node_of_first_tree.has_data
                 else None
             )
 
@@ -251,6 +256,34 @@ def map_over_subtree(func: Callable) -> Callable:
     return _map_over_subtree
 
 
+def _handle_errors_with_path_context(path):
+    """Wraps given function so that if it fails it also raises path to node on which it failed."""
+
+    def decorator(func):
+        def wrapper(*args, **kwargs):
+            try:
+                return func(*args, **kwargs)
+            except Exception as e:
+                if sys.version_info >= (3, 11):
+                    # Add the context information to the error message
+                    e.add_note(
+                        f"Raised whilst mapping function over node with path {path}"
+                    )
+                raise
+
+        return wrapper
+
+    return decorator
+
+
+def add_note(err: BaseException, msg: str) -> None:
+    # TODO: remove once python 3.10 can be dropped
+    if sys.version_info < (3, 11):
+        err.__notes__ = getattr(err, "__notes__", []) + [msg]
+    else:
+        err.add_note(msg)
+
+
 def _check_single_set_return_values(path_to_node, obj):
     """Check types returned from single evaluation of func, and return number of return values received from func."""
     if isinstance(obj, (Dataset, DataArray)):


=====================================
datatree/testing.py
=====================================
@@ -34,7 +34,7 @@ def assert_isomorphic(a: DataTree, b: DataTree, from_root: bool = False):
     assert_identical
     """
     __tracebackhide__ = True
-    assert type(a) == type(b)
+    assert isinstance(a, type(b))
 
     if isinstance(a, DataTree):
         if from_root:
@@ -71,7 +71,7 @@ def assert_equal(a: DataTree, b: DataTree, from_root: bool = True):
     assert_identical
     """
     __tracebackhide__ = True
-    assert type(a) == type(b)
+    assert isinstance(a, type(b))
 
     if isinstance(a, DataTree):
         if from_root:
@@ -109,7 +109,7 @@ def assert_identical(a: DataTree, b: DataTree, from_root: bool = True):
     """
 
     __tracebackhide__ = True
-    assert type(a) == type(b)
+    assert isinstance(a, type(b))
     if isinstance(a, DataTree):
         if from_root:
             a = a.root


=====================================
datatree/tests/__init__.py
=====================================
@@ -1,7 +1,7 @@
 import importlib
-from distutils import version
 
 import pytest
+from packaging import version
 
 
 def _importorskip(modname, minversion=None):
@@ -21,7 +21,7 @@ def LooseVersion(vstring):
     # Our development version is something like '0.10.9+aac7bfc'
     # This function just ignores the git commit id.
     vstring = vstring.split("+")[0]
-    return version.LooseVersion(vstring)
+    return version.parse(vstring)
 
 
 has_zarr, requires_zarr = _importorskip("zarr")


=====================================
datatree/tests/test_datatree.py
=====================================
@@ -136,6 +136,16 @@ class TestStoreDatasets:
         john = DataTree(name="john", data=None)
         assert not john.has_data
 
+    def test_is_hollow(self):
+        john = DataTree(data=xr.Dataset({"a": 0}))
+        assert john.is_hollow
+
+        eve = DataTree(children={"john": john})
+        assert eve.is_hollow
+
+        eve.ds = xr.Dataset({"a": 1})
+        assert not eve.is_hollow
+
 
 class TestVariablesChildrenNameCollisions:
     def test_parent_already_has_variable_with_childs_name(self):
@@ -603,6 +613,13 @@ class TestAccess:
         var_keys = list(dt.variables.keys())
         assert all(var_key in key_completions for var_key in var_keys)
 
+    def test_operation_with_attrs_but_no_data(self):
+        # tests bug from xarray-datatree GH262
+        xs = xr.Dataset({"testvar": xr.DataArray(np.ones((2, 3)))})
+        dt = DataTree.from_dict({"node1": xs, "node2": xs})
+        dt.attrs["test_key"] = 1  # sel works fine without this line
+        dt.sel(dim_0=0)
+
 
 class TestRestructuring:
     def test_drop_nodes(self):
@@ -671,6 +688,25 @@ class TestPipe:
 
 
 class TestSubset:
+    def test_match(self):
+        # TODO is this example going to cause problems with case sensitivity?
+        dt = DataTree.from_dict(
+            {
+                "/a/A": None,
+                "/a/B": None,
+                "/b/A": None,
+                "/b/B": None,
+            }
+        )
+        result = dt.match("*/B")
+        expected = DataTree.from_dict(
+            {
+                "/a/B": None,
+                "/b/B": None,
+            }
+        )
+        dtt.assert_identical(result, expected)
+
     def test_filter(self):
         simpsons = DataTree.from_dict(
             d={


=====================================
datatree/tests/test_formatting.py
=====================================
@@ -90,11 +90,14 @@ class TestDiffFormatting:
         assert actual == expected
 
     def test_diff_node_data(self):
-        ds1 = Dataset({"u": 0, "v": 1})
-        ds3 = Dataset({"w": 5})
+        import numpy as np
+
+        # casting to int64 explicitly ensures that int64s are created on all architectures
+        ds1 = Dataset({"u": np.int64(0), "v": np.int64(1)})
+        ds3 = Dataset({"w": np.int64(5)})
         dt_1 = DataTree.from_dict({"a": ds1, "a/b": ds3})
-        ds2 = Dataset({"u": 0})
-        ds4 = Dataset({"w": 6})
+        ds2 = Dataset({"u": np.int64(0)})
+        ds4 = Dataset({"w": np.int64(6)})
         dt_2 = DataTree.from_dict({"a": ds2, "a/b": ds4})
 
         expected = dedent(


=====================================
datatree/tests/test_mapping.py
=====================================
@@ -252,6 +252,35 @@ class TestMapOverSubTree:
         result_tree = times_ten(subtree)
         assert_equal(result_tree, expected, from_root=False)
 
+    def test_skip_empty_nodes_with_attrs(self, create_test_datatree):
+        # inspired by xarray-datatree GH262
+        dt = create_test_datatree()
+        dt["set1/set2"].attrs["foo"] = "bar"
+
+        def check_for_data(ds):
+            # fails if run on a node that has no data
+            assert len(ds.variables) != 0
+            return ds
+
+        dt.map_over_subtree(check_for_data)
+
+    @pytest.mark.xfail(
+        reason="probably some bug in pytests handling of exception notes"
+    )
+    def test_error_contains_path_of_offending_node(self, create_test_datatree):
+        dt = create_test_datatree()
+        dt["set1"]["bad_var"] = 0
+        print(dt)
+
+        def fail_on_specific_node(ds):
+            if "bad_var" in ds:
+                raise ValueError("Failed because 'bar_var' present in dataset")
+
+        with pytest.raises(
+            ValueError, match="Raised whilst mapping function over node /set1"
+        ):
+            dt.map_over_subtree(fail_on_specific_node)
+
 
 class TestMutableOperations:
     def test_construct_using_type(self):
@@ -275,7 +304,7 @@ class TestMutableOperations:
 
         dt.map_over_subtree(weighted_mean)
 
-    def test_alter_inplace(self):
+    def test_alter_inplace_forbidden(self):
         simpsons = DataTree.from_dict(
             d={
                 "/": xr.Dataset({"age": 83}),
@@ -293,7 +322,8 @@ class TestMutableOperations:
             ds["age"] = ds["age"] + years
             return ds
 
-        simpsons.map_over_subtree(fast_forward, years=10)
+        with pytest.raises(AttributeError):
+            simpsons.map_over_subtree(fast_forward, years=10)
 
 
 @pytest.mark.xfail


=====================================
datatree/tests/test_treenode.py
=====================================
@@ -1,7 +1,7 @@
 import pytest
 
 from datatree.iterators import LevelOrderIter, PreOrderIter
-from datatree.treenode import InvalidTreeError, NamedNode, TreeNode
+from datatree.treenode import InvalidTreeError, NamedNode, NodePath, TreeNode
 
 
 class TestFamilyTree:
@@ -369,3 +369,9 @@ class TestRenderTree:
         ]
         for expected_node, printed_node in zip(expected_nodes, printout.splitlines()):
             assert expected_node in printed_node
+
+
+def test_nodepath():
+    path = NodePath("/Mary")
+    assert path.root == "/"
+    assert path.stem == "Mary"


=====================================
datatree/treenode.py
=====================================
@@ -1,5 +1,6 @@
 from __future__ import annotations
 
+import sys
 from collections import OrderedDict
 from pathlib import PurePosixPath
 from typing import (
@@ -30,21 +31,20 @@ class NotFoundInTreeError(ValueError):
 class NodePath(PurePosixPath):
     """Represents a path from one node to another within a tree."""
 
-    def __new__(cls, *args: str | "NodePath") -> "NodePath":
-        obj = super().__new__(cls, *args)
-
-        if obj.drive:
+    def __init__(self, *pathsegments):
+        if sys.version_info >= (3, 12):
+            super().__init__(*pathsegments)
+        else:
+            super().__new__(PurePosixPath, *pathsegments)
+        if self.drive:
             raise ValueError("NodePaths cannot have drives")
 
-        if obj.root not in ["/", ""]:
+        if self.root not in ["/", ""]:
             raise ValueError(
                 'Root of NodePath can only be either "/" or "", with "" meaning the path is relative.'
             )
-
         # TODO should we also forbid suffixes to avoid node names with dots in them?
 
-        return obj
-
 
 Tree = TypeVar("Tree", bound="TreeNode")
 


=====================================
docs/source/api.rst
=====================================
@@ -64,6 +64,7 @@ This interface echoes that of ``xarray.Dataset``.
    DataTree.has_data
    DataTree.has_attrs
    DataTree.is_empty
+   DataTree.is_hollow
 
 ..
 
@@ -102,6 +103,7 @@ For manipulating, traversing, navigating, or mapping over the tree structure.
    DataTree.find_common_ancestor
    map_over_subtree
    DataTree.pipe
+   DataTree.match
    DataTree.filter
 
 DataTree Contents


=====================================
docs/source/hierarchical-data.rst
=====================================
@@ -175,7 +175,7 @@ Let's use a different example of a tree to discuss more complex relationships be
     ]
 
 We have used the :py:meth:`~DataTree.from_dict` constructor method as an alternate way to quickly create a whole tree,
-and :ref:`filesystem-like syntax <filesystem paths>`_ (to be explained shortly) to select two nodes of interest.
+and :ref:`filesystem paths` (to be explained shortly) to select two nodes of interest.
 
 .. ipython:: python
 
@@ -339,3 +339,297 @@ we can construct a complex tree quickly using the alternative constructor :py:me
     Notice that using the path-like syntax will also create any intermediate empty nodes necessary to reach the end of the specified path
     (i.e. the node labelled `"c"` in this case.)
     This is to help avoid lots of redundant entries when creating deeply-nested trees using :py:meth:`DataTree.from_dict`.
+
+.. _iterating over trees:
+
+Iterating over trees
+~~~~~~~~~~~~~~~~~~~~
+
+You can iterate over every node in a tree using the subtree :py:class:`~DataTree.subtree` property.
+This returns an iterable of nodes, which yields them in depth-first order.
+
+.. ipython:: python
+
+    for node in vertebrates.subtree:
+        print(node.path)
+
+A very useful pattern is to use :py:class:`~DataTree.subtree` conjunction with the :py:class:`~DataTree.path` property to manipulate the nodes however you wish,
+then rebuild a new tree using :py:meth:`DataTree.from_dict()`.
+
+For example, we could keep only the nodes containing data by looping over all nodes,
+checking if they contain any data using :py:class:`~DataTree.has_data`,
+then rebuilding a new tree using only the paths of those nodes:
+
+.. ipython:: python
+
+    non_empty_nodes = {node.path: node.ds for node in dt.subtree if node.has_data}
+    DataTree.from_dict(non_empty_nodes)
+
+You can see this tree is similar to the ``dt`` object above, except that it is missing the empty nodes ``a/c`` and ``a/c/d``.
+
+(If you want to keep the name of the root node, you will need to add the ``name`` kwarg to :py:class:`from_dict`, i.e. ``DataTree.from_dict(non_empty_nodes, name=dt.root.name)``.)
+
+.. _Tree Contents:
+
+Tree Contents
+-------------
+
+Hollow Trees
+~~~~~~~~~~~~
+
+A concept that can sometimes be useful is that of a "Hollow Tree", which means a tree with data stored only at the leaf nodes.
+This is useful because certain useful tree manipulation operations only make sense for hollow trees.
+
+You can check if a tree is a hollow tree by using the :py:meth:`~DataTree.is_hollow` property.
+We can see that the Simpson's family is not hollow because the data variable ``"age"`` is present at some nodes which
+have children (i.e. Abe and Homer).
+
+.. ipython:: python
+
+    simpsons.is_hollow
+
+.. _manipulating trees:
+
+Manipulating Trees
+------------------
+
+Subsetting Tree Nodes
+~~~~~~~~~~~~~~~~~~~~~
+
+We can subset our tree to select only nodes of interest in various ways.
+
+Similarly to on a real filesystem, matching nodes by common patterns in their paths is often useful.
+We can use :py:meth:`DataTree.match` for this:
+
+.. ipython:: python
+
+    dt = DataTree.from_dict(
+        {
+            "/a/A": None,
+            "/a/B": None,
+            "/b/A": None,
+            "/b/B": None,
+        }
+    )
+    result = dt.match("*/B")
+
+We can also subset trees by the contents of the nodes.
+:py:meth:`DataTree.filter` retains only the nodes of a tree that meet a certain condition.
+For example, we could recreate the Simpson's family tree with the ages of each individual, then filter for only the adults:
+First lets recreate the tree but with an `age` data variable in every node:
+
+.. ipython:: python
+
+    simpsons = DataTree.from_dict(
+        d={
+            "/": xr.Dataset({"age": 83}),
+            "/Herbert": xr.Dataset({"age": 40}),
+            "/Homer": xr.Dataset({"age": 39}),
+            "/Homer/Bart": xr.Dataset({"age": 10}),
+            "/Homer/Lisa": xr.Dataset({"age": 8}),
+            "/Homer/Maggie": xr.Dataset({"age": 1}),
+        },
+        name="Abe",
+    )
+    simpsons
+
+Now let's filter out the minors:
+
+.. ipython:: python
+
+    simpsons.filter(lambda node: node["age"] > 18)
+
+The result is a new tree, containing only the nodes matching the condition.
+
+(Yes, under the hood :py:meth:`~DataTree.filter` is just syntactic sugar for the pattern we showed you in :ref:`iterating over trees` !)
+
+.. _tree computation:
+
+Computation
+-----------
+
+`DataTree` objects are also useful for performing computations, not just for organizing data.
+
+Operations and Methods on Trees
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To show how applying operations across a whole tree at once can be useful,
+let's first create a example scientific dataset.
+
+.. ipython:: python
+
+    def time_stamps(n_samples, T):
+        """Create an array of evenly-spaced time stamps"""
+        return xr.DataArray(
+            data=np.linspace(0, 2 * np.pi * T, n_samples), dims=["time"]
+        )
+
+
+    def signal_generator(t, f, A, phase):
+        """Generate an example electrical-like waveform"""
+        return A * np.sin(f * t.data + phase)
+
+
+    time_stamps1 = time_stamps(n_samples=15, T=1.5)
+    time_stamps2 = time_stamps(n_samples=10, T=1.0)
+
+    voltages = DataTree.from_dict(
+        {
+            "/oscilloscope1": xr.Dataset(
+                {
+                    "potential": (
+                        "time",
+                        signal_generator(time_stamps1, f=2, A=1.2, phase=0.5),
+                    ),
+                    "current": (
+                        "time",
+                        signal_generator(time_stamps1, f=2, A=1.2, phase=1),
+                    ),
+                },
+                coords={"time": time_stamps1},
+            ),
+            "/oscilloscope2": xr.Dataset(
+                {
+                    "potential": (
+                        "time",
+                        signal_generator(time_stamps2, f=1.6, A=1.6, phase=0.2),
+                    ),
+                    "current": (
+                        "time",
+                        signal_generator(time_stamps2, f=1.6, A=1.6, phase=0.7),
+                    ),
+                },
+                coords={"time": time_stamps2},
+            ),
+        }
+    )
+    voltages
+
+Most xarray computation methods also exist as methods on datatree objects,
+so you can for example take the mean value of these two timeseries at once:
+
+.. ipython:: python
+
+    voltages.mean(dim="time")
+
+This works by mapping the standard :py:meth:`xarray.Dataset.mean()` method over the dataset stored in each node of the
+tree one-by-one.
+
+The arguments passed to the method are used for every node, so the values of the arguments you pass might be valid for one node and invalid for another
+
+.. ipython:: python
+    :okexcept:
+
+    voltages.isel(time=12)
+
+Notice that the error raised helpfully indicates which node of the tree the operation failed on.
+
+Arithmetic Methods on Trees
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Arithmetic methods are also implemented, so you can e.g. add a scalar to every dataset in the tree at once.
+For example, we can advance the timeline of the Simpsons by a decade just by
+
+.. ipython:: python
+
+    simpsons + 10
+
+See that the same change (fast-forwarding by adding 10 years to the age of each character) has been applied to every node.
+
+Mapping Custom Functions Over Trees
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+You can map custom computation over each node in a tree using :py:func:`map_over_subtree`.
+You can map any function, so long as it takes `xarray.Dataset` objects as one (or more) of the input arguments,
+and returns one (or more) xarray datasets.
+
+.. note::
+
+    Functions passed to :py:func:`map_over_subtree` cannot alter nodes in-place.
+    Instead they must return new `xarray.Dataset` objects.
+
+For example, we can define a function to calculate the Root Mean Square of a timeseries
+
+.. ipython:: python
+
+    def rms(signal):
+        return np.sqrt(np.mean(signal**2))
+
+Then calculate the RMS value of these signals:
+
+.. ipython:: python
+
+    rms(readings)
+
+.. _multiple trees:
+
+Operating on Multiple Trees
+---------------------------
+
+The examples so far have involved mapping functions or methods over the nodes of a single tree,
+but we can generalize this to mapping functions over multiple trees at once.
+
+Comparing Trees for Isomorphism
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For it to make sense to map a single non-unary function over the nodes of multiple trees at once,
+each tree needs to have the same structure. Specifically two trees can only be considered similar, or "isomorphic",
+if they have the same number of nodes, and each corresponding node has the same number of children.
+We can check if any two trees are isomorphic using the :py:meth:`DataTree.isomorphic` method.
+
+.. ipython:: python
+    :okexcept:
+
+    dt1 = DataTree.from_dict({"a": None, "a/b": None})
+    dt2 = DataTree.from_dict({"a": None})
+    dt1.isomorphic(dt2)
+
+    dt3 = DataTree.from_dict({"a": None, "b": None})
+    dt1.isomorphic(dt3)
+
+    dt4 = DataTree.from_dict({"A": None, "A/B": xr.Dataset({"foo": 1})})
+    dt1.isomorphic(dt4)
+
+If the trees are not isomorphic a :py:class:`~TreeIsomorphismError` will be raised.
+Notice that corresponding tree nodes do not need to have the same name or contain the same data in order to be considered isomorphic.
+
+Arithmetic Between Multiple Trees
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Arithmetic operations like multiplication are binary operations, so as long as we have wo isomorphic trees,
+we can do arithmetic between them.
+
+.. ipython:: python
+
+    currents = DataTree.from_dict(
+        {
+            "/oscilloscope1": xr.Dataset(
+                {
+                    "current": (
+                        "time",
+                        signal_generator(time_stamps1, f=2, A=1.2, phase=1),
+                    ),
+                },
+                coords={"time": time_stamps1},
+            ),
+            "/oscilloscope2": xr.Dataset(
+                {
+                    "current": (
+                        "time",
+                        signal_generator(time_stamps2, f=1.6, A=1.6, phase=0.7),
+                    ),
+                },
+                coords={"time": time_stamps2},
+            ),
+        }
+    )
+    currents
+
+    currents.isomorphic(voltages)
+
+We could use this feature to quickly calculate the electrical power in our signal, P=IV.
+
+.. ipython:: python
+
+    power = currents * voltages
+    power


=====================================
docs/source/whats-new.rst
=====================================
@@ -15,9 +15,55 @@ What's New
 
     np.random.seed(123456)
 
+.. _whats-new.v0.0.13:
+
+v0.0.13 (unreleased)
+--------------------
+
+New Features
+~~~~~~~~~~~~
+
+- New :py:meth:`DataTree.match` method for glob-like pattern matching of node paths. (:pull:`267`)
+  By `Tom Nicholas <https://github.com/TomNicholas>`_.
+- New :py:meth:`DataTree.is_hollow` property for checking if data is only contained at the leaf nodes. (:pull:`272`)
+  By `Tom Nicholas <https://github.com/TomNicholas>`_.
+- Indicate which node caused the problem if error encountered while applying user function using :py:func:`map_over_subtree`
+  (:issue:`190`, :pull:`264`). Only works when using python 3.11 or later.
+  By `Tom Nicholas <https://github.com/TomNicholas>`_.
+
+Breaking changes
+~~~~~~~~~~~~~~~~
+
+- Nodes containing only attributes but no data are now ignored by :py:func:`map_over_subtree` (:issue:`262`, :pull:`263`)
+  By `Tom Nicholas <https://github.com/TomNicholas>`_.
+- Disallow altering of given dataset inside function called by :py:func:`map_over_subtree` (:pull:`269`, reverts part of :pull:`194`).
+  By `Tom Nicholas <https://github.com/TomNicholas>`_.
+
+Deprecations
+~~~~~~~~~~~~
+
+Bug fixes
+~~~~~~~~~
+
+- Fix unittests on i386. (:pull:`249`)
+  By `Antonio Valentino <https://github.com/avalentino>`_.
+- Ensure nodepath class is compatible with python 3.12 (:pull:`260`)
+  By `Max Grover <https://github.com/mgrover1>`_.
+
+Documentation
+~~~~~~~~~~~~~
+
+- Added new sections to page on ``Working with Hierarchical Data`` (:pull:`180`)
+  By `Tom Nicholas <https://github.com/TomNicholas>`_.
+
+Internal Changes
+~~~~~~~~~~~~~~~~
+
+* No longer use the deprecated `distutils` package.
+
 .. _whats-new.v0.0.12:
 
-v0.0.12 (unreleased)
+v0.0.12 (03/07/2023)
 --------------------
 
 New Features


=====================================
pyproject.toml
=====================================
@@ -20,6 +20,7 @@ classifiers = [
 requires-python = ">=3.9"
 dependencies = [
     "xarray >=2022.6.0",
+    "packaging",
 ]
 dynamic = ["version"]
 



View it on GitLab: https://salsa.debian.org/debian-gis-team/xarray-datatree/-/commit/bed37d4b47efda0bd57cbe9e635efb14b28c3fc3

-- 
View it on GitLab: https://salsa.debian.org/debian-gis-team/xarray-datatree/-/commit/bed37d4b47efda0bd57cbe9e635efb14b28c3fc3
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/pkg-grass-devel/attachments/20231028/dd7b3537/attachment-0001.htm>


More information about the Pkg-grass-devel mailing list