[Git][debian-gis-team/xarray-datatree][upstream] New upstream version 0.0.13
Antonio Valentino (@antonio.valentino)
gitlab at salsa.debian.org
Sat Oct 28 17:14:25 BST 2023
Antonio Valentino pushed to branch upstream at Debian GIS Project / xarray-datatree
Commits:
bed37d4b by Antonio Valentino at 2023-10-28T15:55:36+00:00
New upstream version 0.0.13
- - - - -
17 changed files:
- .github/workflows/main.yaml
- .github/workflows/pypipublish.yaml
- .pre-commit-config.yaml
- README.md
- datatree/datatree.py
- datatree/mapping.py
- datatree/testing.py
- datatree/tests/__init__.py
- datatree/tests/test_datatree.py
- datatree/tests/test_formatting.py
- datatree/tests/test_mapping.py
- datatree/tests/test_treenode.py
- datatree/treenode.py
- docs/source/api.rst
- docs/source/hierarchical-data.rst
- docs/source/whats-new.rst
- pyproject.toml
Changes:
=====================================
.github/workflows/main.yaml
=====================================
@@ -20,9 +20,9 @@ jobs:
shell: bash -l {0}
strategy:
matrix:
- python-version: ["3.9", "3.10", "3.11"]
+ python-version: ["3.9", "3.10", "3.11", "3.12"]
steps:
- - uses: actions/checkout at v3
+ - uses: actions/checkout at v4
- name: Create conda environment
uses: mamba-org/provision-with-micromamba at main
@@ -48,7 +48,7 @@ jobs:
python -m pytest --cov=./ --cov-report=xml --verbose
- name: Upload code coverage to Codecov
- uses: codecov/codecov-action at v3.1.1
+ uses: codecov/codecov-action at v3.1.4
with:
file: ./coverage.xml
flags: unittests
@@ -65,9 +65,9 @@ jobs:
shell: bash -l {0}
strategy:
matrix:
- python-version: ["3.9", "3.10", "3.11"]
+ python-version: ["3.9", "3.10", "3.11", "3.12"]
steps:
- - uses: actions/checkout at v3
+ - uses: actions/checkout at v4
- name: Create conda environment
uses: mamba-org/provision-with-micromamba at main
=====================================
.github/workflows/pypipublish.yaml
=====================================
@@ -19,7 +19,7 @@ jobs:
runs-on: ubuntu-latest
if: github.repository == 'xarray-contrib/datatree'
steps:
- - uses: actions/checkout at v3
+ - uses: actions/checkout at v4
with:
fetch-depth: 0
- uses: actions/setup-python at v4
@@ -77,7 +77,7 @@ jobs:
name: releases
path: dist
- name: Publish package to PyPI
- uses: pypa/gh-action-pypi-publish at v1.6.4
+ uses: pypa/gh-action-pypi-publish at v1.8.10
with:
user: ${{ secrets.PYPI_USERNAME }}
password: ${{ secrets.PYPI_PASSWORD }}
=====================================
.pre-commit-config.yaml
=====================================
@@ -15,7 +15,7 @@ repos:
- id: isort
# https://github.com/python/black#version-control-integration
- repo: https://github.com/psf/black
- rev: 23.1.0
+ rev: 23.9.1
hooks:
- id: black
- repo: https://github.com/keewis/blackdoc
@@ -23,7 +23,7 @@ repos:
hooks:
- id: blackdoc
- repo: https://github.com/PyCQA/flake8
- rev: 6.0.0
+ rev: 6.1.0
hooks:
- id: flake8
# - repo: https://github.com/Carreau/velin
@@ -32,7 +32,7 @@ repos:
# - id: velin
# args: ["--write", "--compact"]
- repo: https://github.com/pre-commit/mirrors-mypy
- rev: v1.0.1
+ rev: v1.5.1
hooks:
- id: mypy
# Copied from setup.cfg
@@ -45,7 +45,7 @@ repos:
types-pytz,
# Dependencies that are typed
numpy,
- typing-extensions==3.10.0.0,
+ typing-extensions>=4.1.0,
]
# run this occasionally, ref discussion https://github.com/pydata/xarray/pull/3194
# - repo: https://github.com/asottile/pyupgrade
=====================================
README.md
=====================================
@@ -14,6 +14,17 @@ that was more flexible than a single `xarray.Dataset` object.
The initial motivation was to represent netCDF files / Zarr stores with multiple nested groups in a single in-memory object,
but `datatree.DataTree` objects have many other uses.
+### Installation
+You can install datatree via pip:
+```shell
+pip install xarray-datatree
+```
+
+or via conda-forge
+```shell
+conda install -c conda-forge xarray-datatree
+```
+
### Why Datatree?
You might want to use datatree for:
@@ -41,7 +52,7 @@ The approach used here is based on benbovy's [`DatasetNode` example](https://gis
You can create a `DataTree` object in 3 ways:
1) Load from a netCDF file (or Zarr store) that has groups via `open_datatree()`.
2) Using the init method of `DataTree`, which creates an individual node.
- You can then specify the nodes' relationships to one other, either by setting `.parent` and `.chlldren` attributes,
+ You can then specify the nodes' relationships to one other, either by setting `.parent` and `.children` attributes,
or through `__get/setitem__` access, e.g. `dt['path/to/node'] = DataTree()`.
3) Create a tree from a dictionary of paths to datasets using `DataTree.from_dict()`.
=====================================
datatree/datatree.py
=====================================
@@ -36,6 +36,7 @@ from xarray.core.utils import (
HybridMappingProxy,
_default,
either_dict_or_kwargs,
+ maybe_wrap_array,
)
from xarray.core.variable import Variable
@@ -107,13 +108,10 @@ class DatasetView(Dataset):
An immutable Dataset-like view onto the data in a single DataTree node.
In-place operations modifying this object should raise an AttributeError.
+ This requires overriding all inherited constructors.
Operations returning a new result will return a new xarray.Dataset object.
This includes all API on Dataset, which will be inherited.
-
- This requires overriding all inherited private constructors.
-
- We leave the public init constructor because it is used by type() in some xarray code (see datatree GH issue #188)
"""
# TODO what happens if user alters (in-place) a DataArray they extracted from this object?
@@ -129,6 +127,14 @@ class DatasetView(Dataset):
"_variables",
)
+ def __init__(
+ self,
+ data_vars: Optional[Mapping[Any, Any]] = None,
+ coords: Optional[Mapping[Any, Any]] = None,
+ attrs: Optional[Mapping[Any, Any]] = None,
+ ):
+ raise AttributeError("DatasetView objects are not to be initialized directly")
+
@classmethod
def _from_node(
cls,
@@ -149,14 +155,16 @@ class DatasetView(Dataset):
def __setitem__(self, key, val) -> None:
raise AttributeError(
- "Mutation of the DatasetView is not allowed, please use __setitem__ on the wrapping DataTree node, "
- "or use `DataTree.to_dataset()` if you want a mutable dataset"
+ "Mutation of the DatasetView is not allowed, please use `.__setitem__` on the wrapping DataTree node, "
+ "or use `dt.to_dataset()` if you want a mutable dataset. If calling this from within `map_over_subtree`,"
+ "use `.copy()` first to get a mutable version of the input dataset."
)
def update(self, other) -> None:
raise AttributeError(
- "Mutation of the DatasetView is not allowed, please use .update on the wrapping DataTree node, "
- "or use `DataTree.to_dataset()` if you want a mutable dataset"
+ "Mutation of the DatasetView is not allowed, please use `.update` on the wrapping DataTree node, "
+ "or use `dt.to_dataset()` if you want a mutable dataset. If calling this from within `map_over_subtree`,"
+ "use `.copy()` first to get a mutable version of the input dataset."
)
# FIXME https://github.com/python/mypy/issues/7328
@@ -235,6 +243,65 @@ class DatasetView(Dataset):
inplace=inplace,
)
+ def map(
+ self,
+ func: Callable,
+ keep_attrs: bool | None = None,
+ args: Iterable[Any] = (),
+ **kwargs: Any,
+ ) -> Dataset:
+ """Apply a function to each data variable in this dataset
+
+ Parameters
+ ----------
+ func : callable
+ Function which can be called in the form `func(x, *args, **kwargs)`
+ to transform each DataArray `x` in this dataset into another
+ DataArray.
+ keep_attrs : bool or None, optional
+ If True, both the dataset's and variables' attributes (`attrs`) will be
+ copied from the original objects to the new ones. If False, the new dataset
+ and variables will be returned without copying the attributes.
+ args : iterable, optional
+ Positional arguments passed on to `func`.
+ **kwargs : Any
+ Keyword arguments passed on to `func`.
+
+ Returns
+ -------
+ applied : Dataset
+ Resulting dataset from applying ``func`` to each data variable.
+
+ Examples
+ --------
+ >>> da = xr.DataArray(np.random.randn(2, 3))
+ >>> ds = xr.Dataset({"foo": da, "bar": ("x", [-1, 2])})
+ >>> ds
+ <xarray.Dataset>
+ Dimensions: (dim_0: 2, dim_1: 3, x: 2)
+ Dimensions without coordinates: dim_0, dim_1, x
+ Data variables:
+ foo (dim_0, dim_1) float64 1.764 0.4002 0.9787 2.241 1.868 -0.9773
+ bar (x) int64 -1 2
+ >>> ds.map(np.fabs)
+ <xarray.Dataset>
+ Dimensions: (dim_0: 2, dim_1: 3, x: 2)
+ Dimensions without coordinates: dim_0, dim_1, x
+ Data variables:
+ foo (dim_0, dim_1) float64 1.764 0.4002 0.9787 2.241 1.868 0.9773
+ bar (x) float64 1.0 2.0
+ """
+
+ # Copied from xarray.Dataset so as not to call type(self), which causes problems (see datatree GH188).
+ # TODO Refactor xarray upstream to avoid needing to overwrite this.
+ # TODO This copied version will drop all attrs - the keep_attrs stuff should be re-instated
+ variables = {
+ k: maybe_wrap_array(v, func(v, *args, **kwargs))
+ for k, v in self.data_vars.items()
+ }
+ # return type(self)(variables, attrs=attrs)
+ return Dataset(variables)
+
class DataTree(
NamedNode,
@@ -438,6 +505,11 @@ class DataTree(
"""False if node contains any data or attrs. Does not look at children."""
return not (self.has_data or self.has_attrs)
+ @property
+ def is_hollow(self) -> bool:
+ """True if only leaf nodes contain data."""
+ return not any(node.has_data for node in self.subtree if not node.is_leaf)
+
@property
def variables(self) -> Mapping[Hashable, Variable]:
"""Low level interface to node contents as dict of Variable objects.
@@ -1175,8 +1247,13 @@ class DataTree(
filterfunc: function
A function which accepts only one DataTree - the node on which filterfunc will be called.
+ Returns
+ -------
+ DataTree
+
See Also
--------
+ match
pipe
map_over_subtree
"""
@@ -1185,6 +1262,51 @@ class DataTree(
}
return DataTree.from_dict(filtered_nodes, name=self.root.name)
+ def match(self, pattern: str) -> DataTree:
+ """
+ Return nodes with paths matching pattern.
+
+ Uses unix glob-like syntax for pattern-matching.
+
+ Parameters
+ ----------
+ pattern: str
+ A pattern to match each node path against.
+
+ Returns
+ -------
+ DataTree
+
+ See Also
+ --------
+ filter
+ pipe
+ map_over_subtree
+
+ Examples
+ --------
+ >>> dt = DataTree.from_dict(
+ ... {
+ ... "/a/A": None,
+ ... "/a/B": None,
+ ... "/b/A": None,
+ ... "/b/B": None,
+ ... }
+ ... )
+ >>> dt.match("*/B")
+ DataTree('None', parent=None)
+ ├── DataTree('a')
+ │ └── DataTree('B')
+ └── DataTree('b')
+ └── DataTree('B')
+ """
+ matching_nodes = {
+ node.path: node.ds
+ for node in self.subtree
+ if NodePath(node.path).match(pattern)
+ }
+ return DataTree.from_dict(matching_nodes, name=self.root.name)
+
def map_over_subtree(
self,
func: Callable,
=====================================
datatree/mapping.py
=====================================
@@ -1,6 +1,7 @@
from __future__ import annotations
import functools
+import sys
from itertools import repeat
from textwrap import dedent
from typing import TYPE_CHECKING, Callable, Tuple
@@ -109,8 +110,8 @@ def map_over_subtree(func: Callable) -> Callable:
Applies a function to every dataset in one or more subtrees, returning new trees which store the results.
- The function will be applied to any non-empty dataset stored in any of the nodes in the trees. The returned trees
- will have the same structure as the supplied trees.
+ The function will be applied to any data-containing dataset stored in any of the nodes in the trees. The returned
+ trees will have the same structure as the supplied trees.
`func` needs to return one Datasets, DataArrays, or None in order to be able to rebuild the subtrees after
mapping, as each result will be assigned to its respective node of a new tree via `DataTree.__setitem__`. Any
@@ -189,24 +190,28 @@ def map_over_subtree(func: Callable) -> Callable:
*args_as_tree_length_iterables,
*list(kwargs_as_tree_length_iterables.values()),
):
- node_args_as_datasets = [
- a.to_dataset() if isinstance(a, DataTree) else a
- for a in all_node_args[:n_args]
+ node_args_as_datasetviews = [
+ a.ds if isinstance(a, DataTree) else a for a in all_node_args[:n_args]
]
- node_kwargs_as_datasets = dict(
+ node_kwargs_as_datasetviews = dict(
zip(
[k for k in kwargs_as_tree_length_iterables.keys()],
[
- v.to_dataset() if isinstance(v, DataTree) else v
+ v.ds if isinstance(v, DataTree) else v
for v in all_node_args[n_args:]
],
)
)
+ func_with_error_context = _handle_errors_with_path_context(
+ node_of_first_tree.path
+ )(func)
# Now we can call func on the data in this particular set of corresponding nodes
results = (
- func(*node_args_as_datasets, **node_kwargs_as_datasets)
- if not node_of_first_tree.is_empty
+ func_with_error_context(
+ *node_args_as_datasetviews, **node_kwargs_as_datasetviews
+ )
+ if node_of_first_tree.has_data
else None
)
@@ -251,6 +256,34 @@ def map_over_subtree(func: Callable) -> Callable:
return _map_over_subtree
+def _handle_errors_with_path_context(path):
+ """Wraps given function so that if it fails it also raises path to node on which it failed."""
+
+ def decorator(func):
+ def wrapper(*args, **kwargs):
+ try:
+ return func(*args, **kwargs)
+ except Exception as e:
+ if sys.version_info >= (3, 11):
+ # Add the context information to the error message
+ e.add_note(
+ f"Raised whilst mapping function over node with path {path}"
+ )
+ raise
+
+ return wrapper
+
+ return decorator
+
+
+def add_note(err: BaseException, msg: str) -> None:
+ # TODO: remove once python 3.10 can be dropped
+ if sys.version_info < (3, 11):
+ err.__notes__ = getattr(err, "__notes__", []) + [msg]
+ else:
+ err.add_note(msg)
+
+
def _check_single_set_return_values(path_to_node, obj):
"""Check types returned from single evaluation of func, and return number of return values received from func."""
if isinstance(obj, (Dataset, DataArray)):
=====================================
datatree/testing.py
=====================================
@@ -34,7 +34,7 @@ def assert_isomorphic(a: DataTree, b: DataTree, from_root: bool = False):
assert_identical
"""
__tracebackhide__ = True
- assert type(a) == type(b)
+ assert isinstance(a, type(b))
if isinstance(a, DataTree):
if from_root:
@@ -71,7 +71,7 @@ def assert_equal(a: DataTree, b: DataTree, from_root: bool = True):
assert_identical
"""
__tracebackhide__ = True
- assert type(a) == type(b)
+ assert isinstance(a, type(b))
if isinstance(a, DataTree):
if from_root:
@@ -109,7 +109,7 @@ def assert_identical(a: DataTree, b: DataTree, from_root: bool = True):
"""
__tracebackhide__ = True
- assert type(a) == type(b)
+ assert isinstance(a, type(b))
if isinstance(a, DataTree):
if from_root:
a = a.root
=====================================
datatree/tests/__init__.py
=====================================
@@ -1,7 +1,7 @@
import importlib
-from distutils import version
import pytest
+from packaging import version
def _importorskip(modname, minversion=None):
@@ -21,7 +21,7 @@ def LooseVersion(vstring):
# Our development version is something like '0.10.9+aac7bfc'
# This function just ignores the git commit id.
vstring = vstring.split("+")[0]
- return version.LooseVersion(vstring)
+ return version.parse(vstring)
has_zarr, requires_zarr = _importorskip("zarr")
=====================================
datatree/tests/test_datatree.py
=====================================
@@ -136,6 +136,16 @@ class TestStoreDatasets:
john = DataTree(name="john", data=None)
assert not john.has_data
+ def test_is_hollow(self):
+ john = DataTree(data=xr.Dataset({"a": 0}))
+ assert john.is_hollow
+
+ eve = DataTree(children={"john": john})
+ assert eve.is_hollow
+
+ eve.ds = xr.Dataset({"a": 1})
+ assert not eve.is_hollow
+
class TestVariablesChildrenNameCollisions:
def test_parent_already_has_variable_with_childs_name(self):
@@ -603,6 +613,13 @@ class TestAccess:
var_keys = list(dt.variables.keys())
assert all(var_key in key_completions for var_key in var_keys)
+ def test_operation_with_attrs_but_no_data(self):
+ # tests bug from xarray-datatree GH262
+ xs = xr.Dataset({"testvar": xr.DataArray(np.ones((2, 3)))})
+ dt = DataTree.from_dict({"node1": xs, "node2": xs})
+ dt.attrs["test_key"] = 1 # sel works fine without this line
+ dt.sel(dim_0=0)
+
class TestRestructuring:
def test_drop_nodes(self):
@@ -671,6 +688,25 @@ class TestPipe:
class TestSubset:
+ def test_match(self):
+ # TODO is this example going to cause problems with case sensitivity?
+ dt = DataTree.from_dict(
+ {
+ "/a/A": None,
+ "/a/B": None,
+ "/b/A": None,
+ "/b/B": None,
+ }
+ )
+ result = dt.match("*/B")
+ expected = DataTree.from_dict(
+ {
+ "/a/B": None,
+ "/b/B": None,
+ }
+ )
+ dtt.assert_identical(result, expected)
+
def test_filter(self):
simpsons = DataTree.from_dict(
d={
=====================================
datatree/tests/test_formatting.py
=====================================
@@ -90,11 +90,14 @@ class TestDiffFormatting:
assert actual == expected
def test_diff_node_data(self):
- ds1 = Dataset({"u": 0, "v": 1})
- ds3 = Dataset({"w": 5})
+ import numpy as np
+
+ # casting to int64 explicitly ensures that int64s are created on all architectures
+ ds1 = Dataset({"u": np.int64(0), "v": np.int64(1)})
+ ds3 = Dataset({"w": np.int64(5)})
dt_1 = DataTree.from_dict({"a": ds1, "a/b": ds3})
- ds2 = Dataset({"u": 0})
- ds4 = Dataset({"w": 6})
+ ds2 = Dataset({"u": np.int64(0)})
+ ds4 = Dataset({"w": np.int64(6)})
dt_2 = DataTree.from_dict({"a": ds2, "a/b": ds4})
expected = dedent(
=====================================
datatree/tests/test_mapping.py
=====================================
@@ -252,6 +252,35 @@ class TestMapOverSubTree:
result_tree = times_ten(subtree)
assert_equal(result_tree, expected, from_root=False)
+ def test_skip_empty_nodes_with_attrs(self, create_test_datatree):
+ # inspired by xarray-datatree GH262
+ dt = create_test_datatree()
+ dt["set1/set2"].attrs["foo"] = "bar"
+
+ def check_for_data(ds):
+ # fails if run on a node that has no data
+ assert len(ds.variables) != 0
+ return ds
+
+ dt.map_over_subtree(check_for_data)
+
+ @pytest.mark.xfail(
+ reason="probably some bug in pytests handling of exception notes"
+ )
+ def test_error_contains_path_of_offending_node(self, create_test_datatree):
+ dt = create_test_datatree()
+ dt["set1"]["bad_var"] = 0
+ print(dt)
+
+ def fail_on_specific_node(ds):
+ if "bad_var" in ds:
+ raise ValueError("Failed because 'bar_var' present in dataset")
+
+ with pytest.raises(
+ ValueError, match="Raised whilst mapping function over node /set1"
+ ):
+ dt.map_over_subtree(fail_on_specific_node)
+
class TestMutableOperations:
def test_construct_using_type(self):
@@ -275,7 +304,7 @@ class TestMutableOperations:
dt.map_over_subtree(weighted_mean)
- def test_alter_inplace(self):
+ def test_alter_inplace_forbidden(self):
simpsons = DataTree.from_dict(
d={
"/": xr.Dataset({"age": 83}),
@@ -293,7 +322,8 @@ class TestMutableOperations:
ds["age"] = ds["age"] + years
return ds
- simpsons.map_over_subtree(fast_forward, years=10)
+ with pytest.raises(AttributeError):
+ simpsons.map_over_subtree(fast_forward, years=10)
@pytest.mark.xfail
=====================================
datatree/tests/test_treenode.py
=====================================
@@ -1,7 +1,7 @@
import pytest
from datatree.iterators import LevelOrderIter, PreOrderIter
-from datatree.treenode import InvalidTreeError, NamedNode, TreeNode
+from datatree.treenode import InvalidTreeError, NamedNode, NodePath, TreeNode
class TestFamilyTree:
@@ -369,3 +369,9 @@ class TestRenderTree:
]
for expected_node, printed_node in zip(expected_nodes, printout.splitlines()):
assert expected_node in printed_node
+
+
+def test_nodepath():
+ path = NodePath("/Mary")
+ assert path.root == "/"
+ assert path.stem == "Mary"
=====================================
datatree/treenode.py
=====================================
@@ -1,5 +1,6 @@
from __future__ import annotations
+import sys
from collections import OrderedDict
from pathlib import PurePosixPath
from typing import (
@@ -30,21 +31,20 @@ class NotFoundInTreeError(ValueError):
class NodePath(PurePosixPath):
"""Represents a path from one node to another within a tree."""
- def __new__(cls, *args: str | "NodePath") -> "NodePath":
- obj = super().__new__(cls, *args)
-
- if obj.drive:
+ def __init__(self, *pathsegments):
+ if sys.version_info >= (3, 12):
+ super().__init__(*pathsegments)
+ else:
+ super().__new__(PurePosixPath, *pathsegments)
+ if self.drive:
raise ValueError("NodePaths cannot have drives")
- if obj.root not in ["/", ""]:
+ if self.root not in ["/", ""]:
raise ValueError(
'Root of NodePath can only be either "/" or "", with "" meaning the path is relative.'
)
-
# TODO should we also forbid suffixes to avoid node names with dots in them?
- return obj
-
Tree = TypeVar("Tree", bound="TreeNode")
=====================================
docs/source/api.rst
=====================================
@@ -64,6 +64,7 @@ This interface echoes that of ``xarray.Dataset``.
DataTree.has_data
DataTree.has_attrs
DataTree.is_empty
+ DataTree.is_hollow
..
@@ -102,6 +103,7 @@ For manipulating, traversing, navigating, or mapping over the tree structure.
DataTree.find_common_ancestor
map_over_subtree
DataTree.pipe
+ DataTree.match
DataTree.filter
DataTree Contents
=====================================
docs/source/hierarchical-data.rst
=====================================
@@ -175,7 +175,7 @@ Let's use a different example of a tree to discuss more complex relationships be
]
We have used the :py:meth:`~DataTree.from_dict` constructor method as an alternate way to quickly create a whole tree,
-and :ref:`filesystem-like syntax <filesystem paths>`_ (to be explained shortly) to select two nodes of interest.
+and :ref:`filesystem paths` (to be explained shortly) to select two nodes of interest.
.. ipython:: python
@@ -339,3 +339,297 @@ we can construct a complex tree quickly using the alternative constructor :py:me
Notice that using the path-like syntax will also create any intermediate empty nodes necessary to reach the end of the specified path
(i.e. the node labelled `"c"` in this case.)
This is to help avoid lots of redundant entries when creating deeply-nested trees using :py:meth:`DataTree.from_dict`.
+
+.. _iterating over trees:
+
+Iterating over trees
+~~~~~~~~~~~~~~~~~~~~
+
+You can iterate over every node in a tree using the subtree :py:class:`~DataTree.subtree` property.
+This returns an iterable of nodes, which yields them in depth-first order.
+
+.. ipython:: python
+
+ for node in vertebrates.subtree:
+ print(node.path)
+
+A very useful pattern is to use :py:class:`~DataTree.subtree` conjunction with the :py:class:`~DataTree.path` property to manipulate the nodes however you wish,
+then rebuild a new tree using :py:meth:`DataTree.from_dict()`.
+
+For example, we could keep only the nodes containing data by looping over all nodes,
+checking if they contain any data using :py:class:`~DataTree.has_data`,
+then rebuilding a new tree using only the paths of those nodes:
+
+.. ipython:: python
+
+ non_empty_nodes = {node.path: node.ds for node in dt.subtree if node.has_data}
+ DataTree.from_dict(non_empty_nodes)
+
+You can see this tree is similar to the ``dt`` object above, except that it is missing the empty nodes ``a/c`` and ``a/c/d``.
+
+(If you want to keep the name of the root node, you will need to add the ``name`` kwarg to :py:class:`from_dict`, i.e. ``DataTree.from_dict(non_empty_nodes, name=dt.root.name)``.)
+
+.. _Tree Contents:
+
+Tree Contents
+-------------
+
+Hollow Trees
+~~~~~~~~~~~~
+
+A concept that can sometimes be useful is that of a "Hollow Tree", which means a tree with data stored only at the leaf nodes.
+This is useful because certain useful tree manipulation operations only make sense for hollow trees.
+
+You can check if a tree is a hollow tree by using the :py:meth:`~DataTree.is_hollow` property.
+We can see that the Simpson's family is not hollow because the data variable ``"age"`` is present at some nodes which
+have children (i.e. Abe and Homer).
+
+.. ipython:: python
+
+ simpsons.is_hollow
+
+.. _manipulating trees:
+
+Manipulating Trees
+------------------
+
+Subsetting Tree Nodes
+~~~~~~~~~~~~~~~~~~~~~
+
+We can subset our tree to select only nodes of interest in various ways.
+
+Similarly to on a real filesystem, matching nodes by common patterns in their paths is often useful.
+We can use :py:meth:`DataTree.match` for this:
+
+.. ipython:: python
+
+ dt = DataTree.from_dict(
+ {
+ "/a/A": None,
+ "/a/B": None,
+ "/b/A": None,
+ "/b/B": None,
+ }
+ )
+ result = dt.match("*/B")
+
+We can also subset trees by the contents of the nodes.
+:py:meth:`DataTree.filter` retains only the nodes of a tree that meet a certain condition.
+For example, we could recreate the Simpson's family tree with the ages of each individual, then filter for only the adults:
+First lets recreate the tree but with an `age` data variable in every node:
+
+.. ipython:: python
+
+ simpsons = DataTree.from_dict(
+ d={
+ "/": xr.Dataset({"age": 83}),
+ "/Herbert": xr.Dataset({"age": 40}),
+ "/Homer": xr.Dataset({"age": 39}),
+ "/Homer/Bart": xr.Dataset({"age": 10}),
+ "/Homer/Lisa": xr.Dataset({"age": 8}),
+ "/Homer/Maggie": xr.Dataset({"age": 1}),
+ },
+ name="Abe",
+ )
+ simpsons
+
+Now let's filter out the minors:
+
+.. ipython:: python
+
+ simpsons.filter(lambda node: node["age"] > 18)
+
+The result is a new tree, containing only the nodes matching the condition.
+
+(Yes, under the hood :py:meth:`~DataTree.filter` is just syntactic sugar for the pattern we showed you in :ref:`iterating over trees` !)
+
+.. _tree computation:
+
+Computation
+-----------
+
+`DataTree` objects are also useful for performing computations, not just for organizing data.
+
+Operations and Methods on Trees
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To show how applying operations across a whole tree at once can be useful,
+let's first create a example scientific dataset.
+
+.. ipython:: python
+
+ def time_stamps(n_samples, T):
+ """Create an array of evenly-spaced time stamps"""
+ return xr.DataArray(
+ data=np.linspace(0, 2 * np.pi * T, n_samples), dims=["time"]
+ )
+
+
+ def signal_generator(t, f, A, phase):
+ """Generate an example electrical-like waveform"""
+ return A * np.sin(f * t.data + phase)
+
+
+ time_stamps1 = time_stamps(n_samples=15, T=1.5)
+ time_stamps2 = time_stamps(n_samples=10, T=1.0)
+
+ voltages = DataTree.from_dict(
+ {
+ "/oscilloscope1": xr.Dataset(
+ {
+ "potential": (
+ "time",
+ signal_generator(time_stamps1, f=2, A=1.2, phase=0.5),
+ ),
+ "current": (
+ "time",
+ signal_generator(time_stamps1, f=2, A=1.2, phase=1),
+ ),
+ },
+ coords={"time": time_stamps1},
+ ),
+ "/oscilloscope2": xr.Dataset(
+ {
+ "potential": (
+ "time",
+ signal_generator(time_stamps2, f=1.6, A=1.6, phase=0.2),
+ ),
+ "current": (
+ "time",
+ signal_generator(time_stamps2, f=1.6, A=1.6, phase=0.7),
+ ),
+ },
+ coords={"time": time_stamps2},
+ ),
+ }
+ )
+ voltages
+
+Most xarray computation methods also exist as methods on datatree objects,
+so you can for example take the mean value of these two timeseries at once:
+
+.. ipython:: python
+
+ voltages.mean(dim="time")
+
+This works by mapping the standard :py:meth:`xarray.Dataset.mean()` method over the dataset stored in each node of the
+tree one-by-one.
+
+The arguments passed to the method are used for every node, so the values of the arguments you pass might be valid for one node and invalid for another
+
+.. ipython:: python
+ :okexcept:
+
+ voltages.isel(time=12)
+
+Notice that the error raised helpfully indicates which node of the tree the operation failed on.
+
+Arithmetic Methods on Trees
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Arithmetic methods are also implemented, so you can e.g. add a scalar to every dataset in the tree at once.
+For example, we can advance the timeline of the Simpsons by a decade just by
+
+.. ipython:: python
+
+ simpsons + 10
+
+See that the same change (fast-forwarding by adding 10 years to the age of each character) has been applied to every node.
+
+Mapping Custom Functions Over Trees
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+You can map custom computation over each node in a tree using :py:func:`map_over_subtree`.
+You can map any function, so long as it takes `xarray.Dataset` objects as one (or more) of the input arguments,
+and returns one (or more) xarray datasets.
+
+.. note::
+
+ Functions passed to :py:func:`map_over_subtree` cannot alter nodes in-place.
+ Instead they must return new `xarray.Dataset` objects.
+
+For example, we can define a function to calculate the Root Mean Square of a timeseries
+
+.. ipython:: python
+
+ def rms(signal):
+ return np.sqrt(np.mean(signal**2))
+
+Then calculate the RMS value of these signals:
+
+.. ipython:: python
+
+ rms(readings)
+
+.. _multiple trees:
+
+Operating on Multiple Trees
+---------------------------
+
+The examples so far have involved mapping functions or methods over the nodes of a single tree,
+but we can generalize this to mapping functions over multiple trees at once.
+
+Comparing Trees for Isomorphism
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For it to make sense to map a single non-unary function over the nodes of multiple trees at once,
+each tree needs to have the same structure. Specifically two trees can only be considered similar, or "isomorphic",
+if they have the same number of nodes, and each corresponding node has the same number of children.
+We can check if any two trees are isomorphic using the :py:meth:`DataTree.isomorphic` method.
+
+.. ipython:: python
+ :okexcept:
+
+ dt1 = DataTree.from_dict({"a": None, "a/b": None})
+ dt2 = DataTree.from_dict({"a": None})
+ dt1.isomorphic(dt2)
+
+ dt3 = DataTree.from_dict({"a": None, "b": None})
+ dt1.isomorphic(dt3)
+
+ dt4 = DataTree.from_dict({"A": None, "A/B": xr.Dataset({"foo": 1})})
+ dt1.isomorphic(dt4)
+
+If the trees are not isomorphic a :py:class:`~TreeIsomorphismError` will be raised.
+Notice that corresponding tree nodes do not need to have the same name or contain the same data in order to be considered isomorphic.
+
+Arithmetic Between Multiple Trees
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Arithmetic operations like multiplication are binary operations, so as long as we have wo isomorphic trees,
+we can do arithmetic between them.
+
+.. ipython:: python
+
+ currents = DataTree.from_dict(
+ {
+ "/oscilloscope1": xr.Dataset(
+ {
+ "current": (
+ "time",
+ signal_generator(time_stamps1, f=2, A=1.2, phase=1),
+ ),
+ },
+ coords={"time": time_stamps1},
+ ),
+ "/oscilloscope2": xr.Dataset(
+ {
+ "current": (
+ "time",
+ signal_generator(time_stamps2, f=1.6, A=1.6, phase=0.7),
+ ),
+ },
+ coords={"time": time_stamps2},
+ ),
+ }
+ )
+ currents
+
+ currents.isomorphic(voltages)
+
+We could use this feature to quickly calculate the electrical power in our signal, P=IV.
+
+.. ipython:: python
+
+ power = currents * voltages
+ power
=====================================
docs/source/whats-new.rst
=====================================
@@ -15,9 +15,55 @@ What's New
np.random.seed(123456)
+.. _whats-new.v0.0.13:
+
+v0.0.13 (unreleased)
+--------------------
+
+New Features
+~~~~~~~~~~~~
+
+- New :py:meth:`DataTree.match` method for glob-like pattern matching of node paths. (:pull:`267`)
+ By `Tom Nicholas <https://github.com/TomNicholas>`_.
+- New :py:meth:`DataTree.is_hollow` property for checking if data is only contained at the leaf nodes. (:pull:`272`)
+ By `Tom Nicholas <https://github.com/TomNicholas>`_.
+- Indicate which node caused the problem if error encountered while applying user function using :py:func:`map_over_subtree`
+ (:issue:`190`, :pull:`264`). Only works when using python 3.11 or later.
+ By `Tom Nicholas <https://github.com/TomNicholas>`_.
+
+Breaking changes
+~~~~~~~~~~~~~~~~
+
+- Nodes containing only attributes but no data are now ignored by :py:func:`map_over_subtree` (:issue:`262`, :pull:`263`)
+ By `Tom Nicholas <https://github.com/TomNicholas>`_.
+- Disallow altering of given dataset inside function called by :py:func:`map_over_subtree` (:pull:`269`, reverts part of :pull:`194`).
+ By `Tom Nicholas <https://github.com/TomNicholas>`_.
+
+Deprecations
+~~~~~~~~~~~~
+
+Bug fixes
+~~~~~~~~~
+
+- Fix unittests on i386. (:pull:`249`)
+ By `Antonio Valentino <https://github.com/avalentino>`_.
+- Ensure nodepath class is compatible with python 3.12 (:pull:`260`)
+ By `Max Grover <https://github.com/mgrover1>`_.
+
+Documentation
+~~~~~~~~~~~~~
+
+- Added new sections to page on ``Working with Hierarchical Data`` (:pull:`180`)
+ By `Tom Nicholas <https://github.com/TomNicholas>`_.
+
+Internal Changes
+~~~~~~~~~~~~~~~~
+
+* No longer use the deprecated `distutils` package.
+
.. _whats-new.v0.0.12:
-v0.0.12 (unreleased)
+v0.0.12 (03/07/2023)
--------------------
New Features
=====================================
pyproject.toml
=====================================
@@ -20,6 +20,7 @@ classifiers = [
requires-python = ">=3.9"
dependencies = [
"xarray >=2022.6.0",
+ "packaging",
]
dynamic = ["version"]
View it on GitLab: https://salsa.debian.org/debian-gis-team/xarray-datatree/-/commit/bed37d4b47efda0bd57cbe9e635efb14b28c3fc3
--
View it on GitLab: https://salsa.debian.org/debian-gis-team/xarray-datatree/-/commit/bed37d4b47efda0bd57cbe9e635efb14b28c3fc3
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/pkg-grass-devel/attachments/20231028/dd7b3537/attachment-0001.htm>
More information about the Pkg-grass-devel
mailing list