From 136e2ac0b26753196b7a1cfd5ba2efbdf7cad81f Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Thu, 5 Aug 2021 14:50:10 -0400 Subject: [PATCH 001/260] Initial commit --- xarray/datatree_/.gitignore | 129 +++++++++++++++++++++++ xarray/datatree_/LICENSE | 201 ++++++++++++++++++++++++++++++++++++ xarray/datatree_/README.md | 2 + 3 files changed, 332 insertions(+) create mode 100644 xarray/datatree_/.gitignore create mode 100644 xarray/datatree_/LICENSE create mode 100644 xarray/datatree_/README.md diff --git a/xarray/datatree_/.gitignore b/xarray/datatree_/.gitignore new file mode 100644 index 00000000000..b6e47617de1 --- /dev/null +++ b/xarray/datatree_/.gitignore @@ -0,0 +1,129 @@ +# Byte-compiled / optimized / DLL files +__pycache__/ +*.py[cod] +*$py.class + +# C extensions +*.so + +# Distribution / packaging +.Python +build/ +develop-eggs/ +dist/ +downloads/ +eggs/ +.eggs/ +lib/ +lib64/ +parts/ +sdist/ +var/ +wheels/ +pip-wheel-metadata/ +share/python-wheels/ +*.egg-info/ +.installed.cfg +*.egg +MANIFEST + +# PyInstaller +# Usually these files are written by a python script from a template +# before PyInstaller builds the exe, so as to inject date/other infos into it. +*.manifest +*.spec + +# Installer logs +pip-log.txt +pip-delete-this-directory.txt + +# Unit test / coverage reports +htmlcov/ +.tox/ +.nox/ +.coverage +.coverage.* +.cache +nosetests.xml +coverage.xml +*.cover +*.py,cover +.hypothesis/ +.pytest_cache/ + +# Translations +*.mo +*.pot + +# Django stuff: +*.log +local_settings.py +db.sqlite3 +db.sqlite3-journal + +# Flask stuff: +instance/ +.webassets-cache + +# Scrapy stuff: +.scrapy + +# Sphinx documentation +docs/_build/ + +# PyBuilder +target/ + +# Jupyter Notebook +.ipynb_checkpoints + +# IPython +profile_default/ +ipython_config.py + +# pyenv +.python-version + +# pipenv +# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. +# However, in case of collaboration, if having platform-specific dependencies or dependencies +# having no cross-platform support, pipenv may install dependencies that don't work, or not +# install all needed dependencies. +#Pipfile.lock + +# PEP 582; used by e.g. github.com/David-OConnor/pyflow +__pypackages__/ + +# Celery stuff +celerybeat-schedule +celerybeat.pid + +# SageMath parsed files +*.sage.py + +# Environments +.env +.venv +env/ +venv/ +ENV/ +env.bak/ +venv.bak/ + +# Spyder project settings +.spyderproject +.spyproject + +# Rope project settings +.ropeproject + +# mkdocs documentation +/site + +# mypy +.mypy_cache/ +.dmypy.json +dmypy.json + +# Pyre type checker +.pyre/ diff --git a/xarray/datatree_/LICENSE b/xarray/datatree_/LICENSE new file mode 100644 index 00000000000..261eeb9e9f8 --- /dev/null +++ b/xarray/datatree_/LICENSE @@ -0,0 +1,201 @@ + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/xarray/datatree_/README.md b/xarray/datatree_/README.md new file mode 100644 index 00000000000..6806597a656 --- /dev/null +++ b/xarray/datatree_/README.md @@ -0,0 +1,2 @@ +# xtree +WIP implementation of a tree-like hierarchical data structure for xarray. From bb676aaa6aaad6d15b8e0a006a55895f1846e217 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Thu, 5 Aug 2021 18:08:47 -0400 Subject: [PATCH 002/260] pseudocode skeleton --- xarray/datatree_/xtree/__init__.py | 2 + xarray/datatree_/xtree/datatree.py | 419 +++++++++++++++++++++++++++++ xarray/datatree_/xtree/io.py | 65 +++++ 3 files changed, 486 insertions(+) create mode 100644 xarray/datatree_/xtree/__init__.py create mode 100644 xarray/datatree_/xtree/datatree.py create mode 100644 xarray/datatree_/xtree/io.py diff --git a/xarray/datatree_/xtree/__init__.py b/xarray/datatree_/xtree/__init__.py new file mode 100644 index 00000000000..5b61ab46634 --- /dev/null +++ b/xarray/datatree_/xtree/__init__.py @@ -0,0 +1,2 @@ +from .datatree import DataTree +from .io import open_datatree, open_mfdatatree diff --git a/xarray/datatree_/xtree/datatree.py b/xarray/datatree_/xtree/datatree.py new file mode 100644 index 00000000000..724bf85ee14 --- /dev/null +++ b/xarray/datatree_/xtree/datatree.py @@ -0,0 +1,419 @@ +from __future__ import annotations + +from collections import MutableMapping +from pathlib import Path + +from typing import Sequence, Tuple, Mapping, Hashable, Union, List, Any, Callable, Iterable + +from xarray.core.dataset import Dataset +from xarray.core.dataarray import DataArray +from xarray.core.combine import merge +from xarray.core import dtypes + + +PathType = Union[Hashable, Sequence[Hashable]] + + +def _path_to_tuple(path: PathType) -> Tuple[Hashable]: + if isinstance(path, str): + return path + else: + return tuple(Path(path).parts) + + +class DataTree(MutableMapping): + """ + A tree-like hierarchical collection of xarray objects. + + Parameters + ---------- + data_objects : dict-like, optional + A mapping from path names to xarray.Dataset, xarray.DataArray, or xtree.DataTree objects. + + Path names can be given as unix-like paths, or as tuples of strings (where each string + is known as a single "tag"). If path names containing more than one tag are given, new + tree nodes will be constructed as necessary. + + To assign data to the root node of the tree use "/" or "" as the path. + """ + + # TODO Add attrs dict by inheriting from xarray.core.common.AttrsAccessMixin + + # TODO Some way of sorting children by depth + + # TODO Consistency in copying vs updating objects + + # TODO ipython autocomplete for child nodes + + def __init__( + self, + data_objects: Mapping[PathType, Union[Dataset, DataArray]] = None, + ): + self._name = None + self._parent = None + self._dataset = None + self._children = [] + + # Populate tree with children determined from data_objects mapping + for path, obj in data_objects.items(): + self._set_item(path, obj, allow_overwrites=False, new_nodes_along_path=True) + + @classmethod + def _construct( + cls, + name: Hashable = None, + parent: DataTree = None, + children: List[DataTree] = None, + data: Union[Dataset, DataArray] = None, + ) -> DataTree: + """Alternative to __init__ allowing direct creation of a non-root node.""" + + if children is None: + children = [] + + node = cls.__new__(cls) + + node._name = name + node._children = children + node.parent = parent + node.dataset = data + + return node + + @property + def name(self) -> Hashable: + """Name tag for this node.""" + return self._name + + @property + def dataset(self) -> Dataset: + return self._dataset + + @dataset.setter + def dataset(self, data: Union[Dataset, DataArray] = None): + if not isinstance(data, (Dataset, DataArray)) or data is not None: + raise TypeError(f"{type(data)} object is not an xarray Dataset or DataArray") + if isinstance(data, DataArray): + data = data.to_dataset() + self._dataset = data + + @property + def parent(self) -> Union[DataTree, None]: + return self._parent + + @parent.setter + def parent(self, parent: DataTree): + if parent is not None: + if not isinstance(parent, DataTree): + raise TypeError(f"{type(parent.__name__)} object is not a node of a DataTree") + + if self._name in [c.name for c in parent._children]: + raise KeyError(f"Cannot set parent: parent node {parent._name} " + f"already has a child node named {self._name}") + else: + # Parent needs to know it now has a child + parent.children = parent.children + [self] + self._parent = parent + + @property + def children(self) -> List[DataTree]: + return self._children + + @children.setter + def children(self, children: List[DataTree]): + if not all(isinstance(c, DataTree) for c in children): + raise TypeError(f"children must all be of type DataTree") + self._children = children + + def _walk_parents(self) -> DataTree: + """Walk through this node and its parents.""" + yield self + node = self._parent + while node is not None: + yield node + node = node._parent + + def root_node(self) -> DataTree: + """Return the root node in the tree.""" + for node in self._walk_parents(): + pass + return node + + def _walk_children(self) -> DataTree: + """Recursively walk through this node and all its child nodes.""" + yield self + for child in self._children: + for node in child._walk_children(): + yield node + + def add_node(self, name: Hashable, data: Union[DataTree, Dataset, DataArray] = None) -> DataTree: + """Add a child node immediately below this node, and return the new child node.""" + if isinstance(data, DataTree): + data.parent = self + self._children.append(data) + return data + else: + return self._construct(name=name, parent=self, data=data) + + @staticmethod + def _get_node_depth1(node: DataTree, key: Hashable) -> DataTree: + if node is None: + return None + if key == '..': + return node._parent + if key == '.': + return node + for child in node._children: + if key == child._name: + return child + return None + + def get(self, path: str, default: DataTree = None) -> DataTree: + """Return a node given any relative or absolute UNIX-like path.""" + # TODO rewrite using pathlib? + if path == '/': + return self.root_node() + elif path.startswith('/'): + node = self.root_node() + slash, path = path + else: + node = self + + for key in path.split('/'): + node = self._get_node_depth1(node, key) + if node is None: + node = default + + return node + + def __getitem__(self, path: PathType) -> DataTree: + """ + Access node of the tree lying at the given path. + + Raises a KeyError if not found. + + Parameters + ---------- + path : + Path names can be given as unix-like paths, or as tuples of strings (where each string + is known as a single "tag"). + + Returns + ------- + node : DataTree + """ + node = self.get(path) + if node is None: + raise KeyError(f"Node {path} not found") + return node + + def _set_item(self, path: PathType, value: Union[DataTree, Dataset, DataArray], + new_nodes_along_path: bool, + allow_overwrites: bool) -> None: + # TODO: Check that dimensions/coordinates are compatible with adjacent nodes? + + # This check is redundant with checks called in `add_node`, but if we don't do it here + # then a failed __setitem__ might create a trail of new nodes all the way down + if not isinstance(value, (DataTree, Dataset)): + raise TypeError("Can only set new nodes to DataTree or Dataset instances, not " + f"{type(value.__name__)}") + + # Walk to location of new node, creating DataTree objects as we go if necessary + *tags, last_tag = _path_to_tuple(path) + parent = self + for tag in tags: + if tag not in parent.children: + if new_nodes_along_path: + parent = self.add_node(tag) + else: + # TODO Should this also be before we walk? + raise KeyError(f"Cannot reach new node at path {path}: " + f"parent {parent} has no child {tag}") + parent = self._get_node_depth1(parent, tag) + + if last_tag in parent.children: + if not allow_overwrites: + # TODO should this be before we walk to the new node? + raise KeyError(f"Cannot set item at {path} whilst that path already points to a " + f"{type(parent.get(last_tag))} object") + else: + # TODO Delete any newly-orphaned children + ... + + parent.add_node(last_tag, data=value) + + def __setitem__(self, path: PathType, value: Union[DataTree, Dataset, DataArray]) -> None: + """ + Add a leaf to the DataTree, overwriting anything already present at that path. + + The new value can be an array or a DataTree, in which case it forms a new node of the tree. + + Parameters + ---------- + path : Union[Hashable, Sequence[Hashable]] + Path names can be given as unix-like paths, or as tuples of strings (where each string + is known as a single "tag"). + value : Union[DataTree, Dataset, DataArray] + """ + self._set_item(path=path, value=value, new_nodes_along_path=True, allow_overwrites=True) + + def update_node(self, path: PathType, value: Union[DataTree, Dataset, DataArray]) -> None: + """Overwrite the data at a specific node.""" + self._set_item(path=path, value=value, new_nodes_along_path=False, allow_overwrites=True) + + def __delitem__(self, path: PathType): + for child in self._walk_children(): + del child + + def __iter__(self): + return iter(c.name for c in self._children) + + def __len__(self): + return len(self._children) + + @property + def tags(self) -> Tuple[Hashable]: + """All tags, returned in order starting from the root node""" + return tuple(reversed([node.name for node in self._walk_parents()])) + + @property + def path(self) -> str: + """Full path to this node, given as a UNIX-like path.""" + if self._parent is None: + return '/' + else: + return '/'.join(self.tags[-1::-1]) + + def __repr__(self) -> str: + type_str = "" + tree_str = self._node_repr(indent_depth=0) + # TODO add attrs dict to the repr + return type_str + tree_str + + def _node_repr(self, indent_depth: int) -> str: + indent_str = "|" + indent_depth * " |" + "-- " + node_repr = "\n" + indent_str + str(self.name) + + if self.dataset is not None: + # TODO indent every line properly? + node_repr += "\n" + indent_str + f"{repr(self.dataset)[17:]}" + + for child in self.children: + node_repr += child._node_repr(indent_depth+1) + + return node_repr + + def get_all(self, *tags: Hashable) -> DataTree: + """ + Return a DataTree containing the stored objects whose path contains all of the given tags, + where the tags can be present in any order. + """ + matching_children = {c.tags: c.get(tags) for c in self._walk_children() + if all(tag in c.tags for tag in tags)} + return DataTree(data_objects=matching_children) + + def get_any(self, *tags: Hashable) -> DataTree: + """ + Return a DataTree containing the stored objects whose path contains any of the given tags. + """ + matching_children = {c.tags: c.get(tags) for c in self._walk_children() + if any(tag in c.tags for tag in tags)} + return DataTree(data_objects=matching_children) + + def map( + self, + func: Callable, + *args: Iterable[Any], + **kwargs: Any, + ) -> Iterable[Any]: + """ + Apply a function to the dataset at each node in the tree, returning a generator + of all the results. + + Parameters + ---------- + func : callable + Function to apply to datasets with signature: + `func(node.name, node.dataset, *args, **kwargs) -> None or return value`. + + Function will still be applied to any nodes without datasets, + in which cases the `dataset` argument to `func` will be `None`. + *args : tuple, optional + Positional arguments passed on to `func`. + **kwargs : Any + Keyword arguments passed on to `func`. + + Returns + ------- + applied : Iterable[Any] + Generator of results from applying ``func`` to the dataset at each node. + """ + for node in self._walk_children(): + yield func(node.name, node.dataset, *args, **kwargs) + + def map_inplace( + self, + func: Callable, + *args: Iterable[Any], + **kwargs: Any, + ) -> None: + """ + Apply a function to the dataset at each node in the tree, updating each node in place. + + Parameters + ---------- + func : callable + Function to apply to datasets with signature: + `func(node.name, node.dataset, *args, **kwargs) -> Dataset`. + + Function will still be applied to any nodes without datasets, + in which cases the `dataset` argument to `func` will be `None`. + *args : tuple, optional + Positional arguments passed on to `func`. + **kwargs : Any + Keyword arguments passed on to `func`. + """ + for node in self._walk_children(): + new_ds = func(node.name, node.dataset, *args, **kwargs) + node.update_node(node.path, value=new_ds) + + # TODO map applied ufuncs over all leaves + # TODO map applied dataset/dataarray methods over all leaves + + @property + def chunks(self): + raise NotImplementedError + + def chunk(self): + raise NotImplementedError + + def merge(self, datatree: DataTree) -> DataTree: + """Merge all the leaves of a second DataTree into this one.""" + raise NotImplementedError + + def merge_child_nodes(self, *paths, new_path: PathType) -> DataTree: + """Merge a set of child nodes into a single new node.""" + raise NotImplementedError + + def merge_child_datasets( + self, + *paths: PathType, + compat: str = "no_conflicts", + join: str = "outer", + fill_value: Any = dtypes.NA, + combine_attrs: str = "override", + ) -> Dataset: + """Merge the datasets at a set of child nodes and return as a single Dataset.""" + datasets = [self.get(path).dataset for path in paths] + return merge(datasets, compat=compat, join=join, fill_value=fill_value, combine_attrs=combine_attrs) + + def as_dataarray(self) -> DataArray: + return self.dataset.as_dataarray() + + def to_netcdf(self, filename: str): + from .io import _datatree_to_netcdf + + _datatree_to_netcdf(self, filename) + + def plot(self): + raise NotImplementedError diff --git a/xarray/datatree_/xtree/io.py b/xarray/datatree_/xtree/io.py new file mode 100644 index 00000000000..9bd0e3b02fc --- /dev/null +++ b/xarray/datatree_/xtree/io.py @@ -0,0 +1,65 @@ +from typing import Sequence + +from netCDF4 import Dataset as nc_dataset + +from xarray import open_dataset + +from .datatree import DataTree, PathType + + +def _get_group_names(file): + rootgrp = nc_dataset("test.nc", "r", format="NETCDF4") + + def walktree(top): + yield top.groups.values() + for value in top.groups.values(): + yield from walktree(value) + + groups = [] + for children in walktree(rootgrp): + for child in children: + # TODO include parents in saved path + groups.append(child.name) + + rootgrp.close() + return groups + + +def open_datatree(filename_or_obj, engine=None, chunks=None, **kwargs) -> DataTree: + """ + Open and decode a dataset from a file or file-like object, creating one DataTree node + for each group in the file. + """ + + # TODO find all the netCDF groups in the file + file_groups = _get_group_names(filename_or_obj) + + # Populate the DataTree with the groups + groups_and_datasets = {group_path: open_dataset(engine=engine, chunks=chunks, **kwargs) + for group_path in file_groups} + return DataTree(data_objects=groups_and_datasets) + + +def open_mfdatatree(filepaths, rootnames: Sequence[PathType] = None, engine=None, chunks=None, **kwargs) -> DataTree: + """ + Open multiple files as a single DataTree. + + Groups found in each file will be merged at the root level, unless rootnames are specified, + which will then be used to organise the Tree instead. + """ + if rootnames is None: + rootnames = ["/" for _ in filepaths] + elif len(rootnames) != len(filepaths): + raise ValueError + + full_tree = DataTree() + + for file, root in zip(filepaths, rootnames): + dt = open_datatree(file, engine=engine, chunks=chunks, **kwargs) + full_tree._set_item(path=root, value=dt, new_nodes_along_path=True, allow_overwrites=False) + + return full_tree + + +def _datatree_to_netcdf(dt: DataTree, path_or_file: str): + raise NotImplementedError From c7176383a15d05cc3ef25d51e7cdc3d82097e3e7 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Fri, 6 Aug 2021 11:55:14 -0400 Subject: [PATCH 003/260] refactored to use TreeNode class --- xarray/datatree_/xtree/datatree.py | 345 ++++++++++++++++------------- 1 file changed, 195 insertions(+), 150 deletions(-) diff --git a/xarray/datatree_/xtree/datatree.py b/xarray/datatree_/xtree/datatree.py index 724bf85ee14..a7d8db0a07d 100644 --- a/xarray/datatree_/xtree/datatree.py +++ b/xarray/datatree_/xtree/datatree.py @@ -1,7 +1,8 @@ from __future__ import annotations -from collections import MutableMapping +from collections.abc import MutableMapping from pathlib import Path +import functools from typing import Sequence, Tuple, Mapping, Hashable, Union, List, Any, Callable, Iterable @@ -21,64 +22,23 @@ def _path_to_tuple(path: PathType) -> Tuple[Hashable]: return tuple(Path(path).parts) -class DataTree(MutableMapping): - """ - A tree-like hierarchical collection of xarray objects. - - Parameters - ---------- - data_objects : dict-like, optional - A mapping from path names to xarray.Dataset, xarray.DataArray, or xtree.DataTree objects. - - Path names can be given as unix-like paths, or as tuples of strings (where each string - is known as a single "tag"). If path names containing more than one tag are given, new - tree nodes will be constructed as necessary. - - To assign data to the root node of the tree use "/" or "" as the path. - """ - - # TODO Add attrs dict by inheriting from xarray.core.common.AttrsAccessMixin - - # TODO Some way of sorting children by depth - - # TODO Consistency in copying vs updating objects - - # TODO ipython autocomplete for child nodes +class TreeNode(MutableMapping): + """Base class representing a node of a tree, with methods for traversing the tree.""" def __init__( self, - data_objects: Mapping[PathType, Union[Dataset, DataArray]] = None, + name: Hashable, + parent: TreeNode = None, + children: List[TreeNode] = None, ): - self._name = None - self._parent = None - self._dataset = None - self._children = [] - - # Populate tree with children determined from data_objects mapping - for path, obj in data_objects.items(): - self._set_item(path, obj, allow_overwrites=False, new_nodes_along_path=True) - - @classmethod - def _construct( - cls, - name: Hashable = None, - parent: DataTree = None, - children: List[DataTree] = None, - data: Union[Dataset, DataArray] = None, - ) -> DataTree: - """Alternative to __init__ allowing direct creation of a non-root node.""" if children is None: children = [] - node = cls.__new__(cls) - - node._name = name - node._children = children - node.parent = parent - node.dataset = data - - return node + self._name = name + self.children = children + self._parent = None + self.parent = parent @property def name(self) -> Hashable: @@ -86,43 +46,36 @@ def name(self) -> Hashable: return self._name @property - def dataset(self) -> Dataset: - return self._dataset - - @dataset.setter - def dataset(self, data: Union[Dataset, DataArray] = None): - if not isinstance(data, (Dataset, DataArray)) or data is not None: - raise TypeError(f"{type(data)} object is not an xarray Dataset or DataArray") - if isinstance(data, DataArray): - data = data.to_dataset() - self._dataset = data - - @property - def parent(self) -> Union[DataTree, None]: + def parent(self) -> Union[TreeNode, None]: return self._parent @parent.setter - def parent(self, parent: DataTree): + def parent(self, parent: TreeNode): if parent is not None: - if not isinstance(parent, DataTree): - raise TypeError(f"{type(parent.__name__)} object is not a node of a DataTree") + if not isinstance(parent, TreeNode): + raise TypeError(f"{type(parent)} object is not a valid parent") if self._name in [c.name for c in parent._children]: raise KeyError(f"Cannot set parent: parent node {parent._name} " - f"already has a child node named {self._name}") + f"already has a child node named {self._name}") else: - # Parent needs to know it now has a child + # If there was an original parent they can no longer have custody + if self.parent is not None: + self.parent.children.remove(self) + + # New parent needs to know it now has a child parent.children = parent.children + [self] + self._parent = parent @property - def children(self) -> List[DataTree]: + def children(self) -> List[TreeNode]: return self._children @children.setter - def children(self, children: List[DataTree]): - if not all(isinstance(c, DataTree) for c in children): - raise TypeError(f"children must all be of type DataTree") + def children(self, children: List[TreeNode]): + if not all(isinstance(c, TreeNode) for c in children): + raise TypeError(f"children must all be valid tree nodes") self._children = children def _walk_parents(self) -> DataTree: @@ -146,6 +99,9 @@ def _walk_children(self) -> DataTree: for node in child._walk_children(): yield node + def __repr__(self): + return f"TreeNode(name={self._name}, parent={self._parent}, children={self.children})" + def add_node(self, name: Hashable, data: Union[DataTree, Dataset, DataArray] = None) -> DataTree: """Add a child node immediately below this node, and return the new child node.""" if isinstance(data, DataTree): @@ -168,7 +124,33 @@ def _get_node_depth1(node: DataTree, key: Hashable) -> DataTree: return child return None - def get(self, path: str, default: DataTree = None) -> DataTree: + def __delitem__(self, path: PathType): + for child in self._walk_children(): + del child + + def __iter__(self): + return iter(c.name for c in self._children) + + def __len__(self): + return len(self._children) + + def get(self, path: str, default: DataTree = None) -> TreeNode: + """ + Access node of the tree lying at the given path. + + Raises a KeyError if not found. + + Parameters + ---------- + path : + Path names can be given as unix-like paths, or as tuples of strings + (where each string is known as a single "tag"). + + Returns + ------- + node : DataTree + """ + """Return a node given any relative or absolute UNIX-like path.""" # TODO rewrite using pathlib? if path == '/': @@ -187,29 +169,28 @@ def get(self, path: str, default: DataTree = None) -> DataTree: return node def __getitem__(self, path: PathType) -> DataTree: + node = self.get(path) + if node is None: + raise KeyError(f"Node {path} not found") + return node + + def set(self, path: PathType, value: Union[TreeNode, Dataset, DataArray]) -> None: """ - Access node of the tree lying at the given path. + Add a leaf to the tree, overwriting anything already present at that path. - Raises a KeyError if not found. + The new value can be an array or a DataTree, in which case it forms a new node of the tree. Parameters ---------- - path : + path : Union[Hashable, Sequence[Hashable]] Path names can be given as unix-like paths, or as tuples of strings (where each string is known as a single "tag"). - - Returns - ------- - node : DataTree + value : Union[DataTree, Dataset, DataArray] """ - node = self.get(path) - if node is None: - raise KeyError(f"Node {path} not found") - return node + self._set_item(path=path, value=value, new_nodes_along_path=True, allow_overwrites=True) def _set_item(self, path: PathType, value: Union[DataTree, Dataset, DataArray], - new_nodes_along_path: bool, - allow_overwrites: bool) -> None: + new_nodes_along_path: bool, allow_overwrites: bool) -> None: # TODO: Check that dimensions/coordinates are compatible with adjacent nodes? # This check is redundant with checks called in `add_node`, but if we don't do it here @@ -257,20 +238,6 @@ def __setitem__(self, path: PathType, value: Union[DataTree, Dataset, DataArray] """ self._set_item(path=path, value=value, new_nodes_along_path=True, allow_overwrites=True) - def update_node(self, path: PathType, value: Union[DataTree, Dataset, DataArray]) -> None: - """Overwrite the data at a specific node.""" - self._set_item(path=path, value=value, new_nodes_along_path=False, allow_overwrites=True) - - def __delitem__(self, path: PathType): - for child in self._walk_children(): - del child - - def __iter__(self): - return iter(c.name for c in self._children) - - def __len__(self): - return len(self._children) - @property def tags(self) -> Tuple[Hashable]: """All tags, returned in order starting from the root node""" @@ -284,57 +251,58 @@ def path(self) -> str: else: return '/'.join(self.tags[-1::-1]) - def __repr__(self) -> str: - type_str = "" - tree_str = self._node_repr(indent_depth=0) - # TODO add attrs dict to the repr - return type_str + tree_str - def _node_repr(self, indent_depth: int) -> str: - indent_str = "|" + indent_depth * " |" + "-- " - node_repr = "\n" + indent_str + str(self.name) +class DatasetNode(TreeNode): + """ + A tree node, but optionally containing data in the form of an xarray.Dataset. - if self.dataset is not None: - # TODO indent every line properly? - node_repr += "\n" + indent_str + f"{repr(self.dataset)[17:]}" + Also implements xarray.Dataset methods, but wrapped to update all child nodes too. + """ - for child in self.children: - node_repr += child._node_repr(indent_depth+1) + # TODO add all the other methods to dispatch + _DS_METHODS_TO_DISPATCH = ['isel', 'sel', 'min', 'max', '__array_ufunc__'] - return node_repr + def __init__( + self, + data: Dataset = None, + name: Hashable = None, + parent: TreeNode = None, + children: List[TreeNode] = None, + ): + super().__init__(name=name, parent=parent, children=children) + self.ds = data - def get_all(self, *tags: Hashable) -> DataTree: - """ - Return a DataTree containing the stored objects whose path contains all of the given tags, - where the tags can be present in any order. - """ - matching_children = {c.tags: c.get(tags) for c in self._walk_children() - if all(tag in c.tags for tag in tags)} - return DataTree(data_objects=matching_children) + # Enable dataset API methods + for method_name in self._DS_METHODS_TO_DISPATCH: + ds_method = getattr(Dataset, method_name) + self._dispatch_to_children(ds_method) - def get_any(self, *tags: Hashable) -> DataTree: - """ - Return a DataTree containing the stored objects whose path contains any of the given tags. - """ - matching_children = {c.tags: c.get(tags) for c in self._walk_children() - if any(tag in c.tags for tag in tags)} - return DataTree(data_objects=matching_children) + @property + def ds(self) -> Dataset: + return self._ds - def map( + @ds.setter + def ds(self, data: Union[Dataset, DataArray] = None): + if not isinstance(data, (Dataset, DataArray)) or data is not None: + raise TypeError(f"{type(data)} object is not an xarray Dataset or DataArray") + if isinstance(data, DataArray): + data = data.to_dataset() + self._ds = data + + def map_inplace( self, func: Callable, *args: Iterable[Any], **kwargs: Any, - ) -> Iterable[Any]: + ) -> None: """ - Apply a function to the dataset at each node in the tree, returning a generator - of all the results. + Apply a function to the dataset at each child node in the tree, updating data in place. Parameters ---------- func : callable Function to apply to datasets with signature: - `func(node.name, node.dataset, *args, **kwargs) -> None or return value`. + `func(node.name, node.dataset, *args, **kwargs) -> Dataset`. Function will still be applied to any nodes without datasets, in which cases the `dataset` argument to `func` will be `None`. @@ -342,29 +310,26 @@ def map( Positional arguments passed on to `func`. **kwargs : Any Keyword arguments passed on to `func`. - - Returns - ------- - applied : Iterable[Any] - Generator of results from applying ``func`` to the dataset at each node. """ for node in self._walk_children(): - yield func(node.name, node.dataset, *args, **kwargs) + new_ds = func(node.name, node.ds, *args, **kwargs) + node.dataset = new_ds - def map_inplace( + def map( self, func: Callable, *args: Iterable[Any], **kwargs: Any, - ) -> None: + ) -> Iterable[Any]: """ - Apply a function to the dataset at each node in the tree, updating each node in place. + Apply a function to the dataset at each node in the tree, returning a generator + of all the results. Parameters ---------- func : callable Function to apply to datasets with signature: - `func(node.name, node.dataset, *args, **kwargs) -> Dataset`. + `func(node.name, node.dataset, *args, **kwargs) -> None or return value`. Function will still be applied to any nodes without datasets, in which cases the `dataset` argument to `func` will be `None`. @@ -372,13 +337,93 @@ def map_inplace( Positional arguments passed on to `func`. **kwargs : Any Keyword arguments passed on to `func`. + + Returns + ------- + applied : Iterable[Any] + Generator of results from applying ``func`` to the dataset at each node. """ for node in self._walk_children(): - new_ds = func(node.name, node.dataset, *args, **kwargs) - node.update_node(node.path, value=new_ds) + yield func(node.name, node.ds, *args, **kwargs) # TODO map applied ufuncs over all leaves - # TODO map applied dataset/dataarray methods over all leaves + + def _dispatch_to_children(self, method: Callable) -> None: + """Wrap such that when method is called on this instance it is also called on children.""" + _dispatching_method = functools.partial(self.map_inplace, func=method) + # TODO update method docstrings accordingly + setattr(self, method.__name__, _dispatching_method) + + def _node_repr(self, indent_depth: int) -> str: + indent_str = "|" + indent_depth * " |" + "-- " + node_repr = "\n" + indent_str + str(self.name) + + if self.ds is not None: + # TODO indent every line properly? + node_repr += "\n" + indent_str + f"{repr(self.ds)[17:]}" + + for child in self.children: + node_repr += child._node_repr(indent_depth+1) + + return node_repr + + +class DataTree(DatasetNode): + """ + A tree-like hierarchical collection of xarray objects. + + Parameters + ---------- + data_objects : dict-like, optional + A mapping from path names to xarray.Dataset, xarray.DataArray, or xtree.DataTree objects. + + Path names can be given as unix-like paths, or as tuples of strings (where each string + is known as a single "tag"). If path names containing more than one tag are given, new + tree nodes will be constructed as necessary. + + To assign data to the root node of the tree use "/" or "" as the path. + """ + + # TODO Add attrs dict by inheriting from xarray.core.common.AttrsAccessMixin + + # TODO Some way of sorting children by depth + + # TODO Consistency in copying vs updating objects + + # TODO ipython autocomplete for child nodes + + def __init__( + self, + data_objects: Mapping[PathType, Union[Dataset, DataArray, DatasetNode]] = None, + ): + super().__init__(ds=None, name=None, parent=None, children=[]) + + # Populate tree with children determined from data_objects mapping + for path, obj in data_objects.items(): + self._set_item(path, obj, allow_overwrites=False, new_nodes_along_path=True) + + def __repr__(self) -> str: + type_str = "" + tree_str = self._node_repr(indent_depth=0) + # TODO add attrs dict to the repr + return type_str + tree_str + + def get_all(self, *tags: Hashable) -> DataTree: + """ + Return a DataTree containing the stored objects whose path contains all of the given tags, + where the tags can be present in any order. + """ + matching_children = {c.tags: c.get(tags) for c in self._walk_children() + if all(tag in c.tags for tag in tags)} + return DataTree(data_objects=matching_children) + + def get_any(self, *tags: Hashable) -> DataTree: + """ + Return a DataTree containing the stored objects whose path contains any of the given tags. + """ + matching_children = {c.tags: c.get(tags) for c in self._walk_children() + if any(tag in c.tags for tag in tags)} + return DataTree(data_objects=matching_children) @property def chunks(self): @@ -404,11 +449,11 @@ def merge_child_datasets( combine_attrs: str = "override", ) -> Dataset: """Merge the datasets at a set of child nodes and return as a single Dataset.""" - datasets = [self.get(path).dataset for path in paths] + datasets = [self.get(path).ds for path in paths] return merge(datasets, compat=compat, join=join, fill_value=fill_value, combine_attrs=combine_attrs) def as_dataarray(self) -> DataArray: - return self.dataset.as_dataarray() + return self.ds.as_dataarray() def to_netcdf(self, filename: str): from .io import _datatree_to_netcdf From 6f70b6221dbf3269a46548b829e8226c0c026672 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Fri, 6 Aug 2021 11:55:37 -0400 Subject: [PATCH 004/260] Initial TreeNode family tests --- xarray/datatree_/xtree/tests/test_datatree.py | 140 ++++++++++++++++++ 1 file changed, 140 insertions(+) create mode 100644 xarray/datatree_/xtree/tests/test_datatree.py diff --git a/xarray/datatree_/xtree/tests/test_datatree.py b/xarray/datatree_/xtree/tests/test_datatree.py new file mode 100644 index 00000000000..78e4ba1493e --- /dev/null +++ b/xarray/datatree_/xtree/tests/test_datatree.py @@ -0,0 +1,140 @@ +import pytest + +import xarray as xr + +from xtree.datatree import TreeNode, DatasetNode, DataTree + + +def create_test_datatree(): + """ + Create a test datatree with this structure: + + + |-- set1 + | |-- + | | Dimensions: () + | | Data variables: + | | a int64 0 + | | b int64 1 + | |-- set1 + | |-- set2 + |-- set2 + | |-- + | | Dimensions: (x: 2) + | | Data variables: + | | a (x) int64 2, 3 + | | b (x) int64 'foo', 'bar' + | |-- set1 + |-- set3 + |-- + | Dimensions: (x: 2, y: 3) + | Data variables: + | a (y) int64 6, 7, 8 + | set1 (x) int64 9, 10 + + The structure has deliberately repeated names of tags, variables, and + dimensions in order to better check for bugs caused by name conflicts. + """ + set1_data = xr.Dataset({'a': 0, 'b': 1}) + set2_data = xr.Dataset({'a': ('x', [2, 3]), 'b': ('x', ['foo', 'bar'])}) + root_data = xr.Dataset({'a': ('y', [6, 7, 8]), 'set1': ('x', [9, 10])}) + + # Avoid using __init__ so we can independently test it + root = DataTree(data_objects={'/': root_data}) + set1 = DatasetNode(name="set1", parent=root, data=set1_data) + set1_set1 = DatasetNode(name="set1", parent=set1) + set1_set2 = DatasetNode(name="set1", parent=set1) + set2 = DatasetNode(name="set1", parent=root, data=set2_data) + set2_set1 = DatasetNode(name="set1", parent=set2) + set3 = DatasetNode(name="set3", parent=root) + + return root + + +class TestTreeNodes: + def test_lonely(self): + root = TreeNode("/") + assert root.name == "/" + assert root.parent is None + assert root.children == [] + + def test_parenting(self): + john = TreeNode("john") + mary = TreeNode("mary", parent=john) + + assert mary.parent == john + assert mary in john.children + + with pytest.raises(KeyError, match="already has a child node named"): + TreeNode("mary", parent=john) + + with pytest.raises(TypeError, match="object is not a valid parent"): + mary.parent = "apple" + + def test_parent_swap(self): + john = TreeNode("john") + mary = TreeNode("mary", parent=john) + + steve = TreeNode("steve") + mary.parent = steve + assert mary in steve.children + assert mary not in john.children + + def test_multi_child_family(self): + mary = TreeNode("mary") + kate = TreeNode("kate") + john = TreeNode("john", children=[mary, kate]) + + + def test_walking_parents(self): + ... + + def test_walking_children(self): + ... + + def test_adoption(self): + ... + + +class TestTreePlanting: + def test_empty(self): + dt = DataTree() + root = DataTree() + + def test_one_layer(self): + dt = DataTree({"run1": xr.Dataset(), "run2": xr.DataArray()}) + + def test_two_layers(self): + dt = DataTree({"highres/run1": xr.Dataset(), "highres/run2": xr.Dataset()}) + + dt = DataTree({"highres/run1": xr.Dataset(), "lowres/run1": xr.Dataset()}) + assert dt.children == ... + + def test_full(self): + dt = create_test_datatree() + print(dt) + assert False + + +class TestBrowsing: + ... + + +class TestRestructuring: + ... + + +class TestRepr: + ... + + +class TestIO: + ... + + +class TestMethodInheritance: + ... + + +class TestUFuncs: + ... From 4533a3a3b138502a5ed14433fa356fba7508d415 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 11 Aug 2021 20:44:10 -0400 Subject: [PATCH 005/260] more family tree tests --- xarray/datatree_/xtree/datatree.py | 23 +++++++++++++++- xarray/datatree_/xtree/tests/test_datatree.py | 27 ++++++++++++++++--- 2 files changed, 46 insertions(+), 4 deletions(-) diff --git a/xarray/datatree_/xtree/datatree.py b/xarray/datatree_/xtree/datatree.py index a7d8db0a07d..6399166c208 100644 --- a/xarray/datatree_/xtree/datatree.py +++ b/xarray/datatree_/xtree/datatree.py @@ -76,6 +76,16 @@ def children(self) -> List[TreeNode]: def children(self, children: List[TreeNode]): if not all(isinstance(c, TreeNode) for c in children): raise TypeError(f"children must all be valid tree nodes") + + # Don't allow duplicate names + num_children = len([c.name for c in children]) + num_unique_children = len(set(c.name for c in children)) + if num_unique_children < num_children: + raise ValueError("All children must have unique names") + + # Tell children that they have a new parent + for c in children: + c._parent = self self._children = children def _walk_parents(self) -> DataTree: @@ -99,8 +109,19 @@ def _walk_children(self) -> DataTree: for node in child._walk_children(): yield node + @property + def siblings(self) -> Iterable[TreeNode]: + return [k for k in self.parent.children if k is not self] + + @siblings.setter + def siblings(self, value: Any) -> Iterable[TreeNode]: + raise AttributeError(f"Cannot set siblings directly - instead set children or parents") + + def __str__(self): + return f"TreeNode('{self._name}')" + def __repr__(self): - return f"TreeNode(name={self._name}, parent={self._parent}, children={self.children})" + return f"TreeNode(name='{self._name}', parent={str(self._parent)}, children={[str(c) for c in self._children]})" def add_node(self, name: Hashable, data: Union[DataTree, Dataset, DataArray] = None) -> DataTree: """Add a child node immediately below this node, and return the new child node.""" diff --git a/xarray/datatree_/xtree/tests/test_datatree.py b/xarray/datatree_/xtree/tests/test_datatree.py index 78e4ba1493e..84620aa830a 100644 --- a/xarray/datatree_/xtree/tests/test_datatree.py +++ b/xarray/datatree_/xtree/tests/test_datatree.py @@ -84,16 +84,37 @@ def test_multi_child_family(self): mary = TreeNode("mary") kate = TreeNode("kate") john = TreeNode("john", children=[mary, kate]) + assert mary in john.children + assert kate in john.children + assert mary.parent is john + assert kate.parent is john + def test_no_identical_twins(self): + ... + def test_sibling_relationships(self): + mary = TreeNode("mary") + kate = TreeNode("kate") + ashley = TreeNode("ashley") + john = TreeNode("john", children=[mary, kate, ashley]) + assert mary in kate.siblings + assert ashley in kate.siblings + print(kate.siblings) + assert kate not in kate.siblings + with pytest.raises(AttributeError, match="Cannot set siblings directly"): + kate.siblings = john + + @pytest.mark.xfail def test_walking_parents(self): - ... + raise NotImplementedError + @pytest.mark.xfail def test_walking_children(self): - ... + raise NotImplementedError + @pytest.mark.xfail def test_adoption(self): - ... + raise NotImplementedError class TestTreePlanting: From d573116142be69d252469170c45134f2f94e9d51 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Thu, 12 Aug 2021 12:34:24 -0400 Subject: [PATCH 006/260] reimplemented TreeNode class using anytree library --- xarray/datatree_/xtree/datatree.py | 293 ++++++------------ xarray/datatree_/xtree/tests/test_datatree.py | 11 +- 2 files changed, 107 insertions(+), 197 deletions(-) diff --git a/xarray/datatree_/xtree/datatree.py b/xarray/datatree_/xtree/datatree.py index 6399166c208..2af37bbdddf 100644 --- a/xarray/datatree_/xtree/datatree.py +++ b/xarray/datatree_/xtree/datatree.py @@ -1,11 +1,11 @@ from __future__ import annotations -from collections.abc import MutableMapping -from pathlib import Path import functools from typing import Sequence, Tuple, Mapping, Hashable, Union, List, Any, Callable, Iterable +import anytree + from xarray.core.dataset import Dataset from xarray.core.dataarray import DataArray from xarray.core.combine import merge @@ -15,147 +15,69 @@ PathType = Union[Hashable, Sequence[Hashable]] -def _path_to_tuple(path: PathType) -> Tuple[Hashable]: - if isinstance(path, str): - return path - else: - return tuple(Path(path).parts) +class TreeNode(anytree.NodeMixin): + """ + Base class representing a node of a tree, with methods for traversing and altering the tree. + Depends on the anytree library for all tree traversal methods, but the parent class is fairly small + so could be easily reimplemented to avoid a hard dependency. + """ -class TreeNode(MutableMapping): - """Base class representing a node of a tree, with methods for traversing the tree.""" + _resolver = anytree.Resolver('name') def __init__( self, name: Hashable, parent: TreeNode = None, - children: List[TreeNode] = None, + children: Iterable[TreeNode] = None, ): - if children is None: - children = [] - - self._name = name - self.children = children - self._parent = None + self.name = name self.parent = parent - - @property - def name(self) -> Hashable: - """Name tag for this node.""" - return self._name - - @property - def parent(self) -> Union[TreeNode, None]: - return self._parent - - @parent.setter - def parent(self, parent: TreeNode): - if parent is not None: - if not isinstance(parent, TreeNode): - raise TypeError(f"{type(parent)} object is not a valid parent") - - if self._name in [c.name for c in parent._children]: - raise KeyError(f"Cannot set parent: parent node {parent._name} " - f"already has a child node named {self._name}") - else: - # If there was an original parent they can no longer have custody - if self.parent is not None: - self.parent.children.remove(self) - - # New parent needs to know it now has a child - parent.children = parent.children + [self] - - self._parent = parent - - @property - def children(self) -> List[TreeNode]: - return self._children - - @children.setter - def children(self, children: List[TreeNode]): - if not all(isinstance(c, TreeNode) for c in children): - raise TypeError(f"children must all be valid tree nodes") - - # Don't allow duplicate names - num_children = len([c.name for c in children]) - num_unique_children = len(set(c.name for c in children)) - if num_unique_children < num_children: - raise ValueError("All children must have unique names") - - # Tell children that they have a new parent - for c in children: - c._parent = self - self._children = children - - def _walk_parents(self) -> DataTree: - """Walk through this node and its parents.""" - yield self - node = self._parent - while node is not None: - yield node - node = node._parent - - def root_node(self) -> DataTree: - """Return the root node in the tree.""" - for node in self._walk_parents(): - pass - return node - - def _walk_children(self) -> DataTree: - """Recursively walk through this node and all its child nodes.""" - yield self - for child in self._children: - for node in child._walk_children(): - yield node - - @property - def siblings(self) -> Iterable[TreeNode]: - return [k for k in self.parent.children if k is not self] - - @siblings.setter - def siblings(self, value: Any) -> Iterable[TreeNode]: - raise AttributeError(f"Cannot set siblings directly - instead set children or parents") + if children: + self.children = children def __str__(self): - return f"TreeNode('{self._name}')" + return f"TreeNode('{self.name}')" def __repr__(self): - return f"TreeNode(name='{self._name}', parent={str(self._parent)}, children={[str(c) for c in self._children]})" - - def add_node(self, name: Hashable, data: Union[DataTree, Dataset, DataArray] = None) -> DataTree: - """Add a child node immediately below this node, and return the new child node.""" - if isinstance(data, DataTree): - data.parent = self - self._children.append(data) - return data + return f"TreeNode(name='{self.name}', parent={str(self.parent)}, children={[str(c) for c in self.children]})" + + def _pre_attach(self, parent: TreeNode) -> None: + """ + Method which super NodeMixin class calls before setting parent, + here used to prevent children with duplicate names. + """ + if self.name in list(c.name for c in parent.children): + raise KeyError(f"parent {str(parent)} already has a child named {self.name}") + + def _pre_attach_children(self, children: Iterable[TreeNode]) -> None: + """ + Method which super NodeMixin class calls before setting children, + here used to prevent children with duplicate names. + """ + # TODO test this + childrens_names = (c.name for c in children) + if len(set(childrens_names)) < len(list(childrens_names)): + raise KeyError(f"Cannot add multiple children with the same name to parent {str(self)}") + + def add_child(self, child: TreeNode) -> None: + """Add a single child node below this node, without replacement.""" + if child.name not in list(c.name for c in self.children): + child.parent = self + else: + raise KeyError(f"Node already has a child named {child.name}") + + @classmethod + def _tuple_or_path_to_path(cls, address: PathType) -> str: + if isinstance(address, str): + return address + elif isinstance(address, tuple): + return cls.separator.join(tag for tag in address) else: - return self._construct(name=name, parent=self, data=data) - - @staticmethod - def _get_node_depth1(node: DataTree, key: Hashable) -> DataTree: - if node is None: - return None - if key == '..': - return node._parent - if key == '.': - return node - for child in node._children: - if key == child._name: - return child - return None - - def __delitem__(self, path: PathType): - for child in self._walk_children(): - del child - - def __iter__(self): - return iter(c.name for c in self._children) - - def __len__(self): - return len(self._children) - - def get(self, path: str, default: DataTree = None) -> TreeNode: + raise ValueError(f"{address} is not a valid form of path") + + def get(self, path: PathType) -> TreeNode: """ Access node of the tree lying at the given path. @@ -169,108 +91,90 @@ def get(self, path: str, default: DataTree = None) -> TreeNode: Returns ------- - node : DataTree + node """ - """Return a node given any relative or absolute UNIX-like path.""" - # TODO rewrite using pathlib? - if path == '/': - return self.root_node() - elif path.startswith('/'): - node = self.root_node() - slash, path = path - else: - node = self - - for key in path.split('/'): - node = self._get_node_depth1(node, key) - if node is None: - node = default + p = self._tuple_or_path_to_path(path) - return node - - def __getitem__(self, path: PathType) -> DataTree: - node = self.get(path) - if node is None: - raise KeyError(f"Node {path} not found") - return node + return anytree.Resolver('name').get(self, p) def set(self, path: PathType, value: Union[TreeNode, Dataset, DataArray]) -> None: """ - Add a leaf to the tree, overwriting anything already present at that path. + Set a node on the tree, overwriting anything already present at that path. The new value can be an array or a DataTree, in which case it forms a new node of the tree. + Paths are specified relative to the node on which this method was called. + Parameters ---------- path : Union[Hashable, Sequence[Hashable]] Path names can be given as unix-like paths, or as tuples of strings (where each string is known as a single "tag"). - value : Union[DataTree, Dataset, DataArray] + value : Union[TreeNOde, Dataset, DataArray, None] """ - self._set_item(path=path, value=value, new_nodes_along_path=True, allow_overwrites=True) + self._set_item(path=path, value=value, new_nodes_along_path=True, allow_overwrite=True) + + def _set_item(self, path: PathType, value: Union[TreeNode, Dataset, DataArray, None], + new_nodes_along_path: bool, allow_overwrite: bool) -> None: + + p = self._tuple_or_path_to_path(path) - def _set_item(self, path: PathType, value: Union[DataTree, Dataset, DataArray], - new_nodes_along_path: bool, allow_overwrites: bool) -> None: # TODO: Check that dimensions/coordinates are compatible with adjacent nodes? - # This check is redundant with checks called in `add_node`, but if we don't do it here - # then a failed __setitem__ might create a trail of new nodes all the way down - if not isinstance(value, (DataTree, Dataset)): - raise TypeError("Can only set new nodes to DataTree or Dataset instances, not " - f"{type(value.__name__)}") + if not isinstance(value, (TreeNode, Dataset, DataArray)): + raise TypeError("Can only set new nodes to TreeNode, Dataset, or DataArray instances, not " + f"{type(value.__name__)}") - # Walk to location of new node, creating DataTree objects as we go if necessary - *tags, last_tag = _path_to_tuple(path) + # Walk to location of new node, creating node objects as we go if necessary + path = self._tuple_or_path_to_path(path) + *tags, last_tag = path.split(self.separator) parent = self for tag in tags: + # TODO will this mutation within a for loop actually work? if tag not in parent.children: if new_nodes_along_path: - parent = self.add_node(tag) + self.add_child(TreeNode(name=tag, parent=parent)) else: # TODO Should this also be before we walk? raise KeyError(f"Cannot reach new node at path {path}: " - f"parent {parent} has no child {tag}") - parent = self._get_node_depth1(parent, tag) + f"parent {parent} has no child {tag}") + parent = list(self.children)[tag] + # Deal with anything existing at this location if last_tag in parent.children: - if not allow_overwrites: + if allow_overwrite: + child = list(parent.children)[last_tag] + child.parent = None + del child + else: # TODO should this be before we walk to the new node? raise KeyError(f"Cannot set item at {path} whilst that path already points to a " f"{type(parent.get(last_tag))} object") - else: - # TODO Delete any newly-orphaned children - ... - - parent.add_node(last_tag, data=value) - - def __setitem__(self, path: PathType, value: Union[DataTree, Dataset, DataArray]) -> None: - """ - Add a leaf to the DataTree, overwriting anything already present at that path. - The new value can be an array or a DataTree, in which case it forms a new node of the tree. + # Create new child node and set at this location + if value is None: + new_child = TreeNode(name=last_tag, parent=parent) + elif isinstance(value, (Dataset, DataArray)): + new_child = TreeNode(name=last_tag, parent=parent) + new_child.ds = value + elif isinstance(value, TreeNode): + new_child = value + new_child.parent = parent + else: + raise TypeError - Parameters - ---------- - path : Union[Hashable, Sequence[Hashable]] - Path names can be given as unix-like paths, or as tuples of strings (where each string - is known as a single "tag"). - value : Union[DataTree, Dataset, DataArray] - """ - self._set_item(path=path, value=value, new_nodes_along_path=True, allow_overwrites=True) + def glob(self, path: str): + return self._resolver.glob(self, path) @property def tags(self) -> Tuple[Hashable]: """All tags, returned in order starting from the root node""" - return tuple(reversed([node.name for node in self._walk_parents()])) + return tuple(self.path.split(self.separator)) - @property - def path(self) -> str: - """Full path to this node, given as a UNIX-like path.""" - if self._parent is None: - return '/' - else: - return '/'.join(self.tags[-1::-1]) + @tags.setter + def tags(self, value): + raise AttributeError(f"tags cannot be set, except via changing the children and/or parent of a node.") class DatasetNode(TreeNode): @@ -310,6 +214,9 @@ def ds(self, data: Union[Dataset, DataArray] = None): data = data.to_dataset() self._ds = data + def has_data(self): + return self.ds is None + def map_inplace( self, func: Callable, @@ -419,6 +326,8 @@ def __init__( ): super().__init__(ds=None, name=None, parent=None, children=[]) + # TODO implement using anytree.DictImporter + # Populate tree with children determined from data_objects mapping for path, obj in data_objects.items(): self._set_item(path, obj, allow_overwrites=False, new_nodes_along_path=True) diff --git a/xarray/datatree_/xtree/tests/test_datatree.py b/xarray/datatree_/xtree/tests/test_datatree.py index 84620aa830a..c0c496bb321 100644 --- a/xarray/datatree_/xtree/tests/test_datatree.py +++ b/xarray/datatree_/xtree/tests/test_datatree.py @@ -1,5 +1,7 @@ import pytest +from anytree.node.exceptions import TreeError + import xarray as xr from xtree.datatree import TreeNode, DatasetNode, DataTree @@ -56,7 +58,7 @@ def test_lonely(self): root = TreeNode("/") assert root.name == "/" assert root.parent is None - assert root.children == [] + assert root.children == () def test_parenting(self): john = TreeNode("john") @@ -65,10 +67,10 @@ def test_parenting(self): assert mary.parent == john assert mary in john.children - with pytest.raises(KeyError, match="already has a child node named"): + with pytest.raises(KeyError, match="already has a child named"): TreeNode("mary", parent=john) - with pytest.raises(TypeError, match="object is not a valid parent"): + with pytest.raises(TreeError, match="not of type 'NodeMixin'"): mary.parent = "apple" def test_parent_swap(self): @@ -99,9 +101,8 @@ def test_sibling_relationships(self): john = TreeNode("john", children=[mary, kate, ashley]) assert mary in kate.siblings assert ashley in kate.siblings - print(kate.siblings) assert kate not in kate.siblings - with pytest.raises(AttributeError, match="Cannot set siblings directly"): + with pytest.raises(AttributeError): kate.siblings = john @pytest.mark.xfail From 93e4040222e4128ed3bf2721549d6ca7c461ca3e Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Mon, 16 Aug 2021 12:11:52 -0400 Subject: [PATCH 007/260] passes tests for basic node structure --- xarray/datatree_/xtree/datatree.py | 136 +++++++++--------- xarray/datatree_/xtree/tests/test_datatree.py | 65 +++++++-- 2 files changed, 129 insertions(+), 72 deletions(-) diff --git a/xarray/datatree_/xtree/datatree.py b/xarray/datatree_/xtree/datatree.py index 2af37bbdddf..b0730459271 100644 --- a/xarray/datatree_/xtree/datatree.py +++ b/xarray/datatree_/xtree/datatree.py @@ -19,10 +19,16 @@ class TreeNode(anytree.NodeMixin): """ Base class representing a node of a tree, with methods for traversing and altering the tree. - Depends on the anytree library for all tree traversal methods, but the parent class is fairly small + Depends on the anytree library for basic tree structure, but the parent class is fairly small so could be easily reimplemented to avoid a hard dependency. + + Adds restrictions preventing children with the same name, a method to set new nodes at arbitrary depth, + and access via unix-like paths or tuples of tags. Does not yet store anything in the nodes of the tree. """ + # TODO remove anytree dependency + # TODO allow for loops via symbolic links? + _resolver = anytree.Resolver('name') def __init__( @@ -43,30 +49,25 @@ def __str__(self): def __repr__(self): return f"TreeNode(name='{self.name}', parent={str(self.parent)}, children={[str(c) for c in self.children]})" + def render(self): + """Print tree structure, with only node names displayed.""" + for pre, _, node in anytree.RenderTree(self): + print(f"{pre}{node}") + def _pre_attach(self, parent: TreeNode) -> None: """ - Method which super NodeMixin class calls before setting parent, - here used to prevent children with duplicate names. + Method which superclass calls before setting parent, here used to prevent having two + children with duplicate names. """ if self.name in list(c.name for c in parent.children): raise KeyError(f"parent {str(parent)} already has a child named {self.name}") - def _pre_attach_children(self, children: Iterable[TreeNode]) -> None: - """ - Method which super NodeMixin class calls before setting children, - here used to prevent children with duplicate names. - """ - # TODO test this - childrens_names = (c.name for c in children) - if len(set(childrens_names)) < len(list(childrens_names)): - raise KeyError(f"Cannot add multiple children with the same name to parent {str(self)}") - def add_child(self, child: TreeNode) -> None: """Add a single child node below this node, without replacement.""" - if child.name not in list(c.name for c in self.children): - child.parent = self - else: + if child.name in list(c.name for c in self.children): raise KeyError(f"Node already has a child named {child.name}") + else: + child.parent = self @classmethod def _tuple_or_path_to_path(cls, address: PathType) -> str: @@ -134,7 +135,10 @@ def _set_item(self, path: PathType, value: Union[TreeNode, Dataset, DataArray, N # TODO will this mutation within a for loop actually work? if tag not in parent.children: if new_nodes_along_path: - self.add_child(TreeNode(name=tag, parent=parent)) + print(repr(parent)) + print(tag) + print(parent.children) + parent.add_child(TreeNode(name=tag, parent=parent)) else: # TODO Should this also be before we walk? raise KeyError(f"Cannot reach new node at path {path}: " @@ -176,6 +180,25 @@ def tags(self) -> Tuple[Hashable]: def tags(self, value): raise AttributeError(f"tags cannot be set, except via changing the children and/or parent of a node.") + # TODO re-implement using anytree findall function + def get_all(self, *tags: Hashable) -> DataTree: + """ + Return a DataTree containing the stored objects whose path contains all of the given tags, + where the tags can be present in any order. + """ + matching_children = {c.tags: c.get(tags) for c in self._walk_children() + if all(tag in c.tags for tag in tags)} + return DataTree(data_objects=matching_children) + + # TODO re-implement using anytree find function + def get_any(self, *tags: Hashable) -> DataTree: + """ + Return a DataTree containing the stored objects whose path contains any of the given tags. + """ + matching_children = {c.tags: c.get(tags) for c in self._walk_children() + if any(tag in c.tags for tag in tags)} + return DataTree(data_objects=matching_children) + class DatasetNode(TreeNode): """ @@ -189,8 +212,8 @@ class DatasetNode(TreeNode): def __init__( self, - data: Dataset = None, name: Hashable = None, + data: Dataset = None, parent: TreeNode = None, children: List[TreeNode] = None, ): @@ -208,12 +231,13 @@ def ds(self) -> Dataset: @ds.setter def ds(self, data: Union[Dataset, DataArray] = None): - if not isinstance(data, (Dataset, DataArray)) or data is not None: - raise TypeError(f"{type(data)} object is not an xarray Dataset or DataArray") + if not isinstance(data, (Dataset, DataArray)) and data is not None: + raise TypeError(f"{type(data)} object is not an xarray Dataset, DataArray, or None") if isinstance(data, DataArray): data = data.to_dataset() self._ds = data + @property def has_data(self): return self.ds is None @@ -239,6 +263,9 @@ def map_inplace( **kwargs : Any Keyword arguments passed on to `func`. """ + + # TODO if func fails on some node then the previous nodes will still have been updated... + for node in self._walk_children(): new_ds = func(node.name, node.ds, *args, **kwargs) node.dataset = new_ds @@ -276,24 +303,25 @@ def map( # TODO map applied ufuncs over all leaves - def _dispatch_to_children(self, method: Callable) -> None: + @classmethod + def _dispatch_to_children(cls, method: Callable) -> None: """Wrap such that when method is called on this instance it is also called on children.""" - _dispatching_method = functools.partial(self.map_inplace, func=method) + _dispatching_method = functools.partial(cls.map_inplace, func=method) # TODO update method docstrings accordingly - setattr(self, method.__name__, _dispatching_method) - - def _node_repr(self, indent_depth: int) -> str: - indent_str = "|" + indent_depth * " |" + "-- " - node_repr = "\n" + indent_str + str(self.name) + setattr(cls, method.__name__, _dispatching_method) - if self.ds is not None: - # TODO indent every line properly? - node_repr += "\n" + indent_str + f"{repr(self.ds)[17:]}" + def __str__(self): + return f"DatasetNode('{self.name}', data={self.ds})" - for child in self.children: - node_repr += child._node_repr(indent_depth+1) + def __repr__(self): + return f"TreeNode(name='{self.name}', data={str(self.ds)}, parent={str(self.parent)}, children={[str(c) for c in self.children]})" - return node_repr + def render(self): + """Print tree structure, including any data stored at each node.""" + for pre, fill, node in anytree.RenderTree(self): + print(f"{pre}DatasetNode('{self.name}')") + for ds_line in repr(node.ds)[1:]: + print(f"{fill}{ds_line}") class DataTree(DatasetNode): @@ -309,7 +337,9 @@ class DataTree(DatasetNode): is known as a single "tag"). If path names containing more than one tag are given, new tree nodes will be constructed as necessary. - To assign data to the root node of the tree use "/" or "" as the path. + To assign data to the root node of the tree use "{name}" as the path. + name : Hashable, optional + Name for the root node of the tree. Default is "root" """ # TODO Add attrs dict by inheriting from xarray.core.common.AttrsAccessMixin @@ -322,38 +352,16 @@ class DataTree(DatasetNode): def __init__( self, - data_objects: Mapping[PathType, Union[Dataset, DataArray, DatasetNode]] = None, + data_objects: Mapping[PathType, Union[Dataset, DataArray, DatasetNode, None]] = None, + name: Hashable = "root", ): - super().__init__(ds=None, name=None, parent=None, children=[]) - - # TODO implement using anytree.DictImporter - - # Populate tree with children determined from data_objects mapping - for path, obj in data_objects.items(): - self._set_item(path, obj, allow_overwrites=False, new_nodes_along_path=True) - - def __repr__(self) -> str: - type_str = "" - tree_str = self._node_repr(indent_depth=0) - # TODO add attrs dict to the repr - return type_str + tree_str + super().__init__(name=name, data=None, parent=None, children=None) - def get_all(self, *tags: Hashable) -> DataTree: - """ - Return a DataTree containing the stored objects whose path contains all of the given tags, - where the tags can be present in any order. - """ - matching_children = {c.tags: c.get(tags) for c in self._walk_children() - if all(tag in c.tags for tag in tags)} - return DataTree(data_objects=matching_children) - - def get_any(self, *tags: Hashable) -> DataTree: - """ - Return a DataTree containing the stored objects whose path contains any of the given tags. - """ - matching_children = {c.tags: c.get(tags) for c in self._walk_children() - if any(tag in c.tags for tag in tags)} - return DataTree(data_objects=matching_children) + # TODO re-implement using anytree.DictImporter? + if data_objects: + # Populate tree with children determined from data_objects mapping + for path in sorted(data_objects): + self._set_item(path, data_objects[path], allow_overwrite=False, new_nodes_along_path=True) @property def chunks(self): diff --git a/xarray/datatree_/xtree/tests/test_datatree.py b/xarray/datatree_/xtree/tests/test_datatree.py index c0c496bb321..d04f33339a5 100644 --- a/xarray/datatree_/xtree/tests/test_datatree.py +++ b/xarray/datatree_/xtree/tests/test_datatree.py @@ -91,8 +91,35 @@ def test_multi_child_family(self): assert mary.parent is john assert kate.parent is john - def test_no_identical_twins(self): - ... + def test_disown_child(self): + john = TreeNode("john") + mary = TreeNode("mary", parent=john) + mary.parent = None + assert mary not in john.children + + def test_add_child(self): + john = TreeNode("john") + kate = TreeNode("kate") + john.add_child(kate) + assert kate in john.children + assert kate.parent is john + with pytest.raises(KeyError, match="already has a child named"): + john.add_child(TreeNode("kate")) + + def test_assign_children(self): + john = TreeNode("john") + jack = TreeNode("jack") + jill = TreeNode("jill") + + john.children = (jack, jill) + assert jack in john.children + assert jack.parent is john + assert jill in john.children + assert jill.parent is john + + evil_twin_jill = TreeNode("jill") + with pytest.raises(KeyError, match="already has a child named"): + john.children = (jack, jill, evil_twin_jill) def test_sibling_relationships(self): mary = TreeNode("mary") @@ -118,10 +145,21 @@ def test_adoption(self): raise NotImplementedError -class TestTreePlanting: +class TestTreeCreation: def test_empty(self): dt = DataTree() - root = DataTree() + assert dt.name == "root" + assert dt.parent is None + assert dt.children is () + assert dt.ds is None + + def test_data_in_root(self): + dt = DataTree({"root": xr.Dataset()}) + print(dt.name) + assert dt.name == "root" + assert dt.parent is None + assert dt.children is () + assert dt.ds is xr.Dataset() def test_one_layer(self): dt = DataTree({"run1": xr.Dataset(), "run2": xr.DataArray()}) @@ -147,11 +185,18 @@ class TestRestructuring: class TestRepr: - ... - + def test_render_nodetree(self): + mary = TreeNode("mary") + kate = TreeNode("kate") + john = TreeNode("john", children=[mary, kate]) + sam = TreeNode("Sam", parent=mary) + ben = TreeNode("Ben", parent=mary) + john.render() + assert False -class TestIO: - ... + def test_render_datatree(self): + dt = create_test_datatree() + dt.render() class TestMethodInheritance: @@ -160,3 +205,7 @@ class TestMethodInheritance: class TestUFuncs: ... + + +class TestIO: + ... From 218f55b9f99dc5026eec7e492d7c16606e47055e Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Mon, 16 Aug 2021 12:20:15 -0400 Subject: [PATCH 008/260] Update README with new repo name --- xarray/datatree_/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/README.md b/xarray/datatree_/README.md index 6806597a656..1b72f3b560f 100644 --- a/xarray/datatree_/README.md +++ b/xarray/datatree_/README.md @@ -1,2 +1,2 @@ -# xtree +# datatree WIP implementation of a tree-like hierarchical data structure for xarray. From 5dcf83341d30db42a1db1191e984fc19a238ddac Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 17 Aug 2021 15:44:13 -0400 Subject: [PATCH 009/260] rename folders xtree->datatree --- .../datatree_/{xtree => datatree}/__init__.py | 0 xarray/datatree_/datatree/_version.py | 1 + .../datatree_/{xtree => datatree}/datatree.py | 0 xarray/datatree_/{xtree => datatree}/io.py | 0 .../tests/test_datatree.py | 3 +- xarray/datatree_/setup.py | 43 +++++++++++++++++++ 6 files changed, 46 insertions(+), 1 deletion(-) rename xarray/datatree_/{xtree => datatree}/__init__.py (100%) create mode 100644 xarray/datatree_/datatree/_version.py rename xarray/datatree_/{xtree => datatree}/datatree.py (100%) rename xarray/datatree_/{xtree => datatree}/io.py (100%) rename xarray/datatree_/{xtree => datatree}/tests/test_datatree.py (98%) create mode 100644 xarray/datatree_/setup.py diff --git a/xarray/datatree_/xtree/__init__.py b/xarray/datatree_/datatree/__init__.py similarity index 100% rename from xarray/datatree_/xtree/__init__.py rename to xarray/datatree_/datatree/__init__.py diff --git a/xarray/datatree_/datatree/_version.py b/xarray/datatree_/datatree/_version.py new file mode 100644 index 00000000000..ef4e01b5a5e --- /dev/null +++ b/xarray/datatree_/datatree/_version.py @@ -0,0 +1 @@ +__version__ = "0.1.dev9+g805d97f.d20210817" \ No newline at end of file diff --git a/xarray/datatree_/xtree/datatree.py b/xarray/datatree_/datatree/datatree.py similarity index 100% rename from xarray/datatree_/xtree/datatree.py rename to xarray/datatree_/datatree/datatree.py diff --git a/xarray/datatree_/xtree/io.py b/xarray/datatree_/datatree/io.py similarity index 100% rename from xarray/datatree_/xtree/io.py rename to xarray/datatree_/datatree/io.py diff --git a/xarray/datatree_/xtree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py similarity index 98% rename from xarray/datatree_/xtree/tests/test_datatree.py rename to xarray/datatree_/datatree/tests/test_datatree.py index d04f33339a5..70170675ad1 100644 --- a/xarray/datatree_/xtree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -4,7 +4,8 @@ import xarray as xr -from xtree.datatree import TreeNode, DatasetNode, DataTree +from datatree import DataTree +from datatree.datatree import TreeNode, DatasetNode def create_test_datatree(): diff --git a/xarray/datatree_/setup.py b/xarray/datatree_/setup.py new file mode 100644 index 00000000000..03a44eed978 --- /dev/null +++ b/xarray/datatree_/setup.py @@ -0,0 +1,43 @@ +from setuptools import find_packages, setup + +install_requires = [ + "xarray>=0.19.0", + "anytree", + "future", +] + +extras_require = {'tests': + [ + "pytest", + "flake8", + "black", + "codecov", + ] +} + +setup( + name="datatree", + description="Hierarchical tree-like data structures for xarray", + url="https://github.com/TomNicholas/datatree", + author="Thomas Nicholas", + author_email="thomas.nicholas@columbia.edu", + license="Apache", + classifiers=[ + "Development Status :: 5 - Production/Stable", + "Intended Audience :: Science/Research", + "Topic :: Scientific/Engineering", + "License :: OSI Approved :: Apache License", + "Operating System :: OS Independent", + "Programming Language :: Python :: 3.7", + ], + packages=find_packages(exclude=["docs", "tests", "tests.*", "docs.*"]), + install_requires=install_requires, + extras_require=extras_require, + python_requires=">=3.7", + setup_requires="setuptools_scm", + use_scm_version={ + "write_to": "datatree/_version.py", + "write_to_template": '__version__ = "{version}"', + "tag_regex": r"^(?Pv)?(?P[^\+]+)(?P.*)?$", + }, +) From 0a82f814aa8d116e4de09d14b775e57925bcceaf Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 18 Aug 2021 21:40:59 -0400 Subject: [PATCH 010/260] tests for setting elements --- .../datatree_/datatree/tests/test_datatree.py | 115 +++++++++++++++++- 1 file changed, 112 insertions(+), 3 deletions(-) diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 70170675ad1..9d33d50682c 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -56,8 +56,8 @@ def create_test_datatree(): class TestTreeNodes: def test_lonely(self): - root = TreeNode("/") - assert root.name == "/" + root = TreeNode("root") + assert root.name == "root" assert root.parent is None assert root.children == () @@ -146,6 +146,113 @@ def test_adoption(self): raise NotImplementedError +class TestStoreDatasets: + def test_create_datanode(self): + dat = xr.Dataset({'a': 0}) + john = DatasetNode("john", data=dat) + assert john.ds is dat + with pytest.raises(TypeError): + DatasetNode("mary", parent=john, data="junk") + + def test_set_data(self): + john = DatasetNode("john") + dat = xr.Dataset({'a': 0}) + john.ds = dat + assert john.ds is dat + with pytest.raises(TypeError): + john.ds = "junk" + + +class TestSetItem: + @pytest.mark.xfail + def test_not_enough_path_info(self): + john = TreeNode("john") + with pytest.raises(ValueError, match="Not enough path information"): + john.set(path='', value=xr.Dataset()) + + @pytest.mark.xfail + def test_set_child_by_name(self): + john = TreeNode("john") + john.set(path="mary", value=None) + + mary = john.children[0] + assert mary.name == "mary" + assert isinstance(mary, TreeNode) + assert mary.children is () + + def test_set_child_as_data(self): + john = TreeNode("john") + dat = xr.Dataset({'a': 0}) + john.set("mary", dat) + + mary = john.children[0] + assert mary.name == "mary" + assert isinstance(mary, DatasetNode) + assert mary.ds is dat + assert mary.children is () + + def test_set_child_as_node(self): + john = TreeNode("john") + mary = TreeNode("mary") + john.set("mary", mary) + # john["mary"] = mary + + mary = john.children[0] + assert mary.name == "mary" + assert isinstance(mary, TreeNode) + assert mary.children is () + + def test_set_grandchild(self): + john = TreeNode("john") + mary = TreeNode("mary") + rose = TreeNode("rose") + john.set("mary", mary) + mary.set("rose", rose) + + mary = john.children[0] + assert mary.name == "mary" + assert isinstance(mary, TreeNode) + assert rose in mary.children + + rose = mary.children[0] + assert rose.name == "rose" + assert isinstance(rose, TreeNode) + assert rose.children is () + + def test_set_grandchild_and_create_intermediate_child(self): + john = TreeNode("john") + rose = TreeNode("rose") + john.set("mary/rose", rose) + # john["mary/rose"] = rose + # john[("mary", "rose")] = rose + + mary = john.children[0] + assert mary.name == "mary" + assert isinstance(mary, TreeNode) + assert mary.children is (rose,) + + rose = mary.children[0] + assert rose.name == "rose" + assert isinstance(rose, TreeNode) + assert rose.children is () + + def test_no_intermediate_children_allowed(self): + john = TreeNode("john") + rose = TreeNode("rose") + with pytest.raises(KeyError, match="Cannot reach"): + john._set_item(path="mary/rose", value=rose, new_nodes_along_path=False, allow_overwrite=True) + + def test_overwrite_child(self): + ... + + def test_dont_overwrite_child(self): + ... + + +class TestGetItem: + ... + + class TestTreeCreation: def test_empty(self): dt = DataTree() @@ -159,7 +266,9 @@ def test_data_in_root(self): print(dt.name) assert dt.name == "root" assert dt.parent is None - assert dt.children is () + + child = dt.children[0] + assert dt.children is (TreeNode('root')) assert dt.ds is xr.Dataset() def test_one_layer(self): From 813d2c7eef98638ab3c0160fcb7231fc714659aa Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 18 Aug 2021 21:42:52 -0400 Subject: [PATCH 011/260] work on set method --- xarray/datatree_/datatree/datatree.py | 111 +++++++++++++++++++------- 1 file changed, 80 insertions(+), 31 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index b0730459271..b19203d13af 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -2,7 +2,7 @@ import functools -from typing import Sequence, Tuple, Mapping, Hashable, Union, List, Any, Callable, Iterable +from typing import Sequence, Tuple, Mapping, Hashable, Union, List, Any, Callable, Iterable, Dict import anytree @@ -29,6 +29,12 @@ class TreeNode(anytree.NodeMixin): # TODO remove anytree dependency # TODO allow for loops via symbolic links? + # TODO store children with their names in an OrderedDict instead of a tuple like anytree does? + # TODO do nodes even need names? Or can they just be referred to by the tags their parents store them under? + # TODO nodes should have names but they should be optional. Getting and setting should be down without reference to + # the names of stored objects, only their tags (i.e. position in the family tree) + # Ultimately you either need a list of named children, or a dictionary of unnamed children + _resolver = anytree.Resolver('name') def __init__( @@ -37,8 +43,10 @@ def __init__( parent: TreeNode = None, children: Iterable[TreeNode] = None, ): - + if not isinstance(name, str) or '/' in name: + raise ValueError(f"invalid name {name}") self.name = name + self.parent = parent if children: self.children = children @@ -51,8 +59,12 @@ def __repr__(self): def render(self): """Print tree structure, with only node names displayed.""" - for pre, _, node in anytree.RenderTree(self): - print(f"{pre}{node}") + # TODO should be rewritten to reflect names of children rather than names of nodes, probably like anytree.node + # TODO add option to suppress dataset information beyond just variable names + #for pre, _, node in anytree.RenderTree(self): + # print(f"{pre}{node}") + args = ["%r" % self.separator.join([""] + [str(node.name) for node in self.path])] + print(anytree.node.util._repr(self, args=args, nameblacklist=["name"])) def _pre_attach(self, parent: TreeNode) -> None: """ @@ -94,12 +106,29 @@ def get(self, path: PathType) -> TreeNode: ------- node """ - p = self._tuple_or_path_to_path(path) + return anytree.Resolver('name').get(self, p) + + def __getitem__(self, path: PathType) -> TreeNode: + """ + Access node of the tree lying at the given path. + + Raises a KeyError if not found. + Parameters + ---------- + path : + Path names can be given as unix-like paths, or as tuples of strings + (where each string is known as a single "tag"). + + Returns + ------- + node + """ + p = self._tuple_or_path_to_path(path) return anytree.Resolver('name').get(self, p) - def set(self, path: PathType, value: Union[TreeNode, Dataset, DataArray]) -> None: + def set(self, path: PathType, value: Union[TreeNode, Dataset, DataArray] = None) -> None: """ Set a node on the tree, overwriting anything already present at that path. @@ -112,41 +141,60 @@ def set(self, path: PathType, value: Union[TreeNode, Dataset, DataArray]) -> Non path : Union[Hashable, Sequence[Hashable]] Path names can be given as unix-like paths, or as tuples of strings (where each string is known as a single "tag"). - value : Union[TreeNOde, Dataset, DataArray, None] + value : Union[TreeNode, Dataset, DataArray, None] + """ + self._set_item(path=path, value=value, new_nodes_along_path=True, allow_overwrite=True) + + def __setitem__(self, path: PathType, value: Union[TreeNode, Dataset, DataArray] = None) -> None: + """ + Set a node on the tree, overwriting anything already present at that path. + + The new value can be an array or a DataTree, in which case it forms a new node of the tree. + + Paths are specified relative to the node on which this method was called. + + Parameters + ---------- + path : Union[Hashable, Sequence[Hashable]] + Path names can be given as unix-like paths, or as tuples of strings (where each string + is known as a single "tag"). + value : Union[TreeNode, Dataset, DataArray, None] """ self._set_item(path=path, value=value, new_nodes_along_path=True, allow_overwrite=True) def _set_item(self, path: PathType, value: Union[TreeNode, Dataset, DataArray, None], new_nodes_along_path: bool, allow_overwrite: bool) -> None: - p = self._tuple_or_path_to_path(path) + if not isinstance(value, (TreeNode, Dataset, DataArray)) and value is not None: + raise TypeError("Can only set new nodes to TreeNode, Dataset, or DataArray instances, not " + f"{type(value)}") - # TODO: Check that dimensions/coordinates are compatible with adjacent nodes? + # Determine full path of new object + path = self._tuple_or_path_to_path(path) + tags = path.split(self.separator) + if len(tags) < 1: + raise ValueError("Not enough path information provided to create a new node. Please either provide a " + "path containing at least one tag, or a named object for the value.") + *tags, last_tag = tags - if not isinstance(value, (TreeNode, Dataset, DataArray)): - raise TypeError("Can only set new nodes to TreeNode, Dataset, or DataArray instances, not " - f"{type(value.__name__)}") + # TODO: Check that dimensions/coordinates are compatible with adjacent nodes? # Walk to location of new node, creating node objects as we go if necessary - path = self._tuple_or_path_to_path(path) - *tags, last_tag = path.split(self.separator) parent = self for tag in tags: # TODO will this mutation within a for loop actually work? - if tag not in parent.children: + if tag not in [child.name for child in parent.children]: if new_nodes_along_path: - print(repr(parent)) - print(tag) - print(parent.children) + parent.add_child(TreeNode(name=tag, parent=parent)) else: # TODO Should this also be before we walk? raise KeyError(f"Cannot reach new node at path {path}: " f"parent {parent} has no child {tag}") - parent = list(self.children)[tag] + parent = next(c for c in parent.children if c.name == tag) # Deal with anything existing at this location - if last_tag in parent.children: + if last_tag in [child.name for child in parent.children]: if allow_overwrite: child = list(parent.children)[last_tag] child.parent = None @@ -157,16 +205,13 @@ def _set_item(self, path: PathType, value: Union[TreeNode, Dataset, DataArray, N f"{type(parent.get(last_tag))} object") # Create new child node and set at this location - if value is None: - new_child = TreeNode(name=last_tag, parent=parent) - elif isinstance(value, (Dataset, DataArray)): - new_child = TreeNode(name=last_tag, parent=parent) - new_child.ds = value - elif isinstance(value, TreeNode): + if isinstance(value, TreeNode): new_child = value - new_child.parent = parent + elif isinstance(value, (Dataset, DataArray)) or value is None: + new_child = DatasetNode(name=last_tag, data=value) else: raise TypeError + new_child.parent = parent def glob(self, path: str): return self._resolver.glob(self, path) @@ -270,7 +315,7 @@ def map_inplace( new_ds = func(node.name, node.ds, *args, **kwargs) node.dataset = new_ds - def map( + def map_over_descendants( self, func: Callable, *args: Iterable[Any], @@ -303,6 +348,7 @@ def map( # TODO map applied ufuncs over all leaves + # TODO make this public API so that it could be used in a future @register_datatree_accessor example? @classmethod def _dispatch_to_children(cls, method: Callable) -> None: """Wrap such that when method is called on this instance it is also called on children.""" @@ -337,7 +383,7 @@ class DataTree(DatasetNode): is known as a single "tag"). If path names containing more than one tag are given, new tree nodes will be constructed as necessary. - To assign data to the root node of the tree use "{name}" as the path. + To assign data to the root node of the tree use an empty string as the path. name : Hashable, optional Name for the root node of the tree. Default is "root" """ @@ -352,10 +398,11 @@ class DataTree(DatasetNode): def __init__( self, - data_objects: Mapping[PathType, Union[Dataset, DataArray, DatasetNode, None]] = None, + data_objects: Dict[PathType, Union[Dataset, DataArray, DatasetNode, None]] = None, name: Hashable = "root", ): - super().__init__(name=name, data=None, parent=None, children=None) + root_data = data_objects.pop("", None) + super().__init__(name=name, data=root_data, parent=None, children=None) # TODO re-implement using anytree.DictImporter? if data_objects: @@ -363,6 +410,8 @@ def __init__( for path in sorted(data_objects): self._set_item(path, data_objects[path], allow_overwrite=False, new_nodes_along_path=True) + # TODO do we need a watch out for if methods intended only for root nodes are calle on non-root nodes? + @property def chunks(self): raise NotImplementedError From ac5670f1bb023014c27f1f9862551b857c123286 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Thu, 19 Aug 2021 14:47:51 -0400 Subject: [PATCH 012/260] pseudocode implementation of getting/setting both vars and children --- xarray/datatree_/datatree/datatree.py | 144 +++++++++++++++++++------- 1 file changed, 105 insertions(+), 39 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index b19203d13af..c7fc745517d 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -8,12 +8,34 @@ from xarray.core.dataset import Dataset from xarray.core.dataarray import DataArray +from xarray.core.variable import Variable from xarray.core.combine import merge -from xarray.core import dtypes +from xarray.core import dtypes, utils PathType = Union[Hashable, Sequence[Hashable]] +""" +The structure of a populated Datatree looks like this in terms of classes: + +DataTree("root name") +|-- DatasetNode("weather") +| |-- DatasetNode("temperature") +| | |-- DataArrayNode("sea_surface_temperature") +| | |-- DataArrayNode("dew_point_temperature") +| |-- DataArrayNode("wind_speed") +| |-- DataArrayNode("pressure") +|-- DatasetNode("satellite image") +| |-- DatasetNode("infrared") +| | |-- DataArrayNode("near_infrared") +| | |-- DataArrayNode("far_infrared") +| |-- DataArrayNode("true_colour") +|-- DataTreeNode("topography") +| |-- DatasetNode("elevation") +| | |-- DataArrayNode("height_above_sea_level") +|-- DataArrayNode("population") +""" + class TreeNode(anytree.NodeMixin): """ @@ -109,25 +131,6 @@ def get(self, path: PathType) -> TreeNode: p = self._tuple_or_path_to_path(path) return anytree.Resolver('name').get(self, p) - def __getitem__(self, path: PathType) -> TreeNode: - """ - Access node of the tree lying at the given path. - - Raises a KeyError if not found. - - Parameters - ---------- - path : - Path names can be given as unix-like paths, or as tuples of strings - (where each string is known as a single "tag"). - - Returns - ------- - node - """ - p = self._tuple_or_path_to_path(path) - return anytree.Resolver('name').get(self, p) - def set(self, path: PathType, value: Union[TreeNode, Dataset, DataArray] = None) -> None: """ Set a node on the tree, overwriting anything already present at that path. @@ -145,23 +148,6 @@ def set(self, path: PathType, value: Union[TreeNode, Dataset, DataArray] = None) """ self._set_item(path=path, value=value, new_nodes_along_path=True, allow_overwrite=True) - def __setitem__(self, path: PathType, value: Union[TreeNode, Dataset, DataArray] = None) -> None: - """ - Set a node on the tree, overwriting anything already present at that path. - - The new value can be an array or a DataTree, in which case it forms a new node of the tree. - - Paths are specified relative to the node on which this method was called. - - Parameters - ---------- - path : Union[Hashable, Sequence[Hashable]] - Path names can be given as unix-like paths, or as tuples of strings (where each string - is known as a single "tag"). - value : Union[TreeNode, Dataset, DataArray, None] - """ - self._set_item(path=path, value=value, new_nodes_along_path=True, allow_overwrite=True) - def _set_item(self, path: PathType, value: Union[TreeNode, Dataset, DataArray, None], new_nodes_along_path: bool, allow_overwrite: bool) -> None: @@ -249,15 +235,22 @@ class DatasetNode(TreeNode): """ A tree node, but optionally containing data in the form of an xarray.Dataset. - Also implements xarray.Dataset methods, but wrapped to update all child nodes too. + Attempts to present all of the API of xarray.Dataset, but methods are wrapped to also update all child nodes. """ + # TODO should this instead be a subclass of Dataset? + + # TODO add any other properties (maybe dask ones?) + _DS_PROPERTIES = ['variables', 'attrs', 'encoding', 'dims', 'sizes'] + # TODO add all the other methods to dispatch _DS_METHODS_TO_DISPATCH = ['isel', 'sel', 'min', 'max', '__array_ufunc__'] + # TODO currently allows self.ds = None, should we instead always store at least an empty Dataset? + def __init__( self, - name: Hashable = None, + name: Hashable, data: Dataset = None, parent: TreeNode = None, children: List[TreeNode] = None, @@ -265,6 +258,11 @@ def __init__( super().__init__(name=name, parent=parent, children=children) self.ds = data + # Expose properties of wrapped Dataset + for property_name in self._DS_PROPERTIES: + ds_property = getattr(self.ds, property_name) + setattr(self, property_name, ds_property) + # Enable dataset API methods for method_name in self._DS_METHODS_TO_DISPATCH: ds_method = getattr(Dataset, method_name) @@ -286,6 +284,74 @@ def ds(self, data: Union[Dataset, DataArray] = None): def has_data(self): return self.ds is None + def __getitem__(self, key: Union[PathType, Hashable, Mapping, Any]) -> Union[TreeNode, Dataset, DataArray]: + """ + Access either child nodes, or variables or coordinates stored in this node. + + Variable or coordinates of the contained dataset will be returned as a :py:class:`~xarray.DataArray`. + Indexing with a list of names will return a new ``Dataset`` object. + + Parameters + ---------- + key : + If a path to child node then names can be given as unix-like paths, or as tuples of strings + (where each string is known as a single "tag"). + + """ + # Either: + if utils.is_dict_like(key): + # dict-like to variables + return self.ds[key] + elif utils.hashable(key): + if key in self.ds: + # hashable variable + return self.ds[key] + else: + # hashable child name (or path-like) + return self.get(key) + else: + # iterable of hashables + first_key, *_ = key + if first_key in self.children: + # iterable of child tags + return self.get(key) + else: + # iterable of variable names + return self.ds[key] + + def __setitem__( + self, + key: Union[Hashable, List[Hashable], Mapping, PathType], + value: Union[TreeNode, Dataset, DataArray, Variable] + ) -> None: + """ + Add either a child node or an array to this node. + + Parameters + ---------- + key + Either a path-like address for a new node, or the name of a new variable. + value + If a node class or a Dataset, it will be added as a new child node. + If an single array (i.e. DataArray, Variable), it will be added to the underlying Dataset. + """ + if utils.is_dict_like(key): + # TODO xarray.Dataset accepts other possibilities, how do we exactly replicate the behaviour? + raise NotImplementedError + else: + if isinstance(value, (DataArray, Variable)): + self.ds[key] = value + elif isinstance(value, TreeNode): + self.set(path=key, value=value) + elif isinstance(value, Dataset): + # TODO fix this splitting up of path + *path_to_new_node, node_name = key + new_node = DatasetNode(name=node_name, data=value, parent=self) + self.set(path=key, value=new_node) + else: + raise TypeError("Can only assign values of type TreeNode, Dataset, DataArray, or Variable, " + f"not {type(value)}") + def map_inplace( self, func: Callable, From f51fd92e7da76deb18f8bddbe646f926c0a2783c Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Thu, 19 Aug 2021 20:19:36 -0400 Subject: [PATCH 013/260] split out tests for tree nodes --- .../datatree_/datatree/tests/test_datatree.py | 188 +----------------- 1 file changed, 8 insertions(+), 180 deletions(-) diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 9d33d50682c..835b7dd32a1 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -1,11 +1,9 @@ import pytest -from anytree.node.exceptions import TreeError - import xarray as xr from datatree import DataTree -from datatree.datatree import TreeNode, DatasetNode +from datatree.datatree import DatasetNode def create_test_datatree(): @@ -54,98 +52,6 @@ def create_test_datatree(): return root -class TestTreeNodes: - def test_lonely(self): - root = TreeNode("root") - assert root.name == "root" - assert root.parent is None - assert root.children == () - - def test_parenting(self): - john = TreeNode("john") - mary = TreeNode("mary", parent=john) - - assert mary.parent == john - assert mary in john.children - - with pytest.raises(KeyError, match="already has a child named"): - TreeNode("mary", parent=john) - - with pytest.raises(TreeError, match="not of type 'NodeMixin'"): - mary.parent = "apple" - - def test_parent_swap(self): - john = TreeNode("john") - mary = TreeNode("mary", parent=john) - - steve = TreeNode("steve") - mary.parent = steve - assert mary in steve.children - assert mary not in john.children - - def test_multi_child_family(self): - mary = TreeNode("mary") - kate = TreeNode("kate") - john = TreeNode("john", children=[mary, kate]) - assert mary in john.children - assert kate in john.children - assert mary.parent is john - assert kate.parent is john - - def test_disown_child(self): - john = TreeNode("john") - mary = TreeNode("mary", parent=john) - mary.parent = None - assert mary not in john.children - - def test_add_child(self): - john = TreeNode("john") - kate = TreeNode("kate") - john.add_child(kate) - assert kate in john.children - assert kate.parent is john - with pytest.raises(KeyError, match="already has a child named"): - john.add_child(TreeNode("kate")) - - def test_assign_children(self): - john = TreeNode("john") - jack = TreeNode("jack") - jill = TreeNode("jill") - - john.children = (jack, jill) - assert jack in john.children - assert jack.parent is john - assert jill in john.children - assert jill.parent is john - - evil_twin_jill = TreeNode("jill") - with pytest.raises(KeyError, match="already has a child named"): - john.children = (jack, jill, evil_twin_jill) - - def test_sibling_relationships(self): - mary = TreeNode("mary") - kate = TreeNode("kate") - ashley = TreeNode("ashley") - john = TreeNode("john", children=[mary, kate, ashley]) - assert mary in kate.siblings - assert ashley in kate.siblings - assert kate not in kate.siblings - with pytest.raises(AttributeError): - kate.siblings = john - - @pytest.mark.xfail - def test_walking_parents(self): - raise NotImplementedError - - @pytest.mark.xfail - def test_walking_children(self): - raise NotImplementedError - - @pytest.mark.xfail - def test_adoption(self): - raise NotImplementedError - - class TestStoreDatasets: def test_create_datanode(self): dat = xr.Dataset({'a': 0}) @@ -163,93 +69,11 @@ def test_set_data(self): john.ds = "junk" -class TestSetItem: - @pytest.mark.xfail - def test_not_enough_path_info(self): - john = TreeNode("john") - with pytest.raises(ValueError, match="Not enough path information"): - john.set(path='', value=xr.Dataset()) - - @pytest.mark.xfail - def test_set_child_by_name(self): - john = TreeNode("john") - john.set(path="mary", value=None) - - mary = john.children[0] - assert mary.name == "mary" - assert isinstance(mary, TreeNode) - assert mary.children is () - - def test_set_child_as_data(self): - john = TreeNode("john") - dat = xr.Dataset({'a': 0}) - john.set("mary", dat) - - mary = john.children[0] - assert mary.name == "mary" - assert isinstance(mary, DatasetNode) - assert mary.ds is dat - assert mary.children is () - - def test_set_child_as_node(self): - john = TreeNode("john") - mary = TreeNode("mary") - john.set("mary", mary) - # john["mary"] = mary +class TestGetItems: + ... - mary = john.children[0] - assert mary.name == "mary" - assert isinstance(mary, TreeNode) - assert mary.children is () - def test_set_grandchild(self): - john = TreeNode("john") - mary = TreeNode("mary") - rose = TreeNode("rose") - john.set("mary", mary) - mary.set("rose", rose) - - mary = john.children[0] - assert mary.name == "mary" - assert isinstance(mary, TreeNode) - assert rose in mary.children - - rose = mary.children[0] - assert rose.name == "rose" - assert isinstance(rose, TreeNode) - assert rose.children is () - - def test_set_grandchild_and_create_intermediate_child(self): - john = TreeNode("john") - rose = TreeNode("rose") - john.set("mary/rose", rose) - # john["mary/rose"] = rose - # john[("mary", "rose")] = rose - - mary = john.children[0] - assert mary.name == "mary" - assert isinstance(mary, TreeNode) - assert mary.children is (rose,) - - rose = mary.children[0] - assert rose.name == "rose" - assert isinstance(rose, TreeNode) - assert rose.children is () - - def test_no_intermediate_children_allowed(self): - john = TreeNode("john") - rose = TreeNode("rose") - with pytest.raises(KeyError, match="Cannot reach"): - john._set_item(path="mary/rose", value=rose, new_nodes_along_path=False, allow_overwrite=True) - - def test_overwrite_child(self): - ... - - def test_dont_overwrite_child(self): - ... - - -class TestGetItem: +class TestSetItems: ... @@ -309,6 +133,10 @@ def test_render_datatree(self): dt.render() +class TestPropertyInheritance: + ... + + class TestMethodInheritance: ... From f7be8653d8e170fbfe8d3380c5bd1e0e2d6459bf Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Thu, 19 Aug 2021 20:19:57 -0400 Subject: [PATCH 014/260] simplified getting and setting nodes --- xarray/datatree_/datatree/datatree.py | 71 +++++++++++++-------------- 1 file changed, 35 insertions(+), 36 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index c7fc745517d..adcc0b7aa73 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -112,7 +112,7 @@ def _tuple_or_path_to_path(cls, address: PathType) -> str: else: raise ValueError(f"{address} is not a valid form of path") - def get(self, path: PathType) -> TreeNode: + def get_node(self, path: PathType) -> TreeNode: """ Access node of the tree lying at the given path. @@ -121,8 +121,8 @@ def get(self, path: PathType) -> TreeNode: Parameters ---------- path : - Path names can be given as unix-like paths, or as tuples of strings - (where each string is known as a single "tag"). + Paths can be given as unix-like paths, or as tuples of strings + (where each string is known as a single "tag"). Path includes the name of the target node. Returns ------- @@ -131,73 +131,72 @@ def get(self, path: PathType) -> TreeNode: p = self._tuple_or_path_to_path(path) return anytree.Resolver('name').get(self, p) - def set(self, path: PathType, value: Union[TreeNode, Dataset, DataArray] = None) -> None: + def set_node( + self, + path: PathType = '', + node: TreeNode = None, + new_nodes_along_path: bool = True, + allow_overwrite: bool = True, + ) -> None: """ Set a node on the tree, overwriting anything already present at that path. - The new value can be an array or a DataTree, in which case it forms a new node of the tree. + The given value either forms a new node of the tree or overwrites an existing node at that location. - Paths are specified relative to the node on which this method was called. + Paths are specified relative to the node on which this method was called, and the name of the node forms the + last part of the path. (i.e. `.set_node(path='', TreeNode('a'))` is equivalent to `.add_child(TreeNode('a'))`. Parameters ---------- path : Union[Hashable, Sequence[Hashable]] Path names can be given as unix-like paths, or as tuples of strings (where each string - is known as a single "tag"). - value : Union[TreeNode, Dataset, DataArray, None] + is known as a single "tag"). Default is ''. + node : TreeNode + new_nodes_along_path : bool + If true, then if necessary new nodes will be created along the given path, until the tree can reach the + specified location. If false then an error is thrown instead of creating intermediate nodes alang the path. + allow_overwrite : bool + Whether or not to overwrite any existing node at the location given by path. Default is True. + + Raises + ------ + KeyError + If a node already exists at the given path """ - self._set_item(path=path, value=value, new_nodes_along_path=True, allow_overwrite=True) - - def _set_item(self, path: PathType, value: Union[TreeNode, Dataset, DataArray, None], - new_nodes_along_path: bool, allow_overwrite: bool) -> None: - - if not isinstance(value, (TreeNode, Dataset, DataArray)) and value is not None: - raise TypeError("Can only set new nodes to TreeNode, Dataset, or DataArray instances, not " - f"{type(value)}") # Determine full path of new object path = self._tuple_or_path_to_path(path) - tags = path.split(self.separator) - if len(tags) < 1: - raise ValueError("Not enough path information provided to create a new node. Please either provide a " - "path containing at least one tag, or a named object for the value.") - *tags, last_tag = tags - # TODO: Check that dimensions/coordinates are compatible with adjacent nodes? + if not isinstance(node, TreeNode): + raise ValueError + node_name = node.name # Walk to location of new node, creating node objects as we go if necessary parent = self - for tag in tags: + for tag in path.split(self.separator): # TODO will this mutation within a for loop actually work? if tag not in [child.name for child in parent.children]: if new_nodes_along_path: - + # TODO prevent this from leaving a trail of nodes if the assignment fails somehow parent.add_child(TreeNode(name=tag, parent=parent)) else: - # TODO Should this also be before we walk? raise KeyError(f"Cannot reach new node at path {path}: " f"parent {parent} has no child {tag}") parent = next(c for c in parent.children if c.name == tag) # Deal with anything existing at this location - if last_tag in [child.name for child in parent.children]: + if node_name in [child.name for child in parent.children]: if allow_overwrite: - child = list(parent.children)[last_tag] + child = parent.get(node_name) child.parent = None del child else: # TODO should this be before we walk to the new node? raise KeyError(f"Cannot set item at {path} whilst that path already points to a " - f"{type(parent.get(last_tag))} object") + f"{type(parent.get(node_name))} object") - # Create new child node and set at this location - if isinstance(value, TreeNode): - new_child = value - elif isinstance(value, (Dataset, DataArray)) or value is None: - new_child = DatasetNode(name=last_tag, data=value) - else: - raise TypeError - new_child.parent = parent + # Place new child node at this location + node.parent = parent def glob(self, path: str): return self._resolver.glob(self, path) From ba71b03604fb5d3f34d56d506f80f7d1bd0ea9cc Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Thu, 19 Aug 2021 23:44:58 -0400 Subject: [PATCH 015/260] pseudocode for mapping functions over subtrees --- xarray/datatree_/datatree/datatree.py | 110 ++++++++++++++++---------- 1 file changed, 68 insertions(+), 42 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index adcc0b7aa73..bd070c4ba71 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -112,6 +112,9 @@ def _tuple_or_path_to_path(cls, address: PathType) -> str: else: raise ValueError(f"{address} is not a valid form of path") + def relative_to(self, other: PathType): + raise NotImplementedError + def get_node(self, path: PathType) -> TreeNode: """ Access node of the tree lying at the given path. @@ -230,11 +233,36 @@ def get_any(self, *tags: Hashable) -> DataTree: return DataTree(data_objects=matching_children) +def _map_over_subtree(tree, func, *args, **kwargs): + """Internal function which maps func over every node in tree, returning a tree of the results.""" + + subtree_nodes = anytree.iterators.PreOrderIter(tree) + + out_tree = DataTree(name=tree.name, data_objects={}) + + for node in subtree_nodes: + relative_path = tree.path.replace(node.path, '') + + if node.has_data: + result = func(node.ds, *args, **kwargs) + else: + result = None + + out_tree[relative_path] = DatasetNode(name=node.name, data=result) + + return out_tree + + +def map_over_subtree(func): + """Decorator to turn a function which acts on (and returns) single Datasets into one which acts on DataTrees.""" + return functools.wraps(func)(_map_over_subtree) + + class DatasetNode(TreeNode): """ A tree node, but optionally containing data in the form of an xarray.Dataset. - Attempts to present all of the API of xarray.Dataset, but methods are wrapped to also update all child nodes. + Attempts to present the API of xarray.Dataset, but methods are wrapped to also update all the tree's child nodes. """ # TODO should this instead be a subclass of Dataset? @@ -243,7 +271,7 @@ class DatasetNode(TreeNode): _DS_PROPERTIES = ['variables', 'attrs', 'encoding', 'dims', 'sizes'] # TODO add all the other methods to dispatch - _DS_METHODS_TO_DISPATCH = ['isel', 'sel', 'min', 'max', '__array_ufunc__'] + _DS_METHODS_TO_MAP_OVER_SUBTREES = ['isel', 'sel', 'min', 'max', '__array_ufunc__'] # TODO currently allows self.ds = None, should we instead always store at least an empty Dataset? @@ -258,14 +286,15 @@ def __init__( self.ds = data # Expose properties of wrapped Dataset + # TODO if self.ds = None what will happen? for property_name in self._DS_PROPERTIES: - ds_property = getattr(self.ds, property_name) + ds_property = getattr(Dataset, property_name) setattr(self, property_name, ds_property) # Enable dataset API methods - for method_name in self._DS_METHODS_TO_DISPATCH: + for method_name in self._DS_METHODS_TO_MAP_OVER_SUBTREES: ds_method = getattr(Dataset, method_name) - self._dispatch_to_children(ds_method) + setattr(self, method_name, map_over_subtree(ds_method)) @property def ds(self) -> Dataset: @@ -351,75 +380,72 @@ def __setitem__( raise TypeError("Can only assign values of type TreeNode, Dataset, DataArray, or Variable, " f"not {type(value)}") - def map_inplace( - self, - func: Callable, - *args: Iterable[Any], - **kwargs: Any, - ) -> None: + def map_over_subtree( + self, + func: Callable, + *args: Iterable[Any], + **kwargs: Any, + ) -> DataTree: """ - Apply a function to the dataset at each child node in the tree, updating data in place. + Apply a function to every dataset in this subtree, returning a new tree which stores the results. + + The function will be applied to any dataset stored in this node, as well as any dataset stored in any of the + descendant nodes. The returned tree will have the same structure as the original subtree. + + func needs to return a Dataset in order to rebuild the subtree. Parameters ---------- func : callable Function to apply to datasets with signature: - `func(node.name, node.dataset, *args, **kwargs) -> Dataset`. + `func(node.ds, *args, **kwargs) -> Dataset`. - Function will still be applied to any nodes without datasets, - in which cases the `dataset` argument to `func` will be `None`. + Function will not be applied to any nodes without datasets. *args : tuple, optional Positional arguments passed on to `func`. **kwargs : Any Keyword arguments passed on to `func`. - """ - # TODO if func fails on some node then the previous nodes will still have been updated... + Returns + ------- + subtree : DataTree + Subtree containing results from applying ``func`` to the dataset at each node. + """ + # TODO this signature means that func has no way to know which node it is being called upon - change? - for node in self._walk_children(): - new_ds = func(node.name, node.ds, *args, **kwargs) - node.dataset = new_ds + return _map_over_subtree(self, func, *args, **kwargs) - def map_over_descendants( + def map_inplace_over_subtree( self, func: Callable, *args: Iterable[Any], **kwargs: Any, - ) -> Iterable[Any]: + ) -> None: """ - Apply a function to the dataset at each node in the tree, returning a generator - of all the results. + Apply a function to every dataset in this subtree, updating data in place. Parameters ---------- func : callable Function to apply to datasets with signature: - `func(node.name, node.dataset, *args, **kwargs) -> None or return value`. + `func(node.ds, *args, **kwargs) -> Dataset`. - Function will still be applied to any nodes without datasets, - in which cases the `dataset` argument to `func` will be `None`. + Function will not be applied to any nodes without datasets, *args : tuple, optional Positional arguments passed on to `func`. **kwargs : Any Keyword arguments passed on to `func`. - - Returns - ------- - applied : Iterable[Any] - Generator of results from applying ``func`` to the dataset at each node. """ - for node in self._walk_children(): - yield func(node.name, node.ds, *args, **kwargs) - # TODO map applied ufuncs over all leaves + # TODO if func fails on some node then the previous nodes will still have been updated... - # TODO make this public API so that it could be used in a future @register_datatree_accessor example? - @classmethod - def _dispatch_to_children(cls, method: Callable) -> None: - """Wrap such that when method is called on this instance it is also called on children.""" - _dispatching_method = functools.partial(cls.map_inplace, func=method) - # TODO update method docstrings accordingly - setattr(cls, method.__name__, _dispatching_method) + subtree_nodes = anytree.iterators.PreOrderIter(self) + + for node in subtree_nodes: + if node.has_data: + node.ds = func(node.ds, *args, **kwargs) + + # TODO map applied ufuncs over all leaves def __str__(self): return f"DatasetNode('{self.name}', data={self.ds})" From 3ea8162224ae667e42ed6b34c15b0dbaebea7108 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Thu, 19 Aug 2021 20:19:57 -0400 Subject: [PATCH 016/260] simplified getting and setting nodes --- xarray/datatree_/datatree/datatree.py | 85 +++--- .../datatree_/datatree/tests/test_treenode.py | 241 ++++++++++++++++++ 2 files changed, 286 insertions(+), 40 deletions(-) create mode 100644 xarray/datatree_/datatree/tests/test_treenode.py diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index c7fc745517d..2ecac8e5394 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -112,17 +112,17 @@ def _tuple_or_path_to_path(cls, address: PathType) -> str: else: raise ValueError(f"{address} is not a valid form of path") - def get(self, path: PathType) -> TreeNode: + def get_node(self, path: PathType) -> TreeNode: """ Access node of the tree lying at the given path. - Raises a KeyError if not found. + Raises a TreeError if not found. Parameters ---------- path : - Path names can be given as unix-like paths, or as tuples of strings - (where each string is known as a single "tag"). + Paths can be given as unix-like paths, or as tuples of strings + (where each string is known as a single "tag"). Path includes the name of the target node. Returns ------- @@ -131,73 +131,73 @@ def get(self, path: PathType) -> TreeNode: p = self._tuple_or_path_to_path(path) return anytree.Resolver('name').get(self, p) - def set(self, path: PathType, value: Union[TreeNode, Dataset, DataArray] = None) -> None: + def set_node( + self, + path: PathType = '/', + node: TreeNode = None, + new_nodes_along_path: bool = True, + allow_overwrite: bool = True, + ) -> None: """ Set a node on the tree, overwriting anything already present at that path. - The new value can be an array or a DataTree, in which case it forms a new node of the tree. + The given value either forms a new node of the tree or overwrites an existing node at that location. - Paths are specified relative to the node on which this method was called. + Paths are specified relative to the node on which this method was called, and the name of the node forms the + last part of the path. (i.e. `.set_node(path='', TreeNode('a'))` is equivalent to `.add_child(TreeNode('a'))`. Parameters ---------- path : Union[Hashable, Sequence[Hashable]] Path names can be given as unix-like paths, or as tuples of strings (where each string - is known as a single "tag"). - value : Union[TreeNode, Dataset, DataArray, None] + is known as a single "tag"). Default is '/'. + node : TreeNode + new_nodes_along_path : bool + If true, then if necessary new nodes will be created along the given path, until the tree can reach the + specified location. If false then an error is thrown instead of creating intermediate nodes alang the path. + allow_overwrite : bool + Whether or not to overwrite any existing node at the location given by path. Default is True. + + Raises + ------ + KeyError + If a node already exists at the given path """ - self._set_item(path=path, value=value, new_nodes_along_path=True, allow_overwrite=True) - - def _set_item(self, path: PathType, value: Union[TreeNode, Dataset, DataArray, None], - new_nodes_along_path: bool, allow_overwrite: bool) -> None: - - if not isinstance(value, (TreeNode, Dataset, DataArray)) and value is not None: - raise TypeError("Can only set new nodes to TreeNode, Dataset, or DataArray instances, not " - f"{type(value)}") # Determine full path of new object path = self._tuple_or_path_to_path(path) - tags = path.split(self.separator) - if len(tags) < 1: - raise ValueError("Not enough path information provided to create a new node. Please either provide a " - "path containing at least one tag, or a named object for the value.") - *tags, last_tag = tags - # TODO: Check that dimensions/coordinates are compatible with adjacent nodes? + if not isinstance(node, TreeNode): + raise ValueError(f"Can only set nodes to be subclasses of TreeNode, but node is of type {type(node)}") + node_name = node.name - # Walk to location of new node, creating node objects as we go if necessary + # Walk to location of new node, creating intermediate node objects as we go if necessary parent = self + tags = [tag for tag in path.split(self.separator) if tag not in [self.separator, '']] for tag in tags: # TODO will this mutation within a for loop actually work? if tag not in [child.name for child in parent.children]: if new_nodes_along_path: - - parent.add_child(TreeNode(name=tag, parent=parent)) + # TODO prevent this from leaving a trail of nodes if the assignment fails somehow + parent.add_child(TreeNode(name=tag)) else: - # TODO Should this also be before we walk? raise KeyError(f"Cannot reach new node at path {path}: " f"parent {parent} has no child {tag}") - parent = next(c for c in parent.children if c.name == tag) + parent = parent.get_node(tag) - # Deal with anything existing at this location - if last_tag in [child.name for child in parent.children]: + # Deal with anything already existing at this location + if node_name in [child.name for child in parent.children]: if allow_overwrite: - child = list(parent.children)[last_tag] + child = parent.get_node(node_name) child.parent = None del child else: # TODO should this be before we walk to the new node? raise KeyError(f"Cannot set item at {path} whilst that path already points to a " - f"{type(parent.get(last_tag))} object") + f"{type(parent.get_node(node_name))} object") - # Create new child node and set at this location - if isinstance(value, TreeNode): - new_child = value - elif isinstance(value, (Dataset, DataArray)) or value is None: - new_child = DatasetNode(name=last_tag, data=value) - else: - raise TypeError - new_child.parent = parent + # Place new child node at this location + parent.add_child(node) def glob(self, path: str): return self._resolver.glob(self, path) @@ -508,6 +508,11 @@ def merge_child_datasets( def as_dataarray(self) -> DataArray: return self.ds.as_dataarray() + @property + def groups(self): + """Return all netCDF4 groups in the tree, given as a tuple of path-like strings.""" + return tuple(node.path for node in anytree.iterators.PreOrderIter(self)) + def to_netcdf(self, filename: str): from .io import _datatree_to_netcdf diff --git a/xarray/datatree_/datatree/tests/test_treenode.py b/xarray/datatree_/datatree/tests/test_treenode.py new file mode 100644 index 00000000000..9025669561c --- /dev/null +++ b/xarray/datatree_/datatree/tests/test_treenode.py @@ -0,0 +1,241 @@ +import pytest + +from anytree.node.exceptions import TreeError +from anytree.resolver import ChildResolverError + +from datatree.datatree import TreeNode + + +class TestFamilyTree: + def test_lonely(self): + root = TreeNode("root") + assert root.name == "root" + assert root.parent is None + assert root.children == () + + def test_parenting(self): + john = TreeNode("john") + mary = TreeNode("mary", parent=john) + + assert mary.parent == john + assert mary in john.children + + with pytest.raises(KeyError, match="already has a child named"): + TreeNode("mary", parent=john) + + with pytest.raises(TreeError, match="not of type 'NodeMixin'"): + mary.parent = "apple" + + def test_parent_swap(self): + john = TreeNode("john") + mary = TreeNode("mary", parent=john) + + steve = TreeNode("steve") + mary.parent = steve + assert mary in steve.children + assert mary not in john.children + + def test_multi_child_family(self): + mary = TreeNode("mary") + kate = TreeNode("kate") + john = TreeNode("john", children=[mary, kate]) + assert mary in john.children + assert kate in john.children + assert mary.parent is john + assert kate.parent is john + + def test_disown_child(self): + john = TreeNode("john") + mary = TreeNode("mary", parent=john) + mary.parent = None + assert mary not in john.children + + def test_add_child(self): + john = TreeNode("john") + kate = TreeNode("kate") + john.add_child(kate) + assert kate in john.children + assert kate.parent is john + with pytest.raises(KeyError, match="already has a child named"): + john.add_child(TreeNode("kate")) + + def test_assign_children(self): + john = TreeNode("john") + jack = TreeNode("jack") + jill = TreeNode("jill") + + john.children = (jack, jill) + assert jack in john.children + assert jack.parent is john + assert jill in john.children + assert jill.parent is john + + evil_twin_jill = TreeNode("jill") + with pytest.raises(KeyError, match="already has a child named"): + john.children = (jack, jill, evil_twin_jill) + + def test_sibling_relationships(self): + mary = TreeNode("mary") + kate = TreeNode("kate") + ashley = TreeNode("ashley") + john = TreeNode("john", children=[mary, kate, ashley]) + assert mary in kate.siblings + assert ashley in kate.siblings + assert kate not in kate.siblings + with pytest.raises(AttributeError): + kate.siblings = john + + @pytest.mark.xfail + def test_adoption(self): + raise NotImplementedError + + @pytest.mark.xfail + def test_root(self): + raise NotImplementedError + + @pytest.mark.xfail + def test_ancestors(self): + raise NotImplementedError + + @pytest.mark.xfail + def test_descendants(self): + raise NotImplementedError + + +class TestGetNodes: + def test_get_child(self): + john = TreeNode("john") + mary = TreeNode("mary", parent=john) + assert john.get_node("mary") is mary + assert john.get_node(("mary",)) is mary + + def test_get_nonexistent_child(self): + john = TreeNode("john") + TreeNode("jill", parent=john) + with pytest.raises(ChildResolverError): + john.get_node("mary") + + def test_get_grandchild(self): + john = TreeNode("john") + mary = TreeNode("mary", parent=john) + sue = TreeNode("sue", parent=mary) + assert john.get_node("mary/sue") is sue + assert john.get_node(("mary", "sue")) is sue + + def test_get_great_grandchild(self): + john = TreeNode("john") + mary = TreeNode("mary", parent=john) + sue = TreeNode("sue", parent=mary) + steven = TreeNode("steven", parent=sue) + assert john.get_node("mary/sue/steven") is steven + assert john.get_node(("mary", "sue", "steven")) is steven + + def test_get_from_middle_of_tree(self): + john = TreeNode("john") + mary = TreeNode("mary", parent=john) + sue = TreeNode("sue", parent=mary) + steven = TreeNode("steven", parent=sue) + assert mary.get_node("sue/steven") is steven + assert mary.get_node(("sue", "steven")) is steven + + +class TestSetNodes: + def test_set_child_node(self): + john = TreeNode("john") + mary = TreeNode("mary") + john.set_node('/', mary) + + mary = john.children[0] + assert mary.name == "mary" + assert isinstance(mary, TreeNode) + assert mary.children is () + + def test_child_already_exists(self): + john = TreeNode("john") + mary = TreeNode("mary", parent=john) + marys_replacement = TreeNode("mary") + + with pytest.raises(KeyError): + john.set_node('/', marys_replacement, allow_overwrite=False) + + def test_set_grandchild(self): + john = TreeNode("john") + mary = TreeNode("mary") + rose = TreeNode("rose") + john.set_node('/', mary) + john.set_node('/mary/', rose) + + mary = john.children[0] + assert mary.name == "mary" + assert isinstance(mary, TreeNode) + assert rose in mary.children + + rose = mary.children[0] + assert rose.name == "rose" + assert isinstance(rose, TreeNode) + assert rose.children is () + + def test_set_grandchild_and_create_intermediate_child(self): + john = TreeNode("john") + rose = TreeNode("rose") + john.set_node("/mary/", rose) + + mary = john.children[0] + assert mary.name == "mary" + assert isinstance(mary, TreeNode) + assert mary.children[0] is rose + + rose = mary.children[0] + assert rose.name == "rose" + assert isinstance(rose, TreeNode) + assert rose.children is () + + def test_no_intermediate_children_allowed(self): + john = TreeNode("john") + rose = TreeNode("rose") + with pytest.raises(KeyError, match="Cannot reach"): + john.set_node(path="mary", node=rose, new_nodes_along_path=False, allow_overwrite=True) + + def test_set_great_grandchild(self): + john = TreeNode("john") + mary = TreeNode("mary", parent=john) + rose = TreeNode("rose", parent=mary) + sue = TreeNode("sue") + john.set_node("mary/rose", sue) + assert sue.parent is rose + + def test_overwrite_child(self): + john = TreeNode("john") + mary = TreeNode("mary") + john.set_node('/', mary) + assert mary in john.children + + marys_evil_twin = TreeNode("mary") + john.set_node('/', marys_evil_twin) + assert marys_evil_twin in john.children + assert mary not in john.children + + def test_dont_overwrite_child(self): + john = TreeNode("john") + mary = TreeNode("mary") + john.set_node('/', mary) + assert mary in john.children + + marys_evil_twin = TreeNode("mary") + with pytest.raises(KeyError, match="path already points"): + john.set_node('', marys_evil_twin, new_nodes_along_path=True, allow_overwrite=False) + assert mary in john.children + assert marys_evil_twin not in john.children + + +class TestPaths: + def test_relative_path(self): + ... + + +class TestTags: + ... + + +class TestRenderTree: + ... From 334ca8ea87aaa1901876b0e2ffae602155d56335 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Fri, 20 Aug 2021 02:03:28 -0400 Subject: [PATCH 017/260] moved TreeNode class into its own file --- xarray/datatree_/datatree/datatree.py | 200 +---------------- .../datatree_/datatree/tests/test_treenode.py | 2 +- xarray/datatree_/datatree/treenode.py | 202 ++++++++++++++++++ 3 files changed, 205 insertions(+), 199 deletions(-) create mode 100644 xarray/datatree_/datatree/treenode.py diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 2ecac8e5394..e3acf439e90 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -1,8 +1,7 @@ from __future__ import annotations - import functools -from typing import Sequence, Tuple, Mapping, Hashable, Union, List, Any, Callable, Iterable, Dict +from typing import Mapping, Hashable, Union, List, Any, Callable, Iterable, Dict import anytree @@ -12,8 +11,7 @@ from xarray.core.combine import merge from xarray.core import dtypes, utils - -PathType = Union[Hashable, Sequence[Hashable]] +from .treenode import TreeNode, PathType """ The structure of a populated Datatree looks like this in terms of classes: @@ -37,200 +35,6 @@ """ -class TreeNode(anytree.NodeMixin): - """ - Base class representing a node of a tree, with methods for traversing and altering the tree. - - Depends on the anytree library for basic tree structure, but the parent class is fairly small - so could be easily reimplemented to avoid a hard dependency. - - Adds restrictions preventing children with the same name, a method to set new nodes at arbitrary depth, - and access via unix-like paths or tuples of tags. Does not yet store anything in the nodes of the tree. - """ - - # TODO remove anytree dependency - # TODO allow for loops via symbolic links? - - # TODO store children with their names in an OrderedDict instead of a tuple like anytree does? - # TODO do nodes even need names? Or can they just be referred to by the tags their parents store them under? - # TODO nodes should have names but they should be optional. Getting and setting should be down without reference to - # the names of stored objects, only their tags (i.e. position in the family tree) - # Ultimately you either need a list of named children, or a dictionary of unnamed children - - _resolver = anytree.Resolver('name') - - def __init__( - self, - name: Hashable, - parent: TreeNode = None, - children: Iterable[TreeNode] = None, - ): - if not isinstance(name, str) or '/' in name: - raise ValueError(f"invalid name {name}") - self.name = name - - self.parent = parent - if children: - self.children = children - - def __str__(self): - return f"TreeNode('{self.name}')" - - def __repr__(self): - return f"TreeNode(name='{self.name}', parent={str(self.parent)}, children={[str(c) for c in self.children]})" - - def render(self): - """Print tree structure, with only node names displayed.""" - # TODO should be rewritten to reflect names of children rather than names of nodes, probably like anytree.node - # TODO add option to suppress dataset information beyond just variable names - #for pre, _, node in anytree.RenderTree(self): - # print(f"{pre}{node}") - args = ["%r" % self.separator.join([""] + [str(node.name) for node in self.path])] - print(anytree.node.util._repr(self, args=args, nameblacklist=["name"])) - - def _pre_attach(self, parent: TreeNode) -> None: - """ - Method which superclass calls before setting parent, here used to prevent having two - children with duplicate names. - """ - if self.name in list(c.name for c in parent.children): - raise KeyError(f"parent {str(parent)} already has a child named {self.name}") - - def add_child(self, child: TreeNode) -> None: - """Add a single child node below this node, without replacement.""" - if child.name in list(c.name for c in self.children): - raise KeyError(f"Node already has a child named {child.name}") - else: - child.parent = self - - @classmethod - def _tuple_or_path_to_path(cls, address: PathType) -> str: - if isinstance(address, str): - return address - elif isinstance(address, tuple): - return cls.separator.join(tag for tag in address) - else: - raise ValueError(f"{address} is not a valid form of path") - - def get_node(self, path: PathType) -> TreeNode: - """ - Access node of the tree lying at the given path. - - Raises a TreeError if not found. - - Parameters - ---------- - path : - Paths can be given as unix-like paths, or as tuples of strings - (where each string is known as a single "tag"). Path includes the name of the target node. - - Returns - ------- - node - """ - p = self._tuple_or_path_to_path(path) - return anytree.Resolver('name').get(self, p) - - def set_node( - self, - path: PathType = '/', - node: TreeNode = None, - new_nodes_along_path: bool = True, - allow_overwrite: bool = True, - ) -> None: - """ - Set a node on the tree, overwriting anything already present at that path. - - The given value either forms a new node of the tree or overwrites an existing node at that location. - - Paths are specified relative to the node on which this method was called, and the name of the node forms the - last part of the path. (i.e. `.set_node(path='', TreeNode('a'))` is equivalent to `.add_child(TreeNode('a'))`. - - Parameters - ---------- - path : Union[Hashable, Sequence[Hashable]] - Path names can be given as unix-like paths, or as tuples of strings (where each string - is known as a single "tag"). Default is '/'. - node : TreeNode - new_nodes_along_path : bool - If true, then if necessary new nodes will be created along the given path, until the tree can reach the - specified location. If false then an error is thrown instead of creating intermediate nodes alang the path. - allow_overwrite : bool - Whether or not to overwrite any existing node at the location given by path. Default is True. - - Raises - ------ - KeyError - If a node already exists at the given path - """ - - # Determine full path of new object - path = self._tuple_or_path_to_path(path) - - if not isinstance(node, TreeNode): - raise ValueError(f"Can only set nodes to be subclasses of TreeNode, but node is of type {type(node)}") - node_name = node.name - - # Walk to location of new node, creating intermediate node objects as we go if necessary - parent = self - tags = [tag for tag in path.split(self.separator) if tag not in [self.separator, '']] - for tag in tags: - # TODO will this mutation within a for loop actually work? - if tag not in [child.name for child in parent.children]: - if new_nodes_along_path: - # TODO prevent this from leaving a trail of nodes if the assignment fails somehow - parent.add_child(TreeNode(name=tag)) - else: - raise KeyError(f"Cannot reach new node at path {path}: " - f"parent {parent} has no child {tag}") - parent = parent.get_node(tag) - - # Deal with anything already existing at this location - if node_name in [child.name for child in parent.children]: - if allow_overwrite: - child = parent.get_node(node_name) - child.parent = None - del child - else: - # TODO should this be before we walk to the new node? - raise KeyError(f"Cannot set item at {path} whilst that path already points to a " - f"{type(parent.get_node(node_name))} object") - - # Place new child node at this location - parent.add_child(node) - - def glob(self, path: str): - return self._resolver.glob(self, path) - - @property - def tags(self) -> Tuple[Hashable]: - """All tags, returned in order starting from the root node""" - return tuple(self.path.split(self.separator)) - - @tags.setter - def tags(self, value): - raise AttributeError(f"tags cannot be set, except via changing the children and/or parent of a node.") - - # TODO re-implement using anytree findall function - def get_all(self, *tags: Hashable) -> DataTree: - """ - Return a DataTree containing the stored objects whose path contains all of the given tags, - where the tags can be present in any order. - """ - matching_children = {c.tags: c.get(tags) for c in self._walk_children() - if all(tag in c.tags for tag in tags)} - return DataTree(data_objects=matching_children) - - # TODO re-implement using anytree find function - def get_any(self, *tags: Hashable) -> DataTree: - """ - Return a DataTree containing the stored objects whose path contains any of the given tags. - """ - matching_children = {c.tags: c.get(tags) for c in self._walk_children() - if any(tag in c.tags for tag in tags)} - return DataTree(data_objects=matching_children) - - class DatasetNode(TreeNode): """ A tree node, but optionally containing data in the form of an xarray.Dataset. diff --git a/xarray/datatree_/datatree/tests/test_treenode.py b/xarray/datatree_/datatree/tests/test_treenode.py index 9025669561c..a2bb998edff 100644 --- a/xarray/datatree_/datatree/tests/test_treenode.py +++ b/xarray/datatree_/datatree/tests/test_treenode.py @@ -3,7 +3,7 @@ from anytree.node.exceptions import TreeError from anytree.resolver import ChildResolverError -from datatree.datatree import TreeNode +from datatree.treenode import TreeNode class TestFamilyTree: diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py new file mode 100644 index 00000000000..c17750c94a2 --- /dev/null +++ b/xarray/datatree_/datatree/treenode.py @@ -0,0 +1,202 @@ +from __future__ import annotations + +from typing import Sequence, Tuple, Hashable, Union, Iterable + +import anytree + + +PathType = Union[Hashable, Sequence[Hashable]] + + +class TreeNode(anytree.NodeMixin): + """ + Base class representing a node of a tree, with methods for traversing and altering the tree. + + Depends on the anytree library for basic tree structure, but the parent class is fairly small + so could be easily reimplemented to avoid a hard dependency. + + Adds restrictions preventing children with the same name, a method to set new nodes at arbitrary depth, + and access via unix-like paths or tuples of tags. Does not yet store anything in the nodes of the tree. + """ + + # TODO remove anytree dependency + # TODO allow for loops via symbolic links? + + # TODO store children with their names in an OrderedDict instead of a tuple like anytree does? + # TODO do nodes even need names? Or can they just be referred to by the tags their parents store them under? + # TODO nodes should have names but they should be optional. Getting and setting should be down without reference to + # the names of stored objects, only their tags (i.e. position in the family tree) + # Ultimately you either need a list of named children, or a dictionary of unnamed children + + _resolver = anytree.Resolver('name') + + def __init__( + self, + name: Hashable, + parent: TreeNode = None, + children: Iterable[TreeNode] = None, + ): + if not isinstance(name, str) or '/' in name: + raise ValueError(f"invalid name {name}") + self.name = name + + self.parent = parent + if children: + self.children = children + + def __str__(self): + return f"TreeNode('{self.name}')" + + def __repr__(self): + return f"TreeNode(name='{self.name}', parent={str(self.parent)}, children={[str(c) for c in self.children]})" + + def render(self): + """Print tree structure, with only node names displayed.""" + # TODO should be rewritten to reflect names of children rather than names of nodes, probably like anytree.node + # TODO add option to suppress dataset information beyond just variable names + #for pre, _, node in anytree.RenderTree(self): + # print(f"{pre}{node}") + args = ["%r" % self.separator.join([""] + [str(node.name) for node in self.path])] + print(anytree.node.util._repr(self, args=args, nameblacklist=["name"])) + + def _pre_attach(self, parent: TreeNode) -> None: + """ + Method which superclass calls before setting parent, here used to prevent having two + children with duplicate names. + """ + if self.name in list(c.name for c in parent.children): + raise KeyError(f"parent {str(parent)} already has a child named {self.name}") + + def add_child(self, child: TreeNode) -> None: + """Add a single child node below this node, without replacement.""" + if child.name in list(c.name for c in self.children): + raise KeyError(f"Node already has a child named {child.name}") + else: + child.parent = self + + @classmethod + def _tuple_or_path_to_path(cls, address: PathType) -> str: + if isinstance(address, str): + return address + elif isinstance(address, tuple): + return cls.separator.join(tag for tag in address) + else: + raise ValueError(f"{address} is not a valid form of path") + + def get_node(self, path: PathType) -> TreeNode: + """ + Access node of the tree lying at the given path. + + Raises a TreeError if not found. + + Parameters + ---------- + path : + Paths can be given as unix-like paths, or as tuples of strings + (where each string is known as a single "tag"). Path includes the name of the target node. + + Returns + ------- + node + """ + p = self._tuple_or_path_to_path(path) + return anytree.Resolver('name').get(self, p) + + def set_node( + self, + path: PathType = '/', + node: TreeNode = None, + new_nodes_along_path: bool = True, + allow_overwrite: bool = True, + ) -> None: + """ + Set a node on the tree, overwriting anything already present at that path. + + The given value either forms a new node of the tree or overwrites an existing node at that location. + + Paths are specified relative to the node on which this method was called, and the name of the node forms the + last part of the path. (i.e. `.set_node(path='', TreeNode('a'))` is equivalent to `.add_child(TreeNode('a'))`. + + Parameters + ---------- + path : Union[Hashable, Sequence[Hashable]] + Path names can be given as unix-like paths, or as tuples of strings (where each string + is known as a single "tag"). Default is '/'. + node : TreeNode + new_nodes_along_path : bool + If true, then if necessary new nodes will be created along the given path, until the tree can reach the + specified location. If false then an error is thrown instead of creating intermediate nodes alang the path. + allow_overwrite : bool + Whether or not to overwrite any existing node at the location given by path. Default is True. + + Raises + ------ + KeyError + If a node already exists at the given path + """ + + # Determine full path of new object + path = self._tuple_or_path_to_path(path) + + if not isinstance(node, TreeNode): + raise ValueError(f"Can only set nodes to be subclasses of TreeNode, but node is of type {type(node)}") + node_name = node.name + + # Walk to location of new node, creating intermediate node objects as we go if necessary + parent = self + tags = [tag for tag in path.split(self.separator) if tag not in [self.separator, '']] + for tag in tags: + # TODO will this mutation within a for loop actually work? + if tag not in [child.name for child in parent.children]: + if new_nodes_along_path: + # TODO prevent this from leaving a trail of nodes if the assignment fails somehow + parent.add_child(TreeNode(name=tag)) + else: + raise KeyError(f"Cannot reach new node at path {path}: " + f"parent {parent} has no child {tag}") + parent = parent.get_node(tag) + + # Deal with anything already existing at this location + if node_name in [child.name for child in parent.children]: + if allow_overwrite: + child = parent.get_node(node_name) + child.parent = None + del child + else: + # TODO should this be before we walk to the new node? + raise KeyError(f"Cannot set item at {path} whilst that path already points to a " + f"{type(parent.get_node(node_name))} object") + + # Place new child node at this location + parent.add_child(node) + + def glob(self, path: str): + return self._resolver.glob(self, path) + + @property + def tags(self) -> Tuple[Hashable]: + """All tags, returned in order starting from the root node""" + return tuple(self.path.split(self.separator)) + + @tags.setter + def tags(self, value): + raise AttributeError(f"tags cannot be set, except via changing the children and/or parent of a node.") + + # TODO re-implement using anytree findall function + def get_all(self, *tags: Hashable) -> DataTree: + """ + Return a DataTree containing the stored objects whose path contains all of the given tags, + where the tags can be present in any order. + """ + matching_children = {c.tags: c.get(tags) for c in self._walk_children() + if all(tag in c.tags for tag in tags)} + return DataTree(data_objects=matching_children) + + # TODO re-implement using anytree find function + def get_any(self, *tags: Hashable) -> DataTree: + """ + Return a DataTree containing the stored objects whose path contains any of the given tags. + """ + matching_children = {c.tags: c.get(tags) for c in self._walk_children() + if any(tag in c.tags for tag in tags)} + return DataTree(data_objects=matching_children) From a5a3a69b5c1dded90c9c062377d3cc4f33c6ea92 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Fri, 20 Aug 2021 02:15:08 -0400 Subject: [PATCH 018/260] prevent circular import of DataTree --- xarray/datatree_/datatree/datatree.py | 21 ++++++++++++++++++++- xarray/datatree_/datatree/treenode.py | 22 ++++------------------ 2 files changed, 24 insertions(+), 19 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index e3acf439e90..e37800e3a27 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -239,6 +239,25 @@ def render(self): for ds_line in repr(node.ds)[1:]: print(f"{fill}{ds_line}") + # TODO re-implement using anytree findall function? + def get_all(self, *tags: Hashable) -> DataTree: + """ + Return a DataTree containing the stored objects whose path contains all of the given tags, + where the tags can be present in any order. + """ + matching_children = {c.tags: c.get_node(tags) for c in self.descendants + if all(tag in c.tags for tag in tags)} + return DataTree(data_objects=matching_children) + + # TODO re-implement using anytree find function? + def get_any(self, *tags: Hashable) -> DataTree: + """ + Return a DataTree containing the stored objects whose path contains any of the given tags. + """ + matching_children = {c.tags: c.get_node(tags) for c in self.descendants + if any(tag in c.tags for tag in tags)} + return DataTree(data_objects=matching_children) + class DataTree(DatasetNode): """ @@ -315,7 +334,7 @@ def as_dataarray(self) -> DataArray: @property def groups(self): """Return all netCDF4 groups in the tree, given as a tuple of path-like strings.""" - return tuple(node.path for node in anytree.iterators.PreOrderIter(self)) + return tuple(node.path for node in self.subtree_nodes) def to_netcdf(self, filename: str): from .io import _datatree_to_netcdf diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index c17750c94a2..1a7199c9e0f 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -182,21 +182,7 @@ def tags(self) -> Tuple[Hashable]: def tags(self, value): raise AttributeError(f"tags cannot be set, except via changing the children and/or parent of a node.") - # TODO re-implement using anytree findall function - def get_all(self, *tags: Hashable) -> DataTree: - """ - Return a DataTree containing the stored objects whose path contains all of the given tags, - where the tags can be present in any order. - """ - matching_children = {c.tags: c.get(tags) for c in self._walk_children() - if all(tag in c.tags for tag in tags)} - return DataTree(data_objects=matching_children) - - # TODO re-implement using anytree find function - def get_any(self, *tags: Hashable) -> DataTree: - """ - Return a DataTree containing the stored objects whose path contains any of the given tags. - """ - matching_children = {c.tags: c.get(tags) for c in self._walk_children() - if any(tag in c.tags for tag in tags)} - return DataTree(data_objects=matching_children) + @property + def subtree_nodes(self): + """An iterator over all nodes in this tree, including both self and descendants.""" + return anytree.iterators.PreOrderIter(self) From b9b738074795f76c52931dc273ce2b1850c4a7a0 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Fri, 20 Aug 2021 17:01:13 -0400 Subject: [PATCH 019/260] added .pathstr --- .../datatree_/datatree/tests/test_treenode.py | 17 ++++++++++++++++- xarray/datatree_/datatree/treenode.py | 9 ++++++++- 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/xarray/datatree_/datatree/tests/test_treenode.py b/xarray/datatree_/datatree/tests/test_treenode.py index a2bb998edff..56768f10d34 100644 --- a/xarray/datatree_/datatree/tests/test_treenode.py +++ b/xarray/datatree_/datatree/tests/test_treenode.py @@ -229,6 +229,13 @@ def test_dont_overwrite_child(self): class TestPaths: + def test_pathstr(self): + john = TreeNode("john") + mary = TreeNode("mary", parent=john) + rose = TreeNode("rose", parent=mary) + sue = TreeNode("sue", parent=rose) + assert sue.pathstr == "john/mary/rose/sue" + def test_relative_path(self): ... @@ -237,5 +244,13 @@ class TestTags: ... +@pytest.mark.xfail class TestRenderTree: - ... + def test_render_nodetree(self): + mary = TreeNode("mary") + kate = TreeNode("kate") + john = TreeNode("john", children=[mary, kate]) + sam = TreeNode("Sam", parent=mary) + ben = TreeNode("Ben", parent=mary) + john.render() + raise NotImplementedError diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index 1a7199c9e0f..df3d9e3644f 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -28,6 +28,8 @@ class TreeNode(anytree.NodeMixin): # the names of stored objects, only their tags (i.e. position in the family tree) # Ultimately you either need a list of named children, or a dictionary of unnamed children + # TODO change .path in the parent class to behave like .path_str does here. (old .path -> .walk_path()) + _resolver = anytree.Resolver('name') def __init__( @@ -50,6 +52,11 @@ def __str__(self): def __repr__(self): return f"TreeNode(name='{self.name}', parent={str(self.parent)}, children={[str(c) for c in self.children]})" + @property + def pathstr(self) -> str: + """Path from root to this node, as a filepath-like string.""" + return '/'.join(self.tags) + def render(self): """Print tree structure, with only node names displayed.""" # TODO should be rewritten to reflect names of children rather than names of nodes, probably like anytree.node @@ -176,7 +183,7 @@ def glob(self, path: str): @property def tags(self) -> Tuple[Hashable]: """All tags, returned in order starting from the root node""" - return tuple(self.path.split(self.separator)) + return tuple(node.name for node in self.path) @tags.setter def tags(self, value): From 0c0b072c83f37bed636f26ddc9a3803a5b8a0115 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Fri, 20 Aug 2021 17:01:40 -0400 Subject: [PATCH 020/260] build DataTrees from DataTree.__init__ --- xarray/datatree_/datatree/datatree.py | 104 ++++++++++++++---- .../datatree_/datatree/tests/test_datatree.py | 68 +++++++----- 2 files changed, 127 insertions(+), 45 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index d8fdf593c73..620405c67bd 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -1,5 +1,6 @@ from __future__ import annotations import functools +import textwrap from typing import Mapping, Hashable, Union, List, Any, Callable, Iterable, Dict @@ -54,7 +55,38 @@ def _map_over_subtree(tree, func, *args, **kwargs): def map_over_subtree(func): - """Decorator to turn a function which acts on (and returns) single Datasets into one which acts on DataTrees.""" + """ + Decorator which turns a function which acts on (and returns) single Datasets into one which acts on DataTrees. + + Applies a function to every dataset in this subtree, returning a new tree which stores the results. + + The function will be applied to any dataset stored in this node, as well as any dataset stored in any of the + descendant nodes. The returned tree will have the same structure as the original subtree. + + func needs to return a Dataset in order to rebuild the subtree. + + Parameters + ---------- + func : callable + Function to apply to datasets with signature: + `func(node.ds, *args, **kwargs) -> Dataset`. + + Function will not be applied to any nodes without datasets. + *args : tuple, optional + Positional arguments passed on to `func`. + **kwargs : Any + Keyword arguments passed on to `func`. + + Returns + ------- + mapped : callable + Wrapped function which returns tree created from results of applying ``func`` to the dataset at each node. + + See also + -------- + DataTree.map_over_subtree + DataTree.map_over_subtree_inplace + """ return functools.wraps(func)(_map_over_subtree) @@ -71,7 +103,11 @@ class DatasetNode(TreeNode): _DS_PROPERTIES = ['variables', 'attrs', 'encoding', 'dims', 'sizes'] # TODO add all the other methods to dispatch - _DS_METHODS_TO_MAP_OVER_SUBTREES = ['isel', 'sel', 'min', 'max', '__array_ufunc__'] + _DS_METHODS_TO_MAP_OVER_SUBTREES = ['isel', 'sel', 'min', 'max', 'mean', '__array_ufunc__'] + _MAPPED_DOCSTRING_ADDENDUM = textwrap.fill("This method was copied from xarray.Dataset, but has been altered to " + "call the method on the Datasets stored in every node of the subtree. " + "See the datatree.map_over_subtree decorator for more details.", + width=117) # TODO currently allows self.ds = None, should we instead always store at least an empty Dataset? @@ -93,9 +129,16 @@ def __init__( # Enable dataset API methods for method_name in self._DS_METHODS_TO_MAP_OVER_SUBTREES: + # Expose Dataset method, but wrapped to map over whole subtree ds_method = getattr(Dataset, method_name) setattr(self, method_name, map_over_subtree(ds_method)) + # Add a line to the method's docstring explaining how it's been mapped + ds_method_docstring = getattr(Dataset, f'{method_name}').__doc__ + if ds_method_docstring is not None: + updated_method_docstring = ds_method_docstring.replace('\n', self._MAPPED_DOCSTRING_ADDENDUM, 1) + setattr(self, f'{method_name}.__doc__', updated_method_docstring) + @property def ds(self) -> Dataset: return self._ds @@ -110,7 +153,7 @@ def ds(self, data: Union[Dataset, DataArray] = None): @property def has_data(self): - return self.ds is None + return self.ds is not None def __getitem__(self, key: Union[PathType, Hashable, Mapping, Any]) -> Union[TreeNode, Dataset, DataArray]: """ @@ -131,18 +174,19 @@ def __getitem__(self, key: Union[PathType, Hashable, Mapping, Any]) -> Union[Tre # dict-like to variables return self.ds[key] elif utils.hashable(key): - if key in self.ds: + print(self.has_data) + if self.has_data and key in self.ds.data_vars: # hashable variable return self.ds[key] else: # hashable child name (or path-like) - return self.get(key) + return self.get_node(key) else: # iterable of hashables first_key, *_ = key if first_key in self.children: # iterable of child tags - return self.get(key) + return self.get_node(key) else: # iterable of variable names return self.ds[key] @@ -170,12 +214,12 @@ def __setitem__( if isinstance(value, (DataArray, Variable)): self.ds[key] = value elif isinstance(value, TreeNode): - self.set(path=key, value=value) + self.set_node(path=key, node=value) elif isinstance(value, Dataset): # TODO fix this splitting up of path *path_to_new_node, node_name = key new_node = DatasetNode(name=node_name, data=value, parent=self) - self.set(path=key, value=new_node) + self.set_node(path=key, node=new_node) else: raise TypeError("Can only assign values of type TreeNode, Dataset, DataArray, or Variable, " f"not {type(value)}") @@ -215,7 +259,7 @@ def map_over_subtree( return _map_over_subtree(self, func, *args, **kwargs) - def map_inplace_over_subtree( + def map_over_subtree_inplace( self, func: Callable, *args: Iterable[Any], @@ -246,10 +290,16 @@ def map_inplace_over_subtree( # TODO map applied ufuncs over all leaves def __str__(self): - return f"DatasetNode('{self.name}', data={self.ds})" + return f"DatasetNode('{self.name}', data={type(self.ds)})" def __repr__(self): - return f"TreeNode(name='{self.name}', data={str(self.ds)}, parent={str(self.parent)}, children={[str(c) for c in self.children]})" + # TODO update this to indent nicely + return f"TreeNode(\n" \ + f" name='{self.name}',\n" \ + f" data={str(self.ds)},\n" \ + f" parent={str(self.parent)},\n" \ + f" children={tuple(str(c) for c in self.children)}\n" \ + f")" def render(self): """Print tree structure, including any data stored at each node.""" @@ -291,32 +341,48 @@ class DataTree(DatasetNode): is known as a single "tag"). If path names containing more than one tag are given, new tree nodes will be constructed as necessary. - To assign data to the root node of the tree use an empty string as the path. + To assign data to the root node of the tree {name} as the path. name : Hashable, optional Name for the root node of the tree. Default is "root" """ - # TODO Add attrs dict by inheriting from xarray.core.common.AttrsAccessMixin + # TODO Add attrs dict + + # TODO attribute-like access for both vars and child nodes (by inheriting from xarray.core.common.AttrsAccessMixin?) + + # TODO ipython autocomplete for child nodes # TODO Some way of sorting children by depth # TODO Consistency in copying vs updating objects - # TODO ipython autocomplete for child nodes - def __init__( self, - data_objects: Dict[PathType, Union[Dataset, DataArray, DatasetNode, None]] = None, + data_objects: Dict[PathType, Union[Dataset, DataArray]] = None, name: Hashable = "root", ): - root_data = data_objects.pop("", None) + if data_objects is not None: + root_data = data_objects.pop(name, None) + else: + root_data = None super().__init__(name=name, data=root_data, parent=None, children=None) # TODO re-implement using anytree.DictImporter? if data_objects: # Populate tree with children determined from data_objects mapping - for path in sorted(data_objects): - self._set_item(path, data_objects[path], allow_overwrite=False, new_nodes_along_path=True) + for path, data in data_objects.items(): + # Determine name of new node + path = self._tuple_or_path_to_path(path) + if self.separator in path: + node_path, node_name = path.rsplit(self.separator, maxsplit=1) + else: + node_path, node_name = '/', path + + # Create and set new node + new_node = DatasetNode(name=node_name, data=data) + self.set_node(node_path, new_node, allow_overwrite=False, new_nodes_along_path=True) + new_node = self.get_node(path) + new_node[path] = data # TODO do we need a watch out for if methods intended only for root nodes are calle on non-root nodes? diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 835b7dd32a1..8c59801b5fd 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -41,11 +41,11 @@ def create_test_datatree(): root_data = xr.Dataset({'a': ('y', [6, 7, 8]), 'set1': ('x', [9, 10])}) # Avoid using __init__ so we can independently test it - root = DataTree(data_objects={'/': root_data}) + root = DataTree(data_objects={'root': root_data}) set1 = DatasetNode(name="set1", parent=root, data=set1_data) set1_set1 = DatasetNode(name="set1", parent=set1) - set1_set2 = DatasetNode(name="set1", parent=set1) - set2 = DatasetNode(name="set1", parent=root, data=set2_data) + set1_set2 = DatasetNode(name="set2", parent=set1) + set2 = DatasetNode(name="set2", parent=root, data=set2_data) set2_set1 = DatasetNode(name="set1", parent=set2) set3 = DatasetNode(name="set3", parent=root) @@ -68,13 +68,30 @@ def test_set_data(self): with pytest.raises(TypeError): john.ds = "junk" + def test_has_data(self): + john = DatasetNode("john", data=xr.Dataset({'a': 0})) + assert john.has_data + + john = DatasetNode("john", data=None) + assert not john.has_data + class TestGetItems: ... class TestSetItems: - ... + def test_set_dataset(self): + ... + + def test_set_named_dataarray(self): + ... + + def test_set_unnamed_dataarray(self): + ... + + def test_set_node(self): + ... class TestTreeCreation: @@ -86,28 +103,35 @@ def test_empty(self): assert dt.ds is None def test_data_in_root(self): - dt = DataTree({"root": xr.Dataset()}) - print(dt.name) + dat = xr.Dataset() + dt = DataTree({"root": dat}) assert dt.name == "root" assert dt.parent is None - - child = dt.children[0] - assert dt.children is (TreeNode('root')) - assert dt.ds is xr.Dataset() + assert dt.children is () + assert dt.ds is dat def test_one_layer(self): - dt = DataTree({"run1": xr.Dataset(), "run2": xr.DataArray()}) + dat1, dat2 = xr.Dataset({'a': 1}), xr.Dataset({'b': 2}) + dt = DataTree({"run1": dat1, "run2": dat2}) + assert dt.ds is None + assert dt['run1'].ds is dat1 + assert dt['run2'].ds is dat2 def test_two_layers(self): - dt = DataTree({"highres/run1": xr.Dataset(), "highres/run2": xr.Dataset()}) - - dt = DataTree({"highres/run1": xr.Dataset(), "lowres/run1": xr.Dataset()}) - assert dt.children == ... + dat1, dat2 = xr.Dataset({'a': 1}), xr.Dataset({'a': [1, 2]}) + dt = DataTree({"highres/run": dat1, "lowres/run": dat2}) + assert 'highres' in [c.name for c in dt.children] + assert 'lowres' in [c.name for c in dt.children] + highres_run = dt.get_node('highres/run') + assert highres_run.ds is dat1 def test_full(self): dt = create_test_datatree() - print(dt) - assert False + paths = list(node.pathstr for node in dt.subtree_nodes) + assert paths == ['root', 'root/set1', 'root/set1/set1', + 'root/set1/set2', + 'root/set2', 'root/set2/set1', + 'root/set3'] class TestBrowsing: @@ -118,16 +142,8 @@ class TestRestructuring: ... +@pytest.mark.xfail class TestRepr: - def test_render_nodetree(self): - mary = TreeNode("mary") - kate = TreeNode("kate") - john = TreeNode("john", children=[mary, kate]) - sam = TreeNode("Sam", parent=mary) - ben = TreeNode("Ben", parent=mary) - john.render() - assert False - def test_render_datatree(self): dt = create_test_datatree() dt.render() From 5ca5e636da60229983e17ba4012bda1511242d57 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Sun, 22 Aug 2021 02:15:17 -0400 Subject: [PATCH 021/260] __getitem__ on datatree now works --- xarray/datatree_/datatree/datatree.py | 56 ++++++++++++------- .../datatree_/datatree/tests/test_datatree.py | 46 ++++++++++++++- xarray/datatree_/datatree/treenode.py | 2 + 3 files changed, 83 insertions(+), 21 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 620405c67bd..ecd83fe00b5 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -157,39 +157,55 @@ def has_data(self): def __getitem__(self, key: Union[PathType, Hashable, Mapping, Any]) -> Union[TreeNode, Dataset, DataArray]: """ - Access either child nodes, or variables or coordinates stored in this node. + Access either child nodes, variables, or coordinates stored in this tree. - Variable or coordinates of the contained dataset will be returned as a :py:class:`~xarray.DataArray`. + Variables or coordinates of the contained dataset will be returned as a :py:class:`~xarray.DataArray`. Indexing with a list of names will return a new ``Dataset`` object. + Like Dataset.__getitem__ this method also accepts dict-like indexing, and selection of multiple data variables + (from the same Dataset node) via list. + Parameters ---------- key : - If a path to child node then names can be given as unix-like paths, or as tuples of strings + Paths to nodes or to data variables in nodes can be given as unix-like paths, or as tuples of strings (where each string is known as a single "tag"). - """ # Either: if utils.is_dict_like(key): - # dict-like to variables + # dict-like selection on dataset variables return self.ds[key] elif utils.hashable(key): - print(self.has_data) - if self.has_data and key in self.ds.data_vars: - # hashable variable - return self.ds[key] - else: - # hashable child name (or path-like) - return self.get_node(key) + # path-like: a path to a node possibly with a variable name at the end + return self._get_item_from_path(key) + elif utils.is_list_like(key) and all(k in self.ds for k in key): + # iterable of variable names + return self.ds[key] + elif utils.is_list_like(key) and all('/' not in tag for tag in key): + # iterable of child tags + return self._get_item_from_path(key) else: - # iterable of hashables - first_key, *_ = key - if first_key in self.children: - # iterable of child tags - return self.get_node(key) - else: - # iterable of variable names - return self.ds[key] + raise ValueError("Invalid format for key") + + def _get_item_from_path(self, path: PathType) -> Union[TreeNode, Dataset, DataArray]: + """Get item given a path. Two valid cases: either all parts of path are nodes or last part is a variable.""" + + # TODO this currently raises a ChildResolverError if it can't find a data variable in the ds - that's inconsistent with xarray.Dataset.__getitem__ + + path = self._tuple_or_path_to_path(path) + + tags = [tag for tag in path.split(self.separator) if tag not in [self.separator, '']] + *leading_tags, last_tag = tags + + if leading_tags is not None: + penultimate = self.get_node(tuple(leading_tags)) + else: + penultimate = self + + if penultimate.has_data and last_tag in penultimate.ds: + return penultimate.ds[last_tag] + else: + return penultimate.get_node(last_tag) def __setitem__( self, diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 8c59801b5fd..61d2bb98158 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -1,6 +1,9 @@ import pytest import xarray as xr +from xarray.testing import assert_identical + +from anytree.resolver import ChildResolverError from datatree import DataTree from datatree.datatree import DatasetNode @@ -77,7 +80,48 @@ def test_has_data(self): class TestGetItems: - ... + def test_get_node(self): + folder1 = DatasetNode("folder1") + results = DatasetNode("results", parent=folder1) + highres = DatasetNode("highres", parent=results) + assert folder1["results"] is results + assert folder1["results/highres"] is highres + assert folder1[("results", "highres")] is highres + + def test_get_single_data_variable(self): + data = xr.Dataset({"temp": [0, 50]}) + results = DatasetNode("results", data=data) + assert_identical(results["temp"], data["temp"]) + + def test_get_single_data_variable_from_node(self): + data = xr.Dataset({"temp": [0, 50]}) + folder1 = DatasetNode("folder1") + results = DatasetNode("results", parent=folder1) + highres = DatasetNode("highres", parent=results, data=data) + assert_identical(folder1["results/highres/temp"], data["temp"]) + assert_identical(folder1[("results", "highres", "temp")], data["temp"]) + + def test_get_nonexistent_node(self): + folder1 = DatasetNode("folder1") + results = DatasetNode("results", parent=folder1) + with pytest.raises(ChildResolverError): + folder1["results/highres"] + + def test_get_nonexistent_variable(self): + data = xr.Dataset({"temp": [0, 50]}) + results = DatasetNode("results", data=data) + with pytest.raises(ChildResolverError): + results["pressure"] + + def test_get_multiple_data_variables(self): + data = xr.Dataset({"temp": [0, 50], "p": [5, 8, 7]}) + results = DatasetNode("results", data=data) + assert_identical(results[['temp', 'p']], data[['temp', 'p']]) + + def test_dict_like_selection_access_to_dataset(self): + data = xr.Dataset({"temp": [0, 50]}) + results = DatasetNode("results", data=data) + assert_identical(results[{'temp': 1}], data[{'temp': 1}]) class TestSetItems: diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index df3d9e3644f..0b08da0b0d9 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -106,6 +106,8 @@ def get_node(self, path: PathType) -> TreeNode: ------- node """ + # TODO change so this raises a standard KeyError instead of a ChildResolverError when it can't find an item + p = self._tuple_or_path_to_path(path) return anytree.Resolver('name').get(self, p) From fb6c404c86b30416a8f98a56f3a0777d3667fe2f Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Sun, 22 Aug 2021 04:07:43 -0400 Subject: [PATCH 022/260] __setitem__ for data --- xarray/datatree_/datatree/datatree.py | 91 ++++++++++++++----- .../datatree_/datatree/tests/test_datatree.py | 65 +++++++++++-- .../datatree_/datatree/tests/test_treenode.py | 4 + xarray/datatree_/datatree/treenode.py | 9 +- 4 files changed, 138 insertions(+), 31 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index ecd83fe00b5..3892e357ab2 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -42,7 +42,7 @@ def _map_over_subtree(tree, func, *args, **kwargs): out_tree = DataTree(name=tree.name, data_objects={}) for node in tree.subtree_nodes: - relative_path = tree.path.replace(node.path, '') + relative_path = tree.pathstr.replace(node.pathstr, '') if node.has_data: result = func(node.ds, *args, **kwargs) @@ -193,7 +193,6 @@ def _get_item_from_path(self, path: PathType) -> Union[TreeNode, Dataset, DataAr # TODO this currently raises a ChildResolverError if it can't find a data variable in the ds - that's inconsistent with xarray.Dataset.__getitem__ path = self._tuple_or_path_to_path(path) - tags = [tag for tag in path.split(self.separator) if tag not in [self.separator, '']] *leading_tags, last_tag = tags @@ -210,41 +209,91 @@ def _get_item_from_path(self, path: PathType) -> Union[TreeNode, Dataset, DataAr def __setitem__( self, key: Union[Hashable, List[Hashable], Mapping, PathType], - value: Union[TreeNode, Dataset, DataArray, Variable] + value: Union[TreeNode, Dataset, DataArray, Variable], ) -> None: """ - Add either a child node or an array to this node. + Add either a child node or an array to the tree, at any position. + + Data can be added anywhere, and new nodes will be created to cross the path to the new location if necessary. + + If there is already a node at the given location, then if value is a Node class or Dataset it will overwrite the + data already present at that node, and if value is a single array, it will be merged with it. Parameters ---------- key - Either a path-like address for a new node, or the name of a new variable. + A path-like address for either a new node, or the address and name of a new variable, or the name of a new + variable. value - If a node class or a Dataset, it will be added as a new child node. - If an single array (i.e. DataArray, Variable), it will be added to the underlying Dataset. + Can be a node class or a data object (i.e. Dataset, DataArray, Variable). """ + + # TODO xarray.Dataset accepts other possibilities, how do we exactly replicate all the behaviour? if utils.is_dict_like(key): - # TODO xarray.Dataset accepts other possibilities, how do we exactly replicate the behaviour? raise NotImplementedError - else: - if isinstance(value, (DataArray, Variable)): - self.ds[key] = value + + path = self._tuple_or_path_to_path(key) + tags = [tag for tag in path.split(self.separator) if tag not in [self.separator, '']] + + # TODO a .path_as_tags method? + if not tags: + # only dealing with this node, no need for paths + if isinstance(value, (Dataset, DataArray, Variable)): + # single arrays will replace whole Datasets, as no name for new variable was supplied + self.ds = value elif isinstance(value, TreeNode): - self.set_node(path=key, node=value) - elif isinstance(value, Dataset): - # TODO fix this splitting up of path - *path_to_new_node, node_name = key - new_node = DatasetNode(name=node_name, data=value, parent=self) - self.set_node(path=key, node=new_node) + self.add_child(value) else: raise TypeError("Can only assign values of type TreeNode, Dataset, DataArray, or Variable, " f"not {type(value)}") + else: + *path_tags, last_tag = tags + if not path_tags: + path_tags = '/' + + # get anything that already exists at that location + try: + if isinstance(value, TreeNode): + # last tag is the name of the supplied node + existing_node = self.get_node(path) + else: + existing_node = self.get_node(tuple(path_tags)) + except anytree.resolver.ResolverError: + existing_node = None + + if existing_node: + if isinstance(value, Dataset): + # replace whole dataset + existing_node.ds = Dataset + elif isinstance(value, (DataArray, Variable)): + if not existing_node.has_data: + # promotes da to ds + existing_node.ds = value + else: + # update with new da + existing_node.ds[last_tag] = value + elif isinstance(value, TreeNode): + # overwrite with new node at same path + self.set_node(path=path, node=value) + else: + raise TypeError("Can only assign values of type TreeNode, Dataset, DataArray, or Variable, " + f"not {type(value)}") + else: + # if nothing there then make new node based on type of object + if isinstance(value, (Dataset, DataArray, Variable)): + new_node = DatasetNode(name=last_tag, data=value) + self.set_node(path=path_tags, node=new_node) + elif isinstance(value, TreeNode): + self.set_node(path=path, node=value) + else: + raise TypeError("Can only assign values of type TreeNode, Dataset, DataArray, or Variable, " + f"not {type(value)}") def map_over_subtree( - self, - func: Callable, - *args: Iterable[Any], - **kwargs: Any, + self, + func: Callable, + *args: Iterable[Any], + **kwargs: Any, ) -> DataTree: """ Apply a function to every dataset in this subtree, returning a new tree which stores the results. diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 61d2bb98158..7675e16d6ef 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -125,17 +125,66 @@ def test_dict_like_selection_access_to_dataset(self): class TestSetItems: - def test_set_dataset(self): - ... + # TODO test tuple-style access too + def test_set_new_child_node(self): + john = DatasetNode("john") + mary = DatasetNode("mary") + john['/'] = mary + assert john['mary'] is mary - def test_set_named_dataarray(self): - ... + def test_set_new_grandchild_node(self): + john = DatasetNode("john") + mary = DatasetNode("mary", parent=john) + rose = DatasetNode("rose") + john['/mary/'] = rose + assert john['mary/rose'] is rose - def test_set_unnamed_dataarray(self): - ... + def test_set_dataset_on_this_node(self): + data = xr.Dataset({"temp": [0, 50]}) + results = DatasetNode("results") + results['/'] = data + assert results.ds is data + + def test_set_dataset_as_new_node(self): + data = xr.Dataset({"temp": [0, 50]}) + folder1 = DatasetNode("folder1") + folder1['results'] = data + assert folder1['results'].ds is data - def test_set_node(self): - ... + def test_set_dataset_as_new_node_requiring_intermediate_nodes(self): + data = xr.Dataset({"temp": [0, 50]}) + folder1 = DatasetNode("folder1") + folder1['results/highres'] = data + assert folder1['results/highres'].ds is data + + def test_set_named_dataarray_as_new_node(self): + data = xr.DataArray(name='temp', data=[0, 50]) + folder1 = DatasetNode("folder1") + folder1['results'] = data + assert_identical(folder1['results'].ds, data.to_dataset()) + + def test_set_unnamed_dataarray(self): + data = xr.DataArray([0, 50]) + folder1 = DatasetNode("folder1") + with pytest.raises(ValueError, match="unable to convert"): + folder1['results'] = data + + def test_add_new_variable_to_empty_node(self): + results = DatasetNode("results") + results['/'] = xr.DataArray(name='pressure', data=[2, 3]) + assert 'pressure' in results.ds + + # What if there is a path to traverse first? + results = DatasetNode("results") + results['/highres/'] = xr.DataArray(name='pressure', data=[2, 3]) + assert 'pressure' in results['highres'].ds + + def test_dataarray_replace_existing_node(self): + t = xr.Dataset({"temp": [0, 50]}) + results = DatasetNode("results", data=t) + p = xr.DataArray(name='pressure', data=[2, 3]) + results['/'] = p + assert_identical(results.ds, p.to_dataset()) class TestTreeCreation: diff --git a/xarray/datatree_/datatree/tests/test_treenode.py b/xarray/datatree_/datatree/tests/test_treenode.py index 56768f10d34..dd029c76e4a 100644 --- a/xarray/datatree_/datatree/tests/test_treenode.py +++ b/xarray/datatree_/datatree/tests/test_treenode.py @@ -228,6 +228,10 @@ def test_dont_overwrite_child(self): assert marys_evil_twin not in john.children +class TestPruning: + ... + + class TestPaths: def test_pathstr(self): john = TreeNode("john") diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index 0b08da0b0d9..1a6285c9147 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -57,6 +57,10 @@ def pathstr(self) -> str: """Path from root to this node, as a filepath-like string.""" return '/'.join(self.tags) + @property + def has_data(self): + return False + def render(self): """Print tree structure, with only node names displayed.""" # TODO should be rewritten to reflect names of children rather than names of nodes, probably like anytree.node @@ -85,10 +89,11 @@ def add_child(self, child: TreeNode) -> None: def _tuple_or_path_to_path(cls, address: PathType) -> str: if isinstance(address, str): return address - elif isinstance(address, tuple): + # TODO check for iterable in general instead + elif isinstance(address, (tuple, list)): return cls.separator.join(tag for tag in address) else: - raise ValueError(f"{address} is not a valid form of path") + raise TypeError(f"{address} is not a valid form of path") def get_node(self, path: PathType) -> TreeNode: """ From 109666a701147b7d326d02ea767249621c426a35 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Sun, 22 Aug 2021 21:32:45 -0400 Subject: [PATCH 023/260] printable representations of trees --- xarray/datatree_/datatree/datatree.py | 58 +++++++++++++++---- .../datatree_/datatree/tests/test_datatree.py | 36 +++++++++++- .../datatree_/datatree/tests/test_treenode.py | 12 +++- xarray/datatree_/datatree/treenode.py | 22 +++---- 4 files changed, 100 insertions(+), 28 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 3892e357ab2..6c5cfe0e0ce 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -155,6 +155,52 @@ def ds(self, data: Union[Dataset, DataArray] = None): def has_data(self): return self.ds is not None + def __str__(self): + """A printable representation of the structure of this entire subtree.""" + renderer = anytree.RenderTree(self) + + lines = [] + for pre, fill, node in renderer: + node_repr = node._single_node_repr() + + node_line = f"{pre}{node_repr.splitlines()[0]}" + lines.append(node_line) + + if node.has_data: + ds_repr = node_repr.splitlines()[2:] + for line in ds_repr: + if len(node.children) > 0: + lines.append(f"{fill}{renderer.style.vertical}{line}") + else: + lines.append(f"{fill}{line}") + + return "\n".join(lines) + + def _single_node_repr(self): + """Information about this node, not including its relationships to other nodes.""" + node_info = f"DatasetNode('{self.name}')" + + if self.has_data: + ds_info = '\n' + repr(self.ds) + else: + ds_info = '' + return node_info + ds_info + + def __repr__(self): + """Information about this node, including its relationships to other nodes.""" + # TODO redo this to look like the Dataset repr, but just with child and parent info + parent = self.parent.name if self.parent else "None" + node_str = f"DatasetNode(name='{self.name}', parent='{parent}', children={[c.name for c in self.children]}," + + if self.has_data: + ds_repr_lines = self.ds.__repr__().splitlines() + ds_repr = ds_repr_lines[0] + '\n' + textwrap.indent('\n'.join(ds_repr_lines[1:]), " ") + data_str = f"\ndata={ds_repr}\n)" + else: + data_str = "data=None)" + + return node_str + data_str + def __getitem__(self, key: Union[PathType, Hashable, Mapping, Any]) -> Union[TreeNode, Dataset, DataArray]: """ Access either child nodes, variables, or coordinates stored in this tree. @@ -354,18 +400,6 @@ def map_over_subtree_inplace( # TODO map applied ufuncs over all leaves - def __str__(self): - return f"DatasetNode('{self.name}', data={type(self.ds)})" - - def __repr__(self): - # TODO update this to indent nicely - return f"TreeNode(\n" \ - f" name='{self.name}',\n" \ - f" data={str(self.ds)},\n" \ - f" parent={str(self.parent)},\n" \ - f" children={tuple(str(c) for c in self.children)}\n" \ - f")" - def render(self): """Print tree structure, including any data stored at each node.""" for pre, fill, node in anytree.RenderTree(self): diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 7675e16d6ef..e1d437c0ba4 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -235,11 +235,41 @@ class TestRestructuring: ... -@pytest.mark.xfail class TestRepr: - def test_render_datatree(self): + def test_print_empty_node(self): + dt = DatasetNode('root') + printout = dt.__str__() + assert printout == "DatasetNode('root')" + + def test_print_node_with_data(self): + dat = xr.Dataset({'a': [0, 2]}) + dt = DatasetNode('root', data=dat) + printout = dt.__str__() + expected = ["DatasetNode('root')", + "Dimensions", + "Coordinates", + "a", + "Data variables", + "*empty*"] + for expected_line, printed_line in zip(expected, printout.splitlines()): + assert expected_line in printed_line + + def test_nested_node(self): + dat = xr.Dataset({'a': [0, 2]}) + root = DatasetNode('root') + DatasetNode('results', data=dat, parent=root) + printout = root.__str__() + assert printout.splitlines()[2].startswith(" ") + + def test_print_datatree(self): dt = create_test_datatree() - dt.render() + print(dt) + # TODO work out how to test something complex like this + + def test_repr_of_node_with_data(self): + dat = xr.Dataset({'a': [0, 2]}) + dt = DatasetNode('root', data=dat) + assert "Coordinates" in repr(dt) class TestPropertyInheritance: diff --git a/xarray/datatree_/datatree/tests/test_treenode.py b/xarray/datatree_/datatree/tests/test_treenode.py index dd029c76e4a..fa8d23e1afb 100644 --- a/xarray/datatree_/datatree/tests/test_treenode.py +++ b/xarray/datatree_/datatree/tests/test_treenode.py @@ -248,7 +248,6 @@ class TestTags: ... -@pytest.mark.xfail class TestRenderTree: def test_render_nodetree(self): mary = TreeNode("mary") @@ -256,5 +255,12 @@ def test_render_nodetree(self): john = TreeNode("john", children=[mary, kate]) sam = TreeNode("Sam", parent=mary) ben = TreeNode("Ben", parent=mary) - john.render() - raise NotImplementedError + + printout = john.__str__() + expected_nodes = ["TreeNode('john')", + "TreeNode('mary')", + "TreeNode('Sam')", + "TreeNode('Ben')", + "TreeNode('kate')"] + for expected_node, printed_node in zip(expected_nodes, printout.splitlines()): + assert expected_node in printed_node diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index 1a6285c9147..cb49492a2c4 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -47,10 +47,21 @@ def __init__( self.children = children def __str__(self): + """A printable representation of the structure of this entire subtree.""" + lines = [] + for pre, _, node in anytree.RenderTree(self): + node_lines = f"{pre}{node._single_node_repr()}" + lines.append(node_lines) + return "\n".join(lines) + + def _single_node_repr(self): + """Information about this node, not including its relationships to other nodes.""" return f"TreeNode('{self.name}')" def __repr__(self): - return f"TreeNode(name='{self.name}', parent={str(self.parent)}, children={[str(c) for c in self.children]})" + """Information about this node, including its relationships to other nodes.""" + parent = self.parent.name if self.parent else "None" + return f"TreeNode(name='{self.name}', parent='{parent}', children={[c.name for c in self.children]})" @property def pathstr(self) -> str: @@ -61,15 +72,6 @@ def pathstr(self) -> str: def has_data(self): return False - def render(self): - """Print tree structure, with only node names displayed.""" - # TODO should be rewritten to reflect names of children rather than names of nodes, probably like anytree.node - # TODO add option to suppress dataset information beyond just variable names - #for pre, _, node in anytree.RenderTree(self): - # print(f"{pre}{node}") - args = ["%r" % self.separator.join([""] + [str(node.name) for node in self.path])] - print(anytree.node.util._repr(self, args=args, nameblacklist=["name"])) - def _pre_attach(self, parent: TreeNode) -> None: """ Method which superclass calls before setting parent, here used to prevent having two From 2f5387576d8e7b34ac3fe8cc6e8c563520afa26c Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Mon, 23 Aug 2021 12:46:52 -0400 Subject: [PATCH 024/260] Can now __setitem__ = None to delete a .ds --- xarray/datatree_/datatree/datatree.py | 72 ++++++++++++------- .../datatree_/datatree/tests/test_datatree.py | 45 +++++++----- xarray/datatree_/datatree/treenode.py | 6 +- 3 files changed, 79 insertions(+), 44 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 6c5cfe0e0ce..2326b9eed40 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -36,24 +36,6 @@ """ -def _map_over_subtree(tree, func, *args, **kwargs): - """Internal function which maps func over every node in tree, returning a tree of the results.""" - - out_tree = DataTree(name=tree.name, data_objects={}) - - for node in tree.subtree_nodes: - relative_path = tree.pathstr.replace(node.pathstr, '') - - if node.has_data: - result = func(node.ds, *args, **kwargs) - else: - result = None - - out_tree[relative_path] = DatasetNode(name=node.name, data=result) - - return out_tree - - def map_over_subtree(func): """ Decorator which turns a function which acts on (and returns) single Datasets into one which acts on DataTrees. @@ -87,7 +69,42 @@ def map_over_subtree(func): DataTree.map_over_subtree DataTree.map_over_subtree_inplace """ - return functools.wraps(func)(_map_over_subtree) + + @functools.wraps(func) + def _map_over_subtree(tree, *args, **kwargs): + """Internal function which maps func over every node in tree, returning a tree of the results.""" + + # Create and act on root node + out_tree = DatasetNode(name=tree.name, data=tree.ds) + + if out_tree.has_data: + out_tree.ds = func(out_tree.ds, *args, **kwargs) + + #print(out_tree) + + for node in tree.descendants: + relative_path = node.pathstr.replace(tree.pathstr, '') + + #print(repr(node)) + #print(relative_path) + + if node.has_data: + result = func(node.ds, *args, **kwargs) + out_tree[relative_path] = result + else: + result = None + out_tree[relative_path] = DatasetNode(name=node.name) + #out_tree.set_node(relative_path, None) + + + print(relative_path) + + + #out_tree.set_node(relative_path, DatasetNode(name=node.name, data=result)) + + print(out_tree) + return out_tree + return _map_over_subtree class DatasetNode(TreeNode): @@ -255,7 +272,7 @@ def _get_item_from_path(self, path: PathType) -> Union[TreeNode, Dataset, DataAr def __setitem__( self, key: Union[Hashable, List[Hashable], Mapping, PathType], - value: Union[TreeNode, Dataset, DataArray, Variable], + value: Union[TreeNode, Dataset, DataArray, Variable, None], ) -> None: """ Add either a child node or an array to the tree, at any position. @@ -265,6 +282,9 @@ def __setitem__( If there is already a node at the given location, then if value is a Node class or Dataset it will overwrite the data already present at that node, and if value is a single array, it will be merged with it. + If value is None a new node will be created but containing no data. If a node already exists at that path it + will have its .ds attribute set to None. (To remove node from the tree completely instead use `del tree[path]`.) + Parameters ---------- key @@ -289,6 +309,8 @@ def __setitem__( self.ds = value elif isinstance(value, TreeNode): self.add_child(value) + elif value is None: + self.ds = None else: raise TypeError("Can only assign values of type TreeNode, Dataset, DataArray, or Variable, " f"not {type(value)}") @@ -299,11 +321,7 @@ def __setitem__( # get anything that already exists at that location try: - if isinstance(value, TreeNode): - # last tag is the name of the supplied node - existing_node = self.get_node(path) - else: - existing_node = self.get_node(tuple(path_tags)) + existing_node = self.get_node(path) except anytree.resolver.ResolverError: existing_node = None @@ -321,12 +339,14 @@ def __setitem__( elif isinstance(value, TreeNode): # overwrite with new node at same path self.set_node(path=path, node=value) + elif value is None: + existing_node.ds = None else: raise TypeError("Can only assign values of type TreeNode, Dataset, DataArray, or Variable, " f"not {type(value)}") else: # if nothing there then make new node based on type of object - if isinstance(value, (Dataset, DataArray, Variable)): + if isinstance(value, (Dataset, DataArray, Variable)) or value is None: new_node = DatasetNode(name=last_tag, data=value) self.set_node(path=path_tags, node=new_node) elif isinstance(value, TreeNode): diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index e1d437c0ba4..f31ced24533 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -27,7 +27,7 @@ def create_test_datatree(): | | Dimensions: (x: 2) | | Data variables: | | a (x) int64 2, 3 - | | b (x) int64 'foo', 'bar' + | | b (x) int64 0.1, 0.2 | |-- set1 |-- set3 |-- @@ -40,11 +40,12 @@ def create_test_datatree(): dimensions in order to better check for bugs caused by name conflicts. """ set1_data = xr.Dataset({'a': 0, 'b': 1}) - set2_data = xr.Dataset({'a': ('x', [2, 3]), 'b': ('x', ['foo', 'bar'])}) + set2_data = xr.Dataset({'a': ('x', [2, 3]), 'b': ('x', [0.1, 0.2])}) root_data = xr.Dataset({'a': ('y', [6, 7, 8]), 'set1': ('x', [9, 10])}) # Avoid using __init__ so we can independently test it - root = DataTree(data_objects={'root': root_data}) + # TODO change so it has a DataTree at the bottom + root = DatasetNode(name='root', data=root_data) set1 = DatasetNode(name="set1", parent=root, data=set1_data) set1_set1 = DatasetNode(name="set1", parent=set1) set1_set2 = DatasetNode(name="set2", parent=set1) @@ -52,6 +53,8 @@ def create_test_datatree(): set2_set1 = DatasetNode(name="set1", parent=set2) set3 = DatasetNode(name="set3", parent=root) + #print(repr(root)) + return root @@ -136,9 +139,26 @@ def test_set_new_grandchild_node(self): john = DatasetNode("john") mary = DatasetNode("mary", parent=john) rose = DatasetNode("rose") - john['/mary/'] = rose + john['mary/'] = rose assert john['mary/rose'] is rose + def test_set_new_empty_node(self): + john = DatasetNode("john") + john['mary'] = None + mary = john['mary'] + assert isinstance(mary, DatasetNode) + assert mary.ds is None + + def test_overwrite_data_in_node_with_none(self): + john = DatasetNode("john") + mary = DatasetNode("mary", parent=john, data=xr.Dataset()) + john['mary'] = None + assert mary.ds is None + + john.ds = xr.Dataset() + john['/'] = None + assert john.ds is None + def test_set_dataset_on_this_node(self): data = xr.Dataset({"temp": [0, 50]}) results = DatasetNode("results") @@ -176,7 +196,7 @@ def test_add_new_variable_to_empty_node(self): # What if there is a path to traverse first? results = DatasetNode("results") - results['/highres/'] = xr.DataArray(name='pressure', data=[2, 3]) + results['highres/'] = xr.DataArray(name='pressure', data=[2, 3]) assert 'pressure' in results['highres'].ds def test_dataarray_replace_existing_node(self): @@ -264,7 +284,10 @@ def test_nested_node(self): def test_print_datatree(self): dt = create_test_datatree() print(dt) + print(dt.descendants) + # TODO work out how to test something complex like this + assert False def test_repr_of_node_with_data(self): dat = xr.Dataset({'a': [0, 2]}) @@ -272,17 +295,5 @@ def test_repr_of_node_with_data(self): assert "Coordinates" in repr(dt) -class TestPropertyInheritance: - ... - - -class TestMethodInheritance: - ... - - -class TestUFuncs: - ... - - class TestIO: ... diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index cb49492a2c4..386c4e4c8f4 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -166,7 +166,11 @@ def set_node( if tag not in [child.name for child in parent.children]: if new_nodes_along_path: # TODO prevent this from leaving a trail of nodes if the assignment fails somehow - parent.add_child(TreeNode(name=tag)) + + # Want child classes to populate tree with their own types + # TODO this seems like a code smell though... + new_node = type(self)(name=tag) + parent.add_child(new_node) else: raise KeyError(f"Cannot reach new node at path {path}: " f"parent {parent} has no child {tag}") From 2141fee70cbb674be978fd12482defc94c245f24 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Mon, 23 Aug 2021 13:13:23 -0400 Subject: [PATCH 025/260] map_over_subtree decorator and method now works --- xarray/datatree_/datatree/__init__.py | 3 +- xarray/datatree_/datatree/datatree.py | 30 ++----- .../datatree/tests/test_dataset_api.py | 82 +++++++++++++++++++ 3 files changed, 90 insertions(+), 25 deletions(-) create mode 100644 xarray/datatree_/datatree/tests/test_dataset_api.py diff --git a/xarray/datatree_/datatree/__init__.py b/xarray/datatree_/datatree/__init__.py index 5b61ab46634..980e1fb9da4 100644 --- a/xarray/datatree_/datatree/__init__.py +++ b/xarray/datatree_/datatree/__init__.py @@ -1,2 +1 @@ -from .datatree import DataTree -from .io import open_datatree, open_mfdatatree +from .datatree import DataTree, map_over_subtree diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 2326b9eed40..33462f54492 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -74,35 +74,19 @@ def map_over_subtree(func): def _map_over_subtree(tree, *args, **kwargs): """Internal function which maps func over every node in tree, returning a tree of the results.""" - # Create and act on root node + # Recreate and act on root node + # TODO make this of class DataTree out_tree = DatasetNode(name=tree.name, data=tree.ds) - if out_tree.has_data: out_tree.ds = func(out_tree.ds, *args, **kwargs) - #print(out_tree) - + # Act on every other node in the tree, and rebuild from results for node in tree.descendants: + # TODO make a proper relative_path method relative_path = node.pathstr.replace(tree.pathstr, '') + result = func(node.ds, *args, **kwargs) if node.has_data else None + out_tree[relative_path] = result - #print(repr(node)) - #print(relative_path) - - if node.has_data: - result = func(node.ds, *args, **kwargs) - out_tree[relative_path] = result - else: - result = None - out_tree[relative_path] = DatasetNode(name=node.name) - #out_tree.set_node(relative_path, None) - - - print(relative_path) - - - #out_tree.set_node(relative_path, DatasetNode(name=node.name, data=result)) - - print(out_tree) return out_tree return _map_over_subtree @@ -388,7 +372,7 @@ def map_over_subtree( """ # TODO this signature means that func has no way to know which node it is being called upon - change? - return _map_over_subtree(self, func, *args, **kwargs) + return map_over_subtree(func)(self, *args, **kwargs) def map_over_subtree_inplace( self, diff --git a/xarray/datatree_/datatree/tests/test_dataset_api.py b/xarray/datatree_/datatree/tests/test_dataset_api.py new file mode 100644 index 00000000000..89cd2f42b2f --- /dev/null +++ b/xarray/datatree_/datatree/tests/test_dataset_api.py @@ -0,0 +1,82 @@ +import pytest + +import xarray as xr +from xarray.testing import assert_equal + +from datatree import DataTree, map_over_subtree +from datatree.datatree import DatasetNode + +from test_datatree import create_test_datatree + + +class TestMapOverSubTree: + def test_map_over_subtree(self): + dt = create_test_datatree() + + @map_over_subtree + def times_ten(ds): + return 10.0 * ds + + result_tree = times_ten(dt) + + # TODO write an assert_tree_equal function + for result_node, original_node, in zip(result_tree.subtree_nodes, dt.subtree_nodes): + assert isinstance(result_node, DatasetNode) + + if original_node.has_data: + assert_equal(result_node.ds, original_node.ds * 10.0) + else: + assert not result_node.has_data + + def test_map_over_subtree_with_args_and_kwargs(self): + dt = create_test_datatree() + + @map_over_subtree + def multiply_then_add(ds, times, add=0.0): + return times * ds + add + + result_tree = multiply_then_add(dt, 10.0, add=2.0) + + for result_node, original_node, in zip(result_tree.subtree_nodes, dt.subtree_nodes): + assert isinstance(result_node, DatasetNode) + + if original_node.has_data: + assert_equal(result_node.ds, (original_node.ds * 10.0) + 2.0) + else: + assert not result_node.has_data + + def test_map_over_subtree_method(self): + dt = create_test_datatree() + + def multiply_then_add(ds, times, add=0.0): + return times * ds + add + + result_tree = dt.map_over_subtree(multiply_then_add, 10.0, add=2.0) + + for result_node, original_node, in zip(result_tree.subtree_nodes, dt.subtree_nodes): + assert isinstance(result_node, DatasetNode) + + if original_node.has_data: + assert_equal(result_node.ds, (original_node.ds * 10.0) + 2.0) + else: + assert not result_node.has_data + + @pytest.mark.xfail + def test_map_over_subtree_inplace(self): + raise NotImplementedError + + +class TestDSPropertyInheritance: + ... + + +class TestDSMethodInheritance: + ... + + +class TestBinaryOps: + ... + + +class TestUFuncs: + ... From 9f3de47abbbac6a68c6bd68af0142f297874e6ba Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Mon, 23 Aug 2021 16:04:47 -0400 Subject: [PATCH 026/260] expose properties of wrapped Dataset --- xarray/datatree_/datatree/datatree.py | 66 ++++++++++++++++--- .../datatree/tests/test_dataset_api.py | 27 +++++++- 2 files changed, 83 insertions(+), 10 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 33462f54492..b8c4c6042bc 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -91,7 +91,55 @@ def _map_over_subtree(tree, *args, **kwargs): return _map_over_subtree -class DatasetNode(TreeNode): +class DatasetPropertiesMixin: + """Expose properties of wrapped Dataset""" + + # TODO a neater / more succinct way of doing this? + # we wouldn't need it at all if we inherited directly from Dataset... + + @property + def dims(self): + if self.has_data: + return self.ds.dims + else: + raise AttributeError("property is not defined for a node with no data") + + @property + def variables(self): + if self.has_data: + return self.ds.variables + else: + raise AttributeError("property is not defined for a node with no data") + + @property + def encoding(self): + if self.has_data: + return self.ds.encoding + else: + raise AttributeError("property is not defined for a node with no data") + + @property + def sizes(self): + if self.has_data: + return self.ds.sizes + else: + raise AttributeError("property is not defined for a node with no data") + + @property + def attrs(self): + if self.has_data: + return self.ds.attrs + else: + raise AttributeError("property is not defined for a node with no data") + + dims.__doc__ = Dataset.dims.__doc__ + variables.__doc__ = Dataset.variables.__doc__ + encoding.__doc__ = Dataset.encoding.__doc__ + sizes.__doc__ = Dataset.sizes.__doc__ + attrs.__doc__ = Dataset.attrs.__doc__ + + +class DatasetNode(TreeNode, DatasetPropertiesMixin): """ A tree node, but optionally containing data in the form of an xarray.Dataset. @@ -122,13 +170,13 @@ def __init__( super().__init__(name=name, parent=parent, children=children) self.ds = data - # Expose properties of wrapped Dataset + # TODO if self.ds = None what will happen? - for property_name in self._DS_PROPERTIES: - ds_property = getattr(Dataset, property_name) - setattr(self, property_name, ds_property) + #for property_name in self._DS_PROPERTIES: + # ds_property = getattr(Dataset, property_name) + # setattr(self, property_name, ds_property) - # Enable dataset API methods + # Add methods defined in Dataset's class definition to this classes API, but wrapped to map over descendants too for method_name in self._DS_METHODS_TO_MAP_OVER_SUBTREES: # Expose Dataset method, but wrapped to map over whole subtree ds_method = getattr(Dataset, method_name) @@ -140,6 +188,10 @@ def __init__( updated_method_docstring = ds_method_docstring.replace('\n', self._MAPPED_DOCSTRING_ADDENDUM, 1) setattr(self, f'{method_name}.__doc__', updated_method_docstring) + # TODO wrap methods for ops too, such as those in DatasetOpsMixin + + # TODO map applied ufuncs over all leaves + @property def ds(self) -> Dataset: return self._ds @@ -402,8 +454,6 @@ def map_over_subtree_inplace( if node.has_data: node.ds = func(node.ds, *args, **kwargs) - # TODO map applied ufuncs over all leaves - def render(self): """Print tree structure, including any data stored at each node.""" for pre, fill, node in anytree.RenderTree(self): diff --git a/xarray/datatree_/datatree/tests/test_dataset_api.py b/xarray/datatree_/datatree/tests/test_dataset_api.py index 89cd2f42b2f..e6e9336b115 100644 --- a/xarray/datatree_/datatree/tests/test_dataset_api.py +++ b/xarray/datatree_/datatree/tests/test_dataset_api.py @@ -66,8 +66,31 @@ def test_map_over_subtree_inplace(self): raise NotImplementedError -class TestDSPropertyInheritance: - ... +class TestDSProperties: + def test_properties(self): + da_a = xr.DataArray(name='a', data=[0, 2], dims=['x']) + da_b = xr.DataArray(name='b', data=[5, 6, 7], dims=['y']) + ds = xr.Dataset({'a': da_a, 'b': da_b}) + dt = DatasetNode('root', data=ds) + + assert dt.attrs == dt.ds.attrs + assert dt.encoding == dt.ds.encoding + assert dt.dims == dt.ds.dims + assert dt.sizes == dt.ds.sizes + assert dt.variables == dt.ds.variables + + def test_no_data_no_properties(self): + dt = DatasetNode('root', data=None) + with pytest.raises(AttributeError): + dt.attrs + with pytest.raises(AttributeError): + dt.encoding + with pytest.raises(AttributeError): + dt.dims + with pytest.raises(AttributeError): + dt.sizes + with pytest.raises(AttributeError): + dt.variables class TestDSMethodInheritance: From 6408ddcf36c501b24877a016cd3cb33871655b75 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Mon, 23 Aug 2021 16:52:47 -0400 Subject: [PATCH 027/260] attempt to implement the same ops defined on Dataset --- xarray/datatree_/datatree/datatree.py | 57 ++++++++++++++++--- .../datatree/tests/test_dataset_api.py | 13 ++++- 2 files changed, 59 insertions(+), 11 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index b8c4c6042bc..baa5917055b 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -1,6 +1,7 @@ from __future__ import annotations import functools import textwrap +import inspect from typing import Mapping, Hashable, Union, List, Any, Callable, Iterable, Dict @@ -11,6 +12,7 @@ from xarray.core.variable import Variable from xarray.core.combine import merge from xarray.core import dtypes, utils +from xarray.core._typed_ops import DatasetOpsMixin from .treenode import TreeNode, PathType @@ -139,7 +141,50 @@ def attrs(self): attrs.__doc__ = Dataset.attrs.__doc__ -class DatasetNode(TreeNode, DatasetPropertiesMixin): +_MAPPED_DOCSTRING_ADDENDUM = textwrap.fill("This method was copied from xarray.Dataset, but has been altered to " + "call the method on the Datasets stored in every node of the subtree. " + "See the datatree.map_over_subtree decorator for more details.", + width=117) + + +def _expose_methods_wrapped_to_map_over_subtree(obj, method_name, method): + """ + Expose given method on node object, but wrapped to map over whole subtree, not just that node object. + + Result is like having written this in obj's class definition + + @map_over_subtree + def method_name(self, *args, **kwargs): + return self.method(*args, **kwargs) + """ + + # Expose Dataset method, but wrapped to map over whole subtree when called + setattr(obj, method_name, map_over_subtree(method)) + + # TODO do we really need this for ops like __add__? + # Add a line to the method's docstring explaining how it's been mapped + method_docstring = method.__doc__ + if method_docstring is not None: + updated_op_docstring = method_docstring.replace('\n', _MAPPED_DOCSTRING_ADDENDUM, 1) + setattr(obj, f'{method_name}.__doc__', method_docstring) + + +_DATASET_OPS_TO_EXCLUDE = ['__str__', '__repr__'] + + +class DataTreeOpsMixin: + """Mixin to add ops like __add__, but wrapped to map over subtrees.""" + + dataset_methods = inspect.getmembers(DatasetOpsMixin, inspect.isfunction) + ops_to_expose = [(name, method) for name, method in dataset_methods if name not in _DATASET_OPS_TO_EXCLUDE] + + # TODO is there a way to put this code in the class definition so we don't have to specifically call this method? + def _add_ops(self): + for method_name, method in self.ops_to_expose: + _expose_methods_wrapped_to_map_over_subtree(self, method_name, method) + + +class DatasetNode(TreeNode, DatasetPropertiesMixin, DataTreeOpsMixin): """ A tree node, but optionally containing data in the form of an xarray.Dataset. @@ -149,7 +194,6 @@ class DatasetNode(TreeNode, DatasetPropertiesMixin): # TODO should this instead be a subclass of Dataset? # TODO add any other properties (maybe dask ones?) - _DS_PROPERTIES = ['variables', 'attrs', 'encoding', 'dims', 'sizes'] # TODO add all the other methods to dispatch _DS_METHODS_TO_MAP_OVER_SUBTREES = ['isel', 'sel', 'min', 'max', 'mean', '__array_ufunc__'] @@ -170,11 +214,8 @@ def __init__( super().__init__(name=name, parent=parent, children=children) self.ds = data - - # TODO if self.ds = None what will happen? - #for property_name in self._DS_PROPERTIES: - # ds_property = getattr(Dataset, property_name) - # setattr(self, property_name, ds_property) + # Add ops like __add__, but wrapped to map over subtrees + self._add_ops() # Add methods defined in Dataset's class definition to this classes API, but wrapped to map over descendants too for method_name in self._DS_METHODS_TO_MAP_OVER_SUBTREES: @@ -188,8 +229,6 @@ def __init__( updated_method_docstring = ds_method_docstring.replace('\n', self._MAPPED_DOCSTRING_ADDENDUM, 1) setattr(self, f'{method_name}.__doc__', updated_method_docstring) - # TODO wrap methods for ops too, such as those in DatasetOpsMixin - # TODO map applied ufuncs over all leaves @property diff --git a/xarray/datatree_/datatree/tests/test_dataset_api.py b/xarray/datatree_/datatree/tests/test_dataset_api.py index e6e9336b115..ecb676495bd 100644 --- a/xarray/datatree_/datatree/tests/test_dataset_api.py +++ b/xarray/datatree_/datatree/tests/test_dataset_api.py @@ -97,8 +97,17 @@ class TestDSMethodInheritance: ... -class TestBinaryOps: - ... +class TestOps: + def test_multiplication(self): + ds1 = xr.Dataset({'a': [5], 'b': [3]}) + ds2 = xr.Dataset({'x': [0.1, 0.2], 'y': [10, 20]}) + dt = DatasetNode('root', data=ds1) + DatasetNode('subnode', data=ds2, parent=dt) + + print(dir(dt)) + + result = dt * dt + print(result) class TestUFuncs: From 9b4f40dc5ec08429822afa6c322c2f467bae6c2b Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Mon, 23 Aug 2021 17:24:07 -0400 Subject: [PATCH 028/260] copied BenBovy's code to open all groups in a netCDF file --- xarray/datatree_/datatree/__init__.py | 1 + xarray/datatree_/datatree/io.py | 59 ++++++++++++++------------- 2 files changed, 31 insertions(+), 29 deletions(-) diff --git a/xarray/datatree_/datatree/__init__.py b/xarray/datatree_/datatree/__init__.py index 980e1fb9da4..46174967da7 100644 --- a/xarray/datatree_/datatree/__init__.py +++ b/xarray/datatree_/datatree/__init__.py @@ -1 +1,2 @@ from .datatree import DataTree, map_over_subtree +from .io import open_datatree, open_mfdatatree diff --git a/xarray/datatree_/datatree/io.py b/xarray/datatree_/datatree/io.py index 9bd0e3b02fc..6f2f130961d 100644 --- a/xarray/datatree_/datatree/io.py +++ b/xarray/datatree_/datatree/io.py @@ -1,43 +1,44 @@ -from typing import Sequence +from typing import Sequence, Dict +import os -from netCDF4 import Dataset as nc_dataset +import netCDF4 from xarray import open_dataset -from .datatree import DataTree, PathType +from .datatree import DataTree, DatasetNode, PathType -def _get_group_names(file): - rootgrp = nc_dataset("test.nc", "r", format="NETCDF4") +def _open_group_children_recursively(filename, node, ncgroup, chunks, **kwargs): + for g in ncgroup.groups.values(): - def walktree(top): - yield top.groups.values() - for value in top.groups.values(): - yield from walktree(value) + # Open and add this node's dataset to the tree + name = os.path.basename(g.path) + ds = open_dataset(filename, group=g.path, chunks=chunks, **kwargs) + child_node = DatasetNode(name, ds) + node.add_child(child_node) - groups = [] - for children in walktree(rootgrp): - for child in children: - # TODO include parents in saved path - groups.append(child.name) + _open_group_children_recursively(filename, node[name], g, chunks, **kwargs) - rootgrp.close() - return groups - -def open_datatree(filename_or_obj, engine=None, chunks=None, **kwargs) -> DataTree: - """ - Open and decode a dataset from a file or file-like object, creating one DataTree node - for each group in the file. +def open_datatree(filename: str, chunks: Dict = None, **kwargs) -> DataTree: """ + Open and decode a dataset from a file or file-like object, creating one Tree node for each group in the file. - # TODO find all the netCDF groups in the file - file_groups = _get_group_names(filename_or_obj) + Parameters + ---------- + filename + chunks + + Returns + ------- + DataTree + """ - # Populate the DataTree with the groups - groups_and_datasets = {group_path: open_dataset(engine=engine, chunks=chunks, **kwargs) - for group_path in file_groups} - return DataTree(data_objects=groups_and_datasets) + with netCDF4.Dataset(filename, mode='r') as ncfile: + ds = open_dataset(filename, chunks=chunks, **kwargs) + tree_root = DataTree(data_objects={'root': ds}) + _open_group_children_recursively(filename, tree_root, ncfile, chunks, **kwargs) + return tree_root def open_mfdatatree(filepaths, rootnames: Sequence[PathType] = None, engine=None, chunks=None, **kwargs) -> DataTree: @@ -55,8 +56,8 @@ def open_mfdatatree(filepaths, rootnames: Sequence[PathType] = None, engine=None full_tree = DataTree() for file, root in zip(filepaths, rootnames): - dt = open_datatree(file, engine=engine, chunks=chunks, **kwargs) - full_tree._set_item(path=root, value=dt, new_nodes_along_path=True, allow_overwrites=False) + dt = open_datatree(file, chunks=chunks, **kwargs) + full_tree.set_node(path=root, node=dt, new_nodes_along_path=True, allow_overwrite=False) return full_tree From 23316019e0bd534451c2529e9c0df915f706bc4c Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Mon, 23 Aug 2021 19:20:47 -0400 Subject: [PATCH 029/260] condensed DatasetNode and DataTree into a single DataTree class --- xarray/datatree_/datatree/__init__.py | 2 +- xarray/datatree_/datatree/datatree.py | 209 ++++++++++-------- .../datatree/tests/test_dataset_api.py | 13 +- .../datatree_/datatree/tests/test_datatree.py | 102 ++++----- xarray/datatree_/datatree/treenode.py | 20 +- 5 files changed, 185 insertions(+), 161 deletions(-) diff --git a/xarray/datatree_/datatree/__init__.py b/xarray/datatree_/datatree/__init__.py index 980e1fb9da4..15ec0665e91 100644 --- a/xarray/datatree_/datatree/__init__.py +++ b/xarray/datatree_/datatree/__init__.py @@ -1 +1 @@ -from .datatree import DataTree, map_over_subtree +from .datatree import DataTree, map_over_subtree, DataNode diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index b8c4c6042bc..69c2af971b3 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -12,27 +12,27 @@ from xarray.core.combine import merge from xarray.core import dtypes, utils -from .treenode import TreeNode, PathType +from .treenode import TreeNode, PathType, _init_single_treenode """ -The structure of a populated Datatree looks like this in terms of classes: +The structure of a populated Datatree looks roughly like this: DataTree("root name") -|-- DatasetNode("weather") -| |-- DatasetNode("temperature") -| | |-- DataArrayNode("sea_surface_temperature") -| | |-- DataArrayNode("dew_point_temperature") -| |-- DataArrayNode("wind_speed") -| |-- DataArrayNode("pressure") -|-- DatasetNode("satellite image") -| |-- DatasetNode("infrared") -| | |-- DataArrayNode("near_infrared") -| | |-- DataArrayNode("far_infrared") -| |-- DataArrayNode("true_colour") -|-- DataTreeNode("topography") -| |-- DatasetNode("elevation") -| | |-- DataArrayNode("height_above_sea_level") -|-- DataArrayNode("population") +|-- DataNode("weather") +| | Variable("wind_speed") +| | Variable("pressure") +| |-- DataNode("temperature") +| | Variable("sea_surface_temperature") +| | Variable("dew_point_temperature") +|-- DataNode("satellite image") +| | Variable("true_colour") +| |-- DataNode("infrared") +| | Variable("near_infrared") +| | Variable("far_infrared") +|-- DataNode("topography") +| |-- DataNode("elevation") +| | |-- Variable("height_above_sea_level") +|-- DataNode("population") """ @@ -76,7 +76,7 @@ def _map_over_subtree(tree, *args, **kwargs): # Recreate and act on root node # TODO make this of class DataTree - out_tree = DatasetNode(name=tree.name, data=tree.ds) + out_tree = DataNode(name=tree.name, data=tree.ds) if out_tree.has_data: out_tree.ds = func(out_tree.ds, *args, **kwargs) @@ -139,15 +139,44 @@ def attrs(self): attrs.__doc__ = Dataset.attrs.__doc__ -class DatasetNode(TreeNode, DatasetPropertiesMixin): +class DataTree(TreeNode, DatasetPropertiesMixin): """ - A tree node, but optionally containing data in the form of an xarray.Dataset. + A tree-like hierarchical collection of xarray objects. Attempts to present the API of xarray.Dataset, but methods are wrapped to also update all the tree's child nodes. + + Parameters + ---------- + data_objects : dict-like, optional + A mapping from path names to xarray.Dataset, xarray.DataArray, or xtree.DataTree objects. + + Path names can be given as unix-like paths, or as tuples of strings (where each string + is known as a single "tag"). If path names containing more than one tag are given, new + tree nodes will be constructed as necessary. + + To assign data to the root node of the tree {name} as the path. + name : Hashable, optional + Name for the root node of the tree. Default is "root" + + See also + -------- + DataNode : Shortcut to create a DataTree with only a single node. """ # TODO should this instead be a subclass of Dataset? + # TODO Add attrs dict + + # TODO attribute-like access for both vars and child nodes (by inheriting from xarray.core.common.AttrsAccessMixin?) + + # TODO ipython autocomplete for child nodes + + # TODO Some way of sorting children by depth + + # TODO Consistency in copying vs updating objects + + # TODO do we need a watch out for if methods intended only for root nodes are called on non-root nodes? + # TODO add any other properties (maybe dask ones?) _DS_PROPERTIES = ['variables', 'attrs', 'encoding', 'dims', 'sizes'] @@ -162,20 +191,36 @@ class DatasetNode(TreeNode, DatasetPropertiesMixin): def __init__( self, - name: Hashable, - data: Dataset = None, - parent: TreeNode = None, - children: List[TreeNode] = None, + data_objects: Dict[PathType, Union[Dataset, DataArray]] = None, + name: Hashable = "root", ): - super().__init__(name=name, parent=parent, children=children) - self.ds = data + # First create the root node + super().__init__(name=name, parent=None, children=None) + if data_objects: + root_data = data_objects.pop(name, None) + else: + root_data = None + self.ds = root_data + + if data_objects: + # Populate tree with children determined from data_objects mapping + for path, data in data_objects.items(): + # Determine name of new node + path = self._tuple_or_path_to_path(path) + if self.separator in path: + node_path, node_name = path.rsplit(self.separator, maxsplit=1) + else: + node_path, node_name = '/', path + # Create and set new node + new_node = DataNode(name=node_name, data=data) + self.set_node(node_path, new_node, allow_overwrite=False, new_nodes_along_path=True) + new_node = self.get_node(path) + new_node[path] = data - # TODO if self.ds = None what will happen? - #for property_name in self._DS_PROPERTIES: - # ds_property = getattr(Dataset, property_name) - # setattr(self, property_name, ds_property) + self._add_method_api() + def _add_method_api(self): # Add methods defined in Dataset's class definition to this classes API, but wrapped to map over descendants too for method_name in self._DS_METHODS_TO_MAP_OVER_SUBTREES: # Expose Dataset method, but wrapped to map over whole subtree @@ -208,6 +253,40 @@ def ds(self, data: Union[Dataset, DataArray] = None): def has_data(self): return self.ds is not None + @classmethod + def _init_single_datatree_node( + cls, + name: Hashable, + data: Dataset = None, + parent: TreeNode = None, + children: List[TreeNode] = None, + ): + """ + Create a single node of a DataTree, which optionally contains data in the form of an xarray.Dataset. + + Parameters + ---------- + name : Hashable + Name for the root node of the tree. Default is "root" + data : Dataset, DataArray, Variable or None, optional + Data to store under the .ds attribute of this node. DataArrays and Variables will be promoted to Datasets. + Default is None. + parent : TreeNode, optional + Parent node to this node. Default is None. + children : Sequence[TreeNode], optional + Any child nodes of this node. Default is None. + + Returns + ------- + node : DataTree + """ + + # This approach was inspired by xarray.Dataset._construct_direct() + obj = object.__new__(cls) + obj = _init_single_treenode(obj, name=name, parent=parent, children=children) + obj.ds = data + return obj + def __str__(self): """A printable representation of the structure of this entire subtree.""" renderer = anytree.RenderTree(self) @@ -231,7 +310,7 @@ def __str__(self): def _single_node_repr(self): """Information about this node, not including its relationships to other nodes.""" - node_info = f"DatasetNode('{self.name}')" + node_info = f"DataNode('{self.name}')" if self.has_data: ds_info = '\n' + repr(self.ds) @@ -243,7 +322,7 @@ def __repr__(self): """Information about this node, including its relationships to other nodes.""" # TODO redo this to look like the Dataset repr, but just with child and parent info parent = self.parent.name if self.parent else "None" - node_str = f"DatasetNode(name='{self.name}', parent='{parent}', children={[c.name for c in self.children]}," + node_str = f"DataNode(name='{self.name}', parent='{parent}', children={[c.name for c in self.children]}," if self.has_data: ds_repr_lines = self.ds.__repr__().splitlines() @@ -383,7 +462,7 @@ def __setitem__( else: # if nothing there then make new node based on type of object if isinstance(value, (Dataset, DataArray, Variable)) or value is None: - new_node = DatasetNode(name=last_tag, data=value) + new_node = DataNode(name=last_tag, data=value) self.set_node(path=path_tags, node=new_node) elif isinstance(value, TreeNode): self.set_node(path=path, node=value) @@ -457,7 +536,7 @@ def map_over_subtree_inplace( def render(self): """Print tree structure, including any data stored at each node.""" for pre, fill, node in anytree.RenderTree(self): - print(f"{pre}DatasetNode('{self.name}')") + print(f"{pre}DataNode('{self.name}')") for ds_line in repr(node.ds)[1:]: print(f"{fill}{ds_line}") @@ -480,65 +559,6 @@ def get_any(self, *tags: Hashable) -> DataTree: if any(tag in c.tags for tag in tags)} return DataTree(data_objects=matching_children) - -class DataTree(DatasetNode): - """ - A tree-like hierarchical collection of xarray objects. - - Parameters - ---------- - data_objects : dict-like, optional - A mapping from path names to xarray.Dataset, xarray.DataArray, or xtree.DataTree objects. - - Path names can be given as unix-like paths, or as tuples of strings (where each string - is known as a single "tag"). If path names containing more than one tag are given, new - tree nodes will be constructed as necessary. - - To assign data to the root node of the tree {name} as the path. - name : Hashable, optional - Name for the root node of the tree. Default is "root" - """ - - # TODO Add attrs dict - - # TODO attribute-like access for both vars and child nodes (by inheriting from xarray.core.common.AttrsAccessMixin?) - - # TODO ipython autocomplete for child nodes - - # TODO Some way of sorting children by depth - - # TODO Consistency in copying vs updating objects - - def __init__( - self, - data_objects: Dict[PathType, Union[Dataset, DataArray]] = None, - name: Hashable = "root", - ): - if data_objects is not None: - root_data = data_objects.pop(name, None) - else: - root_data = None - super().__init__(name=name, data=root_data, parent=None, children=None) - - # TODO re-implement using anytree.DictImporter? - if data_objects: - # Populate tree with children determined from data_objects mapping - for path, data in data_objects.items(): - # Determine name of new node - path = self._tuple_or_path_to_path(path) - if self.separator in path: - node_path, node_name = path.rsplit(self.separator, maxsplit=1) - else: - node_path, node_name = '/', path - - # Create and set new node - new_node = DatasetNode(name=node_name, data=data) - self.set_node(node_path, new_node, allow_overwrite=False, new_nodes_along_path=True) - new_node = self.get_node(path) - new_node[path] = data - - # TODO do we need a watch out for if methods intended only for root nodes are calle on non-root nodes? - @property def chunks(self): raise NotImplementedError @@ -581,3 +601,6 @@ def to_netcdf(self, filename: str): def plot(self): raise NotImplementedError + + +DataNode = DataTree._init_single_datatree_node diff --git a/xarray/datatree_/datatree/tests/test_dataset_api.py b/xarray/datatree_/datatree/tests/test_dataset_api.py index e6e9336b115..ea3cd920fd4 100644 --- a/xarray/datatree_/datatree/tests/test_dataset_api.py +++ b/xarray/datatree_/datatree/tests/test_dataset_api.py @@ -3,8 +3,7 @@ import xarray as xr from xarray.testing import assert_equal -from datatree import DataTree, map_over_subtree -from datatree.datatree import DatasetNode +from datatree import DataTree, DataNode, map_over_subtree from test_datatree import create_test_datatree @@ -21,7 +20,7 @@ def times_ten(ds): # TODO write an assert_tree_equal function for result_node, original_node, in zip(result_tree.subtree_nodes, dt.subtree_nodes): - assert isinstance(result_node, DatasetNode) + assert isinstance(result_node, DataTree) if original_node.has_data: assert_equal(result_node.ds, original_node.ds * 10.0) @@ -38,7 +37,7 @@ def multiply_then_add(ds, times, add=0.0): result_tree = multiply_then_add(dt, 10.0, add=2.0) for result_node, original_node, in zip(result_tree.subtree_nodes, dt.subtree_nodes): - assert isinstance(result_node, DatasetNode) + assert isinstance(result_node, DataTree) if original_node.has_data: assert_equal(result_node.ds, (original_node.ds * 10.0) + 2.0) @@ -54,7 +53,7 @@ def multiply_then_add(ds, times, add=0.0): result_tree = dt.map_over_subtree(multiply_then_add, 10.0, add=2.0) for result_node, original_node, in zip(result_tree.subtree_nodes, dt.subtree_nodes): - assert isinstance(result_node, DatasetNode) + assert isinstance(result_node, DataTree) if original_node.has_data: assert_equal(result_node.ds, (original_node.ds * 10.0) + 2.0) @@ -71,7 +70,7 @@ def test_properties(self): da_a = xr.DataArray(name='a', data=[0, 2], dims=['x']) da_b = xr.DataArray(name='b', data=[5, 6, 7], dims=['y']) ds = xr.Dataset({'a': da_a, 'b': da_b}) - dt = DatasetNode('root', data=ds) + dt = DataNode('root', data=ds) assert dt.attrs == dt.ds.attrs assert dt.encoding == dt.ds.encoding @@ -80,7 +79,7 @@ def test_properties(self): assert dt.variables == dt.ds.variables def test_no_data_no_properties(self): - dt = DatasetNode('root', data=None) + dt = DataNode('root', data=None) with pytest.raises(AttributeError): dt.attrs with pytest.raises(AttributeError): diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index f31ced24533..f3b0ba1305f 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -5,8 +5,7 @@ from anytree.resolver import ChildResolverError -from datatree import DataTree -from datatree.datatree import DatasetNode +from datatree import DataTree, DataNode def create_test_datatree(): @@ -45,15 +44,13 @@ def create_test_datatree(): # Avoid using __init__ so we can independently test it # TODO change so it has a DataTree at the bottom - root = DatasetNode(name='root', data=root_data) - set1 = DatasetNode(name="set1", parent=root, data=set1_data) - set1_set1 = DatasetNode(name="set1", parent=set1) - set1_set2 = DatasetNode(name="set2", parent=set1) - set2 = DatasetNode(name="set2", parent=root, data=set2_data) - set2_set1 = DatasetNode(name="set1", parent=set2) - set3 = DatasetNode(name="set3", parent=root) - - #print(repr(root)) + root = DataNode(name='root', data=root_data) + set1 = DataNode(name="set1", parent=root, data=set1_data) + set1_set1 = DataNode(name="set1", parent=set1) + set1_set2 = DataNode(name="set2", parent=set1) + set2 = DataNode(name="set2", parent=root, data=set2_data) + set2_set1 = DataNode(name="set1", parent=set2) + set3 = DataNode(name="set3", parent=root) return root @@ -61,13 +58,13 @@ def create_test_datatree(): class TestStoreDatasets: def test_create_datanode(self): dat = xr.Dataset({'a': 0}) - john = DatasetNode("john", data=dat) + john = DataNode("john", data=dat) assert john.ds is dat with pytest.raises(TypeError): - DatasetNode("mary", parent=john, data="junk") + DataNode("mary", parent=john, data="junk") def test_set_data(self): - john = DatasetNode("john") + john = DataNode("john") dat = xr.Dataset({'a': 0}) john.ds = dat assert john.ds is dat @@ -75,83 +72,83 @@ def test_set_data(self): john.ds = "junk" def test_has_data(self): - john = DatasetNode("john", data=xr.Dataset({'a': 0})) + john = DataNode("john", data=xr.Dataset({'a': 0})) assert john.has_data - john = DatasetNode("john", data=None) + john = DataNode("john", data=None) assert not john.has_data class TestGetItems: def test_get_node(self): - folder1 = DatasetNode("folder1") - results = DatasetNode("results", parent=folder1) - highres = DatasetNode("highres", parent=results) + folder1 = DataNode("folder1") + results = DataNode("results", parent=folder1) + highres = DataNode("highres", parent=results) assert folder1["results"] is results assert folder1["results/highres"] is highres assert folder1[("results", "highres")] is highres def test_get_single_data_variable(self): data = xr.Dataset({"temp": [0, 50]}) - results = DatasetNode("results", data=data) + results = DataNode("results", data=data) assert_identical(results["temp"], data["temp"]) def test_get_single_data_variable_from_node(self): data = xr.Dataset({"temp": [0, 50]}) - folder1 = DatasetNode("folder1") - results = DatasetNode("results", parent=folder1) - highres = DatasetNode("highres", parent=results, data=data) + folder1 = DataNode("folder1") + results = DataNode("results", parent=folder1) + highres = DataNode("highres", parent=results, data=data) assert_identical(folder1["results/highres/temp"], data["temp"]) assert_identical(folder1[("results", "highres", "temp")], data["temp"]) def test_get_nonexistent_node(self): - folder1 = DatasetNode("folder1") - results = DatasetNode("results", parent=folder1) + folder1 = DataNode("folder1") + results = DataNode("results", parent=folder1) with pytest.raises(ChildResolverError): folder1["results/highres"] def test_get_nonexistent_variable(self): data = xr.Dataset({"temp": [0, 50]}) - results = DatasetNode("results", data=data) + results = DataNode("results", data=data) with pytest.raises(ChildResolverError): results["pressure"] def test_get_multiple_data_variables(self): data = xr.Dataset({"temp": [0, 50], "p": [5, 8, 7]}) - results = DatasetNode("results", data=data) + results = DataNode("results", data=data) assert_identical(results[['temp', 'p']], data[['temp', 'p']]) def test_dict_like_selection_access_to_dataset(self): data = xr.Dataset({"temp": [0, 50]}) - results = DatasetNode("results", data=data) + results = DataNode("results", data=data) assert_identical(results[{'temp': 1}], data[{'temp': 1}]) class TestSetItems: # TODO test tuple-style access too def test_set_new_child_node(self): - john = DatasetNode("john") - mary = DatasetNode("mary") + john = DataNode("john") + mary = DataNode("mary") john['/'] = mary assert john['mary'] is mary def test_set_new_grandchild_node(self): - john = DatasetNode("john") - mary = DatasetNode("mary", parent=john) - rose = DatasetNode("rose") + john = DataNode("john") + mary = DataNode("mary", parent=john) + rose = DataNode("rose") john['mary/'] = rose assert john['mary/rose'] is rose def test_set_new_empty_node(self): - john = DatasetNode("john") + john = DataNode("john") john['mary'] = None mary = john['mary'] - assert isinstance(mary, DatasetNode) + assert isinstance(mary, DataTree) assert mary.ds is None def test_overwrite_data_in_node_with_none(self): - john = DatasetNode("john") - mary = DatasetNode("mary", parent=john, data=xr.Dataset()) + john = DataNode("john") + mary = DataNode("mary", parent=john, data=xr.Dataset()) john['mary'] = None assert mary.ds is None @@ -161,47 +158,47 @@ def test_overwrite_data_in_node_with_none(self): def test_set_dataset_on_this_node(self): data = xr.Dataset({"temp": [0, 50]}) - results = DatasetNode("results") + results = DataNode("results") results['/'] = data assert results.ds is data def test_set_dataset_as_new_node(self): data = xr.Dataset({"temp": [0, 50]}) - folder1 = DatasetNode("folder1") + folder1 = DataNode("folder1") folder1['results'] = data assert folder1['results'].ds is data def test_set_dataset_as_new_node_requiring_intermediate_nodes(self): data = xr.Dataset({"temp": [0, 50]}) - folder1 = DatasetNode("folder1") + folder1 = DataNode("folder1") folder1['results/highres'] = data assert folder1['results/highres'].ds is data def test_set_named_dataarray_as_new_node(self): data = xr.DataArray(name='temp', data=[0, 50]) - folder1 = DatasetNode("folder1") + folder1 = DataNode("folder1") folder1['results'] = data assert_identical(folder1['results'].ds, data.to_dataset()) def test_set_unnamed_dataarray(self): data = xr.DataArray([0, 50]) - folder1 = DatasetNode("folder1") + folder1 = DataNode("folder1") with pytest.raises(ValueError, match="unable to convert"): folder1['results'] = data def test_add_new_variable_to_empty_node(self): - results = DatasetNode("results") + results = DataNode("results") results['/'] = xr.DataArray(name='pressure', data=[2, 3]) assert 'pressure' in results.ds # What if there is a path to traverse first? - results = DatasetNode("results") + results = DataNode("results") results['highres/'] = xr.DataArray(name='pressure', data=[2, 3]) assert 'pressure' in results['highres'].ds def test_dataarray_replace_existing_node(self): t = xr.Dataset({"temp": [0, 50]}) - results = DatasetNode("results", data=t) + results = DataNode("results", data=t) p = xr.DataArray(name='pressure', data=[2, 3]) results['/'] = p assert_identical(results.ds, p.to_dataset()) @@ -257,15 +254,15 @@ class TestRestructuring: class TestRepr: def test_print_empty_node(self): - dt = DatasetNode('root') + dt = DataNode('root') printout = dt.__str__() - assert printout == "DatasetNode('root')" + assert printout == "DataNode('root')" def test_print_node_with_data(self): dat = xr.Dataset({'a': [0, 2]}) - dt = DatasetNode('root', data=dat) + dt = DataNode('root', data=dat) printout = dt.__str__() - expected = ["DatasetNode('root')", + expected = ["DataNode('root')", "Dimensions", "Coordinates", "a", @@ -276,8 +273,8 @@ def test_print_node_with_data(self): def test_nested_node(self): dat = xr.Dataset({'a': [0, 2]}) - root = DatasetNode('root') - DatasetNode('results', data=dat, parent=root) + root = DataNode('root') + DataNode('results', data=dat, parent=root) printout = root.__str__() assert printout.splitlines()[2].startswith(" ") @@ -287,11 +284,10 @@ def test_print_datatree(self): print(dt.descendants) # TODO work out how to test something complex like this - assert False def test_repr_of_node_with_data(self): dat = xr.Dataset({'a': [0, 2]}) - dt = DatasetNode('root', data=dat) + dt = DataNode('root', data=dat) assert "Coordinates" in repr(dt) diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index 386c4e4c8f4..d0f43514e83 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -8,6 +8,18 @@ PathType = Union[Hashable, Sequence[Hashable]] +def _init_single_treenode(obj, name, parent, children): + if not isinstance(name, str) or '/' in name: + raise ValueError(f"invalid name {name}") + obj.name = name + + obj.parent = parent + if children: + obj.children = children + + return obj + + class TreeNode(anytree.NodeMixin): """ Base class representing a node of a tree, with methods for traversing and altering the tree. @@ -38,13 +50,7 @@ def __init__( parent: TreeNode = None, children: Iterable[TreeNode] = None, ): - if not isinstance(name, str) or '/' in name: - raise ValueError(f"invalid name {name}") - self.name = name - - self.parent = parent - if children: - self.children = children + _init_single_treenode(self, name=name, parent=parent, children=children) def __str__(self): """A printable representation of the structure of this entire subtree.""" From 333e8a52639bf91a03aa5007a42eba58983700fd Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Mon, 23 Aug 2021 22:04:56 -0400 Subject: [PATCH 030/260] successfully maps ds.isel() over tree --- xarray/datatree_/datatree/datatree.py | 93 +++++++++++-------- .../datatree/tests/test_dataset_api.py | 32 ++++++- 2 files changed, 86 insertions(+), 39 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index c5903aad981..397307bc926 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -133,6 +133,8 @@ def attrs(self): else: raise AttributeError("property is not defined for a node with no data") + # TODO .loc + dims.__doc__ = Dataset.dims.__doc__ variables.__doc__ = Dataset.variables.__doc__ encoding.__doc__ = Dataset.encoding.__doc__ @@ -142,33 +144,70 @@ def attrs(self): _MAPPED_DOCSTRING_ADDENDUM = textwrap.fill("This method was copied from xarray.Dataset, but has been altered to " "call the method on the Datasets stored in every node of the subtree. " - "See the datatree.map_over_subtree decorator for more details.", - width=117) + "See the `map_over_subtree` decorator for more details.", width=117) def _expose_methods_wrapped_to_map_over_subtree(obj, method_name, method): """ Expose given method on node object, but wrapped to map over whole subtree, not just that node object. - Result is like having written this in obj's class definition + Result is like having written this in obj's class definition: + ``` @map_over_subtree def method_name(self, *args, **kwargs): return self.method(*args, **kwargs) + ``` """ # Expose Dataset method, but wrapped to map over whole subtree when called - setattr(obj, method_name, map_over_subtree(method)) + # TODO should we be using functools.partialmethod here instead? + mapped_over_tree = functools.partial(map_over_subtree(method), obj) + setattr(obj, method_name, mapped_over_tree) # TODO do we really need this for ops like __add__? # Add a line to the method's docstring explaining how it's been mapped method_docstring = method.__doc__ if method_docstring is not None: - updated_op_docstring = method_docstring.replace('\n', _MAPPED_DOCSTRING_ADDENDUM, 1) - setattr(obj, f'{method_name}.__doc__', method_docstring) + updated_method_docstring = method_docstring.replace('\n', _MAPPED_DOCSTRING_ADDENDUM, 1) + setattr(obj, f'{method_name}.__doc__', updated_method_docstring) + + +# TODO equals, broadcast_equals etc. +# TODO do dask-related private methods need to be exposed? +_DATASET_DASK_METHODS_TO_EXPOSE = ['load', 'compute', 'persist', 'unify_chunks', 'chunk', 'map_blocks'] +_DATASET_METHODS_TO_EXPOSE = ['copy', 'as_numpy', '__copy__', '__deepcopy__', '__contains__', '__len__', + '__bool__', '__iter__', '__array__', 'set_coords', 'reset_coords', 'info', + 'isel', 'sel', 'head', 'tail', 'thin', 'broadcast_like', 'reindex_like', + 'reindex', 'interp', 'interp_like', 'rename', 'rename_dims', 'rename_vars', + 'swap_dims', 'expand_dims', 'set_index', 'reset_index', 'reorder_levels', 'stack', + 'unstack', 'update', 'merge', 'drop_vars', 'drop_sel', 'drop_isel', 'drop_dims', + 'transpose', 'dropna', 'fillna', 'interpolate_na', 'ffill', 'bfill', 'combine_first', + 'reduce', 'map', 'assign', 'diff', 'shift', 'roll', 'sortby', 'quantile', 'rank', + 'differentiate', 'integrate', 'cumulative_integrate', 'filter_by_attrs', 'polyfit', + 'pad', 'idxmin', 'idxmax', 'argmin', 'argmax', 'query', 'curvefit'] +_DATASET_OPS_TO_EXPOSE = ['_unary_op', '_binary_op', '_inplace_binary_op'] +_ALL_DATASET_METHODS_TO_EXPOSE = _DATASET_DASK_METHODS_TO_EXPOSE + _DATASET_METHODS_TO_EXPOSE + _DATASET_OPS_TO_EXPOSE + +# TODO methods which should not or cannot act over the whole tree, such as .to_array + + +class DatasetMethodsMixin: + """Mixin to add Dataset methods like .mean(), but wrapped to map over all nodes in the subtree.""" + + # TODO is there a way to put this code in the class definition so we don't have to specifically call this method? + def _add_dataset_methods(self): + methods_to_expose = [(method_name, getattr(Dataset, method_name)) + for method_name in _ALL_DATASET_METHODS_TO_EXPOSE] + + for method_name, method in methods_to_expose: + _expose_methods_wrapped_to_map_over_subtree(self, method_name, method) + + +# TODO implement ArrayReduce type methods -class DataTree(TreeNode, DatasetPropertiesMixin): +class DataTree(TreeNode, DatasetPropertiesMixin, DatasetMethodsMixin): """ A tree-like hierarchical collection of xarray objects. @@ -208,13 +247,6 @@ class DataTree(TreeNode, DatasetPropertiesMixin): # TODO add any other properties (maybe dask ones?) - # TODO add all the other methods to dispatch - _DS_METHODS_TO_MAP_OVER_SUBTREES = ['isel', 'sel', 'min', 'max', 'mean', '__array_ufunc__'] - _MAPPED_DOCSTRING_ADDENDUM = textwrap.fill("This method was copied from xarray.Dataset, but has been altered to " - "call the method on the Datasets stored in every node of the subtree. " - "See the datatree.map_over_subtree decorator for more details.", - width=117) - # TODO currently allows self.ds = None, should we instead always store at least an empty Dataset? def __init__( @@ -246,23 +278,14 @@ def __init__( new_node = self.get_node(path) new_node[path] = data - # Add method like .mean(), but wrapped to map over subtrees - self._add_method_api() - - def _add_method_api(self): - # Add methods defined in Dataset's class definition to this classes API, but wrapped to map over descendants too - for method_name in self._DS_METHODS_TO_MAP_OVER_SUBTREES: - # Expose Dataset method, but wrapped to map over whole subtree - ds_method = getattr(Dataset, method_name) - setattr(self, method_name, map_over_subtree(ds_method)) + # TODO this has to be + self._add_all_dataset_api() - # Add a line to the method's docstring explaining how it's been mapped - ds_method_docstring = getattr(Dataset, f'{method_name}').__doc__ - if ds_method_docstring is not None: - updated_method_docstring = ds_method_docstring.replace('\n', self._MAPPED_DOCSTRING_ADDENDUM, 1) - setattr(self, f'{method_name}.__doc__', updated_method_docstring) + def _add_all_dataset_api(self): + # Add methods like .mean(), but wrapped to map over subtrees + self._add_dataset_methods() - # TODO map applied ufuncs over all leaves + # TODO add dataset ops here @property def ds(self) -> Dataset: @@ -284,7 +307,7 @@ def has_data(self): def _init_single_datatree_node( cls, name: Hashable, - data: Dataset = None, + data: Union[Dataset, DataArray] = None, parent: TreeNode = None, children: List[TreeNode] = None, ): @@ -312,6 +335,9 @@ def _init_single_datatree_node( obj = object.__new__(cls) obj = _init_single_treenode(obj, name=name, parent=parent, children=children) obj.ds = data + + obj._add_all_dataset_api() + return obj def __str__(self): @@ -586,13 +612,6 @@ def get_any(self, *tags: Hashable) -> DataTree: if any(tag in c.tags for tag in tags)} return DataTree(data_objects=matching_children) - @property - def chunks(self): - raise NotImplementedError - - def chunk(self): - raise NotImplementedError - def merge(self, datatree: DataTree) -> DataTree: """Merge all the leaves of a second DataTree into this one.""" raise NotImplementedError diff --git a/xarray/datatree_/datatree/tests/test_dataset_api.py b/xarray/datatree_/datatree/tests/test_dataset_api.py index 9e3eb78e5df..e1db1c352e1 100644 --- a/xarray/datatree_/datatree/tests/test_dataset_api.py +++ b/xarray/datatree_/datatree/tests/test_dataset_api.py @@ -1,5 +1,7 @@ import pytest +import numpy as np + import xarray as xr from xarray.testing import assert_equal @@ -93,7 +95,20 @@ def test_no_data_no_properties(self): class TestDSMethodInheritance: - ... + def test_root(self): + da = xr.DataArray(name='a', data=[1, 2, 3], dims='x') + dt = DataNode('root', data=da) + expected_ds = da.to_dataset().isel(x=1) + result_ds = dt.isel(x=1).ds + assert_equal(result_ds, expected_ds) + + def test_descendants(self): + da = xr.DataArray(name='a', data=[1, 2, 3], dims='x') + dt = DataNode('root') + DataNode('results', parent=dt, data=da) + expected_ds = da.to_dataset().isel(x=1) + result_ds = dt.isel(x=1)['results'].ds + assert_equal(result_ds, expected_ds) class TestOps: @@ -101,4 +116,17 @@ class TestOps: class TestUFuncs: - ... + def test_root(self): + da = xr.DataArray(name='a', data=[1, 2, 3]) + dt = DataNode('root', data=da) + expected_ds = np.sin(da.to_dataset()) + result_ds = np.sin(dt).ds + assert_equal(result_ds, expected_ds) + + def test_descendants(self): + da = xr.DataArray(name='a', data=[1, 2, 3]) + dt = DataNode('root') + DataNode('results', parent=dt, data=da) + expected_ds = np.sin(da.to_dataset()) + result_ds = np.sin(dt)['results'].ds + assert_equal(result_ds, expected_ds) From 15ef6d7c03ffc673e26f35c976d7fdd7580fd050 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Mon, 23 Aug 2021 23:42:57 -0400 Subject: [PATCH 031/260] define all dataset properties --- xarray/datatree_/datatree/datatree.py | 92 ++++++++++++++++++- .../datatree/tests/test_dataset_api.py | 2 + 2 files changed, 90 insertions(+), 4 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 397307bc926..d624df65a0e 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -92,12 +92,32 @@ def _map_over_subtree(tree, *args, **kwargs): return _map_over_subtree +_DATASET_PROPERTIES_TO_EXPOSE = ['dims', 'variables', 'encoding', 'sizes', 'attrs', 'nbytes', 'indexes', 'xindexes', + 'xindexes', 'coords', 'data_vars', 'chunks', 'real', 'imag'] + + class DatasetPropertiesMixin: """Expose properties of wrapped Dataset""" - # TODO a neater / more succinct way of doing this? - # we wouldn't need it at all if we inherited directly from Dataset... + # We wouldn't need this at all if we inherited directly from Dataset... + + def _add_dataset_properties(self): + for prop_name in _DATASET_PROPERTIES_TO_EXPOSE: + prop = getattr(Dataset, prop_name) + # Expose Dataset property + # TODO needs to be wrapped with a decorator that checks if self.has_data + # TODO should we be using functools.partialmethod here instead? + # TODO is the property() wrapper needed? + setattr(self, prop_name, property(prop)) + + # Copy the docstring across unchanged + prop_docstring = prop.__doc__ + if prop_docstring: + dt_prop = getattr(self, prop_name) + setattr(dt_prop, '__doc__', prop_docstring) + + """ @property def dims(self): if self.has_data: @@ -133,6 +153,61 @@ def attrs(self): else: raise AttributeError("property is not defined for a node with no data") + + @property + def nbytes(self) -> int: + return sum(node.ds.nbytes for node in self.subtree_nodes) + + @property + def indexes(self): + if self.has_data: + return self.ds.indexes + else: + raise AttributeError("property is not defined for a node with no data") + + @property + def xindexes(self): + if self.has_data: + return self.ds.xindexes + else: + raise AttributeError("property is not defined for a node with no data") + + @property + def coords(self): + if self.has_data: + return self.ds.coords + else: + raise AttributeError("property is not defined for a node with no data") + + @property + def data_vars(self): + if self.has_data: + return self.ds.data_vars + else: + raise AttributeError("property is not defined for a node with no data") + + # TODO should this instead somehow give info about the chunking of every node? + @property + def chunks(self): + if self.has_data: + return self.ds.chunks + else: + raise AttributeError("property is not defined for a node with no data") + + @property + def real(self): + if self.has_data: + return self.ds.real + else: + raise AttributeError("property is not defined for a node with no data") + + @property + def imag(self): + if self.has_data: + return self.ds.imag + else: + raise AttributeError("property is not defined for a node with no data") + # TODO .loc dims.__doc__ = Dataset.dims.__doc__ @@ -140,7 +215,13 @@ def attrs(self): encoding.__doc__ = Dataset.encoding.__doc__ sizes.__doc__ = Dataset.sizes.__doc__ attrs.__doc__ = Dataset.attrs.__doc__ + indexes.__doc__ = Dataset.indexes.__doc__ + xindexes.__doc__ = Dataset.xindexes.__doc__ + coords.__doc__ = Dataset.coords.__doc__ + data_vars.__doc__ = Dataset.data_vars.__doc__ + chunks.__doc__ = Dataset.chunks.__doc__ + """ _MAPPED_DOCSTRING_ADDENDUM = textwrap.fill("This method was copied from xarray.Dataset, but has been altered to " "call the method on the Datasets stored in every node of the subtree. " @@ -282,9 +363,12 @@ def __init__( self._add_all_dataset_api() def _add_all_dataset_api(self): - # Add methods like .mean(), but wrapped to map over subtrees + # Add methods like .isel(), but wrapped to map over subtrees self._add_dataset_methods() + # Add properties like .data_vars + self._add_dataset_properties() + # TODO add dataset ops here @property @@ -632,7 +716,7 @@ def merge_child_datasets( datasets = [self.get(path).ds for path in paths] return merge(datasets, compat=compat, join=join, fill_value=fill_value, combine_attrs=combine_attrs) - def as_dataarray(self) -> DataArray: + def as_array(self) -> DataArray: return self.ds.as_dataarray() @property diff --git a/xarray/datatree_/datatree/tests/test_dataset_api.py b/xarray/datatree_/datatree/tests/test_dataset_api.py index e1db1c352e1..c6d0f150da1 100644 --- a/xarray/datatree_/datatree/tests/test_dataset_api.py +++ b/xarray/datatree_/datatree/tests/test_dataset_api.py @@ -80,6 +80,7 @@ def test_properties(self): assert dt.sizes == dt.ds.sizes assert dt.variables == dt.ds.variables + def test_no_data_no_properties(self): dt = DataNode('root', data=None) with pytest.raises(AttributeError): @@ -115,6 +116,7 @@ class TestOps: ... +@pytest.mark.xfail class TestUFuncs: def test_root(self): da = xr.DataArray(name='a', data=[1, 2, 3]) From a105297819139820e1f76ce6f13c7f3ad5906f31 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 24 Aug 2021 00:21:08 -0400 Subject: [PATCH 032/260] define method docstrings properly --- xarray/datatree_/datatree/datatree.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 397307bc926..19b7ba3f1b1 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -170,7 +170,8 @@ def method_name(self, *args, **kwargs): method_docstring = method.__doc__ if method_docstring is not None: updated_method_docstring = method_docstring.replace('\n', _MAPPED_DOCSTRING_ADDENDUM, 1) - setattr(obj, f'{method_name}.__doc__', updated_method_docstring) + obj_method = getattr(obj, method_name) + setattr(obj_method, '__doc__', updated_method_docstring) # TODO equals, broadcast_equals etc. From 9c084cfd34c036073ffd3e093e8ff01918828af6 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 24 Aug 2021 00:22:50 -0400 Subject: [PATCH 033/260] mark ufunc tests as failing because they aren't in the remit of this Mixin --- xarray/datatree_/datatree/tests/test_dataset_api.py | 1 + 1 file changed, 1 insertion(+) diff --git a/xarray/datatree_/datatree/tests/test_dataset_api.py b/xarray/datatree_/datatree/tests/test_dataset_api.py index e1db1c352e1..99d03254d95 100644 --- a/xarray/datatree_/datatree/tests/test_dataset_api.py +++ b/xarray/datatree_/datatree/tests/test_dataset_api.py @@ -115,6 +115,7 @@ class TestOps: ... +@pytest.mark.xfail class TestUFuncs: def test_root(self): da = xr.DataArray(name='a', data=[1, 2, 3]) From c04c80cfeb3c15f032e45095c6c373684db531ff Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 24 Aug 2021 00:48:47 -0400 Subject: [PATCH 034/260] use methods from DatasetArithmetic instead of DatasetOpsMixin --- xarray/datatree_/datatree/datatree.py | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index cb6bde5f4af..90310714bbd 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -12,6 +12,7 @@ from xarray.core.variable import Variable from xarray.core.combine import merge from xarray.core import dtypes, utils +from xarray.core.arithmetic import DatasetArithmetic from .treenode import TreeNode, PathType, _init_single_treenode @@ -204,10 +205,21 @@ def _add_dataset_methods(self): _expose_methods_wrapped_to_map_over_subtree(self, method_name, method) -# TODO implement ArrayReduce type methods +_ARITHMETIC_METHODS_TO_IGNORE = ['__class__', '__doc__', '__format__', '__repr__', '__slots__', '_binary_op', + '_unary_op', '_inplace_binary_op'] +_ALL_DATASET_ARITHMETIC_TO_EXPOSE = [(method_name, method) for method_name, method + in inspect.getmembers(DatasetArithmetic, inspect.isfunction) + if method_name not in _ARITHMETIC_METHODS_TO_IGNORE] -class DataTree(TreeNode, DatasetPropertiesMixin, DatasetMethodsMixin, DataTreeOpsMixin): +class DataTreeArithmetic: + # TODO is there a way to put this code in the class definition so we don't have to specifically call this method? + def _add_dataset_arithmetic(self): + for method_name, method in _ALL_DATASET_ARITHMETIC_TO_EXPOSE: + _expose_methods_wrapped_to_map_over_subtree(self, method_name, method) + + +class DataTree(TreeNode, DatasetPropertiesMixin, DatasetMethodsMixin, DataTreeArithmetic): """ A tree-like hierarchical collection of xarray objects. @@ -285,7 +297,7 @@ def _add_all_dataset_api(self): self._add_dataset_methods() # Add operations like __add__, but wrapped to map over subtrees - self._add_dataset_ops() + self._add_dataset_arithmetic() @property def ds(self) -> Dataset: From 5a71cbdf0a57ceb15447ae0a6792da0fb9f83fcb Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 24 Aug 2021 10:48:36 -0400 Subject: [PATCH 035/260] add netcdf4 to dependencies --- xarray/datatree_/setup.py | 1 + 1 file changed, 1 insertion(+) diff --git a/xarray/datatree_/setup.py b/xarray/datatree_/setup.py index 03a44eed978..1ab01d6dc41 100644 --- a/xarray/datatree_/setup.py +++ b/xarray/datatree_/setup.py @@ -2,6 +2,7 @@ install_requires = [ "xarray>=0.19.0", + "netcdf4" "anytree", "future", ] From 478f193254e68065ae578115793556f91976ae9e Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 24 Aug 2021 12:18:55 -0400 Subject: [PATCH 036/260] just do it the manual way for now --- xarray/datatree_/datatree/datatree.py | 22 +--------------------- 1 file changed, 1 insertion(+), 21 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index d624df65a0e..02b1b4246d6 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -99,25 +99,9 @@ def _map_over_subtree(tree, *args, **kwargs): class DatasetPropertiesMixin: """Expose properties of wrapped Dataset""" + # TODO a neater way of setting all of these? # We wouldn't need this at all if we inherited directly from Dataset... - def _add_dataset_properties(self): - for prop_name in _DATASET_PROPERTIES_TO_EXPOSE: - prop = getattr(Dataset, prop_name) - - # Expose Dataset property - # TODO needs to be wrapped with a decorator that checks if self.has_data - # TODO should we be using functools.partialmethod here instead? - # TODO is the property() wrapper needed? - setattr(self, prop_name, property(prop)) - - # Copy the docstring across unchanged - prop_docstring = prop.__doc__ - if prop_docstring: - dt_prop = getattr(self, prop_name) - setattr(dt_prop, '__doc__', prop_docstring) - - """ @property def dims(self): if self.has_data: @@ -221,7 +205,6 @@ def imag(self): data_vars.__doc__ = Dataset.data_vars.__doc__ chunks.__doc__ = Dataset.chunks.__doc__ - """ _MAPPED_DOCSTRING_ADDENDUM = textwrap.fill("This method was copied from xarray.Dataset, but has been altered to " "call the method on the Datasets stored in every node of the subtree. " @@ -366,9 +349,6 @@ def _add_all_dataset_api(self): # Add methods like .isel(), but wrapped to map over subtrees self._add_dataset_methods() - # Add properties like .data_vars - self._add_dataset_properties() - # TODO add dataset ops here @property From 6df675e714e9f7df570d61cf2fd1150691834e5d Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 24 Aug 2021 12:24:27 -0400 Subject: [PATCH 037/260] remove list of dataset properties to add --- xarray/datatree_/datatree/datatree.py | 4 ---- 1 file changed, 4 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index c23ade40564..c5efab1188b 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -92,10 +92,6 @@ def _map_over_subtree(tree, *args, **kwargs): return _map_over_subtree -_DATASET_PROPERTIES_TO_EXPOSE = ['dims', 'variables', 'encoding', 'sizes', 'attrs', 'nbytes', 'indexes', 'xindexes', - 'xindexes', 'coords', 'data_vars', 'chunks', 'real', 'imag'] - - class DatasetPropertiesMixin: """Expose properties of wrapped Dataset""" From 0cdcdab2468dc06ff8ed0b2eeebb25dc6d8f9eac Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Tue, 24 Aug 2021 12:57:07 -0400 Subject: [PATCH 038/260] Update status of project in readme --- xarray/datatree_/README.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/xarray/datatree_/README.md b/xarray/datatree_/README.md index 1b72f3b560f..30c564e3111 100644 --- a/xarray/datatree_/README.md +++ b/xarray/datatree_/README.md @@ -1,2 +1,20 @@ # datatree WIP implementation of a tree-like hierarchical data structure for xarray. + +This aims to create the data structure discussed in [xarray issue #4118](https://github.com/pydata/xarray/issues/4118), and therefore extend xarray's data model to be able to [handle arbitrarily nested netCDF4 groups](https://github.com/pydata/xarray/issues/1092#issuecomment-868324949). + + +The approach used here is based on benbovy's [`DatasetNode` example](https://gist.github.com/benbovy/92e7c76220af1aaa4b3a0b65374e233a) - the basic idea is that each tree node wraps a up to a single `xarray.Dataset`. The differences are that this effort: +- [Uses a NodeMixin from anytree](https://github.com/TomNicholas/datatree/issues/7) for the tree structure, +- Implements path-like and tag-like getting and setting, +- Has functions for mapping user-supplied functions over every node in the tree, +- Automatically dispatches *some* of `xarray.Dataset`'s API over every node in the tree (such as `.isel`), +- Has a bunch of tests, +- Has a printable representation that currently looks like this: +drawing + +You can create a `DataTree` object in 3 ways: +1) Load from a netCDF file that has groups via `open_datatree()`, +2) Using the init method of `DataTree`, which accepts a nested dictionary of Datasets, +3) Manually create individual nodes with `DataNode()` and specify their relationships to each other, either by setting `.parent` and `.chlldren` attributes, or through `__get/setitem__` access, e.g. +`dt['path/to/node'] = xr.Dataset()` From 448ead46d969a2d390b906cdd2e2eefa337bafc0 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 24 Aug 2021 16:49:17 -0400 Subject: [PATCH 039/260] add_api_in_class_definition --- xarray/datatree_/datatree/datatree.py | 114 ++++++++++++-------------- 1 file changed, 52 insertions(+), 62 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index c5efab1188b..b2cc8602919 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -12,7 +12,6 @@ from xarray.core.variable import Variable from xarray.core.combine import merge from xarray.core import dtypes, utils -from xarray.core._typed_ops import DatasetOpsMixin from .treenode import TreeNode, PathType, _init_single_treenode @@ -188,7 +187,7 @@ def imag(self): else: raise AttributeError("property is not defined for a node with no data") - # TODO .loc + # TODO .loc, __contains__, __iter__, __array__, '__len__', dims.__doc__ = Dataset.dims.__doc__ variables.__doc__ = Dataset.variables.__doc__ @@ -207,68 +206,71 @@ def imag(self): "See the `map_over_subtree` decorator for more details.", width=117) -def _expose_methods_wrapped_to_map_over_subtree(obj, method_name, method): +def _wrap_then_attach_to_cls(cls_dict, methods_to_expose, wrap_func=None): """ - Expose given method on node object, but wrapped to map over whole subtree, not just that node object. - - Result is like having written this in obj's class definition: + Attach given methods on a class, and optionally wrap each method first. (i.e. with map_over_subtree) + Result is like having written this in the classes' definition: ``` - @map_over_subtree + @wrap_func def method_name(self, *args, **kwargs): return self.method(*args, **kwargs) ``` - """ - - # Expose Dataset method, but wrapped to map over whole subtree when called - # TODO should we be using functools.partialmethod here instead? - mapped_over_tree = functools.partial(map_over_subtree(method), obj) - setattr(obj, method_name, mapped_over_tree) - - # TODO do we really need this for ops like __add__? - # Add a line to the method's docstring explaining how it's been mapped - method_docstring = method.__doc__ - if method_docstring is not None: - updated_method_docstring = method_docstring.replace('\n', _MAPPED_DOCSTRING_ADDENDUM, 1) - obj_method = getattr(obj, method_name) - setattr(obj_method, '__doc__', updated_method_docstring) + Parameters + ---------- + cls_dict + The __dict__ attribute of a class, which can also be accessed by calling vars() from within that classes' + definition. + methods_to_expose : Iterable[Tuple[str, callable]] + The method names and definitions supplied as a list of (method_name_string, method) pairs.\ + This format matches the output of inspect.getmembers(). + """ + for method_name, method in methods_to_expose: + wrapped_method = wrap_func(method) if wrap_func is not None else method + cls_dict[method_name] = wrapped_method -# TODO equals, broadcast_equals etc. -# TODO do dask-related private methods need to be exposed? -_DATASET_DASK_METHODS_TO_EXPOSE = ['load', 'compute', 'persist', 'unify_chunks', 'chunk', 'map_blocks'] -_DATASET_METHODS_TO_EXPOSE = ['copy', 'as_numpy', '__copy__', '__deepcopy__', '__contains__', '__len__', - '__bool__', '__iter__', '__array__', 'set_coords', 'reset_coords', 'info', - 'isel', 'sel', 'head', 'tail', 'thin', 'broadcast_like', 'reindex_like', - 'reindex', 'interp', 'interp_like', 'rename', 'rename_dims', 'rename_vars', - 'swap_dims', 'expand_dims', 'set_index', 'reset_index', 'reorder_levels', 'stack', - 'unstack', 'update', 'merge', 'drop_vars', 'drop_sel', 'drop_isel', 'drop_dims', - 'transpose', 'dropna', 'fillna', 'interpolate_na', 'ffill', 'bfill', 'combine_first', - 'reduce', 'map', 'assign', 'diff', 'shift', 'roll', 'sortby', 'quantile', 'rank', - 'differentiate', 'integrate', 'cumulative_integrate', 'filter_by_attrs', 'polyfit', - 'pad', 'idxmin', 'idxmax', 'argmin', 'argmax', 'query', 'curvefit'] -_DATASET_OPS_TO_EXPOSE = ['_unary_op', '_binary_op', '_inplace_binary_op'] -_ALL_DATASET_METHODS_TO_EXPOSE = _DATASET_DASK_METHODS_TO_EXPOSE + _DATASET_METHODS_TO_EXPOSE + _DATASET_OPS_TO_EXPOSE - -# TODO methods which should not or cannot act over the whole tree, such as .to_array - + # TODO do we really need this for ops like __add__? + # Add a line to the method's docstring explaining how it's been mapped + method_docstring = method.__doc__ + if method_docstring is not None: + updated_method_docstring = method_docstring.replace('\n', _MAPPED_DOCSTRING_ADDENDUM, 1) + setattr(cls_dict[method_name], '__doc__', updated_method_docstring) -class DatasetMethodsMixin: - """Mixin to add Dataset methods like .mean(), but wrapped to map over all nodes in the subtree.""" - # TODO is there a way to put this code in the class definition so we don't have to specifically call this method? - def _add_dataset_methods(self): - methods_to_expose = [(method_name, getattr(Dataset, method_name)) - for method_name in _ALL_DATASET_METHODS_TO_EXPOSE] +class MappedDatasetMethodsMixin: + """ + Mixin to add Dataset methods like .mean(), but wrapped to map over all nodes in the subtree. - for method_name, method in methods_to_expose: - _expose_methods_wrapped_to_map_over_subtree(self, method_name, method) + Every method wrapped here needs to have a return value of Dataset or DataArray in order to construct a new tree. + """ + # TODO equals, broadcast_equals etc. + # TODO do dask-related private methods need to be exposed? + _DATASET_DASK_METHODS_TO_EXPOSE = ['load', 'compute', 'persist', 'unify_chunks', 'chunk', 'map_blocks'] + _DATASET_METHODS_TO_EXPOSE = ['copy', 'as_numpy', '__copy__', '__deepcopy__', 'set_coords', 'reset_coords', 'info', + 'isel', 'sel', 'head', 'tail', 'thin', 'broadcast_like', 'reindex_like', + 'reindex', 'interp', 'interp_like', 'rename', 'rename_dims', 'rename_vars', + 'swap_dims', 'expand_dims', 'set_index', 'reset_index', 'reorder_levels', 'stack', + 'unstack', 'update', 'merge', 'drop_vars', 'drop_sel', 'drop_isel', 'drop_dims', + 'transpose', 'dropna', 'fillna', 'interpolate_na', 'ffill', 'bfill', 'combine_first', + 'reduce', 'map', 'assign', 'diff', 'shift', 'roll', 'sortby', 'quantile', 'rank', + 'differentiate', 'integrate', 'cumulative_integrate', 'filter_by_attrs', 'polyfit', + 'pad', 'idxmin', 'idxmax', 'argmin', 'argmax', 'query', 'curvefit'] + # TODO unsure if these are called by external functions or not? + _DATASET_OPS_TO_EXPOSE = ['_unary_op', '_binary_op', '_inplace_binary_op'] + _ALL_DATASET_METHODS_TO_EXPOSE = _DATASET_DASK_METHODS_TO_EXPOSE + _DATASET_METHODS_TO_EXPOSE + _DATASET_OPS_TO_EXPOSE + + # TODO methods which should not or cannot act over the whole tree, such as .to_array + + methods_to_expose = [(method_name, getattr(Dataset, method_name)) + for method_name in _ALL_DATASET_METHODS_TO_EXPOSE] + _wrap_then_attach_to_cls(vars(), methods_to_expose, wrap_func=map_over_subtree) # TODO implement ArrayReduce type methods -class DataTree(TreeNode, DatasetPropertiesMixin, DatasetMethodsMixin): +class DataTree(TreeNode, DatasetPropertiesMixin, MappedDatasetMethodsMixin): """ A tree-like hierarchical collection of xarray objects. @@ -339,15 +341,6 @@ def __init__( new_node = self.get_node(path) new_node[path] = data - # TODO this has to be - self._add_all_dataset_api() - - def _add_all_dataset_api(self): - # Add methods like .isel(), but wrapped to map over subtrees - self._add_dataset_methods() - - # TODO add dataset ops here - @property def ds(self) -> Dataset: return self._ds @@ -396,9 +389,6 @@ def _init_single_datatree_node( obj = object.__new__(cls) obj = _init_single_treenode(obj, name=name, parent=parent, children=children) obj.ds = data - - obj._add_all_dataset_api() - return obj def __str__(self): @@ -435,7 +425,7 @@ def _single_node_repr(self): def __repr__(self): """Information about this node, including its relationships to other nodes.""" # TODO redo this to look like the Dataset repr, but just with child and parent info - parent = self.parent.name if self.parent else "None" + parent = self.parent.name if self.parent is not None else "None" node_str = f"DataNode(name='{self.name}', parent='{parent}', children={[c.name for c in self.children]}," if self.has_data: @@ -554,7 +544,7 @@ def __setitem__( except anytree.resolver.ResolverError: existing_node = None - if existing_node: + if existing_node is not None: if isinstance(value, Dataset): # replace whole dataset existing_node.ds = Dataset From 51e89d347c7c2fbf4e7087f9e99f888ee6e25176 Mon Sep 17 00:00:00 2001 From: Joseph Hamman Date: Tue, 24 Aug 2021 17:26:31 -0700 Subject: [PATCH 040/260] add basic ci setup --- xarray/datatree_/.github/dependabot.yml | 11 +++ xarray/datatree_/.github/workflows/main.yaml | 81 +++++++++++++++++++ .../.github/workflows/pypipublish.yaml | 26 ++++++ xarray/datatree_/.pre-commit-config.yaml | 56 +++++++++++++ xarray/datatree_/dev-requirements.txt | 5 ++ xarray/datatree_/requirements.txt | 4 + xarray/datatree_/setup.py | 28 +++---- 7 files changed, 196 insertions(+), 15 deletions(-) create mode 100644 xarray/datatree_/.github/dependabot.yml create mode 100644 xarray/datatree_/.github/workflows/main.yaml create mode 100644 xarray/datatree_/.github/workflows/pypipublish.yaml create mode 100644 xarray/datatree_/.pre-commit-config.yaml create mode 100644 xarray/datatree_/dev-requirements.txt create mode 100644 xarray/datatree_/requirements.txt diff --git a/xarray/datatree_/.github/dependabot.yml b/xarray/datatree_/.github/dependabot.yml new file mode 100644 index 00000000000..d1d1190be70 --- /dev/null +++ b/xarray/datatree_/.github/dependabot.yml @@ -0,0 +1,11 @@ +version: 2 +updates: + - package-ecosystem: pip + directory: "/" + schedule: + interval: daily + - package-ecosystem: "github-actions" + directory: "/" + schedule: + # Check for updates to GitHub Actions every weekday + interval: "daily" diff --git a/xarray/datatree_/.github/workflows/main.yaml b/xarray/datatree_/.github/workflows/main.yaml new file mode 100644 index 00000000000..e8d8f3d4cd8 --- /dev/null +++ b/xarray/datatree_/.github/workflows/main.yaml @@ -0,0 +1,81 @@ +name: CI + +on: + push: + branches: "*" + pull_request: + branches: main + schedule: + - cron: "0 0 * * *" + +jobs: + lint: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v2.3.4 + - uses: actions/setup-python@v2.2.2 + - uses: pre-commit/action@v2.0.3 + + test: + name: ${{ matrix.python-version }}-build + runs-on: ubuntu-latest + strategy: + matrix: + python-version: [3.7, 3.8, 3.9] + steps: + - uses: actions/checkout@v2.3.4 + - name: Setup Python + uses: actions/setup-python@v2.2.2 + with: + python-version: ${{ matrix.python-version }} + architecture: x64 + - uses: actions/cache@v2.1.6 + with: + path: ~/.cache/pip + key: ${{ runner.os }}-pip-${{ hashFiles('**/dev-requirements.txt') }} + restore-keys: | + ${{ runner.os }}-pip- + - run: | + python -m pip install -r dev-requirements.txt + python -m pip install --no-deps -e . + python -m pip list + - name: Running Tests + run: | + python -m pytest --cov=./ --cov-report=xml --verbose + - name: Upload coverage to Codecov + uses: codecov/codecov-action@v2.0.2 + if: ${{ matrix.python-version }} == 3.8 + with: + file: ./coverage.xml + fail_ci_if_error: false + + test-upstream: + name: ${{ matrix.python-version }}-dev-build + runs-on: ubuntu-latest + strategy: + matrix: + python-version: [3.8, 3.9] + steps: + - uses: actions/checkout@v2.3.4 + - name: Setup Python + uses: actions/setup-python@v2.2.2 + with: + python-version: ${{ matrix.python-version }} + architecture: x64 + - uses: actions/cache@v2.1.6 + with: + path: ~/.cache/pip + key: ${{ runner.os }}-pip-${{ hashFiles('**/*requirements.txt') }} + restore-keys: | + ${{ runner.os }}-pip- + - run: | + python -m pip install -r dev-requirements.txt + python -m pip install --no-deps --upgrade \ + git+https://github.com/pydata/xarray \ + git+https://github.com/Unidata/netcdf4-python \ + git+https://github.com/c0fec0de/anytree + python -m pip install --no-deps -e . + python -m pip list + - name: Running Tests + run: | + python -m pytest --verbose diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml new file mode 100644 index 00000000000..986315cae97 --- /dev/null +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -0,0 +1,26 @@ +name: Upload Python Package + +on: + release: + types: [created] + +jobs: + deploy: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v2.3.4 + - name: Set up Python + uses: actions/setup-python@v2.2.1 + with: + python-version: "3.x" + - name: Install dependencies + run: | + python -m pip install --upgrade pip + python -m pip install setuptools setuptools-scm wheel twine + - name: Build and publish + env: + TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }} + TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }} + run: | + python setup.py sdist bdist_wheel + twine upload dist/* diff --git a/xarray/datatree_/.pre-commit-config.yaml b/xarray/datatree_/.pre-commit-config.yaml new file mode 100644 index 00000000000..53525d0def9 --- /dev/null +++ b/xarray/datatree_/.pre-commit-config.yaml @@ -0,0 +1,56 @@ +# https://pre-commit.com/ +repos: + - repo: https://github.com/pre-commit/pre-commit-hooks + rev: v4.0.1 + hooks: + - id: trailing-whitespace + - id: end-of-file-fixer + - id: check-yaml + # isort should run before black as black sometimes tweaks the isort output + - repo: https://github.com/PyCQA/isort + rev: 5.9.3 + hooks: + - id: isort + # https://github.com/python/black#version-control-integration + - repo: https://github.com/psf/black + rev: 21.7b0 + hooks: + - id: black + - repo: https://github.com/keewis/blackdoc + rev: v0.3.4 + hooks: + - id: blackdoc + - repo: https://gitlab.com/pycqa/flake8 + rev: 3.9.2 + hooks: + - id: flake8 + # - repo: https://github.com/Carreau/velin + # rev: 0.0.8 + # hooks: + # - id: velin + # args: ["--write", "--compact"] + - repo: https://github.com/pre-commit/mirrors-mypy + rev: v0.910 + hooks: + - id: mypy + # Copied from setup.cfg + exclude: "properties|asv_bench" + additional_dependencies: [ + # Type stubs + types-python-dateutil, + types-pkg_resources, + types-PyYAML, + types-pytz, + # Dependencies that are typed + numpy, + typing-extensions==3.10.0.0, + ] + # run this occasionally, ref discussion https://github.com/pydata/xarray/pull/3194 + # - repo: https://github.com/asottile/pyupgrade + # rev: v1.22.1 + # hooks: + # - id: pyupgrade + # args: + # - "--py3-only" + # # remove on f-strings in Py3.7 + # - "--keep-percent-format" diff --git a/xarray/datatree_/dev-requirements.txt b/xarray/datatree_/dev-requirements.txt new file mode 100644 index 00000000000..349f188deb9 --- /dev/null +++ b/xarray/datatree_/dev-requirements.txt @@ -0,0 +1,5 @@ +pytest +flake8 +black +codecov +-r requirements.txt diff --git a/xarray/datatree_/requirements.txt b/xarray/datatree_/requirements.txt new file mode 100644 index 00000000000..67e19d194b6 --- /dev/null +++ b/xarray/datatree_/requirements.txt @@ -0,0 +1,4 @@ +xarray>=0.19.0 +netcdf4 +anytree +future diff --git a/xarray/datatree_/setup.py b/xarray/datatree_/setup.py index 1ab01d6dc41..bbb053554d7 100644 --- a/xarray/datatree_/setup.py +++ b/xarray/datatree_/setup.py @@ -1,20 +1,16 @@ +from os.path import exists from setuptools import find_packages, setup -install_requires = [ - "xarray>=0.19.0", - "netcdf4" - "anytree", - "future", -] -extras_require = {'tests': - [ - "pytest", - "flake8", - "black", - "codecov", - ] -} +with open('requirements.txt') as f: + install_requires = f.read().strip().split('\n') + +if exists('README.rst'): + with open('README.rst') as f: + long_description = f.read() +else: + long_description = '' + setup( name="datatree", @@ -29,11 +25,13 @@ "Topic :: Scientific/Engineering", "License :: OSI Approved :: Apache License", "Operating System :: OS Independent", + "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.7", + "Programming Language :: Python :: 3.8", + "Programming Language :: Python :: 3.9", ], packages=find_packages(exclude=["docs", "tests", "tests.*", "docs.*"]), install_requires=install_requires, - extras_require=extras_require, python_requires=">=3.7", setup_requires="setuptools_scm", use_scm_version={ From f160cc2ad2a69de6920e5a27a63c2add0cfb8c03 Mon Sep 17 00:00:00 2001 From: Joseph Hamman Date: Tue, 24 Aug 2021 17:30:18 -0700 Subject: [PATCH 041/260] add long description from readme --- xarray/datatree_/datatree/_version.py | 2 +- xarray/datatree_/setup.py | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/xarray/datatree_/datatree/_version.py b/xarray/datatree_/datatree/_version.py index ef4e01b5a5e..c7d99fbfbfd 100644 --- a/xarray/datatree_/datatree/_version.py +++ b/xarray/datatree_/datatree/_version.py @@ -1 +1 @@ -__version__ = "0.1.dev9+g805d97f.d20210817" \ No newline at end of file +__version__ = "0.1.dev46+g415cbb7.d20210825" \ No newline at end of file diff --git a/xarray/datatree_/setup.py b/xarray/datatree_/setup.py index bbb053554d7..1110c1a3ea5 100644 --- a/xarray/datatree_/setup.py +++ b/xarray/datatree_/setup.py @@ -15,6 +15,7 @@ setup( name="datatree", description="Hierarchical tree-like data structures for xarray", + long_description=long_description, url="https://github.com/TomNicholas/datatree", author="Thomas Nicholas", author_email="thomas.nicholas@columbia.edu", From 4cd5e2d8157b219ac1078a6ea6050d4711e43c66 Mon Sep 17 00:00:00 2001 From: Joseph Hamman Date: Tue, 24 Aug 2021 17:43:36 -0700 Subject: [PATCH 042/260] switch to conda --- xarray/datatree_/.github/workflows/main.yaml | 66 +++++++++++--------- xarray/datatree_/ci/environment.yml | 12 ++++ 2 files changed, 50 insertions(+), 28 deletions(-) create mode 100644 xarray/datatree_/ci/environment.yml diff --git a/xarray/datatree_/.github/workflows/main.yaml b/xarray/datatree_/.github/workflows/main.yaml index e8d8f3d4cd8..6d3e7dbeb05 100644 --- a/xarray/datatree_/.github/workflows/main.yaml +++ b/xarray/datatree_/.github/workflows/main.yaml @@ -24,30 +24,35 @@ jobs: python-version: [3.7, 3.8, 3.9] steps: - uses: actions/checkout@v2.3.4 - - name: Setup Python - uses: actions/setup-python@v2.2.2 + - uses: conda-incubator/setup-miniconda@v2 with: + mamba-version: "*" + auto-update-conda: true python-version: ${{ matrix.python-version }} - architecture: x64 - - uses: actions/cache@v2.1.6 - with: - path: ~/.cache/pip - key: ${{ runner.os }}-pip-${{ hashFiles('**/dev-requirements.txt') }} - restore-keys: | - ${{ runner.os }}-pip- - - run: | - python -m pip install -r dev-requirements.txt + auto-activate-base: false + activate-environment: datatree + environment-file: ci/environment.yml + - name: Conda info + shell: bash -l {0} + run: conda info + - name: Conda list + shell: bash -l {0} + run: conda list + - name: Install datatree + shell: bash -l {0} + run: | python -m pip install --no-deps -e . python -m pip list - name: Running Tests + shell: bash -l {0} run: | python -m pytest --cov=./ --cov-report=xml --verbose - - name: Upload coverage to Codecov - uses: codecov/codecov-action@v2.0.2 - if: ${{ matrix.python-version }} == 3.8 - with: - file: ./coverage.xml - fail_ci_if_error: false + # - name: Upload coverage to Codecov + # uses: codecov/codecov-action@v2.0.2 + # if: ${{ matrix.python-version }} == 3.8 + # with: + # file: ./coverage.xml + # fail_ci_if_error: false test-upstream: name: ${{ matrix.python-version }}-dev-build @@ -57,19 +62,23 @@ jobs: python-version: [3.8, 3.9] steps: - uses: actions/checkout@v2.3.4 - - name: Setup Python - uses: actions/setup-python@v2.2.2 + - uses: conda-incubator/setup-miniconda@v2 with: + mamba-version: "*" + auto-update-conda: true python-version: ${{ matrix.python-version }} - architecture: x64 - - uses: actions/cache@v2.1.6 - with: - path: ~/.cache/pip - key: ${{ runner.os }}-pip-${{ hashFiles('**/*requirements.txt') }} - restore-keys: | - ${{ runner.os }}-pip- - - run: | - python -m pip install -r dev-requirements.txt + auto-activate-base: false + activate-environment: datatree + environment-file: ci/environment.yml + - name: Conda info + shell: bash -l {0} + run: conda info + - name: Conda list + shell: bash -l {0} + run: conda list + - name: Install dev reqs + shell: bash -l {0} + run: | python -m pip install --no-deps --upgrade \ git+https://github.com/pydata/xarray \ git+https://github.com/Unidata/netcdf4-python \ @@ -77,5 +86,6 @@ jobs: python -m pip install --no-deps -e . python -m pip list - name: Running Tests + shell: bash -l {0} run: | python -m pytest --verbose diff --git a/xarray/datatree_/ci/environment.yml b/xarray/datatree_/ci/environment.yml new file mode 100644 index 00000000000..7b903d7ded3 --- /dev/null +++ b/xarray/datatree_/ci/environment.yml @@ -0,0 +1,12 @@ +name: datatree +channels: + - conda-forge + - nodefaults +dependencies: + - xarray >=0.19.0 + - netcdf4 + - anytree + - pytest + - flake8 + - black + - codecov From 5bcb5c03159142ecc5886a7e5b069529c7de72ff Mon Sep 17 00:00:00 2001 From: Joseph Hamman Date: Tue, 24 Aug 2021 17:46:15 -0700 Subject: [PATCH 043/260] add pytest-cov --- xarray/datatree_/ci/environment.yml | 1 + xarray/datatree_/dev-requirements.txt | 1 + 2 files changed, 2 insertions(+) diff --git a/xarray/datatree_/ci/environment.yml b/xarray/datatree_/ci/environment.yml index 7b903d7ded3..8486fc927d6 100644 --- a/xarray/datatree_/ci/environment.yml +++ b/xarray/datatree_/ci/environment.yml @@ -10,3 +10,4 @@ dependencies: - flake8 - black - codecov + - pytest-cov diff --git a/xarray/datatree_/dev-requirements.txt b/xarray/datatree_/dev-requirements.txt index 349f188deb9..57209c776a5 100644 --- a/xarray/datatree_/dev-requirements.txt +++ b/xarray/datatree_/dev-requirements.txt @@ -2,4 +2,5 @@ pytest flake8 black codecov +pytest-cov -r requirements.txt From d5035d848369cbf26e258e920cf67822ebf41078 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 24 Aug 2021 21:09:00 -0400 Subject: [PATCH 044/260] now also inherits from a mapped version of DataWithCoords --- xarray/datatree_/datatree/datatree.py | 25 +++++++++++++++---------- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 74d728eef0f..bfc4ea1ed62 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -134,7 +134,6 @@ def attrs(self): else: raise AttributeError("property is not defined for a node with no data") - @property def nbytes(self) -> int: return sum(node.ds.nbytes for node in self.subtree_nodes) @@ -252,8 +251,8 @@ class MappedDatasetMethodsMixin: # TODO equals, broadcast_equals etc. # TODO do dask-related private methods need to be exposed? - _DATASET_DASK_METHODS_TO_EXPOSE = ['load', 'compute', 'persist', 'unify_chunks', 'chunk', 'map_blocks'] - _DATASET_METHODS_TO_EXPOSE = ['copy', 'as_numpy', '__copy__', '__deepcopy__', 'set_coords', 'reset_coords', 'info', + _DATASET_DASK_METHODS_TO_MAP = ['load', 'compute', 'persist', 'unify_chunks', 'chunk', 'map_blocks'] + _DATASET_METHODS_TO_MAP = ['copy', 'as_numpy', '__copy__', '__deepcopy__', 'set_coords', 'reset_coords', 'info', 'isel', 'sel', 'head', 'tail', 'thin', 'broadcast_like', 'reindex_like', 'reindex', 'interp', 'interp_like', 'rename', 'rename_dims', 'rename_vars', 'swap_dims', 'expand_dims', 'set_index', 'reset_index', 'reorder_levels', 'stack', @@ -263,12 +262,21 @@ class MappedDatasetMethodsMixin: 'differentiate', 'integrate', 'cumulative_integrate', 'filter_by_attrs', 'polyfit', 'pad', 'idxmin', 'idxmax', 'argmin', 'argmax', 'query', 'curvefit'] # TODO unsure if these are called by external functions or not? - _DATASET_OPS_TO_EXPOSE = ['_unary_op', '_binary_op', '_inplace_binary_op'] - _ALL_DATASET_METHODS_TO_EXPOSE = _DATASET_DASK_METHODS_TO_EXPOSE + _DATASET_METHODS_TO_EXPOSE + _DATASET_OPS_TO_EXPOSE + _DATASET_OPS_TO_MAP = ['_unary_op', '_binary_op', '_inplace_binary_op'] + _ALL_DATASET_METHODS_TO_MAP = _DATASET_DASK_METHODS_TO_MAP + _DATASET_METHODS_TO_MAP + _DATASET_OPS_TO_MAP # TODO methods which should not or cannot act over the whole tree, such as .to_array - methods_to_wrap = [(method_name, getattr(Dataset, method_name)) for method_name in _ALL_DATASET_METHODS_TO_EXPOSE] + methods_to_wrap = [(method_name, getattr(Dataset, method_name)) for method_name in _ALL_DATASET_METHODS_TO_MAP] + _wrap_then_attach_to_cls(vars(), methods_to_wrap, wrap_func=map_over_subtree) + + +class MappedDataWithCoords(DataWithCoords): + # TODO add mapped versions of groupby, weighted, rolling, rolling_exp, coarsen, resample, + _DATA_WITH_COORDS_METHODS_TO_MAP = ['squeeze', 'clip', 'assign_coords', 'where', 'close', 'isnull', 'notnull', + 'isin', 'astype'] + methods_to_wrap = [(method_name, getattr(DataWithCoords, method_name)) + for method_name in _DATA_WITH_COORDS_METHODS_TO_MAP] _wrap_then_attach_to_cls(vars(), methods_to_wrap, wrap_func=map_over_subtree) @@ -292,10 +300,7 @@ class DataTreeArithmetic(DatasetArithmetic): _wrap_then_attach_to_cls(vars(), methods_to_wrap, wrap_func=map_over_subtree) -# TODO also inherit from DataWithCoords? (will require it's own mapped version to mixin) -# TODO inherit from AttrsAccessMixin? (which is a superclass of DataWithCoords - -class DataTree(TreeNode, DatasetPropertiesMixin, MappedDatasetMethodsMixin, DataTreeArithmetic): +class DataTree(TreeNode, DatasetPropertiesMixin, MappedDatasetMethodsMixin, MappedDataWithCoords, DataTreeArithmetic): """ A tree-like hierarchical collection of xarray objects. From e784f913d3b193b75e4a928f1df1f4eb3239efe7 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 24 Aug 2021 21:09:00 -0400 Subject: [PATCH 045/260] now also inherits from a mapped version of DataWithCoords --- xarray/datatree_/datatree/datatree.py | 86 +++++++++++++-------------- 1 file changed, 43 insertions(+), 43 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index bfc4ea1ed62..b1a0a0336ea 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -1,7 +1,6 @@ from __future__ import annotations import functools import textwrap -import inspect from typing import Mapping, Hashable, Union, List, Any, Callable, Iterable, Dict @@ -14,6 +13,7 @@ from xarray.core import dtypes, utils from xarray.core.common import DataWithCoords from xarray.core.arithmetic import DatasetArithmetic +from xarray.core.ops import NUM_BINARY_OPS, NUMPY_SAME_METHODS, REDUCE_METHODS, NAN_REDUCE_METHODS, NAN_CUM_METHODS from .treenode import TreeNode, PathType, _init_single_treenode @@ -48,7 +48,8 @@ def map_over_subtree(func): The function will be applied to any dataset stored in this node, as well as any dataset stored in any of the descendant nodes. The returned tree will have the same structure as the original subtree. - func needs to return a Dataset in order to rebuild the subtree. + func needs to return a Dataset, DataArray, or None in order to be able to rebuild the subtree after mapping, as each + result will be assigned to its respective node of new tree via `DataTree.__setitem__`. Parameters ---------- @@ -204,10 +205,10 @@ def imag(self): _MAPPED_DOCSTRING_ADDENDUM = textwrap.fill("This method was copied from xarray.Dataset, but has been altered to " "call the method on the Datasets stored in every node of the subtree. " - "See the `map_over_subtree` decorator for more details.", width=117) + "See the `map_over_subtree` function for more details.", width=117) -def _wrap_then_attach_to_cls(cls_dict, methods_to_expose, wrap_func=None): +def _wrap_then_attach_to_cls(target_cls_dict, source_cls, methods_to_set, wrap_func=None): """ Attach given methods on a class, and optionally wrap each method first. (i.e. with map_over_subtree) @@ -220,25 +221,32 @@ def method_name(self, *args, **kwargs): Parameters ---------- - cls_dict - The __dict__ attribute of a class, which can also be accessed by calling vars() from within that classes' - definition. - methods_to_expose : Iterable[Tuple[str, callable]] - The method names and definitions supplied as a list of (method_name_string, method) pairs.\ + target_cls_dict : MappingProxy + The __dict__ attribute of the class which we want the methods to be added to. (The __dict__ attribute can also + be accessed by calling vars() from within that classes' definition.) This will be updated by this function. + source_cls : class + Class object from which we want to copy methods (and optionally wrap them). Should be the actual class object + (or instance), not just the __dict__. + methods_to_set : Iterable[Tuple[str, callable]] + The method names and definitions supplied as a list of (method_name_string, method) pairs. This format matches the output of inspect.getmembers(). wrap_func : callable, optional Function to decorate each method with. Must have the same return type as the method. """ - for method_name, method in methods_to_expose: - wrapped_method = wrap_func(method) if wrap_func is not None else method - cls_dict[method_name] = wrapped_method - - # TODO do we really need this for ops like __add__? - # Add a line to the method's docstring explaining how it's been mapped - method_docstring = method.__doc__ - if method_docstring is not None: - updated_method_docstring = method_docstring.replace('\n', _MAPPED_DOCSTRING_ADDENDUM, 1) - setattr(cls_dict[method_name], '__doc__', updated_method_docstring) + for method_name in methods_to_set: + orig_method = getattr(source_cls, method_name) + wrapped_method = wrap_func(orig_method) if wrap_func is not None else orig_method + target_cls_dict[method_name] = wrapped_method + + if wrap_func is map_over_subtree: + # Add a paragraph to the method's docstring explaining how it's been mapped + orig_method_docstring = orig_method.__doc__ + if orig_method_docstring is not None: + if '\n' in orig_method_docstring: + new_method_docstring = orig_method_docstring.replace('\n', _MAPPED_DOCSTRING_ADDENDUM, 1) + else: + new_method_docstring = orig_method_docstring + f"\n\n{_MAPPED_DOCSTRING_ADDENDUM}" + setattr(target_cls_dict[method_name], '__doc__', new_method_docstring) class MappedDatasetMethodsMixin: @@ -253,51 +261,43 @@ class MappedDatasetMethodsMixin: # TODO do dask-related private methods need to be exposed? _DATASET_DASK_METHODS_TO_MAP = ['load', 'compute', 'persist', 'unify_chunks', 'chunk', 'map_blocks'] _DATASET_METHODS_TO_MAP = ['copy', 'as_numpy', '__copy__', '__deepcopy__', 'set_coords', 'reset_coords', 'info', - 'isel', 'sel', 'head', 'tail', 'thin', 'broadcast_like', 'reindex_like', - 'reindex', 'interp', 'interp_like', 'rename', 'rename_dims', 'rename_vars', - 'swap_dims', 'expand_dims', 'set_index', 'reset_index', 'reorder_levels', 'stack', - 'unstack', 'update', 'merge', 'drop_vars', 'drop_sel', 'drop_isel', 'drop_dims', - 'transpose', 'dropna', 'fillna', 'interpolate_na', 'ffill', 'bfill', 'combine_first', - 'reduce', 'map', 'assign', 'diff', 'shift', 'roll', 'sortby', 'quantile', 'rank', - 'differentiate', 'integrate', 'cumulative_integrate', 'filter_by_attrs', 'polyfit', - 'pad', 'idxmin', 'idxmax', 'argmin', 'argmax', 'query', 'curvefit'] + 'isel', 'sel', 'head', 'tail', 'thin', 'broadcast_like', 'reindex_like', + 'reindex', 'interp', 'interp_like', 'rename', 'rename_dims', 'rename_vars', + 'swap_dims', 'expand_dims', 'set_index', 'reset_index', 'reorder_levels', 'stack', + 'unstack', 'update', 'merge', 'drop_vars', 'drop_sel', 'drop_isel', 'drop_dims', + 'transpose', 'dropna', 'fillna', 'interpolate_na', 'ffill', 'bfill', 'combine_first', + 'reduce', 'map', 'assign', 'diff', 'shift', 'roll', 'sortby', 'quantile', 'rank', + 'differentiate', 'integrate', 'cumulative_integrate', 'filter_by_attrs', 'polyfit', + 'pad', 'idxmin', 'idxmax', 'argmin', 'argmax', 'query', 'curvefit'] # TODO unsure if these are called by external functions or not? _DATASET_OPS_TO_MAP = ['_unary_op', '_binary_op', '_inplace_binary_op'] _ALL_DATASET_METHODS_TO_MAP = _DATASET_DASK_METHODS_TO_MAP + _DATASET_METHODS_TO_MAP + _DATASET_OPS_TO_MAP # TODO methods which should not or cannot act over the whole tree, such as .to_array - methods_to_wrap = [(method_name, getattr(Dataset, method_name)) for method_name in _ALL_DATASET_METHODS_TO_MAP] - _wrap_then_attach_to_cls(vars(), methods_to_wrap, wrap_func=map_over_subtree) + _wrap_then_attach_to_cls(vars(), Dataset, _ALL_DATASET_METHODS_TO_MAP, wrap_func=map_over_subtree) class MappedDataWithCoords(DataWithCoords): - # TODO add mapped versions of groupby, weighted, rolling, rolling_exp, coarsen, resample, + # TODO add mapped versions of groupby, weighted, rolling, rolling_exp, coarsen, resample + # TODO re-implement AttrsAccessMixin stuff so that it includes access to child nodes _DATA_WITH_COORDS_METHODS_TO_MAP = ['squeeze', 'clip', 'assign_coords', 'where', 'close', 'isnull', 'notnull', 'isin', 'astype'] - methods_to_wrap = [(method_name, getattr(DataWithCoords, method_name)) - for method_name in _DATA_WITH_COORDS_METHODS_TO_MAP] - _wrap_then_attach_to_cls(vars(), methods_to_wrap, wrap_func=map_over_subtree) - - -# TODO no idea why if I put this line in the definition of DataTreeArithmetic it says it's not defined -_ARITHMETIC_METHODS_TO_IGNORE = ['__class__', '__doc__', '__format__', '__repr__', '__slots__', '_binary_op', - '_unary_op', '_inplace_binary_op', '__bool__', 'float'] + _wrap_then_attach_to_cls(vars(), DataWithCoords, _DATA_WITH_COORDS_METHODS_TO_MAP, wrap_func=map_over_subtree) class DataTreeArithmetic(DatasetArithmetic): """ Mixin to add Dataset methods like __add__ and .mean() - Some of these method must be wrapped to map over all nodes in the subtree. Others are fine unaltered (normally + Some of these methods must be wrapped to map over all nodes in the subtree. Others are fine unaltered (normally because they (a) only call dataset properties and (b) don't return a dataset that should be nested into a new tree) and some will get overridden by the class definition of DataTree. """ - methods_to_wrap = [(method_name, method) - for method_name, method in inspect.getmembers(DatasetArithmetic, inspect.isfunction) - if method_name not in _ARITHMETIC_METHODS_TO_IGNORE] - _wrap_then_attach_to_cls(vars(), methods_to_wrap, wrap_func=map_over_subtree) + # TODO NUM_BINARY_OPS apparently aren't defined on DatasetArithmetic, and don't appear to be injected anywhere... + _ARITHMETIC_METHODS_TO_WRAP = ['__array_ufunc__'] + REDUCE_METHODS + NAN_REDUCE_METHODS + NAN_CUM_METHODS + _wrap_then_attach_to_cls(vars(), DatasetArithmetic, _ARITHMETIC_METHODS_TO_WRAP, wrap_func=map_over_subtree) class DataTree(TreeNode, DatasetPropertiesMixin, MappedDatasetMethodsMixin, MappedDataWithCoords, DataTreeArithmetic): From 0c3ceabc9d7c6b7be54c2b6193623f7d2b8a7590 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 24 Aug 2021 22:32:35 -0400 Subject: [PATCH 046/260] dont try and import ops that we cant define on a dataset --- xarray/datatree_/datatree/datatree.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index b1a0a0336ea..d2b3699fa18 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -13,7 +13,7 @@ from xarray.core import dtypes, utils from xarray.core.common import DataWithCoords from xarray.core.arithmetic import DatasetArithmetic -from xarray.core.ops import NUM_BINARY_OPS, NUMPY_SAME_METHODS, REDUCE_METHODS, NAN_REDUCE_METHODS, NAN_CUM_METHODS +from xarray.core.ops import REDUCE_METHODS, NAN_REDUCE_METHODS, NAN_CUM_METHODS from .treenode import TreeNode, PathType, _init_single_treenode From 4eed833e240ad2888edd9eda45c367329f14da7b Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 24 Aug 2021 23:23:02 -0400 Subject: [PATCH 047/260] lists of methods to define shouldn't be stored as attributes --- xarray/datatree_/datatree/datatree.py | 49 ++++++++++++++------------- 1 file changed, 25 insertions(+), 24 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index d2b3699fa18..081e7117e52 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -207,6 +207,29 @@ def imag(self): "call the method on the Datasets stored in every node of the subtree. " "See the `map_over_subtree` function for more details.", width=117) +# TODO equals, broadcast_equals etc. +# TODO do dask-related private methods need to be exposed? +_DATASET_DASK_METHODS_TO_MAP = ['load', 'compute', 'persist', 'unify_chunks', 'chunk', 'map_blocks'] +_DATASET_METHODS_TO_MAP = ['copy', 'as_numpy', '__copy__', '__deepcopy__', 'set_coords', 'reset_coords', 'info', + 'isel', 'sel', 'head', 'tail', 'thin', 'broadcast_like', 'reindex_like', + 'reindex', 'interp', 'interp_like', 'rename', 'rename_dims', 'rename_vars', + 'swap_dims', 'expand_dims', 'set_index', 'reset_index', 'reorder_levels', 'stack', + 'unstack', 'update', 'merge', 'drop_vars', 'drop_sel', 'drop_isel', 'drop_dims', + 'transpose', 'dropna', 'fillna', 'interpolate_na', 'ffill', 'bfill', 'combine_first', + 'reduce', 'map', 'assign', 'diff', 'shift', 'roll', 'sortby', 'quantile', 'rank', + 'differentiate', 'integrate', 'cumulative_integrate', 'filter_by_attrs', 'polyfit', + 'pad', 'idxmin', 'idxmax', 'argmin', 'argmax', 'query', 'curvefit'] +# TODO unsure if these are called by external functions or not? +_DATASET_OPS_TO_MAP = ['_unary_op', '_binary_op', '_inplace_binary_op'] +_ALL_DATASET_METHODS_TO_MAP = _DATASET_DASK_METHODS_TO_MAP + _DATASET_METHODS_TO_MAP + _DATASET_OPS_TO_MAP + +_DATA_WITH_COORDS_METHODS_TO_MAP = ['squeeze', 'clip', 'assign_coords', 'where', 'close', 'isnull', 'notnull', + 'isin', 'astype'] + +# TODO NUM_BINARY_OPS apparently aren't defined on DatasetArithmetic, and don't appear to be injected anywhere... +#['__array_ufunc__'] \ +_ARITHMETIC_METHODS_TO_WRAP = REDUCE_METHODS + NAN_REDUCE_METHODS + NAN_CUM_METHODS + def _wrap_then_attach_to_cls(target_cls_dict, source_cls, methods_to_set, wrap_func=None): """ @@ -256,33 +279,12 @@ class MappedDatasetMethodsMixin: Every method wrapped here needs to have a return value of Dataset or DataArray in order to construct a new tree. """ __slots__ = () - - # TODO equals, broadcast_equals etc. - # TODO do dask-related private methods need to be exposed? - _DATASET_DASK_METHODS_TO_MAP = ['load', 'compute', 'persist', 'unify_chunks', 'chunk', 'map_blocks'] - _DATASET_METHODS_TO_MAP = ['copy', 'as_numpy', '__copy__', '__deepcopy__', 'set_coords', 'reset_coords', 'info', - 'isel', 'sel', 'head', 'tail', 'thin', 'broadcast_like', 'reindex_like', - 'reindex', 'interp', 'interp_like', 'rename', 'rename_dims', 'rename_vars', - 'swap_dims', 'expand_dims', 'set_index', 'reset_index', 'reorder_levels', 'stack', - 'unstack', 'update', 'merge', 'drop_vars', 'drop_sel', 'drop_isel', 'drop_dims', - 'transpose', 'dropna', 'fillna', 'interpolate_na', 'ffill', 'bfill', 'combine_first', - 'reduce', 'map', 'assign', 'diff', 'shift', 'roll', 'sortby', 'quantile', 'rank', - 'differentiate', 'integrate', 'cumulative_integrate', 'filter_by_attrs', 'polyfit', - 'pad', 'idxmin', 'idxmax', 'argmin', 'argmax', 'query', 'curvefit'] - # TODO unsure if these are called by external functions or not? - _DATASET_OPS_TO_MAP = ['_unary_op', '_binary_op', '_inplace_binary_op'] - _ALL_DATASET_METHODS_TO_MAP = _DATASET_DASK_METHODS_TO_MAP + _DATASET_METHODS_TO_MAP + _DATASET_OPS_TO_MAP - - # TODO methods which should not or cannot act over the whole tree, such as .to_array - _wrap_then_attach_to_cls(vars(), Dataset, _ALL_DATASET_METHODS_TO_MAP, wrap_func=map_over_subtree) class MappedDataWithCoords(DataWithCoords): # TODO add mapped versions of groupby, weighted, rolling, rolling_exp, coarsen, resample # TODO re-implement AttrsAccessMixin stuff so that it includes access to child nodes - _DATA_WITH_COORDS_METHODS_TO_MAP = ['squeeze', 'clip', 'assign_coords', 'where', 'close', 'isnull', 'notnull', - 'isin', 'astype'] _wrap_then_attach_to_cls(vars(), DataWithCoords, _DATA_WITH_COORDS_METHODS_TO_MAP, wrap_func=map_over_subtree) @@ -294,9 +296,6 @@ class DataTreeArithmetic(DatasetArithmetic): because they (a) only call dataset properties and (b) don't return a dataset that should be nested into a new tree) and some will get overridden by the class definition of DataTree. """ - - # TODO NUM_BINARY_OPS apparently aren't defined on DatasetArithmetic, and don't appear to be injected anywhere... - _ARITHMETIC_METHODS_TO_WRAP = ['__array_ufunc__'] + REDUCE_METHODS + NAN_REDUCE_METHODS + NAN_CUM_METHODS _wrap_then_attach_to_cls(vars(), DatasetArithmetic, _ARITHMETIC_METHODS_TO_WRAP, wrap_func=map_over_subtree) @@ -342,6 +341,8 @@ class DataTree(TreeNode, DatasetPropertiesMixin, MappedDatasetMethodsMixin, Mapp # TODO currently allows self.ds = None, should we instead always store at least an empty Dataset? + # TODO dataset methods which should not or cannot act over the whole tree, such as .to_array + def __init__( self, data_objects: Dict[PathType, Union[Dataset, DataArray]] = None, From d765c99e8963d33f91b0190bac529fbed9ac8ad0 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 24 Aug 2021 23:23:46 -0400 Subject: [PATCH 048/260] test reduce ops --- .../datatree/tests/test_dataset_api.py | 60 +++++++++++++++---- 1 file changed, 49 insertions(+), 11 deletions(-) diff --git a/xarray/datatree_/datatree/tests/test_dataset_api.py b/xarray/datatree_/datatree/tests/test_dataset_api.py index 376414f971a..20cac07931f 100644 --- a/xarray/datatree_/datatree/tests/test_dataset_api.py +++ b/xarray/datatree_/datatree/tests/test_dataset_api.py @@ -80,7 +80,6 @@ def test_properties(self): assert dt.sizes == dt.ds.sizes assert dt.variables == dt.ds.variables - def test_no_data_no_properties(self): dt = DataNode('root', data=None) with pytest.raises(AttributeError): @@ -96,34 +95,73 @@ def test_no_data_no_properties(self): class TestDSMethodInheritance: - def test_root(self): + def test_dataset_method(self): + # test root da = xr.DataArray(name='a', data=[1, 2, 3], dims='x') dt = DataNode('root', data=da) expected_ds = da.to_dataset().isel(x=1) result_ds = dt.isel(x=1).ds assert_equal(result_ds, expected_ds) - def test_descendants(self): - da = xr.DataArray(name='a', data=[1, 2, 3], dims='x') - dt = DataNode('root') + # test descendant DataNode('results', parent=dt, data=da) - expected_ds = da.to_dataset().isel(x=1) result_ds = dt.isel(x=1)['results'].ds assert_equal(result_ds, expected_ds) + def test_reduce_method(self): + # test root + da = xr.DataArray(name='a', data=[False, True, False], dims='x') + dt = DataNode('root', data=da) + expected_ds = da.to_dataset().any() + result_ds = dt.any().ds + assert_equal(result_ds, expected_ds) -class TestOps: + # test descendant + DataNode('results', parent=dt, data=da) + result_ds = dt.any()['results'].ds + assert_equal(result_ds, expected_ds) + + def test_nan_reduce_method(self): + # test root + da = xr.DataArray(name='a', data=[1, 2, 3], dims='x') + dt = DataNode('root', data=da) + expected_ds = da.to_dataset().mean() + result_ds = dt.mean().ds + assert_equal(result_ds, expected_ds) - def test_multiplication(self): + # test descendant + DataNode('results', parent=dt, data=da) + result_ds = dt.mean()['results'].ds + assert_equal(result_ds, expected_ds) + + def test_cum_method(self): + # test root + da = xr.DataArray(name='a', data=[1, 2, 3], dims='x') + dt = DataNode('root', data=da) + expected_ds = da.to_dataset().cumsum() + result_ds = dt.cumsum().ds + assert_equal(result_ds, expected_ds) + + # test descendant + DataNode('results', parent=dt, data=da) + result_ds = dt.cumsum()['results'].ds + assert_equal(result_ds, expected_ds) + + +class TestOps: + @pytest.mark.xfail + def test_binary_op(self): ds1 = xr.Dataset({'a': [5], 'b': [3]}) ds2 = xr.Dataset({'x': [0.1, 0.2], 'y': [10, 20]}) dt = DataNode('root', data=ds1) DataNode('subnode', data=ds2, parent=dt) - print(dir(dt)) - + expected_root = DataNode('root', data=ds1*ds1) + expected_descendant = DataNode('subnode', data=ds2*ds2, parent=expected_root) result = dt * dt - print(result) + + assert_equal(result.ds, expected_root.ds) + assert_equal(result['subnode'].ds, expected_descendant.ds) @pytest.mark.xfail From d3bb49e33cd79c695b6282b5ed32b6313dae3bdc Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 25 Aug 2021 00:16:03 -0400 Subject: [PATCH 049/260] add developers note on class structure of DataTree --- xarray/datatree_/datatree/datatree.py | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 081e7117e52..1fcc8e08ae3 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -36,6 +36,15 @@ | |-- DataNode("elevation") | | Variable("height_above_sea_level") |-- DataNode("population") + + +DEVELOPERS' NOTE +---------------- +The idea of this module is to create a `DataTree` class which inherits the tree structure from TreeNode, and also copies +the entire API of `xarray.Dataset`, but with certain methods decorated to instead map the dataset function over every +node in the tree. As this API is copied without directly subclassing `xarray.Dataset` we instead create various Mixin +classes which each define part of `xarray.Dataset`'s extensive API. + """ @@ -227,8 +236,7 @@ def imag(self): 'isin', 'astype'] # TODO NUM_BINARY_OPS apparently aren't defined on DatasetArithmetic, and don't appear to be injected anywhere... -#['__array_ufunc__'] \ -_ARITHMETIC_METHODS_TO_WRAP = REDUCE_METHODS + NAN_REDUCE_METHODS + NAN_CUM_METHODS +_ARITHMETIC_METHODS_TO_MAP = REDUCE_METHODS + NAN_REDUCE_METHODS + NAN_CUM_METHODS + ['__array_ufunc__'] def _wrap_then_attach_to_cls(target_cls_dict, source_cls, methods_to_set, wrap_func=None): @@ -296,7 +304,7 @@ class DataTreeArithmetic(DatasetArithmetic): because they (a) only call dataset properties and (b) don't return a dataset that should be nested into a new tree) and some will get overridden by the class definition of DataTree. """ - _wrap_then_attach_to_cls(vars(), DatasetArithmetic, _ARITHMETIC_METHODS_TO_WRAP, wrap_func=map_over_subtree) + _wrap_then_attach_to_cls(vars(), DatasetArithmetic, _ARITHMETIC_METHODS_TO_MAP, wrap_func=map_over_subtree) class DataTree(TreeNode, DatasetPropertiesMixin, MappedDatasetMethodsMixin, MappedDataWithCoords, DataTreeArithmetic): @@ -343,6 +351,8 @@ class DataTree(TreeNode, DatasetPropertiesMixin, MappedDatasetMethodsMixin, Mapp # TODO dataset methods which should not or cannot act over the whole tree, such as .to_array + # TODO del and delitem methods + def __init__( self, data_objects: Dict[PathType, Union[Dataset, DataArray]] = None, From bbdd7fce6cb1b46385a415ce1e1bdffda0eeef5a Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Wed, 25 Aug 2021 12:22:31 -0400 Subject: [PATCH 050/260] Linting https://github.com/xarray-contrib/datatree/pull/21 * black reformatting * add setup.cfg to configure flake8/black/isort/mypy * add setup.cfg to configure flake8/black/isort/mypy https://github.com/xarray-contrib/datatree/pull/22 * passes flake8 * disabled mypy for now Co-authored-by: Joseph Hamman --- xarray/datatree_/.pre-commit-config.yaml | 32 +- xarray/datatree_/README.md | 2 +- xarray/datatree_/datatree/__init__.py | 4 +- xarray/datatree_/datatree/_version.py | 2 +- xarray/datatree_/datatree/datatree.py | 276 +++++++++++++----- xarray/datatree_/datatree/io.py | 17 +- .../datatree/tests/test_dataset_api.py | 92 +++--- .../datatree_/datatree/tests/test_datatree.py | 145 ++++----- .../datatree_/datatree/tests/test_treenode.py | 47 +-- xarray/datatree_/datatree/treenode.py | 41 ++- xarray/datatree_/setup.cfg | 21 ++ xarray/datatree_/setup.py | 12 +- 12 files changed, 443 insertions(+), 248 deletions(-) create mode 100644 xarray/datatree_/setup.cfg diff --git a/xarray/datatree_/.pre-commit-config.yaml b/xarray/datatree_/.pre-commit-config.yaml index 53525d0def9..0e1e7192694 100644 --- a/xarray/datatree_/.pre-commit-config.yaml +++ b/xarray/datatree_/.pre-commit-config.yaml @@ -29,22 +29,22 @@ repos: # hooks: # - id: velin # args: ["--write", "--compact"] - - repo: https://github.com/pre-commit/mirrors-mypy - rev: v0.910 - hooks: - - id: mypy - # Copied from setup.cfg - exclude: "properties|asv_bench" - additional_dependencies: [ - # Type stubs - types-python-dateutil, - types-pkg_resources, - types-PyYAML, - types-pytz, - # Dependencies that are typed - numpy, - typing-extensions==3.10.0.0, - ] +# - repo: https://github.com/pre-commit/mirrors-mypy +# rev: v0.910 +# hooks: +# - id: mypy +# # Copied from setup.cfg +# exclude: "properties|asv_bench" +# additional_dependencies: [ +# # Type stubs +# types-python-dateutil, +# types-pkg_resources, +# types-PyYAML, +# types-pytz, +# # Dependencies that are typed +# numpy, +# typing-extensions==3.10.0.0, +# ] # run this occasionally, ref discussion https://github.com/pydata/xarray/pull/3194 # - repo: https://github.com/asottile/pyupgrade # rev: v1.22.1 diff --git a/xarray/datatree_/README.md b/xarray/datatree_/README.md index 30c564e3111..e3368eeaa9c 100644 --- a/xarray/datatree_/README.md +++ b/xarray/datatree_/README.md @@ -5,7 +5,7 @@ This aims to create the data structure discussed in [xarray issue #4118](https:/ The approach used here is based on benbovy's [`DatasetNode` example](https://gist.github.com/benbovy/92e7c76220af1aaa4b3a0b65374e233a) - the basic idea is that each tree node wraps a up to a single `xarray.Dataset`. The differences are that this effort: -- [Uses a NodeMixin from anytree](https://github.com/TomNicholas/datatree/issues/7) for the tree structure, +- [Uses a NodeMixin from anytree](https://github.com/TomNicholas/datatree/issues/7) for the tree structure, - Implements path-like and tag-like getting and setting, - Has functions for mapping user-supplied functions over every node in the tree, - Automatically dispatches *some* of `xarray.Dataset`'s API over every node in the tree (such as `.isel`), diff --git a/xarray/datatree_/datatree/__init__.py b/xarray/datatree_/datatree/__init__.py index e166c2276e1..f83edbb0970 100644 --- a/xarray/datatree_/datatree/__init__.py +++ b/xarray/datatree_/datatree/__init__.py @@ -1,2 +1,4 @@ -from .datatree import DataTree, map_over_subtree, DataNode +# flake8: noqa +# Ignoring F401: imported but unused +from .datatree import DataNode, DataTree, map_over_subtree from .io import open_datatree diff --git a/xarray/datatree_/datatree/_version.py b/xarray/datatree_/datatree/_version.py index c7d99fbfbfd..772ffe3d741 100644 --- a/xarray/datatree_/datatree/_version.py +++ b/xarray/datatree_/datatree/_version.py @@ -1 +1 @@ -__version__ = "0.1.dev46+g415cbb7.d20210825" \ No newline at end of file +__version__ = "0.1.dev46+g415cbb7.d20210825" diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 1fcc8e08ae3..a3df42d1e3b 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -1,24 +1,23 @@ from __future__ import annotations + import functools import textwrap - -from typing import Mapping, Hashable, Union, List, Any, Callable, Iterable, Dict +from typing import Any, Callable, Dict, Hashable, Iterable, List, Mapping, Union import anytree - -from xarray.core.dataset import Dataset -from xarray.core.dataarray import DataArray -from xarray.core.variable import Variable -from xarray.core.combine import merge from xarray.core import dtypes, utils -from xarray.core.common import DataWithCoords from xarray.core.arithmetic import DatasetArithmetic -from xarray.core.ops import REDUCE_METHODS, NAN_REDUCE_METHODS, NAN_CUM_METHODS +from xarray.core.combine import merge +from xarray.core.common import DataWithCoords +from xarray.core.dataarray import DataArray +from xarray.core.dataset import Dataset +from xarray.core.ops import NAN_CUM_METHODS, NAN_REDUCE_METHODS, REDUCE_METHODS +from xarray.core.variable import Variable -from .treenode import TreeNode, PathType, _init_single_treenode +from .treenode import PathType, TreeNode, _init_single_treenode """ -The structure of a populated Datatree looks roughly like this: +The structure of a populated Datatree looks roughly like this: DataTree("root name") |-- DataNode("weather") @@ -41,8 +40,8 @@ DEVELOPERS' NOTE ---------------- The idea of this module is to create a `DataTree` class which inherits the tree structure from TreeNode, and also copies -the entire API of `xarray.Dataset`, but with certain methods decorated to instead map the dataset function over every -node in the tree. As this API is copied without directly subclassing `xarray.Dataset` we instead create various Mixin +the entire API of `xarray.Dataset`, but with certain methods decorated to instead map the dataset function over every +node in the tree. As this API is copied without directly subclassing `xarray.Dataset` we instead create various Mixin classes which each define part of `xarray.Dataset`'s extensive API. """ @@ -95,11 +94,12 @@ def _map_over_subtree(tree, *args, **kwargs): # Act on every other node in the tree, and rebuild from results for node in tree.descendants: # TODO make a proper relative_path method - relative_path = node.pathstr.replace(tree.pathstr, '') + relative_path = node.pathstr.replace(tree.pathstr, "") result = func(node.ds, *args, **kwargs) if node.has_data else None out_tree[relative_path] = result return out_tree + return _map_over_subtree @@ -212,34 +212,113 @@ def imag(self): chunks.__doc__ = Dataset.chunks.__doc__ -_MAPPED_DOCSTRING_ADDENDUM = textwrap.fill("This method was copied from xarray.Dataset, but has been altered to " - "call the method on the Datasets stored in every node of the subtree. " - "See the `map_over_subtree` function for more details.", width=117) +_MAPPED_DOCSTRING_ADDENDUM = textwrap.fill( + "This method was copied from xarray.Dataset, but has been altered to " + "call the method on the Datasets stored in every node of the subtree. " + "See the `map_over_subtree` function for more details.", + width=117, +) # TODO equals, broadcast_equals etc. # TODO do dask-related private methods need to be exposed? -_DATASET_DASK_METHODS_TO_MAP = ['load', 'compute', 'persist', 'unify_chunks', 'chunk', 'map_blocks'] -_DATASET_METHODS_TO_MAP = ['copy', 'as_numpy', '__copy__', '__deepcopy__', 'set_coords', 'reset_coords', 'info', - 'isel', 'sel', 'head', 'tail', 'thin', 'broadcast_like', 'reindex_like', - 'reindex', 'interp', 'interp_like', 'rename', 'rename_dims', 'rename_vars', - 'swap_dims', 'expand_dims', 'set_index', 'reset_index', 'reorder_levels', 'stack', - 'unstack', 'update', 'merge', 'drop_vars', 'drop_sel', 'drop_isel', 'drop_dims', - 'transpose', 'dropna', 'fillna', 'interpolate_na', 'ffill', 'bfill', 'combine_first', - 'reduce', 'map', 'assign', 'diff', 'shift', 'roll', 'sortby', 'quantile', 'rank', - 'differentiate', 'integrate', 'cumulative_integrate', 'filter_by_attrs', 'polyfit', - 'pad', 'idxmin', 'idxmax', 'argmin', 'argmax', 'query', 'curvefit'] +_DATASET_DASK_METHODS_TO_MAP = [ + "load", + "compute", + "persist", + "unify_chunks", + "chunk", + "map_blocks", +] +_DATASET_METHODS_TO_MAP = [ + "copy", + "as_numpy", + "__copy__", + "__deepcopy__", + "set_coords", + "reset_coords", + "info", + "isel", + "sel", + "head", + "tail", + "thin", + "broadcast_like", + "reindex_like", + "reindex", + "interp", + "interp_like", + "rename", + "rename_dims", + "rename_vars", + "swap_dims", + "expand_dims", + "set_index", + "reset_index", + "reorder_levels", + "stack", + "unstack", + "update", + "merge", + "drop_vars", + "drop_sel", + "drop_isel", + "drop_dims", + "transpose", + "dropna", + "fillna", + "interpolate_na", + "ffill", + "bfill", + "combine_first", + "reduce", + "map", + "assign", + "diff", + "shift", + "roll", + "sortby", + "quantile", + "rank", + "differentiate", + "integrate", + "cumulative_integrate", + "filter_by_attrs", + "polyfit", + "pad", + "idxmin", + "idxmax", + "argmin", + "argmax", + "query", + "curvefit", +] # TODO unsure if these are called by external functions or not? -_DATASET_OPS_TO_MAP = ['_unary_op', '_binary_op', '_inplace_binary_op'] -_ALL_DATASET_METHODS_TO_MAP = _DATASET_DASK_METHODS_TO_MAP + _DATASET_METHODS_TO_MAP + _DATASET_OPS_TO_MAP - -_DATA_WITH_COORDS_METHODS_TO_MAP = ['squeeze', 'clip', 'assign_coords', 'where', 'close', 'isnull', 'notnull', - 'isin', 'astype'] +_DATASET_OPS_TO_MAP = ["_unary_op", "_binary_op", "_inplace_binary_op"] +_ALL_DATASET_METHODS_TO_MAP = ( + _DATASET_DASK_METHODS_TO_MAP + _DATASET_METHODS_TO_MAP + _DATASET_OPS_TO_MAP +) + +_DATA_WITH_COORDS_METHODS_TO_MAP = [ + "squeeze", + "clip", + "assign_coords", + "where", + "close", + "isnull", + "notnull", + "isin", + "astype", +] # TODO NUM_BINARY_OPS apparently aren't defined on DatasetArithmetic, and don't appear to be injected anywhere... -_ARITHMETIC_METHODS_TO_MAP = REDUCE_METHODS + NAN_REDUCE_METHODS + NAN_CUM_METHODS + ['__array_ufunc__'] +_ARITHMETIC_METHODS_TO_MAP = ( + REDUCE_METHODS + NAN_REDUCE_METHODS + NAN_CUM_METHODS + ["__array_ufunc__"] +) -def _wrap_then_attach_to_cls(target_cls_dict, source_cls, methods_to_set, wrap_func=None): +def _wrap_then_attach_to_cls( + target_cls_dict, source_cls, methods_to_set, wrap_func=None +): """ Attach given methods on a class, and optionally wrap each method first. (i.e. with map_over_subtree) @@ -266,18 +345,24 @@ def method_name(self, *args, **kwargs): """ for method_name in methods_to_set: orig_method = getattr(source_cls, method_name) - wrapped_method = wrap_func(orig_method) if wrap_func is not None else orig_method + wrapped_method = ( + wrap_func(orig_method) if wrap_func is not None else orig_method + ) target_cls_dict[method_name] = wrapped_method if wrap_func is map_over_subtree: # Add a paragraph to the method's docstring explaining how it's been mapped orig_method_docstring = orig_method.__doc__ if orig_method_docstring is not None: - if '\n' in orig_method_docstring: - new_method_docstring = orig_method_docstring.replace('\n', _MAPPED_DOCSTRING_ADDENDUM, 1) + if "\n" in orig_method_docstring: + new_method_docstring = orig_method_docstring.replace( + "\n", _MAPPED_DOCSTRING_ADDENDUM, 1 + ) else: - new_method_docstring = orig_method_docstring + f"\n\n{_MAPPED_DOCSTRING_ADDENDUM}" - setattr(target_cls_dict[method_name], '__doc__', new_method_docstring) + new_method_docstring = ( + orig_method_docstring + f"\n\n{_MAPPED_DOCSTRING_ADDENDUM}" + ) + setattr(target_cls_dict[method_name], "__doc__", new_method_docstring) class MappedDatasetMethodsMixin: @@ -286,14 +371,22 @@ class MappedDatasetMethodsMixin: Every method wrapped here needs to have a return value of Dataset or DataArray in order to construct a new tree. """ + __slots__ = () - _wrap_then_attach_to_cls(vars(), Dataset, _ALL_DATASET_METHODS_TO_MAP, wrap_func=map_over_subtree) + _wrap_then_attach_to_cls( + vars(), Dataset, _ALL_DATASET_METHODS_TO_MAP, wrap_func=map_over_subtree + ) class MappedDataWithCoords(DataWithCoords): # TODO add mapped versions of groupby, weighted, rolling, rolling_exp, coarsen, resample # TODO re-implement AttrsAccessMixin stuff so that it includes access to child nodes - _wrap_then_attach_to_cls(vars(), DataWithCoords, _DATA_WITH_COORDS_METHODS_TO_MAP, wrap_func=map_over_subtree) + _wrap_then_attach_to_cls( + vars(), + DataWithCoords, + _DATA_WITH_COORDS_METHODS_TO_MAP, + wrap_func=map_over_subtree, + ) class DataTreeArithmetic(DatasetArithmetic): @@ -304,10 +397,22 @@ class DataTreeArithmetic(DatasetArithmetic): because they (a) only call dataset properties and (b) don't return a dataset that should be nested into a new tree) and some will get overridden by the class definition of DataTree. """ - _wrap_then_attach_to_cls(vars(), DatasetArithmetic, _ARITHMETIC_METHODS_TO_MAP, wrap_func=map_over_subtree) - -class DataTree(TreeNode, DatasetPropertiesMixin, MappedDatasetMethodsMixin, MappedDataWithCoords, DataTreeArithmetic): + _wrap_then_attach_to_cls( + vars(), + DatasetArithmetic, + _ARITHMETIC_METHODS_TO_MAP, + wrap_func=map_over_subtree, + ) + + +class DataTree( + TreeNode, + DatasetPropertiesMixin, + MappedDatasetMethodsMixin, + MappedDataWithCoords, + DataTreeArithmetic, +): """ A tree-like hierarchical collection of xarray objects. @@ -374,11 +479,16 @@ def __init__( if self.separator in path: node_path, node_name = path.rsplit(self.separator, maxsplit=1) else: - node_path, node_name = '/', path + node_path, node_name = "/", path # Create and set new node new_node = DataNode(name=node_name, data=data) - self.set_node(node_path, new_node, allow_overwrite=False, new_nodes_along_path=True) + self.set_node( + node_path, + new_node, + allow_overwrite=False, + new_nodes_along_path=True, + ) new_node = self.get_node(path) new_node[path] = data @@ -389,7 +499,9 @@ def ds(self) -> Dataset: @ds.setter def ds(self, data: Union[Dataset, DataArray] = None): if not isinstance(data, (Dataset, DataArray)) and data is not None: - raise TypeError(f"{type(data)} object is not an xarray Dataset, DataArray, or None") + raise TypeError( + f"{type(data)} object is not an xarray Dataset, DataArray, or None" + ) if isinstance(data, DataArray): data = data.to_dataset() self._ds = data @@ -458,9 +570,9 @@ def _single_node_repr(self): node_info = f"DataNode('{self.name}')" if self.has_data: - ds_info = '\n' + repr(self.ds) + ds_info = "\n" + repr(self.ds) else: - ds_info = '' + ds_info = "" return node_info + ds_info def __repr__(self): @@ -471,14 +583,20 @@ def __repr__(self): if self.has_data: ds_repr_lines = self.ds.__repr__().splitlines() - ds_repr = ds_repr_lines[0] + '\n' + textwrap.indent('\n'.join(ds_repr_lines[1:]), " ") + ds_repr = ( + ds_repr_lines[0] + + "\n" + + textwrap.indent("\n".join(ds_repr_lines[1:]), " ") + ) data_str = f"\ndata={ds_repr}\n)" else: data_str = "data=None)" return node_str + data_str - def __getitem__(self, key: Union[PathType, Hashable, Mapping, Any]) -> Union[TreeNode, Dataset, DataArray]: + def __getitem__( + self, key: Union[PathType, Hashable, Mapping, Any] + ) -> Union[TreeNode, Dataset, DataArray]: """ Access either child nodes, variables, or coordinates stored in this tree. @@ -504,19 +622,23 @@ def __getitem__(self, key: Union[PathType, Hashable, Mapping, Any]) -> Union[Tre elif utils.is_list_like(key) and all(k in self.ds for k in key): # iterable of variable names return self.ds[key] - elif utils.is_list_like(key) and all('/' not in tag for tag in key): + elif utils.is_list_like(key) and all("/" not in tag for tag in key): # iterable of child tags return self._get_item_from_path(key) else: raise ValueError("Invalid format for key") - def _get_item_from_path(self, path: PathType) -> Union[TreeNode, Dataset, DataArray]: + def _get_item_from_path( + self, path: PathType + ) -> Union[TreeNode, Dataset, DataArray]: """Get item given a path. Two valid cases: either all parts of path are nodes or last part is a variable.""" # TODO this currently raises a ChildResolverError if it can't find a data variable in the ds - that's inconsistent with xarray.Dataset.__getitem__ path = self._tuple_or_path_to_path(path) - tags = [tag for tag in path.split(self.separator) if tag not in [self.separator, '']] + tags = [ + tag for tag in path.split(self.separator) if tag not in [self.separator, ""] + ] *leading_tags, last_tag = tags if leading_tags is not None: @@ -559,7 +681,9 @@ def __setitem__( raise NotImplementedError path = self._tuple_or_path_to_path(key) - tags = [tag for tag in path.split(self.separator) if tag not in [self.separator, '']] + tags = [ + tag for tag in path.split(self.separator) if tag not in [self.separator, ""] + ] # TODO a .path_as_tags method? if not tags: @@ -572,12 +696,14 @@ def __setitem__( elif value is None: self.ds = None else: - raise TypeError("Can only assign values of type TreeNode, Dataset, DataArray, or Variable, " - f"not {type(value)}") + raise TypeError( + "Can only assign values of type TreeNode, Dataset, DataArray, or Variable, " + f"not {type(value)}" + ) else: *path_tags, last_tag = tags if not path_tags: - path_tags = '/' + path_tags = "/" # get anything that already exists at that location try: @@ -602,8 +728,10 @@ def __setitem__( elif value is None: existing_node.ds = None else: - raise TypeError("Can only assign values of type TreeNode, Dataset, DataArray, or Variable, " - f"not {type(value)}") + raise TypeError( + "Can only assign values of type TreeNode, Dataset, DataArray, or Variable, " + f"not {type(value)}" + ) else: # if nothing there then make new node based on type of object if isinstance(value, (Dataset, DataArray, Variable)) or value is None: @@ -612,8 +740,10 @@ def __setitem__( elif isinstance(value, TreeNode): self.set_node(path=path, node=value) else: - raise TypeError("Can only assign values of type TreeNode, Dataset, DataArray, or Variable, " - f"not {type(value)}") + raise TypeError( + "Can only assign values of type TreeNode, Dataset, DataArray, or Variable, " + f"not {type(value)}" + ) def map_over_subtree( self, @@ -691,8 +821,11 @@ def get_all(self, *tags: Hashable) -> DataTree: Return a DataTree containing the stored objects whose path contains all of the given tags, where the tags can be present in any order. """ - matching_children = {c.tags: c.get_node(tags) for c in self.descendants - if all(tag in c.tags for tag in tags)} + matching_children = { + c.tags: c.get_node(tags) + for c in self.descendants + if all(tag in c.tags for tag in tags) + } return DataTree(data_objects=matching_children) # TODO re-implement using anytree find function? @@ -700,8 +833,11 @@ def get_any(self, *tags: Hashable) -> DataTree: """ Return a DataTree containing the stored objects whose path contains any of the given tags. """ - matching_children = {c.tags: c.get_node(tags) for c in self.descendants - if any(tag in c.tags for tag in tags)} + matching_children = { + c.tags: c.get_node(tags) + for c in self.descendants + if any(tag in c.tags for tag in tags) + } return DataTree(data_objects=matching_children) def merge(self, datatree: DataTree) -> DataTree: @@ -722,7 +858,13 @@ def merge_child_datasets( ) -> Dataset: """Merge the datasets at a set of child nodes and return as a single Dataset.""" datasets = [self.get(path).ds for path in paths] - return merge(datasets, compat=compat, join=join, fill_value=fill_value, combine_attrs=combine_attrs) + return merge( + datasets, + compat=compat, + join=join, + fill_value=fill_value, + combine_attrs=combine_attrs, + ) def as_array(self) -> DataArray: return self.ds.as_dataarray() diff --git a/xarray/datatree_/datatree/io.py b/xarray/datatree_/datatree/io.py index 08c51a21274..727b44859e8 100644 --- a/xarray/datatree_/datatree/io.py +++ b/xarray/datatree_/datatree/io.py @@ -1,11 +1,10 @@ -from typing import Sequence, Dict import os +from typing import Dict, Sequence import netCDF4 - from xarray import open_dataset -from .datatree import DataTree, DataNode, PathType +from .datatree import DataNode, DataTree, PathType def _open_group_children_recursively(filename, node, ncgroup, chunks, **kwargs): @@ -34,14 +33,16 @@ def open_datatree(filename: str, chunks: Dict = None, **kwargs) -> DataTree: DataTree """ - with netCDF4.Dataset(filename, mode='r') as ncfile: + with netCDF4.Dataset(filename, mode="r") as ncfile: ds = open_dataset(filename, chunks=chunks, **kwargs) - tree_root = DataTree(data_objects={'root': ds}) + tree_root = DataTree(data_objects={"root": ds}) _open_group_children_recursively(filename, tree_root, ncfile, chunks, **kwargs) return tree_root -def open_mfdatatree(filepaths, rootnames: Sequence[PathType] = None, chunks=None, **kwargs) -> DataTree: +def open_mfdatatree( + filepaths, rootnames: Sequence[PathType] = None, chunks=None, **kwargs +) -> DataTree: """ Open multiple files as a single DataTree. @@ -57,7 +58,9 @@ def open_mfdatatree(filepaths, rootnames: Sequence[PathType] = None, chunks=None for file, root in zip(filepaths, rootnames): dt = open_datatree(file, chunks=chunks, **kwargs) - full_tree.set_node(path=root, node=dt, new_nodes_along_path=True, allow_overwrite=False) + full_tree.set_node( + path=root, node=dt, new_nodes_along_path=True, allow_overwrite=False + ) return full_tree diff --git a/xarray/datatree_/datatree/tests/test_dataset_api.py b/xarray/datatree_/datatree/tests/test_dataset_api.py index 20cac07931f..afda3588ac0 100644 --- a/xarray/datatree_/datatree/tests/test_dataset_api.py +++ b/xarray/datatree_/datatree/tests/test_dataset_api.py @@ -1,13 +1,10 @@ -import pytest - import numpy as np - +import pytest import xarray as xr +from test_datatree import create_test_datatree from xarray.testing import assert_equal -from datatree import DataTree, DataNode, map_over_subtree - -from test_datatree import create_test_datatree +from datatree import DataNode, DataTree, map_over_subtree class TestMapOverSubTree: @@ -21,7 +18,10 @@ def times_ten(ds): result_tree = times_ten(dt) # TODO write an assert_tree_equal function - for result_node, original_node, in zip(result_tree.subtree_nodes, dt.subtree_nodes): + for ( + result_node, + original_node, + ) in zip(result_tree.subtree_nodes, dt.subtree_nodes): assert isinstance(result_node, DataTree) if original_node.has_data: @@ -38,7 +38,10 @@ def multiply_then_add(ds, times, add=0.0): result_tree = multiply_then_add(dt, 10.0, add=2.0) - for result_node, original_node, in zip(result_tree.subtree_nodes, dt.subtree_nodes): + for ( + result_node, + original_node, + ) in zip(result_tree.subtree_nodes, dt.subtree_nodes): assert isinstance(result_node, DataTree) if original_node.has_data: @@ -54,7 +57,10 @@ def multiply_then_add(ds, times, add=0.0): result_tree = dt.map_over_subtree(multiply_then_add, 10.0, add=2.0) - for result_node, original_node, in zip(result_tree.subtree_nodes, dt.subtree_nodes): + for ( + result_node, + original_node, + ) in zip(result_tree.subtree_nodes, dt.subtree_nodes): assert isinstance(result_node, DataTree) if original_node.has_data: @@ -69,10 +75,10 @@ def test_map_over_subtree_inplace(self): class TestDSProperties: def test_properties(self): - da_a = xr.DataArray(name='a', data=[0, 2], dims=['x']) - da_b = xr.DataArray(name='b', data=[5, 6, 7], dims=['y']) - ds = xr.Dataset({'a': da_a, 'b': da_b}) - dt = DataNode('root', data=ds) + da_a = xr.DataArray(name="a", data=[0, 2], dims=["x"]) + da_b = xr.DataArray(name="b", data=[5, 6, 7], dims=["y"]) + ds = xr.Dataset({"a": da_a, "b": da_b}) + dt = DataNode("root", data=ds) assert dt.attrs == dt.ds.attrs assert dt.encoding == dt.ds.encoding @@ -81,7 +87,7 @@ def test_properties(self): assert dt.variables == dt.ds.variables def test_no_data_no_properties(self): - dt = DataNode('root', data=None) + dt = DataNode("root", data=None) with pytest.raises(AttributeError): dt.attrs with pytest.raises(AttributeError): @@ -97,86 +103,86 @@ def test_no_data_no_properties(self): class TestDSMethodInheritance: def test_dataset_method(self): # test root - da = xr.DataArray(name='a', data=[1, 2, 3], dims='x') - dt = DataNode('root', data=da) + da = xr.DataArray(name="a", data=[1, 2, 3], dims="x") + dt = DataNode("root", data=da) expected_ds = da.to_dataset().isel(x=1) result_ds = dt.isel(x=1).ds assert_equal(result_ds, expected_ds) # test descendant - DataNode('results', parent=dt, data=da) - result_ds = dt.isel(x=1)['results'].ds + DataNode("results", parent=dt, data=da) + result_ds = dt.isel(x=1)["results"].ds assert_equal(result_ds, expected_ds) def test_reduce_method(self): # test root - da = xr.DataArray(name='a', data=[False, True, False], dims='x') - dt = DataNode('root', data=da) + da = xr.DataArray(name="a", data=[False, True, False], dims="x") + dt = DataNode("root", data=da) expected_ds = da.to_dataset().any() result_ds = dt.any().ds assert_equal(result_ds, expected_ds) # test descendant - DataNode('results', parent=dt, data=da) - result_ds = dt.any()['results'].ds + DataNode("results", parent=dt, data=da) + result_ds = dt.any()["results"].ds assert_equal(result_ds, expected_ds) def test_nan_reduce_method(self): # test root - da = xr.DataArray(name='a', data=[1, 2, 3], dims='x') - dt = DataNode('root', data=da) + da = xr.DataArray(name="a", data=[1, 2, 3], dims="x") + dt = DataNode("root", data=da) expected_ds = da.to_dataset().mean() result_ds = dt.mean().ds assert_equal(result_ds, expected_ds) # test descendant - DataNode('results', parent=dt, data=da) - result_ds = dt.mean()['results'].ds + DataNode("results", parent=dt, data=da) + result_ds = dt.mean()["results"].ds assert_equal(result_ds, expected_ds) def test_cum_method(self): # test root - da = xr.DataArray(name='a', data=[1, 2, 3], dims='x') - dt = DataNode('root', data=da) + da = xr.DataArray(name="a", data=[1, 2, 3], dims="x") + dt = DataNode("root", data=da) expected_ds = da.to_dataset().cumsum() result_ds = dt.cumsum().ds assert_equal(result_ds, expected_ds) # test descendant - DataNode('results', parent=dt, data=da) - result_ds = dt.cumsum()['results'].ds + DataNode("results", parent=dt, data=da) + result_ds = dt.cumsum()["results"].ds assert_equal(result_ds, expected_ds) class TestOps: @pytest.mark.xfail def test_binary_op(self): - ds1 = xr.Dataset({'a': [5], 'b': [3]}) - ds2 = xr.Dataset({'x': [0.1, 0.2], 'y': [10, 20]}) - dt = DataNode('root', data=ds1) - DataNode('subnode', data=ds2, parent=dt) + ds1 = xr.Dataset({"a": [5], "b": [3]}) + ds2 = xr.Dataset({"x": [0.1, 0.2], "y": [10, 20]}) + dt = DataNode("root", data=ds1) + DataNode("subnode", data=ds2, parent=dt) - expected_root = DataNode('root', data=ds1*ds1) - expected_descendant = DataNode('subnode', data=ds2*ds2, parent=expected_root) + expected_root = DataNode("root", data=ds1 * ds1) + expected_descendant = DataNode("subnode", data=ds2 * ds2, parent=expected_root) result = dt * dt assert_equal(result.ds, expected_root.ds) - assert_equal(result['subnode'].ds, expected_descendant.ds) + assert_equal(result["subnode"].ds, expected_descendant.ds) @pytest.mark.xfail class TestUFuncs: def test_root(self): - da = xr.DataArray(name='a', data=[1, 2, 3]) - dt = DataNode('root', data=da) + da = xr.DataArray(name="a", data=[1, 2, 3]) + dt = DataNode("root", data=da) expected_ds = np.sin(da.to_dataset()) result_ds = np.sin(dt).ds assert_equal(result_ds, expected_ds) def test_descendants(self): - da = xr.DataArray(name='a', data=[1, 2, 3]) - dt = DataNode('root') - DataNode('results', parent=dt, data=da) + da = xr.DataArray(name="a", data=[1, 2, 3]) + dt = DataNode("root") + DataNode("results", parent=dt, data=da) expected_ds = np.sin(da.to_dataset()) - result_ds = np.sin(dt)['results'].ds + result_ds = np.sin(dt)["results"].ds assert_equal(result_ds, expected_ds) diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index f3b0ba1305f..a1266af02f3 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -1,11 +1,9 @@ import pytest - import xarray as xr -from xarray.testing import assert_identical - from anytree.resolver import ChildResolverError +from xarray.testing import assert_identical -from datatree import DataTree, DataNode +from datatree import DataNode, DataTree def create_test_datatree(): @@ -38,26 +36,26 @@ def create_test_datatree(): The structure has deliberately repeated names of tags, variables, and dimensions in order to better check for bugs caused by name conflicts. """ - set1_data = xr.Dataset({'a': 0, 'b': 1}) - set2_data = xr.Dataset({'a': ('x', [2, 3]), 'b': ('x', [0.1, 0.2])}) - root_data = xr.Dataset({'a': ('y', [6, 7, 8]), 'set1': ('x', [9, 10])}) + set1_data = xr.Dataset({"a": 0, "b": 1}) + set2_data = xr.Dataset({"a": ("x", [2, 3]), "b": ("x", [0.1, 0.2])}) + root_data = xr.Dataset({"a": ("y", [6, 7, 8]), "set1": ("x", [9, 10])}) # Avoid using __init__ so we can independently test it # TODO change so it has a DataTree at the bottom - root = DataNode(name='root', data=root_data) + root = DataNode(name="root", data=root_data) set1 = DataNode(name="set1", parent=root, data=set1_data) - set1_set1 = DataNode(name="set1", parent=set1) - set1_set2 = DataNode(name="set2", parent=set1) + DataNode(name="set1", parent=set1) + DataNode(name="set2", parent=set1) set2 = DataNode(name="set2", parent=root, data=set2_data) - set2_set1 = DataNode(name="set1", parent=set2) - set3 = DataNode(name="set3", parent=root) + DataNode(name="set1", parent=set2) + DataNode(name="set3", parent=root) return root class TestStoreDatasets: def test_create_datanode(self): - dat = xr.Dataset({'a': 0}) + dat = xr.Dataset({"a": 0}) john = DataNode("john", data=dat) assert john.ds is dat with pytest.raises(TypeError): @@ -65,14 +63,14 @@ def test_create_datanode(self): def test_set_data(self): john = DataNode("john") - dat = xr.Dataset({'a': 0}) + dat = xr.Dataset({"a": 0}) john.ds = dat assert john.ds is dat with pytest.raises(TypeError): john.ds = "junk" def test_has_data(self): - john = DataNode("john", data=xr.Dataset({'a': 0})) + john = DataNode("john", data=xr.Dataset({"a": 0})) assert john.has_data john = DataNode("john", data=None) @@ -97,13 +95,13 @@ def test_get_single_data_variable_from_node(self): data = xr.Dataset({"temp": [0, 50]}) folder1 = DataNode("folder1") results = DataNode("results", parent=folder1) - highres = DataNode("highres", parent=results, data=data) + DataNode("highres", parent=results, data=data) assert_identical(folder1["results/highres/temp"], data["temp"]) assert_identical(folder1[("results", "highres", "temp")], data["temp"]) def test_get_nonexistent_node(self): folder1 = DataNode("folder1") - results = DataNode("results", parent=folder1) + DataNode("results", parent=folder1) with pytest.raises(ChildResolverError): folder1["results/highres"] @@ -116,12 +114,12 @@ def test_get_nonexistent_variable(self): def test_get_multiple_data_variables(self): data = xr.Dataset({"temp": [0, 50], "p": [5, 8, 7]}) results = DataNode("results", data=data) - assert_identical(results[['temp', 'p']], data[['temp', 'p']]) + assert_identical(results[["temp", "p"]], data[["temp", "p"]]) def test_dict_like_selection_access_to_dataset(self): data = xr.Dataset({"temp": [0, 50]}) results = DataNode("results", data=data) - assert_identical(results[{'temp': 1}], data[{'temp': 1}]) + assert_identical(results[{"temp": 1}], data[{"temp": 1}]) class TestSetItems: @@ -129,78 +127,78 @@ class TestSetItems: def test_set_new_child_node(self): john = DataNode("john") mary = DataNode("mary") - john['/'] = mary - assert john['mary'] is mary + john["/"] = mary + assert john["mary"] is mary def test_set_new_grandchild_node(self): john = DataNode("john") - mary = DataNode("mary", parent=john) + DataNode("mary", parent=john) rose = DataNode("rose") - john['mary/'] = rose - assert john['mary/rose'] is rose + john["mary/"] = rose + assert john["mary/rose"] is rose def test_set_new_empty_node(self): john = DataNode("john") - john['mary'] = None - mary = john['mary'] + john["mary"] = None + mary = john["mary"] assert isinstance(mary, DataTree) assert mary.ds is None def test_overwrite_data_in_node_with_none(self): john = DataNode("john") mary = DataNode("mary", parent=john, data=xr.Dataset()) - john['mary'] = None + john["mary"] = None assert mary.ds is None john.ds = xr.Dataset() - john['/'] = None + john["/"] = None assert john.ds is None def test_set_dataset_on_this_node(self): data = xr.Dataset({"temp": [0, 50]}) results = DataNode("results") - results['/'] = data + results["/"] = data assert results.ds is data def test_set_dataset_as_new_node(self): data = xr.Dataset({"temp": [0, 50]}) folder1 = DataNode("folder1") - folder1['results'] = data - assert folder1['results'].ds is data + folder1["results"] = data + assert folder1["results"].ds is data def test_set_dataset_as_new_node_requiring_intermediate_nodes(self): data = xr.Dataset({"temp": [0, 50]}) folder1 = DataNode("folder1") - folder1['results/highres'] = data - assert folder1['results/highres'].ds is data + folder1["results/highres"] = data + assert folder1["results/highres"].ds is data def test_set_named_dataarray_as_new_node(self): - data = xr.DataArray(name='temp', data=[0, 50]) + data = xr.DataArray(name="temp", data=[0, 50]) folder1 = DataNode("folder1") - folder1['results'] = data - assert_identical(folder1['results'].ds, data.to_dataset()) + folder1["results"] = data + assert_identical(folder1["results"].ds, data.to_dataset()) def test_set_unnamed_dataarray(self): data = xr.DataArray([0, 50]) folder1 = DataNode("folder1") with pytest.raises(ValueError, match="unable to convert"): - folder1['results'] = data + folder1["results"] = data def test_add_new_variable_to_empty_node(self): results = DataNode("results") - results['/'] = xr.DataArray(name='pressure', data=[2, 3]) - assert 'pressure' in results.ds + results["/"] = xr.DataArray(name="pressure", data=[2, 3]) + assert "pressure" in results.ds # What if there is a path to traverse first? results = DataNode("results") - results['highres/'] = xr.DataArray(name='pressure', data=[2, 3]) - assert 'pressure' in results['highres'].ds + results["highres/"] = xr.DataArray(name="pressure", data=[2, 3]) + assert "pressure" in results["highres"].ds def test_dataarray_replace_existing_node(self): t = xr.Dataset({"temp": [0, 50]}) results = DataNode("results", data=t) - p = xr.DataArray(name='pressure', data=[2, 3]) - results['/'] = p + p = xr.DataArray(name="pressure", data=[2, 3]) + results["/"] = p assert_identical(results.ds, p.to_dataset()) @@ -209,7 +207,7 @@ def test_empty(self): dt = DataTree() assert dt.name == "root" assert dt.parent is None - assert dt.children is () + assert dt.children == () assert dt.ds is None def test_data_in_root(self): @@ -217,31 +215,36 @@ def test_data_in_root(self): dt = DataTree({"root": dat}) assert dt.name == "root" assert dt.parent is None - assert dt.children is () + assert dt.children == () assert dt.ds is dat def test_one_layer(self): - dat1, dat2 = xr.Dataset({'a': 1}), xr.Dataset({'b': 2}) + dat1, dat2 = xr.Dataset({"a": 1}), xr.Dataset({"b": 2}) dt = DataTree({"run1": dat1, "run2": dat2}) assert dt.ds is None - assert dt['run1'].ds is dat1 - assert dt['run2'].ds is dat2 + assert dt["run1"].ds is dat1 + assert dt["run2"].ds is dat2 def test_two_layers(self): - dat1, dat2 = xr.Dataset({'a': 1}), xr.Dataset({'a': [1, 2]}) + dat1, dat2 = xr.Dataset({"a": 1}), xr.Dataset({"a": [1, 2]}) dt = DataTree({"highres/run": dat1, "lowres/run": dat2}) - assert 'highres' in [c.name for c in dt.children] - assert 'lowres' in [c.name for c in dt.children] - highres_run = dt.get_node('highres/run') + assert "highres" in [c.name for c in dt.children] + assert "lowres" in [c.name for c in dt.children] + highres_run = dt.get_node("highres/run") assert highres_run.ds is dat1 def test_full(self): dt = create_test_datatree() paths = list(node.pathstr for node in dt.subtree_nodes) - assert paths == ['root', 'root/set1', 'root/set1/set1', - 'root/set1/set2', - 'root/set2', 'root/set2/set1', - 'root/set3'] + assert paths == [ + "root", + "root/set1", + "root/set1/set1", + "root/set1/set2", + "root/set2", + "root/set2/set1", + "root/set3", + ] class TestBrowsing: @@ -254,27 +257,29 @@ class TestRestructuring: class TestRepr: def test_print_empty_node(self): - dt = DataNode('root') + dt = DataNode("root") printout = dt.__str__() assert printout == "DataNode('root')" def test_print_node_with_data(self): - dat = xr.Dataset({'a': [0, 2]}) - dt = DataNode('root', data=dat) + dat = xr.Dataset({"a": [0, 2]}) + dt = DataNode("root", data=dat) printout = dt.__str__() - expected = ["DataNode('root')", - "Dimensions", - "Coordinates", - "a", - "Data variables", - "*empty*"] + expected = [ + "DataNode('root')", + "Dimensions", + "Coordinates", + "a", + "Data variables", + "*empty*", + ] for expected_line, printed_line in zip(expected, printout.splitlines()): assert expected_line in printed_line def test_nested_node(self): - dat = xr.Dataset({'a': [0, 2]}) - root = DataNode('root') - DataNode('results', data=dat, parent=root) + dat = xr.Dataset({"a": [0, 2]}) + root = DataNode("root") + DataNode("results", data=dat, parent=root) printout = root.__str__() assert printout.splitlines()[2].startswith(" ") @@ -286,8 +291,8 @@ def test_print_datatree(self): # TODO work out how to test something complex like this def test_repr_of_node_with_data(self): - dat = xr.Dataset({'a': [0, 2]}) - dt = DataNode('root', data=dat) + dat = xr.Dataset({"a": [0, 2]}) + dt = DataNode("root", data=dat) assert "Coordinates" in repr(dt) diff --git a/xarray/datatree_/datatree/tests/test_treenode.py b/xarray/datatree_/datatree/tests/test_treenode.py index fa8d23e1afb..0c86af16dfa 100644 --- a/xarray/datatree_/datatree/tests/test_treenode.py +++ b/xarray/datatree_/datatree/tests/test_treenode.py @@ -1,5 +1,4 @@ import pytest - from anytree.node.exceptions import TreeError from anytree.resolver import ChildResolverError @@ -143,27 +142,27 @@ class TestSetNodes: def test_set_child_node(self): john = TreeNode("john") mary = TreeNode("mary") - john.set_node('/', mary) + john.set_node("/", mary) mary = john.children[0] assert mary.name == "mary" assert isinstance(mary, TreeNode) - assert mary.children is () + assert mary.children == () def test_child_already_exists(self): john = TreeNode("john") - mary = TreeNode("mary", parent=john) + TreeNode("mary", parent=john) marys_replacement = TreeNode("mary") with pytest.raises(KeyError): - john.set_node('/', marys_replacement, allow_overwrite=False) + john.set_node("/", marys_replacement, allow_overwrite=False) def test_set_grandchild(self): john = TreeNode("john") mary = TreeNode("mary") rose = TreeNode("rose") - john.set_node('/', mary) - john.set_node('/mary/', rose) + john.set_node("/", mary) + john.set_node("/mary/", rose) mary = john.children[0] assert mary.name == "mary" @@ -173,7 +172,7 @@ def test_set_grandchild(self): rose = mary.children[0] assert rose.name == "rose" assert isinstance(rose, TreeNode) - assert rose.children is () + assert rose.children == () def test_set_grandchild_and_create_intermediate_child(self): john = TreeNode("john") @@ -188,13 +187,15 @@ def test_set_grandchild_and_create_intermediate_child(self): rose = mary.children[0] assert rose.name == "rose" assert isinstance(rose, TreeNode) - assert rose.children is () + assert rose.children == () def test_no_intermediate_children_allowed(self): john = TreeNode("john") rose = TreeNode("rose") with pytest.raises(KeyError, match="Cannot reach"): - john.set_node(path="mary", node=rose, new_nodes_along_path=False, allow_overwrite=True) + john.set_node( + path="mary", node=rose, new_nodes_along_path=False, allow_overwrite=True + ) def test_set_great_grandchild(self): john = TreeNode("john") @@ -207,23 +208,25 @@ def test_set_great_grandchild(self): def test_overwrite_child(self): john = TreeNode("john") mary = TreeNode("mary") - john.set_node('/', mary) + john.set_node("/", mary) assert mary in john.children marys_evil_twin = TreeNode("mary") - john.set_node('/', marys_evil_twin) + john.set_node("/", marys_evil_twin) assert marys_evil_twin in john.children assert mary not in john.children def test_dont_overwrite_child(self): john = TreeNode("john") mary = TreeNode("mary") - john.set_node('/', mary) + john.set_node("/", mary) assert mary in john.children marys_evil_twin = TreeNode("mary") with pytest.raises(KeyError, match="path already points"): - john.set_node('', marys_evil_twin, new_nodes_along_path=True, allow_overwrite=False) + john.set_node( + "", marys_evil_twin, new_nodes_along_path=True, allow_overwrite=False + ) assert mary in john.children assert marys_evil_twin not in john.children @@ -253,14 +256,16 @@ def test_render_nodetree(self): mary = TreeNode("mary") kate = TreeNode("kate") john = TreeNode("john", children=[mary, kate]) - sam = TreeNode("Sam", parent=mary) - ben = TreeNode("Ben", parent=mary) + TreeNode("Sam", parent=mary) + TreeNode("Ben", parent=mary) printout = john.__str__() - expected_nodes = ["TreeNode('john')", - "TreeNode('mary')", - "TreeNode('Sam')", - "TreeNode('Ben')", - "TreeNode('kate')"] + expected_nodes = [ + "TreeNode('john')", + "TreeNode('mary')", + "TreeNode('Sam')", + "TreeNode('Ben')", + "TreeNode('kate')", + ] for expected_node, printed_node in zip(expected_nodes, printout.splitlines()): assert expected_node in printed_node diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index d0f43514e83..e11f96f7cd9 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -1,15 +1,14 @@ from __future__ import annotations -from typing import Sequence, Tuple, Hashable, Union, Iterable +from typing import Hashable, Iterable, Sequence, Tuple, Union import anytree - PathType = Union[Hashable, Sequence[Hashable]] def _init_single_treenode(obj, name, parent, children): - if not isinstance(name, str) or '/' in name: + if not isinstance(name, str) or "/" in name: raise ValueError(f"invalid name {name}") obj.name = name @@ -42,7 +41,7 @@ class TreeNode(anytree.NodeMixin): # TODO change .path in the parent class to behave like .path_str does here. (old .path -> .walk_path()) - _resolver = anytree.Resolver('name') + _resolver = anytree.Resolver("name") def __init__( self, @@ -72,7 +71,7 @@ def __repr__(self): @property def pathstr(self) -> str: """Path from root to this node, as a filepath-like string.""" - return '/'.join(self.tags) + return "/".join(self.tags) @property def has_data(self): @@ -84,7 +83,9 @@ def _pre_attach(self, parent: TreeNode) -> None: children with duplicate names. """ if self.name in list(c.name for c in parent.children): - raise KeyError(f"parent {str(parent)} already has a child named {self.name}") + raise KeyError( + f"parent {str(parent)} already has a child named {self.name}" + ) def add_child(self, child: TreeNode) -> None: """Add a single child node below this node, without replacement.""" @@ -122,11 +123,11 @@ def get_node(self, path: PathType) -> TreeNode: # TODO change so this raises a standard KeyError instead of a ChildResolverError when it can't find an item p = self._tuple_or_path_to_path(path) - return anytree.Resolver('name').get(self, p) + return anytree.Resolver("name").get(self, p) def set_node( self, - path: PathType = '/', + path: PathType = "/", node: TreeNode = None, new_nodes_along_path: bool = True, allow_overwrite: bool = True, @@ -161,12 +162,16 @@ def set_node( path = self._tuple_or_path_to_path(path) if not isinstance(node, TreeNode): - raise ValueError(f"Can only set nodes to be subclasses of TreeNode, but node is of type {type(node)}") + raise ValueError( + f"Can only set nodes to be subclasses of TreeNode, but node is of type {type(node)}" + ) node_name = node.name # Walk to location of new node, creating intermediate node objects as we go if necessary parent = self - tags = [tag for tag in path.split(self.separator) if tag not in [self.separator, '']] + tags = [ + tag for tag in path.split(self.separator) if tag not in [self.separator, ""] + ] for tag in tags: # TODO will this mutation within a for loop actually work? if tag not in [child.name for child in parent.children]: @@ -178,8 +183,10 @@ def set_node( new_node = type(self)(name=tag) parent.add_child(new_node) else: - raise KeyError(f"Cannot reach new node at path {path}: " - f"parent {parent} has no child {tag}") + raise KeyError( + f"Cannot reach new node at path {path}: " + f"parent {parent} has no child {tag}" + ) parent = parent.get_node(tag) # Deal with anything already existing at this location @@ -190,8 +197,10 @@ def set_node( del child else: # TODO should this be before we walk to the new node? - raise KeyError(f"Cannot set item at {path} whilst that path already points to a " - f"{type(parent.get_node(node_name))} object") + raise KeyError( + f"Cannot set item at {path} whilst that path already points to a " + f"{type(parent.get_node(node_name))} object" + ) # Place new child node at this location parent.add_child(node) @@ -206,7 +215,9 @@ def tags(self) -> Tuple[Hashable]: @tags.setter def tags(self, value): - raise AttributeError(f"tags cannot be set, except via changing the children and/or parent of a node.") + raise AttributeError( + "tags cannot be set, except via changing the children and/or parent of a node." + ) @property def subtree_nodes(self): diff --git a/xarray/datatree_/setup.cfg b/xarray/datatree_/setup.cfg new file mode 100644 index 00000000000..3a6f8120ce5 --- /dev/null +++ b/xarray/datatree_/setup.cfg @@ -0,0 +1,21 @@ +[flake8] +ignore = + E203 # whitespace before ':' - doesn't work well with black + E402 # module level import not at top of file + E501 # line too long - let black worry about that + E731 # do not assign a lambda expression, use a def + W503 # line break before binary operator +exclude= + .eggs + doc + +[isort] +profile = black +skip_gitignore = true +float_to_top = true +default_section = THIRDPARTY +known_first_party = datatree + +[mypy] +files = datatree/**/*.py +show_error_codes = True diff --git a/xarray/datatree_/setup.py b/xarray/datatree_/setup.py index 1110c1a3ea5..cae7a90389c 100644 --- a/xarray/datatree_/setup.py +++ b/xarray/datatree_/setup.py @@ -1,15 +1,15 @@ from os.path import exists -from setuptools import find_packages, setup +from setuptools import find_packages, setup -with open('requirements.txt') as f: - install_requires = f.read().strip().split('\n') +with open("requirements.txt") as f: + install_requires = f.read().strip().split("\n") -if exists('README.rst'): - with open('README.rst') as f: +if exists("README.rst"): + with open("README.rst") as f: long_description = f.read() else: - long_description = '' + long_description = "" setup( From c1785e08afa61e64d264aa5c733d91478368e54f Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 25 Aug 2021 14:02:24 -0400 Subject: [PATCH 051/260] updated developer's note --- xarray/datatree_/datatree/datatree.py | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index a3df42d1e3b..e3fcf7360d1 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -44,6 +44,9 @@ node in the tree. As this API is copied without directly subclassing `xarray.Dataset` we instead create various Mixin classes which each define part of `xarray.Dataset`'s extensive API. +Some of these methods must be wrapped to map over all nodes in the subtree. Others are fine to inherit unaltered +(normally because they (a) only call dataset properties and (b) don't return a dataset that should be nested into a new +tree) and some will get overridden by the class definition of DataTree. """ @@ -329,6 +332,8 @@ def method_name(self, *args, **kwargs): return self.method(*args, **kwargs) ``` + Every method attached here needs to have a return value of Dataset or DataArray in order to construct a new tree. + Parameters ---------- target_cls_dict : MappingProxy @@ -368,8 +373,6 @@ def method_name(self, *args, **kwargs): class MappedDatasetMethodsMixin: """ Mixin to add Dataset methods like .mean(), but wrapped to map over all nodes in the subtree. - - Every method wrapped here needs to have a return value of Dataset or DataArray in order to construct a new tree. """ __slots__ = () @@ -391,11 +394,7 @@ class MappedDataWithCoords(DataWithCoords): class DataTreeArithmetic(DatasetArithmetic): """ - Mixin to add Dataset methods like __add__ and .mean() - - Some of these methods must be wrapped to map over all nodes in the subtree. Others are fine unaltered (normally - because they (a) only call dataset properties and (b) don't return a dataset that should be nested into a new - tree) and some will get overridden by the class definition of DataTree. + Mixin to add Dataset methods like __add__ and .mean(). """ _wrap_then_attach_to_cls( From 81af7d13c5842013204b77288f721d1a69698181 Mon Sep 17 00:00:00 2001 From: Joe Hamman Date: Wed, 25 Aug 2021 15:17:56 -0700 Subject: [PATCH 052/260] [WIP] add DataTree.to_netcdf https://github.com/xarray-contrib/datatree/pull/26 * first attempt at to_netcdf * lint * add test for roundtrip and support empty nodes * Apply suggestions from code review Co-authored-by: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> * update roundtrip test, improves empty node handling in IO Co-authored-by: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> --- xarray/datatree_/datatree/datatree.py | 38 ++++++++- xarray/datatree_/datatree/io.py | 80 ++++++++++++++++++- .../datatree_/datatree/tests/test_datatree.py | 24 +++++- 3 files changed, 135 insertions(+), 7 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index e3fcf7360d1..5125b270d4a 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -873,10 +873,44 @@ def groups(self): """Return all netCDF4 groups in the tree, given as a tuple of path-like strings.""" return tuple(node.pathstr for node in self.subtree_nodes) - def to_netcdf(self, filename: str): + def to_netcdf( + self, filepath, mode: str = "w", encoding=None, unlimited_dims=None, **kwargs + ): + """ + Write datatree contents to a netCDF file. + + Paramters + --------- + filepath : str or Path + Path to which to save this datatree. + mode : {"w", "a"}, default: "w" + Write ('w') or append ('a') mode. If mode='w', any existing file at + this location will be overwritten. If mode='a', existing variables + will be overwritten. Only appies to the root group. + encoding : dict, optional + Nested dictionary with variable names as keys and dictionaries of + variable specific encodings as values, e.g., + ``{"root/set1": {"my_variable": {"dtype": "int16", "scale_factor": 0.1, + "zlib": True}, ...}, ...}``. See ``xarray.Dataset.to_netcdf`` for available + options. + unlimited_dims : dict, optional + Mapping of unlimited dimensions per group that that should be serialized as unlimited dimensions. + By default, no dimensions are treated as unlimited dimensions. + Note that unlimited_dims may also be set via + ``dataset.encoding["unlimited_dims"]``. + kwargs : + Addional keyword arguments to be passed to ``xarray.Dataset.to_netcdf`` + """ from .io import _datatree_to_netcdf - _datatree_to_netcdf(self, filename) + _datatree_to_netcdf( + self, + filepath, + mode=mode, + encoding=encoding, + unlimited_dims=unlimited_dims, + **kwargs, + ) def plot(self): raise NotImplementedError diff --git a/xarray/datatree_/datatree/io.py b/xarray/datatree_/datatree/io.py index 727b44859e8..c717203a0f4 100644 --- a/xarray/datatree_/datatree/io.py +++ b/xarray/datatree_/datatree/io.py @@ -7,12 +7,20 @@ from .datatree import DataNode, DataTree, PathType +def _ds_or_none(ds): + """return none if ds is empty""" + if any(ds.coords) or any(ds.variables) or any(ds.attrs): + return ds + return None + + def _open_group_children_recursively(filename, node, ncgroup, chunks, **kwargs): for g in ncgroup.groups.values(): # Open and add this node's dataset to the tree name = os.path.basename(g.path) ds = open_dataset(filename, group=g.path, chunks=chunks, **kwargs) + ds = _ds_or_none(ds) child_node = DataNode(name, ds) node.add_child(child_node) @@ -65,5 +73,73 @@ def open_mfdatatree( return full_tree -def _datatree_to_netcdf(dt: DataTree, filepath: str): - raise NotImplementedError +def _maybe_extract_group_kwargs(enc, group): + try: + return enc[group] + except KeyError: + return None + + +def _create_empty_group(filename, group, mode): + with netCDF4.Dataset(filename, mode=mode) as rootgrp: + rootgrp.createGroup(group) + + +def _datatree_to_netcdf( + dt: DataTree, + filepath, + mode: str = "w", + encoding=None, + unlimited_dims=None, + **kwargs +): + + if kwargs.get("format", None) not in [None, "NETCDF4"]: + raise ValueError("to_netcdf only supports the NETCDF4 format") + + if kwargs.get("engine", None) not in [None, "netcdf4", "h5netcdf"]: + raise ValueError("to_netcdf only supports the netcdf4 and h5netcdf engines") + + if kwargs.get("group", None) is not None: + raise NotImplementedError( + "specifying a root group for the tree has not been implemented" + ) + + if not kwargs.get("compute", True): + raise NotImplementedError("compute=False has not been implemented yet") + + if encoding is None: + encoding = {} + + if unlimited_dims is None: + unlimited_dims = {} + + ds = dt.ds + group_path = dt.pathstr.replace(dt.root.pathstr, "") + if ds is None: + _create_empty_group(filepath, group_path, mode) + else: + ds.to_netcdf( + filepath, + group=group_path, + mode=mode, + encoding=_maybe_extract_group_kwargs(encoding, dt.pathstr), + unlimited_dims=_maybe_extract_group_kwargs(unlimited_dims, dt.pathstr), + **kwargs + ) + mode = "a" + + for node in dt.descendants: + ds = node.ds + group_path = node.pathstr.replace(dt.root.pathstr, "") + if ds is None: + _create_empty_group(filepath, group_path, mode) + else: + ds.to_netcdf( + filepath, + group=group_path, + mode=mode, + encoding=_maybe_extract_group_kwargs(encoding, dt.pathstr), + unlimited_dims=_maybe_extract_group_kwargs(unlimited_dims, dt.pathstr), + **kwargs + ) diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index a1266af02f3..91412e6b69f 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -4,6 +4,7 @@ from xarray.testing import assert_identical from datatree import DataNode, DataTree +from datatree.io import open_datatree def create_test_datatree(): @@ -31,14 +32,14 @@ def create_test_datatree(): | Dimensions: (x: 2, y: 3) | Data variables: | a (y) int64 6, 7, 8 - | set1 (x) int64 9, 10 + | set0 (x) int64 9, 10 The structure has deliberately repeated names of tags, variables, and dimensions in order to better check for bugs caused by name conflicts. """ set1_data = xr.Dataset({"a": 0, "b": 1}) set2_data = xr.Dataset({"a": ("x", [2, 3]), "b": ("x", [0.1, 0.2])}) - root_data = xr.Dataset({"a": ("y", [6, 7, 8]), "set1": ("x", [9, 10])}) + root_data = xr.Dataset({"a": ("y", [6, 7, 8]), "set0": ("x", [9, 10])}) # Avoid using __init__ so we can independently test it # TODO change so it has a DataTree at the bottom @@ -297,4 +298,21 @@ def test_repr_of_node_with_data(self): class TestIO: - ... + def test_to_netcdf(self, tmpdir): + filepath = str( + tmpdir / "test.nc" + ) # casting to str avoids a pathlib bug in xarray + original_dt = create_test_datatree() + original_dt.to_netcdf(filepath, engine="netcdf4") + + roundtrip_dt = open_datatree(filepath) + + original_dt.name == roundtrip_dt.name + assert original_dt.ds.identical(roundtrip_dt.ds) + for a, b in zip(original_dt.descendants, roundtrip_dt.descendants): + assert a.name == b.name + assert a.pathstr == b.pathstr + if a.has_data: + assert a.ds.identical(b.ds) + else: + assert a.ds is b.ds From 75511c735b497cd0fbb861a4e8d1e023a197d547 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 25 Aug 2021 19:45:01 -0400 Subject: [PATCH 053/260] subtree_nodes -> subtree --- xarray/datatree_/datatree/datatree.py | 6 +++--- xarray/datatree_/datatree/tests/test_dataset_api.py | 6 +++--- xarray/datatree_/datatree/tests/test_datatree.py | 2 +- xarray/datatree_/datatree/treenode.py | 4 ++-- 4 files changed, 9 insertions(+), 9 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 5125b270d4a..416e1894158 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -149,7 +149,7 @@ def attrs(self): @property def nbytes(self) -> int: - return sum(node.ds.nbytes for node in self.subtree_nodes) + return sum(node.ds.nbytes for node in self.subtree) @property def indexes(self): @@ -803,7 +803,7 @@ def map_over_subtree_inplace( # TODO if func fails on some node then the previous nodes will still have been updated... - for node in self.subtree_nodes: + for node in self.subtree: if node.has_data: node.ds = func(node.ds, *args, **kwargs) @@ -871,7 +871,7 @@ def as_array(self) -> DataArray: @property def groups(self): """Return all netCDF4 groups in the tree, given as a tuple of path-like strings.""" - return tuple(node.pathstr for node in self.subtree_nodes) + return tuple(node.pathstr for node in self.subtree) def to_netcdf( self, filepath, mode: str = "w", encoding=None, unlimited_dims=None, **kwargs diff --git a/xarray/datatree_/datatree/tests/test_dataset_api.py b/xarray/datatree_/datatree/tests/test_dataset_api.py index afda3588ac0..82d8871e7b1 100644 --- a/xarray/datatree_/datatree/tests/test_dataset_api.py +++ b/xarray/datatree_/datatree/tests/test_dataset_api.py @@ -21,7 +21,7 @@ def times_ten(ds): for ( result_node, original_node, - ) in zip(result_tree.subtree_nodes, dt.subtree_nodes): + ) in zip(result_tree.subtree, dt.subtree): assert isinstance(result_node, DataTree) if original_node.has_data: @@ -41,7 +41,7 @@ def multiply_then_add(ds, times, add=0.0): for ( result_node, original_node, - ) in zip(result_tree.subtree_nodes, dt.subtree_nodes): + ) in zip(result_tree.subtree, dt.subtree): assert isinstance(result_node, DataTree) if original_node.has_data: @@ -60,7 +60,7 @@ def multiply_then_add(ds, times, add=0.0): for ( result_node, original_node, - ) in zip(result_tree.subtree_nodes, dt.subtree_nodes): + ) in zip(result_tree.subtree, dt.subtree): assert isinstance(result_node, DataTree) if original_node.has_data: diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 91412e6b69f..b82ae9846c1 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -236,7 +236,7 @@ def test_two_layers(self): def test_full(self): dt = create_test_datatree() - paths = list(node.pathstr for node in dt.subtree_nodes) + paths = list(node.pathstr for node in dt.subtree) assert paths == [ "root", "root/set1", diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index e11f96f7cd9..898ee12dae6 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -220,6 +220,6 @@ def tags(self, value): ) @property - def subtree_nodes(self): - """An iterator over all nodes in this tree, including both self and descendants.""" + def subtree(self): + """An iterator over all nodes in this tree, including both self and all descendants.""" return anytree.iterators.PreOrderIter(self) From afc29f5efbab3a8595e441a554de759665faa704 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Thu, 26 Aug 2021 22:41:04 -0400 Subject: [PATCH 054/260] hotfix + test for bug in DataTree.__init__ --- xarray/datatree_/datatree/datatree.py | 4 +--- xarray/datatree_/datatree/tests/test_datatree.py | 2 ++ 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 416e1894158..1bd495d9e66 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -459,7 +459,7 @@ class DataTree( def __init__( self, - data_objects: Dict[PathType, Union[Dataset, DataArray]] = None, + data_objects: Dict[PathType, Union[Dataset, DataArray, None]] = None, name: Hashable = "root", ): # First create the root node @@ -488,8 +488,6 @@ def __init__( allow_overwrite=False, new_nodes_along_path=True, ) - new_node = self.get_node(path) - new_node[path] = data @property def ds(self) -> Dataset: diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index b82ae9846c1..3d587e3fe06 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -224,7 +224,9 @@ def test_one_layer(self): dt = DataTree({"run1": dat1, "run2": dat2}) assert dt.ds is None assert dt["run1"].ds is dat1 + assert dt["run1"].children == () assert dt["run2"].ds is dat2 + assert dt["run2"].children == () def test_two_layers(self): dat1, dat2 = xr.Dataset({"a": 1}), xr.Dataset({"a": [1, 2]}) From 2aea280cf5397ab5997037d2a9850e637b96dc43 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Fri, 27 Aug 2021 17:43:19 -0400 Subject: [PATCH 055/260] don't need to special case root when saving to netcdf --- xarray/datatree_/datatree/io.py | 18 ++---------------- 1 file changed, 2 insertions(+), 16 deletions(-) diff --git a/xarray/datatree_/datatree/io.py b/xarray/datatree_/datatree/io.py index c717203a0f4..84be2485d99 100644 --- a/xarray/datatree_/datatree/io.py +++ b/xarray/datatree_/datatree/io.py @@ -114,22 +114,7 @@ def _datatree_to_netcdf( if unlimited_dims is None: unlimited_dims = {} - ds = dt.ds - group_path = dt.pathstr.replace(dt.root.pathstr, "") - if ds is None: - _create_empty_group(filepath, group_path, mode) - else: - ds.to_netcdf( - filepath, - group=group_path, - mode=mode, - encoding=_maybe_extract_group_kwargs(encoding, dt.pathstr), - unlimited_dims=_maybe_extract_group_kwargs(unlimited_dims, dt.pathstr), - **kwargs - ) - mode = "a" - - for node in dt.descendants: + for node in dt.subtree: ds = node.ds group_path = node.pathstr.replace(dt.root.pathstr, "") if ds is None: @@ -143,3 +128,4 @@ def _datatree_to_netcdf( unlimited_dims=_maybe_extract_group_kwargs(unlimited_dims, dt.pathstr), **kwargs ) + mode = "a" From fe66ea7ac4e96bd6f82a0773ba13aa962d2326e6 Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Fri, 27 Aug 2021 18:49:36 -0400 Subject: [PATCH 056/260] Check isomorphism https://github.com/xarray-contrib/datatree/pull/31 * pseudocode ideas for generalizing map_over_subtree * pseudocode for a generalized map_over_subtree (still only one return arg) + a new mapping.py file * pseudocode for mapping but now multiple return values * pseudocode for mapping but with multiple return values * check_isomorphism works and has tests * cleaned up the mapping tests a bit * remove WIP from oter branch * ensure tests pass * map_over_subtree in the public API properly * linting --- xarray/datatree_/datatree/__init__.py | 3 +- xarray/datatree_/datatree/datatree.py | 58 +----- xarray/datatree_/datatree/mapping.py | 139 +++++++++++++ .../datatree/tests/test_dataset_api.py | 69 +------ .../datatree_/datatree/tests/test_datatree.py | 23 ++- .../datatree_/datatree/tests/test_mapping.py | 184 ++++++++++++++++++ xarray/datatree_/datatree/treenode.py | 2 +- 7 files changed, 346 insertions(+), 132 deletions(-) create mode 100644 xarray/datatree_/datatree/mapping.py create mode 100644 xarray/datatree_/datatree/tests/test_mapping.py diff --git a/xarray/datatree_/datatree/__init__.py b/xarray/datatree_/datatree/__init__.py index f83edbb0970..fbe1cba7860 100644 --- a/xarray/datatree_/datatree/__init__.py +++ b/xarray/datatree_/datatree/__init__.py @@ -1,4 +1,5 @@ # flake8: noqa # Ignoring F401: imported but unused -from .datatree import DataNode, DataTree, map_over_subtree +from .datatree import DataNode, DataTree from .io import open_datatree +from .mapping import map_over_subtree diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 1bd495d9e66..1828f7c4f6d 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -1,6 +1,5 @@ from __future__ import annotations -import functools import textwrap from typing import Any, Callable, Dict, Hashable, Iterable, List, Mapping, Union @@ -14,6 +13,7 @@ from xarray.core.ops import NAN_CUM_METHODS, NAN_REDUCE_METHODS, REDUCE_METHODS from xarray.core.variable import Variable +from .mapping import map_over_subtree from .treenode import PathType, TreeNode, _init_single_treenode """ @@ -50,62 +50,6 @@ """ -def map_over_subtree(func): - """ - Decorator which turns a function which acts on (and returns) single Datasets into one which acts on DataTrees. - - Applies a function to every dataset in this subtree, returning a new tree which stores the results. - - The function will be applied to any dataset stored in this node, as well as any dataset stored in any of the - descendant nodes. The returned tree will have the same structure as the original subtree. - - func needs to return a Dataset, DataArray, or None in order to be able to rebuild the subtree after mapping, as each - result will be assigned to its respective node of new tree via `DataTree.__setitem__`. - - Parameters - ---------- - func : callable - Function to apply to datasets with signature: - `func(node.ds, *args, **kwargs) -> Dataset`. - - Function will not be applied to any nodes without datasets. - *args : tuple, optional - Positional arguments passed on to `func`. - **kwargs : Any - Keyword arguments passed on to `func`. - - Returns - ------- - mapped : callable - Wrapped function which returns tree created from results of applying ``func`` to the dataset at each node. - - See also - -------- - DataTree.map_over_subtree - DataTree.map_over_subtree_inplace - """ - - @functools.wraps(func) - def _map_over_subtree(tree, *args, **kwargs): - """Internal function which maps func over every node in tree, returning a tree of the results.""" - - # Recreate and act on root node - out_tree = DataNode(name=tree.name, data=tree.ds) - if out_tree.has_data: - out_tree.ds = func(out_tree.ds, *args, **kwargs) - - # Act on every other node in the tree, and rebuild from results - for node in tree.descendants: - # TODO make a proper relative_path method - relative_path = node.pathstr.replace(tree.pathstr, "") - result = func(node.ds, *args, **kwargs) if node.has_data else None - out_tree[relative_path] = result - - return out_tree - - return _map_over_subtree - - class DatasetPropertiesMixin: """Expose properties of wrapped Dataset""" diff --git a/xarray/datatree_/datatree/mapping.py b/xarray/datatree_/datatree/mapping.py new file mode 100644 index 00000000000..b0ff2b2283a --- /dev/null +++ b/xarray/datatree_/datatree/mapping.py @@ -0,0 +1,139 @@ +import functools + +from anytree.iterators import LevelOrderIter + +from .treenode import TreeNode + + +class TreeIsomorphismError(ValueError): + """Error raised if two tree objects are not isomorphic to one another when they need to be.""" + + pass + + +def _check_isomorphic(subtree_a, subtree_b, require_names_equal=False): + """ + Check that two trees have the same structure, raising an error if not. + + Does not check the actual data in the nodes, but it does check that if one node does/doesn't have data then its + counterpart in the other tree also does/doesn't have data. + + Also does not check that the root nodes of each tree have the same parent - so this function checks that subtrees + are isomorphic, not the entire tree above (if it exists). + + Can optionally check if respective nodes should have the same name. + + Parameters + ---------- + subtree_a : DataTree + subtree_b : DataTree + require_names_equal : Bool, optional + Whether or not to also check that each node has the same name as its counterpart. Default is False. + + Raises + ------ + TypeError + If either subtree_a or subtree_b are not tree objects. + TreeIsomorphismError + If subtree_a and subtree_b are tree objects, but are not isomorphic to one another, or one contains data at a + location the other does not. Also optionally raised if their structure is isomorphic, but the names of any two + respective nodes are not equal. + """ + # TODO turn this into a public function called assert_isomorphic + + if not isinstance(subtree_a, TreeNode): + raise TypeError( + f"Argument `subtree_a is not a tree, it is of type {type(subtree_a)}" + ) + if not isinstance(subtree_b, TreeNode): + raise TypeError( + f"Argument `subtree_b is not a tree, it is of type {type(subtree_b)}" + ) + + # Walking nodes in "level-order" fashion means walking down from the root breadth-first. + # Checking by walking in this way implicitly assumes that the tree is an ordered tree (which it is so long as + # children are stored in a tuple or list rather than in a set). + for node_a, node_b in zip(LevelOrderIter(subtree_a), LevelOrderIter(subtree_b)): + path_a, path_b = node_a.pathstr, node_b.pathstr + + if require_names_equal: + if node_a.name != node_b.name: + raise TreeIsomorphismError( + f"Trees are not isomorphic because node '{path_a}' in the first tree has " + f"name '{node_a.name}', whereas its counterpart node '{path_b}' in the " + f"second tree has name '{node_b.name}'." + ) + + if node_a.has_data != node_b.has_data: + dat_a = "no " if not node_a.has_data else "" + dat_b = "no " if not node_b.has_data else "" + raise TreeIsomorphismError( + f"Trees are not isomorphic because node '{path_a}' in the first tree has " + f"{dat_a}data, whereas its counterpart node '{path_b}' in the second tree " + f"has {dat_b}data." + ) + + if len(node_a.children) != len(node_b.children): + raise TreeIsomorphismError( + f"Trees are not isomorphic because node '{path_a}' in the first tree has " + f"{len(node_a.children)} children, whereas its counterpart node '{path_b}' in " + f"the second tree has {len(node_b.children)} children." + ) + + +def map_over_subtree(func): + """ + Decorator which turns a function which acts on (and returns) single Datasets into one which acts on DataTrees. + + Applies a function to every dataset in this subtree, returning a new tree which stores the results. + + The function will be applied to any dataset stored in this node, as well as any dataset stored in any of the + descendant nodes. The returned tree will have the same structure as the original subtree. + + func needs to return a Dataset, DataArray, or None in order to be able to rebuild the subtree after mapping, as each + result will be assigned to its respective node of new tree via `DataTree.__setitem__`. + + Parameters + ---------- + func : callable + Function to apply to datasets with signature: + `func(node.ds, *args, **kwargs) -> Dataset`. + + Function will not be applied to any nodes without datasets. + *args : tuple, optional + Positional arguments passed on to `func`. + **kwargs : Any + Keyword arguments passed on to `func`. + + Returns + ------- + mapped : callable + Wrapped function which returns tree created from results of applying ``func`` to the dataset at each node. + + See also + -------- + DataTree.map_over_subtree + DataTree.map_over_subtree_inplace + """ + + @functools.wraps(func) + def _map_over_subtree(tree, *args, **kwargs): + """Internal function which maps func over every node in tree, returning a tree of the results.""" + + # Recreate and act on root node + from .datatree import DataNode + + out_tree = DataNode(name=tree.name, data=tree.ds) + if out_tree.has_data: + out_tree.ds = func(out_tree.ds, *args, **kwargs) + + # Act on every other node in the tree, and rebuild from results + for node in tree.descendants: + # TODO make a proper relative_path method + relative_path = node.pathstr.replace(tree.pathstr, "") + result = func(node.ds, *args, **kwargs) if node.has_data else None + out_tree[relative_path] = result + + return out_tree + + return _map_over_subtree diff --git a/xarray/datatree_/datatree/tests/test_dataset_api.py b/xarray/datatree_/datatree/tests/test_dataset_api.py index 82d8871e7b1..e930f49fcf3 100644 --- a/xarray/datatree_/datatree/tests/test_dataset_api.py +++ b/xarray/datatree_/datatree/tests/test_dataset_api.py @@ -1,76 +1,9 @@ import numpy as np import pytest import xarray as xr -from test_datatree import create_test_datatree from xarray.testing import assert_equal -from datatree import DataNode, DataTree, map_over_subtree - - -class TestMapOverSubTree: - def test_map_over_subtree(self): - dt = create_test_datatree() - - @map_over_subtree - def times_ten(ds): - return 10.0 * ds - - result_tree = times_ten(dt) - - # TODO write an assert_tree_equal function - for ( - result_node, - original_node, - ) in zip(result_tree.subtree, dt.subtree): - assert isinstance(result_node, DataTree) - - if original_node.has_data: - assert_equal(result_node.ds, original_node.ds * 10.0) - else: - assert not result_node.has_data - - def test_map_over_subtree_with_args_and_kwargs(self): - dt = create_test_datatree() - - @map_over_subtree - def multiply_then_add(ds, times, add=0.0): - return times * ds + add - - result_tree = multiply_then_add(dt, 10.0, add=2.0) - - for ( - result_node, - original_node, - ) in zip(result_tree.subtree, dt.subtree): - assert isinstance(result_node, DataTree) - - if original_node.has_data: - assert_equal(result_node.ds, (original_node.ds * 10.0) + 2.0) - else: - assert not result_node.has_data - - def test_map_over_subtree_method(self): - dt = create_test_datatree() - - def multiply_then_add(ds, times, add=0.0): - return times * ds + add - - result_tree = dt.map_over_subtree(multiply_then_add, 10.0, add=2.0) - - for ( - result_node, - original_node, - ) in zip(result_tree.subtree, dt.subtree): - assert isinstance(result_node, DataTree) - - if original_node.has_data: - assert_equal(result_node.ds, (original_node.ds * 10.0) + 2.0) - else: - assert not result_node.has_data - - @pytest.mark.xfail - def test_map_over_subtree_inplace(self): - raise NotImplementedError +from datatree import DataNode class TestDSProperties: diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 3d587e3fe06..f13a7f3c639 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -7,7 +7,21 @@ from datatree.io import open_datatree -def create_test_datatree(): +def assert_tree_equal(dt_a, dt_b): + assert dt_a.name == dt_b.name + assert dt_a.parent is dt_b.parent + + assert dt_a.ds.equals(dt_b.ds) + for a, b in zip(dt_a.descendants, dt_b.descendants): + assert a.name == b.name + assert a.pathstr == b.pathstr + if a.has_data: + assert a.ds.equals(b.ds) + else: + assert a.ds is b.ds + + +def create_test_datatree(modify=lambda ds: ds): """ Create a test datatree with this structure: @@ -37,12 +51,11 @@ def create_test_datatree(): The structure has deliberately repeated names of tags, variables, and dimensions in order to better check for bugs caused by name conflicts. """ - set1_data = xr.Dataset({"a": 0, "b": 1}) - set2_data = xr.Dataset({"a": ("x", [2, 3]), "b": ("x", [0.1, 0.2])}) - root_data = xr.Dataset({"a": ("y", [6, 7, 8]), "set0": ("x", [9, 10])}) + set1_data = modify(xr.Dataset({"a": 0, "b": 1})) + set2_data = modify(xr.Dataset({"a": ("x", [2, 3]), "b": ("x", [0.1, 0.2])})) + root_data = modify(xr.Dataset({"a": ("y", [6, 7, 8]), "set0": ("x", [9, 10])})) # Avoid using __init__ so we can independently test it - # TODO change so it has a DataTree at the bottom root = DataNode(name="root", data=root_data) set1 = DataNode(name="set1", parent=root, data=set1_data) DataNode(name="set1", parent=set1) diff --git a/xarray/datatree_/datatree/tests/test_mapping.py b/xarray/datatree_/datatree/tests/test_mapping.py new file mode 100644 index 00000000000..da2ad8be196 --- /dev/null +++ b/xarray/datatree_/datatree/tests/test_mapping.py @@ -0,0 +1,184 @@ +import pytest +import xarray as xr +from test_datatree import assert_tree_equal, create_test_datatree +from xarray.testing import assert_equal + +from datatree.datatree import DataNode, DataTree +from datatree.mapping import TreeIsomorphismError, _check_isomorphic, map_over_subtree +from datatree.treenode import TreeNode + +empty = xr.Dataset() + + +class TestCheckTreesIsomorphic: + def test_not_a_tree(self): + with pytest.raises(TypeError, match="not a tree"): + _check_isomorphic("s", 1) + + def test_different_widths(self): + dt1 = DataTree(data_objects={"a": empty}) + dt2 = DataTree(data_objects={"a": empty, "b": empty}) + expected_err_str = ( + "'root' in the first tree has 1 children, whereas its counterpart node 'root' in the " + "second tree has 2 children" + ) + with pytest.raises(TreeIsomorphismError, match=expected_err_str): + _check_isomorphic(dt1, dt2) + + def test_different_heights(self): + dt1 = DataTree(data_objects={"a": empty}) + dt2 = DataTree(data_objects={"a": empty, "a/b": empty}) + expected_err_str = ( + "'root/a' in the first tree has 0 children, whereas its counterpart node 'root/a' in the " + "second tree has 1 children" + ) + with pytest.raises(TreeIsomorphismError, match=expected_err_str): + _check_isomorphic(dt1, dt2) + + def test_only_one_has_data(self): + dt1 = DataTree(data_objects={"a": xr.Dataset({"a": 0})}) + dt2 = DataTree(data_objects={"a": None}) + expected_err_str = ( + "'root/a' in the first tree has data, whereas its counterpart node 'root/a' in the " + "second tree has no data" + ) + with pytest.raises(TreeIsomorphismError, match=expected_err_str): + _check_isomorphic(dt1, dt2) + + def test_names_different(self): + dt1 = DataTree(data_objects={"a": xr.Dataset()}) + dt2 = DataTree(data_objects={"b": empty}) + expected_err_str = ( + "'root/a' in the first tree has name 'a', whereas its counterpart node 'root/b' in the " + "second tree has name 'b'" + ) + with pytest.raises(TreeIsomorphismError, match=expected_err_str): + _check_isomorphic(dt1, dt2, require_names_equal=True) + + def test_isomorphic_names_equal(self): + dt1 = DataTree( + data_objects={"a": empty, "b": empty, "b/c": empty, "b/d": empty} + ) + dt2 = DataTree( + data_objects={"a": empty, "b": empty, "b/c": empty, "b/d": empty} + ) + _check_isomorphic(dt1, dt2, require_names_equal=True) + + def test_isomorphic_ordering(self): + dt1 = DataTree( + data_objects={"a": empty, "b": empty, "b/d": empty, "b/c": empty} + ) + dt2 = DataTree( + data_objects={"a": empty, "b": empty, "b/c": empty, "b/d": empty} + ) + _check_isomorphic(dt1, dt2, require_names_equal=False) + + def test_isomorphic_names_not_equal(self): + dt1 = DataTree( + data_objects={"a": empty, "b": empty, "b/c": empty, "b/d": empty} + ) + dt2 = DataTree( + data_objects={"A": empty, "B": empty, "B/C": empty, "B/D": empty} + ) + _check_isomorphic(dt1, dt2) + + def test_not_isomorphic_complex_tree(self): + dt1 = create_test_datatree() + dt2 = create_test_datatree() + dt2.set_node("set1/set2", TreeNode("set3")) + with pytest.raises(TreeIsomorphismError, match="root/set1/set2"): + _check_isomorphic(dt1, dt2) + + +class TestMapOverSubTree: + @pytest.mark.xfail + def test_no_trees_passed(self): + raise NotImplementedError + + @pytest.mark.xfail + def test_not_isomorphic(self): + raise NotImplementedError + + @pytest.mark.xfail + def test_no_trees_returned(self): + raise NotImplementedError + + def test_single_dt_arg(self): + dt = create_test_datatree() + + @map_over_subtree + def times_ten(ds): + return 10.0 * ds + + result_tree = times_ten(dt) + expected = create_test_datatree(modify=lambda ds: 10.0 * ds) + assert_tree_equal(result_tree, expected) + + def test_single_dt_arg_plus_args_and_kwargs(self): + dt = create_test_datatree() + + @map_over_subtree + def multiply_then_add(ds, times, add=0.0): + return times * ds + add + + result_tree = multiply_then_add(dt, 10.0, add=2.0) + expected = create_test_datatree(modify=lambda ds: (10.0 * ds) + 2.0) + assert_tree_equal(result_tree, expected) + + @pytest.mark.xfail + def test_multiple_dt_args(self): + ds = xr.Dataset({"a": ("x", [1, 2, 3])}) + dt = DataNode("root", data=ds) + DataNode("results", data=ds + 0.2, parent=dt) + + @map_over_subtree + def add(ds1, ds2): + return ds1 + ds2 + + expected = DataNode("root", data=ds * 2) + DataNode("results", data=(ds + 0.2) * 2, parent=expected) + + result = add(dt, dt) + + # dt1 = create_test_datatree() + # dt2 = create_test_datatree() + # expected = create_test_datatree(modify=lambda ds: 2 * ds) + + assert_tree_equal(result, expected) + + @pytest.mark.xfail + def test_dt_as_kwarg(self): + raise NotImplementedError + + @pytest.mark.xfail + def test_return_multiple_dts(self): + raise NotImplementedError + + @pytest.mark.xfail + def test_return_no_dts(self): + raise NotImplementedError + + def test_dt_method(self): + dt = create_test_datatree() + + def multiply_then_add(ds, times, add=0.0): + return times * ds + add + + result_tree = dt.map_over_subtree(multiply_then_add, 10.0, add=2.0) + + for ( + result_node, + original_node, + ) in zip(result_tree.subtree, dt.subtree): + assert isinstance(result_node, DataTree) + + if original_node.has_data: + assert_equal(result_node.ds, (original_node.ds * 10.0) + 2.0) + else: + assert not result_node.has_data + + +@pytest.mark.xfail +class TestMapOverSubTreeInplace: + def test_map_over_subtree_inplace(self): + raise NotImplementedError diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index 898ee12dae6..276577e7fc3 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -84,7 +84,7 @@ def _pre_attach(self, parent: TreeNode) -> None: """ if self.name in list(c.name for c in parent.children): raise KeyError( - f"parent {str(parent)} already has a child named {self.name}" + f"parent {parent.name} already has a child named {self.name}" ) def add_child(self, child: TreeNode) -> None: From 60d38b7042fb1895ac7bb4d6ec90bfbaf21c843d Mon Sep 17 00:00:00 2001 From: Joe Hamman Date: Mon, 30 Aug 2021 09:25:57 -0700 Subject: [PATCH 057/260] Add zarr read/write https://github.com/xarray-contrib/datatree/pull/30 * add test for roundtrip and support empty nodes * update roundtrip test, improves empty node handling in IO * add zarr read/write support * support netcdf4 or h5netcdf * netcdf is optional, zarr too! * Apply suggestions from code review Co-authored-by: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Co-authored-by: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> --- xarray/datatree_/ci/environment.yml | 1 + xarray/datatree_/datatree/datatree.py | 31 ++++ xarray/datatree_/datatree/io.py | 140 ++++++++++++++---- .../datatree_/datatree/tests/test_datatree.py | 21 +-- xarray/datatree_/requirements.txt | 1 - 5 files changed, 159 insertions(+), 35 deletions(-) diff --git a/xarray/datatree_/ci/environment.yml b/xarray/datatree_/ci/environment.yml index 8486fc927d6..e379a9fab44 100644 --- a/xarray/datatree_/ci/environment.yml +++ b/xarray/datatree_/ci/environment.yml @@ -11,3 +11,4 @@ dependencies: - black - codecov - pytest-cov + - zarr diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 1828f7c4f6d..79257343085 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -854,6 +854,37 @@ def to_netcdf( **kwargs, ) + def to_zarr(self, store, mode: str = "w", encoding=None, **kwargs): + """ + Write datatree contents to a netCDF file. + + Parameters + --------- + store : MutableMapping, str or Path, optional + Store or path to directory in file system + mode : {{"w", "w-", "a", "r+", None}, default: "w" + Persistence mode: “w” means create (overwrite if exists); “w-” means create (fail if exists); + “a” means override existing variables (create if does not exist); “r+” means modify existing + array values only (raise an error if any metadata or shapes would change). The default mode + is “a” if append_dim is set. Otherwise, it is “r+” if region is set and w- otherwise. + encoding : dict, optional + Nested dictionary with variable names as keys and dictionaries of + variable specific encodings as values, e.g., + ``{"root/set1": {"my_variable": {"dtype": "int16", "scale_factor": 0.1}, ...}, ...}``. + See ``xarray.Dataset.to_zarr`` for available options. + kwargs : + Addional keyword arguments to be passed to ``xarray.Dataset.to_zarr`` + """ + from .io import _datatree_to_zarr + + _datatree_to_zarr( + self, + store, + mode=mode, + encoding=encoding, + **kwargs, + ) + def plot(self): raise NotImplementedError diff --git a/xarray/datatree_/datatree/io.py b/xarray/datatree_/datatree/io.py index 84be2485d99..f7bdf570a04 100644 --- a/xarray/datatree_/datatree/io.py +++ b/xarray/datatree_/datatree/io.py @@ -1,10 +1,9 @@ -import os -from typing import Dict, Sequence +import pathlib +from typing import Sequence -import netCDF4 from xarray import open_dataset -from .datatree import DataNode, DataTree, PathType +from .datatree import DataTree, PathType def _ds_or_none(ds): @@ -14,37 +13,87 @@ def _ds_or_none(ds): return None -def _open_group_children_recursively(filename, node, ncgroup, chunks, **kwargs): - for g in ncgroup.groups.values(): +def _iter_zarr_groups(root, parrent=""): + parrent = pathlib.Path(parrent) + for path, group in root.groups(): + gpath = parrent / path + yield str(gpath) + yield from _iter_zarr_groups(group, parrent=gpath) - # Open and add this node's dataset to the tree - name = os.path.basename(g.path) - ds = open_dataset(filename, group=g.path, chunks=chunks, **kwargs) - ds = _ds_or_none(ds) - child_node = DataNode(name, ds) - node.add_child(child_node) - _open_group_children_recursively(filename, node[name], g, chunks, **kwargs) +def _iter_nc_groups(root, parrent=""): + parrent = pathlib.Path(parrent) + for path, group in root.groups.items(): + gpath = parrent / path + yield str(gpath) + yield from _iter_nc_groups(group, parrent=gpath) -def open_datatree(filename: str, chunks: Dict = None, **kwargs) -> DataTree: +def _get_nc_dataset_class(engine): + if engine == "netcdf4": + from netCDF4 import Dataset + elif engine == "h5netcdf": + from h5netcdf import Dataset + elif engine is None: + try: + from netCDF4 import Dataset + except ImportError: + from h5netcdf import Dataset + else: + raise ValueError(f"unsupported engine: {engine}") + return Dataset + + +def open_datatree(filename_or_obj, engine=None, **kwargs) -> DataTree: """ Open and decode a dataset from a file or file-like object, creating one Tree node for each group in the file. Parameters ---------- - filename - chunks + filename_or_obj : str, Path, file-like, or DataStore + Strings and Path objects are interpreted as a path to a netCDF file or Zarr store. + engine : str, optional + Xarray backend engine to us. Valid options include `{"netcdf4", "h5netcdf", "zarr"}`. + kwargs : + Additional keyword arguments passed to ``xarray.open_dataset`` for each group. Returns ------- DataTree """ - with netCDF4.Dataset(filename, mode="r") as ncfile: - ds = open_dataset(filename, chunks=chunks, **kwargs) + if engine == "zarr": + return _open_datatree_zarr(filename_or_obj, **kwargs) + elif engine in [None, "netcdf4", "h5netcdf"]: + return _open_datatree_netcdf(filename_or_obj, engine=engine, **kwargs) + + +def _open_datatree_netcdf(filename: str, **kwargs) -> DataTree: + ncDataset = _get_nc_dataset_class(kwargs.get("engine", None)) + + with ncDataset(filename, mode="r") as ncds: + ds = open_dataset(filename, **kwargs).pipe(_ds_or_none) + tree_root = DataTree(data_objects={"root": ds}) + for key in _iter_nc_groups(ncds): + tree_root[key] = open_dataset(filename, group=key, **kwargs).pipe( + _ds_or_none + ) + return tree_root + + +def _open_datatree_zarr(store, **kwargs) -> DataTree: + import zarr + + with zarr.open_group(store, mode="r") as zds: + ds = open_dataset(store, engine="zarr", **kwargs).pipe(_ds_or_none) tree_root = DataTree(data_objects={"root": ds}) - _open_group_children_recursively(filename, tree_root, ncfile, chunks, **kwargs) + for key in _iter_zarr_groups(zds): + try: + tree_root[key] = open_dataset( + store, engine="zarr", group=key, **kwargs + ).pipe(_ds_or_none) + except zarr.errors.PathNotFoundError: + tree_root[key] = None return tree_root @@ -80,8 +129,10 @@ def _maybe_extract_group_kwargs(enc, group): return None -def _create_empty_group(filename, group, mode): - with netCDF4.Dataset(filename, mode=mode) as rootgrp: +def _create_empty_netcdf_group(filename, group, mode, engine): + ncDataset = _get_nc_dataset_class(engine) + + with ncDataset(filename, mode=mode) as rootgrp: rootgrp.createGroup(group) @@ -91,13 +142,14 @@ def _datatree_to_netcdf( mode: str = "w", encoding=None, unlimited_dims=None, - **kwargs + **kwargs, ): if kwargs.get("format", None) not in [None, "NETCDF4"]: raise ValueError("to_netcdf only supports the NETCDF4 format") - if kwargs.get("engine", None) not in [None, "netcdf4", "h5netcdf"]: + engine = kwargs.get("engine", None) + if engine not in [None, "netcdf4", "h5netcdf"]: raise ValueError("to_netcdf only supports the netcdf4 and h5netcdf engines") if kwargs.get("group", None) is not None: @@ -118,14 +170,52 @@ def _datatree_to_netcdf( ds = node.ds group_path = node.pathstr.replace(dt.root.pathstr, "") if ds is None: - _create_empty_group(filepath, group_path, mode) + _create_empty_netcdf_group(filepath, group_path, mode, engine) else: + ds.to_netcdf( filepath, group=group_path, mode=mode, encoding=_maybe_extract_group_kwargs(encoding, dt.pathstr), unlimited_dims=_maybe_extract_group_kwargs(unlimited_dims, dt.pathstr), - **kwargs + **kwargs, ) mode = "a" + + +def _create_empty_zarr_group(store, group, mode): + import zarr + + root = zarr.open_group(store, mode=mode) + root.create_group(group, overwrite=True) + + +def _datatree_to_zarr(dt: DataTree, store, mode: str = "w", encoding=None, **kwargs): + + if kwargs.get("group", None) is not None: + raise NotImplementedError( + "specifying a root group for the tree has not been implemented" + ) + + if not kwargs.get("compute", True): + raise NotImplementedError("compute=False has not been implemented yet") + + if encoding is None: + encoding = {} + + for node in dt.subtree: + ds = node.ds + group_path = node.pathstr.replace(dt.root.pathstr, "") + if ds is None: + _create_empty_zarr_group(store, group_path, mode) + else: + ds.to_zarr( + store, + group=group_path, + mode=mode, + encoding=_maybe_extract_group_kwargs(encoding, dt.pathstr), + **kwargs, + ) + if "w" in mode: + mode = "a" diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index f13a7f3c639..4592643bdf0 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -322,12 +322,15 @@ def test_to_netcdf(self, tmpdir): roundtrip_dt = open_datatree(filepath) - original_dt.name == roundtrip_dt.name - assert original_dt.ds.identical(roundtrip_dt.ds) - for a, b in zip(original_dt.descendants, roundtrip_dt.descendants): - assert a.name == b.name - assert a.pathstr == b.pathstr - if a.has_data: - assert a.ds.identical(b.ds) - else: - assert a.ds is b.ds + assert_tree_equal(original_dt, roundtrip_dt) + + def test_to_zarr(self, tmpdir): + filepath = str( + tmpdir / "test.zarr" + ) # casting to str avoids a pathlib bug in xarray + original_dt = create_test_datatree() + original_dt.to_zarr(filepath) + + roundtrip_dt = open_datatree(filepath, engine="zarr") + + assert_tree_equal(original_dt, roundtrip_dt) diff --git a/xarray/datatree_/requirements.txt b/xarray/datatree_/requirements.txt index 67e19d194b6..a95f277b2f7 100644 --- a/xarray/datatree_/requirements.txt +++ b/xarray/datatree_/requirements.txt @@ -1,4 +1,3 @@ xarray>=0.19.0 -netcdf4 anytree future From 494d16e58300610fca982a5cea89ae1a12172a75 Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Wed, 1 Sep 2021 20:58:27 -0400 Subject: [PATCH 058/260] Map over multiple subtrees https://github.com/xarray-contrib/datatree/pull/32 * pseudocode ideas for generalizing map_over_subtree * pseudocode for a generalized map_over_subtree (still only one return arg) + a new mapping.py file * pseudocode for mapping but now multiple return values * pseudocode for mapping but with multiple return values * check_isomorphism works and has tests * cleaned up the mapping tests a bit * tests for mapping over multiple trees * incorrect pseudocode attempt to map over multiple subtrees * small improvements * fixed test * zipping of multiple arguments * passes for mapping over a single tree * successfully maps over multiple trees * successfully returns multiple trees * filled out all tests * checking types now works for trees with only one node * improved docstring --- xarray/datatree_/datatree/datatree.py | 4 +- xarray/datatree_/datatree/mapping.py | 192 +++++++++++++++--- .../datatree_/datatree/tests/test_datatree.py | 6 +- .../datatree_/datatree/tests/test_mapping.py | 151 ++++++++++---- 4 files changed, 283 insertions(+), 70 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 79257343085..e39b0c05490 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -424,10 +424,12 @@ def __init__( else: node_path, node_name = "/", path + relative_path = node_path.replace(self.name, "") + # Create and set new node new_node = DataNode(name=node_name, data=data) self.set_node( - node_path, + relative_path, new_node, allow_overwrite=False, new_nodes_along_path=True, diff --git a/xarray/datatree_/datatree/mapping.py b/xarray/datatree_/datatree/mapping.py index b0ff2b2283a..94b17ac04bd 100644 --- a/xarray/datatree_/datatree/mapping.py +++ b/xarray/datatree_/datatree/mapping.py @@ -1,6 +1,8 @@ import functools +from itertools import repeat from anytree.iterators import LevelOrderIter +from xarray import DataArray, Dataset from .treenode import TreeNode @@ -43,11 +45,11 @@ def _check_isomorphic(subtree_a, subtree_b, require_names_equal=False): if not isinstance(subtree_a, TreeNode): raise TypeError( - f"Argument `subtree_a is not a tree, it is of type {type(subtree_a)}" + f"Argument `subtree_a` is not a tree, it is of type {type(subtree_a)}" ) if not isinstance(subtree_b, TreeNode): raise TypeError( - f"Argument `subtree_b is not a tree, it is of type {type(subtree_b)}" + f"Argument `subtree_b` is not a tree, it is of type {type(subtree_b)}" ) # Walking nodes in "level-order" fashion means walking down from the root breadth-first. @@ -83,57 +85,195 @@ def _check_isomorphic(subtree_a, subtree_b, require_names_equal=False): def map_over_subtree(func): """ - Decorator which turns a function which acts on (and returns) single Datasets into one which acts on DataTrees. + Decorator which turns a function which acts on (and returns) Datasets into one which acts on and returns DataTrees. - Applies a function to every dataset in this subtree, returning a new tree which stores the results. + Applies a function to every dataset in one or more subtrees, returning new trees which store the results. - The function will be applied to any dataset stored in this node, as well as any dataset stored in any of the - descendant nodes. The returned tree will have the same structure as the original subtree. + The function will be applied to any dataset stored in any of the nodes in the trees. The returned trees will have + the same structure as the supplied trees. - func needs to return a Dataset, DataArray, or None in order to be able to rebuild the subtree after mapping, as each - result will be assigned to its respective node of new tree via `DataTree.__setitem__`. + `func` needs to return one Datasets, DataArrays, or None in order to be able to rebuild the subtrees after + mapping, as each result will be assigned to its respective node of a new tree via `DataTree.__setitem__`. Any + returned value that is one of these types will be stacked into a separate tree before returning all of them. + + The trees passed to the resulting function must all be isomorphic to one another. Their nodes need not be named + similarly, but all the output trees will have nodes named in the same way as the first tree passed. Parameters ---------- func : callable Function to apply to datasets with signature: - `func(node.ds, *args, **kwargs) -> Dataset`. + `func(*args, **kwargs) -> Union[Dataset, Iterable[Dataset]]`. + + (i.e. func must accept at least one Dataset and return at least one Dataset.) Function will not be applied to any nodes without datasets. *args : tuple, optional - Positional arguments passed on to `func`. + Positional arguments passed on to `func`. If DataTrees any data-containing nodes will be converted to Datasets \ + via .ds . **kwargs : Any - Keyword arguments passed on to `func`. + Keyword arguments passed on to `func`. If DataTrees any data-containing nodes will be converted to Datasets + via .ds . Returns ------- mapped : callable - Wrapped function which returns tree created from results of applying ``func`` to the dataset at each node. + Wrapped function which returns one or more tree(s) created from results of applying ``func`` to the dataset at + each node. See also -------- DataTree.map_over_subtree DataTree.map_over_subtree_inplace + DataTree.subtree """ + # TODO examples in the docstring + + # TODO inspect function to work out immediately if the wrong number of arguments were passed for it? + @functools.wraps(func) - def _map_over_subtree(tree, *args, **kwargs): + def _map_over_subtree(*args, **kwargs): """Internal function which maps func over every node in tree, returning a tree of the results.""" + from .datatree import DataTree + + all_tree_inputs = [a for a in args if isinstance(a, DataTree)] + [ + a for a in kwargs.values() if isinstance(a, DataTree) + ] + + if len(all_tree_inputs) > 0: + first_tree, *other_trees = all_tree_inputs + else: + raise TypeError("Must pass at least one tree object") + + for other_tree in other_trees: + # isomorphism is transitive so this is enough to guarantee all trees are mutually isomorphic + _check_isomorphic(first_tree, other_tree, require_names_equal=False) + + # Walk all trees simultaneously, applying func to all nodes that lie in same position in different trees + # We don't know which arguments are DataTrees so we zip all arguments together as iterables + # Store tuples of results in a dict because we don't yet know how many trees we need to rebuild to return + out_data_objects = {} + args_as_tree_length_iterables = [ + a.subtree if isinstance(a, DataTree) else repeat(a) for a in args + ] + n_args = len(args_as_tree_length_iterables) + kwargs_as_tree_length_iterables = { + k: v.subtree if isinstance(v, DataTree) else repeat(v) + for k, v in kwargs.items() + } + for node_of_first_tree, *all_node_args in zip( + first_tree.subtree, + *args_as_tree_length_iterables, + *list(kwargs_as_tree_length_iterables.values()), + ): + node_args_as_datasets = [ + a.ds if isinstance(a, DataTree) else a for a in all_node_args[:n_args] + ] + node_kwargs_as_datasets = dict( + zip( + [k for k in kwargs_as_tree_length_iterables.keys()], + [ + v.ds if isinstance(v, DataTree) else v + for v in all_node_args[n_args:] + ], + ) + ) - # Recreate and act on root node - from .datatree import DataNode + # Now we can call func on the data in this particular set of corresponding nodes + results = ( + func(*node_args_as_datasets, **node_kwargs_as_datasets) + if node_of_first_tree.has_data + else None + ) - out_tree = DataNode(name=tree.name, data=tree.ds) - if out_tree.has_data: - out_tree.ds = func(out_tree.ds, *args, **kwargs) + # TODO implement mapping over multiple trees in-place using if conditions from here on? + out_data_objects[node_of_first_tree.pathstr] = results + + # Find out how many return values we received + num_return_values = _check_all_return_values(out_data_objects) + + # Reconstruct 1+ subtrees from the dict of results, by filling in all nodes of all result trees + result_trees = [] + for i in range(num_return_values): + out_tree_contents = {} + for n in first_tree.subtree: + p = n.pathstr + if p in out_data_objects.keys(): + if isinstance(out_data_objects[p], tuple): + output_node_data = out_data_objects[p][i] + else: + output_node_data = out_data_objects[p] + else: + output_node_data = None + out_tree_contents[p] = output_node_data + + new_tree = DataTree(name=first_tree.name, data_objects=out_tree_contents) + result_trees.append(new_tree) + + # If only one result then don't wrap it in a tuple + if len(result_trees) == 1: + return result_trees[0] + else: + return tuple(result_trees) - # Act on every other node in the tree, and rebuild from results - for node in tree.descendants: - # TODO make a proper relative_path method - relative_path = node.pathstr.replace(tree.pathstr, "") - result = func(node.ds, *args, **kwargs) if node.has_data else None - out_tree[relative_path] = result + return _map_over_subtree - return out_tree - return _map_over_subtree +def _check_single_set_return_values(path_to_node, obj): + """Check types returned from single evaluation of func, and return number of return values received from func.""" + if isinstance(obj, (Dataset, DataArray)): + return 1 + elif isinstance(obj, tuple): + for r in obj: + if not isinstance(r, (Dataset, DataArray)): + raise TypeError( + f"One of the results of calling func on datasets on the nodes at position {path_to_node} is " + f"of type {type(r)}, not Dataset or DataArray." + ) + return len(obj) + else: + raise TypeError( + f"The result of calling func on the node at position {path_to_node} is of type {type(obj)}, not " + f"Dataset or DataArray, nor a tuple of such types." + ) + + +def _check_all_return_values(returned_objects): + """Walk through all values returned by mapping func over subtrees, raising on any invalid or inconsistent types.""" + + if all(r is None for r in returned_objects.values()): + raise TypeError( + "Called supplied function on all nodes but found a return value of None for" + "all of them." + ) + + result_data_objects = [ + (path_to_node, r) + for path_to_node, r in returned_objects.items() + if r is not None + ] + + if len(result_data_objects) == 1: + # Only one node in the tree: no need to check consistency of results between nodes + path_to_node, result = result_data_objects[0] + num_return_values = _check_single_set_return_values(path_to_node, result) + else: + prev_path, _ = result_data_objects[0] + prev_num_return_values, num_return_values = None, None + for path_to_node, obj in result_data_objects[1:]: + num_return_values = _check_single_set_return_values(path_to_node, obj) + + if ( + num_return_values != prev_num_return_values + and prev_num_return_values is not None + ): + raise TypeError( + f"Calling func on the nodes at position {path_to_node} returns {num_return_values} separate return " + f"values, whereas calling func on the nodes at position {prev_path} instead returns " + f"{prev_num_return_values} separate return values." + ) + + prev_path, prev_num_return_values = path_to_node, num_return_values + + return num_return_values diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 4592643bdf0..6ce51851f49 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -8,11 +8,9 @@ def assert_tree_equal(dt_a, dt_b): - assert dt_a.name == dt_b.name assert dt_a.parent is dt_b.parent - assert dt_a.ds.equals(dt_b.ds) - for a, b in zip(dt_a.descendants, dt_b.descendants): + for a, b in zip(dt_a.subtree, dt_b.subtree): assert a.name == b.name assert a.pathstr == b.pathstr if a.has_data: @@ -321,7 +319,6 @@ def test_to_netcdf(self, tmpdir): original_dt.to_netcdf(filepath, engine="netcdf4") roundtrip_dt = open_datatree(filepath) - assert_tree_equal(original_dt, roundtrip_dt) def test_to_zarr(self, tmpdir): @@ -332,5 +329,4 @@ def test_to_zarr(self, tmpdir): original_dt.to_zarr(filepath) roundtrip_dt = open_datatree(filepath, engine="zarr") - assert_tree_equal(original_dt, roundtrip_dt) diff --git a/xarray/datatree_/datatree/tests/test_mapping.py b/xarray/datatree_/datatree/tests/test_mapping.py index da2ad8be196..b94840dc38c 100644 --- a/xarray/datatree_/datatree/tests/test_mapping.py +++ b/xarray/datatree_/datatree/tests/test_mapping.py @@ -1,9 +1,8 @@ import pytest import xarray as xr from test_datatree import assert_tree_equal, create_test_datatree -from xarray.testing import assert_equal -from datatree.datatree import DataNode, DataTree +from datatree.datatree import DataTree from datatree.mapping import TreeIsomorphismError, _check_isomorphic, map_over_subtree from datatree.treenode import TreeNode @@ -91,17 +90,36 @@ def test_not_isomorphic_complex_tree(self): class TestMapOverSubTree: - @pytest.mark.xfail def test_no_trees_passed(self): - raise NotImplementedError + @map_over_subtree + def times_ten(ds): + return 10.0 * ds + + with pytest.raises(TypeError, match="Must pass at least one tree"): + times_ten("dt") - @pytest.mark.xfail def test_not_isomorphic(self): - raise NotImplementedError + dt1 = create_test_datatree() + dt2 = create_test_datatree() + dt2["set4"] = None + + @map_over_subtree + def times_ten(ds1, ds2): + return ds1 * ds2 + + with pytest.raises(TreeIsomorphismError): + times_ten(dt1, dt2) - @pytest.mark.xfail def test_no_trees_returned(self): - raise NotImplementedError + dt1 = create_test_datatree() + dt2 = create_test_datatree() + + @map_over_subtree + def bad_func(ds1, ds2): + return None + + with pytest.raises(TypeError, match="return value of None"): + bad_func(dt1, dt2) def test_single_dt_arg(self): dt = create_test_datatree() @@ -110,8 +128,8 @@ def test_single_dt_arg(self): def times_ten(ds): return 10.0 * ds - result_tree = times_ten(dt) expected = create_test_datatree(modify=lambda ds: 10.0 * ds) + result_tree = times_ten(dt) assert_tree_equal(result_tree, expected) def test_single_dt_arg_plus_args_and_kwargs(self): @@ -119,43 +137,109 @@ def test_single_dt_arg_plus_args_and_kwargs(self): @map_over_subtree def multiply_then_add(ds, times, add=0.0): - return times * ds + add + return (times * ds) + add - result_tree = multiply_then_add(dt, 10.0, add=2.0) expected = create_test_datatree(modify=lambda ds: (10.0 * ds) + 2.0) + result_tree = multiply_then_add(dt, 10.0, add=2.0) assert_tree_equal(result_tree, expected) - @pytest.mark.xfail def test_multiple_dt_args(self): - ds = xr.Dataset({"a": ("x", [1, 2, 3])}) - dt = DataNode("root", data=ds) - DataNode("results", data=ds + 0.2, parent=dt) + dt1 = create_test_datatree() + dt2 = create_test_datatree() @map_over_subtree def add(ds1, ds2): return ds1 + ds2 - expected = DataNode("root", data=ds * 2) - DataNode("results", data=(ds + 0.2) * 2, parent=expected) + expected = create_test_datatree(modify=lambda ds: 2.0 * ds) + result = add(dt1, dt2) + assert_tree_equal(result, expected) - result = add(dt, dt) + def test_dt_as_kwarg(self): + dt1 = create_test_datatree() + dt2 = create_test_datatree() - # dt1 = create_test_datatree() - # dt2 = create_test_datatree() - # expected = create_test_datatree(modify=lambda ds: 2 * ds) + @map_over_subtree + def add(ds1, value=0.0): + return ds1 + value + expected = create_test_datatree(modify=lambda ds: 2.0 * ds) + result = add(dt1, value=dt2) assert_tree_equal(result, expected) - @pytest.mark.xfail - def test_dt_as_kwarg(self): - raise NotImplementedError + def test_return_multiple_dts(self): + dt = create_test_datatree() + + @map_over_subtree + def minmax(ds): + return ds.min(), ds.max() + + dt_min, dt_max = minmax(dt) + expected_min = create_test_datatree(modify=lambda ds: ds.min()) + assert_tree_equal(dt_min, expected_min) + expected_max = create_test_datatree(modify=lambda ds: ds.max()) + assert_tree_equal(dt_max, expected_max) + + def test_return_wrong_type(self): + dt1 = create_test_datatree() + + @map_over_subtree + def bad_func(ds1): + return "string" + + with pytest.raises(TypeError, match="not Dataset or DataArray"): + bad_func(dt1) + + def test_return_tuple_of_wrong_types(self): + dt1 = create_test_datatree() + + @map_over_subtree + def bad_func(ds1): + return xr.Dataset(), "string" + + with pytest.raises(TypeError, match="not Dataset or DataArray"): + bad_func(dt1) @pytest.mark.xfail - def test_return_multiple_dts(self): - raise NotImplementedError + def test_return_inconsistent_number_of_results(self): + dt1 = create_test_datatree() + + @map_over_subtree + def bad_func(ds): + # Datasets in create_test_datatree() have different numbers of dims + # TODO need to instead return different numbers of Dataset objects for this test to catch the intended error + return tuple(ds.dims) + + with pytest.raises(TypeError, match="instead returns"): + bad_func(dt1) + + def test_wrong_number_of_arguments_for_func(self): + dt = create_test_datatree() + + @map_over_subtree + def times_ten(ds): + return 10.0 * ds + + with pytest.raises( + TypeError, match="takes 1 positional argument but 2 were given" + ): + times_ten(dt, dt) + + def test_map_single_dataset_against_whole_tree(self): + dt = create_test_datatree() + + @map_over_subtree + def nodewise_merge(node_ds, fixed_ds): + return xr.merge([node_ds, fixed_ds]) + + other_ds = xr.Dataset({"z": ("z", [0])}) + expected = create_test_datatree(modify=lambda ds: xr.merge([ds, other_ds])) + result_tree = nodewise_merge(dt, other_ds) + assert_tree_equal(result_tree, expected) @pytest.mark.xfail - def test_return_no_dts(self): + def test_trees_with_different_node_names(self): + # TODO test this after I've got good tests for renaming nodes raise NotImplementedError def test_dt_method(self): @@ -164,18 +248,9 @@ def test_dt_method(self): def multiply_then_add(ds, times, add=0.0): return times * ds + add + expected = create_test_datatree(modify=lambda ds: (10.0 * ds) + 2.0) result_tree = dt.map_over_subtree(multiply_then_add, 10.0, add=2.0) - - for ( - result_node, - original_node, - ) in zip(result_tree.subtree, dt.subtree): - assert isinstance(result_node, DataTree) - - if original_node.has_data: - assert_equal(result_node.ds, (original_node.ds * 10.0) + 2.0) - else: - assert not result_node.has_data + assert_tree_equal(result_tree, expected) @pytest.mark.xfail From 5cc4bb14e5b59275742641aecc2325021e3bfc6c Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Wed, 1 Sep 2021 21:17:19 -0400 Subject: [PATCH 059/260] skips tests if it doesn't have the correct dependency https://github.com/xarray-contrib/datatree/pull/34 --- xarray/datatree_/datatree/tests/__init__.py | 28 +++++++++++++++++++ .../datatree_/datatree/tests/test_datatree.py | 3 ++ .../datatree_/datatree/tests/test_mapping.py | 3 +- 3 files changed, 33 insertions(+), 1 deletion(-) create mode 100644 xarray/datatree_/datatree/tests/__init__.py diff --git a/xarray/datatree_/datatree/tests/__init__.py b/xarray/datatree_/datatree/tests/__init__.py new file mode 100644 index 00000000000..e5afc834c08 --- /dev/null +++ b/xarray/datatree_/datatree/tests/__init__.py @@ -0,0 +1,28 @@ +import importlib +from distutils import version + +import pytest + + +def _importorskip(modname, minversion=None): + try: + mod = importlib.import_module(modname) + has = True + if minversion is not None: + if LooseVersion(mod.__version__) < LooseVersion(minversion): + raise ImportError("Minimum version not satisfied") + except ImportError: + has = False + func = pytest.mark.skipif(not has, reason=f"requires {modname}") + return has, func + + +def LooseVersion(vstring): + # Our development version is something like '0.10.9+aac7bfc' + # This function just ignores the git commit id. + vstring = vstring.split("+")[0] + return version.LooseVersion(vstring) + + +has_zarr, requires_zarr = _importorskip("zarr") +has_netCDF4, requires_netCDF4 = _importorskip("netCDF4") diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 6ce51851f49..df73109abee 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -5,6 +5,7 @@ from datatree import DataNode, DataTree from datatree.io import open_datatree +from datatree.tests import requires_netCDF4, requires_zarr def assert_tree_equal(dt_a, dt_b): @@ -311,6 +312,7 @@ def test_repr_of_node_with_data(self): class TestIO: + @requires_netCDF4 def test_to_netcdf(self, tmpdir): filepath = str( tmpdir / "test.nc" @@ -321,6 +323,7 @@ def test_to_netcdf(self, tmpdir): roundtrip_dt = open_datatree(filepath) assert_tree_equal(original_dt, roundtrip_dt) + @requires_zarr def test_to_zarr(self, tmpdir): filepath = str( tmpdir / "test.zarr" diff --git a/xarray/datatree_/datatree/tests/test_mapping.py b/xarray/datatree_/datatree/tests/test_mapping.py index b94840dc38c..00b30f57b7c 100644 --- a/xarray/datatree_/datatree/tests/test_mapping.py +++ b/xarray/datatree_/datatree/tests/test_mapping.py @@ -1,11 +1,12 @@ import pytest import xarray as xr -from test_datatree import assert_tree_equal, create_test_datatree from datatree.datatree import DataTree from datatree.mapping import TreeIsomorphismError, _check_isomorphic, map_over_subtree from datatree.treenode import TreeNode +from .test_datatree import assert_tree_equal, create_test_datatree + empty = xr.Dataset() From 0be5907329dfa147e2e55928d668710880ee283b Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Thu, 2 Sep 2021 12:15:11 -0400 Subject: [PATCH 060/260] [WIP] Add typed ops https://github.com/xarray-contrib/datatree/pull/24 * added methods from xarray.core._typed_ops.py to list to map over * test ops with non-datatrees acting on datatrees * removed the xfails * refactored ops out into new file * linting * minimise imports of xarray internals --- xarray/datatree_/datatree/datatree.py | 232 +-------------- xarray/datatree_/datatree/ops.py | 268 ++++++++++++++++++ .../datatree/tests/test_dataset_api.py | 41 ++- 3 files changed, 318 insertions(+), 223 deletions(-) create mode 100644 xarray/datatree_/datatree/ops.py diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index e39b0c05490..76fc1bf96be 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -4,45 +4,25 @@ from typing import Any, Callable, Dict, Hashable, Iterable, List, Mapping, Union import anytree +from xarray import DataArray, Dataset, merge from xarray.core import dtypes, utils -from xarray.core.arithmetic import DatasetArithmetic -from xarray.core.combine import merge -from xarray.core.common import DataWithCoords -from xarray.core.dataarray import DataArray -from xarray.core.dataset import Dataset -from xarray.core.ops import NAN_CUM_METHODS, NAN_REDUCE_METHODS, REDUCE_METHODS from xarray.core.variable import Variable from .mapping import map_over_subtree +from .ops import ( + DataTreeArithmeticMixin, + MappedDatasetMethodsMixin, + MappedDataWithCoords, +) from .treenode import PathType, TreeNode, _init_single_treenode """ -The structure of a populated Datatree looks roughly like this: - -DataTree("root name") -|-- DataNode("weather") -| | Variable("wind_speed") -| | Variable("pressure") -| |-- DataNode("temperature") -| | Variable("sea_surface_temperature") -| | Variable("dew_point_temperature") -|-- DataNode("satellite image") -| | Variable("true_colour") -| |-- DataNode("infrared") -| | Variable("near_infrared") -| | Variable("far_infrared") -|-- DataNode("topography") -| |-- DataNode("elevation") -| | Variable("height_above_sea_level") -|-- DataNode("population") - - DEVELOPERS' NOTE ---------------- The idea of this module is to create a `DataTree` class which inherits the tree structure from TreeNode, and also copies the entire API of `xarray.Dataset`, but with certain methods decorated to instead map the dataset function over every node in the tree. As this API is copied without directly subclassing `xarray.Dataset` we instead create various Mixin -classes which each define part of `xarray.Dataset`'s extensive API. +classes (in ops.py) which each define part of `xarray.Dataset`'s extensive API. Some of these methods must be wrapped to map over all nodes in the subtree. Others are fine to inherit unaltered (normally because they (a) only call dataset properties and (b) don't return a dataset that should be nested into a new @@ -56,6 +36,8 @@ class DatasetPropertiesMixin: # TODO a neater way of setting all of these? # We wouldn't need this at all if we inherited directly from Dataset... + # TODO we could also just not define these at all, and require users to call e.g. dt.ds.dims ... + @property def dims(self): if self.has_data: @@ -159,202 +141,12 @@ def imag(self): chunks.__doc__ = Dataset.chunks.__doc__ -_MAPPED_DOCSTRING_ADDENDUM = textwrap.fill( - "This method was copied from xarray.Dataset, but has been altered to " - "call the method on the Datasets stored in every node of the subtree. " - "See the `map_over_subtree` function for more details.", - width=117, -) - -# TODO equals, broadcast_equals etc. -# TODO do dask-related private methods need to be exposed? -_DATASET_DASK_METHODS_TO_MAP = [ - "load", - "compute", - "persist", - "unify_chunks", - "chunk", - "map_blocks", -] -_DATASET_METHODS_TO_MAP = [ - "copy", - "as_numpy", - "__copy__", - "__deepcopy__", - "set_coords", - "reset_coords", - "info", - "isel", - "sel", - "head", - "tail", - "thin", - "broadcast_like", - "reindex_like", - "reindex", - "interp", - "interp_like", - "rename", - "rename_dims", - "rename_vars", - "swap_dims", - "expand_dims", - "set_index", - "reset_index", - "reorder_levels", - "stack", - "unstack", - "update", - "merge", - "drop_vars", - "drop_sel", - "drop_isel", - "drop_dims", - "transpose", - "dropna", - "fillna", - "interpolate_na", - "ffill", - "bfill", - "combine_first", - "reduce", - "map", - "assign", - "diff", - "shift", - "roll", - "sortby", - "quantile", - "rank", - "differentiate", - "integrate", - "cumulative_integrate", - "filter_by_attrs", - "polyfit", - "pad", - "idxmin", - "idxmax", - "argmin", - "argmax", - "query", - "curvefit", -] -# TODO unsure if these are called by external functions or not? -_DATASET_OPS_TO_MAP = ["_unary_op", "_binary_op", "_inplace_binary_op"] -_ALL_DATASET_METHODS_TO_MAP = ( - _DATASET_DASK_METHODS_TO_MAP + _DATASET_METHODS_TO_MAP + _DATASET_OPS_TO_MAP -) - -_DATA_WITH_COORDS_METHODS_TO_MAP = [ - "squeeze", - "clip", - "assign_coords", - "where", - "close", - "isnull", - "notnull", - "isin", - "astype", -] - -# TODO NUM_BINARY_OPS apparently aren't defined on DatasetArithmetic, and don't appear to be injected anywhere... -_ARITHMETIC_METHODS_TO_MAP = ( - REDUCE_METHODS + NAN_REDUCE_METHODS + NAN_CUM_METHODS + ["__array_ufunc__"] -) - - -def _wrap_then_attach_to_cls( - target_cls_dict, source_cls, methods_to_set, wrap_func=None -): - """ - Attach given methods on a class, and optionally wrap each method first. (i.e. with map_over_subtree) - - Result is like having written this in the classes' definition: - ``` - @wrap_func - def method_name(self, *args, **kwargs): - return self.method(*args, **kwargs) - ``` - - Every method attached here needs to have a return value of Dataset or DataArray in order to construct a new tree. - - Parameters - ---------- - target_cls_dict : MappingProxy - The __dict__ attribute of the class which we want the methods to be added to. (The __dict__ attribute can also - be accessed by calling vars() from within that classes' definition.) This will be updated by this function. - source_cls : class - Class object from which we want to copy methods (and optionally wrap them). Should be the actual class object - (or instance), not just the __dict__. - methods_to_set : Iterable[Tuple[str, callable]] - The method names and definitions supplied as a list of (method_name_string, method) pairs. - This format matches the output of inspect.getmembers(). - wrap_func : callable, optional - Function to decorate each method with. Must have the same return type as the method. - """ - for method_name in methods_to_set: - orig_method = getattr(source_cls, method_name) - wrapped_method = ( - wrap_func(orig_method) if wrap_func is not None else orig_method - ) - target_cls_dict[method_name] = wrapped_method - - if wrap_func is map_over_subtree: - # Add a paragraph to the method's docstring explaining how it's been mapped - orig_method_docstring = orig_method.__doc__ - if orig_method_docstring is not None: - if "\n" in orig_method_docstring: - new_method_docstring = orig_method_docstring.replace( - "\n", _MAPPED_DOCSTRING_ADDENDUM, 1 - ) - else: - new_method_docstring = ( - orig_method_docstring + f"\n\n{_MAPPED_DOCSTRING_ADDENDUM}" - ) - setattr(target_cls_dict[method_name], "__doc__", new_method_docstring) - - -class MappedDatasetMethodsMixin: - """ - Mixin to add Dataset methods like .mean(), but wrapped to map over all nodes in the subtree. - """ - - __slots__ = () - _wrap_then_attach_to_cls( - vars(), Dataset, _ALL_DATASET_METHODS_TO_MAP, wrap_func=map_over_subtree - ) - - -class MappedDataWithCoords(DataWithCoords): - # TODO add mapped versions of groupby, weighted, rolling, rolling_exp, coarsen, resample - # TODO re-implement AttrsAccessMixin stuff so that it includes access to child nodes - _wrap_then_attach_to_cls( - vars(), - DataWithCoords, - _DATA_WITH_COORDS_METHODS_TO_MAP, - wrap_func=map_over_subtree, - ) - - -class DataTreeArithmetic(DatasetArithmetic): - """ - Mixin to add Dataset methods like __add__ and .mean(). - """ - - _wrap_then_attach_to_cls( - vars(), - DatasetArithmetic, - _ARITHMETIC_METHODS_TO_MAP, - wrap_func=map_over_subtree, - ) - - class DataTree( TreeNode, DatasetPropertiesMixin, MappedDatasetMethodsMixin, MappedDataWithCoords, - DataTreeArithmetic, + DataTreeArithmeticMixin, ): """ A tree-like hierarchical collection of xarray objects. @@ -858,7 +650,7 @@ def to_netcdf( def to_zarr(self, store, mode: str = "w", encoding=None, **kwargs): """ - Write datatree contents to a netCDF file. + Write datatree contents to a Zarr store. Parameters --------- @@ -875,7 +667,7 @@ def to_zarr(self, store, mode: str = "w", encoding=None, **kwargs): ``{"root/set1": {"my_variable": {"dtype": "int16", "scale_factor": 0.1}, ...}, ...}``. See ``xarray.Dataset.to_zarr`` for available options. kwargs : - Addional keyword arguments to be passed to ``xarray.Dataset.to_zarr`` + Additional keyword arguments to be passed to ``xarray.Dataset.to_zarr`` """ from .io import _datatree_to_zarr diff --git a/xarray/datatree_/datatree/ops.py b/xarray/datatree_/datatree/ops.py new file mode 100644 index 00000000000..e411c973c99 --- /dev/null +++ b/xarray/datatree_/datatree/ops.py @@ -0,0 +1,268 @@ +import textwrap + +from xarray import Dataset + +from .mapping import map_over_subtree + +""" +Module which specifies the subset of xarray.Dataset's API which we wish to copy onto DataTree. + +Structured to mirror the way xarray defines Dataset's various operations internally, but does not actually import from +xarray's internals directly, only the public-facing xarray.Dataset class. +""" + + +_MAPPED_DOCSTRING_ADDENDUM = textwrap.fill( + "This method was copied from xarray.Dataset, but has been altered to " + "call the method on the Datasets stored in every node of the subtree. " + "See the `map_over_subtree` function for more details.", + width=117, +) + +# TODO equals, broadcast_equals etc. +# TODO do dask-related private methods need to be exposed? +_DATASET_DASK_METHODS_TO_MAP = [ + "load", + "compute", + "persist", + "unify_chunks", + "chunk", + "map_blocks", +] +_DATASET_METHODS_TO_MAP = [ + "copy", + "as_numpy", + "__copy__", + "__deepcopy__", + "set_coords", + "reset_coords", + "info", + "isel", + "sel", + "head", + "tail", + "thin", + "broadcast_like", + "reindex_like", + "reindex", + "interp", + "interp_like", + "rename", + "rename_dims", + "rename_vars", + "swap_dims", + "expand_dims", + "set_index", + "reset_index", + "reorder_levels", + "stack", + "unstack", + "update", + "merge", + "drop_vars", + "drop_sel", + "drop_isel", + "drop_dims", + "transpose", + "dropna", + "fillna", + "interpolate_na", + "ffill", + "bfill", + "combine_first", + "reduce", + "map", + "assign", + "diff", + "shift", + "roll", + "sortby", + "quantile", + "rank", + "differentiate", + "integrate", + "cumulative_integrate", + "filter_by_attrs", + "polyfit", + "pad", + "idxmin", + "idxmax", + "argmin", + "argmax", + "query", + "curvefit", +] +_ALL_DATASET_METHODS_TO_MAP = _DATASET_DASK_METHODS_TO_MAP + _DATASET_METHODS_TO_MAP + +_DATA_WITH_COORDS_METHODS_TO_MAP = [ + "squeeze", + "clip", + "assign_coords", + "where", + "close", + "isnull", + "notnull", + "isin", + "astype", +] + +REDUCE_METHODS = ["all", "any"] +NAN_REDUCE_METHODS = [ + "max", + "min", + "mean", + "prod", + "sum", + "std", + "var", + "median", +] +NAN_CUM_METHODS = ["cumsum", "cumprod"] +_TYPED_DATASET_OPS_TO_MAP = [ + "__add__", + "__sub__", + "__mul__", + "__pow__", + "__truediv__", + "__floordiv__", + "__mod__", + "__and__", + "__xor__", + "__or__", + "__lt__", + "__le__", + "__gt__", + "__ge__", + "__eq__", + "__ne__", + "__radd__", + "__rsub__", + "__rmul__", + "__rpow__", + "__rtruediv__", + "__rfloordiv__", + "__rmod__", + "__rand__", + "__rxor__", + "__ror__", + "__iadd__", + "__isub__", + "__imul__", + "__ipow__", + "__itruediv__", + "__ifloordiv__", + "__imod__", + "__iand__", + "__ixor__", + "__ior__", + "__neg__", + "__pos__", + "__abs__", + "__invert__", + "round", + "argsort", + "conj", + "conjugate", +] +# TODO NUM_BINARY_OPS apparently aren't defined on DatasetArithmetic, and don't appear to be injected anywhere... +_ARITHMETIC_METHODS_TO_MAP = ( + REDUCE_METHODS + + NAN_REDUCE_METHODS + + NAN_CUM_METHODS + + _TYPED_DATASET_OPS_TO_MAP + + ["__array_ufunc__"] +) + + +def _wrap_then_attach_to_cls( + target_cls_dict, source_cls, methods_to_set, wrap_func=None +): + """ + Attach given methods on a class, and optionally wrap each method first. (i.e. with map_over_subtree) + + Result is like having written this in the classes' definition: + ``` + @wrap_func + def method_name(self, *args, **kwargs): + return self.method(*args, **kwargs) + ``` + + Every method attached here needs to have a return value of Dataset or DataArray in order to construct a new tree. + + Parameters + ---------- + target_cls_dict : MappingProxy + The __dict__ attribute of the class which we want the methods to be added to. (The __dict__ attribute can also + be accessed by calling vars() from within that classes' definition.) This will be updated by this function. + source_cls : class + Class object from which we want to copy methods (and optionally wrap them). Should be the actual class object + (or instance), not just the __dict__. + methods_to_set : Iterable[Tuple[str, callable]] + The method names and definitions supplied as a list of (method_name_string, method) pairs. + This format matches the output of inspect.getmembers(). + wrap_func : callable, optional + Function to decorate each method with. Must have the same return type as the method. + """ + for method_name in methods_to_set: + orig_method = getattr(source_cls, method_name) + wrapped_method = ( + wrap_func(orig_method) if wrap_func is not None else orig_method + ) + target_cls_dict[method_name] = wrapped_method + + if wrap_func is map_over_subtree: + # Add a paragraph to the method's docstring explaining how it's been mapped + orig_method_docstring = orig_method.__doc__ + if orig_method_docstring is not None: + if "\n" in orig_method_docstring: + new_method_docstring = orig_method_docstring.replace( + "\n", _MAPPED_DOCSTRING_ADDENDUM, 1 + ) + else: + new_method_docstring = ( + orig_method_docstring + f"\n\n{_MAPPED_DOCSTRING_ADDENDUM}" + ) + setattr(target_cls_dict[method_name], "__doc__", new_method_docstring) + + +class MappedDatasetMethodsMixin: + """ + Mixin to add methods defined specifically on the Dataset class such as .query(), but wrapped to map over all nodes + in the subtree. + """ + + _wrap_then_attach_to_cls( + target_cls_dict=vars(), + source_cls=Dataset, + methods_to_set=_ALL_DATASET_METHODS_TO_MAP, + wrap_func=map_over_subtree, + ) + + +class MappedDataWithCoords: + """ + Mixin to add coordinate-aware Dataset methods such as .where(), but wrapped to map over all nodes in the subtree. + """ + + # TODO add mapped versions of groupby, weighted, rolling, rolling_exp, coarsen, resample + # TODO re-implement AttrsAccessMixin stuff so that it includes access to child nodes + _wrap_then_attach_to_cls( + target_cls_dict=vars(), + source_cls=Dataset, + methods_to_set=_DATA_WITH_COORDS_METHODS_TO_MAP, + wrap_func=map_over_subtree, + ) + + +class DataTreeArithmeticMixin: + """ + Mixin to add Dataset arithmetic operations such as __add__, reduction methods such as .mean(), and enable numpy + ufuncs such as np.sin(), but wrapped to map over all nodes in the subtree. + """ + + _wrap_then_attach_to_cls( + target_cls_dict=vars(), + source_cls=Dataset, + methods_to_set=_ARITHMETIC_METHODS_TO_MAP, + wrap_func=map_over_subtree, + ) diff --git a/xarray/datatree_/datatree/tests/test_dataset_api.py b/xarray/datatree_/datatree/tests/test_dataset_api.py index e930f49fcf3..2c6a4d528fc 100644 --- a/xarray/datatree_/datatree/tests/test_dataset_api.py +++ b/xarray/datatree_/datatree/tests/test_dataset_api.py @@ -5,6 +5,8 @@ from datatree import DataNode +from .test_datatree import assert_tree_equal, create_test_datatree + class TestDSProperties: def test_properties(self): @@ -88,8 +90,36 @@ def test_cum_method(self): class TestOps: - @pytest.mark.xfail - def test_binary_op(self): + def test_binary_op_on_int(self): + ds1 = xr.Dataset({"a": [5], "b": [3]}) + ds2 = xr.Dataset({"x": [0.1, 0.2], "y": [10, 20]}) + dt = DataNode("root", data=ds1) + DataNode("subnode", data=ds2, parent=dt) + + expected_root = DataNode("root", data=ds1 * 5) + expected_descendant = DataNode("subnode", data=ds2 * 5, parent=expected_root) + result = dt * 5 + + assert_equal(result.ds, expected_root.ds) + assert_equal(result["subnode"].ds, expected_descendant.ds) + + def test_binary_op_on_dataset(self): + ds1 = xr.Dataset({"a": [5], "b": [3]}) + ds2 = xr.Dataset({"x": [0.1, 0.2], "y": [10, 20]}) + dt = DataNode("root", data=ds1) + DataNode("subnode", data=ds2, parent=dt) + other_ds = xr.Dataset({"z": ("z", [0.1, 0.2])}) + + expected_root = DataNode("root", data=ds1 * other_ds) + expected_descendant = DataNode( + "subnode", data=ds2 * other_ds, parent=expected_root + ) + result = dt * other_ds + + assert_equal(result.ds, expected_root.ds) + assert_equal(result["subnode"].ds, expected_descendant.ds) + + def test_binary_op_on_datatree(self): ds1 = xr.Dataset({"a": [5], "b": [3]}) ds2 = xr.Dataset({"x": [0.1, 0.2], "y": [10, 20]}) dt = DataNode("root", data=ds1) @@ -103,7 +133,6 @@ def test_binary_op(self): assert_equal(result["subnode"].ds, expected_descendant.ds) -@pytest.mark.xfail class TestUFuncs: def test_root(self): da = xr.DataArray(name="a", data=[1, 2, 3]) @@ -119,3 +148,9 @@ def test_descendants(self): expected_ds = np.sin(da.to_dataset()) result_ds = np.sin(dt)["results"].ds assert_equal(result_ds, expected_ds) + + def test_tree(self): + dt = create_test_datatree() + expected = create_test_datatree(modify=lambda ds: np.sin(ds)) + result_tree = np.sin(dt) + assert_tree_equal(result_tree, expected) From f71c69b6b0810216bfa67485737891d655e9b9a2 Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Thu, 2 Sep 2021 16:23:41 -0400 Subject: [PATCH 061/260] Unexpose dataset properties https://github.com/xarray-contrib/datatree/pull/37 * remove DatasetPropertiesMixin * remove tests --- xarray/datatree_/datatree/datatree.py | 122 +----------------- .../datatree/tests/test_dataset_api.py | 28 ---- 2 files changed, 6 insertions(+), 144 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 76fc1bf96be..35830fa9c0d 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -30,120 +30,8 @@ """ -class DatasetPropertiesMixin: - """Expose properties of wrapped Dataset""" - - # TODO a neater way of setting all of these? - # We wouldn't need this at all if we inherited directly from Dataset... - - # TODO we could also just not define these at all, and require users to call e.g. dt.ds.dims ... - - @property - def dims(self): - if self.has_data: - return self.ds.dims - else: - raise AttributeError("property is not defined for a node with no data") - - @property - def variables(self): - if self.has_data: - return self.ds.variables - else: - raise AttributeError("property is not defined for a node with no data") - - @property - def encoding(self): - if self.has_data: - return self.ds.encoding - else: - raise AttributeError("property is not defined for a node with no data") - - @property - def sizes(self): - if self.has_data: - return self.ds.sizes - else: - raise AttributeError("property is not defined for a node with no data") - - @property - def attrs(self): - if self.has_data: - return self.ds.attrs - else: - raise AttributeError("property is not defined for a node with no data") - - @property - def nbytes(self) -> int: - return sum(node.ds.nbytes for node in self.subtree) - - @property - def indexes(self): - if self.has_data: - return self.ds.indexes - else: - raise AttributeError("property is not defined for a node with no data") - - @property - def xindexes(self): - if self.has_data: - return self.ds.xindexes - else: - raise AttributeError("property is not defined for a node with no data") - - @property - def coords(self): - if self.has_data: - return self.ds.coords - else: - raise AttributeError("property is not defined for a node with no data") - - @property - def data_vars(self): - if self.has_data: - return self.ds.data_vars - else: - raise AttributeError("property is not defined for a node with no data") - - # TODO should this instead somehow give info about the chunking of every node? - @property - def chunks(self): - if self.has_data: - return self.ds.chunks - else: - raise AttributeError("property is not defined for a node with no data") - - @property - def real(self): - if self.has_data: - return self.ds.real - else: - raise AttributeError("property is not defined for a node with no data") - - @property - def imag(self): - if self.has_data: - return self.ds.imag - else: - raise AttributeError("property is not defined for a node with no data") - - # TODO .loc, __contains__, __iter__, __array__, '__len__', - - dims.__doc__ = Dataset.dims.__doc__ - variables.__doc__ = Dataset.variables.__doc__ - encoding.__doc__ = Dataset.encoding.__doc__ - sizes.__doc__ = Dataset.sizes.__doc__ - attrs.__doc__ = Dataset.attrs.__doc__ - indexes.__doc__ = Dataset.indexes.__doc__ - xindexes.__doc__ = Dataset.xindexes.__doc__ - coords.__doc__ = Dataset.coords.__doc__ - data_vars.__doc__ = Dataset.data_vars.__doc__ - chunks.__doc__ = Dataset.chunks.__doc__ - - class DataTree( TreeNode, - DatasetPropertiesMixin, MappedDatasetMethodsMixin, MappedDataWithCoords, DataTreeArithmeticMixin, @@ -173,8 +61,6 @@ class DataTree( # TODO should this instead be a subclass of Dataset? - # TODO Add attrs dict - # TODO attribute-like access for both vars and child nodes (by inheriting from xarray.core.common.AttrsAccessMixin?) # TODO ipython autocomplete for child nodes @@ -185,14 +71,14 @@ class DataTree( # TODO do we need a watch out for if methods intended only for root nodes are called on non-root nodes? - # TODO add any other properties (maybe dask ones?) - # TODO currently allows self.ds = None, should we instead always store at least an empty Dataset? # TODO dataset methods which should not or cannot act over the whole tree, such as .to_array # TODO del and delitem methods + # TODO .loc, __contains__, __iter__, __array__, __len__ + def __init__( self, data_objects: Dict[PathType, Union[Dataset, DataArray, None]] = None, @@ -480,6 +366,10 @@ def __setitem__( f"not {type(value)}" ) + @property + def nbytes(self) -> int: + return sum(node.ds.nbytes if node.has_data else 0 for node in self.subtree) + def map_over_subtree( self, func: Callable, diff --git a/xarray/datatree_/datatree/tests/test_dataset_api.py b/xarray/datatree_/datatree/tests/test_dataset_api.py index 2c6a4d528fc..f3276aa886b 100644 --- a/xarray/datatree_/datatree/tests/test_dataset_api.py +++ b/xarray/datatree_/datatree/tests/test_dataset_api.py @@ -1,5 +1,4 @@ import numpy as np -import pytest import xarray as xr from xarray.testing import assert_equal @@ -8,33 +7,6 @@ from .test_datatree import assert_tree_equal, create_test_datatree -class TestDSProperties: - def test_properties(self): - da_a = xr.DataArray(name="a", data=[0, 2], dims=["x"]) - da_b = xr.DataArray(name="b", data=[5, 6, 7], dims=["y"]) - ds = xr.Dataset({"a": da_a, "b": da_b}) - dt = DataNode("root", data=ds) - - assert dt.attrs == dt.ds.attrs - assert dt.encoding == dt.ds.encoding - assert dt.dims == dt.ds.dims - assert dt.sizes == dt.ds.sizes - assert dt.variables == dt.ds.variables - - def test_no_data_no_properties(self): - dt = DataNode("root", data=None) - with pytest.raises(AttributeError): - dt.attrs - with pytest.raises(AttributeError): - dt.encoding - with pytest.raises(AttributeError): - dt.dims - with pytest.raises(AttributeError): - dt.sizes - with pytest.raises(AttributeError): - dt.variables - - class TestDSMethodInheritance: def test_dataset_method(self): # test root From bef183af3e39925e2fb95ae11d380750492b3608 Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Tue, 7 Sep 2021 15:24:28 -0400 Subject: [PATCH 062/260] Name collisions https://github.com/xarray-contrib/datatree/pull/40 * tests for name collisions between variables and children * pass 2/3 tests --- xarray/datatree_/datatree/datatree.py | 30 +++++++++++++++++++ .../datatree_/datatree/tests/test_datatree.py | 28 +++++++++++++++++ 2 files changed, 58 insertions(+) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 35830fa9c0d..8ccd6c8bd37 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -125,6 +125,12 @@ def ds(self, data: Union[Dataset, DataArray] = None): ) if isinstance(data, DataArray): data = data.to_dataset() + if data is not None: + for var in list(data.variables): + if var in list(c.name for c in self.children): + raise KeyError( + f"Cannot add variable named {var}: node already has a child named {var}" + ) self._ds = data @property @@ -165,6 +171,30 @@ def _init_single_datatree_node( obj.ds = data return obj + def _pre_attach(self, parent: TreeNode) -> None: + """ + Method which superclass calls before setting parent, here used to prevent having two + children with duplicate names (or a data variable with the same name as a child). + """ + super()._pre_attach(parent) + if parent.has_data and self.name in list(parent.ds.variables): + raise KeyError( + f"parent {parent.name} already contains a data variable named {self.name}" + ) + + def add_child(self, child: TreeNode) -> None: + """ + Add a single child node below this node, without replacement. + + Will raise a KeyError if either a child or data variable already exists with this name. + """ + if child.name in list(c.name for c in self.children): + raise KeyError(f"Node already has a child named {child.name}") + elif self.has_data and child.name in list(self.ds.variables): + raise KeyError(f"Node already contains a data variable named {child.name}") + else: + child.parent = self + def __str__(self): """A printable representation of the structure of this entire subtree.""" renderer = anytree.RenderTree(self) diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index df73109abee..4a5d64ff774 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -90,6 +90,34 @@ def test_has_data(self): assert not john.has_data +class TestVariablesChildrenNameCollisions: + def test_parent_already_has_variable_with_childs_name(self): + dt = DataNode("root", data=xr.Dataset({"a": [0], "b": 1})) + with pytest.raises(KeyError, match="already contains a data variable named a"): + DataNode("a", data=None, parent=dt) + + with pytest.raises(KeyError, match="already contains a data variable named a"): + dt.add_child(DataNode("a", data=None)) + + def test_assign_when_already_child_with_variables_name(self): + dt = DataNode("root", data=None) + DataNode("a", data=None, parent=dt) + with pytest.raises(KeyError, match="already has a child named a"): + dt.ds = xr.Dataset({"a": 0}) + + dt.ds = xr.Dataset() + with pytest.raises(KeyError, match="already has a child named a"): + dt.ds = dt.ds.assign(a=xr.DataArray(0)) + + @pytest.mark.xfail + def test_update_when_already_child_with_variables_name(self): + # See issue https://github.com/xarray-contrib/datatree/issues/38 + dt = DataNode("root", data=None) + DataNode("a", data=None, parent=dt) + with pytest.raises(KeyError, match="already has a child named a"): + dt.ds["a"] = xr.DataArray(0) + + class TestGetItems: def test_get_node(self): folder1 = DataNode("folder1") From 2de7650cc1149bd03969607cede0e53e5d808951 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 15 Sep 2021 12:03:51 -0400 Subject: [PATCH 063/260] fix minor bug when creating a DataNode with children from scratch --- xarray/datatree_/datatree/datatree.py | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 8ccd6c8bd37..ba914b95a87 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -90,7 +90,7 @@ def __init__( root_data = data_objects.pop(name, None) else: root_data = None - self.ds = root_data + self._ds = root_data if data_objects: # Populate tree with children determined from data_objects mapping @@ -167,8 +167,9 @@ def _init_single_datatree_node( # This approach was inspired by xarray.Dataset._construct_direct() obj = object.__new__(cls) + obj._ds = None obj = _init_single_treenode(obj, name=name, parent=parent, children=children) - obj.ds = data + obj._ds = data return obj def _pre_attach(self, parent: TreeNode) -> None: From 79fef600f362742d209b8ea3cb4dc86eb40e5640 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 15 Sep 2021 12:14:04 -0400 Subject: [PATCH 064/260] fix bug the last commit introduced... --- xarray/datatree_/datatree/datatree.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index ba914b95a87..fbcec02bb2c 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -169,7 +169,7 @@ def _init_single_datatree_node( obj = object.__new__(cls) obj._ds = None obj = _init_single_treenode(obj, name=name, parent=parent, children=children) - obj._ds = data + obj.ds = data return obj def _pre_attach(self, parent: TreeNode) -> None: From a9ed61dc72b86474df2c187070e819daf3b3e72c Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 8 Nov 2021 12:33:45 -0500 Subject: [PATCH 065/260] Bump actions/checkout from 2.3.4 to 2.4.0 https://github.com/xarray-contrib/datatree/pull/43 Bumps [actions/checkout](https://github.com/actions/checkout) from 2.3.4 to 2.4.0. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/v2.3.4...v2.4.0) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- xarray/datatree_/.github/workflows/main.yaml | 6 +++--- xarray/datatree_/.github/workflows/pypipublish.yaml | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/xarray/datatree_/.github/workflows/main.yaml b/xarray/datatree_/.github/workflows/main.yaml index 6d3e7dbeb05..6fd951e040f 100644 --- a/xarray/datatree_/.github/workflows/main.yaml +++ b/xarray/datatree_/.github/workflows/main.yaml @@ -12,7 +12,7 @@ jobs: lint: runs-on: ubuntu-latest steps: - - uses: actions/checkout@v2.3.4 + - uses: actions/checkout@v2.4.0 - uses: actions/setup-python@v2.2.2 - uses: pre-commit/action@v2.0.3 @@ -23,7 +23,7 @@ jobs: matrix: python-version: [3.7, 3.8, 3.9] steps: - - uses: actions/checkout@v2.3.4 + - uses: actions/checkout@v2.4.0 - uses: conda-incubator/setup-miniconda@v2 with: mamba-version: "*" @@ -61,7 +61,7 @@ jobs: matrix: python-version: [3.8, 3.9] steps: - - uses: actions/checkout@v2.3.4 + - uses: actions/checkout@v2.4.0 - uses: conda-incubator/setup-miniconda@v2 with: mamba-version: "*" diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index 986315cae97..abdd87c2338 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -8,7 +8,7 @@ jobs: deploy: runs-on: ubuntu-latest steps: - - uses: actions/checkout@v2.3.4 + - uses: actions/checkout@v2.4.0 - name: Set up Python uses: actions/setup-python@v2.2.1 with: From 4ad548548fe83aa0ed3576506c653076d862cb2f Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Mon, 8 Nov 2021 14:57:00 -0500 Subject: [PATCH 066/260] swapped original init for a .from_dict method --- xarray/datatree_/datatree/__init__.py | 2 +- xarray/datatree_/datatree/datatree.py | 145 +++++++++++++------------- xarray/datatree_/datatree/io.py | 4 +- xarray/datatree_/datatree/mapping.py | 4 +- xarray/datatree_/datatree/treenode.py | 20 ++-- 5 files changed, 86 insertions(+), 89 deletions(-) diff --git a/xarray/datatree_/datatree/__init__.py b/xarray/datatree_/datatree/__init__.py index fbe1cba7860..7cd8ce5cd32 100644 --- a/xarray/datatree_/datatree/__init__.py +++ b/xarray/datatree_/datatree/__init__.py @@ -1,5 +1,5 @@ # flake8: noqa # Ignoring F401: imported but unused -from .datatree import DataNode, DataTree +from .datatree import DataTree from .io import open_datatree from .mapping import map_over_subtree diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index fbcec02bb2c..25c9437a572 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -14,7 +14,7 @@ MappedDatasetMethodsMixin, MappedDataWithCoords, ) -from .treenode import PathType, TreeNode, _init_single_treenode +from .treenode import PathType, TreeNode """ DEVELOPERS' NOTE @@ -39,24 +39,7 @@ class DataTree( """ A tree-like hierarchical collection of xarray objects. - Attempts to present the API of xarray.Dataset, but methods are wrapped to also update all the tree's child nodes. - - Parameters - ---------- - data_objects : dict-like, optional - A mapping from path names to xarray.Dataset, xarray.DataArray, or xtree.DataTree objects. - - Path names can be given as unix-like paths, or as tuples of strings (where each string - is known as a single "tag"). If path names containing more than one tag are given, new - tree nodes will be constructed as necessary. - - To assign data to the root node of the tree {name} as the path. - name : Hashable, optional - Name for the root node of the tree. Default is "root" - - See also - -------- - DataNode : Shortcut to create a DataTree with only a single node. + Attempts to present an API like that of xarray.Dataset, but methods are wrapped to also update all the tree's child nodes. """ # TODO should this instead be a subclass of Dataset? @@ -81,37 +64,37 @@ class DataTree( def __init__( self, - data_objects: Dict[PathType, Union[Dataset, DataArray, None]] = None, name: Hashable = "root", + data: Union[Dataset, DataArray] = None, + parent: TreeNode = None, + children: List[TreeNode] = None, ): - # First create the root node - super().__init__(name=name, parent=None, children=None) - if data_objects: - root_data = data_objects.pop(name, None) - else: - root_data = None - self._ds = root_data + """ + Create a single node of a DataTree, which optionally contains data in the form of an xarray.Dataset. - if data_objects: - # Populate tree with children determined from data_objects mapping - for path, data in data_objects.items(): - # Determine name of new node - path = self._tuple_or_path_to_path(path) - if self.separator in path: - node_path, node_name = path.rsplit(self.separator, maxsplit=1) - else: - node_path, node_name = "/", path + Parameters + ---------- + name : Hashable + Name for the root node of the tree. Default is "root" + data : Dataset, DataArray, Variable or None, optional + Data to store under the .ds attribute of this node. DataArrays and Variables will be promoted to Datasets. + Default is None. + parent : TreeNode, optional + Parent node to this node. Default is None. + children : Sequence[TreeNode], optional + Any child nodes of this node. Default is None. - relative_path = node_path.replace(self.name, "") + Returns + ------- + node : DataTree - # Create and set new node - new_node = DataNode(name=node_name, data=data) - self.set_node( - relative_path, - new_node, - allow_overwrite=False, - new_nodes_along_path=True, - ) + See Also + -------- + DataTree.from_dict + """ + + super().__init__(name, parent=parent, children=children) + self.ds = data @property def ds(self) -> Dataset: @@ -138,38 +121,59 @@ def has_data(self): return self.ds is not None @classmethod - def _init_single_datatree_node( + def from_dict( cls, - name: Hashable, - data: Union[Dataset, DataArray] = None, - parent: TreeNode = None, - children: List[TreeNode] = None, + data_objects: Dict[PathType, Union[Dataset, DataArray, None]] = None, + name: Hashable = "root", ): """ - Create a single node of a DataTree, which optionally contains data in the form of an xarray.Dataset. + Create a datatree from a dictionary of data objects, labelled by paths into the tree. Parameters ---------- - name : Hashable + data_objects : dict-like, optional + A mapping from path names to xarray.Dataset, xarray.DataArray, or DataTree objects. + + Path names can be given as unix-like paths, or as tuples of strings (where each string + is known as a single "tag"). If path names containing more than one tag are given, new + tree nodes will be constructed as necessary. + + To assign data to the root node of the tree use {name} as the path. + name : Hashable, optional Name for the root node of the tree. Default is "root" - data : Dataset, DataArray, Variable or None, optional - Data to store under the .ds attribute of this node. DataArrays and Variables will be promoted to Datasets. - Default is None. - parent : TreeNode, optional - Parent node to this node. Default is None. - children : Sequence[TreeNode], optional - Any child nodes of this node. Default is None. Returns ------- - node : DataTree + DataTree """ - # This approach was inspired by xarray.Dataset._construct_direct() - obj = object.__new__(cls) - obj._ds = None - obj = _init_single_treenode(obj, name=name, parent=parent, children=children) - obj.ds = data + # First create the root node + if data_objects: + root_data = data_objects.pop(name, None) + else: + root_data = None + obj = cls(name=name, data=root_data, parent=None, children=None) + + if data_objects: + # Populate tree with children determined from data_objects mapping + for path, data in data_objects.items(): + # Determine name of new node + path = obj._tuple_or_path_to_path(path) + if obj.separator in path: + node_path, node_name = path.rsplit(obj.separator, maxsplit=1) + else: + node_path, node_name = "/", path + + relative_path = node_path.replace(obj.name, "") + + # Create and set new node + new_node = cls(name=node_name, data=data) + obj.set_node( + relative_path, + new_node, + allow_overwrite=False, + new_nodes_along_path=True, + ) return obj def _pre_attach(self, parent: TreeNode) -> None: @@ -219,7 +223,7 @@ def __str__(self): def _single_node_repr(self): """Information about this node, not including its relationships to other nodes.""" - node_info = f"DataNode('{self.name}')" + node_info = f"DataTree('{self.name}')" if self.has_data: ds_info = "\n" + repr(self.ds) @@ -231,7 +235,7 @@ def __repr__(self): """Information about this node, including its relationships to other nodes.""" # TODO redo this to look like the Dataset repr, but just with child and parent info parent = self.parent.name if self.parent is not None else "None" - node_str = f"DataNode(name='{self.name}', parent='{parent}', children={[c.name for c in self.children]}," + node_str = f"DataTree(name='{self.name}', parent='{parent}', children={[c.name for c in self.children]}," if self.has_data: ds_repr_lines = self.ds.__repr__().splitlines() @@ -387,7 +391,7 @@ def __setitem__( else: # if nothing there then make new node based on type of object if isinstance(value, (Dataset, DataArray, Variable)) or value is None: - new_node = DataNode(name=last_tag, data=value) + new_node = DataTree(name=last_tag, data=value) self.set_node(path=path_tags, node=new_node) elif isinstance(value, TreeNode): self.set_node(path=path, node=value) @@ -467,7 +471,7 @@ def map_over_subtree_inplace( def render(self): """Print tree structure, including any data stored at each node.""" for pre, fill, node in anytree.RenderTree(self): - print(f"{pre}DataNode('{self.name}')") + print(f"{pre}DataTree('{self.name}')") for ds_line in repr(node.ds)[1:]: print(f"{fill}{ds_line}") @@ -602,6 +606,3 @@ def to_zarr(self, store, mode: str = "w", encoding=None, **kwargs): def plot(self): raise NotImplementedError - - -DataNode = DataTree._init_single_datatree_node diff --git a/xarray/datatree_/datatree/io.py b/xarray/datatree_/datatree/io.py index f7bdf570a04..533ded2b163 100644 --- a/xarray/datatree_/datatree/io.py +++ b/xarray/datatree_/datatree/io.py @@ -73,7 +73,7 @@ def _open_datatree_netcdf(filename: str, **kwargs) -> DataTree: with ncDataset(filename, mode="r") as ncds: ds = open_dataset(filename, **kwargs).pipe(_ds_or_none) - tree_root = DataTree(data_objects={"root": ds}) + tree_root = DataTree.from_dict(data_objects={"root": ds}) for key in _iter_nc_groups(ncds): tree_root[key] = open_dataset(filename, group=key, **kwargs).pipe( _ds_or_none @@ -86,7 +86,7 @@ def _open_datatree_zarr(store, **kwargs) -> DataTree: with zarr.open_group(store, mode="r") as zds: ds = open_dataset(store, engine="zarr", **kwargs).pipe(_ds_or_none) - tree_root = DataTree(data_objects={"root": ds}) + tree_root = DataTree.from_dict(data_objects={"root": ds}) for key in _iter_zarr_groups(zds): try: tree_root[key] = open_dataset( diff --git a/xarray/datatree_/datatree/mapping.py b/xarray/datatree_/datatree/mapping.py index 94b17ac04bd..dc0fb913f15 100644 --- a/xarray/datatree_/datatree/mapping.py +++ b/xarray/datatree_/datatree/mapping.py @@ -208,7 +208,9 @@ def _map_over_subtree(*args, **kwargs): output_node_data = None out_tree_contents[p] = output_node_data - new_tree = DataTree(name=first_tree.name, data_objects=out_tree_contents) + new_tree = DataTree.from_dict( + name=first_tree.name, data_objects=out_tree_contents + ) result_trees.append(new_tree) # If only one result then don't wrap it in a tuple diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index 276577e7fc3..463c68847a7 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -7,18 +7,6 @@ PathType = Union[Hashable, Sequence[Hashable]] -def _init_single_treenode(obj, name, parent, children): - if not isinstance(name, str) or "/" in name: - raise ValueError(f"invalid name {name}") - obj.name = name - - obj.parent = parent - if children: - obj.children = children - - return obj - - class TreeNode(anytree.NodeMixin): """ Base class representing a node of a tree, with methods for traversing and altering the tree. @@ -49,7 +37,13 @@ def __init__( parent: TreeNode = None, children: Iterable[TreeNode] = None, ): - _init_single_treenode(self, name=name, parent=parent, children=children) + if not isinstance(name, str) or "/" in name: + raise ValueError(f"invalid name {name}") + self.name = name + + self.parent = parent + if children: + self.children = children def __str__(self): """A printable representation of the structure of this entire subtree.""" From 997a21843e4270f2ccc704c5d02ecabfeafb25da Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Mon, 8 Nov 2021 14:57:41 -0500 Subject: [PATCH 067/260] updated tests --- .../datatree/tests/test_dataset_api.py | 48 +++---- .../datatree_/datatree/tests/test_datatree.py | 118 +++++++++--------- .../datatree_/datatree/tests/test_mapping.py | 28 ++--- 3 files changed, 97 insertions(+), 97 deletions(-) diff --git a/xarray/datatree_/datatree/tests/test_dataset_api.py b/xarray/datatree_/datatree/tests/test_dataset_api.py index f3276aa886b..a7284ec25eb 100644 --- a/xarray/datatree_/datatree/tests/test_dataset_api.py +++ b/xarray/datatree_/datatree/tests/test_dataset_api.py @@ -2,7 +2,7 @@ import xarray as xr from xarray.testing import assert_equal -from datatree import DataNode +from datatree import DataTree from .test_datatree import assert_tree_equal, create_test_datatree @@ -11,52 +11,52 @@ class TestDSMethodInheritance: def test_dataset_method(self): # test root da = xr.DataArray(name="a", data=[1, 2, 3], dims="x") - dt = DataNode("root", data=da) + dt = DataTree("root", data=da) expected_ds = da.to_dataset().isel(x=1) result_ds = dt.isel(x=1).ds assert_equal(result_ds, expected_ds) # test descendant - DataNode("results", parent=dt, data=da) + DataTree("results", parent=dt, data=da) result_ds = dt.isel(x=1)["results"].ds assert_equal(result_ds, expected_ds) def test_reduce_method(self): # test root da = xr.DataArray(name="a", data=[False, True, False], dims="x") - dt = DataNode("root", data=da) + dt = DataTree("root", data=da) expected_ds = da.to_dataset().any() result_ds = dt.any().ds assert_equal(result_ds, expected_ds) # test descendant - DataNode("results", parent=dt, data=da) + DataTree("results", parent=dt, data=da) result_ds = dt.any()["results"].ds assert_equal(result_ds, expected_ds) def test_nan_reduce_method(self): # test root da = xr.DataArray(name="a", data=[1, 2, 3], dims="x") - dt = DataNode("root", data=da) + dt = DataTree("root", data=da) expected_ds = da.to_dataset().mean() result_ds = dt.mean().ds assert_equal(result_ds, expected_ds) # test descendant - DataNode("results", parent=dt, data=da) + DataTree("results", parent=dt, data=da) result_ds = dt.mean()["results"].ds assert_equal(result_ds, expected_ds) def test_cum_method(self): # test root da = xr.DataArray(name="a", data=[1, 2, 3], dims="x") - dt = DataNode("root", data=da) + dt = DataTree("root", data=da) expected_ds = da.to_dataset().cumsum() result_ds = dt.cumsum().ds assert_equal(result_ds, expected_ds) # test descendant - DataNode("results", parent=dt, data=da) + DataTree("results", parent=dt, data=da) result_ds = dt.cumsum()["results"].ds assert_equal(result_ds, expected_ds) @@ -65,11 +65,11 @@ class TestOps: def test_binary_op_on_int(self): ds1 = xr.Dataset({"a": [5], "b": [3]}) ds2 = xr.Dataset({"x": [0.1, 0.2], "y": [10, 20]}) - dt = DataNode("root", data=ds1) - DataNode("subnode", data=ds2, parent=dt) + dt = DataTree("root", data=ds1) + DataTree("subnode", data=ds2, parent=dt) - expected_root = DataNode("root", data=ds1 * 5) - expected_descendant = DataNode("subnode", data=ds2 * 5, parent=expected_root) + expected_root = DataTree("root", data=ds1 * 5) + expected_descendant = DataTree("subnode", data=ds2 * 5, parent=expected_root) result = dt * 5 assert_equal(result.ds, expected_root.ds) @@ -78,12 +78,12 @@ def test_binary_op_on_int(self): def test_binary_op_on_dataset(self): ds1 = xr.Dataset({"a": [5], "b": [3]}) ds2 = xr.Dataset({"x": [0.1, 0.2], "y": [10, 20]}) - dt = DataNode("root", data=ds1) - DataNode("subnode", data=ds2, parent=dt) + dt = DataTree("root", data=ds1) + DataTree("subnode", data=ds2, parent=dt) other_ds = xr.Dataset({"z": ("z", [0.1, 0.2])}) - expected_root = DataNode("root", data=ds1 * other_ds) - expected_descendant = DataNode( + expected_root = DataTree("root", data=ds1 * other_ds) + expected_descendant = DataTree( "subnode", data=ds2 * other_ds, parent=expected_root ) result = dt * other_ds @@ -94,11 +94,11 @@ def test_binary_op_on_dataset(self): def test_binary_op_on_datatree(self): ds1 = xr.Dataset({"a": [5], "b": [3]}) ds2 = xr.Dataset({"x": [0.1, 0.2], "y": [10, 20]}) - dt = DataNode("root", data=ds1) - DataNode("subnode", data=ds2, parent=dt) + dt = DataTree("root", data=ds1) + DataTree("subnode", data=ds2, parent=dt) - expected_root = DataNode("root", data=ds1 * ds1) - expected_descendant = DataNode("subnode", data=ds2 * ds2, parent=expected_root) + expected_root = DataTree("root", data=ds1 * ds1) + expected_descendant = DataTree("subnode", data=ds2 * ds2, parent=expected_root) result = dt * dt assert_equal(result.ds, expected_root.ds) @@ -108,15 +108,15 @@ def test_binary_op_on_datatree(self): class TestUFuncs: def test_root(self): da = xr.DataArray(name="a", data=[1, 2, 3]) - dt = DataNode("root", data=da) + dt = DataTree("root", data=da) expected_ds = np.sin(da.to_dataset()) result_ds = np.sin(dt).ds assert_equal(result_ds, expected_ds) def test_descendants(self): da = xr.DataArray(name="a", data=[1, 2, 3]) - dt = DataNode("root") - DataNode("results", parent=dt, data=da) + dt = DataTree("root") + DataTree("results", parent=dt, data=da) expected_ds = np.sin(da.to_dataset()) result_ds = np.sin(dt)["results"].ds assert_equal(result_ds, expected_ds) diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 4a5d64ff774..7d1569876bd 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -3,7 +3,7 @@ from anytree.resolver import ChildResolverError from xarray.testing import assert_identical -from datatree import DataNode, DataTree +from datatree import DataTree from datatree.io import open_datatree from datatree.tests import requires_netCDF4, requires_zarr @@ -55,27 +55,27 @@ def create_test_datatree(modify=lambda ds: ds): root_data = modify(xr.Dataset({"a": ("y", [6, 7, 8]), "set0": ("x", [9, 10])})) # Avoid using __init__ so we can independently test it - root = DataNode(name="root", data=root_data) - set1 = DataNode(name="set1", parent=root, data=set1_data) - DataNode(name="set1", parent=set1) - DataNode(name="set2", parent=set1) - set2 = DataNode(name="set2", parent=root, data=set2_data) - DataNode(name="set1", parent=set2) - DataNode(name="set3", parent=root) + root = DataTree(name="root", data=root_data) + set1 = DataTree(name="set1", parent=root, data=set1_data) + DataTree(name="set1", parent=set1) + DataTree(name="set2", parent=set1) + set2 = DataTree(name="set2", parent=root, data=set2_data) + DataTree(name="set1", parent=set2) + DataTree(name="set3", parent=root) return root class TestStoreDatasets: - def test_create_datanode(self): + def test_create_DataTree(self): dat = xr.Dataset({"a": 0}) - john = DataNode("john", data=dat) + john = DataTree("john", data=dat) assert john.ds is dat with pytest.raises(TypeError): - DataNode("mary", parent=john, data="junk") + DataTree("mary", parent=john, data="junk") def test_set_data(self): - john = DataNode("john") + john = DataTree("john") dat = xr.Dataset({"a": 0}) john.ds = dat assert john.ds is dat @@ -83,25 +83,25 @@ def test_set_data(self): john.ds = "junk" def test_has_data(self): - john = DataNode("john", data=xr.Dataset({"a": 0})) + john = DataTree("john", data=xr.Dataset({"a": 0})) assert john.has_data - john = DataNode("john", data=None) + john = DataTree("john", data=None) assert not john.has_data class TestVariablesChildrenNameCollisions: def test_parent_already_has_variable_with_childs_name(self): - dt = DataNode("root", data=xr.Dataset({"a": [0], "b": 1})) + dt = DataTree("root", data=xr.Dataset({"a": [0], "b": 1})) with pytest.raises(KeyError, match="already contains a data variable named a"): - DataNode("a", data=None, parent=dt) + DataTree("a", data=None, parent=dt) with pytest.raises(KeyError, match="already contains a data variable named a"): - dt.add_child(DataNode("a", data=None)) + dt.add_child(DataTree("a", data=None)) def test_assign_when_already_child_with_variables_name(self): - dt = DataNode("root", data=None) - DataNode("a", data=None, parent=dt) + dt = DataTree("root", data=None) + DataTree("a", data=None, parent=dt) with pytest.raises(KeyError, match="already has a child named a"): dt.ds = xr.Dataset({"a": 0}) @@ -112,82 +112,82 @@ def test_assign_when_already_child_with_variables_name(self): @pytest.mark.xfail def test_update_when_already_child_with_variables_name(self): # See issue https://github.com/xarray-contrib/datatree/issues/38 - dt = DataNode("root", data=None) - DataNode("a", data=None, parent=dt) + dt = DataTree("root", data=None) + DataTree("a", data=None, parent=dt) with pytest.raises(KeyError, match="already has a child named a"): dt.ds["a"] = xr.DataArray(0) class TestGetItems: def test_get_node(self): - folder1 = DataNode("folder1") - results = DataNode("results", parent=folder1) - highres = DataNode("highres", parent=results) + folder1 = DataTree("folder1") + results = DataTree("results", parent=folder1) + highres = DataTree("highres", parent=results) assert folder1["results"] is results assert folder1["results/highres"] is highres assert folder1[("results", "highres")] is highres def test_get_single_data_variable(self): data = xr.Dataset({"temp": [0, 50]}) - results = DataNode("results", data=data) + results = DataTree("results", data=data) assert_identical(results["temp"], data["temp"]) def test_get_single_data_variable_from_node(self): data = xr.Dataset({"temp": [0, 50]}) - folder1 = DataNode("folder1") - results = DataNode("results", parent=folder1) - DataNode("highres", parent=results, data=data) + folder1 = DataTree("folder1") + results = DataTree("results", parent=folder1) + DataTree("highres", parent=results, data=data) assert_identical(folder1["results/highres/temp"], data["temp"]) assert_identical(folder1[("results", "highres", "temp")], data["temp"]) def test_get_nonexistent_node(self): - folder1 = DataNode("folder1") - DataNode("results", parent=folder1) + folder1 = DataTree("folder1") + DataTree("results", parent=folder1) with pytest.raises(ChildResolverError): folder1["results/highres"] def test_get_nonexistent_variable(self): data = xr.Dataset({"temp": [0, 50]}) - results = DataNode("results", data=data) + results = DataTree("results", data=data) with pytest.raises(ChildResolverError): results["pressure"] def test_get_multiple_data_variables(self): data = xr.Dataset({"temp": [0, 50], "p": [5, 8, 7]}) - results = DataNode("results", data=data) + results = DataTree("results", data=data) assert_identical(results[["temp", "p"]], data[["temp", "p"]]) def test_dict_like_selection_access_to_dataset(self): data = xr.Dataset({"temp": [0, 50]}) - results = DataNode("results", data=data) + results = DataTree("results", data=data) assert_identical(results[{"temp": 1}], data[{"temp": 1}]) class TestSetItems: # TODO test tuple-style access too def test_set_new_child_node(self): - john = DataNode("john") - mary = DataNode("mary") + john = DataTree("john") + mary = DataTree("mary") john["/"] = mary assert john["mary"] is mary def test_set_new_grandchild_node(self): - john = DataNode("john") - DataNode("mary", parent=john) - rose = DataNode("rose") + john = DataTree("john") + DataTree("mary", parent=john) + rose = DataTree("rose") john["mary/"] = rose assert john["mary/rose"] is rose def test_set_new_empty_node(self): - john = DataNode("john") + john = DataTree("john") john["mary"] = None mary = john["mary"] assert isinstance(mary, DataTree) assert mary.ds is None def test_overwrite_data_in_node_with_none(self): - john = DataNode("john") - mary = DataNode("mary", parent=john, data=xr.Dataset()) + john = DataTree("john") + mary = DataTree("mary", parent=john, data=xr.Dataset()) john["mary"] = None assert mary.ds is None @@ -197,47 +197,47 @@ def test_overwrite_data_in_node_with_none(self): def test_set_dataset_on_this_node(self): data = xr.Dataset({"temp": [0, 50]}) - results = DataNode("results") + results = DataTree("results") results["/"] = data assert results.ds is data def test_set_dataset_as_new_node(self): data = xr.Dataset({"temp": [0, 50]}) - folder1 = DataNode("folder1") + folder1 = DataTree("folder1") folder1["results"] = data assert folder1["results"].ds is data def test_set_dataset_as_new_node_requiring_intermediate_nodes(self): data = xr.Dataset({"temp": [0, 50]}) - folder1 = DataNode("folder1") + folder1 = DataTree("folder1") folder1["results/highres"] = data assert folder1["results/highres"].ds is data def test_set_named_dataarray_as_new_node(self): data = xr.DataArray(name="temp", data=[0, 50]) - folder1 = DataNode("folder1") + folder1 = DataTree("folder1") folder1["results"] = data assert_identical(folder1["results"].ds, data.to_dataset()) def test_set_unnamed_dataarray(self): data = xr.DataArray([0, 50]) - folder1 = DataNode("folder1") + folder1 = DataTree("folder1") with pytest.raises(ValueError, match="unable to convert"): folder1["results"] = data def test_add_new_variable_to_empty_node(self): - results = DataNode("results") + results = DataTree("results") results["/"] = xr.DataArray(name="pressure", data=[2, 3]) assert "pressure" in results.ds # What if there is a path to traverse first? - results = DataNode("results") + results = DataTree("results") results["highres/"] = xr.DataArray(name="pressure", data=[2, 3]) assert "pressure" in results["highres"].ds def test_dataarray_replace_existing_node(self): t = xr.Dataset({"temp": [0, 50]}) - results = DataNode("results", data=t) + results = DataTree("results", data=t) p = xr.DataArray(name="pressure", data=[2, 3]) results["/"] = p assert_identical(results.ds, p.to_dataset()) @@ -253,7 +253,7 @@ def test_empty(self): def test_data_in_root(self): dat = xr.Dataset() - dt = DataTree({"root": dat}) + dt = DataTree.from_dict({"root": dat}) assert dt.name == "root" assert dt.parent is None assert dt.children == () @@ -261,7 +261,7 @@ def test_data_in_root(self): def test_one_layer(self): dat1, dat2 = xr.Dataset({"a": 1}), xr.Dataset({"b": 2}) - dt = DataTree({"run1": dat1, "run2": dat2}) + dt = DataTree.from_dict({"run1": dat1, "run2": dat2}) assert dt.ds is None assert dt["run1"].ds is dat1 assert dt["run1"].children == () @@ -270,7 +270,7 @@ def test_one_layer(self): def test_two_layers(self): dat1, dat2 = xr.Dataset({"a": 1}), xr.Dataset({"a": [1, 2]}) - dt = DataTree({"highres/run": dat1, "lowres/run": dat2}) + dt = DataTree.from_dict({"highres/run": dat1, "lowres/run": dat2}) assert "highres" in [c.name for c in dt.children] assert "lowres" in [c.name for c in dt.children] highres_run = dt.get_node("highres/run") @@ -300,16 +300,16 @@ class TestRestructuring: class TestRepr: def test_print_empty_node(self): - dt = DataNode("root") + dt = DataTree("root") printout = dt.__str__() - assert printout == "DataNode('root')" + assert printout == "DataTree('root')" def test_print_node_with_data(self): dat = xr.Dataset({"a": [0, 2]}) - dt = DataNode("root", data=dat) + dt = DataTree("root", data=dat) printout = dt.__str__() expected = [ - "DataNode('root')", + "DataTree('root')", "Dimensions", "Coordinates", "a", @@ -321,8 +321,8 @@ def test_print_node_with_data(self): def test_nested_node(self): dat = xr.Dataset({"a": [0, 2]}) - root = DataNode("root") - DataNode("results", data=dat, parent=root) + root = DataTree("root") + DataTree("results", data=dat, parent=root) printout = root.__str__() assert printout.splitlines()[2].startswith(" ") @@ -335,7 +335,7 @@ def test_print_datatree(self): def test_repr_of_node_with_data(self): dat = xr.Dataset({"a": [0, 2]}) - dt = DataNode("root", data=dat) + dt = DataTree("root", data=dat) assert "Coordinates" in repr(dt) diff --git a/xarray/datatree_/datatree/tests/test_mapping.py b/xarray/datatree_/datatree/tests/test_mapping.py index 00b30f57b7c..050bbbf6c9f 100644 --- a/xarray/datatree_/datatree/tests/test_mapping.py +++ b/xarray/datatree_/datatree/tests/test_mapping.py @@ -16,8 +16,8 @@ def test_not_a_tree(self): _check_isomorphic("s", 1) def test_different_widths(self): - dt1 = DataTree(data_objects={"a": empty}) - dt2 = DataTree(data_objects={"a": empty, "b": empty}) + dt1 = DataTree.from_dict(data_objects={"a": empty}) + dt2 = DataTree.from_dict(data_objects={"a": empty, "b": empty}) expected_err_str = ( "'root' in the first tree has 1 children, whereas its counterpart node 'root' in the " "second tree has 2 children" @@ -26,8 +26,8 @@ def test_different_widths(self): _check_isomorphic(dt1, dt2) def test_different_heights(self): - dt1 = DataTree(data_objects={"a": empty}) - dt2 = DataTree(data_objects={"a": empty, "a/b": empty}) + dt1 = DataTree.from_dict(data_objects={"a": empty}) + dt2 = DataTree.from_dict(data_objects={"a": empty, "a/b": empty}) expected_err_str = ( "'root/a' in the first tree has 0 children, whereas its counterpart node 'root/a' in the " "second tree has 1 children" @@ -36,8 +36,8 @@ def test_different_heights(self): _check_isomorphic(dt1, dt2) def test_only_one_has_data(self): - dt1 = DataTree(data_objects={"a": xr.Dataset({"a": 0})}) - dt2 = DataTree(data_objects={"a": None}) + dt1 = DataTree.from_dict(data_objects={"a": xr.Dataset({"a": 0})}) + dt2 = DataTree.from_dict(data_objects={"a": None}) expected_err_str = ( "'root/a' in the first tree has data, whereas its counterpart node 'root/a' in the " "second tree has no data" @@ -46,8 +46,8 @@ def test_only_one_has_data(self): _check_isomorphic(dt1, dt2) def test_names_different(self): - dt1 = DataTree(data_objects={"a": xr.Dataset()}) - dt2 = DataTree(data_objects={"b": empty}) + dt1 = DataTree.from_dict(data_objects={"a": xr.Dataset()}) + dt2 = DataTree.from_dict(data_objects={"b": empty}) expected_err_str = ( "'root/a' in the first tree has name 'a', whereas its counterpart node 'root/b' in the " "second tree has name 'b'" @@ -56,28 +56,28 @@ def test_names_different(self): _check_isomorphic(dt1, dt2, require_names_equal=True) def test_isomorphic_names_equal(self): - dt1 = DataTree( + dt1 = DataTree.from_dict( data_objects={"a": empty, "b": empty, "b/c": empty, "b/d": empty} ) - dt2 = DataTree( + dt2 = DataTree.from_dict( data_objects={"a": empty, "b": empty, "b/c": empty, "b/d": empty} ) _check_isomorphic(dt1, dt2, require_names_equal=True) def test_isomorphic_ordering(self): - dt1 = DataTree( + dt1 = DataTree.from_dict( data_objects={"a": empty, "b": empty, "b/d": empty, "b/c": empty} ) - dt2 = DataTree( + dt2 = DataTree.from_dict( data_objects={"a": empty, "b": empty, "b/c": empty, "b/d": empty} ) _check_isomorphic(dt1, dt2, require_names_equal=False) def test_isomorphic_names_not_equal(self): - dt1 = DataTree( + dt1 = DataTree.from_dict( data_objects={"a": empty, "b": empty, "b/c": empty, "b/d": empty} ) - dt2 = DataTree( + dt2 = DataTree.from_dict( data_objects={"A": empty, "B": empty, "B/C": empty, "B/D": empty} ) _check_isomorphic(dt1, dt2) From fc4d0f977829293ea570b9b0f92264c72761bce3 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Mon, 8 Nov 2021 14:58:07 -0500 Subject: [PATCH 068/260] corrected options for creating datatree objects --- xarray/datatree_/README.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/xarray/datatree_/README.md b/xarray/datatree_/README.md index e3368eeaa9c..63d9bd8e0e1 100644 --- a/xarray/datatree_/README.md +++ b/xarray/datatree_/README.md @@ -14,7 +14,8 @@ The approach used here is based on benbovy's [`DatasetNode` example](https://gis drawing You can create a `DataTree` object in 3 ways: -1) Load from a netCDF file that has groups via `open_datatree()`, -2) Using the init method of `DataTree`, which accepts a nested dictionary of Datasets, -3) Manually create individual nodes with `DataNode()` and specify their relationships to each other, either by setting `.parent` and `.chlldren` attributes, or through `__get/setitem__` access, e.g. -`dt['path/to/node'] = xr.Dataset()` +1) Load from a netCDF file (or Zarr store) that has groups via `open_datatree()`. +2) Using the init method of `DataTree`, which creates an individual node. + You can then specify the nodes' relationships to one other, either by setting `.parent` and `.chlldren` attributes, + or through `__get/setitem__` access, e.g. `dt['path/to/node'] = xr.Dataset()`. +3) Create a tree from a dictionary of paths to datasets using `DataTree.from_dict()`. From 02d2d295f5c9a9eb739332a1b2f250744cb89021 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Mon, 8 Nov 2021 15:01:35 -0500 Subject: [PATCH 069/260] revert accidental push to main --- xarray/datatree_/README.md | 9 +- xarray/datatree_/datatree/__init__.py | 2 +- xarray/datatree_/datatree/datatree.py | 145 +++++++++--------- xarray/datatree_/datatree/io.py | 4 +- xarray/datatree_/datatree/mapping.py | 4 +- .../datatree/tests/test_dataset_api.py | 48 +++--- .../datatree_/datatree/tests/test_datatree.py | 118 +++++++------- .../datatree_/datatree/tests/test_mapping.py | 28 ++-- xarray/datatree_/datatree/treenode.py | 20 ++- 9 files changed, 190 insertions(+), 188 deletions(-) diff --git a/xarray/datatree_/README.md b/xarray/datatree_/README.md index 63d9bd8e0e1..e3368eeaa9c 100644 --- a/xarray/datatree_/README.md +++ b/xarray/datatree_/README.md @@ -14,8 +14,7 @@ The approach used here is based on benbovy's [`DatasetNode` example](https://gis drawing You can create a `DataTree` object in 3 ways: -1) Load from a netCDF file (or Zarr store) that has groups via `open_datatree()`. -2) Using the init method of `DataTree`, which creates an individual node. - You can then specify the nodes' relationships to one other, either by setting `.parent` and `.chlldren` attributes, - or through `__get/setitem__` access, e.g. `dt['path/to/node'] = xr.Dataset()`. -3) Create a tree from a dictionary of paths to datasets using `DataTree.from_dict()`. +1) Load from a netCDF file that has groups via `open_datatree()`, +2) Using the init method of `DataTree`, which accepts a nested dictionary of Datasets, +3) Manually create individual nodes with `DataNode()` and specify their relationships to each other, either by setting `.parent` and `.chlldren` attributes, or through `__get/setitem__` access, e.g. +`dt['path/to/node'] = xr.Dataset()` diff --git a/xarray/datatree_/datatree/__init__.py b/xarray/datatree_/datatree/__init__.py index 7cd8ce5cd32..fbe1cba7860 100644 --- a/xarray/datatree_/datatree/__init__.py +++ b/xarray/datatree_/datatree/__init__.py @@ -1,5 +1,5 @@ # flake8: noqa # Ignoring F401: imported but unused -from .datatree import DataTree +from .datatree import DataNode, DataTree from .io import open_datatree from .mapping import map_over_subtree diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 25c9437a572..fbcec02bb2c 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -14,7 +14,7 @@ MappedDatasetMethodsMixin, MappedDataWithCoords, ) -from .treenode import PathType, TreeNode +from .treenode import PathType, TreeNode, _init_single_treenode """ DEVELOPERS' NOTE @@ -39,7 +39,24 @@ class DataTree( """ A tree-like hierarchical collection of xarray objects. - Attempts to present an API like that of xarray.Dataset, but methods are wrapped to also update all the tree's child nodes. + Attempts to present the API of xarray.Dataset, but methods are wrapped to also update all the tree's child nodes. + + Parameters + ---------- + data_objects : dict-like, optional + A mapping from path names to xarray.Dataset, xarray.DataArray, or xtree.DataTree objects. + + Path names can be given as unix-like paths, or as tuples of strings (where each string + is known as a single "tag"). If path names containing more than one tag are given, new + tree nodes will be constructed as necessary. + + To assign data to the root node of the tree {name} as the path. + name : Hashable, optional + Name for the root node of the tree. Default is "root" + + See also + -------- + DataNode : Shortcut to create a DataTree with only a single node. """ # TODO should this instead be a subclass of Dataset? @@ -64,37 +81,37 @@ class DataTree( def __init__( self, + data_objects: Dict[PathType, Union[Dataset, DataArray, None]] = None, name: Hashable = "root", - data: Union[Dataset, DataArray] = None, - parent: TreeNode = None, - children: List[TreeNode] = None, ): - """ - Create a single node of a DataTree, which optionally contains data in the form of an xarray.Dataset. - - Parameters - ---------- - name : Hashable - Name for the root node of the tree. Default is "root" - data : Dataset, DataArray, Variable or None, optional - Data to store under the .ds attribute of this node. DataArrays and Variables will be promoted to Datasets. - Default is None. - parent : TreeNode, optional - Parent node to this node. Default is None. - children : Sequence[TreeNode], optional - Any child nodes of this node. Default is None. + # First create the root node + super().__init__(name=name, parent=None, children=None) + if data_objects: + root_data = data_objects.pop(name, None) + else: + root_data = None + self._ds = root_data - Returns - ------- - node : DataTree + if data_objects: + # Populate tree with children determined from data_objects mapping + for path, data in data_objects.items(): + # Determine name of new node + path = self._tuple_or_path_to_path(path) + if self.separator in path: + node_path, node_name = path.rsplit(self.separator, maxsplit=1) + else: + node_path, node_name = "/", path - See Also - -------- - DataTree.from_dict - """ + relative_path = node_path.replace(self.name, "") - super().__init__(name, parent=parent, children=children) - self.ds = data + # Create and set new node + new_node = DataNode(name=node_name, data=data) + self.set_node( + relative_path, + new_node, + allow_overwrite=False, + new_nodes_along_path=True, + ) @property def ds(self) -> Dataset: @@ -121,59 +138,38 @@ def has_data(self): return self.ds is not None @classmethod - def from_dict( + def _init_single_datatree_node( cls, - data_objects: Dict[PathType, Union[Dataset, DataArray, None]] = None, - name: Hashable = "root", + name: Hashable, + data: Union[Dataset, DataArray] = None, + parent: TreeNode = None, + children: List[TreeNode] = None, ): """ - Create a datatree from a dictionary of data objects, labelled by paths into the tree. + Create a single node of a DataTree, which optionally contains data in the form of an xarray.Dataset. Parameters ---------- - data_objects : dict-like, optional - A mapping from path names to xarray.Dataset, xarray.DataArray, or DataTree objects. - - Path names can be given as unix-like paths, or as tuples of strings (where each string - is known as a single "tag"). If path names containing more than one tag are given, new - tree nodes will be constructed as necessary. - - To assign data to the root node of the tree use {name} as the path. - name : Hashable, optional + name : Hashable Name for the root node of the tree. Default is "root" + data : Dataset, DataArray, Variable or None, optional + Data to store under the .ds attribute of this node. DataArrays and Variables will be promoted to Datasets. + Default is None. + parent : TreeNode, optional + Parent node to this node. Default is None. + children : Sequence[TreeNode], optional + Any child nodes of this node. Default is None. Returns ------- - DataTree + node : DataTree """ - # First create the root node - if data_objects: - root_data = data_objects.pop(name, None) - else: - root_data = None - obj = cls(name=name, data=root_data, parent=None, children=None) - - if data_objects: - # Populate tree with children determined from data_objects mapping - for path, data in data_objects.items(): - # Determine name of new node - path = obj._tuple_or_path_to_path(path) - if obj.separator in path: - node_path, node_name = path.rsplit(obj.separator, maxsplit=1) - else: - node_path, node_name = "/", path - - relative_path = node_path.replace(obj.name, "") - - # Create and set new node - new_node = cls(name=node_name, data=data) - obj.set_node( - relative_path, - new_node, - allow_overwrite=False, - new_nodes_along_path=True, - ) + # This approach was inspired by xarray.Dataset._construct_direct() + obj = object.__new__(cls) + obj._ds = None + obj = _init_single_treenode(obj, name=name, parent=parent, children=children) + obj.ds = data return obj def _pre_attach(self, parent: TreeNode) -> None: @@ -223,7 +219,7 @@ def __str__(self): def _single_node_repr(self): """Information about this node, not including its relationships to other nodes.""" - node_info = f"DataTree('{self.name}')" + node_info = f"DataNode('{self.name}')" if self.has_data: ds_info = "\n" + repr(self.ds) @@ -235,7 +231,7 @@ def __repr__(self): """Information about this node, including its relationships to other nodes.""" # TODO redo this to look like the Dataset repr, but just with child and parent info parent = self.parent.name if self.parent is not None else "None" - node_str = f"DataTree(name='{self.name}', parent='{parent}', children={[c.name for c in self.children]}," + node_str = f"DataNode(name='{self.name}', parent='{parent}', children={[c.name for c in self.children]}," if self.has_data: ds_repr_lines = self.ds.__repr__().splitlines() @@ -391,7 +387,7 @@ def __setitem__( else: # if nothing there then make new node based on type of object if isinstance(value, (Dataset, DataArray, Variable)) or value is None: - new_node = DataTree(name=last_tag, data=value) + new_node = DataNode(name=last_tag, data=value) self.set_node(path=path_tags, node=new_node) elif isinstance(value, TreeNode): self.set_node(path=path, node=value) @@ -471,7 +467,7 @@ def map_over_subtree_inplace( def render(self): """Print tree structure, including any data stored at each node.""" for pre, fill, node in anytree.RenderTree(self): - print(f"{pre}DataTree('{self.name}')") + print(f"{pre}DataNode('{self.name}')") for ds_line in repr(node.ds)[1:]: print(f"{fill}{ds_line}") @@ -606,3 +602,6 @@ def to_zarr(self, store, mode: str = "w", encoding=None, **kwargs): def plot(self): raise NotImplementedError + + +DataNode = DataTree._init_single_datatree_node diff --git a/xarray/datatree_/datatree/io.py b/xarray/datatree_/datatree/io.py index 533ded2b163..f7bdf570a04 100644 --- a/xarray/datatree_/datatree/io.py +++ b/xarray/datatree_/datatree/io.py @@ -73,7 +73,7 @@ def _open_datatree_netcdf(filename: str, **kwargs) -> DataTree: with ncDataset(filename, mode="r") as ncds: ds = open_dataset(filename, **kwargs).pipe(_ds_or_none) - tree_root = DataTree.from_dict(data_objects={"root": ds}) + tree_root = DataTree(data_objects={"root": ds}) for key in _iter_nc_groups(ncds): tree_root[key] = open_dataset(filename, group=key, **kwargs).pipe( _ds_or_none @@ -86,7 +86,7 @@ def _open_datatree_zarr(store, **kwargs) -> DataTree: with zarr.open_group(store, mode="r") as zds: ds = open_dataset(store, engine="zarr", **kwargs).pipe(_ds_or_none) - tree_root = DataTree.from_dict(data_objects={"root": ds}) + tree_root = DataTree(data_objects={"root": ds}) for key in _iter_zarr_groups(zds): try: tree_root[key] = open_dataset( diff --git a/xarray/datatree_/datatree/mapping.py b/xarray/datatree_/datatree/mapping.py index dc0fb913f15..94b17ac04bd 100644 --- a/xarray/datatree_/datatree/mapping.py +++ b/xarray/datatree_/datatree/mapping.py @@ -208,9 +208,7 @@ def _map_over_subtree(*args, **kwargs): output_node_data = None out_tree_contents[p] = output_node_data - new_tree = DataTree.from_dict( - name=first_tree.name, data_objects=out_tree_contents - ) + new_tree = DataTree(name=first_tree.name, data_objects=out_tree_contents) result_trees.append(new_tree) # If only one result then don't wrap it in a tuple diff --git a/xarray/datatree_/datatree/tests/test_dataset_api.py b/xarray/datatree_/datatree/tests/test_dataset_api.py index a7284ec25eb..f3276aa886b 100644 --- a/xarray/datatree_/datatree/tests/test_dataset_api.py +++ b/xarray/datatree_/datatree/tests/test_dataset_api.py @@ -2,7 +2,7 @@ import xarray as xr from xarray.testing import assert_equal -from datatree import DataTree +from datatree import DataNode from .test_datatree import assert_tree_equal, create_test_datatree @@ -11,52 +11,52 @@ class TestDSMethodInheritance: def test_dataset_method(self): # test root da = xr.DataArray(name="a", data=[1, 2, 3], dims="x") - dt = DataTree("root", data=da) + dt = DataNode("root", data=da) expected_ds = da.to_dataset().isel(x=1) result_ds = dt.isel(x=1).ds assert_equal(result_ds, expected_ds) # test descendant - DataTree("results", parent=dt, data=da) + DataNode("results", parent=dt, data=da) result_ds = dt.isel(x=1)["results"].ds assert_equal(result_ds, expected_ds) def test_reduce_method(self): # test root da = xr.DataArray(name="a", data=[False, True, False], dims="x") - dt = DataTree("root", data=da) + dt = DataNode("root", data=da) expected_ds = da.to_dataset().any() result_ds = dt.any().ds assert_equal(result_ds, expected_ds) # test descendant - DataTree("results", parent=dt, data=da) + DataNode("results", parent=dt, data=da) result_ds = dt.any()["results"].ds assert_equal(result_ds, expected_ds) def test_nan_reduce_method(self): # test root da = xr.DataArray(name="a", data=[1, 2, 3], dims="x") - dt = DataTree("root", data=da) + dt = DataNode("root", data=da) expected_ds = da.to_dataset().mean() result_ds = dt.mean().ds assert_equal(result_ds, expected_ds) # test descendant - DataTree("results", parent=dt, data=da) + DataNode("results", parent=dt, data=da) result_ds = dt.mean()["results"].ds assert_equal(result_ds, expected_ds) def test_cum_method(self): # test root da = xr.DataArray(name="a", data=[1, 2, 3], dims="x") - dt = DataTree("root", data=da) + dt = DataNode("root", data=da) expected_ds = da.to_dataset().cumsum() result_ds = dt.cumsum().ds assert_equal(result_ds, expected_ds) # test descendant - DataTree("results", parent=dt, data=da) + DataNode("results", parent=dt, data=da) result_ds = dt.cumsum()["results"].ds assert_equal(result_ds, expected_ds) @@ -65,11 +65,11 @@ class TestOps: def test_binary_op_on_int(self): ds1 = xr.Dataset({"a": [5], "b": [3]}) ds2 = xr.Dataset({"x": [0.1, 0.2], "y": [10, 20]}) - dt = DataTree("root", data=ds1) - DataTree("subnode", data=ds2, parent=dt) + dt = DataNode("root", data=ds1) + DataNode("subnode", data=ds2, parent=dt) - expected_root = DataTree("root", data=ds1 * 5) - expected_descendant = DataTree("subnode", data=ds2 * 5, parent=expected_root) + expected_root = DataNode("root", data=ds1 * 5) + expected_descendant = DataNode("subnode", data=ds2 * 5, parent=expected_root) result = dt * 5 assert_equal(result.ds, expected_root.ds) @@ -78,12 +78,12 @@ def test_binary_op_on_int(self): def test_binary_op_on_dataset(self): ds1 = xr.Dataset({"a": [5], "b": [3]}) ds2 = xr.Dataset({"x": [0.1, 0.2], "y": [10, 20]}) - dt = DataTree("root", data=ds1) - DataTree("subnode", data=ds2, parent=dt) + dt = DataNode("root", data=ds1) + DataNode("subnode", data=ds2, parent=dt) other_ds = xr.Dataset({"z": ("z", [0.1, 0.2])}) - expected_root = DataTree("root", data=ds1 * other_ds) - expected_descendant = DataTree( + expected_root = DataNode("root", data=ds1 * other_ds) + expected_descendant = DataNode( "subnode", data=ds2 * other_ds, parent=expected_root ) result = dt * other_ds @@ -94,11 +94,11 @@ def test_binary_op_on_dataset(self): def test_binary_op_on_datatree(self): ds1 = xr.Dataset({"a": [5], "b": [3]}) ds2 = xr.Dataset({"x": [0.1, 0.2], "y": [10, 20]}) - dt = DataTree("root", data=ds1) - DataTree("subnode", data=ds2, parent=dt) + dt = DataNode("root", data=ds1) + DataNode("subnode", data=ds2, parent=dt) - expected_root = DataTree("root", data=ds1 * ds1) - expected_descendant = DataTree("subnode", data=ds2 * ds2, parent=expected_root) + expected_root = DataNode("root", data=ds1 * ds1) + expected_descendant = DataNode("subnode", data=ds2 * ds2, parent=expected_root) result = dt * dt assert_equal(result.ds, expected_root.ds) @@ -108,15 +108,15 @@ def test_binary_op_on_datatree(self): class TestUFuncs: def test_root(self): da = xr.DataArray(name="a", data=[1, 2, 3]) - dt = DataTree("root", data=da) + dt = DataNode("root", data=da) expected_ds = np.sin(da.to_dataset()) result_ds = np.sin(dt).ds assert_equal(result_ds, expected_ds) def test_descendants(self): da = xr.DataArray(name="a", data=[1, 2, 3]) - dt = DataTree("root") - DataTree("results", parent=dt, data=da) + dt = DataNode("root") + DataNode("results", parent=dt, data=da) expected_ds = np.sin(da.to_dataset()) result_ds = np.sin(dt)["results"].ds assert_equal(result_ds, expected_ds) diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 7d1569876bd..4a5d64ff774 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -3,7 +3,7 @@ from anytree.resolver import ChildResolverError from xarray.testing import assert_identical -from datatree import DataTree +from datatree import DataNode, DataTree from datatree.io import open_datatree from datatree.tests import requires_netCDF4, requires_zarr @@ -55,27 +55,27 @@ def create_test_datatree(modify=lambda ds: ds): root_data = modify(xr.Dataset({"a": ("y", [6, 7, 8]), "set0": ("x", [9, 10])})) # Avoid using __init__ so we can independently test it - root = DataTree(name="root", data=root_data) - set1 = DataTree(name="set1", parent=root, data=set1_data) - DataTree(name="set1", parent=set1) - DataTree(name="set2", parent=set1) - set2 = DataTree(name="set2", parent=root, data=set2_data) - DataTree(name="set1", parent=set2) - DataTree(name="set3", parent=root) + root = DataNode(name="root", data=root_data) + set1 = DataNode(name="set1", parent=root, data=set1_data) + DataNode(name="set1", parent=set1) + DataNode(name="set2", parent=set1) + set2 = DataNode(name="set2", parent=root, data=set2_data) + DataNode(name="set1", parent=set2) + DataNode(name="set3", parent=root) return root class TestStoreDatasets: - def test_create_DataTree(self): + def test_create_datanode(self): dat = xr.Dataset({"a": 0}) - john = DataTree("john", data=dat) + john = DataNode("john", data=dat) assert john.ds is dat with pytest.raises(TypeError): - DataTree("mary", parent=john, data="junk") + DataNode("mary", parent=john, data="junk") def test_set_data(self): - john = DataTree("john") + john = DataNode("john") dat = xr.Dataset({"a": 0}) john.ds = dat assert john.ds is dat @@ -83,25 +83,25 @@ def test_set_data(self): john.ds = "junk" def test_has_data(self): - john = DataTree("john", data=xr.Dataset({"a": 0})) + john = DataNode("john", data=xr.Dataset({"a": 0})) assert john.has_data - john = DataTree("john", data=None) + john = DataNode("john", data=None) assert not john.has_data class TestVariablesChildrenNameCollisions: def test_parent_already_has_variable_with_childs_name(self): - dt = DataTree("root", data=xr.Dataset({"a": [0], "b": 1})) + dt = DataNode("root", data=xr.Dataset({"a": [0], "b": 1})) with pytest.raises(KeyError, match="already contains a data variable named a"): - DataTree("a", data=None, parent=dt) + DataNode("a", data=None, parent=dt) with pytest.raises(KeyError, match="already contains a data variable named a"): - dt.add_child(DataTree("a", data=None)) + dt.add_child(DataNode("a", data=None)) def test_assign_when_already_child_with_variables_name(self): - dt = DataTree("root", data=None) - DataTree("a", data=None, parent=dt) + dt = DataNode("root", data=None) + DataNode("a", data=None, parent=dt) with pytest.raises(KeyError, match="already has a child named a"): dt.ds = xr.Dataset({"a": 0}) @@ -112,82 +112,82 @@ def test_assign_when_already_child_with_variables_name(self): @pytest.mark.xfail def test_update_when_already_child_with_variables_name(self): # See issue https://github.com/xarray-contrib/datatree/issues/38 - dt = DataTree("root", data=None) - DataTree("a", data=None, parent=dt) + dt = DataNode("root", data=None) + DataNode("a", data=None, parent=dt) with pytest.raises(KeyError, match="already has a child named a"): dt.ds["a"] = xr.DataArray(0) class TestGetItems: def test_get_node(self): - folder1 = DataTree("folder1") - results = DataTree("results", parent=folder1) - highres = DataTree("highres", parent=results) + folder1 = DataNode("folder1") + results = DataNode("results", parent=folder1) + highres = DataNode("highres", parent=results) assert folder1["results"] is results assert folder1["results/highres"] is highres assert folder1[("results", "highres")] is highres def test_get_single_data_variable(self): data = xr.Dataset({"temp": [0, 50]}) - results = DataTree("results", data=data) + results = DataNode("results", data=data) assert_identical(results["temp"], data["temp"]) def test_get_single_data_variable_from_node(self): data = xr.Dataset({"temp": [0, 50]}) - folder1 = DataTree("folder1") - results = DataTree("results", parent=folder1) - DataTree("highres", parent=results, data=data) + folder1 = DataNode("folder1") + results = DataNode("results", parent=folder1) + DataNode("highres", parent=results, data=data) assert_identical(folder1["results/highres/temp"], data["temp"]) assert_identical(folder1[("results", "highres", "temp")], data["temp"]) def test_get_nonexistent_node(self): - folder1 = DataTree("folder1") - DataTree("results", parent=folder1) + folder1 = DataNode("folder1") + DataNode("results", parent=folder1) with pytest.raises(ChildResolverError): folder1["results/highres"] def test_get_nonexistent_variable(self): data = xr.Dataset({"temp": [0, 50]}) - results = DataTree("results", data=data) + results = DataNode("results", data=data) with pytest.raises(ChildResolverError): results["pressure"] def test_get_multiple_data_variables(self): data = xr.Dataset({"temp": [0, 50], "p": [5, 8, 7]}) - results = DataTree("results", data=data) + results = DataNode("results", data=data) assert_identical(results[["temp", "p"]], data[["temp", "p"]]) def test_dict_like_selection_access_to_dataset(self): data = xr.Dataset({"temp": [0, 50]}) - results = DataTree("results", data=data) + results = DataNode("results", data=data) assert_identical(results[{"temp": 1}], data[{"temp": 1}]) class TestSetItems: # TODO test tuple-style access too def test_set_new_child_node(self): - john = DataTree("john") - mary = DataTree("mary") + john = DataNode("john") + mary = DataNode("mary") john["/"] = mary assert john["mary"] is mary def test_set_new_grandchild_node(self): - john = DataTree("john") - DataTree("mary", parent=john) - rose = DataTree("rose") + john = DataNode("john") + DataNode("mary", parent=john) + rose = DataNode("rose") john["mary/"] = rose assert john["mary/rose"] is rose def test_set_new_empty_node(self): - john = DataTree("john") + john = DataNode("john") john["mary"] = None mary = john["mary"] assert isinstance(mary, DataTree) assert mary.ds is None def test_overwrite_data_in_node_with_none(self): - john = DataTree("john") - mary = DataTree("mary", parent=john, data=xr.Dataset()) + john = DataNode("john") + mary = DataNode("mary", parent=john, data=xr.Dataset()) john["mary"] = None assert mary.ds is None @@ -197,47 +197,47 @@ def test_overwrite_data_in_node_with_none(self): def test_set_dataset_on_this_node(self): data = xr.Dataset({"temp": [0, 50]}) - results = DataTree("results") + results = DataNode("results") results["/"] = data assert results.ds is data def test_set_dataset_as_new_node(self): data = xr.Dataset({"temp": [0, 50]}) - folder1 = DataTree("folder1") + folder1 = DataNode("folder1") folder1["results"] = data assert folder1["results"].ds is data def test_set_dataset_as_new_node_requiring_intermediate_nodes(self): data = xr.Dataset({"temp": [0, 50]}) - folder1 = DataTree("folder1") + folder1 = DataNode("folder1") folder1["results/highres"] = data assert folder1["results/highres"].ds is data def test_set_named_dataarray_as_new_node(self): data = xr.DataArray(name="temp", data=[0, 50]) - folder1 = DataTree("folder1") + folder1 = DataNode("folder1") folder1["results"] = data assert_identical(folder1["results"].ds, data.to_dataset()) def test_set_unnamed_dataarray(self): data = xr.DataArray([0, 50]) - folder1 = DataTree("folder1") + folder1 = DataNode("folder1") with pytest.raises(ValueError, match="unable to convert"): folder1["results"] = data def test_add_new_variable_to_empty_node(self): - results = DataTree("results") + results = DataNode("results") results["/"] = xr.DataArray(name="pressure", data=[2, 3]) assert "pressure" in results.ds # What if there is a path to traverse first? - results = DataTree("results") + results = DataNode("results") results["highres/"] = xr.DataArray(name="pressure", data=[2, 3]) assert "pressure" in results["highres"].ds def test_dataarray_replace_existing_node(self): t = xr.Dataset({"temp": [0, 50]}) - results = DataTree("results", data=t) + results = DataNode("results", data=t) p = xr.DataArray(name="pressure", data=[2, 3]) results["/"] = p assert_identical(results.ds, p.to_dataset()) @@ -253,7 +253,7 @@ def test_empty(self): def test_data_in_root(self): dat = xr.Dataset() - dt = DataTree.from_dict({"root": dat}) + dt = DataTree({"root": dat}) assert dt.name == "root" assert dt.parent is None assert dt.children == () @@ -261,7 +261,7 @@ def test_data_in_root(self): def test_one_layer(self): dat1, dat2 = xr.Dataset({"a": 1}), xr.Dataset({"b": 2}) - dt = DataTree.from_dict({"run1": dat1, "run2": dat2}) + dt = DataTree({"run1": dat1, "run2": dat2}) assert dt.ds is None assert dt["run1"].ds is dat1 assert dt["run1"].children == () @@ -270,7 +270,7 @@ def test_one_layer(self): def test_two_layers(self): dat1, dat2 = xr.Dataset({"a": 1}), xr.Dataset({"a": [1, 2]}) - dt = DataTree.from_dict({"highres/run": dat1, "lowres/run": dat2}) + dt = DataTree({"highres/run": dat1, "lowres/run": dat2}) assert "highres" in [c.name for c in dt.children] assert "lowres" in [c.name for c in dt.children] highres_run = dt.get_node("highres/run") @@ -300,16 +300,16 @@ class TestRestructuring: class TestRepr: def test_print_empty_node(self): - dt = DataTree("root") + dt = DataNode("root") printout = dt.__str__() - assert printout == "DataTree('root')" + assert printout == "DataNode('root')" def test_print_node_with_data(self): dat = xr.Dataset({"a": [0, 2]}) - dt = DataTree("root", data=dat) + dt = DataNode("root", data=dat) printout = dt.__str__() expected = [ - "DataTree('root')", + "DataNode('root')", "Dimensions", "Coordinates", "a", @@ -321,8 +321,8 @@ def test_print_node_with_data(self): def test_nested_node(self): dat = xr.Dataset({"a": [0, 2]}) - root = DataTree("root") - DataTree("results", data=dat, parent=root) + root = DataNode("root") + DataNode("results", data=dat, parent=root) printout = root.__str__() assert printout.splitlines()[2].startswith(" ") @@ -335,7 +335,7 @@ def test_print_datatree(self): def test_repr_of_node_with_data(self): dat = xr.Dataset({"a": [0, 2]}) - dt = DataTree("root", data=dat) + dt = DataNode("root", data=dat) assert "Coordinates" in repr(dt) diff --git a/xarray/datatree_/datatree/tests/test_mapping.py b/xarray/datatree_/datatree/tests/test_mapping.py index 050bbbf6c9f..00b30f57b7c 100644 --- a/xarray/datatree_/datatree/tests/test_mapping.py +++ b/xarray/datatree_/datatree/tests/test_mapping.py @@ -16,8 +16,8 @@ def test_not_a_tree(self): _check_isomorphic("s", 1) def test_different_widths(self): - dt1 = DataTree.from_dict(data_objects={"a": empty}) - dt2 = DataTree.from_dict(data_objects={"a": empty, "b": empty}) + dt1 = DataTree(data_objects={"a": empty}) + dt2 = DataTree(data_objects={"a": empty, "b": empty}) expected_err_str = ( "'root' in the first tree has 1 children, whereas its counterpart node 'root' in the " "second tree has 2 children" @@ -26,8 +26,8 @@ def test_different_widths(self): _check_isomorphic(dt1, dt2) def test_different_heights(self): - dt1 = DataTree.from_dict(data_objects={"a": empty}) - dt2 = DataTree.from_dict(data_objects={"a": empty, "a/b": empty}) + dt1 = DataTree(data_objects={"a": empty}) + dt2 = DataTree(data_objects={"a": empty, "a/b": empty}) expected_err_str = ( "'root/a' in the first tree has 0 children, whereas its counterpart node 'root/a' in the " "second tree has 1 children" @@ -36,8 +36,8 @@ def test_different_heights(self): _check_isomorphic(dt1, dt2) def test_only_one_has_data(self): - dt1 = DataTree.from_dict(data_objects={"a": xr.Dataset({"a": 0})}) - dt2 = DataTree.from_dict(data_objects={"a": None}) + dt1 = DataTree(data_objects={"a": xr.Dataset({"a": 0})}) + dt2 = DataTree(data_objects={"a": None}) expected_err_str = ( "'root/a' in the first tree has data, whereas its counterpart node 'root/a' in the " "second tree has no data" @@ -46,8 +46,8 @@ def test_only_one_has_data(self): _check_isomorphic(dt1, dt2) def test_names_different(self): - dt1 = DataTree.from_dict(data_objects={"a": xr.Dataset()}) - dt2 = DataTree.from_dict(data_objects={"b": empty}) + dt1 = DataTree(data_objects={"a": xr.Dataset()}) + dt2 = DataTree(data_objects={"b": empty}) expected_err_str = ( "'root/a' in the first tree has name 'a', whereas its counterpart node 'root/b' in the " "second tree has name 'b'" @@ -56,28 +56,28 @@ def test_names_different(self): _check_isomorphic(dt1, dt2, require_names_equal=True) def test_isomorphic_names_equal(self): - dt1 = DataTree.from_dict( + dt1 = DataTree( data_objects={"a": empty, "b": empty, "b/c": empty, "b/d": empty} ) - dt2 = DataTree.from_dict( + dt2 = DataTree( data_objects={"a": empty, "b": empty, "b/c": empty, "b/d": empty} ) _check_isomorphic(dt1, dt2, require_names_equal=True) def test_isomorphic_ordering(self): - dt1 = DataTree.from_dict( + dt1 = DataTree( data_objects={"a": empty, "b": empty, "b/d": empty, "b/c": empty} ) - dt2 = DataTree.from_dict( + dt2 = DataTree( data_objects={"a": empty, "b": empty, "b/c": empty, "b/d": empty} ) _check_isomorphic(dt1, dt2, require_names_equal=False) def test_isomorphic_names_not_equal(self): - dt1 = DataTree.from_dict( + dt1 = DataTree( data_objects={"a": empty, "b": empty, "b/c": empty, "b/d": empty} ) - dt2 = DataTree.from_dict( + dt2 = DataTree( data_objects={"A": empty, "B": empty, "B/C": empty, "B/D": empty} ) _check_isomorphic(dt1, dt2) diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index 463c68847a7..276577e7fc3 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -7,6 +7,18 @@ PathType = Union[Hashable, Sequence[Hashable]] +def _init_single_treenode(obj, name, parent, children): + if not isinstance(name, str) or "/" in name: + raise ValueError(f"invalid name {name}") + obj.name = name + + obj.parent = parent + if children: + obj.children = children + + return obj + + class TreeNode(anytree.NodeMixin): """ Base class representing a node of a tree, with methods for traversing and altering the tree. @@ -37,13 +49,7 @@ def __init__( parent: TreeNode = None, children: Iterable[TreeNode] = None, ): - if not isinstance(name, str) or "/" in name: - raise ValueError(f"invalid name {name}") - self.name = name - - self.parent = parent - if children: - self.children = children + _init_single_treenode(self, name=name, parent=parent, children=children) def __str__(self): """A printable representation of the structure of this entire subtree.""" From 360118ef6ae0ff621c1687be06a0e085921d09e6 Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Tue, 9 Nov 2021 09:59:35 -0500 Subject: [PATCH 070/260] Replace DataTree init method with from_dict https://github.com/xarray-contrib/datatree/pull/44 * swapped original init for a .from_dict method * updated tests * corrected options for creating datatree objects --- xarray/datatree_/README.md | 9 +- xarray/datatree_/datatree/__init__.py | 2 +- xarray/datatree_/datatree/datatree.py | 145 +++++++++--------- xarray/datatree_/datatree/io.py | 4 +- xarray/datatree_/datatree/mapping.py | 4 +- .../datatree/tests/test_dataset_api.py | 48 +++--- .../datatree_/datatree/tests/test_datatree.py | 118 +++++++------- .../datatree_/datatree/tests/test_mapping.py | 28 ++-- xarray/datatree_/datatree/treenode.py | 20 +-- 9 files changed, 188 insertions(+), 190 deletions(-) diff --git a/xarray/datatree_/README.md b/xarray/datatree_/README.md index e3368eeaa9c..63d9bd8e0e1 100644 --- a/xarray/datatree_/README.md +++ b/xarray/datatree_/README.md @@ -14,7 +14,8 @@ The approach used here is based on benbovy's [`DatasetNode` example](https://gis drawing You can create a `DataTree` object in 3 ways: -1) Load from a netCDF file that has groups via `open_datatree()`, -2) Using the init method of `DataTree`, which accepts a nested dictionary of Datasets, -3) Manually create individual nodes with `DataNode()` and specify their relationships to each other, either by setting `.parent` and `.chlldren` attributes, or through `__get/setitem__` access, e.g. -`dt['path/to/node'] = xr.Dataset()` +1) Load from a netCDF file (or Zarr store) that has groups via `open_datatree()`. +2) Using the init method of `DataTree`, which creates an individual node. + You can then specify the nodes' relationships to one other, either by setting `.parent` and `.chlldren` attributes, + or through `__get/setitem__` access, e.g. `dt['path/to/node'] = xr.Dataset()`. +3) Create a tree from a dictionary of paths to datasets using `DataTree.from_dict()`. diff --git a/xarray/datatree_/datatree/__init__.py b/xarray/datatree_/datatree/__init__.py index fbe1cba7860..7cd8ce5cd32 100644 --- a/xarray/datatree_/datatree/__init__.py +++ b/xarray/datatree_/datatree/__init__.py @@ -1,5 +1,5 @@ # flake8: noqa # Ignoring F401: imported but unused -from .datatree import DataNode, DataTree +from .datatree import DataTree from .io import open_datatree from .mapping import map_over_subtree diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index fbcec02bb2c..25c9437a572 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -14,7 +14,7 @@ MappedDatasetMethodsMixin, MappedDataWithCoords, ) -from .treenode import PathType, TreeNode, _init_single_treenode +from .treenode import PathType, TreeNode """ DEVELOPERS' NOTE @@ -39,24 +39,7 @@ class DataTree( """ A tree-like hierarchical collection of xarray objects. - Attempts to present the API of xarray.Dataset, but methods are wrapped to also update all the tree's child nodes. - - Parameters - ---------- - data_objects : dict-like, optional - A mapping from path names to xarray.Dataset, xarray.DataArray, or xtree.DataTree objects. - - Path names can be given as unix-like paths, or as tuples of strings (where each string - is known as a single "tag"). If path names containing more than one tag are given, new - tree nodes will be constructed as necessary. - - To assign data to the root node of the tree {name} as the path. - name : Hashable, optional - Name for the root node of the tree. Default is "root" - - See also - -------- - DataNode : Shortcut to create a DataTree with only a single node. + Attempts to present an API like that of xarray.Dataset, but methods are wrapped to also update all the tree's child nodes. """ # TODO should this instead be a subclass of Dataset? @@ -81,37 +64,37 @@ class DataTree( def __init__( self, - data_objects: Dict[PathType, Union[Dataset, DataArray, None]] = None, name: Hashable = "root", + data: Union[Dataset, DataArray] = None, + parent: TreeNode = None, + children: List[TreeNode] = None, ): - # First create the root node - super().__init__(name=name, parent=None, children=None) - if data_objects: - root_data = data_objects.pop(name, None) - else: - root_data = None - self._ds = root_data + """ + Create a single node of a DataTree, which optionally contains data in the form of an xarray.Dataset. - if data_objects: - # Populate tree with children determined from data_objects mapping - for path, data in data_objects.items(): - # Determine name of new node - path = self._tuple_or_path_to_path(path) - if self.separator in path: - node_path, node_name = path.rsplit(self.separator, maxsplit=1) - else: - node_path, node_name = "/", path + Parameters + ---------- + name : Hashable + Name for the root node of the tree. Default is "root" + data : Dataset, DataArray, Variable or None, optional + Data to store under the .ds attribute of this node. DataArrays and Variables will be promoted to Datasets. + Default is None. + parent : TreeNode, optional + Parent node to this node. Default is None. + children : Sequence[TreeNode], optional + Any child nodes of this node. Default is None. - relative_path = node_path.replace(self.name, "") + Returns + ------- + node : DataTree - # Create and set new node - new_node = DataNode(name=node_name, data=data) - self.set_node( - relative_path, - new_node, - allow_overwrite=False, - new_nodes_along_path=True, - ) + See Also + -------- + DataTree.from_dict + """ + + super().__init__(name, parent=parent, children=children) + self.ds = data @property def ds(self) -> Dataset: @@ -138,38 +121,59 @@ def has_data(self): return self.ds is not None @classmethod - def _init_single_datatree_node( + def from_dict( cls, - name: Hashable, - data: Union[Dataset, DataArray] = None, - parent: TreeNode = None, - children: List[TreeNode] = None, + data_objects: Dict[PathType, Union[Dataset, DataArray, None]] = None, + name: Hashable = "root", ): """ - Create a single node of a DataTree, which optionally contains data in the form of an xarray.Dataset. + Create a datatree from a dictionary of data objects, labelled by paths into the tree. Parameters ---------- - name : Hashable + data_objects : dict-like, optional + A mapping from path names to xarray.Dataset, xarray.DataArray, or DataTree objects. + + Path names can be given as unix-like paths, or as tuples of strings (where each string + is known as a single "tag"). If path names containing more than one tag are given, new + tree nodes will be constructed as necessary. + + To assign data to the root node of the tree use {name} as the path. + name : Hashable, optional Name for the root node of the tree. Default is "root" - data : Dataset, DataArray, Variable or None, optional - Data to store under the .ds attribute of this node. DataArrays and Variables will be promoted to Datasets. - Default is None. - parent : TreeNode, optional - Parent node to this node. Default is None. - children : Sequence[TreeNode], optional - Any child nodes of this node. Default is None. Returns ------- - node : DataTree + DataTree """ - # This approach was inspired by xarray.Dataset._construct_direct() - obj = object.__new__(cls) - obj._ds = None - obj = _init_single_treenode(obj, name=name, parent=parent, children=children) - obj.ds = data + # First create the root node + if data_objects: + root_data = data_objects.pop(name, None) + else: + root_data = None + obj = cls(name=name, data=root_data, parent=None, children=None) + + if data_objects: + # Populate tree with children determined from data_objects mapping + for path, data in data_objects.items(): + # Determine name of new node + path = obj._tuple_or_path_to_path(path) + if obj.separator in path: + node_path, node_name = path.rsplit(obj.separator, maxsplit=1) + else: + node_path, node_name = "/", path + + relative_path = node_path.replace(obj.name, "") + + # Create and set new node + new_node = cls(name=node_name, data=data) + obj.set_node( + relative_path, + new_node, + allow_overwrite=False, + new_nodes_along_path=True, + ) return obj def _pre_attach(self, parent: TreeNode) -> None: @@ -219,7 +223,7 @@ def __str__(self): def _single_node_repr(self): """Information about this node, not including its relationships to other nodes.""" - node_info = f"DataNode('{self.name}')" + node_info = f"DataTree('{self.name}')" if self.has_data: ds_info = "\n" + repr(self.ds) @@ -231,7 +235,7 @@ def __repr__(self): """Information about this node, including its relationships to other nodes.""" # TODO redo this to look like the Dataset repr, but just with child and parent info parent = self.parent.name if self.parent is not None else "None" - node_str = f"DataNode(name='{self.name}', parent='{parent}', children={[c.name for c in self.children]}," + node_str = f"DataTree(name='{self.name}', parent='{parent}', children={[c.name for c in self.children]}," if self.has_data: ds_repr_lines = self.ds.__repr__().splitlines() @@ -387,7 +391,7 @@ def __setitem__( else: # if nothing there then make new node based on type of object if isinstance(value, (Dataset, DataArray, Variable)) or value is None: - new_node = DataNode(name=last_tag, data=value) + new_node = DataTree(name=last_tag, data=value) self.set_node(path=path_tags, node=new_node) elif isinstance(value, TreeNode): self.set_node(path=path, node=value) @@ -467,7 +471,7 @@ def map_over_subtree_inplace( def render(self): """Print tree structure, including any data stored at each node.""" for pre, fill, node in anytree.RenderTree(self): - print(f"{pre}DataNode('{self.name}')") + print(f"{pre}DataTree('{self.name}')") for ds_line in repr(node.ds)[1:]: print(f"{fill}{ds_line}") @@ -602,6 +606,3 @@ def to_zarr(self, store, mode: str = "w", encoding=None, **kwargs): def plot(self): raise NotImplementedError - - -DataNode = DataTree._init_single_datatree_node diff --git a/xarray/datatree_/datatree/io.py b/xarray/datatree_/datatree/io.py index f7bdf570a04..533ded2b163 100644 --- a/xarray/datatree_/datatree/io.py +++ b/xarray/datatree_/datatree/io.py @@ -73,7 +73,7 @@ def _open_datatree_netcdf(filename: str, **kwargs) -> DataTree: with ncDataset(filename, mode="r") as ncds: ds = open_dataset(filename, **kwargs).pipe(_ds_or_none) - tree_root = DataTree(data_objects={"root": ds}) + tree_root = DataTree.from_dict(data_objects={"root": ds}) for key in _iter_nc_groups(ncds): tree_root[key] = open_dataset(filename, group=key, **kwargs).pipe( _ds_or_none @@ -86,7 +86,7 @@ def _open_datatree_zarr(store, **kwargs) -> DataTree: with zarr.open_group(store, mode="r") as zds: ds = open_dataset(store, engine="zarr", **kwargs).pipe(_ds_or_none) - tree_root = DataTree(data_objects={"root": ds}) + tree_root = DataTree.from_dict(data_objects={"root": ds}) for key in _iter_zarr_groups(zds): try: tree_root[key] = open_dataset( diff --git a/xarray/datatree_/datatree/mapping.py b/xarray/datatree_/datatree/mapping.py index 94b17ac04bd..dc0fb913f15 100644 --- a/xarray/datatree_/datatree/mapping.py +++ b/xarray/datatree_/datatree/mapping.py @@ -208,7 +208,9 @@ def _map_over_subtree(*args, **kwargs): output_node_data = None out_tree_contents[p] = output_node_data - new_tree = DataTree(name=first_tree.name, data_objects=out_tree_contents) + new_tree = DataTree.from_dict( + name=first_tree.name, data_objects=out_tree_contents + ) result_trees.append(new_tree) # If only one result then don't wrap it in a tuple diff --git a/xarray/datatree_/datatree/tests/test_dataset_api.py b/xarray/datatree_/datatree/tests/test_dataset_api.py index f3276aa886b..a7284ec25eb 100644 --- a/xarray/datatree_/datatree/tests/test_dataset_api.py +++ b/xarray/datatree_/datatree/tests/test_dataset_api.py @@ -2,7 +2,7 @@ import xarray as xr from xarray.testing import assert_equal -from datatree import DataNode +from datatree import DataTree from .test_datatree import assert_tree_equal, create_test_datatree @@ -11,52 +11,52 @@ class TestDSMethodInheritance: def test_dataset_method(self): # test root da = xr.DataArray(name="a", data=[1, 2, 3], dims="x") - dt = DataNode("root", data=da) + dt = DataTree("root", data=da) expected_ds = da.to_dataset().isel(x=1) result_ds = dt.isel(x=1).ds assert_equal(result_ds, expected_ds) # test descendant - DataNode("results", parent=dt, data=da) + DataTree("results", parent=dt, data=da) result_ds = dt.isel(x=1)["results"].ds assert_equal(result_ds, expected_ds) def test_reduce_method(self): # test root da = xr.DataArray(name="a", data=[False, True, False], dims="x") - dt = DataNode("root", data=da) + dt = DataTree("root", data=da) expected_ds = da.to_dataset().any() result_ds = dt.any().ds assert_equal(result_ds, expected_ds) # test descendant - DataNode("results", parent=dt, data=da) + DataTree("results", parent=dt, data=da) result_ds = dt.any()["results"].ds assert_equal(result_ds, expected_ds) def test_nan_reduce_method(self): # test root da = xr.DataArray(name="a", data=[1, 2, 3], dims="x") - dt = DataNode("root", data=da) + dt = DataTree("root", data=da) expected_ds = da.to_dataset().mean() result_ds = dt.mean().ds assert_equal(result_ds, expected_ds) # test descendant - DataNode("results", parent=dt, data=da) + DataTree("results", parent=dt, data=da) result_ds = dt.mean()["results"].ds assert_equal(result_ds, expected_ds) def test_cum_method(self): # test root da = xr.DataArray(name="a", data=[1, 2, 3], dims="x") - dt = DataNode("root", data=da) + dt = DataTree("root", data=da) expected_ds = da.to_dataset().cumsum() result_ds = dt.cumsum().ds assert_equal(result_ds, expected_ds) # test descendant - DataNode("results", parent=dt, data=da) + DataTree("results", parent=dt, data=da) result_ds = dt.cumsum()["results"].ds assert_equal(result_ds, expected_ds) @@ -65,11 +65,11 @@ class TestOps: def test_binary_op_on_int(self): ds1 = xr.Dataset({"a": [5], "b": [3]}) ds2 = xr.Dataset({"x": [0.1, 0.2], "y": [10, 20]}) - dt = DataNode("root", data=ds1) - DataNode("subnode", data=ds2, parent=dt) + dt = DataTree("root", data=ds1) + DataTree("subnode", data=ds2, parent=dt) - expected_root = DataNode("root", data=ds1 * 5) - expected_descendant = DataNode("subnode", data=ds2 * 5, parent=expected_root) + expected_root = DataTree("root", data=ds1 * 5) + expected_descendant = DataTree("subnode", data=ds2 * 5, parent=expected_root) result = dt * 5 assert_equal(result.ds, expected_root.ds) @@ -78,12 +78,12 @@ def test_binary_op_on_int(self): def test_binary_op_on_dataset(self): ds1 = xr.Dataset({"a": [5], "b": [3]}) ds2 = xr.Dataset({"x": [0.1, 0.2], "y": [10, 20]}) - dt = DataNode("root", data=ds1) - DataNode("subnode", data=ds2, parent=dt) + dt = DataTree("root", data=ds1) + DataTree("subnode", data=ds2, parent=dt) other_ds = xr.Dataset({"z": ("z", [0.1, 0.2])}) - expected_root = DataNode("root", data=ds1 * other_ds) - expected_descendant = DataNode( + expected_root = DataTree("root", data=ds1 * other_ds) + expected_descendant = DataTree( "subnode", data=ds2 * other_ds, parent=expected_root ) result = dt * other_ds @@ -94,11 +94,11 @@ def test_binary_op_on_dataset(self): def test_binary_op_on_datatree(self): ds1 = xr.Dataset({"a": [5], "b": [3]}) ds2 = xr.Dataset({"x": [0.1, 0.2], "y": [10, 20]}) - dt = DataNode("root", data=ds1) - DataNode("subnode", data=ds2, parent=dt) + dt = DataTree("root", data=ds1) + DataTree("subnode", data=ds2, parent=dt) - expected_root = DataNode("root", data=ds1 * ds1) - expected_descendant = DataNode("subnode", data=ds2 * ds2, parent=expected_root) + expected_root = DataTree("root", data=ds1 * ds1) + expected_descendant = DataTree("subnode", data=ds2 * ds2, parent=expected_root) result = dt * dt assert_equal(result.ds, expected_root.ds) @@ -108,15 +108,15 @@ def test_binary_op_on_datatree(self): class TestUFuncs: def test_root(self): da = xr.DataArray(name="a", data=[1, 2, 3]) - dt = DataNode("root", data=da) + dt = DataTree("root", data=da) expected_ds = np.sin(da.to_dataset()) result_ds = np.sin(dt).ds assert_equal(result_ds, expected_ds) def test_descendants(self): da = xr.DataArray(name="a", data=[1, 2, 3]) - dt = DataNode("root") - DataNode("results", parent=dt, data=da) + dt = DataTree("root") + DataTree("results", parent=dt, data=da) expected_ds = np.sin(da.to_dataset()) result_ds = np.sin(dt)["results"].ds assert_equal(result_ds, expected_ds) diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 4a5d64ff774..7d1569876bd 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -3,7 +3,7 @@ from anytree.resolver import ChildResolverError from xarray.testing import assert_identical -from datatree import DataNode, DataTree +from datatree import DataTree from datatree.io import open_datatree from datatree.tests import requires_netCDF4, requires_zarr @@ -55,27 +55,27 @@ def create_test_datatree(modify=lambda ds: ds): root_data = modify(xr.Dataset({"a": ("y", [6, 7, 8]), "set0": ("x", [9, 10])})) # Avoid using __init__ so we can independently test it - root = DataNode(name="root", data=root_data) - set1 = DataNode(name="set1", parent=root, data=set1_data) - DataNode(name="set1", parent=set1) - DataNode(name="set2", parent=set1) - set2 = DataNode(name="set2", parent=root, data=set2_data) - DataNode(name="set1", parent=set2) - DataNode(name="set3", parent=root) + root = DataTree(name="root", data=root_data) + set1 = DataTree(name="set1", parent=root, data=set1_data) + DataTree(name="set1", parent=set1) + DataTree(name="set2", parent=set1) + set2 = DataTree(name="set2", parent=root, data=set2_data) + DataTree(name="set1", parent=set2) + DataTree(name="set3", parent=root) return root class TestStoreDatasets: - def test_create_datanode(self): + def test_create_DataTree(self): dat = xr.Dataset({"a": 0}) - john = DataNode("john", data=dat) + john = DataTree("john", data=dat) assert john.ds is dat with pytest.raises(TypeError): - DataNode("mary", parent=john, data="junk") + DataTree("mary", parent=john, data="junk") def test_set_data(self): - john = DataNode("john") + john = DataTree("john") dat = xr.Dataset({"a": 0}) john.ds = dat assert john.ds is dat @@ -83,25 +83,25 @@ def test_set_data(self): john.ds = "junk" def test_has_data(self): - john = DataNode("john", data=xr.Dataset({"a": 0})) + john = DataTree("john", data=xr.Dataset({"a": 0})) assert john.has_data - john = DataNode("john", data=None) + john = DataTree("john", data=None) assert not john.has_data class TestVariablesChildrenNameCollisions: def test_parent_already_has_variable_with_childs_name(self): - dt = DataNode("root", data=xr.Dataset({"a": [0], "b": 1})) + dt = DataTree("root", data=xr.Dataset({"a": [0], "b": 1})) with pytest.raises(KeyError, match="already contains a data variable named a"): - DataNode("a", data=None, parent=dt) + DataTree("a", data=None, parent=dt) with pytest.raises(KeyError, match="already contains a data variable named a"): - dt.add_child(DataNode("a", data=None)) + dt.add_child(DataTree("a", data=None)) def test_assign_when_already_child_with_variables_name(self): - dt = DataNode("root", data=None) - DataNode("a", data=None, parent=dt) + dt = DataTree("root", data=None) + DataTree("a", data=None, parent=dt) with pytest.raises(KeyError, match="already has a child named a"): dt.ds = xr.Dataset({"a": 0}) @@ -112,82 +112,82 @@ def test_assign_when_already_child_with_variables_name(self): @pytest.mark.xfail def test_update_when_already_child_with_variables_name(self): # See issue https://github.com/xarray-contrib/datatree/issues/38 - dt = DataNode("root", data=None) - DataNode("a", data=None, parent=dt) + dt = DataTree("root", data=None) + DataTree("a", data=None, parent=dt) with pytest.raises(KeyError, match="already has a child named a"): dt.ds["a"] = xr.DataArray(0) class TestGetItems: def test_get_node(self): - folder1 = DataNode("folder1") - results = DataNode("results", parent=folder1) - highres = DataNode("highres", parent=results) + folder1 = DataTree("folder1") + results = DataTree("results", parent=folder1) + highres = DataTree("highres", parent=results) assert folder1["results"] is results assert folder1["results/highres"] is highres assert folder1[("results", "highres")] is highres def test_get_single_data_variable(self): data = xr.Dataset({"temp": [0, 50]}) - results = DataNode("results", data=data) + results = DataTree("results", data=data) assert_identical(results["temp"], data["temp"]) def test_get_single_data_variable_from_node(self): data = xr.Dataset({"temp": [0, 50]}) - folder1 = DataNode("folder1") - results = DataNode("results", parent=folder1) - DataNode("highres", parent=results, data=data) + folder1 = DataTree("folder1") + results = DataTree("results", parent=folder1) + DataTree("highres", parent=results, data=data) assert_identical(folder1["results/highres/temp"], data["temp"]) assert_identical(folder1[("results", "highres", "temp")], data["temp"]) def test_get_nonexistent_node(self): - folder1 = DataNode("folder1") - DataNode("results", parent=folder1) + folder1 = DataTree("folder1") + DataTree("results", parent=folder1) with pytest.raises(ChildResolverError): folder1["results/highres"] def test_get_nonexistent_variable(self): data = xr.Dataset({"temp": [0, 50]}) - results = DataNode("results", data=data) + results = DataTree("results", data=data) with pytest.raises(ChildResolverError): results["pressure"] def test_get_multiple_data_variables(self): data = xr.Dataset({"temp": [0, 50], "p": [5, 8, 7]}) - results = DataNode("results", data=data) + results = DataTree("results", data=data) assert_identical(results[["temp", "p"]], data[["temp", "p"]]) def test_dict_like_selection_access_to_dataset(self): data = xr.Dataset({"temp": [0, 50]}) - results = DataNode("results", data=data) + results = DataTree("results", data=data) assert_identical(results[{"temp": 1}], data[{"temp": 1}]) class TestSetItems: # TODO test tuple-style access too def test_set_new_child_node(self): - john = DataNode("john") - mary = DataNode("mary") + john = DataTree("john") + mary = DataTree("mary") john["/"] = mary assert john["mary"] is mary def test_set_new_grandchild_node(self): - john = DataNode("john") - DataNode("mary", parent=john) - rose = DataNode("rose") + john = DataTree("john") + DataTree("mary", parent=john) + rose = DataTree("rose") john["mary/"] = rose assert john["mary/rose"] is rose def test_set_new_empty_node(self): - john = DataNode("john") + john = DataTree("john") john["mary"] = None mary = john["mary"] assert isinstance(mary, DataTree) assert mary.ds is None def test_overwrite_data_in_node_with_none(self): - john = DataNode("john") - mary = DataNode("mary", parent=john, data=xr.Dataset()) + john = DataTree("john") + mary = DataTree("mary", parent=john, data=xr.Dataset()) john["mary"] = None assert mary.ds is None @@ -197,47 +197,47 @@ def test_overwrite_data_in_node_with_none(self): def test_set_dataset_on_this_node(self): data = xr.Dataset({"temp": [0, 50]}) - results = DataNode("results") + results = DataTree("results") results["/"] = data assert results.ds is data def test_set_dataset_as_new_node(self): data = xr.Dataset({"temp": [0, 50]}) - folder1 = DataNode("folder1") + folder1 = DataTree("folder1") folder1["results"] = data assert folder1["results"].ds is data def test_set_dataset_as_new_node_requiring_intermediate_nodes(self): data = xr.Dataset({"temp": [0, 50]}) - folder1 = DataNode("folder1") + folder1 = DataTree("folder1") folder1["results/highres"] = data assert folder1["results/highres"].ds is data def test_set_named_dataarray_as_new_node(self): data = xr.DataArray(name="temp", data=[0, 50]) - folder1 = DataNode("folder1") + folder1 = DataTree("folder1") folder1["results"] = data assert_identical(folder1["results"].ds, data.to_dataset()) def test_set_unnamed_dataarray(self): data = xr.DataArray([0, 50]) - folder1 = DataNode("folder1") + folder1 = DataTree("folder1") with pytest.raises(ValueError, match="unable to convert"): folder1["results"] = data def test_add_new_variable_to_empty_node(self): - results = DataNode("results") + results = DataTree("results") results["/"] = xr.DataArray(name="pressure", data=[2, 3]) assert "pressure" in results.ds # What if there is a path to traverse first? - results = DataNode("results") + results = DataTree("results") results["highres/"] = xr.DataArray(name="pressure", data=[2, 3]) assert "pressure" in results["highres"].ds def test_dataarray_replace_existing_node(self): t = xr.Dataset({"temp": [0, 50]}) - results = DataNode("results", data=t) + results = DataTree("results", data=t) p = xr.DataArray(name="pressure", data=[2, 3]) results["/"] = p assert_identical(results.ds, p.to_dataset()) @@ -253,7 +253,7 @@ def test_empty(self): def test_data_in_root(self): dat = xr.Dataset() - dt = DataTree({"root": dat}) + dt = DataTree.from_dict({"root": dat}) assert dt.name == "root" assert dt.parent is None assert dt.children == () @@ -261,7 +261,7 @@ def test_data_in_root(self): def test_one_layer(self): dat1, dat2 = xr.Dataset({"a": 1}), xr.Dataset({"b": 2}) - dt = DataTree({"run1": dat1, "run2": dat2}) + dt = DataTree.from_dict({"run1": dat1, "run2": dat2}) assert dt.ds is None assert dt["run1"].ds is dat1 assert dt["run1"].children == () @@ -270,7 +270,7 @@ def test_one_layer(self): def test_two_layers(self): dat1, dat2 = xr.Dataset({"a": 1}), xr.Dataset({"a": [1, 2]}) - dt = DataTree({"highres/run": dat1, "lowres/run": dat2}) + dt = DataTree.from_dict({"highres/run": dat1, "lowres/run": dat2}) assert "highres" in [c.name for c in dt.children] assert "lowres" in [c.name for c in dt.children] highres_run = dt.get_node("highres/run") @@ -300,16 +300,16 @@ class TestRestructuring: class TestRepr: def test_print_empty_node(self): - dt = DataNode("root") + dt = DataTree("root") printout = dt.__str__() - assert printout == "DataNode('root')" + assert printout == "DataTree('root')" def test_print_node_with_data(self): dat = xr.Dataset({"a": [0, 2]}) - dt = DataNode("root", data=dat) + dt = DataTree("root", data=dat) printout = dt.__str__() expected = [ - "DataNode('root')", + "DataTree('root')", "Dimensions", "Coordinates", "a", @@ -321,8 +321,8 @@ def test_print_node_with_data(self): def test_nested_node(self): dat = xr.Dataset({"a": [0, 2]}) - root = DataNode("root") - DataNode("results", data=dat, parent=root) + root = DataTree("root") + DataTree("results", data=dat, parent=root) printout = root.__str__() assert printout.splitlines()[2].startswith(" ") @@ -335,7 +335,7 @@ def test_print_datatree(self): def test_repr_of_node_with_data(self): dat = xr.Dataset({"a": [0, 2]}) - dt = DataNode("root", data=dat) + dt = DataTree("root", data=dat) assert "Coordinates" in repr(dt) diff --git a/xarray/datatree_/datatree/tests/test_mapping.py b/xarray/datatree_/datatree/tests/test_mapping.py index 00b30f57b7c..050bbbf6c9f 100644 --- a/xarray/datatree_/datatree/tests/test_mapping.py +++ b/xarray/datatree_/datatree/tests/test_mapping.py @@ -16,8 +16,8 @@ def test_not_a_tree(self): _check_isomorphic("s", 1) def test_different_widths(self): - dt1 = DataTree(data_objects={"a": empty}) - dt2 = DataTree(data_objects={"a": empty, "b": empty}) + dt1 = DataTree.from_dict(data_objects={"a": empty}) + dt2 = DataTree.from_dict(data_objects={"a": empty, "b": empty}) expected_err_str = ( "'root' in the first tree has 1 children, whereas its counterpart node 'root' in the " "second tree has 2 children" @@ -26,8 +26,8 @@ def test_different_widths(self): _check_isomorphic(dt1, dt2) def test_different_heights(self): - dt1 = DataTree(data_objects={"a": empty}) - dt2 = DataTree(data_objects={"a": empty, "a/b": empty}) + dt1 = DataTree.from_dict(data_objects={"a": empty}) + dt2 = DataTree.from_dict(data_objects={"a": empty, "a/b": empty}) expected_err_str = ( "'root/a' in the first tree has 0 children, whereas its counterpart node 'root/a' in the " "second tree has 1 children" @@ -36,8 +36,8 @@ def test_different_heights(self): _check_isomorphic(dt1, dt2) def test_only_one_has_data(self): - dt1 = DataTree(data_objects={"a": xr.Dataset({"a": 0})}) - dt2 = DataTree(data_objects={"a": None}) + dt1 = DataTree.from_dict(data_objects={"a": xr.Dataset({"a": 0})}) + dt2 = DataTree.from_dict(data_objects={"a": None}) expected_err_str = ( "'root/a' in the first tree has data, whereas its counterpart node 'root/a' in the " "second tree has no data" @@ -46,8 +46,8 @@ def test_only_one_has_data(self): _check_isomorphic(dt1, dt2) def test_names_different(self): - dt1 = DataTree(data_objects={"a": xr.Dataset()}) - dt2 = DataTree(data_objects={"b": empty}) + dt1 = DataTree.from_dict(data_objects={"a": xr.Dataset()}) + dt2 = DataTree.from_dict(data_objects={"b": empty}) expected_err_str = ( "'root/a' in the first tree has name 'a', whereas its counterpart node 'root/b' in the " "second tree has name 'b'" @@ -56,28 +56,28 @@ def test_names_different(self): _check_isomorphic(dt1, dt2, require_names_equal=True) def test_isomorphic_names_equal(self): - dt1 = DataTree( + dt1 = DataTree.from_dict( data_objects={"a": empty, "b": empty, "b/c": empty, "b/d": empty} ) - dt2 = DataTree( + dt2 = DataTree.from_dict( data_objects={"a": empty, "b": empty, "b/c": empty, "b/d": empty} ) _check_isomorphic(dt1, dt2, require_names_equal=True) def test_isomorphic_ordering(self): - dt1 = DataTree( + dt1 = DataTree.from_dict( data_objects={"a": empty, "b": empty, "b/d": empty, "b/c": empty} ) - dt2 = DataTree( + dt2 = DataTree.from_dict( data_objects={"a": empty, "b": empty, "b/c": empty, "b/d": empty} ) _check_isomorphic(dt1, dt2, require_names_equal=False) def test_isomorphic_names_not_equal(self): - dt1 = DataTree( + dt1 = DataTree.from_dict( data_objects={"a": empty, "b": empty, "b/c": empty, "b/d": empty} ) - dt2 = DataTree( + dt2 = DataTree.from_dict( data_objects={"A": empty, "B": empty, "B/C": empty, "B/D": empty} ) _check_isomorphic(dt1, dt2) diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index 276577e7fc3..463c68847a7 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -7,18 +7,6 @@ PathType = Union[Hashable, Sequence[Hashable]] -def _init_single_treenode(obj, name, parent, children): - if not isinstance(name, str) or "/" in name: - raise ValueError(f"invalid name {name}") - obj.name = name - - obj.parent = parent - if children: - obj.children = children - - return obj - - class TreeNode(anytree.NodeMixin): """ Base class representing a node of a tree, with methods for traversing and altering the tree. @@ -49,7 +37,13 @@ def __init__( parent: TreeNode = None, children: Iterable[TreeNode] = None, ): - _init_single_treenode(self, name=name, parent=parent, children=children) + if not isinstance(name, str) or "/" in name: + raise ValueError(f"invalid name {name}") + self.name = name + + self.parent = parent + if children: + self.children = children def __str__(self): """A printable representation of the structure of this entire subtree.""" From c99dea41f16e2b505e951669f49e459c3cb20f53 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Fri, 10 Dec 2021 10:55:09 -0500 Subject: [PATCH 071/260] Bump actions/setup-python from 2.2.2 to 2.3.1 https://github.com/xarray-contrib/datatree/pull/46 Bumps [actions/setup-python](https://github.com/actions/setup-python) from 2.2.2 to 2.3.1. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](https://github.com/actions/setup-python/compare/v2.2.2...v2.3.1) --- updated-dependencies: - dependency-name: actions/setup-python dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- xarray/datatree_/.github/workflows/main.yaml | 2 +- xarray/datatree_/.github/workflows/pypipublish.yaml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/xarray/datatree_/.github/workflows/main.yaml b/xarray/datatree_/.github/workflows/main.yaml index 6fd951e040f..392a7565a9f 100644 --- a/xarray/datatree_/.github/workflows/main.yaml +++ b/xarray/datatree_/.github/workflows/main.yaml @@ -13,7 +13,7 @@ jobs: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2.4.0 - - uses: actions/setup-python@v2.2.2 + - uses: actions/setup-python@v2.3.1 - uses: pre-commit/action@v2.0.3 test: diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index abdd87c2338..63bb976d2f0 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -10,7 +10,7 @@ jobs: steps: - uses: actions/checkout@v2.4.0 - name: Set up Python - uses: actions/setup-python@v2.2.1 + uses: actions/setup-python@v2.3.1 with: python-version: "3.x" - name: Install dependencies From 21a3b3f0eaf017724375c4789df23c27c71090c0 Mon Sep 17 00:00:00 2001 From: Joe Hamman Date: Mon, 13 Dec 2021 08:40:26 -0800 Subject: [PATCH 072/260] add initial draft of docs https://github.com/xarray-contrib/datatree/pull/39 * add initial draft of docs * add pages * made build work, but had to rollback docstring modification --- xarray/datatree_/.gitignore | 1 + xarray/datatree_/datatree/_version.py | 2 +- xarray/datatree_/datatree/datatree.py | 28 +- xarray/datatree_/datatree/ops.py | 20 +- xarray/datatree_/docs/Makefile | 177 +++++++++++ xarray/datatree_/docs/make.bat | 242 +++++++++++++++ xarray/datatree_/docs/requirements.txt | 3 + xarray/datatree_/docs/source/api.rst | 155 ++++++++++ xarray/datatree_/docs/source/conf.py | 283 ++++++++++++++++++ xarray/datatree_/docs/source/contributing.rst | 136 +++++++++ xarray/datatree_/docs/source/index.rst | 20 ++ xarray/datatree_/docs/source/installation.rst | 5 + xarray/datatree_/docs/source/tutorial.rst | 5 + 13 files changed, 1052 insertions(+), 25 deletions(-) create mode 100644 xarray/datatree_/docs/Makefile create mode 100644 xarray/datatree_/docs/make.bat create mode 100644 xarray/datatree_/docs/requirements.txt create mode 100644 xarray/datatree_/docs/source/api.rst create mode 100644 xarray/datatree_/docs/source/conf.py create mode 100644 xarray/datatree_/docs/source/contributing.rst create mode 100644 xarray/datatree_/docs/source/index.rst create mode 100644 xarray/datatree_/docs/source/installation.rst create mode 100644 xarray/datatree_/docs/source/tutorial.rst diff --git a/xarray/datatree_/.gitignore b/xarray/datatree_/.gitignore index b6e47617de1..ee3bee05376 100644 --- a/xarray/datatree_/.gitignore +++ b/xarray/datatree_/.gitignore @@ -70,6 +70,7 @@ instance/ # Sphinx documentation docs/_build/ +docs/source/generated # PyBuilder target/ diff --git a/xarray/datatree_/datatree/_version.py b/xarray/datatree_/datatree/_version.py index 772ffe3d741..e1068e8b8df 100644 --- a/xarray/datatree_/datatree/_version.py +++ b/xarray/datatree_/datatree/_version.py @@ -1 +1 @@ -__version__ = "0.1.dev46+g415cbb7.d20210825" +__version__ = "0.1.dev75+g977ffe2.d20210902" diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 25c9437a572..46e3d6c92d2 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -16,18 +16,18 @@ ) from .treenode import PathType, TreeNode -""" -DEVELOPERS' NOTE ----------------- -The idea of this module is to create a `DataTree` class which inherits the tree structure from TreeNode, and also copies -the entire API of `xarray.Dataset`, but with certain methods decorated to instead map the dataset function over every -node in the tree. As this API is copied without directly subclassing `xarray.Dataset` we instead create various Mixin -classes (in ops.py) which each define part of `xarray.Dataset`'s extensive API. +# """ +# DEVELOPERS' NOTE +# ---------------- +# The idea of this module is to create a `DataTree` class which inherits the tree structure from TreeNode, and also copies +# the entire API of `xarray.Dataset`, but with certain methods decorated to instead map the dataset function over every +# node in the tree. As this API is copied without directly subclassing `xarray.Dataset` we instead create various Mixin +# classes (in ops.py) which each define part of `xarray.Dataset`'s extensive API. -Some of these methods must be wrapped to map over all nodes in the subtree. Others are fine to inherit unaltered -(normally because they (a) only call dataset properties and (b) don't return a dataset that should be nested into a new -tree) and some will get overridden by the class definition of DataTree. -""" +# Some of these methods must be wrapped to map over all nodes in the subtree. Others are fine to inherit unaltered +# (normally because they (a) only call dataset properties and (b) don't return a dataset that should be nested into a new +# tree) and some will get overridden by the class definition of DataTree. +# """ class DataTree( @@ -540,8 +540,8 @@ def to_netcdf( """ Write datatree contents to a netCDF file. - Paramters - --------- + Parameters + ---------- filepath : str or Path Path to which to save this datatree. mode : {"w", "a"}, default: "w" @@ -578,7 +578,7 @@ def to_zarr(self, store, mode: str = "w", encoding=None, **kwargs): Write datatree contents to a Zarr store. Parameters - --------- + ---------- store : MutableMapping, str or Path, optional Store or path to directory in file system mode : {{"w", "w-", "a", "r+", None}, default: "w" diff --git a/xarray/datatree_/datatree/ops.py b/xarray/datatree_/datatree/ops.py index e411c973c99..ee55ccfe4c2 100644 --- a/xarray/datatree_/datatree/ops.py +++ b/xarray/datatree_/datatree/ops.py @@ -213,16 +213,16 @@ def method_name(self, *args, **kwargs): if wrap_func is map_over_subtree: # Add a paragraph to the method's docstring explaining how it's been mapped orig_method_docstring = orig_method.__doc__ - if orig_method_docstring is not None: - if "\n" in orig_method_docstring: - new_method_docstring = orig_method_docstring.replace( - "\n", _MAPPED_DOCSTRING_ADDENDUM, 1 - ) - else: - new_method_docstring = ( - orig_method_docstring + f"\n\n{_MAPPED_DOCSTRING_ADDENDUM}" - ) - setattr(target_cls_dict[method_name], "__doc__", new_method_docstring) + # if orig_method_docstring is not None: + # if "\n" in orig_method_docstring: + # new_method_docstring = orig_method_docstring.replace( + # "\n", _MAPPED_DOCSTRING_ADDENDUM, 1 + # ) + # else: + # new_method_docstring = ( + # orig_method_docstring + f"\n\n{_MAPPED_DOCSTRING_ADDENDUM}" + # ) + setattr(target_cls_dict[method_name], "__doc__", orig_method_docstring) class MappedDatasetMethodsMixin: diff --git a/xarray/datatree_/docs/Makefile b/xarray/datatree_/docs/Makefile new file mode 100644 index 00000000000..9b5b6042838 --- /dev/null +++ b/xarray/datatree_/docs/Makefile @@ -0,0 +1,177 @@ +# Makefile for Sphinx documentation +# + +# You can set these variables from the command line. +SPHINXOPTS = +SPHINXBUILD = sphinx-build +PAPER = +BUILDDIR = _build + +# User-friendly check for sphinx-build +ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1) +$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/) +endif + +# Internal variables. +PAPEROPT_a4 = -D latex_paper_size=a4 +PAPEROPT_letter = -D latex_paper_size=letter +ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source +# the i18n builder cannot share the environment and doctrees with the others +I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source + +.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext + +help: + @echo "Please use \`make ' where is one of" + @echo " html to make standalone HTML files" + @echo " dirhtml to make HTML files named index.html in directories" + @echo " singlehtml to make a single large HTML file" + @echo " pickle to make pickle files" + @echo " json to make JSON files" + @echo " htmlhelp to make HTML files and a HTML help project" + @echo " qthelp to make HTML files and a qthelp project" + @echo " devhelp to make HTML files and a Devhelp project" + @echo " epub to make an epub" + @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" + @echo " latexpdf to make LaTeX files and run them through pdflatex" + @echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx" + @echo " text to make text files" + @echo " man to make manual pages" + @echo " texinfo to make Texinfo files" + @echo " info to make Texinfo files and run them through makeinfo" + @echo " gettext to make PO message catalogs" + @echo " changes to make an overview of all changed/added/deprecated items" + @echo " xml to make Docutils-native XML files" + @echo " pseudoxml to make pseudoxml-XML files for display purposes" + @echo " linkcheck to check all external links for integrity" + @echo " doctest to run all doctests embedded in the documentation (if enabled)" + +clean: + rm -rf $(BUILDDIR)/* + +html: + $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html + @echo + @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." + +dirhtml: + $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml + @echo + @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." + +singlehtml: + $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml + @echo + @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." + +pickle: + $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle + @echo + @echo "Build finished; now you can process the pickle files." + +json: + $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json + @echo + @echo "Build finished; now you can process the JSON files." + +htmlhelp: + $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp + @echo + @echo "Build finished; now you can run HTML Help Workshop with the" \ + ".hhp project file in $(BUILDDIR)/htmlhelp." + +qthelp: + $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp + @echo + @echo "Build finished; now you can run "qcollectiongenerator" with the" \ + ".qhcp project file in $(BUILDDIR)/qthelp, like this:" + @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/complexity.qhcp" + @echo "To view the help file:" + @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/complexity.qhc" + +devhelp: + $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp + @echo + @echo "Build finished." + @echo "To view the help file:" + @echo "# mkdir -p $$HOME/.local/share/devhelp/complexity" + @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/complexity" + @echo "# devhelp" + +epub: + $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub + @echo + @echo "Build finished. The epub file is in $(BUILDDIR)/epub." + +latex: + $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex + @echo + @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." + @echo "Run \`make' in that directory to run these through (pdf)latex" \ + "(use \`make latexpdf' here to do that automatically)." + +latexpdf: + $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex + @echo "Running LaTeX files through pdflatex..." + $(MAKE) -C $(BUILDDIR)/latex all-pdf + @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." + +latexpdfja: + $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex + @echo "Running LaTeX files through platex and dvipdfmx..." + $(MAKE) -C $(BUILDDIR)/latex all-pdf-ja + @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." + +text: + $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text + @echo + @echo "Build finished. The text files are in $(BUILDDIR)/text." + +man: + $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man + @echo + @echo "Build finished. The manual pages are in $(BUILDDIR)/man." + +texinfo: + $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo + @echo + @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo." + @echo "Run \`make' in that directory to run these through makeinfo" \ + "(use \`make info' here to do that automatically)." + +info: + $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo + @echo "Running Texinfo files through makeinfo..." + make -C $(BUILDDIR)/texinfo info + @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo." + +gettext: + $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale + @echo + @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale." + +changes: + $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes + @echo + @echo "The overview file is in $(BUILDDIR)/changes." + +linkcheck: + $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck + @echo + @echo "Link check complete; look for any errors in the above output " \ + "or in $(BUILDDIR)/linkcheck/output.txt." + +doctest: + $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest + @echo "Testing of doctests in the sources finished, look at the " \ + "results in $(BUILDDIR)/doctest/output.txt." + +xml: + $(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml + @echo + @echo "Build finished. The XML files are in $(BUILDDIR)/xml." + +pseudoxml: + $(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml + @echo + @echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml." diff --git a/xarray/datatree_/docs/make.bat b/xarray/datatree_/docs/make.bat new file mode 100644 index 00000000000..2df9a8cbbb6 --- /dev/null +++ b/xarray/datatree_/docs/make.bat @@ -0,0 +1,242 @@ +@ECHO OFF + +REM Command file for Sphinx documentation + +if "%SPHINXBUILD%" == "" ( + set SPHINXBUILD=sphinx-build +) +set BUILDDIR=_build +set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% . +set I18NSPHINXOPTS=%SPHINXOPTS% . +if NOT "%PAPER%" == "" ( + set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS% + set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS% +) + +if "%1" == "" goto help + +if "%1" == "help" ( + :help + echo.Please use `make ^` where ^ is one of + echo. html to make standalone HTML files + echo. dirhtml to make HTML files named index.html in directories + echo. singlehtml to make a single large HTML file + echo. pickle to make pickle files + echo. json to make JSON files + echo. htmlhelp to make HTML files and a HTML help project + echo. qthelp to make HTML files and a qthelp project + echo. devhelp to make HTML files and a Devhelp project + echo. epub to make an epub + echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter + echo. text to make text files + echo. man to make manual pages + echo. texinfo to make Texinfo files + echo. gettext to make PO message catalogs + echo. changes to make an overview over all changed/added/deprecated items + echo. xml to make Docutils-native XML files + echo. pseudoxml to make pseudoxml-XML files for display purposes + echo. linkcheck to check all external links for integrity + echo. doctest to run all doctests embedded in the documentation if enabled + goto end +) + +if "%1" == "clean" ( + for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i + del /q /s %BUILDDIR%\* + goto end +) + + +%SPHINXBUILD% 2> nul +if errorlevel 9009 ( + echo. + echo.The 'sphinx-build' command was not found. Make sure you have Sphinx + echo.installed, then set the SPHINXBUILD environment variable to point + echo.to the full path of the 'sphinx-build' executable. Alternatively you + echo.may add the Sphinx directory to PATH. + echo. + echo.If you don't have Sphinx installed, grab it from + echo.http://sphinx-doc.org/ + exit /b 1 +) + +if "%1" == "html" ( + %SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. The HTML pages are in %BUILDDIR%/html. + goto end +) + +if "%1" == "dirhtml" ( + %SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml. + goto end +) + +if "%1" == "singlehtml" ( + %SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml. + goto end +) + +if "%1" == "pickle" ( + %SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle + if errorlevel 1 exit /b 1 + echo. + echo.Build finished; now you can process the pickle files. + goto end +) + +if "%1" == "json" ( + %SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json + if errorlevel 1 exit /b 1 + echo. + echo.Build finished; now you can process the JSON files. + goto end +) + +if "%1" == "htmlhelp" ( + %SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp + if errorlevel 1 exit /b 1 + echo. + echo.Build finished; now you can run HTML Help Workshop with the ^ +.hhp project file in %BUILDDIR%/htmlhelp. + goto end +) + +if "%1" == "qthelp" ( + %SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp + if errorlevel 1 exit /b 1 + echo. + echo.Build finished; now you can run "qcollectiongenerator" with the ^ +.qhcp project file in %BUILDDIR%/qthelp, like this: + echo.^> qcollectiongenerator %BUILDDIR%\qthelp\complexity.qhcp + echo.To view the help file: + echo.^> assistant -collectionFile %BUILDDIR%\qthelp\complexity.ghc + goto end +) + +if "%1" == "devhelp" ( + %SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. + goto end +) + +if "%1" == "epub" ( + %SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. The epub file is in %BUILDDIR%/epub. + goto end +) + +if "%1" == "latex" ( + %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex + if errorlevel 1 exit /b 1 + echo. + echo.Build finished; the LaTeX files are in %BUILDDIR%/latex. + goto end +) + +if "%1" == "latexpdf" ( + %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex + cd %BUILDDIR%/latex + make all-pdf + cd %BUILDDIR%/.. + echo. + echo.Build finished; the PDF files are in %BUILDDIR%/latex. + goto end +) + +if "%1" == "latexpdfja" ( + %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex + cd %BUILDDIR%/latex + make all-pdf-ja + cd %BUILDDIR%/.. + echo. + echo.Build finished; the PDF files are in %BUILDDIR%/latex. + goto end +) + +if "%1" == "text" ( + %SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. The text files are in %BUILDDIR%/text. + goto end +) + +if "%1" == "man" ( + %SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. The manual pages are in %BUILDDIR%/man. + goto end +) + +if "%1" == "texinfo" ( + %SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo. + goto end +) + +if "%1" == "gettext" ( + %SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. The message catalogs are in %BUILDDIR%/locale. + goto end +) + +if "%1" == "changes" ( + %SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes + if errorlevel 1 exit /b 1 + echo. + echo.The overview file is in %BUILDDIR%/changes. + goto end +) + +if "%1" == "linkcheck" ( + %SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck + if errorlevel 1 exit /b 1 + echo. + echo.Link check complete; look for any errors in the above output ^ +or in %BUILDDIR%/linkcheck/output.txt. + goto end +) + +if "%1" == "doctest" ( + %SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest + if errorlevel 1 exit /b 1 + echo. + echo.Testing of doctests in the sources finished, look at the ^ +results in %BUILDDIR%/doctest/output.txt. + goto end +) + +if "%1" == "xml" ( + %SPHINXBUILD% -b xml %ALLSPHINXOPTS% %BUILDDIR%/xml + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. The XML files are in %BUILDDIR%/xml. + goto end +) + +if "%1" == "pseudoxml" ( + %SPHINXBUILD% -b pseudoxml %ALLSPHINXOPTS% %BUILDDIR%/pseudoxml + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. The pseudo-XML files are in %BUILDDIR%/pseudoxml. + goto end +) + +:end diff --git a/xarray/datatree_/docs/requirements.txt b/xarray/datatree_/docs/requirements.txt new file mode 100644 index 00000000000..6a10e1ab22f --- /dev/null +++ b/xarray/datatree_/docs/requirements.txt @@ -0,0 +1,3 @@ +sphinx>=3.1 +sphinx_copybutton +sphinx-autosummary-accessors diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst new file mode 100644 index 00000000000..ee58db54af2 --- /dev/null +++ b/xarray/datatree_/docs/source/api.rst @@ -0,0 +1,155 @@ +.. currentmodule:: datatree + +############# +API reference +############# + +DataTree +======== + +.. autosummary:: + :toctree: generated/ + + DataTree + DataNode + +Attributes +---------- + +.. autosummary:: + :toctree: generated/ + + DataTree.dims + DataTree.variables + DataTree.encoding + DataTree.sizes + DataTree.attrs + DataTree.nbytes + DataTree.indexes + DataTree.xindexes + DataTree.coords + DataTree.chunks + DataTree.real + DataTree.imag + DataTree.ds + DataTree.has_data + DataTree.groups + +Dictionary interface +-------------------- + +.. autosummary:: + :toctree: generated/ + + DataTree.__getitem__ + DataTree.__setitem__ + DataTree.update + +Methods +------- + +.. autosummary:: + :toctree: generated/ + + DataTree.load + DataTree.compute + DataTree.persist + DataTree.unify_chunks + DataTree.chunk + DataTree.map_blocks + DataTree.copy + DataTree.as_numpy + DataTree.__copy__ + DataTree.__deepcopy__ + DataTree.set_coords + DataTree.reset_coords + DataTree.info + DataTree.isel + DataTree.sel + DataTree.head + DataTree.tail + DataTree.thin + DataTree.broadcast_like + DataTree.reindex_like + DataTree.reindex + DataTree.interp + DataTree.interp_like + DataTree.rename + DataTree.rename_dims + DataTree.rename_vars + DataTree.swap_dims + DataTree.expand_dims + DataTree.set_index + DataTree.reset_index + DataTree.reorder_levels + DataTree.stack + DataTree.unstack + DataTree.update + DataTree.merge + DataTree.drop_vars + DataTree.drop_sel + DataTree.drop_isel + DataTree.drop_dims + DataTree.transpose + DataTree.dropna + DataTree.fillna + DataTree.interpolate_na + DataTree.ffill + DataTree.bfill + DataTree.combine_first + DataTree.reduce + DataTree.map + DataTree.assign + DataTree.diff + DataTree.shift + DataTree.roll + DataTree.sortby + DataTree.quantile + DataTree.rank + DataTree.differentiate + DataTree.integrate + DataTree.cumulative_integrate + DataTree.filter_by_attrs + DataTree.polyfit + DataTree.pad + DataTree.idxmin + DataTree.idxmax + DataTree.argmin + DataTree.argmax + DataTree.query + DataTree.curvefit + DataTree.squeeze + DataTree.clip + DataTree.assign_coords + DataTree.where + DataTree.close + DataTree.isnull + DataTree.notnull + DataTree.isin + DataTree.astype + +Utilities +========= + +.. autosummary:: + :toctree: generated/ + + map_over_subtree + +I/O +=== + +.. autosummary:: + :toctree: generated/ + + open_datatree + DataTree.to_netcdf + DataTree.to_zarr + +.. + Missing + DataTree.__delitem__ + DataTree.get + DataTree.items + DataTree.keys + DataTree.values diff --git a/xarray/datatree_/docs/source/conf.py b/xarray/datatree_/docs/source/conf.py new file mode 100644 index 00000000000..e89e2656b3a --- /dev/null +++ b/xarray/datatree_/docs/source/conf.py @@ -0,0 +1,283 @@ +# -*- coding: utf-8 -*- +# flake8: noqa +# Ignoring F401: imported but unused + +# complexity documentation build configuration file, created by +# sphinx-quickstart on Tue Jul 9 22:26:36 2013. +# +# This file is execfile()d with the current directory set to its containing dir. +# +# Note that not all possible configuration values are present in this +# autogenerated file. +# +# All configuration values have a default; values that are commented out +# serve to show the default. + +import os +import sys + +import sphinx_autosummary_accessors + +import datatree + +# If extensions (or modules to document with autodoc) are in another directory, +# add these directories to sys.path here. If the directory is relative to the +# documentation root, use os.path.abspath to make it absolute, like shown here. +# sys.path.insert(0, os.path.abspath('.')) + +cwd = os.getcwd() +parent = os.path.dirname(cwd) +sys.path.insert(0, parent) + + +# -- General configuration ----------------------------------------------------- + +# If your documentation needs a minimal Sphinx version, state it here. +# needs_sphinx = '1.0' + +# Add any Sphinx extension module names here, as strings. They can be extensions +# coming with Sphinx (named 'sphinx.ext.*') or your custom ones. +extensions = [ + "numpydoc", + "sphinx.ext.autodoc", + "sphinx.ext.viewcode", + "sphinx.ext.autosummary", + "sphinx.ext.intersphinx", + "sphinx.ext.extlinks", + "sphinx.ext.napoleon", +] + +extlinks = { + "issue": ("https://github.com/TomNicholas/datatree/issues/%s", "GH#"), + "pr": ("https://github.com/TomNicholas/datatree/pull/%s", "GH#"), +} +# Add any paths that contain templates here, relative to this directory. +templates_path = ["_templates", sphinx_autosummary_accessors.templates_path] + +# Generate the API documentation when building +autosummary_generate = True + +# The suffix of source filenames. +source_suffix = ".rst" + +# The encoding of source files. +# source_encoding = 'utf-8-sig' + +# The master toctree document. +master_doc = "index" + +# General information about the project. +project = "Datatree" +copyright = "2021 onwards, Tom Nicholas and its Contributors" +author = "Tom Nicholas" + +# The version info for the project you're documenting, acts as replacement for +# |version| and |release|, also used in various other places throughout the +# built documents. +# +# The short X.Y version. +version = "0.0.0" # datatree.__version__ +# The full version, including alpha/beta/rc tags. +release = "0.0.0" # datatree.__version__ + +# The language for content autogenerated by Sphinx. Refer to documentation +# for a list of supported languages. +# language = None + +# There are two options for replacing |today|: either, you set today to some +# non-false value, then it is used: +# today = '' +# Else, today_fmt is used as the format for a strftime call. +# today_fmt = '%B %d, %Y' + +# List of patterns, relative to source directory, that match files and +# directories to ignore when looking for source files. +exclude_patterns = ["_build"] + +# The reST default role (used for this markup: `text`) to use for all documents. +# default_role = None + +# If true, '()' will be appended to :func: etc. cross-reference text. +# add_function_parentheses = True + +# If true, the current module name will be prepended to all description +# unit titles (such as .. function::). +# add_module_names = True + +# If true, sectionauthor and moduleauthor directives will be shown in the +# output. They are ignored by default. +# show_authors = False + +# The name of the Pygments (syntax highlighting) style to use. +pygments_style = "sphinx" + +# A list of ignored prefixes for module index sorting. +# modindex_common_prefix = [] + +# If true, keep warnings as "system message" paragraphs in the built documents. +# keep_warnings = False + + +# -- Intersphinx links --------------------------------------------------------- + +intersphinx_mapping = { + "python": ("https://docs.python.org/3.8/", None), + "xarray": ("https://xarray.pydata.org/en/stable/", None), +} + +# -- Options for HTML output --------------------------------------------------- + +# The theme to use for HTML and HTML Help pages. See the documentation for +# a list of builtin themes. +html_theme = "sphinx_rtd_theme" + +# Theme options are theme-specific and customize the look and feel of a theme +# further. For a list of options available for each theme, see the +# documentation. +# html_theme_options = {} + +# Add any paths that contain custom themes here, relative to this directory. +# html_theme_path = [] + +# The name for this set of Sphinx documents. If None, it defaults to +# " v documentation". +# html_title = None + +# A shorter title for the navigation bar. Default is the same as html_title. +# html_short_title = None + +# The name of an image file (relative to this directory) to place at the top +# of the sidebar. +# html_logo = None + +# The name of an image file (within the static path) to use as favicon of the +# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 +# pixels large. +# html_favicon = None + +# Add any paths that contain custom static files (such as style sheets) here, +# relative to this directory. They are copied after the builtin static files, +# so a file named "default.css" will overwrite the builtin "default.css". +# html_static_path = ['_static'] + +# If not '', a 'Last updated on:' timestamp is inserted at every page bottom, +# using the given strftime format. +# html_last_updated_fmt = '%b %d, %Y' + +# If true, SmartyPants will be used to convert quotes and dashes to +# typographically correct entities. +# html_use_smartypants = True + +# Custom sidebar templates, maps document names to template names. +# html_sidebars = {} + +# Additional templates that should be rendered to pages, maps page names to +# template names. +# html_additional_pages = {} + +# If false, no module index is generated. +# html_domain_indices = True + +# If false, no index is generated. +# html_use_index = True + +# If true, the index is split into individual pages for each letter. +# html_split_index = False + +# If true, links to the reST sources are added to the pages. +# html_show_sourcelink = True + +# If true, "Created using Sphinx" is shown in the HTML footer. Default is True. +# html_show_sphinx = True + +# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True. +# html_show_copyright = True + +# If true, an OpenSearch description file will be output, and all pages will +# contain a tag referring to it. The value of this option must be the +# base URL from which the finished HTML is served. +# html_use_opensearch = '' + +# This is the file name suffix for HTML files (e.g. ".xhtml"). +# html_file_suffix = None + +# Output file base name for HTML help builder. +htmlhelp_basename = "datatree_doc" + + +# -- Options for LaTeX output -------------------------------------------------- + +latex_elements = { + # The paper size ('letterpaper' or 'a4paper'). + # 'papersize': 'letterpaper', + # The font size ('10pt', '11pt' or '12pt'). + # 'pointsize': '10pt', + # Additional stuff for the LaTeX preamble. + # 'preamble': '', +} + +# Grouping the document tree into LaTeX files. List of tuples +# (source start file, target name, title, author, documentclass [howto/manual]). +latex_documents = [ + ("index", "datatree.tex", "Datatree Documentation", author, "manual") +] + +# The name of an image file (relative to this directory) to place at the top of +# the title page. +# latex_logo = None + +# For "manual" documents, if this is true, then toplevel headings are parts, +# not chapters. +# latex_use_parts = False + +# If true, show page references after internal links. +# latex_show_pagerefs = False + +# If true, show URL addresses after external links. +# latex_show_urls = False + +# Documents to append as an appendix to all manuals. +# latex_appendices = [] + +# If false, no module index is generated. +# latex_domain_indices = True + + +# -- Options for manual page output -------------------------------------------- + +# One entry per manual page. List of tuples +# (source start file, name, description, authors, manual section). +man_pages = [("index", "datatree", "Datatree Documentation", [author], 1)] + +# If true, show URL addresses after external links. +# man_show_urls = False + + +# -- Options for Texinfo output ------------------------------------------------ + +# Grouping the document tree into Texinfo files. List of tuples +# (source start file, target name, title, author, +# dir menu entry, description, category) +texinfo_documents = [ + ( + "index", + "datatree", + "Datatree Documentation", + author, + "datatree", + "Tree-like hierarchical data structure for xarray.", + "Miscellaneous", + ) +] + +# Documents to append as an appendix to all manuals. +# texinfo_appendices = [] + +# If false, no module index is generated. +# texinfo_domain_indices = True + +# How to display URL addresses: 'footnote', 'no', or 'inline'. +# texinfo_show_urls = 'footnote' + +# If true, do not generate a @detailmenu in the "Top" node's menu. +# texinfo_no_detailmenu = False diff --git a/xarray/datatree_/docs/source/contributing.rst b/xarray/datatree_/docs/source/contributing.rst new file mode 100644 index 00000000000..b070c07c867 --- /dev/null +++ b/xarray/datatree_/docs/source/contributing.rst @@ -0,0 +1,136 @@ +======================== +Contributing to Datatree +======================== + +Contributions are highly welcomed and appreciated. Every little help counts, +so do not hesitate! + +.. contents:: Contribution links + :depth: 2 + +.. _submitfeedback: + +Feature requests and feedback +----------------------------- + +Do you like Datatree? Share some love on Twitter or in your blog posts! + +We'd also like to hear about your propositions and suggestions. Feel free to +`submit them as issues `_ and: + +* Explain in detail how they should work. +* Keep the scope as narrow as possible. This will make it easier to implement. + +.. _reportbugs: + +Report bugs +----------- + +Report bugs for Datatree in the `issue tracker `_. + +If you are reporting a bug, please include: + +* Your operating system name and version. +* Any details about your local setup that might be helpful in troubleshooting, + specifically the Python interpreter version, installed libraries, and Datatree + version. +* Detailed steps to reproduce the bug. + +If you can write a demonstration test that currently fails but should pass +(xfail), that is a very useful commit to make as well, even if you cannot +fix the bug itself. + +.. _fixbugs: + +Fix bugs +-------- + +Look through the `GitHub issues for bugs `_. + +Talk to developers to find out how you can fix specific bugs. + +Write documentation +------------------- + +Datatree could always use more documentation. What exactly is needed? + +* More complementary documentation. Have you perhaps found something unclear? +* Docstrings. There can never be too many of them. +* Blog posts, articles and such -- they're all very appreciated. + +You can also edit documentation files directly in the GitHub web interface, +without using a local copy. This can be convenient for small fixes. + +To build the documentation locally, you first need to install the following +tools: + +- `Sphinx `__ +- `sphinx_rtd_theme `__ +- `sphinx-autosummary-accessors `__ + +You can then build the documentation with the following commands:: + + $ cd docs + $ make html + +The built documentation should be available in the ``docs/_build/`` folder. + +.. _`pull requests`: +.. _pull-requests: + +Preparing Pull Requests +----------------------- + +#. Fork the + `Datatree GitHub repository `__. It's + fine to use ``Datatree`` as your fork repository name because it will live + under your user. + +#. Clone your fork locally using `git `_ and create a branch:: + + $ git clone git@github.com:{YOUR_GITHUB_USERNAME}/Datatree.git + $ cd Datatree + + # now, to fix a bug or add feature create your own branch off "master": + + $ git checkout -b your-bugfix-feature-branch-name master + +#. Install `pre-commit `_ and its hook on the Datatree repo:: + + $ pip install --user pre-commit + $ pre-commit install + + Afterwards ``pre-commit`` will run whenever you commit. + + https://pre-commit.com/ is a framework for managing and maintaining multi-language pre-commit hooks + to ensure code-style and code formatting is consistent. + +#. Install dependencies into a new conda environment:: + + $ conda env update -f ci/environment.yml + +#. Run all the tests + + Now running tests is as simple as issuing this command:: + + $ conda activate datatree-dev + $ pytest --junitxml=test-reports/junit.xml --cov=./ --verbose + + This command will run tests via the "pytest" tool. + +#. You can now edit your local working copy and run the tests again as necessary. Please follow PEP-8 for naming. + + When committing, ``pre-commit`` will re-format the files if necessary. + +#. Commit and push once your tests pass and you are happy with your change(s):: + + $ git commit -a -m "" + $ git push -u + +#. Finally, submit a pull request through the GitHub website using this data:: + + head-fork: YOUR_GITHUB_USERNAME/Datatree + compare: your-branch-name + + base-fork: TomNicholas/datatree + base: master diff --git a/xarray/datatree_/docs/source/index.rst b/xarray/datatree_/docs/source/index.rst new file mode 100644 index 00000000000..fa1604101cd --- /dev/null +++ b/xarray/datatree_/docs/source/index.rst @@ -0,0 +1,20 @@ +Datatree +======== + +**Datatree is a WIP implementation of a tree-like hierarchical data structure for xarray.** + + +.. toctree:: + :maxdepth: 2 + :caption: Documentation Contents + + installation + tutorial + api + contributing + +Feedback +-------- + +If you encounter any errors or problems with **Datatree**, please open an issue +on `GitHub `_. diff --git a/xarray/datatree_/docs/source/installation.rst b/xarray/datatree_/docs/source/installation.rst new file mode 100644 index 00000000000..c4e4c7fc468 --- /dev/null +++ b/xarray/datatree_/docs/source/installation.rst @@ -0,0 +1,5 @@ +============ +Installation +============ + +Coming soon! diff --git a/xarray/datatree_/docs/source/tutorial.rst b/xarray/datatree_/docs/source/tutorial.rst new file mode 100644 index 00000000000..e70044c2aa9 --- /dev/null +++ b/xarray/datatree_/docs/source/tutorial.rst @@ -0,0 +1,5 @@ +======== +Tutorial +======== + +Coming soon! From 251fbc694cf82fed2b8154d528d3d1fc6e847f1c Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Wed, 15 Dec 2021 12:31:21 -0500 Subject: [PATCH 073/260] Forbid .ds=None https://github.com/xarray-contrib/datatree/pull/49 * ensure .ds returns an instance of Dataset, not None * change tests to pass * relax _check_isomorphic to not care about presence or absence of data --- xarray/datatree_/datatree/datatree.py | 20 +++++++++++-------- xarray/datatree_/datatree/mapping.py | 20 +++++-------------- .../datatree_/datatree/tests/test_datatree.py | 15 ++++++-------- .../datatree_/datatree/tests/test_mapping.py | 10 ---------- 4 files changed, 23 insertions(+), 42 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 46e3d6c92d2..ccf67951d1a 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -106,19 +106,23 @@ def ds(self, data: Union[Dataset, DataArray] = None): raise TypeError( f"{type(data)} object is not an xarray Dataset, DataArray, or None" ) + if isinstance(data, DataArray): data = data.to_dataset() - if data is not None: - for var in list(data.variables): - if var in list(c.name for c in self.children): - raise KeyError( - f"Cannot add variable named {var}: node already has a child named {var}" - ) + elif data is None: + data = Dataset() + + for var in list(data.variables): + if var in list(c.name for c in self.children): + raise KeyError( + f"Cannot add variable named {var}: node already has a child named {var}" + ) + self._ds = data @property - def has_data(self): - return self.ds is not None + def has_data(self) -> bool: + return len(self.ds.variables) > 0 @classmethod def from_dict( diff --git a/xarray/datatree_/datatree/mapping.py b/xarray/datatree_/datatree/mapping.py index dc0fb913f15..00e5b7f559e 100644 --- a/xarray/datatree_/datatree/mapping.py +++ b/xarray/datatree_/datatree/mapping.py @@ -17,13 +17,12 @@ def _check_isomorphic(subtree_a, subtree_b, require_names_equal=False): """ Check that two trees have the same structure, raising an error if not. - Does not check the actual data in the nodes, but it does check that if one node does/doesn't have data then its - counterpart in the other tree also does/doesn't have data. + Does not compare the actual data in the nodes. Also does not check that the root nodes of each tree have the same parent - so this function checks that subtrees are isomorphic, not the entire tree above (if it exists). - Can optionally check if respective nodes should have the same name. + Can optionally check if corresponding nodes should have the same name. Parameters ---------- @@ -37,8 +36,8 @@ def _check_isomorphic(subtree_a, subtree_b, require_names_equal=False): TypeError If either subtree_a or subtree_b are not tree objects. TreeIsomorphismError - If subtree_a and subtree_b are tree objects, but are not isomorphic to one another, or one contains data at a - location the other does not. Also optionally raised if their structure is isomorphic, but the names of any two + If subtree_a and subtree_b are tree objects, but are not isomorphic to one another. + Also optionally raised if their structure is isomorphic, but the names of any two respective nodes are not equal. """ # TODO turn this into a public function called assert_isomorphic @@ -66,15 +65,6 @@ def _check_isomorphic(subtree_a, subtree_b, require_names_equal=False): f"second tree has name '{node_b.name}'." ) - if node_a.has_data != node_b.has_data: - dat_a = "no " if not node_a.has_data else "" - dat_b = "no " if not node_b.has_data else "" - raise TreeIsomorphismError( - f"Trees are not isomorphic because node '{path_a}' in the first tree has " - f"{dat_a}data, whereas its counterpart node '{path_b}' in the second tree " - f"has {dat_b}data." - ) - if len(node_a.children) != len(node_b.children): raise TreeIsomorphismError( f"Trees are not isomorphic because node '{path_a}' in the first tree has " @@ -89,7 +79,7 @@ def map_over_subtree(func): Applies a function to every dataset in one or more subtrees, returning new trees which store the results. - The function will be applied to any dataset stored in any of the nodes in the trees. The returned trees will have + The function will be applied to any non-empty dataset stored in any of the nodes in the trees. The returned trees will have the same structure as the supplied trees. `func` needs to return one Datasets, DataArrays, or None in order to be able to rebuild the subtrees after diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 7d1569876bd..3437ffe6390 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -14,10 +14,7 @@ def assert_tree_equal(dt_a, dt_b): for a, b in zip(dt_a.subtree, dt_b.subtree): assert a.name == b.name assert a.pathstr == b.pathstr - if a.has_data: - assert a.ds.equals(b.ds) - else: - assert a.ds is b.ds + assert a.ds.equals(b.ds) def create_test_datatree(modify=lambda ds: ds): @@ -183,17 +180,17 @@ def test_set_new_empty_node(self): john["mary"] = None mary = john["mary"] assert isinstance(mary, DataTree) - assert mary.ds is None + assert_identical(mary.ds, xr.Dataset()) def test_overwrite_data_in_node_with_none(self): john = DataTree("john") mary = DataTree("mary", parent=john, data=xr.Dataset()) john["mary"] = None - assert mary.ds is None + assert_identical(mary.ds, xr.Dataset()) john.ds = xr.Dataset() john["/"] = None - assert john.ds is None + assert_identical(john.ds, xr.Dataset()) def test_set_dataset_on_this_node(self): data = xr.Dataset({"temp": [0, 50]}) @@ -249,7 +246,7 @@ def test_empty(self): assert dt.name == "root" assert dt.parent is None assert dt.children == () - assert dt.ds is None + assert_identical(dt.ds, xr.Dataset()) def test_data_in_root(self): dat = xr.Dataset() @@ -262,7 +259,7 @@ def test_data_in_root(self): def test_one_layer(self): dat1, dat2 = xr.Dataset({"a": 1}), xr.Dataset({"b": 2}) dt = DataTree.from_dict({"run1": dat1, "run2": dat2}) - assert dt.ds is None + assert_identical(dt.ds, xr.Dataset()) assert dt["run1"].ds is dat1 assert dt["run1"].children == () assert dt["run2"].ds is dat2 diff --git a/xarray/datatree_/datatree/tests/test_mapping.py b/xarray/datatree_/datatree/tests/test_mapping.py index 050bbbf6c9f..f6082b5f285 100644 --- a/xarray/datatree_/datatree/tests/test_mapping.py +++ b/xarray/datatree_/datatree/tests/test_mapping.py @@ -35,16 +35,6 @@ def test_different_heights(self): with pytest.raises(TreeIsomorphismError, match=expected_err_str): _check_isomorphic(dt1, dt2) - def test_only_one_has_data(self): - dt1 = DataTree.from_dict(data_objects={"a": xr.Dataset({"a": 0})}) - dt2 = DataTree.from_dict(data_objects={"a": None}) - expected_err_str = ( - "'root/a' in the first tree has data, whereas its counterpart node 'root/a' in the " - "second tree has no data" - ) - with pytest.raises(TreeIsomorphismError, match=expected_err_str): - _check_isomorphic(dt1, dt2) - def test_names_different(self): dt1 = DataTree.from_dict(data_objects={"a": xr.Dataset()}) dt2 = DataTree.from_dict(data_objects={"b": empty}) From ca6830fdde7a627e0d8efee4c5d270f9e3150ca6 Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Thu, 16 Dec 2021 22:22:36 -0500 Subject: [PATCH 074/260] Add assert_equal functions https://github.com/xarray-contrib/datatree/pull/50 * added assert functions * pseudocode for printing differences between trees - need tests * equals->identical fix * refactored to use same diff function for asserts and to check isomorphism internally * added tests of tree diff formatting * added option to check trees from the root * fix bugs with assert functions --- xarray/datatree_/conftest.py | 3 + xarray/datatree_/datatree/datatree.py | 111 +++++++++++++++- xarray/datatree_/datatree/formatting.py | 46 +++++++ xarray/datatree_/datatree/mapping.py | 100 ++++++++++----- xarray/datatree_/datatree/testing.py | 120 ++++++++++++++++++ .../datatree/tests/test_formatting.py | 63 +++++++++ .../datatree_/datatree/tests/test_mapping.py | 41 +++--- 7 files changed, 428 insertions(+), 56 deletions(-) create mode 100644 xarray/datatree_/conftest.py create mode 100644 xarray/datatree_/datatree/formatting.py create mode 100644 xarray/datatree_/datatree/testing.py create mode 100644 xarray/datatree_/datatree/tests/test_formatting.py diff --git a/xarray/datatree_/conftest.py b/xarray/datatree_/conftest.py new file mode 100644 index 00000000000..7ef19174298 --- /dev/null +++ b/xarray/datatree_/conftest.py @@ -0,0 +1,3 @@ +import pytest + +pytest.register_assert_rewrite("datatree.testing") diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index ccf67951d1a..3874c14099b 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -1,14 +1,14 @@ from __future__ import annotations import textwrap -from typing import Any, Callable, Dict, Hashable, Iterable, List, Mapping, Union +from typing import Any, Callable, Dict, Hashable, Iterable, List, Mapping, Tuple, Union import anytree from xarray import DataArray, Dataset, merge from xarray.core import dtypes, utils from xarray.core.variable import Variable -from .mapping import map_over_subtree +from .mapping import TreeIsomorphismError, check_isomorphic, map_over_subtree from .ops import ( DataTreeArithmeticMixin, MappedDatasetMethodsMixin, @@ -409,12 +409,113 @@ def __setitem__( def nbytes(self) -> int: return sum(node.ds.nbytes if node.has_data else 0 for node in self.subtree) + def isomorphic( + self, + other: DataTree, + from_root=False, + strict_names=False, + ) -> bool: + """ + Two DataTrees are considered isomorphic if every node has the same number of children. + + Nothing about the data in each node is checked. + + Isomorphism is a necessary condition for two trees to be used in a nodewise binary operation, + such as tree1 + tree2. + + By default this method does not check any part of the tree above the given node. + Therefore this method can be used as default to check that two subtrees are isomorphic. + + Parameters + ---------- + other : DataTree + The tree object to compare to. + from_root : bool, optional, default is False + Whether or not to first traverse to the root of the trees before checking for isomorphism. + If a & b have no parents then this has no effect. + + See Also + -------- + DataTree.equals + DataTree.identical + """ + try: + check_isomorphic( + self, + other, + require_names_equal=strict_names, + check_from_root=from_root, + ) + return True + except (TypeError, TreeIsomorphismError): + return False + + def equals(self, other: DataTree, from_root=True) -> bool: + """ + Two DataTrees are equal if they have isomorphic node structures, with matching node names, + and if they have matching variables and coordinates, all of which are equal. + + By default this method will check the whole tree above the given node. + + Parameters + ---------- + other : DataTree + The tree object to compare to. + from_root : bool, optional, default is True + Whether or not to first traverse to the root of the trees before checking. + If a & b have no parents then this has no effect. + + See Also + -------- + Dataset.equals + DataTree.isomorphic + DataTree.identical + """ + if not self.isomorphic(other, from_root=from_root, strict_names=True): + return False + + return all( + [ + node.ds.equals(other_node.ds) + for node, other_node in zip(self.subtree, other.subtree) + ] + ) + + def identical(self, other: DataTree, from_root=True) -> bool: + """ + Like equals, but will also check all dataset attributes and the attributes on + all variables and coordinates. + + By default this method will check the whole tree above the given node. + + Parameters + ---------- + other : DataTree + The tree object to compare to. + from_root : bool, optional, default is True + Whether or not to first traverse to the root of the trees before checking. + If a & b have no parents then this has no effect. + + See Also + -------- + Dataset.identical + DataTree.isomorphic + DataTree.equals + """ + if not self.isomorphic(other, from_root=from_root, strict_names=True): + return False + + return all( + node.ds.identical(other_node.ds) + for node, other_node in zip(self.subtree, other.subtree) + ) + def map_over_subtree( self, func: Callable, *args: Iterable[Any], **kwargs: Any, - ) -> DataTree: + ) -> DataTree | Tuple[DataTree]: """ Apply a function to every dataset in this subtree, returning a new tree which stores the results. @@ -437,8 +538,8 @@ def map_over_subtree( Returns ------- - subtree : DataTree - Subtree containing results from applying ``func`` to the dataset at each node. + subtrees : DataTree, Tuple of DataTrees + One or more subtrees containing results from applying ``func`` to the data at each node. """ # TODO this signature means that func has no way to know which node it is being called upon - change? diff --git a/xarray/datatree_/datatree/formatting.py b/xarray/datatree_/datatree/formatting.py new file mode 100644 index 00000000000..9a03be3a0ca --- /dev/null +++ b/xarray/datatree_/datatree/formatting.py @@ -0,0 +1,46 @@ +from xarray.core.formatting import _compat_to_str, diff_dataset_repr + +from .mapping import diff_treestructure + + +def diff_nodewise_summary(a, b, compat): + """Iterates over all corresponding nodes, recording differences between data at each location.""" + + compat_str = _compat_to_str(compat) + + summary = [] + for node_a, node_b in zip(a.subtree, b.subtree): + a_ds, b_ds = node_a.ds, node_b.ds + + if not a_ds._all_compat(b_ds, compat): + path = node_a.pathstr + dataset_diff = diff_dataset_repr(a_ds, b_ds, compat_str) + data_diff = "\n".join(dataset_diff.split("\n", 1)[1:]) + + nodediff = ( + f"\nData in nodes at position '{path}' do not match:" f"{data_diff}" + ) + summary.append(nodediff) + + return "\n".join(summary) + + +def diff_tree_repr(a, b, compat): + summary = [ + f"Left and right {type(a).__name__} objects are not {_compat_to_str(compat)}" + ] + + # TODO check root parents? + + strict_names = True if compat in ["equals", "identical"] else False + treestructure_diff = diff_treestructure(a, b, strict_names) + + # If the trees structures are different there is no point comparing each node + # TODO we could show any differences in nodes up to the first place that structure differs? + if treestructure_diff or compat == "isomorphic": + summary.append("\n" + treestructure_diff) + else: + nodewise_diff = diff_nodewise_summary(a, b, compat) + summary.append("\n" + nodewise_diff) + + return "\n".join(summary) diff --git a/xarray/datatree_/datatree/mapping.py b/xarray/datatree_/datatree/mapping.py index 00e5b7f559e..29608779614 100644 --- a/xarray/datatree_/datatree/mapping.py +++ b/xarray/datatree_/datatree/mapping.py @@ -1,11 +1,18 @@ +from __future__ import annotations + import functools from itertools import repeat +from textwrap import dedent +from typing import TYPE_CHECKING, Callable, Tuple from anytree.iterators import LevelOrderIter from xarray import DataArray, Dataset from .treenode import TreeNode +if TYPE_CHECKING: + from .datatree import DataTree + class TreeIsomorphismError(ValueError): """Error raised if two tree objects are not isomorphic to one another when they need to be.""" @@ -13,74 +20,97 @@ class TreeIsomorphismError(ValueError): pass -def _check_isomorphic(subtree_a, subtree_b, require_names_equal=False): +def check_isomorphic( + a: DataTree, + b: DataTree, + require_names_equal=False, + check_from_root=True, +): """ Check that two trees have the same structure, raising an error if not. Does not compare the actual data in the nodes. - Also does not check that the root nodes of each tree have the same parent - so this function checks that subtrees - are isomorphic, not the entire tree above (if it exists). + By default this function only checks that subtrees are isomorphic, not the entire tree above (if it exists). + Can instead optionally check the entire trees starting from the root, which will ensure all Can optionally check if corresponding nodes should have the same name. Parameters ---------- - subtree_a : DataTree - subtree_b : DataTree - require_names_equal : Bool, optional - Whether or not to also check that each node has the same name as its counterpart. Default is False. + a : DataTree + b : DataTree + require_names_equal : Bool + Whether or not to also check that each node has the same name as its counterpart. + check_from_root : Bool + Whether or not to first traverse to the root of the trees before checking for isomorphism. + If a & b have no parents then this has no effect. Raises ------ TypeError - If either subtree_a or subtree_b are not tree objects. + If either a or b are not tree objects. TreeIsomorphismError - If subtree_a and subtree_b are tree objects, but are not isomorphic to one another. + If a and b are tree objects, but are not isomorphic to one another. Also optionally raised if their structure is isomorphic, but the names of any two respective nodes are not equal. """ - # TODO turn this into a public function called assert_isomorphic - if not isinstance(subtree_a, TreeNode): - raise TypeError( - f"Argument `subtree_a` is not a tree, it is of type {type(subtree_a)}" - ) - if not isinstance(subtree_b, TreeNode): - raise TypeError( - f"Argument `subtree_b` is not a tree, it is of type {type(subtree_b)}" - ) + if not isinstance(a, TreeNode): + raise TypeError(f"Argument `a` is not a tree, it is of type {type(a)}") + if not isinstance(b, TreeNode): + raise TypeError(f"Argument `b` is not a tree, it is of type {type(b)}") + + if check_from_root: + a = a.root + b = b.root + + diff = diff_treestructure(a, b, require_names_equal=require_names_equal) + + if diff: + raise TreeIsomorphismError("DataTree objects are not isomorphic:\n" + diff) + + +def diff_treestructure(a: DataTree, b: DataTree, require_names_equal: bool) -> str: + """ + Return a summary of why two trees are not isomorphic. + If they are isomorphic return an empty string. + """ # Walking nodes in "level-order" fashion means walking down from the root breadth-first. - # Checking by walking in this way implicitly assumes that the tree is an ordered tree (which it is so long as - # children are stored in a tuple or list rather than in a set). - for node_a, node_b in zip(LevelOrderIter(subtree_a), LevelOrderIter(subtree_b)): + # Checking for isomorphism by walking in this way implicitly assumes that the tree is an ordered tree + # (which it is so long as children are stored in a tuple or list rather than in a set). + for node_a, node_b in zip(LevelOrderIter(a), LevelOrderIter(b)): path_a, path_b = node_a.pathstr, node_b.pathstr if require_names_equal: if node_a.name != node_b.name: - raise TreeIsomorphismError( - f"Trees are not isomorphic because node '{path_a}' in the first tree has " - f"name '{node_a.name}', whereas its counterpart node '{path_b}' in the " - f"second tree has name '{node_b.name}'." + diff = dedent( + f"""\ + Node '{path_a}' in the left object has name '{node_a.name}' + Node '{path_b}' in the right object has name '{node_b.name}'""" ) + return diff if len(node_a.children) != len(node_b.children): - raise TreeIsomorphismError( - f"Trees are not isomorphic because node '{path_a}' in the first tree has " - f"{len(node_a.children)} children, whereas its counterpart node '{path_b}' in " - f"the second tree has {len(node_b.children)} children." + diff = dedent( + f"""\ + Number of children on node '{path_a}' of the left object: {len(node_a.children)} + Number of children on node '{path_b}' of the right object: {len(node_b.children)}""" ) + return diff + + return "" -def map_over_subtree(func): +def map_over_subtree(func: Callable) -> DataTree | Tuple[DataTree, ...]: """ Decorator which turns a function which acts on (and returns) Datasets into one which acts on and returns DataTrees. Applies a function to every dataset in one or more subtrees, returning new trees which store the results. - The function will be applied to any non-empty dataset stored in any of the nodes in the trees. The returned trees will have - the same structure as the supplied trees. + The function will be applied to any non-empty dataset stored in any of the nodes in the trees. The returned trees + will have the same structure as the supplied trees. `func` needs to return one Datasets, DataArrays, or None in order to be able to rebuild the subtrees after mapping, as each result will be assigned to its respective node of a new tree via `DataTree.__setitem__`. Any @@ -99,7 +129,7 @@ def map_over_subtree(func): (i.e. func must accept at least one Dataset and return at least one Dataset.) Function will not be applied to any nodes without datasets. *args : tuple, optional - Positional arguments passed on to `func`. If DataTrees any data-containing nodes will be converted to Datasets \ + Positional arguments passed on to `func`. If DataTrees any data-containing nodes will be converted to Datasets via .ds . **kwargs : Any Keyword arguments passed on to `func`. If DataTrees any data-containing nodes will be converted to Datasets @@ -138,7 +168,9 @@ def _map_over_subtree(*args, **kwargs): for other_tree in other_trees: # isomorphism is transitive so this is enough to guarantee all trees are mutually isomorphic - _check_isomorphic(first_tree, other_tree, require_names_equal=False) + check_isomorphic( + first_tree, other_tree, require_names_equal=False, check_from_root=False + ) # Walk all trees simultaneously, applying func to all nodes that lie in same position in different trees # We don't know which arguments are DataTrees so we zip all arguments together as iterables diff --git a/xarray/datatree_/datatree/testing.py b/xarray/datatree_/datatree/testing.py new file mode 100644 index 00000000000..2b33a5b3ea0 --- /dev/null +++ b/xarray/datatree_/datatree/testing.py @@ -0,0 +1,120 @@ +from xarray.testing import ensure_warnings + +from .datatree import DataTree +from .formatting import diff_tree_repr + + +@ensure_warnings +def assert_isomorphic(a: DataTree, b: DataTree, from_root: bool = False): + """ + Two DataTrees are considered isomorphic if every node has the same number of children. + + Nothing about the data in each node is checked. + + Isomorphism is a necessary condition for two trees to be used in a nodewise binary operation, + such as tree1 + tree2. + + By default this function does not check any part of the tree above the given node. + Therefore this function can be used as default to check that two subtrees are isomorphic. + + Parameters + ---------- + a : DataTree + The first object to compare. + b : DataTree + The second object to compare. + from_root : bool, optional, default is False + Whether or not to first traverse to the root of the trees before checking for isomorphism. + If a & b have no parents then this has no effect. + + See Also + -------- + DataTree.isomorphic + assert_equals + assert_identical + """ + __tracebackhide__ = True + assert type(a) == type(b) + + if isinstance(a, DataTree): + if from_root: + a = a.root + b = b.root + + assert a.isomorphic(b, from_root=False), diff_tree_repr(a, b, "isomorphic") + else: + raise TypeError(f"{type(a)} not of type DataTree") + + +@ensure_warnings +def assert_equal(a: DataTree, b: DataTree, from_root: bool = True): + """ + Two DataTrees are equal if they have isomorphic node structures, with matching node names, + and if they have matching variables and coordinates, all of which are equal. + + By default this method will check the whole tree above the given node. + + Parameters + ---------- + a : DataTree + The first object to compare. + b : DataTree + The second object to compare. + from_root : bool, optional, default is True + Whether or not to first traverse to the root of the trees before checking for isomorphism. + If a & b have no parents then this has no effect. + + See Also + -------- + DataTree.equals + assert_isomorphic + assert_identical + """ + __tracebackhide__ = True + assert type(a) == type(b) + + if isinstance(a, DataTree): + if from_root: + a = a.root + b = b.root + + assert a.equals(b), diff_tree_repr(a, b, "equals") + else: + raise TypeError(f"{type(a)} not of type DataTree") + + +@ensure_warnings +def assert_identical(a: DataTree, b: DataTree, from_root: bool = True): + """ + Like assert_equals, but will also check all dataset attributes and the attributes on + all variables and coordinates. + + By default this method will check the whole tree above the given node. + + Parameters + ---------- + a : xarray.DataTree + The first object to compare. + b : xarray.DataTree + The second object to compare. + from_root : bool, optional, default is True + Whether or not to first traverse to the root of the trees before checking for isomorphism. + If a & b have no parents then this has no effect. + + See Also + -------- + DataTree.identical + assert_isomorphic + assert_equal + """ + + __tracebackhide__ = True + assert type(a) == type(b) + if isinstance(a, DataTree): + if from_root: + a = a.root + b = b.root + + assert a.identical(b), diff_tree_repr(a, b, "identical") + else: + raise TypeError(f"{type(a)} not of type DataTree") diff --git a/xarray/datatree_/datatree/tests/test_formatting.py b/xarray/datatree_/datatree/tests/test_formatting.py new file mode 100644 index 00000000000..ba582a07bd4 --- /dev/null +++ b/xarray/datatree_/datatree/tests/test_formatting.py @@ -0,0 +1,63 @@ +from textwrap import dedent + +from xarray import Dataset + +from datatree import DataTree +from datatree.formatting import diff_tree_repr + + +class TestDiffFormatting: + def test_diff_structure(self): + dt_1 = DataTree.from_dict({"a": None, "a/b": None, "a/c": None}) + dt_2 = DataTree.from_dict({"d": None, "d/e": None}) + + expected = dedent( + """\ + Left and right DataTree objects are not isomorphic + + Number of children on node 'root/a' of the left object: 2 + Number of children on node 'root/d' of the right object: 1""" + ) + actual = diff_tree_repr(dt_1, dt_2, "isomorphic") + assert actual == expected + + def test_diff_node_names(self): + dt_1 = DataTree.from_dict({"a": None}) + dt_2 = DataTree.from_dict({"b": None}) + + expected = dedent( + """\ + Left and right DataTree objects are not identical + + Node 'root/a' in the left object has name 'a' + Node 'root/b' in the right object has name 'b'""" + ) + actual = diff_tree_repr(dt_1, dt_2, "identical") + assert actual == expected + + def test_diff_node_data(self): + ds1 = Dataset({"u": 0, "v": 1}) + ds3 = Dataset({"w": 5}) + dt_1 = DataTree.from_dict({"a": ds1, "a/b": ds3}) + ds2 = Dataset({"u": 0}) + ds4 = Dataset({"w": 6}) + dt_2 = DataTree.from_dict({"a": ds2, "a/b": ds4}) + + expected = dedent( + """\ + Left and right DataTree objects are not equal + + + Data in nodes at position 'root/a' do not match: + + Data variables only on the left object: + v int64 1 + + Data in nodes at position 'root/a/b' do not match: + + Differing data variables: + L w int64 5 + R w int64 6""" + ) + actual = diff_tree_repr(dt_1, dt_2, "equals") + assert actual == expected diff --git a/xarray/datatree_/datatree/tests/test_mapping.py b/xarray/datatree_/datatree/tests/test_mapping.py index f6082b5f285..98cc6027dde 100644 --- a/xarray/datatree_/datatree/tests/test_mapping.py +++ b/xarray/datatree_/datatree/tests/test_mapping.py @@ -2,7 +2,7 @@ import xarray as xr from datatree.datatree import DataTree -from datatree.mapping import TreeIsomorphismError, _check_isomorphic, map_over_subtree +from datatree.mapping import TreeIsomorphismError, check_isomorphic, map_over_subtree from datatree.treenode import TreeNode from .test_datatree import assert_tree_equal, create_test_datatree @@ -13,37 +13,37 @@ class TestCheckTreesIsomorphic: def test_not_a_tree(self): with pytest.raises(TypeError, match="not a tree"): - _check_isomorphic("s", 1) + check_isomorphic("s", 1) def test_different_widths(self): dt1 = DataTree.from_dict(data_objects={"a": empty}) - dt2 = DataTree.from_dict(data_objects={"a": empty, "b": empty}) + dt2 = DataTree.from_dict(data_objects={"b": empty, "c": empty}) expected_err_str = ( - "'root' in the first tree has 1 children, whereas its counterpart node 'root' in the " - "second tree has 2 children" + "Number of children on node 'root' of the left object: 1\n" + "Number of children on node 'root' of the right object: 2" ) with pytest.raises(TreeIsomorphismError, match=expected_err_str): - _check_isomorphic(dt1, dt2) + check_isomorphic(dt1, dt2) def test_different_heights(self): dt1 = DataTree.from_dict(data_objects={"a": empty}) - dt2 = DataTree.from_dict(data_objects={"a": empty, "a/b": empty}) + dt2 = DataTree.from_dict(data_objects={"b": empty, "b/c": empty}) expected_err_str = ( - "'root/a' in the first tree has 0 children, whereas its counterpart node 'root/a' in the " - "second tree has 1 children" + "Number of children on node 'root/a' of the left object: 0\n" + "Number of children on node 'root/b' of the right object: 1" ) with pytest.raises(TreeIsomorphismError, match=expected_err_str): - _check_isomorphic(dt1, dt2) + check_isomorphic(dt1, dt2) def test_names_different(self): dt1 = DataTree.from_dict(data_objects={"a": xr.Dataset()}) dt2 = DataTree.from_dict(data_objects={"b": empty}) expected_err_str = ( - "'root/a' in the first tree has name 'a', whereas its counterpart node 'root/b' in the " - "second tree has name 'b'" + "Node 'root/a' in the left object has name 'a'\n" + "Node 'root/b' in the right object has name 'b'" ) with pytest.raises(TreeIsomorphismError, match=expected_err_str): - _check_isomorphic(dt1, dt2, require_names_equal=True) + check_isomorphic(dt1, dt2, require_names_equal=True) def test_isomorphic_names_equal(self): dt1 = DataTree.from_dict( @@ -52,7 +52,7 @@ def test_isomorphic_names_equal(self): dt2 = DataTree.from_dict( data_objects={"a": empty, "b": empty, "b/c": empty, "b/d": empty} ) - _check_isomorphic(dt1, dt2, require_names_equal=True) + check_isomorphic(dt1, dt2, require_names_equal=True) def test_isomorphic_ordering(self): dt1 = DataTree.from_dict( @@ -61,7 +61,7 @@ def test_isomorphic_ordering(self): dt2 = DataTree.from_dict( data_objects={"a": empty, "b": empty, "b/c": empty, "b/d": empty} ) - _check_isomorphic(dt1, dt2, require_names_equal=False) + check_isomorphic(dt1, dt2, require_names_equal=False) def test_isomorphic_names_not_equal(self): dt1 = DataTree.from_dict( @@ -70,14 +70,21 @@ def test_isomorphic_names_not_equal(self): dt2 = DataTree.from_dict( data_objects={"A": empty, "B": empty, "B/C": empty, "B/D": empty} ) - _check_isomorphic(dt1, dt2) + check_isomorphic(dt1, dt2) def test_not_isomorphic_complex_tree(self): dt1 = create_test_datatree() dt2 = create_test_datatree() dt2.set_node("set1/set2", TreeNode("set3")) with pytest.raises(TreeIsomorphismError, match="root/set1/set2"): - _check_isomorphic(dt1, dt2) + check_isomorphic(dt1, dt2) + + def test_checking_from_root(self): + dt1 = create_test_datatree() + dt2 = create_test_datatree() + dt1.parent = DataTree(name="real_root") + with pytest.raises(TreeIsomorphismError): + check_isomorphic(dt1, dt2, check_from_root=True) class TestMapOverSubTree: From 9727d3e79ea9e5963b0209fbff7983d472810aa1 Mon Sep 17 00:00:00 2001 From: Joe Hamman Date: Thu, 16 Dec 2021 19:23:04 -0800 Subject: [PATCH 075/260] fix bug where consolidated metadata is created after writing each group https://github.com/xarray-contrib/datatree/pull/52 --- xarray/datatree_/datatree/datatree.py | 8 +++++++- xarray/datatree_/datatree/io.py | 15 ++++++++++++++- xarray/datatree_/datatree/tests/test_datatree.py | 15 +++++++++++++++ 3 files changed, 36 insertions(+), 2 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 3874c14099b..83d0738bbfa 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -678,7 +678,9 @@ def to_netcdf( **kwargs, ) - def to_zarr(self, store, mode: str = "w", encoding=None, **kwargs): + def to_zarr( + self, store, mode: str = "w", encoding=None, consolidated: bool = True, **kwargs + ): """ Write datatree contents to a Zarr store. @@ -696,6 +698,9 @@ def to_zarr(self, store, mode: str = "w", encoding=None, **kwargs): variable specific encodings as values, e.g., ``{"root/set1": {"my_variable": {"dtype": "int16", "scale_factor": 0.1}, ...}, ...}``. See ``xarray.Dataset.to_zarr`` for available options. + consolidated : bool + If True, apply zarr's `consolidate_metadata` function to the store + after writing metadata for all groups. kwargs : Additional keyword arguments to be passed to ``xarray.Dataset.to_zarr`` """ @@ -706,6 +711,7 @@ def to_zarr(self, store, mode: str = "w", encoding=None, **kwargs): store, mode=mode, encoding=encoding, + consolidated=consolidated, **kwargs, ) diff --git a/xarray/datatree_/datatree/io.py b/xarray/datatree_/datatree/io.py index 533ded2b163..0c0c4eab161 100644 --- a/xarray/datatree_/datatree/io.py +++ b/xarray/datatree_/datatree/io.py @@ -191,7 +191,16 @@ def _create_empty_zarr_group(store, group, mode): root.create_group(group, overwrite=True) -def _datatree_to_zarr(dt: DataTree, store, mode: str = "w", encoding=None, **kwargs): +def _datatree_to_zarr( + dt: DataTree, + store, + mode: str = "w", + encoding=None, + consolidated: bool = True, + **kwargs, +): + + from zarr.convenience import consolidate_metadata if kwargs.get("group", None) is not None: raise NotImplementedError( @@ -215,7 +224,11 @@ def _datatree_to_zarr(dt: DataTree, store, mode: str = "w", encoding=None, **kwa group=group_path, mode=mode, encoding=_maybe_extract_group_kwargs(encoding, dt.pathstr), + consolidated=False, **kwargs, ) if "w" in mode: mode = "a" + + if consolidated: + consolidate_metadata(store) diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 3437ffe6390..bb30a3c913f 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -358,3 +358,18 @@ def test_to_zarr(self, tmpdir): roundtrip_dt = open_datatree(filepath, engine="zarr") assert_tree_equal(original_dt, roundtrip_dt) + + @requires_zarr + def test_to_zarr_not_consolidated(self, tmpdir): + filepath = tmpdir / "test.zarr" + zmetadata = filepath / ".zmetadata" + s1zmetadata = filepath / "set1" / ".zmetadata" + filepath = str(filepath) # casting to str avoids a pathlib bug in xarray + original_dt = create_test_datatree() + original_dt.to_zarr(filepath, consolidated=False) + assert not zmetadata.exists() + assert not s1zmetadata.exists() + + with pytest.warns(RuntimeWarning, match="consolidated"): + roundtrip_dt = open_datatree(filepath, engine="zarr") + assert_tree_equal(original_dt, roundtrip_dt) From c5b23b7828c08392fbbb1ccd60c157dcfacc6a52 Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Thu, 16 Dec 2021 22:41:14 -0500 Subject: [PATCH 076/260] Use new assert_equal functions in tests https://github.com/xarray-contrib/datatree/pull/53 * added assert functions * pseudocode for printing differences between trees - need tests * equals->identical fix * refactored to use same diff function for asserts and to check isomorphism internally * added tests of tree diff formatting * added option to check trees from the root * fix bugs with assert functions * convert tests to use new assert_equal function for tree comparisons * linting --- .../datatree/tests/test_dataset_api.py | 128 +++++++----------- .../datatree_/datatree/tests/test_datatree.py | 42 +++--- .../datatree_/datatree/tests/test_mapping.py | 19 +-- 3 files changed, 77 insertions(+), 112 deletions(-) diff --git a/xarray/datatree_/datatree/tests/test_dataset_api.py b/xarray/datatree_/datatree/tests/test_dataset_api.py index a7284ec25eb..9bc57d47da0 100644 --- a/xarray/datatree_/datatree/tests/test_dataset_api.py +++ b/xarray/datatree_/datatree/tests/test_dataset_api.py @@ -1,64 +1,56 @@ import numpy as np import xarray as xr -from xarray.testing import assert_equal from datatree import DataTree +from datatree.testing import assert_equal -from .test_datatree import assert_tree_equal, create_test_datatree +from .test_datatree import create_test_datatree class TestDSMethodInheritance: def test_dataset_method(self): - # test root - da = xr.DataArray(name="a", data=[1, 2, 3], dims="x") - dt = DataTree("root", data=da) - expected_ds = da.to_dataset().isel(x=1) - result_ds = dt.isel(x=1).ds - assert_equal(result_ds, expected_ds) - - # test descendant - DataTree("results", parent=dt, data=da) - result_ds = dt.isel(x=1)["results"].ds - assert_equal(result_ds, expected_ds) + ds = xr.Dataset({"a": ("x", [1, 2, 3])}) + dt = DataTree("root", data=ds) + DataTree("results", parent=dt, data=ds) + + expected = DataTree("root", data=ds.isel(x=1)) + DataTree("results", parent=expected, data=ds.isel(x=1)) + + result = dt.isel(x=1) + assert_equal(result, expected) def test_reduce_method(self): - # test root - da = xr.DataArray(name="a", data=[False, True, False], dims="x") - dt = DataTree("root", data=da) - expected_ds = da.to_dataset().any() - result_ds = dt.any().ds - assert_equal(result_ds, expected_ds) - - # test descendant - DataTree("results", parent=dt, data=da) - result_ds = dt.any()["results"].ds - assert_equal(result_ds, expected_ds) + ds = xr.Dataset({"a": ("x", [False, True, False])}) + dt = DataTree("root", data=ds) + DataTree("results", parent=dt, data=ds) + + expected = DataTree("root", data=ds.any()) + DataTree("results", parent=expected, data=ds.any()) + + result = dt.any() + assert_equal(result, expected) def test_nan_reduce_method(self): - # test root - da = xr.DataArray(name="a", data=[1, 2, 3], dims="x") - dt = DataTree("root", data=da) - expected_ds = da.to_dataset().mean() - result_ds = dt.mean().ds - assert_equal(result_ds, expected_ds) - - # test descendant - DataTree("results", parent=dt, data=da) - result_ds = dt.mean()["results"].ds - assert_equal(result_ds, expected_ds) + ds = xr.Dataset({"a": ("x", [1, 2, 3])}) + dt = DataTree("root", data=ds) + DataTree("results", parent=dt, data=ds) + + expected = DataTree("root", data=ds.mean()) + DataTree("results", parent=expected, data=ds.mean()) + + result = dt.mean() + assert_equal(result, expected) def test_cum_method(self): - # test root - da = xr.DataArray(name="a", data=[1, 2, 3], dims="x") - dt = DataTree("root", data=da) - expected_ds = da.to_dataset().cumsum() - result_ds = dt.cumsum().ds - assert_equal(result_ds, expected_ds) + ds = xr.Dataset({"a": ("x", [1, 2, 3])}) + dt = DataTree("root", data=ds) + DataTree("results", parent=dt, data=ds) + + expected = DataTree("root", data=ds.cumsum()) + DataTree("results", parent=expected, data=ds.cumsum()) - # test descendant - DataTree("results", parent=dt, data=da) - result_ds = dt.cumsum()["results"].ds - assert_equal(result_ds, expected_ds) + result = dt.cumsum() + assert_equal(result, expected) class TestOps: @@ -68,12 +60,11 @@ def test_binary_op_on_int(self): dt = DataTree("root", data=ds1) DataTree("subnode", data=ds2, parent=dt) - expected_root = DataTree("root", data=ds1 * 5) - expected_descendant = DataTree("subnode", data=ds2 * 5, parent=expected_root) - result = dt * 5 + expected = DataTree("root", data=ds1 * 5) + DataTree("subnode", data=ds2 * 5, parent=expected) - assert_equal(result.ds, expected_root.ds) - assert_equal(result["subnode"].ds, expected_descendant.ds) + result = dt * 5 + assert_equal(result, expected) def test_binary_op_on_dataset(self): ds1 = xr.Dataset({"a": [5], "b": [3]}) @@ -82,14 +73,11 @@ def test_binary_op_on_dataset(self): DataTree("subnode", data=ds2, parent=dt) other_ds = xr.Dataset({"z": ("z", [0.1, 0.2])}) - expected_root = DataTree("root", data=ds1 * other_ds) - expected_descendant = DataTree( - "subnode", data=ds2 * other_ds, parent=expected_root - ) - result = dt * other_ds + expected = DataTree("root", data=ds1 * other_ds) + DataTree("subnode", data=ds2 * other_ds, parent=expected) - assert_equal(result.ds, expected_root.ds) - assert_equal(result["subnode"].ds, expected_descendant.ds) + result = dt * other_ds + assert_equal(result, expected) def test_binary_op_on_datatree(self): ds1 = xr.Dataset({"a": [5], "b": [3]}) @@ -97,32 +85,16 @@ def test_binary_op_on_datatree(self): dt = DataTree("root", data=ds1) DataTree("subnode", data=ds2, parent=dt) - expected_root = DataTree("root", data=ds1 * ds1) - expected_descendant = DataTree("subnode", data=ds2 * ds2, parent=expected_root) - result = dt * dt + expected = DataTree("root", data=ds1 * ds1) + DataTree("subnode", data=ds2 * ds2, parent=expected) - assert_equal(result.ds, expected_root.ds) - assert_equal(result["subnode"].ds, expected_descendant.ds) + result = dt * dt + assert_equal(result, expected) class TestUFuncs: - def test_root(self): - da = xr.DataArray(name="a", data=[1, 2, 3]) - dt = DataTree("root", data=da) - expected_ds = np.sin(da.to_dataset()) - result_ds = np.sin(dt).ds - assert_equal(result_ds, expected_ds) - - def test_descendants(self): - da = xr.DataArray(name="a", data=[1, 2, 3]) - dt = DataTree("root") - DataTree("results", parent=dt, data=da) - expected_ds = np.sin(da.to_dataset()) - result_ds = np.sin(dt)["results"].ds - assert_equal(result_ds, expected_ds) - def test_tree(self): dt = create_test_datatree() expected = create_test_datatree(modify=lambda ds: np.sin(ds)) result_tree = np.sin(dt) - assert_tree_equal(result_tree, expected) + assert_equal(result_tree, expected) diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index bb30a3c913f..ccc5efc0806 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -1,22 +1,14 @@ import pytest import xarray as xr +import xarray.testing as xrt from anytree.resolver import ChildResolverError -from xarray.testing import assert_identical from datatree import DataTree from datatree.io import open_datatree +from datatree.testing import assert_equal from datatree.tests import requires_netCDF4, requires_zarr -def assert_tree_equal(dt_a, dt_b): - assert dt_a.parent is dt_b.parent - - for a, b in zip(dt_a.subtree, dt_b.subtree): - assert a.name == b.name - assert a.pathstr == b.pathstr - assert a.ds.equals(b.ds) - - def create_test_datatree(modify=lambda ds: ds): """ Create a test datatree with this structure: @@ -127,15 +119,15 @@ def test_get_node(self): def test_get_single_data_variable(self): data = xr.Dataset({"temp": [0, 50]}) results = DataTree("results", data=data) - assert_identical(results["temp"], data["temp"]) + xrt.assert_identical(results["temp"], data["temp"]) def test_get_single_data_variable_from_node(self): data = xr.Dataset({"temp": [0, 50]}) folder1 = DataTree("folder1") results = DataTree("results", parent=folder1) DataTree("highres", parent=results, data=data) - assert_identical(folder1["results/highres/temp"], data["temp"]) - assert_identical(folder1[("results", "highres", "temp")], data["temp"]) + xrt.assert_identical(folder1["results/highres/temp"], data["temp"]) + xrt.assert_identical(folder1[("results", "highres", "temp")], data["temp"]) def test_get_nonexistent_node(self): folder1 = DataTree("folder1") @@ -152,12 +144,12 @@ def test_get_nonexistent_variable(self): def test_get_multiple_data_variables(self): data = xr.Dataset({"temp": [0, 50], "p": [5, 8, 7]}) results = DataTree("results", data=data) - assert_identical(results[["temp", "p"]], data[["temp", "p"]]) + xrt.assert_identical(results[["temp", "p"]], data[["temp", "p"]]) def test_dict_like_selection_access_to_dataset(self): data = xr.Dataset({"temp": [0, 50]}) results = DataTree("results", data=data) - assert_identical(results[{"temp": 1}], data[{"temp": 1}]) + xrt.assert_identical(results[{"temp": 1}], data[{"temp": 1}]) class TestSetItems: @@ -180,17 +172,17 @@ def test_set_new_empty_node(self): john["mary"] = None mary = john["mary"] assert isinstance(mary, DataTree) - assert_identical(mary.ds, xr.Dataset()) + xrt.assert_identical(mary.ds, xr.Dataset()) def test_overwrite_data_in_node_with_none(self): john = DataTree("john") mary = DataTree("mary", parent=john, data=xr.Dataset()) john["mary"] = None - assert_identical(mary.ds, xr.Dataset()) + xrt.assert_identical(mary.ds, xr.Dataset()) john.ds = xr.Dataset() john["/"] = None - assert_identical(john.ds, xr.Dataset()) + xrt.assert_identical(john.ds, xr.Dataset()) def test_set_dataset_on_this_node(self): data = xr.Dataset({"temp": [0, 50]}) @@ -214,7 +206,7 @@ def test_set_named_dataarray_as_new_node(self): data = xr.DataArray(name="temp", data=[0, 50]) folder1 = DataTree("folder1") folder1["results"] = data - assert_identical(folder1["results"].ds, data.to_dataset()) + xrt.assert_identical(folder1["results"].ds, data.to_dataset()) def test_set_unnamed_dataarray(self): data = xr.DataArray([0, 50]) @@ -237,7 +229,7 @@ def test_dataarray_replace_existing_node(self): results = DataTree("results", data=t) p = xr.DataArray(name="pressure", data=[2, 3]) results["/"] = p - assert_identical(results.ds, p.to_dataset()) + xrt.assert_identical(results.ds, p.to_dataset()) class TestTreeCreation: @@ -246,7 +238,7 @@ def test_empty(self): assert dt.name == "root" assert dt.parent is None assert dt.children == () - assert_identical(dt.ds, xr.Dataset()) + xrt.assert_identical(dt.ds, xr.Dataset()) def test_data_in_root(self): dat = xr.Dataset() @@ -259,7 +251,7 @@ def test_data_in_root(self): def test_one_layer(self): dat1, dat2 = xr.Dataset({"a": 1}), xr.Dataset({"b": 2}) dt = DataTree.from_dict({"run1": dat1, "run2": dat2}) - assert_identical(dt.ds, xr.Dataset()) + xrt.assert_identical(dt.ds, xr.Dataset()) assert dt["run1"].ds is dat1 assert dt["run1"].children == () assert dt["run2"].ds is dat2 @@ -346,7 +338,7 @@ def test_to_netcdf(self, tmpdir): original_dt.to_netcdf(filepath, engine="netcdf4") roundtrip_dt = open_datatree(filepath) - assert_tree_equal(original_dt, roundtrip_dt) + assert_equal(original_dt, roundtrip_dt) @requires_zarr def test_to_zarr(self, tmpdir): @@ -357,7 +349,7 @@ def test_to_zarr(self, tmpdir): original_dt.to_zarr(filepath) roundtrip_dt = open_datatree(filepath, engine="zarr") - assert_tree_equal(original_dt, roundtrip_dt) + assert_equal(original_dt, roundtrip_dt) @requires_zarr def test_to_zarr_not_consolidated(self, tmpdir): @@ -372,4 +364,4 @@ def test_to_zarr_not_consolidated(self, tmpdir): with pytest.warns(RuntimeWarning, match="consolidated"): roundtrip_dt = open_datatree(filepath, engine="zarr") - assert_tree_equal(original_dt, roundtrip_dt) + assert_equal(original_dt, roundtrip_dt) diff --git a/xarray/datatree_/datatree/tests/test_mapping.py b/xarray/datatree_/datatree/tests/test_mapping.py index 98cc6027dde..742a0f07079 100644 --- a/xarray/datatree_/datatree/tests/test_mapping.py +++ b/xarray/datatree_/datatree/tests/test_mapping.py @@ -3,9 +3,10 @@ from datatree.datatree import DataTree from datatree.mapping import TreeIsomorphismError, check_isomorphic, map_over_subtree +from datatree.testing import assert_equal from datatree.treenode import TreeNode -from .test_datatree import assert_tree_equal, create_test_datatree +from .test_datatree import create_test_datatree empty = xr.Dataset() @@ -128,7 +129,7 @@ def times_ten(ds): expected = create_test_datatree(modify=lambda ds: 10.0 * ds) result_tree = times_ten(dt) - assert_tree_equal(result_tree, expected) + assert_equal(result_tree, expected) def test_single_dt_arg_plus_args_and_kwargs(self): dt = create_test_datatree() @@ -139,7 +140,7 @@ def multiply_then_add(ds, times, add=0.0): expected = create_test_datatree(modify=lambda ds: (10.0 * ds) + 2.0) result_tree = multiply_then_add(dt, 10.0, add=2.0) - assert_tree_equal(result_tree, expected) + assert_equal(result_tree, expected) def test_multiple_dt_args(self): dt1 = create_test_datatree() @@ -151,7 +152,7 @@ def add(ds1, ds2): expected = create_test_datatree(modify=lambda ds: 2.0 * ds) result = add(dt1, dt2) - assert_tree_equal(result, expected) + assert_equal(result, expected) def test_dt_as_kwarg(self): dt1 = create_test_datatree() @@ -163,7 +164,7 @@ def add(ds1, value=0.0): expected = create_test_datatree(modify=lambda ds: 2.0 * ds) result = add(dt1, value=dt2) - assert_tree_equal(result, expected) + assert_equal(result, expected) def test_return_multiple_dts(self): dt = create_test_datatree() @@ -174,9 +175,9 @@ def minmax(ds): dt_min, dt_max = minmax(dt) expected_min = create_test_datatree(modify=lambda ds: ds.min()) - assert_tree_equal(dt_min, expected_min) + assert_equal(dt_min, expected_min) expected_max = create_test_datatree(modify=lambda ds: ds.max()) - assert_tree_equal(dt_max, expected_max) + assert_equal(dt_max, expected_max) def test_return_wrong_type(self): dt1 = create_test_datatree() @@ -233,7 +234,7 @@ def nodewise_merge(node_ds, fixed_ds): other_ds = xr.Dataset({"z": ("z", [0])}) expected = create_test_datatree(modify=lambda ds: xr.merge([ds, other_ds])) result_tree = nodewise_merge(dt, other_ds) - assert_tree_equal(result_tree, expected) + assert_equal(result_tree, expected) @pytest.mark.xfail def test_trees_with_different_node_names(self): @@ -248,7 +249,7 @@ def multiply_then_add(ds, times, add=0.0): expected = create_test_datatree(modify=lambda ds: (10.0 * ds) + 2.0) result_tree = dt.map_over_subtree(multiply_then_add, 10.0, add=2.0) - assert_tree_equal(result_tree, expected) + assert_equal(result_tree, expected) @pytest.mark.xfail From 18309f01e6ffddf01443102e576557b30ed9ea7e Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Fri, 17 Dec 2021 10:45:54 -0500 Subject: [PATCH 077/260] add testing functions to API --- xarray/datatree_/docs/source/api.rst | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst index ee58db54af2..0abcd3d1243 100644 --- a/xarray/datatree_/docs/source/api.rst +++ b/xarray/datatree_/docs/source/api.rst @@ -90,6 +90,9 @@ Methods DataTree.drop_sel DataTree.drop_isel DataTree.drop_dims + DataTree.isomorphic + DataTree.equals + DataTree.identical DataTree.transpose DataTree.dropna DataTree.fillna @@ -153,3 +156,21 @@ I/O DataTree.items DataTree.keys DataTree.values + +Testing +=== + +.. autosummary:: + :toctree: generated/ + + testing.assert_isomorphic + testing.assert_equal + testing.assert_identical + +Exceptions +=== + +.. autosummary:: + :toctree: generated/ + + TreeIsomorphismError \ No newline at end of file From 49cb5dc230e7cc120403cc32a47eb870906be0fe Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Fri, 17 Dec 2021 10:50:32 -0500 Subject: [PATCH 078/260] newline at end of file --- xarray/datatree_/docs/source/api.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst index 0abcd3d1243..f0a56cc027d 100644 --- a/xarray/datatree_/docs/source/api.rst +++ b/xarray/datatree_/docs/source/api.rst @@ -173,4 +173,4 @@ Exceptions .. autosummary:: :toctree: generated/ - TreeIsomorphismError \ No newline at end of file + TreeIsomorphismError From f591be2991e888faac0ede89a1335513daea96f5 Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Fri, 17 Dec 2021 10:51:36 -0500 Subject: [PATCH 079/260] Create pull_request_template.md --- xarray/datatree_/.github/pull_request_template.md | 6 ++++++ 1 file changed, 6 insertions(+) create mode 100644 xarray/datatree_/.github/pull_request_template.md diff --git a/xarray/datatree_/.github/pull_request_template.md b/xarray/datatree_/.github/pull_request_template.md new file mode 100644 index 00000000000..e144c6adaa3 --- /dev/null +++ b/xarray/datatree_/.github/pull_request_template.md @@ -0,0 +1,6 @@ + + +- [ ] Closes #xxxx +- [ ] Tests added +- [ ] Passes `pre-commit run --all-files` +- [ ] New functions/methods are listed in `api.rst` From 549d1996446f125f175c4bdfeef10d527734735b Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Fri, 17 Dec 2021 12:45:28 -0500 Subject: [PATCH 080/260] think I've fixed the bug --- xarray/datatree_/datatree/mapping.py | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/xarray/datatree_/datatree/mapping.py b/xarray/datatree_/datatree/mapping.py index 29608779614..5c5aa1b9681 100644 --- a/xarray/datatree_/datatree/mapping.py +++ b/xarray/datatree_/datatree/mapping.py @@ -215,6 +215,8 @@ def _map_over_subtree(*args, **kwargs): # Find out how many return values we received num_return_values = _check_all_return_values(out_data_objects) + ancestors_of_new_root = first_tree.pathstr.removesuffix(first_tree.name) + # Reconstruct 1+ subtrees from the dict of results, by filling in all nodes of all result trees result_trees = [] for i in range(num_return_values): @@ -228,7 +230,11 @@ def _map_over_subtree(*args, **kwargs): output_node_data = out_data_objects[p] else: output_node_data = None - out_tree_contents[p] = output_node_data + + # Discard parentage so that new trees don't include parents of input nodes + # TODO use a proper relative_path method on DataTree(/TreeNode) to do this + relative_path = p.removeprefix(ancestors_of_new_root) + out_tree_contents[relative_path] = output_node_data new_tree = DataTree.from_dict( name=first_tree.name, data_objects=out_tree_contents From 9afcd21f2050da9d5bcfe2a899ec7b908e35a27e Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Fri, 17 Dec 2021 12:45:58 -0500 Subject: [PATCH 081/260] used feature from python 3.9 --- xarray/datatree_/datatree/_version.py | 2 +- xarray/datatree_/setup.py | 5 +---- 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/xarray/datatree_/datatree/_version.py b/xarray/datatree_/datatree/_version.py index e1068e8b8df..4c803ed9cb8 100644 --- a/xarray/datatree_/datatree/_version.py +++ b/xarray/datatree_/datatree/_version.py @@ -1 +1 @@ -__version__ = "0.1.dev75+g977ffe2.d20210902" +__version__ = "0.1.dev94+g6c6f23c.d20211217" diff --git a/xarray/datatree_/setup.py b/xarray/datatree_/setup.py index cae7a90389c..7670aae7d68 100644 --- a/xarray/datatree_/setup.py +++ b/xarray/datatree_/setup.py @@ -26,14 +26,11 @@ "Topic :: Scientific/Engineering", "License :: OSI Approved :: Apache License", "Operating System :: OS Independent", - "Programming Language :: Python :: 3", - "Programming Language :: Python :: 3.7", - "Programming Language :: Python :: 3.8", "Programming Language :: Python :: 3.9", ], packages=find_packages(exclude=["docs", "tests", "tests.*", "docs.*"]), install_requires=install_requires, - python_requires=">=3.7", + python_requires=">=3.9", setup_requires="setuptools_scm", use_scm_version={ "write_to": "datatree/_version.py", From 9f02a88ce92ea7de64f71c5de5b803881b12d37b Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Fri, 17 Dec 2021 12:46:18 -0500 Subject: [PATCH 082/260] test but doesn't yet work properly --- xarray/datatree_/datatree/tests/test_mapping.py | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/xarray/datatree_/datatree/tests/test_mapping.py b/xarray/datatree_/datatree/tests/test_mapping.py index 742a0f07079..9a3b03aa7c3 100644 --- a/xarray/datatree_/datatree/tests/test_mapping.py +++ b/xarray/datatree_/datatree/tests/test_mapping.py @@ -251,6 +251,21 @@ def multiply_then_add(ds, times, add=0.0): result_tree = dt.map_over_subtree(multiply_then_add, 10.0, add=2.0) assert_equal(result_tree, expected) + def test_discard_ancestry(self): + # Check for datatree GH issue https://github.com/xarray-contrib/datatree/issues/48 + dt = create_test_datatree() + subtree = dt["set1"] + + @map_over_subtree + def times_ten(ds): + return 10.0 * ds + + expected = create_test_datatree(modify=lambda ds: 10.0 * ds)["set1"] + result_tree = times_ten(subtree) + print(result_tree) + print(expected) + assert_equal(result_tree, expected) + @pytest.mark.xfail class TestMapOverSubTreeInplace: From 1d6d96b9c35ed09f0b2013fdec6af541870fd716 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Fri, 17 Dec 2021 12:49:12 -0500 Subject: [PATCH 083/260] Revert accidental push to main This reverts commit 9f02a88ce92ea7de64f71c5de5b803881b12d37b. --- xarray/datatree_/datatree/tests/test_mapping.py | 15 --------------- 1 file changed, 15 deletions(-) diff --git a/xarray/datatree_/datatree/tests/test_mapping.py b/xarray/datatree_/datatree/tests/test_mapping.py index 9a3b03aa7c3..742a0f07079 100644 --- a/xarray/datatree_/datatree/tests/test_mapping.py +++ b/xarray/datatree_/datatree/tests/test_mapping.py @@ -251,21 +251,6 @@ def multiply_then_add(ds, times, add=0.0): result_tree = dt.map_over_subtree(multiply_then_add, 10.0, add=2.0) assert_equal(result_tree, expected) - def test_discard_ancestry(self): - # Check for datatree GH issue https://github.com/xarray-contrib/datatree/issues/48 - dt = create_test_datatree() - subtree = dt["set1"] - - @map_over_subtree - def times_ten(ds): - return 10.0 * ds - - expected = create_test_datatree(modify=lambda ds: 10.0 * ds)["set1"] - result_tree = times_ten(subtree) - print(result_tree) - print(expected) - assert_equal(result_tree, expected) - @pytest.mark.xfail class TestMapOverSubTreeInplace: From 0352ad478ffba94f26c0fd657bf80103e41e79b1 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Fri, 17 Dec 2021 12:49:24 -0500 Subject: [PATCH 084/260] Revert "used feature from python 3.9" This reverts commit 9afcd21f2050da9d5bcfe2a899ec7b908e35a27e. --- xarray/datatree_/datatree/_version.py | 2 +- xarray/datatree_/setup.py | 5 ++++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/xarray/datatree_/datatree/_version.py b/xarray/datatree_/datatree/_version.py index 4c803ed9cb8..e1068e8b8df 100644 --- a/xarray/datatree_/datatree/_version.py +++ b/xarray/datatree_/datatree/_version.py @@ -1 +1 @@ -__version__ = "0.1.dev94+g6c6f23c.d20211217" +__version__ = "0.1.dev75+g977ffe2.d20210902" diff --git a/xarray/datatree_/setup.py b/xarray/datatree_/setup.py index 7670aae7d68..cae7a90389c 100644 --- a/xarray/datatree_/setup.py +++ b/xarray/datatree_/setup.py @@ -26,11 +26,14 @@ "Topic :: Scientific/Engineering", "License :: OSI Approved :: Apache License", "Operating System :: OS Independent", + "Programming Language :: Python :: 3", + "Programming Language :: Python :: 3.7", + "Programming Language :: Python :: 3.8", "Programming Language :: Python :: 3.9", ], packages=find_packages(exclude=["docs", "tests", "tests.*", "docs.*"]), install_requires=install_requires, - python_requires=">=3.9", + python_requires=">=3.7", setup_requires="setuptools_scm", use_scm_version={ "write_to": "datatree/_version.py", From bc9afde18abe83782b8ff25da640ec0a89af82ac Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Fri, 17 Dec 2021 12:49:42 -0500 Subject: [PATCH 085/260] Revert "think I've fixed the bug" This reverts commit 549d1996446f125f175c4bdfeef10d527734735b. --- xarray/datatree_/datatree/mapping.py | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/xarray/datatree_/datatree/mapping.py b/xarray/datatree_/datatree/mapping.py index 5c5aa1b9681..29608779614 100644 --- a/xarray/datatree_/datatree/mapping.py +++ b/xarray/datatree_/datatree/mapping.py @@ -215,8 +215,6 @@ def _map_over_subtree(*args, **kwargs): # Find out how many return values we received num_return_values = _check_all_return_values(out_data_objects) - ancestors_of_new_root = first_tree.pathstr.removesuffix(first_tree.name) - # Reconstruct 1+ subtrees from the dict of results, by filling in all nodes of all result trees result_trees = [] for i in range(num_return_values): @@ -230,11 +228,7 @@ def _map_over_subtree(*args, **kwargs): output_node_data = out_data_objects[p] else: output_node_data = None - - # Discard parentage so that new trees don't include parents of input nodes - # TODO use a proper relative_path method on DataTree(/TreeNode) to do this - relative_path = p.removeprefix(ancestors_of_new_root) - out_tree_contents[relative_path] = output_node_data + out_tree_contents[p] = output_node_data new_tree = DataTree.from_dict( name=first_tree.name, data_objects=out_tree_contents From e7ee7b770c555f2589ae56fa1ff4b21dad2933f5 Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Thu, 13 Jan 2022 14:23:18 -0500 Subject: [PATCH 086/260] Fix bug with opening files using h5netcdf https://github.com/xarray-contrib/datatree/pull/57 * fix bug by correcting import * add roundtripping test for h5netcdf * add h5netcdf to CI environment --- xarray/datatree_/ci/environment.yml | 1 + xarray/datatree_/datatree/io.py | 4 ++-- xarray/datatree_/datatree/tests/__init__.py | 1 + xarray/datatree_/datatree/tests/test_datatree.py | 13 ++++++++++++- 4 files changed, 16 insertions(+), 3 deletions(-) diff --git a/xarray/datatree_/ci/environment.yml b/xarray/datatree_/ci/environment.yml index e379a9fab44..deab412a822 100644 --- a/xarray/datatree_/ci/environment.yml +++ b/xarray/datatree_/ci/environment.yml @@ -11,4 +11,5 @@ dependencies: - black - codecov - pytest-cov + - h5netcdf - zarr diff --git a/xarray/datatree_/datatree/io.py b/xarray/datatree_/datatree/io.py index 0c0c4eab161..36fc93defed 100644 --- a/xarray/datatree_/datatree/io.py +++ b/xarray/datatree_/datatree/io.py @@ -33,12 +33,12 @@ def _get_nc_dataset_class(engine): if engine == "netcdf4": from netCDF4 import Dataset elif engine == "h5netcdf": - from h5netcdf import Dataset + from h5netcdf.legacyapi import Dataset elif engine is None: try: from netCDF4 import Dataset except ImportError: - from h5netcdf import Dataset + from h5netcdf.legacyapi import Dataset else: raise ValueError(f"unsupported engine: {engine}") return Dataset diff --git a/xarray/datatree_/datatree/tests/__init__.py b/xarray/datatree_/datatree/tests/__init__.py index e5afc834c08..964cb635dc5 100644 --- a/xarray/datatree_/datatree/tests/__init__.py +++ b/xarray/datatree_/datatree/tests/__init__.py @@ -25,4 +25,5 @@ def LooseVersion(vstring): has_zarr, requires_zarr = _importorskip("zarr") +has_h5netcdf, requires_h5netcdf = _importorskip("h5netcdf") has_netCDF4, requires_netCDF4 = _importorskip("netCDF4") diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index ccc5efc0806..82952737518 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -6,7 +6,7 @@ from datatree import DataTree from datatree.io import open_datatree from datatree.testing import assert_equal -from datatree.tests import requires_netCDF4, requires_zarr +from datatree.tests import requires_h5netcdf, requires_netCDF4, requires_zarr def create_test_datatree(modify=lambda ds: ds): @@ -340,6 +340,17 @@ def test_to_netcdf(self, tmpdir): roundtrip_dt = open_datatree(filepath) assert_equal(original_dt, roundtrip_dt) + @requires_h5netcdf + def test_to_h5netcdf(self, tmpdir): + filepath = str( + tmpdir / "test.nc" + ) # casting to str avoids a pathlib bug in xarray + original_dt = create_test_datatree() + original_dt.to_netcdf(filepath, engine="h5netcdf") + + roundtrip_dt = open_datatree(filepath) + assert_equal(original_dt, roundtrip_dt) + @requires_zarr def test_to_zarr(self, tmpdir): filepath = str( From ac39f0de60ef3db9b0f6a469d9992c49597e5c2e Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Fri, 28 Jan 2022 12:02:49 -0500 Subject: [PATCH 087/260] Fix mapping parentage bug https://github.com/xarray-contrib/datatree/pull/54 * think I've fixed the bug * used feature from python 3.9 * test but doesn't yet work properly * only check subtree, not down to root * make sure choice whether to check from root is propagated * bump python version in CI * 3.10 instead of 3.1 --- xarray/datatree_/.github/workflows/main.yaml | 4 ++-- xarray/datatree_/datatree/_version.py | 2 +- xarray/datatree_/datatree/mapping.py | 8 +++++++- xarray/datatree_/datatree/testing.py | 6 +++--- xarray/datatree_/datatree/tests/test_mapping.py | 13 +++++++++++++ xarray/datatree_/setup.py | 6 ++---- 6 files changed, 28 insertions(+), 11 deletions(-) diff --git a/xarray/datatree_/.github/workflows/main.yaml b/xarray/datatree_/.github/workflows/main.yaml index 392a7565a9f..2e7c8bddfba 100644 --- a/xarray/datatree_/.github/workflows/main.yaml +++ b/xarray/datatree_/.github/workflows/main.yaml @@ -21,7 +21,7 @@ jobs: runs-on: ubuntu-latest strategy: matrix: - python-version: [3.7, 3.8, 3.9] + python-version: ["3.9", "3.10"] steps: - uses: actions/checkout@v2.4.0 - uses: conda-incubator/setup-miniconda@v2 @@ -59,7 +59,7 @@ jobs: runs-on: ubuntu-latest strategy: matrix: - python-version: [3.8, 3.9] + python-version: ["3.9", "3.10"] steps: - uses: actions/checkout@v2.4.0 - uses: conda-incubator/setup-miniconda@v2 diff --git a/xarray/datatree_/datatree/_version.py b/xarray/datatree_/datatree/_version.py index e1068e8b8df..4c803ed9cb8 100644 --- a/xarray/datatree_/datatree/_version.py +++ b/xarray/datatree_/datatree/_version.py @@ -1 +1 @@ -__version__ = "0.1.dev75+g977ffe2.d20210902" +__version__ = "0.1.dev94+g6c6f23c.d20211217" diff --git a/xarray/datatree_/datatree/mapping.py b/xarray/datatree_/datatree/mapping.py index 29608779614..5c5aa1b9681 100644 --- a/xarray/datatree_/datatree/mapping.py +++ b/xarray/datatree_/datatree/mapping.py @@ -215,6 +215,8 @@ def _map_over_subtree(*args, **kwargs): # Find out how many return values we received num_return_values = _check_all_return_values(out_data_objects) + ancestors_of_new_root = first_tree.pathstr.removesuffix(first_tree.name) + # Reconstruct 1+ subtrees from the dict of results, by filling in all nodes of all result trees result_trees = [] for i in range(num_return_values): @@ -228,7 +230,11 @@ def _map_over_subtree(*args, **kwargs): output_node_data = out_data_objects[p] else: output_node_data = None - out_tree_contents[p] = output_node_data + + # Discard parentage so that new trees don't include parents of input nodes + # TODO use a proper relative_path method on DataTree(/TreeNode) to do this + relative_path = p.removeprefix(ancestors_of_new_root) + out_tree_contents[relative_path] = output_node_data new_tree = DataTree.from_dict( name=first_tree.name, data_objects=out_tree_contents diff --git a/xarray/datatree_/datatree/testing.py b/xarray/datatree_/datatree/testing.py index 2b33a5b3ea0..a89cfb0f103 100644 --- a/xarray/datatree_/datatree/testing.py +++ b/xarray/datatree_/datatree/testing.py @@ -41,7 +41,7 @@ def assert_isomorphic(a: DataTree, b: DataTree, from_root: bool = False): a = a.root b = b.root - assert a.isomorphic(b, from_root=False), diff_tree_repr(a, b, "isomorphic") + assert a.isomorphic(b, from_root=from_root), diff_tree_repr(a, b, "isomorphic") else: raise TypeError(f"{type(a)} not of type DataTree") @@ -78,7 +78,7 @@ def assert_equal(a: DataTree, b: DataTree, from_root: bool = True): a = a.root b = b.root - assert a.equals(b), diff_tree_repr(a, b, "equals") + assert a.equals(b, from_root=from_root), diff_tree_repr(a, b, "equals") else: raise TypeError(f"{type(a)} not of type DataTree") @@ -115,6 +115,6 @@ def assert_identical(a: DataTree, b: DataTree, from_root: bool = True): a = a.root b = b.root - assert a.identical(b), diff_tree_repr(a, b, "identical") + assert a.identical(b, from_root=from_root), diff_tree_repr(a, b, "identical") else: raise TypeError(f"{type(a)} not of type DataTree") diff --git a/xarray/datatree_/datatree/tests/test_mapping.py b/xarray/datatree_/datatree/tests/test_mapping.py index 742a0f07079..8ea4682b137 100644 --- a/xarray/datatree_/datatree/tests/test_mapping.py +++ b/xarray/datatree_/datatree/tests/test_mapping.py @@ -251,6 +251,19 @@ def multiply_then_add(ds, times, add=0.0): result_tree = dt.map_over_subtree(multiply_then_add, 10.0, add=2.0) assert_equal(result_tree, expected) + def test_discard_ancestry(self): + # Check for datatree GH issue https://github.com/xarray-contrib/datatree/issues/48 + dt = create_test_datatree() + subtree = dt["set1"] + + @map_over_subtree + def times_ten(ds): + return 10.0 * ds + + expected = create_test_datatree(modify=lambda ds: 10.0 * ds)["set1"] + result_tree = times_ten(subtree) + assert_equal(result_tree, expected, from_root=False) + @pytest.mark.xfail class TestMapOverSubTreeInplace: diff --git a/xarray/datatree_/setup.py b/xarray/datatree_/setup.py index cae7a90389c..6eabd3879a0 100644 --- a/xarray/datatree_/setup.py +++ b/xarray/datatree_/setup.py @@ -26,14 +26,12 @@ "Topic :: Scientific/Engineering", "License :: OSI Approved :: Apache License", "Operating System :: OS Independent", - "Programming Language :: Python :: 3", - "Programming Language :: Python :: 3.7", - "Programming Language :: Python :: 3.8", "Programming Language :: Python :: 3.9", + "Programming Language :: Python :: 3.10", ], packages=find_packages(exclude=["docs", "tests", "tests.*", "docs.*"]), install_requires=install_requires, - python_requires=">=3.7", + python_requires=">=3.9", setup_requires="setuptools_scm", use_scm_version={ "write_to": "datatree/_version.py", From 122735ee803eeacadc48601139cdb87c92265f0f Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Fri, 28 Jan 2022 13:30:00 -0500 Subject: [PATCH 088/260] Add parent info to repr https://github.com/xarray-contrib/datatree/pull/59 * think I've fixed the bug * used feature from python 3.9 * test but doesn't yet work properly * only check subtree, not down to root * add parent info to repr * trigger CI --- xarray/datatree_/datatree/datatree.py | 8 ++++++++ xarray/datatree_/datatree/tests/test_datatree.py | 4 ++-- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 83d0738bbfa..8216e7e96e6 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -223,6 +223,12 @@ def __str__(self): else: lines.append(f"{fill}{line}") + # Tack on info about whether or not root node has a parent at the start + first_line = lines[0] + parent = f'"{self.parent.name}"' if self.parent is not None else "None" + first_line_with_parent = first_line[:-1] + f", parent={parent})" + lines[0] = first_line_with_parent + return "\n".join(lines) def _single_node_repr(self): @@ -433,6 +439,8 @@ def isomorphic( from_root : bool, optional, default is False Whether or not to first traverse to the root of the trees before checking for isomorphism. If a & b have no parents then this has no effect. + strict_names : bool, optional, default is False + Whether or not to also check that each node has the same name as its counterpart. See Also -------- diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 82952737518..a5b655cf743 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -291,14 +291,14 @@ class TestRepr: def test_print_empty_node(self): dt = DataTree("root") printout = dt.__str__() - assert printout == "DataTree('root')" + assert printout == "DataTree('root', parent=None)" def test_print_node_with_data(self): dat = xr.Dataset({"a": [0, 2]}) dt = DataTree("root", data=dat) printout = dt.__str__() expected = [ - "DataTree('root')", + "DataTree('root', parent=None)", "Dimensions", "Coordinates", "a", From ae5b8933978a9c5d7dff4ec5a230dd9a6bcfecf2 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 14 Feb 2022 17:07:58 -0500 Subject: [PATCH 089/260] Bump actions/setup-python from 2.3.1 to 2.3.2 https://github.com/xarray-contrib/datatree/pull/60 Bumps [actions/setup-python](https://github.com/actions/setup-python) from 2.3.1 to 2.3.2. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](https://github.com/actions/setup-python/compare/v2.3.1...v2.3.2) --- updated-dependencies: - dependency-name: actions/setup-python dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- xarray/datatree_/.github/workflows/main.yaml | 2 +- xarray/datatree_/.github/workflows/pypipublish.yaml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/xarray/datatree_/.github/workflows/main.yaml b/xarray/datatree_/.github/workflows/main.yaml index 2e7c8bddfba..4397ea405d9 100644 --- a/xarray/datatree_/.github/workflows/main.yaml +++ b/xarray/datatree_/.github/workflows/main.yaml @@ -13,7 +13,7 @@ jobs: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2.4.0 - - uses: actions/setup-python@v2.3.1 + - uses: actions/setup-python@v2.3.2 - uses: pre-commit/action@v2.0.3 test: diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index 63bb976d2f0..cea58b9d4c3 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -10,7 +10,7 @@ jobs: steps: - uses: actions/checkout@v2.4.0 - name: Set up Python - uses: actions/setup-python@v2.3.1 + uses: actions/setup-python@v2.3.2 with: python-version: "3.x" - name: Install dependencies From 1888e91e31bcd9da649b6bbb213518cf71c69b41 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue, 1 Mar 2022 13:26:24 -0500 Subject: [PATCH 090/260] Bump actions/setup-python from 2.3.2 to 3 https://github.com/xarray-contrib/datatree/pull/63 Bumps [actions/setup-python](https://github.com/actions/setup-python) from 2.3.2 to 3. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](https://github.com/actions/setup-python/compare/v2.3.2...v3) --- updated-dependencies: - dependency-name: actions/setup-python dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- xarray/datatree_/.github/workflows/main.yaml | 2 +- xarray/datatree_/.github/workflows/pypipublish.yaml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/xarray/datatree_/.github/workflows/main.yaml b/xarray/datatree_/.github/workflows/main.yaml index 4397ea405d9..6f0aa0db79c 100644 --- a/xarray/datatree_/.github/workflows/main.yaml +++ b/xarray/datatree_/.github/workflows/main.yaml @@ -13,7 +13,7 @@ jobs: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2.4.0 - - uses: actions/setup-python@v2.3.2 + - uses: actions/setup-python@v3 - uses: pre-commit/action@v2.0.3 test: diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index cea58b9d4c3..f7b1641e2e6 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -10,7 +10,7 @@ jobs: steps: - uses: actions/checkout@v2.4.0 - name: Set up Python - uses: actions/setup-python@v2.3.2 + uses: actions/setup-python@v3 with: python-version: "3.x" - name: Install dependencies From e6dc4d8e3e8881d266dce024402f9b1ca9c50657 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Thu, 3 Mar 2022 10:56:29 -0500 Subject: [PATCH 091/260] Bump actions/checkout from 2.4.0 to 3 https://github.com/xarray-contrib/datatree/pull/64 Bumps [actions/checkout](https://github.com/actions/checkout) from 2.4.0 to 3. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/v2.4.0...v3) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- xarray/datatree_/.github/workflows/main.yaml | 6 +++--- xarray/datatree_/.github/workflows/pypipublish.yaml | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/xarray/datatree_/.github/workflows/main.yaml b/xarray/datatree_/.github/workflows/main.yaml index 6f0aa0db79c..d51cb2aab69 100644 --- a/xarray/datatree_/.github/workflows/main.yaml +++ b/xarray/datatree_/.github/workflows/main.yaml @@ -12,7 +12,7 @@ jobs: lint: runs-on: ubuntu-latest steps: - - uses: actions/checkout@v2.4.0 + - uses: actions/checkout@v3 - uses: actions/setup-python@v3 - uses: pre-commit/action@v2.0.3 @@ -23,7 +23,7 @@ jobs: matrix: python-version: ["3.9", "3.10"] steps: - - uses: actions/checkout@v2.4.0 + - uses: actions/checkout@v3 - uses: conda-incubator/setup-miniconda@v2 with: mamba-version: "*" @@ -61,7 +61,7 @@ jobs: matrix: python-version: ["3.9", "3.10"] steps: - - uses: actions/checkout@v2.4.0 + - uses: actions/checkout@v3 - uses: conda-incubator/setup-miniconda@v2 with: mamba-version: "*" diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index f7b1641e2e6..f974295bb01 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -8,7 +8,7 @@ jobs: deploy: runs-on: ubuntu-latest steps: - - uses: actions/checkout@v2.4.0 + - uses: actions/checkout@v3 - name: Set up Python uses: actions/setup-python@v3 with: From 86baba6cf62a61d82f715a4b58ab8453ed5899b0 Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Thu, 3 Mar 2022 10:57:52 -0500 Subject: [PATCH 092/260] Quick Overview docs page https://github.com/xarray-contrib/datatree/pull/62 * wrote quick overview page * extremely basic intallation instructions * version 0.0.1 * updated with .from_dict constructor change * linting --- xarray/datatree_/docs/source/api.rst | 2 +- xarray/datatree_/docs/source/conf.py | 6 +- xarray/datatree_/docs/source/index.rst | 14 ++-- xarray/datatree_/docs/source/installation.rst | 19 ++++- .../datatree_/docs/source/quick-overview.rst | 83 +++++++++++++++++++ 5 files changed, 115 insertions(+), 9 deletions(-) create mode 100644 xarray/datatree_/docs/source/quick-overview.rst diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst index f0a56cc027d..5398aff888d 100644 --- a/xarray/datatree_/docs/source/api.rst +++ b/xarray/datatree_/docs/source/api.rst @@ -11,7 +11,6 @@ DataTree :toctree: generated/ DataTree - DataNode Attributes ---------- @@ -51,6 +50,7 @@ Methods .. autosummary:: :toctree: generated/ + DataTree.from_dict DataTree.load DataTree.compute DataTree.persist diff --git a/xarray/datatree_/docs/source/conf.py b/xarray/datatree_/docs/source/conf.py index e89e2656b3a..5a9c0403843 100644 --- a/xarray/datatree_/docs/source/conf.py +++ b/xarray/datatree_/docs/source/conf.py @@ -45,6 +45,8 @@ "sphinx.ext.intersphinx", "sphinx.ext.extlinks", "sphinx.ext.napoleon", + "IPython.sphinxext.ipython_console_highlighting", + "IPython.sphinxext.ipython_directive", ] extlinks = { @@ -76,9 +78,9 @@ # built documents. # # The short X.Y version. -version = "0.0.0" # datatree.__version__ +version = "0.0.1" # datatree.__version__ # The full version, including alpha/beta/rc tags. -release = "0.0.0" # datatree.__version__ +release = "0.0.1" # datatree.__version__ # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. diff --git a/xarray/datatree_/docs/source/index.rst b/xarray/datatree_/docs/source/index.rst index fa1604101cd..4ba2890405d 100644 --- a/xarray/datatree_/docs/source/index.rst +++ b/xarray/datatree_/docs/source/index.rst @@ -1,17 +1,21 @@ Datatree ======== -**Datatree is a WIP implementation of a tree-like hierarchical data structure for xarray.** +**Datatree is a prototype implementation of a tree-like hierarchical data structure for xarray.** .. toctree:: :maxdepth: 2 :caption: Documentation Contents - installation - tutorial - api - contributing + Installation + Quick Overview + Tutorial + API Reference + How do I ... + Contributing Guide + Development Roadmap + GitHub repository Feedback -------- diff --git a/xarray/datatree_/docs/source/installation.rst b/xarray/datatree_/docs/source/installation.rst index c4e4c7fc468..e2cfeae1067 100644 --- a/xarray/datatree_/docs/source/installation.rst +++ b/xarray/datatree_/docs/source/installation.rst @@ -2,4 +2,21 @@ Installation ============ -Coming soon! +Datatree is not yet available on pypi or via conda, so for now you will have to install it from source. + +``git clone https://github.com/TomNicholas/datatree.git``` + +``pip install -e ./datatree/`` + +The main branch will be kept up-to-date, so if you clone main and run the test suite with ``pytest datatree`` and get no failures, +then you have the most up-to-date version. + +You will need xarray and `anytree `_ +as dependencies, with netcdf4, zarr, and h5netcdf as optional dependencies to allow file I/O. + +.. note:: + + Datatree is very much still in the early stages of development. There may be functions that are present but whose + internals are not yet implemented, or significant changes to the API in future. + That said, if you try it out and find some behaviour that looks like a bug to you, please report it on the + `issue tracker `_! diff --git a/xarray/datatree_/docs/source/quick-overview.rst b/xarray/datatree_/docs/source/quick-overview.rst new file mode 100644 index 00000000000..b5ea1d1ffd2 --- /dev/null +++ b/xarray/datatree_/docs/source/quick-overview.rst @@ -0,0 +1,83 @@ +############## +Quick overview +############## + +DataTrees +--------- + +:py:class:`DataTree` is a tree-like container of ``DataArray`` objects, organised into multiple mutually alignable groups. +You can think of it like a (recursive) ``dict`` of ``Dataset`` objects. + +Let's first make some example xarray datasets (following on from xarray's +`quick overview `_ page): + +.. ipython:: python + + import numpy as np + import xarray as xr + + data = xr.DataArray(np.random.randn(2, 3), dims=("x", "y"), coords={"x": [10, 20]}) + ds = xr.Dataset(dict(foo=data, bar=("x", [1, 2]), baz=np.pi)) + ds + + ds2 = ds.interp(coords={"x": [10, 12, 14, 16, 18, 20]}) + ds2 + + ds3 = xr.Dataset( + dict(people=["alice", "bob"], heights=("people", [1.57, 1.82])), + coords={"species": "human"}, + ) + ds3 + +Now we'll put this data into a multi-group tree: + +.. ipython:: python + + from datatree import DataTree + + dt = DataTree.from_dict( + {"root/simulation/coarse": ds, "root/simulation/fine": ds2, "root": ds3} + ) + print(dt) + +This creates a datatree with various groups. We have one root group (named ``root``), containing information about individual people. +The root group then has one subgroup ``simulation``, which contains no data itself but does contain another two subgroups, +named ``fine`` and ``coarse``. + +The (sub-)sub-groups ``fine`` and ``coarse`` contain two very similar datasets. +They both have an ``"x"`` dimension, but the dimension is of different lengths in each group, which makes the data in each group unalignable. +In (``root``) we placed some completely unrelated information, showing how we can use a tree to store heterogenous data. + +The constraints on each group are therefore the same as the constraint on dataarrays within a single dataset. + +We created the sub-groups using a filesystem-like syntax, and accessing groups works the same way. +We can access individual dataarrays in a similar fashion + +.. ipython:: python + + dt["simulation/coarse/foo"] + +and we can also pull out the data in a particular group as a ``Dataset`` object using ``.ds``: + +.. ipython:: python + + dt["simulation/coarse"].ds + +Operations map over subtrees, so we can take a mean over the ``x`` dimension of both the ``fine`` and ``coarse`` groups just by + +.. ipython:: python + + avg = dt["simulation"].mean(dim="x") + print(avg) + +Here the ``"x"`` dimension used is always the one local to that sub-group. + +You can do almost everything you can do with ``Dataset`` objects with ``DataTree`` objects +(including indexing and arithmetic), as operations will be mapped over every sub-group in the tree. +This allows you to work with multiple groups of non-alignable variables at once. + +.. note:: + + If all of your variables are mutually alignable + (i.e. they live on the same grid, such that every common dimension name maps to the same length), + then you probably don't need :py:class:`DataTree`, and should consider just sticking with ``xarray.Dataset``. From 5f16828bfbd01e56243bd5a4eba4cfbdc6be6058 Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Thu, 3 Mar 2022 12:23:41 -0500 Subject: [PATCH 093/260] Enable ReadTheDocs https://github.com/xarray-contrib/datatree/pull/65 * update requirements * add rtd yaml config * linting --- xarray/datatree_/docs/requirements.txt | 5 ++++- xarray/datatree_/readthedocs.yml | 10 ++++++++++ 2 files changed, 14 insertions(+), 1 deletion(-) create mode 100644 xarray/datatree_/readthedocs.yml diff --git a/xarray/datatree_/docs/requirements.txt b/xarray/datatree_/docs/requirements.txt index 6a10e1ab22f..ed3f1440212 100644 --- a/xarray/datatree_/docs/requirements.txt +++ b/xarray/datatree_/docs/requirements.txt @@ -1,3 +1,6 @@ -sphinx>=3.1 +xarray>=0.21.1 +ipython +sphinx>=3.2 +sphinx_rtd_theme sphinx_copybutton sphinx-autosummary-accessors diff --git a/xarray/datatree_/readthedocs.yml b/xarray/datatree_/readthedocs.yml new file mode 100644 index 00000000000..b3a0483ef03 --- /dev/null +++ b/xarray/datatree_/readthedocs.yml @@ -0,0 +1,10 @@ +version: 2 + +build: + image: latest + +python: + install: + - requirements: docs/requirements.txt + - method: pip + path: . From 1f81fd698eba1e33f1afdcda0f53229c2c07a6ed Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Thu, 3 Mar 2022 12:28:51 -0500 Subject: [PATCH 094/260] make xarray version requirements consistent --- xarray/datatree_/docs/requirements.txt | 2 +- xarray/datatree_/requirements.txt | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/xarray/datatree_/docs/requirements.txt b/xarray/datatree_/docs/requirements.txt index ed3f1440212..fe2653f217a 100644 --- a/xarray/datatree_/docs/requirements.txt +++ b/xarray/datatree_/docs/requirements.txt @@ -1,4 +1,4 @@ -xarray>=0.21.1 +xarray>=0.20.2 ipython sphinx>=3.2 sphinx_rtd_theme diff --git a/xarray/datatree_/requirements.txt b/xarray/datatree_/requirements.txt index a95f277b2f7..2a32f4e3969 100644 --- a/xarray/datatree_/requirements.txt +++ b/xarray/datatree_/requirements.txt @@ -1,3 +1,3 @@ -xarray>=0.19.0 +xarray>=0.20.2 anytree future From 9af72aa58411ebe13c9f087f2e9af3fa28ce0958 Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Thu, 3 Mar 2022 12:59:53 -0500 Subject: [PATCH 095/260] Conda env for docs https://github.com/xarray-contrib/datatree/pull/66 * added conda env for building docs * updated testing ci env * point rtd to new conda env * remove unneeded future impor * removed docs requirements file in favour of conda env * try to prevent loading two versions of python simultaneously --- xarray/datatree_/ci/doc.yml | 18 ++++++++++++++++++ xarray/datatree_/ci/environment.yml | 5 +++-- xarray/datatree_/docs/requirements.txt | 6 ------ xarray/datatree_/readthedocs.yml | 5 ++++- xarray/datatree_/requirements.txt | 1 - 5 files changed, 25 insertions(+), 10 deletions(-) create mode 100644 xarray/datatree_/ci/doc.yml delete mode 100644 xarray/datatree_/docs/requirements.txt diff --git a/xarray/datatree_/ci/doc.yml b/xarray/datatree_/ci/doc.yml new file mode 100644 index 00000000000..02d7d5543dc --- /dev/null +++ b/xarray/datatree_/ci/doc.yml @@ -0,0 +1,18 @@ +name: datatree-doc +channels: + - conda-forge +dependencies: + - pip + - python>=3.9 + - xarray>=0.20.2 + - netcdf4 + - anytree + - sphinx + - sphinx-copybutton + - numpydoc + - sphinx-autosummary-accessors + - ipython + - h5netcdf + - zarr + - pip: + - git+https://github.com/xarray-contrib/datatree diff --git a/xarray/datatree_/ci/environment.yml b/xarray/datatree_/ci/environment.yml index deab412a822..dce5ee85a64 100644 --- a/xarray/datatree_/ci/environment.yml +++ b/xarray/datatree_/ci/environment.yml @@ -1,9 +1,10 @@ -name: datatree +name: datatree-test channels: - conda-forge - nodefaults dependencies: - - xarray >=0.19.0 + - python>=3.9 + - xarray>=0.20.2 - netcdf4 - anytree - pytest diff --git a/xarray/datatree_/docs/requirements.txt b/xarray/datatree_/docs/requirements.txt deleted file mode 100644 index fe2653f217a..00000000000 --- a/xarray/datatree_/docs/requirements.txt +++ /dev/null @@ -1,6 +0,0 @@ -xarray>=0.20.2 -ipython -sphinx>=3.2 -sphinx_rtd_theme -sphinx_copybutton -sphinx-autosummary-accessors diff --git a/xarray/datatree_/readthedocs.yml b/xarray/datatree_/readthedocs.yml index b3a0483ef03..d634f48e9ec 100644 --- a/xarray/datatree_/readthedocs.yml +++ b/xarray/datatree_/readthedocs.yml @@ -3,8 +3,11 @@ version: 2 build: image: latest +# Optionally set the version of Python and requirements required to build your docs +conda: + environment: ci/doc.yml + python: install: - - requirements: docs/requirements.txt - method: pip path: . diff --git a/xarray/datatree_/requirements.txt b/xarray/datatree_/requirements.txt index 2a32f4e3969..8e2d30ac15c 100644 --- a/xarray/datatree_/requirements.txt +++ b/xarray/datatree_/requirements.txt @@ -1,3 +1,2 @@ xarray>=0.20.2 anytree -future From 55d425ccbf0a6f7adc9053060ead3d30fcff9e0b Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Thu, 3 Mar 2022 13:10:15 -0500 Subject: [PATCH 096/260] add scipy to docs env so that works --- xarray/datatree_/ci/doc.yml | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) create mode 100644 xarray/datatree_/ci/doc.yml diff --git a/xarray/datatree_/ci/doc.yml b/xarray/datatree_/ci/doc.yml new file mode 100644 index 00000000000..012cf69e75f --- /dev/null +++ b/xarray/datatree_/ci/doc.yml @@ -0,0 +1,19 @@ +name: datatree-doc +channels: + - conda-forge +dependencies: + - pip + - python>=3.9 + - xarray>=0.20.2 + - netcdf4 + - anytree + - scipy + - sphinx + - sphinx-copybutton + - numpydoc + - sphinx-autosummary-accessors + - ipython + - h5netcdf + - zarr + - pip: + - git+https://github.com/xarray-contrib/datatree From 13fe531ba89b05da48b988fa5de0298425b4ed37 Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Mon, 21 Mar 2022 15:00:41 -0400 Subject: [PATCH 097/260] Update setup.py --- xarray/datatree_/setup.py | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/xarray/datatree_/setup.py b/xarray/datatree_/setup.py index 6eabd3879a0..ba2c1c160b6 100644 --- a/xarray/datatree_/setup.py +++ b/xarray/datatree_/setup.py @@ -16,16 +16,17 @@ name="datatree", description="Hierarchical tree-like data structures for xarray", long_description=long_description, - url="https://github.com/TomNicholas/datatree", + url="https://github.com/xarray-contrib/datatree", author="Thomas Nicholas", author_email="thomas.nicholas@columbia.edu", license="Apache", classifiers=[ - "Development Status :: 5 - Production/Stable", + "Development Status :: 3 - Alpha", "Intended Audience :: Science/Research", "Topic :: Scientific/Engineering", "License :: OSI Approved :: Apache License", "Operating System :: OS Independent", + "Programming Language :: Python", "Programming Language :: Python :: 3.9", "Programming Language :: Python :: 3.10", ], From bd1d6dd95ed567989adab5036c1a9a738e23a9cc Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Mon, 21 Mar 2022 15:03:30 -0400 Subject: [PATCH 098/260] Add first version number --- xarray/datatree_/setup.py | 1 + 1 file changed, 1 insertion(+) diff --git a/xarray/datatree_/setup.py b/xarray/datatree_/setup.py index ba2c1c160b6..13acc8647ce 100644 --- a/xarray/datatree_/setup.py +++ b/xarray/datatree_/setup.py @@ -14,6 +14,7 @@ setup( name="datatree", + version="0.0.1", description="Hierarchical tree-like data structures for xarray", long_description=long_description, url="https://github.com/xarray-contrib/datatree", From 928fd956a31b305fc44f093f25ce0ca2fd20102b Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Mon, 21 Mar 2022 15:41:30 -0400 Subject: [PATCH 099/260] Add version number for anytree to requirements --- xarray/datatree_/requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/requirements.txt b/xarray/datatree_/requirements.txt index 8e2d30ac15c..bad07301c3e 100644 --- a/xarray/datatree_/requirements.txt +++ b/xarray/datatree_/requirements.txt @@ -1,2 +1,2 @@ xarray>=0.20.2 -anytree +anytree>=2.8.0 From dd75acc3923e6e89331b0a21c9af6bf79521ca91 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 30 Mar 2022 16:36:05 -0400 Subject: [PATCH 100/260] remove _version.py --- xarray/datatree_/.gitignore | 3 +++ xarray/datatree_/datatree/_version.py | 1 - 2 files changed, 3 insertions(+), 1 deletion(-) delete mode 100644 xarray/datatree_/datatree/_version.py diff --git a/xarray/datatree_/.gitignore b/xarray/datatree_/.gitignore index ee3bee05376..64f6a86852e 100644 --- a/xarray/datatree_/.gitignore +++ b/xarray/datatree_/.gitignore @@ -128,3 +128,6 @@ dmypy.json # Pyre type checker .pyre/ + +# version +_version.py diff --git a/xarray/datatree_/datatree/_version.py b/xarray/datatree_/datatree/_version.py deleted file mode 100644 index 4c803ed9cb8..00000000000 --- a/xarray/datatree_/datatree/_version.py +++ /dev/null @@ -1 +0,0 @@ -__version__ = "0.1.dev94+g6c6f23c.d20211217" From a82542cb7673905b0bdd3ce8a64606cdf2732dcf Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 30 Mar 2022 16:40:17 -0400 Subject: [PATCH 101/260] try to fix package version --- xarray/datatree_/datatree/__init__.py | 10 ++++++++++ xarray/datatree_/setup.py | 9 ++------- 2 files changed, 12 insertions(+), 7 deletions(-) diff --git a/xarray/datatree_/datatree/__init__.py b/xarray/datatree_/datatree/__init__.py index 7cd8ce5cd32..d799dc027ee 100644 --- a/xarray/datatree_/datatree/__init__.py +++ b/xarray/datatree_/datatree/__init__.py @@ -1,5 +1,15 @@ # flake8: noqa # Ignoring F401: imported but unused + +from pkg_resources import DistributionNotFound, get_distribution + +# import public API from .datatree import DataTree from .io import open_datatree from .mapping import map_over_subtree + +try: + __version__ = get_distribution(__name__).version +except DistributionNotFound: # noqa: F401; pragma: no cover + # package is not installed + pass diff --git a/xarray/datatree_/setup.py b/xarray/datatree_/setup.py index 13acc8647ce..bfd8aaff19a 100644 --- a/xarray/datatree_/setup.py +++ b/xarray/datatree_/setup.py @@ -14,7 +14,6 @@ setup( name="datatree", - version="0.0.1", description="Hierarchical tree-like data structures for xarray", long_description=long_description, url="https://github.com/xarray-contrib/datatree", @@ -34,10 +33,6 @@ packages=find_packages(exclude=["docs", "tests", "tests.*", "docs.*"]), install_requires=install_requires, python_requires=">=3.9", - setup_requires="setuptools_scm", - use_scm_version={ - "write_to": "datatree/_version.py", - "write_to_template": '__version__ = "{version}"', - "tag_regex": r"^(?Pv)?(?P[^\+]+)(?P.*)?$", - }, + use_scm_version={"version_scheme": "post-release", "local_scheme": "dirty-tag"}, + setup_requires=["setuptools_scm>=3.4", "setuptools>=42"], ) From 0a3c8d0555276c6eebc760d4952a84bf20bdaaec Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 30 Mar 2022 16:42:20 -0400 Subject: [PATCH 102/260] fix license calssifier --- xarray/datatree_/setup.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/setup.py b/xarray/datatree_/setup.py index bfd8aaff19a..fb3b12430a5 100644 --- a/xarray/datatree_/setup.py +++ b/xarray/datatree_/setup.py @@ -24,7 +24,7 @@ "Development Status :: 3 - Alpha", "Intended Audience :: Science/Research", "Topic :: Scientific/Engineering", - "License :: OSI Approved :: Apache License", + "License :: OSI Approved :: Apache Software License", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 3.9", From 96b6b65662b0fec0fa8401b9eb2a3a74f29b2f0f Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 30 Mar 2022 16:48:08 -0400 Subject: [PATCH 103/260] bump version to 0.0.2 From 0892f5d94a4c7889c21cd43f4db7c3959fad9c82 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 30 Mar 2022 16:49:57 -0400 Subject: [PATCH 104/260] change pypi project name to xarray-datatree --- xarray/datatree_/setup.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/setup.py b/xarray/datatree_/setup.py index fb3b12430a5..af5bac3ba89 100644 --- a/xarray/datatree_/setup.py +++ b/xarray/datatree_/setup.py @@ -13,7 +13,7 @@ setup( - name="datatree", + name="xarray-datatree", description="Hierarchical tree-like data structures for xarray", long_description=long_description, url="https://github.com/xarray-contrib/datatree", From 1d5e12e01b9e5cd0b44ef4463e9e50b7125053cf Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 30 Mar 2022 16:52:17 -0400 Subject: [PATCH 105/260] bump version to 0.0.3 From 06244081d3b43210e39987c038ad910477156cc3 Mon Sep 17 00:00:00 2001 From: Don Setiawan Date: Thu, 31 Mar 2022 11:40:22 -0700 Subject: [PATCH 106/260] Allow for older python and empty dataset with attributes https://github.com/xarray-contrib/datatree/pull/70 * Update repr to always show ds and allow for py<3.9 * Fix version check bug * Update repr to allow attrs only * Update dependencies and ci to allow <3.9 * Fix flake8 issues * Perform black linting * Match up with xarray's minimum python 3.8 req * Fix missing comma in classifiers --- xarray/datatree_/.github/workflows/main.yaml | 4 +- xarray/datatree_/ci/doc.yml | 2 +- xarray/datatree_/ci/environment.yml | 2 +- xarray/datatree_/datatree/datatree.py | 12 +++-- xarray/datatree_/datatree/mapping.py | 5 +- .../datatree_/datatree/tests/test_datatree.py | 16 ++++++ xarray/datatree_/datatree/tests/test_utils.py | 50 +++++++++++++++++++ xarray/datatree_/datatree/utils.py | 19 +++++++ xarray/datatree_/setup.py | 3 +- 9 files changed, 102 insertions(+), 11 deletions(-) create mode 100644 xarray/datatree_/datatree/tests/test_utils.py create mode 100644 xarray/datatree_/datatree/utils.py diff --git a/xarray/datatree_/.github/workflows/main.yaml b/xarray/datatree_/.github/workflows/main.yaml index d51cb2aab69..b0198a9d952 100644 --- a/xarray/datatree_/.github/workflows/main.yaml +++ b/xarray/datatree_/.github/workflows/main.yaml @@ -21,7 +21,7 @@ jobs: runs-on: ubuntu-latest strategy: matrix: - python-version: ["3.9", "3.10"] + python-version: ["3.8", "3.9", "3.10"] steps: - uses: actions/checkout@v3 - uses: conda-incubator/setup-miniconda@v2 @@ -59,7 +59,7 @@ jobs: runs-on: ubuntu-latest strategy: matrix: - python-version: ["3.9", "3.10"] + python-version: ["3.8", "3.9", "3.10"] steps: - uses: actions/checkout@v3 - uses: conda-incubator/setup-miniconda@v2 diff --git a/xarray/datatree_/ci/doc.yml b/xarray/datatree_/ci/doc.yml index 012cf69e75f..91240069cf1 100644 --- a/xarray/datatree_/ci/doc.yml +++ b/xarray/datatree_/ci/doc.yml @@ -3,7 +3,7 @@ channels: - conda-forge dependencies: - pip - - python>=3.9 + - python>=3.8 - xarray>=0.20.2 - netcdf4 - anytree diff --git a/xarray/datatree_/ci/environment.yml b/xarray/datatree_/ci/environment.yml index dce5ee85a64..239ece5e21e 100644 --- a/xarray/datatree_/ci/environment.yml +++ b/xarray/datatree_/ci/environment.yml @@ -3,7 +3,7 @@ channels: - conda-forge - nodefaults dependencies: - - python>=3.9 + - python>=3.8 - xarray>=0.20.2 - netcdf4 - anytree diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 8216e7e96e6..7df58070a76 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -124,6 +124,10 @@ def ds(self, data: Union[Dataset, DataArray] = None): def has_data(self) -> bool: return len(self.ds.variables) > 0 + @property + def has_attrs(self) -> bool: + return len(self.ds.attrs.keys()) > 0 + @classmethod def from_dict( cls, @@ -215,7 +219,7 @@ def __str__(self): node_line = f"{pre}{node_repr.splitlines()[0]}" lines.append(node_line) - if node.has_data: + if node.has_data or node.has_attrs: ds_repr = node_repr.splitlines()[2:] for line in ds_repr: if len(node.children) > 0: @@ -235,7 +239,7 @@ def _single_node_repr(self): """Information about this node, not including its relationships to other nodes.""" node_info = f"DataTree('{self.name}')" - if self.has_data: + if self.has_data or self.has_attrs: ds_info = "\n" + repr(self.ds) else: ds_info = "" @@ -247,8 +251,8 @@ def __repr__(self): parent = self.parent.name if self.parent is not None else "None" node_str = f"DataTree(name='{self.name}', parent='{parent}', children={[c.name for c in self.children]}," - if self.has_data: - ds_repr_lines = self.ds.__repr__().splitlines() + if self.has_data or self.has_attrs: + ds_repr_lines = repr(self.ds).splitlines() ds_repr = ( ds_repr_lines[0] + "\n" diff --git a/xarray/datatree_/datatree/mapping.py b/xarray/datatree_/datatree/mapping.py index 5c5aa1b9681..200edbec0b6 100644 --- a/xarray/datatree_/datatree/mapping.py +++ b/xarray/datatree_/datatree/mapping.py @@ -9,6 +9,7 @@ from xarray import DataArray, Dataset from .treenode import TreeNode +from .utils import removeprefix, removesuffix if TYPE_CHECKING: from .datatree import DataTree @@ -215,7 +216,7 @@ def _map_over_subtree(*args, **kwargs): # Find out how many return values we received num_return_values = _check_all_return_values(out_data_objects) - ancestors_of_new_root = first_tree.pathstr.removesuffix(first_tree.name) + ancestors_of_new_root = removesuffix(first_tree.pathstr, first_tree.name) # Reconstruct 1+ subtrees from the dict of results, by filling in all nodes of all result trees result_trees = [] @@ -233,7 +234,7 @@ def _map_over_subtree(*args, **kwargs): # Discard parentage so that new trees don't include parents of input nodes # TODO use a proper relative_path method on DataTree(/TreeNode) to do this - relative_path = p.removeprefix(ancestors_of_new_root) + relative_path = removeprefix(p, ancestors_of_new_root) out_tree_contents[relative_path] = output_node_data new_tree = DataTree.from_dict( diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index a5b655cf743..0f5d8576967 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -1,3 +1,5 @@ +import textwrap + import pytest import xarray as xr import xarray.testing as xrt @@ -293,6 +295,20 @@ def test_print_empty_node(self): printout = dt.__str__() assert printout == "DataTree('root', parent=None)" + def test_print_empty_node_with_attrs(self): + dat = xr.Dataset(attrs={"note": "has attrs"}) + dt = DataTree("root", data=dat) + printout = dt.__str__() + assert printout == textwrap.dedent( + """\ + DataTree('root', parent=None) + Dimensions: () + Data variables: + *empty* + Attributes: + note: has attrs""" + ) + def test_print_node_with_data(self): dat = xr.Dataset({"a": [0, 2]}) dt = DataTree("root", data=dat) diff --git a/xarray/datatree_/datatree/tests/test_utils.py b/xarray/datatree_/datatree/tests/test_utils.py new file mode 100644 index 00000000000..25632d38770 --- /dev/null +++ b/xarray/datatree_/datatree/tests/test_utils.py @@ -0,0 +1,50 @@ +from datatree.utils import removeprefix, removesuffix + + +def checkequal(expected_result, obj, method, *args, **kwargs): + result = method(obj, *args, **kwargs) + assert result == expected_result + + +def checkraises(exc, obj, method, *args): + try: + method(obj, *args) + except Exception as e: + assert isinstance(e, exc) is True + + +def test_removeprefix(): + checkequal("am", "spam", removeprefix, "sp") + checkequal("spamspam", "spamspamspam", removeprefix, "spam") + checkequal("spam", "spam", removeprefix, "python") + checkequal("spam", "spam", removeprefix, "spider") + checkequal("spam", "spam", removeprefix, "spam and eggs") + checkequal("", "", removeprefix, "") + checkequal("", "", removeprefix, "abcde") + checkequal("abcde", "abcde", removeprefix, "") + checkequal("", "abcde", removeprefix, "abcde") + + checkraises(TypeError, "hello", removeprefix) + checkraises(TypeError, "hello", removeprefix, 42) + checkraises(TypeError, "hello", removeprefix, 42, "h") + checkraises(TypeError, "hello", removeprefix, "h", 42) + checkraises(TypeError, "hello", removeprefix, ("he", "l")) + + +def test_removesuffix(): + checkequal("sp", "spam", removesuffix, "am") + checkequal("spamspam", "spamspamspam", removesuffix, "spam") + checkequal("spam", "spam", removesuffix, "python") + checkequal("spam", "spam", removesuffix, "blam") + checkequal("spam", "spam", removesuffix, "eggs and spam") + + checkequal("", "", removesuffix, "") + checkequal("", "", removesuffix, "abcde") + checkequal("abcde", "abcde", removesuffix, "") + checkequal("", "abcde", removesuffix, "abcde") + + checkraises(TypeError, "hello", removesuffix) + checkraises(TypeError, "hello", removesuffix, 42) + checkraises(TypeError, "hello", removesuffix, 42, "h") + checkraises(TypeError, "hello", removesuffix, "h", 42) + checkraises(TypeError, "hello", removesuffix, ("lo", "l")) diff --git a/xarray/datatree_/datatree/utils.py b/xarray/datatree_/datatree/utils.py new file mode 100644 index 00000000000..95d7ec0b23c --- /dev/null +++ b/xarray/datatree_/datatree/utils.py @@ -0,0 +1,19 @@ +import sys + + +def removesuffix(base: str, suffix: str) -> str: + if sys.version_info >= (3, 9): + return base.removesuffix(suffix) + else: + if base.endswith(suffix): + return base[: len(base) - len(suffix)] + return base + + +def removeprefix(base: str, prefix: str) -> str: + if sys.version_info >= (3, 9): + return base.removeprefix(prefix) + else: + if base.startswith(prefix): + return base[len(prefix) :] + return base diff --git a/xarray/datatree_/setup.py b/xarray/datatree_/setup.py index af5bac3ba89..12ac3a011b0 100644 --- a/xarray/datatree_/setup.py +++ b/xarray/datatree_/setup.py @@ -27,12 +27,13 @@ "License :: OSI Approved :: Apache Software License", "Operating System :: OS Independent", "Programming Language :: Python", + "Programming Language :: Python :: 3.8", "Programming Language :: Python :: 3.9", "Programming Language :: Python :: 3.10", ], packages=find_packages(exclude=["docs", "tests", "tests.*", "docs.*"]), install_requires=install_requires, - python_requires=">=3.9", + python_requires=">=3.8", use_scm_version={"version_scheme": "post-release", "local_scheme": "dirty-tag"}, setup_requires=["setuptools_scm>=3.4", "setuptools>=42"], ) From 370cd24a1ef70ae65035d287de9c94a34df83f6d Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Thu, 31 Mar 2022 14:42:55 -0400 Subject: [PATCH 107/260] bump version to 0.0.4 From 5b087d0b4b0964e86c6d9cd6c6d3a8934b573272 Mon Sep 17 00:00:00 2001 From: Anderson Banihirwe Date: Fri, 1 Apr 2022 13:46:46 -0600 Subject: [PATCH 108/260] Update installation instructions https://github.com/xarray-contrib/datatree/pull/71 * Update installation instructions * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- xarray/datatree_/docs/source/installation.rst | 27 ++++++++++++++----- 1 file changed, 21 insertions(+), 6 deletions(-) diff --git a/xarray/datatree_/docs/source/installation.rst b/xarray/datatree_/docs/source/installation.rst index e2cfeae1067..48799089d4b 100644 --- a/xarray/datatree_/docs/source/installation.rst +++ b/xarray/datatree_/docs/source/installation.rst @@ -2,14 +2,29 @@ Installation ============ -Datatree is not yet available on pypi or via conda, so for now you will have to install it from source. +Datatree can be installed in three ways: -``git clone https://github.com/TomNicholas/datatree.git``` +Using the `conda `__ package manager that comes with the +Anaconda/Miniconda distribution: -``pip install -e ./datatree/`` +.. code:: bash + + $ conda install xarray-datatree --channel conda-forge + +Using the `pip `__ package manager: + +.. code:: bash + + $ python -m pip install xarray-datatree + +To install a development version from source: + +.. code:: bash + + $ git clone https://github.com/xarray-contrib/datatree + $ cd datatree + $ python -m pip install -e . -The main branch will be kept up-to-date, so if you clone main and run the test suite with ``pytest datatree`` and get no failures, -then you have the most up-to-date version. You will need xarray and `anytree `_ as dependencies, with netcdf4, zarr, and h5netcdf as optional dependencies to allow file I/O. @@ -19,4 +34,4 @@ as dependencies, with netcdf4, zarr, and h5netcdf as optional dependencies to al Datatree is very much still in the early stages of development. There may be functions that are present but whose internals are not yet implemented, or significant changes to the API in future. That said, if you try it out and find some behaviour that looks like a bug to you, please report it on the - `issue tracker `_! + `issue tracker `_! From 7cb892b5a46e9d6b7b72760cbc347f2a56a580e0 Mon Sep 17 00:00:00 2001 From: Anderson Banihirwe Date: Fri, 1 Apr 2022 13:48:11 -0600 Subject: [PATCH 109/260] Remove lint workflow in favor of pre-commit.ci https://github.com/xarray-contrib/datatree/pull/72 --- xarray/datatree_/.github/workflows/main.yaml | 6 ------ 1 file changed, 6 deletions(-) diff --git a/xarray/datatree_/.github/workflows/main.yaml b/xarray/datatree_/.github/workflows/main.yaml index b0198a9d952..a1843b3477f 100644 --- a/xarray/datatree_/.github/workflows/main.yaml +++ b/xarray/datatree_/.github/workflows/main.yaml @@ -9,12 +9,6 @@ on: - cron: "0 0 * * *" jobs: - lint: - runs-on: ubuntu-latest - steps: - - uses: actions/checkout@v3 - - uses: actions/setup-python@v3 - - uses: pre-commit/action@v2.0.3 test: name: ${{ matrix.python-version }}-build From 4658642174e30cb83621b4db0afc900f579824fa Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Fri, 1 Apr 2022 16:06:37 -0400 Subject: [PATCH 110/260] Tree-like repr https://github.com/xarray-contrib/datatree/pull/73 * make __str__ and __repr__ consistent * update docs to match * fix lint error --- xarray/datatree_/datatree/datatree.py | 58 ++----------------- xarray/datatree_/datatree/formatting.py | 40 +++++++++++++ .../datatree_/docs/source/quick-overview.rst | 4 +- 3 files changed, 46 insertions(+), 56 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 7df58070a76..4f93ec2f75d 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -1,6 +1,5 @@ from __future__ import annotations -import textwrap from typing import Any, Callable, Dict, Hashable, Iterable, List, Mapping, Tuple, Union import anytree @@ -8,6 +7,7 @@ from xarray.core import dtypes, utils from xarray.core.variable import Variable +from .formatting import tree_repr from .mapping import TreeIsomorphismError, check_isomorphic, map_over_subtree from .ops import ( DataTreeArithmeticMixin, @@ -208,61 +208,11 @@ def add_child(self, child: TreeNode) -> None: else: child.parent = self - def __str__(self): - """A printable representation of the structure of this entire subtree.""" - renderer = anytree.RenderTree(self) - - lines = [] - for pre, fill, node in renderer: - node_repr = node._single_node_repr() - - node_line = f"{pre}{node_repr.splitlines()[0]}" - lines.append(node_line) - - if node.has_data or node.has_attrs: - ds_repr = node_repr.splitlines()[2:] - for line in ds_repr: - if len(node.children) > 0: - lines.append(f"{fill}{renderer.style.vertical}{line}") - else: - lines.append(f"{fill}{line}") - - # Tack on info about whether or not root node has a parent at the start - first_line = lines[0] - parent = f'"{self.parent.name}"' if self.parent is not None else "None" - first_line_with_parent = first_line[:-1] + f", parent={parent})" - lines[0] = first_line_with_parent - - return "\n".join(lines) - - def _single_node_repr(self): - """Information about this node, not including its relationships to other nodes.""" - node_info = f"DataTree('{self.name}')" - - if self.has_data or self.has_attrs: - ds_info = "\n" + repr(self.ds) - else: - ds_info = "" - return node_info + ds_info - def __repr__(self): - """Information about this node, including its relationships to other nodes.""" - # TODO redo this to look like the Dataset repr, but just with child and parent info - parent = self.parent.name if self.parent is not None else "None" - node_str = f"DataTree(name='{self.name}', parent='{parent}', children={[c.name for c in self.children]}," - - if self.has_data or self.has_attrs: - ds_repr_lines = repr(self.ds).splitlines() - ds_repr = ( - ds_repr_lines[0] - + "\n" - + textwrap.indent("\n".join(ds_repr_lines[1:]), " ") - ) - data_str = f"\ndata={ds_repr}\n)" - else: - data_str = "data=None)" + return tree_repr(self) - return node_str + data_str + def __str__(self): + return tree_repr(self) def __getitem__( self, key: Union[PathType, Hashable, Mapping, Any] diff --git a/xarray/datatree_/datatree/formatting.py b/xarray/datatree_/datatree/formatting.py index 9a03be3a0ca..a5c852d6041 100644 --- a/xarray/datatree_/datatree/formatting.py +++ b/xarray/datatree_/datatree/formatting.py @@ -1,3 +1,4 @@ +import anytree from xarray.core.formatting import _compat_to_str, diff_dataset_repr from .mapping import diff_treestructure @@ -44,3 +45,42 @@ def diff_tree_repr(a, b, compat): summary.append("\n" + nodewise_diff) return "\n".join(summary) + + +def tree_repr(dt): + """A printable representation of the structure of this entire tree.""" + renderer = anytree.RenderTree(dt) + + lines = [] + for pre, fill, node in renderer: + node_repr = _single_node_repr(node) + + node_line = f"{pre}{node_repr.splitlines()[0]}" + lines.append(node_line) + + if node.has_data or node.has_attrs: + ds_repr = node_repr.splitlines()[2:] + for line in ds_repr: + if len(node.children) > 0: + lines.append(f"{fill}{renderer.style.vertical}{line}") + else: + lines.append(f"{fill}{line}") + + # Tack on info about whether or not root node has a parent at the start + first_line = lines[0] + parent = f'"{dt.parent.name}"' if dt.parent is not None else "None" + first_line_with_parent = first_line[:-1] + f", parent={parent})" + lines[0] = first_line_with_parent + + return "\n".join(lines) + + +def _single_node_repr(node): + """Information about this node, not including its relationships to other nodes.""" + node_info = f"DataTree('{node.name}')" + + if node.has_data or node.has_attrs: + ds_info = "\n" + repr(node.ds) + else: + ds_info = "" + return node_info + ds_info diff --git a/xarray/datatree_/docs/source/quick-overview.rst b/xarray/datatree_/docs/source/quick-overview.rst index b5ea1d1ffd2..bba3fc695e9 100644 --- a/xarray/datatree_/docs/source/quick-overview.rst +++ b/xarray/datatree_/docs/source/quick-overview.rst @@ -38,7 +38,7 @@ Now we'll put this data into a multi-group tree: dt = DataTree.from_dict( {"root/simulation/coarse": ds, "root/simulation/fine": ds2, "root": ds3} ) - print(dt) + dt This creates a datatree with various groups. We have one root group (named ``root``), containing information about individual people. The root group then has one subgroup ``simulation``, which contains no data itself but does contain another two subgroups, @@ -68,7 +68,7 @@ Operations map over subtrees, so we can take a mean over the ``x`` dimension of .. ipython:: python avg = dt["simulation"].mean(dim="x") - print(avg) + avg Here the ``"x"`` dimension used is always the one local to that sub-group. From e985dab86361a46b06a29f3883128fb14cb90877 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Mon, 4 Apr 2022 17:42:17 -0600 Subject: [PATCH 111/260] [pre-commit.ci] pre-commit autoupdate https://github.com/xarray-contrib/datatree/pull/74 Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Anderson Banihirwe --- xarray/datatree_/.pre-commit-config.yaml | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/xarray/datatree_/.pre-commit-config.yaml b/xarray/datatree_/.pre-commit-config.yaml index 0e1e7192694..60e7db3436c 100644 --- a/xarray/datatree_/.pre-commit-config.yaml +++ b/xarray/datatree_/.pre-commit-config.yaml @@ -1,27 +1,29 @@ # https://pre-commit.com/ +ci: + autoupdate_schedule: monthly repos: - repo: https://github.com/pre-commit/pre-commit-hooks - rev: v4.0.1 + rev: v4.1.0 hooks: - id: trailing-whitespace - id: end-of-file-fixer - id: check-yaml # isort should run before black as black sometimes tweaks the isort output - repo: https://github.com/PyCQA/isort - rev: 5.9.3 + rev: 5.10.1 hooks: - id: isort # https://github.com/python/black#version-control-integration - repo: https://github.com/psf/black - rev: 21.7b0 + rev: 22.3.0 hooks: - id: black - repo: https://github.com/keewis/blackdoc rev: v0.3.4 hooks: - id: blackdoc - - repo: https://gitlab.com/pycqa/flake8 - rev: 3.9.2 + - repo: https://github.com/PyCQA/flake8 + rev: 4.0.1 hooks: - id: flake8 # - repo: https://github.com/Carreau/velin From 76d86664b3fa60b2ae5307fd163ad535aea8d743 Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Wed, 27 Apr 2022 17:30:19 -0400 Subject: [PATCH 112/260] Child dict refactor (also removes anytree dependency) https://github.com/xarray-contrib/datatree/pull/76 * draft implementation of a TreeNode class which stores children in a dict * separate path-like access out into mixin * pseudocode for node getter * basic idea for a path-like object which inherits from pathlib * pass type checking * implement attach * consolidate tree classes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * passes some basic family tree tests * frozen children * passes all basic family tree tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copied iterators code over from anytree * get nodes with path-like syntax * relative path method * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * set and get node methods * copy anytree iterators * add anytree license * change iterator import * copy anytree's string renderer * renderer * refactored treenode to use .get * black * updated datatree tests to match new path API * moved io tests to their own file * reimplemented getitem in terms of .get * reimplemented setitem in terms of .update * remove anytree dependency * from_dict constructor * string representation of tree * fixed tree diff * fixed io * removed cheeky print statements * fixed isomorphism checking * fixed map_over_subtree * removed now-uneeded utils.py compatibility functions * fixed tests for mapped dataset api methods * updated API docs * reimplement __setitem__ in terms of _set * fixed bug by ensuring name of child node is changed to match key it is stored under * updated docs * added whats-new, and put all changes from this PR in it * added summary of previous versions * remove outdated ._add_child method Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- xarray/datatree_/README.md | 6 +- xarray/datatree_/ci/doc.yml | 1 - xarray/datatree_/ci/environment.yml | 1 - xarray/datatree_/datatree/datatree.py | 386 ++++++------ xarray/datatree_/datatree/formatting.py | 15 +- xarray/datatree_/datatree/io.py | 80 +-- xarray/datatree_/datatree/iterators.py | 116 ++++ xarray/datatree_/datatree/mapping.py | 29 +- xarray/datatree_/datatree/render.py | 271 ++++++++ .../datatree/tests/test_dataset_api.py | 56 +- .../datatree_/datatree/tests/test_datatree.py | 345 +++++----- .../datatree/tests/test_formatting.py | 12 +- xarray/datatree_/datatree/tests/test_io.py | 56 ++ .../datatree_/datatree/tests/test_mapping.py | 58 +- .../datatree_/datatree/tests/test_treenode.py | 445 +++++++------ xarray/datatree_/datatree/tests/test_utils.py | 50 -- xarray/datatree_/datatree/treenode.py | 591 +++++++++++++----- xarray/datatree_/datatree/utils.py | 19 - xarray/datatree_/docs/source/api.rst | 110 +++- xarray/datatree_/docs/source/index.rst | 7 +- .../datatree_/docs/source/quick-overview.rst | 9 +- xarray/datatree_/docs/source/whats-new.rst | 95 +++ xarray/datatree_/licenses/ANYTREE_LICENSE | 201 ++++++ xarray/datatree_/requirements.txt | 1 - 24 files changed, 1987 insertions(+), 973 deletions(-) create mode 100644 xarray/datatree_/datatree/iterators.py create mode 100644 xarray/datatree_/datatree/render.py create mode 100644 xarray/datatree_/datatree/tests/test_io.py delete mode 100644 xarray/datatree_/datatree/tests/test_utils.py delete mode 100644 xarray/datatree_/datatree/utils.py create mode 100644 xarray/datatree_/docs/source/whats-new.rst create mode 100644 xarray/datatree_/licenses/ANYTREE_LICENSE diff --git a/xarray/datatree_/README.md b/xarray/datatree_/README.md index 63d9bd8e0e1..78830f65816 100644 --- a/xarray/datatree_/README.md +++ b/xarray/datatree_/README.md @@ -5,8 +5,8 @@ This aims to create the data structure discussed in [xarray issue #4118](https:/ The approach used here is based on benbovy's [`DatasetNode` example](https://gist.github.com/benbovy/92e7c76220af1aaa4b3a0b65374e233a) - the basic idea is that each tree node wraps a up to a single `xarray.Dataset`. The differences are that this effort: -- [Uses a NodeMixin from anytree](https://github.com/TomNicholas/datatree/issues/7) for the tree structure, -- Implements path-like and tag-like getting and setting, +- Uses a node structure inspired by [anytree](https://github.com/TomNicholas/datatree/issues/7) for the tree, +- Implements path-like getting and setting, - Has functions for mapping user-supplied functions over every node in the tree, - Automatically dispatches *some* of `xarray.Dataset`'s API over every node in the tree (such as `.isel`), - Has a bunch of tests, @@ -17,5 +17,5 @@ You can create a `DataTree` object in 3 ways: 1) Load from a netCDF file (or Zarr store) that has groups via `open_datatree()`. 2) Using the init method of `DataTree`, which creates an individual node. You can then specify the nodes' relationships to one other, either by setting `.parent` and `.chlldren` attributes, - or through `__get/setitem__` access, e.g. `dt['path/to/node'] = xr.Dataset()`. + or through `__get/setitem__` access, e.g. `dt['path/to/node'] = DataTree()`. 3) Create a tree from a dictionary of paths to datasets using `DataTree.from_dict()`. diff --git a/xarray/datatree_/ci/doc.yml b/xarray/datatree_/ci/doc.yml index 91240069cf1..0a20f516948 100644 --- a/xarray/datatree_/ci/doc.yml +++ b/xarray/datatree_/ci/doc.yml @@ -6,7 +6,6 @@ dependencies: - python>=3.8 - xarray>=0.20.2 - netcdf4 - - anytree - scipy - sphinx - sphinx-copybutton diff --git a/xarray/datatree_/ci/environment.yml b/xarray/datatree_/ci/environment.yml index 239ece5e21e..c5d58977e08 100644 --- a/xarray/datatree_/ci/environment.yml +++ b/xarray/datatree_/ci/environment.yml @@ -6,7 +6,6 @@ dependencies: - python>=3.8 - xarray>=0.20.2 - netcdf4 - - anytree - pytest - flake8 - black diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 4f93ec2f75d..50f943b070f 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -1,8 +1,17 @@ from __future__ import annotations -from typing import Any, Callable, Dict, Hashable, Iterable, List, Mapping, Tuple, Union +from typing import ( + TYPE_CHECKING, + Any, + Callable, + Hashable, + Iterable, + Mapping, + MutableMapping, + Tuple, + Union, +) -import anytree from xarray import DataArray, Dataset, merge from xarray.core import dtypes, utils from xarray.core.variable import Variable @@ -14,7 +23,11 @@ MappedDatasetMethodsMixin, MappedDataWithCoords, ) -from .treenode import PathType, TreeNode +from .render import RenderTree +from .treenode import NodePath, TreeNode + +if TYPE_CHECKING: + from xarray.core.merge import CoercibleValue # """ # DEVELOPERS' NOTE @@ -30,6 +43,9 @@ # """ +T_Path = Union[str, NodePath] + + class DataTree( TreeNode, MappedDatasetMethodsMixin, @@ -42,8 +58,6 @@ class DataTree( Attempts to present an API like that of xarray.Dataset, but methods are wrapped to also update all the tree's child nodes. """ - # TODO should this instead be a subclass of Dataset? - # TODO attribute-like access for both vars and child nodes (by inheriting from xarray.core.common.AttrsAccessMixin?) # TODO ipython autocomplete for child nodes @@ -54,35 +68,36 @@ class DataTree( # TODO do we need a watch out for if methods intended only for root nodes are called on non-root nodes? - # TODO currently allows self.ds = None, should we instead always store at least an empty Dataset? - # TODO dataset methods which should not or cannot act over the whole tree, such as .to_array # TODO del and delitem methods # TODO .loc, __contains__, __iter__, __array__, __len__ + _name: str | None + _ds: Dataset | None + def __init__( self, - name: Hashable = "root", - data: Union[Dataset, DataArray] = None, - parent: TreeNode = None, - children: List[TreeNode] = None, + data: Dataset | DataArray = None, + parent: DataTree = None, + children: Mapping[str, DataTree] = None, + name: str = None, ): """ Create a single node of a DataTree, which optionally contains data in the form of an xarray.Dataset. Parameters ---------- - name : Hashable - Name for the root node of the tree. Default is "root" data : Dataset, DataArray, Variable or None, optional Data to store under the .ds attribute of this node. DataArrays and Variables will be promoted to Datasets. Default is None. - parent : TreeNode, optional + parent : DataTree, optional Parent node to this node. Default is None. - children : Sequence[TreeNode], optional + children : Mapping[str, DataTree], optional Any child nodes of this node. Default is None. + name : str, optional + Name for the root node of the tree. Returns ------- @@ -93,11 +108,29 @@ def __init__( DataTree.from_dict """ - super().__init__(name, parent=parent, children=children) + super().__init__(children=children) + self._name = name + self.parent = parent self.ds = data + @property + def name(self) -> str | None: + """The name of this node.""" + return self._name + + @name.setter + def name(self, name: str | None) -> None: + self._name = name + + @TreeNode.parent.setter + def parent(self, new_parent: DataTree) -> None: + if new_parent and self.name is None: + raise ValueError("Cannot set an unnamed node as a child of another node") + self._set_parent(new_parent, self.name) + @property def ds(self) -> Dataset: + """The data in this node, returned as a Dataset.""" return self._ds @ds.setter @@ -113,7 +146,7 @@ def ds(self, data: Union[Dataset, DataArray] = None): data = Dataset() for var in list(data.variables): - if var in list(c.name for c in self.children): + if var in self.children: raise KeyError( f"Cannot add variable named {var}: node already has a child named {var}" ) @@ -122,67 +155,18 @@ def ds(self, data: Union[Dataset, DataArray] = None): @property def has_data(self) -> bool: + """Whether or not there are any data variables in this node.""" return len(self.ds.variables) > 0 @property def has_attrs(self) -> bool: + """Whether or not there are any metadata attributes in this node.""" return len(self.ds.attrs.keys()) > 0 - @classmethod - def from_dict( - cls, - data_objects: Dict[PathType, Union[Dataset, DataArray, None]] = None, - name: Hashable = "root", - ): - """ - Create a datatree from a dictionary of data objects, labelled by paths into the tree. - - Parameters - ---------- - data_objects : dict-like, optional - A mapping from path names to xarray.Dataset, xarray.DataArray, or DataTree objects. - - Path names can be given as unix-like paths, or as tuples of strings (where each string - is known as a single "tag"). If path names containing more than one tag are given, new - tree nodes will be constructed as necessary. - - To assign data to the root node of the tree use {name} as the path. - name : Hashable, optional - Name for the root node of the tree. Default is "root" - - Returns - ------- - DataTree - """ - - # First create the root node - if data_objects: - root_data = data_objects.pop(name, None) - else: - root_data = None - obj = cls(name=name, data=root_data, parent=None, children=None) - - if data_objects: - # Populate tree with children determined from data_objects mapping - for path, data in data_objects.items(): - # Determine name of new node - path = obj._tuple_or_path_to_path(path) - if obj.separator in path: - node_path, node_name = path.rsplit(obj.separator, maxsplit=1) - else: - node_path, node_name = "/", path - - relative_path = node_path.replace(obj.name, "") - - # Create and set new node - new_node = cls(name=node_name, data=data) - obj.set_node( - relative_path, - new_node, - allow_overwrite=False, - new_nodes_along_path=True, - ) - return obj + @property + def is_empty(self) -> bool: + """False if node contains any data or attrs. Does not look at children.""" + return not (self.has_data or self.has_attrs) def _pre_attach(self, parent: TreeNode) -> None: """ @@ -195,86 +179,79 @@ def _pre_attach(self, parent: TreeNode) -> None: f"parent {parent.name} already contains a data variable named {self.name}" ) - def add_child(self, child: TreeNode) -> None: - """ - Add a single child node below this node, without replacement. - - Will raise a KeyError if either a child or data variable already exists with this name. - """ - if child.name in list(c.name for c in self.children): - raise KeyError(f"Node already has a child named {child.name}") - elif self.has_data and child.name in list(self.ds.variables): - raise KeyError(f"Node already contains a data variable named {child.name}") - else: - child.parent = self - def __repr__(self): return tree_repr(self) def __str__(self): return tree_repr(self) - def __getitem__( - self, key: Union[PathType, Hashable, Mapping, Any] - ) -> Union[TreeNode, Dataset, DataArray]: + def get( + self, key: str, default: DataTree | DataArray = None + ) -> DataTree | DataArray | None: """ - Access either child nodes, variables, or coordinates stored in this tree. + Access child nodes stored in this node as a DataTree or variables or coordinates stored in this node as a + DataArray. - Variables or coordinates of the contained dataset will be returned as a :py:class:`~xarray.DataArray`. - Indexing with a list of names will return a new ``Dataset`` object. + Parameters + ---------- + key : str + Name of variable / node item, which must lie in this immediate node (not elsewhere in the tree). + default : DataTree | DataArray, optional + A value to return if the specified key does not exist. + Default value is None. + """ + if key in self.children: + return self.children[key] + elif key in self.ds: + return self.ds[key] + else: + return default - Like Dataset.__getitem__ this method also accepts dict-like indexing, and selection of multiple data variables - (from the same Dataset node) via list. + def __getitem__(self, key: str) -> DataTree | DataArray: + """ + Access child nodes stored in this tree as a DataTree or variables or coordinates stored in this tree as a + DataArray. Parameters ---------- - key : - Paths to nodes or to data variables in nodes can be given as unix-like paths, or as tuples of strings - (where each string is known as a single "tag"). + key : str + Name of variable / node, or unix-like path to variable / node. """ # Either: if utils.is_dict_like(key): - # dict-like selection on dataset variables - return self.ds[key] - elif utils.hashable(key): - # path-like: a path to a node possibly with a variable name at the end - return self._get_item_from_path(key) - elif utils.is_list_like(key) and all(k in self.ds for k in key): + # dict-like indexing + raise NotImplementedError("Should this index over whole tree?") + elif isinstance(key, str): + # TODO should possibly deal with hashables in general? + # path-like: a name of a node/variable, or path to a node/variable + path = NodePath(key) + return self._get_item(path) + elif utils.is_list_like(key): # iterable of variable names - return self.ds[key] - elif utils.is_list_like(key) and all("/" not in tag for tag in key): - # iterable of child tags - return self._get_item_from_path(key) + raise NotImplementedError( + "Selecting via tags is deprecated, and selecting multiple items should be " + "implemented via .subset" + ) else: raise ValueError("Invalid format for key") - def _get_item_from_path( - self, path: PathType - ) -> Union[TreeNode, Dataset, DataArray]: - """Get item given a path. Two valid cases: either all parts of path are nodes or last part is a variable.""" - - # TODO this currently raises a ChildResolverError if it can't find a data variable in the ds - that's inconsistent with xarray.Dataset.__getitem__ - - path = self._tuple_or_path_to_path(path) - tags = [ - tag for tag in path.split(self.separator) if tag not in [self.separator, ""] - ] - *leading_tags, last_tag = tags - - if leading_tags is not None: - penultimate = self.get_node(tuple(leading_tags)) - else: - penultimate = self + def _set(self, key: str, val: DataTree | CoercibleValue) -> None: + """ + Set the child node or variable with the specified key to value. - if penultimate.has_data and last_tag in penultimate.ds: - return penultimate.ds[last_tag] + Counterpart to the public .get method, and also only works on the immediate node, not other nodes in the tree. + """ + if isinstance(val, DataTree): + val.name = key + val.parent = self + elif isinstance(val, (DataArray, Variable)): + # TODO this should also accomodate other types that can be coerced into Variables + self.ds[key] = val else: - return penultimate.get_node(last_tag) + raise TypeError(f"Type {type(val)} cannot be assigned to a DataTree") def __setitem__( - self, - key: Union[Hashable, List[Hashable], Mapping, PathType], - value: Union[TreeNode, Dataset, DataArray, Variable, None], + self, key: str, value: DataTree | Dataset | DataArray | Variable ) -> None: """ Add either a child node or an array to the tree, at any position. @@ -283,87 +260,84 @@ def __setitem__( If there is already a node at the given location, then if value is a Node class or Dataset it will overwrite the data already present at that node, and if value is a single array, it will be merged with it. + """ + # TODO xarray.Dataset accepts other possibilities, how do we exactly replicate all the behaviour? + if utils.is_dict_like(key): + raise NotImplementedError + elif isinstance(key, str): + # TODO should possibly deal with hashables in general? + # path-like: a name of a node/variable, or path to a node/variable + path = NodePath(key) + return self._set_item(path, value, new_nodes_along_path=True) + else: + raise ValueError("Invalid format for key") + + def update(self, other: Dataset | Mapping[str, DataTree | CoercibleValue]) -> None: + """ + Update this node's children and / or variables. - If value is None a new node will be created but containing no data. If a node already exists at that path it - will have its .ds attribute set to None. (To remove node from the tree completely instead use `del tree[path]`.) + Just like `dict.update` this is an in-place operation. + """ + # TODO separate by type + new_children = {} + new_variables = {} + for k, v in other.items(): + if isinstance(v, DataTree): + new_children[k] = v + elif isinstance(v, (DataArray, Variable)): + # TODO this should also accomodate other types that can be coerced into Variables + new_variables[k] = v + elif isinstance(v, Dataset): + new_variables = v.variables + else: + raise TypeError(f"Type {type(v)} cannot be assigned to a DataTree") + + super().update(new_children) + self.ds.update(new_variables) + + @classmethod + def from_dict( + cls, + d: MutableMapping[str, Any], + name: str = None, + ) -> DataTree: + """ + Create a datatree from a dictionary of data objects, labelled by paths into the tree. Parameters ---------- - key - A path-like address for either a new node, or the address and name of a new variable, or the name of a new - variable. - value - Can be a node class or a data object (i.e. Dataset, DataArray, Variable). + d : dict-like + A mapping from path names to xarray.Dataset, xarray.DataArray, or DataTree objects. + + Path names are to be given as unix-like path. If path names containing more than one part are given, new + tree nodes will be constructed as necessary. + + To assign data to the root node of the tree use "/" as the path. + name : Hashable, optional + Name for the root node of the tree. Default is None. + + Returns + ------- + DataTree """ - # TODO xarray.Dataset accepts other possibilities, how do we exactly replicate all the behaviour? - if utils.is_dict_like(key): - raise NotImplementedError + # First create the root node + root_data = d.pop("/", None) + obj = cls(name=name, data=root_data, parent=None, children=None) - path = self._tuple_or_path_to_path(key) - tags = [ - tag for tag in path.split(self.separator) if tag not in [self.separator, ""] - ] - - # TODO a .path_as_tags method? - if not tags: - # only dealing with this node, no need for paths - if isinstance(value, (Dataset, DataArray, Variable)): - # single arrays will replace whole Datasets, as no name for new variable was supplied - self.ds = value - elif isinstance(value, TreeNode): - self.add_child(value) - elif value is None: - self.ds = None - else: - raise TypeError( - "Can only assign values of type TreeNode, Dataset, DataArray, or Variable, " - f"not {type(value)}" + if d: + # Populate tree with children determined from data_objects mapping + for path, data in d.items(): + # Create and set new node + node_name = NodePath(path).name + new_node = cls(name=node_name, data=data) + obj._set_item( + path, + new_node, + allow_overwrite=False, + new_nodes_along_path=True, ) - else: - *path_tags, last_tag = tags - if not path_tags: - path_tags = "/" - - # get anything that already exists at that location - try: - existing_node = self.get_node(path) - except anytree.resolver.ResolverError: - existing_node = None - - if existing_node is not None: - if isinstance(value, Dataset): - # replace whole dataset - existing_node.ds = Dataset - elif isinstance(value, (DataArray, Variable)): - if not existing_node.has_data: - # promotes da to ds - existing_node.ds = value - else: - # update with new da - existing_node.ds[last_tag] = value - elif isinstance(value, TreeNode): - # overwrite with new node at same path - self.set_node(path=path, node=value) - elif value is None: - existing_node.ds = None - else: - raise TypeError( - "Can only assign values of type TreeNode, Dataset, DataArray, or Variable, " - f"not {type(value)}" - ) - else: - # if nothing there then make new node based on type of object - if isinstance(value, (Dataset, DataArray, Variable)) or value is None: - new_node = DataTree(name=last_tag, data=value) - self.set_node(path=path_tags, node=new_node) - elif isinstance(value, TreeNode): - self.set_node(path=path, node=value) - else: - raise TypeError( - "Can only assign values of type TreeNode, Dataset, DataArray, or Variable, " - f"not {type(value)}" - ) + return obj @property def nbytes(self) -> int: @@ -537,7 +511,7 @@ def map_over_subtree_inplace( def render(self): """Print tree structure, including any data stored at each node.""" - for pre, fill, node in anytree.RenderTree(self): + for pre, fill, node in RenderTree(self): print(f"{pre}DataTree('{self.name}')") for ds_line in repr(node.ds)[1:]: print(f"{fill}{ds_line}") @@ -571,13 +545,13 @@ def merge(self, datatree: DataTree) -> DataTree: """Merge all the leaves of a second DataTree into this one.""" raise NotImplementedError - def merge_child_nodes(self, *paths, new_path: PathType) -> DataTree: + def merge_child_nodes(self, *paths, new_path: T_Path) -> DataTree: """Merge a set of child nodes into a single new node.""" raise NotImplementedError def merge_child_datasets( self, - *paths: PathType, + *paths: T_Path, compat: str = "no_conflicts", join: str = "outer", fill_value: Any = dtypes.NA, @@ -599,7 +573,7 @@ def as_array(self) -> DataArray: @property def groups(self): """Return all netCDF4 groups in the tree, given as a tuple of path-like strings.""" - return tuple(node.pathstr for node in self.subtree) + return tuple(node.path for node in self.subtree) def to_netcdf( self, filepath, mode: str = "w", encoding=None, unlimited_dims=None, **kwargs diff --git a/xarray/datatree_/datatree/formatting.py b/xarray/datatree_/datatree/formatting.py index a5c852d6041..7b66c4e13c0 100644 --- a/xarray/datatree_/datatree/formatting.py +++ b/xarray/datatree_/datatree/formatting.py @@ -1,7 +1,12 @@ -import anytree +from typing import TYPE_CHECKING + from xarray.core.formatting import _compat_to_str, diff_dataset_repr from .mapping import diff_treestructure +from .render import RenderTree + +if TYPE_CHECKING: + from .datatree import DataTree def diff_nodewise_summary(a, b, compat): @@ -14,12 +19,12 @@ def diff_nodewise_summary(a, b, compat): a_ds, b_ds = node_a.ds, node_b.ds if not a_ds._all_compat(b_ds, compat): - path = node_a.pathstr dataset_diff = diff_dataset_repr(a_ds, b_ds, compat_str) data_diff = "\n".join(dataset_diff.split("\n", 1)[1:]) nodediff = ( - f"\nData in nodes at position '{path}' do not match:" f"{data_diff}" + f"\nData in nodes at position '{node_a.path}' do not match:" + f"{data_diff}" ) summary.append(nodediff) @@ -49,7 +54,7 @@ def diff_tree_repr(a, b, compat): def tree_repr(dt): """A printable representation of the structure of this entire tree.""" - renderer = anytree.RenderTree(dt) + renderer = RenderTree(dt) lines = [] for pre, fill, node in renderer: @@ -75,7 +80,7 @@ def tree_repr(dt): return "\n".join(lines) -def _single_node_repr(node): +def _single_node_repr(node: "DataTree") -> str: """Information about this node, not including its relationships to other nodes.""" node_info = f"DataTree('{node.name}')" diff --git a/xarray/datatree_/datatree/io.py b/xarray/datatree_/datatree/io.py index 36fc93defed..06e9b88436c 100644 --- a/xarray/datatree_/datatree/io.py +++ b/xarray/datatree_/datatree/io.py @@ -1,32 +1,24 @@ -import pathlib from typing import Sequence -from xarray import open_dataset +from xarray import Dataset, open_dataset -from .datatree import DataTree, PathType +from .datatree import DataTree, NodePath, T_Path -def _ds_or_none(ds): - """return none if ds is empty""" - if any(ds.coords) or any(ds.variables) or any(ds.attrs): - return ds - return None - - -def _iter_zarr_groups(root, parrent=""): - parrent = pathlib.Path(parrent) +def _iter_zarr_groups(root, parent="/"): + parent = NodePath(parent) for path, group in root.groups(): - gpath = parrent / path + gpath = parent / path yield str(gpath) - yield from _iter_zarr_groups(group, parrent=gpath) + yield from _iter_zarr_groups(group, parent=gpath) -def _iter_nc_groups(root, parrent=""): - parrent = pathlib.Path(parrent) +def _iter_nc_groups(root, parent="/"): + parent = NodePath(parent) for path, group in root.groups.items(): - gpath = parrent / path + gpath = parent / path yield str(gpath) - yield from _iter_nc_groups(group, parrent=gpath) + yield from _iter_nc_groups(group, parent=gpath) def _get_nc_dataset_class(engine): @@ -72,11 +64,19 @@ def _open_datatree_netcdf(filename: str, **kwargs) -> DataTree: ncDataset = _get_nc_dataset_class(kwargs.get("engine", None)) with ncDataset(filename, mode="r") as ncds: - ds = open_dataset(filename, **kwargs).pipe(_ds_or_none) - tree_root = DataTree.from_dict(data_objects={"root": ds}) - for key in _iter_nc_groups(ncds): - tree_root[key] = open_dataset(filename, group=key, **kwargs).pipe( - _ds_or_none + ds = open_dataset(filename, **kwargs) + tree_root = DataTree.from_dict({"/": ds}) + for path in _iter_nc_groups(ncds): + subgroup_ds = open_dataset(filename, group=path, **kwargs) + + # TODO refactor to use __setitem__ once creation of new nodes by assigning Dataset works again + node_name = NodePath(path).name + new_node = DataTree(name=node_name, data=subgroup_ds) + tree_root._set_item( + path, + new_node, + allow_overwrite=False, + new_nodes_along_path=True, ) return tree_root @@ -85,20 +85,28 @@ def _open_datatree_zarr(store, **kwargs) -> DataTree: import zarr with zarr.open_group(store, mode="r") as zds: - ds = open_dataset(store, engine="zarr", **kwargs).pipe(_ds_or_none) - tree_root = DataTree.from_dict(data_objects={"root": ds}) - for key in _iter_zarr_groups(zds): + ds = open_dataset(store, engine="zarr", **kwargs) + tree_root = DataTree.from_dict({"/": ds}) + for path in _iter_zarr_groups(zds): try: - tree_root[key] = open_dataset( - store, engine="zarr", group=key, **kwargs - ).pipe(_ds_or_none) + subgroup_ds = open_dataset(store, engine="zarr", group=path, **kwargs) except zarr.errors.PathNotFoundError: - tree_root[key] = None + subgroup_ds = Dataset() + + # TODO refactor to use __setitem__ once creation of new nodes by assigning Dataset works again + node_name = NodePath(path).name + new_node = DataTree(name=node_name, data=subgroup_ds) + tree_root._set_item( + path, + new_node, + allow_overwrite=False, + new_nodes_along_path=True, + ) return tree_root def open_mfdatatree( - filepaths, rootnames: Sequence[PathType] = None, chunks=None, **kwargs + filepaths, rootnames: Sequence[T_Path] = None, chunks=None, **kwargs ) -> DataTree: """ Open multiple files as a single DataTree. @@ -168,7 +176,7 @@ def _datatree_to_netcdf( for node in dt.subtree: ds = node.ds - group_path = node.pathstr.replace(dt.root.pathstr, "") + group_path = node.path if ds is None: _create_empty_netcdf_group(filepath, group_path, mode, engine) else: @@ -177,8 +185,8 @@ def _datatree_to_netcdf( filepath, group=group_path, mode=mode, - encoding=_maybe_extract_group_kwargs(encoding, dt.pathstr), - unlimited_dims=_maybe_extract_group_kwargs(unlimited_dims, dt.pathstr), + encoding=_maybe_extract_group_kwargs(encoding, dt.path), + unlimited_dims=_maybe_extract_group_kwargs(unlimited_dims, dt.path), **kwargs, ) mode = "a" @@ -215,7 +223,7 @@ def _datatree_to_zarr( for node in dt.subtree: ds = node.ds - group_path = node.pathstr.replace(dt.root.pathstr, "") + group_path = node.path if ds is None: _create_empty_zarr_group(store, group_path, mode) else: @@ -223,7 +231,7 @@ def _datatree_to_zarr( store, group=group_path, mode=mode, - encoding=_maybe_extract_group_kwargs(encoding, dt.pathstr), + encoding=_maybe_extract_group_kwargs(encoding, dt.path), consolidated=False, **kwargs, ) diff --git a/xarray/datatree_/datatree/iterators.py b/xarray/datatree_/datatree/iterators.py new file mode 100644 index 00000000000..8e34fa0c141 --- /dev/null +++ b/xarray/datatree_/datatree/iterators.py @@ -0,0 +1,116 @@ +from abc import abstractmethod +from collections import abc +from typing import Callable, Iterator, List + +from .treenode import TreeNode + +"""These iterators are copied from anytree.iterators, with minor modifications.""" + + +class AbstractIter(abc.Iterator): + def __init__( + self, + node: TreeNode, + filter_: Callable = None, + stop: Callable = None, + maxlevel: int = None, + ): + """ + Iterate over tree starting at `node`. + Base class for all iterators. + Keyword Args: + filter_: function called with every `node` as argument, `node` is returned if `True`. + stop: stop iteration at `node` if `stop` function returns `True` for `node`. + maxlevel (int): maximum descending in the node hierarchy. + """ + self.node = node + self.filter_ = filter_ + self.stop = stop + self.maxlevel = maxlevel + self.__iter = None + + def __init(self): + node = self.node + maxlevel = self.maxlevel + filter_ = self.filter_ or AbstractIter.__default_filter + stop = self.stop or AbstractIter.__default_stop + children = ( + [] + if AbstractIter._abort_at_level(1, maxlevel) + else AbstractIter._get_children([node], stop) + ) + return self._iter(children, filter_, stop, maxlevel) + + @staticmethod + def __default_filter(node): + return True + + @staticmethod + def __default_stop(node): + return False + + def __iter__(self) -> Iterator[TreeNode]: + return self + + def __next__(self) -> TreeNode: + if self.__iter is None: + self.__iter = self.__init() + item = next(self.__iter) + return item + + @staticmethod + @abstractmethod + def _iter(children: List[TreeNode], filter_, stop, maxlevel) -> Iterator[TreeNode]: + ... + + @staticmethod + def _abort_at_level(level, maxlevel): + return maxlevel is not None and level > maxlevel + + @staticmethod + def _get_children(children: List[TreeNode], stop) -> List[TreeNode]: + return [child for child in children if not stop(child)] + + +class PreOrderIter(AbstractIter): + """ + Iterate over tree applying pre-order strategy starting at `node`. + Start at root and go-down until reaching a leaf node. + Step upwards then, and search for the next leafs. + """ + + @staticmethod + def _iter(children, filter_, stop, maxlevel): + for child_ in children: + if stop(child_): + continue + if filter_(child_): + yield child_ + if not AbstractIter._abort_at_level(2, maxlevel): + descendantmaxlevel = maxlevel - 1 if maxlevel else None + for descendant_ in PreOrderIter._iter( + list(child_.children.values()), filter_, stop, descendantmaxlevel + ): + yield descendant_ + + +class LevelOrderIter(AbstractIter): + """ + Iterate over tree applying level-order strategy starting at `node`. + """ + + @staticmethod + def _iter(children, filter_, stop, maxlevel): + level = 1 + while children: + next_children = [] + for child in children: + if filter_(child): + yield child + next_children += AbstractIter._get_children( + list(child.children.values()), stop + ) + children = next_children + level += 1 + if AbstractIter._abort_at_level(level, maxlevel): + break diff --git a/xarray/datatree_/datatree/mapping.py b/xarray/datatree_/datatree/mapping.py index 200edbec0b6..f669fda6166 100644 --- a/xarray/datatree_/datatree/mapping.py +++ b/xarray/datatree_/datatree/mapping.py @@ -5,18 +5,17 @@ from textwrap import dedent from typing import TYPE_CHECKING, Callable, Tuple -from anytree.iterators import LevelOrderIter from xarray import DataArray, Dataset -from .treenode import TreeNode -from .utils import removeprefix, removesuffix +from .iterators import LevelOrderIter +from .treenode import NodePath, TreeNode if TYPE_CHECKING: from .datatree import DataTree class TreeIsomorphismError(ValueError): - """Error raised if two tree objects are not isomorphic to one another when they need to be.""" + """Error raised if two tree objects do not share the same node structure.""" pass @@ -24,8 +23,8 @@ class TreeIsomorphismError(ValueError): def check_isomorphic( a: DataTree, b: DataTree, - require_names_equal=False, - check_from_root=True, + require_names_equal: bool = False, + check_from_root: bool = True, ): """ Check that two trees have the same structure, raising an error if not. @@ -82,7 +81,7 @@ def diff_treestructure(a: DataTree, b: DataTree, require_names_equal: bool) -> s # Checking for isomorphism by walking in this way implicitly assumes that the tree is an ordered tree # (which it is so long as children are stored in a tuple or list rather than in a set). for node_a, node_b in zip(LevelOrderIter(a), LevelOrderIter(b)): - path_a, path_b = node_a.pathstr, node_b.pathstr + path_a, path_b = node_a.path, node_b.path if require_names_equal: if node_a.name != node_b.name: @@ -206,24 +205,23 @@ def _map_over_subtree(*args, **kwargs): # Now we can call func on the data in this particular set of corresponding nodes results = ( func(*node_args_as_datasets, **node_kwargs_as_datasets) - if node_of_first_tree.has_data + if not node_of_first_tree.is_empty else None ) # TODO implement mapping over multiple trees in-place using if conditions from here on? - out_data_objects[node_of_first_tree.pathstr] = results + out_data_objects[node_of_first_tree.path] = results # Find out how many return values we received num_return_values = _check_all_return_values(out_data_objects) - ancestors_of_new_root = removesuffix(first_tree.pathstr, first_tree.name) - # Reconstruct 1+ subtrees from the dict of results, by filling in all nodes of all result trees + original_root_path = first_tree.path result_trees = [] for i in range(num_return_values): out_tree_contents = {} for n in first_tree.subtree: - p = n.pathstr + p = n.path if p in out_data_objects.keys(): if isinstance(out_data_objects[p], tuple): output_node_data = out_data_objects[p][i] @@ -233,12 +231,13 @@ def _map_over_subtree(*args, **kwargs): output_node_data = None # Discard parentage so that new trees don't include parents of input nodes - # TODO use a proper relative_path method on DataTree(/TreeNode) to do this - relative_path = removeprefix(p, ancestors_of_new_root) + relative_path = str(NodePath(p).relative_to(original_root_path)) + relative_path = "/" if relative_path == "." else relative_path out_tree_contents[relative_path] = output_node_data new_tree = DataTree.from_dict( - name=first_tree.name, data_objects=out_tree_contents + out_tree_contents, + name=first_tree.name, ) result_trees.append(new_tree) diff --git a/xarray/datatree_/datatree/render.py b/xarray/datatree_/datatree/render.py new file mode 100644 index 00000000000..aef327c5c47 --- /dev/null +++ b/xarray/datatree_/datatree/render.py @@ -0,0 +1,271 @@ +""" +String Tree Rendering. Copied from anytree. +""" + +import collections +from typing import TYPE_CHECKING + +if TYPE_CHECKING: + from .datatree import DataTree + +Row = collections.namedtuple("Row", ("pre", "fill", "node")) + + +class AbstractStyle(object): + def __init__(self, vertical, cont, end): + """ + Tree Render Style. + Args: + vertical: Sign for vertical line. + cont: Chars for a continued branch. + end: Chars for the last branch. + """ + super(AbstractStyle, self).__init__() + self.vertical = vertical + self.cont = cont + self.end = end + assert ( + len(cont) == len(vertical) == len(end) + ), f"'{vertical}', '{cont}' and '{end}' need to have equal length" + + @property + def empty(self): + """Empty string as placeholder.""" + return " " * len(self.end) + + def __repr__(self): + return f"{self.__class__.__name__}()" + + +class ContStyle(AbstractStyle): + def __init__(self): + """ + Continued style, without gaps. + + >>> from anytree import Node, RenderTree + >>> root = Node("root") + >>> s0 = Node("sub0", parent=root) + >>> s0b = Node("sub0B", parent=s0) + >>> s0a = Node("sub0A", parent=s0) + >>> s1 = Node("sub1", parent=root) + >>> print(RenderTree(root, style=ContStyle())) + + Node('/root') + ├── Node('/root/sub0') + │ ├── Node('/root/sub0/sub0B') + │ └── Node('/root/sub0/sub0A') + └── Node('/root/sub1') + """ + super(ContStyle, self).__init__( + "\u2502 ", "\u251c\u2500\u2500 ", "\u2514\u2500\u2500 " + ) + + +class RenderTree(object): + def __init__( + self, node: "DataTree", style=ContStyle(), childiter=list, maxlevel=None + ): + """ + Render tree starting at `node`. + Keyword Args: + style (AbstractStyle): Render Style. + childiter: Child iterator. + maxlevel: Limit rendering to this depth. + :any:`RenderTree` is an iterator, returning a tuple with 3 items: + `pre` + tree prefix. + `fill` + filling for multiline entries. + `node` + :any:`NodeMixin` object. + It is up to the user to assemble these parts to a whole. + >>> from anytree import Node, RenderTree + >>> root = Node("root", lines=["c0fe", "c0de"]) + >>> s0 = Node("sub0", parent=root, lines=["ha", "ba"]) + >>> s0b = Node("sub0B", parent=s0, lines=["1", "2", "3"]) + >>> s0a = Node("sub0A", parent=s0, lines=["a", "b"]) + >>> s1 = Node("sub1", parent=root, lines=["Z"]) + Simple one line: + >>> for pre, _, node in RenderTree(root): + ... print("%s%s" % (pre, node.name)) + ... + root + ├── sub0 + │ ├── sub0B + │ └── sub0A + └── sub1 + Multiline: + >>> for pre, fill, node in RenderTree(root): + ... print("%s%s" % (pre, node.lines[0])) + ... for line in node.lines[1:]: + ... print("%s%s" % (fill, line)) + ... + c0fe + c0de + ├── ha + │ ba + │ ├── 1 + │ │ 2 + │ │ 3 + │ └── a + │ b + └── Z + `maxlevel` limits the depth of the tree: + >>> print(RenderTree(root, maxlevel=2)) + Node('/root', lines=['c0fe', 'c0de']) + ├── Node('/root/sub0', lines=['ha', 'ba']) + └── Node('/root/sub1', lines=['Z']) + The `childiter` is responsible for iterating over child nodes at the + same level. An reversed order can be achived by using `reversed`. + >>> for row in RenderTree(root, childiter=reversed): + ... print("%s%s" % (row.pre, row.node.name)) + ... + root + ├── sub1 + └── sub0 + ├── sub0A + └── sub0B + Or writing your own sort function: + >>> def mysort(items): + ... return sorted(items, key=lambda item: item.name) + ... + >>> for row in RenderTree(root, childiter=mysort): + ... print("%s%s" % (row.pre, row.node.name)) + ... + root + ├── sub0 + │ ├── sub0A + │ └── sub0B + └── sub1 + :any:`by_attr` simplifies attribute rendering and supports multiline: + >>> print(RenderTree(root).by_attr()) + root + ├── sub0 + │ ├── sub0B + │ └── sub0A + └── sub1 + >>> print(RenderTree(root).by_attr("lines")) + c0fe + c0de + ├── ha + │ ba + │ ├── 1 + │ │ 2 + │ │ 3 + │ └── a + │ b + └── Z + And can be a function: + >>> print(RenderTree(root).by_attr(lambda n: " ".join(n.lines))) + c0fe c0de + ├── ha ba + │ ├── 1 2 3 + │ └── a b + └── Z + """ + if not isinstance(style, AbstractStyle): + style = style() + self.node = node + self.style = style + self.childiter = childiter + self.maxlevel = maxlevel + + def __iter__(self): + return self.__next(self.node, tuple()) + + def __next(self, node, continues, level=0): + yield RenderTree.__item(node, continues, self.style) + children = node.children.values() + level += 1 + if children and (self.maxlevel is None or level < self.maxlevel): + children = self.childiter(children) + for child, is_last in _is_last(children): + for grandchild in self.__next( + child, continues + (not is_last,), level=level + ): + yield grandchild + + @staticmethod + def __item(node, continues, style): + if not continues: + return Row("", "", node) + else: + items = [style.vertical if cont else style.empty for cont in continues] + indent = "".join(items[:-1]) + branch = style.cont if continues[-1] else style.end + pre = indent + branch + fill = "".join(items) + return Row(pre, fill, node) + + def __str__(self): + lines = ["%s%r" % (pre, node) for pre, _, node in self] + return "\n".join(lines) + + def __repr__(self): + classname = self.__class__.__name__ + args = [ + repr(self.node), + "style=%s" % repr(self.style), + "childiter=%s" % repr(self.childiter), + ] + return "%s(%s)" % (classname, ", ".join(args)) + + def by_attr(self, attrname="name"): + """ + Return rendered tree with node attribute `attrname`. + >>> from anytree import AnyNode, RenderTree + >>> root = AnyNode(id="root") + >>> s0 = AnyNode(id="sub0", parent=root) + >>> s0b = AnyNode(id="sub0B", parent=s0, foo=4, bar=109) + >>> s0a = AnyNode(id="sub0A", parent=s0) + >>> s1 = AnyNode(id="sub1", parent=root) + >>> s1a = AnyNode(id="sub1A", parent=s1) + >>> s1b = AnyNode(id="sub1B", parent=s1, bar=8) + >>> s1c = AnyNode(id="sub1C", parent=s1) + >>> s1ca = AnyNode(id="sub1Ca", parent=s1c) + >>> print(RenderTree(root).by_attr("id")) + root + ├── sub0 + │ ├── sub0B + │ └── sub0A + └── sub1 + ├── sub1A + ├── sub1B + └── sub1C + └── sub1Ca + """ + + def get(): + for pre, fill, node in self: + attr = ( + attrname(node) + if callable(attrname) + else getattr(node, attrname, "") + ) + if isinstance(attr, (list, tuple)): + lines = attr + else: + lines = str(attr).split("\n") + yield "%s%s" % (pre, lines[0]) + for line in lines[1:]: + yield "%s%s" % (fill, line) + + return "\n".join(get()) + + +def _is_last(iterable): + iter_ = iter(iterable) + try: + nextitem = next(iter_) + except StopIteration: + pass + else: + item = nextitem + while True: + try: + nextitem = next(iter_) + yield item, False + except StopIteration: + yield nextitem, True + break + item = nextitem diff --git a/xarray/datatree_/datatree/tests/test_dataset_api.py b/xarray/datatree_/datatree/tests/test_dataset_api.py index 9bc57d47da0..f8bae063383 100644 --- a/xarray/datatree_/datatree/tests/test_dataset_api.py +++ b/xarray/datatree_/datatree/tests/test_dataset_api.py @@ -10,44 +10,44 @@ class TestDSMethodInheritance: def test_dataset_method(self): ds = xr.Dataset({"a": ("x", [1, 2, 3])}) - dt = DataTree("root", data=ds) - DataTree("results", parent=dt, data=ds) + dt = DataTree(data=ds) + DataTree(name="results", parent=dt, data=ds) - expected = DataTree("root", data=ds.isel(x=1)) - DataTree("results", parent=expected, data=ds.isel(x=1)) + expected = DataTree(data=ds.isel(x=1)) + DataTree(name="results", parent=expected, data=ds.isel(x=1)) result = dt.isel(x=1) assert_equal(result, expected) def test_reduce_method(self): ds = xr.Dataset({"a": ("x", [False, True, False])}) - dt = DataTree("root", data=ds) - DataTree("results", parent=dt, data=ds) + dt = DataTree(data=ds) + DataTree(name="results", parent=dt, data=ds) - expected = DataTree("root", data=ds.any()) - DataTree("results", parent=expected, data=ds.any()) + expected = DataTree(data=ds.any()) + DataTree(name="results", parent=expected, data=ds.any()) result = dt.any() assert_equal(result, expected) def test_nan_reduce_method(self): ds = xr.Dataset({"a": ("x", [1, 2, 3])}) - dt = DataTree("root", data=ds) - DataTree("results", parent=dt, data=ds) + dt = DataTree(data=ds) + DataTree(name="results", parent=dt, data=ds) - expected = DataTree("root", data=ds.mean()) - DataTree("results", parent=expected, data=ds.mean()) + expected = DataTree(data=ds.mean()) + DataTree(name="results", parent=expected, data=ds.mean()) result = dt.mean() assert_equal(result, expected) def test_cum_method(self): ds = xr.Dataset({"a": ("x", [1, 2, 3])}) - dt = DataTree("root", data=ds) - DataTree("results", parent=dt, data=ds) + dt = DataTree(data=ds) + DataTree(name="results", parent=dt, data=ds) - expected = DataTree("root", data=ds.cumsum()) - DataTree("results", parent=expected, data=ds.cumsum()) + expected = DataTree(data=ds.cumsum()) + DataTree(name="results", parent=expected, data=ds.cumsum()) result = dt.cumsum() assert_equal(result, expected) @@ -57,11 +57,11 @@ class TestOps: def test_binary_op_on_int(self): ds1 = xr.Dataset({"a": [5], "b": [3]}) ds2 = xr.Dataset({"x": [0.1, 0.2], "y": [10, 20]}) - dt = DataTree("root", data=ds1) - DataTree("subnode", data=ds2, parent=dt) + dt = DataTree(data=ds1) + DataTree(name="subnode", data=ds2, parent=dt) - expected = DataTree("root", data=ds1 * 5) - DataTree("subnode", data=ds2 * 5, parent=expected) + expected = DataTree(data=ds1 * 5) + DataTree(name="subnode", data=ds2 * 5, parent=expected) result = dt * 5 assert_equal(result, expected) @@ -69,12 +69,12 @@ def test_binary_op_on_int(self): def test_binary_op_on_dataset(self): ds1 = xr.Dataset({"a": [5], "b": [3]}) ds2 = xr.Dataset({"x": [0.1, 0.2], "y": [10, 20]}) - dt = DataTree("root", data=ds1) - DataTree("subnode", data=ds2, parent=dt) + dt = DataTree(data=ds1) + DataTree(name="subnode", data=ds2, parent=dt) other_ds = xr.Dataset({"z": ("z", [0.1, 0.2])}) - expected = DataTree("root", data=ds1 * other_ds) - DataTree("subnode", data=ds2 * other_ds, parent=expected) + expected = DataTree(data=ds1 * other_ds) + DataTree(name="subnode", data=ds2 * other_ds, parent=expected) result = dt * other_ds assert_equal(result, expected) @@ -82,11 +82,11 @@ def test_binary_op_on_dataset(self): def test_binary_op_on_datatree(self): ds1 = xr.Dataset({"a": [5], "b": [3]}) ds2 = xr.Dataset({"x": [0.1, 0.2], "y": [10, 20]}) - dt = DataTree("root", data=ds1) - DataTree("subnode", data=ds2, parent=dt) + dt = DataTree(data=ds1) + DataTree(name="subnode", data=ds2, parent=dt) - expected = DataTree("root", data=ds1 * ds1) - DataTree("subnode", data=ds2 * ds2, parent=expected) + expected = DataTree(data=ds1 * ds1) + DataTree(name="subnode", data=ds2 * ds2, parent=expected) result = dt * dt assert_equal(result, expected) diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 0f5d8576967..3bf28a3aac6 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -3,19 +3,15 @@ import pytest import xarray as xr import xarray.testing as xrt -from anytree.resolver import ChildResolverError from datatree import DataTree -from datatree.io import open_datatree -from datatree.testing import assert_equal -from datatree.tests import requires_h5netcdf, requires_netCDF4, requires_zarr def create_test_datatree(modify=lambda ds: ds): """ Create a test datatree with this structure: - + |-- set1 | |-- | | Dimensions: () @@ -46,7 +42,7 @@ def create_test_datatree(modify=lambda ds: ds): root_data = modify(xr.Dataset({"a": ("y", [6, 7, 8]), "set0": ("x", [9, 10])})) # Avoid using __init__ so we can independently test it - root = DataTree(name="root", data=root_data) + root = DataTree(data=root_data) set1 = DataTree(name="set1", parent=root, data=set1_data) DataTree(name="set1", parent=set1) DataTree(name="set2", parent=set1) @@ -57,16 +53,37 @@ def create_test_datatree(modify=lambda ds: ds): return root +class TestTreeCreation: + def test_empty(self): + dt = DataTree(name="root") + assert dt.name == "root" + assert dt.parent is None + assert dt.children == {} + xrt.assert_identical(dt.ds, xr.Dataset()) + + def test_unnamed(self): + dt = DataTree() + assert dt.name is None + + +class TestFamilyTree: + def test_setparent_unnamed_child_node_fails(self): + john = DataTree(name="john") + with pytest.raises(ValueError, match="unnamed"): + DataTree(parent=john) + + class TestStoreDatasets: - def test_create_DataTree(self): + def test_create_with_data(self): dat = xr.Dataset({"a": 0}) - john = DataTree("john", data=dat) + john = DataTree(name="john", data=dat) assert john.ds is dat + with pytest.raises(TypeError): - DataTree("mary", parent=john, data="junk") + DataTree(name="mary", parent=john, data="junk") # noqa def test_set_data(self): - john = DataTree("john") + john = DataTree(name="john") dat = xr.Dataset({"a": 0}) john.ds = dat assert john.ds is dat @@ -74,25 +91,22 @@ def test_set_data(self): john.ds = "junk" def test_has_data(self): - john = DataTree("john", data=xr.Dataset({"a": 0})) + john = DataTree(name="john", data=xr.Dataset({"a": 0})) assert john.has_data - john = DataTree("john", data=None) + john = DataTree(name="john", data=None) assert not john.has_data class TestVariablesChildrenNameCollisions: def test_parent_already_has_variable_with_childs_name(self): - dt = DataTree("root", data=xr.Dataset({"a": [0], "b": 1})) - with pytest.raises(KeyError, match="already contains a data variable named a"): - DataTree("a", data=None, parent=dt) - + dt = DataTree(data=xr.Dataset({"a": [0], "b": 1})) with pytest.raises(KeyError, match="already contains a data variable named a"): - dt.add_child(DataTree("a", data=None)) + DataTree(name="a", data=None, parent=dt) def test_assign_when_already_child_with_variables_name(self): - dt = DataTree("root", data=None) - DataTree("a", data=None, parent=dt) + dt = DataTree(data=None) + DataTree(name="a", data=None, parent=dt) with pytest.raises(KeyError, match="already has a child named a"): dt.ds = xr.Dataset({"a": 0}) @@ -103,181 +117,217 @@ def test_assign_when_already_child_with_variables_name(self): @pytest.mark.xfail def test_update_when_already_child_with_variables_name(self): # See issue https://github.com/xarray-contrib/datatree/issues/38 - dt = DataTree("root", data=None) - DataTree("a", data=None, parent=dt) + dt = DataTree(name="root", data=None) + DataTree(name="a", data=None, parent=dt) with pytest.raises(KeyError, match="already has a child named a"): dt.ds["a"] = xr.DataArray(0) -class TestGetItems: - def test_get_node(self): - folder1 = DataTree("folder1") - results = DataTree("results", parent=folder1) - highres = DataTree("highres", parent=results) +class TestGet: + ... + + +class TestGetItem: + def test_getitem_node(self): + folder1 = DataTree(name="folder1") + results = DataTree(name="results", parent=folder1) + highres = DataTree(name="highres", parent=results) assert folder1["results"] is results assert folder1["results/highres"] is highres - assert folder1[("results", "highres")] is highres - def test_get_single_data_variable(self): + def test_getitem_self(self): + dt = DataTree() + assert dt["."] is dt + + def test_getitem_single_data_variable(self): data = xr.Dataset({"temp": [0, 50]}) - results = DataTree("results", data=data) + results = DataTree(name="results", data=data) xrt.assert_identical(results["temp"], data["temp"]) - def test_get_single_data_variable_from_node(self): + def test_getitem_single_data_variable_from_node(self): data = xr.Dataset({"temp": [0, 50]}) - folder1 = DataTree("folder1") - results = DataTree("results", parent=folder1) - DataTree("highres", parent=results, data=data) + folder1 = DataTree(name="folder1") + results = DataTree(name="results", parent=folder1) + DataTree(name="highres", parent=results, data=data) xrt.assert_identical(folder1["results/highres/temp"], data["temp"]) - xrt.assert_identical(folder1[("results", "highres", "temp")], data["temp"]) - def test_get_nonexistent_node(self): - folder1 = DataTree("folder1") - DataTree("results", parent=folder1) - with pytest.raises(ChildResolverError): + def test_getitem_nonexistent_node(self): + folder1 = DataTree(name="folder1") + DataTree(name="results", parent=folder1) + with pytest.raises(KeyError): folder1["results/highres"] - def test_get_nonexistent_variable(self): + def test_getitem_nonexistent_variable(self): data = xr.Dataset({"temp": [0, 50]}) - results = DataTree("results", data=data) - with pytest.raises(ChildResolverError): + results = DataTree(name="results", data=data) + with pytest.raises(KeyError): results["pressure"] - def test_get_multiple_data_variables(self): + @pytest.mark.xfail(reason="Should be deprecated in favour of .subset") + def test_getitem_multiple_data_variables(self): data = xr.Dataset({"temp": [0, 50], "p": [5, 8, 7]}) - results = DataTree("results", data=data) + results = DataTree(name="results", data=data) xrt.assert_identical(results[["temp", "p"]], data[["temp", "p"]]) - def test_dict_like_selection_access_to_dataset(self): + @pytest.mark.xfail(reason="Indexing needs to return whole tree (GH https://github.com/xarray-contrib/datatree/issues/77)") + def test_getitem_dict_like_selection_access_to_dataset(self): data = xr.Dataset({"temp": [0, 50]}) - results = DataTree("results", data=data) + results = DataTree(name="results", data=data) xrt.assert_identical(results[{"temp": 1}], data[{"temp": 1}]) -class TestSetItems: - # TODO test tuple-style access too - def test_set_new_child_node(self): - john = DataTree("john") - mary = DataTree("mary") - john["/"] = mary - assert john["mary"] is mary - - def test_set_new_grandchild_node(self): - john = DataTree("john") - DataTree("mary", parent=john) - rose = DataTree("rose") - john["mary/"] = rose - assert john["mary/rose"] is rose - - def test_set_new_empty_node(self): - john = DataTree("john") - john["mary"] = None +class TestUpdate: + ... + + +class TestSetItem: + def test_setitem_new_child_node(self): + john = DataTree(name="john") + mary = DataTree(name="mary") + john["Mary"] = mary + assert john["Mary"] is mary + + def test_setitem_unnamed_child_node_becomes_named(self): + john2 = DataTree(name="john2") + john2["sonny"] = DataTree() + assert john2["sonny"].name == "sonny" + + @pytest.mark.xfail(reason="bug with name overwriting") + def test_setitem_child_node_keeps_name(self): + john = DataTree(name="john") + r2d2 = DataTree(name="R2D2") + john["Mary"] = r2d2 + assert r2d2.name == "R2D2" + + def test_setitem_new_grandchild_node(self): + john = DataTree(name="john") + DataTree(name="mary", parent=john) + rose = DataTree(name="rose") + john["Mary/Rose"] = rose + assert john["Mary/Rose"] is rose + + def test_setitem_new_empty_node(self): + john = DataTree(name="john") + john["mary"] = DataTree() mary = john["mary"] assert isinstance(mary, DataTree) xrt.assert_identical(mary.ds, xr.Dataset()) - def test_overwrite_data_in_node_with_none(self): - john = DataTree("john") - mary = DataTree("mary", parent=john, data=xr.Dataset()) - john["mary"] = None + def test_setitem_overwrite_data_in_node_with_none(self): + john = DataTree(name="john") + mary = DataTree(name="mary", parent=john, data=xr.Dataset()) + john["mary"] = DataTree() xrt.assert_identical(mary.ds, xr.Dataset()) john.ds = xr.Dataset() - john["/"] = None - xrt.assert_identical(john.ds, xr.Dataset()) + with pytest.raises(ValueError, match="has no name"): + john["."] = DataTree() - def test_set_dataset_on_this_node(self): + @pytest.mark.xfail(reason="assigning Datasets doesn't yet create new nodes") + def test_setitem_dataset_on_this_node(self): data = xr.Dataset({"temp": [0, 50]}) - results = DataTree("results") - results["/"] = data + results = DataTree(name="results") + results["."] = data assert results.ds is data - def test_set_dataset_as_new_node(self): + @pytest.mark.xfail(reason="assigning Datasets doesn't yet create new nodes") + def test_setitem_dataset_as_new_node(self): data = xr.Dataset({"temp": [0, 50]}) - folder1 = DataTree("folder1") + folder1 = DataTree(name="folder1") folder1["results"] = data assert folder1["results"].ds is data - def test_set_dataset_as_new_node_requiring_intermediate_nodes(self): + @pytest.mark.xfail(reason="assigning Datasets doesn't yet create new nodes") + def test_setitem_dataset_as_new_node_requiring_intermediate_nodes(self): data = xr.Dataset({"temp": [0, 50]}) - folder1 = DataTree("folder1") + folder1 = DataTree(name="folder1") folder1["results/highres"] = data assert folder1["results/highres"].ds is data - def test_set_named_dataarray_as_new_node(self): + def test_setitem_named_dataarray(self): data = xr.DataArray(name="temp", data=[0, 50]) - folder1 = DataTree("folder1") + folder1 = DataTree(name="folder1") folder1["results"] = data - xrt.assert_identical(folder1["results"].ds, data.to_dataset()) + expected = data.rename("results") + xrt.assert_equal(folder1["results"], expected) - def test_set_unnamed_dataarray(self): + def test_setitem_unnamed_dataarray(self): data = xr.DataArray([0, 50]) - folder1 = DataTree("folder1") - with pytest.raises(ValueError, match="unable to convert"): - folder1["results"] = data + folder1 = DataTree(name="folder1") + folder1["results"] = data + xrt.assert_equal(folder1["results"], data) - def test_add_new_variable_to_empty_node(self): - results = DataTree("results") - results["/"] = xr.DataArray(name="pressure", data=[2, 3]) + def test_setitem_add_new_variable_to_empty_node(self): + results = DataTree(name="results") + results["pressure"] = xr.DataArray(data=[2, 3]) assert "pressure" in results.ds + results["temp"] = xr.Variable(data=[10, 11], dims=["x"]) + assert "temp" in results.ds # What if there is a path to traverse first? - results = DataTree("results") - results["highres/"] = xr.DataArray(name="pressure", data=[2, 3]) + results = DataTree(name="results") + results["highres/pressure"] = xr.DataArray(data=[2, 3]) assert "pressure" in results["highres"].ds + results["highres/temp"] = xr.Variable(data=[10, 11], dims=["x"]) + assert "temp" in results["highres"].ds - def test_dataarray_replace_existing_node(self): + def test_setitem_dataarray_replace_existing_node(self): t = xr.Dataset({"temp": [0, 50]}) - results = DataTree("results", data=t) - p = xr.DataArray(name="pressure", data=[2, 3]) - results["/"] = p - xrt.assert_identical(results.ds, p.to_dataset()) + results = DataTree(name="results", data=t) + p = xr.DataArray(data=[2, 3]) + results["pressure"] = p + expected = t.assign(pressure=p) + xrt.assert_identical(results.ds, expected) -class TestTreeCreation: - def test_empty(self): - dt = DataTree() - assert dt.name == "root" - assert dt.parent is None - assert dt.children == () - xrt.assert_identical(dt.ds, xr.Dataset()) +class TestDictionaryInterface: + ... + +class TestTreeFromDict: def test_data_in_root(self): dat = xr.Dataset() - dt = DataTree.from_dict({"root": dat}) - assert dt.name == "root" + dt = DataTree.from_dict({"/": dat}) + assert dt.name is None assert dt.parent is None - assert dt.children == () + assert dt.children == {} assert dt.ds is dat def test_one_layer(self): dat1, dat2 = xr.Dataset({"a": 1}), xr.Dataset({"b": 2}) dt = DataTree.from_dict({"run1": dat1, "run2": dat2}) xrt.assert_identical(dt.ds, xr.Dataset()) + assert dt.name is None assert dt["run1"].ds is dat1 - assert dt["run1"].children == () + assert dt["run1"].children == {} assert dt["run2"].ds is dat2 - assert dt["run2"].children == () + assert dt["run2"].children == {} def test_two_layers(self): dat1, dat2 = xr.Dataset({"a": 1}), xr.Dataset({"a": [1, 2]}) dt = DataTree.from_dict({"highres/run": dat1, "lowres/run": dat2}) - assert "highres" in [c.name for c in dt.children] - assert "lowres" in [c.name for c in dt.children] - highres_run = dt.get_node("highres/run") + assert "highres" in dt.children + assert "lowres" in dt.children + highres_run = dt["highres/run"] assert highres_run.ds is dat1 + def test_nones(self): + dt = DataTree.from_dict({"d": None, "d/e": None}) + assert [node.name for node in dt.subtree] == [None, "d", "e"] + assert [node.path for node in dt.subtree] == ["/", "/d", "/d/e"] + xrt.assert_equal(dt["d/e"].ds, xr.Dataset()) + def test_full(self): dt = create_test_datatree() - paths = list(node.pathstr for node in dt.subtree) + paths = list(node.path for node in dt.subtree) assert paths == [ - "root", - "root/set1", - "root/set1/set1", - "root/set1/set2", - "root/set2", - "root/set2/set1", - "root/set3", + "/", + "/set1", + "/set1/set1", + "/set1/set2", + "/set2", + "/set2/set1", + "/set3", ] @@ -291,13 +341,13 @@ class TestRestructuring: class TestRepr: def test_print_empty_node(self): - dt = DataTree("root") + dt = DataTree(name="root") printout = dt.__str__() assert printout == "DataTree('root', parent=None)" def test_print_empty_node_with_attrs(self): dat = xr.Dataset(attrs={"note": "has attrs"}) - dt = DataTree("root", data=dat) + dt = DataTree(name="root", data=dat) printout = dt.__str__() assert printout == textwrap.dedent( """\ @@ -311,7 +361,7 @@ def test_print_empty_node_with_attrs(self): def test_print_node_with_data(self): dat = xr.Dataset({"a": [0, 2]}) - dt = DataTree("root", data=dat) + dt = DataTree(name="root", data=dat) printout = dt.__str__() expected = [ "DataTree('root', parent=None)", @@ -326,69 +376,18 @@ def test_print_node_with_data(self): def test_nested_node(self): dat = xr.Dataset({"a": [0, 2]}) - root = DataTree("root") - DataTree("results", data=dat, parent=root) + root = DataTree(name="root") + DataTree(name="results", data=dat, parent=root) printout = root.__str__() assert printout.splitlines()[2].startswith(" ") def test_print_datatree(self): dt = create_test_datatree() print(dt) - print(dt.descendants) # TODO work out how to test something complex like this def test_repr_of_node_with_data(self): dat = xr.Dataset({"a": [0, 2]}) - dt = DataTree("root", data=dat) + dt = DataTree(name="root", data=dat) assert "Coordinates" in repr(dt) - - -class TestIO: - @requires_netCDF4 - def test_to_netcdf(self, tmpdir): - filepath = str( - tmpdir / "test.nc" - ) # casting to str avoids a pathlib bug in xarray - original_dt = create_test_datatree() - original_dt.to_netcdf(filepath, engine="netcdf4") - - roundtrip_dt = open_datatree(filepath) - assert_equal(original_dt, roundtrip_dt) - - @requires_h5netcdf - def test_to_h5netcdf(self, tmpdir): - filepath = str( - tmpdir / "test.nc" - ) # casting to str avoids a pathlib bug in xarray - original_dt = create_test_datatree() - original_dt.to_netcdf(filepath, engine="h5netcdf") - - roundtrip_dt = open_datatree(filepath) - assert_equal(original_dt, roundtrip_dt) - - @requires_zarr - def test_to_zarr(self, tmpdir): - filepath = str( - tmpdir / "test.zarr" - ) # casting to str avoids a pathlib bug in xarray - original_dt = create_test_datatree() - original_dt.to_zarr(filepath) - - roundtrip_dt = open_datatree(filepath, engine="zarr") - assert_equal(original_dt, roundtrip_dt) - - @requires_zarr - def test_to_zarr_not_consolidated(self, tmpdir): - filepath = tmpdir / "test.zarr" - zmetadata = filepath / ".zmetadata" - s1zmetadata = filepath / "set1" / ".zmetadata" - filepath = str(filepath) # casting to str avoids a pathlib bug in xarray - original_dt = create_test_datatree() - original_dt.to_zarr(filepath, consolidated=False) - assert not zmetadata.exists() - assert not s1zmetadata.exists() - - with pytest.warns(RuntimeWarning, match="consolidated"): - roundtrip_dt = open_datatree(filepath, engine="zarr") - assert_equal(original_dt, roundtrip_dt) diff --git a/xarray/datatree_/datatree/tests/test_formatting.py b/xarray/datatree_/datatree/tests/test_formatting.py index ba582a07bd4..995a7c85fb4 100644 --- a/xarray/datatree_/datatree/tests/test_formatting.py +++ b/xarray/datatree_/datatree/tests/test_formatting.py @@ -15,8 +15,8 @@ def test_diff_structure(self): """\ Left and right DataTree objects are not isomorphic - Number of children on node 'root/a' of the left object: 2 - Number of children on node 'root/d' of the right object: 1""" + Number of children on node '/a' of the left object: 2 + Number of children on node '/d' of the right object: 1""" ) actual = diff_tree_repr(dt_1, dt_2, "isomorphic") assert actual == expected @@ -29,8 +29,8 @@ def test_diff_node_names(self): """\ Left and right DataTree objects are not identical - Node 'root/a' in the left object has name 'a' - Node 'root/b' in the right object has name 'b'""" + Node '/a' in the left object has name 'a' + Node '/b' in the right object has name 'b'""" ) actual = diff_tree_repr(dt_1, dt_2, "identical") assert actual == expected @@ -48,12 +48,12 @@ def test_diff_node_data(self): Left and right DataTree objects are not equal - Data in nodes at position 'root/a' do not match: + Data in nodes at position '/a' do not match: Data variables only on the left object: v int64 1 - Data in nodes at position 'root/a/b' do not match: + Data in nodes at position '/a/b' do not match: Differing data variables: L w int64 5 diff --git a/xarray/datatree_/datatree/tests/test_io.py b/xarray/datatree_/datatree/tests/test_io.py new file mode 100644 index 00000000000..659f0c31463 --- /dev/null +++ b/xarray/datatree_/datatree/tests/test_io.py @@ -0,0 +1,56 @@ +import pytest + +from datatree.io import open_datatree +from datatree.testing import assert_equal +from datatree.tests import requires_h5netcdf, requires_netCDF4, requires_zarr +from datatree.tests.test_datatree import create_test_datatree + + +class TestIO: + @requires_netCDF4 + def test_to_netcdf(self, tmpdir): + filepath = str( + tmpdir / "test.nc" + ) # casting to str avoids a pathlib bug in xarray + original_dt = create_test_datatree() + original_dt.to_netcdf(filepath, engine="netcdf4") + + roundtrip_dt = open_datatree(filepath) + assert_equal(original_dt, roundtrip_dt) + + @requires_h5netcdf + def test_to_h5netcdf(self, tmpdir): + filepath = str( + tmpdir / "test.nc" + ) # casting to str avoids a pathlib bug in xarray + original_dt = create_test_datatree() + original_dt.to_netcdf(filepath, engine="h5netcdf") + + roundtrip_dt = open_datatree(filepath) + assert_equal(original_dt, roundtrip_dt) + + @requires_zarr + def test_to_zarr(self, tmpdir): + filepath = str( + tmpdir / "test.zarr" + ) # casting to str avoids a pathlib bug in xarray + original_dt = create_test_datatree() + original_dt.to_zarr(filepath) + + roundtrip_dt = open_datatree(filepath, engine="zarr") + assert_equal(original_dt, roundtrip_dt) + + @requires_zarr + def test_to_zarr_not_consolidated(self, tmpdir): + filepath = tmpdir / "test.zarr" + zmetadata = filepath / ".zmetadata" + s1zmetadata = filepath / "set1" / ".zmetadata" + filepath = str(filepath) # casting to str avoids a pathlib bug in xarray + original_dt = create_test_datatree() + original_dt.to_zarr(filepath, consolidated=False) + assert not zmetadata.exists() + assert not s1zmetadata.exists() + + with pytest.warns(RuntimeWarning, match="consolidated"): + roundtrip_dt = open_datatree(filepath, engine="zarr") + assert_equal(original_dt, roundtrip_dt) diff --git a/xarray/datatree_/datatree/tests/test_mapping.py b/xarray/datatree_/datatree/tests/test_mapping.py index 8ea4682b137..0bdd3be6f44 100644 --- a/xarray/datatree_/datatree/tests/test_mapping.py +++ b/xarray/datatree_/datatree/tests/test_mapping.py @@ -4,7 +4,6 @@ from datatree.datatree import DataTree from datatree.mapping import TreeIsomorphismError, check_isomorphic, map_over_subtree from datatree.testing import assert_equal -from datatree.treenode import TreeNode from .test_datatree import create_test_datatree @@ -17,73 +16,62 @@ def test_not_a_tree(self): check_isomorphic("s", 1) def test_different_widths(self): - dt1 = DataTree.from_dict(data_objects={"a": empty}) - dt2 = DataTree.from_dict(data_objects={"b": empty, "c": empty}) + dt1 = DataTree.from_dict(d={"a": empty}) + dt2 = DataTree.from_dict(d={"b": empty, "c": empty}) expected_err_str = ( - "Number of children on node 'root' of the left object: 1\n" - "Number of children on node 'root' of the right object: 2" + "Number of children on node '/' of the left object: 1\n" + "Number of children on node '/' of the right object: 2" ) with pytest.raises(TreeIsomorphismError, match=expected_err_str): check_isomorphic(dt1, dt2) def test_different_heights(self): - dt1 = DataTree.from_dict(data_objects={"a": empty}) - dt2 = DataTree.from_dict(data_objects={"b": empty, "b/c": empty}) + dt1 = DataTree.from_dict({"a": empty}) + dt2 = DataTree.from_dict({"b": empty, "b/c": empty}) expected_err_str = ( - "Number of children on node 'root/a' of the left object: 0\n" - "Number of children on node 'root/b' of the right object: 1" + "Number of children on node '/a' of the left object: 0\n" + "Number of children on node '/b' of the right object: 1" ) with pytest.raises(TreeIsomorphismError, match=expected_err_str): check_isomorphic(dt1, dt2) def test_names_different(self): - dt1 = DataTree.from_dict(data_objects={"a": xr.Dataset()}) - dt2 = DataTree.from_dict(data_objects={"b": empty}) + dt1 = DataTree.from_dict({"a": xr.Dataset()}) + dt2 = DataTree.from_dict({"b": empty}) expected_err_str = ( - "Node 'root/a' in the left object has name 'a'\n" - "Node 'root/b' in the right object has name 'b'" + "Node '/a' in the left object has name 'a'\n" + "Node '/b' in the right object has name 'b'" ) with pytest.raises(TreeIsomorphismError, match=expected_err_str): check_isomorphic(dt1, dt2, require_names_equal=True) def test_isomorphic_names_equal(self): - dt1 = DataTree.from_dict( - data_objects={"a": empty, "b": empty, "b/c": empty, "b/d": empty} - ) - dt2 = DataTree.from_dict( - data_objects={"a": empty, "b": empty, "b/c": empty, "b/d": empty} - ) + dt1 = DataTree.from_dict({"a": empty, "b": empty, "b/c": empty, "b/d": empty}) + dt2 = DataTree.from_dict({"a": empty, "b": empty, "b/c": empty, "b/d": empty}) check_isomorphic(dt1, dt2, require_names_equal=True) def test_isomorphic_ordering(self): - dt1 = DataTree.from_dict( - data_objects={"a": empty, "b": empty, "b/d": empty, "b/c": empty} - ) - dt2 = DataTree.from_dict( - data_objects={"a": empty, "b": empty, "b/c": empty, "b/d": empty} - ) + dt1 = DataTree.from_dict({"a": empty, "b": empty, "b/d": empty, "b/c": empty}) + dt2 = DataTree.from_dict({"a": empty, "b": empty, "b/c": empty, "b/d": empty}) check_isomorphic(dt1, dt2, require_names_equal=False) def test_isomorphic_names_not_equal(self): - dt1 = DataTree.from_dict( - data_objects={"a": empty, "b": empty, "b/c": empty, "b/d": empty} - ) - dt2 = DataTree.from_dict( - data_objects={"A": empty, "B": empty, "B/C": empty, "B/D": empty} - ) + dt1 = DataTree.from_dict({"a": empty, "b": empty, "b/c": empty, "b/d": empty}) + dt2 = DataTree.from_dict({"A": empty, "B": empty, "B/C": empty, "B/D": empty}) check_isomorphic(dt1, dt2) def test_not_isomorphic_complex_tree(self): dt1 = create_test_datatree() dt2 = create_test_datatree() - dt2.set_node("set1/set2", TreeNode("set3")) - with pytest.raises(TreeIsomorphismError, match="root/set1/set2"): + dt2["set1/set2/extra"] = DataTree(name="extra") + with pytest.raises(TreeIsomorphismError, match="/set1/set2"): check_isomorphic(dt1, dt2) def test_checking_from_root(self): dt1 = create_test_datatree() dt2 = create_test_datatree() - dt1.parent = DataTree(name="real_root") + real_root = DataTree() + real_root["fake_root"] = dt2 with pytest.raises(TreeIsomorphismError): check_isomorphic(dt1, dt2, check_from_root=True) @@ -100,7 +88,7 @@ def times_ten(ds): def test_not_isomorphic(self): dt1 = create_test_datatree() dt2 = create_test_datatree() - dt2["set4"] = None + dt2["set1/set2/extra"] = DataTree(name="extra") @map_over_subtree def times_ten(ds1, ds2): diff --git a/xarray/datatree_/datatree/tests/test_treenode.py b/xarray/datatree_/datatree/tests/test_treenode.py index 0c86af16dfa..1b4ebc94daf 100644 --- a/xarray/datatree_/datatree/tests/test_treenode.py +++ b/xarray/datatree_/datatree/tests/test_treenode.py @@ -1,271 +1,306 @@ import pytest -from anytree.node.exceptions import TreeError -from anytree.resolver import ChildResolverError -from datatree.treenode import TreeNode +from datatree.iterators import LevelOrderIter, PreOrderIter +from datatree.treenode import TreeError, TreeNode class TestFamilyTree: def test_lonely(self): - root = TreeNode("root") - assert root.name == "root" + root = TreeNode() assert root.parent is None - assert root.children == () + assert root.children == {} def test_parenting(self): - john = TreeNode("john") - mary = TreeNode("mary", parent=john) + john = TreeNode() + mary = TreeNode() + mary._set_parent(john, "Mary") assert mary.parent == john - assert mary in john.children - - with pytest.raises(KeyError, match="already has a child named"): - TreeNode("mary", parent=john) - - with pytest.raises(TreeError, match="not of type 'NodeMixin'"): - mary.parent = "apple" + assert john.children["Mary"] is mary def test_parent_swap(self): - john = TreeNode("john") - mary = TreeNode("mary", parent=john) + john = TreeNode() + mary = TreeNode() + mary._set_parent(john, "Mary") + + steve = TreeNode() + mary._set_parent(steve, "Mary") - steve = TreeNode("steve") - mary.parent = steve - assert mary in steve.children - assert mary not in john.children + assert mary.parent == steve + assert steve.children["Mary"] is mary + assert "Mary" not in john.children def test_multi_child_family(self): - mary = TreeNode("mary") - kate = TreeNode("kate") - john = TreeNode("john", children=[mary, kate]) - assert mary in john.children - assert kate in john.children + mary = TreeNode() + kate = TreeNode() + john = TreeNode(children={"Mary": mary, "Kate": kate}) + assert john.children["Mary"] is mary + assert john.children["Kate"] is kate assert mary.parent is john assert kate.parent is john def test_disown_child(self): - john = TreeNode("john") - mary = TreeNode("mary", parent=john) - mary.parent = None - assert mary not in john.children - - def test_add_child(self): - john = TreeNode("john") - kate = TreeNode("kate") - john.add_child(kate) - assert kate in john.children - assert kate.parent is john - with pytest.raises(KeyError, match="already has a child named"): - john.add_child(TreeNode("kate")) + mary = TreeNode() + john = TreeNode(children={"Mary": mary}) + mary.orphan() + assert mary.parent is None + assert "Mary" not in john.children + + def test_doppelganger_child(self): + kate = TreeNode() + john = TreeNode() - def test_assign_children(self): - john = TreeNode("john") - jack = TreeNode("jack") - jill = TreeNode("jill") + with pytest.raises(TypeError): + john.children = {"Kate": 666} - john.children = (jack, jill) - assert jack in john.children - assert jack.parent is john - assert jill in john.children - assert jill.parent is john + with pytest.raises(TreeError, match="Cannot add same node"): + john.children = {"Kate": kate, "Evil_Kate": kate} - evil_twin_jill = TreeNode("jill") - with pytest.raises(KeyError, match="already has a child named"): - john.children = (jack, jill, evil_twin_jill) + john = TreeNode(children={"Kate": kate}) + evil_kate = TreeNode() + evil_kate._set_parent(john, "Kate") + assert john.children["Kate"] is evil_kate def test_sibling_relationships(self): - mary = TreeNode("mary") - kate = TreeNode("kate") - ashley = TreeNode("ashley") - john = TreeNode("john", children=[mary, kate, ashley]) - assert mary in kate.siblings - assert ashley in kate.siblings - assert kate not in kate.siblings - with pytest.raises(AttributeError): - kate.siblings = john - - @pytest.mark.xfail - def test_adoption(self): - raise NotImplementedError - - @pytest.mark.xfail - def test_root(self): - raise NotImplementedError - - @pytest.mark.xfail - def test_ancestors(self): - raise NotImplementedError + mary = TreeNode() + kate = TreeNode() + ashley = TreeNode() + TreeNode(children={"Mary": mary, "Kate": kate, "Ashley": ashley}) + assert kate.siblings["Mary"] is mary + assert kate.siblings["Ashley"] is ashley + assert "Kate" not in kate.siblings - @pytest.mark.xfail - def test_descendants(self): - raise NotImplementedError + def test_ancestors(self): + tony = TreeNode() + michael = TreeNode(children={"Tony": tony}) + vito = TreeNode(children={"Michael": michael}) + assert tony.root is vito + assert tony.lineage == (tony, michael, vito) + assert tony.ancestors == (vito, michael, tony) class TestGetNodes: def test_get_child(self): - john = TreeNode("john") - mary = TreeNode("mary", parent=john) - assert john.get_node("mary") is mary - assert john.get_node(("mary",)) is mary - - def test_get_nonexistent_child(self): - john = TreeNode("john") - TreeNode("jill", parent=john) - with pytest.raises(ChildResolverError): - john.get_node("mary") - - def test_get_grandchild(self): - john = TreeNode("john") - mary = TreeNode("mary", parent=john) - sue = TreeNode("sue", parent=mary) - assert john.get_node("mary/sue") is sue - assert john.get_node(("mary", "sue")) is sue - - def test_get_great_grandchild(self): - john = TreeNode("john") - mary = TreeNode("mary", parent=john) - sue = TreeNode("sue", parent=mary) - steven = TreeNode("steven", parent=sue) - assert john.get_node("mary/sue/steven") is steven - assert john.get_node(("mary", "sue", "steven")) is steven - - def test_get_from_middle_of_tree(self): - john = TreeNode("john") - mary = TreeNode("mary", parent=john) - sue = TreeNode("sue", parent=mary) - steven = TreeNode("steven", parent=sue) - assert mary.get_node("sue/steven") is steven - assert mary.get_node(("sue", "steven")) is steven + steven = TreeNode() + sue = TreeNode(children={"Steven": steven}) + mary = TreeNode(children={"Sue": sue}) + john = TreeNode(children={"Mary": mary}) + + # get child + assert john._get_item("Mary") is mary + assert mary._get_item("Sue") is sue + + # no child exists + with pytest.raises(KeyError): + john._get_item("Kate") + + # get grandchild + assert john._get_item("Mary/Sue") is sue + + # get great-grandchild + assert john._get_item("Mary/Sue/Steven") is steven + + # get from middle of tree + assert mary._get_item("Sue/Steven") is steven + + def test_get_upwards(self): + sue = TreeNode() + kate = TreeNode() + mary = TreeNode(children={"Sue": sue, "Kate": kate}) + john = TreeNode(children={"Mary": mary}) + + assert sue._get_item("../") is mary + assert sue._get_item("../../") is john + + # relative path + assert sue._get_item("../Kate") is kate + + def test_get_from_root(self): + sue = TreeNode() + mary = TreeNode(children={"Sue": sue}) + john = TreeNode(children={"Mary": mary}) # noqa + + assert sue._get_item("/Mary") is mary + + +class TestPaths: + def test_path_property(self): + sue = TreeNode() + mary = TreeNode(children={"Sue": sue}) + john = TreeNode(children={"Mary": mary}) # noqa + assert sue.path == "/Mary/Sue" + assert john.path == "/" + + def test_path_roundtrip(self): + sue = TreeNode() + mary = TreeNode(children={"Sue": sue}) + john = TreeNode(children={"Mary": mary}) # noqa + assert john._get_item(sue.path) == sue + + def test_same_tree(self): + mary = TreeNode() + kate = TreeNode() + john = TreeNode(children={"Mary": mary, "Kate": kate}) # noqa + assert mary.same_tree(kate) + + def test_relative_paths(self): + sue = TreeNode() + mary = TreeNode(children={"Sue": sue}) + annie = TreeNode() + john = TreeNode(children={"Mary": mary, "Annie": annie}) + + assert sue.relative_to(john) == "Mary/Sue" + assert john.relative_to(sue) == "../.." + assert annie.relative_to(sue) == "../../Annie" + assert sue.relative_to(annie) == "../Mary/Sue" + assert sue.relative_to(sue) == "." + + evil_kate = TreeNode() + with pytest.raises(ValueError, match="nodes do not lie within the same tree"): + sue.relative_to(evil_kate) class TestSetNodes: def test_set_child_node(self): - john = TreeNode("john") - mary = TreeNode("mary") - john.set_node("/", mary) + john = TreeNode() + mary = TreeNode() + john._set_item("Mary", mary) - mary = john.children[0] - assert mary.name == "mary" + assert john.children["Mary"] is mary assert isinstance(mary, TreeNode) - assert mary.children == () + assert mary.children == {} + assert mary.parent is john def test_child_already_exists(self): - john = TreeNode("john") - TreeNode("mary", parent=john) - marys_replacement = TreeNode("mary") - + mary = TreeNode() + john = TreeNode(children={"Mary": mary}) + mary_2 = TreeNode() with pytest.raises(KeyError): - john.set_node("/", marys_replacement, allow_overwrite=False) + john._set_item("Mary", mary_2, allow_overwrite=False) def test_set_grandchild(self): - john = TreeNode("john") - mary = TreeNode("mary") - rose = TreeNode("rose") - john.set_node("/", mary) - john.set_node("/mary/", rose) - - mary = john.children[0] - assert mary.name == "mary" + rose = TreeNode() + mary = TreeNode() + john = TreeNode() + + john._set_item("Mary", mary) + john._set_item("Mary/Rose", rose) + + assert john.children["Mary"] is mary assert isinstance(mary, TreeNode) - assert rose in mary.children + assert "Rose" in mary.children + assert rose.parent is mary - rose = mary.children[0] - assert rose.name == "rose" - assert isinstance(rose, TreeNode) - assert rose.children == () + def test_create_intermediate_child(self): + john = TreeNode() + rose = TreeNode() - def test_set_grandchild_and_create_intermediate_child(self): - john = TreeNode("john") - rose = TreeNode("rose") - john.set_node("/mary/", rose) + # test intermediate children not allowed + with pytest.raises(KeyError, match="Could not reach"): + john._set_item(path="Mary/Rose", item=rose, new_nodes_along_path=False) - mary = john.children[0] - assert mary.name == "mary" + # test intermediate children allowed + john._set_item("Mary/Rose", rose, new_nodes_along_path=True) + assert "Mary" in john.children + mary = john.children["Mary"] assert isinstance(mary, TreeNode) - assert mary.children[0] is rose - - rose = mary.children[0] - assert rose.name == "rose" - assert isinstance(rose, TreeNode) - assert rose.children == () - - def test_no_intermediate_children_allowed(self): - john = TreeNode("john") - rose = TreeNode("rose") - with pytest.raises(KeyError, match="Cannot reach"): - john.set_node( - path="mary", node=rose, new_nodes_along_path=False, allow_overwrite=True - ) - - def test_set_great_grandchild(self): - john = TreeNode("john") - mary = TreeNode("mary", parent=john) - rose = TreeNode("rose", parent=mary) - sue = TreeNode("sue") - john.set_node("mary/rose", sue) - assert sue.parent is rose + assert mary.children == {"Rose": rose} + assert rose.parent == mary + assert rose.parent == mary def test_overwrite_child(self): - john = TreeNode("john") - mary = TreeNode("mary") - john.set_node("/", mary) - assert mary in john.children - - marys_evil_twin = TreeNode("mary") - john.set_node("/", marys_evil_twin) - assert marys_evil_twin in john.children - assert mary not in john.children - - def test_dont_overwrite_child(self): - john = TreeNode("john") - mary = TreeNode("mary") - john.set_node("/", mary) - assert mary in john.children - - marys_evil_twin = TreeNode("mary") - with pytest.raises(KeyError, match="path already points"): - john.set_node( - "", marys_evil_twin, new_nodes_along_path=True, allow_overwrite=False - ) - assert mary in john.children - assert marys_evil_twin not in john.children + john = TreeNode() + mary = TreeNode() + john._set_item("Mary", mary) + # test overwriting not allowed + marys_evil_twin = TreeNode() + with pytest.raises(KeyError, match="Already a node object"): + john._set_item("Mary", marys_evil_twin, allow_overwrite=False) + assert john.children["Mary"] is mary + assert marys_evil_twin.parent is None -class TestPruning: - ... - + # test overwriting allowed + marys_evil_twin = TreeNode() + john._set_item("Mary", marys_evil_twin, allow_overwrite=True) + assert john.children["Mary"] is marys_evil_twin + assert marys_evil_twin.parent is john -class TestPaths: - def test_pathstr(self): - john = TreeNode("john") - mary = TreeNode("mary", parent=john) - rose = TreeNode("rose", parent=mary) - sue = TreeNode("sue", parent=rose) - assert sue.pathstr == "john/mary/rose/sue" - def test_relative_path(self): - ... +# TODO write and test all the del methods +class TestPruning: + ... -class TestTags: - ... +def create_test_tree(): + f = TreeNode() + b = TreeNode() + a = TreeNode() + d = TreeNode() + c = TreeNode() + e = TreeNode() + g = TreeNode() + i = TreeNode() + h = TreeNode() + + f.children = {"b": b, "g": g} + b.children = {"a": a, "d": d} + d.children = {"c": c, "e": e} + g.children = {"i": i} + i.children = {"h": h} + + return f + + +class TestIterators: + def test_preorderiter(self): + tree = create_test_tree() + result = [node.name for node in PreOrderIter(tree)] + expected = [ + None, # root TreeNode is unnamed + "b", + "a", + "d", + "c", + "e", + "g", + "i", + "h", + ] + assert result == expected + + def test_levelorderiter(self): + tree = create_test_tree() + result = [node.name for node in LevelOrderIter(tree)] + expected = [ + None, # root TreeNode is unnamed + "b", + "g", + "a", + "d", + "i", + "c", + "e", + "h", + ] + assert result == expected class TestRenderTree: def test_render_nodetree(self): - mary = TreeNode("mary") - kate = TreeNode("kate") - john = TreeNode("john", children=[mary, kate]) - TreeNode("Sam", parent=mary) - TreeNode("Ben", parent=mary) + sam = TreeNode() + ben = TreeNode() + mary = TreeNode(children={"Sam": sam, "Ben": ben}) + kate = TreeNode() + john = TreeNode(children={"Mary": mary, "Kate": kate}) printout = john.__str__() expected_nodes = [ - "TreeNode('john')", - "TreeNode('mary')", + "TreeNode()", + "TreeNode('Mary')", "TreeNode('Sam')", "TreeNode('Ben')", - "TreeNode('kate')", + "TreeNode('Kate')", ] for expected_node, printed_node in zip(expected_nodes, printout.splitlines()): assert expected_node in printed_node diff --git a/xarray/datatree_/datatree/tests/test_utils.py b/xarray/datatree_/datatree/tests/test_utils.py deleted file mode 100644 index 25632d38770..00000000000 --- a/xarray/datatree_/datatree/tests/test_utils.py +++ /dev/null @@ -1,50 +0,0 @@ -from datatree.utils import removeprefix, removesuffix - - -def checkequal(expected_result, obj, method, *args, **kwargs): - result = method(obj, *args, **kwargs) - assert result == expected_result - - -def checkraises(exc, obj, method, *args): - try: - method(obj, *args) - except Exception as e: - assert isinstance(e, exc) is True - - -def test_removeprefix(): - checkequal("am", "spam", removeprefix, "sp") - checkequal("spamspam", "spamspamspam", removeprefix, "spam") - checkequal("spam", "spam", removeprefix, "python") - checkequal("spam", "spam", removeprefix, "spider") - checkequal("spam", "spam", removeprefix, "spam and eggs") - checkequal("", "", removeprefix, "") - checkequal("", "", removeprefix, "abcde") - checkequal("abcde", "abcde", removeprefix, "") - checkequal("", "abcde", removeprefix, "abcde") - - checkraises(TypeError, "hello", removeprefix) - checkraises(TypeError, "hello", removeprefix, 42) - checkraises(TypeError, "hello", removeprefix, 42, "h") - checkraises(TypeError, "hello", removeprefix, "h", 42) - checkraises(TypeError, "hello", removeprefix, ("he", "l")) - - -def test_removesuffix(): - checkequal("sp", "spam", removesuffix, "am") - checkequal("spamspam", "spamspamspam", removesuffix, "spam") - checkequal("spam", "spam", removesuffix, "python") - checkequal("spam", "spam", removesuffix, "blam") - checkequal("spam", "spam", removesuffix, "eggs and spam") - - checkequal("", "", removesuffix, "") - checkequal("", "", removesuffix, "abcde") - checkequal("abcde", "abcde", removesuffix, "") - checkequal("", "abcde", removesuffix, "abcde") - - checkraises(TypeError, "hello", removesuffix) - checkraises(TypeError, "hello", removesuffix, 42) - checkraises(TypeError, "hello", removesuffix, 42, "h") - checkraises(TypeError, "hello", removesuffix, "h", 42) - checkraises(TypeError, "hello", removesuffix, ("lo", "l")) diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index 463c68847a7..729207229e3 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -1,219 +1,500 @@ from __future__ import annotations -from typing import Hashable, Iterable, Sequence, Tuple, Union +from collections import OrderedDict +from pathlib import PurePosixPath +from typing import Any, Iterator, Mapping, Tuple -import anytree +from xarray.core.utils import Frozen, is_dict_like -PathType = Union[Hashable, Sequence[Hashable]] +class TreeError(Exception): + """Exception type raised when user attempts to create an invalid tree in some way.""" -class TreeNode(anytree.NodeMixin): + ... + + +class NodePath(PurePosixPath): + """Represents a path from one node to another within a tree.""" + + def __new__(cls, *args: str | "NodePath") -> "NodePath": + obj = super().__new__(cls, *args) + + if obj.drive: + raise ValueError("NodePaths cannot have drives") + + if obj.root not in ["/", ""]: + raise ValueError( + 'Root of NodePath can only be either "/" or "", with "" meaning the path is relative.' + ) + + # TODO should we also forbid suffixes to avoid node names with dots in them? + + return obj + + +class TreeNode: """ Base class representing a node of a tree, with methods for traversing and altering the tree. - Depends on the anytree library for basic tree structure, but the parent class is fairly small - so could be easily reimplemented to avoid a hard dependency. + This class stores no data, it has only parents and children attributes, and various methods. - Adds restrictions preventing children with the same name, a method to set new nodes at arbitrary depth, - and access via unix-like paths or tuples of tags. Does not yet store anything in the nodes of the tree. - """ + Stores child nodes in an Ordered Dictionary, which is necessary to ensure that equality checks between two trees + also check that the order of child nodes is the same. - # TODO remove anytree dependency - # TODO allow for loops via symbolic links? + Nodes themselves are intrinsically unnamed (do not possess a ._name attribute), but if the node has a parent you can + find the key it is stored under via the .name property. - # TODO store children with their names in an OrderedDict instead of a tuple like anytree does? - # TODO do nodes even need names? Or can they just be referred to by the tags their parents store them under? - # TODO nodes should have names but they should be optional. Getting and setting should be down without reference to - # the names of stored objects, only their tags (i.e. position in the family tree) - # Ultimately you either need a list of named children, or a dictionary of unnamed children + The .parent attribute is read-only: to replace the parent using public API you must set this node as the child of a + new parent using `new_parent.children[name] = child_node`, or to instead detach from the current parent use + `child_node.orphan()`. - # TODO change .path in the parent class to behave like .path_str does here. (old .path -> .walk_path()) + This class is intended to be subclassed by DataTree, which will overwrite some of the inherited behaviour, + in particular to make names an inherent attribute, and allow setting parents directly. The intention is to mirror + the class structure of xarray.Variable & xarray.DataArray, where Variable is unnamed but DataArray is (optionally) + named. - _resolver = anytree.Resolver("name") + Also allows access to any other node in the tree via unix-like paths, including upwards referencing via '../'. - def __init__( - self, - name: Hashable, - parent: TreeNode = None, - children: Iterable[TreeNode] = None, - ): - if not isinstance(name, str) or "/" in name: - raise ValueError(f"invalid name {name}") - self.name = name + (This class is heavily inspired by the anytree library's NodeMixin class.) + """ + + # TODO replace all type annotations that use "TreeNode" with "Self", so it's still correct when subclassed (requires python 3.11) + _parent: TreeNode | None + _children: OrderedDict[str, TreeNode] - self.parent = parent - if children: + def __init__(self, children: Mapping[str, TreeNode] = None): + """Create a parentless node.""" + self._parent = None + self._children = OrderedDict() + if children is not None: self.children = children - def __str__(self): - """A printable representation of the structure of this entire subtree.""" - lines = [] - for pre, _, node in anytree.RenderTree(self): - node_lines = f"{pre}{node._single_node_repr()}" - lines.append(node_lines) - return "\n".join(lines) + @property + def parent(self) -> TreeNode | None: + """Parent of this node.""" + return self._parent + + def _set_parent(self, new_parent: TreeNode | None, child_name: str = None): + # TODO is it possible to refactor in a way that removes this private method? + + if new_parent is not None and not isinstance(new_parent, TreeNode): + raise TypeError( + "Parent nodes must be of type DataTree or None, " + f"not type {type(new_parent)}" + ) - def _single_node_repr(self): - """Information about this node, not including its relationships to other nodes.""" - return f"TreeNode('{self.name}')" + old_parent = self._parent + if new_parent is not old_parent: + self._check_loop(new_parent) + self._detach(old_parent) + self._attach(new_parent, child_name) + + def _check_loop(self, new_parent: TreeNode | None): + """Checks that assignment of this new parent will not create a cycle.""" + if new_parent is not None: + if new_parent is self: + raise TreeError( + f"Cannot set parent, as node {self} cannot be a parent of itself." + ) + + _self, *lineage = list(self.lineage) + if any(child is self for child in lineage): + raise TreeError( + f"Cannot set parent, as node {self} is already a descendant of node {new_parent}." + ) + + def _detach(self, parent: TreeNode | None): + if parent is not None: + self._pre_detach(parent) + parents_children = parent.children + parent._children = OrderedDict( + { + name: child + for name, child in parents_children.items() + if child is not self + } + ) + self._parent = None + self._post_detach(parent) + + def _attach(self, parent: TreeNode | None, child_name: str = None): + if parent is not None: + self._pre_attach(parent) + parentchildren = parent._children + assert not any( + child is self for child in parentchildren + ), "Tree is corrupt." + parentchildren[child_name] = self + self._parent = parent + self._post_attach(parent) + else: + self._parent = None + + def orphan(self): + """Detach this node from its parent.""" + self._set_parent(new_parent=None) + + @property + def children(self) -> Mapping[str, TreeNode]: + """Child nodes of this node, stored under a mapping via their names.""" + return Frozen(self._children) + + @children.setter + def children(self, children: Mapping[str, TreeNode]): + self._check_children(children) + children = OrderedDict(children) + + old_children = self.children + del self.children + try: + self._pre_attach_children(children) + for name, child in children.items(): + child._set_parent(new_parent=self, child_name=name) + self._post_attach_children(children) + assert len(self.children) == len(children) + except Exception: + # if something goes wrong then revert to previous children + self.children = old_children + raise + + @children.deleter + def children(self): + # TODO this just detaches all the children, it doesn't actually delete them... + children = self.children + self._pre_detach_children(children) + for child in self.children.values(): + child.orphan() + assert len(self.children) == 0 + self._post_detach_children(children) + + @staticmethod + def _check_children(children: Mapping[str, TreeNode]): + """Check children for correct types and for any duplicates.""" + if not is_dict_like(children): + raise TypeError( + "children must be a dict-like mapping from names to node objects" + ) + + seen = set() + for name, child in children.items(): + if not isinstance(child, TreeNode): + raise TypeError( + f"Cannot add object {name}. It is of type {type(child)}, " + "but can only add children of type DataTree" + ) + + childid = id(child) + if childid not in seen: + seen.add(childid) + else: + raise TreeError( + f"Cannot add same node {name} multiple times as different children." + ) def __repr__(self): - """Information about this node, including its relationships to other nodes.""" - parent = self.parent.name if self.parent else "None" - return f"TreeNode(name='{self.name}', parent='{parent}', children={[c.name for c in self.children]})" + return f"TreeNode(children={dict(self._children)})" + + def _pre_detach_children(self, children: Mapping[str, TreeNode]): + """Method call before detaching `children`.""" + pass + + def _post_detach_children(self, children: Mapping[str, TreeNode]): + """Method call after detaching `children`.""" + pass + + def _pre_attach_children(self, children: Mapping[str, TreeNode]): + """Method call before attaching `children`.""" + pass + + def _post_attach_children(self, children: Mapping[str, TreeNode]): + """Method call after attaching `children`.""" + pass + + def iter_lineage(self) -> Iterator[TreeNode]: + """Iterate up the tree, starting from the current node.""" + # TODO should this instead return an OrderedDict, so as to include node names? + node: TreeNode | None = self + while node is not None: + yield node + node = node.parent + + @property + def lineage(self) -> Tuple[TreeNode]: + """All parent nodes and their parent nodes, starting with the closest.""" + return tuple(self.iter_lineage()) @property - def pathstr(self) -> str: - """Path from root to this node, as a filepath-like string.""" - return "/".join(self.tags) + def ancestors(self) -> Tuple[TreeNode, ...]: + """All parent nodes and their parent nodes, starting with the most distant.""" + if self.parent is None: + return (self,) + else: + ancestors = tuple(reversed(list(self.lineage))) + return ancestors @property - def has_data(self): - return False + def root(self) -> TreeNode: + """Root node of the tree""" + node = self + while node.parent is not None: + node = node.parent + return node - def _pre_attach(self, parent: TreeNode) -> None: + @property + def is_root(self) -> bool: + """Whether or not this node is the tree root.""" + return self.parent is None + + @property + def is_leaf(self) -> bool: + """Whether or not this node is a leaf node.""" + return self.children == {} + + @property + def siblings(self) -> OrderedDict[str, TreeNode]: """ - Method which superclass calls before setting parent, here used to prevent having two - children with duplicate names. + Nodes with the same parent as this node. """ - if self.name in list(c.name for c in parent.children): - raise KeyError( - f"parent {parent.name} already has a child named {self.name}" - ) + return OrderedDict( + { + name: child + for name, child in self.parent.children.items() + if child is not self + } + ) - def add_child(self, child: TreeNode) -> None: - """Add a single child node below this node, without replacement.""" - if child.name in list(c.name for c in self.children): - raise KeyError(f"Node already has a child named {child.name}") - else: - child.parent = self - - @classmethod - def _tuple_or_path_to_path(cls, address: PathType) -> str: - if isinstance(address, str): - return address - # TODO check for iterable in general instead - elif isinstance(address, (tuple, list)): - return cls.separator.join(tag for tag in address) + @property + def subtree(self) -> Iterator[TreeNode]: + """ + An iterator over all nodes in this tree, including both self and all descendants. + + Iterates depth-first. + """ + from . import iterators + + return iterators.PreOrderIter(self) + + def _pre_detach(self, parent: TreeNode): + """Method call before detaching from `parent`.""" + pass + + def _post_detach(self, parent: TreeNode): + """Method call after detaching from `parent`.""" + pass + + def _pre_attach(self, parent: TreeNode): + """Method call before attaching to `parent`.""" + pass + + def _post_attach(self, parent: TreeNode): + """Method call after attaching to `parent`.""" + pass + + def get(self, key: str, default: TreeNode = None) -> TreeNode | None: + """ + Return the child node with the specified key. + + Only looks for the node within the immediate children of this node, + not in other nodes of the tree. + """ + if key in self.children: + return self.children[key] else: - raise TypeError(f"{address} is not a valid form of path") + return default - def get_node(self, path: PathType) -> TreeNode: + def _get_item(self, path: str | NodePath) -> Any: """ - Access node of the tree lying at the given path. + Returns the object lying at the given path. - Raises a TreeError if not found. + Raises a KeyError if there is no object at the given path. + """ + if isinstance(path, str): + path = NodePath(path) - Parameters - ---------- - path : - Paths can be given as unix-like paths, or as tuples of strings - (where each string is known as a single "tag"). Path includes the name of the target node. + if path.root: + current_node = self.root + root, *parts = path.parts + else: + current_node = self + parts = path.parts + + for part in parts: + if part == "..": + parent = current_node.parent + if parent is None: + raise KeyError(f"Could not find node at {path}") + current_node = parent + elif part in ("", "."): + pass + else: + current_node = current_node.get(part) + if current_node is None: + raise KeyError(f"Could not find node at {path}") + return current_node - Returns - ------- - node + def _set(self, key: str, val: TreeNode) -> None: """ - # TODO change so this raises a standard KeyError instead of a ChildResolverError when it can't find an item + Set the child node with the specified key to value. - p = self._tuple_or_path_to_path(path) - return anytree.Resolver("name").get(self, p) + Counterpart to the public .get method, and also only works on the immediate node, not other nodes in the tree. + """ + new_children = {**self.children, key: val} + self.children = new_children - def set_node( + def _set_item( self, - path: PathType = "/", - node: TreeNode = None, - new_nodes_along_path: bool = True, + path: str | NodePath, + item: Any, + new_nodes_along_path: bool = False, allow_overwrite: bool = True, - ) -> None: + ): """ - Set a node on the tree, overwriting anything already present at that path. - - The given value either forms a new node of the tree or overwrites an existing node at that location. + Set a new item in the tree, overwriting anything already present at that path. - Paths are specified relative to the node on which this method was called, and the name of the node forms the - last part of the path. (i.e. `.set_node(path='', TreeNode('a'))` is equivalent to `.add_child(TreeNode('a'))`. + The given value either forms a new node of the tree or overwrites an existing item at that location. Parameters ---------- - path : Union[Hashable, Sequence[Hashable]] - Path names can be given as unix-like paths, or as tuples of strings (where each string - is known as a single "tag"). Default is '/'. - node : TreeNode + path + item new_nodes_along_path : bool If true, then if necessary new nodes will be created along the given path, until the tree can reach the - specified location. If false then an error is thrown instead of creating intermediate nodes alang the path. + specified location. allow_overwrite : bool - Whether or not to overwrite any existing node at the location given by path. Default is True. + Whether or not to overwrite any existing node at the location given by path. Raises ------ KeyError - If a node already exists at the given path + If node cannot be reached, and new_nodes_along_path=False. + Or if a node already exists at the specified path, and allow_overwrite=False. """ + if isinstance(path, str): + path = NodePath(path) - # Determine full path of new object - path = self._tuple_or_path_to_path(path) + if not path.name: + raise ValueError("Can't set an item under a path which has no name") - if not isinstance(node, TreeNode): - raise ValueError( - f"Can only set nodes to be subclasses of TreeNode, but node is of type {type(node)}" - ) - node_name = node.name - - # Walk to location of new node, creating intermediate node objects as we go if necessary - parent = self - tags = [ - tag for tag in path.split(self.separator) if tag not in [self.separator, ""] - ] - for tag in tags: - # TODO will this mutation within a for loop actually work? - if tag not in [child.name for child in parent.children]: - if new_nodes_along_path: - # TODO prevent this from leaving a trail of nodes if the assignment fails somehow - - # Want child classes to populate tree with their own types - # TODO this seems like a code smell though... - new_node = type(self)(name=tag) - parent.add_child(new_node) + if path.root: + # absolute path + current_node = self.root + root, *parts, name = path.parts + else: + # relative path + current_node = self + *parts, name = path.parts + + if parts: + # Walk to location of new node, creating intermediate node objects as we go if necessary + for part in parts: + if part == "..": + parent = current_node.parent + if parent is None: + # We can't create a parent if `new_nodes_along_path=True` as we wouldn't know what to name it + raise KeyError(f"Could not reach node at path {path}") + current_node = parent + elif part in ("", "."): + pass else: - raise KeyError( - f"Cannot reach new node at path {path}: " - f"parent {parent} has no child {tag}" - ) - parent = parent.get_node(tag) - - # Deal with anything already existing at this location - if node_name in [child.name for child in parent.children]: + if part in current_node.children: + current_node = current_node.children[part] + elif new_nodes_along_path: + # Want child classes (i.e. DataTree) to populate tree with their own types + new_node = type(self)() + current_node._set(part, new_node) + current_node = current_node.children[part] + else: + raise KeyError(f"Could not reach node at path {path}") + + if name in current_node.children: + # Deal with anything already existing at this location if allow_overwrite: - child = parent.get_node(node_name) - child.parent = None - del child + current_node._set(name, item) else: - # TODO should this be before we walk to the new node? - raise KeyError( - f"Cannot set item at {path} whilst that path already points to a " - f"{type(parent.get_node(node_name))} object" - ) + raise KeyError(f"Already a node object at path {path}") + else: + current_node._set(name, item) - # Place new child node at this location - parent.add_child(node) + def del_node(self, path: str): + raise NotImplementedError - def glob(self, path: str): - return self._resolver.glob(self, path) + def update(self, other: Mapping[str, TreeNode]) -> None: + """ + Update this node's children. + + Just like `dict.update` this is an in-place operation. + """ + new_children = {**self.children, **other} + self.children = new_children @property - def tags(self) -> Tuple[Hashable]: - """All tags, returned in order starting from the root node""" - return tuple(node.name for node in self.path) - - @tags.setter - def tags(self, value): - raise AttributeError( - "tags cannot be set, except via changing the children and/or parent of a node." - ) + def name(self) -> str | None: + """If node has a parent, this is the key under which it is stored in `parent.children`.""" + if self.parent: + return next( + name for name, child in self.parent.children.items() if child is self + ) + else: + return None + + def __str__(self): + return f"TreeNode({self.name})" if self.name else "TreeNode()" @property - def subtree(self): - """An iterator over all nodes in this tree, including both self and all descendants.""" - return anytree.iterators.PreOrderIter(self) + def path(self) -> str: + """Return the file-like path from the root to this node.""" + if self.is_root: + return "/" + else: + root, *ancestors = self.ancestors + # don't include name of root because (a) root might not have a name & (b) we want path relative to root. + return "/" + "/".join(node.name for node in ancestors) + + def relative_to(self, other: TreeNode) -> str: + """ + Compute the relative path from this node to node `other`. + + If other is not in this tree, or it's otherwise impossible, raise a ValueError. + """ + if not self.same_tree(other): + raise ValueError( + "Cannot find relative path because nodes do not lie within the same tree" + ) + + this_path = NodePath(self.path) + if other in self.lineage: + return str(this_path.relative_to(other.path)) + else: + common_ancestor = self.find_common_ancestor(other) + path_to_common_ancestor = other._path_to_ancestor(common_ancestor) + return str( + path_to_common_ancestor / this_path.relative_to(common_ancestor.path) + ) + + def same_tree(self, other: TreeNode) -> bool: + """True if other node is in the same tree as this node.""" + return self.root is other.root + + def find_common_ancestor(self, other: TreeNode) -> TreeNode: + """ + Find the first common ancestor of two nodes in the same tree. + + Raise ValueError if they are not in the same tree. + """ + common_ancestor = None + for node in other.iter_lineage(): + if node in self.ancestors: + common_ancestor = node + break + + if not common_ancestor: + raise ValueError( + "Cannot find relative path because nodes do not lie within the same tree" + ) + + return common_ancestor + + def _path_to_ancestor(self, ancestor: TreeNode) -> NodePath: + generation_gap = list(self.lineage).index(ancestor) + path_upwards = "../" * generation_gap if generation_gap > 0 else "/" + return NodePath(path_upwards) diff --git a/xarray/datatree_/datatree/utils.py b/xarray/datatree_/datatree/utils.py deleted file mode 100644 index 95d7ec0b23c..00000000000 --- a/xarray/datatree_/datatree/utils.py +++ /dev/null @@ -1,19 +0,0 @@ -import sys - - -def removesuffix(base: str, suffix: str) -> str: - if sys.version_info >= (3, 9): - return base.removesuffix(suffix) - else: - if base.endswith(suffix): - return base[: len(base) - len(suffix)] - return base - - -def removeprefix(base: str, prefix: str) -> str: - if sys.version_info >= (3, 9): - return base.removeprefix(prefix) - else: - if base.startswith(prefix): - return base[len(prefix) :] - return base diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst index 5398aff888d..37368b998b0 100644 --- a/xarray/datatree_/docs/source/api.rst +++ b/xarray/datatree_/docs/source/api.rst @@ -7,13 +7,35 @@ API reference DataTree ======== +Creating a DataTree +------------------- + .. autosummary:: :toctree: generated/ DataTree -Attributes ----------- +Tree Attributes +--------------- + +.. autosummary:: + :toctree: generated/ + + DataTree.parent + DataTree.children + DataTree.name + DataTree.path + DataTree.root + DataTree.is_root + DataTree.is_leaf + DataTree.subtree + DataTree.siblings + DataTree.lineage + DataTree.ancestors + DataTree.groups + +Data Attributes +--------------- .. autosummary:: :toctree: generated/ @@ -23,16 +45,19 @@ Attributes DataTree.encoding DataTree.sizes DataTree.attrs - DataTree.nbytes DataTree.indexes DataTree.xindexes DataTree.coords DataTree.chunks - DataTree.real - DataTree.imag DataTree.ds DataTree.has_data - DataTree.groups + DataTree.has_attrs + DataTree.is_empty + +.. + + Missing + DataTree.chunksizes Dictionary interface -------------------- @@ -43,14 +68,45 @@ Dictionary interface DataTree.__getitem__ DataTree.__setitem__ DataTree.update + DataTree.get + +.. + + Missing + DataTree.__delitem__ + DataTree.items + DataTree.keys + DataTree.values + +Tree Manipulation Methods +------------------------- + +.. autosummary:: + :toctree: generated/ + + DataTree.orphan + DataTree.same_tree + DataTree.relative_to + DataTree.iter_lineage + DataTree.find_common_ancestor + +Tree Manipulation Utilities +--------------------------- + +.. autosummary:: + :toctree: generated/ + + map_over_subtree Methods ------- +.. + + TODO divide these up into "Dataset contents", "Indexing", "Computation" etc. .. autosummary:: :toctree: generated/ - DataTree.from_dict DataTree.load DataTree.compute DataTree.persist @@ -131,13 +187,25 @@ Methods DataTree.isin DataTree.astype -Utilities -========= +Comparisons +=========== .. autosummary:: :toctree: generated/ - map_over_subtree + testing.assert_isomorphic + testing.assert_equal + testing.assert_identical + +ndarray methods +--------------- + +.. autosummary:: + :toctree: generated/ + + DataTree.nbytes + DataTree.real + DataTree.imag I/O === @@ -146,31 +214,21 @@ I/O :toctree: generated/ open_datatree + DataTree.from_dict DataTree.to_netcdf DataTree.to_zarr .. - Missing - DataTree.__delitem__ - DataTree.get - DataTree.items - DataTree.keys - DataTree.values - -Testing -=== -.. autosummary:: - :toctree: generated/ - - testing.assert_isomorphic - testing.assert_equal - testing.assert_identical + Missing + DataTree.to_dict + open_mfdatatree Exceptions -=== +========== .. autosummary:: :toctree: generated/ + TreeError TreeIsomorphismError diff --git a/xarray/datatree_/docs/source/index.rst b/xarray/datatree_/docs/source/index.rst index 4ba2890405d..76ed72beafe 100644 --- a/xarray/datatree_/docs/source/index.rst +++ b/xarray/datatree_/docs/source/index.rst @@ -15,10 +15,11 @@ Datatree How do I ... Contributing Guide Development Roadmap - GitHub repository + What's New + GitHub repository Feedback -------- -If you encounter any errors or problems with **Datatree**, please open an issue -on `GitHub `_. +If you encounter any errors, problems with **Datatree**, or have any suggestions, please open an issue +on `GitHub `_. diff --git a/xarray/datatree_/docs/source/quick-overview.rst b/xarray/datatree_/docs/source/quick-overview.rst index bba3fc695e9..5ec2194a190 100644 --- a/xarray/datatree_/docs/source/quick-overview.rst +++ b/xarray/datatree_/docs/source/quick-overview.rst @@ -35,18 +35,17 @@ Now we'll put this data into a multi-group tree: from datatree import DataTree - dt = DataTree.from_dict( - {"root/simulation/coarse": ds, "root/simulation/fine": ds2, "root": ds3} - ) + dt = DataTree.from_dict({"simulation/coarse": ds, "simulation/fine": ds2, "/": ds3}) dt -This creates a datatree with various groups. We have one root group (named ``root``), containing information about individual people. +This creates a datatree with various groups. We have one root group, containing information about individual people. +(This root group can be named, but here is unnamed, so is referred to with ``"/"``, same as the root of a unix-like filesystem.) The root group then has one subgroup ``simulation``, which contains no data itself but does contain another two subgroups, named ``fine`` and ``coarse``. The (sub-)sub-groups ``fine`` and ``coarse`` contain two very similar datasets. They both have an ``"x"`` dimension, but the dimension is of different lengths in each group, which makes the data in each group unalignable. -In (``root``) we placed some completely unrelated information, showing how we can use a tree to store heterogenous data. +In the root group we placed some completely unrelated information, showing how we can use a tree to store heterogenous data. The constraints on each group are therefore the same as the constraint on dataarrays within a single dataset. diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst new file mode 100644 index 00000000000..4bc263d471f --- /dev/null +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -0,0 +1,95 @@ +.. currentmodule:: datatree + +What's New +========== + +.. ipython:: python + :suppress: + + import numpy as np + import pandas as pd + import xarray as xray + import xarray + import xarray as xr + import datatree + + np.random.seed(123456) + +.. _whats-new.v0.1.0: + +v0.1.0 (unreleased) +------------------- + +- Major refactor of internals, moving from the ``DataTree.children`` attribute being a ``Tuple[DataTree]`` to being a + ``FrozenDict[str, DataTree]``. This was necessary in order to integrate better with xarray's dictionary-like API, + solve several issues, simplify the code internally, remove dependencies, and enable new features. (:pull:`76`) + By `Tom Nicholas `_. + +New Features +~~~~~~~~~~~~ + +- Syntax for accessing nodes now supports file-like paths, including parent nodes via ``"../"``, relative paths, the + root node via ``"/"``, and the current node via ``"."``. (Internally it actually uses ``pathlib`` now.) + By `Tom Nicholas `_. +- New path-like API methods, such as ``.relative_to``, ``.find_common_ancestor``, and ``.same_tree``. +- Some new diction-like methods, such as ``DataTree.get`` and ``DataTree.update``. (:pull:`76`) + By `Tom Nicholas `_. + +Breaking changes +~~~~~~~~~~~~~~~~ + +- Node names are now optional, which means that the root of the tree can be unnamed. This has knock-on effects for + a lot of the API. +- The ``__init__`` signature for ``DataTree`` has changed, so that ``name`` is now an optional kwarg. +- Files will now be loaded as a slightly different tree, because the root group no longer needs to be given a default + name. +- Removed tag-like access to nodes. +- Removes the option to delete all data in a node by assigning None to the node (in favour of deleting data using the + xarray API), or to create a new empty node in the same way (in favour of assigning an empty DataTree object instead). +- Removes the ability to create a new node by assigning a ``Dataset`` object to ``DataTree.__setitem__`. +- Several other minor API changes such as ``.pathstr`` -> ``.path``, and ``from_dict``'s dictionary argument now being + required. (:pull:`76`) + By `Tom Nicholas `_. + +Deprecations +~~~~~~~~~~~~ + +- No longer depends on the anytree library (:pull:`76`) + By `Tom Nicholas `_. + +Bug fixes +~~~~~~~~~ + +Documentation +~~~~~~~~~~~~~ + +- Quick-overview page updated to match change in path syntax (:pull:`76`) + By `Tom Nicholas `_. + +Internal Changes +~~~~~~~~~~~~~~~~ + +- Basically every file was changed in some way to accommodate (:pull:`76`). +- No longer need the utility functions for string manipulation that were defined in ``utils.py``. +- A considerable amount of code copied over from the internals of anytree (e.g. in ``render.py`` and ``iterators.py``). + The Apache license for anytree has now been bundled with datatree. (:pull:`76`). + By `Tom Nicholas `_. + +.. _whats-new.v0.0.4: + +v0.0.4 (31/03/2022) +------------------- + +- Ensure you get the pretty tree-like string representation by default in ipython (:pull:`73`). + By `Tom Nicholas `_. +- Now available on conda-forge (as xarray-datatree)! (:pull:`71`) + By `Anderson Banihirwe `_. +- Allow for python 3.8 (:pull:`70`). + By `Don Setiawan `_. + +.. _whats-new.v0.0.3: + +v0.0.3 (30/03/2022) +------------------- + +- First released version available on both pypi (as xarray-datatree)! diff --git a/xarray/datatree_/licenses/ANYTREE_LICENSE b/xarray/datatree_/licenses/ANYTREE_LICENSE new file mode 100644 index 00000000000..8dada3edaf5 --- /dev/null +++ b/xarray/datatree_/licenses/ANYTREE_LICENSE @@ -0,0 +1,201 @@ + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "{}" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright {yyyy} {name of copyright owner} + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/xarray/datatree_/requirements.txt b/xarray/datatree_/requirements.txt index bad07301c3e..cf84c87ec50 100644 --- a/xarray/datatree_/requirements.txt +++ b/xarray/datatree_/requirements.txt @@ -1,2 +1 @@ xarray>=0.20.2 -anytree>=2.8.0 From ca40321bdfed1c0173d1217af888e530903c8985 Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Thu, 28 Apr 2022 22:30:42 -0400 Subject: [PATCH 113/260] Enable mypy https://github.com/xarray-contrib/datatree/pull/23 * re-enable mypy * ignored untyped imports * draft implementation of a TreeNode class which stores children in a dict * separate path-like access out into mixin * pseudocode for node getter * basic idea for a path-like object which inherits from pathlib * pass type checking * implement attach * consolidate tree classes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * passes some basic family tree tests * frozen children * passes all basic family tree tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copied iterators code over from anytree * get nodes with path-like syntax * relative path method * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * set and get node methods * copy anytree iterators * add anytree license * change iterator import * copy anytree's string renderer * renderer * refactored treenode to use .get * black * updated datatree tests to match new path API * moved io tests to their own file * reimplemented getitem in terms of .get * reimplemented setitem in terms of .update * remove anytree dependency * from_dict constructor * string representation of tree * fixed tree diff * fixed io * removed cheeky print statements * fixed isomorphism checking * fixed map_over_subtree * removed now-uneeded utils.py compatibility functions * fixed tests for mapped dataset api methods * updated API docs * reimplement __setitem__ in terms of _set * fixed bug by ensuring name of child node is changed to match key it is stored under * updated docs * added whats-new, and put all changes from this PR in it * added summary of previous versions * remove outdated ._add_child method * fix some of the easier typing errors * generic typevar for tree in TreeNode * datatree.py almost passes type checking * ignore remaining typing errors for now * fix / ignore last few typing errors * remove spurious type check Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- xarray/datatree_/.pre-commit-config.yaml | 32 ++--- xarray/datatree_/datatree/datatree.py | 107 ++++++---------- xarray/datatree_/datatree/io.py | 47 ++----- xarray/datatree_/datatree/iterators.py | 14 +-- xarray/datatree_/datatree/mapping.py | 4 +- xarray/datatree_/datatree/treenode.py | 148 +++++++++++++---------- 6 files changed, 160 insertions(+), 192 deletions(-) diff --git a/xarray/datatree_/.pre-commit-config.yaml b/xarray/datatree_/.pre-commit-config.yaml index 60e7db3436c..f1bd6160652 100644 --- a/xarray/datatree_/.pre-commit-config.yaml +++ b/xarray/datatree_/.pre-commit-config.yaml @@ -31,22 +31,22 @@ repos: # hooks: # - id: velin # args: ["--write", "--compact"] -# - repo: https://github.com/pre-commit/mirrors-mypy -# rev: v0.910 -# hooks: -# - id: mypy -# # Copied from setup.cfg -# exclude: "properties|asv_bench" -# additional_dependencies: [ -# # Type stubs -# types-python-dateutil, -# types-pkg_resources, -# types-PyYAML, -# types-pytz, -# # Dependencies that are typed -# numpy, -# typing-extensions==3.10.0.0, -# ] + - repo: https://github.com/pre-commit/mirrors-mypy + rev: v0.910 + hooks: + - id: mypy + # Copied from setup.cfg + exclude: "properties|asv_bench|docs" + additional_dependencies: [ + # Type stubs + types-python-dateutil, + types-pkg_resources, + types-PyYAML, + types-pytz, + # Dependencies that are typed + numpy, + typing-extensions==3.10.0.0, + ] # run this occasionally, ref discussion https://github.com/pydata/xarray/pull/3194 # - repo: https://github.com/asottile/pyupgrade # rev: v1.22.1 diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 50f943b070f..a88ebc35109 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -1,19 +1,21 @@ from __future__ import annotations +from collections import OrderedDict from typing import ( TYPE_CHECKING, Any, Callable, - Hashable, + Generic, Iterable, Mapping, MutableMapping, + Optional, Tuple, Union, ) -from xarray import DataArray, Dataset, merge -from xarray.core import dtypes, utils +from xarray import DataArray, Dataset +from xarray.core import utils from xarray.core.variable import Variable from .formatting import tree_repr @@ -24,7 +26,7 @@ MappedDataWithCoords, ) from .render import RenderTree -from .treenode import NodePath, TreeNode +from .treenode import NodePath, Tree, TreeNode if TYPE_CHECKING: from xarray.core.merge import CoercibleValue @@ -51,6 +53,7 @@ class DataTree( MappedDatasetMethodsMixin, MappedDataWithCoords, DataTreeArithmeticMixin, + Generic[Tree], ): """ A tree-like hierarchical collection of xarray objects. @@ -74,12 +77,14 @@ class DataTree( # TODO .loc, __contains__, __iter__, __array__, __len__ - _name: str | None - _ds: Dataset | None + _name: Optional[str] + _parent: Optional[Tree] + _children: OrderedDict[str, Tree] + _ds: Dataset def __init__( self, - data: Dataset | DataArray = None, + data: Optional[Dataset | DataArray] = None, parent: DataTree = None, children: Mapping[str, DataTree] = None, name: str = None, @@ -109,9 +114,9 @@ def __init__( """ super().__init__(children=children) - self._name = name + self.name = name self.parent = parent - self.ds = data + self.ds = data # type: ignore[assignment] @property def name(self) -> str | None: @@ -122,8 +127,13 @@ def name(self) -> str | None: def name(self, name: str | None) -> None: self._name = name - @TreeNode.parent.setter - def parent(self, new_parent: DataTree) -> None: + @property + def parent(self: DataTree) -> DataTree | None: + """Parent of this node.""" + return self._parent + + @parent.setter + def parent(self: DataTree, new_parent: DataTree) -> None: if new_parent and self.name is None: raise ValueError("Cannot set an unnamed node as a child of another node") self._set_parent(new_parent, self.name) @@ -134,7 +144,7 @@ def ds(self) -> Dataset: return self._ds @ds.setter - def ds(self, data: Union[Dataset, DataArray] = None): + def ds(self, data: Union[Dataset, DataArray] = None) -> None: if not isinstance(data, (Dataset, DataArray)) and data is not None: raise TypeError( f"{type(data)} object is not an xarray Dataset, DataArray, or None" @@ -168,7 +178,7 @@ def is_empty(self) -> bool: """False if node contains any data or attrs. Does not look at children.""" return not (self.has_data or self.has_attrs) - def _pre_attach(self, parent: TreeNode) -> None: + def _pre_attach(self: DataTree, parent: DataTree) -> None: """ Method which superclass calls before setting parent, here used to prevent having two children with duplicate names (or a data variable with the same name as a child). @@ -186,8 +196,8 @@ def __str__(self): return tree_repr(self) def get( - self, key: str, default: DataTree | DataArray = None - ) -> DataTree | DataArray | None: + self: DataTree, key: str, default: Optional[DataTree | DataArray] = None + ) -> Optional[DataTree | DataArray]: """ Access child nodes stored in this node as a DataTree or variables or coordinates stored in this node as a DataArray. @@ -207,7 +217,7 @@ def get( else: return default - def __getitem__(self, key: str) -> DataTree | DataArray: + def __getitem__(self: DataTree, key: str) -> DataTree | DataArray: """ Access child nodes stored in this tree as a DataTree or variables or coordinates stored in this tree as a DataArray. @@ -272,7 +282,7 @@ def __setitem__( else: raise ValueError("Invalid format for key") - def update(self, other: Dataset | Mapping[str, DataTree | CoercibleValue]) -> None: + def update(self, other: Dataset | Mapping[str, DataTree | DataArray]) -> None: """ Update this node's children and / or variables. @@ -285,10 +295,8 @@ def update(self, other: Dataset | Mapping[str, DataTree | CoercibleValue]) -> No if isinstance(v, DataTree): new_children[k] = v elif isinstance(v, (DataArray, Variable)): - # TODO this should also accomodate other types that can be coerced into Variables + # TODO this should also accommodate other types that can be coerced into Variables new_variables[k] = v - elif isinstance(v, Dataset): - new_variables = v.variables else: raise TypeError(f"Type {type(v)} cannot be assigned to a DataTree") @@ -298,7 +306,7 @@ def update(self, other: Dataset | Mapping[str, DataTree | CoercibleValue]) -> No @classmethod def from_dict( cls, - d: MutableMapping[str, Any], + d: MutableMapping[str, DataTree | Dataset | DataArray], name: str = None, ) -> DataTree: """ @@ -322,15 +330,16 @@ def from_dict( """ # First create the root node + # TODO there is a real bug here where what if root_data is of type DataTree? root_data = d.pop("/", None) - obj = cls(name=name, data=root_data, parent=None, children=None) + obj = cls(name=name, data=root_data, parent=None, children=None) # type: ignore[arg-type] if d: # Populate tree with children determined from data_objects mapping for path, data in d.items(): # Create and set new node node_name = NodePath(path).name - new_node = cls(name=node_name, data=data) + new_node = cls(name=node_name, data=data) # type: ignore[arg-type] obj._set_item( path, new_node, @@ -346,8 +355,8 @@ def nbytes(self) -> int: def isomorphic( self, other: DataTree, - from_root=False, - strict_names=False, + from_root: bool = False, + strict_names: bool = False, ) -> bool: """ Two DataTrees are considered isomorphic if every node has the same number of children. @@ -386,7 +395,7 @@ def isomorphic( except (TypeError, TreeIsomorphismError): return False - def equals(self, other: DataTree, from_root=True) -> bool: + def equals(self, other: DataTree, from_root: bool = True) -> bool: """ Two DataTrees are equal if they have isomorphic node structures, with matching node names, and if they have matching variables and coordinates, all of which are equal. @@ -479,7 +488,8 @@ def map_over_subtree( """ # TODO this signature means that func has no way to know which node it is being called upon - change? - return map_over_subtree(func)(self, *args, **kwargs) + # TODO fix this typing error + return map_over_subtree(func)(self, *args, **kwargs) # type: ignore[operator] def map_over_subtree_inplace( self, @@ -516,31 +526,6 @@ def render(self): for ds_line in repr(node.ds)[1:]: print(f"{fill}{ds_line}") - # TODO re-implement using anytree findall function? - def get_all(self, *tags: Hashable) -> DataTree: - """ - Return a DataTree containing the stored objects whose path contains all of the given tags, - where the tags can be present in any order. - """ - matching_children = { - c.tags: c.get_node(tags) - for c in self.descendants - if all(tag in c.tags for tag in tags) - } - return DataTree(data_objects=matching_children) - - # TODO re-implement using anytree find function? - def get_any(self, *tags: Hashable) -> DataTree: - """ - Return a DataTree containing the stored objects whose path contains any of the given tags. - """ - matching_children = { - c.tags: c.get_node(tags) - for c in self.descendants - if any(tag in c.tags for tag in tags) - } - return DataTree(data_objects=matching_children) - def merge(self, datatree: DataTree) -> DataTree: """Merge all the leaves of a second DataTree into this one.""" raise NotImplementedError @@ -549,23 +534,7 @@ def merge_child_nodes(self, *paths, new_path: T_Path) -> DataTree: """Merge a set of child nodes into a single new node.""" raise NotImplementedError - def merge_child_datasets( - self, - *paths: T_Path, - compat: str = "no_conflicts", - join: str = "outer", - fill_value: Any = dtypes.NA, - combine_attrs: str = "override", - ) -> Dataset: - """Merge the datasets at a set of child nodes and return as a single Dataset.""" - datasets = [self.get(path).ds for path in paths] - return merge( - datasets, - compat=compat, - join=join, - fill_value=fill_value, - combine_attrs=combine_attrs, - ) + # TODO some kind of .collapse() or .flatten() method to merge a subtree def as_array(self) -> DataArray: return self.ds.as_dataarray() diff --git a/xarray/datatree_/datatree/io.py b/xarray/datatree_/datatree/io.py index 06e9b88436c..6cf562752fa 100644 --- a/xarray/datatree_/datatree/io.py +++ b/xarray/datatree_/datatree/io.py @@ -1,8 +1,6 @@ -from typing import Sequence - from xarray import Dataset, open_dataset -from .datatree import DataTree, NodePath, T_Path +from .datatree import DataTree, NodePath def _iter_zarr_groups(root, parent="/"): @@ -23,14 +21,14 @@ def _iter_nc_groups(root, parent="/"): def _get_nc_dataset_class(engine): if engine == "netcdf4": - from netCDF4 import Dataset + from netCDF4 import Dataset # type: ignore elif engine == "h5netcdf": - from h5netcdf.legacyapi import Dataset + from h5netcdf.legacyapi import Dataset # type: ignore elif engine is None: try: from netCDF4 import Dataset except ImportError: - from h5netcdf.legacyapi import Dataset + from h5netcdf.legacyapi import Dataset # type: ignore else: raise ValueError(f"unsupported engine: {engine}") return Dataset @@ -58,6 +56,8 @@ def open_datatree(filename_or_obj, engine=None, **kwargs) -> DataTree: return _open_datatree_zarr(filename_or_obj, **kwargs) elif engine in [None, "netcdf4", "h5netcdf"]: return _open_datatree_netcdf(filename_or_obj, engine=engine, **kwargs) + else: + raise ValueError("Unsupported engine") def _open_datatree_netcdf(filename: str, **kwargs) -> DataTree: @@ -71,7 +71,7 @@ def _open_datatree_netcdf(filename: str, **kwargs) -> DataTree: # TODO refactor to use __setitem__ once creation of new nodes by assigning Dataset works again node_name = NodePath(path).name - new_node = DataTree(name=node_name, data=subgroup_ds) + new_node: DataTree = DataTree(name=node_name, data=subgroup_ds) tree_root._set_item( path, new_node, @@ -82,7 +82,7 @@ def _open_datatree_netcdf(filename: str, **kwargs) -> DataTree: def _open_datatree_zarr(store, **kwargs) -> DataTree: - import zarr + import zarr # type: ignore with zarr.open_group(store, mode="r") as zds: ds = open_dataset(store, engine="zarr", **kwargs) @@ -95,7 +95,7 @@ def _open_datatree_zarr(store, **kwargs) -> DataTree: # TODO refactor to use __setitem__ once creation of new nodes by assigning Dataset works again node_name = NodePath(path).name - new_node = DataTree(name=node_name, data=subgroup_ds) + new_node: DataTree = DataTree(name=node_name, data=subgroup_ds) tree_root._set_item( path, new_node, @@ -105,31 +105,6 @@ def _open_datatree_zarr(store, **kwargs) -> DataTree: return tree_root -def open_mfdatatree( - filepaths, rootnames: Sequence[T_Path] = None, chunks=None, **kwargs -) -> DataTree: - """ - Open multiple files as a single DataTree. - - Groups found in each file will be merged at the root level, unless rootnames are specified, - which will then be used to organise the Tree instead. - """ - if rootnames is None: - rootnames = ["/" for _ in filepaths] - elif len(rootnames) != len(filepaths): - raise ValueError - - full_tree = DataTree() - - for file, root in zip(filepaths, rootnames): - dt = open_datatree(file, chunks=chunks, **kwargs) - full_tree.set_node( - path=root, node=dt, new_nodes_along_path=True, allow_overwrite=False - ) - - return full_tree - - def _maybe_extract_group_kwargs(enc, group): try: return enc[group] @@ -193,7 +168,7 @@ def _datatree_to_netcdf( def _create_empty_zarr_group(store, group, mode): - import zarr + import zarr # type: ignore root = zarr.open_group(store, mode=mode) root.create_group(group, overwrite=True) @@ -208,7 +183,7 @@ def _datatree_to_zarr( **kwargs, ): - from zarr.convenience import consolidate_metadata + from zarr.convenience import consolidate_metadata # type: ignore if kwargs.get("group", None) is not None: raise NotImplementedError( diff --git a/xarray/datatree_/datatree/iterators.py b/xarray/datatree_/datatree/iterators.py index 8e34fa0c141..e2c6b4d3fde 100644 --- a/xarray/datatree_/datatree/iterators.py +++ b/xarray/datatree_/datatree/iterators.py @@ -2,7 +2,7 @@ from collections import abc from typing import Callable, Iterator, List -from .treenode import TreeNode +from .treenode import Tree """These iterators are copied from anytree.iterators, with minor modifications.""" @@ -10,7 +10,7 @@ class AbstractIter(abc.Iterator): def __init__( self, - node: TreeNode, + node: Tree, filter_: Callable = None, stop: Callable = None, maxlevel: int = None, @@ -49,18 +49,18 @@ def __default_filter(node): def __default_stop(node): return False - def __iter__(self) -> Iterator[TreeNode]: + def __iter__(self) -> Iterator[Tree]: return self - def __next__(self) -> TreeNode: + def __next__(self) -> Iterator[Tree]: if self.__iter is None: self.__iter = self.__init() - item = next(self.__iter) + item = next(self.__iter) # type: ignore[call-overload] return item @staticmethod @abstractmethod - def _iter(children: List[TreeNode], filter_, stop, maxlevel) -> Iterator[TreeNode]: + def _iter(children: List[Tree], filter_, stop, maxlevel) -> Iterator[Tree]: ... @staticmethod @@ -68,7 +68,7 @@ def _abort_at_level(level, maxlevel): return maxlevel is not None and level > maxlevel @staticmethod - def _get_children(children: List[TreeNode], stop) -> List[TreeNode]: + def _get_children(children: List[Tree], stop) -> List[Tree]: return [child for child in children if not stop(child)] diff --git a/xarray/datatree_/datatree/mapping.py b/xarray/datatree_/datatree/mapping.py index f669fda6166..94d2c7418fa 100644 --- a/xarray/datatree_/datatree/mapping.py +++ b/xarray/datatree_/datatree/mapping.py @@ -103,7 +103,7 @@ def diff_treestructure(a: DataTree, b: DataTree, require_names_equal: bool) -> s return "" -def map_over_subtree(func: Callable) -> DataTree | Tuple[DataTree, ...]: +def map_over_subtree(func: Callable) -> Callable: """ Decorator which turns a function which acts on (and returns) Datasets into one which acts on and returns DataTrees. @@ -153,7 +153,7 @@ def map_over_subtree(func: Callable) -> DataTree | Tuple[DataTree, ...]: # TODO inspect function to work out immediately if the wrong number of arguments were passed for it? @functools.wraps(func) - def _map_over_subtree(*args, **kwargs): + def _map_over_subtree(*args, **kwargs) -> DataTree | Tuple[DataTree, ...]: """Internal function which maps func over every node in tree, returning a tree of the results.""" from .datatree import DataTree diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index 729207229e3..f4c0c77fd22 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -2,10 +2,22 @@ from collections import OrderedDict from pathlib import PurePosixPath -from typing import Any, Iterator, Mapping, Tuple +from typing import ( + TYPE_CHECKING, + Generic, + Iterator, + Mapping, + Optional, + Tuple, + TypeVar, + Union, +) from xarray.core.utils import Frozen, is_dict_like +if TYPE_CHECKING: + from xarray.core.types import T_DataArray + class TreeError(Exception): """Exception type raised when user attempts to create an invalid tree in some way.""" @@ -32,7 +44,10 @@ def __new__(cls, *args: str | "NodePath") -> "NodePath": return obj -class TreeNode: +Tree = TypeVar("Tree", bound="TreeNode") + + +class TreeNode(Generic[Tree]): """ Base class representing a node of a tree, with methods for traversing and altering the tree. @@ -58,11 +73,10 @@ class TreeNode: (This class is heavily inspired by the anytree library's NodeMixin class.) """ - # TODO replace all type annotations that use "TreeNode" with "Self", so it's still correct when subclassed (requires python 3.11) - _parent: TreeNode | None - _children: OrderedDict[str, TreeNode] + _parent: Optional[Tree] + _children: OrderedDict[str, Tree] - def __init__(self, children: Mapping[str, TreeNode] = None): + def __init__(self, children: Mapping[str, Tree] = None): """Create a parentless node.""" self._parent = None self._children = OrderedDict() @@ -70,11 +84,11 @@ def __init__(self, children: Mapping[str, TreeNode] = None): self.children = children @property - def parent(self) -> TreeNode | None: + def parent(self) -> Tree | None: """Parent of this node.""" return self._parent - def _set_parent(self, new_parent: TreeNode | None, child_name: str = None): + def _set_parent(self, new_parent: Tree | None, child_name: str = None) -> None: # TODO is it possible to refactor in a way that removes this private method? if new_parent is not None and not isinstance(new_parent, TreeNode): @@ -89,7 +103,7 @@ def _set_parent(self, new_parent: TreeNode | None, child_name: str = None): self._detach(old_parent) self._attach(new_parent, child_name) - def _check_loop(self, new_parent: TreeNode | None): + def _check_loop(self, new_parent: Tree | None) -> None: """Checks that assignment of this new parent will not create a cycle.""" if new_parent is not None: if new_parent is self: @@ -103,7 +117,7 @@ def _check_loop(self, new_parent: TreeNode | None): f"Cannot set parent, as node {self} is already a descendant of node {new_parent}." ) - def _detach(self, parent: TreeNode | None): + def _detach(self, parent: Tree | None) -> None: if parent is not None: self._pre_detach(parent) parents_children = parent.children @@ -117,8 +131,11 @@ def _detach(self, parent: TreeNode | None): self._parent = None self._post_detach(parent) - def _attach(self, parent: TreeNode | None, child_name: str = None): + def _attach(self, parent: Tree | None, child_name: str = None) -> None: if parent is not None: + if child_name is None: + raise ValueError() + self._pre_attach(parent) parentchildren = parent._children assert not any( @@ -130,17 +147,17 @@ def _attach(self, parent: TreeNode | None, child_name: str = None): else: self._parent = None - def orphan(self): + def orphan(self) -> None: """Detach this node from its parent.""" self._set_parent(new_parent=None) @property - def children(self) -> Mapping[str, TreeNode]: + def children(self: Tree) -> Mapping[str, Tree]: """Child nodes of this node, stored under a mapping via their names.""" return Frozen(self._children) @children.setter - def children(self, children: Mapping[str, TreeNode]): + def children(self: Tree, children: Mapping[str, Tree]) -> None: self._check_children(children) children = OrderedDict(children) @@ -158,7 +175,7 @@ def children(self, children: Mapping[str, TreeNode]): raise @children.deleter - def children(self): + def children(self) -> None: # TODO this just detaches all the children, it doesn't actually delete them... children = self.children self._pre_detach_children(children) @@ -168,7 +185,7 @@ def children(self): self._post_detach_children(children) @staticmethod - def _check_children(children: Mapping[str, TreeNode]): + def _check_children(children: Mapping[str, Tree]) -> None: """Check children for correct types and for any duplicates.""" if not is_dict_like(children): raise TypeError( @@ -191,40 +208,40 @@ def _check_children(children: Mapping[str, TreeNode]): f"Cannot add same node {name} multiple times as different children." ) - def __repr__(self): + def __repr__(self) -> str: return f"TreeNode(children={dict(self._children)})" - def _pre_detach_children(self, children: Mapping[str, TreeNode]): + def _pre_detach_children(self: Tree, children: Mapping[str, Tree]) -> None: """Method call before detaching `children`.""" pass - def _post_detach_children(self, children: Mapping[str, TreeNode]): + def _post_detach_children(self: Tree, children: Mapping[str, Tree]) -> None: """Method call after detaching `children`.""" pass - def _pre_attach_children(self, children: Mapping[str, TreeNode]): + def _pre_attach_children(self: Tree, children: Mapping[str, Tree]) -> None: """Method call before attaching `children`.""" pass - def _post_attach_children(self, children: Mapping[str, TreeNode]): + def _post_attach_children(self: Tree, children: Mapping[str, Tree]) -> None: """Method call after attaching `children`.""" pass - def iter_lineage(self) -> Iterator[TreeNode]: + def iter_lineage(self: Tree) -> Iterator[Tree]: """Iterate up the tree, starting from the current node.""" # TODO should this instead return an OrderedDict, so as to include node names? - node: TreeNode | None = self + node: Tree | None = self while node is not None: yield node node = node.parent @property - def lineage(self) -> Tuple[TreeNode]: + def lineage(self: Tree) -> Tuple[Tree, ...]: """All parent nodes and their parent nodes, starting with the closest.""" return tuple(self.iter_lineage()) @property - def ancestors(self) -> Tuple[TreeNode, ...]: + def ancestors(self: Tree) -> Tuple[Tree, ...]: """All parent nodes and their parent nodes, starting with the most distant.""" if self.parent is None: return (self,) @@ -233,7 +250,7 @@ def ancestors(self) -> Tuple[TreeNode, ...]: return ancestors @property - def root(self) -> TreeNode: + def root(self: Tree) -> Tree: """Root node of the tree""" node = self while node.parent is not None: @@ -251,20 +268,23 @@ def is_leaf(self) -> bool: return self.children == {} @property - def siblings(self) -> OrderedDict[str, TreeNode]: + def siblings(self: Tree) -> OrderedDict[str, Tree]: """ Nodes with the same parent as this node. """ - return OrderedDict( - { - name: child - for name, child in self.parent.children.items() - if child is not self - } - ) + if self.parent: + return OrderedDict( + { + name: child + for name, child in self.parent.children.items() + if child is not self + } + ) + else: + return OrderedDict() @property - def subtree(self) -> Iterator[TreeNode]: + def subtree(self: Tree) -> Iterator[Tree]: """ An iterator over all nodes in this tree, including both self and all descendants. @@ -274,23 +294,23 @@ def subtree(self) -> Iterator[TreeNode]: return iterators.PreOrderIter(self) - def _pre_detach(self, parent: TreeNode): + def _pre_detach(self: Tree, parent: Tree) -> None: """Method call before detaching from `parent`.""" pass - def _post_detach(self, parent: TreeNode): + def _post_detach(self: Tree, parent: Tree) -> None: """Method call after detaching from `parent`.""" pass - def _pre_attach(self, parent: TreeNode): + def _pre_attach(self: Tree, parent: Tree) -> None: """Method call before attaching to `parent`.""" pass - def _post_attach(self, parent: TreeNode): + def _post_attach(self: Tree, parent: Tree) -> None: """Method call after attaching to `parent`.""" pass - def get(self, key: str, default: TreeNode = None) -> TreeNode | None: + def get(self: Tree, key: str, default: Tree = None) -> Optional[Tree]: """ Return the child node with the specified key. @@ -302,7 +322,9 @@ def get(self, key: str, default: TreeNode = None) -> TreeNode | None: else: return default - def _get_item(self, path: str | NodePath) -> Any: + # TODO `._walk` method to be called by both `_get_item` and `_set_item` + + def _get_item(self: Tree, path: str | NodePath) -> Union[Tree, T_DataArray]: """ Returns the object lying at the given path. @@ -313,26 +335,27 @@ def _get_item(self, path: str | NodePath) -> Any: if path.root: current_node = self.root - root, *parts = path.parts + root, *parts = list(path.parts) else: current_node = self - parts = path.parts + parts = list(path.parts) for part in parts: if part == "..": - parent = current_node.parent - if parent is None: + if current_node.parent is None: raise KeyError(f"Could not find node at {path}") - current_node = parent + else: + current_node = current_node.parent elif part in ("", "."): pass else: - current_node = current_node.get(part) - if current_node is None: + if current_node.get(part) is None: raise KeyError(f"Could not find node at {path}") + else: + current_node = current_node.get(part) return current_node - def _set(self, key: str, val: TreeNode) -> None: + def _set(self: Tree, key: str, val: Tree) -> None: """ Set the child node with the specified key to value. @@ -342,12 +365,12 @@ def _set(self, key: str, val: TreeNode) -> None: self.children = new_children def _set_item( - self, + self: Tree, path: str | NodePath, - item: Any, + item: Union[Tree, T_DataArray], new_nodes_along_path: bool = False, allow_overwrite: bool = True, - ): + ) -> None: """ Set a new item in the tree, overwriting anything already present at that path. @@ -388,11 +411,11 @@ def _set_item( # Walk to location of new node, creating intermediate node objects as we go if necessary for part in parts: if part == "..": - parent = current_node.parent - if parent is None: + if current_node.parent is None: # We can't create a parent if `new_nodes_along_path=True` as we wouldn't know what to name it raise KeyError(f"Could not reach node at path {path}") - current_node = parent + else: + current_node = current_node.parent elif part in ("", "."): pass else: @@ -418,7 +441,7 @@ def _set_item( def del_node(self, path: str): raise NotImplementedError - def update(self, other: Mapping[str, TreeNode]) -> None: + def update(self: Tree, other: Mapping[str, Tree]) -> None: """ Update this node's children. @@ -437,7 +460,7 @@ def name(self) -> str | None: else: return None - def __str__(self): + def __str__(self) -> str: return f"TreeNode({self.name})" if self.name else "TreeNode()" @property @@ -448,9 +471,10 @@ def path(self) -> str: else: root, *ancestors = self.ancestors # don't include name of root because (a) root might not have a name & (b) we want path relative to root. - return "/" + "/".join(node.name for node in ancestors) + names = [node.name for node in ancestors] + return "/" + "/".join(names) # type: ignore - def relative_to(self, other: TreeNode) -> str: + def relative_to(self, other: Tree) -> str: """ Compute the relative path from this node to node `other`. @@ -471,11 +495,11 @@ def relative_to(self, other: TreeNode) -> str: path_to_common_ancestor / this_path.relative_to(common_ancestor.path) ) - def same_tree(self, other: TreeNode) -> bool: + def same_tree(self, other: Tree) -> bool: """True if other node is in the same tree as this node.""" return self.root is other.root - def find_common_ancestor(self, other: TreeNode) -> TreeNode: + def find_common_ancestor(self, other: Tree) -> Tree: """ Find the first common ancestor of two nodes in the same tree. @@ -494,7 +518,7 @@ def find_common_ancestor(self, other: TreeNode) -> TreeNode: return common_ancestor - def _path_to_ancestor(self, ancestor: TreeNode) -> NodePath: + def _path_to_ancestor(self, ancestor: Tree) -> NodePath: generation_gap = list(self.lineage).index(ancestor) path_upwards = "../" * generation_gap if generation_gap > 0 else "/" return NodePath(path_upwards) From c7bad488392025d32da09ab3334feeafff8233ad Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Mon, 2 May 2022 15:43:18 -0400 Subject: [PATCH 114/260] [pre-commit.ci] pre-commit autoupdate https://github.com/xarray-contrib/datatree/pull/83 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit updates: - [github.com/pre-commit/pre-commit-hooks: v4.1.0 → v4.2.0](https://github.com/pre-commit/pre-commit-hooks/compare/v4.1.0...v4.2.0) - [github.com/pre-commit/mirrors-mypy: v0.910 → v0.950](https://github.com/pre-commit/mirrors-mypy/compare/v0.910...v0.950) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- xarray/datatree_/.pre-commit-config.yaml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/xarray/datatree_/.pre-commit-config.yaml b/xarray/datatree_/.pre-commit-config.yaml index f1bd6160652..c30f66aeeec 100644 --- a/xarray/datatree_/.pre-commit-config.yaml +++ b/xarray/datatree_/.pre-commit-config.yaml @@ -3,7 +3,7 @@ ci: autoupdate_schedule: monthly repos: - repo: https://github.com/pre-commit/pre-commit-hooks - rev: v4.1.0 + rev: v4.2.0 hooks: - id: trailing-whitespace - id: end-of-file-fixer @@ -32,7 +32,7 @@ repos: # - id: velin # args: ["--write", "--compact"] - repo: https://github.com/pre-commit/mirrors-mypy - rev: v0.910 + rev: v0.950 hooks: - id: mypy # Copied from setup.cfg From 72181643da325d96682dba83cd2a55cd446ddc33 Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Wed, 4 May 2022 17:45:42 -0400 Subject: [PATCH 115/260] Str repr indentation https://github.com/xarray-contrib/datatree/pull/86 * indent dataset repr * move repr tests to test_formatting.py * whatsnew --- xarray/datatree_/datatree/formatting.py | 2 +- .../datatree_/datatree/tests/test_datatree.py | 56 ------------------- .../datatree/tests/test_formatting.py | 56 +++++++++++++++++++ xarray/datatree_/docs/source/whats-new.rst | 3 + 4 files changed, 60 insertions(+), 57 deletions(-) diff --git a/xarray/datatree_/datatree/formatting.py b/xarray/datatree_/datatree/formatting.py index 7b66c4e13c0..5ed56572018 100644 --- a/xarray/datatree_/datatree/formatting.py +++ b/xarray/datatree_/datatree/formatting.py @@ -69,7 +69,7 @@ def tree_repr(dt): if len(node.children) > 0: lines.append(f"{fill}{renderer.style.vertical}{line}") else: - lines.append(f"{fill}{line}") + lines.append(f"{fill}{' ' * len(renderer.style.vertical)}{line}") # Tack on info about whether or not root node has a parent at the start first_line = lines[0] diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 3bf28a3aac6..45cc7b4786f 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -1,5 +1,3 @@ -import textwrap - import pytest import xarray as xr import xarray.testing as xrt @@ -337,57 +335,3 @@ class TestBrowsing: class TestRestructuring: ... - - -class TestRepr: - def test_print_empty_node(self): - dt = DataTree(name="root") - printout = dt.__str__() - assert printout == "DataTree('root', parent=None)" - - def test_print_empty_node_with_attrs(self): - dat = xr.Dataset(attrs={"note": "has attrs"}) - dt = DataTree(name="root", data=dat) - printout = dt.__str__() - assert printout == textwrap.dedent( - """\ - DataTree('root', parent=None) - Dimensions: () - Data variables: - *empty* - Attributes: - note: has attrs""" - ) - - def test_print_node_with_data(self): - dat = xr.Dataset({"a": [0, 2]}) - dt = DataTree(name="root", data=dat) - printout = dt.__str__() - expected = [ - "DataTree('root', parent=None)", - "Dimensions", - "Coordinates", - "a", - "Data variables", - "*empty*", - ] - for expected_line, printed_line in zip(expected, printout.splitlines()): - assert expected_line in printed_line - - def test_nested_node(self): - dat = xr.Dataset({"a": [0, 2]}) - root = DataTree(name="root") - DataTree(name="results", data=dat, parent=root) - printout = root.__str__() - assert printout.splitlines()[2].startswith(" ") - - def test_print_datatree(self): - dt = create_test_datatree() - print(dt) - - # TODO work out how to test something complex like this - - def test_repr_of_node_with_data(self): - dat = xr.Dataset({"a": [0, 2]}) - dt = DataTree(name="root", data=dat) - assert "Coordinates" in repr(dt) diff --git a/xarray/datatree_/datatree/tests/test_formatting.py b/xarray/datatree_/datatree/tests/test_formatting.py index 995a7c85fb4..b3a9fed04ba 100644 --- a/xarray/datatree_/datatree/tests/test_formatting.py +++ b/xarray/datatree_/datatree/tests/test_formatting.py @@ -5,6 +5,62 @@ from datatree import DataTree from datatree.formatting import diff_tree_repr +from .test_datatree import create_test_datatree + + +class TestRepr: + def test_print_empty_node(self): + dt = DataTree(name="root") + printout = dt.__str__() + assert printout == "DataTree('root', parent=None)" + + def test_print_empty_node_with_attrs(self): + dat = Dataset(attrs={"note": "has attrs"}) + dt = DataTree(name="root", data=dat) + printout = dt.__str__() + assert printout == dedent( + """\ + DataTree('root', parent=None) + Dimensions: () + Data variables: + *empty* + Attributes: + note: has attrs""" + ) + + def test_print_node_with_data(self): + dat = Dataset({"a": [0, 2]}) + dt = DataTree(name="root", data=dat) + printout = dt.__str__() + expected = [ + "DataTree('root', parent=None)", + "Dimensions", + "Coordinates", + "a", + "Data variables", + "*empty*", + ] + for expected_line, printed_line in zip(expected, printout.splitlines()): + assert expected_line in printed_line + + def test_nested_node(self): + dat = Dataset({"a": [0, 2]}) + root = DataTree(name="root") + DataTree(name="results", data=dat, parent=root) + printout = root.__str__() + assert printout.splitlines()[2].startswith(" ") + + def test_print_datatree(self): + dt = create_test_datatree() + print(dt) + + # TODO work out how to test something complex like this + + def test_repr_of_node_with_data(self): + dat = Dataset({"a": [0, 2]}) + dt = DataTree(name="root", data=dat) + assert "Coordinates" in repr(dt) + class TestDiffFormatting: def test_diff_structure(self): diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 4bc263d471f..82c11674700 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -60,6 +60,9 @@ Deprecations Bug fixes ~~~~~~~~~ +- Fixed indentation issue with the string repr (:pull:`86`) + By `Tom Nicholas `_. + Documentation ~~~~~~~~~~~~~ From dbd8f60a1793ada88b4ae74acf768a284ebc59d7 Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Wed, 4 May 2022 18:43:47 -0400 Subject: [PATCH 116/260] html repr https://github.com/xarray-contrib/datatree/pull/78 * html repr displays data in root group * displays under name 'xarray.DataTree' * creates a html repr for each sub-group, but it looks messed up * correctly indents sub-groups * show names of groups * refactoring * dodge type hinting bug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix failing test in merg * fix bug caused by merge * whatsnew Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- xarray/datatree_/datatree/datatree.py | 29 +++++++--- xarray/datatree_/datatree/formatting.py | 2 +- xarray/datatree_/datatree/formatting_html.py | 57 ++++++++++++++++++++ xarray/datatree_/datatree/treenode.py | 2 +- xarray/datatree_/docs/source/whats-new.rst | 4 +- 5 files changed, 85 insertions(+), 9 deletions(-) create mode 100644 xarray/datatree_/datatree/formatting_html.py diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index a88ebc35109..15cb26438e5 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -1,6 +1,7 @@ from __future__ import annotations from collections import OrderedDict +from html import escape from typing import ( TYPE_CHECKING, Any, @@ -16,9 +17,10 @@ from xarray import DataArray, Dataset from xarray.core import utils +from xarray.core.options import OPTIONS as XR_OPTS from xarray.core.variable import Variable -from .formatting import tree_repr +from . import formatting, formatting_html from .mapping import TreeIsomorphismError, check_isomorphic, map_over_subtree from .ops import ( DataTreeArithmeticMixin, @@ -189,11 +191,17 @@ def _pre_attach(self: DataTree, parent: DataTree) -> None: f"parent {parent.name} already contains a data variable named {self.name}" ) - def __repr__(self): - return tree_repr(self) + def __repr__(self) -> str: + return formatting.datatree_repr(self) - def __str__(self): - return tree_repr(self) + def __str__(self) -> str: + return formatting.datatree_repr(self) + + def _repr_html_(self): + """Make html representation of datatree object""" + if XR_OPTS["display_style"] == "text": + return f"
{escape(repr(self))}
" + return formatting_html.datatree_repr(self) def get( self: DataTree, key: str, default: Optional[DataTree | DataArray] = None @@ -227,8 +235,10 @@ def __getitem__(self: DataTree, key: str) -> DataTree | DataArray: key : str Name of variable / node, or unix-like path to variable / node. """ + # Either: if utils.is_dict_like(key): + # dict-like indexing raise NotImplementedError("Should this index over whole tree?") elif isinstance(key, str): @@ -243,7 +253,7 @@ def __getitem__(self: DataTree, key: str) -> DataTree | DataArray: "implemented via .subset" ) else: - raise ValueError("Invalid format for key") + raise ValueError(f"Invalid format for key: {key}") def _set(self, key: str, val: DataTree | CoercibleValue) -> None: """ @@ -352,6 +362,13 @@ def from_dict( def nbytes(self) -> int: return sum(node.ds.nbytes if node.has_data else 0 for node in self.subtree) + def __len__(self) -> int: + if self.children: + n_children = len(self.children) + else: + n_children = 0 + return n_children + len(self.ds) + def isomorphic( self, other: DataTree, diff --git a/xarray/datatree_/datatree/formatting.py b/xarray/datatree_/datatree/formatting.py index 5ed56572018..deba57eb09d 100644 --- a/xarray/datatree_/datatree/formatting.py +++ b/xarray/datatree_/datatree/formatting.py @@ -52,7 +52,7 @@ def diff_tree_repr(a, b, compat): return "\n".join(summary) -def tree_repr(dt): +def datatree_repr(dt): """A printable representation of the structure of this entire tree.""" renderer = RenderTree(dt) diff --git a/xarray/datatree_/datatree/formatting_html.py b/xarray/datatree_/datatree/formatting_html.py new file mode 100644 index 00000000000..91c1d1449d5 --- /dev/null +++ b/xarray/datatree_/datatree/formatting_html.py @@ -0,0 +1,57 @@ +from functools import partial +from html import escape +from typing import Any, Mapping + +from xarray.core.formatting_html import ( + _mapping_section, + _obj_repr, + attr_section, + coord_section, + datavar_section, + dim_section, +) +from xarray.core.options import OPTIONS + +OPTIONS["display_expand_groups"] = "default" + + +def summarize_children(children: Mapping[str, Any]) -> str: + children_li = "".join( + f"
    {node_repr(n, c)}
" for n, c in children.items() + ) + + return ( + "
    " + f"
    {children_li}
    " + "
" + ) + + +children_section = partial( + _mapping_section, + name="Groups", + details_func=summarize_children, + max_items_collapse=1, + expand_option_name="display_expand_groups", +) + + +def node_repr(group_title: str, dt: Any) -> str: + header_components = [f"
{escape(group_title)}
"] + + ds = dt.ds + + sections = [ + children_section(dt.children), + dim_section(ds), + coord_section(ds.coords), + datavar_section(ds.data_vars), + attr_section(ds.attrs), + ] + + return _obj_repr(ds, header_components, sections) + + +def datatree_repr(dt: Any) -> str: + obj_type = f"datatree.{type(dt).__name__}" + return node_repr(obj_type, dt) diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index f4c0c77fd22..899945a1e35 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -134,7 +134,7 @@ def _detach(self, parent: Tree | None) -> None: def _attach(self, parent: Tree | None, child_name: str = None) -> None: if parent is not None: if child_name is None: - raise ValueError() + raise ValueError("Cannot directly assign a parent to an unnamed node") self._pre_attach(parent) parentchildren = parent._children diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 82c11674700..82c7ce7caea 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -32,7 +32,9 @@ New Features root node via ``"/"``, and the current node via ``"."``. (Internally it actually uses ``pathlib`` now.) By `Tom Nicholas `_. - New path-like API methods, such as ``.relative_to``, ``.find_common_ancestor``, and ``.same_tree``. -- Some new diction-like methods, such as ``DataTree.get`` and ``DataTree.update``. (:pull:`76`) +- Some new dictionary-like methods, such as ``DataTree.get`` and ``DataTree.update``. (:pull:`76`) + By `Tom Nicholas `_. +- New HTML repr, which will automatically display in a jupyter notebook. (:pull:`78`) By `Tom Nicholas `_. Breaking changes From fb831ac4cde4bcb379d4e030540b05dde8be6045 Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Wed, 4 May 2022 19:18:25 -0400 Subject: [PATCH 117/260] delitem method https://github.com/xarray-contrib/datatree/pull/88 * test * method * whatsnew --- xarray/datatree_/datatree/tests/test_treenode.py | 13 +++++++++++-- xarray/datatree_/datatree/treenode.py | 10 ++++++++-- xarray/datatree_/docs/source/whats-new.rst | 2 ++ 3 files changed, 21 insertions(+), 4 deletions(-) diff --git a/xarray/datatree_/datatree/tests/test_treenode.py b/xarray/datatree_/datatree/tests/test_treenode.py index 1b4ebc94daf..3ea859d8492 100644 --- a/xarray/datatree_/datatree/tests/test_treenode.py +++ b/xarray/datatree_/datatree/tests/test_treenode.py @@ -227,9 +227,18 @@ def test_overwrite_child(self): assert marys_evil_twin.parent is john -# TODO write and test all the del methods class TestPruning: - ... + def test_del_child(self): + john = TreeNode() + mary = TreeNode() + john._set_item("Mary", mary) + + del john["Mary"] + assert "Mary" not in john.children + assert mary.parent is None + + with pytest.raises(KeyError): + del john["Mary"] def create_test_tree(): diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index 899945a1e35..c3effd91bd7 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -438,8 +438,14 @@ def _set_item( else: current_node._set(name, item) - def del_node(self, path: str): - raise NotImplementedError + def __delitem__(self: Tree, key: str): + """Remove a child node from this tree object.""" + if key in self.children: + child = self._children[key] + del self._children[key] + child.orphan() + else: + raise KeyError("Cannot delete") def update(self: Tree, other: Mapping[str, Tree]) -> None: """ diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 82c7ce7caea..699a82b99b8 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -36,6 +36,8 @@ New Features By `Tom Nicholas `_. - New HTML repr, which will automatically display in a jupyter notebook. (:pull:`78`) By `Tom Nicholas `_. +- New delitem method so you can delete nodes. (:pull:`88`) + By `Tom Nicholas `_. Breaking changes ~~~~~~~~~~~~~~~~ From 2479bc7af15b895200016d9e5bee66ccef8b71b3 Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Thu, 5 May 2022 11:47:16 -0400 Subject: [PATCH 118/260] to/from_dict https://github.com/xarray-contrib/datatree/pull/82 * add to_dict method * added roundtrip test * add xfailed test for rountripping with named root * whats-new --- xarray/datatree_/datatree/datatree.py | 22 +++++++++++++++---- .../datatree_/datatree/tests/test_datatree.py | 14 ++++++++++++ xarray/datatree_/docs/source/api.rst | 2 +- xarray/datatree_/docs/source/whats-new.rst | 2 ++ 4 files changed, 35 insertions(+), 5 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 15cb26438e5..af666927701 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -6,6 +6,7 @@ TYPE_CHECKING, Any, Callable, + Dict, Generic, Iterable, Mapping, @@ -316,7 +317,7 @@ def update(self, other: Dataset | Mapping[str, DataTree | DataArray]) -> None: @classmethod def from_dict( cls, - d: MutableMapping[str, DataTree | Dataset | DataArray], + d: MutableMapping[str, Dataset | DataArray | None], name: str = None, ) -> DataTree: """ @@ -337,19 +338,22 @@ def from_dict( Returns ------- DataTree + + Notes + ----- + If your dictionary is nested you will need to flatten it before using this method. """ # First create the root node - # TODO there is a real bug here where what if root_data is of type DataTree? root_data = d.pop("/", None) - obj = cls(name=name, data=root_data, parent=None, children=None) # type: ignore[arg-type] + obj = cls(name=name, data=root_data, parent=None, children=None) if d: # Populate tree with children determined from data_objects mapping for path, data in d.items(): # Create and set new node node_name = NodePath(path).name - new_node = cls(name=node_name, data=data) # type: ignore[arg-type] + new_node = cls(name=node_name, data=data) obj._set_item( path, new_node, @@ -358,6 +362,16 @@ def from_dict( ) return obj + def to_dict(self) -> Dict[str, Any]: + """ + Create a dictionary mapping of absolute node paths to the data contained in those nodes. + + Returns + ------- + Dict + """ + return {node.path: node.ds for node in self.subtree} + @property def nbytes(self) -> int: return sum(node.ds.nbytes if node.has_data else 0 for node in self.subtree) diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 45cc7b4786f..45442b9c753 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -328,6 +328,20 @@ def test_full(self): "/set3", ] + def test_roundtrip(self): + dt = create_test_datatree() + roundtrip = DataTree.from_dict(dt.to_dict()) + assert roundtrip.equals(dt) + + @pytest.mark.xfail + def test_roundtrip_unnamed_root(self): + # See GH81 + + dt = create_test_datatree() + dt.name = "root" + roundtrip = DataTree.from_dict(dt.to_dict()) + assert roundtrip.equals(dt) + class TestBrowsing: ... diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst index 37368b998b0..d10e89c7c07 100644 --- a/xarray/datatree_/docs/source/api.rst +++ b/xarray/datatree_/docs/source/api.rst @@ -215,13 +215,13 @@ I/O open_datatree DataTree.from_dict + DataTree.to_dict DataTree.to_netcdf DataTree.to_zarr .. Missing - DataTree.to_dict open_mfdatatree Exceptions diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 699a82b99b8..b808407f439 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -38,6 +38,8 @@ New Features By `Tom Nicholas `_. - New delitem method so you can delete nodes. (:pull:`88`) By `Tom Nicholas `_. +- New ``to_dict`` method. (:pull:`82`) + By `Tom Nicholas `_. Breaking changes ~~~~~~~~~~~~~~~~ From 636957dd20e2c3ea5cceeb698b04ce0c498d1aa3 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Thu, 5 May 2022 11:48:55 -0400 Subject: [PATCH 119/260] update whatsnew for 0.0.5 release --- xarray/datatree_/docs/source/whats-new.rst | 34 ++++++++++++++++++---- 1 file changed, 29 insertions(+), 5 deletions(-) diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index b808407f439..ef950de6f71 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -15,13 +15,36 @@ What's New np.random.seed(123456) -.. _whats-new.v0.1.0: +.. _whats-new.v0.0.6: -v0.1.0 (unreleased) +v0.0.6 (unreleased) +------------------- + +New Features +~~~~~~~~~~~~ + +Breaking changes +~~~~~~~~~~~~~~~~ + +Deprecations +~~~~~~~~~~~~ + +Bug fixes +~~~~~~~~~ + +Documentation +~~~~~~~~~~~~~ + +Internal Changes +~~~~~~~~~~~~~~~~ + +.. _whats-new.v0.0.5: + +v0.0.5 (05/05/2022) ------------------- - Major refactor of internals, moving from the ``DataTree.children`` attribute being a ``Tuple[DataTree]`` to being a - ``FrozenDict[str, DataTree]``. This was necessary in order to integrate better with xarray's dictionary-like API, + ``OrderedDict[str, DataTree]``. This was necessary in order to integrate better with xarray's dictionary-like API, solve several issues, simplify the code internally, remove dependencies, and enable new features. (:pull:`76`) By `Tom Nicholas `_. @@ -50,8 +73,9 @@ Breaking changes - Files will now be loaded as a slightly different tree, because the root group no longer needs to be given a default name. - Removed tag-like access to nodes. -- Removes the option to delete all data in a node by assigning None to the node (in favour of deleting data using the - xarray API), or to create a new empty node in the same way (in favour of assigning an empty DataTree object instead). +- Removes the option to delete all data in a node by assigning None to the node (in favour of deleting data by replacing + the node's ``.ds`` attribute with an empty Dataset), or to create a new empty node in the same way (in favour of + assigning an empty DataTree object instead). - Removes the ability to create a new node by assigning a ``Dataset`` object to ``DataTree.__setitem__`. - Several other minor API changes such as ``.pathstr`` -> ``.path``, and ``from_dict``'s dictionary argument now being required. (:pull:`76`) From c1402a38e67401517aa8191edc53872da9bd31ba Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Thu, 5 May 2022 11:51:32 -0400 Subject: [PATCH 120/260] add __delitem__ to API docs --- xarray/datatree_/docs/source/api.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst index d10e89c7c07..5cd16466328 100644 --- a/xarray/datatree_/docs/source/api.rst +++ b/xarray/datatree_/docs/source/api.rst @@ -67,13 +67,13 @@ Dictionary interface DataTree.__getitem__ DataTree.__setitem__ + DataTree.__delitem__ DataTree.update DataTree.get .. Missing - DataTree.__delitem__ DataTree.items DataTree.keys DataTree.values @@ -230,5 +230,5 @@ Exceptions .. autosummary:: :toctree: generated/ - TreeError - TreeIsomorphismError + TreeError + TreeIsomorphismError From 0a886216e9d5cf80fc5e859a9d0aac724ebbb82b Mon Sep 17 00:00:00 2001 From: Matt McCormick Date: Wed, 18 May 2022 08:11:28 -0700 Subject: [PATCH 121/260] Do not call __exit__ on Zarr store when opening https://github.com/xarray-contrib/datatree/pull/90 * Do not call __exit__ on Zarr store when opening The `with` context when opening the zarr group with result in calling __exit__ on the store when the function completes. This calls `.close()` on ZipStore's, which results in errors: ``` ValueError: Attempt to use ZIP archive that was already closed ``` * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- xarray/datatree_/datatree/io.py | 36 +++++++++++----------- xarray/datatree_/datatree/tests/test_io.py | 14 +++++++++ 2 files changed, 32 insertions(+), 18 deletions(-) diff --git a/xarray/datatree_/datatree/io.py b/xarray/datatree_/datatree/io.py index 6cf562752fa..6236763dbb4 100644 --- a/xarray/datatree_/datatree/io.py +++ b/xarray/datatree_/datatree/io.py @@ -84,24 +84,24 @@ def _open_datatree_netcdf(filename: str, **kwargs) -> DataTree: def _open_datatree_zarr(store, **kwargs) -> DataTree: import zarr # type: ignore - with zarr.open_group(store, mode="r") as zds: - ds = open_dataset(store, engine="zarr", **kwargs) - tree_root = DataTree.from_dict({"/": ds}) - for path in _iter_zarr_groups(zds): - try: - subgroup_ds = open_dataset(store, engine="zarr", group=path, **kwargs) - except zarr.errors.PathNotFoundError: - subgroup_ds = Dataset() - - # TODO refactor to use __setitem__ once creation of new nodes by assigning Dataset works again - node_name = NodePath(path).name - new_node: DataTree = DataTree(name=node_name, data=subgroup_ds) - tree_root._set_item( - path, - new_node, - allow_overwrite=False, - new_nodes_along_path=True, - ) + zds = zarr.open_group(store, mode="r") + ds = open_dataset(store, engine="zarr", **kwargs) + tree_root = DataTree.from_dict({"/": ds}) + for path in _iter_zarr_groups(zds): + try: + subgroup_ds = open_dataset(store, engine="zarr", group=path, **kwargs) + except zarr.errors.PathNotFoundError: + subgroup_ds = Dataset() + + # TODO refactor to use __setitem__ once creation of new nodes by assigning Dataset works again + node_name = NodePath(path).name + new_node: DataTree = DataTree(name=node_name, data=subgroup_ds) + tree_root._set_item( + path, + new_node, + allow_overwrite=False, + new_nodes_along_path=True, + ) return tree_root diff --git a/xarray/datatree_/datatree/tests/test_io.py b/xarray/datatree_/datatree/tests/test_io.py index 659f0c31463..b7005471f17 100644 --- a/xarray/datatree_/datatree/tests/test_io.py +++ b/xarray/datatree_/datatree/tests/test_io.py @@ -40,6 +40,20 @@ def test_to_zarr(self, tmpdir): roundtrip_dt = open_datatree(filepath, engine="zarr") assert_equal(original_dt, roundtrip_dt) + @requires_zarr + def test_to_zarr_zip_store(self, tmpdir): + from zarr.storage import ZipStore + + filepath = str( + tmpdir / "test.zarr.zip" + ) # casting to str avoids a pathlib bug in xarray + original_dt = create_test_datatree() + store = ZipStore(filepath) + original_dt.to_zarr(store) + + roundtrip_dt = open_datatree(store, engine="zarr") + assert_equal(original_dt, roundtrip_dt) + @requires_zarr def test_to_zarr_not_consolidated(self, tmpdir): filepath = tmpdir / "test.zarr" From 80118299740e60bfd43252dfb34e3012701d3475 Mon Sep 17 00:00:00 2001 From: Joe Hamman Date: Thu, 26 May 2022 11:37:07 -0400 Subject: [PATCH 122/260] Fix netcdf encoding https://github.com/xarray-contrib/datatree/pull/95 * the fix that didn't fix * further work on the netcdf encoding issue * use set2 * check for invalid groups in to_zarr * add tests and comments for the future Co-authored-by: Justin Magers --- xarray/datatree_/datatree/io.py | 32 ++++++++++------- xarray/datatree_/datatree/tests/test_io.py | 42 ++++++++++++++++++++++ 2 files changed, 62 insertions(+), 12 deletions(-) diff --git a/xarray/datatree_/datatree/io.py b/xarray/datatree_/datatree/io.py index 6236763dbb4..8460a8979a4 100644 --- a/xarray/datatree_/datatree/io.py +++ b/xarray/datatree_/datatree/io.py @@ -105,13 +105,6 @@ def _open_datatree_zarr(store, **kwargs) -> DataTree: return tree_root -def _maybe_extract_group_kwargs(enc, group): - try: - return enc[group] - except KeyError: - return None - - def _create_empty_netcdf_group(filename, group, mode, engine): ncDataset = _get_nc_dataset_class(engine) @@ -146,6 +139,14 @@ def _datatree_to_netcdf( if encoding is None: encoding = {} + # In the future, we may want to expand this check to insure all the provided encoding + # options are valid. For now, this simply checks that all provided encoding keys are + # groups in the datatree. + if set(encoding) - set(dt.groups): + raise ValueError( + f"unexpected encoding group name(s) provided: {set(encoding) - set(dt.groups)}" + ) + if unlimited_dims is None: unlimited_dims = {} @@ -155,16 +156,15 @@ def _datatree_to_netcdf( if ds is None: _create_empty_netcdf_group(filepath, group_path, mode, engine) else: - ds.to_netcdf( filepath, group=group_path, mode=mode, - encoding=_maybe_extract_group_kwargs(encoding, dt.path), - unlimited_dims=_maybe_extract_group_kwargs(unlimited_dims, dt.path), + encoding=encoding.get(node.path), + unlimited_dims=unlimited_dims.get(node.path), **kwargs, ) - mode = "a" + mode = "r+" def _create_empty_zarr_group(store, group, mode): @@ -196,6 +196,14 @@ def _datatree_to_zarr( if encoding is None: encoding = {} + # In the future, we may want to expand this check to insure all the provided encoding + # options are valid. For now, this simply checks that all provided encoding keys are + # groups in the datatree. + if set(encoding) - set(dt.groups): + raise ValueError( + f"unexpected encoding group name(s) provided: {set(encoding) - set(dt.groups)}" + ) + for node in dt.subtree: ds = node.ds group_path = node.path @@ -206,7 +214,7 @@ def _datatree_to_zarr( store, group=group_path, mode=mode, - encoding=_maybe_extract_group_kwargs(encoding, dt.path), + encoding=encoding.get(node.path), consolidated=False, **kwargs, ) diff --git a/xarray/datatree_/datatree/tests/test_io.py b/xarray/datatree_/datatree/tests/test_io.py index b7005471f17..dd354cce847 100644 --- a/xarray/datatree_/datatree/tests/test_io.py +++ b/xarray/datatree_/datatree/tests/test_io.py @@ -18,6 +18,27 @@ def test_to_netcdf(self, tmpdir): roundtrip_dt = open_datatree(filepath) assert_equal(original_dt, roundtrip_dt) + @requires_netCDF4 + def test_netcdf_encoding(self, tmpdir): + filepath = str( + tmpdir / "test.nc" + ) # casting to str avoids a pathlib bug in xarray + original_dt = create_test_datatree() + + # add compression + comp = dict(zlib=True, complevel=9) + enc = {"/set2": {var: comp for var in original_dt["/set2"].ds.data_vars}} + + original_dt.to_netcdf(filepath, encoding=enc, engine="netcdf4") + roundtrip_dt = open_datatree(filepath) + + assert roundtrip_dt["/set2/a"].encoding["zlib"] == comp["zlib"] + assert roundtrip_dt["/set2/a"].encoding["complevel"] == comp["complevel"] + + enc["/not/a/group"] = {"foo": "bar"} + with pytest.raises(ValueError, match="unexpected encoding group.*"): + original_dt.to_netcdf(filepath, encoding=enc, engine="netcdf4") + @requires_h5netcdf def test_to_h5netcdf(self, tmpdir): filepath = str( @@ -40,6 +61,27 @@ def test_to_zarr(self, tmpdir): roundtrip_dt = open_datatree(filepath, engine="zarr") assert_equal(original_dt, roundtrip_dt) + @requires_zarr + def test_zarr_encoding(self, tmpdir): + import zarr + + filepath = str( + tmpdir / "test.zarr" + ) # casting to str avoids a pathlib bug in xarray + original_dt = create_test_datatree() + + comp = {"compressor": zarr.Blosc(cname="zstd", clevel=3, shuffle=2)} + enc = {"/set2": {var: comp for var in original_dt["/set2"].ds.data_vars}} + original_dt.to_zarr(filepath, encoding=enc) + roundtrip_dt = open_datatree(filepath, engine="zarr") + + print(roundtrip_dt["/set2/a"].encoding) + assert roundtrip_dt["/set2/a"].encoding["compressor"] == comp["compressor"] + + enc["/not/a/group"] = {"foo": "bar"} + with pytest.raises(ValueError, match="unexpected encoding group.*"): + original_dt.to_zarr(filepath, encoding=enc, engine="zarr") + @requires_zarr def test_to_zarr_zip_store(self, tmpdir): from zarr.storage import ZipStore From 7231e13d6a811d531f5ad6c6516d3419ba3ad451 Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Thu, 26 May 2022 11:41:22 -0400 Subject: [PATCH 123/260] Add reminder to update whatsnew to PR template --- xarray/datatree_/.github/pull_request_template.md | 1 + 1 file changed, 1 insertion(+) diff --git a/xarray/datatree_/.github/pull_request_template.md b/xarray/datatree_/.github/pull_request_template.md index e144c6adaa3..8270498108a 100644 --- a/xarray/datatree_/.github/pull_request_template.md +++ b/xarray/datatree_/.github/pull_request_template.md @@ -4,3 +4,4 @@ - [ ] Tests added - [ ] Passes `pre-commit run --all-files` - [ ] New functions/methods are listed in `api.rst` +- [ ] Changes are summarized in `docs/source/whats-new.rst` From a1764a1cfd6a8048dd8c2b133d61e6d878342b40 Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Thu, 26 May 2022 12:23:57 -0400 Subject: [PATCH 124/260] Remove anytree dependency from CI runs https://github.com/xarray-contrib/datatree/pull/101 --- xarray/datatree_/.github/workflows/main.yaml | 3 +-- xarray/datatree_/docs/source/installation.rst | 3 +-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/xarray/datatree_/.github/workflows/main.yaml b/xarray/datatree_/.github/workflows/main.yaml index a1843b3477f..37f0ae222b2 100644 --- a/xarray/datatree_/.github/workflows/main.yaml +++ b/xarray/datatree_/.github/workflows/main.yaml @@ -75,8 +75,7 @@ jobs: run: | python -m pip install --no-deps --upgrade \ git+https://github.com/pydata/xarray \ - git+https://github.com/Unidata/netcdf4-python \ - git+https://github.com/c0fec0de/anytree + git+https://github.com/Unidata/netcdf4-python python -m pip install --no-deps -e . python -m pip list - name: Running Tests diff --git a/xarray/datatree_/docs/source/installation.rst b/xarray/datatree_/docs/source/installation.rst index 48799089d4b..6cab417e950 100644 --- a/xarray/datatree_/docs/source/installation.rst +++ b/xarray/datatree_/docs/source/installation.rst @@ -26,8 +26,7 @@ To install a development version from source: $ python -m pip install -e . -You will need xarray and `anytree `_ -as dependencies, with netcdf4, zarr, and h5netcdf as optional dependencies to allow file I/O. +You will just need xarray as a required dependency, with netcdf4, zarr, and h5netcdf as optional dependencies to allow file I/O. .. note:: From 08988ca4cdb746e5b0e7d17ca8797f7fd911448f Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Thu, 2 Jun 2022 14:59:43 -0400 Subject: [PATCH 125/260] Fix loop bug https://github.com/xarray-contrib/datatree/pull/105 * test to check for loops * fix bug when assigning new parent * refactor to use _is_descendant_of method * also check children setter for loops * whatsnew --- .../datatree_/datatree/tests/test_treenode.py | 20 +++++++++++++++++++ xarray/datatree_/datatree/treenode.py | 9 ++++++--- xarray/datatree_/docs/source/whats-new.rst | 3 +++ 3 files changed, 29 insertions(+), 3 deletions(-) diff --git a/xarray/datatree_/datatree/tests/test_treenode.py b/xarray/datatree_/datatree/tests/test_treenode.py index 3ea859d8492..0494911f2ca 100644 --- a/xarray/datatree_/datatree/tests/test_treenode.py +++ b/xarray/datatree_/datatree/tests/test_treenode.py @@ -18,6 +18,26 @@ def test_parenting(self): assert mary.parent == john assert john.children["Mary"] is mary + def test_no_time_traveller_loops(self): + john = TreeNode() + + with pytest.raises(TreeError, match="cannot be a parent of itself"): + john._set_parent(john, "John") + + with pytest.raises(TreeError, match="cannot be a parent of itself"): + john.children = {"John": john} + + mary = TreeNode() + rose = TreeNode() + mary._set_parent(john, "Mary") + rose._set_parent(mary, "Rose") + + with pytest.raises(TreeError, match="is already a descendant"): + john._set_parent(rose, "John") + + with pytest.raises(TreeError, match="is already a descendant"): + rose.children = {"John": john} + def test_parent_swap(self): john = TreeNode() mary = TreeNode() diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index c3effd91bd7..e29bfd66344 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -111,12 +111,15 @@ def _check_loop(self, new_parent: Tree | None) -> None: f"Cannot set parent, as node {self} cannot be a parent of itself." ) - _self, *lineage = list(self.lineage) - if any(child is self for child in lineage): + if self._is_descendant_of(new_parent): raise TreeError( - f"Cannot set parent, as node {self} is already a descendant of node {new_parent}." + f"Cannot set parent, as node {new_parent.name} is already a descendant of this node." ) + def _is_descendant_of(self, node: Tree) -> bool: + _self, *lineage = list(node.lineage) + return any(n is self for n in lineage) + def _detach(self, parent: Tree | None) -> None: if parent is not None: self._pre_detach(parent) diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index ef950de6f71..5ce16337d54 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -32,6 +32,9 @@ Deprecations Bug fixes ~~~~~~~~~ +- Fixed bug with checking that assigning parent or new children did not create a loop in the tree (:pull:`105`) + By `Tom Nicholas `_. + Documentation ~~~~~~~~~~~~~ From 205320dbbf7fed9e1664d08292bb084cb0ce0c4a Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Thu, 2 Jun 2022 16:30:07 -0400 Subject: [PATCH 126/260] add other bugfixes to whatsnew --- xarray/datatree_/docs/source/whats-new.rst | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 5ce16337d54..37b3d3e20fb 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -34,6 +34,10 @@ Bug fixes - Fixed bug with checking that assigning parent or new children did not create a loop in the tree (:pull:`105`) By `Tom Nicholas `_. +- Do not call ``__exit__`` on Zarr store when opening (:pull:`90`) + By `Matt McCormick `_. +- Fix netCDF encoding for compression (:pull:`95`) + By `Joe Hamman `_. Documentation ~~~~~~~~~~~~~ From 325982145b2a4035c843b71b1b3627c13caaff52 Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Fri, 3 Jun 2022 11:50:24 -0400 Subject: [PATCH 127/260] Name checking https://github.com/xarray-contrib/datatree/pull/106 * test for invalid node names * check for invalid node names in property setter * remove typing confusion * whatsnew --- xarray/datatree_/datatree/datatree.py | 9 +++++++-- xarray/datatree_/datatree/tests/test_datatree.py | 7 +++++++ xarray/datatree_/docs/source/whats-new.rst | 2 ++ 3 files changed, 16 insertions(+), 2 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index af666927701..708a4599611 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -87,7 +87,7 @@ class DataTree( def __init__( self, - data: Optional[Dataset | DataArray] = None, + data: Dataset | DataArray = None, parent: DataTree = None, children: Mapping[str, DataTree] = None, name: str = None, @@ -119,7 +119,7 @@ def __init__( super().__init__(children=children) self.name = name self.parent = parent - self.ds = data # type: ignore[assignment] + self.ds = data @property def name(self) -> str | None: @@ -128,6 +128,11 @@ def name(self) -> str | None: @name.setter def name(self, name: str | None) -> None: + if name is not None: + if not isinstance(name, str): + raise TypeError("node name must be a string or None") + if "/" in name: + raise ValueError("node names cannot contain forward slashes") self._name = name @property diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 45442b9c753..fb28e7b396a 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -63,6 +63,13 @@ def test_unnamed(self): dt = DataTree() assert dt.name is None + def test_bad_names(self): + with pytest.raises(TypeError): + DataTree(name=5) + + with pytest.raises(ValueError): + DataTree(name="folder/data") + class TestFamilyTree: def test_setparent_unnamed_child_node_fails(self): diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 37b3d3e20fb..8f965833281 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -38,6 +38,8 @@ Bug fixes By `Matt McCormick `_. - Fix netCDF encoding for compression (:pull:`95`) By `Joe Hamman `_. +- Added validity checking for node names (:pull:`106`) + By `Tom Nicholas `_. Documentation ~~~~~~~~~~~~~ From 6b4de29475b20c8ac76af3da0662f626f285a73f Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Fri, 3 Jun 2022 12:02:34 -0400 Subject: [PATCH 128/260] update whatsnew for 0.0.6 release --- xarray/datatree_/docs/source/whats-new.rst | 27 +++++++++++++++------- 1 file changed, 19 insertions(+), 8 deletions(-) diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 8f965833281..91fa85a7b88 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -15,9 +15,9 @@ What's New np.random.seed(123456) -.. _whats-new.v0.0.6: +.. _whats-new.v0.0.7: -v0.0.6 (unreleased) +v0.0.7 (unreleased) ------------------- New Features @@ -32,6 +32,23 @@ Deprecations Bug fixes ~~~~~~~~~ +Documentation +~~~~~~~~~~~~~ + +Internal Changes +~~~~~~~~~~~~~~~~ + + +.. _whats-new.v0.0.6: + +v0.0.6 (06/03/2022) +------------------- + +Various small bug fixes, in preparation for more significant changes in the next version. + +Bug fixes +~~~~~~~~~ + - Fixed bug with checking that assigning parent or new children did not create a loop in the tree (:pull:`105`) By `Tom Nicholas `_. - Do not call ``__exit__`` on Zarr store when opening (:pull:`90`) @@ -41,12 +58,6 @@ Bug fixes - Added validity checking for node names (:pull:`106`) By `Tom Nicholas `_. -Documentation -~~~~~~~~~~~~~ - -Internal Changes -~~~~~~~~~~~~~~~~ - .. _whats-new.v0.0.5: v0.0.5 (05/05/2022) From 7c99ed7a1060cae772b7988eff772cef9e701eb7 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Wed, 8 Jun 2022 16:16:24 -0400 Subject: [PATCH 129/260] [pre-commit.ci] pre-commit autoupdate https://github.com/xarray-contrib/datatree/pull/108 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit updates: - [github.com/pre-commit/mirrors-mypy: v0.950 → v0.960](https://github.com/pre-commit/mirrors-mypy/compare/v0.950...v0.960) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- xarray/datatree_/.pre-commit-config.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/.pre-commit-config.yaml b/xarray/datatree_/.pre-commit-config.yaml index c30f66aeeec..1e1649cf9af 100644 --- a/xarray/datatree_/.pre-commit-config.yaml +++ b/xarray/datatree_/.pre-commit-config.yaml @@ -32,7 +32,7 @@ repos: # - id: velin # args: ["--write", "--compact"] - repo: https://github.com/pre-commit/mirrors-mypy - rev: v0.950 + rev: v0.960 hooks: - id: mypy # Copied from setup.cfg From 3b4e181c88ebd271df88d383400dc3db1f0809ff Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Thu, 9 Jun 2022 11:31:55 -0400 Subject: [PATCH 130/260] Bump actions/setup-python from 3 to 4 https://github.com/xarray-contrib/datatree/pull/110 Bumps [actions/setup-python](https://github.com/actions/setup-python) from 3 to 4. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](https://github.com/actions/setup-python/compare/v3...v4) --- updated-dependencies: - dependency-name: actions/setup-python dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- xarray/datatree_/.github/workflows/pypipublish.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index f974295bb01..48270f5f9b5 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -10,7 +10,7 @@ jobs: steps: - uses: actions/checkout@v3 - name: Set up Python - uses: actions/setup-python@v3 + uses: actions/setup-python@v4 with: python-version: "3.x" - name: Install dependencies From 9637a2e8a7e71dc94c12f1c38cfb2670c7f751b8 Mon Sep 17 00:00:00 2001 From: Benjamin Woods Date: Wed, 15 Jun 2022 20:03:44 +0100 Subject: [PATCH 131/260] Enable tree-style HTML representation https://github.com/xarray-contrib/datatree/pull/109 * Modify HTML repr to add "anytree" style * Instead of indenting children, wrap each node_repr into a three column CSS grid * Column 1 has right border, column 2 has 2 rows, one with bottom border, column 3 contains the node_repr * Set the height of column 1 depending on whether it is the end of a list of nodes in the tree or not * Change _wrapped_node_repr -> _wrap_repr * This quick rewrite allows for much easier testing; the code is a bit more single purpose this way. * Unit tests for _wrap_repr and summarize_children. * black; quick flake8 patch (hashlib.sha1) was not used. * whatsnew Co-authored-by: Thomas Nicholas --- xarray/datatree_/datatree/formatting_html.py | 93 ++++++++- .../datatree/tests/test_formatting_html.py | 197 ++++++++++++++++++ xarray/datatree_/docs/source/whats-new.rst | 3 + 3 files changed, 287 insertions(+), 6 deletions(-) create mode 100644 xarray/datatree_/datatree/tests/test_formatting_html.py diff --git a/xarray/datatree_/datatree/formatting_html.py b/xarray/datatree_/datatree/formatting_html.py index 91c1d1449d5..4531f5aec18 100644 --- a/xarray/datatree_/datatree/formatting_html.py +++ b/xarray/datatree_/datatree/formatting_html.py @@ -16,14 +16,24 @@ def summarize_children(children: Mapping[str, Any]) -> str: - children_li = "".join( - f"
    {node_repr(n, c)}
" for n, c in children.items() + N_CHILDREN = len(children) - 1 + + # Get result from node_repr and wrap it + lines_callback = lambda n, c, end: _wrap_repr(node_repr(n, c), end=end) + + children_html = "".join( + lines_callback(n, c, end=False) # Long lines + if i < N_CHILDREN + else lines_callback(n, c, end=True) # Short lines + for i, (n, c) in enumerate(children.items()) ) - return ( - "
    " - f"
    {children_li}
    " - "
" + return "".join( + [ + "
", + children_html, + "
", + ] ) @@ -52,6 +62,77 @@ def node_repr(group_title: str, dt: Any) -> str: return _obj_repr(ds, header_components, sections) +def _wrap_repr(r: str, end: bool = False) -> str: + """ + Wrap HTML representation with a tee to the left of it. + + Enclosing HTML tag is a
with :code:`display: inline-grid` style. + + Turns: + [ title ] + | details | + |_____________| + + into (A): + |─ [ title ] + | | details | + | |_____________| + + or (B): + └─ [ title ] + | details | + |_____________| + + Parameters + ---------- + r: str + HTML representation to wrap. + end: bool + Specify if the line on the left should continue or end. + + Default is True. + + Returns + ------- + str + Wrapped HTML representation. + + Tee color is set to the variable :code:`--xr-border-color`. + """ + # height of line + end = bool(end) + height = "100%" if end is False else "1.2em" + return "".join( + [ + "
", + "
", + "
", + "
", + "
", + "
", + "
    ", + r, + "
" "
", + "
", + ] + ) + + def datatree_repr(dt: Any) -> str: obj_type = f"datatree.{type(dt).__name__}" return node_repr(obj_type, dt) diff --git a/xarray/datatree_/datatree/tests/test_formatting_html.py b/xarray/datatree_/datatree/tests/test_formatting_html.py new file mode 100644 index 00000000000..7c6a47ea86e --- /dev/null +++ b/xarray/datatree_/datatree/tests/test_formatting_html.py @@ -0,0 +1,197 @@ +import pytest +import xarray as xr + +from datatree import DataTree, formatting_html + + +@pytest.fixture(scope="module", params=["some html", "some other html"]) +def repr(request): + return request.param + + +class Test_summarize_children: + """ + Unit tests for summarize_children. + """ + + func = staticmethod(formatting_html.summarize_children) + + @pytest.fixture(scope="class") + def childfree_tree_factory(self): + """ + Fixture for a child-free DataTree factory. + """ + from random import randint + + def _childfree_tree_factory(): + return DataTree( + data=xr.Dataset({"z": ("y", [randint(1, 100) for _ in range(3)])}) + ) + + return _childfree_tree_factory + + @pytest.fixture(scope="class") + def childfree_tree(self, childfree_tree_factory): + """ + Fixture for a child-free DataTree. + """ + return childfree_tree_factory() + + @pytest.fixture(scope="function") + def mock_node_repr(self, monkeypatch): + """ + Apply mocking for node_repr. + """ + + def mock(group_title, dt): + """ + Mock with a simple result + """ + return group_title + " " + str(id(dt)) + + monkeypatch.setattr(formatting_html, "node_repr", mock) + + @pytest.fixture(scope="function") + def mock_wrap_repr(self, monkeypatch): + """ + Apply mocking for _wrap_repr. + """ + + def mock(r, *, end, **kwargs): + """ + Mock by appending "end" or "not end". + """ + return r + " " + ("end" if end else "not end") + "//" + + monkeypatch.setattr(formatting_html, "_wrap_repr", mock) + + def test_empty_mapping(self): + """ + Test with an empty mapping of children. + """ + children = {} + assert self.func(children) == ( + "
" "
" + ) + + def test_one_child(self, childfree_tree, mock_wrap_repr, mock_node_repr): + """ + Test with one child. + + Uses a mock of _wrap_repr and node_repr to essentially mock + the inline lambda function "lines_callback". + """ + # Create mapping of children + children = {"a": childfree_tree} + + # Expect first line to be produced from the first child, and + # wrapped as the last child + first_line = f"a {id(children['a'])} end//" + + assert self.func(children) == ( + "
" + f"{first_line}" + "
" + ) + + def test_two_children(self, childfree_tree_factory, mock_wrap_repr, mock_node_repr): + """ + Test with two level deep children. + + Uses a mock of _wrap_repr and node_repr to essentially mock + the inline lambda function "lines_callback". + """ + + # Create mapping of children + children = {"a": childfree_tree_factory(), "b": childfree_tree_factory()} + + # Expect first line to be produced from the first child, and + # wrapped as _not_ the last child + first_line = f"a {id(children['a'])} not end//" + + # Expect second line to be produced from the second child, and + # wrapped as the last child + second_line = f"b {id(children['b'])} end//" + + assert self.func(children) == ( + "
" + f"{first_line}" + f"{second_line}" + "
" + ) + + +class Test__wrap_repr: + """ + Unit tests for _wrap_repr. + """ + + func = staticmethod(formatting_html._wrap_repr) + + def test_end(self, repr): + """ + Test with end=True. + """ + r = self.func(repr, end=True) + assert r == ( + "
" + "
" + "
" + "
" + "
" + "
" + "
    " + f"{repr}" + "
" + "
" + "
" + ) + + def test_not_end(self, repr): + """ + Test with end=False. + """ + r = self.func(repr, end=False) + assert r == ( + "
" + "
" + "
" + "
" + "
" + "
" + "
    " + f"{repr}" + "
" + "
" + "
" + ) diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 91fa85a7b88..d1b5c56b96e 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -23,6 +23,9 @@ v0.0.7 (unreleased) New Features ~~~~~~~~~~~~ +- Improve the HTML repr by adding tree-style lines connecting groups and sub-groups (:pull:`109`). + By `Benjamin Woods `_. + Breaking changes ~~~~~~~~~~~~~~~~ From 90c53d997b6074ab6bb3a28a48911a365ec2b5ab Mon Sep 17 00:00:00 2001 From: Benjamin Woods Date: Wed, 15 Jun 2022 20:19:46 +0100 Subject: [PATCH 132/260] Make create_test_datatree a pytest.fixture https://github.com/xarray-contrib/datatree/pull/107 * Migrate create_test_datatree to pytest.fixture * Move create_test_datatree fixture to conftest.py * black * whatsnew Co-authored-by: Thomas Nicholas --- xarray/datatree_/datatree/tests/conftest.py | 65 +++++++++++++++++++ .../datatree/tests/test_dataset_api.py | 4 +- .../datatree_/datatree/tests/test_datatree.py | 58 ++--------------- .../datatree/tests/test_formatting.py | 6 +- xarray/datatree_/datatree/tests/test_io.py | 29 ++++----- .../datatree_/datatree/tests/test_mapping.py | 44 ++++++------- xarray/datatree_/docs/source/whats-new.rst | 3 + 7 files changed, 112 insertions(+), 97 deletions(-) create mode 100644 xarray/datatree_/datatree/tests/conftest.py diff --git a/xarray/datatree_/datatree/tests/conftest.py b/xarray/datatree_/datatree/tests/conftest.py new file mode 100644 index 00000000000..3ed1325ccd5 --- /dev/null +++ b/xarray/datatree_/datatree/tests/conftest.py @@ -0,0 +1,65 @@ +import pytest +import xarray as xr + +from datatree import DataTree + + +@pytest.fixture(scope="module") +def create_test_datatree(): + """ + Create a test datatree with this structure: + + + |-- set1 + | |-- + | | Dimensions: () + | | Data variables: + | | a int64 0 + | | b int64 1 + | |-- set1 + | |-- set2 + |-- set2 + | |-- + | | Dimensions: (x: 2) + | | Data variables: + | | a (x) int64 2, 3 + | | b (x) int64 0.1, 0.2 + | |-- set1 + |-- set3 + |-- + | Dimensions: (x: 2, y: 3) + | Data variables: + | a (y) int64 6, 7, 8 + | set0 (x) int64 9, 10 + + The structure has deliberately repeated names of tags, variables, and + dimensions in order to better check for bugs caused by name conflicts. + """ + + def _create_test_datatree(modify=lambda ds: ds): + set1_data = modify(xr.Dataset({"a": 0, "b": 1})) + set2_data = modify(xr.Dataset({"a": ("x", [2, 3]), "b": ("x", [0.1, 0.2])})) + root_data = modify(xr.Dataset({"a": ("y", [6, 7, 8]), "set0": ("x", [9, 10])})) + + # Avoid using __init__ so we can independently test it + root = DataTree(data=root_data) + set1 = DataTree(name="set1", parent=root, data=set1_data) + DataTree(name="set1", parent=set1) + DataTree(name="set2", parent=set1) + set2 = DataTree(name="set2", parent=root, data=set2_data) + DataTree(name="set1", parent=set2) + DataTree(name="set3", parent=root) + + return root + + return _create_test_datatree + + +@pytest.fixture(scope="module") +def simple_datatree(create_test_datatree): + """ + Invoke create_test_datatree fixture (callback). + + Returns a DataTree. + """ + return create_test_datatree() diff --git a/xarray/datatree_/datatree/tests/test_dataset_api.py b/xarray/datatree_/datatree/tests/test_dataset_api.py index f8bae063383..6879b869299 100644 --- a/xarray/datatree_/datatree/tests/test_dataset_api.py +++ b/xarray/datatree_/datatree/tests/test_dataset_api.py @@ -4,8 +4,6 @@ from datatree import DataTree from datatree.testing import assert_equal -from .test_datatree import create_test_datatree - class TestDSMethodInheritance: def test_dataset_method(self): @@ -93,7 +91,7 @@ def test_binary_op_on_datatree(self): class TestUFuncs: - def test_tree(self): + def test_tree(self, create_test_datatree): dt = create_test_datatree() expected = create_test_datatree(modify=lambda ds: np.sin(ds)) result_tree = np.sin(dt) diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index fb28e7b396a..2ef2b8c8f64 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -5,52 +5,6 @@ from datatree import DataTree -def create_test_datatree(modify=lambda ds: ds): - """ - Create a test datatree with this structure: - - - |-- set1 - | |-- - | | Dimensions: () - | | Data variables: - | | a int64 0 - | | b int64 1 - | |-- set1 - | |-- set2 - |-- set2 - | |-- - | | Dimensions: (x: 2) - | | Data variables: - | | a (x) int64 2, 3 - | | b (x) int64 0.1, 0.2 - | |-- set1 - |-- set3 - |-- - | Dimensions: (x: 2, y: 3) - | Data variables: - | a (y) int64 6, 7, 8 - | set0 (x) int64 9, 10 - - The structure has deliberately repeated names of tags, variables, and - dimensions in order to better check for bugs caused by name conflicts. - """ - set1_data = modify(xr.Dataset({"a": 0, "b": 1})) - set2_data = modify(xr.Dataset({"a": ("x", [2, 3]), "b": ("x", [0.1, 0.2])})) - root_data = modify(xr.Dataset({"a": ("y", [6, 7, 8]), "set0": ("x", [9, 10])})) - - # Avoid using __init__ so we can independently test it - root = DataTree(data=root_data) - set1 = DataTree(name="set1", parent=root, data=set1_data) - DataTree(name="set1", parent=set1) - DataTree(name="set2", parent=set1) - set2 = DataTree(name="set2", parent=root, data=set2_data) - DataTree(name="set1", parent=set2) - DataTree(name="set3", parent=root) - - return root - - class TestTreeCreation: def test_empty(self): dt = DataTree(name="root") @@ -322,8 +276,8 @@ def test_nones(self): assert [node.path for node in dt.subtree] == ["/", "/d", "/d/e"] xrt.assert_equal(dt["d/e"].ds, xr.Dataset()) - def test_full(self): - dt = create_test_datatree() + def test_full(self, simple_datatree): + dt = simple_datatree paths = list(node.path for node in dt.subtree) assert paths == [ "/", @@ -335,16 +289,16 @@ def test_full(self): "/set3", ] - def test_roundtrip(self): - dt = create_test_datatree() + def test_roundtrip(self, simple_datatree): + dt = simple_datatree roundtrip = DataTree.from_dict(dt.to_dict()) assert roundtrip.equals(dt) @pytest.mark.xfail - def test_roundtrip_unnamed_root(self): + def test_roundtrip_unnamed_root(self, simple_datatree): # See GH81 - dt = create_test_datatree() + dt = simple_datatree dt.name = "root" roundtrip = DataTree.from_dict(dt.to_dict()) assert roundtrip.equals(dt) diff --git a/xarray/datatree_/datatree/tests/test_formatting.py b/xarray/datatree_/datatree/tests/test_formatting.py index b3a9fed04ba..d0e3e9fd36d 100644 --- a/xarray/datatree_/datatree/tests/test_formatting.py +++ b/xarray/datatree_/datatree/tests/test_formatting.py @@ -5,8 +5,6 @@ from datatree import DataTree from datatree.formatting import diff_tree_repr -from .test_datatree import create_test_datatree - class TestRepr: def test_print_empty_node(self): @@ -50,8 +48,8 @@ def test_nested_node(self): printout = root.__str__() assert printout.splitlines()[2].startswith(" ") - def test_print_datatree(self): - dt = create_test_datatree() + def test_print_datatree(self, simple_datatree): + dt = simple_datatree print(dt) # TODO work out how to test something complex like this diff --git a/xarray/datatree_/datatree/tests/test_io.py b/xarray/datatree_/datatree/tests/test_io.py index dd354cce847..59199371de4 100644 --- a/xarray/datatree_/datatree/tests/test_io.py +++ b/xarray/datatree_/datatree/tests/test_io.py @@ -3,27 +3,26 @@ from datatree.io import open_datatree from datatree.testing import assert_equal from datatree.tests import requires_h5netcdf, requires_netCDF4, requires_zarr -from datatree.tests.test_datatree import create_test_datatree class TestIO: @requires_netCDF4 - def test_to_netcdf(self, tmpdir): + def test_to_netcdf(self, tmpdir, simple_datatree): filepath = str( tmpdir / "test.nc" ) # casting to str avoids a pathlib bug in xarray - original_dt = create_test_datatree() + original_dt = simple_datatree original_dt.to_netcdf(filepath, engine="netcdf4") roundtrip_dt = open_datatree(filepath) assert_equal(original_dt, roundtrip_dt) @requires_netCDF4 - def test_netcdf_encoding(self, tmpdir): + def test_netcdf_encoding(self, tmpdir, simple_datatree): filepath = str( tmpdir / "test.nc" ) # casting to str avoids a pathlib bug in xarray - original_dt = create_test_datatree() + original_dt = simple_datatree # add compression comp = dict(zlib=True, complevel=9) @@ -40,35 +39,35 @@ def test_netcdf_encoding(self, tmpdir): original_dt.to_netcdf(filepath, encoding=enc, engine="netcdf4") @requires_h5netcdf - def test_to_h5netcdf(self, tmpdir): + def test_to_h5netcdf(self, tmpdir, simple_datatree): filepath = str( tmpdir / "test.nc" ) # casting to str avoids a pathlib bug in xarray - original_dt = create_test_datatree() + original_dt = simple_datatree original_dt.to_netcdf(filepath, engine="h5netcdf") roundtrip_dt = open_datatree(filepath) assert_equal(original_dt, roundtrip_dt) @requires_zarr - def test_to_zarr(self, tmpdir): + def test_to_zarr(self, tmpdir, simple_datatree): filepath = str( tmpdir / "test.zarr" ) # casting to str avoids a pathlib bug in xarray - original_dt = create_test_datatree() + original_dt = simple_datatree original_dt.to_zarr(filepath) roundtrip_dt = open_datatree(filepath, engine="zarr") assert_equal(original_dt, roundtrip_dt) @requires_zarr - def test_zarr_encoding(self, tmpdir): + def test_zarr_encoding(self, tmpdir, simple_datatree): import zarr filepath = str( tmpdir / "test.zarr" ) # casting to str avoids a pathlib bug in xarray - original_dt = create_test_datatree() + original_dt = simple_datatree comp = {"compressor": zarr.Blosc(cname="zstd", clevel=3, shuffle=2)} enc = {"/set2": {var: comp for var in original_dt["/set2"].ds.data_vars}} @@ -83,13 +82,13 @@ def test_zarr_encoding(self, tmpdir): original_dt.to_zarr(filepath, encoding=enc, engine="zarr") @requires_zarr - def test_to_zarr_zip_store(self, tmpdir): + def test_to_zarr_zip_store(self, tmpdir, simple_datatree): from zarr.storage import ZipStore filepath = str( tmpdir / "test.zarr.zip" ) # casting to str avoids a pathlib bug in xarray - original_dt = create_test_datatree() + original_dt = simple_datatree store = ZipStore(filepath) original_dt.to_zarr(store) @@ -97,12 +96,12 @@ def test_to_zarr_zip_store(self, tmpdir): assert_equal(original_dt, roundtrip_dt) @requires_zarr - def test_to_zarr_not_consolidated(self, tmpdir): + def test_to_zarr_not_consolidated(self, tmpdir, simple_datatree): filepath = tmpdir / "test.zarr" zmetadata = filepath / ".zmetadata" s1zmetadata = filepath / "set1" / ".zmetadata" filepath = str(filepath) # casting to str avoids a pathlib bug in xarray - original_dt = create_test_datatree() + original_dt = simple_datatree original_dt.to_zarr(filepath, consolidated=False) assert not zmetadata.exists() assert not s1zmetadata.exists() diff --git a/xarray/datatree_/datatree/tests/test_mapping.py b/xarray/datatree_/datatree/tests/test_mapping.py index 0bdd3be6f44..b1bb59f890f 100644 --- a/xarray/datatree_/datatree/tests/test_mapping.py +++ b/xarray/datatree_/datatree/tests/test_mapping.py @@ -5,8 +5,6 @@ from datatree.mapping import TreeIsomorphismError, check_isomorphic, map_over_subtree from datatree.testing import assert_equal -from .test_datatree import create_test_datatree - empty = xr.Dataset() @@ -60,14 +58,14 @@ def test_isomorphic_names_not_equal(self): dt2 = DataTree.from_dict({"A": empty, "B": empty, "B/C": empty, "B/D": empty}) check_isomorphic(dt1, dt2) - def test_not_isomorphic_complex_tree(self): + def test_not_isomorphic_complex_tree(self, create_test_datatree): dt1 = create_test_datatree() dt2 = create_test_datatree() dt2["set1/set2/extra"] = DataTree(name="extra") with pytest.raises(TreeIsomorphismError, match="/set1/set2"): check_isomorphic(dt1, dt2) - def test_checking_from_root(self): + def test_checking_from_root(self, create_test_datatree): dt1 = create_test_datatree() dt2 = create_test_datatree() real_root = DataTree() @@ -85,7 +83,7 @@ def times_ten(ds): with pytest.raises(TypeError, match="Must pass at least one tree"): times_ten("dt") - def test_not_isomorphic(self): + def test_not_isomorphic(self, create_test_datatree): dt1 = create_test_datatree() dt2 = create_test_datatree() dt2["set1/set2/extra"] = DataTree(name="extra") @@ -97,7 +95,7 @@ def times_ten(ds1, ds2): with pytest.raises(TreeIsomorphismError): times_ten(dt1, dt2) - def test_no_trees_returned(self): + def test_no_trees_returned(self, create_test_datatree): dt1 = create_test_datatree() dt2 = create_test_datatree() @@ -108,7 +106,7 @@ def bad_func(ds1, ds2): with pytest.raises(TypeError, match="return value of None"): bad_func(dt1, dt2) - def test_single_dt_arg(self): + def test_single_dt_arg(self, create_test_datatree): dt = create_test_datatree() @map_over_subtree @@ -119,7 +117,7 @@ def times_ten(ds): result_tree = times_ten(dt) assert_equal(result_tree, expected) - def test_single_dt_arg_plus_args_and_kwargs(self): + def test_single_dt_arg_plus_args_and_kwargs(self, create_test_datatree): dt = create_test_datatree() @map_over_subtree @@ -130,7 +128,7 @@ def multiply_then_add(ds, times, add=0.0): result_tree = multiply_then_add(dt, 10.0, add=2.0) assert_equal(result_tree, expected) - def test_multiple_dt_args(self): + def test_multiple_dt_args(self, create_test_datatree): dt1 = create_test_datatree() dt2 = create_test_datatree() @@ -142,7 +140,7 @@ def add(ds1, ds2): result = add(dt1, dt2) assert_equal(result, expected) - def test_dt_as_kwarg(self): + def test_dt_as_kwarg(self, create_test_datatree): dt1 = create_test_datatree() dt2 = create_test_datatree() @@ -154,7 +152,7 @@ def add(ds1, value=0.0): result = add(dt1, value=dt2) assert_equal(result, expected) - def test_return_multiple_dts(self): + def test_return_multiple_dts(self, create_test_datatree): dt = create_test_datatree() @map_over_subtree @@ -167,8 +165,8 @@ def minmax(ds): expected_max = create_test_datatree(modify=lambda ds: ds.max()) assert_equal(dt_max, expected_max) - def test_return_wrong_type(self): - dt1 = create_test_datatree() + def test_return_wrong_type(self, simple_datatree): + dt1 = simple_datatree @map_over_subtree def bad_func(ds1): @@ -177,8 +175,8 @@ def bad_func(ds1): with pytest.raises(TypeError, match="not Dataset or DataArray"): bad_func(dt1) - def test_return_tuple_of_wrong_types(self): - dt1 = create_test_datatree() + def test_return_tuple_of_wrong_types(self, simple_datatree): + dt1 = simple_datatree @map_over_subtree def bad_func(ds1): @@ -188,20 +186,20 @@ def bad_func(ds1): bad_func(dt1) @pytest.mark.xfail - def test_return_inconsistent_number_of_results(self): - dt1 = create_test_datatree() + def test_return_inconsistent_number_of_results(self, simple_datatree): + dt1 = simple_datatree @map_over_subtree def bad_func(ds): - # Datasets in create_test_datatree() have different numbers of dims + # Datasets in simple_datatree have different numbers of dims # TODO need to instead return different numbers of Dataset objects for this test to catch the intended error return tuple(ds.dims) with pytest.raises(TypeError, match="instead returns"): bad_func(dt1) - def test_wrong_number_of_arguments_for_func(self): - dt = create_test_datatree() + def test_wrong_number_of_arguments_for_func(self, simple_datatree): + dt = simple_datatree @map_over_subtree def times_ten(ds): @@ -212,7 +210,7 @@ def times_ten(ds): ): times_ten(dt, dt) - def test_map_single_dataset_against_whole_tree(self): + def test_map_single_dataset_against_whole_tree(self, create_test_datatree): dt = create_test_datatree() @map_over_subtree @@ -229,7 +227,7 @@ def test_trees_with_different_node_names(self): # TODO test this after I've got good tests for renaming nodes raise NotImplementedError - def test_dt_method(self): + def test_dt_method(self, create_test_datatree): dt = create_test_datatree() def multiply_then_add(ds, times, add=0.0): @@ -239,7 +237,7 @@ def multiply_then_add(ds, times, add=0.0): result_tree = dt.map_over_subtree(multiply_then_add, 10.0, add=2.0) assert_equal(result_tree, expected) - def test_discard_ancestry(self): + def test_discard_ancestry(self, create_test_datatree): # Check for datatree GH issue https://github.com/xarray-contrib/datatree/issues/48 dt = create_test_datatree() subtree = dt["set1"] diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index d1b5c56b96e..d46d5b87054 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -41,6 +41,9 @@ Documentation Internal Changes ~~~~~~~~~~~~~~~~ +- Made ``testing.test_datatree.create_test_datatree`` into a pytest fixture (:pull:`107`). + By `Benjamin Woods `_. + .. _whats-new.v0.0.6: From a1851c8a5d0b3fa85bb243fccab2f57c17ec99c8 Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Thu, 16 Jun 2022 10:45:44 -0400 Subject: [PATCH 133/260] Integrate variables into DataTree https://github.com/xarray-contrib/datatree/pull/41 * sketching out changes needed to integrate variables into DataTree * fixed some other basic conflicts * fix mypy errors * can create basic datatree node objects again * child-variable name collisions dectected correctly * in-progres * add _replace method * updated tests to assert identical instead of check .ds is expected_ds * refactor .ds setter to use _replace * refactor init to use _replace * refactor test tree to avoid init * attempt at copy methods * rewrote implementation of .copy method * xfailing test for deepcopying * pseudocode implementation of DatasetView * Revert "pseudocode implementation of DatasetView" This reverts commit 52ef23baaa4b6892cad2d69c61b43db831044630. * removed duplicated implementation of copy * reorganise API docs * expose data_vars, coords etc. properties * try except with calculate_dimensions private import * add keys/values/items methods * don't use has_data when .variables would do * explanation of basic properties * add data structures page to index * revert adding documentation in favour of that going in a different PR * correct deepcopy tests * use .data_vars in copy tests * make imports depend on most recent version of xarray Co-authored-by: Mattia Almansi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove try except for internal import * depend on latest pre-release of xarray * correct name of version * xarray pre-release under pip in ci envs * correct methods * whatsnews * improve docstrings Co-authored-by: Mattia Almansi Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- xarray/datatree_/ci/doc.yml | 2 +- xarray/datatree_/ci/environment.yml | 3 +- xarray/datatree_/datatree/__init__.py | 2 +- xarray/datatree_/datatree/datatree.py | 445 +++++++++++++++--- xarray/datatree_/datatree/ops.py | 4 +- .../datatree_/datatree/tests/test_datatree.py | 139 +++++- xarray/datatree_/datatree/treenode.py | 4 +- xarray/datatree_/docs/source/api.rst | 270 +++++++---- xarray/datatree_/docs/source/whats-new.rst | 7 + xarray/datatree_/requirements.txt | 2 +- 10 files changed, 694 insertions(+), 184 deletions(-) diff --git a/xarray/datatree_/ci/doc.yml b/xarray/datatree_/ci/doc.yml index 0a20f516948..ff303a98115 100644 --- a/xarray/datatree_/ci/doc.yml +++ b/xarray/datatree_/ci/doc.yml @@ -4,7 +4,6 @@ channels: dependencies: - pip - python>=3.8 - - xarray>=0.20.2 - netcdf4 - scipy - sphinx @@ -16,3 +15,4 @@ dependencies: - zarr - pip: - git+https://github.com/xarray-contrib/datatree + - xarray>=2022.05.0.dev0 diff --git a/xarray/datatree_/ci/environment.yml b/xarray/datatree_/ci/environment.yml index c5d58977e08..1aa9af93363 100644 --- a/xarray/datatree_/ci/environment.yml +++ b/xarray/datatree_/ci/environment.yml @@ -4,7 +4,6 @@ channels: - nodefaults dependencies: - python>=3.8 - - xarray>=0.20.2 - netcdf4 - pytest - flake8 @@ -13,3 +12,5 @@ dependencies: - pytest-cov - h5netcdf - zarr + - pip: + - xarray>=2022.05.0.dev0 diff --git a/xarray/datatree_/datatree/__init__.py b/xarray/datatree_/datatree/__init__.py index d799dc027ee..58b65aec598 100644 --- a/xarray/datatree_/datatree/__init__.py +++ b/xarray/datatree_/datatree/__init__.py @@ -6,7 +6,7 @@ # import public API from .datatree import DataTree from .io import open_datatree -from .mapping import map_over_subtree +from .mapping import TreeIsomorphismError, map_over_subtree try: __version__ = get_distribution(__name__).version diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 708a4599611..e5a4bd4a21f 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -1,5 +1,7 @@ from __future__ import annotations +import copy +import itertools from collections import OrderedDict from html import escape from typing import ( @@ -8,18 +10,27 @@ Callable, Dict, Generic, + Hashable, Iterable, + Iterator, Mapping, MutableMapping, Optional, + Set, Tuple, Union, ) -from xarray import DataArray, Dataset +import pandas as pd from xarray.core import utils +from xarray.core.coordinates import DatasetCoordinates +from xarray.core.dataarray import DataArray +from xarray.core.dataset import Dataset, DataVariables +from xarray.core.indexes import Index, Indexes +from xarray.core.merge import dataset_update_method from xarray.core.options import OPTIONS as XR_OPTS -from xarray.core.variable import Variable +from xarray.core.utils import Default, Frozen, _default +from xarray.core.variable import Variable, calculate_dimensions from . import formatting, formatting_html from .mapping import TreeIsomorphismError, check_isomorphic, map_over_subtree @@ -41,7 +52,7 @@ # the entire API of `xarray.Dataset`, but with certain methods decorated to instead map the dataset function over every # node in the tree. As this API is copied without directly subclassing `xarray.Dataset` we instead create various Mixin # classes (in ops.py) which each define part of `xarray.Dataset`'s extensive API. - +# # Some of these methods must be wrapped to map over all nodes in the subtree. Others are fine to inherit unaltered # (normally because they (a) only call dataset properties and (b) don't return a dataset that should be nested into a new # tree) and some will get overridden by the class definition of DataTree. @@ -51,12 +62,37 @@ T_Path = Union[str, NodePath] +def _coerce_to_dataset(data: Dataset | DataArray | None) -> Dataset: + if isinstance(data, DataArray): + ds = data.to_dataset() + elif isinstance(data, Dataset): + ds = data + elif data is None: + ds = Dataset() + else: + raise TypeError( + f"data object is not an xarray Dataset, DataArray, or None, it is of type {type(data)}" + ) + return ds + + +def _check_for_name_collisions( + children: Iterable[str], variables: Iterable[Hashable] +) -> None: + colliding_names = set(children).intersection(set(variables)) + if colliding_names: + raise KeyError( + f"Some names would collide between variables and children: {list(colliding_names)}" + ) + + class DataTree( TreeNode, MappedDatasetMethodsMixin, MappedDataWithCoords, DataTreeArithmeticMixin, Generic[Tree], + Mapping, ): """ A tree-like hierarchical collection of xarray objects. @@ -80,10 +116,23 @@ class DataTree( # TODO .loc, __contains__, __iter__, __array__, __len__ + # TODO a lot of properties like .variables could be defined in a DataMapping class which both Dataset and DataTree inherit from + + # TODO __slots__ + + # TODO all groupby classes + _name: Optional[str] - _parent: Optional[Tree] - _children: OrderedDict[str, Tree] - _ds: Dataset + _parent: Optional[DataTree] + _children: OrderedDict[str, DataTree] + _attrs: Optional[Dict[Hashable, Any]] + _cache: Dict[str, Any] + _coord_names: Set[Hashable] + _dims: Dict[Hashable, int] + _encoding: Optional[Dict[Hashable, Any]] + _close: Optional[Callable[[], None]] + _indexes: Dict[Hashable, Index] + _variables: Dict[Hashable, Variable] def __init__( self, @@ -93,33 +142,54 @@ def __init__( name: str = None, ): """ - Create a single node of a DataTree, which optionally contains data in the form of an xarray.Dataset. + Create a single node of a DataTree. + + The node may optionally contain data in the form of data and coordinate variables, stored in the same way as + data is stored in an xarray.Dataset. Parameters ---------- - data : Dataset, DataArray, Variable or None, optional - Data to store under the .ds attribute of this node. DataArrays and Variables will be promoted to Datasets. + data : Dataset, DataArray, or None, optional + Data to store under the .ds attribute of this node. DataArrays will be promoted to Datasets. Default is None. parent : DataTree, optional Parent node to this node. Default is None. children : Mapping[str, DataTree], optional Any child nodes of this node. Default is None. name : str, optional - Name for the root node of the tree. + Name for this node of the tree. Default is None. Returns ------- - node : DataTree + DataTree See Also -------- DataTree.from_dict """ + # validate input + if children is None: + children = {} + ds = _coerce_to_dataset(data) + _check_for_name_collisions(children, ds.variables) + + # set tree attributes super().__init__(children=children) self.name = name self.parent = parent - self.ds = data + + # set data attributes + self._replace( + inplace=True, + variables=ds._variables, + coord_names=ds._coord_names, + dims=ds._dims, + indexes=ds._indexes, + attrs=ds._attrs, + encoding=ds._encoding, + ) + self._close = ds._close @property def name(self) -> str | None: @@ -149,53 +219,136 @@ def parent(self: DataTree, new_parent: DataTree) -> None: @property def ds(self) -> Dataset: """The data in this node, returned as a Dataset.""" - return self._ds + # TODO change this to return only an immutable view onto this node's data (see GH https://github.com/xarray-contrib/datatree/issues/80) + return self.to_dataset() @ds.setter def ds(self, data: Union[Dataset, DataArray] = None) -> None: - if not isinstance(data, (Dataset, DataArray)) and data is not None: - raise TypeError( - f"{type(data)} object is not an xarray Dataset, DataArray, or None" - ) - if isinstance(data, DataArray): - data = data.to_dataset() - elif data is None: - data = Dataset() + ds = _coerce_to_dataset(data) - for var in list(data.variables): - if var in self.children: - raise KeyError( - f"Cannot add variable named {var}: node already has a child named {var}" - ) + _check_for_name_collisions(self.children, ds.variables) + + self._replace( + inplace=True, + variables=ds._variables, + coord_names=ds._coord_names, + dims=ds._dims, + indexes=ds._indexes, + attrs=ds._attrs, + encoding=ds._encoding, + ) + self._close = ds._close + + def _pre_attach(self: DataTree, parent: DataTree) -> None: + """ + Method which superclass calls before setting parent, here used to prevent having two + children with duplicate names (or a data variable with the same name as a child). + """ + super()._pre_attach(parent) + if self.name in list(parent.ds.variables): + raise KeyError( + f"parent {parent.name} already contains a data variable named {self.name}" + ) - self._ds = data + def to_dataset(self) -> Dataset: + """Return the data in this node as a new xarray.Dataset object.""" + return Dataset._construct_direct( + self._variables, + self._coord_names, + self._dims, + self._attrs, + self._indexes, + self._encoding, + self._close, + ) @property - def has_data(self) -> bool: + def has_data(self): """Whether or not there are any data variables in this node.""" - return len(self.ds.variables) > 0 + return len(self._variables) > 0 @property def has_attrs(self) -> bool: """Whether or not there are any metadata attributes in this node.""" - return len(self.ds.attrs.keys()) > 0 + return len(self.attrs.keys()) > 0 @property def is_empty(self) -> bool: """False if node contains any data or attrs. Does not look at children.""" return not (self.has_data or self.has_attrs) - def _pre_attach(self: DataTree, parent: DataTree) -> None: + @property + def variables(self) -> Mapping[Hashable, Variable]: + """Low level interface to node contents as dict of Variable objects. + + This ordered dictionary is frozen to prevent mutation that could + violate Dataset invariants. It contains all variable objects + constituting this DataTree node, including both data variables and + coordinates. """ - Method which superclass calls before setting parent, here used to prevent having two - children with duplicate names (or a data variable with the same name as a child). + return Frozen(self._variables) + + @property + def attrs(self) -> Dict[Hashable, Any]: + """Dictionary of global attributes on this node""" + if self._attrs is None: + self._attrs = {} + return self._attrs + + @attrs.setter + def attrs(self, value: Mapping[Any, Any]) -> None: + self._attrs = dict(value) + + @property + def encoding(self) -> Dict: + """Dictionary of global encoding attributes on this node""" + if self._encoding is None: + self._encoding = {} + return self._encoding + + @encoding.setter + def encoding(self, value: Mapping) -> None: + self._encoding = dict(value) + + @property + def dims(self) -> Mapping[Hashable, int]: + """Mapping from dimension names to lengths. + + Cannot be modified directly, but is updated when adding new variables. + + Note that type of this object differs from `DataArray.dims`. + See `DataTree.sizes`, `Dataset.sizes`, and `DataArray.sizes` for consistently named + properties. """ - super()._pre_attach(parent) - if parent.has_data and self.name in list(parent.ds.variables): - raise KeyError( - f"parent {parent.name} already contains a data variable named {self.name}" - ) + return Frozen(self._dims) + + @property + def sizes(self) -> Mapping[Hashable, int]: + """Mapping from dimension names to lengths. + + Cannot be modified directly, but is updated when adding new variables. + + This is an alias for `DataTree.dims` provided for the benefit of + consistency with `DataArray.sizes`. + + See Also + -------- + DataArray.sizes + """ + return self.dims + + def __contains__(self, key: object) -> bool: + """The 'in' operator will return true or false depending on whether + 'key' is either an array stored in the datatree or a child node, or neither. + """ + return key in self.variables or key in self.children + + def __bool__(self) -> bool: + return bool(self.ds.data_vars) or bool(self.children) + + def __iter__(self) -> Iterator[Hashable]: + return itertools.chain(self.ds.data_vars, self.children) def __repr__(self) -> str: return formatting.datatree_repr(self) @@ -209,20 +362,135 @@ def _repr_html_(self): return f"
{escape(repr(self))}
" return formatting_html.datatree_repr(self) + @classmethod + def _construct_direct( + cls, + variables: dict[Any, Variable], + coord_names: set[Hashable], + dims: dict[Any, int] = None, + attrs: dict = None, + indexes: dict[Any, Index] = None, + encoding: dict = None, + name: str | None = None, + parent: DataTree | None = None, + children: OrderedDict[str, DataTree] = None, + close: Callable[[], None] = None, + ) -> DataTree: + """Shortcut around __init__ for internal use when we want to skip costly validation.""" + + # data attributes + if dims is None: + dims = calculate_dimensions(variables) + if indexes is None: + indexes = {} + if children is None: + children = OrderedDict() + + obj: DataTree = object.__new__(cls) + obj._variables = variables + obj._coord_names = coord_names + obj._dims = dims + obj._indexes = indexes + obj._attrs = attrs + obj._close = close + obj._encoding = encoding + + # tree attributes + obj._name = name + obj._children = children + obj._parent = parent + + return obj + + def _replace( + self: DataTree, + variables: dict[Hashable, Variable] = None, + coord_names: set[Hashable] = None, + dims: dict[Any, int] = None, + attrs: dict[Hashable, Any] | None | Default = _default, + indexes: dict[Hashable, Index] = None, + encoding: dict | None | Default = _default, + name: str | None | Default = _default, + parent: DataTree | None = _default, + children: OrderedDict[str, DataTree] = None, + inplace: bool = False, + ) -> DataTree: + """ + Fastpath constructor for internal use. + + Returns an object with optionally replaced attributes. + + Explicitly passed arguments are *not* copied when placed on the new + datatree. It is up to the caller to ensure that they have the right type + and are not used elsewhere. + """ + if inplace: + if variables is not None: + self._variables = variables + if coord_names is not None: + self._coord_names = coord_names + if dims is not None: + self._dims = dims + if attrs is not _default: + self._attrs = attrs + if indexes is not None: + self._indexes = indexes + if encoding is not _default: + self._encoding = encoding + if name is not _default: + self._name = name + if parent is not _default: + self._parent = parent + if children is not None: + self._children = children + obj = self + else: + if variables is None: + variables = self._variables.copy() + if coord_names is None: + coord_names = self._coord_names.copy() + if dims is None: + dims = self._dims.copy() + if attrs is _default: + attrs = copy.copy(self._attrs) + if indexes is None: + indexes = self._indexes.copy() + if encoding is _default: + encoding = copy.copy(self._encoding) + if name is _default: + name = self._name # no need to copy str objects or None + if parent is _default: + parent = copy.copy(self._parent) + if children is _default: + children = copy.copy(self._children) + obj = self._construct_direct( + variables, + coord_names, + dims, + attrs, + indexes, + encoding, + name, + parent, + children, + ) + return obj + def get( self: DataTree, key: str, default: Optional[DataTree | DataArray] = None ) -> Optional[DataTree | DataArray]: """ - Access child nodes stored in this node as a DataTree or variables or coordinates stored in this node as a - DataArray. + Access child nodes, variables, or coordinates stored in this node. + + Returned object will be either a DataTree or DataArray object depending on whether the key given points to a + child or variable. Parameters ---------- key : str - Name of variable / node item, which must lie in this immediate node (not elsewhere in the tree). + Name of variable / child within this node. Must lie in this immediate node (not elsewhere in the tree). default : DataTree | DataArray, optional - A value to return if the specified key does not exist. - Default value is None. + A value to return if the specified key does not exist. Default return value is None. """ if key in self.children: return self.children[key] @@ -233,13 +501,19 @@ def get( def __getitem__(self: DataTree, key: str) -> DataTree | DataArray: """ - Access child nodes stored in this tree as a DataTree or variables or coordinates stored in this tree as a - DataArray. + Access child nodes, variables, or coordinates stored anywhere in this tree. + + Returned object will be either a DataTree or DataArray object depending on whether the key given points to a + child or variable. Parameters ---------- key : str - Name of variable / node, or unix-like path to variable / node. + Name of variable / child within this node, or unix-like path to variable / child within another node. + + Returns + ------- + Union[DataTree, DataArray] """ # Either: @@ -272,7 +546,7 @@ def _set(self, key: str, val: DataTree | CoercibleValue) -> None: val.parent = self elif isinstance(val, (DataArray, Variable)): # TODO this should also accomodate other types that can be coerced into Variables - self.ds[key] = val + self.update({key: val}) else: raise TypeError(f"Type {type(val)} cannot be assigned to a DataTree") @@ -316,8 +590,12 @@ def update(self, other: Dataset | Mapping[str, DataTree | DataArray]) -> None: else: raise TypeError(f"Type {type(v)} cannot be assigned to a DataTree") - super().update(new_children) - self.ds.update(new_variables) + vars_merge_result = dataset_update_method(self.to_dataset(), new_variables) + # TODO are there any subtleties with preserving order of children like this? + merged_children = OrderedDict(**self.children, **new_children) + self._replace( + inplace=True, children=merged_children, **vars_merge_result._asdict() + ) @classmethod def from_dict( @@ -326,7 +604,7 @@ def from_dict( name: str = None, ) -> DataTree: """ - Create a datatree from a dictionary of data objects, labelled by paths into the tree. + Create a datatree from a dictionary of data objects, organised by paths into the tree. Parameters ---------- @@ -365,28 +643,54 @@ def from_dict( allow_overwrite=False, new_nodes_along_path=True, ) + return obj - def to_dict(self) -> Dict[str, Any]: + def to_dict(self) -> Dict[str, Dataset]: """ Create a dictionary mapping of absolute node paths to the data contained in those nodes. Returns ------- - Dict + Dict[str, Dataset] """ - return {node.path: node.ds for node in self.subtree} + return {node.path: node.to_dataset() for node in self.subtree} @property def nbytes(self) -> int: - return sum(node.ds.nbytes if node.has_data else 0 for node in self.subtree) + return sum(node.to_dataset().nbytes for node in self.subtree) def __len__(self) -> int: - if self.children: - n_children = len(self.children) - else: - n_children = 0 - return n_children + len(self.ds) + return len(self.children) + len(self.data_vars) + + @property + def indexes(self) -> Indexes[pd.Index]: + """Mapping of pandas.Index objects used for label based indexing. + Raises an error if this DataTree node has indexes that cannot be coerced + to pandas.Index objects. + + See Also + -------- + DataTree.xindexes + """ + return self.xindexes.to_pandas_indexes() + + @property + def xindexes(self) -> Indexes[Index]: + """Mapping of xarray Index objects used for label based indexing.""" + return Indexes(self._indexes, {k: self._variables[k] for k in self._indexes}) + + @property + def coords(self) -> DatasetCoordinates: + """Dictionary of xarray.DataArray objects corresponding to coordinate + variables + """ + return DatasetCoordinates(self.to_dataset()) + + @property + def data_vars(self) -> DataVariables: + """Dictionary of DataArray objects corresponding to data variables""" + return DataVariables(self.to_dataset()) def isomorphic( self, @@ -400,7 +704,7 @@ def isomorphic( Nothing about the data in each node is checked. Isomorphism is a necessary condition for two trees to be used in a nodewise binary operation, - such as tree1 + tree2. + such as ``tree1 + tree2``. By default this method does not check any part of the tree above the given node. Therefore this method can be used as default to check that two subtrees are isomorphic. @@ -408,12 +712,13 @@ def isomorphic( Parameters ---------- other : DataTree - The tree object to compare to. + The other tree object to compare to. from_root : bool, optional, default is False - Whether or not to first traverse to the root of the trees before checking for isomorphism. - If a & b have no parents then this has no effect. + Whether or not to first traverse to the root of the two trees before checking for isomorphism. + If neither tree has a parent then this has no effect. strict_names : bool, optional, default is False - Whether or not to also check that each node has the same name as its counterpart. + Whether or not to also check that every node in the tree has the same name as its counterpart in the other + tree. See Also -------- @@ -441,10 +746,10 @@ def equals(self, other: DataTree, from_root: bool = True) -> bool: Parameters ---------- other : DataTree - The tree object to compare to. + The other tree object to compare to. from_root : bool, optional, default is True - Whether or not to first traverse to the root of the trees before checking. - If a & b have no parents then this has no effect. + Whether or not to first traverse to the root of the two trees before checking for isomorphism. + If neither tree has a parent then this has no effect. See Also -------- @@ -472,10 +777,10 @@ def identical(self, other: DataTree, from_root=True) -> bool: Parameters ---------- other : DataTree - The tree object to compare to. + The other tree object to compare to. from_root : bool, optional, default is True - Whether or not to first traverse to the root of the trees before checking. - If a & b have no parents then this has no effect. + Whether or not to first traverse to the root of the two trees before checking for isomorphism. + If neither tree has a parent then this has no effect. See Also -------- diff --git a/xarray/datatree_/datatree/ops.py b/xarray/datatree_/datatree/ops.py index ee55ccfe4c2..bdc931c910e 100644 --- a/xarray/datatree_/datatree/ops.py +++ b/xarray/datatree_/datatree/ops.py @@ -30,8 +30,8 @@ "map_blocks", ] _DATASET_METHODS_TO_MAP = [ - "copy", "as_numpy", + "copy", "__copy__", "__deepcopy__", "set_coords", @@ -57,7 +57,6 @@ "reorder_levels", "stack", "unstack", - "update", "merge", "drop_vars", "drop_sel", @@ -245,7 +244,6 @@ class MappedDataWithCoords: """ # TODO add mapped versions of groupby, weighted, rolling, rolling_exp, coarsen, resample - # TODO re-implement AttrsAccessMixin stuff so that it includes access to child nodes _wrap_then_attach_to_cls( target_cls_dict=vars(), source_cls=Dataset, diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 2ef2b8c8f64..b488ed0a57a 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -1,7 +1,12 @@ +from copy import copy, deepcopy + +import numpy as np import pytest import xarray as xr import xarray.testing as xrt +from xarray.tests import source_ndarray +import datatree.testing as dtt from datatree import DataTree @@ -31,12 +36,37 @@ def test_setparent_unnamed_child_node_fails(self): with pytest.raises(ValueError, match="unnamed"): DataTree(parent=john) + def test_create_two_children(self): + root_data = xr.Dataset({"a": ("y", [6, 7, 8]), "set0": ("x", [9, 10])}) + set1_data = xr.Dataset({"a": 0, "b": 1}) + + root = DataTree(data=root_data) + set1 = DataTree(name="set1", parent=root, data=set1_data) + DataTree(name="set1", parent=root) + DataTree(name="set2", parent=set1) + + def test_create_full_tree(self, simple_datatree): + root_data = xr.Dataset({"a": ("y", [6, 7, 8]), "set0": ("x", [9, 10])}) + set1_data = xr.Dataset({"a": 0, "b": 1}) + set2_data = xr.Dataset({"a": ("x", [2, 3]), "b": ("x", [0.1, 0.2])}) + + root = DataTree(data=root_data) + set1 = DataTree(name="set1", parent=root, data=set1_data) + DataTree(name="set1", parent=set1) + DataTree(name="set2", parent=set1) + set2 = DataTree(name="set2", parent=root, data=set2_data) + DataTree(name="set1", parent=set2) + DataTree(name="set3", parent=root) + + expected = simple_datatree + assert root.identical(expected) + class TestStoreDatasets: def test_create_with_data(self): dat = xr.Dataset({"a": 0}) john = DataTree(name="john", data=dat) - assert john.ds is dat + xrt.assert_identical(john.ds, dat) with pytest.raises(TypeError): DataTree(name="mary", parent=john, data="junk") # noqa @@ -45,7 +75,7 @@ def test_set_data(self): john = DataTree(name="john") dat = xr.Dataset({"a": 0}) john.ds = dat - assert john.ds is dat + xrt.assert_identical(john.ds, dat) with pytest.raises(TypeError): john.ds = "junk" @@ -66,11 +96,11 @@ def test_parent_already_has_variable_with_childs_name(self): def test_assign_when_already_child_with_variables_name(self): dt = DataTree(data=None) DataTree(name="a", data=None, parent=dt) - with pytest.raises(KeyError, match="already has a child named a"): + with pytest.raises(KeyError, match="names would collide"): dt.ds = xr.Dataset({"a": 0}) dt.ds = xr.Dataset() - with pytest.raises(KeyError, match="already has a child named a"): + with pytest.raises(KeyError, match="names would collide"): dt.ds = dt.ds.assign(a=xr.DataArray(0)) @pytest.mark.xfail @@ -78,7 +108,7 @@ def test_update_when_already_child_with_variables_name(self): # See issue https://github.com/xarray-contrib/datatree/issues/38 dt = DataTree(name="root", data=None) DataTree(name="a", data=None, parent=dt) - with pytest.raises(KeyError, match="already has a child named a"): + with pytest.raises(KeyError, match="names would collide"): dt.ds["a"] = xr.DataArray(0) @@ -136,7 +166,82 @@ def test_getitem_dict_like_selection_access_to_dataset(self): class TestUpdate: - ... + def test_update_new_named_dataarray(self): + da = xr.DataArray(name="temp", data=[0, 50]) + folder1 = DataTree(name="folder1") + folder1.update({"results": da}) + expected = da.rename("results") + xrt.assert_equal(folder1["results"], expected) + + +class TestCopy: + def test_copy(self, create_test_datatree): + dt = create_test_datatree() + + for node in dt.root.subtree: + node.attrs["Test"] = [1, 2, 3] + + for copied in [dt.copy(deep=False), copy(dt)]: + dtt.assert_identical(dt, copied) + + for node, copied_node in zip(dt.root.subtree, copied.root.subtree): + + assert node.encoding == copied_node.encoding + # Note: IndexVariable objects with string dtype are always + # copied because of xarray.core.util.safe_cast_to_index. + # Limiting the test to data variables. + for k in node.data_vars: + v0 = node.variables[k] + v1 = copied_node.variables[k] + assert source_ndarray(v0.data) is source_ndarray(v1.data) + copied_node["foo"] = xr.DataArray(data=np.arange(5), dims="z") + assert "foo" not in node + + copied_node.attrs["foo"] = "bar" + assert "foo" not in node.attrs + assert node.attrs["Test"] is copied_node.attrs["Test"] + + def test_deepcopy(self, create_test_datatree): + dt = create_test_datatree() + + for node in dt.root.subtree: + node.attrs["Test"] = [1, 2, 3] + + for copied in [dt.copy(deep=True), deepcopy(dt)]: + dtt.assert_identical(dt, copied) + + for node, copied_node in zip(dt.root.subtree, copied.root.subtree): + assert node.encoding == copied_node.encoding + # Note: IndexVariable objects with string dtype are always + # copied because of xarray.core.util.safe_cast_to_index. + # Limiting the test to data variables. + for k in node.data_vars: + v0 = node.variables[k] + v1 = copied_node.variables[k] + assert source_ndarray(v0.data) is not source_ndarray(v1.data) + copied_node["foo"] = xr.DataArray(data=np.arange(5), dims="z") + assert "foo" not in node + + copied_node.attrs["foo"] = "bar" + assert "foo" not in node.attrs + assert node.attrs["Test"] is not copied_node.attrs["Test"] + + @pytest.mark.xfail(reason="data argument not yet implemented") + def test_copy_with_data(self, create_test_datatree): + orig = create_test_datatree() + # TODO use .data_vars once that property is available + data_vars = { + k: v for k, v in orig.variables.items() if k not in orig._coord_names + } + new_data = {k: np.random.randn(*v.shape) for k, v in data_vars.items()} + actual = orig.copy(data=new_data) + + expected = orig.copy() + for k, v in new_data.items(): + expected[k].data = v + dtt.assert_identical(expected, actual) + + # TODO test parents and children? class TestSetItem: @@ -187,27 +292,27 @@ def test_setitem_dataset_on_this_node(self): data = xr.Dataset({"temp": [0, 50]}) results = DataTree(name="results") results["."] = data - assert results.ds is data + xrt.assert_identical(results.ds, data) @pytest.mark.xfail(reason="assigning Datasets doesn't yet create new nodes") def test_setitem_dataset_as_new_node(self): data = xr.Dataset({"temp": [0, 50]}) folder1 = DataTree(name="folder1") folder1["results"] = data - assert folder1["results"].ds is data + xrt.assert_identical(folder1["results"].ds, data) @pytest.mark.xfail(reason="assigning Datasets doesn't yet create new nodes") def test_setitem_dataset_as_new_node_requiring_intermediate_nodes(self): data = xr.Dataset({"temp": [0, 50]}) folder1 = DataTree(name="folder1") folder1["results/highres"] = data - assert folder1["results/highres"].ds is data + xrt.assert_identical(folder1["results/highres"].ds, data) def test_setitem_named_dataarray(self): - data = xr.DataArray(name="temp", data=[0, 50]) + da = xr.DataArray(name="temp", data=[0, 50]) folder1 = DataTree(name="folder1") - folder1["results"] = data - expected = data.rename("results") + folder1["results"] = da + expected = da.rename("results") xrt.assert_equal(folder1["results"], expected) def test_setitem_unnamed_dataarray(self): @@ -250,16 +355,16 @@ def test_data_in_root(self): assert dt.name is None assert dt.parent is None assert dt.children == {} - assert dt.ds is dat + xrt.assert_identical(dt.ds, dat) def test_one_layer(self): dat1, dat2 = xr.Dataset({"a": 1}), xr.Dataset({"b": 2}) dt = DataTree.from_dict({"run1": dat1, "run2": dat2}) xrt.assert_identical(dt.ds, xr.Dataset()) assert dt.name is None - assert dt["run1"].ds is dat1 + xrt.assert_identical(dt["run1"].ds, dat1) assert dt["run1"].children == {} - assert dt["run2"].ds is dat2 + xrt.assert_identical(dt["run2"].ds, dat2) assert dt["run2"].children == {} def test_two_layers(self): @@ -268,13 +373,13 @@ def test_two_layers(self): assert "highres" in dt.children assert "lowres" in dt.children highres_run = dt["highres/run"] - assert highres_run.ds is dat1 + xrt.assert_identical(highres_run.ds, dat1) def test_nones(self): dt = DataTree.from_dict({"d": None, "d/e": None}) assert [node.name for node in dt.subtree] == [None, "d", "e"] assert [node.path for node in dt.subtree] == ["/", "/d", "/d/e"] - xrt.assert_equal(dt["d/e"].ds, xr.Dataset()) + xrt.assert_identical(dt["d/e"].ds, xr.Dataset()) def test_full(self, simple_datatree): dt = simple_datatree diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index e29bfd66344..a2e87675b57 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -137,7 +137,9 @@ def _detach(self, parent: Tree | None) -> None: def _attach(self, parent: Tree | None, child_name: str = None) -> None: if parent is not None: if child_name is None: - raise ValueError("Cannot directly assign a parent to an unnamed node") + raise ValueError( + "To directly set parent, child needs a name, but child is unnamed" + ) self._pre_attach(parent) parentchildren = parent._children diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst index 5cd16466328..9ad741901c4 100644 --- a/xarray/datatree_/docs/source/api.rst +++ b/xarray/datatree_/docs/source/api.rst @@ -18,6 +18,8 @@ Creating a DataTree Tree Attributes --------------- +Attributes relating to the recursive tree-like structure of a ``DataTree``. + .. autosummary:: :toctree: generated/ @@ -34,34 +36,40 @@ Tree Attributes DataTree.ancestors DataTree.groups -Data Attributes ---------------- +Data Contents +------------- + +Interface to the data objects (optionally) stored inside a single ``DataTree`` node. +This interface echoes that of ``xarray.Dataset``. .. autosummary:: :toctree: generated/ DataTree.dims - DataTree.variables - DataTree.encoding DataTree.sizes + DataTree.data_vars + DataTree.coords DataTree.attrs + DataTree.encoding DataTree.indexes - DataTree.xindexes - DataTree.coords DataTree.chunks + DataTree.nbytes DataTree.ds + DataTree.to_dataset DataTree.has_data DataTree.has_attrs DataTree.is_empty .. - Missing - DataTree.chunksizes + Missing: + ``DataTree.chunksizes`` Dictionary interface -------------------- +``DataTree`` objects also have a dict-like interface mapping keys to either ``xarray.DataArray``s or to child ``DataTree`` nodes. + .. autosummary:: :toctree: generated/ @@ -70,16 +78,14 @@ Dictionary interface DataTree.__delitem__ DataTree.update DataTree.get - -.. - - Missing DataTree.items DataTree.keys DataTree.values -Tree Manipulation Methods -------------------------- +Tree Manipulation +----------------- + +For manipulating, traversing, navigating, or mapping over the tree structure. .. autosummary:: :toctree: generated/ @@ -89,127 +95,181 @@ Tree Manipulation Methods DataTree.relative_to DataTree.iter_lineage DataTree.find_common_ancestor + map_over_subtree + +DataTree Contents +----------------- -Tree Manipulation Utilities ---------------------------- +Manipulate the contents of all nodes in a tree simultaneously. .. autosummary:: :toctree: generated/ - map_over_subtree + DataTree.copy + DataTree.assign + DataTree.assign_coords + DataTree.merge + DataTree.rename + DataTree.rename_vars + DataTree.rename_dims + DataTree.swap_dims + DataTree.expand_dims + DataTree.drop_vars + DataTree.drop_dims + DataTree.set_coords + DataTree.reset_coords -Methods -------- -.. +DataTree Node Contents +---------------------- - TODO divide these up into "Dataset contents", "Indexing", "Computation" etc. +Manipulate the contents of a single DataTree node. + +Comparisons +=========== + +Compare one ``DataTree`` object to another. + +.. autosummary:: + :toctree: generated/ + + DataTree.isomorphic + DataTree.equals + DataTree.identical + +Indexing +======== + +Index into all nodes in the subtree simultaneously. .. autosummary:: :toctree: generated/ - DataTree.load - DataTree.compute - DataTree.persist - DataTree.unify_chunks - DataTree.chunk - DataTree.map_blocks - DataTree.copy - DataTree.as_numpy - DataTree.__copy__ - DataTree.__deepcopy__ - DataTree.set_coords - DataTree.reset_coords - DataTree.info DataTree.isel DataTree.sel + DataTree.drop_sel + DataTree.drop_isel DataTree.head DataTree.tail DataTree.thin - DataTree.broadcast_like - DataTree.reindex_like - DataTree.reindex + DataTree.squeeze DataTree.interp DataTree.interp_like - DataTree.rename - DataTree.rename_dims - DataTree.rename_vars - DataTree.swap_dims - DataTree.expand_dims + DataTree.reindex + DataTree.reindex_like DataTree.set_index DataTree.reset_index DataTree.reorder_levels - DataTree.stack - DataTree.unstack - DataTree.update - DataTree.merge - DataTree.drop_vars - DataTree.drop_sel - DataTree.drop_isel - DataTree.drop_dims - DataTree.isomorphic - DataTree.equals - DataTree.identical - DataTree.transpose + DataTree.query + +.. + + Missing: + ``DataTree.loc`` + + +Missing Value Handling +====================== + +.. autosummary:: + :toctree: generated/ + + DataTree.isnull + DataTree.notnull + DataTree.combine_first DataTree.dropna DataTree.fillna - DataTree.interpolate_na DataTree.ffill DataTree.bfill - DataTree.combine_first - DataTree.reduce + DataTree.interpolate_na + DataTree.where + DataTree.isin + +Computation +=========== + +Apply a computation to the data in all nodes in the subtree simultaneously. + +.. autosummary:: + :toctree: generated/ + DataTree.map - DataTree.assign + DataTree.reduce DataTree.diff - DataTree.shift - DataTree.roll - DataTree.sortby DataTree.quantile - DataTree.rank DataTree.differentiate DataTree.integrate - DataTree.cumulative_integrate - DataTree.filter_by_attrs + DataTree.map_blocks DataTree.polyfit - DataTree.pad - DataTree.idxmin - DataTree.idxmax - DataTree.argmin - DataTree.argmax - DataTree.query DataTree.curvefit - DataTree.squeeze - DataTree.clip - DataTree.assign_coords - DataTree.where - DataTree.close - DataTree.isnull - DataTree.notnull - DataTree.isin - DataTree.astype -Comparisons +Aggregation =========== +Aggregate data in all nodes in the subtree simultaneously. + .. autosummary:: :toctree: generated/ - testing.assert_isomorphic - testing.assert_equal - testing.assert_identical + DataTree.all + DataTree.any + DataTree.argmax + DataTree.argmin + DataTree.idxmax + DataTree.idxmin + DataTree.max + DataTree.min + DataTree.mean + DataTree.median + DataTree.prod + DataTree.sum + DataTree.std + DataTree.var + DataTree.cumsum + DataTree.cumprod ndarray methods ---------------- +=============== + +Methods copied from `np.ndarray` objects, here applying to the data in all nodes in the subtree. .. autosummary:: :toctree: generated/ - DataTree.nbytes - DataTree.real + DataTree.argsort + DataTree.astype + DataTree.clip + DataTree.conj + DataTree.conjugate DataTree.imag + DataTree.round + DataTree.real + DataTree.rank + +Reshaping and reorganising +========================== + +Reshape or reorganise the data in all nodes in the subtree. + +.. autosummary:: + :toctree: generated/ + + DataTree.transpose + DataTree.stack + DataTree.unstack + DataTree.shift + DataTree.roll + DataTree.pad + DataTree.sortby + DataTree.broadcast_like + +Plotting +======== I/O === +Create or + .. autosummary:: :toctree: generated/ @@ -221,14 +281,46 @@ I/O .. - Missing - open_mfdatatree + Missing: + ``open_mfdatatree`` + +Tutorial +======== + +Testing +======= + +Test that two DataTree objects are similar. + +.. autosummary:: + :toctree: generated/ + + testing.assert_isomorphic + testing.assert_equal + testing.assert_identical Exceptions ========== +Exceptions raised when manipulating trees. + .. autosummary:: :toctree: generated/ - TreeError TreeIsomorphismError + +Advanced API +============ + +Relatively advanced API for users or developers looking to understand the internals, or extend functionality. + +.. autosummary:: + :toctree: generated/ + + DataTree.variables + +.. + + Missing: + ``DataTree.set_close`` + ``register_datatree_accessor`` diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index d46d5b87054..e64ff549149 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -38,9 +38,16 @@ Bug fixes Documentation ~~~~~~~~~~~~~ +- API page updated with all the methods that are copied from ``xarray.Dataset``. (:pull:`41`) + By `Tom Nicholas `_. + Internal Changes ~~~~~~~~~~~~~~~~ +- Refactored ``DataTree`` class to store a set of ``xarray.Variable`` objects instead of a single ``xarray.Dataset``. + This approach means that the ``DataTree`` class now effectively copies and extends the internal structure of + ``xarray.Dataset``. (:pull:`41`) + By `Tom Nicholas `_. - Made ``testing.test_datatree.create_test_datatree`` into a pytest fixture (:pull:`107`). By `Benjamin Woods `_. diff --git a/xarray/datatree_/requirements.txt b/xarray/datatree_/requirements.txt index cf84c87ec50..4eb031ceee3 100644 --- a/xarray/datatree_/requirements.txt +++ b/xarray/datatree_/requirements.txt @@ -1 +1 @@ -xarray>=0.20.2 +xarray>=2022.05.0.dev0 From aefc0e0701a27eb25d4379c487826f45318fc229 Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Thu, 16 Jun 2022 11:27:01 -0400 Subject: [PATCH 134/260] Replace .ds with immutable DatasetView https://github.com/xarray-contrib/datatree/pull/99 * sketching out changes needed to integrate variables into DataTree * fixed some other basic conflicts * fix mypy errors * can create basic datatree node objects again * child-variable name collisions dectected correctly * in-progres * add _replace method * updated tests to assert identical instead of check .ds is expected_ds * refactor .ds setter to use _replace * refactor init to use _replace * refactor test tree to avoid init * attempt at copy methods * rewrote implementation of .copy method * xfailing test for deepcopying * pseudocode implementation of DatasetView * Revert "pseudocode implementation of DatasetView" This reverts commit 52ef23baaa4b6892cad2d69c61b43db831044630. * pseudocode implementation of DatasetView * removed duplicated implementation of copy * reorganise API docs * expose data_vars, coords etc. properties * try except with calculate_dimensions private import * add keys/values/items methods * don't use has_data when .variables would do * change asserts to not fail just because of differing types * full sketch of how DatasetView could work * added tests for DatasetView * remove commented pseudocode * explanation of basic properties * add data structures page to index * revert adding documentation in favour of that going in a different PR * correct deepcopy tests * use .data_vars in copy tests * add test for arithmetic with .ds * remove reference to wrapping node in DatasetView * clarify type through renaming variables * remove test for out-of-node access * make imports depend on most recent version of xarray Co-authored-by: Mattia Almansi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove try except for internal import * depend on latest pre-release of xarray * correct name of version * xarray pre-release under pip in ci envs * correct methods * whatsnews * fix fixture in test * whatsnew * improve docstrings * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: Mattia Almansi Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- xarray/datatree_/datatree/datatree.py | 141 +++++++++++++++++- xarray/datatree_/datatree/mapping.py | 6 +- .../datatree_/datatree/tests/test_datatree.py | 83 +++++++---- xarray/datatree_/docs/source/whats-new.rst | 8 + 4 files changed, 204 insertions(+), 34 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index e5a4bd4a21f..116e7230ad5 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -19,6 +19,7 @@ Set, Tuple, Union, + overload, ) import pandas as pd @@ -86,6 +87,135 @@ def _check_for_name_collisions( ) +class DatasetView(Dataset): + """ + An immutable Dataset-like view onto the data in a single DataTree node. + + In-place operations modifying this object should raise an AttributeError. + + Operations returning a new result will return a new xarray.Dataset object. + This includes all API on Dataset, which will be inherited. + + This requires overriding all inherited private constructors. + """ + + # TODO what happens if user alters (in-place) a DataArray they extracted from this object? + + def __init__( + self, + data_vars: Mapping[Any, Any] = None, + coords: Mapping[Any, Any] = None, + attrs: Mapping[Any, Any] = None, + ): + raise AttributeError("DatasetView objects are not to be initialized directly") + + @classmethod + def _from_node( + cls, + wrapping_node: DataTree, + ) -> DatasetView: + """Constructor, using dataset attributes from wrapping node""" + + obj: DatasetView = object.__new__(cls) + obj._variables = wrapping_node._variables + obj._coord_names = wrapping_node._coord_names + obj._dims = wrapping_node._dims + obj._indexes = wrapping_node._indexes + obj._attrs = wrapping_node._attrs + obj._close = wrapping_node._close + obj._encoding = wrapping_node._encoding + + return obj + + def __setitem__(self, key, val) -> None: + raise AttributeError( + "Mutation of the DatasetView is not allowed, please use __setitem__ on the wrapping DataTree node, " + "or use `DataTree.to_dataset()` if you want a mutable dataset" + ) + + def update(self, other) -> None: + raise AttributeError( + "Mutation of the DatasetView is not allowed, please use .update on the wrapping DataTree node, " + "or use `DataTree.to_dataset()` if you want a mutable dataset" + ) + + # FIXME https://github.com/python/mypy/issues/7328 + @overload + def __getitem__(self, key: Mapping) -> Dataset: # type: ignore[misc] + ... + + @overload + def __getitem__(self, key: Hashable) -> DataArray: # type: ignore[misc] + ... + + @overload + def __getitem__(self, key: Any) -> Dataset: + ... + + def __getitem__(self, key) -> DataArray: + # TODO call the `_get_item` method of DataTree to allow path-like access to contents of other nodes + # For now just call Dataset.__getitem__ + return Dataset.__getitem__(self, key) + + @classmethod + def _construct_direct( + cls, + variables: dict[Any, Variable], + coord_names: set[Hashable], + dims: dict[Any, int] = None, + attrs: dict = None, + indexes: dict[Any, Index] = None, + encoding: dict = None, + close: Callable[[], None] = None, + ) -> Dataset: + """ + Overriding this method (along with ._replace) and modifying it to return a Dataset object + should hopefully ensure that the return type of any method on this object is a Dataset. + """ + if dims is None: + dims = calculate_dimensions(variables) + if indexes is None: + indexes = {} + obj = object.__new__(Dataset) + obj._variables = variables + obj._coord_names = coord_names + obj._dims = dims + obj._indexes = indexes + obj._attrs = attrs + obj._close = close + obj._encoding = encoding + return obj + + def _replace( + self, + variables: dict[Hashable, Variable] = None, + coord_names: set[Hashable] = None, + dims: dict[Any, int] = None, + attrs: dict[Hashable, Any] | None | Default = _default, + indexes: dict[Hashable, Index] = None, + encoding: dict | None | Default = _default, + inplace: bool = False, + ) -> Dataset: + """ + Overriding this method (along with ._construct_direct) and modifying it to return a Dataset object + should hopefully ensure that the return type of any method on this object is a Dataset. + """ + + if inplace: + raise AttributeError("In-place mutation of the DatasetView is not allowed") + + return Dataset._replace( + self, + variables=variables, + coord_names=coord_names, + dims=dims, + attrs=attrs, + indexes=indexes, + encoding=encoding, + inplace=inplace, + ) + + class DataTree( TreeNode, MappedDatasetMethodsMixin, @@ -217,10 +347,13 @@ def parent(self: DataTree, new_parent: DataTree) -> None: self._set_parent(new_parent, self.name) @property - def ds(self) -> Dataset: - """The data in this node, returned as a Dataset.""" - # TODO change this to return only an immutable view onto this node's data (see GH https://github.com/xarray-contrib/datatree/issues/80) - return self.to_dataset() + def ds(self) -> DatasetView: + """ + An immutable Dataset-like view onto the data in this node. + + For a mutable Dataset containing the same data as in this node, use `.to_dataset()` instead. + """ + return DatasetView._from_node(self) @ds.setter def ds(self, data: Union[Dataset, DataArray] = None) -> None: diff --git a/xarray/datatree_/datatree/mapping.py b/xarray/datatree_/datatree/mapping.py index 94d2c7418fa..344842b7b49 100644 --- a/xarray/datatree_/datatree/mapping.py +++ b/xarray/datatree_/datatree/mapping.py @@ -189,10 +189,10 @@ def _map_over_subtree(*args, **kwargs) -> DataTree | Tuple[DataTree, ...]: *args_as_tree_length_iterables, *list(kwargs_as_tree_length_iterables.values()), ): - node_args_as_datasets = [ + node_args_as_datasetviews = [ a.ds if isinstance(a, DataTree) else a for a in all_node_args[:n_args] ] - node_kwargs_as_datasets = dict( + node_kwargs_as_datasetviews = dict( zip( [k for k in kwargs_as_tree_length_iterables.keys()], [ @@ -204,7 +204,7 @@ def _map_over_subtree(*args, **kwargs) -> DataTree | Tuple[DataTree, ...]: # Now we can call func on the data in this particular set of corresponding nodes results = ( - func(*node_args_as_datasets, **node_kwargs_as_datasets) + func(*node_args_as_datasetviews, **node_kwargs_as_datasetviews) if not node_of_first_tree.is_empty else None ) diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index b488ed0a57a..86a7858a7d9 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -4,7 +4,7 @@ import pytest import xarray as xr import xarray.testing as xrt -from xarray.tests import source_ndarray +from xarray.tests import create_test_data, source_ndarray import datatree.testing as dtt from datatree import DataTree @@ -16,7 +16,7 @@ def test_empty(self): assert dt.name == "root" assert dt.parent is None assert dt.children == {} - xrt.assert_identical(dt.ds, xr.Dataset()) + xrt.assert_identical(dt.to_dataset(), xr.Dataset()) def test_unnamed(self): dt = DataTree() @@ -66,7 +66,7 @@ class TestStoreDatasets: def test_create_with_data(self): dat = xr.Dataset({"a": 0}) john = DataTree(name="john", data=dat) - xrt.assert_identical(john.ds, dat) + xrt.assert_identical(john.to_dataset(), dat) with pytest.raises(TypeError): DataTree(name="mary", parent=john, data="junk") # noqa @@ -75,7 +75,7 @@ def test_set_data(self): john = DataTree(name="john") dat = xr.Dataset({"a": 0}) john.ds = dat - xrt.assert_identical(john.ds, dat) + xrt.assert_identical(john.to_dataset(), dat) with pytest.raises(TypeError): john.ds = "junk" @@ -100,16 +100,9 @@ def test_assign_when_already_child_with_variables_name(self): dt.ds = xr.Dataset({"a": 0}) dt.ds = xr.Dataset() + new_ds = dt.to_dataset().assign(a=xr.DataArray(0)) with pytest.raises(KeyError, match="names would collide"): - dt.ds = dt.ds.assign(a=xr.DataArray(0)) - - @pytest.mark.xfail - def test_update_when_already_child_with_variables_name(self): - # See issue https://github.com/xarray-contrib/datatree/issues/38 - dt = DataTree(name="root", data=None) - DataTree(name="a", data=None, parent=dt) - with pytest.raises(KeyError, match="names would collide"): - dt.ds["a"] = xr.DataArray(0) + dt.ds = new_ds class TestGet: @@ -275,13 +268,13 @@ def test_setitem_new_empty_node(self): john["mary"] = DataTree() mary = john["mary"] assert isinstance(mary, DataTree) - xrt.assert_identical(mary.ds, xr.Dataset()) + xrt.assert_identical(mary.to_dataset(), xr.Dataset()) def test_setitem_overwrite_data_in_node_with_none(self): john = DataTree(name="john") mary = DataTree(name="mary", parent=john, data=xr.Dataset()) john["mary"] = DataTree() - xrt.assert_identical(mary.ds, xr.Dataset()) + xrt.assert_identical(mary.to_dataset(), xr.Dataset()) john.ds = xr.Dataset() with pytest.raises(ValueError, match="has no name"): @@ -292,21 +285,21 @@ def test_setitem_dataset_on_this_node(self): data = xr.Dataset({"temp": [0, 50]}) results = DataTree(name="results") results["."] = data - xrt.assert_identical(results.ds, data) + xrt.assert_identical(results.to_dataset(), data) @pytest.mark.xfail(reason="assigning Datasets doesn't yet create new nodes") def test_setitem_dataset_as_new_node(self): data = xr.Dataset({"temp": [0, 50]}) folder1 = DataTree(name="folder1") folder1["results"] = data - xrt.assert_identical(folder1["results"].ds, data) + xrt.assert_identical(folder1["results"].to_dataset(), data) @pytest.mark.xfail(reason="assigning Datasets doesn't yet create new nodes") def test_setitem_dataset_as_new_node_requiring_intermediate_nodes(self): data = xr.Dataset({"temp": [0, 50]}) folder1 = DataTree(name="folder1") folder1["results/highres"] = data - xrt.assert_identical(folder1["results/highres"].ds, data) + xrt.assert_identical(folder1["results/highres"].to_dataset(), data) def test_setitem_named_dataarray(self): da = xr.DataArray(name="temp", data=[0, 50]) @@ -341,7 +334,7 @@ def test_setitem_dataarray_replace_existing_node(self): p = xr.DataArray(data=[2, 3]) results["pressure"] = p expected = t.assign(pressure=p) - xrt.assert_identical(results.ds, expected) + xrt.assert_identical(results.to_dataset(), expected) class TestDictionaryInterface: @@ -355,16 +348,16 @@ def test_data_in_root(self): assert dt.name is None assert dt.parent is None assert dt.children == {} - xrt.assert_identical(dt.ds, dat) + xrt.assert_identical(dt.to_dataset(), dat) def test_one_layer(self): dat1, dat2 = xr.Dataset({"a": 1}), xr.Dataset({"b": 2}) dt = DataTree.from_dict({"run1": dat1, "run2": dat2}) - xrt.assert_identical(dt.ds, xr.Dataset()) + xrt.assert_identical(dt.to_dataset(), xr.Dataset()) assert dt.name is None - xrt.assert_identical(dt["run1"].ds, dat1) + xrt.assert_identical(dt["run1"].to_dataset(), dat1) assert dt["run1"].children == {} - xrt.assert_identical(dt["run2"].ds, dat2) + xrt.assert_identical(dt["run2"].to_dataset(), dat2) assert dt["run2"].children == {} def test_two_layers(self): @@ -373,13 +366,13 @@ def test_two_layers(self): assert "highres" in dt.children assert "lowres" in dt.children highres_run = dt["highres/run"] - xrt.assert_identical(highres_run.ds, dat1) + xrt.assert_identical(highres_run.to_dataset(), dat1) def test_nones(self): dt = DataTree.from_dict({"d": None, "d/e": None}) assert [node.name for node in dt.subtree] == [None, "d", "e"] assert [node.path for node in dt.subtree] == ["/", "/d", "/d/e"] - xrt.assert_identical(dt["d/e"].ds, xr.Dataset()) + xrt.assert_identical(dt["d/e"].to_dataset(), xr.Dataset()) def test_full(self, simple_datatree): dt = simple_datatree @@ -409,8 +402,44 @@ def test_roundtrip_unnamed_root(self, simple_datatree): assert roundtrip.equals(dt) -class TestBrowsing: - ... +class TestDatasetView: + def test_view_contents(self): + ds = create_test_data() + dt = DataTree(data=ds) + assert ds.identical( + dt.ds + ) # this only works because Dataset.identical doesn't check types + assert isinstance(dt.ds, xr.Dataset) + + def test_immutability(self): + # See issue https://github.com/xarray-contrib/datatree/issues/38 + dt = DataTree(name="root", data=None) + DataTree(name="a", data=None, parent=dt) + + with pytest.raises( + AttributeError, match="Mutation of the DatasetView is not allowed" + ): + dt.ds["a"] = xr.DataArray(0) + + with pytest.raises( + AttributeError, match="Mutation of the DatasetView is not allowed" + ): + dt.ds.update({"a": 0}) + + # TODO are there any other ways you can normally modify state (in-place)? + # (not attribute-like assignment because that doesn't work on Dataset anyway) + + def test_methods(self): + ds = create_test_data() + dt = DataTree(data=ds) + assert ds.mean().identical(dt.ds.mean()) + assert type(dt.ds.mean()) == xr.Dataset + + def test_arithmetic(self, create_test_datatree): + dt = create_test_datatree() + expected = create_test_datatree(modify=lambda ds: 10.0 * ds)["set1"] + result = 10.0 * dt["set1"].ds + assert result.identical(expected) class TestRestructuring: diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index e64ff549149..8a31940ff27 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -29,12 +29,20 @@ New Features Breaking changes ~~~~~~~~~~~~~~~~ +- The ``DataTree.ds`` attribute now returns a view onto an immutable Dataset-like object, instead of an actual instance + of ``xarray.Dataset``. This make break existing ``isinstance`` checks or ``assert`` comparisons. (:pull:`99`) + By `Tom Nicholas `_. + Deprecations ~~~~~~~~~~~~ Bug fixes ~~~~~~~~~ +- Modifying the contents of a ``DataTree`` object via the ``DataTree.ds`` attribute is now forbidden, which prevents + any possibility of the contents of a ``DataTree`` object and its ``.ds`` attribute diverging. (:issue:`38`, :pull:`99`) + By `Tom Nicholas `_. + Documentation ~~~~~~~~~~~~~ From f8f4efcbd0dd1ac2d5994e576dfa2218f08dfcdd Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Thu, 16 Jun 2022 18:54:42 -0400 Subject: [PATCH 135/260] define __slots__ --- xarray/datatree_/datatree/datatree.py | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 116e7230ad5..0673e9db2dc 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -101,6 +101,18 @@ class DatasetView(Dataset): # TODO what happens if user alters (in-place) a DataArray they extracted from this object? + __slots__ = ( + "_attrs", + "_cache", + "_coord_names", + "_dims", + "_encoding", + "_close", + "_indexes", + "_variables", + "__weakref__", + ) + def __init__( self, data_vars: Mapping[Any, Any] = None, @@ -264,6 +276,17 @@ class DataTree( _indexes: Dict[Hashable, Index] _variables: Dict[Hashable, Variable] + __slots__ = ( + "_attrs", + "_cache", + "_coord_names", + "_dims", + "_encoding", + "_close", + "_indexes", + "_variables", + ) + def __init__( self, data: Dataset | DataArray = None, From bc47b2d15dd7dbc70ba4f3098fca3479a137cf6a Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Thu, 16 Jun 2022 19:31:11 -0400 Subject: [PATCH 136/260] remove slot for weakref --- xarray/datatree_/datatree/datatree.py | 1 - 1 file changed, 1 deletion(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 0673e9db2dc..ebeab5a1705 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -110,7 +110,6 @@ class DatasetView(Dataset): "_close", "_indexes", "_variables", - "__weakref__", ) def __init__( From 380db2691c4d439ea61db243f5f9eeb34d53b729 Mon Sep 17 00:00:00 2001 From: Mattia Almansi Date: Fri, 17 Jun 2022 18:56:56 +0200 Subject: [PATCH 137/260] Fix version https://github.com/xarray-contrib/datatree/pull/113 * fix version * Update conf.py --- xarray/datatree_/datatree/__init__.py | 23 ++++++----- .../datatree_/datatree/tests/test_version.py | 5 +++ xarray/datatree_/dev-requirements.txt | 6 --- xarray/datatree_/docs/source/conf.py | 4 +- xarray/datatree_/pyproject.toml | 14 +++++++ xarray/datatree_/requirements.txt | 1 - xarray/datatree_/setup.cfg | 33 ++++++++++++++++ xarray/datatree_/setup.py | 39 ------------------- 8 files changed, 68 insertions(+), 57 deletions(-) create mode 100644 xarray/datatree_/datatree/tests/test_version.py delete mode 100644 xarray/datatree_/dev-requirements.txt create mode 100644 xarray/datatree_/pyproject.toml delete mode 100644 xarray/datatree_/requirements.txt delete mode 100644 xarray/datatree_/setup.py diff --git a/xarray/datatree_/datatree/__init__.py b/xarray/datatree_/datatree/__init__.py index 58b65aec598..8de251a423f 100644 --- a/xarray/datatree_/datatree/__init__.py +++ b/xarray/datatree_/datatree/__init__.py @@ -1,15 +1,20 @@ -# flake8: noqa -# Ignoring F401: imported but unused - -from pkg_resources import DistributionNotFound, get_distribution - # import public API from .datatree import DataTree from .io import open_datatree from .mapping import TreeIsomorphismError, map_over_subtree try: - __version__ = get_distribution(__name__).version -except DistributionNotFound: # noqa: F401; pragma: no cover - # package is not installed - pass + # NOTE: the `_version.py` file must not be present in the git repository + # as it is generated by setuptools at install time + from ._version import __version__ +except ImportError: # pragma: no cover + # Local copy or not installed with setuptools + __version__ = "999" + +__all__ = ( + "DataTree", + "open_datatree", + "TreeIsomorphismError", + "map_over_subtree", + "__version__", +) diff --git a/xarray/datatree_/datatree/tests/test_version.py b/xarray/datatree_/datatree/tests/test_version.py new file mode 100644 index 00000000000..207d5d86d53 --- /dev/null +++ b/xarray/datatree_/datatree/tests/test_version.py @@ -0,0 +1,5 @@ +import datatree + + +def test_version(): + assert datatree.__version__ != "999" diff --git a/xarray/datatree_/dev-requirements.txt b/xarray/datatree_/dev-requirements.txt deleted file mode 100644 index 57209c776a5..00000000000 --- a/xarray/datatree_/dev-requirements.txt +++ /dev/null @@ -1,6 +0,0 @@ -pytest -flake8 -black -codecov -pytest-cov --r requirements.txt diff --git a/xarray/datatree_/docs/source/conf.py b/xarray/datatree_/docs/source/conf.py index 5a9c0403843..c6da277d346 100644 --- a/xarray/datatree_/docs/source/conf.py +++ b/xarray/datatree_/docs/source/conf.py @@ -78,9 +78,9 @@ # built documents. # # The short X.Y version. -version = "0.0.1" # datatree.__version__ +version = datatree.__version__ # The full version, including alpha/beta/rc tags. -release = "0.0.1" # datatree.__version__ +release = datatree.__version__ # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. diff --git a/xarray/datatree_/pyproject.toml b/xarray/datatree_/pyproject.toml new file mode 100644 index 00000000000..ec2731bbefb --- /dev/null +++ b/xarray/datatree_/pyproject.toml @@ -0,0 +1,14 @@ +[build-system] +requires = [ + "setuptools>=42", + "wheel", + "setuptools_scm[toml]>=3.4", + "setuptools_scm_git_archive", +] + +[tool.setuptools_scm] +write_to = "datatree/_version.py" +write_to_template = ''' +# Do not change! Do not track in version control! +__version__ = "{version}" +''' diff --git a/xarray/datatree_/requirements.txt b/xarray/datatree_/requirements.txt deleted file mode 100644 index 4eb031ceee3..00000000000 --- a/xarray/datatree_/requirements.txt +++ /dev/null @@ -1 +0,0 @@ -xarray>=2022.05.0.dev0 diff --git a/xarray/datatree_/setup.cfg b/xarray/datatree_/setup.cfg index 3a6f8120ce5..c59993b13cc 100644 --- a/xarray/datatree_/setup.cfg +++ b/xarray/datatree_/setup.cfg @@ -1,3 +1,36 @@ +[metadata] +name = xarray-datatree +description = Hierarchical tree-like data structures for xarray +long_description_content_type=text/markdown +long_description = file: README.md +url = https://github.com/xarray-contrib/datatree +author = Thomas Nicholas +author_email = thomas.nicholas@columbia.edu +license = Apache +classifiers = + Development Status :: 3 - Alpha + Intended Audience :: Science/Research + Topic :: Scientific/Engineering + License :: OSI Approved :: Apache Software License + Operating System :: OS Independent + Programming Language :: Python + Programming Language :: Python :: 3.8 + Programming Language :: Python :: 3.9 + Programming Language :: Python :: 3.10 + +[options] +packages = find: +python_requires = >=3.8 +install_requires = + xarray >=2022.05.0.dev0 + +[options.packages.find] +exclude = + docs + tests + tests.* + docs.* + [flake8] ignore = E203 # whitespace before ':' - doesn't work well with black diff --git a/xarray/datatree_/setup.py b/xarray/datatree_/setup.py deleted file mode 100644 index 12ac3a011b0..00000000000 --- a/xarray/datatree_/setup.py +++ /dev/null @@ -1,39 +0,0 @@ -from os.path import exists - -from setuptools import find_packages, setup - -with open("requirements.txt") as f: - install_requires = f.read().strip().split("\n") - -if exists("README.rst"): - with open("README.rst") as f: - long_description = f.read() -else: - long_description = "" - - -setup( - name="xarray-datatree", - description="Hierarchical tree-like data structures for xarray", - long_description=long_description, - url="https://github.com/xarray-contrib/datatree", - author="Thomas Nicholas", - author_email="thomas.nicholas@columbia.edu", - license="Apache", - classifiers=[ - "Development Status :: 3 - Alpha", - "Intended Audience :: Science/Research", - "Topic :: Scientific/Engineering", - "License :: OSI Approved :: Apache Software License", - "Operating System :: OS Independent", - "Programming Language :: Python", - "Programming Language :: Python :: 3.8", - "Programming Language :: Python :: 3.9", - "Programming Language :: Python :: 3.10", - ], - packages=find_packages(exclude=["docs", "tests", "tests.*", "docs.*"]), - install_requires=install_requires, - python_requires=">=3.8", - use_scm_version={"version_scheme": "post-release", "local_scheme": "dirty-tag"}, - setup_requires=["setuptools_scm>=3.4", "setuptools>=42"], -) From e4df20a789bb3c1bb02118fb33d6896956023b8a Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Fri, 17 Jun 2022 13:32:37 -0400 Subject: [PATCH 138/260] test assigning int --- xarray/datatree_/datatree/tests/test_datatree.py | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 86a7858a7d9..54e4be95627 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -314,6 +314,17 @@ def test_setitem_unnamed_dataarray(self): folder1["results"] = data xrt.assert_equal(folder1["results"], data) + def test_setitem_variable(self): + var = xr.Variable(data=[0, 50], dims="x") + folder1 = DataTree(name="folder1") + folder1["results"] = var + xrt.assert_equal(folder1["results"], xr.DataArray(var)) + + def test_setitem_coerce_to_dataarray(self): + folder1 = DataTree(name="folder1") + folder1["results"] = 0 + xrt.assert_equal(folder1["results"], xr.DataArray(0)) + def test_setitem_add_new_variable_to_empty_node(self): results = DataTree(name="results") results["pressure"] = xr.DataArray(data=[2, 3]) From d498a0c3cf87196c58364087d3a89a05c569f36c Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Fri, 17 Jun 2022 13:32:57 -0400 Subject: [PATCH 139/260] allow assigning coercible values --- xarray/datatree_/datatree/datatree.py | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index ebeab5a1705..8e62f1c8784 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -699,14 +699,17 @@ def _set(self, key: str, val: DataTree | CoercibleValue) -> None: if isinstance(val, DataTree): val.name = key val.parent = self - elif isinstance(val, (DataArray, Variable)): - # TODO this should also accomodate other types that can be coerced into Variables - self.update({key: val}) else: - raise TypeError(f"Type {type(val)} cannot be assigned to a DataTree") + if not isinstance(val, (DataArray, Variable)): + # accommodate other types that can be coerced into Variables + val = DataArray(val) + + self.update({key: val}) def __setitem__( - self, key: str, value: DataTree | Dataset | DataArray | Variable + self, + key: str, + value: Any, ) -> None: """ Add either a child node or an array to the tree, at any position. From 18be0847a1548163b5594e5b2973dc0ddbd6c381 Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Fri, 17 Jun 2022 16:43:05 -0400 Subject: [PATCH 140/260] Allow assigning Coercible values + NamedNode internal class https://github.com/xarray-contrib/datatree/pull/115 * test assigning int * allow assigning coercible values * refactor name-related methods to intermediate class * refactor tests to match * fix now-exposed bug with naming * moved test showing lack of name permanence * whatsnew --- xarray/datatree_/datatree/datatree.py | 31 ++--- .../datatree_/datatree/tests/test_datatree.py | 18 ++- .../datatree_/datatree/tests/test_treenode.py | 131 ++++++++++-------- xarray/datatree_/datatree/treenode.py | 104 ++++++++------ xarray/datatree_/docs/source/whats-new.rst | 6 + 5 files changed, 164 insertions(+), 126 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index ebeab5a1705..6705e07b7d8 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -41,7 +41,7 @@ MappedDataWithCoords, ) from .render import RenderTree -from .treenode import NodePath, Tree, TreeNode +from .treenode import NamedNode, NodePath, Tree if TYPE_CHECKING: from xarray.core.merge import CoercibleValue @@ -228,7 +228,7 @@ def _replace( class DataTree( - TreeNode, + NamedNode, MappedDatasetMethodsMixin, MappedDataWithCoords, DataTreeArithmeticMixin, @@ -343,20 +343,6 @@ def __init__( ) self._close = ds._close - @property - def name(self) -> str | None: - """The name of this node.""" - return self._name - - @name.setter - def name(self, name: str | None) -> None: - if name is not None: - if not isinstance(name, str): - raise TypeError("node name must be a string or None") - if "/" in name: - raise ValueError("node names cannot contain forward slashes") - self._name = name - @property def parent(self: DataTree) -> DataTree | None: """Parent of this node.""" @@ -699,14 +685,17 @@ def _set(self, key: str, val: DataTree | CoercibleValue) -> None: if isinstance(val, DataTree): val.name = key val.parent = self - elif isinstance(val, (DataArray, Variable)): - # TODO this should also accomodate other types that can be coerced into Variables - self.update({key: val}) else: - raise TypeError(f"Type {type(val)} cannot be assigned to a DataTree") + if not isinstance(val, (DataArray, Variable)): + # accommodate other types that can be coerced into Variables + val = DataArray(val) + + self.update({key: val}) def __setitem__( - self, key: str, value: DataTree | Dataset | DataArray | Variable + self, + key: str, + value: Any, ) -> None: """ Add either a child node or an array to the tree, at any position. diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 86a7858a7d9..82831977f42 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -249,13 +249,6 @@ def test_setitem_unnamed_child_node_becomes_named(self): john2["sonny"] = DataTree() assert john2["sonny"].name == "sonny" - @pytest.mark.xfail(reason="bug with name overwriting") - def test_setitem_child_node_keeps_name(self): - john = DataTree(name="john") - r2d2 = DataTree(name="R2D2") - john["Mary"] = r2d2 - assert r2d2.name == "R2D2" - def test_setitem_new_grandchild_node(self): john = DataTree(name="john") DataTree(name="mary", parent=john) @@ -314,6 +307,17 @@ def test_setitem_unnamed_dataarray(self): folder1["results"] = data xrt.assert_equal(folder1["results"], data) + def test_setitem_variable(self): + var = xr.Variable(data=[0, 50], dims="x") + folder1 = DataTree(name="folder1") + folder1["results"] = var + xrt.assert_equal(folder1["results"], xr.DataArray(var)) + + def test_setitem_coerce_to_dataarray(self): + folder1 = DataTree(name="folder1") + folder1["results"] = 0 + xrt.assert_equal(folder1["results"], xr.DataArray(0)) + def test_setitem_add_new_variable_to_empty_node(self): results = DataTree(name="results") results["pressure"] = xr.DataArray(data=[2, 3]) diff --git a/xarray/datatree_/datatree/tests/test_treenode.py b/xarray/datatree_/datatree/tests/test_treenode.py index 0494911f2ca..2c2a50961ae 100644 --- a/xarray/datatree_/datatree/tests/test_treenode.py +++ b/xarray/datatree_/datatree/tests/test_treenode.py @@ -1,7 +1,7 @@ import pytest from datatree.iterators import LevelOrderIter, PreOrderIter -from datatree.treenode import TreeError, TreeNode +from datatree.treenode import NamedNode, TreeError, TreeNode class TestFamilyTree: @@ -143,43 +143,6 @@ def test_get_from_root(self): assert sue._get_item("/Mary") is mary -class TestPaths: - def test_path_property(self): - sue = TreeNode() - mary = TreeNode(children={"Sue": sue}) - john = TreeNode(children={"Mary": mary}) # noqa - assert sue.path == "/Mary/Sue" - assert john.path == "/" - - def test_path_roundtrip(self): - sue = TreeNode() - mary = TreeNode(children={"Sue": sue}) - john = TreeNode(children={"Mary": mary}) # noqa - assert john._get_item(sue.path) == sue - - def test_same_tree(self): - mary = TreeNode() - kate = TreeNode() - john = TreeNode(children={"Mary": mary, "Kate": kate}) # noqa - assert mary.same_tree(kate) - - def test_relative_paths(self): - sue = TreeNode() - mary = TreeNode(children={"Sue": sue}) - annie = TreeNode() - john = TreeNode(children={"Mary": mary, "Annie": annie}) - - assert sue.relative_to(john) == "Mary/Sue" - assert john.relative_to(sue) == "../.." - assert annie.relative_to(sue) == "../../Annie" - assert sue.relative_to(annie) == "../Mary/Sue" - assert sue.relative_to(sue) == "." - - evil_kate = TreeNode() - with pytest.raises(ValueError, match="nodes do not lie within the same tree"): - sue.relative_to(evil_kate) - - class TestSetNodes: def test_set_child_node(self): john = TreeNode() @@ -261,16 +224,66 @@ def test_del_child(self): del john["Mary"] +class TestNames: + def test_child_gets_named_on_attach(self): + sue = NamedNode() + mary = NamedNode(children={"Sue": sue}) # noqa + assert sue.name == "Sue" + + @pytest.mark.xfail(reason="requires refactoring to retain name") + def test_grafted_subtree_retains_name(self): + subtree = NamedNode("original") + root = NamedNode(children={"new_name": subtree}) # noqa + assert subtree.name == "original" + + +class TestPaths: + def test_path_property(self): + sue = NamedNode() + mary = NamedNode(children={"Sue": sue}) + john = NamedNode(children={"Mary": mary}) # noqa + assert sue.path == "/Mary/Sue" + assert john.path == "/" + + def test_path_roundtrip(self): + sue = NamedNode() + mary = NamedNode(children={"Sue": sue}) + john = NamedNode(children={"Mary": mary}) # noqa + assert john._get_item(sue.path) == sue + + def test_same_tree(self): + mary = NamedNode() + kate = NamedNode() + john = NamedNode(children={"Mary": mary, "Kate": kate}) # noqa + assert mary.same_tree(kate) + + def test_relative_paths(self): + sue = NamedNode() + mary = NamedNode(children={"Sue": sue}) + annie = NamedNode() + john = NamedNode(children={"Mary": mary, "Annie": annie}) + + assert sue.relative_to(john) == "Mary/Sue" + assert john.relative_to(sue) == "../.." + assert annie.relative_to(sue) == "../../Annie" + assert sue.relative_to(annie) == "../Mary/Sue" + assert sue.relative_to(sue) == "." + + evil_kate = NamedNode() + with pytest.raises(ValueError, match="nodes do not lie within the same tree"): + sue.relative_to(evil_kate) + + def create_test_tree(): - f = TreeNode() - b = TreeNode() - a = TreeNode() - d = TreeNode() - c = TreeNode() - e = TreeNode() - g = TreeNode() - i = TreeNode() - h = TreeNode() + f = NamedNode() + b = NamedNode() + a = NamedNode() + d = NamedNode() + c = NamedNode() + e = NamedNode() + g = NamedNode() + i = NamedNode() + h = NamedNode() f.children = {"b": b, "g": g} b.children = {"a": a, "d": d} @@ -286,7 +299,7 @@ def test_preorderiter(self): tree = create_test_tree() result = [node.name for node in PreOrderIter(tree)] expected = [ - None, # root TreeNode is unnamed + None, # root Node is unnamed "b", "a", "d", @@ -302,7 +315,7 @@ def test_levelorderiter(self): tree = create_test_tree() result = [node.name for node in LevelOrderIter(tree)] expected = [ - None, # root TreeNode is unnamed + None, # root Node is unnamed "b", "g", "a", @@ -317,19 +330,19 @@ def test_levelorderiter(self): class TestRenderTree: def test_render_nodetree(self): - sam = TreeNode() - ben = TreeNode() - mary = TreeNode(children={"Sam": sam, "Ben": ben}) - kate = TreeNode() - john = TreeNode(children={"Mary": mary, "Kate": kate}) + sam = NamedNode() + ben = NamedNode() + mary = NamedNode(children={"Sam": sam, "Ben": ben}) + kate = NamedNode() + john = NamedNode(children={"Mary": mary, "Kate": kate}) printout = john.__str__() expected_nodes = [ - "TreeNode()", - "TreeNode('Mary')", - "TreeNode('Sam')", - "TreeNode('Ben')", - "TreeNode('Kate')", + "NamedNode()", + "NamedNode('Mary')", + "NamedNode('Sam')", + "NamedNode('Ben')", + "NamedNode('Kate')", ] for expected_node, printed_node in zip(expected_nodes, printout.splitlines()): assert expected_node in printed_node diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index a2e87675b57..16ffecc261b 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -113,7 +113,7 @@ def _check_loop(self, new_parent: Tree | None) -> None: if self._is_descendant_of(new_parent): raise TreeError( - f"Cannot set parent, as node {new_parent.name} is already a descendant of this node." + "Cannot set parent, as intended parent is already a descendant of this node." ) def _is_descendant_of(self, node: Tree) -> bool: @@ -461,18 +461,72 @@ def update(self: Tree, other: Mapping[str, Tree]) -> None: new_children = {**self.children, **other} self.children = new_children + def same_tree(self, other: Tree) -> bool: + """True if other node is in the same tree as this node.""" + return self.root is other.root + + def find_common_ancestor(self, other: Tree) -> Tree: + """ + Find the first common ancestor of two nodes in the same tree. + + Raise ValueError if they are not in the same tree. + """ + common_ancestor = None + for node in other.iter_lineage(): + if node in self.ancestors: + common_ancestor = node + break + + if not common_ancestor: + raise ValueError( + "Cannot find relative path because nodes do not lie within the same tree" + ) + + return common_ancestor + + def _path_to_ancestor(self, ancestor: Tree) -> NodePath: + generation_gap = list(self.lineage).index(ancestor) + path_upwards = "../" * generation_gap if generation_gap > 0 else "/" + return NodePath(path_upwards) + + +class NamedNode(TreeNode, Generic[Tree]): + """ + A TreeNode which knows its own name. + + Implements path-like relationships to other nodes in its tree. + """ + + _name: Optional[str] + _parent: Optional[Tree] + _children: OrderedDict[str, Tree] + + def __init__(self, name=None, children=None): + super().__init__(children=children) + self._name = None + self.name = name + @property def name(self) -> str | None: - """If node has a parent, this is the key under which it is stored in `parent.children`.""" - if self.parent: - return next( - name for name, child in self.parent.children.items() if child is self - ) - else: - return None + """The name of this node.""" + return self._name + + @name.setter + def name(self, name: str | None) -> None: + if name is not None: + if not isinstance(name, str): + raise TypeError("node name must be a string or None") + if "/" in name: + raise ValueError("node names cannot contain forward slashes") + self._name = name def __str__(self) -> str: - return f"TreeNode({self.name})" if self.name else "TreeNode()" + return f"NamedNode({self.name})" if self.name else "NamedNode()" + + def _post_attach(self: NamedNode, parent: NamedNode) -> None: + """Ensures child has name attribute corresponding to key under which it has been stored.""" + key = next(k for k, v in parent.children.items() if v is self) + self.name = key @property def path(self) -> str: @@ -483,9 +537,9 @@ def path(self) -> str: root, *ancestors = self.ancestors # don't include name of root because (a) root might not have a name & (b) we want path relative to root. names = [node.name for node in ancestors] - return "/" + "/".join(names) # type: ignore + return "/" + "/".join(names) - def relative_to(self, other: Tree) -> str: + def relative_to(self: NamedNode, other: NamedNode) -> str: """ Compute the relative path from this node to node `other`. @@ -505,31 +559,3 @@ def relative_to(self, other: Tree) -> str: return str( path_to_common_ancestor / this_path.relative_to(common_ancestor.path) ) - - def same_tree(self, other: Tree) -> bool: - """True if other node is in the same tree as this node.""" - return self.root is other.root - - def find_common_ancestor(self, other: Tree) -> Tree: - """ - Find the first common ancestor of two nodes in the same tree. - - Raise ValueError if they are not in the same tree. - """ - common_ancestor = None - for node in other.iter_lineage(): - if node in self.ancestors: - common_ancestor = node - break - - if not common_ancestor: - raise ValueError( - "Cannot find relative path because nodes do not lie within the same tree" - ) - - return common_ancestor - - def _path_to_ancestor(self, ancestor: Tree) -> NodePath: - generation_gap = list(self.lineage).index(ancestor) - path_upwards = "../" * generation_gap if generation_gap > 0 else "/" - return NodePath(path_upwards) diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 8a31940ff27..7e0a57dfb1f 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -42,6 +42,8 @@ Bug fixes - Modifying the contents of a ``DataTree`` object via the ``DataTree.ds`` attribute is now forbidden, which prevents any possibility of the contents of a ``DataTree`` object and its ``.ds`` attribute diverging. (:issue:`38`, :pull:`99`) By `Tom Nicholas `_. +- Fixed a bug so that names of children now always match keys under which parents store them (:pull:`99`). + By `Tom Nicholas `_. Documentation ~~~~~~~~~~~~~ @@ -56,10 +58,14 @@ Internal Changes This approach means that the ``DataTree`` class now effectively copies and extends the internal structure of ``xarray.Dataset``. (:pull:`41`) By `Tom Nicholas `_. +- Refactored to use intermediate ``NamedNode`` class, separating implementation of methods requiring a ``name`` + attribute from those not requiring it. + By `Tom Nicholas `_. - Made ``testing.test_datatree.create_test_datatree`` into a pytest fixture (:pull:`107`). By `Benjamin Woods `_. + .. _whats-new.v0.0.6: v0.0.6 (06/03/2022) From 9ba6008bbc2076815bec09b5064fbfce5e3d1ed9 Mon Sep 17 00:00:00 2001 From: Tom Nicholas <35968931+TomNicholas@users.noreply.github.com> Date: Sun, 26 Jun 2022 08:36:10 -0400 Subject: [PATCH 141/260] Data structures docs https://github.com/xarray-contrib/datatree/pull/103 * sketching out changes needed to integrate variables into DataTree * fixed some other basic conflicts * fix mypy errors * can create basic datatree node objects again * child-variable name collisions dectected correctly * in-progres * add _replace method * updated tests to assert identical instead of check .ds is expected_ds * refactor .ds setter to use _replace * refactor init to use _replace * refactor test tree to avoid init * attempt at copy methods * rewrote implementation of .copy method * xfailing test for deepcopying * pseudocode implementation of DatasetView * Revert "pseudocode implementation of DatasetView" This reverts commit 52ef23baaa4b6892cad2d69c61b43db831044630. * removed duplicated implementation of copy * reorganise API docs * expose data_vars, coords etc. properties * try except with calculate_dimensions private import * add keys/values/items methods * don't use has_data when .variables would do * explanation of basic properties * add data structures page to index * revert adding documentation in favour of that going in a different PR * explanation of basic properties * add data structures page to index * create tree node-by-node * create tree from dict * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * dict-like interface * correct deepcopy tests * use .data_vars in copy tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * black * whatsnew * data contents * dictionary-like access * TODOs * test assigning int * allow assigning coercible values * simplify example using #115 * add note about fully qualified names Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- xarray/datatree_/datatree/datatree.py | 16 +- .../datatree_/docs/source/data-structures.rst | 212 ++++++++++++++++++ xarray/datatree_/docs/source/index.rst | 1 + xarray/datatree_/docs/source/whats-new.rst | 3 + 4 files changed, 229 insertions(+), 3 deletions(-) create mode 100644 xarray/datatree_/docs/source/data-structures.rst diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 6705e07b7d8..6f78d8c8c67 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -360,6 +360,10 @@ def ds(self) -> DatasetView: An immutable Dataset-like view onto the data in this node. For a mutable Dataset containing the same data as in this node, use `.to_dataset()` instead. + + See Also + -------- + DataTree.to_dataset """ return DatasetView._from_node(self) @@ -393,7 +397,13 @@ def _pre_attach(self: DataTree, parent: DataTree) -> None: ) def to_dataset(self) -> Dataset: - """Return the data in this node as a new xarray.Dataset object.""" + """ + Return the data in this node as a new xarray.Dataset object. + + See Also + -------- + DataTree.ds + """ return Dataset._construct_direct( self._variables, self._coord_names, @@ -432,7 +442,7 @@ def variables(self) -> Mapping[Hashable, Variable]: @property def attrs(self) -> Dict[Hashable, Any]: - """Dictionary of global attributes on this node""" + """Dictionary of global attributes on this node object.""" if self._attrs is None: self._attrs = {} return self._attrs @@ -443,7 +453,7 @@ def attrs(self, value: Mapping[Any, Any]) -> None: @property def encoding(self) -> Dict: - """Dictionary of global encoding attributes on this node""" + """Dictionary of global encoding attributes on this node object.""" if self._encoding is None: self._encoding = {} return self._encoding diff --git a/xarray/datatree_/docs/source/data-structures.rst b/xarray/datatree_/docs/source/data-structures.rst new file mode 100644 index 00000000000..93d5b9abe31 --- /dev/null +++ b/xarray/datatree_/docs/source/data-structures.rst @@ -0,0 +1,212 @@ +.. _data structures: + +Data Structures +=============== + +.. ipython:: python + :suppress: + + import numpy as np + import pandas as pd + import xarray as xr + import datatree + + np.random.seed(123456) + np.set_printoptions(threshold=10) + +.. note:: + + This page builds on the information given in xarray's main page on + `data structures `_, so it is suggested that you + are familiar with those first. + +DataTree +-------- + +:py:class:``DataTree`` is xarray's highest-level data structure, able to organise heterogeneous data which +could not be stored inside a single ``Dataset`` object. This includes representing the recursive structure of multiple +`groups`_ within a netCDF file or `Zarr Store`_. + +.. _groups: https://www.unidata.ucar.edu/software/netcdf/workshops/2011/groups-types/GroupsIntro.html +.. _Zarr Store: https://zarr.readthedocs.io/en/stable/tutorial.html#groups + +Each ``DataTree`` object (or "node") contains the same data that a single ``xarray.Dataset`` would (i.e. ``DataArray`` objects +stored under hashable keys), and so has the same key properties: + +- ``dims``: a dictionary mapping of dimension names to lengths, for the variables in this node, +- ``data_vars``: a dict-like container of DataArrays corresponding to variables in this node, +- ``coords``: another dict-like container of DataArrays, corresponding to coordinate variables in this node, +- ``attrs``: dict to hold arbitary metadata relevant to data in this node. + +A single ``DataTree`` object acts much like a single ``Dataset`` object, and has a similar set of dict-like methods +defined upon it. However, ``DataTree``'s can also contain other ``DataTree`` objects, so they can be thought of as nested dict-like +containers of both ``xarray.DataArray``'s and ``DataTree``'s. + +A single datatree object is known as a "node", and its position relative to other nodes is defined by two more key +properties: + +- ``children``: An ordered dictionary mapping from names to other ``DataTree`` objects, known as its' "child nodes". +- ``parent``: The single ``DataTree`` object whose children this datatree is a member of, known as its' "parent node". + +Each child automatically knows about its parent node, and a node without a parent is known as a "root" node +(represented by the ``parent`` attribute pointing to ``None``). +Nodes can have multiple children, but as each child node has at most one parent, there can only ever be one root node in a given tree. + +The overall structure is technically a `connected acyclic undirected rooted graph`, otherwise known as a +`"Tree" `_. + +.. note:: + + Technically a ``DataTree`` with more than one child node forms an `"Ordered Tree" `_, + because the children are stored in an Ordered Dictionary. However, this distinction only really matters for a few + edge cases involving operations on multiple trees simultaneously, and can safely be ignored by most users. + + +``DataTree`` objects can also optionally have a ``name`` as well as ``attrs``, just like a ``DataArray``. +Again these are not normally used unless explicitly accessed by the user. + + +Creating a DataTree +~~~~~~~~~~~~~~~~~~~ + +There are two ways to create a ``DataTree`` from scratch. The first is to create each node individually, +specifying the nodes' relationship to one another as you create each one. + +The ``DataTree`` constructor takes: + +- ``data``: The data that will be stored in this node, represented by a single ``xarray.Dataset``, or a named ``xarray.DataArray``. +- ``parent``: The parent node (if there is one), given as a ``DataTree`` object. +- ``children``: The various child nodes (if there are any), given as a mapping from string keys to ``DataTree`` objects. +- ``name``: A string to use as the name of this node. + +Let's make a datatree node without anything in it: + +.. ipython:: python + + from datatree import DataTree + + # create root node + node1 = DataTree(name="Oak") + + node1 + +At this point our node is also the root node, as every tree has a root node. + +We can add a second node to this tree either by referring to the first node in the constructor of the second: + +.. ipython:: python + + # add a child by referring to the parent node + node2 = DataTree(name="Bonsai", parent=node1) + +or by dynamically updating the attributes of one node to refer to another: + +.. ipython:: python + + # add a grandparent by updating the .parent property of an existing node + node0 = DataTree(name="General Sherman") + node1.parent = node0 + +Our tree now has three nodes within it, and one of the two new nodes has become the new root: + +.. ipython:: python + + node0 + +Is is at tree construction time that consistency checks are enforced. For instance, if we try to create a `cycle` the constructor will raise an error: + +.. ipython:: python + :okexcept: + + node0.parent = node2 + +The second way is to build the tree from a dictionary of filesystem-like paths and corresponding ``xarray.Dataset`` objects. + +This relies on a syntax inspired by unix-like filesystems, where the "path" to a node is specified by the keys of each intermediate node in sequence, +separated by forward slashes. The root node is referred to by ``"/"``, so the path from our current root node to its grand-child would be ``"/Oak/Bonsai"``. +A path specified from the root (as opposed to being specified relative to an arbitrary node in the tree) is sometimes also referred to as a +`"fully qualified name" `_. + +If we have a dictionary where each key is a valid path, and each value is either valid data or ``None``, +we can construct a complex tree quickly using the alternative constructor ``:py:func::DataTree.from_dict``: + +.. ipython:: python + + d = { + "/": xr.Dataset({"foo": "orange"}), + "/a": xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])}), + "/a/b": xr.Dataset({"zed": np.NaN}), + "a/c/d": None, + } + dt = DataTree.from_dict(d) + dt + +Notice that this method will also create any intermediate empty node necessary to reach the end of the specified path +(i.e. the node labelled `"c"` in this case.) + +Finally if you have a file containing data on disk (such as a netCDF file or a Zarr Store), you can also create a datatree by opening the +file using ``:py:func::~datatree.open_datatree``. + + +DataTree Contents +~~~~~~~~~~~~~~~~~ + +Like ``xarray.Dataset``, ``DataTree`` implements the python mapping interface, but with values given by either ``xarray.DataArray`` objects or other ``DataTree`` objects. + +.. ipython:: python + + dt["a"] + dt["foo"] + +Iterating over keys will iterate over both the names of variables and child nodes. + +We can also access all the data in a single node through a dataset-like view + +.. ipython:: python + + dt["a"].ds + +This demonstrates the fact that the data in any one node is equivalent to the contents of a single ``xarray.Dataset`` object. +The ``DataTree.ds`` property returns an immutable view, but we can instead extract the node's data contents as a new (and mutable) +``xarray.Dataset`` object via ``.to_dataset()``: + +.. ipython:: python + + dt["a"].to_dataset() + +Like with ``Dataset``, you can access the data and coordinate variables of a node separately via the ``data_vars`` and ``coords`` attributes: + +.. ipython:: python + + dt["a"].data_vars + dt["a"].coords + + +Dictionary-like methods +~~~~~~~~~~~~~~~~~~~~~~~ + +We can update the contents of the tree in-place using a dictionary-like syntax. + +We can update a datatree in-place using Python's standard dictionary syntax, similar to how we can for Dataset objects. +For example, to create this example datatree from scratch, we could have written: + +# TODO update this example using ``.coords`` and ``.data_vars`` as setters, + +.. ipython:: python + + dt = DataTree() + dt["foo"] = "orange" + dt["a"] = DataTree(data=xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])})) + dt["a/b/zed"] = np.NaN + dt["a/c/d"] = DataTree() + dt + +To change the variables in a node of a ``DataTree``, you can use all the standard dictionary +methods, including ``values``, ``items``, ``__delitem__``, ``get`` and +:py:meth:`~xarray.DataTree.update`. +Note that assigning a ``DataArray`` object to a ``DataTree`` variable using ``__setitem__`` or ``update`` will +:ref:`automatically align` the array(s) to the original node's indexes. + +If you copy a ``DataTree`` using the ``:py:func::copy`` function or the :py:meth:`~xarray.DataTree.copy` it will copy the entire tree, +including all parents and children. +Like for ``Dataset``, this copy is shallow by default, but you can copy all the data by calling ``dt.copy(deep=True)``. diff --git a/xarray/datatree_/docs/source/index.rst b/xarray/datatree_/docs/source/index.rst index 76ed72beafe..f3e12e091cd 100644 --- a/xarray/datatree_/docs/source/index.rst +++ b/xarray/datatree_/docs/source/index.rst @@ -11,6 +11,7 @@ Datatree Installation Quick Overview Tutorial + Data Model API Reference How do I ... Contributing Guide diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 7e0a57dfb1f..a99518d76dc 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -48,6 +48,9 @@ Bug fixes Documentation ~~~~~~~~~~~~~ +- Added ``Data Structures`` page describing the internal structure of a ``DataTree`` object, and its relation to + ``xarray.Dataset`` objects. (:pull:`103`) + By `Tom Nicholas `_. - API page updated with all the methods that are copied from ``xarray.Dataset``. (:pull:`41`) By `Tom Nicholas `_. From 2f305ec764d082923cf188eba486f4bf32bce2e7 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Mon, 11 Jul 2022 08:36:29 -0400 Subject: [PATCH 142/260] [pre-commit.ci] pre-commit autoupdate https://github.com/xarray-contrib/datatree/pull/117 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit updates: - [github.com/pre-commit/pre-commit-hooks: v4.2.0 → v4.3.0](https://github.com/pre-commit/pre-commit-hooks/compare/v4.2.0...v4.3.0) - [github.com/psf/black: 22.3.0 → 22.6.0](https://github.com/psf/black/compare/22.3.0...22.6.0) - [github.com/pre-commit/mirrors-mypy: v0.960 → v0.961](https://github.com/pre-commit/mirrors-mypy/compare/v0.960...v0.961) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- xarray/datatree_/.pre-commit-config.yaml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/xarray/datatree_/.pre-commit-config.yaml b/xarray/datatree_/.pre-commit-config.yaml index 1e1649cf9af..cee6e80d529 100644 --- a/xarray/datatree_/.pre-commit-config.yaml +++ b/xarray/datatree_/.pre-commit-config.yaml @@ -3,7 +3,7 @@ ci: autoupdate_schedule: monthly repos: - repo: https://github.com/pre-commit/pre-commit-hooks - rev: v4.2.0 + rev: v4.3.0 hooks: - id: trailing-whitespace - id: end-of-file-fixer @@ -15,7 +15,7 @@ repos: - id: isort # https://github.com/python/black#version-control-integration - repo: https://github.com/psf/black - rev: 22.3.0 + rev: 22.6.0 hooks: - id: black - repo: https://github.com/keewis/blackdoc @@ -32,7 +32,7 @@ repos: # - id: velin # args: ["--write", "--compact"] - repo: https://github.com/pre-commit/mirrors-mypy - rev: v0.960 + rev: v0.961 hooks: - id: mypy # Copied from setup.cfg From d2dd27012122eaec7c72e907c9cf6f06351272a2 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 12 Jul 2022 11:34:43 -0500 Subject: [PATCH 143/260] update whatsnew with template for 0.0.8 --- xarray/datatree_/docs/source/whats-new.rst | 27 +++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index a99518d76dc..91cefc18d53 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -15,9 +15,34 @@ What's New np.random.seed(123456) + +.. _whats-new.v0.0.8: + +v0.0.8 (unreleased) +------------------- + +New Features +~~~~~~~~~~~~ + +Breaking changes +~~~~~~~~~~~~~~~~ + +Deprecations +~~~~~~~~~~~~ + +Bug fixes +~~~~~~~~~ + +Documentation +~~~~~~~~~~~~~ + +Internal Changes +~~~~~~~~~~~~~~~~ + + .. _whats-new.v0.0.7: -v0.0.7 (unreleased) +v0.0.7 (07/11/2022) ------------------- New Features From e94f4fc1458f5045bbc303f4c4b7f3d3a8efe311 Mon Sep 17 00:00:00 2001 From: Mattia Almansi Date: Wed, 13 Jul 2022 20:00:57 +0200 Subject: [PATCH 144/260] Update pypipublish.yaml https://github.com/xarray-contrib/datatree/pull/119 --- .../.github/workflows/pypipublish.yaml | 27 ++++++++----------- 1 file changed, 11 insertions(+), 16 deletions(-) diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index 48270f5f9b5..a53e26528f5 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -8,19 +8,14 @@ jobs: deploy: runs-on: ubuntu-latest steps: - - uses: actions/checkout@v3 - - name: Set up Python - uses: actions/setup-python@v4 - with: - python-version: "3.x" - - name: Install dependencies - run: | - python -m pip install --upgrade pip - python -m pip install setuptools setuptools-scm wheel twine - - name: Build and publish - env: - TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }} - TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }} - run: | - python setup.py sdist bdist_wheel - twine upload dist/* + - uses: actions/checkout@v3 + - name: Build distributions + run: | + $CONDA/bin/python -m pip install build + $CONDA/bin/python -m build + - name: Publish a Python distribution to PyPI + if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags') + uses: pypa/gh-action-pypi-publish@release/v1 + with: + user: ${{ secrets.PYPI_USERNAME }} + password: ${{ secrets.PYPI_PASSWORD }} From 7d59c363b2f52dede8147ba26130411dccd5c87b Mon Sep 17 00:00:00 2001 From: Anderson Banihirwe Date: Thu, 14 Jul 2022 13:46:56 -0600 Subject: [PATCH 145/260] update pypi publish workflow https://github.com/xarray-contrib/datatree/pull/120 --- .../.github/workflows/pypipublish.yaml | 94 ++++++++++++++++--- xarray/datatree_/pyproject.toml | 1 + 2 files changed, 80 insertions(+), 15 deletions(-) diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index a53e26528f5..ef02c4dc561 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -1,21 +1,85 @@ -name: Upload Python Package - +name: Build distribution on: release: - types: [created] + types: + - published + push: + branches: + - main + pull_request: + branches: + - main + +concurrency: + group: ${{ github.workflow }}-${{ github.ref }} + cancel-in-progress: true jobs: - deploy: + build-artifacts: + runs-on: ubuntu-latest + if: github.repository == 'xarray-contrib/datatree' + steps: + - uses: actions/checkout@v2 + with: + fetch-depth: 0 + - uses: actions/setup-python@v2 + name: Install Python + with: + python-version: 3.8 + + - name: Install dependencies + run: | + python -m pip install --upgrade pip + python -m pip install build + + - name: Build tarball and wheels + run: | + git clean -xdf + git restore -SW . + python -m build --sdist --wheel . + + + - uses: actions/upload-artifact@v2 + with: + name: releases + path: dist + + test-built-dist: + needs: build-artifacts + runs-on: ubuntu-latest + steps: + - uses: actions/setup-python@v2 + name: Install Python + with: + python-version: '3.10' + - uses: actions/download-artifact@v2 + with: + name: releases + path: dist + - name: List contents of built dist + run: | + ls -ltrh + ls -ltrh dist + + - name: Verify the built dist/wheel is valid + if: github.event_name == 'push' + run: | + python -m pip install --upgrade pip + python -m pip install dist/xarray-datatree*.whl + python -c "import datatree; print(datatree.__version__)" + + upload-to-pypi: + needs: test-built-dist + if: github.event_name == 'release' runs-on: ubuntu-latest steps: - - uses: actions/checkout@v3 - - name: Build distributions - run: | - $CONDA/bin/python -m pip install build - $CONDA/bin/python -m build - - name: Publish a Python distribution to PyPI - if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags') - uses: pypa/gh-action-pypi-publish@release/v1 - with: - user: ${{ secrets.PYPI_USERNAME }} - password: ${{ secrets.PYPI_PASSWORD }} + - uses: actions/download-artifact@v2 + with: + name: releases + path: dist + - name: Publish package to PyPI + uses: pypa/gh-action-pypi-publish@v1.5.0 + with: + user: __token__ + password: ${{ secrets.PYPI_PASSWORD }} + verbose: true diff --git a/xarray/datatree_/pyproject.toml b/xarray/datatree_/pyproject.toml index ec2731bbefb..209ec8fee6a 100644 --- a/xarray/datatree_/pyproject.toml +++ b/xarray/datatree_/pyproject.toml @@ -4,6 +4,7 @@ requires = [ "wheel", "setuptools_scm[toml]>=3.4", "setuptools_scm_git_archive", + "check-manifest" ] [tool.setuptools_scm] From 40a6ca629a7dd3c8b86551cdf78dd702da903495 Mon Sep 17 00:00:00 2001 From: Anderson Banihirwe Date: Thu, 14 Jul 2022 13:52:43 -0600 Subject: [PATCH 146/260] Fix bug in pypi publish GH workflow https://github.com/xarray-contrib/datatree/pull/121 --- xarray/datatree_/.github/workflows/pypipublish.yaml | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index ef02c4dc561..cc8241e4391 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -62,10 +62,9 @@ jobs: ls -ltrh dist - name: Verify the built dist/wheel is valid - if: github.event_name == 'push' run: | python -m pip install --upgrade pip - python -m pip install dist/xarray-datatree*.whl + python -m pip install dist/xarray_datatree*.whl python -c "import datatree; print(datatree.__version__)" upload-to-pypi: From 007ee5c14c877435a40341cccad1d866eb6cdb08 Mon Sep 17 00:00:00 2001 From: Anderson Banihirwe Date: Thu, 14 Jul 2022 14:26:33 -0600 Subject: [PATCH 147/260] use `PYPI_USERNAME` GH action secret https://github.com/xarray-contrib/datatree/pull/122 --- xarray/datatree_/.github/workflows/pypipublish.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index cc8241e4391..dceb1182496 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -79,6 +79,6 @@ jobs: - name: Publish package to PyPI uses: pypa/gh-action-pypi-publish@v1.5.0 with: - user: __token__ + user: ${{ secrets.PYPI_USERNAME }} password: ${{ secrets.PYPI_PASSWORD }} verbose: true From a26790772f35bc6814c210799fc40877d85fb90f Mon Sep 17 00:00:00 2001 From: Julius Busecke Date: Thu, 14 Jul 2022 18:53:28 -0500 Subject: [PATCH 148/260] Update docs theme https://github.com/xarray-contrib/datatree/pull/123 * Update docs theme * Update doc.yml * Update doc.yml * Update doc.yml * Update whats-new.rst * Update whats-new.rst --- xarray/datatree_/ci/doc.yml | 1 + xarray/datatree_/docs/source/conf.py | 10 +++++++- xarray/datatree_/docs/source/whats-new.rst | 28 ++++++++++++++++++++-- 3 files changed, 36 insertions(+), 3 deletions(-) diff --git a/xarray/datatree_/ci/doc.yml b/xarray/datatree_/ci/doc.yml index ff303a98115..5a3afbdf49f 100644 --- a/xarray/datatree_/ci/doc.yml +++ b/xarray/datatree_/ci/doc.yml @@ -15,4 +15,5 @@ dependencies: - zarr - pip: - git+https://github.com/xarray-contrib/datatree + - pangeo-sphinx-book-theme - xarray>=2022.05.0.dev0 diff --git a/xarray/datatree_/docs/source/conf.py b/xarray/datatree_/docs/source/conf.py index c6da277d346..d330c920982 100644 --- a/xarray/datatree_/docs/source/conf.py +++ b/xarray/datatree_/docs/source/conf.py @@ -131,7 +131,15 @@ # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. -html_theme = "sphinx_rtd_theme" +html_theme = "pangeo_sphinx_book_theme" +html_theme_options = { + "repository_url": "https://github.com/xarray-contrib/datatree", + "repository_branch": "main", + "path_to_docs": "doc", + "use_repository_button": True, + "use_issues_button": True, + "use_edit_page_button": True, +} # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 91cefc18d53..514dda9e236 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -15,10 +15,32 @@ What's New np.random.seed(123456) +.. _whats-new.v0.0.10: -.. _whats-new.v0.0.8: +v0.0.10 (unreleased) +------------------- + +New Features +~~~~~~~~~~~~ + +Breaking changes +~~~~~~~~~~~~~~~~ + +Deprecations +~~~~~~~~~~~~ + +Bug fixes +~~~~~~~~~ + +Documentation +~~~~~~~~~~~~~ + +Internal Changes +~~~~~~~~~~~~~~~~ + +.. _whats-new.v0.0.9: -v0.0.8 (unreleased) +v0.0.9 (07/14/2022) ------------------- New Features @@ -35,6 +57,8 @@ Bug fixes Documentation ~~~~~~~~~~~~~ +- Switch docs theme (:pull:`123`). + By `JuliusBusecke `_. Internal Changes ~~~~~~~~~~~~~~~~ From 834c22b4a0ab3cc0fd872933fcc11d099b41ce20 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Thu, 14 Jul 2022 22:46:57 -0600 Subject: [PATCH 149/260] Bump actions/upload-artifact from 2 to 3 https://github.com/xarray-contrib/datatree/pull/125 Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 2 to 3. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](https://github.com/actions/upload-artifact/compare/v2...v3) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- xarray/datatree_/.github/workflows/pypipublish.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index dceb1182496..0dc94121855 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -39,7 +39,7 @@ jobs: python -m build --sdist --wheel . - - uses: actions/upload-artifact@v2 + - uses: actions/upload-artifact@v3 with: name: releases path: dist From 064b88af97ffc6bfa7153e7f5116e57596f0615a Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Fri, 15 Jul 2022 04:47:36 +0000 Subject: [PATCH 150/260] Bump actions/setup-python from 2 to 4 https://github.com/xarray-contrib/datatree/pull/126 --- xarray/datatree_/.github/workflows/pypipublish.yaml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index 0dc94121855..faad929e406 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -22,7 +22,7 @@ jobs: - uses: actions/checkout@v2 with: fetch-depth: 0 - - uses: actions/setup-python@v2 + - uses: actions/setup-python@v4 name: Install Python with: python-version: 3.8 @@ -48,7 +48,7 @@ jobs: needs: build-artifacts runs-on: ubuntu-latest steps: - - uses: actions/setup-python@v2 + - uses: actions/setup-python@v4 name: Install Python with: python-version: '3.10' From ace5455259c15c7e81201ff0cb269f7233fa2488 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Thu, 14 Jul 2022 22:52:02 -0600 Subject: [PATCH 151/260] Bump actions/download-artifact from 2 to 3 https://github.com/xarray-contrib/datatree/pull/127 Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- xarray/datatree_/.github/workflows/pypipublish.yaml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index faad929e406..6ee7871c647 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -52,7 +52,7 @@ jobs: name: Install Python with: python-version: '3.10' - - uses: actions/download-artifact@v2 + - uses: actions/download-artifact@v3 with: name: releases path: dist @@ -72,7 +72,7 @@ jobs: if: github.event_name == 'release' runs-on: ubuntu-latest steps: - - uses: actions/download-artifact@v2 + - uses: actions/download-artifact@v3 with: name: releases path: dist From 6900b59325df690a331854f98caf2dc33463474f Mon Sep 17 00:00:00 2001 From: Anderson Banihirwe Date: Fri, 15 Jul 2022 10:09:20 -0600 Subject: [PATCH 152/260] Add badges to README.md https://github.com/xarray-contrib/datatree/pull/128 --- xarray/datatree_/README.md | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/xarray/datatree_/README.md b/xarray/datatree_/README.md index 78830f65816..bd6cc1b4bdd 100644 --- a/xarray/datatree_/README.md +++ b/xarray/datatree_/README.md @@ -1,4 +1,12 @@ # datatree + +| CI | [![GitHub Workflow Status][github-ci-badge]][github-ci-link] [![Code Coverage Status][codecov-badge]][codecov-link] [![pre-commit.ci status][pre-commit.ci-badge]][pre-commit.ci-link] | +| :---------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| **Docs** | [![Documentation Status][rtd-badge]][rtd-link] | +| **Package** | [![Conda][conda-badge]][conda-link] [![PyPI][pypi-badge]][pypi-link] | +| **License** | [![License][license-badge]][repo-link] | + + WIP implementation of a tree-like hierarchical data structure for xarray. This aims to create the data structure discussed in [xarray issue #4118](https://github.com/pydata/xarray/issues/4118), and therefore extend xarray's data model to be able to [handle arbitrarily nested netCDF4 groups](https://github.com/pydata/xarray/issues/1092#issuecomment-868324949). @@ -19,3 +27,20 @@ You can create a `DataTree` object in 3 ways: You can then specify the nodes' relationships to one other, either by setting `.parent` and `.chlldren` attributes, or through `__get/setitem__` access, e.g. `dt['path/to/node'] = DataTree()`. 3) Create a tree from a dictionary of paths to datasets using `DataTree.from_dict()`. + + + +[github-ci-badge]: https://img.shields.io/github/workflow/status/xarray-contrib/datatree/CI?label=CI&logo=github +[github-ci-link]: https://github.com/xarray-contrib/datatree/actions?query=workflow%3ACI +[codecov-badge]: https://img.shields.io/codecov/c/github/xarray-contrib/datatree.svg?logo=codecov +[codecov-link]: https://codecov.io/gh/xarray-contrib/datatree +[rtd-badge]: https://img.shields.io/readthedocs/xarray-datatree/latest.svg +[rtd-link]: https://xarray-datatree.readthedocs.io/en/latest/?badge=latest +[pypi-badge]: https://img.shields.io/pypi/v/xarray-datatree?logo=pypi +[pypi-link]: https://pypi.org/project/xarray-datatree +[conda-badge]: https://img.shields.io/conda/vn/conda-forge/xarray-datatree?logo=anaconda +[conda-link]: https://anaconda.org/conda-forge/xarray-datatree +[license-badge]: https://img.shields.io/github/license/xarray-contrib/datatree +[repo-link]: https://github.com/xarray-contrib/datatree +[pre-commit.ci-badge]: https://results.pre-commit.ci/badge/github/xarray-contrib/datatree/main.svg +[pre-commit.ci-link]: https://results.pre-commit.ci/latest/github/xarray-contrib/datatree/main From 6123fb0aaee114c2cfda7ad466b597ef6e67a9b9 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue, 26 Jul 2022 09:43:40 -0600 Subject: [PATCH 153/260] Bump pypa/gh-action-pypi-publish from 1.5.0 to 1.5.1 https://github.com/xarray-contrib/datatree/pull/132 Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- xarray/datatree_/.github/workflows/pypipublish.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index 6ee7871c647..b3bea263824 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -77,7 +77,7 @@ jobs: name: releases path: dist - name: Publish package to PyPI - uses: pypa/gh-action-pypi-publish@v1.5.0 + uses: pypa/gh-action-pypi-publish@v1.5.1 with: user: ${{ secrets.PYPI_USERNAME }} password: ${{ secrets.PYPI_PASSWORD }} From c5a603741493795f4f46093b9adcd3786989d909 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Mon, 1 Aug 2022 21:01:59 +0000 Subject: [PATCH 154/260] [pre-commit.ci] pre-commit autoupdate https://github.com/xarray-contrib/datatree/pull/135 Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- xarray/datatree_/.pre-commit-config.yaml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/xarray/datatree_/.pre-commit-config.yaml b/xarray/datatree_/.pre-commit-config.yaml index cee6e80d529..224333b3837 100644 --- a/xarray/datatree_/.pre-commit-config.yaml +++ b/xarray/datatree_/.pre-commit-config.yaml @@ -19,11 +19,11 @@ repos: hooks: - id: black - repo: https://github.com/keewis/blackdoc - rev: v0.3.4 + rev: v0.3.5 hooks: - id: blackdoc - repo: https://github.com/PyCQA/flake8 - rev: 4.0.1 + rev: 5.0.2 hooks: - id: flake8 # - repo: https://github.com/Carreau/velin @@ -32,7 +32,7 @@ repos: # - id: velin # args: ["--write", "--compact"] - repo: https://github.com/pre-commit/mirrors-mypy - rev: v0.961 + rev: v0.971 hooks: - id: mypy # Copied from setup.cfg From 70f3c37eef89bfe721034a04e1873bd4cf27af54 Mon Sep 17 00:00:00 2001 From: Anderson Banihirwe Date: Tue, 2 Aug 2022 21:03:38 -0600 Subject: [PATCH 155/260] add codecov Github workflow https://github.com/xarray-contrib/datatree/pull/129 --- xarray/datatree_/.github/workflows/main.yaml | 22 +++++++++++++------- xarray/datatree_/codecov.yml | 20 ++++++++++++++++++ 2 files changed, 34 insertions(+), 8 deletions(-) create mode 100644 xarray/datatree_/codecov.yml diff --git a/xarray/datatree_/.github/workflows/main.yaml b/xarray/datatree_/.github/workflows/main.yaml index 37f0ae222b2..f64e7d1b4f8 100644 --- a/xarray/datatree_/.github/workflows/main.yaml +++ b/xarray/datatree_/.github/workflows/main.yaml @@ -2,9 +2,11 @@ name: CI on: push: - branches: "*" + branches: + - main pull_request: - branches: main + branches: + - main schedule: - cron: "0 0 * * *" @@ -41,12 +43,16 @@ jobs: shell: bash -l {0} run: | python -m pytest --cov=./ --cov-report=xml --verbose - # - name: Upload coverage to Codecov - # uses: codecov/codecov-action@v2.0.2 - # if: ${{ matrix.python-version }} == 3.8 - # with: - # file: ./coverage.xml - # fail_ci_if_error: false + + - name: Upload code coverage to Codecov + uses: codecov/codecov-action@v2.1.0 + with: + file: ./coverage.xml + flags: unittests + env_vars: OS,PYTHON + name: codecov-umbrella + fail_ci_if_error: false + test-upstream: name: ${{ matrix.python-version }}-dev-build diff --git a/xarray/datatree_/codecov.yml b/xarray/datatree_/codecov.yml new file mode 100644 index 00000000000..8b30905cf4c --- /dev/null +++ b/xarray/datatree_/codecov.yml @@ -0,0 +1,20 @@ +codecov: + require_ci_to_pass: false + max_report_age: off + +comment: false + +ignore: + - 'datatree/tests/*' + - 'setup.py' + +coverage: + precision: 2 + round: down + status: + project: + default: + target: 95 + informational: true + patch: off + changes: false From 202e4014acbc0a574f97bb96b45827037d6a04a7 Mon Sep 17 00:00:00 2001 From: Anderson Banihirwe Date: Tue, 2 Aug 2022 21:10:14 -0600 Subject: [PATCH 156/260] Update LICENSE https://github.com/xarray-contrib/datatree/pull/136 --- xarray/datatree_/LICENSE | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/xarray/datatree_/LICENSE b/xarray/datatree_/LICENSE index 261eeb9e9f8..d68e7230919 100644 --- a/xarray/datatree_/LICENSE +++ b/xarray/datatree_/LICENSE @@ -1,4 +1,4 @@ - Apache License + Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ @@ -186,7 +186,7 @@ same "printed page" as the copyright notice for easier identification within third-party archives. - Copyright [yyyy] [name of copyright owner] + Copyright (c) 2022 onwards, datatree developers Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. From 44c28d3f9532a592135a73d83185ca51ae91acf6 Mon Sep 17 00:00:00 2001 From: Anderson Banihirwe Date: Tue, 2 Aug 2022 21:13:19 -0600 Subject: [PATCH 157/260] remove anytree license file https://github.com/xarray-contrib/datatree/pull/137 --- xarray/datatree_/licenses/ANYTREE_LICENSE | 201 ---------------------- 1 file changed, 201 deletions(-) delete mode 100644 xarray/datatree_/licenses/ANYTREE_LICENSE diff --git a/xarray/datatree_/licenses/ANYTREE_LICENSE b/xarray/datatree_/licenses/ANYTREE_LICENSE deleted file mode 100644 index 8dada3edaf5..00000000000 --- a/xarray/datatree_/licenses/ANYTREE_LICENSE +++ /dev/null @@ -1,201 +0,0 @@ - Apache License - Version 2.0, January 2004 - http://www.apache.org/licenses/ - - TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION - - 1. Definitions. - - "License" shall mean the terms and conditions for use, reproduction, - and distribution as defined by Sections 1 through 9 of this document. - - "Licensor" shall mean the copyright owner or entity authorized by - the copyright owner that is granting the License. - - "Legal Entity" shall mean the union of the acting entity and all - other entities that control, are controlled by, or are under common - control with that entity. For the purposes of this definition, - "control" means (i) the power, direct or indirect, to cause the - direction or management of such entity, whether by contract or - otherwise, or (ii) ownership of fifty percent (50%) or more of the - outstanding shares, or (iii) beneficial ownership of such entity. - - "You" (or "Your") shall mean an individual or Legal Entity - exercising permissions granted by this License. - - "Source" form shall mean the preferred form for making modifications, - including but not limited to software source code, documentation - source, and configuration files. - - "Object" form shall mean any form resulting from mechanical - transformation or translation of a Source form, including but - not limited to compiled object code, generated documentation, - and conversions to other media types. - - "Work" shall mean the work of authorship, whether in Source or - Object form, made available under the License, as indicated by a - copyright notice that is included in or attached to the work - (an example is provided in the Appendix below). - - "Derivative Works" shall mean any work, whether in Source or Object - form, that is based on (or derived from) the Work and for which the - editorial revisions, annotations, elaborations, or other modifications - represent, as a whole, an original work of authorship. For the purposes - of this License, Derivative Works shall not include works that remain - separable from, or merely link (or bind by name) to the interfaces of, - the Work and Derivative Works thereof. - - "Contribution" shall mean any work of authorship, including - the original version of the Work and any modifications or additions - to that Work or Derivative Works thereof, that is intentionally - submitted to Licensor for inclusion in the Work by the copyright owner - or by an individual or Legal Entity authorized to submit on behalf of - the copyright owner. For the purposes of this definition, "submitted" - means any form of electronic, verbal, or written communication sent - to the Licensor or its representatives, including but not limited to - communication on electronic mailing lists, source code control systems, - and issue tracking systems that are managed by, or on behalf of, the - Licensor for the purpose of discussing and improving the Work, but - excluding communication that is conspicuously marked or otherwise - designated in writing by the copyright owner as "Not a Contribution." - - "Contributor" shall mean Licensor and any individual or Legal Entity - on behalf of whom a Contribution has been received by Licensor and - subsequently incorporated within the Work. - - 2. Grant of Copyright License. Subject to the terms and conditions of - this License, each Contributor hereby grants to You a perpetual, - worldwide, non-exclusive, no-charge, royalty-free, irrevocable - copyright license to reproduce, prepare Derivative Works of, - publicly display, publicly perform, sublicense, and distribute the - Work and such Derivative Works in Source or Object form. - - 3. Grant of Patent License. Subject to the terms and conditions of - this License, each Contributor hereby grants to You a perpetual, - worldwide, non-exclusive, no-charge, royalty-free, irrevocable - (except as stated in this section) patent license to make, have made, - use, offer to sell, sell, import, and otherwise transfer the Work, - where such license applies only to those patent claims licensable - by such Contributor that are necessarily infringed by their - Contribution(s) alone or by combination of their Contribution(s) - with the Work to which such Contribution(s) was submitted. If You - institute patent litigation against any entity (including a - cross-claim or counterclaim in a lawsuit) alleging that the Work - or a Contribution incorporated within the Work constitutes direct - or contributory patent infringement, then any patent licenses - granted to You under this License for that Work shall terminate - as of the date such litigation is filed. - - 4. Redistribution. You may reproduce and distribute copies of the - Work or Derivative Works thereof in any medium, with or without - modifications, and in Source or Object form, provided that You - meet the following conditions: - - (a) You must give any other recipients of the Work or - Derivative Works a copy of this License; and - - (b) You must cause any modified files to carry prominent notices - stating that You changed the files; and - - (c) You must retain, in the Source form of any Derivative Works - that You distribute, all copyright, patent, trademark, and - attribution notices from the Source form of the Work, - excluding those notices that do not pertain to any part of - the Derivative Works; and - - (d) If the Work includes a "NOTICE" text file as part of its - distribution, then any Derivative Works that You distribute must - include a readable copy of the attribution notices contained - within such NOTICE file, excluding those notices that do not - pertain to any part of the Derivative Works, in at least one - of the following places: within a NOTICE text file distributed - as part of the Derivative Works; within the Source form or - documentation, if provided along with the Derivative Works; or, - within a display generated by the Derivative Works, if and - wherever such third-party notices normally appear. The contents - of the NOTICE file are for informational purposes only and - do not modify the License. You may add Your own attribution - notices within Derivative Works that You distribute, alongside - or as an addendum to the NOTICE text from the Work, provided - that such additional attribution notices cannot be construed - as modifying the License. - - You may add Your own copyright statement to Your modifications and - may provide additional or different license terms and conditions - for use, reproduction, or distribution of Your modifications, or - for any such Derivative Works as a whole, provided Your use, - reproduction, and distribution of the Work otherwise complies with - the conditions stated in this License. - - 5. Submission of Contributions. Unless You explicitly state otherwise, - any Contribution intentionally submitted for inclusion in the Work - by You to the Licensor shall be under the terms and conditions of - this License, without any additional terms or conditions. - Notwithstanding the above, nothing herein shall supersede or modify - the terms of any separate license agreement you may have executed - with Licensor regarding such Contributions. - - 6. Trademarks. This License does not grant permission to use the trade - names, trademarks, service marks, or product names of the Licensor, - except as required for reasonable and customary use in describing the - origin of the Work and reproducing the content of the NOTICE file. - - 7. Disclaimer of Warranty. Unless required by applicable law or - agreed to in writing, Licensor provides the Work (and each - Contributor provides its Contributions) on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - implied, including, without limitation, any warranties or conditions - of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A - PARTICULAR PURPOSE. You are solely responsible for determining the - appropriateness of using or redistributing the Work and assume any - risks associated with Your exercise of permissions under this License. - - 8. Limitation of Liability. In no event and under no legal theory, - whether in tort (including negligence), contract, or otherwise, - unless required by applicable law (such as deliberate and grossly - negligent acts) or agreed to in writing, shall any Contributor be - liable to You for damages, including any direct, indirect, special, - incidental, or consequential damages of any character arising as a - result of this License or out of the use or inability to use the - Work (including but not limited to damages for loss of goodwill, - work stoppage, computer failure or malfunction, or any and all - other commercial damages or losses), even if such Contributor - has been advised of the possibility of such damages. - - 9. Accepting Warranty or Additional Liability. While redistributing - the Work or Derivative Works thereof, You may choose to offer, - and charge a fee for, acceptance of support, warranty, indemnity, - or other liability obligations and/or rights consistent with this - License. However, in accepting such obligations, You may act only - on Your own behalf and on Your sole responsibility, not on behalf - of any other Contributor, and only if You agree to indemnify, - defend, and hold each Contributor harmless for any liability - incurred by, or claims asserted against, such Contributor by reason - of your accepting any such warranty or additional liability. - - END OF TERMS AND CONDITIONS - - APPENDIX: How to apply the Apache License to your work. - - To apply the Apache License to your work, attach the following - boilerplate notice, with the fields enclosed by brackets "{}" - replaced with your own identifying information. (Don't include - the brackets!) The text should be enclosed in the appropriate - comment syntax for the file format. We also recommend that a - file or class name and description of purpose be included on the - same "printed page" as the copyright notice for easier - identification within third-party archives. - - Copyright {yyyy} {name of copyright owner} - - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. From 7fdbf5bb7e0b6e9e6bdccac033e6532875e6e629 Mon Sep 17 00:00:00 2001 From: Anderson Banihirwe Date: Tue, 2 Aug 2022 21:31:53 -0600 Subject: [PATCH 158/260] Update minimum required version for xarray https://github.com/xarray-contrib/datatree/pull/138 --- xarray/datatree_/setup.cfg | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/setup.cfg b/xarray/datatree_/setup.cfg index c59993b13cc..9a5664de397 100644 --- a/xarray/datatree_/setup.cfg +++ b/xarray/datatree_/setup.cfg @@ -22,7 +22,7 @@ classifiers = packages = find: python_requires = >=3.8 install_requires = - xarray >=2022.05.0.dev0 + xarray >=2022.6.0 [options.packages.find] exclude = From 42dbefd20c232fd759358417dc74f8417aaab490 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue, 2 Aug 2022 23:29:22 -0600 Subject: [PATCH 159/260] Bump codecov/codecov-action from 2.1.0 to 3.1.0 https://github.com/xarray-contrib/datatree/pull/139 --- xarray/datatree_/.github/workflows/main.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/.github/workflows/main.yaml b/xarray/datatree_/.github/workflows/main.yaml index f64e7d1b4f8..ac0b31e780c 100644 --- a/xarray/datatree_/.github/workflows/main.yaml +++ b/xarray/datatree_/.github/workflows/main.yaml @@ -45,7 +45,7 @@ jobs: python -m pytest --cov=./ --cov-report=xml --verbose - name: Upload code coverage to Codecov - uses: codecov/codecov-action@v2.1.0 + uses: codecov/codecov-action@v3.1.0 with: file: ./coverage.xml flags: unittests From be79a041254a5864a281484e8c6f2e2e2f5be8b4 Mon Sep 17 00:00:00 2001 From: Anderson Banihirwe Date: Wed, 3 Aug 2022 11:08:45 -0600 Subject: [PATCH 160/260] Use mamba & micromamba to speed up CI workflows https://github.com/xarray-contrib/datatree/pull/140 --- xarray/datatree_/.github/workflows/main.yaml | 63 +++++++++++--------- xarray/datatree_/codecov.yml | 1 + xarray/datatree_/readthedocs.yml | 16 ++--- 3 files changed, 41 insertions(+), 39 deletions(-) diff --git a/xarray/datatree_/.github/workflows/main.yaml b/xarray/datatree_/.github/workflows/main.yaml index ac0b31e780c..b2b1ab56f31 100644 --- a/xarray/datatree_/.github/workflows/main.yaml +++ b/xarray/datatree_/.github/workflows/main.yaml @@ -15,32 +15,35 @@ jobs: test: name: ${{ matrix.python-version }}-build runs-on: ubuntu-latest + defaults: + run: + shell: bash -l {0} strategy: matrix: python-version: ["3.8", "3.9", "3.10"] steps: - uses: actions/checkout@v3 - - uses: conda-incubator/setup-miniconda@v2 + + - name: Create conda environment + uses: mamba-org/provision-with-micromamba@main with: - mamba-version: "*" - auto-update-conda: true - python-version: ${{ matrix.python-version }} - auto-activate-base: false - activate-environment: datatree + cache-downloads: true + micromamba-version: 'latest' environment-file: ci/environment.yml + extra-specs: | + python=${{ matrix.python-version }} + - name: Conda info - shell: bash -l {0} run: conda info - - name: Conda list - shell: bash -l {0} - run: conda list + - name: Install datatree - shell: bash -l {0} run: | - python -m pip install --no-deps -e . - python -m pip list + python -m pip install -e . --no-deps --force-reinstall + + - name: Conda list + run: conda list + - name: Running Tests - shell: bash -l {0} run: | python -m pytest --cov=./ --cov-report=xml --verbose @@ -57,34 +60,38 @@ jobs: test-upstream: name: ${{ matrix.python-version }}-dev-build runs-on: ubuntu-latest + defaults: + run: + shell: bash -l {0} strategy: matrix: python-version: ["3.8", "3.9", "3.10"] steps: - uses: actions/checkout@v3 - - uses: conda-incubator/setup-miniconda@v2 + + - name: Create conda environment + uses: mamba-org/provision-with-micromamba@main with: - mamba-version: "*" - auto-update-conda: true - python-version: ${{ matrix.python-version }} - auto-activate-base: false - activate-environment: datatree + cache-downloads: true + micromamba-version: 'latest' environment-file: ci/environment.yml + extra-specs: | + python=${{ matrix.python-version }} + - name: Conda info - shell: bash -l {0} run: conda info - - name: Conda list - shell: bash -l {0} - run: conda list + - name: Install dev reqs - shell: bash -l {0} run: | python -m pip install --no-deps --upgrade \ git+https://github.com/pydata/xarray \ git+https://github.com/Unidata/netcdf4-python - python -m pip install --no-deps -e . - python -m pip list + + python -m pip install -e . --no-deps --force-reinstall + + - name: Conda list + run: conda list + - name: Running Tests - shell: bash -l {0} run: | python -m pytest --verbose diff --git a/xarray/datatree_/codecov.yml b/xarray/datatree_/codecov.yml index 8b30905cf4c..44fd739d417 100644 --- a/xarray/datatree_/codecov.yml +++ b/xarray/datatree_/codecov.yml @@ -7,6 +7,7 @@ comment: false ignore: - 'datatree/tests/*' - 'setup.py' + - 'conftest.py' coverage: precision: 2 diff --git a/xarray/datatree_/readthedocs.yml b/xarray/datatree_/readthedocs.yml index d634f48e9ec..9b04939c898 100644 --- a/xarray/datatree_/readthedocs.yml +++ b/xarray/datatree_/readthedocs.yml @@ -1,13 +1,7 @@ version: 2 - -build: - image: latest - -# Optionally set the version of Python and requirements required to build your docs conda: - environment: ci/doc.yml - -python: - install: - - method: pip - path: . + environment: ci/doc.yml +build: + os: 'ubuntu-20.04' + tools: + python: 'mambaforge-4.10' From fd55a11abd34aee1fc8ef3a45edef9fe3e23b7b0 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Tue, 23 Aug 2022 11:13:02 -0700 Subject: [PATCH 161/260] Add accessors https://github.com/xarray-contrib/datatree/pull/144 * test accessor * implement accessor * expose accessor * whatsnew * shut mypy up * fix test --- xarray/datatree_/datatree/__init__.py | 2 + xarray/datatree_/datatree/extensions.py | 20 ++++++++++ .../datatree/tests/test_extensions.py | 40 +++++++++++++++++++ xarray/datatree_/docs/source/api.rst | 3 +- xarray/datatree_/docs/source/whats-new.rst | 3 ++ 5 files changed, 66 insertions(+), 2 deletions(-) create mode 100644 xarray/datatree_/datatree/extensions.py create mode 100644 xarray/datatree_/datatree/tests/test_extensions.py diff --git a/xarray/datatree_/datatree/__init__.py b/xarray/datatree_/datatree/__init__.py index 8de251a423f..a8e29faa354 100644 --- a/xarray/datatree_/datatree/__init__.py +++ b/xarray/datatree_/datatree/__init__.py @@ -1,5 +1,6 @@ # import public API from .datatree import DataTree +from .extensions import register_datatree_accessor from .io import open_datatree from .mapping import TreeIsomorphismError, map_over_subtree @@ -16,5 +17,6 @@ "open_datatree", "TreeIsomorphismError", "map_over_subtree", + "register_datatree_accessor", "__version__", ) diff --git a/xarray/datatree_/datatree/extensions.py b/xarray/datatree_/datatree/extensions.py new file mode 100644 index 00000000000..f6f4e985a79 --- /dev/null +++ b/xarray/datatree_/datatree/extensions.py @@ -0,0 +1,20 @@ +from xarray.core.extensions import _register_accessor + +from .datatree import DataTree + + +def register_datatree_accessor(name): + """Register a custom accessor on DataTree objects. + + Parameters + ---------- + name : str + Name under which the accessor should be registered. A warning is issued + if this name conflicts with a preexisting attribute. + + See Also + -------- + xarray.register_dataarray_accessor + xarray.register_dataset_accessor + """ + return _register_accessor(name, DataTree) diff --git a/xarray/datatree_/datatree/tests/test_extensions.py b/xarray/datatree_/datatree/tests/test_extensions.py new file mode 100644 index 00000000000..b288998e2ce --- /dev/null +++ b/xarray/datatree_/datatree/tests/test_extensions.py @@ -0,0 +1,40 @@ +import pytest + +from datatree import DataTree, register_datatree_accessor + + +class TestAccessor: + def test_register(self) -> None: + @register_datatree_accessor("demo") + class DemoAccessor: + """Demo accessor.""" + + def __init__(self, xarray_obj): + self._obj = xarray_obj + + @property + def foo(self): + return "bar" + + dt: DataTree = DataTree() + assert dt.demo.foo == "bar" # type: ignore + + # accessor is cached + assert dt.demo is dt.demo # type: ignore + + # check descriptor + assert dt.demo.__doc__ == "Demo accessor." # type: ignore + # TODO: typing doesn't seem to work with accessors + assert DataTree.demo.__doc__ == "Demo accessor." # type: ignore + assert isinstance(dt.demo, DemoAccessor) # type: ignore + assert DataTree.demo is DemoAccessor # type: ignore + + with pytest.warns(Warning, match="overriding a preexisting attribute"): + + @register_datatree_accessor("demo") + class Foo: + pass + + # ensure we can remove it + del DataTree.demo # type: ignore + assert not hasattr(DataTree, "demo") diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst index 9ad741901c4..209d4ab9417 100644 --- a/xarray/datatree_/docs/source/api.rst +++ b/xarray/datatree_/docs/source/api.rst @@ -318,9 +318,8 @@ Relatively advanced API for users or developers looking to understand the intern :toctree: generated/ DataTree.variables + register_datatree_accessor .. - Missing: ``DataTree.set_close`` - ``register_datatree_accessor`` diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 514dda9e236..25bb86145a6 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -23,6 +23,9 @@ v0.0.10 (unreleased) New Features ~~~~~~~~~~~~ +- Add the ability to register accessors on ``DataTree`` objects, by using ``register_datatree_accessor``. (:pull:`144`) + By `Tom Nicholas `_. + Breaking changes ~~~~~~~~~~~~~~~~ From 45203323139e9b5dbf78d9c4963995e31aa4f928 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Tue, 6 Sep 2022 11:28:22 -0400 Subject: [PATCH 162/260] [pre-commit.ci] pre-commit autoupdate https://github.com/xarray-contrib/datatree/pull/148 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit updates: - [github.com/psf/black: 22.6.0 → 22.8.0](https://github.com/psf/black/compare/22.6.0...22.8.0) - [github.com/keewis/blackdoc: v0.3.5 → v0.3.6](https://github.com/keewis/blackdoc/compare/v0.3.5...v0.3.6) - [github.com/PyCQA/flake8: 5.0.2 → 5.0.4](https://github.com/PyCQA/flake8/compare/5.0.2...5.0.4) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- xarray/datatree_/.pre-commit-config.yaml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/xarray/datatree_/.pre-commit-config.yaml b/xarray/datatree_/.pre-commit-config.yaml index 224333b3837..4f145dba747 100644 --- a/xarray/datatree_/.pre-commit-config.yaml +++ b/xarray/datatree_/.pre-commit-config.yaml @@ -15,15 +15,15 @@ repos: - id: isort # https://github.com/python/black#version-control-integration - repo: https://github.com/psf/black - rev: 22.6.0 + rev: 22.8.0 hooks: - id: black - repo: https://github.com/keewis/blackdoc - rev: v0.3.5 + rev: v0.3.6 hooks: - id: blackdoc - repo: https://github.com/PyCQA/flake8 - rev: 5.0.2 + rev: 5.0.4 hooks: - id: flake8 # - repo: https://github.com/Carreau/velin From 26c14b4e4cabc22f1ecc78f22101c9289771ac8a Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Thu, 8 Sep 2022 09:54:30 -0600 Subject: [PATCH 163/260] Bump actions/checkout from 2 to 3 https://github.com/xarray-contrib/datatree/pull/149 Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- xarray/datatree_/.github/workflows/pypipublish.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index b3bea263824..115fdb00095 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -19,7 +19,7 @@ jobs: runs-on: ubuntu-latest if: github.repository == 'xarray-contrib/datatree' steps: - - uses: actions/checkout@v2 + - uses: actions/checkout@v3 with: fetch-depth: 0 - uses: actions/setup-python@v4 From 7367474debedc987c7201ffde229e79c2a0183fc Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue, 20 Sep 2022 09:45:40 -0600 Subject: [PATCH 164/260] Bump codecov/codecov-action from 3.1.0 to 3.1.1 https://github.com/xarray-contrib/datatree/pull/150 Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- xarray/datatree_/.github/workflows/main.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/.github/workflows/main.yaml b/xarray/datatree_/.github/workflows/main.yaml index b2b1ab56f31..b18159aed50 100644 --- a/xarray/datatree_/.github/workflows/main.yaml +++ b/xarray/datatree_/.github/workflows/main.yaml @@ -48,7 +48,7 @@ jobs: python -m pytest --cov=./ --cov-report=xml --verbose - name: Upload code coverage to Codecov - uses: codecov/codecov-action@v3.1.0 + uses: codecov/codecov-action@v3.1.1 with: file: ./coverage.xml flags: unittests From dbe47a7178b68a4970cf591cd30d66a88f580191 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Mon, 3 Oct 2022 17:12:18 -0400 Subject: [PATCH 165/260] [pre-commit.ci] pre-commit autoupdate https://github.com/xarray-contrib/datatree/pull/153 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit updates: - [github.com/keewis/blackdoc: v0.3.6 → v0.3.7](https://github.com/keewis/blackdoc/compare/v0.3.6...v0.3.7) - [github.com/pre-commit/mirrors-mypy: v0.971 → v0.981](https://github.com/pre-commit/mirrors-mypy/compare/v0.971...v0.981) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- xarray/datatree_/.pre-commit-config.yaml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/xarray/datatree_/.pre-commit-config.yaml b/xarray/datatree_/.pre-commit-config.yaml index 4f145dba747..0702d44690a 100644 --- a/xarray/datatree_/.pre-commit-config.yaml +++ b/xarray/datatree_/.pre-commit-config.yaml @@ -19,7 +19,7 @@ repos: hooks: - id: black - repo: https://github.com/keewis/blackdoc - rev: v0.3.6 + rev: v0.3.7 hooks: - id: blackdoc - repo: https://github.com/PyCQA/flake8 @@ -32,7 +32,7 @@ repos: # - id: velin # args: ["--write", "--compact"] - repo: https://github.com/pre-commit/mirrors-mypy - rev: v0.971 + rev: v0.981 hooks: - id: mypy # Copied from setup.cfg From aa9eefda4884473b4e24890604995ec5cea131b8 Mon Sep 17 00:00:00 2001 From: Justus Magin Date: Mon, 7 Nov 2022 21:04:24 +0100 Subject: [PATCH 166/260] add `DataTree.pipe` to allow chaining `DataTree` consuming functions https://github.com/xarray-contrib/datatree/pull/156 * add a minimal pipe implementation * copy the code of Dataset.pipe * create a documentation page * add tests for `pipe` * whats-new.rst --- xarray/datatree_/datatree/datatree.py | 60 +++++++++++++++++++ .../datatree_/datatree/tests/test_datatree.py | 32 ++++++++++ xarray/datatree_/docs/source/api.rst | 1 + xarray/datatree_/docs/source/whats-new.rst | 2 + 4 files changed, 95 insertions(+) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 6f78d8c8c67..ff7a417b5ad 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -1014,6 +1014,66 @@ def map_over_subtree_inplace( if node.has_data: node.ds = func(node.ds, *args, **kwargs) + def pipe( + self, func: Callable | tuple[Callable, str], *args: Any, **kwargs: Any + ) -> Any: + """Apply ``func(self, *args, **kwargs)`` + + This method replicates the pandas method of the same name. + + Parameters + ---------- + func : callable + function to apply to this xarray object (Dataset/DataArray). + ``args``, and ``kwargs`` are passed into ``func``. + Alternatively a ``(callable, data_keyword)`` tuple where + ``data_keyword`` is a string indicating the keyword of + ``callable`` that expects the xarray object. + *args + positional arguments passed into ``func``. + **kwargs + a dictionary of keyword arguments passed into ``func``. + + Returns + ------- + object : Any + the return type of ``func``. + + Notes + ----- + Use ``.pipe`` when chaining together functions that expect + xarray or pandas objects, e.g., instead of writing + + .. code:: python + + f(g(h(dt), arg1=a), arg2=b, arg3=c) + + You can write + + .. code:: python + + (dt.pipe(h).pipe(g, arg1=a).pipe(f, arg2=b, arg3=c)) + + If you have a function that takes the data as (say) the second + argument, pass a tuple indicating which keyword expects the + data. For example, suppose ``f`` takes its data as ``arg2``: + + .. code:: python + + (dt.pipe(h).pipe(g, arg1=a).pipe((f, "arg2"), arg1=a, arg3=c)) + + """ + if isinstance(func, tuple): + func, target = func + if target in kwargs: + raise ValueError( + f"{target} is both the pipe target and a keyword argument" + ) + kwargs[target] = self + else: + args = (self,) + args + return func(*args, **kwargs) + def render(self): """Print tree structure, including any data stored at each node.""" for pre, fill, node in RenderTree(self): diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 82831977f42..aa78c2671d6 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -448,3 +448,35 @@ def test_arithmetic(self, create_test_datatree): class TestRestructuring: ... + + +class TestPipe: + def test_noop(self, create_test_datatree): + dt = create_test_datatree() + + actual = dt.pipe(lambda tree: tree) + assert actual.identical(dt) + + def test_params(self, create_test_datatree): + dt = create_test_datatree() + + def f(tree, **attrs): + return tree.assign(arr_with_attrs=xr.Variable("dim0", [], attrs=attrs)) + + attrs = {"x": 1, "y": 2, "z": 3} + + actual = dt.pipe(f, **attrs) + assert actual["arr_with_attrs"].attrs == attrs + + def test_named_self(self, create_test_datatree): + dt = create_test_datatree() + + def f(x, tree, y): + tree.attrs.update({"x": x, "y": y}) + return tree + + attrs = {"x": 1, "y": 2} + + actual = dt.pipe((f, "tree"), **attrs) + + assert actual is dt and actual.attrs == attrs diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst index 209d4ab9417..49caaea86c5 100644 --- a/xarray/datatree_/docs/source/api.rst +++ b/xarray/datatree_/docs/source/api.rst @@ -96,6 +96,7 @@ For manipulating, traversing, navigating, or mapping over the tree structure. DataTree.iter_lineage DataTree.find_common_ancestor map_over_subtree + DataTree.pipe DataTree Contents ----------------- diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 25bb86145a6..daee3fd8ff7 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -25,6 +25,8 @@ New Features - Add the ability to register accessors on ``DataTree`` objects, by using ``register_datatree_accessor``. (:pull:`144`) By `Tom Nicholas `_. +- Allow method chaining with a new :py:meth:`DataTree.pipe` method (:issue:`151`, :pull:`156`). + By `Justus Magin `_. Breaking changes ~~~~~~~~~~~~~~~~ From c57214c8d8eebd45dda8feec77d0cbd9c20e2c07 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Mon, 7 Nov 2022 16:45:19 -0500 Subject: [PATCH 167/260] [pre-commit.ci] pre-commit autoupdate https://github.com/xarray-contrib/datatree/pull/157 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit updates: - [github.com/psf/black: 22.8.0 → 22.10.0](https://github.com/psf/black/compare/22.8.0...22.10.0) - [github.com/keewis/blackdoc: v0.3.7 → v0.3.8](https://github.com/keewis/blackdoc/compare/v0.3.7...v0.3.8) - [github.com/pre-commit/mirrors-mypy: v0.981 → v0.982](https://github.com/pre-commit/mirrors-mypy/compare/v0.981...v0.982) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- xarray/datatree_/.pre-commit-config.yaml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/xarray/datatree_/.pre-commit-config.yaml b/xarray/datatree_/.pre-commit-config.yaml index 0702d44690a..88f16dddddd 100644 --- a/xarray/datatree_/.pre-commit-config.yaml +++ b/xarray/datatree_/.pre-commit-config.yaml @@ -15,11 +15,11 @@ repos: - id: isort # https://github.com/python/black#version-control-integration - repo: https://github.com/psf/black - rev: 22.8.0 + rev: 22.10.0 hooks: - id: black - repo: https://github.com/keewis/blackdoc - rev: v0.3.7 + rev: v0.3.8 hooks: - id: blackdoc - repo: https://github.com/PyCQA/flake8 @@ -32,7 +32,7 @@ repos: # - id: velin # args: ["--write", "--compact"] - repo: https://github.com/pre-commit/mirrors-mypy - rev: v0.981 + rev: v0.982 hooks: - id: mypy # Copied from setup.cfg From 340e8a8c79a79cc316907c842ef593552858abfd Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Tue, 8 Nov 2022 17:56:59 -0500 Subject: [PATCH 168/260] Added docs page on io https://github.com/xarray-contrib/datatree/pull/158 * added page on io * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * whatsnew * fix typo which inverted meaning of key sentence * fix headings and link * add missing links * fix final link Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- xarray/datatree_/docs/source/index.rst | 1 + xarray/datatree_/docs/source/io.rst | 52 ++++++++++++++++++++++ xarray/datatree_/docs/source/whats-new.rst | 3 ++ 3 files changed, 56 insertions(+) create mode 100644 xarray/datatree_/docs/source/io.rst diff --git a/xarray/datatree_/docs/source/index.rst b/xarray/datatree_/docs/source/index.rst index f3e12e091cd..0f28aa60f68 100644 --- a/xarray/datatree_/docs/source/index.rst +++ b/xarray/datatree_/docs/source/index.rst @@ -12,6 +12,7 @@ Datatree Quick Overview Tutorial Data Model + Reading and Writing Files API Reference How do I ... Contributing Guide diff --git a/xarray/datatree_/docs/source/io.rst b/xarray/datatree_/docs/source/io.rst new file mode 100644 index 00000000000..be43f851396 --- /dev/null +++ b/xarray/datatree_/docs/source/io.rst @@ -0,0 +1,52 @@ +.. _data structures: + +Reading and Writing Files +========================= + +.. note:: + + This page builds on the information given in xarray's main page on + `reading and writing files `_, + so it is suggested that you are familiar with those first. + + +netCDF +------ + +Groups +~~~~~~ + +Whilst netCDF groups can only be loaded individually as Dataset objects, a whole file of many nested groups can be loaded +as a single ``:py:class::DataTree`` object. +To open a whole netCDF file as a tree of groups use the ``:py:func::open_datatree()`` function. +To save a DataTree object as a netCDF file containing many groups, use the ``:py:meth::DataTree.to_netcdf()`` method. + + +.. _netcdf.group.warning: + +.. warning:: + ``DataTree`` objects do not follow the exact same data model as netCDF files, which means that perfect round-tripping + is not always possible. + + In particular in the netCDF data model dimensions are entities that can exist regardless of whether any variable possesses them. + This is in contrast to `xarray's data model `_ + (and hence :ref:`datatree's data model`) in which the dimensions of a (Dataset/Tree) + object are simply the set of dimensions present across all variables in that dataset. + + This means that if a netCDF file contains dimensions but no variables which possess those dimensions, + these dimensions will not be present when that file is opened as a DataTree object. + Saving this DataTree object to file will therefore not preserve these "unused" dimensions. + +Zarr +---- + +Groups +~~~~~~ + +Nested groups in zarr stores can be represented by loading the store as a ``:py:class::DataTree`` object, similarly to netCDF. +To open a whole zarr store as a tree of groups use the ``:py:func::open_datatree()`` function. +To save a DataTree object as a zarr store containing many groups, use the ``:py:meth::DataTree.to_zarr()`` method. + +.. note:: + Note that perfect round-tripping should always be possible with a zarr store (:ref:`unlike for netCDF files`), + as zarr does not support "unused" dimensions. diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index daee3fd8ff7..f79d983465a 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -40,6 +40,9 @@ Bug fixes Documentation ~~~~~~~~~~~~~ +- Added ``Reading and Writing Files`` page. (:pull:`158`) + By `Tom Nicholas `_. + Internal Changes ~~~~~~~~~~~~~~~~ From ac1188362a80489c8b3efa639d3ea60d3455ef3e Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Thu, 1 Dec 2022 10:30:55 -0500 Subject: [PATCH 169/260] Bump pypa/gh-action-pypi-publish from 1.5.1 to 1.5.2 https://github.com/xarray-contrib/datatree/pull/162 Bumps [pypa/gh-action-pypi-publish](https://github.com/pypa/gh-action-pypi-publish) from 1.5.1 to 1.5.2. - [Release notes](https://github.com/pypa/gh-action-pypi-publish/releases) - [Commits](https://github.com/pypa/gh-action-pypi-publish/compare/v1.5.1...v1.5.2) --- updated-dependencies: - dependency-name: pypa/gh-action-pypi-publish dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- xarray/datatree_/.github/workflows/pypipublish.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index 115fdb00095..f5440e2e5f1 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -77,7 +77,7 @@ jobs: name: releases path: dist - name: Publish package to PyPI - uses: pypa/gh-action-pypi-publish@v1.5.1 + uses: pypa/gh-action-pypi-publish@v1.5.2 with: user: ${{ secrets.PYPI_USERNAME }} password: ${{ secrets.PYPI_PASSWORD }} From 1476a9a63668cc61bcfec7c8694caf610db0b2a4 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 5 Dec 2022 11:07:53 -0500 Subject: [PATCH 170/260] Bump pypa/gh-action-pypi-publish from 1.5.2 to 1.6.1 https://github.com/xarray-contrib/datatree/pull/163 Bumps [pypa/gh-action-pypi-publish](https://github.com/pypa/gh-action-pypi-publish) from 1.5.2 to 1.6.1. - [Release notes](https://github.com/pypa/gh-action-pypi-publish/releases) - [Commits](https://github.com/pypa/gh-action-pypi-publish/compare/v1.5.2...v1.6.1) --- updated-dependencies: - dependency-name: pypa/gh-action-pypi-publish dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- xarray/datatree_/.github/workflows/pypipublish.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index f5440e2e5f1..36981729c8d 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -77,7 +77,7 @@ jobs: name: releases path: dist - name: Publish package to PyPI - uses: pypa/gh-action-pypi-publish@v1.5.2 + uses: pypa/gh-action-pypi-publish@v1.6.1 with: user: ${{ secrets.PYPI_USERNAME }} password: ${{ secrets.PYPI_PASSWORD }} From 672c4f229c926782916c0b603280e61aca9d9b93 Mon Sep 17 00:00:00 2001 From: William Roberts <38170479+wroberts4@users.noreply.github.com> Date: Tue, 6 Dec 2022 14:37:23 -0600 Subject: [PATCH 171/260] Fix reading from fsspec s3 https://github.com/xarray-contrib/datatree/pull/130 * Fix pointer not at the start of the file error * whatsnew Co-authored-by: William Roberts Co-authored-by: Tom Nicholas --- xarray/datatree_/datatree/io.py | 4 ++-- xarray/datatree_/docs/source/whats-new.rst | 4 ++++ 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/xarray/datatree_/datatree/io.py b/xarray/datatree_/datatree/io.py index 8460a8979a4..fe18456efe3 100644 --- a/xarray/datatree_/datatree/io.py +++ b/xarray/datatree_/datatree/io.py @@ -63,9 +63,9 @@ def open_datatree(filename_or_obj, engine=None, **kwargs) -> DataTree: def _open_datatree_netcdf(filename: str, **kwargs) -> DataTree: ncDataset = _get_nc_dataset_class(kwargs.get("engine", None)) + ds = open_dataset(filename, **kwargs) + tree_root = DataTree.from_dict({"/": ds}) with ncDataset(filename, mode="r") as ncds: - ds = open_dataset(filename, **kwargs) - tree_root = DataTree.from_dict({"/": ds}) for path in _iter_nc_groups(ncds): subgroup_ds = open_dataset(filename, group=path, **kwargs) diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index f79d983465a..8d9d6573232 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -46,6 +46,10 @@ Documentation Internal Changes ~~~~~~~~~~~~~~~~ +- Avoid reading from same file twice with fsspec3 (:pull:`130`) + By `William Roberts `_. + + .. _whats-new.v0.0.9: v0.0.9 (07/14/2022) From 885ccaf679f3f27ce505c7ce24ab847c6b6dbf76 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Wed, 7 Dec 2022 12:52:44 -0500 Subject: [PATCH 172/260] Bump pypa/gh-action-pypi-publish from 1.6.1 to 1.6.4 https://github.com/xarray-contrib/datatree/pull/166 Bumps [pypa/gh-action-pypi-publish](https://github.com/pypa/gh-action-pypi-publish) from 1.6.1 to 1.6.4. - [Release notes](https://github.com/pypa/gh-action-pypi-publish/releases) - [Commits](https://github.com/pypa/gh-action-pypi-publish/compare/v1.6.1...v1.6.4) --- updated-dependencies: - dependency-name: pypa/gh-action-pypi-publish dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- xarray/datatree_/.github/workflows/pypipublish.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index 36981729c8d..98860accf70 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -77,7 +77,7 @@ jobs: name: releases path: dist - name: Publish package to PyPI - uses: pypa/gh-action-pypi-publish@v1.6.1 + uses: pypa/gh-action-pypi-publish@v1.6.4 with: user: ${{ secrets.PYPI_USERNAME }} password: ${{ secrets.PYPI_PASSWORD }} From 34e6de8d850e1f291305f421e18241c52979b663 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 7 Dec 2022 13:23:55 -0500 Subject: [PATCH 173/260] remove implicit optionals for PEP 484 --- xarray/datatree_/datatree/datatree.py | 58 +++++++++++++------------- xarray/datatree_/datatree/iterators.py | 8 ++-- xarray/datatree_/datatree/treenode.py | 10 +++-- 3 files changed, 39 insertions(+), 37 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index ff7a417b5ad..da53b7e363e 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -114,9 +114,9 @@ class DatasetView(Dataset): def __init__( self, - data_vars: Mapping[Any, Any] = None, - coords: Mapping[Any, Any] = None, - attrs: Mapping[Any, Any] = None, + data_vars: Optional[Mapping[Any, Any]] = None, + coords: Optional[Mapping[Any, Any]] = None, + attrs: Optional[Mapping[Any, Any]] = None, ): raise AttributeError("DatasetView objects are not to be initialized directly") @@ -173,11 +173,11 @@ def _construct_direct( cls, variables: dict[Any, Variable], coord_names: set[Hashable], - dims: dict[Any, int] = None, - attrs: dict = None, - indexes: dict[Any, Index] = None, - encoding: dict = None, - close: Callable[[], None] = None, + dims: Optional[dict[Any, int]] = None, + attrs: Optional[dict] = None, + indexes: Optional[dict[Any, Index]] = None, + encoding: Optional[dict] = None, + close: Optional[Callable[[], None]] = None, ) -> Dataset: """ Overriding this method (along with ._replace) and modifying it to return a Dataset object @@ -199,11 +199,11 @@ def _construct_direct( def _replace( self, - variables: dict[Hashable, Variable] = None, - coord_names: set[Hashable] = None, - dims: dict[Any, int] = None, + variables: Optional[dict[Hashable, Variable]] = None, + coord_names: Optional[set[Hashable]] = None, + dims: Optional[dict[Any, int]] = None, attrs: dict[Hashable, Any] | None | Default = _default, - indexes: dict[Hashable, Index] = None, + indexes: Optional[dict[Hashable, Index]] = None, encoding: dict | None | Default = _default, inplace: bool = False, ) -> Dataset: @@ -288,10 +288,10 @@ class DataTree( def __init__( self, - data: Dataset | DataArray = None, - parent: DataTree = None, - children: Mapping[str, DataTree] = None, - name: str = None, + data: Optional[Dataset | DataArray] = None, + parent: Optional[DataTree] = None, + children: Optional[Mapping[str, DataTree]] = None, + name: Optional[str] = None, ): """ Create a single node of a DataTree. @@ -368,7 +368,7 @@ def ds(self) -> DatasetView: return DatasetView._from_node(self) @ds.setter - def ds(self, data: Union[Dataset, DataArray] = None) -> None: + def ds(self, data: Optional[Union[Dataset, DataArray]] = None) -> None: ds = _coerce_to_dataset(data) @@ -518,14 +518,14 @@ def _construct_direct( cls, variables: dict[Any, Variable], coord_names: set[Hashable], - dims: dict[Any, int] = None, - attrs: dict = None, - indexes: dict[Any, Index] = None, - encoding: dict = None, + dims: Optional[dict[Any, int]] = None, + attrs: Optional[dict] = None, + indexes: Optional[dict[Any, Index]] = None, + encoding: Optional[dict] = None, name: str | None = None, parent: DataTree | None = None, - children: OrderedDict[str, DataTree] = None, - close: Callable[[], None] = None, + children: Optional[OrderedDict[str, DataTree]] = None, + close: Optional[Callable[[], None]] = None, ) -> DataTree: """Shortcut around __init__ for internal use when we want to skip costly validation.""" @@ -555,15 +555,15 @@ def _construct_direct( def _replace( self: DataTree, - variables: dict[Hashable, Variable] = None, - coord_names: set[Hashable] = None, - dims: dict[Any, int] = None, + variables: Optional[dict[Hashable, Variable]] = None, + coord_names: Optional[set[Hashable]] = None, + dims: Optional[dict[Any, int]] = None, attrs: dict[Hashable, Any] | None | Default = _default, - indexes: dict[Hashable, Index] = None, + indexes: Optional[dict[Hashable, Index]] = None, encoding: dict | None | Default = _default, name: str | None | Default = _default, parent: DataTree | None = _default, - children: OrderedDict[str, DataTree] = None, + children: Optional[OrderedDict[str, DataTree]] = None, inplace: bool = False, ) -> DataTree: """ @@ -755,7 +755,7 @@ def update(self, other: Dataset | Mapping[str, DataTree | DataArray]) -> None: def from_dict( cls, d: MutableMapping[str, Dataset | DataArray | None], - name: str = None, + name: Optional[str] = None, ) -> DataTree: """ Create a datatree from a dictionary of data objects, organised by paths into the tree. diff --git a/xarray/datatree_/datatree/iterators.py b/xarray/datatree_/datatree/iterators.py index e2c6b4d3fde..52ed8d22422 100644 --- a/xarray/datatree_/datatree/iterators.py +++ b/xarray/datatree_/datatree/iterators.py @@ -1,6 +1,6 @@ from abc import abstractmethod from collections import abc -from typing import Callable, Iterator, List +from typing import Callable, Iterator, List, Optional from .treenode import Tree @@ -11,9 +11,9 @@ class AbstractIter(abc.Iterator): def __init__( self, node: Tree, - filter_: Callable = None, - stop: Callable = None, - maxlevel: int = None, + filter_: Optional[Callable] = None, + stop: Optional[Callable] = None, + maxlevel: Optional[int] = None, ): """ Iterate over tree starting at `node`. diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index 16ffecc261b..9109205d51e 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -76,7 +76,7 @@ class TreeNode(Generic[Tree]): _parent: Optional[Tree] _children: OrderedDict[str, Tree] - def __init__(self, children: Mapping[str, Tree] = None): + def __init__(self, children: Optional[Mapping[str, Tree]] = None): """Create a parentless node.""" self._parent = None self._children = OrderedDict() @@ -88,7 +88,9 @@ def parent(self) -> Tree | None: """Parent of this node.""" return self._parent - def _set_parent(self, new_parent: Tree | None, child_name: str = None) -> None: + def _set_parent( + self, new_parent: Tree | None, child_name: Optional[str] = None + ) -> None: # TODO is it possible to refactor in a way that removes this private method? if new_parent is not None and not isinstance(new_parent, TreeNode): @@ -134,7 +136,7 @@ def _detach(self, parent: Tree | None) -> None: self._parent = None self._post_detach(parent) - def _attach(self, parent: Tree | None, child_name: str = None) -> None: + def _attach(self, parent: Tree | None, child_name: Optional[str] = None) -> None: if parent is not None: if child_name is None: raise ValueError( @@ -315,7 +317,7 @@ def _post_attach(self: Tree, parent: Tree) -> None: """Method call after attaching to `parent`.""" pass - def get(self: Tree, key: str, default: Tree = None) -> Optional[Tree]: + def get(self: Tree, key: str, default: Optional[Tree] = None) -> Optional[Tree]: """ Return the child node with the specified key. From 20eb9d968e7e7c192b17e512fc246c1ee038cf72 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 7 Dec 2022 13:56:41 -0500 Subject: [PATCH 174/260] remove comments from pre-commit ci config --- xarray/datatree_/.pre-commit-config.yaml | 20 -------------------- 1 file changed, 20 deletions(-) diff --git a/xarray/datatree_/.pre-commit-config.yaml b/xarray/datatree_/.pre-commit-config.yaml index 88f16dddddd..879cba1f3a0 100644 --- a/xarray/datatree_/.pre-commit-config.yaml +++ b/xarray/datatree_/.pre-commit-config.yaml @@ -1,4 +1,3 @@ -# https://pre-commit.com/ ci: autoupdate_schedule: monthly repos: @@ -8,12 +7,10 @@ repos: - id: trailing-whitespace - id: end-of-file-fixer - id: check-yaml - # isort should run before black as black sometimes tweaks the isort output - repo: https://github.com/PyCQA/isort rev: 5.10.1 hooks: - id: isort - # https://github.com/python/black#version-control-integration - repo: https://github.com/psf/black rev: 22.10.0 hooks: @@ -26,33 +23,16 @@ repos: rev: 5.0.4 hooks: - id: flake8 - # - repo: https://github.com/Carreau/velin - # rev: 0.0.8 - # hooks: - # - id: velin - # args: ["--write", "--compact"] - repo: https://github.com/pre-commit/mirrors-mypy rev: v0.982 hooks: - id: mypy - # Copied from setup.cfg exclude: "properties|asv_bench|docs" additional_dependencies: [ - # Type stubs types-python-dateutil, types-pkg_resources, types-PyYAML, types-pytz, - # Dependencies that are typed numpy, typing-extensions==3.10.0.0, ] - # run this occasionally, ref discussion https://github.com/pydata/xarray/pull/3194 - # - repo: https://github.com/asottile/pyupgrade - # rev: v1.22.1 - # hooks: - # - id: pyupgrade - # args: - # - "--py3-only" - # # remove on f-strings in Py3.7 - # - "--keep-percent-format" From 91bc86d69ecb9c6920e746589b7198635f3db6bb Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 7 Dec 2022 14:04:38 -0500 Subject: [PATCH 175/260] Revert "remove comments from pre-commit ci config" This reverts commit 20eb9d968e7e7c192b17e512fc246c1ee038cf72. --- xarray/datatree_/.pre-commit-config.yaml | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/xarray/datatree_/.pre-commit-config.yaml b/xarray/datatree_/.pre-commit-config.yaml index 879cba1f3a0..88f16dddddd 100644 --- a/xarray/datatree_/.pre-commit-config.yaml +++ b/xarray/datatree_/.pre-commit-config.yaml @@ -1,3 +1,4 @@ +# https://pre-commit.com/ ci: autoupdate_schedule: monthly repos: @@ -7,10 +8,12 @@ repos: - id: trailing-whitespace - id: end-of-file-fixer - id: check-yaml + # isort should run before black as black sometimes tweaks the isort output - repo: https://github.com/PyCQA/isort rev: 5.10.1 hooks: - id: isort + # https://github.com/python/black#version-control-integration - repo: https://github.com/psf/black rev: 22.10.0 hooks: @@ -23,16 +26,33 @@ repos: rev: 5.0.4 hooks: - id: flake8 + # - repo: https://github.com/Carreau/velin + # rev: 0.0.8 + # hooks: + # - id: velin + # args: ["--write", "--compact"] - repo: https://github.com/pre-commit/mirrors-mypy rev: v0.982 hooks: - id: mypy + # Copied from setup.cfg exclude: "properties|asv_bench|docs" additional_dependencies: [ + # Type stubs types-python-dateutil, types-pkg_resources, types-PyYAML, types-pytz, + # Dependencies that are typed numpy, typing-extensions==3.10.0.0, ] + # run this occasionally, ref discussion https://github.com/pydata/xarray/pull/3194 + # - repo: https://github.com/asottile/pyupgrade + # rev: v1.22.1 + # hooks: + # - id: pyupgrade + # args: + # - "--py3-only" + # # remove on f-strings in Py3.7 + # - "--keep-percent-format" From 4de53069394c6d71a75564477d478ef763a0987c Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 7 Dec 2022 14:05:43 -0500 Subject: [PATCH 176/260] un-inlined comments in flake8 config --- xarray/datatree_/setup.cfg | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/xarray/datatree_/setup.cfg b/xarray/datatree_/setup.cfg index 9a5664de397..2c7a052b197 100644 --- a/xarray/datatree_/setup.cfg +++ b/xarray/datatree_/setup.cfg @@ -33,11 +33,16 @@ exclude = [flake8] ignore = - E203 # whitespace before ':' - doesn't work well with black - E402 # module level import not at top of file - E501 # line too long - let black worry about that - E731 # do not assign a lambda expression, use a def - W503 # line break before binary operator + # whitespace before ':' - doesn't work well with black + E203 + # module level import not at top of file + E402 + # line too long - let black worry about that + E501 + # do not assign a lambda expression, use a def + E731 + # line break before binary operator + W503 exclude= .eggs doc From 6a7dae3efd2a31bee2aebfa90e495890066d27a3 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Wed, 7 Dec 2022 14:09:59 -0500 Subject: [PATCH 177/260] [pre-commit.ci] pre-commit autoupdate https://github.com/xarray-contrib/datatree/pull/165 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit updates: - [github.com/pre-commit/pre-commit-hooks: v4.3.0 → v4.4.0](https://github.com/pre-commit/pre-commit-hooks/compare/v4.3.0...v4.4.0) - [github.com/PyCQA/flake8: 5.0.4 → 6.0.0](https://github.com/PyCQA/flake8/compare/5.0.4...6.0.0) - [github.com/pre-commit/mirrors-mypy: v0.982 → v0.991](https://github.com/pre-commit/mirrors-mypy/compare/v0.982...v0.991) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tom Nicholas --- xarray/datatree_/.pre-commit-config.yaml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/xarray/datatree_/.pre-commit-config.yaml b/xarray/datatree_/.pre-commit-config.yaml index 88f16dddddd..bc7eafc1082 100644 --- a/xarray/datatree_/.pre-commit-config.yaml +++ b/xarray/datatree_/.pre-commit-config.yaml @@ -3,7 +3,7 @@ ci: autoupdate_schedule: monthly repos: - repo: https://github.com/pre-commit/pre-commit-hooks - rev: v4.3.0 + rev: v4.4.0 hooks: - id: trailing-whitespace - id: end-of-file-fixer @@ -23,7 +23,7 @@ repos: hooks: - id: blackdoc - repo: https://github.com/PyCQA/flake8 - rev: 5.0.4 + rev: 6.0.0 hooks: - id: flake8 # - repo: https://github.com/Carreau/velin @@ -32,7 +32,7 @@ repos: # - id: velin # args: ["--write", "--compact"] - repo: https://github.com/pre-commit/mirrors-mypy - rev: v0.982 + rev: v0.991 hooks: - id: mypy # Copied from setup.cfg From 245bb8c4646a6eb11ffbc8746868134968764093 Mon Sep 17 00:00:00 2001 From: Justus Magin Date: Wed, 7 Dec 2022 20:27:36 +0100 Subject: [PATCH 178/260] actually allow `DataTree` objects as values in `from_dict` https://github.com/xarray-contrib/datatree/pull/159 * allow passing `DataTree` objects as `dict` values * add a test verifying that DataTree objects are actually allowed * ignore mypy error with copied copy method * whatsnew Co-authored-by: Tom Nicholas --- xarray/datatree_/datatree/datatree.py | 9 +++++++-- xarray/datatree_/datatree/tests/test_datatree.py | 9 +++++++++ xarray/datatree_/docs/source/whats-new.rst | 3 +++ 3 files changed, 19 insertions(+), 2 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index da53b7e363e..5d588da1193 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -754,7 +754,7 @@ def update(self, other: Dataset | Mapping[str, DataTree | DataArray]) -> None: @classmethod def from_dict( cls, - d: MutableMapping[str, Dataset | DataArray | None], + d: MutableMapping[str, Dataset | DataArray | DataTree | None], name: Optional[str] = None, ) -> DataTree: """ @@ -790,7 +790,12 @@ def from_dict( for path, data in d.items(): # Create and set new node node_name = NodePath(path).name - new_node = cls(name=node_name, data=data) + if isinstance(data, cls): + # TODO ignoring type error only needed whilst .copy() method is copied from Dataset.copy(). + new_node = data.copy() # type: ignore[attr-defined] + new_node.orphan() + else: + new_node = cls(name=node_name, data=data) obj._set_item( path, new_node, diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index aa78c2671d6..f7860037d78 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -391,6 +391,15 @@ def test_full(self, simple_datatree): "/set3", ] + def test_datatree_values(self): + dat1 = DataTree(data=xr.Dataset({"a": 1})) + expected = DataTree() + expected["a"] = dat1 + + actual = DataTree.from_dict({"a": dat1}) + + dtt.assert_identical(actual, expected) + def test_roundtrip(self, simple_datatree): dt = simple_datatree roundtrip = DataTree.from_dict(dt.to_dict()) diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 8d9d6573232..cfe73a509ce 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -37,6 +37,9 @@ Deprecations Bug fixes ~~~~~~~~~ +- Allow ``Datatree`` objects as values in :py:meth:`DataTree.from_dict` (:pull:`159`). + By `Justus Magin `_. + Documentation ~~~~~~~~~~~~~ From 6dabe6d3380a5d7a0ec3390212b01a7f2417aeb4 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 7 Dec 2022 14:40:21 -0500 Subject: [PATCH 179/260] whatsnew after v0.0.10 relrease --- xarray/datatree_/docs/source/whats-new.rst | 44 +++++++++++++++++++++- 1 file changed, 42 insertions(+), 2 deletions(-) diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index cfe73a509ce..4c6ade30b68 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -15,10 +15,50 @@ What's New np.random.seed(123456) +.. _whats-new.v0.0.11: + +v0.0.11 (unreleased) +-------------------- + +New Features +~~~~~~~~~~~~ + +- Add the ability to register accessors on ``DataTree`` objects, by using ``register_datatree_accessor``. (:pull:`144`) + By `Tom Nicholas `_. +- Allow method chaining with a new :py:meth:`DataTree.pipe` method (:issue:`151`, :pull:`156`). + By `Justus Magin `_. + +Breaking changes +~~~~~~~~~~~~~~~~ + +Deprecations +~~~~~~~~~~~~ + +Bug fixes +~~~~~~~~~ + +- Allow ``Datatree`` objects as values in :py:meth:`DataTree.from_dict` (:pull:`159`). + By `Justus Magin `_. + +Documentation +~~~~~~~~~~~~~ + +- Added ``Reading and Writing Files`` page. (:pull:`158`) + By `Tom Nicholas `_. + +Internal Changes +~~~~~~~~~~~~~~~~ + +- Avoid reading from same file twice with fsspec3 (:pull:`130`) + By `William Roberts `_. + + .. _whats-new.v0.0.10: -v0.0.10 (unreleased) -------------------- +v0.0.10 (12/07/2022) +-------------------- + +Adds accessors and a `.pipe()` method. New Features ~~~~~~~~~~~~ From 77b7c5c5cedfdeff3e26b60ba0f5cd90c7fa411f Mon Sep 17 00:00:00 2001 From: Abel Aoun Date: Mon, 12 Dec 2022 19:12:21 +0100 Subject: [PATCH 180/260] Update README.md https://github.com/xarray-contrib/datatree/pull/167 Fix link to anytree issue. Co-authored-by: Tom Nicholas --- xarray/datatree_/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/README.md b/xarray/datatree_/README.md index bd6cc1b4bdd..a770fc27b3e 100644 --- a/xarray/datatree_/README.md +++ b/xarray/datatree_/README.md @@ -13,7 +13,7 @@ This aims to create the data structure discussed in [xarray issue #4118](https:/ The approach used here is based on benbovy's [`DatasetNode` example](https://gist.github.com/benbovy/92e7c76220af1aaa4b3a0b65374e233a) - the basic idea is that each tree node wraps a up to a single `xarray.Dataset`. The differences are that this effort: -- Uses a node structure inspired by [anytree](https://github.com/TomNicholas/datatree/issues/7) for the tree, +- Uses a node structure inspired by [anytree](https://github.com/xarray-contrib/datatree/issues/7) for the tree, - Implements path-like getting and setting, - Has functions for mapping user-supplied functions over every node in the tree, - Automatically dispatches *some* of `xarray.Dataset`'s API over every node in the tree (such as `.isel`), From 6441658abb3ca53cdfc26efbf93593b690996e04 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Tue, 27 Dec 2022 20:34:26 -0500 Subject: [PATCH 181/260] relative_to method bugfix https://github.com/xarray-contrib/datatree/pull/160 * moved tests to test datatree instead of treenode * fix bug with assigning name to undefined children * clean up test * fix original relative_to bug by changing checks in ancestors and lineage * whatsnew --- xarray/datatree_/datatree/datatree.py | 9 +-- .../datatree_/datatree/tests/test_datatree.py | 51 +++++++++++++++ .../datatree_/datatree/tests/test_treenode.py | 50 --------------- xarray/datatree_/datatree/treenode.py | 62 +++++++++++-------- xarray/datatree_/docs/source/whats-new.rst | 2 + 5 files changed, 95 insertions(+), 79 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 5d588da1193..606c7935e8e 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -326,10 +326,7 @@ def __init__( ds = _coerce_to_dataset(data) _check_for_name_collisions(children, ds.variables) - # set tree attributes - super().__init__(children=children) - self.name = name - self.parent = parent + super().__init__(name=name) # set data attributes self._replace( @@ -343,6 +340,10 @@ def __init__( ) self._close = ds._close + # set tree attributes (must happen after variables set to avoid initialization errors) + self.children = children + self.parent = parent + @property def parent(self: DataTree) -> DataTree | None: """Parent of this node.""" diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index f7860037d78..71d25a54045 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -62,6 +62,57 @@ def test_create_full_tree(self, simple_datatree): assert root.identical(expected) +class TestNames: + def test_child_gets_named_on_attach(self): + sue = DataTree() + mary = DataTree(children={"Sue": sue}) # noqa + assert sue.name == "Sue" + + @pytest.mark.xfail(reason="requires refactoring to retain name") + def test_grafted_subtree_retains_name(self): + subtree = DataTree("original") + root = DataTree(children={"new_name": subtree}) # noqa + assert subtree.name == "original" + + +class TestPaths: + def test_path_property(self): + sue = DataTree() + mary = DataTree(children={"Sue": sue}) + john = DataTree(children={"Mary": mary}) # noqa + assert sue.path == "/Mary/Sue" + assert john.path == "/" + + def test_path_roundtrip(self): + sue = DataTree() + mary = DataTree(children={"Sue": sue}) + john = DataTree(children={"Mary": mary}) # noqa + assert john[sue.path] is sue + + def test_same_tree(self): + mary = DataTree() + kate = DataTree() + john = DataTree(children={"Mary": mary, "Kate": kate}) # noqa + assert mary.same_tree(kate) + + def test_relative_paths(self): + sue = DataTree() + mary = DataTree(children={"Sue": sue}) + annie = DataTree() + john = DataTree(children={"Mary": mary, "Annie": annie}) + + result = sue.relative_to(john) + assert result == "Mary/Sue" + assert john.relative_to(sue) == "../.." + assert annie.relative_to(sue) == "../../Annie" + assert sue.relative_to(annie) == "../Mary/Sue" + assert sue.relative_to(sue) == "." + + evil_kate = DataTree() + with pytest.raises(ValueError, match="nodes do not lie within the same tree"): + sue.relative_to(evil_kate) + + class TestStoreDatasets: def test_create_with_data(self): dat = xr.Dataset({"a": 0}) diff --git a/xarray/datatree_/datatree/tests/test_treenode.py b/xarray/datatree_/datatree/tests/test_treenode.py index 2c2a50961ae..a38f4b2c070 100644 --- a/xarray/datatree_/datatree/tests/test_treenode.py +++ b/xarray/datatree_/datatree/tests/test_treenode.py @@ -224,56 +224,6 @@ def test_del_child(self): del john["Mary"] -class TestNames: - def test_child_gets_named_on_attach(self): - sue = NamedNode() - mary = NamedNode(children={"Sue": sue}) # noqa - assert sue.name == "Sue" - - @pytest.mark.xfail(reason="requires refactoring to retain name") - def test_grafted_subtree_retains_name(self): - subtree = NamedNode("original") - root = NamedNode(children={"new_name": subtree}) # noqa - assert subtree.name == "original" - - -class TestPaths: - def test_path_property(self): - sue = NamedNode() - mary = NamedNode(children={"Sue": sue}) - john = NamedNode(children={"Mary": mary}) # noqa - assert sue.path == "/Mary/Sue" - assert john.path == "/" - - def test_path_roundtrip(self): - sue = NamedNode() - mary = NamedNode(children={"Sue": sue}) - john = NamedNode(children={"Mary": mary}) # noqa - assert john._get_item(sue.path) == sue - - def test_same_tree(self): - mary = NamedNode() - kate = NamedNode() - john = NamedNode(children={"Mary": mary, "Kate": kate}) # noqa - assert mary.same_tree(kate) - - def test_relative_paths(self): - sue = NamedNode() - mary = NamedNode(children={"Sue": sue}) - annie = NamedNode() - john = NamedNode(children={"Mary": mary, "Annie": annie}) - - assert sue.relative_to(john) == "Mary/Sue" - assert john.relative_to(sue) == "../.." - assert annie.relative_to(sue) == "../../Annie" - assert sue.relative_to(annie) == "../Mary/Sue" - assert sue.relative_to(sue) == "." - - evil_kate = NamedNode() - with pytest.raises(ValueError, match="nodes do not lie within the same tree"): - sue.relative_to(evil_kate) - - def create_test_tree(): f = NamedNode() b = NamedNode() diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index 9109205d51e..d24c5c67a90 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -467,30 +467,6 @@ def same_tree(self, other: Tree) -> bool: """True if other node is in the same tree as this node.""" return self.root is other.root - def find_common_ancestor(self, other: Tree) -> Tree: - """ - Find the first common ancestor of two nodes in the same tree. - - Raise ValueError if they are not in the same tree. - """ - common_ancestor = None - for node in other.iter_lineage(): - if node in self.ancestors: - common_ancestor = node - break - - if not common_ancestor: - raise ValueError( - "Cannot find relative path because nodes do not lie within the same tree" - ) - - return common_ancestor - - def _path_to_ancestor(self, ancestor: Tree) -> NodePath: - generation_gap = list(self.lineage).index(ancestor) - path_upwards = "../" * generation_gap if generation_gap > 0 else "/" - return NodePath(path_upwards) - class NamedNode(TreeNode, Generic[Tree]): """ @@ -553,7 +529,7 @@ def relative_to(self: NamedNode, other: NamedNode) -> str: ) this_path = NodePath(self.path) - if other in self.lineage: + if other.path in list(ancestor.path for ancestor in self.lineage): return str(this_path.relative_to(other.path)) else: common_ancestor = self.find_common_ancestor(other) @@ -561,3 +537,39 @@ def relative_to(self: NamedNode, other: NamedNode) -> str: return str( path_to_common_ancestor / this_path.relative_to(common_ancestor.path) ) + + def find_common_ancestor(self, other: NamedNode) -> NamedNode: + """ + Find the first common ancestor of two nodes in the same tree. + + Raise ValueError if they are not in the same tree. + """ + common_ancestor = None + for node in other.iter_lineage(): + if node.path in [ancestor.path for ancestor in self.ancestors]: + common_ancestor = node + break + + if not common_ancestor: + raise ValueError( + "Cannot find relative path because nodes do not lie within the same tree" + ) + + return common_ancestor + + def _path_to_ancestor(self, ancestor: NamedNode) -> NodePath: + """Return the relative path from this node to the given ancestor node""" + + if not self.same_tree(ancestor): + raise ValueError( + "Cannot find relative path to ancestor because nodes do not lie within the same tree" + ) + if ancestor.path not in list(a.path for a in self.ancestors): + raise ValueError( + "Cannot find relative path to ancestor because given node is not an ancestor of this node" + ) + + lineage_paths = list(ancestor.path for ancestor in self.lineage) + generation_gap = list(lineage_paths).index(ancestor.path) + path_upwards = "../" * generation_gap if generation_gap > 0 else "/" + return NodePath(path_upwards) diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 4c6ade30b68..dda5a4c5489 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -39,6 +39,8 @@ Bug fixes - Allow ``Datatree`` objects as values in :py:meth:`DataTree.from_dict` (:pull:`159`). By `Justus Magin `_. +- Fix bug with :py:meth:`DataTree.relative_to` method (:issue:`133`, :pull:`160`). + By `Tom Nicholas `_. Documentation ~~~~~~~~~~~~~ From 5247692c00b62385f8cda32947499471bd9bd179 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Tue, 27 Dec 2022 21:04:27 -0500 Subject: [PATCH 182/260] Improved exception types https://github.com/xarray-contrib/datatree/pull/169 * created more specific tree-related exception types * listed new exception types in public API --- xarray/datatree_/datatree/__init__.py | 3 +++ .../datatree_/datatree/tests/test_datatree.py | 6 +++-- .../datatree_/datatree/tests/test_treenode.py | 12 +++++----- xarray/datatree_/datatree/treenode.py | 22 ++++++++++--------- xarray/datatree_/docs/source/api.rst | 2 ++ xarray/datatree_/docs/source/whats-new.rst | 1 + 6 files changed, 28 insertions(+), 18 deletions(-) diff --git a/xarray/datatree_/datatree/__init__.py b/xarray/datatree_/datatree/__init__.py index a8e29faa354..3b97ea9d4db 100644 --- a/xarray/datatree_/datatree/__init__.py +++ b/xarray/datatree_/datatree/__init__.py @@ -3,6 +3,7 @@ from .extensions import register_datatree_accessor from .io import open_datatree from .mapping import TreeIsomorphismError, map_over_subtree +from .treenode import InvalidTreeError, NotFoundInTreeError try: # NOTE: the `_version.py` file must not be present in the git repository @@ -16,6 +17,8 @@ "DataTree", "open_datatree", "TreeIsomorphismError", + "InvalidTreeError", + "NotFoundInTreeError", "map_over_subtree", "register_datatree_accessor", "__version__", diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 71d25a54045..08203f3ed32 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -7,7 +7,7 @@ from xarray.tests import create_test_data, source_ndarray import datatree.testing as dtt -from datatree import DataTree +from datatree import DataTree, NotFoundInTreeError class TestTreeCreation: @@ -109,7 +109,9 @@ def test_relative_paths(self): assert sue.relative_to(sue) == "." evil_kate = DataTree() - with pytest.raises(ValueError, match="nodes do not lie within the same tree"): + with pytest.raises( + NotFoundInTreeError, match="nodes do not lie within the same tree" + ): sue.relative_to(evil_kate) diff --git a/xarray/datatree_/datatree/tests/test_treenode.py b/xarray/datatree_/datatree/tests/test_treenode.py index a38f4b2c070..1805f038efa 100644 --- a/xarray/datatree_/datatree/tests/test_treenode.py +++ b/xarray/datatree_/datatree/tests/test_treenode.py @@ -1,7 +1,7 @@ import pytest from datatree.iterators import LevelOrderIter, PreOrderIter -from datatree.treenode import NamedNode, TreeError, TreeNode +from datatree.treenode import InvalidTreeError, NamedNode, TreeNode class TestFamilyTree: @@ -21,10 +21,10 @@ def test_parenting(self): def test_no_time_traveller_loops(self): john = TreeNode() - with pytest.raises(TreeError, match="cannot be a parent of itself"): + with pytest.raises(InvalidTreeError, match="cannot be a parent of itself"): john._set_parent(john, "John") - with pytest.raises(TreeError, match="cannot be a parent of itself"): + with pytest.raises(InvalidTreeError, match="cannot be a parent of itself"): john.children = {"John": john} mary = TreeNode() @@ -32,10 +32,10 @@ def test_no_time_traveller_loops(self): mary._set_parent(john, "Mary") rose._set_parent(mary, "Rose") - with pytest.raises(TreeError, match="is already a descendant"): + with pytest.raises(InvalidTreeError, match="is already a descendant"): john._set_parent(rose, "John") - with pytest.raises(TreeError, match="is already a descendant"): + with pytest.raises(InvalidTreeError, match="is already a descendant"): rose.children = {"John": john} def test_parent_swap(self): @@ -73,7 +73,7 @@ def test_doppelganger_child(self): with pytest.raises(TypeError): john.children = {"Kate": 666} - with pytest.raises(TreeError, match="Cannot add same node"): + with pytest.raises(InvalidTreeError, match="Cannot add same node"): john.children = {"Kate": kate, "Evil_Kate": kate} john = TreeNode(children={"Kate": kate}) diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index d24c5c67a90..3ff2eb98a2b 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -19,10 +19,12 @@ from xarray.core.types import T_DataArray -class TreeError(Exception): - """Exception type raised when user attempts to create an invalid tree in some way.""" +class InvalidTreeError(Exception): + """Raised when user attempts to create an invalid tree in some way.""" - ... + +class NotFoundInTreeError(ValueError): + """Raised when operation can't be completed because one node is part of the expected tree.""" class NodePath(PurePosixPath): @@ -109,12 +111,12 @@ def _check_loop(self, new_parent: Tree | None) -> None: """Checks that assignment of this new parent will not create a cycle.""" if new_parent is not None: if new_parent is self: - raise TreeError( + raise InvalidTreeError( f"Cannot set parent, as node {self} cannot be a parent of itself." ) if self._is_descendant_of(new_parent): - raise TreeError( + raise InvalidTreeError( "Cannot set parent, as intended parent is already a descendant of this node." ) @@ -211,7 +213,7 @@ def _check_children(children: Mapping[str, Tree]) -> None: if childid not in seen: seen.add(childid) else: - raise TreeError( + raise InvalidTreeError( f"Cannot add same node {name} multiple times as different children." ) @@ -524,7 +526,7 @@ def relative_to(self: NamedNode, other: NamedNode) -> str: If other is not in this tree, or it's otherwise impossible, raise a ValueError. """ if not self.same_tree(other): - raise ValueError( + raise NotFoundInTreeError( "Cannot find relative path because nodes do not lie within the same tree" ) @@ -551,7 +553,7 @@ def find_common_ancestor(self, other: NamedNode) -> NamedNode: break if not common_ancestor: - raise ValueError( + raise NotFoundInTreeError( "Cannot find relative path because nodes do not lie within the same tree" ) @@ -561,11 +563,11 @@ def _path_to_ancestor(self, ancestor: NamedNode) -> NodePath: """Return the relative path from this node to the given ancestor node""" if not self.same_tree(ancestor): - raise ValueError( + raise NotFoundInTreeError( "Cannot find relative path to ancestor because nodes do not lie within the same tree" ) if ancestor.path not in list(a.path for a in self.ancestors): - raise ValueError( + raise NotFoundInTreeError( "Cannot find relative path to ancestor because given node is not an ancestor of this node" ) diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst index 49caaea86c5..75c70584dab 100644 --- a/xarray/datatree_/docs/source/api.rst +++ b/xarray/datatree_/docs/source/api.rst @@ -309,6 +309,8 @@ Exceptions raised when manipulating trees. :toctree: generated/ TreeIsomorphismError + InvalidTreeError + NotFoundInTreeError Advanced API ============ diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index dda5a4c5489..71febcd92bd 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -27,6 +27,7 @@ New Features By `Tom Nicholas `_. - Allow method chaining with a new :py:meth:`DataTree.pipe` method (:issue:`151`, :pull:`156`). By `Justus Magin `_. +- New, more specific exception types for tree-related errors (:pull:`169`). Breaking changes ~~~~~~~~~~~~~~~~ From 54d940f217421b6638224b9b8db4bb394b6c3250 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Fri, 30 Dec 2022 19:11:37 -0500 Subject: [PATCH 183/260] Use xarray docs theme instead of pangeo https://github.com/xarray-contrib/datatree/pull/173 * use xarray docs theme instead of pangeo * satisfy pre-commit * whatsnew --- xarray/datatree_/ci/doc.yml | 10 +++- xarray/datatree_/docs/Makefile | 8 ++- xarray/datatree_/docs/source/conf.py | 66 +++++++++++++++++++--- xarray/datatree_/docs/source/whats-new.rst | 2 + 4 files changed, 75 insertions(+), 11 deletions(-) diff --git a/xarray/datatree_/ci/doc.yml b/xarray/datatree_/ci/doc.yml index 5a3afbdf49f..fc9baeb06ac 100644 --- a/xarray/datatree_/ci/doc.yml +++ b/xarray/datatree_/ci/doc.yml @@ -6,14 +6,18 @@ dependencies: - python>=3.8 - netcdf4 - scipy - - sphinx + - sphinx>=4.2.0 - sphinx-copybutton - - numpydoc + - sphinx-panels - sphinx-autosummary-accessors + - sphinx-book-theme >= 0.0.38 + - pydata-sphinx-theme>=0.4.3 + - numpydoc - ipython - h5netcdf - zarr - pip: - git+https://github.com/xarray-contrib/datatree - - pangeo-sphinx-book-theme + - sphinxext-rediraffe + - sphinxext-opengraph - xarray>=2022.05.0.dev0 diff --git a/xarray/datatree_/docs/Makefile b/xarray/datatree_/docs/Makefile index 9b5b6042838..6e9b4058414 100644 --- a/xarray/datatree_/docs/Makefile +++ b/xarray/datatree_/docs/Makefile @@ -19,11 +19,12 @@ ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) sou # the i18n builder cannot share the environment and doctrees with the others I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source -.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext +.PHONY: help clean html rtdhtml dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext help: @echo "Please use \`make ' where is one of" @echo " html to make standalone HTML files" + @echo " rtdhtml Build html using same settings used on ReadtheDocs" @echo " dirhtml to make HTML files named index.html in directories" @echo " singlehtml to make a single large HTML file" @echo " pickle to make pickle files" @@ -54,6 +55,11 @@ html: @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." +rtdhtml: + $(SPHINXBUILD) -T -j auto -E -W --keep-going -b html -d $(BUILDDIR)/doctrees -D language=en . $(BUILDDIR)/html + @echo + @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." + dirhtml: $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml @echo diff --git a/xarray/datatree_/docs/source/conf.py b/xarray/datatree_/docs/source/conf.py index d330c920982..e95bc2bc7e2 100644 --- a/xarray/datatree_/docs/source/conf.py +++ b/xarray/datatree_/docs/source/conf.py @@ -45,6 +45,9 @@ "sphinx.ext.intersphinx", "sphinx.ext.extlinks", "sphinx.ext.napoleon", + "sphinx_copybutton", + "sphinxext.opengraph", + "sphinx_autosummary_accessors", "IPython.sphinxext.ipython_console_highlighting", "IPython.sphinxext.ipython_directive", ] @@ -131,7 +134,11 @@ # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. -html_theme = "pangeo_sphinx_book_theme" +html_theme = "sphinx_book_theme" + +# Theme options are theme-specific and customize the look and feel of a theme +# further. For a list of options available for each theme, see the +# documentation. html_theme_options = { "repository_url": "https://github.com/xarray-contrib/datatree", "repository_branch": "main", @@ -141,11 +148,6 @@ "use_edit_page_button": True, } -# Theme options are theme-specific and customize the look and feel of a theme -# further. For a list of options available for each theme, see the -# documentation. -# html_theme_options = {} - # Add any paths that contain custom themes here, relative to this directory. # html_theme_path = [] @@ -168,7 +170,7 @@ # Add any paths that contain custom static files (such as style sheets) here, # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". -# html_static_path = ['_static'] +html_static_path = ["_static"] # If not '', a 'Last updated on:' timestamp is inserted at every page bottom, # using the given strftime format. @@ -291,3 +293,53 @@ # If true, do not generate a @detailmenu in the "Top" node's menu. # texinfo_no_detailmenu = False + + +# based on numpy doc/source/conf.py +def linkcode_resolve(domain, info): + """ + Determine the URL corresponding to Python object + """ + if domain != "py": + return None + + modname = info["module"] + fullname = info["fullname"] + + submod = sys.modules.get(modname) + if submod is None: + return None + + obj = submod + for part in fullname.split("."): + try: + obj = getattr(obj, part) + except AttributeError: + return None + + try: + fn = inspect.getsourcefile(inspect.unwrap(obj)) + except TypeError: + fn = None + if not fn: + return None + + try: + source, lineno = inspect.getsourcelines(obj) + except OSError: + lineno = None + + if lineno: + linespec = f"#L{lineno}-L{lineno + len(source) - 1}" + else: + linespec = "" + + fn = os.path.relpath(fn, start=os.path.dirname(xarray.__file__)) + + if "+" in xarray.__version__: + return f"https://github.com/xarray-contrib/datatree/blob/main/datatree/{fn}{linespec}" + else: + return ( + f"https://github.com/xarray-contrib/datatree/blob/" + f"v{datatree.__version__}/xarray/{fn}{linespec}" + ) diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 71febcd92bd..18ac3024745 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -48,6 +48,8 @@ Documentation - Added ``Reading and Writing Files`` page. (:pull:`158`) By `Tom Nicholas `_. +- Changed docs theme to match xarray's main documentation. (:pull:`173`) + By `Tom Nicholas `_. Internal Changes ~~~~~~~~~~~~~~~~ From 8442193ae05702a7db2e29598f7fa7582ca76c6d Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Fri, 30 Dec 2022 19:16:26 -0500 Subject: [PATCH 184/260] Descendants property https://github.com/xarray-contrib/datatree/pull/170 * added descendants property * tests for descendants, lineage, ancestors, subtree * added descendants to API docs * whatsnew --- .../datatree_/datatree/tests/test_treenode.py | 91 ++++++++++++++----- xarray/datatree_/datatree/treenode.py | 20 +++- xarray/datatree_/docs/source/api.rst | 1 + xarray/datatree_/docs/source/whats-new.rst | 3 + 4 files changed, 93 insertions(+), 22 deletions(-) diff --git a/xarray/datatree_/datatree/tests/test_treenode.py b/xarray/datatree_/datatree/tests/test_treenode.py index 1805f038efa..aaa0362a99f 100644 --- a/xarray/datatree_/datatree/tests/test_treenode.py +++ b/xarray/datatree_/datatree/tests/test_treenode.py @@ -225,57 +225,106 @@ def test_del_child(self): def create_test_tree(): - f = NamedNode() + a = NamedNode(name="a") b = NamedNode() - a = NamedNode() - d = NamedNode() c = NamedNode() + d = NamedNode() e = NamedNode() + f = NamedNode() g = NamedNode() - i = NamedNode() h = NamedNode() + i = NamedNode() - f.children = {"b": b, "g": g} - b.children = {"a": a, "d": d} - d.children = {"c": c, "e": e} - g.children = {"i": i} - i.children = {"h": h} + a.children = {"b": b, "c": c} + b.children = {"d": d, "e": e} + e.children = {"f": f, "g": g} + c.children = {"h": h} + h.children = {"i": i} - return f + return a, f class TestIterators: def test_preorderiter(self): - tree = create_test_tree() - result = [node.name for node in PreOrderIter(tree)] + root, _ = create_test_tree() + result = [node.name for node in PreOrderIter(root)] expected = [ - None, # root Node is unnamed - "b", "a", + "b", "d", - "c", "e", + "f", "g", - "i", + "c", "h", + "i", ] assert result == expected def test_levelorderiter(self): - tree = create_test_tree() - result = [node.name for node in LevelOrderIter(tree)] + root, _ = create_test_tree() + result = [node.name for node in LevelOrderIter(root)] expected = [ - None, # root Node is unnamed + "a", # root Node is unnamed "b", + "c", + "d", + "e", + "h", + "f", "g", + "i", + ] + assert result == expected + + +class TestAncestry: + def test_lineage(self): + _, leaf = create_test_tree() + lineage = leaf.lineage + expected = ["f", "e", "b", "a"] + for node, expected_name in zip(lineage, expected): + assert node.name == expected_name + + def test_ancestors(self): + _, leaf = create_test_tree() + ancestors = leaf.ancestors + expected = ["a", "b", "e", "f"] + for node, expected_name in zip(ancestors, expected): + assert node.name == expected_name + + def test_subtree(self): + root, _ = create_test_tree() + subtree = root.subtree + expected = [ "a", + "b", "d", - "i", + "e", + "f", + "g", "c", + "h", + "i", + ] + for node, expected_name in zip(subtree, expected): + assert node.name == expected_name + + def test_descendants(self): + root, _ = create_test_tree() + descendants = root.descendants + expected = [ + "b", + "d", "e", + "f", + "g", + "c", "h", + "i", ] - assert result == expected + for node, expected_name in zip(descendants, expected): + assert node.name == expected_name class TestRenderTree: diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index 3ff2eb98a2b..7b49971015f 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -238,7 +238,6 @@ def _post_attach_children(self: Tree, children: Mapping[str, Tree]) -> None: def iter_lineage(self: Tree) -> Iterator[Tree]: """Iterate up the tree, starting from the current node.""" - # TODO should this instead return an OrderedDict, so as to include node names? node: Tree | None = self while node is not None: yield node @@ -298,11 +297,30 @@ def subtree(self: Tree) -> Iterator[Tree]: An iterator over all nodes in this tree, including both self and all descendants. Iterates depth-first. + + See Also + -------- + DataTree.descendants """ from . import iterators return iterators.PreOrderIter(self) + @property + def descendants(self: Tree) -> Tuple[Tree]: + """ + Child nodes and all their child nodes. + + Returned in depth-first order. + + See Also + -------- + DataTree.subtree + """ + all_nodes = tuple(self.subtree) + this_node, *descendants = all_nodes + return tuple(descendants) # type: ignore[return-value] + def _pre_detach(self: Tree, parent: Tree) -> None: """Method call before detaching from `parent`.""" pass diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst index 75c70584dab..751e5643c7b 100644 --- a/xarray/datatree_/docs/source/api.rst +++ b/xarray/datatree_/docs/source/api.rst @@ -31,6 +31,7 @@ Attributes relating to the recursive tree-like structure of a ``DataTree``. DataTree.is_root DataTree.is_leaf DataTree.subtree + DataTree.descendants DataTree.siblings DataTree.lineage DataTree.ancestors diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 18ac3024745..5e86ba1c738 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -28,6 +28,9 @@ New Features - Allow method chaining with a new :py:meth:`DataTree.pipe` method (:issue:`151`, :pull:`156`). By `Justus Magin `_. - New, more specific exception types for tree-related errors (:pull:`169`). + By `Tom Nicholas `_. +- Added a new :py:meth:`DataTree.descendants` property (:pull:`170`). + By `Tom Nicholas `_. Breaking changes ~~~~~~~~~~~~~~~~ From e43d7abea16911ee61c4bc66d08057bab91534eb Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Fri, 30 Dec 2022 19:25:43 -0500 Subject: [PATCH 185/260] Explicit copy https://github.com/xarray-contrib/datatree/pull/171 * added descendants property * tests for descendants, lineage, ancestors, subtree * added descendants to API docs * whatsnew * rerun tests * rewrote copy method * remove outdated mypy ignore error --- xarray/datatree_/datatree/datatree.py | 70 ++++++++++++++++++++-- xarray/datatree_/datatree/ops.py | 3 - xarray/datatree_/docs/source/whats-new.rst | 3 + 3 files changed, 69 insertions(+), 7 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 606c7935e8e..0696c90b4c9 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -628,6 +628,66 @@ def _replace( ) return obj + def copy( + self: DataTree, + deep: bool = False, + ) -> DataTree: + """ + Returns a copy of this subtree. + + Copies this node and all child nodes. + + If `deep=True`, a deep copy is made of each of the component variables. + Otherwise, a shallow copy of each of the component variable is made, so + that the underlying memory region of the new datatree is the same as in + the original datatree. + + Parameters + ---------- + deep : bool, default: False + Whether each component variable is loaded into memory and copied onto + the new object. Default is False. + + Returns + ------- + object : DataTree + New object with dimensions, attributes, coordinates, name, encoding, + and data of this node and all child nodes copied from original. + + See Also + -------- + xarray.Dataset.copy + pandas.DataFrame.copy + """ + return self._copy_subtree(deep=deep) + + def _copy_subtree( + self: DataTree, + deep: bool = False, + memo: dict[int, Any] | None = None, + ) -> DataTree: + """Copy entire subtree""" + new_tree = self._copy_node(deep=deep) + for node in self.descendants: + new_tree[node.path] = node._copy_node(deep=deep) + return new_tree + + def _copy_node( + self: DataTree, + deep: bool = False, + ) -> DataTree: + """Copy just one node of a tree""" + new_node: DataTree = DataTree() + new_node.name = self.name + new_node.ds = self.to_dataset().copy(deep=deep) + return new_node + + def __copy__(self: DataTree) -> DataTree: + return self._copy_subtree(deep=False) + + def __deepcopy__(self: DataTree, memo: dict[int, Any] | None = None) -> DataTree: + return self._copy_subtree(deep=True, memo=memo) + def get( self: DataTree, key: str, default: Optional[DataTree | DataArray] = None ) -> Optional[DataTree | DataArray]: @@ -694,8 +754,11 @@ def _set(self, key: str, val: DataTree | CoercibleValue) -> None: Counterpart to the public .get method, and also only works on the immediate node, not other nodes in the tree. """ if isinstance(val, DataTree): - val.name = key - val.parent = self + # TODO shallow copy here so as not to alter name of node in original tree? + # new_node = copy.copy(val, deep=False) + new_node = val + new_node.name = key + new_node.parent = self else: if not isinstance(val, (DataArray, Variable)): # accommodate other types that can be coerced into Variables @@ -792,8 +855,7 @@ def from_dict( # Create and set new node node_name = NodePath(path).name if isinstance(data, cls): - # TODO ignoring type error only needed whilst .copy() method is copied from Dataset.copy(). - new_node = data.copy() # type: ignore[attr-defined] + new_node = data.copy() new_node.orphan() else: new_node = cls(name=node_name, data=data) diff --git a/xarray/datatree_/datatree/ops.py b/xarray/datatree_/datatree/ops.py index bdc931c910e..eabc1fafc1c 100644 --- a/xarray/datatree_/datatree/ops.py +++ b/xarray/datatree_/datatree/ops.py @@ -31,9 +31,6 @@ ] _DATASET_METHODS_TO_MAP = [ "as_numpy", - "copy", - "__copy__", - "__deepcopy__", "set_coords", "reset_coords", "info", diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 5e86ba1c738..e13a71432d9 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -35,6 +35,9 @@ New Features Breaking changes ~~~~~~~~~~~~~~~~ +- :py:meth:`DataTree.copy` copy method now only copies the subtree, not the parent nodes (:pull:`171`). + By `Tom Nicholas `_. + Deprecations ~~~~~~~~~~~~ From 191de90b4e38972eca431c70cba36cfadc33a781 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Fri, 30 Dec 2022 19:31:06 -0500 Subject: [PATCH 186/260] Name permanence via shallow copy https://github.com/xarray-contrib/datatree/pull/172 * added descendants property * tests for descendants, lineage, ancestors, subtree * added descendants to API docs * whatsnew * rerun tests * rewrote copy method * remove outdated mypy ignore error * changed tests to reflect new expected behaviour * shallow copy on insertion * update test for checking isomorphism from root * whatsnew --- xarray/datatree_/datatree/datatree.py | 5 ++-- .../datatree_/datatree/tests/test_datatree.py | 28 +++++++++++-------- .../datatree_/datatree/tests/test_mapping.py | 5 ++-- xarray/datatree_/docs/source/whats-new.rst | 2 ++ 4 files changed, 24 insertions(+), 16 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 0696c90b4c9..661da370842 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -754,9 +754,8 @@ def _set(self, key: str, val: DataTree | CoercibleValue) -> None: Counterpart to the public .get method, and also only works on the immediate node, not other nodes in the tree. """ if isinstance(val, DataTree): - # TODO shallow copy here so as not to alter name of node in original tree? - # new_node = copy.copy(val, deep=False) - new_node = val + # create and assign a shallow copy here so as not to alter original name of node in grafted tree + new_node = val.copy(deep=False) new_node.name = key new_node.parent = self else: diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 08203f3ed32..b5a0c44a967 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -68,12 +68,6 @@ def test_child_gets_named_on_attach(self): mary = DataTree(children={"Sue": sue}) # noqa assert sue.name == "Sue" - @pytest.mark.xfail(reason="requires refactoring to retain name") - def test_grafted_subtree_retains_name(self): - subtree = DataTree("original") - root = DataTree(children={"new_name": subtree}) # noqa - assert subtree.name == "original" - class TestPaths: def test_path_property(self): @@ -294,8 +288,11 @@ class TestSetItem: def test_setitem_new_child_node(self): john = DataTree(name="john") mary = DataTree(name="mary") - john["Mary"] = mary - assert john["Mary"] is mary + john["mary"] = mary + + grafted_mary = john["mary"] + assert grafted_mary.parent is john + assert grafted_mary.name == "mary" def test_setitem_unnamed_child_node_becomes_named(self): john2 = DataTree(name="john2") @@ -304,10 +301,19 @@ def test_setitem_unnamed_child_node_becomes_named(self): def test_setitem_new_grandchild_node(self): john = DataTree(name="john") - DataTree(name="mary", parent=john) + mary = DataTree(name="mary", parent=john) rose = DataTree(name="rose") - john["Mary/Rose"] = rose - assert john["Mary/Rose"] is rose + john["mary/rose"] = rose + + grafted_rose = john["mary/rose"] + assert grafted_rose.parent is mary + assert grafted_rose.name == "rose" + + def test_grafted_subtree_retains_name(self): + subtree = DataTree(name="original_subtree_name") + root = DataTree(name="root") + root["new_subtree_name"] = subtree # noqa + assert subtree.name == "original_subtree_name" def test_setitem_new_empty_node(self): john = DataTree(name="john") diff --git a/xarray/datatree_/datatree/tests/test_mapping.py b/xarray/datatree_/datatree/tests/test_mapping.py index b1bb59f890f..9714233a9d9 100644 --- a/xarray/datatree_/datatree/tests/test_mapping.py +++ b/xarray/datatree_/datatree/tests/test_mapping.py @@ -68,8 +68,9 @@ def test_not_isomorphic_complex_tree(self, create_test_datatree): def test_checking_from_root(self, create_test_datatree): dt1 = create_test_datatree() dt2 = create_test_datatree() - real_root = DataTree() - real_root["fake_root"] = dt2 + real_root = DataTree(name="real root") + dt2.name = "not_real_root" + dt2.parent = real_root with pytest.raises(TreeIsomorphismError): check_isomorphic(dt1, dt2, check_from_root=True) diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index e13a71432d9..2fbc4819847 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -37,6 +37,8 @@ Breaking changes - :py:meth:`DataTree.copy` copy method now only copies the subtree, not the parent nodes (:pull:`171`). By `Tom Nicholas `_. +- Grafting a subtree onto another tree now leaves name of original subtree object unchanged (:issue:`116`, :pull:`172`). + By `Tom Nicholas `_. Deprecations ~~~~~~~~~~~~ From a1f49239c9e1ca478c1b2bfdb863da9274d0747d Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Sat, 31 Dec 2022 13:04:36 -0500 Subject: [PATCH 187/260] Terminology https://github.com/xarray-contrib/datatree/pull/174 * terminology page * added to index * whatsnew * clarifications --- xarray/datatree_/docs/source/index.rst | 1 + xarray/datatree_/docs/source/terminology.rst | 33 ++++++++++++++++++++ xarray/datatree_/docs/source/whats-new.rst | 2 ++ 3 files changed, 36 insertions(+) create mode 100644 xarray/datatree_/docs/source/terminology.rst diff --git a/xarray/datatree_/docs/source/index.rst b/xarray/datatree_/docs/source/index.rst index 0f28aa60f68..9448e2325ed 100644 --- a/xarray/datatree_/docs/source/index.rst +++ b/xarray/datatree_/docs/source/index.rst @@ -14,6 +14,7 @@ Datatree Data Model Reading and Writing Files API Reference + Terminology How do I ... Contributing Guide Development Roadmap diff --git a/xarray/datatree_/docs/source/terminology.rst b/xarray/datatree_/docs/source/terminology.rst new file mode 100644 index 00000000000..a6b1cc8f2de --- /dev/null +++ b/xarray/datatree_/docs/source/terminology.rst @@ -0,0 +1,33 @@ +.. currentmodule:: datatree +.. _terminology: + +This page extends `xarray's page on terminology `_. + +Terminology +=========== + +.. glossary:: + + DataTree + A tree-like collection of ``Dataset`` objects. A *tree* is made up of one or more *nodes*, + each of which can store the same information as a single ``Dataset`` (accessed via `.ds`). + This data is stored in the same way as in a ``Dataset``, i.e. in the form of data variables + (see **Variable** in the `corresponding xarray terminology page `_), + dimensions, coordinates, and attributes. + + The nodes in a tree are linked to one another, and each node is it's own instance of ``DataTree`` object. + Each node can have zero or more *children* (stored in a dictionary-like manner under their corresponding *names*), + and those child nodes can themselves have children. + If a node is a child of another node that other node is said to be its *parent*. Nodes can have a maximum of one parent, + and if a node has no parent it is said to be the *root* node of that *tree*. + + Subtree + A section of a *tree*, consisting of a *node* along with all the child nodes below it + (and the child nodes below them, i.e. all so-called *descendant* nodes). + Excludes the parent node and all nodes above. + + Group + Another word for a subtree, reflecting how the hierarchical structure of a ``DataTree`` allows for grouping related data together. + Analogous to a single + `netCDF group `_ or + `Zarr group `_. diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 2fbc4819847..55999d01a9f 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -58,6 +58,8 @@ Documentation By `Tom Nicholas `_. - Changed docs theme to match xarray's main documentation. (:pull:`173`) By `Tom Nicholas `_. +- Added ``Terminology`` page. (:pull:`174`) + By `Tom Nicholas `_. Internal Changes ~~~~~~~~~~~~~~~~ From 2baea0a2c3d04839d4d09d563295326ac3fd4cf9 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Sat, 31 Dec 2022 13:07:21 -0500 Subject: [PATCH 188/260] removed duplicated whatsnew entries from v0.10 in v0.11 --- xarray/datatree_/docs/source/whats-new.rst | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 55999d01a9f..b840a14d921 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -23,10 +23,6 @@ v0.0.11 (unreleased) New Features ~~~~~~~~~~~~ -- Add the ability to register accessors on ``DataTree`` objects, by using ``register_datatree_accessor``. (:pull:`144`) - By `Tom Nicholas `_. -- Allow method chaining with a new :py:meth:`DataTree.pipe` method (:issue:`151`, :pull:`156`). - By `Justus Magin `_. - New, more specific exception types for tree-related errors (:pull:`169`). By `Tom Nicholas `_. - Added a new :py:meth:`DataTree.descendants` property (:pull:`170`). @@ -46,16 +42,12 @@ Deprecations Bug fixes ~~~~~~~~~ -- Allow ``Datatree`` objects as values in :py:meth:`DataTree.from_dict` (:pull:`159`). - By `Justus Magin `_. - Fix bug with :py:meth:`DataTree.relative_to` method (:issue:`133`, :pull:`160`). By `Tom Nicholas `_. Documentation ~~~~~~~~~~~~~ -- Added ``Reading and Writing Files`` page. (:pull:`158`) - By `Tom Nicholas `_. - Changed docs theme to match xarray's main documentation. (:pull:`173`) By `Tom Nicholas `_. - Added ``Terminology`` page. (:pull:`174`) @@ -64,9 +56,6 @@ Documentation Internal Changes ~~~~~~~~~~~~~~~~ -- Avoid reading from same file twice with fsspec3 (:pull:`130`) - By `William Roberts `_. - .. _whats-new.v0.0.10: From 77d8c296f25dd69f26b72257dde461c7d18a1cf7 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Sat, 31 Dec 2022 16:29:28 -0500 Subject: [PATCH 189/260] Added drop_nodes method https://github.com/xarray-contrib/datatree/pull/175 * tests * drop_nodes implementation * whatsnew * added drop_nodes to API docs page --- xarray/datatree_/datatree/datatree.py | 37 +++++++++++++++++++ .../datatree_/datatree/tests/test_datatree.py | 20 +++++++++- xarray/datatree_/docs/source/api.rst | 5 +++ xarray/datatree_/docs/source/whats-new.rst | 2 + 4 files changed, 63 insertions(+), 1 deletion(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 661da370842..048fd1df370 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -45,6 +45,7 @@ if TYPE_CHECKING: from xarray.core.merge import CoercibleValue + from xarray.core.types import ErrorOptions # """ # DEVELOPERS' NOTE @@ -814,6 +815,42 @@ def update(self, other: Dataset | Mapping[str, DataTree | DataArray]) -> None: inplace=True, children=merged_children, **vars_merge_result._asdict() ) + def drop_nodes( + self: DataTree, names: str | Iterable[str], *, errors: ErrorOptions = "raise" + ) -> DataTree: + """ + Drop child nodes from this node. + + Parameters + ---------- + names : str or iterable of str + Name(s) of nodes to drop. + errors : {"raise", "ignore"}, default: "raise" + If 'raise', raises a KeyError if any of the node names + passed are not present as children of this node. If 'ignore', + any given names that are present are dropped and no error is raised. + + Returns + ------- + dropped : DataTree + A copy of the node with the specified children dropped. + """ + # the Iterable check is required for mypy + if isinstance(names, str) or not isinstance(names, Iterable): + names = {names} + else: + names = set(names) + + if errors == "raise": + extra = names - set(self.children) + if extra: + raise KeyError(f"Cannot drop all nodes - nodes {extra} not present") + + children_to_keep = OrderedDict( + {name: child for name, child in self.children.items() if name not in names} + ) + return self._replace(children=children_to_keep) + @classmethod def from_dict( cls, diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index b5a0c44a967..b1e9ee48cc2 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -515,7 +515,25 @@ def test_arithmetic(self, create_test_datatree): class TestRestructuring: - ... + def test_drop_nodes(self): + sue = DataTree.from_dict({"Mary": None, "Kate": None, "Ashley": None}) + + # test drop just one node + dropped_one = sue.drop_nodes(names="Mary") + assert "Mary" not in dropped_one.children + + # test drop multiple nodes + dropped = sue.drop_nodes(names=["Mary", "Kate"]) + assert not set(["Mary", "Kate"]).intersection(set(dropped.children)) + assert "Ashley" in dropped.children + + # test raise + with pytest.raises(KeyError, match="nodes {'Mary'} not present"): + dropped.drop_nodes(names=["Mary", "Ashley"]) + + # test ignore + childless = dropped.drop_nodes(names=["Mary", "Ashley"], errors="ignore") + assert childless.children == {} class TestPipe: diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst index 751e5643c7b..18f9747c58c 100644 --- a/xarray/datatree_/docs/source/api.rst +++ b/xarray/datatree_/docs/source/api.rst @@ -126,6 +126,11 @@ DataTree Node Contents Manipulate the contents of a single DataTree node. +.. autosummary:: + :toctree: generated/ + + DataTree.drop_nodes + Comparisons =========== diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index b840a14d921..7c7e875c0ee 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -23,6 +23,8 @@ v0.0.11 (unreleased) New Features ~~~~~~~~~~~~ +- Added a :py:meth:`DataTree.drop_nodes` method (:issue:`161`, :pull:`175`). + By `Tom Nicholas `_. - New, more specific exception types for tree-related errors (:pull:`169`). By `Tom Nicholas `_. - Added a new :py:meth:`DataTree.descendants` property (:pull:`170`). From 852164037d7c7a24b53ccb54a086e01ea988eb65 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Sat, 31 Dec 2022 16:31:25 -0500 Subject: [PATCH 190/260] removed unusued update method on TreeNode --- xarray/datatree_/datatree/treenode.py | 9 --------- 1 file changed, 9 deletions(-) diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index 7b49971015f..2de56088795 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -474,15 +474,6 @@ def __delitem__(self: Tree, key: str): else: raise KeyError("Cannot delete") - def update(self: Tree, other: Mapping[str, Tree]) -> None: - """ - Update this node's children. - - Just like `dict.update` this is an in-place operation. - """ - new_children = {**self.children, **other} - self.children = new_children - def same_tree(self, other: Tree) -> bool: """True if other node is in the same tree as this node.""" return self.root is other.root From 34c0a01794ff4adb1104b5b349ce5a2ff4fa54de Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Sun, 1 Jan 2023 15:18:02 -0500 Subject: [PATCH 191/260] add and correct internal links between docs pages --- xarray/datatree_/docs/source/data-structures.rst | 8 +++++--- xarray/datatree_/docs/source/io.rst | 2 +- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/xarray/datatree_/docs/source/data-structures.rst b/xarray/datatree_/docs/source/data-structures.rst index 93d5b9abe31..67e0e608cd3 100644 --- a/xarray/datatree_/docs/source/data-structures.rst +++ b/xarray/datatree_/docs/source/data-structures.rst @@ -66,10 +66,12 @@ The overall structure is technically a `connected acyclic undirected rooted grap Again these are not normally used unless explicitly accessed by the user. +.. _creating a datatree: + Creating a DataTree ~~~~~~~~~~~~~~~~~~~ -There are two ways to create a ``DataTree`` from scratch. The first is to create each node individually, +There are three ways to create a ``DataTree`` from scratch. The first is to create each node individually, specifying the nodes' relationship to one another as you create each one. The ``DataTree`` constructor takes: @@ -144,8 +146,8 @@ we can construct a complex tree quickly using the alternative constructor ``:py: Notice that this method will also create any intermediate empty node necessary to reach the end of the specified path (i.e. the node labelled `"c"` in this case.) -Finally if you have a file containing data on disk (such as a netCDF file or a Zarr Store), you can also create a datatree by opening the -file using ``:py:func::~datatree.open_datatree``. +Finally the third way is from a file. if you have a file containing data on disk (such as a netCDF file or a Zarr Store), you can also create a datatree by opening the +file using ``:py:func::~datatree.open_datatree``. See the page on :ref:`reading and writing files ` for more details. DataTree Contents diff --git a/xarray/datatree_/docs/source/io.rst b/xarray/datatree_/docs/source/io.rst index be43f851396..49f3faa76d2 100644 --- a/xarray/datatree_/docs/source/io.rst +++ b/xarray/datatree_/docs/source/io.rst @@ -1,4 +1,4 @@ -.. _data structures: +.. _io: Reading and Writing Files ========================= From 6e419c4d425890a50fd3cafd532723c51026f27b Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Mon, 2 Jan 2023 20:25:15 -0500 Subject: [PATCH 192/260] [pre-commit.ci] pre-commit autoupdate https://github.com/xarray-contrib/datatree/pull/176 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit updates: - [github.com/PyCQA/isort: 5.10.1 → 5.11.4](https://github.com/PyCQA/isort/compare/5.10.1...5.11.4) - [github.com/psf/black: 22.10.0 → 22.12.0](https://github.com/psf/black/compare/22.10.0...22.12.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- xarray/datatree_/.pre-commit-config.yaml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/xarray/datatree_/.pre-commit-config.yaml b/xarray/datatree_/.pre-commit-config.yaml index bc7eafc1082..7773f727497 100644 --- a/xarray/datatree_/.pre-commit-config.yaml +++ b/xarray/datatree_/.pre-commit-config.yaml @@ -10,12 +10,12 @@ repos: - id: check-yaml # isort should run before black as black sometimes tweaks the isort output - repo: https://github.com/PyCQA/isort - rev: 5.10.1 + rev: 5.11.4 hooks: - id: isort # https://github.com/python/black#version-control-integration - repo: https://github.com/psf/black - rev: 22.10.0 + rev: 22.12.0 hooks: - id: black - repo: https://github.com/keewis/blackdoc From ea91203509934fe4c1f23b8ff312e7157296774c Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Mon, 2 Jan 2023 20:35:24 -0500 Subject: [PATCH 193/260] Add leaves property https://github.com/xarray-contrib/datatree/pull/177 * test * implementation of leaves * add leaves to public API * whatsnew --- .../datatree_/datatree/tests/test_treenode.py | 12 ++++++++++ xarray/datatree_/datatree/treenode.py | 23 +++++++++++++++---- xarray/datatree_/docs/source/api.rst | 1 + xarray/datatree_/docs/source/whats-new.rst | 2 ++ 4 files changed, 33 insertions(+), 5 deletions(-) diff --git a/xarray/datatree_/datatree/tests/test_treenode.py b/xarray/datatree_/datatree/tests/test_treenode.py index aaa0362a99f..a996468b367 100644 --- a/xarray/datatree_/datatree/tests/test_treenode.py +++ b/xarray/datatree_/datatree/tests/test_treenode.py @@ -326,6 +326,18 @@ def test_descendants(self): for node, expected_name in zip(descendants, expected): assert node.name == expected_name + def test_leaves(self): + tree, _ = create_test_tree() + leaves = tree.leaves + expected = [ + "d", + "f", + "g", + "i", + ] + for node, expected_name in zip(leaves, expected): + assert node.name == expected_name + class TestRenderTree: def test_render_nodetree(self): diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index 2de56088795..2d618951ec4 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -267,14 +267,27 @@ def root(self: Tree) -> Tree: @property def is_root(self) -> bool: - """Whether or not this node is the tree root.""" + """Whether this node is the tree root.""" return self.parent is None @property def is_leaf(self) -> bool: - """Whether or not this node is a leaf node.""" + """ + Whether this node is a leaf node. + + Leaf nodes are defined as nodes which have no children. + """ return self.children == {} + @property + def leaves(self: Tree) -> Tuple[Tree, ...]: + """ + All leaf nodes. + + Leaf nodes are defined as nodes which have no children. + """ + return tuple([node for node in self.subtree if node.is_leaf]) + @property def siblings(self: Tree) -> OrderedDict[str, Tree]: """ @@ -307,7 +320,7 @@ def subtree(self: Tree) -> Iterator[Tree]: return iterators.PreOrderIter(self) @property - def descendants(self: Tree) -> Tuple[Tree]: + def descendants(self: Tree) -> Tuple[Tree, ...]: """ Child nodes and all their child nodes. @@ -319,7 +332,7 @@ def descendants(self: Tree) -> Tuple[Tree]: """ all_nodes = tuple(self.subtree) this_node, *descendants = all_nodes - return tuple(descendants) # type: ignore[return-value] + return tuple(descendants) def _pre_detach(self: Tree, parent: Tree) -> None: """Method call before detaching from `parent`.""" @@ -563,7 +576,7 @@ def find_common_ancestor(self, other: NamedNode) -> NamedNode: if not common_ancestor: raise NotFoundInTreeError( - "Cannot find relative path because nodes do not lie within the same tree" + "Cannot find common ancestor because nodes do not lie within the same tree" ) return common_ancestor diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst index 18f9747c58c..3aec6d6c85b 100644 --- a/xarray/datatree_/docs/source/api.rst +++ b/xarray/datatree_/docs/source/api.rst @@ -30,6 +30,7 @@ Attributes relating to the recursive tree-like structure of a ``DataTree``. DataTree.root DataTree.is_root DataTree.is_leaf + DataTree.leaves DataTree.subtree DataTree.descendants DataTree.siblings diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 7c7e875c0ee..3fcf690590d 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -29,6 +29,8 @@ New Features By `Tom Nicholas `_. - Added a new :py:meth:`DataTree.descendants` property (:pull:`170`). By `Tom Nicholas `_. +- Added a :py:meth:`DataTree.leaves` property (:pull:`177`). + By `Tom Nicholas `_. Breaking changes ~~~~~~~~~~~~~~~~ From 09025d1d955ed8a1a584a3b99e0071f7fc3f4e62 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Wed, 4 Jan 2023 11:14:14 -0500 Subject: [PATCH 194/260] Fix name permanence behaviour in update method https://github.com/xarray-contrib/datatree/pull/178 * test for name permanence in update * ensure node is copied on update * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * black * whatsnew Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- xarray/datatree_/datatree/datatree.py | 8 +++++++- xarray/datatree_/datatree/tests/test_datatree.py | 7 +++++++ xarray/datatree_/docs/source/whats-new.rst | 2 +- 3 files changed, 15 insertions(+), 2 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 048fd1df370..fb26dff00d8 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -577,6 +577,9 @@ def _replace( datatree. It is up to the caller to ensure that they have the right type and are not used elsewhere. """ + # TODO Adding new children inplace using this method will cause bugs. + # You will end up with an inconsistency between the name of the child node and the key the child is stored under. + # Use ._set() instead for now if inplace: if variables is not None: self._variables = variables @@ -801,7 +804,10 @@ def update(self, other: Dataset | Mapping[str, DataTree | DataArray]) -> None: new_variables = {} for k, v in other.items(): if isinstance(v, DataTree): - new_children[k] = v + # avoid named node being stored under inconsistent key + new_child = v.copy() + new_child.name = k + new_children[k] = new_child elif isinstance(v, (DataArray, Variable)): # TODO this should also accommodate other types that can be coerced into Variables new_variables[k] = v diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index b1e9ee48cc2..74e178450f0 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -213,6 +213,13 @@ def test_update_new_named_dataarray(self): expected = da.rename("results") xrt.assert_equal(folder1["results"], expected) + def test_update_doesnt_alter_child_name(self): + dt = DataTree() + dt.update({"foo": xr.DataArray(0), "a": DataTree(name="b")}) + assert "a" in dt.children + child = dt["a"] + assert child.name == "a" + class TestCopy: def test_copy(self, create_test_datatree): diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 3fcf690590d..6dafa1e24b5 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -37,7 +37,7 @@ Breaking changes - :py:meth:`DataTree.copy` copy method now only copies the subtree, not the parent nodes (:pull:`171`). By `Tom Nicholas `_. -- Grafting a subtree onto another tree now leaves name of original subtree object unchanged (:issue:`116`, :pull:`172`). +- Grafting a subtree onto another tree now leaves name of original subtree object unchanged (:issue:`116`, :pull:`172`, :pull:`178`). By `Tom Nicholas `_. Deprecations From 968a84e2cff6a2214fc3ac1422a113edb4afa3b2 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Wed, 4 Jan 2023 17:16:48 -0500 Subject: [PATCH 195/260] Add assign method https://github.com/xarray-contrib/datatree/pull/181 * WIP assign method * remove rogue prints * remove assign from mapped methods * moved assign in docs to reflect it only acting on single nodes now * whatsnew --- xarray/datatree_/datatree/datatree.py | 44 ++++++++++++++++++- xarray/datatree_/datatree/ops.py | 1 - .../datatree_/datatree/tests/test_datatree.py | 23 ++++++++++ xarray/datatree_/docs/source/api.rst | 3 +- xarray/datatree_/docs/source/whats-new.rst | 2 + 5 files changed, 70 insertions(+), 3 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index fb26dff00d8..85cec7d9605 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -30,7 +30,7 @@ from xarray.core.indexes import Index, Indexes from xarray.core.merge import dataset_update_method from xarray.core.options import OPTIONS as XR_OPTS -from xarray.core.utils import Default, Frozen, _default +from xarray.core.utils import Default, Frozen, _default, either_dict_or_kwargs from xarray.core.variable import Variable, calculate_dimensions from . import formatting, formatting_html @@ -821,6 +821,48 @@ def update(self, other: Dataset | Mapping[str, DataTree | DataArray]) -> None: inplace=True, children=merged_children, **vars_merge_result._asdict() ) + def assign( + self, items: Mapping[Any, Any] | None = None, **items_kwargs: Any + ) -> DataTree: + """ + Assign new data variables or child nodes to a DataTree, returning a new object + with all the original items in addition to the new ones. + + Parameters + ---------- + items : mapping of hashable to Any + Mapping from variable or child node names to the new values. If the new values + are callable, they are computed on the Dataset and assigned to new + data variables. If the values are not callable, (e.g. a DataTree, DataArray, + scalar, or array), they are simply assigned. + **items_kwargs + The keyword arguments form of ``variables``. + One of variables or variables_kwargs must be provided. + + Returns + ------- + dt : DataTree + A new DataTree with the new variables or children in addition to all the + existing items. + + Notes + ----- + Since ``kwargs`` is a dictionary, the order of your arguments may not + be preserved, and so the order of the new variables is not well-defined. + Assigning multiple items within the same ``assign`` is + possible, but you cannot reference other variables created within the + same ``assign`` call. + + See Also + -------- + xarray.Dataset.assign + pandas.DataFrame.assign + """ + items = either_dict_or_kwargs(items, items_kwargs, "assign") + dt = self.copy() + dt.update(items) + return dt + def drop_nodes( self: DataTree, names: str | Iterable[str], *, errors: ErrorOptions = "raise" ) -> DataTree: diff --git a/xarray/datatree_/datatree/ops.py b/xarray/datatree_/datatree/ops.py index eabc1fafc1c..d6ac4f83e7c 100644 --- a/xarray/datatree_/datatree/ops.py +++ b/xarray/datatree_/datatree/ops.py @@ -68,7 +68,6 @@ "combine_first", "reduce", "map", - "assign", "diff", "shift", "roll", diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 74e178450f0..4a6fb8bff59 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -206,6 +206,17 @@ def test_getitem_dict_like_selection_access_to_dataset(self): class TestUpdate: + def test_update(self): + dt = DataTree() + dt.update({"foo": xr.DataArray(0), "a": DataTree()}) + expected = DataTree.from_dict({"/": xr.Dataset({"foo": 0}), "a": None}) + print(dt) + print(dt.children) + print(dt._children) + print(dt["a"]) + print(expected) + dtt.assert_equal(dt, expected) + def test_update_new_named_dataarray(self): da = xr.DataArray(name="temp", data=[0, 50]) folder1 = DataTree(name="folder1") @@ -542,6 +553,18 @@ def test_drop_nodes(self): childless = dropped.drop_nodes(names=["Mary", "Ashley"], errors="ignore") assert childless.children == {} + def test_assign(self): + dt = DataTree() + expected = DataTree.from_dict({"/": xr.Dataset({"foo": 0}), "/a": None}) + + # kwargs form + result = dt.assign(foo=xr.DataArray(0), a=DataTree()) + dtt.assert_equal(result, expected) + + # dict form + result = dt.assign({"foo": xr.DataArray(0), "a": DataTree()}) + dtt.assert_equal(result, expected) + class TestPipe: def test_noop(self, create_test_datatree): diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst index 3aec6d6c85b..23f903263e4 100644 --- a/xarray/datatree_/docs/source/api.rst +++ b/xarray/datatree_/docs/source/api.rst @@ -109,7 +109,7 @@ Manipulate the contents of all nodes in a tree simultaneously. :toctree: generated/ DataTree.copy - DataTree.assign + DataTree.assign_coords DataTree.merge DataTree.rename @@ -130,6 +130,7 @@ Manipulate the contents of a single DataTree node. .. autosummary:: :toctree: generated/ + DataTree.assign DataTree.drop_nodes Comparisons diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 6dafa1e24b5..e57e31e4761 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -39,6 +39,8 @@ Breaking changes By `Tom Nicholas `_. - Grafting a subtree onto another tree now leaves name of original subtree object unchanged (:issue:`116`, :pull:`172`, :pull:`178`). By `Tom Nicholas `_. +- Changed the :py:meth:`DataTree.assign` method to just work on the local node (:pull:`181`). + By `Tom Nicholas `_. Deprecations ~~~~~~~~~~~~ From c97f0169c9f35d5740f80b387caceb63849af3f1 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Wed, 4 Jan 2023 17:54:16 -0500 Subject: [PATCH 196/260] Hierarchical data docs page https://github.com/xarray-contrib/datatree/pull/179 * why hierarchical data * add hierarchical data page to index * Simpsons family tree * evolutionary tree * WIP rearrangement of creating trees * fixed examples in data structures page * dict-like navigation * filesystem-like paths explained * split PR into parts * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update docs/source/data-structures.rst Co-authored-by: Justus Magin * black * whatsnew * get assign example working * fix some links to methods * relative_to example Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Justus Magin --- .../datatree_/docs/source/data-structures.rst | 69 ++-- .../docs/source/hierarchical-data.rst | 332 ++++++++++++++++++ xarray/datatree_/docs/source/index.rst | 1 + xarray/datatree_/docs/source/whats-new.rst | 2 + 4 files changed, 359 insertions(+), 45 deletions(-) create mode 100644 xarray/datatree_/docs/source/hierarchical-data.rst diff --git a/xarray/datatree_/docs/source/data-structures.rst b/xarray/datatree_/docs/source/data-structures.rst index 67e0e608cd3..4417e099132 100644 --- a/xarray/datatree_/docs/source/data-structures.rst +++ b/xarray/datatree_/docs/source/data-structures.rst @@ -71,7 +71,7 @@ Again these are not normally used unless explicitly accessed by the user. Creating a DataTree ~~~~~~~~~~~~~~~~~~~ -There are three ways to create a ``DataTree`` from scratch. The first is to create each node individually, +One way to create a create a ``DataTree`` from scratch is to create each node individually, specifying the nodes' relationship to one another as you create each one. The ``DataTree`` constructor takes: @@ -81,16 +81,16 @@ The ``DataTree`` constructor takes: - ``children``: The various child nodes (if there are any), given as a mapping from string keys to ``DataTree`` objects. - ``name``: A string to use as the name of this node. -Let's make a datatree node without anything in it: +Let's make a single datatree node with some example data in it: .. ipython:: python from datatree import DataTree - # create root node - node1 = DataTree(name="Oak") + ds1 = xr.Dataset({"foo": "orange"}) + dt = DataTree(name="root", data=ds1) # create root node - node1 + dt At this point our node is also the root node, as every tree has a root node. @@ -98,56 +98,38 @@ We can add a second node to this tree either by referring to the first node in t .. ipython:: python + ds2 = xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])}) # add a child by referring to the parent node - node2 = DataTree(name="Bonsai", parent=node1) + node2 = DataTree(name="a", parent=dt, data=ds2) or by dynamically updating the attributes of one node to refer to another: .. ipython:: python - # add a grandparent by updating the .parent property of an existing node - node0 = DataTree(name="General Sherman") - node1.parent = node0 + # add a second child by first creating a new node ... + ds3 = xr.Dataset({"zed": np.NaN}) + node3 = DataTree(name="b", data=ds3) + # ... then updating its .parent property + node3.parent = dt -Our tree now has three nodes within it, and one of the two new nodes has become the new root: +Our tree now has three nodes within it: .. ipython:: python - node0 + dt -Is is at tree construction time that consistency checks are enforced. For instance, if we try to create a `cycle` the constructor will raise an error: +It is at tree construction time that consistency checks are enforced. For instance, if we try to create a `cycle` the constructor will raise an error: .. ipython:: python :okexcept: - node0.parent = node2 - -The second way is to build the tree from a dictionary of filesystem-like paths and corresponding ``xarray.Dataset`` objects. - -This relies on a syntax inspired by unix-like filesystems, where the "path" to a node is specified by the keys of each intermediate node in sequence, -separated by forward slashes. The root node is referred to by ``"/"``, so the path from our current root node to its grand-child would be ``"/Oak/Bonsai"``. -A path specified from the root (as opposed to being specified relative to an arbitrary node in the tree) is sometimes also referred to as a -`"fully qualified name" `_. - -If we have a dictionary where each key is a valid path, and each value is either valid data or ``None``, -we can construct a complex tree quickly using the alternative constructor ``:py:func::DataTree.from_dict``: + dt.parent = node3 -.. ipython:: python - - d = { - "/": xr.Dataset({"foo": "orange"}), - "/a": xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])}), - "/a/b": xr.Dataset({"zed": np.NaN}), - "a/c/d": None, - } - dt = DataTree.from_dict(d) - dt +Alternatively you can also create a ``DataTree`` object from -Notice that this method will also create any intermediate empty node necessary to reach the end of the specified path -(i.e. the node labelled `"c"` in this case.) - -Finally the third way is from a file. if you have a file containing data on disk (such as a netCDF file or a Zarr Store), you can also create a datatree by opening the -file using ``:py:func::~datatree.open_datatree``. See the page on :ref:`reading and writing files ` for more details. +- An ``xarray.Dataset`` using ``Dataset.to_node()`` (not yet implemented), +- A dictionary mapping directory-like paths to either ``DataTree`` nodes or data, using ``DataTree.from_dict()``, +- A netCDF or Zarr file on disk with ``open_datatree()``. See :ref:`reading and writing files `. DataTree Contents @@ -187,8 +169,6 @@ Like with ``Dataset``, you can access the data and coordinate variables of a nod Dictionary-like methods ~~~~~~~~~~~~~~~~~~~~~~~ -We can update the contents of the tree in-place using a dictionary-like syntax. - We can update a datatree in-place using Python's standard dictionary syntax, similar to how we can for Dataset objects. For example, to create this example datatree from scratch, we could have written: @@ -196,11 +176,10 @@ For example, to create this example datatree from scratch, we could have written .. ipython:: python - dt = DataTree() + dt = DataTree(name="root") dt["foo"] = "orange" dt["a"] = DataTree(data=xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])})) dt["a/b/zed"] = np.NaN - dt["a/c/d"] = DataTree() dt To change the variables in a node of a ``DataTree``, you can use all the standard dictionary @@ -209,6 +188,6 @@ methods, including ``values``, ``items``, ``__delitem__``, ``get`` and Note that assigning a ``DataArray`` object to a ``DataTree`` variable using ``__setitem__`` or ``update`` will :ref:`automatically align` the array(s) to the original node's indexes. -If you copy a ``DataTree`` using the ``:py:func::copy`` function or the :py:meth:`~xarray.DataTree.copy` it will copy the entire tree, -including all parents and children. -Like for ``Dataset``, this copy is shallow by default, but you can copy all the data by calling ``dt.copy(deep=True)``. +If you copy a ``DataTree`` using the ``:py:func::copy`` function or the :py:meth:`~xarray.DataTree.copy` it will copy the subtree, +meaning that node and children below it, but no parents above it. +Like for ``Dataset``, this copy is shallow by default, but you can copy all the underlying data arrays by calling ``dt.copy(deep=True)``. diff --git a/xarray/datatree_/docs/source/hierarchical-data.rst b/xarray/datatree_/docs/source/hierarchical-data.rst new file mode 100644 index 00000000000..85d392d0af9 --- /dev/null +++ b/xarray/datatree_/docs/source/hierarchical-data.rst @@ -0,0 +1,332 @@ +.. _hierarchical-data: + +Working With Hierarchical Data +============================== + +.. ipython:: python + :suppress: + + import numpy as np + import pandas as pd + import xarray as xr + from datatree import DataTree + + np.random.seed(123456) + np.set_printoptions(threshold=10) + +Why Hierarchical Data? +---------------------- + +Many real-world datasets are composed of multiple differing components, +and it can often be be useful to think of these in terms of a hierarchy of related groups of data. +Examples of data which one might want organise in a grouped or hierarchical manner include: + +- Simulation data at multiple resolutions, +- Observational data about the same system but from multiple different types of sensors, +- Mixed experimental and theoretical data, +- A systematic study recording the same experiment but with different parameters, +- Heterogenous data, such as demographic and metereological data, + +or even any combination of the above. + +Often datasets like this cannot easily fit into a single ``xarray.Dataset`` object, +or are more usefully thought of as groups of related ``xarray.Dataset`` objects. +For this purpose we provide the :py:class:`DataTree` class. + +This page explains in detail how to understand and use the different features of the :py:class:`DataTree` class for your own heirarchical data needs. + +.. _node relationships: + +Node Relationships +------------------ + +.. _creating a family tree: + +Creating a Family Tree +~~~~~~~~~~~~~~~~~~~~~~ + +The three main ways of creating a ``DataTree`` object are described briefly in :ref:`creating a datatree`. +Here we go into more detail about how to create a tree node-by-node, using a famous family tree from the Simpsons cartoon as an example. + +Let's start by defining nodes representing the two siblings, Bart and Lisa Simpson: + +.. ipython:: python + + bart = DataTree(name="Bart") + lisa = DataTree(name="Lisa") + +Each of these node objects knows their own :py:class:`~DataTree.name`, but they currently have no relationship to one another. +We can connect them by creating another node representing a common parent, Homer Simpson: + +.. ipython:: python + + homer = DataTree(name="Homer", children={"Bart": bart, "Lisa": lisa}) + +Here we set the children of Homer in the node's constructor. +We now have a small family tree + +.. ipython:: python + + homer + +where we can see how these individual Simpson family members are related to one another. +The nodes representing Bart and Lisa are now connected - we can confirm their sibling rivalry by examining the :py:class:`~DataTree.siblings` property: + +.. ipython:: python + + list(bart.siblings) + +But oops, we forgot Homer's third daughter, Maggie! Let's add her by updating Homer's :py:class:`~DataTree.children` property to include her: + +.. ipython:: python + + maggie = DataTree(name="Maggie") + homer.children = {"Bart": bart, "Lisa": lisa, "Maggie": maggie} + homer + +Let's check that Maggie knows who her Dad is: + +.. ipython:: python + + maggie.parent.name + +That's good - updating the properties of our nodes does not break the internal consistency of our tree, as changes of parentage are automatically reflected on both nodes. + + These children obviously have another parent, Marge Simpson, but ``DataTree`` nodes can only have a maximum of one parent. + Genealogical `family trees are not even technically trees `_ in the mathematical sense - + the fact that distant relatives can mate makes it a directed acyclic graph. + Trees of ``DataTree`` objects cannot represent this. + +Homer is currently listed as having no parent (the so-called "root node" of this tree), but we can update his :py:class:`~DataTree.parent` property: + +.. ipython:: python + + abe = DataTree(name="Abe") + homer.parent = abe + +Abe is now the "root" of this tree, which we can see by examining the :py:class:`~DataTree.root` property of any node in the tree + +.. ipython:: python + + maggie.root.name + +We can see the whole tree by printing Abe's node or just part of the tree by printing Homer's node: + +.. ipython:: python + + abe + homer + +We can see that Homer is aware of his parentage, and we say that Homer and his children form a "subtree" of the larger Simpson family tree. + +In episode 28, Abe Simpson reveals that he had another son, Herbert "Herb" Simpson. +We can add Herbert to the family tree without displacing Homer by :py:meth:`~DataTree.assign`-ing another child to Abe: + +.. ipython:: python + + herbert = DataTree(name="Herb") + abe.assign({"Herbert": herbert}) + +.. note:: + This example shows a minor subtlety - the returned tree has Homer's brother listed as ``"Herbert"``, + but the original node was named "Herbert". Not only are names overriden when stored as keys like this, + but the new node is a copy, so that the original node that was reference is unchanged (i.e. ``herbert.name == "Herb"`` still). + In other words, nodes are copied into trees, not inserted into them. + This is intentional, and mirrors the behaviour when storing named ``xarray.DataArray`` objects inside datasets. + +Certain manipulations of our tree are forbidden, if they would create an inconsistent result. +In episode 51 of the show Futurama, Philip J. Fry travels back in time and accidentally becomes his own Grandfather. +If we try similar time-travelling hijinks with Homer, we get a :py:class:`InvalidTreeError` raised: + +.. ipython:: python + :okexcept: + + abe.parent = homer + +.. _evolutionary tree: + +Ancestry in an Evolutionary Tree +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Let's use a different example of a tree to discuss more complex relationships between nodes - the phylogenetic tree, or tree of life. + +.. ipython:: python + + vertebrates = DataTree.from_dict( + name="Vertebrae", + d={ + "/Sharks": None, + "/Bony Skeleton/Ray-finned Fish": None, + "/Bony Skeleton/Four Limbs/Amphibians": None, + "/Bony Skeleton/Four Limbs/Amniotic Egg/Hair/Primates": None, + "/Bony Skeleton/Four Limbs/Amniotic Egg/Hair/Rodents & Rabbits": None, + "/Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Dinosaurs": None, + "/Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Birds": None, + }, + ) + + primates = vertebrates["/Bony Skeleton/Four Limbs/Amniotic Egg/Hair/Primates"] + dinosaurs = vertebrates[ + "/Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Dinosaurs" + ] + +We have used the :py:meth:`~DataTree.from_dict` constructor method as an alternate way to quickly create a whole tree, +and :ref:`filesystem-like syntax `_ (to be explained shortly) to select two nodes of interest. + +.. ipython:: python + + vertebrates + +This tree shows various families of species, grouped by their common features (making it technically a `"Cladogram" `_, +rather than an evolutionary tree). + +Here both the species and the features used to group them are represented by ``DataTree`` node objects - there is no distinction in types of node. +We can however get a list of only the nodes we used to represent species by using the fact that all those nodes have no children - they are "leaf nodes". +We can check if a node is a leaf with :py:meth:`~DataTree.is_leaf`, and get a list of all leaves with the :py:class:`~DataTree.leaves` property: + +.. ipython:: python + + primates.is_leaf + [node.name for node in vertebrates.leaves] + +Pretending that this is a true evolutionary tree for a moment, we can find the features of the evolutionary ancestors (so-called "ancestor" nodes), +the distinguishing feature of the common ancestor of all vertebrate life (the root node), +and even the distinguishing feature of the common ancestor of any two species (the common ancestor of two nodes): + +.. ipython:: python + + [node.name for node in primates.ancestors] + primates.root.name + primates.find_common_ancestor(dinosaurs).name + +We can only find a common ancestor between two nodes that lie in the same tree. +If we try to find the common evolutionary ancestor between primates and an Alien species that has no relationship to Earth's evolutionary tree, +an error will be raised. + +.. ipython:: python + :okexcept: + + alien = DataTree(name="Xenomorph") + primates.find_common_ancestor(alien) + + +.. _navigating trees: + +Navigating Trees +---------------- + +There are various ways to access the different nodes in a tree. + +Properties +~~~~~~~~~~ + +We can navigate trees using the :py:class:`~DataTree.parent` and :py:class:`~DataTree.children` properties of each node, for example: + +.. ipython:: python + + lisa.parent.children["Bart"].name + +but there are also more convenient ways to access nodes. + +Dictionary-like interface +~~~~~~~~~~~~~~~~~~~~~~~~~ + +Children are stored on each node as a key-value mapping from name to child node. +They can be accessed and altered via the :py:class:`~DataTree.__getitem__` and :py:class:`~DataTree.__setitem__` syntax. +In general :py:class:`~DataTree.DataTree` objects support almost the entire set of dict-like methods, +including :py:meth:`~DataTree.keys`, :py:class:`~DataTree.values`, :py:class:`~DataTree.items`, +:py:meth:`~DataTree.__delitem__` and :py:meth:`~DataTree.update`. + +.. ipython:: python + + vertebrates["Bony Skeleton"]["Ray-finned Fish"] + +Note that the dict-like interface combines access to child ``DataTree`` nodes and stored ``DataArrays``, +so if we have a node that contains both children and data, calling :py:meth:`~DataTree.keys` will list both names of child nodes and +names of data variables: + +.. ipython:: python + + dt = DataTree( + data=xr.Dataset({"foo": 0, "bar": 1}), + children={"a": DataTree(), "b": DataTree()}, + ) + print(dt) + list(dt.keys()) + +This also means that the names of variables and of child nodes must be different to one another. + +Attribute-like access +~~~~~~~~~~~~~~~~~~~~~ + +# TODO attribute-like access is not yet implemented, see issue https://github.com/xarray-contrib/datatree/issues/98 + +.. _filesystem paths: + +Filesystem-like Paths +~~~~~~~~~~~~~~~~~~~~~ + +Hierarchical trees can be thought of as analogous to file systems. +Each node is like a directory, and each directory can contain both more sub-directories and data. + +.. note:: + + You can even make the filesystem analogy concrete by using :py:func:`~DataTree.open_mfdatatree` or :py:func:`~DataTree.save_mfdatatree` # TODO not yet implemented - see GH issue 51 + +Datatree objects support a syntax inspired by unix-like filesystems, +where the "path" to a node is specified by the keys of each intermediate node in sequence, +separated by forward slashes. +This is an extension of the conventional dictionary ``__getitem__`` syntax to allow navigation across multiple levels of the tree. + +Like with filepaths, paths within the tree can either be relative to the current node, e.g. + +.. ipython:: python + + abe["Homer/Bart"].name + abe["./Homer/Bart"].name # alternative syntax + +or relative to the root node. +A path specified from the root (as opposed to being specified relative to an arbitrary node in the tree) is sometimes also referred to as a +`"fully qualified name" `_, +or as an "absolute path". +The root node is referred to by ``"/"``, so the path from the root node to its grand-child would be ``"/child/grandchild"``, e.g. + +.. ipython:: python + + # absolute path will start from root node + lisa["/Homer/Bart"].name + +Relative paths between nodes also support the ``"../"`` syntax to mean the parent of the current node. +We can use this with ``__setitem__`` to add a missing entry to our evolutionary tree, but add it relative to a more familiar node of interest: + +.. ipython:: python + + primates["../../Two Fenestrae/Crocodiles"] = DataTree() + print(vertebrates) + +Given two nodes in a tree, we can also find their relative path: + +.. ipython:: python + + bart.relative_to(lisa) + +You can use this filepath feature to build a nested tree from a dictionary of filesystem-like paths and corresponding ``xarray.Dataset`` objects in a single step. +If we have a dictionary where each key is a valid path, and each value is either valid data or ``None``, +we can construct a complex tree quickly using the alternative constructor :py:meth:`DataTree.from_dict()`: + +.. ipython:: python + + d = { + "/": xr.Dataset({"foo": "orange"}), + "/a": xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])}), + "/a/b": xr.Dataset({"zed": np.NaN}), + "a/c/d": None, + } + dt = DataTree.from_dict(d) + dt + +.. note:: + + Notice that using the path-like syntax will also create any intermediate empty nodes necessary to reach the end of the specified path + (i.e. the node labelled `"c"` in this case.) + This is to help avoid lots of redundant entries when creating deeply-nested trees using :py:meth:`DataTree.from_dict`. diff --git a/xarray/datatree_/docs/source/index.rst b/xarray/datatree_/docs/source/index.rst index 9448e2325ed..e0e39de7d18 100644 --- a/xarray/datatree_/docs/source/index.rst +++ b/xarray/datatree_/docs/source/index.rst @@ -12,6 +12,7 @@ Datatree Quick Overview Tutorial Data Model + Hierarchical Data Reading and Writing Files API Reference Terminology diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index e57e31e4761..0d59e0e71df 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -58,6 +58,8 @@ Documentation By `Tom Nicholas `_. - Added ``Terminology`` page. (:pull:`174`) By `Tom Nicholas `_. +- Added page on ``Working with Hierarchical Data`` (:pull:`179`) + By `Tom Nicholas `_. Internal Changes ~~~~~~~~~~~~~~~~ From 8d6255a58a4f26b516117966a49f3bc69faaa951 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 4 Jan 2023 18:07:14 -0500 Subject: [PATCH 197/260] add py.typed file --- xarray/datatree_/datatree/py.typed | 0 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 xarray/datatree_/datatree/py.typed diff --git a/xarray/datatree_/datatree/py.typed b/xarray/datatree_/datatree/py.typed new file mode 100644 index 00000000000..e69de29bb2d From 5d4242d7f5d4e9e671188317210c06fdf3b0aa05 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Thu, 5 Jan 2023 11:28:16 -0500 Subject: [PATCH 198/260] Add content to Index page https://github.com/xarray-contrib/datatree/pull/182 * Joe's suggestions for index page * whatsnew --- xarray/datatree_/docs/source/index.rst | 33 +++++++++++++++++++++- xarray/datatree_/docs/source/whats-new.rst | 2 ++ 2 files changed, 34 insertions(+), 1 deletion(-) diff --git a/xarray/datatree_/docs/source/index.rst b/xarray/datatree_/docs/source/index.rst index e0e39de7d18..9fd21c95de5 100644 --- a/xarray/datatree_/docs/source/index.rst +++ b/xarray/datatree_/docs/source/index.rst @@ -3,6 +3,38 @@ Datatree **Datatree is a prototype implementation of a tree-like hierarchical data structure for xarray.** +Why Datatree? +~~~~~~~~~~~~~ + +Datatree was born after the xarray team recognised a `need for a new hierarchical data structure `_, +that was more flexible than a single :py:class:`xarray.Dataset` object. +The initial motivation was to represent netCDF files / Zarr stores with multiple nested groups in a single in-memory object, +but :py:class:`~datatree.DataTree` objects have many other uses. + +You might want to use datatree for: + +- Organising many related datasets, e.g. results of the same experiment with different parameters, or simulations of the same system using different models, +- Analysing similar data at multiple resolutions simultaneously, such as when doing a convergence study, +- Comparing heterogenous but related data, such as experimental and theoretical data, +- I/O with nested data formats such as netCDF / Zarr groups. + +Development Roadmap +~~~~~~~~~~~~~~~~~~~ + +Datatree currently lives in a separate repository to the main xarray package. +This allows the datatree developers to make changes to it, experiment, and improve it faster. + +Eventually we plan to fully integrate datatree upstream into xarray's main codebase, at which point the `github.com/xarray-contrib/datatree `_ repository will be archived. +This should not cause much disruption to code that depends on datatree - you will likely only have to change the import line (i.e. from ``from datatree import DataTree`` to ``from xarray import DataTree``). + +However, until this full integration occurs, datatree's API should not be considered to have the same `level of stability as xarray's `_. + +User Feedback +~~~~~~~~~~~~~ + +We really really really want to hear your opinions on datatree! +At this point in development, user feedback is critical to help us create something that will suit everyone's needs. +Please raise any thoughts, issues, suggestions or bugs, no matter how small or large, on the `github issue tracker `_. .. toctree:: :maxdepth: 2 @@ -18,7 +50,6 @@ Datatree Terminology How do I ... Contributing Guide - Development Roadmap What's New GitHub repository diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 0d59e0e71df..93fad0e1940 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -60,6 +60,8 @@ Documentation By `Tom Nicholas `_. - Added page on ``Working with Hierarchical Data`` (:pull:`179`) By `Tom Nicholas `_. +- Added context content to ``Index`` page (:pull:`182`) + By `Tom Nicholas `_. Internal Changes ~~~~~~~~~~~~~~~~ From 5eec9cc77277749264f24180158d067878052d66 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Thu, 5 Jan 2023 11:47:41 -0500 Subject: [PATCH 199/260] fix syntax of lots of API references --- xarray/datatree_/docs/source/data-structures.rst | 16 ++++++++-------- .../datatree_/docs/source/hierarchical-data.rst | 3 ++- xarray/datatree_/docs/source/io.rst | 16 ++++++++-------- 3 files changed, 18 insertions(+), 17 deletions(-) diff --git a/xarray/datatree_/docs/source/data-structures.rst b/xarray/datatree_/docs/source/data-structures.rst index 4417e099132..42da3b0630e 100644 --- a/xarray/datatree_/docs/source/data-structures.rst +++ b/xarray/datatree_/docs/source/data-structures.rst @@ -23,8 +23,8 @@ Data Structures DataTree -------- -:py:class:``DataTree`` is xarray's highest-level data structure, able to organise heterogeneous data which -could not be stored inside a single ``Dataset`` object. This includes representing the recursive structure of multiple +:py:class:`DataTree` is xarray's highest-level data structure, able to organise heterogeneous data which +could not be stored inside a single :py:class:`Dataset` object. This includes representing the recursive structure of multiple `groups`_ within a netCDF file or `Zarr Store`_. .. _groups: https://www.unidata.ucar.edu/software/netcdf/workshops/2011/groups-types/GroupsIntro.html @@ -128,8 +128,8 @@ It is at tree construction time that consistency checks are enforced. For instan Alternatively you can also create a ``DataTree`` object from - An ``xarray.Dataset`` using ``Dataset.to_node()`` (not yet implemented), -- A dictionary mapping directory-like paths to either ``DataTree`` nodes or data, using ``DataTree.from_dict()``, -- A netCDF or Zarr file on disk with ``open_datatree()``. See :ref:`reading and writing files `. +- A dictionary mapping directory-like paths to either ``DataTree`` nodes or data, using :py:meth:`DataTree.from_dict()`, +- A netCDF or Zarr file on disk with :py:func:`open_datatree()`. See :ref:`reading and writing files `. DataTree Contents @@ -152,7 +152,7 @@ We can also access all the data in a single node through a dataset-like view This demonstrates the fact that the data in any one node is equivalent to the contents of a single ``xarray.Dataset`` object. The ``DataTree.ds`` property returns an immutable view, but we can instead extract the node's data contents as a new (and mutable) -``xarray.Dataset`` object via ``.to_dataset()``: +``xarray.Dataset`` object via :py:meth:`DataTree.to_dataset()`: .. ipython:: python @@ -184,10 +184,10 @@ For example, to create this example datatree from scratch, we could have written To change the variables in a node of a ``DataTree``, you can use all the standard dictionary methods, including ``values``, ``items``, ``__delitem__``, ``get`` and -:py:meth:`~xarray.DataTree.update`. +:py:meth:`DataTree.update`. Note that assigning a ``DataArray`` object to a ``DataTree`` variable using ``__setitem__`` or ``update`` will -:ref:`automatically align` the array(s) to the original node's indexes. +:ref:`automatically align ` the array(s) to the original node's indexes. -If you copy a ``DataTree`` using the ``:py:func::copy`` function or the :py:meth:`~xarray.DataTree.copy` it will copy the subtree, +If you copy a ``DataTree`` using the :py:func:`copy` function or the :py:meth:`DataTree.copy` method it will copy the subtree, meaning that node and children below it, but no parents above it. Like for ``Dataset``, this copy is shallow by default, but you can copy all the underlying data arrays by calling ``dt.copy(deep=True)``. diff --git a/xarray/datatree_/docs/source/hierarchical-data.rst b/xarray/datatree_/docs/source/hierarchical-data.rst index 85d392d0af9..2aad1dbf655 100644 --- a/xarray/datatree_/docs/source/hierarchical-data.rst +++ b/xarray/datatree_/docs/source/hierarchical-data.rst @@ -14,6 +14,7 @@ Working With Hierarchical Data np.random.seed(123456) np.set_printoptions(threshold=10) + Why Hierarchical Data? ---------------------- @@ -33,7 +34,7 @@ Often datasets like this cannot easily fit into a single ``xarray.Dataset`` obje or are more usefully thought of as groups of related ``xarray.Dataset`` objects. For this purpose we provide the :py:class:`DataTree` class. -This page explains in detail how to understand and use the different features of the :py:class:`DataTree` class for your own heirarchical data needs. +This page explains in detail how to understand and use the different features of the :py:class:`DataTree` class for your own hierarchical data needs. .. _node relationships: diff --git a/xarray/datatree_/docs/source/io.rst b/xarray/datatree_/docs/source/io.rst index 49f3faa76d2..dee4ba802f4 100644 --- a/xarray/datatree_/docs/source/io.rst +++ b/xarray/datatree_/docs/source/io.rst @@ -17,9 +17,9 @@ Groups ~~~~~~ Whilst netCDF groups can only be loaded individually as Dataset objects, a whole file of many nested groups can be loaded -as a single ``:py:class::DataTree`` object. -To open a whole netCDF file as a tree of groups use the ``:py:func::open_datatree()`` function. -To save a DataTree object as a netCDF file containing many groups, use the ``:py:meth::DataTree.to_netcdf()`` method. +as a single :py:class:`DataTree` object. +To open a whole netCDF file as a tree of groups use the :py:func:`open_datatree` function. +To save a DataTree object as a netCDF file containing many groups, use the :py:meth:`DataTree.to_netcdf` method. .. _netcdf.group.warning: @@ -30,7 +30,7 @@ To save a DataTree object as a netCDF file containing many groups, use the ``:py In particular in the netCDF data model dimensions are entities that can exist regardless of whether any variable possesses them. This is in contrast to `xarray's data model `_ - (and hence :ref:`datatree's data model`) in which the dimensions of a (Dataset/Tree) + (and hence :ref:`datatree's data model `) in which the dimensions of a (Dataset/Tree) object are simply the set of dimensions present across all variables in that dataset. This means that if a netCDF file contains dimensions but no variables which possess those dimensions, @@ -43,10 +43,10 @@ Zarr Groups ~~~~~~ -Nested groups in zarr stores can be represented by loading the store as a ``:py:class::DataTree`` object, similarly to netCDF. -To open a whole zarr store as a tree of groups use the ``:py:func::open_datatree()`` function. -To save a DataTree object as a zarr store containing many groups, use the ``:py:meth::DataTree.to_zarr()`` method. +Nested groups in zarr stores can be represented by loading the store as a :py:class:`DataTree` object, similarly to netCDF. +To open a whole zarr store as a tree of groups use the :py:func:`open_datatree` function. +To save a DataTree object as a zarr store containing many groups, use the :py:meth:`DataTree.to_zarr()` method. .. note:: - Note that perfect round-tripping should always be possible with a zarr store (:ref:`unlike for netCDF files`), + Note that perfect round-tripping should always be possible with a zarr store (:ref:`unlike for netCDF files `), as zarr does not support "unused" dimensions. From c3d1a277e69055c363c8e8d75eb1cb31660220f2 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Thu, 5 Jan 2023 11:54:11 -0500 Subject: [PATCH 200/260] make it clear that set_close is missing right now --- xarray/datatree_/docs/source/api.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst index 23f903263e4..0ef996c8821 100644 --- a/xarray/datatree_/docs/source/api.rst +++ b/xarray/datatree_/docs/source/api.rst @@ -333,4 +333,5 @@ Relatively advanced API for users or developers looking to understand the intern .. + Missing: ``DataTree.set_close`` From 2b537881bf71a8f53ee6e9467c4a9d02063d774a Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Thu, 5 Jan 2023 11:55:29 -0500 Subject: [PATCH 201/260] link to numpy.ndarray class --- xarray/datatree_/docs/source/api.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst index 0ef996c8821..feccdcfd121 100644 --- a/xarray/datatree_/docs/source/api.rst +++ b/xarray/datatree_/docs/source/api.rst @@ -239,7 +239,7 @@ Aggregate data in all nodes in the subtree simultaneously. ndarray methods =============== -Methods copied from `np.ndarray` objects, here applying to the data in all nodes in the subtree. +Methods copied from :py:class:`numpy.ndarray` objects, here applying to the data in all nodes in the subtree. .. autosummary:: :toctree: generated/ From 18ccffe712ec55bfe476d24f4e48d565f48f8a62 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Thu, 5 Jan 2023 11:58:59 -0500 Subject: [PATCH 202/260] fix :pull: in whatsnew --- xarray/datatree_/docs/source/conf.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/docs/source/conf.py b/xarray/datatree_/docs/source/conf.py index e95bc2bc7e2..f310e35d40b 100644 --- a/xarray/datatree_/docs/source/conf.py +++ b/xarray/datatree_/docs/source/conf.py @@ -54,7 +54,7 @@ extlinks = { "issue": ("https://github.com/TomNicholas/datatree/issues/%s", "GH#"), - "pr": ("https://github.com/TomNicholas/datatree/pull/%s", "GH#"), + "pull": ("https://github.com/TomNicholas/datatree/pull/%s", "GH#"), } # Add any paths that contain templates here, relative to this directory. templates_path = ["_templates", sphinx_autosummary_accessors.templates_path] From 81a24ef2d42613f19f33fb7a2815640c335ef618 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Thu, 5 Jan 2023 12:02:37 -0500 Subject: [PATCH 203/260] shorten error messages --- xarray/datatree_/docs/source/data-structures.rst | 2 ++ xarray/datatree_/docs/source/hierarchical-data.rst | 1 + 2 files changed, 3 insertions(+) diff --git a/xarray/datatree_/docs/source/data-structures.rst b/xarray/datatree_/docs/source/data-structures.rst index 42da3b0630e..37bfb12cf30 100644 --- a/xarray/datatree_/docs/source/data-structures.rst +++ b/xarray/datatree_/docs/source/data-structures.rst @@ -14,6 +14,8 @@ Data Structures np.random.seed(123456) np.set_printoptions(threshold=10) + %xmode minimal + .. note:: This page builds on the information given in xarray's main page on diff --git a/xarray/datatree_/docs/source/hierarchical-data.rst b/xarray/datatree_/docs/source/hierarchical-data.rst index 2aad1dbf655..899423b4b49 100644 --- a/xarray/datatree_/docs/source/hierarchical-data.rst +++ b/xarray/datatree_/docs/source/hierarchical-data.rst @@ -14,6 +14,7 @@ Working With Hierarchical Data np.random.seed(123456) np.set_printoptions(threshold=10) + %xmode minimal Why Hierarchical Data? ---------------------- From 31f74abd5389dcc02443080c82ec506a7382eb71 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Fri, 6 Jan 2023 11:23:05 -0500 Subject: [PATCH 204/260] implement filter --- xarray/datatree_/datatree/datatree.py | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 85cec7d9605..a1bcf02fdd0 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -1102,6 +1102,28 @@ def identical(self, other: DataTree, from_root=True) -> bool: for node, other_node in zip(self.subtree, other.subtree) ) + def filter(self: DataTree, filterfunc: Callable[[DataTree], bool]) -> DataTree: + """ + Filter nodes according to a specified condition. + + Returns a new tree containing only the nodes in the original tree for which `fitlerfunc(node)` is True. + Will also contain empty nodes at intermediate positions if required to support leaves. + + Parameters + ---------- + filterfunc: function + A function which accepts only one DataTree - the node on which filterfunc will be called. + + See Also + -------- + pipe + map_over_subtree + """ + filtered_nodes = { + node.path: node.ds for node in self.subtree if filterfunc(node) + } + return DataTree.from_dict(filtered_nodes, name=self.root.name) + def map_over_subtree( self, func: Callable, From f7546abf6262829d04a42d3622ebf78ae12f3a73 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Fri, 6 Jan 2023 11:26:53 -0500 Subject: [PATCH 205/260] test filter --- .../datatree_/datatree/tests/test_datatree.py | 25 +++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 4a6fb8bff59..9527b88a5c4 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -596,3 +596,28 @@ def f(x, tree, y): actual = dt.pipe((f, "tree"), **attrs) assert actual is dt and actual.attrs == attrs + + +class TestSubset: + def test_filter(self): + simpsons = DataTree.from_dict( + d={ + "/": xr.Dataset({"age": 83}), + "/Herbert": xr.Dataset({"age": 40}), + "/Homer": xr.Dataset({"age": 39}), + "/Homer/Bart": xr.Dataset({"age": 10}), + "/Homer/Lisa": xr.Dataset({"age": 8}), + "/Homer/Maggie": xr.Dataset({"age": 1}), + }, + name="Abe", + ) + expected = DataTree.from_dict( + d={ + "/": xr.Dataset({"age": 83}), + "/Herbert": xr.Dataset({"age": 40}), + "/Homer": xr.Dataset({"age": 39}), + }, + name="Abe", + ) + elders = simpsons.filter(lambda node: node["age"] > 18) + dtt.assert_identical(elders, expected) From 99a212e4e483fda545445697888d39bdf7138c48 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Fri, 6 Jan 2023 11:34:09 -0500 Subject: [PATCH 206/260] Add filter method https://github.com/xarray-contrib/datatree/pull/185 * implement filter * test filter * whatsnew * add filter to API docs --- xarray/datatree_/datatree/datatree.py | 22 ++++++++++++++++ .../datatree_/datatree/tests/test_datatree.py | 25 +++++++++++++++++++ xarray/datatree_/docs/source/api.rst | 1 + xarray/datatree_/docs/source/whats-new.rst | 2 ++ 4 files changed, 50 insertions(+) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 85cec7d9605..a1bcf02fdd0 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -1102,6 +1102,28 @@ def identical(self, other: DataTree, from_root=True) -> bool: for node, other_node in zip(self.subtree, other.subtree) ) + def filter(self: DataTree, filterfunc: Callable[[DataTree], bool]) -> DataTree: + """ + Filter nodes according to a specified condition. + + Returns a new tree containing only the nodes in the original tree for which `fitlerfunc(node)` is True. + Will also contain empty nodes at intermediate positions if required to support leaves. + + Parameters + ---------- + filterfunc: function + A function which accepts only one DataTree - the node on which filterfunc will be called. + + See Also + -------- + pipe + map_over_subtree + """ + filtered_nodes = { + node.path: node.ds for node in self.subtree if filterfunc(node) + } + return DataTree.from_dict(filtered_nodes, name=self.root.name) + def map_over_subtree( self, func: Callable, diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 4a6fb8bff59..9527b88a5c4 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -596,3 +596,28 @@ def f(x, tree, y): actual = dt.pipe((f, "tree"), **attrs) assert actual is dt and actual.attrs == attrs + + +class TestSubset: + def test_filter(self): + simpsons = DataTree.from_dict( + d={ + "/": xr.Dataset({"age": 83}), + "/Herbert": xr.Dataset({"age": 40}), + "/Homer": xr.Dataset({"age": 39}), + "/Homer/Bart": xr.Dataset({"age": 10}), + "/Homer/Lisa": xr.Dataset({"age": 8}), + "/Homer/Maggie": xr.Dataset({"age": 1}), + }, + name="Abe", + ) + expected = DataTree.from_dict( + d={ + "/": xr.Dataset({"age": 83}), + "/Herbert": xr.Dataset({"age": 40}), + "/Homer": xr.Dataset({"age": 39}), + }, + name="Abe", + ) + elders = simpsons.filter(lambda node: node["age"] > 18) + dtt.assert_identical(elders, expected) diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst index feccdcfd121..835b18d4832 100644 --- a/xarray/datatree_/docs/source/api.rst +++ b/xarray/datatree_/docs/source/api.rst @@ -99,6 +99,7 @@ For manipulating, traversing, navigating, or mapping over the tree structure. DataTree.find_common_ancestor map_over_subtree DataTree.pipe + DataTree.filter DataTree Contents ----------------- diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 93fad0e1940..27ba0eb5546 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -31,6 +31,8 @@ New Features By `Tom Nicholas `_. - Added a :py:meth:`DataTree.leaves` property (:pull:`177`). By `Tom Nicholas `_. +- Added a :py:meth:`DataTree.filter` method (:pull:`184`). + By `Tom Nicholas `_. Breaking changes ~~~~~~~~~~~~~~~~ From 5522d90cf8b52ba8fcdf55baaadc91871a7bbab4 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Fri, 6 Jan 2023 12:02:20 -0500 Subject: [PATCH 207/260] Try to fix code links in docs https://github.com/xarray-contrib/datatree/pull/183 * try to fix code links in docs * add currentmodule datatree command to get links to api docs working * add some intersphinx links to xarray API * whatsnew --- xarray/datatree_/ci/doc.yml | 2 ++ xarray/datatree_/docs/source/conf.py | 18 ++++++++++++++---- .../datatree_/docs/source/data-structures.rst | 2 ++ .../docs/source/hierarchical-data.rst | 4 +++- xarray/datatree_/docs/source/index.rst | 2 ++ xarray/datatree_/docs/source/installation.rst | 2 ++ xarray/datatree_/docs/source/io.rst | 2 ++ .../datatree_/docs/source/quick-overview.rst | 6 ++++-- xarray/datatree_/docs/source/terminology.rst | 1 + xarray/datatree_/docs/source/tutorial.rst | 2 ++ xarray/datatree_/docs/source/whats-new.rst | 2 ++ 11 files changed, 36 insertions(+), 7 deletions(-) diff --git a/xarray/datatree_/ci/doc.yml b/xarray/datatree_/ci/doc.yml index fc9baeb06ac..ce502c0f5f5 100644 --- a/xarray/datatree_/ci/doc.yml +++ b/xarray/datatree_/ci/doc.yml @@ -11,6 +11,8 @@ dependencies: - sphinx-panels - sphinx-autosummary-accessors - sphinx-book-theme >= 0.0.38 + - nbsphinx + - sphinxcontrib-srclinks - pydata-sphinx-theme>=0.4.3 - numpydoc - ipython diff --git a/xarray/datatree_/docs/source/conf.py b/xarray/datatree_/docs/source/conf.py index f310e35d40b..06eb6d9d62b 100644 --- a/xarray/datatree_/docs/source/conf.py +++ b/xarray/datatree_/docs/source/conf.py @@ -13,6 +13,7 @@ # All configuration values have a default; values that are commented out # serve to show the default. +import inspect import os import sys @@ -41,6 +42,7 @@ "numpydoc", "sphinx.ext.autodoc", "sphinx.ext.viewcode", + "sphinx.ext.linkcode", "sphinx.ext.autosummary", "sphinx.ext.intersphinx", "sphinx.ext.extlinks", @@ -50,6 +52,8 @@ "sphinx_autosummary_accessors", "IPython.sphinxext.ipython_console_highlighting", "IPython.sphinxext.ipython_directive", + "nbsphinx", + "sphinxcontrib.srclinks", ] extlinks = { @@ -76,6 +80,11 @@ copyright = "2021 onwards, Tom Nicholas and its Contributors" author = "Tom Nicholas" +html_show_sourcelink = True +srclink_project = "https://github.com/xarray-contrib/datatree" +srclink_branch = "main" +srclink_src_path = "docs/source" + # The version info for the project you're documenting, acts as replacement for # |version| and |release|, also used in various other places throughout the # built documents. @@ -127,6 +136,7 @@ intersphinx_mapping = { "python": ("https://docs.python.org/3.8/", None), + "numpy": ("https://numpy.org/doc/stable", None), "xarray": ("https://xarray.pydata.org/en/stable/", None), } @@ -142,7 +152,7 @@ html_theme_options = { "repository_url": "https://github.com/xarray-contrib/datatree", "repository_branch": "main", - "path_to_docs": "doc", + "path_to_docs": "docs/source", "use_repository_button": True, "use_issues_button": True, "use_edit_page_button": True, @@ -334,12 +344,12 @@ def linkcode_resolve(domain, info): else: linespec = "" - fn = os.path.relpath(fn, start=os.path.dirname(xarray.__file__)) + fn = os.path.relpath(fn, start=os.path.dirname(datatree.__file__)) - if "+" in xarray.__version__: + if "+" in datatree.__version__: return f"https://github.com/xarray-contrib/datatree/blob/main/datatree/{fn}{linespec}" else: return ( f"https://github.com/xarray-contrib/datatree/blob/" - f"v{datatree.__version__}/xarray/{fn}{linespec}" + f"v{datatree.__version__}/datatree/{fn}{linespec}" ) diff --git a/xarray/datatree_/docs/source/data-structures.rst b/xarray/datatree_/docs/source/data-structures.rst index 37bfb12cf30..23dd8edf315 100644 --- a/xarray/datatree_/docs/source/data-structures.rst +++ b/xarray/datatree_/docs/source/data-structures.rst @@ -1,3 +1,5 @@ +.. currentmodule:: datatree + .. _data structures: Data Structures diff --git a/xarray/datatree_/docs/source/hierarchical-data.rst b/xarray/datatree_/docs/source/hierarchical-data.rst index 899423b4b49..ac20b53c6f5 100644 --- a/xarray/datatree_/docs/source/hierarchical-data.rst +++ b/xarray/datatree_/docs/source/hierarchical-data.rst @@ -1,3 +1,5 @@ +.. currentmodule:: datatree + .. _hierarchical-data: Working With Hierarchical Data @@ -31,7 +33,7 @@ Examples of data which one might want organise in a grouped or hierarchical mann or even any combination of the above. -Often datasets like this cannot easily fit into a single ``xarray.Dataset`` object, +Often datasets like this cannot easily fit into a single :py:class:`xarray.Dataset` object, or are more usefully thought of as groups of related ``xarray.Dataset`` objects. For this purpose we provide the :py:class:`DataTree` class. diff --git a/xarray/datatree_/docs/source/index.rst b/xarray/datatree_/docs/source/index.rst index 9fd21c95de5..d13a0edf798 100644 --- a/xarray/datatree_/docs/source/index.rst +++ b/xarray/datatree_/docs/source/index.rst @@ -1,3 +1,5 @@ +.. currentmodule:: datatree + Datatree ======== diff --git a/xarray/datatree_/docs/source/installation.rst b/xarray/datatree_/docs/source/installation.rst index 6cab417e950..b2682743ade 100644 --- a/xarray/datatree_/docs/source/installation.rst +++ b/xarray/datatree_/docs/source/installation.rst @@ -1,3 +1,5 @@ +.. currentmodule:: datatree + ============ Installation ============ diff --git a/xarray/datatree_/docs/source/io.rst b/xarray/datatree_/docs/source/io.rst index dee4ba802f4..2f2dabf9948 100644 --- a/xarray/datatree_/docs/source/io.rst +++ b/xarray/datatree_/docs/source/io.rst @@ -1,3 +1,5 @@ +.. currentmodule:: datatree + .. _io: Reading and Writing Files diff --git a/xarray/datatree_/docs/source/quick-overview.rst b/xarray/datatree_/docs/source/quick-overview.rst index 5ec2194a190..4743b0899fa 100644 --- a/xarray/datatree_/docs/source/quick-overview.rst +++ b/xarray/datatree_/docs/source/quick-overview.rst @@ -1,3 +1,5 @@ +.. currentmodule:: datatree + ############## Quick overview ############## @@ -5,8 +7,8 @@ Quick overview DataTrees --------- -:py:class:`DataTree` is a tree-like container of ``DataArray`` objects, organised into multiple mutually alignable groups. -You can think of it like a (recursive) ``dict`` of ``Dataset`` objects. +:py:class:`DataTree` is a tree-like container of :py:class:`xarray.DataArray` objects, organised into multiple mutually alignable groups. +You can think of it like a (recursive) ``dict`` of :py:class:`xarray.Dataset` objects. Let's first make some example xarray datasets (following on from xarray's `quick overview `_ page): diff --git a/xarray/datatree_/docs/source/terminology.rst b/xarray/datatree_/docs/source/terminology.rst index a6b1cc8f2de..e481a01a6b2 100644 --- a/xarray/datatree_/docs/source/terminology.rst +++ b/xarray/datatree_/docs/source/terminology.rst @@ -1,4 +1,5 @@ .. currentmodule:: datatree + .. _terminology: This page extends `xarray's page on terminology `_. diff --git a/xarray/datatree_/docs/source/tutorial.rst b/xarray/datatree_/docs/source/tutorial.rst index e70044c2aa9..6e33bd36f91 100644 --- a/xarray/datatree_/docs/source/tutorial.rst +++ b/xarray/datatree_/docs/source/tutorial.rst @@ -1,3 +1,5 @@ +.. currentmodule:: datatree + ======== Tutorial ======== diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 27ba0eb5546..ed099fd9bed 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -52,6 +52,8 @@ Bug fixes - Fix bug with :py:meth:`DataTree.relative_to` method (:issue:`133`, :pull:`160`). By `Tom Nicholas `_. +- Fix links to API docs in all documentation (:pull:`183`). + By `Tom Nicholas `_. Documentation ~~~~~~~~~~~~~ From 2b5c7d99e6b1f912be278ac41681ffaf6102fa82 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Fri, 6 Jan 2023 15:21:52 -0500 Subject: [PATCH 208/260] Update readme https://github.com/xarray-contrib/datatree/pull/187 * implement filter * test filter * update readme with content from docs index page * features heading * whatsnew --- xarray/datatree_/README.md | 34 ++++++++++++++++++++-- xarray/datatree_/docs/source/whats-new.rst | 2 ++ 2 files changed, 34 insertions(+), 2 deletions(-) diff --git a/xarray/datatree_/README.md b/xarray/datatree_/README.md index a770fc27b3e..4ab9e95f098 100644 --- a/xarray/datatree_/README.md +++ b/xarray/datatree_/README.md @@ -7,10 +7,23 @@ | **License** | [![License][license-badge]][repo-link] | -WIP implementation of a tree-like hierarchical data structure for xarray. +**Datatree is a prototype implementation of a tree-like hierarchical data structure for xarray.** -This aims to create the data structure discussed in [xarray issue #4118](https://github.com/pydata/xarray/issues/4118), and therefore extend xarray's data model to be able to [handle arbitrarily nested netCDF4 groups](https://github.com/pydata/xarray/issues/1092#issuecomment-868324949). +Datatree was born after the xarray team recognised a [need for a new hierarchical data structure](https://github.com/pydata/xarray/issues/4118), +that was more flexible than a single `xarray.Dataset` object. +The initial motivation was to represent netCDF files / Zarr stores with multiple nested groups in a single in-memory object, +but `datatree.DataTree` objects have many other uses. +### Why Datatree? + +You might want to use datatree for: + +- Organising many related datasets, e.g. results of the same experiment with different parameters, or simulations of the same system using different models, +- Analysing similar data at multiple resolutions simultaneously, such as when doing a convergence study, +- Comparing heterogenous but related data, such as experimental and theoretical data, +- I/O with nested data formats such as netCDF / Zarr groups. + +### Features The approach used here is based on benbovy's [`DatasetNode` example](https://gist.github.com/benbovy/92e7c76220af1aaa4b3a0b65374e233a) - the basic idea is that each tree node wraps a up to a single `xarray.Dataset`. The differences are that this effort: - Uses a node structure inspired by [anytree](https://github.com/xarray-contrib/datatree/issues/7) for the tree, @@ -21,6 +34,8 @@ The approach used here is based on benbovy's [`DatasetNode` example](https://gis - Has a printable representation that currently looks like this: drawing +### Get Started + You can create a `DataTree` object in 3 ways: 1) Load from a netCDF file (or Zarr store) that has groups via `open_datatree()`. 2) Using the init method of `DataTree`, which creates an individual node. @@ -28,6 +43,21 @@ You can create a `DataTree` object in 3 ways: or through `__get/setitem__` access, e.g. `dt['path/to/node'] = DataTree()`. 3) Create a tree from a dictionary of paths to datasets using `DataTree.from_dict()`. +### Development Roadmap + +Datatree currently lives in a separate repository to the main xarray package. +This allows the datatree developers to make changes to it, experiment, and improve it faster. + +Eventually we plan to fully integrate datatree upstream into xarray's main codebase, at which point the [github.com/xarray-contrib/datatree](https://github.com/xarray-contrib/datatree>) repository will be archived. +This should not cause much disruption to code that depends on datatree - you will likely only have to change the import line (i.e. from ``from datatree import DataTree`` to ``from xarray import DataTree``). + +However, until this full integration occurs, datatree's API should not be considered to have the same [level of stability as xarray's](https://docs.xarray.dev/en/stable/contributing.html#backwards-compatibility). + +### User Feedback + +We really really really want to hear your opinions on datatree! +At this point in development, user feedback is critical to help us create something that will suit everyone's needs. +Please raise any thoughts, issues, suggestions or bugs, no matter how small or large, on the [github issue tracker](https://github.com/xarray-contrib/datatree/issues). [github-ci-badge]: https://img.shields.io/github/workflow/status/xarray-contrib/datatree/CI?label=CI&logo=github diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index ed099fd9bed..1bab66237b0 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -66,6 +66,8 @@ Documentation By `Tom Nicholas `_. - Added context content to ``Index`` page (:pull:`182`) By `Tom Nicholas `_. +- Updated the README (:pull:`187`) + By `Tom Nicholas `_. Internal Changes ~~~~~~~~~~~~~~~~ From 9986c5ac81ed9fc2827a3d0da43b4c8b9e4e3798 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Mon, 9 Jan 2023 10:47:59 -0500 Subject: [PATCH 209/260] blank whatsnew for v0.0.12 --- xarray/datatree_/docs/source/whats-new.rst | 33 +++++++++++++++++++++- 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 1bab66237b0..238487c1043 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -15,11 +15,42 @@ What's New np.random.seed(123456) +.. _whats-new.v0.0.12: + +v0.0.12 (unreleased) +-------------------- + +New Features +~~~~~~~~~~~~ + + +Breaking changes +~~~~~~~~~~~~~~~~ + + +Deprecations +~~~~~~~~~~~~ + +Bug fixes +~~~~~~~~~ + + +Documentation +~~~~~~~~~~~~~ + + +Internal Changes +~~~~~~~~~~~~~~~~ + + .. _whats-new.v0.0.11: -v0.0.11 (unreleased) +v0.0.11 (01/09/2023) -------------------- +Big update with entirely new pages in the docs, +new methods (``.drop_nodes``, ``.filter``, ``.leaves``, ``.descendants``), and bug fixes! + New Features ~~~~~~~~~~~~ From d954320deb98edd1d17f5f92b76e9dc93bee1c59 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Tue, 10 Jan 2023 11:45:14 -0500 Subject: [PATCH 210/260] Add link to AMS 2023 slides to readme --- xarray/datatree_/README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/xarray/datatree_/README.md b/xarray/datatree_/README.md index 4ab9e95f098..3d9aca29b8b 100644 --- a/xarray/datatree_/README.md +++ b/xarray/datatree_/README.md @@ -23,6 +23,8 @@ You might want to use datatree for: - Comparing heterogenous but related data, such as experimental and theoretical data, - I/O with nested data formats such as netCDF / Zarr groups. +[**Talk slides on Datatree from AMS-python 2023**](https://speakerdeck.com/tomnicholas/xarray-datatree-hierarchical-data-structures-for-multi-model-science) + ### Features The approach used here is based on benbovy's [`DatasetNode` example](https://gist.github.com/benbovy/92e7c76220af1aaa4b3a0b65374e233a) - the basic idea is that each tree node wraps a up to a single `xarray.Dataset`. The differences are that this effort: From c3d5e7b18fda1ccba15090fea459524a522a1009 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Tue, 10 Jan 2023 11:15:41 -0700 Subject: [PATCH 211/260] .to_dataset in map over subtree https://github.com/xarray-contrib/datatree/pull/194 * tests * use to_dataset instead of DatasetView in map_over_subtree * no longer forbid initialising a DatasetView with init * whatsnew --- xarray/datatree_/datatree/datatree.py | 10 +---- xarray/datatree_/datatree/mapping.py | 11 ++--- .../datatree_/datatree/tests/test_datatree.py | 16 +++++++ .../datatree_/datatree/tests/test_mapping.py | 44 +++++++++++++++++++ xarray/datatree_/docs/source/whats-new.rst | 2 + 5 files changed, 70 insertions(+), 13 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index a1bcf02fdd0..9a416d8f562 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -98,6 +98,8 @@ class DatasetView(Dataset): This includes all API on Dataset, which will be inherited. This requires overriding all inherited private constructors. + + We leave the public init constructor because it is used by type() in some xarray code (see datatree GH issue #188) """ # TODO what happens if user alters (in-place) a DataArray they extracted from this object? @@ -113,14 +115,6 @@ class DatasetView(Dataset): "_variables", ) - def __init__( - self, - data_vars: Optional[Mapping[Any, Any]] = None, - coords: Optional[Mapping[Any, Any]] = None, - attrs: Optional[Mapping[Any, Any]] = None, - ): - raise AttributeError("DatasetView objects are not to be initialized directly") - @classmethod def _from_node( cls, diff --git a/xarray/datatree_/datatree/mapping.py b/xarray/datatree_/datatree/mapping.py index 344842b7b49..5f43af961ca 100644 --- a/xarray/datatree_/datatree/mapping.py +++ b/xarray/datatree_/datatree/mapping.py @@ -189,14 +189,15 @@ def _map_over_subtree(*args, **kwargs) -> DataTree | Tuple[DataTree, ...]: *args_as_tree_length_iterables, *list(kwargs_as_tree_length_iterables.values()), ): - node_args_as_datasetviews = [ - a.ds if isinstance(a, DataTree) else a for a in all_node_args[:n_args] + node_args_as_datasets = [ + a.to_dataset() if isinstance(a, DataTree) else a + for a in all_node_args[:n_args] ] - node_kwargs_as_datasetviews = dict( + node_kwargs_as_datasets = dict( zip( [k for k in kwargs_as_tree_length_iterables.keys()], [ - v.ds if isinstance(v, DataTree) else v + v.to_dataset() if isinstance(v, DataTree) else v for v in all_node_args[n_args:] ], ) @@ -204,7 +205,7 @@ def _map_over_subtree(*args, **kwargs) -> DataTree | Tuple[DataTree, ...]: # Now we can call func on the data in this particular set of corresponding nodes results = ( - func(*node_args_as_datasetviews, **node_kwargs_as_datasetviews) + func(*node_args_as_datasets, **node_kwargs_as_datasets) if not node_of_first_tree.is_empty else None ) diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 9527b88a5c4..4dd8ac3f109 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -531,6 +531,22 @@ def test_arithmetic(self, create_test_datatree): result = 10.0 * dt["set1"].ds assert result.identical(expected) + def test_init_via_type(self): + # from datatree GH issue https://github.com/xarray-contrib/datatree/issues/188 + # xarray's .weighted is unusual because it uses type() to create a Dataset/DataArray + + a = xr.DataArray( + np.random.rand(3, 4, 10), + dims=["x", "y", "time"], + coords={"area": (["x", "y"], np.random.rand(3, 4))}, + ).to_dataset(name="data") + dt = DataTree(data=a) + + def weighted_mean(ds): + return ds.weighted(ds.area).mean(["x", "y"]) + + weighted_mean(dt.ds) + class TestRestructuring: def test_drop_nodes(self): diff --git a/xarray/datatree_/datatree/tests/test_mapping.py b/xarray/datatree_/datatree/tests/test_mapping.py index 9714233a9d9..47978edad5b 100644 --- a/xarray/datatree_/datatree/tests/test_mapping.py +++ b/xarray/datatree_/datatree/tests/test_mapping.py @@ -1,3 +1,4 @@ +import numpy as np import pytest import xarray as xr @@ -252,6 +253,49 @@ def times_ten(ds): assert_equal(result_tree, expected, from_root=False) +class TestMutableOperations: + def test_construct_using_type(self): + # from datatree GH issue https://github.com/xarray-contrib/datatree/issues/188 + # xarray's .weighted is unusual because it uses type() to create a Dataset/DataArray + + a = xr.DataArray( + np.random.rand(3, 4, 10), + dims=["x", "y", "time"], + coords={"area": (["x", "y"], np.random.rand(3, 4))}, + ).to_dataset(name="data") + b = xr.DataArray( + np.random.rand(2, 6, 14), + dims=["x", "y", "time"], + coords={"area": (["x", "y"], np.random.rand(2, 6))}, + ).to_dataset(name="data") + dt = DataTree.from_dict({"a": a, "b": b}) + + def weighted_mean(ds): + return ds.weighted(ds.area).mean(["x", "y"]) + + dt.map_over_subtree(weighted_mean) + + def test_alter_inplace(self): + simpsons = DataTree.from_dict( + d={ + "/": xr.Dataset({"age": 83}), + "/Herbert": xr.Dataset({"age": 40}), + "/Homer": xr.Dataset({"age": 39}), + "/Homer/Bart": xr.Dataset({"age": 10}), + "/Homer/Lisa": xr.Dataset({"age": 8}), + "/Homer/Maggie": xr.Dataset({"age": 1}), + }, + name="Abe", + ) + + def fast_forward(ds: xr.Dataset, years: float) -> xr.Dataset: + """Add some years to the age, but by altering the given dataset""" + ds["age"] = ds["age"] + years + return ds + + simpsons.map_over_subtree(fast_forward, years=10) + + @pytest.mark.xfail class TestMapOverSubTreeInplace: def test_map_over_subtree_inplace(self): diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 238487c1043..f4295bb4292 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -34,6 +34,8 @@ Deprecations Bug fixes ~~~~~~~~~ +- Allow for altering of given dataset inside function called by :py:func:`map_over_subtree` (:issue:`188`, :pull:`194`). + By `Tom Nicholas `_. Documentation ~~~~~~~~~~~~~ From 1899ad57eb154cb610c300d5b451065acb77c16f Mon Sep 17 00:00:00 2001 From: Justus Magin Date: Thu, 19 Jan 2023 10:14:53 +0100 Subject: [PATCH 212/260] update the badge route https://github.com/xarray-contrib/datatree/pull/202 * update the badge url * actually use new url --- xarray/datatree_/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/README.md b/xarray/datatree_/README.md index 3d9aca29b8b..9fb01a4439e 100644 --- a/xarray/datatree_/README.md +++ b/xarray/datatree_/README.md @@ -62,7 +62,7 @@ At this point in development, user feedback is critical to help us create someth Please raise any thoughts, issues, suggestions or bugs, no matter how small or large, on the [github issue tracker](https://github.com/xarray-contrib/datatree/issues). -[github-ci-badge]: https://img.shields.io/github/workflow/status/xarray-contrib/datatree/CI?label=CI&logo=github +[github-ci-badge]: https://img.shields.io/github/actions/workflow/status/xarray-contrib/datatree/main.yaml?branch=main&label=CI&logo=github [github-ci-link]: https://github.com/xarray-contrib/datatree/actions?query=workflow%3ACI [codecov-badge]: https://img.shields.io/codecov/c/github/xarray-contrib/datatree.svg?logo=codecov [codecov-link]: https://codecov.io/gh/xarray-contrib/datatree From d257806c4a11c940a9651c917c07ec828dca3b98 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Fri, 20 Jan 2023 08:37:18 -0700 Subject: [PATCH 213/260] Add level, depth & width properties https://github.com/xarray-contrib/datatree/pull/208 * tests * implementation * add to API * actually add to API * whatsnew --- .../datatree_/datatree/tests/test_treenode.py | 12 ++++ xarray/datatree_/datatree/treenode.py | 55 +++++++++++++++++++ xarray/datatree_/docs/source/api.rst | 3 + xarray/datatree_/docs/source/whats-new.rst | 3 + 4 files changed, 73 insertions(+) diff --git a/xarray/datatree_/datatree/tests/test_treenode.py b/xarray/datatree_/datatree/tests/test_treenode.py index a996468b367..935afe3948b 100644 --- a/xarray/datatree_/datatree/tests/test_treenode.py +++ b/xarray/datatree_/datatree/tests/test_treenode.py @@ -338,6 +338,18 @@ def test_leaves(self): for node, expected_name in zip(leaves, expected): assert node.name == expected_name + def test_levels(self): + a, f = create_test_tree() + + assert a.level == 0 + assert f.level == 3 + + assert a.depth == 3 + assert f.depth == 3 + + assert a.width == 1 + assert f.width == 3 + class TestRenderTree: def test_render_nodetree(self): diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index 2d618951ec4..60a4556dd96 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -334,6 +334,61 @@ def descendants(self: Tree) -> Tuple[Tree, ...]: this_node, *descendants = all_nodes return tuple(descendants) + @property + def level(self: Tree) -> int: + """ + Level of this node. + + Level means number of parent nodes above this node before reaching the root. + The root node is at level 0. + + Returns + ------- + level : int + + See Also + -------- + depth + width + """ + return len(self.ancestors) - 1 + + @property + def depth(self: Tree) -> int: + """ + Maximum level of this tree. + + Measured from the root, which has a depth of 0. + + Returns + ------- + depth : int + + See Also + -------- + level + width + """ + return max(node.level for node in self.root.subtree) + + @property + def width(self: Tree) -> int: + """ + Number of nodes at this level in the tree. + + Includes number of immediate siblings, but also "cousins" in other branches and so-on. + + Returns + ------- + depth : int + + See Also + -------- + level + depth + """ + return len([node for node in self.root.subtree if node.level == self.level]) + def _pre_detach(self: Tree, parent: Tree) -> None: """Method call before detaching from `parent`.""" pass diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst index 835b18d4832..bbe1d7aaf9f 100644 --- a/xarray/datatree_/docs/source/api.rst +++ b/xarray/datatree_/docs/source/api.rst @@ -31,6 +31,9 @@ Attributes relating to the recursive tree-like structure of a ``DataTree``. DataTree.is_root DataTree.is_leaf DataTree.leaves + DataTree.level + DataTree.depth + DataTree.width DataTree.subtree DataTree.descendants DataTree.siblings diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index f4295bb4292..d904947ebe9 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -23,6 +23,9 @@ v0.0.12 (unreleased) New Features ~~~~~~~~~~~~ +- Added a :py:func:`DataTree.level`, :py:func:`DataTree.depth`, and :py:func:`DataTree.width` property (:pull:`208`). + By `Tom Nicholas `_. + Breaking changes ~~~~~~~~~~~~~~~~ From f0720ad6e88b59680f897de705ac50a067408d21 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Wed, 25 Jan 2023 13:04:41 -0700 Subject: [PATCH 214/260] Attribute-like access and ipython autocomplete https://github.com/xarray-contrib/datatree/pull/98 * sketching out changes needed to integrate variables into DataTree * fixed some other basic conflicts * fix mypy errors * can create basic datatree node objects again * child-variable name collisions dectected correctly * in-progres * add _replace method * updated tests to assert identical instead of check .ds is expected_ds * refactor .ds setter to use _replace * refactor init to use _replace * refactor test tree to avoid init * attempt at copy methods * rewrote implementation of .copy method * xfailing test for deepcopying * attribute-like access * test for accessing attrs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * pseudocode implementation of DatasetView * Revert "pseudocode implementation of DatasetView" This reverts commit 52ef23baaa4b6892cad2d69c61b43db831044630. * removed duplicated implementation of copy * reorganise API docs * expose data_vars, coords etc. properties * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try except with calculate_dimensions private import * add keys/values/items methods * don't use has_data when .variables would do * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * attribute-like access now working * include attribute-like access in docs * silence warning about slots * should have been in last commit * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * whatsnew * make typing compatible with 3.8 * got string file path completion working * test ipython key completions * note about supporting auto-completion of relative paths in future Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- xarray/datatree_/datatree/common.py | 105 ++++++++++++++++++ xarray/datatree_/datatree/datatree.py | 90 +++++++++++++-- .../datatree_/datatree/tests/test_datatree.py | 38 +++++++ xarray/datatree_/docs/source/api.rst | 1 - .../docs/source/hierarchical-data.rst | 7 +- xarray/datatree_/docs/source/whats-new.rst | 6 +- 6 files changed, 230 insertions(+), 17 deletions(-) create mode 100644 xarray/datatree_/datatree/common.py diff --git a/xarray/datatree_/datatree/common.py b/xarray/datatree_/datatree/common.py new file mode 100644 index 00000000000..e4d52925ede --- /dev/null +++ b/xarray/datatree_/datatree/common.py @@ -0,0 +1,105 @@ +""" +This file and class only exists because it was easier to copy the code for AttrAccessMixin from xarray.core.common +with some slight modifications than it was to change the behaviour of an inherited xarray internal here. + +The modifications are marked with # TODO comments. +""" + +import warnings +from contextlib import suppress +from typing import Any, Hashable, Iterable, List, Mapping + + +class TreeAttrAccessMixin: + """Mixin class that allows getting keys with attribute access""" + + __slots__ = () + + def __init_subclass__(cls, **kwargs): + """Verify that all subclasses explicitly define ``__slots__``. If they don't, + raise error in the core xarray module and a FutureWarning in third-party + extensions. + """ + if not hasattr(object.__new__(cls), "__dict__"): + pass + # TODO reinstate this once integrated upstream + # elif cls.__module__.startswith("datatree."): + # raise AttributeError(f"{cls.__name__} must explicitly define __slots__") + # else: + # cls.__setattr__ = cls._setattr_dict + # warnings.warn( + # f"xarray subclass {cls.__name__} should explicitly define __slots__", + # FutureWarning, + # stacklevel=2, + # ) + super().__init_subclass__(**kwargs) + + @property + def _attr_sources(self) -> Iterable[Mapping[Hashable, Any]]: + """Places to look-up items for attribute-style access""" + yield from () + + @property + def _item_sources(self) -> Iterable[Mapping[Hashable, Any]]: + """Places to look-up items for key-autocompletion""" + yield from () + + def __getattr__(self, name: str) -> Any: + if name not in {"__dict__", "__setstate__"}: + # this avoids an infinite loop when pickle looks for the + # __setstate__ attribute before the xarray object is initialized + for source in self._attr_sources: + with suppress(KeyError): + return source[name] + raise AttributeError( + f"{type(self).__name__!r} object has no attribute {name!r}" + ) + + # This complicated two-method design boosts overall performance of simple operations + # - particularly DataArray methods that perform a _to_temp_dataset() round-trip - by + # a whopping 8% compared to a single method that checks hasattr(self, "__dict__") at + # runtime before every single assignment. All of this is just temporary until the + # FutureWarning can be changed into a hard crash. + def _setattr_dict(self, name: str, value: Any) -> None: + """Deprecated third party subclass (see ``__init_subclass__`` above)""" + object.__setattr__(self, name, value) + if name in self.__dict__: + # Custom, non-slotted attr, or improperly assigned variable? + warnings.warn( + f"Setting attribute {name!r} on a {type(self).__name__!r} object. Explicitly define __slots__ " + "to suppress this warning for legitimate custom attributes and " + "raise an error when attempting variables assignments.", + FutureWarning, + stacklevel=2, + ) + + def __setattr__(self, name: str, value: Any) -> None: + """Objects with ``__slots__`` raise AttributeError if you try setting an + undeclared attribute. This is desirable, but the error message could use some + improvement. + """ + try: + object.__setattr__(self, name, value) + except AttributeError as e: + # Don't accidentally shadow custom AttributeErrors, e.g. + # DataArray.dims.setter + if str(e) != "{!r} object has no attribute {!r}".format( + type(self).__name__, name + ): + raise + raise AttributeError( + f"cannot set attribute {name!r} on a {type(self).__name__!r} object. Use __setitem__ style" + "assignment (e.g., `ds['name'] = ...`) instead of assigning variables." + ) from e + + def __dir__(self) -> List[str]: + """Provide method name lookup and completion. Only provide 'public' + methods. + """ + extra_attrs = { + item + for source in self._attr_sources + for item in source + if isinstance(item, str) + } + return sorted(set(dir(type(self))) | extra_attrs) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 9a416d8f562..e51fa92902c 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -13,6 +13,7 @@ Hashable, Iterable, Iterator, + List, Mapping, MutableMapping, Optional, @@ -22,7 +23,6 @@ overload, ) -import pandas as pd from xarray.core import utils from xarray.core.coordinates import DatasetCoordinates from xarray.core.dataarray import DataArray @@ -30,10 +30,17 @@ from xarray.core.indexes import Index, Indexes from xarray.core.merge import dataset_update_method from xarray.core.options import OPTIONS as XR_OPTS -from xarray.core.utils import Default, Frozen, _default, either_dict_or_kwargs -from xarray.core.variable import Variable, calculate_dimensions +from xarray.core.utils import ( + Default, + Frozen, + HybridMappingProxy, + _default, + either_dict_or_kwargs, +) +from xarray.core.variable import Variable from . import formatting, formatting_html +from .common import TreeAttrAccessMixin from .mapping import TreeIsomorphismError, check_isomorphic, map_over_subtree from .ops import ( DataTreeArithmeticMixin, @@ -43,7 +50,14 @@ from .render import RenderTree from .treenode import NamedNode, NodePath, Tree +try: + from xarray.core.variable import calculate_dimensions +except ImportError: + # for xarray versions 2022.03.0 and earlier + from xarray.core.dataset import calculate_dimensions + if TYPE_CHECKING: + import pandas as pd from xarray.core.merge import CoercibleValue from xarray.core.types import ErrorOptions @@ -227,6 +241,7 @@ class DataTree( MappedDatasetMethodsMixin, MappedDataWithCoords, DataTreeArithmeticMixin, + TreeAttrAccessMixin, Generic[Tree], Mapping, ): @@ -236,21 +251,17 @@ class DataTree( Attempts to present an API like that of xarray.Dataset, but methods are wrapped to also update all the tree's child nodes. """ - # TODO attribute-like access for both vars and child nodes (by inheriting from xarray.core.common.AttrsAccessMixin?) - - # TODO ipython autocomplete for child nodes - # TODO Some way of sorting children by depth - # TODO Consistency in copying vs updating objects - # TODO do we need a watch out for if methods intended only for root nodes are called on non-root nodes? # TODO dataset methods which should not or cannot act over the whole tree, such as .to_array - # TODO del and delitem methods + # TODO .loc method + + # TODO a lot of properties like .variables could be defined in a DataMapping class which both Dataset and DataTree inherit from - # TODO .loc, __contains__, __iter__, __array__, __len__ + # TODO all groupby classes # TODO a lot of properties like .variables could be defined in a DataMapping class which both Dataset and DataTree inherit from @@ -271,6 +282,9 @@ class DataTree( _variables: Dict[Hashable, Variable] __slots__ = ( + "_name", + "_parent", + "_children", "_attrs", "_cache", "_coord_names", @@ -485,6 +499,51 @@ def sizes(self) -> Mapping[Hashable, int]: """ return self.dims + @property + def _attr_sources(self) -> Iterable[Mapping[Hashable, Any]]: + """Places to look-up items for attribute-style access""" + yield from self._item_sources + yield self.attrs + + @property + def _item_sources(self) -> Iterable[Mapping[Any, Any]]: + """Places to look-up items for key-completion""" + yield self.data_vars + yield HybridMappingProxy(keys=self._coord_names, mapping=self.coords) + + # virtual coordinates + yield HybridMappingProxy(keys=self.dims, mapping=self) + + # immediate child nodes + yield self.children + + def _ipython_key_completions_(self) -> List[str]: + """Provide method for the key-autocompletions in IPython. + See http://ipython.readthedocs.io/en/stable/config/integrating.html#tab-completion + For the details. + """ + + # TODO allow auto-completing relative string paths, e.g. `dt['path/to/../ node'` + # Would require changes to ipython's autocompleter, see https://github.com/ipython/ipython/issues/12420 + # Instead for now we only list direct paths to all node in subtree explicitly + + items_on_this_node = self._item_sources + full_file_like_paths_to_all_nodes_in_subtree = { + node.path[1:]: node for node in self.subtree + } + + all_item_sources = itertools.chain( + items_on_this_node, [full_file_like_paths_to_all_nodes_in_subtree] + ) + + items = { + item + for source in all_item_sources + for item in source + if isinstance(item, str) + } + return list(items) + def __contains__(self, key: object) -> bool: """The 'in' operator will return true or false depending on whether 'key' is either an array stored in the datatree or a child node, or neither. @@ -497,6 +556,14 @@ def __bool__(self) -> bool: def __iter__(self) -> Iterator[Hashable]: return itertools.chain(self.ds.data_vars, self.children) + def __array__(self, dtype=None): + raise TypeError( + "cannot directly convert a DataTree into a " + "numpy array. Instead, create an xarray.DataArray " + "first, either with indexing on the DataTree or by " + "invoking the `to_array()` method." + ) + def __repr__(self) -> str: return formatting.datatree_repr(self) @@ -966,6 +1033,7 @@ def __len__(self) -> int: @property def indexes(self) -> Indexes[pd.Index]: """Mapping of pandas.Index objects used for label based indexing. + Raises an error if this DataTree node has indexes that cannot be coerced to pandas.Index objects. diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 4dd8ac3f109..4c23a09d504 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -113,6 +113,7 @@ class TestStoreDatasets: def test_create_with_data(self): dat = xr.Dataset({"a": 0}) john = DataTree(name="john", data=dat) + xrt.assert_identical(john.to_dataset(), dat) with pytest.raises(TypeError): @@ -122,7 +123,9 @@ def test_set_data(self): john = DataTree(name="john") dat = xr.Dataset({"a": 0}) john.ds = dat + xrt.assert_identical(john.to_dataset(), dat) + with pytest.raises(TypeError): john.ds = "junk" @@ -147,6 +150,7 @@ def test_assign_when_already_child_with_variables_name(self): dt.ds = xr.Dataset({"a": 0}) dt.ds = xr.Dataset() + new_ds = dt.to_dataset().assign(a=xr.DataArray(0)) with pytest.raises(KeyError, match="names would collide"): dt.ds = new_ds @@ -548,6 +552,40 @@ def weighted_mean(ds): weighted_mean(dt.ds) +class TestAccess: + def test_attribute_access(self, create_test_datatree): + dt = create_test_datatree() + + # vars / coords + for key in ["a", "set0"]: + xrt.assert_equal(dt[key], getattr(dt, key)) + assert key in dir(dt) + + # dims + xrt.assert_equal(dt["a"]["y"], getattr(dt.a, "y")) + assert "y" in dir(dt["a"]) + + # children + for key in ["set1", "set2", "set3"]: + dtt.assert_equal(dt[key], getattr(dt, key)) + assert key in dir(dt) + + # attrs + dt.attrs["meta"] = "NASA" + assert dt.attrs["meta"] == "NASA" + assert "meta" in dir(dt) + + def test_ipython_key_completions(self, create_test_datatree): + dt = create_test_datatree() + key_completions = dt._ipython_key_completions_() + + node_keys = [node.path[1:] for node in dt.subtree] + assert all(node_key in key_completions for node_key in node_keys) + + var_keys = list(dt.variables.keys()) + assert all(var_key in key_completions for var_key in var_keys) + + class TestRestructuring: def test_drop_nodes(self): sue = DataTree.from_dict({"Mary": None, "Kate": None, "Ashley": None}) diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst index bbe1d7aaf9f..9a34bdd0089 100644 --- a/xarray/datatree_/docs/source/api.rst +++ b/xarray/datatree_/docs/source/api.rst @@ -113,7 +113,6 @@ Manipulate the contents of all nodes in a tree simultaneously. :toctree: generated/ DataTree.copy - DataTree.assign_coords DataTree.merge DataTree.rename diff --git a/xarray/datatree_/docs/source/hierarchical-data.rst b/xarray/datatree_/docs/source/hierarchical-data.rst index ac20b53c6f5..66c7d51b453 100644 --- a/xarray/datatree_/docs/source/hierarchical-data.rst +++ b/xarray/datatree_/docs/source/hierarchical-data.rst @@ -263,7 +263,12 @@ This also means that the names of variables and of child nodes must be different Attribute-like access ~~~~~~~~~~~~~~~~~~~~~ -# TODO attribute-like access is not yet implemented, see issue https://github.com/xarray-contrib/datatree/issues/98 +You can also select both variables and child nodes through dot indexing + +.. ipython:: python + + dt.foo + dt.a .. _filesystem paths: diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index d904947ebe9..ed33c2b6550 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -25,12 +25,12 @@ New Features - Added a :py:func:`DataTree.level`, :py:func:`DataTree.depth`, and :py:func:`DataTree.width` property (:pull:`208`). By `Tom Nicholas `_. - +- Allow dot-style (or "attribute-like") access to child nodes and variables, with ipython autocomplete. (:issue:`189`, :pull:`98`) + By `Tom Nicholas `_. Breaking changes ~~~~~~~~~~~~~~~~ - Deprecations ~~~~~~~~~~~~ @@ -43,11 +43,9 @@ Bug fixes Documentation ~~~~~~~~~~~~~ - Internal Changes ~~~~~~~~~~~~~~~~ - .. _whats-new.v0.0.11: v0.0.11 (01/09/2023) From 0b9bde5dd93fbe71e6d00bac24632770d012e455 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Fri, 27 Jan 2023 10:06:23 -0700 Subject: [PATCH 215/260] Deprecate python 3.8 https://github.com/xarray-contrib/datatree/pull/214 * update requirements and envs * whatsnew * added classifier for 3.11 --- xarray/datatree_/.github/workflows/main.yaml | 4 ++-- xarray/datatree_/.github/workflows/pypipublish.yaml | 2 +- xarray/datatree_/ci/doc.yml | 2 +- xarray/datatree_/ci/environment.yml | 2 +- xarray/datatree_/docs/source/whats-new.rst | 3 +++ xarray/datatree_/setup.cfg | 4 ++-- 6 files changed, 10 insertions(+), 7 deletions(-) diff --git a/xarray/datatree_/.github/workflows/main.yaml b/xarray/datatree_/.github/workflows/main.yaml index b18159aed50..cfced572d3a 100644 --- a/xarray/datatree_/.github/workflows/main.yaml +++ b/xarray/datatree_/.github/workflows/main.yaml @@ -20,7 +20,7 @@ jobs: shell: bash -l {0} strategy: matrix: - python-version: ["3.8", "3.9", "3.10"] + python-version: ["3.9", "3.10", "3.11"] steps: - uses: actions/checkout@v3 @@ -65,7 +65,7 @@ jobs: shell: bash -l {0} strategy: matrix: - python-version: ["3.8", "3.9", "3.10"] + python-version: ["3.9", "3.10", "3.11"] steps: - uses: actions/checkout@v3 diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index 98860accf70..cf94c00b12c 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -25,7 +25,7 @@ jobs: - uses: actions/setup-python@v4 name: Install Python with: - python-version: 3.8 + python-version: 3.9 - name: Install dependencies run: | diff --git a/xarray/datatree_/ci/doc.yml b/xarray/datatree_/ci/doc.yml index ce502c0f5f5..69c9b1042db 100644 --- a/xarray/datatree_/ci/doc.yml +++ b/xarray/datatree_/ci/doc.yml @@ -3,7 +3,7 @@ channels: - conda-forge dependencies: - pip - - python>=3.8 + - python>=3.9 - netcdf4 - scipy - sphinx>=4.2.0 diff --git a/xarray/datatree_/ci/environment.yml b/xarray/datatree_/ci/environment.yml index 1aa9af93363..fc0c6d97e9f 100644 --- a/xarray/datatree_/ci/environment.yml +++ b/xarray/datatree_/ci/environment.yml @@ -3,7 +3,7 @@ channels: - conda-forge - nodefaults dependencies: - - python>=3.8 + - python>=3.9 - netcdf4 - pytest - flake8 diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index ed33c2b6550..0b163a0f0f0 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -34,6 +34,9 @@ Breaking changes Deprecations ~~~~~~~~~~~~ +- Dropped support for python 3.8 (:issue:`212`, :pull:`214`) + By `Tom Nicholas `_. + Bug fixes ~~~~~~~~~ diff --git a/xarray/datatree_/setup.cfg b/xarray/datatree_/setup.cfg index 2c7a052b197..48517c3be78 100644 --- a/xarray/datatree_/setup.cfg +++ b/xarray/datatree_/setup.cfg @@ -14,13 +14,13 @@ classifiers = License :: OSI Approved :: Apache Software License Operating System :: OS Independent Programming Language :: Python - Programming Language :: Python :: 3.8 Programming Language :: Python :: 3.9 Programming Language :: Python :: 3.10 + Programming Language :: Python :: 3.11 [options] packages = find: -python_requires = >=3.8 +python_requires = >=3.9 install_requires = xarray >=2022.6.0 From f36203e1f450c38598b62d43fed8bcc63b776a3a Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 31 Jan 2023 23:17:24 -0500 Subject: [PATCH 216/260] add py.typed to package_data --- xarray/datatree_/setup.cfg | 1 + 1 file changed, 1 insertion(+) diff --git a/xarray/datatree_/setup.cfg b/xarray/datatree_/setup.cfg index 48517c3be78..4066b2c3b86 100644 --- a/xarray/datatree_/setup.cfg +++ b/xarray/datatree_/setup.cfg @@ -20,6 +20,7 @@ classifiers = [options] packages = find: +package_data={'datatree': 'py.typed'} python_requires = >=3.9 install_requires = xarray >=2022.6.0 From d78309960f77cf77012d8d923ba78e823b1cfcaf Mon Sep 17 00:00:00 2001 From: Justus Magin Date: Fri, 10 Feb 2023 15:01:23 +0100 Subject: [PATCH 217/260] move to a `pyproject.toml`-based build configuration https://github.com/xarray-contrib/datatree/pull/219 * move to a `pyproject.toml`-based configuration * move the `flake8` configuration to a separate file * move `isort` and `mypy` configuration to `pyproject.toml` * update the `isort` version to avoid the install error * install the currently checked-out version of datatree [skip-ci] * install xarray from conda-forge * fix the install path --- xarray/datatree_/.flake8 | 15 ++++++ xarray/datatree_/.git_archival.txt | 4 ++ xarray/datatree_/.pre-commit-config.yaml | 2 +- xarray/datatree_/ci/doc.yml | 4 +- xarray/datatree_/pyproject.toml | 51 ++++++++++++++++++-- xarray/datatree_/setup.cfg | 60 ------------------------ 6 files changed, 70 insertions(+), 66 deletions(-) create mode 100644 xarray/datatree_/.flake8 create mode 100644 xarray/datatree_/.git_archival.txt delete mode 100644 xarray/datatree_/setup.cfg diff --git a/xarray/datatree_/.flake8 b/xarray/datatree_/.flake8 new file mode 100644 index 00000000000..f1e3f9271e1 --- /dev/null +++ b/xarray/datatree_/.flake8 @@ -0,0 +1,15 @@ +[flake8] +ignore = + # whitespace before ':' - doesn't work well with black + E203 + # module level import not at top of file + E402 + # line too long - let black worry about that + E501 + # do not assign a lambda expression, use a def + E731 + # line break before binary operator + W503 +exclude= + .eggs + doc diff --git a/xarray/datatree_/.git_archival.txt b/xarray/datatree_/.git_archival.txt new file mode 100644 index 00000000000..3994ec0a83e --- /dev/null +++ b/xarray/datatree_/.git_archival.txt @@ -0,0 +1,4 @@ +node: $Format:%H$ +node-date: $Format:%cI$ +describe-name: $Format:%(describe:tags=true)$ +ref-names: $Format:%D$ diff --git a/xarray/datatree_/.pre-commit-config.yaml b/xarray/datatree_/.pre-commit-config.yaml index 7773f727497..b1439989238 100644 --- a/xarray/datatree_/.pre-commit-config.yaml +++ b/xarray/datatree_/.pre-commit-config.yaml @@ -10,7 +10,7 @@ repos: - id: check-yaml # isort should run before black as black sometimes tweaks the isort output - repo: https://github.com/PyCQA/isort - rev: 5.11.4 + rev: 5.12.0 hooks: - id: isort # https://github.com/python/black#version-control-integration diff --git a/xarray/datatree_/ci/doc.yml b/xarray/datatree_/ci/doc.yml index 69c9b1042db..6e1fda6ee9f 100644 --- a/xarray/datatree_/ci/doc.yml +++ b/xarray/datatree_/ci/doc.yml @@ -18,8 +18,8 @@ dependencies: - ipython - h5netcdf - zarr + - xarray - pip: - - git+https://github.com/xarray-contrib/datatree + - -e .. - sphinxext-rediraffe - sphinxext-opengraph - - xarray>=2022.05.0.dev0 diff --git a/xarray/datatree_/pyproject.toml b/xarray/datatree_/pyproject.toml index 209ec8fee6a..a219b9767ff 100644 --- a/xarray/datatree_/pyproject.toml +++ b/xarray/datatree_/pyproject.toml @@ -1,9 +1,37 @@ +[project] +name = "xarray-datatree" +description = "Hierarchical tree-like data structures for xarray" +readme = "README.md" +authors = [ + {name = "Thomas Nicholas", email = "thomas.nicholas@columbia.edu"} +] +license = {text = "Apache-2"} +classifiers = [ + "Development Status :: 3 - Alpha", + "Intended Audience :: Science/Research", + "Topic :: Scientific/Engineering", + "License :: OSI Approved :: Apache Software License", + "Operating System :: OS Independent", + "Programming Language :: Python", + "Programming Language :: Python :: 3.9", + "Programming Language :: Python :: 3.10", + "Programming Language :: Python :: 3.11", +] +requires-python = ">=3.9" +dependencies = [ + "xarray >=2022.6.0", +] +dynamic = ["version"] + +[project.urls] +Home = "https://github.com/xarray-contrib/datatree" +Documentation = "https://xarray-datatree.readthedocs.io/en/stable/" + [build-system] requires = [ - "setuptools>=42", + "setuptools>=61.0.0", "wheel", - "setuptools_scm[toml]>=3.4", - "setuptools_scm_git_archive", + "setuptools_scm[toml]>=7.0", "check-manifest" ] @@ -13,3 +41,20 @@ write_to_template = ''' # Do not change! Do not track in version control! __version__ = "{version}" ''' + +[tool.setuptools.packages.find] +exclude = ["docs", "tests", "tests.*", "docs.*"] + +[tool.setuptools.package-data] +datatree = ["py.typed"] + +[tool.isort] +profile = "black" +skip_gitignore = true +float_to_top = true +default_section = "THIRDPARTY" +known_first_party = "datatree" + +[mypy] +files = "datatree/**/*.py" +show_error_codes = true diff --git a/xarray/datatree_/setup.cfg b/xarray/datatree_/setup.cfg deleted file mode 100644 index 4066b2c3b86..00000000000 --- a/xarray/datatree_/setup.cfg +++ /dev/null @@ -1,60 +0,0 @@ -[metadata] -name = xarray-datatree -description = Hierarchical tree-like data structures for xarray -long_description_content_type=text/markdown -long_description = file: README.md -url = https://github.com/xarray-contrib/datatree -author = Thomas Nicholas -author_email = thomas.nicholas@columbia.edu -license = Apache -classifiers = - Development Status :: 3 - Alpha - Intended Audience :: Science/Research - Topic :: Scientific/Engineering - License :: OSI Approved :: Apache Software License - Operating System :: OS Independent - Programming Language :: Python - Programming Language :: Python :: 3.9 - Programming Language :: Python :: 3.10 - Programming Language :: Python :: 3.11 - -[options] -packages = find: -package_data={'datatree': 'py.typed'} -python_requires = >=3.9 -install_requires = - xarray >=2022.6.0 - -[options.packages.find] -exclude = - docs - tests - tests.* - docs.* - -[flake8] -ignore = - # whitespace before ':' - doesn't work well with black - E203 - # module level import not at top of file - E402 - # line too long - let black worry about that - E501 - # do not assign a lambda expression, use a def - E731 - # line break before binary operator - W503 -exclude= - .eggs - doc - -[isort] -profile = black -skip_gitignore = true -float_to_top = true -default_section = THIRDPARTY -known_first_party = datatree - -[mypy] -files = datatree/**/*.py -show_error_codes = True From ac65afc6a1dcd334df1a9f52489dd4a3e264ff0f Mon Sep 17 00:00:00 2001 From: Justus Magin Date: Fri, 10 Feb 2023 15:22:00 +0100 Subject: [PATCH 218/260] copy subtrees without creating nodes for ancestors https://github.com/xarray-contrib/datatree/pull/201 * use relative paths for the copied descendants * check that copying subtrees works * changelog --- xarray/datatree_/datatree/datatree.py | 3 ++- xarray/datatree_/datatree/tests/test_datatree.py | 8 ++++++++ xarray/datatree_/docs/source/whats-new.rst | 2 ++ 3 files changed, 12 insertions(+), 1 deletion(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index e51fa92902c..0e3b348c114 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -734,7 +734,8 @@ def _copy_subtree( """Copy entire subtree""" new_tree = self._copy_node(deep=deep) for node in self.descendants: - new_tree[node.path] = node._copy_node(deep=deep) + path = node.relative_to(self) + new_tree[path] = node._copy_node(deep=deep) return new_tree def _copy_node( diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 4c23a09d504..de7e6ca1c01 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -263,6 +263,14 @@ def test_copy(self, create_test_datatree): assert "foo" not in node.attrs assert node.attrs["Test"] is copied_node.attrs["Test"] + def test_copy_subtree(self): + dt = DataTree.from_dict({"/level1/level2/level3": xr.Dataset()}) + + actual = dt["/level1/level2"].copy() + expected = DataTree.from_dict({"/level3": xr.Dataset()}, name="level2") + + dtt.assert_identical(actual, expected) + def test_deepcopy(self, create_test_datatree): dt = create_test_datatree() diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 0b163a0f0f0..ed5489dc7a8 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -42,6 +42,8 @@ Bug fixes - Allow for altering of given dataset inside function called by :py:func:`map_over_subtree` (:issue:`188`, :pull:`194`). By `Tom Nicholas `_. +- copy subtrees without creating ancestor nodes (:pull:`201`) + By `Justus Magin `_. Documentation ~~~~~~~~~~~~~ From a7831376c1caf09fd19954d96a9d5b973972db2c Mon Sep 17 00:00:00 2001 From: Justus Magin Date: Fri, 10 Feb 2023 16:04:24 +0100 Subject: [PATCH 219/260] update using dictionary unpacking https://github.com/xarray-contrib/datatree/pull/213 * merge two dictionaries using dictionary unpacking * check that the fix actually worked * use `|` to combine two dictionaries * Revert "use `|` to combine two dictionaries" This reverts commit ecfbbd55dc687ac5b2bb582cd3d29a4afc3608e4. --------- Co-authored-by: Tom Nicholas --- xarray/datatree_/datatree/datatree.py | 2 +- xarray/datatree_/datatree/tests/test_datatree.py | 11 +++++++++++ 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 0e3b348c114..8805fb419f4 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -878,7 +878,7 @@ def update(self, other: Dataset | Mapping[str, DataTree | DataArray]) -> None: vars_merge_result = dataset_update_method(self.to_dataset(), new_variables) # TODO are there any subtleties with preserving order of children like this? - merged_children = OrderedDict(**self.children, **new_children) + merged_children = OrderedDict({**self.children, **new_children}) self._replace( inplace=True, children=merged_children, **vars_merge_result._asdict() ) diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index de7e6ca1c01..20090d736f4 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -235,6 +235,17 @@ def test_update_doesnt_alter_child_name(self): child = dt["a"] assert child.name == "a" + def test_update_overwrite(self): + actual = DataTree.from_dict({"a": DataTree(xr.Dataset({"x": 1}))}) + actual.update({"a": DataTree(xr.Dataset({"x": 2}))}) + + expected = DataTree.from_dict({"a": DataTree(xr.Dataset({"x": 2}))}) + + print(actual) + print(expected) + + dtt.assert_equal(actual, expected) + class TestCopy: def test_copy(self, create_test_datatree): From adfc35459d8664d23e31b636f5447bfc7d750dc9 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Mon, 13 Feb 2023 15:20:57 +0100 Subject: [PATCH 220/260] [pre-commit.ci] pre-commit autoupdate https://github.com/xarray-contrib/datatree/pull/218 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * [pre-commit.ci] pre-commit autoupdate updates: - [github.com/PyCQA/isort: 5.11.4 → 5.12.0](https://github.com/PyCQA/isort/compare/5.11.4...5.12.0) - [github.com/psf/black: 22.12.0 → 23.1.0](https://github.com/psf/black/compare/22.12.0...23.1.0) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Justus Magin --- xarray/datatree_/.pre-commit-config.yaml | 2 +- xarray/datatree_/datatree/datatree.py | 2 -- xarray/datatree_/datatree/io.py | 2 -- xarray/datatree_/datatree/tests/test_datatree.py | 1 - 4 files changed, 1 insertion(+), 6 deletions(-) diff --git a/xarray/datatree_/.pre-commit-config.yaml b/xarray/datatree_/.pre-commit-config.yaml index b1439989238..f4dc53db253 100644 --- a/xarray/datatree_/.pre-commit-config.yaml +++ b/xarray/datatree_/.pre-commit-config.yaml @@ -15,7 +15,7 @@ repos: - id: isort # https://github.com/python/black#version-control-integration - repo: https://github.com/psf/black - rev: 22.12.0 + rev: 23.1.0 hooks: - id: black - repo: https://github.com/keewis/blackdoc diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 8805fb419f4..b6bf8ac02d0 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -379,7 +379,6 @@ def ds(self) -> DatasetView: @ds.setter def ds(self, data: Optional[Union[Dataset, DataArray]] = None) -> None: - ds = _coerce_to_dataset(data) _check_for_name_collisions(self.children, ds.variables) @@ -796,7 +795,6 @@ def __getitem__(self: DataTree, key: str) -> DataTree | DataArray: # Either: if utils.is_dict_like(key): - # dict-like indexing raise NotImplementedError("Should this index over whole tree?") elif isinstance(key, str): diff --git a/xarray/datatree_/datatree/io.py b/xarray/datatree_/datatree/io.py index fe18456efe3..73992f135da 100644 --- a/xarray/datatree_/datatree/io.py +++ b/xarray/datatree_/datatree/io.py @@ -120,7 +120,6 @@ def _datatree_to_netcdf( unlimited_dims=None, **kwargs, ): - if kwargs.get("format", None) not in [None, "NETCDF4"]: raise ValueError("to_netcdf only supports the NETCDF4 format") @@ -182,7 +181,6 @@ def _datatree_to_zarr( consolidated: bool = True, **kwargs, ): - from zarr.convenience import consolidate_metadata # type: ignore if kwargs.get("group", None) is not None: diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 20090d736f4..ec4d5cf0043 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -258,7 +258,6 @@ def test_copy(self, create_test_datatree): dtt.assert_identical(dt, copied) for node, copied_node in zip(dt.root.subtree, copied.root.subtree): - assert node.encoding == copied_node.encoding # Note: IndexVariable objects with string dtype are always # copied because of xarray.core.util.safe_cast_to_index. From b0f39cb1595bf7412cc30fee087e3cd0b91e04a0 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Tue, 7 Mar 2023 15:24:22 -0500 Subject: [PATCH 221/260] [pre-commit.ci] pre-commit autoupdate https://github.com/xarray-contrib/datatree/pull/222 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit updates: - [github.com/pre-commit/mirrors-mypy: v0.991 → v1.0.1](https://github.com/pre-commit/mirrors-mypy/compare/v0.991...v1.0.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- xarray/datatree_/.pre-commit-config.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/.pre-commit-config.yaml b/xarray/datatree_/.pre-commit-config.yaml index f4dc53db253..1c83ba6278f 100644 --- a/xarray/datatree_/.pre-commit-config.yaml +++ b/xarray/datatree_/.pre-commit-config.yaml @@ -32,7 +32,7 @@ repos: # - id: velin # args: ["--write", "--compact"] - repo: https://github.com/pre-commit/mirrors-mypy - rev: v0.991 + rev: v1.0.1 hooks: - id: mypy # Copied from setup.cfg From a655d4243f209d8e457fb60ed308dfbfa0dd272d Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 7 Mar 2023 15:55:57 -0500 Subject: [PATCH 222/260] blank whatsnew for next release --- xarray/datatree_/docs/source/whats-new.rst | 25 +++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index ed5489dc7a8..e44dff08fdc 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -15,9 +15,32 @@ What's New np.random.seed(123456) +.. _whats-new.v0.0.13: + +v0.0.13 (unreleased) +-------------------- + +New Features +~~~~~~~~~~~~ + +Breaking changes +~~~~~~~~~~~~~~~~ + +Deprecations +~~~~~~~~~~~~ + +Bug fixes +~~~~~~~~~ + +Documentation +~~~~~~~~~~~~~ + +Internal Changes +~~~~~~~~~~~~~~~~ + .. _whats-new.v0.0.12: -v0.0.12 (unreleased) +v0.0.12 (03/07/2023) -------------------- New Features From f6b47b7fa6dd90d7c6b9c916d8aa6ee010ab47cb Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Thu, 16 Mar 2023 13:37:49 +0100 Subject: [PATCH 223/260] Bump pypa/gh-action-pypi-publish from 1.6.4 to 1.8.0 https://github.com/xarray-contrib/datatree/pull/224 Bumps [pypa/gh-action-pypi-publish](https://github.com/pypa/gh-action-pypi-publish) from 1.6.4 to 1.8.0. - [Release notes](https://github.com/pypa/gh-action-pypi-publish/releases) - [Commits](https://github.com/pypa/gh-action-pypi-publish/compare/v1.6.4...v1.8.0) --- updated-dependencies: - dependency-name: pypa/gh-action-pypi-publish dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- xarray/datatree_/.github/workflows/pypipublish.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index cf94c00b12c..61ddebb0bec 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -77,7 +77,7 @@ jobs: name: releases path: dist - name: Publish package to PyPI - uses: pypa/gh-action-pypi-publish@v1.6.4 + uses: pypa/gh-action-pypi-publish@v1.8.0 with: user: ${{ secrets.PYPI_USERNAME }} password: ${{ secrets.PYPI_PASSWORD }} From aed947fc84ba788f2622c8b069dfadb832f8d469 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue, 4 Apr 2023 12:05:14 +0200 Subject: [PATCH 224/260] Bump pypa/gh-action-pypi-publish from 1.8.0 to 1.8.5 https://github.com/xarray-contrib/datatree/pull/237 Bumps [pypa/gh-action-pypi-publish](https://github.com/pypa/gh-action-pypi-publish) from 1.8.0 to 1.8.5. - [Release notes](https://github.com/pypa/gh-action-pypi-publish/releases) - [Commits](https://github.com/pypa/gh-action-pypi-publish/compare/v1.8.0...v1.8.5) --- updated-dependencies: - dependency-name: pypa/gh-action-pypi-publish dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- xarray/datatree_/.github/workflows/pypipublish.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index 61ddebb0bec..0841ae31ade 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -77,7 +77,7 @@ jobs: name: releases path: dist - name: Publish package to PyPI - uses: pypa/gh-action-pypi-publish@v1.8.0 + uses: pypa/gh-action-pypi-publish@v1.8.5 with: user: ${{ secrets.PYPI_USERNAME }} password: ${{ secrets.PYPI_PASSWORD }} From 9cd3ac2eaac365804a7ae642a876d363c470f14d Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 31 Jul 2023 14:40:03 -0400 Subject: [PATCH 225/260] Bump pypa/gh-action-pypi-publish from 1.8.5 to 1.8.7 https://github.com/xarray-contrib/datatree/pull/248 Bumps [pypa/gh-action-pypi-publish](https://github.com/pypa/gh-action-pypi-publish) from 1.8.5 to 1.8.7. - [Release notes](https://github.com/pypa/gh-action-pypi-publish/releases) - [Commits](https://github.com/pypa/gh-action-pypi-publish/compare/v1.8.5...v1.8.7) --- updated-dependencies: - dependency-name: pypa/gh-action-pypi-publish dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- xarray/datatree_/.github/workflows/pypipublish.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index 0841ae31ade..cdda71bdd89 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -77,7 +77,7 @@ jobs: name: releases path: dist - name: Publish package to PyPI - uses: pypa/gh-action-pypi-publish@v1.8.5 + uses: pypa/gh-action-pypi-publish@v1.8.7 with: user: ${{ secrets.PYPI_USERNAME }} password: ${{ secrets.PYPI_PASSWORD }} From f6325236757b8dcf9bcc71d08f1cb900cb0877e9 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Wed, 4 Oct 2023 11:41:12 -0400 Subject: [PATCH 226/260] Bump pypa/gh-action-pypi-publish from 1.8.7 to 1.8.10 https://github.com/xarray-contrib/datatree/pull/255 Bumps [pypa/gh-action-pypi-publish](https://github.com/pypa/gh-action-pypi-publish) from 1.8.7 to 1.8.10. - [Release notes](https://github.com/pypa/gh-action-pypi-publish/releases) - [Commits](https://github.com/pypa/gh-action-pypi-publish/compare/v1.8.7...v1.8.10) --- updated-dependencies: - dependency-name: pypa/gh-action-pypi-publish dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- xarray/datatree_/.github/workflows/pypipublish.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index cdda71bdd89..43cc0729838 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -77,7 +77,7 @@ jobs: name: releases path: dist - name: Publish package to PyPI - uses: pypa/gh-action-pypi-publish@v1.8.7 + uses: pypa/gh-action-pypi-publish@v1.8.10 with: user: ${{ secrets.PYPI_USERNAME }} password: ${{ secrets.PYPI_PASSWORD }} From 66356915c66a7835a8d23a47b6929dc820127096 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Wed, 4 Oct 2023 11:45:29 -0400 Subject: [PATCH 227/260] Bump codecov/codecov-action from 3.1.1 to 3.1.4 https://github.com/xarray-contrib/datatree/pull/245 Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 3.1.1 to 3.1.4. - [Release notes](https://github.com/codecov/codecov-action/releases) - [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/codecov/codecov-action/compare/v3.1.1...v3.1.4) --- updated-dependencies: - dependency-name: codecov/codecov-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Tom Nicholas --- xarray/datatree_/.github/workflows/main.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/.github/workflows/main.yaml b/xarray/datatree_/.github/workflows/main.yaml index cfced572d3a..7313ab281f8 100644 --- a/xarray/datatree_/.github/workflows/main.yaml +++ b/xarray/datatree_/.github/workflows/main.yaml @@ -48,7 +48,7 @@ jobs: python -m pytest --cov=./ --cov-report=xml --verbose - name: Upload code coverage to Codecov - uses: codecov/codecov-action@v3.1.1 + uses: codecov/codecov-action@v3.1.4 with: file: ./coverage.xml flags: unittests From 2f2f0b98813579aa48975747116fadb95d769ea0 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 16 Oct 2023 14:25:04 -0400 Subject: [PATCH 228/260] Bump actions/checkout from 3 to 4 https://github.com/xarray-contrib/datatree/pull/256 Bumps [actions/checkout](https://github.com/actions/checkout) from 3 to 4. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/v3...v4) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Tom Nicholas --- xarray/datatree_/.github/workflows/main.yaml | 4 ++-- xarray/datatree_/.github/workflows/pypipublish.yaml | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/xarray/datatree_/.github/workflows/main.yaml b/xarray/datatree_/.github/workflows/main.yaml index 7313ab281f8..bfc9f43652f 100644 --- a/xarray/datatree_/.github/workflows/main.yaml +++ b/xarray/datatree_/.github/workflows/main.yaml @@ -22,7 +22,7 @@ jobs: matrix: python-version: ["3.9", "3.10", "3.11"] steps: - - uses: actions/checkout@v3 + - uses: actions/checkout@v4 - name: Create conda environment uses: mamba-org/provision-with-micromamba@main @@ -67,7 +67,7 @@ jobs: matrix: python-version: ["3.9", "3.10", "3.11"] steps: - - uses: actions/checkout@v3 + - uses: actions/checkout@v4 - name: Create conda environment uses: mamba-org/provision-with-micromamba@main diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index 43cc0729838..9ad36fc5dce 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -19,7 +19,7 @@ jobs: runs-on: ubuntu-latest if: github.repository == 'xarray-contrib/datatree' steps: - - uses: actions/checkout@v3 + - uses: actions/checkout@v4 with: fetch-depth: 0 - uses: actions/setup-python@v4 From d416fa166581f6e9a16fc1ca24c4af0dd4749920 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Mon, 16 Oct 2023 14:38:03 -0400 Subject: [PATCH 229/260] [pre-commit.ci] pre-commit autoupdate https://github.com/xarray-contrib/datatree/pull/236 Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tom Nicholas Co-authored-by: Anderson Banihirwe <13301940+andersy005@users.noreply.github.com> --- xarray/datatree_/.pre-commit-config.yaml | 8 ++++---- xarray/datatree_/datatree/testing.py | 6 +++--- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/xarray/datatree_/.pre-commit-config.yaml b/xarray/datatree_/.pre-commit-config.yaml index 1c83ba6278f..a2dac76a44f 100644 --- a/xarray/datatree_/.pre-commit-config.yaml +++ b/xarray/datatree_/.pre-commit-config.yaml @@ -15,7 +15,7 @@ repos: - id: isort # https://github.com/python/black#version-control-integration - repo: https://github.com/psf/black - rev: 23.1.0 + rev: 23.9.1 hooks: - id: black - repo: https://github.com/keewis/blackdoc @@ -23,7 +23,7 @@ repos: hooks: - id: blackdoc - repo: https://github.com/PyCQA/flake8 - rev: 6.0.0 + rev: 6.1.0 hooks: - id: flake8 # - repo: https://github.com/Carreau/velin @@ -32,7 +32,7 @@ repos: # - id: velin # args: ["--write", "--compact"] - repo: https://github.com/pre-commit/mirrors-mypy - rev: v1.0.1 + rev: v1.5.1 hooks: - id: mypy # Copied from setup.cfg @@ -45,7 +45,7 @@ repos: types-pytz, # Dependencies that are typed numpy, - typing-extensions==3.10.0.0, + typing-extensions>=4.1.0, ] # run this occasionally, ref discussion https://github.com/pydata/xarray/pull/3194 # - repo: https://github.com/asottile/pyupgrade diff --git a/xarray/datatree_/datatree/testing.py b/xarray/datatree_/datatree/testing.py index a89cfb0f103..ebe32cbefcd 100644 --- a/xarray/datatree_/datatree/testing.py +++ b/xarray/datatree_/datatree/testing.py @@ -34,7 +34,7 @@ def assert_isomorphic(a: DataTree, b: DataTree, from_root: bool = False): assert_identical """ __tracebackhide__ = True - assert type(a) == type(b) + assert isinstance(a, type(b)) if isinstance(a, DataTree): if from_root: @@ -71,7 +71,7 @@ def assert_equal(a: DataTree, b: DataTree, from_root: bool = True): assert_identical """ __tracebackhide__ = True - assert type(a) == type(b) + assert isinstance(a, type(b)) if isinstance(a, DataTree): if from_root: @@ -109,7 +109,7 @@ def assert_identical(a: DataTree, b: DataTree, from_root: bool = True): """ __tracebackhide__ = True - assert type(a) == type(b) + assert isinstance(a, type(b)) if isinstance(a, DataTree): if from_root: a = a.root From 1ab3ba01f94ad89478dbd6e25f7a4b7816a152f5 Mon Sep 17 00:00:00 2001 From: Julius Busecke Date: Mon, 16 Oct 2023 20:58:02 +0200 Subject: [PATCH 230/260] Add installation instructions https://github.com/xarray-contrib/datatree/pull/231 Co-authored-by: Tom Nicholas --- xarray/datatree_/README.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/xarray/datatree_/README.md b/xarray/datatree_/README.md index 9fb01a4439e..f8beaaf8f95 100644 --- a/xarray/datatree_/README.md +++ b/xarray/datatree_/README.md @@ -14,6 +14,17 @@ that was more flexible than a single `xarray.Dataset` object. The initial motivation was to represent netCDF files / Zarr stores with multiple nested groups in a single in-memory object, but `datatree.DataTree` objects have many other uses. +### Installation +You can install datatree via pip: +```shell +pip install xarray-datatree +``` + +or via conda-forge +```shell +conda install -c conda-forge xarray-datatree +``` + ### Why Datatree? You might want to use datatree for: From dc4a28325b64c5ed41f6c0ed427deba5c9720114 Mon Sep 17 00:00:00 2001 From: Max Grover Date: Tue, 17 Oct 2023 12:37:28 -0500 Subject: [PATCH 231/260] FIX: Fix bug with nodepath python 3.12 issues https://github.com/xarray-contrib/datatree/pull/260 * ADD: Add test support for python 3.12 * ADD: Add to whats new doc * Apply suggestions from review Co-authored-by: Tom Nicholas * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * FIX: Add 3.12 to dev build --------- Co-authored-by: Tom Nicholas Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- xarray/datatree_/.github/workflows/main.yaml | 4 ++-- xarray/datatree_/datatree/tests/test_treenode.py | 8 +++++++- xarray/datatree_/datatree/treenode.py | 16 ++++++++-------- xarray/datatree_/docs/source/whats-new.rst | 3 +++ 4 files changed, 20 insertions(+), 11 deletions(-) diff --git a/xarray/datatree_/.github/workflows/main.yaml b/xarray/datatree_/.github/workflows/main.yaml index bfc9f43652f..37034fc5900 100644 --- a/xarray/datatree_/.github/workflows/main.yaml +++ b/xarray/datatree_/.github/workflows/main.yaml @@ -20,7 +20,7 @@ jobs: shell: bash -l {0} strategy: matrix: - python-version: ["3.9", "3.10", "3.11"] + python-version: ["3.9", "3.10", "3.11", "3.12"] steps: - uses: actions/checkout@v4 @@ -65,7 +65,7 @@ jobs: shell: bash -l {0} strategy: matrix: - python-version: ["3.9", "3.10", "3.11"] + python-version: ["3.9", "3.10", "3.11", "3.12"] steps: - uses: actions/checkout@v4 diff --git a/xarray/datatree_/datatree/tests/test_treenode.py b/xarray/datatree_/datatree/tests/test_treenode.py index 935afe3948b..5a05a6b5bef 100644 --- a/xarray/datatree_/datatree/tests/test_treenode.py +++ b/xarray/datatree_/datatree/tests/test_treenode.py @@ -1,7 +1,7 @@ import pytest from datatree.iterators import LevelOrderIter, PreOrderIter -from datatree.treenode import InvalidTreeError, NamedNode, TreeNode +from datatree.treenode import InvalidTreeError, NamedNode, NodePath, TreeNode class TestFamilyTree: @@ -369,3 +369,9 @@ def test_render_nodetree(self): ] for expected_node, printed_node in zip(expected_nodes, printout.splitlines()): assert expected_node in printed_node + + +def test_nodepath(): + path = NodePath("/Mary") + assert path.root == "/" + assert path.stem == "Mary" diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index 60a4556dd96..4950bc9ce12 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -1,5 +1,6 @@ from __future__ import annotations +import sys from collections import OrderedDict from pathlib import PurePosixPath from typing import ( @@ -30,21 +31,20 @@ class NotFoundInTreeError(ValueError): class NodePath(PurePosixPath): """Represents a path from one node to another within a tree.""" - def __new__(cls, *args: str | "NodePath") -> "NodePath": - obj = super().__new__(cls, *args) - - if obj.drive: + def __init__(self, *pathsegments): + if sys.version_info >= (3, 12): + super().__init__(*pathsegments) + else: + super().__new__(PurePosixPath, *pathsegments) + if self.drive: raise ValueError("NodePaths cannot have drives") - if obj.root not in ["/", ""]: + if self.root not in ["/", ""]: raise ValueError( 'Root of NodePath can only be either "/" or "", with "" meaning the path is relative.' ) - # TODO should we also forbid suffixes to avoid node names with dots in them? - return obj - Tree = TypeVar("Tree", bound="TreeNode") diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index e44dff08fdc..8e62adceb04 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -32,6 +32,9 @@ Deprecations Bug fixes ~~~~~~~~~ +- Ensure nodepath class is compatible with python 3.12 (:pull:`260`) + By `Max Grover `_. + Documentation ~~~~~~~~~~~~~ From a7881faf8323e105a8f3176b1c111714ba74ea08 Mon Sep 17 00:00:00 2001 From: Brewster Malevich Date: Mon, 23 Oct 2023 14:17:13 -0700 Subject: [PATCH 232/260] Fix minor typo https://github.com/xarray-contrib/datatree/pull/246 Co-authored-by: Tom Nicholas --- xarray/datatree_/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/README.md b/xarray/datatree_/README.md index f8beaaf8f95..df4e3b6cc49 100644 --- a/xarray/datatree_/README.md +++ b/xarray/datatree_/README.md @@ -52,7 +52,7 @@ The approach used here is based on benbovy's [`DatasetNode` example](https://gis You can create a `DataTree` object in 3 ways: 1) Load from a netCDF file (or Zarr store) that has groups via `open_datatree()`. 2) Using the init method of `DataTree`, which creates an individual node. - You can then specify the nodes' relationships to one other, either by setting `.parent` and `.chlldren` attributes, + You can then specify the nodes' relationships to one other, either by setting `.parent` and `.children` attributes, or through `__get/setitem__` access, e.g. `dt['path/to/node'] = DataTree()`. 3) Create a tree from a dictionary of paths to datasets using `DataTree.from_dict()`. From 60555c9e6bcf1f58785461eab56c62a10eccce95 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Mon, 23 Oct 2023 20:47:47 -0400 Subject: [PATCH 233/260] Map over only data nodes, ignoring attrs https://github.com/xarray-contrib/datatree/pull/263 * add test from issue * test as a property of map_over_subtree directly * change behaviour * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * correct test * whatsnew --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- xarray/datatree_/datatree/mapping.py | 6 +++--- xarray/datatree_/datatree/tests/test_datatree.py | 7 +++++++ xarray/datatree_/datatree/tests/test_mapping.py | 12 ++++++++++++ xarray/datatree_/docs/source/whats-new.rst | 3 +++ 4 files changed, 25 insertions(+), 3 deletions(-) diff --git a/xarray/datatree_/datatree/mapping.py b/xarray/datatree_/datatree/mapping.py index 5f43af961ca..6f8c65aebae 100644 --- a/xarray/datatree_/datatree/mapping.py +++ b/xarray/datatree_/datatree/mapping.py @@ -109,8 +109,8 @@ def map_over_subtree(func: Callable) -> Callable: Applies a function to every dataset in one or more subtrees, returning new trees which store the results. - The function will be applied to any non-empty dataset stored in any of the nodes in the trees. The returned trees - will have the same structure as the supplied trees. + The function will be applied to any data-containing dataset stored in any of the nodes in the trees. The returned + trees will have the same structure as the supplied trees. `func` needs to return one Datasets, DataArrays, or None in order to be able to rebuild the subtrees after mapping, as each result will be assigned to its respective node of a new tree via `DataTree.__setitem__`. Any @@ -206,7 +206,7 @@ def _map_over_subtree(*args, **kwargs) -> DataTree | Tuple[DataTree, ...]: # Now we can call func on the data in this particular set of corresponding nodes results = ( func(*node_args_as_datasets, **node_kwargs_as_datasets) - if not node_of_first_tree.is_empty + if node_of_first_tree.has_data else None ) diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index ec4d5cf0043..726925fa78a 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -603,6 +603,13 @@ def test_ipython_key_completions(self, create_test_datatree): var_keys = list(dt.variables.keys()) assert all(var_key in key_completions for var_key in var_keys) + def test_operation_with_attrs_but_no_data(self): + # tests bug from xarray-datatree GH262 + xs = xr.Dataset({"testvar": xr.DataArray(np.ones((2, 3)))}) + dt = DataTree.from_dict({"node1": xs, "node2": xs}) + dt.attrs["test_key"] = 1 # sel works fine without this line + dt.sel(dim_0=0) + class TestRestructuring: def test_drop_nodes(self): diff --git a/xarray/datatree_/datatree/tests/test_mapping.py b/xarray/datatree_/datatree/tests/test_mapping.py index 47978edad5b..e5a50155677 100644 --- a/xarray/datatree_/datatree/tests/test_mapping.py +++ b/xarray/datatree_/datatree/tests/test_mapping.py @@ -252,6 +252,18 @@ def times_ten(ds): result_tree = times_ten(subtree) assert_equal(result_tree, expected, from_root=False) + def test_skip_empty_nodes_with_attrs(self, create_test_datatree): + # inspired by xarray-datatree GH262 + dt = create_test_datatree() + dt["set1/set2"].attrs["foo"] = "bar" + + def check_for_data(ds): + # fails if run on a node that has no data + assert len(ds.variables) != 0 + return ds + + dt.map_over_subtree(check_for_data) + class TestMutableOperations: def test_construct_using_type(self): diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 8e62adceb04..607af509a56 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -26,6 +26,9 @@ New Features Breaking changes ~~~~~~~~~~~~~~~~ +- Nodes containing only attributes but no data are now ignored by :py:func:`map_over_subtree` (:issue:`262`, :pull:`263`) + By `Tom Nicholas `_. + Deprecations ~~~~~~~~~~~~ From 15672a2d4ce7de08aa54a5f9b7953f6e7796444c Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Mon, 23 Oct 2023 21:12:39 -0400 Subject: [PATCH 234/260] Add path to error message in map_over_subtree https://github.com/xarray-contrib/datatree/pull/264 * test * implementation * formatting * add version check, if not using 3.11 then you just won't get the extra info in the error message * whatsnew * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use better helper function * xfail test, because this does actually work... --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- xarray/datatree_/datatree/mapping.py | 36 ++++++++++++++++++- .../datatree_/datatree/tests/test_mapping.py | 17 +++++++++ xarray/datatree_/docs/source/whats-new.rst | 4 +++ 3 files changed, 56 insertions(+), 1 deletion(-) diff --git a/xarray/datatree_/datatree/mapping.py b/xarray/datatree_/datatree/mapping.py index 6f8c65aebae..bd41cdbda62 100644 --- a/xarray/datatree_/datatree/mapping.py +++ b/xarray/datatree_/datatree/mapping.py @@ -1,6 +1,7 @@ from __future__ import annotations import functools +import sys from itertools import repeat from textwrap import dedent from typing import TYPE_CHECKING, Callable, Tuple @@ -202,10 +203,15 @@ def _map_over_subtree(*args, **kwargs) -> DataTree | Tuple[DataTree, ...]: ], ) ) + func_with_error_context = _handle_errors_with_path_context( + node_of_first_tree.path + )(func) # Now we can call func on the data in this particular set of corresponding nodes results = ( - func(*node_args_as_datasets, **node_kwargs_as_datasets) + func_with_error_context( + *node_args_as_datasets, **node_kwargs_as_datasets + ) if node_of_first_tree.has_data else None ) @@ -251,6 +257,34 @@ def _map_over_subtree(*args, **kwargs) -> DataTree | Tuple[DataTree, ...]: return _map_over_subtree +def _handle_errors_with_path_context(path): + """Wraps given function so that if it fails it also raises path to node on which it failed.""" + + def decorator(func): + def wrapper(*args, **kwargs): + try: + return func(*args, **kwargs) + except Exception as e: + if sys.version_info >= (3, 11): + # Add the context information to the error message + e.add_note( + f"Raised whilst mapping function over node with path {path}" + ) + raise + + return wrapper + + return decorator + + +def add_note(err: BaseException, msg: str) -> None: + # TODO: remove once python 3.10 can be dropped + if sys.version_info < (3, 11): + err.__notes__ = getattr(err, "__notes__", []) + [msg] + else: + err.add_note(msg) + + def _check_single_set_return_values(path_to_node, obj): """Check types returned from single evaluation of func, and return number of return values received from func.""" if isinstance(obj, (Dataset, DataArray)): diff --git a/xarray/datatree_/datatree/tests/test_mapping.py b/xarray/datatree_/datatree/tests/test_mapping.py index e5a50155677..71a4fed6bf6 100644 --- a/xarray/datatree_/datatree/tests/test_mapping.py +++ b/xarray/datatree_/datatree/tests/test_mapping.py @@ -264,6 +264,23 @@ def check_for_data(ds): dt.map_over_subtree(check_for_data) + @pytest.mark.xfail( + reason="probably some bug in pytests handling of exception notes" + ) + def test_error_contains_path_of_offending_node(self, create_test_datatree): + dt = create_test_datatree() + dt["set1"]["bad_var"] = 0 + print(dt) + + def fail_on_specific_node(ds): + if "bad_var" in ds: + raise ValueError("Failed because 'bar_var' present in dataset") + + with pytest.raises( + ValueError, match="Raised whilst mapping function over node /set1" + ): + dt.map_over_subtree(fail_on_specific_node) + class TestMutableOperations: def test_construct_using_type(self): diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 607af509a56..d82353b8c33 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -23,6 +23,10 @@ v0.0.13 (unreleased) New Features ~~~~~~~~~~~~ +- Indicate which node caused the problem if error encountered while applying user function using :py:func:`map_over_subtree` + (:issue:`190`, :pull:`264`). Only works when using python 3.11 or later. + By `Tom Nicholas `_. + Breaking changes ~~~~~~~~~~~~~~~~ From 8a64309a8f520136a0859eba6e2abea4e7d63ad7 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Mon, 23 Oct 2023 21:55:29 -0400 Subject: [PATCH 235/260] Docs on manipulating trees https://github.com/xarray-contrib/datatree/pull/180 * why hierarchical data * add hierarchical data page to index * Simpsons family tree * evolutionary tree * WIP rearrangement of creating trees * fixed examples in data structures page * dict-like navigation * filesystem-like paths explained * split PR into parts * plan * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ipython bug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * filter simpsons family tree by age * use new filter method * test about filter * simple example of mapping over a subtree * ideas for docs on iterating over trees * add section on iterating over subtree * text to accompany Simpsons family aging example * add voltage dataset * RMS as example of mapping custom computation * isomorphism * P=IV example of binary multiplication * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unfinished sections * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * whatsnew --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- .../docs/source/hierarchical-data.rst | 261 +++++++++++++++++- xarray/datatree_/docs/source/whats-new.rst | 3 + 2 files changed, 263 insertions(+), 1 deletion(-) diff --git a/xarray/datatree_/docs/source/hierarchical-data.rst b/xarray/datatree_/docs/source/hierarchical-data.rst index 66c7d51b453..7795c9e2876 100644 --- a/xarray/datatree_/docs/source/hierarchical-data.rst +++ b/xarray/datatree_/docs/source/hierarchical-data.rst @@ -175,7 +175,7 @@ Let's use a different example of a tree to discuss more complex relationships be ] We have used the :py:meth:`~DataTree.from_dict` constructor method as an alternate way to quickly create a whole tree, -and :ref:`filesystem-like syntax `_ (to be explained shortly) to select two nodes of interest. +and :ref:`filesystem paths` (to be explained shortly) to select two nodes of interest. .. ipython:: python @@ -339,3 +339,262 @@ we can construct a complex tree quickly using the alternative constructor :py:me Notice that using the path-like syntax will also create any intermediate empty nodes necessary to reach the end of the specified path (i.e. the node labelled `"c"` in this case.) This is to help avoid lots of redundant entries when creating deeply-nested trees using :py:meth:`DataTree.from_dict`. + +.. _iterating over trees: + +Iterating over trees +~~~~~~~~~~~~~~~~~~~~ + +You can iterate over every node in a tree using the subtree :py:class:`~DataTree.subtree` property. +This returns an iterable of nodes, which yields them in depth-first order. + +.. ipython:: python + + for node in vertebrates.subtree: + print(node.path) + +A very useful pattern is to use :py:class:`~DataTree.subtree` conjunction with the :py:class:`~DataTree.path` property to manipulate the nodes however you wish, +then rebuild a new tree using :py:meth:`DataTree.from_dict()`. + +For example, we could keep only the nodes containing data by looping over all nodes, +checking if they contain any data using :py:class:`~DataTree.has_data`, +then rebuilding a new tree using only the paths of those nodes: + +.. ipython:: python + + non_empty_nodes = {node.path: node.ds for node in dt.subtree if node.has_data} + DataTree.from_dict(non_empty_nodes) + +You can see this tree is similar to the ``dt`` object above, except that it is missing the empty nodes ``a/c`` and ``a/c/d``. + +(If you want to keep the name of the root node, you will need to add the ``name`` kwarg to :py:class:`from_dict`, i.e. ``DataTree.from_dict(non_empty_nodes, name=dt.root.name)``.) + +.. _manipulating trees: + +Manipulating Trees +------------------ + +Subsetting Tree Nodes +~~~~~~~~~~~~~~~~~~~~~ + +We can subset our tree to select only nodes of interest in various ways. + +The :py:meth:`DataTree.filter` method can be used to retain only the nodes of a tree that meet a certain condition. +For example, we could recreate the Simpson's family tree with the ages of each individual, then filter for only the adults: +First lets recreate the tree but with an `age` data variable in every node: + +.. ipython:: python + + simpsons = DataTree.from_dict( + d={ + "/": xr.Dataset({"age": 83}), + "/Herbert": xr.Dataset({"age": 40}), + "/Homer": xr.Dataset({"age": 39}), + "/Homer/Bart": xr.Dataset({"age": 10}), + "/Homer/Lisa": xr.Dataset({"age": 8}), + "/Homer/Maggie": xr.Dataset({"age": 1}), + }, + name="Abe", + ) + simpsons + +Now let's filter out the minors: + +.. ipython:: python + + simpsons.filter(lambda node: node["age"] > 18) + +The result is a new tree, containing only the nodes matching the condition. + +(Yes, under the hood :py:meth:`~DataTree.filter` is just syntactic sugar for the pattern we showed you in :ref:`iterating over trees` !) + +.. _tree computation: + +Computation +----------- + +`DataTree` objects are also useful for performing computations, not just for organizing data. + +Operations and Methods on Trees +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To show how applying operations across a whole tree at once can be useful, +let's first create a example scientific dataset. + +.. ipython:: python + + def time_stamps(n_samples, T): + """Create an array of evenly-spaced time stamps""" + return xr.DataArray( + data=np.linspace(0, 2 * np.pi * T, n_samples), dims=["time"] + ) + + + def signal_generator(t, f, A, phase): + """Generate an example electrical-like waveform""" + return A * np.sin(f * t.data + phase) + + + time_stamps1 = time_stamps(n_samples=15, T=1.5) + time_stamps2 = time_stamps(n_samples=10, T=1.0) + + voltages = DataTree.from_dict( + { + "/oscilloscope1": xr.Dataset( + { + "potential": ( + "time", + signal_generator(time_stamps1, f=2, A=1.2, phase=0.5), + ), + "current": ( + "time", + signal_generator(time_stamps1, f=2, A=1.2, phase=1), + ), + }, + coords={"time": time_stamps1}, + ), + "/oscilloscope2": xr.Dataset( + { + "potential": ( + "time", + signal_generator(time_stamps2, f=1.6, A=1.6, phase=0.2), + ), + "current": ( + "time", + signal_generator(time_stamps2, f=1.6, A=1.6, phase=0.7), + ), + }, + coords={"time": time_stamps2}, + ), + } + ) + voltages + +Most xarray computation methods also exist as methods on datatree objects, +so you can for example take the mean value of these two timeseries at once: + +.. ipython:: python + + voltages.mean(dim="time") + +This works by mapping the standard :py:meth:`xarray.Dataset.mean()` method over the dataset stored in each node of the +tree one-by-one. + +The arguments passed to the method are used for every node, so the values of the arguments you pass might be valid for one node and invalid for another + +.. ipython:: python + :okexcept: + + voltages.isel(time=12) + +Notice that the error raised helpfully indicates which node of the tree the operation failed on. + +Arithmetic Methods on Trees +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Arithmetic methods are also implemented, so you can e.g. add a scalar to every dataset in the tree at once. +For example, we can advance the timeline of the Simpsons by a decade just by + +.. ipython:: python + + simpsons + 10 + +See that the same change (fast-forwarding by adding 10 years to the age of each character) has been applied to every node. + +Mapping Custom Functions Over Trees +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +You can map custom computation over each node in a tree using :py:func:`map_over_subtree`. +You can map any function, so long as it takes `xarray.Dataset` objects as one (or more) of the input arguments, +and returns one (or more) xarray datasets. + +.. note:: + + Functions passed to :py:func:`map_over_subtree` cannot alter nodes in-place. + Instead they must return new `xarray.Dataset` objects. + +For example, we can define a function to calculate the Root Mean Square of a timeseries + +.. ipython:: python + + def rms(signal): + return np.sqrt(np.mean(signal**2)) + +Then calculate the RMS value of these signals: + +.. ipython:: python + + rms(readings) + +.. _multiple trees: + +Operating on Multiple Trees +--------------------------- + +The examples so far have involved mapping functions or methods over the nodes of a single tree, +but we can generalize this to mapping functions over multiple trees at once. + +Comparing Trees for Isomorphism +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For it to make sense to map a single non-unary function over the nodes of multiple trees at once, +each tree needs to have the same structure. Specifically two trees can only be considered similar, or "isomorphic", +if they have the same number of nodes, and each corresponding node has the same number of children. +We can check if any two trees are isomorphic using the :py:meth:`DataTree.isomorphic` method. + +.. ipython:: python + :okexcept: + + dt1 = DataTree.from_dict({"a": None, "a/b": None}) + dt2 = DataTree.from_dict({"a": None}) + dt1.isomorphic(dt2) + + dt3 = DataTree.from_dict({"a": None, "b": None}) + dt1.isomorphic(dt3) + + dt4 = DataTree.from_dict({"A": None, "A/B": xr.Dataset({"foo": 1})}) + dt1.isomorphic(dt4) + +If the trees are not isomorphic a :py:class:`~TreeIsomorphismError` will be raised. +Notice that corresponding tree nodes do not need to have the same name or contain the same data in order to be considered isomorphic. + +Arithmetic Between Multiple Trees +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Arithmetic operations like multiplication are binary operations, so as long as we have wo isomorphic trees, +we can do arithmetic between them. + +.. ipython:: python + + currents = DataTree.from_dict( + { + "/oscilloscope1": xr.Dataset( + { + "current": ( + "time", + signal_generator(time_stamps1, f=2, A=1.2, phase=1), + ), + }, + coords={"time": time_stamps1}, + ), + "/oscilloscope2": xr.Dataset( + { + "current": ( + "time", + signal_generator(time_stamps2, f=1.6, A=1.6, phase=0.7), + ), + }, + coords={"time": time_stamps2}, + ), + } + ) + currents + + currents.isomorphic(voltages) + +We could use this feature to quickly calculate the electrical power in our signal, P=IV. + +.. ipython:: python + + power = currents * voltages + power diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index d82353b8c33..8bcf262b6b4 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -45,6 +45,9 @@ Bug fixes Documentation ~~~~~~~~~~~~~ +- Added new sections to page on ``Working with Hierarchical Data`` (:pull:`180`) + By `Tom Nicholas `_. + Internal Changes ~~~~~~~~~~~~~~~~ From 04778564c76e02a5a97f3f54e0173988b94f6f52 Mon Sep 17 00:00:00 2001 From: Antonio Valentino Date: Tue, 24 Oct 2023 07:20:15 +0200 Subject: [PATCH 236/260] Bugfix/fix tests on i386 https://github.com/xarray-contrib/datatree/pull/249 * Fix tests on i386 * Update changelog * Add comment explaining explicit cast --------- Co-authored-by: Tom Nicholas Co-authored-by: Tom Nicholas --- xarray/datatree_/datatree/tests/test_formatting.py | 11 +++++++---- xarray/datatree_/docs/source/whats-new.rst | 2 ++ 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/xarray/datatree_/datatree/tests/test_formatting.py b/xarray/datatree_/datatree/tests/test_formatting.py index d0e3e9fd36d..0f64644c05a 100644 --- a/xarray/datatree_/datatree/tests/test_formatting.py +++ b/xarray/datatree_/datatree/tests/test_formatting.py @@ -90,11 +90,14 @@ def test_diff_node_names(self): assert actual == expected def test_diff_node_data(self): - ds1 = Dataset({"u": 0, "v": 1}) - ds3 = Dataset({"w": 5}) + import numpy as np + + # casting to int64 explicitly ensures that int64s are created on all architectures + ds1 = Dataset({"u": np.int64(0), "v": np.int64(1)}) + ds3 = Dataset({"w": np.int64(5)}) dt_1 = DataTree.from_dict({"a": ds1, "a/b": ds3}) - ds2 = Dataset({"u": 0}) - ds4 = Dataset({"w": 6}) + ds2 = Dataset({"u": np.int64(0)}) + ds4 = Dataset({"w": np.int64(6)}) dt_2 = DataTree.from_dict({"a": ds2, "a/b": ds4}) expected = dedent( diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 8bcf262b6b4..5d70f914add 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -39,6 +39,8 @@ Deprecations Bug fixes ~~~~~~~~~ +- Fix unittests on i386. (:pull:`249`) + By `Antonio Valentino `_. - Ensure nodepath class is compatible with python 3.12 (:pull:`260`) By `Max Grover `_. From 09023b030ee354669fb6f8692f9d0feeef474fe2 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Tue, 24 Oct 2023 12:18:46 -0400 Subject: [PATCH 237/260] Alternative fix for #188 https://github.com/xarray-contrib/datatree/pull/268 * alternative fix for 188 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- xarray/datatree_/datatree/datatree.py | 60 +++++++++++++++++++++++++++ 1 file changed, 60 insertions(+) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index b6bf8ac02d0..e44942610f6 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -36,6 +36,7 @@ HybridMappingProxy, _default, either_dict_or_kwargs, + maybe_wrap_array, ) from xarray.core.variable import Variable @@ -235,6 +236,65 @@ def _replace( inplace=inplace, ) + def map( + self, + func: Callable, + keep_attrs: bool | None = None, + args: Iterable[Any] = (), + **kwargs: Any, + ) -> Dataset: + """Apply a function to each data variable in this dataset + + Parameters + ---------- + func : callable + Function which can be called in the form `func(x, *args, **kwargs)` + to transform each DataArray `x` in this dataset into another + DataArray. + keep_attrs : bool or None, optional + If True, both the dataset's and variables' attributes (`attrs`) will be + copied from the original objects to the new ones. If False, the new dataset + and variables will be returned without copying the attributes. + args : iterable, optional + Positional arguments passed on to `func`. + **kwargs : Any + Keyword arguments passed on to `func`. + + Returns + ------- + applied : Dataset + Resulting dataset from applying ``func`` to each data variable. + + Examples + -------- + >>> da = xr.DataArray(np.random.randn(2, 3)) + >>> ds = xr.Dataset({"foo": da, "bar": ("x", [-1, 2])}) + >>> ds + + Dimensions: (dim_0: 2, dim_1: 3, x: 2) + Dimensions without coordinates: dim_0, dim_1, x + Data variables: + foo (dim_0, dim_1) float64 1.764 0.4002 0.9787 2.241 1.868 -0.9773 + bar (x) int64 -1 2 + >>> ds.map(np.fabs) + + Dimensions: (dim_0: 2, dim_1: 3, x: 2) + Dimensions without coordinates: dim_0, dim_1, x + Data variables: + foo (dim_0, dim_1) float64 1.764 0.4002 0.9787 2.241 1.868 0.9773 + bar (x) float64 1.0 2.0 + """ + + # Copied from xarray.Dataset so as not to call type(self), which causes problems (see datatree GH188). + # TODO Refactor xarray upstream to avoid needing to overwrite this. + # TODO This copied version will drop all attrs - the keep_attrs stuff should be re-instated + variables = { + k: maybe_wrap_array(v, func(v, *args, **kwargs)) + for k, v in self.data_vars.items() + } + # return type(self)(variables, attrs=attrs) + return Dataset(variables) + class DataTree( NamedNode, From 002e7cb11945d548902f1d223df3dda3995b9acf Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Tue, 24 Oct 2023 13:10:44 -0400 Subject: [PATCH 238/260] DatasetView in map_over_subtree https://github.com/xarray-contrib/datatree/pull/269 * re-forbid initializing a DatasetView directly * test that map_over_subtree definitely doesn't modify data in-place * chnge behaviour to return an immmutable DatasetView within map_over_subtree * improve error messages * whatsew * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused return variable --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- xarray/datatree_/datatree/datatree.py | 23 ++++++++++++------- xarray/datatree_/datatree/mapping.py | 11 ++++----- .../datatree_/datatree/tests/test_mapping.py | 5 ++-- xarray/datatree_/docs/source/whats-new.rst | 2 ++ 4 files changed, 25 insertions(+), 16 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index e44942610f6..f858ceac00c 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -108,13 +108,10 @@ class DatasetView(Dataset): An immutable Dataset-like view onto the data in a single DataTree node. In-place operations modifying this object should raise an AttributeError. + This requires overriding all inherited constructors. Operations returning a new result will return a new xarray.Dataset object. This includes all API on Dataset, which will be inherited. - - This requires overriding all inherited private constructors. - - We leave the public init constructor because it is used by type() in some xarray code (see datatree GH issue #188) """ # TODO what happens if user alters (in-place) a DataArray they extracted from this object? @@ -130,6 +127,14 @@ class DatasetView(Dataset): "_variables", ) + def __init__( + self, + data_vars: Optional[Mapping[Any, Any]] = None, + coords: Optional[Mapping[Any, Any]] = None, + attrs: Optional[Mapping[Any, Any]] = None, + ): + raise AttributeError("DatasetView objects are not to be initialized directly") + @classmethod def _from_node( cls, @@ -150,14 +155,16 @@ def _from_node( def __setitem__(self, key, val) -> None: raise AttributeError( - "Mutation of the DatasetView is not allowed, please use __setitem__ on the wrapping DataTree node, " - "or use `DataTree.to_dataset()` if you want a mutable dataset" + "Mutation of the DatasetView is not allowed, please use `.__setitem__` on the wrapping DataTree node, " + "or use `dt.to_dataset()` if you want a mutable dataset. If calling this from within `map_over_subtree`," + "use `.copy()` first to get a mutable version of the input dataset." ) def update(self, other) -> None: raise AttributeError( - "Mutation of the DatasetView is not allowed, please use .update on the wrapping DataTree node, " - "or use `DataTree.to_dataset()` if you want a mutable dataset" + "Mutation of the DatasetView is not allowed, please use `.update` on the wrapping DataTree node, " + "or use `dt.to_dataset()` if you want a mutable dataset. If calling this from within `map_over_subtree`," + "use `.copy()` first to get a mutable version of the input dataset." ) # FIXME https://github.com/python/mypy/issues/7328 diff --git a/xarray/datatree_/datatree/mapping.py b/xarray/datatree_/datatree/mapping.py index bd41cdbda62..d6df73f164a 100644 --- a/xarray/datatree_/datatree/mapping.py +++ b/xarray/datatree_/datatree/mapping.py @@ -190,15 +190,14 @@ def _map_over_subtree(*args, **kwargs) -> DataTree | Tuple[DataTree, ...]: *args_as_tree_length_iterables, *list(kwargs_as_tree_length_iterables.values()), ): - node_args_as_datasets = [ - a.to_dataset() if isinstance(a, DataTree) else a - for a in all_node_args[:n_args] + node_args_as_datasetviews = [ + a.ds if isinstance(a, DataTree) else a for a in all_node_args[:n_args] ] - node_kwargs_as_datasets = dict( + node_kwargs_as_datasetviews = dict( zip( [k for k in kwargs_as_tree_length_iterables.keys()], [ - v.to_dataset() if isinstance(v, DataTree) else v + v.ds if isinstance(v, DataTree) else v for v in all_node_args[n_args:] ], ) @@ -210,7 +209,7 @@ def _map_over_subtree(*args, **kwargs) -> DataTree | Tuple[DataTree, ...]: # Now we can call func on the data in this particular set of corresponding nodes results = ( func_with_error_context( - *node_args_as_datasets, **node_kwargs_as_datasets + *node_args_as_datasetviews, **node_kwargs_as_datasetviews ) if node_of_first_tree.has_data else None diff --git a/xarray/datatree_/datatree/tests/test_mapping.py b/xarray/datatree_/datatree/tests/test_mapping.py index 71a4fed6bf6..f9a44b723ba 100644 --- a/xarray/datatree_/datatree/tests/test_mapping.py +++ b/xarray/datatree_/datatree/tests/test_mapping.py @@ -304,7 +304,7 @@ def weighted_mean(ds): dt.map_over_subtree(weighted_mean) - def test_alter_inplace(self): + def test_alter_inplace_forbidden(self): simpsons = DataTree.from_dict( d={ "/": xr.Dataset({"age": 83}), @@ -322,7 +322,8 @@ def fast_forward(ds: xr.Dataset, years: float) -> xr.Dataset: ds["age"] = ds["age"] + years return ds - simpsons.map_over_subtree(fast_forward, years=10) + with pytest.raises(AttributeError): + simpsons.map_over_subtree(fast_forward, years=10) @pytest.mark.xfail diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 5d70f914add..4af1691d601 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -32,6 +32,8 @@ Breaking changes - Nodes containing only attributes but no data are now ignored by :py:func:`map_over_subtree` (:issue:`262`, :pull:`263`) By `Tom Nicholas `_. +- Disallow altering of given dataset inside function called by :py:func:`map_over_subtree` (:pull:`269`, reverts part of :pull:`194`). + By `Tom Nicholas `_. Deprecations ~~~~~~~~~~~~ From 944cd0fd76cda11d6f34973b1ffc210c9cadb739 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Tue, 24 Oct 2023 13:45:00 -0400 Subject: [PATCH 239/260] Method to match node paths via glob https://github.com/xarray-contrib/datatree/pull/267 * test * implementation * documentation * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * whatsnew * API * correct faulty test * remove newline * search-> match * format continuation lines correctly --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- xarray/datatree_/datatree/datatree.py | 50 +++++++++++++++++++ .../datatree_/datatree/tests/test_datatree.py | 19 +++++++ xarray/datatree_/docs/source/api.rst | 1 + .../docs/source/hierarchical-data.rst | 18 ++++++- xarray/datatree_/docs/source/whats-new.rst | 2 + 5 files changed, 89 insertions(+), 1 deletion(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index f858ceac00c..f2618d2a0ab 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -1242,8 +1242,13 @@ def filter(self: DataTree, filterfunc: Callable[[DataTree], bool]) -> DataTree: filterfunc: function A function which accepts only one DataTree - the node on which filterfunc will be called. + Returns + ------- + DataTree + See Also -------- + match pipe map_over_subtree """ @@ -1252,6 +1257,51 @@ def filter(self: DataTree, filterfunc: Callable[[DataTree], bool]) -> DataTree: } return DataTree.from_dict(filtered_nodes, name=self.root.name) + def match(self, pattern: str) -> DataTree: + """ + Return nodes with paths matching pattern. + + Uses unix glob-like syntax for pattern-matching. + + Parameters + ---------- + pattern: str + A pattern to match each node path against. + + Returns + ------- + DataTree + + See Also + -------- + filter + pipe + map_over_subtree + + Examples + -------- + >>> dt = DataTree.from_dict( + ... { + ... "/a/A": None, + ... "/a/B": None, + ... "/b/A": None, + ... "/b/B": None, + ... } + ... ) + >>> dt.match("*/B") + DataTree('None', parent=None) + ├── DataTree('a') + │ └── DataTree('B') + └── DataTree('b') + └── DataTree('B') + """ + matching_nodes = { + node.path: node.ds + for node in self.subtree + if NodePath(node.path).match(pattern) + } + return DataTree.from_dict(matching_nodes, name=self.root.name) + def map_over_subtree( self, func: Callable, diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 726925fa78a..26fd0e54040 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -678,6 +678,25 @@ def f(x, tree, y): class TestSubset: + def test_match(self): + # TODO is this example going to cause problems with case sensitivity? + dt = DataTree.from_dict( + { + "/a/A": None, + "/a/B": None, + "/b/A": None, + "/b/B": None, + } + ) + result = dt.match("*/B") + expected = DataTree.from_dict( + { + "/a/B": None, + "/b/B": None, + } + ) + dtt.assert_identical(result, expected) + def test_filter(self): simpsons = DataTree.from_dict( d={ diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst index 9a34bdd0089..54a98639d03 100644 --- a/xarray/datatree_/docs/source/api.rst +++ b/xarray/datatree_/docs/source/api.rst @@ -102,6 +102,7 @@ For manipulating, traversing, navigating, or mapping over the tree structure. DataTree.find_common_ancestor map_over_subtree DataTree.pipe + DataTree.match DataTree.filter DataTree Contents diff --git a/xarray/datatree_/docs/source/hierarchical-data.rst b/xarray/datatree_/docs/source/hierarchical-data.rst index 7795c9e2876..f74a635dfaa 100644 --- a/xarray/datatree_/docs/source/hierarchical-data.rst +++ b/xarray/datatree_/docs/source/hierarchical-data.rst @@ -379,7 +379,23 @@ Subsetting Tree Nodes We can subset our tree to select only nodes of interest in various ways. -The :py:meth:`DataTree.filter` method can be used to retain only the nodes of a tree that meet a certain condition. +Similarly to on a real filesystem, matching nodes by common patterns in their paths is often useful. +We can use :py:meth:`DataTree.match` for this: + +.. ipython:: python + + dt = DataTree.from_dict( + { + "/a/A": None, + "/a/B": None, + "/b/A": None, + "/b/B": None, + } + ) + result = dt.match("*/B") + +We can also subset trees by the contents of the nodes. +:py:meth:`DataTree.filter` retains only the nodes of a tree that meet a certain condition. For example, we could recreate the Simpson's family tree with the ages of each individual, then filter for only the adults: First lets recreate the tree but with an `age` data variable in every node: diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 4af1691d601..eb2c034016f 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -23,6 +23,8 @@ v0.0.13 (unreleased) New Features ~~~~~~~~~~~~ +- New :py:meth:`DataTree.match` method for glob-like pattern matching of node paths. (:pull:`267`) + By `Tom Nicholas `_. - Indicate which node caused the problem if error encountered while applying user function using :py:func:`map_over_subtree` (:issue:`190`, :pull:`264`). Only works when using python 3.11 or later. By `Tom Nicholas `_. From 02ec20b41d9e5a6e8ec4d42e34ed99104cf65749 Mon Sep 17 00:00:00 2001 From: Antonio Valentino Date: Tue, 24 Oct 2023 21:18:42 +0200 Subject: [PATCH 240/260] Do not use the deprecated distutils https://github.com/xarray-contrib/datatree/pull/247 * Do not use the deprecated distutils * Update docs/source/whats-new.rst * Fix import sorting --------- Co-authored-by: Tom Nicholas --- xarray/datatree_/datatree/tests/__init__.py | 4 ++-- xarray/datatree_/docs/source/whats-new.rst | 2 ++ xarray/datatree_/pyproject.toml | 1 + 3 files changed, 5 insertions(+), 2 deletions(-) diff --git a/xarray/datatree_/datatree/tests/__init__.py b/xarray/datatree_/datatree/tests/__init__.py index 964cb635dc5..64961158b13 100644 --- a/xarray/datatree_/datatree/tests/__init__.py +++ b/xarray/datatree_/datatree/tests/__init__.py @@ -1,7 +1,7 @@ import importlib -from distutils import version import pytest +from packaging import version def _importorskip(modname, minversion=None): @@ -21,7 +21,7 @@ def LooseVersion(vstring): # Our development version is something like '0.10.9+aac7bfc' # This function just ignores the git commit id. vstring = vstring.split("+")[0] - return version.LooseVersion(vstring) + return version.parse(vstring) has_zarr, requires_zarr = _importorskip("zarr") diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index eb2c034016f..c3606f19327 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -57,6 +57,8 @@ Documentation Internal Changes ~~~~~~~~~~~~~~~~ +* No longer use the deprecated `distutils` package. + .. _whats-new.v0.0.12: v0.0.12 (03/07/2023) diff --git a/xarray/datatree_/pyproject.toml b/xarray/datatree_/pyproject.toml index a219b9767ff..86ad3639073 100644 --- a/xarray/datatree_/pyproject.toml +++ b/xarray/datatree_/pyproject.toml @@ -20,6 +20,7 @@ classifiers = [ requires-python = ">=3.9" dependencies = [ "xarray >=2022.6.0", + "packaging", ] dynamic = ["version"] From 333bfa81e15f1f250bdbb464c6c54f2d0970d1ca Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Wed, 25 Oct 2023 14:06:53 -0400 Subject: [PATCH 241/260] is_hollow method https://github.com/xarray-contrib/datatree/pull/272 * tests * implementation * API docs * narrative docs * whatsnew --- xarray/datatree_/datatree/datatree.py | 5 +++++ .../datatree_/datatree/tests/test_datatree.py | 10 ++++++++++ xarray/datatree_/docs/source/api.rst | 1 + .../docs/source/hierarchical-data.rst | 19 +++++++++++++++++++ xarray/datatree_/docs/source/whats-new.rst | 2 ++ 5 files changed, 37 insertions(+) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index f2618d2a0ab..52049f47d4b 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -505,6 +505,11 @@ def is_empty(self) -> bool: """False if node contains any data or attrs. Does not look at children.""" return not (self.has_data or self.has_attrs) + @property + def is_hollow(self) -> bool: + """True if only leaf nodes contain data.""" + return not any(node.has_data for node in self.subtree if not node.is_leaf) + @property def variables(self) -> Mapping[Hashable, Variable]: """Low level interface to node contents as dict of Variable objects. diff --git a/xarray/datatree_/datatree/tests/test_datatree.py b/xarray/datatree_/datatree/tests/test_datatree.py index 26fd0e54040..fde83b2e226 100644 --- a/xarray/datatree_/datatree/tests/test_datatree.py +++ b/xarray/datatree_/datatree/tests/test_datatree.py @@ -136,6 +136,16 @@ def test_has_data(self): john = DataTree(name="john", data=None) assert not john.has_data + def test_is_hollow(self): + john = DataTree(data=xr.Dataset({"a": 0})) + assert john.is_hollow + + eve = DataTree(children={"john": john}) + assert eve.is_hollow + + eve.ds = xr.Dataset({"a": 1}) + assert not eve.is_hollow + class TestVariablesChildrenNameCollisions: def test_parent_already_has_variable_with_childs_name(self): diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst index 54a98639d03..215105efa31 100644 --- a/xarray/datatree_/docs/source/api.rst +++ b/xarray/datatree_/docs/source/api.rst @@ -64,6 +64,7 @@ This interface echoes that of ``xarray.Dataset``. DataTree.has_data DataTree.has_attrs DataTree.is_empty + DataTree.is_hollow .. diff --git a/xarray/datatree_/docs/source/hierarchical-data.rst b/xarray/datatree_/docs/source/hierarchical-data.rst index f74a635dfaa..51bcea56b6b 100644 --- a/xarray/datatree_/docs/source/hierarchical-data.rst +++ b/xarray/datatree_/docs/source/hierarchical-data.rst @@ -369,6 +369,25 @@ You can see this tree is similar to the ``dt`` object above, except that it is m (If you want to keep the name of the root node, you will need to add the ``name`` kwarg to :py:class:`from_dict`, i.e. ``DataTree.from_dict(non_empty_nodes, name=dt.root.name)``.) +.. _Tree Contents: + +Tree Contents +------------- + +Hollow Trees +~~~~~~~~~~~~ + +A concept that can sometimes be useful is that of a "Hollow Tree", which means a tree with data stored only at the leaf nodes. +This is useful because certain useful tree manipulation operations only make sense for hollow trees. + +You can check if a tree is a hollow tree by using the :py:meth:`~DataTree.is_hollow` property. +We can see that the Simpson's family is not hollow because the data variable ``"age"`` is present at some nodes which +have children (i.e. Abe and Homer). + +.. ipython:: python + + simpsons.is_hollow + .. _manipulating trees: Manipulating Trees diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index c3606f19327..5163fdd654a 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -25,6 +25,8 @@ New Features - New :py:meth:`DataTree.match` method for glob-like pattern matching of node paths. (:pull:`267`) By `Tom Nicholas `_. +- New :py:meth:`DataTree.is_hollow` property for checking if data is only contained at the leaf nodes. (:pull:`272`) + By `Tom Nicholas `_. - Indicate which node caused the problem if error encountered while applying user function using :py:func:`map_over_subtree` (:issue:`190`, :pull:`264`). Only works when using python 3.11 or later. By `Tom Nicholas `_. From 52590cbb7424b9c767c2729ba166e7ca4fa620c1 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Fri, 27 Oct 2023 18:19:11 -0400 Subject: [PATCH 242/260] blank whatsnew for next release --- xarray/datatree_/docs/source/whats-new.rst | 28 ++++++++++++++++++---- 1 file changed, 24 insertions(+), 4 deletions(-) diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 5163fdd654a..a72bfdca4fc 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -15,9 +15,32 @@ What's New np.random.seed(123456) +.. _whats-new.v0.0.14: + +v0.0.14 (unreleased) +-------------------- + +New Features +~~~~~~~~~~~~ + +Breaking changes +~~~~~~~~~~~~~~~~ + +Deprecations +~~~~~~~~~~~~ + +Bug fixes +~~~~~~~~~ + +Documentation +~~~~~~~~~~~~~ + +Internal Changes +~~~~~~~~~~~~~~~~ + .. _whats-new.v0.0.13: -v0.0.13 (unreleased) +v0.0.13 (27/10/2023) -------------------- New Features @@ -39,9 +62,6 @@ Breaking changes - Disallow altering of given dataset inside function called by :py:func:`map_over_subtree` (:pull:`269`, reverts part of :pull:`194`). By `Tom Nicholas `_. -Deprecations -~~~~~~~~~~~~ - Bug fixes ~~~~~~~~~ From 256ce55acf11d99375418b6ed602f37c38962ebe Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Fri, 27 Oct 2023 18:33:14 -0400 Subject: [PATCH 243/260] fix broken example of map_over_subtree --- xarray/datatree_/docs/source/hierarchical-data.rst | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/xarray/datatree_/docs/source/hierarchical-data.rst b/xarray/datatree_/docs/source/hierarchical-data.rst index 51bcea56b6b..3cae4e3bd13 100644 --- a/xarray/datatree_/docs/source/hierarchical-data.rst +++ b/xarray/datatree_/docs/source/hierarchical-data.rst @@ -539,7 +539,7 @@ See that the same change (fast-forwarding by adding 10 years to the age of each Mapping Custom Functions Over Trees ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -You can map custom computation over each node in a tree using :py:func:`map_over_subtree`. +You can map custom computation over each node in a tree using :py:meth:`DataTree.map_over_subtree`. You can map any function, so long as it takes `xarray.Dataset` objects as one (or more) of the input arguments, and returns one (or more) xarray datasets. @@ -559,10 +559,13 @@ Then calculate the RMS value of these signals: .. ipython:: python - rms(readings) + voltages.map_over_subtree(rms) .. _multiple trees: +We can also use the :py:func:`map_over_subtree` decorator to promote a function which accepts datasets into one which +accepts datatrees. + Operating on Multiple Trees --------------------------- From cc4de87cec017b854ca183646a244abfe13a28ca Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Fri, 27 Oct 2023 18:44:23 -0400 Subject: [PATCH 244/260] add DataTree.map_over_subtree method to API docs --- xarray/datatree_/docs/source/api.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst index 215105efa31..c4ad6e58c78 100644 --- a/xarray/datatree_/docs/source/api.rst +++ b/xarray/datatree_/docs/source/api.rst @@ -101,6 +101,7 @@ For manipulating, traversing, navigating, or mapping over the tree structure. DataTree.relative_to DataTree.iter_lineage DataTree.find_common_ancestor + DataTree.map_over_subtree map_over_subtree DataTree.pipe DataTree.match From 20aa09a51aba66496b744915a4ca2a95d97c538f Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Mon, 6 Nov 2023 13:56:03 -0700 Subject: [PATCH 245/260] [pre-commit.ci] pre-commit autoupdate https://github.com/xarray-contrib/datatree/pull/273 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit updates: - [github.com/pre-commit/pre-commit-hooks: v4.4.0 → v4.5.0](https://github.com/pre-commit/pre-commit-hooks/compare/v4.4.0...v4.5.0) - [github.com/psf/black: 23.9.1 → 23.10.1](https://github.com/psf/black/compare/23.9.1...23.10.1) - [github.com/keewis/blackdoc: v0.3.8 → v0.3.9](https://github.com/keewis/blackdoc/compare/v0.3.8...v0.3.9) - [github.com/pre-commit/mirrors-mypy: v1.5.1 → v1.6.1](https://github.com/pre-commit/mirrors-mypy/compare/v1.5.1...v1.6.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- xarray/datatree_/.pre-commit-config.yaml | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/xarray/datatree_/.pre-commit-config.yaml b/xarray/datatree_/.pre-commit-config.yaml index a2dac76a44f..d3e649b18dc 100644 --- a/xarray/datatree_/.pre-commit-config.yaml +++ b/xarray/datatree_/.pre-commit-config.yaml @@ -3,7 +3,7 @@ ci: autoupdate_schedule: monthly repos: - repo: https://github.com/pre-commit/pre-commit-hooks - rev: v4.4.0 + rev: v4.5.0 hooks: - id: trailing-whitespace - id: end-of-file-fixer @@ -15,11 +15,11 @@ repos: - id: isort # https://github.com/python/black#version-control-integration - repo: https://github.com/psf/black - rev: 23.9.1 + rev: 23.10.1 hooks: - id: black - repo: https://github.com/keewis/blackdoc - rev: v0.3.8 + rev: v0.3.9 hooks: - id: blackdoc - repo: https://github.com/PyCQA/flake8 @@ -32,7 +32,7 @@ repos: # - id: velin # args: ["--write", "--compact"] - repo: https://github.com/pre-commit/mirrors-mypy - rev: v1.5.1 + rev: v1.6.1 hooks: - id: mypy # Copied from setup.cfg From 12ca491370e1597774f731b91ecbfedf9ab89e69 Mon Sep 17 00:00:00 2001 From: Sam Levang <39069044+slevang@users.noreply.github.com> Date: Thu, 9 Nov 2023 23:40:29 -0500 Subject: [PATCH 246/260] Change default write mode of `to_zarr()` to `'w-'` https://github.com/xarray-contrib/datatree/pull/275 * change default to_zarr mode to w- * regression test * add whats new entry --- xarray/datatree_/datatree/datatree.py | 9 +++++++-- xarray/datatree_/datatree/io.py | 2 +- xarray/datatree_/datatree/tests/test_io.py | 9 +++++++++ xarray/datatree_/docs/source/whats-new.rst | 4 ++++ 4 files changed, 21 insertions(+), 3 deletions(-) diff --git a/xarray/datatree_/datatree/datatree.py b/xarray/datatree_/datatree/datatree.py index 52049f47d4b..c86c2e2e3e8 100644 --- a/xarray/datatree_/datatree/datatree.py +++ b/xarray/datatree_/datatree/datatree.py @@ -1496,7 +1496,12 @@ def to_netcdf( ) def to_zarr( - self, store, mode: str = "w", encoding=None, consolidated: bool = True, **kwargs + self, + store, + mode: str = "w-", + encoding=None, + consolidated: bool = True, + **kwargs, ): """ Write datatree contents to a Zarr store. @@ -1505,7 +1510,7 @@ def to_zarr( ---------- store : MutableMapping, str or Path, optional Store or path to directory in file system - mode : {{"w", "w-", "a", "r+", None}, default: "w" + mode : {{"w", "w-", "a", "r+", None}, default: "w-" Persistence mode: “w” means create (overwrite if exists); “w-” means create (fail if exists); “a” means override existing variables (create if does not exist); “r+” means modify existing array values only (raise an error if any metadata or shapes would change). The default mode diff --git a/xarray/datatree_/datatree/io.py b/xarray/datatree_/datatree/io.py index 73992f135da..8bb7682f085 100644 --- a/xarray/datatree_/datatree/io.py +++ b/xarray/datatree_/datatree/io.py @@ -176,7 +176,7 @@ def _create_empty_zarr_group(store, group, mode): def _datatree_to_zarr( dt: DataTree, store, - mode: str = "w", + mode: str = "w-", encoding=None, consolidated: bool = True, **kwargs, diff --git a/xarray/datatree_/datatree/tests/test_io.py b/xarray/datatree_/datatree/tests/test_io.py index 59199371de4..6fa20479f9a 100644 --- a/xarray/datatree_/datatree/tests/test_io.py +++ b/xarray/datatree_/datatree/tests/test_io.py @@ -1,4 +1,5 @@ import pytest +import zarr.errors from datatree.io import open_datatree from datatree.testing import assert_equal @@ -109,3 +110,11 @@ def test_to_zarr_not_consolidated(self, tmpdir, simple_datatree): with pytest.warns(RuntimeWarning, match="consolidated"): roundtrip_dt = open_datatree(filepath, engine="zarr") assert_equal(original_dt, roundtrip_dt) + + @requires_zarr + def test_to_zarr_default_write_mode(self, tmpdir, simple_datatree): + simple_datatree.to_zarr(tmpdir) + + # with default settings, to_zarr should not overwrite an existing dir + with pytest.raises(zarr.errors.ContainsGroupError): + simple_datatree.to_zarr(tmpdir) diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index a72bfdca4fc..fe1ac57fe1b 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -26,6 +26,10 @@ New Features Breaking changes ~~~~~~~~~~~~~~~~ +- Change default write mode of :py:meth:`DataTree.to_zarr` to ``'w-'`` to match ``xarray`` + default and prevent accidental directory overwrites. (:issue:`274`, :pull:`275`) + By `Sam Levang `_. + Deprecations ~~~~~~~~~~~~ From 494ace1fb679428a636b64bc02879e24cbc56d05 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 26 Nov 2023 20:33:48 -0500 Subject: [PATCH 247/260] remove non-existent chunks method from API docs page --- xarray/datatree_/docs/source/api.rst | 6 ------ 1 file changed, 6 deletions(-) diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst index c4ad6e58c78..259c29d10ea 100644 --- a/xarray/datatree_/docs/source/api.rst +++ b/xarray/datatree_/docs/source/api.rst @@ -57,7 +57,6 @@ This interface echoes that of ``xarray.Dataset``. DataTree.attrs DataTree.encoding DataTree.indexes - DataTree.chunks DataTree.nbytes DataTree.ds DataTree.to_dataset @@ -66,11 +65,6 @@ This interface echoes that of ``xarray.Dataset``. DataTree.is_empty DataTree.is_hollow -.. - - Missing: - ``DataTree.chunksizes`` - Dictionary interface -------------------- From 6b778c490613919f2fe7960acdd3a578b6fea9d3 Mon Sep 17 00:00:00 2001 From: Sam Levang <39069044+slevang@users.noreply.github.com> Date: Mon, 27 Nov 2023 11:47:46 -0500 Subject: [PATCH 248/260] Keep attrs in `map_over_subtree` https://github.com/xarray-contrib/datatree/pull/279 * keep attrs in map_over_subtree * more intelligible logic --------- Co-authored-by: Tom Nicholas --- xarray/datatree_/datatree/mapping.py | 15 +++++++++------ xarray/datatree_/datatree/tests/test_mapping.py | 11 +++++++++++ xarray/datatree_/docs/source/whats-new.rst | 2 ++ 3 files changed, 22 insertions(+), 6 deletions(-) diff --git a/xarray/datatree_/datatree/mapping.py b/xarray/datatree_/datatree/mapping.py index d6df73f164a..c9631e1edbf 100644 --- a/xarray/datatree_/datatree/mapping.py +++ b/xarray/datatree_/datatree/mapping.py @@ -206,14 +206,17 @@ def _map_over_subtree(*args, **kwargs) -> DataTree | Tuple[DataTree, ...]: node_of_first_tree.path )(func) - # Now we can call func on the data in this particular set of corresponding nodes - results = ( - func_with_error_context( + if node_of_first_tree.has_data: + # call func on the data in this particular set of corresponding nodes + results = func_with_error_context( *node_args_as_datasetviews, **node_kwargs_as_datasetviews ) - if node_of_first_tree.has_data - else None - ) + elif node_of_first_tree.has_attrs: + # propagate attrs + results = node_of_first_tree.ds + else: + # nothing to propagate so use fastpath to create empty node in new tree + results = None # TODO implement mapping over multiple trees in-place using if conditions from here on? out_data_objects[node_of_first_tree.path] = results diff --git a/xarray/datatree_/datatree/tests/test_mapping.py b/xarray/datatree_/datatree/tests/test_mapping.py index f9a44b723ba..929ce7644dd 100644 --- a/xarray/datatree_/datatree/tests/test_mapping.py +++ b/xarray/datatree_/datatree/tests/test_mapping.py @@ -264,6 +264,17 @@ def check_for_data(ds): dt.map_over_subtree(check_for_data) + def test_keep_attrs_on_empty_nodes(self, create_test_datatree): + # GH278 + dt = create_test_datatree() + dt["set1/set2"].attrs["foo"] = "bar" + + def empty_func(ds): + return ds + + result = dt.map_over_subtree(empty_func) + assert result["set1/set2"].attrs == dt["set1/set2"].attrs + @pytest.mark.xfail( reason="probably some bug in pytests handling of exception notes" ) diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index fe1ac57fe1b..9d46c95bbe3 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -35,6 +35,8 @@ Deprecations Bug fixes ~~~~~~~~~ +- Keep attributes on nodes containing no data in :py:func:`map_over_subtree`. (:issue:`278`, :pull:`279`) + By `Sam Levang `_. Documentation ~~~~~~~~~~~~~ From ced83678f2e50b69509da67a13f902f88a355914 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Wed, 6 Dec 2023 22:33:46 -0700 Subject: [PATCH 249/260] Bump actions/setup-python from 4 to 5 https://github.com/xarray-contrib/datatree/pull/291 Bumps [actions/setup-python](https://github.com/actions/setup-python) from 4 to 5. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](https://github.com/actions/setup-python/compare/v4...v5) --- updated-dependencies: - dependency-name: actions/setup-python dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- xarray/datatree_/.github/workflows/pypipublish.yaml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index 9ad36fc5dce..ba641183cdf 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -22,7 +22,7 @@ jobs: - uses: actions/checkout@v4 with: fetch-depth: 0 - - uses: actions/setup-python@v4 + - uses: actions/setup-python@v5 name: Install Python with: python-version: 3.9 @@ -48,7 +48,7 @@ jobs: needs: build-artifacts runs-on: ubuntu-latest steps: - - uses: actions/setup-python@v4 + - uses: actions/setup-python@v5 name: Install Python with: python-version: '3.10' From 1ca8452487a1d7c30e00da4badaf2c6a808f514e Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Sun, 10 Dec 2023 11:04:43 -0500 Subject: [PATCH 250/260] Fix for xarray v2023.12.0 https://github.com/xarray-contrib/datatree/pull/294 * fix import of xarray.testing internals that was changed by https://github.com/pydata/xarray/pull/8404 * bump minimum required version of xarray * linting --- xarray/datatree_/datatree/testing.py | 2 +- xarray/datatree_/docs/source/whats-new.rst | 4 ++++ xarray/datatree_/pyproject.toml | 2 +- 3 files changed, 6 insertions(+), 2 deletions(-) diff --git a/xarray/datatree_/datatree/testing.py b/xarray/datatree_/datatree/testing.py index ebe32cbefcd..1cbcdf2d4e3 100644 --- a/xarray/datatree_/datatree/testing.py +++ b/xarray/datatree_/datatree/testing.py @@ -1,4 +1,4 @@ -from xarray.testing import ensure_warnings +from xarray.testing.assertions import ensure_warnings from .datatree import DataTree from .formatting import diff_tree_repr diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 9d46c95bbe3..95bdcaf79bf 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -26,6 +26,10 @@ New Features Breaking changes ~~~~~~~~~~~~~~~~ +- Minimum required version of xarray is now 2023.12.0, i.e. the latest version. + This is required to prevent recent changes to xarray's internals from breaking datatree. + (:issue:`293`, :pull:`294`) + By `Tom Nicholas `_. - Change default write mode of :py:meth:`DataTree.to_zarr` to ``'w-'`` to match ``xarray`` default and prevent accidental directory overwrites. (:issue:`274`, :pull:`275`) By `Sam Levang `_. diff --git a/xarray/datatree_/pyproject.toml b/xarray/datatree_/pyproject.toml index 86ad3639073..40f7d5a59b3 100644 --- a/xarray/datatree_/pyproject.toml +++ b/xarray/datatree_/pyproject.toml @@ -19,7 +19,7 @@ classifiers = [ ] requires-python = ">=3.9" dependencies = [ - "xarray >=2022.6.0", + "xarray >=2023.12.0", "packaging", ] dynamic = ["version"] From 7cfb6b2ea3b935a8a393fb47c15e810131445d01 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Wed, 3 Jan 2024 11:25:01 +0100 Subject: [PATCH 251/260] Bump actions/upload-artifact from 3 to 4 https://github.com/xarray-contrib/datatree/pull/296 Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 3 to 4. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](https://github.com/actions/upload-artifact/compare/v3...v4) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- xarray/datatree_/.github/workflows/pypipublish.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index ba641183cdf..916f69b676e 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -39,7 +39,7 @@ jobs: python -m build --sdist --wheel . - - uses: actions/upload-artifact@v3 + - uses: actions/upload-artifact@v4 with: name: releases path: dist From b40cad3369123d0d034fcb52686cf522ac67fd6d Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Wed, 3 Jan 2024 11:46:07 +0100 Subject: [PATCH 252/260] Bump actions/download-artifact from 3 to 4 https://github.com/xarray-contrib/datatree/pull/295 Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 3 to 4. - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](https://github.com/actions/download-artifact/compare/v3...v4) --- updated-dependencies: - dependency-name: actions/download-artifact dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- xarray/datatree_/.github/workflows/pypipublish.yaml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index 916f69b676e..bae6e0b726f 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -52,7 +52,7 @@ jobs: name: Install Python with: python-version: '3.10' - - uses: actions/download-artifact@v3 + - uses: actions/download-artifact@v4 with: name: releases path: dist @@ -72,7 +72,7 @@ jobs: if: github.event_name == 'release' runs-on: ubuntu-latest steps: - - uses: actions/download-artifact@v3 + - uses: actions/download-artifact@v4 with: name: releases path: dist From 11995b0550f7d30640a81d653aed674da92cf6ed Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Wed, 3 Jan 2024 11:50:57 +0100 Subject: [PATCH 253/260] [pre-commit.ci] pre-commit autoupdate https://github.com/xarray-contrib/datatree/pull/289 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit updates: - [github.com/PyCQA/isort: 5.12.0 → 5.13.2](https://github.com/PyCQA/isort/compare/5.12.0...5.13.2) - [github.com/psf/black: 23.10.1 → 23.12.1](https://github.com/psf/black/compare/23.10.1...23.12.1) - [github.com/pre-commit/mirrors-mypy: v1.6.1 → v1.8.0](https://github.com/pre-commit/mirrors-mypy/compare/v1.6.1...v1.8.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Justus Magin --- xarray/datatree_/.pre-commit-config.yaml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/xarray/datatree_/.pre-commit-config.yaml b/xarray/datatree_/.pre-commit-config.yaml index d3e649b18dc..ea73c38d73e 100644 --- a/xarray/datatree_/.pre-commit-config.yaml +++ b/xarray/datatree_/.pre-commit-config.yaml @@ -10,12 +10,12 @@ repos: - id: check-yaml # isort should run before black as black sometimes tweaks the isort output - repo: https://github.com/PyCQA/isort - rev: 5.12.0 + rev: 5.13.2 hooks: - id: isort # https://github.com/python/black#version-control-integration - repo: https://github.com/psf/black - rev: 23.10.1 + rev: 23.12.1 hooks: - id: black - repo: https://github.com/keewis/blackdoc @@ -32,7 +32,7 @@ repos: # - id: velin # args: ["--write", "--compact"] - repo: https://github.com/pre-commit/mirrors-mypy - rev: v1.6.1 + rev: v1.8.0 hooks: - id: mypy # Copied from setup.cfg From 804b3f7164822d9d51e530845143d5da694d817b Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Wed, 3 Jan 2024 11:54:10 +0100 Subject: [PATCH 254/260] Bump pypa/gh-action-pypi-publish from 1.8.10 to 1.8.11 https://github.com/xarray-contrib/datatree/pull/285 Bumps [pypa/gh-action-pypi-publish](https://github.com/pypa/gh-action-pypi-publish) from 1.8.10 to 1.8.11. - [Release notes](https://github.com/pypa/gh-action-pypi-publish/releases) - [Commits](https://github.com/pypa/gh-action-pypi-publish/compare/v1.8.10...v1.8.11) --- updated-dependencies: - dependency-name: pypa/gh-action-pypi-publish dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Justus Magin --- xarray/datatree_/.github/workflows/pypipublish.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/.github/workflows/pypipublish.yaml b/xarray/datatree_/.github/workflows/pypipublish.yaml index bae6e0b726f..7dc36d87691 100644 --- a/xarray/datatree_/.github/workflows/pypipublish.yaml +++ b/xarray/datatree_/.github/workflows/pypipublish.yaml @@ -77,7 +77,7 @@ jobs: name: releases path: dist - name: Publish package to PyPI - uses: pypa/gh-action-pypi-publish@v1.8.10 + uses: pypa/gh-action-pypi-publish@v1.8.11 with: user: ${{ secrets.PYPI_USERNAME }} password: ${{ secrets.PYPI_PASSWORD }} From a01cc36432f76c9494a567cf3c2caa09b455e2a9 Mon Sep 17 00:00:00 2001 From: etienneschalk <45271239+etienneschalk@users.noreply.github.com> Date: Tue, 9 Jan 2024 21:20:20 +0100 Subject: [PATCH 255/260] Use napoleon instead of numpydoc (xarray doc alignment), and fixes https://github.com/xarray-contrib/datatree/pull/298 * Use napoleon instead of numpydoc, and fixes * docs * A mypy ignore for pre-commit run --all-files * Updated whats-new.rst --- xarray/datatree_/.gitignore | 2 + xarray/datatree_/ci/doc.yml | 2 +- xarray/datatree_/datatree/mapping.py | 2 +- xarray/datatree_/docs/README.md | 14 +++++ xarray/datatree_/docs/source/api.rst | 2 - xarray/datatree_/docs/source/conf.py | 73 +++++++++++++++++++--- xarray/datatree_/docs/source/index.rst | 1 - xarray/datatree_/docs/source/whats-new.rst | 5 +- 8 files changed, 87 insertions(+), 14 deletions(-) create mode 100644 xarray/datatree_/docs/README.md diff --git a/xarray/datatree_/.gitignore b/xarray/datatree_/.gitignore index 64f6a86852e..d286b19edb1 100644 --- a/xarray/datatree_/.gitignore +++ b/xarray/datatree_/.gitignore @@ -131,3 +131,5 @@ dmypy.json # version _version.py + +.vscode diff --git a/xarray/datatree_/ci/doc.yml b/xarray/datatree_/ci/doc.yml index 6e1fda6ee9f..f3b95f71bd4 100644 --- a/xarray/datatree_/ci/doc.yml +++ b/xarray/datatree_/ci/doc.yml @@ -13,8 +13,8 @@ dependencies: - sphinx-book-theme >= 0.0.38 - nbsphinx - sphinxcontrib-srclinks + - pickleshare - pydata-sphinx-theme>=0.4.3 - - numpydoc - ipython - h5netcdf - zarr diff --git a/xarray/datatree_/datatree/mapping.py b/xarray/datatree_/datatree/mapping.py index c9631e1edbf..34e227d349d 100644 --- a/xarray/datatree_/datatree/mapping.py +++ b/xarray/datatree_/datatree/mapping.py @@ -282,7 +282,7 @@ def wrapper(*args, **kwargs): def add_note(err: BaseException, msg: str) -> None: # TODO: remove once python 3.10 can be dropped if sys.version_info < (3, 11): - err.__notes__ = getattr(err, "__notes__", []) + [msg] + err.__notes__ = getattr(err, "__notes__", []) + [msg] # type: ignore[attr-defined] else: err.add_note(msg) diff --git a/xarray/datatree_/docs/README.md b/xarray/datatree_/docs/README.md new file mode 100644 index 00000000000..ca2bf72952e --- /dev/null +++ b/xarray/datatree_/docs/README.md @@ -0,0 +1,14 @@ +# README - docs + +## Build the documentation locally + +```bash +cd docs # From project's root +make clean +rm -rf source/generated # remove autodoc artefacts, that are not removed by `make clean` +make html +``` + +## Access the documentation locally + +Open `docs/_build/html/index.html` in a web browser diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst index 259c29d10ea..ec6228439ac 100644 --- a/xarray/datatree_/docs/source/api.rst +++ b/xarray/datatree_/docs/source/api.rst @@ -249,9 +249,7 @@ Methods copied from :py:class:`numpy.ndarray` objects, here applying to the data DataTree.clip DataTree.conj DataTree.conjugate - DataTree.imag DataTree.round - DataTree.real DataTree.rank Reshaping and reorganising diff --git a/xarray/datatree_/docs/source/conf.py b/xarray/datatree_/docs/source/conf.py index 06eb6d9d62b..8a9224def5b 100644 --- a/xarray/datatree_/docs/source/conf.py +++ b/xarray/datatree_/docs/source/conf.py @@ -39,7 +39,6 @@ # Add any Sphinx extension module names here, as strings. They can be extensions # coming with Sphinx (named 'sphinx.ext.*') or your custom ones. extensions = [ - "numpydoc", "sphinx.ext.autodoc", "sphinx.ext.viewcode", "sphinx.ext.linkcode", @@ -57,8 +56,8 @@ ] extlinks = { - "issue": ("https://github.com/TomNicholas/datatree/issues/%s", "GH#"), - "pull": ("https://github.com/TomNicholas/datatree/pull/%s", "GH#"), + "issue": ("https://github.com/xarray-contrib/datatree/issues/%s", "GH#%s"), + "pull": ("https://github.com/xarray-contrib/datatree/pull/%s", "GH#%s"), } # Add any paths that contain templates here, relative to this directory. templates_path = ["_templates", sphinx_autosummary_accessors.templates_path] @@ -66,6 +65,69 @@ # Generate the API documentation when building autosummary_generate = True + +# Napoleon configurations + +napoleon_google_docstring = False +napoleon_numpy_docstring = True +napoleon_use_param = False +napoleon_use_rtype = False +napoleon_preprocess_types = True +napoleon_type_aliases = { + # general terms + "sequence": ":term:`sequence`", + "iterable": ":term:`iterable`", + "callable": ":py:func:`callable`", + "dict_like": ":term:`dict-like `", + "dict-like": ":term:`dict-like `", + "path-like": ":term:`path-like `", + "mapping": ":term:`mapping`", + "file-like": ":term:`file-like `", + # special terms + # "same type as caller": "*same type as caller*", # does not work, yet + # "same type as values": "*same type as values*", # does not work, yet + # stdlib type aliases + "MutableMapping": "~collections.abc.MutableMapping", + "sys.stdout": ":obj:`sys.stdout`", + "timedelta": "~datetime.timedelta", + "string": ":class:`string `", + # numpy terms + "array_like": ":term:`array_like`", + "array-like": ":term:`array-like `", + "scalar": ":term:`scalar`", + "array": ":term:`array`", + "hashable": ":term:`hashable `", + # matplotlib terms + "color-like": ":py:func:`color-like `", + "matplotlib colormap name": ":doc:`matplotlib colormap name `", + "matplotlib axes object": ":py:class:`matplotlib axes object `", + "colormap": ":py:class:`colormap `", + # objects without namespace: xarray + "DataArray": "~xarray.DataArray", + "Dataset": "~xarray.Dataset", + "Variable": "~xarray.Variable", + "DatasetGroupBy": "~xarray.core.groupby.DatasetGroupBy", + "DataArrayGroupBy": "~xarray.core.groupby.DataArrayGroupBy", + # objects without namespace: numpy + "ndarray": "~numpy.ndarray", + "MaskedArray": "~numpy.ma.MaskedArray", + "dtype": "~numpy.dtype", + "ComplexWarning": "~numpy.ComplexWarning", + # objects without namespace: pandas + "Index": "~pandas.Index", + "MultiIndex": "~pandas.MultiIndex", + "CategoricalIndex": "~pandas.CategoricalIndex", + "TimedeltaIndex": "~pandas.TimedeltaIndex", + "DatetimeIndex": "~pandas.DatetimeIndex", + "Series": "~pandas.Series", + "DataFrame": "~pandas.DataFrame", + "Categorical": "~pandas.Categorical", + "Path": "~~pathlib.Path", + # objects with abbreviated namespace (from pandas) + "pd.Index": "~pandas.Index", + "pd.NaT": "~pandas.NaT", +} + # The suffix of source filenames. source_suffix = ".rst" @@ -177,11 +239,6 @@ # pixels large. # html_favicon = None -# Add any paths that contain custom static files (such as style sheets) here, -# relative to this directory. They are copied after the builtin static files, -# so a file named "default.css" will overwrite the builtin "default.css". -html_static_path = ["_static"] - # If not '', a 'Last updated on:' timestamp is inserted at every page bottom, # using the given strftime format. # html_last_updated_fmt = '%b %d, %Y' diff --git a/xarray/datatree_/docs/source/index.rst b/xarray/datatree_/docs/source/index.rst index d13a0edf798..a88a5747ada 100644 --- a/xarray/datatree_/docs/source/index.rst +++ b/xarray/datatree_/docs/source/index.rst @@ -50,7 +50,6 @@ Please raise any thoughts, issues, suggestions or bugs, no matter how small or l Reading and Writing Files API Reference Terminology - How do I ... Contributing Guide What's New GitHub repository diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 95bdcaf79bf..675b0fb2d08 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -44,6 +44,9 @@ Bug fixes Documentation ~~~~~~~~~~~~~ +- Use ``napoleon`` instead of ``numpydoc`` to align with xarray documentation + (:issue:`284`, :pull:`298`). + By `Etienne Schalk `_. Internal Changes ~~~~~~~~~~~~~~~~ @@ -365,7 +368,7 @@ Breaking changes - Removes the option to delete all data in a node by assigning None to the node (in favour of deleting data by replacing the node's ``.ds`` attribute with an empty Dataset), or to create a new empty node in the same way (in favour of assigning an empty DataTree object instead). -- Removes the ability to create a new node by assigning a ``Dataset`` object to ``DataTree.__setitem__`. +- Removes the ability to create a new node by assigning a ``Dataset`` object to ``DataTree.__setitem__``. - Several other minor API changes such as ``.pathstr`` -> ``.path``, and ``from_dict``'s dictionary argument now being required. (:pull:`76`) By `Tom Nicholas `_. From a156da981d8c130016c9bca5b1e90ed3e5d40d6d Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Wed, 17 Jan 2024 14:54:40 -0700 Subject: [PATCH 256/260] Update data-structures.rst https://github.com/xarray-contrib/datatree/pull/299 typo found while reading the docs --- xarray/datatree_/docs/source/data-structures.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xarray/datatree_/docs/source/data-structures.rst b/xarray/datatree_/docs/source/data-structures.rst index 23dd8edf315..02e4a31f688 100644 --- a/xarray/datatree_/docs/source/data-structures.rst +++ b/xarray/datatree_/docs/source/data-structures.rst @@ -75,7 +75,7 @@ Again these are not normally used unless explicitly accessed by the user. Creating a DataTree ~~~~~~~~~~~~~~~~~~~ -One way to create a create a ``DataTree`` from scratch is to create each node individually, +One way to create a ``DataTree`` from scratch is to create each node individually, specifying the nodes' relationship to one another as you create each one. The ``DataTree`` constructor takes: From d030502bfc8518a21ad99f978ccf2abbebb2145e Mon Sep 17 00:00:00 2001 From: etienneschalk <45271239+etienneschalk@users.noreply.github.com> Date: Fri, 19 Jan 2024 23:35:14 +0100 Subject: [PATCH 257/260] DataTree.lineage should be renamed to .parents https://github.com/xarray-contrib/datatree/pull/286 * Replace 'lineage' occurences in code by 'parents' * Replace 'lineage' occurences in api.rst by 'parents' * MyPy ignore * whats-new * Re-introduce lineage and deprecate it * Added credit * Added back lineage in api.rst * Update datatree/tests/test_treenode.py Co-authored-by: Tom Nicholas * Updated lineage and parents, broke tests * Replaced slash by point, tests pass * New PR * Fix tests * Remove iter_parents from api.rst * Avoid entering into the more complex else branch --------- Co-authored-by: Tom Nicholas --- xarray/datatree_/.gitignore | 3 +- .../datatree_/datatree/tests/test_treenode.py | 11 ++- xarray/datatree_/datatree/treenode.py | 81 ++++++++++++------- xarray/datatree_/docs/source/api.rst | 1 + xarray/datatree_/docs/source/whats-new.rst | 6 ++ 5 files changed, 69 insertions(+), 33 deletions(-) diff --git a/xarray/datatree_/.gitignore b/xarray/datatree_/.gitignore index d286b19edb1..88af9943a90 100644 --- a/xarray/datatree_/.gitignore +++ b/xarray/datatree_/.gitignore @@ -132,4 +132,5 @@ dmypy.json # version _version.py -.vscode +# Ignore vscode specific settings +.vscode/ diff --git a/xarray/datatree_/datatree/tests/test_treenode.py b/xarray/datatree_/datatree/tests/test_treenode.py index 5a05a6b5bef..f2d314c50e3 100644 --- a/xarray/datatree_/datatree/tests/test_treenode.py +++ b/xarray/datatree_/datatree/tests/test_treenode.py @@ -95,7 +95,7 @@ def test_ancestors(self): michael = TreeNode(children={"Tony": tony}) vito = TreeNode(children={"Michael": michael}) assert tony.root is vito - assert tony.lineage == (tony, michael, vito) + assert tony.parents == (michael, vito) assert tony.ancestors == (vito, michael, tony) @@ -279,12 +279,15 @@ def test_levelorderiter(self): class TestAncestry: + def test_parents(self): + _, leaf = create_test_tree() + expected = ["e", "b", "a"] + assert [node.name for node in leaf.parents] == expected + def test_lineage(self): _, leaf = create_test_tree() - lineage = leaf.lineage expected = ["f", "e", "b", "a"] - for node, expected_name in zip(lineage, expected): - assert node.name == expected_name + assert [node.name for node in leaf.lineage] == expected def test_ancestors(self): _, leaf = create_test_tree() diff --git a/xarray/datatree_/datatree/treenode.py b/xarray/datatree_/datatree/treenode.py index 4950bc9ce12..1689d261c34 100644 --- a/xarray/datatree_/datatree/treenode.py +++ b/xarray/datatree_/datatree/treenode.py @@ -121,8 +121,7 @@ def _check_loop(self, new_parent: Tree | None) -> None: ) def _is_descendant_of(self, node: Tree) -> bool: - _self, *lineage = list(node.lineage) - return any(n is self for n in lineage) + return any(n is self for n in node.parents) def _detach(self, parent: Tree | None) -> None: if parent is not None: @@ -236,26 +235,53 @@ def _post_attach_children(self: Tree, children: Mapping[str, Tree]) -> None: """Method call after attaching `children`.""" pass - def iter_lineage(self: Tree) -> Iterator[Tree]: + def _iter_parents(self: Tree) -> Iterator[Tree]: """Iterate up the tree, starting from the current node.""" - node: Tree | None = self + node: Tree | None = self.parent while node is not None: yield node node = node.parent + def iter_lineage(self: Tree) -> Tuple[Tree, ...]: + """Iterate up the tree, starting from the current node.""" + from warnings import warn + + warn( + "`iter_lineage` has been deprecated, and in the future will raise an error." + "Please use `parents` from now on.", + DeprecationWarning, + ) + return tuple((self, *self.parents)) + @property def lineage(self: Tree) -> Tuple[Tree, ...]: """All parent nodes and their parent nodes, starting with the closest.""" - return tuple(self.iter_lineage()) + from warnings import warn + + warn( + "`lineage` has been deprecated, and in the future will raise an error." + "Please use `parents` from now on.", + DeprecationWarning, + ) + return self.iter_lineage() + + @property + def parents(self: Tree) -> Tuple[Tree, ...]: + """All parent nodes and their parent nodes, starting with the closest.""" + return tuple(self._iter_parents()) @property def ancestors(self: Tree) -> Tuple[Tree, ...]: """All parent nodes and their parent nodes, starting with the most distant.""" - if self.parent is None: - return (self,) - else: - ancestors = tuple(reversed(list(self.lineage))) - return ancestors + + from warnings import warn + + warn( + "`ancestors` has been deprecated, and in the future will raise an error." + "Please use `parents`. Example: `tuple(reversed(node.parents))`", + DeprecationWarning, + ) + return tuple((*reversed(self.parents), self)) @property def root(self: Tree) -> Tree: @@ -351,7 +377,7 @@ def level(self: Tree) -> int: depth width """ - return len(self.ancestors) - 1 + return len(self.parents) @property def depth(self: Tree) -> int: @@ -591,9 +617,9 @@ def path(self) -> str: if self.is_root: return "/" else: - root, *ancestors = self.ancestors + root, *ancestors = tuple(reversed(self.parents)) # don't include name of root because (a) root might not have a name & (b) we want path relative to root. - names = [node.name for node in ancestors] + names = [*(node.name for node in ancestors), self.name] return "/" + "/".join(names) def relative_to(self: NamedNode, other: NamedNode) -> str: @@ -608,7 +634,7 @@ def relative_to(self: NamedNode, other: NamedNode) -> str: ) this_path = NodePath(self.path) - if other.path in list(ancestor.path for ancestor in self.lineage): + if other.path in list(parent.path for parent in (self, *self.parents)): return str(this_path.relative_to(other.path)) else: common_ancestor = self.find_common_ancestor(other) @@ -623,18 +649,17 @@ def find_common_ancestor(self, other: NamedNode) -> NamedNode: Raise ValueError if they are not in the same tree. """ - common_ancestor = None - for node in other.iter_lineage(): - if node.path in [ancestor.path for ancestor in self.ancestors]: - common_ancestor = node - break + if self is other: + return self - if not common_ancestor: - raise NotFoundInTreeError( - "Cannot find common ancestor because nodes do not lie within the same tree" - ) + other_paths = [op.path for op in other.parents] + for parent in (self, *self.parents): + if parent.path in other_paths: + return parent - return common_ancestor + raise NotFoundInTreeError( + "Cannot find common ancestor because nodes do not lie within the same tree" + ) def _path_to_ancestor(self, ancestor: NamedNode) -> NodePath: """Return the relative path from this node to the given ancestor node""" @@ -643,12 +668,12 @@ def _path_to_ancestor(self, ancestor: NamedNode) -> NodePath: raise NotFoundInTreeError( "Cannot find relative path to ancestor because nodes do not lie within the same tree" ) - if ancestor.path not in list(a.path for a in self.ancestors): + if ancestor.path not in list(a.path for a in (self, *self.parents)): raise NotFoundInTreeError( "Cannot find relative path to ancestor because given node is not an ancestor of this node" ) - lineage_paths = list(ancestor.path for ancestor in self.lineage) - generation_gap = list(lineage_paths).index(ancestor.path) - path_upwards = "../" * generation_gap if generation_gap > 0 else "/" + parents_paths = list(parent.path for parent in (self, *self.parents)) + generation_gap = list(parents_paths).index(ancestor.path) + path_upwards = "../" * generation_gap if generation_gap > 0 else "." return NodePath(path_upwards) diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst index ec6228439ac..417b849575d 100644 --- a/xarray/datatree_/docs/source/api.rst +++ b/xarray/datatree_/docs/source/api.rst @@ -38,6 +38,7 @@ Attributes relating to the recursive tree-like structure of a ``DataTree``. DataTree.descendants DataTree.siblings DataTree.lineage + DataTree.parents DataTree.ancestors DataTree.groups diff --git a/xarray/datatree_/docs/source/whats-new.rst b/xarray/datatree_/docs/source/whats-new.rst index 675b0fb2d08..2f6e4f88fe5 100644 --- a/xarray/datatree_/docs/source/whats-new.rst +++ b/xarray/datatree_/docs/source/whats-new.rst @@ -26,6 +26,8 @@ New Features Breaking changes ~~~~~~~~~~~~~~~~ +- Renamed `DataTree.lineage` to `DataTree.parents` to match `pathlib` vocabulary + (:issue:`283`, :pull:`286`) - Minimum required version of xarray is now 2023.12.0, i.e. the latest version. This is required to prevent recent changes to xarray's internals from breaking datatree. (:issue:`293`, :pull:`294`) @@ -37,6 +39,10 @@ Breaking changes Deprecations ~~~~~~~~~~~~ +- Renamed `DataTree.lineage` to `DataTree.parents` to match `pathlib` vocabulary + (:issue:`283`, :pull:`286`). `lineage` is now deprecated and use of `parents` is encouraged. + By `Etienne Schalk `_. + Bug fixes ~~~~~~~~~ - Keep attributes on nodes containing no data in :py:func:`map_over_subtree`. (:issue:`278`, :pull:`279`) From 65a7d244f5815c2593d7461242aa3cd7c5dce09f Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Fri, 19 Jan 2024 17:42:02 -0500 Subject: [PATCH 258/260] Add Pathlike methods to api docs https://github.com/xarray-contrib/datatree/pull/287 * move from_dict to creation methods * add section for pathlib-like interface * add suggestions for missing pathlib-like api --- xarray/datatree_/docs/source/api.rst | 32 +++++++++++++++++++++++++--- 1 file changed, 29 insertions(+), 3 deletions(-) diff --git a/xarray/datatree_/docs/source/api.rst b/xarray/datatree_/docs/source/api.rst index 417b849575d..d325d24f4a4 100644 --- a/xarray/datatree_/docs/source/api.rst +++ b/xarray/datatree_/docs/source/api.rst @@ -10,10 +10,13 @@ DataTree Creating a DataTree ------------------- +Methods of creating a datatree. + .. autosummary:: :toctree: generated/ DataTree + DataTree.from_dict Tree Attributes --------------- @@ -66,7 +69,7 @@ This interface echoes that of ``xarray.Dataset``. DataTree.is_empty DataTree.is_hollow -Dictionary interface +Dictionary Interface -------------------- ``DataTree`` objects also have a dict-like interface mapping keys to either ``xarray.DataArray``s or to child ``DataTree`` nodes. @@ -102,6 +105,30 @@ For manipulating, traversing, navigating, or mapping over the tree structure. DataTree.match DataTree.filter +Pathlib-like Interface +---------------------- + +``DataTree`` objects deliberately echo some of the API of `pathlib.PurePath`. + +.. autosummary:: + :toctree: generated/ + + DataTree.name + DataTree.parent + DataTree.parents + DataTree.relative_to + +Missing: + +.. + + ``DataTree.glob`` + ``DataTree.joinpath`` + ``DataTree.with_name`` + ``DataTree.walk`` + ``DataTree.rename`` + ``DataTree.replace`` + DataTree Contents ----------------- @@ -276,13 +303,12 @@ Plotting I/O === -Create or +Open a datatree from an on-disk store or serialize the tree. .. autosummary:: :toctree: generated/ open_datatree - DataTree.from_dict DataTree.to_dict DataTree.to_netcdf DataTree.to_zarr From fca7fe49a4e44f653ed870b0fbb088a148c56a91 Mon Sep 17 00:00:00 2001 From: Matt Savoie Date: Fri, 19 Jan 2024 15:52:51 -0700 Subject: [PATCH 259/260] Moves Tree Contents so that simpsons data tree in example is defined. https://github.com/xarray-contrib/datatree/pull/301 * Moves Tree Contents so that simpsons data tree in example is defined. Also changes :py:meth => :py:class since it's a property not a method. * Show result of datatree.match operation for clarity. * fix typo wo => two --------- Co-authored-by: Tom Nicholas --- .../docs/source/hierarchical-data.rst | 41 ++++++++++--------- 1 file changed, 21 insertions(+), 20 deletions(-) diff --git a/xarray/datatree_/docs/source/hierarchical-data.rst b/xarray/datatree_/docs/source/hierarchical-data.rst index 3cae4e3bd13..d4f58847718 100644 --- a/xarray/datatree_/docs/source/hierarchical-data.rst +++ b/xarray/datatree_/docs/source/hierarchical-data.rst @@ -369,25 +369,6 @@ You can see this tree is similar to the ``dt`` object above, except that it is m (If you want to keep the name of the root node, you will need to add the ``name`` kwarg to :py:class:`from_dict`, i.e. ``DataTree.from_dict(non_empty_nodes, name=dt.root.name)``.) -.. _Tree Contents: - -Tree Contents -------------- - -Hollow Trees -~~~~~~~~~~~~ - -A concept that can sometimes be useful is that of a "Hollow Tree", which means a tree with data stored only at the leaf nodes. -This is useful because certain useful tree manipulation operations only make sense for hollow trees. - -You can check if a tree is a hollow tree by using the :py:meth:`~DataTree.is_hollow` property. -We can see that the Simpson's family is not hollow because the data variable ``"age"`` is present at some nodes which -have children (i.e. Abe and Homer). - -.. ipython:: python - - simpsons.is_hollow - .. _manipulating trees: Manipulating Trees @@ -412,6 +393,7 @@ We can use :py:meth:`DataTree.match` for this: } ) result = dt.match("*/B") + result We can also subset trees by the contents of the nodes. :py:meth:`DataTree.filter` retains only the nodes of a tree that meet a certain condition. @@ -443,6 +425,25 @@ The result is a new tree, containing only the nodes matching the condition. (Yes, under the hood :py:meth:`~DataTree.filter` is just syntactic sugar for the pattern we showed you in :ref:`iterating over trees` !) +.. _Tree Contents: + +Tree Contents +------------- + +Hollow Trees +~~~~~~~~~~~~ + +A concept that can sometimes be useful is that of a "Hollow Tree", which means a tree with data stored only at the leaf nodes. +This is useful because certain useful tree manipulation operations only make sense for hollow trees. + +You can check if a tree is a hollow tree by using the :py:class:`~DataTree.is_hollow` property. +We can see that the Simpson's family is not hollow because the data variable ``"age"`` is present at some nodes which +have children (i.e. Abe and Homer). + +.. ipython:: python + + simpsons.is_hollow + .. _tree computation: Computation @@ -599,7 +600,7 @@ Notice that corresponding tree nodes do not need to have the same name or contai Arithmetic Between Multiple Trees ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Arithmetic operations like multiplication are binary operations, so as long as we have wo isomorphic trees, +Arithmetic operations like multiplication are binary operations, so as long as we have two isomorphic trees, we can do arithmetic between them. .. ipython:: python From 25020f00104a25183550b2fb7946340f74942566 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Tue, 23 Jan 2024 17:40:51 -0500 Subject: [PATCH 260/260] add migration notice to readme --- xarray/datatree_/README.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/xarray/datatree_/README.md b/xarray/datatree_/README.md index df4e3b6cc49..e41a13b4cb6 100644 --- a/xarray/datatree_/README.md +++ b/xarray/datatree_/README.md @@ -14,6 +14,12 @@ that was more flexible than a single `xarray.Dataset` object. The initial motivation was to represent netCDF files / Zarr stores with multiple nested groups in a single in-memory object, but `datatree.DataTree` objects have many other uses. +### DEPRECATION NOTICE + +Datatree is in the process of being merged upstream into xarray (as of [v0.0.14](https://github.com/xarray-contrib/datatree/releases/tag/v0.0.14), see xarray issue [#8572](https://github.com/pydata/xarray/issues/8572)). We are aiming to preserve the record of contributions to this repository during the migration process. However whilst we will hapily accept new PRs to this repository, this repo will be deprecated and any PRs since [v0.0.14](https://github.com/xarray-contrib/datatree/releases/tag/v0.0.14) might be later copied across to xarray without full git attribution. + +Hopefully for users the disruption will be minimal - and just mean that in some future version of xarray you only need to do `from xarray import DataTree` rather than `from datatree import DataTree`. Once the migration is complete this repository will be archived. + ### Installation You can install datatree via pip: ```shell