Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataTree.update can cause multiple root groups. #9285

Closed
5 tasks done
flamingbear opened this issue Jul 26, 2024 · 7 comments
Closed
5 tasks done

DataTree.update can cause multiple root groups. #9285

flamingbear opened this issue Jul 26, 2024 · 7 comments
Labels
bug topic-DataTree Related to the implementation of a DataTree class

Comments

@flamingbear
Copy link
Contributor

What happened?

Reviewing documentation for hierarchical-data.rst I saw the abe/herbert example didn't look right, updated the abe.assign() -> abe = abe.assign() and it still looked wrong

>>> abe = xr.DataTree(name="abe")
>>> herbert = xr.DataTree(name="Herb")

>>> abe.update({"herbert": herbert})

>>> print(abe)
<xarray.DataTree 'abe'>
Group: /
└── Group: /
>>> print(abe.groups)
('/', '/')

What did you expect to happen?

I expected not to see two root groups.

Minimal Complete Verifiable Example

abe = xr.DataTree(name="abe")
herbert = xr.DataTree(name="Herb")

abe.update({"herbert": herbert})
print(abe)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

it is failing the update or the assign (but because assign uses update)

Environment

/Users/savoie/.pyenv/versions/miniconda3-4.7.12/envs/xarray-tests/lib/python3.11/site-packages/_distutils_hack/init.py:26: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit: 9e4b737
python: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:45:13) [Clang 16.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 23.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2024.6.1.dev88+g068bab28b.d20240724
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.13.0
netCDF4: 1.6.5
pydap: installed
h5netcdf: 1.3.0
h5py: 3.11.0
zarr: 2.17.2
cftime: 1.6.3
nc_time_axis: 1.4.1
iris: 3.9.0
bottleneck: 1.3.8
dask: 2024.4.2
distributed: 2024.4.2
matplotlib: 3.8.4
cartopy: 0.23.0
seaborn: 0.13.2
numbagg: 0.8.1
fsspec: 2024.3.1
cupy: None
pint: 0.23
sparse: 0.15.1
flox: 0.9.6
numpy_groupies: 0.10.2
setuptools: 69.5.1
pip: 24.0
conda: None
pytest: 8.1.1
mypy: 1.8.0
IPython: 8.25.0
sphinx: None

@flamingbear flamingbear added bug needs triage Issue that has not been reviewed by xarray team member topic-DataTree Related to the implementation of a DataTree class labels Jul 26, 2024
@flamingbear
Copy link
Contributor Author

flamingbear commented Jul 26, 2024

This might need to be a separate issue, but assign used to update in place rather than create a full new DataTree and this might be why? This is what I'm seeing now.

bart = xr.DataTree(name="Bart")
lisa = xr.DataTree(name="Lisa")
homer = xr.DataTree(name="Homer", children={"Lisa": lisa})

lisa.parent = homer
print(homer)

try:
    homer.parent = lisa
except Exception:
    print("that excepts as expected")

homer = homer.assign({"bart": bart})
print(homer)

# This shouldn't work
homer.parent = lisa
# This looks bad
print(homer)

And the output

<xarray.DataTree 'Homer'>
Group: /
└── Group: /Lisa
that excepts as expected
<xarray.DataTree 'Homer'>
Group: /
├── Group: /Lisa
└── Group: /
<xarray.DataTree 'Homer'>
Group: /Lisa/Homer
├── Group: /Lisa/Homer/Lisa
└── Group: /

@TomNicholas
Copy link
Member

Thanks again @flamingbear ! I think I've tracked these down, and they are both the same issue. Basically there is a bug in the new implementation of ._replace_node (introduced in #9063).

What happens is that inside ._replace_node we assign to self._children instead of self.children, which therefore skips a bunch of _pre_attach and other stuff in treenode.py that ensures that all the paths are consistent. Without this then if you print the new merged_children inside .update you see this

In [3]: homer.assign(Bart=bart)
{'Lisa': <xarray.DataTree 'Lisa'>
Group: /Lisa, 'Bart': <xarray.DataTree 'Bart'>
Group: /}

Notice that Bart has a path of /, not /Bart as it should to be consistent with /Lisa. This means the .path attributes of the new_children have not yet been altered correctly to fit into the tree, and it's not safe to assign to self._children yet.

I think we should just assign to self.children instead.

@TomNicholas
Copy link
Member

Another way to say this is: Assigning directly to self._children is a bug because it's not enough to just say "these are the new children", you also have to tell those children who their new parent is, and that step is being skipped.

FYI @shoyer

@TomNicholas TomNicholas removed the needs triage Issue that has not been reviewed by xarray team member label Jul 30, 2024
@flamingbear
Copy link
Contributor Author

simply changing the _replace_node datatree.py L773 with this diff, does fix the "root names"

-        self._children = children
+        self.children = children

I'm still seeing the assign act strange (allowing cycles)

lisa = xr.DataTree(name="Lisa")
homer = xr.DataTree(name="Homer", children={"Lisa": lisa})
homer = homer.assign({})
homer.parent = lisa
print(homer)
<xarray.DataTree 'Homer'>
Group: /Lisa/Homer
└── Group: /Lisa/Homer/Lisa

@flamingbear
Copy link
Contributor Author

I'm pretty sure you told me exactly how to fix this yesterday, but it didn't sink in until last night, I'll see if I can do this or at least throw something up to discuss.

@flamingbear
Copy link
Contributor Author

flamingbear commented Aug 2, 2024

So This was my confusion from the old code to the new w/r/t assign. It's not creating cycles. I just missed that the (current) assign behavior is creating a new homer so it's fine to make lisa his parent.

Anyway this will change again with #9297, but the original multiple root groups can be fixed with that one liner and I'll put that up now.

And actually you can close this with #9297 I'll just suggest the extra assert

@flamingbear
Copy link
Contributor Author

closed in #9243

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug topic-DataTree Related to the implementation of a DataTree class
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants