Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable multi-coord grouping from xarray #9332

Closed
max-sixty opened this issue Aug 12, 2024 · 3 comments · Fixed by #9372
Closed

Enable multi-coord grouping from xarray #9332

max-sixty opened this issue Aug 12, 2024 · 3 comments · Fixed by #9372

Comments

@max-sixty
Copy link
Collaborator

I'm working through some around non-trivial groupby from colleagues. In particular, grouping by multiple coordinates seems much harder in xarray than pandas. Flox actually does this really nicely, as per the comment from @dcherian:

As an aside, the API isn't great but this works in flox (I think)

import flox.xarray

flox.xarray.xarray_reduce(da, "labels1", "labels2", func="mean")
image

Originally posted by @dcherian in #9278 (comment)

Would be great if da.groupby(["labels1", "labels2"]).mean() worked too — is that just a simple translation to flox from the xarray code? (I can probably do it if so). Or is there something more complex going on?

@keewis
Copy link
Collaborator

keewis commented Aug 12, 2024

from #6610 (comment) the idea is to eventually support

data.groupby({"labels1": xr.UniqueGrouper(), "labels2": xr.UniqueGrouper()}).mean()

As far as I remember, multi-dimensional groupby is planned but not supported yet. What we'd need to do is remove the condition here:

xarray/xarray/core/dataset.py

Lines 10388 to 10393 in ce5130f

if len(groupers) > 1:
raise ValueError("Grouping by multiple variables is not supported yet.")
elif not groupers:
raise ValueError("Either `group` or `**groupers` must be provided.")
for group, grouper in groupers.items():
rgrouper = ResolvedGrouper(grouper, group, self)
and make sure the loop still works (@dcherian will have more details).

@max-sixty
Copy link
Collaborator Author

from #6610 (comment) the idea is to eventually support

data.groupby({"labels1": xr.UniqueGrouper(), "labels2": xr.UniqueGrouper()}).mean()

Great!

I would have also supported data.groupby(["labels1", "labels2"]) as syntactic sugar (which pandas supports) but I would need to read more to have a confident view...

@dcherian
Copy link
Contributor

Yes I have struggled to fully generalize it.

That said, we could just begin with simply supporting _flox_reduce. This should be a straightforward extrapolation of what's already there.

dcherian added a commit to dcherian/xarray that referenced this issue Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants