Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support DataTree for organizing Datasets by type of level #327

Open
jthielen opened this issue Jan 10, 2023 · 1 comment
Open

Support DataTree for organizing Datasets by type of level #327

jthielen opened this issue Jan 10, 2023 · 1 comment

Comments

@jthielen
Copy link

jthielen commented Jan 10, 2023

As discussed in xarray-contrib/datatree#195, it would be wonderful (and relatively straightforward) to add support for DataTree in cfgrib. This would allow a improved organization of the different datasets that would have been previously been returned from cfgrib.open_datasets() in a single data collection.

As far as implementation, I would propose refactoring the existing open_datasets() to something like:

def open_datatree(path, backend_kwargs={}, **kwargs):
    # type: (str, T.Dict[str, T.Any], T.Any) -> datatree.DataTree
    """
    Open a GRIB file groupping incompatible hypercubes to different datasets via simple heuristics.
    """
    squeeze = backend_kwargs.get("squeeze", True)
    backend_kwargs = backend_kwargs.copy()
    backend_kwargs["squeeze"] = False
    datasets = open_variable_datasets(path, backend_kwargs=backend_kwargs, **kwargs)

    type_of_level_datasets = {}  # type: T.Dict[str, T.List[xr.Dataset]]
    for ds in datasets:
        for _, da in ds.data_vars.items():
            type_of_level = da.attrs.get("GRIB_typeOfLevel", "undef")
            type_of_level_datasets.setdefault(type_of_level, []).append(ds)

    return datatree.DataTree.from_dict(type_of_level_datasets)

Then, open_datasets could be re-implemented something like:

def open_datasets(path, backend_kwargs={}, **kwargs):
    type_of_level_datasets = open_datatree(path, backend_kwargs=backend_kwargs, **kwargs)
    merged = []  # type: T.List[xr.Dataset]
    for type_of_level in sorted(type_of_level_datasets):
        for ds in merge_datasets(type_of_level_datasets[type_of_level], join="exact"):
            merged.append(ds.squeeze() if squeeze else ds)
    return merged

(these snippets were edited quick in-between conference sessions; no guarantee that I didn't miss something and these don't work properly as-is)

This all being said, discussions would likely need to happen to decide whether this should be supported before or after integration of DataTree into xarray proper (xref pydata/xarray#7418).

cc @TomNicholas, @blaylockbk

@blaylockbk
Copy link
Contributor

#187 and #321 are additional cases where Datatree could help cfgrib: Different stepRange for precipitation (and other?) variables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants