Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to open/format a sample forecast datasets? #17

Open
observingClouds opened this issue Jul 26, 2024 · 4 comments
Open

How to open/format a sample forecast datasets? #17

observingClouds opened this issue Jul 26, 2024 · 4 comments
Labels
question Further information is requested

Comments

@observingClouds
Copy link

Hi @abkfenris,

I found your package following the pangeo discussion about forecast formats and wanted to give this a try. The structure and provided functions seem very appealing.

I know this package is non-fully developed yet, but I was wondering if you could give me a quick hint on how to format datasets to match the xarray_fmrc format.

Let's assume the following:

import numpy as np
import datetime
import xarray as xr
import xarray_fmrc

ds0 = xr.Dataset(
    {
        'pres': (['forecast_reference_time', 'time', 'lat', 'lon'], np.random.randint(980, 1000, (1, 5, 10, 10)))
    },
    coords={
        'lat': np.arange(10, 20),
        'lon': np.arange(-60, -50),
        'forecast_reference_time': [datetime.datetime(2020, 1, 1, 0, 0)],
        'forecast_offset': xr.DataArray([datetime.timedelta(hours=h) for h in range(5)], dims='time'),
        'time': [
            datetime.datetime(2020, 1, 1, 0, 0),
            datetime.datetime(2020, 1, 1, 1, 0),
            datetime.datetime(2020, 1, 1, 2, 0),
            datetime.datetime(2020, 1, 1, 3, 0),
            datetime.datetime(2020, 1, 1, 4, 0)
        ]
    }
)

ds1 = xr.Dataset(
    {
        'pres': (['forecast_reference_time', 'time', 'lat', 'lon'], np.random.randint(980, 1000, (1, 5, 10, 10)))
    },
    coords={
        'lat': np.arange(10, 20),
        'lon': np.arange(-60, -50),
        'forecast_reference_time': [datetime.datetime(2020, 1, 1, 12, 0)],
        'forecast_offset': xr.DataArray([datetime.timedelta(hours=h) for h in range(5)], dims='time'),
        'time': [
            datetime.datetime(2020, 1, 1, 12, 0),
            datetime.datetime(2020, 1, 1, 13, 0),
            datetime.datetime(2020, 1, 1, 14, 0),
            datetime.datetime(2020, 1, 1, 15, 0),
            datetime.datetime(2020, 1, 1, 16, 0)
        ]
    }
)

dt = xarray_fmrc.from_dict({datetime.datetime(2020, 1, 1, 0, 0):ds0, datetime.datetime(2020, 1, 1, 12, 0):ds1})

Applying all functions provided by xarray_fmrc, e.g. dt.fmrc.constant_offset('1h') result in the same error:

ValueError: those coordinates do not have an index: {'forecast_offset'}

What am I doing wrong?

Thank you very much for your help!

@observingClouds observingClouds added the question Further information is requested label Jul 26, 2024
Copy link

Hello @observingClouds, thank you for your interest in our work!

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

@abkfenris
Copy link
Owner

You're right, I haven't gotten to do much with this recently so it is still rough, but I still want to work on it, so I'm happy to have you kick the tires.

From a quick look, I think the format is right, but in some cases xarray doesn't automatically create an index for a coordinate. I'm not exactly sure why it doesn't always do it, but how about trying giving it a nudge to create the index for 'forecast_offset' on both ds0 and ds1?

@observingClouds
Copy link
Author

Thanks @abkfenris for your quick response.

Registering forecast_offset explicitly as an index leads to a working dt.fmrc.constant_offset() and dt.fmrc.best(). dt.fmrc.constant_forecast continues to fail with ValueError: those coordinates do not have an index: {'forecast_offset'}.

The issue seems to be that to_dict is not registering any coordinates on the node level, like from_model_runs did. Installing the previous version (969afbf) and using dt = xarray_fmrc.from_model_runs([ds0,ds1]) works for all accessor functions.

Was there a reason to remove from_model_runs?

I'll might play around with this a bit more.

Just for completeness here are the extra lines of code to set the index of the datasets (which are not needed when using from_model_runs

ds0 = ds0.set_xindex(coord_names=['forecast_offset'])
ds1 = ds1.set_xindex(coord_names=['forecast_offset'])

@abkfenris
Copy link
Owner

I'm glad to hear that .from_model_runs worked

I made that switch to try to generalize things more, hopefully so that the library could evolve and support various datatree structures with different ways of looking up and matching to how folks were already structuring their data. Clearly it isn't quite there yet...

I probably should have an option to do some sort of validation using .to_dict(), or have a separate validation function.

FYI, I'm about to be largely away from internet access for the next two weeks, so my responses might be a little bit more delayed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants