-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for custom seasons spanning calendar years #423
base: main
Are you sure you want to change the base?
Conversation
96c5eca
to
fa087b7
Compare
Example result of # Before dropping
# -----------------
# 2000-1, 2000-2, and 2001-12 months in incomplete "DJF" seasons" so they are dropped
ds.time
<xarray.DataArray 'time' (time: 15)>
array([cftime.DatetimeGregorian(2000, 1, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 2, 15, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 3, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 4, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 5, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 6, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 7, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 8, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 9, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 10, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 11, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 12, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2001, 1, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2001, 2, 15, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2001, 12, 16, 12, 0, 0, 0, has_year_zero=False)],
dtype=object)
Coordinates:
* time (time) object 2000-01-16 12:00:00 ... 2001-12-16 12:00:00
Attributes:
axis: T
long_name: time
standard_name: time
bounds: time_bnds
# After dropping
# -----------------
ds_new.time
<xarray.DataArray 'time' (time: 12)>
array([cftime.DatetimeGregorian(2000, 3, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 4, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 5, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 6, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 7, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 8, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 9, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 10, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 11, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 12, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2001, 1, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2001, 2, 15, 0, 0, 0, 0, has_year_zero=False)],
dtype=object)
Coordinates:
* time (time) object 2000-03-16 12:00:00 ... 2001-02-15 00:00:00
Attributes:
axis: T
long_name: time
standard_name: time
bounds: time_bnds |
c11e505
to
dc0c325
Compare
Hey @lee1043, this PR seemed to be mostly done when I stopped working on it last year. I just had to fix a few things and update the tests. Would you like to check out this branch to test it out on real data? Also a code review would be appreciated. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #423 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 15 15
Lines 1546 1579 +33
=========================================
+ Hits 1546 1579 +33 ☔ View full report in Codecov by Sentry. |
@tomvothecoder sure, I will test it out and review. Thank you for the update! |
@tomvothecoder Can this be considered for v0.7.0? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My PR self-review
warnings.warn( | ||
"The `season_config` argument 'drop_incomplete_djf' is being " | ||
"deprecated. Please use 'drop_incomplete_seasons' instead.", | ||
DeprecationWarning, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Need to a specify a specific version that we will deprecate drop_incomplete_djf. Probably v0.8.0 or v0.9.0.
if len(input_months) != len(predefined_months): | ||
raise ValueError( | ||
"Exactly 12 months were not passed in the list of custom seasons." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed requirements for all 12 months to be included in a custom season
This PR still needs thorough review before I'm confident in merging it. I'll probably tag Steve at some point. The release after v0.7.0 is more realistic and reasonable. We can always initiate a new release for this feature whenever it is merged. |
@tomvothecoder no problem. Thank you for consideration. |
@tomvothecoder it looks like when custom season go beyond calendar year (Nov, Dec, Jan) there is error as follows. import os
import xcdat as xc
input_data = os.path.join(
"/p/css03/esgf_publish/CMIP6/CMIP/AWI/AWI-CM-1-1-MR/historical/r1i1p1f1/Amon/psl/gn/v20181218/",
"psl_Amon_AWI-CM-1-1-MR_historical_r1i1p1f1_gn_201301-201312.nc")
ds = xc.open_mfdataset(input_data)
# Example of custom seasons in a three month format:
custom_seasons = [
['Dec', 'Jan'],
]
season_config = {'custom_seasons': custom_seasons, 'dec_mode': 'DJF', 'drop_incomplete_djf': True}
ds.temporal.group_average("psl", "season", season_config=season_config)
|
@lee1043 Thanks for trying to this out and providing an example script! I'll debug the stack trace. |
Hello, All this features are very nice, and needed, improvements ! Wiil this be available soon ? I've made some tests : it stiil doesn't work in xcdat 0.7.1. Olivier |
8262a8a
to
45b5cb0
Compare
Hi @oliviermarti, thanks for your interest in this feature. I've had minimal time to work on it and plan to ramp up development soon. I don't have a set timeline on when it will be done, but I'm hoping within the next few months. |
The issue is that this dataset only has a single year of data and does not have a complete
|
f6858ff
to
ec14495
Compare
After thorough testing, I actually think this PR is close to complete. |
Hey @oliviermarti, @arfriedman, and @DamienIrving, would any of you be interested in checking out this branch to test the custom season functionality? If so, please check out our contributing guide. Thanks! |
@tomvothecoder : this version seems perfect to me :-) (I've got the branch feature/416-custom-season-span : correct ? ) Thanks !! I join my very simple test case, which runs well : import os, sys
import numpy as np, xarray as xr, xcdat as xc
# Version check
print ( f'{sys.version = }' )
print ( f'{np.__version__ = }' )
print ( f'{xr.__version__ = }' )
print ( f'{xc.__version__ = }' )
print ( f'{xc.__file__ = }' )
# Creates time axis
nt=36
time = cftime.num2date ( np.arange(36,dtype=float)*30.0+15, units="day since 2000-01-01", calendar='360_day', only_use_cftime_datetimes=True,
only_use_python_datetimes=False, has_year_zero=None)
time = xr.DataArray (time, dims=('time',), coords=(time,) )
# Creates two tests variables
nmonth = np.arange (nt,dtype=float) + 1
nmod = np.mod (np.arange (nt,dtype=float), 12) +1
nmonth = xr.DataArray (nmonth, dims=('time',), coords=(time,) )
nmod = xr.DataArray (nmod , dims=('time',), coords=(time,) )
# Creates dataset
dd = xr.Dataset ( {'nmonth': nmonth, 'nmod':nmod})
# Writes dataset and open it with xcdat
if os.path.exists ('test.nc') : os.remove ('test.nc')
dd.to_netcdf ( 'test.nc', mode='w')
dx = xc.open_dataset ('test.nc', use_cftime=True, decode_times=True).bounds.add_missing_bounds()
print ( 'Check variables')
print (dx.nmonth.values)
print (dx.nmod.values)
print ( 'Year means are correct')
print (dx.temporal.group_average("nmonth", freq="year", weighted=True)['nmonth'].values)
print (dx.temporal.group_average("nmod" , freq="year", weighted=True)['nmod' ].values)
print ( ' Standard seasonnal means are correct')
print (dx.temporal.group_average("nmonth", freq="season", weighted=True,
season_config={'drop_incomplete_djf':False, 'dec_mode': 'DJF',})['nmonth'].values)
print (dx.temporal.group_average("nmod" , freq="season", weighted=True,
season_config={'drop_incomplete_djf':False, 'dec_mode': 'DJF',})['nmod' ].values)
print (dx.temporal.group_average("nmonth", freq="season", weighted=True,
season_config={'drop_incomplete_djf':True, 'dec_mode': 'DJF',})['nmonth'].values)
print (dx.temporal.group_average("nmod" , freq="season", weighted=True,
season_config={'drop_incomplete_djf':True, 'dec_mode': 'DJF',})['nmod' ].values)
print ( '3 months custom seasons are correct' )
custom_seasons = [ ["Dec", "Jan", "Feb"], ]
print (dx.temporal.group_average("nmonth", freq="season",
season_config={'drop_incomplete_djf':False, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',},
weighted=True)['nmonth'].values)
print (dx.temporal.group_average("nmod" , freq="season",
season_config={'drop_incomplete_djf':False, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',},
weighted=True)['nmod' ].values)
print (dx.temporal.group_average("nmonth", freq="season",
season_config={'drop_incomplete_djf':True, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',},
weighted=True)['nmonth'].values)
print (dx.temporal.group_average("nmod" , freq="season",
season_config={'drop_incomplete_djf':True, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',},
weighted=True)['nmod' ].values)
print ( '4 months custom seasons are correct' )
custom_seasons = [ ["Dec", "Jan", "Feb", "Mar"], ]
print (dx.temporal.group_average("nmonth", freq="season",
season_config={'drop_incomplete_djf':False, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',},
weighted=True)['nmonth'].values)
print (dx.temporal.group_average("nmod" , freq="season",
season_config={'drop_incomplete_djf':False, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',},
weighted=True)['nmod' ].values)
print (dx.temporal.group_average("nmonth", freq="season",
season_config={'drop_incomplete_djf':True, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',},
weighted=True)['nmonth'].values)
print (dx.temporal.group_average("nmod" , freq="season",
season_config={'drop_incomplete_djf':True, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',},
weighted=True)['nmod' ].values) Results :
|
Yes that is the correct branch. I'm happy to see your test cases pass! If you're interested, it'd be helpful to test on some real data too. |
Tom, On real data, version 0.7.1 returns a result, but a wrong result. It's hards to have an easy check on real data. Hence this academic test case. I've made a fews test on real data, from ESGF, and raw data from my IPSL model. It seems OK. I've got problems with the IPSL model, but it's an another story outside the present scope : IPSL raw outputs have one time axis 'time_counter', but two time variables : IPSL outputs processed and available on ESGF are OK. |
Thank you. @lee1043 is also going to test on CMIP data.
I pushed |
Thank you @tomvothecoder! -- I will take a look. |
Here's an upstream version: pydata/xarray#9524 . I could use some help testing it out. |
- Remove logic for requiring all 12 months to be used
- Add conditional that determines whether subsetting time coordinates is necessary with custom seasons - Update docstrings for `season_config` - Add tests
388f4da
to
1a91228
Compare
@@ -285,24 +292,29 @@ def group_average( | |||
predefined seasons are passed, configs for custom seasons are | |||
ignored and vice versa. | |||
|
|||
Configs for predefined seasons: | |||
* "drop_incomplete_seasons" (bool, by default False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomvothecoder I am little bit concerned that this can be a breaking change if someone was using drop_incomplete_djf in their code and it starts not working when they updated xcdat version. I wonder this change can be incremental change, e.g., if drop_incomplete_djf is used, raise a will-be-deprecated warning message that encourages to use drop_incomplete_season instead. Internally, drop_incomplete_djf can be remapped to drop_incomplete_season until it is fully deprecated. Would this be a reasonable thing to consider?
Description
TODO:
_shift_spanning_months()
)custom_season = ["Nov", "Dec", "Jan", "Feb", "Mar"]
:["Nov", "Dec"]
are from the previous year since they are listed before"Jan"
["Jan", "Feb", "Mar"]
are from the current year_shift_decembers()
to shift other months too. This method shifts the previous year December to the current year so xarray can properly group "DJF" seasons spanning calendar years._drop_incomplete_seasons()
)_drop_incomplete_djf()
drop_incomplete_djf
withdrop_incomplete_season
Checklist
If applicable: