Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Example for several Scenarios #521

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

veni-vidi-vici-dormivi
Copy link
Collaborator

if this works could also add an integration test for this

  • Closes add later
  • Tests added
  • Fully documented, including CHANGELOG.rst

@veni-vidi-vici-dormivi
Copy link
Collaborator Author

image

Okay so when I plot the residuals after removing the global trend (including volcanic forcing) where I treat the historical members as their own scenario I get this "mismatch" at the transition from historical to projected period. At the moment I would say that this is not pretty but actually, for the fitting it should be fine as long as we keep treating the historical data as its own scenario for the AR processes. For the linear regressions and the variances/covariances it doesn't matter since these do not consider time dependency. But I would be happy if a second brain went over this as well @mathause 🙂

We should point out however that in the emulation process one should use a continuous time series not ensure continuity of the realization.

Copy link

codecov bot commented Sep 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 49.77%. Comparing base (0e15d4e) to head (ef315d7).
Report is 12 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #521      +/-   ##
==========================================
+ Coverage   49.76%   49.77%   +0.01%     
==========================================
  Files          50       50              
  Lines        3563     3572       +9     
==========================================
+ Hits         1773     1778       +5     
- Misses       1790     1794       +4     
Flag Coverage Δ
unittests 49.77% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@veni-vidi-vici-dormivi
Copy link
Collaborator Author

Okay, I'm actually surprisingly happy with this approach and impressed by what xarray and datatree can do. I feel like the data tree approach I went for here (holding one dataset per scenario that holds the members along the dimensions) is nice.

Nevertheless, I want to rewrite the autoregression functions to work on data trees instead of the arg list. For the linear regression and covariance we could think about implementing functions that take care of the stacking and weighting automatically. Actually I think this would be quite fun.

But I want to focus on MESMER-X for the rest of the week.

@veni-vidi-vici-dormivi
Copy link
Collaborator Author

@yquilcaille You can use this now. All functionality should stay the same as it is in here now, just that some of the manual data prepping I do will be moved into functions, which needs more time to implement cleanly.

One thing, if you calibrate on all the ESMs, could you tell me if you ever run into singular correlation matrices when fitting for the best localization radius? At the moment this should abort the fitting and we are still debating if it is worth to implement a version where we singular matrices are allowed. Thank you!

Copy link
Collaborator

@yquilcaille yquilcaille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @veni-vidi-vici-dormivi! The "surfer" looks good, no problem to add it. I agree that the preparation of the data should be moved into functions with the future cleaning. Also, some users may benefit from easy wrappers, like one for training, one for emulation.

I will now use this surfer to prepare the training of all ESMs and emulations for FASTMIP. Promised, if any issue appears on the singular matrices, I will let you know :)

@mathause
Copy link
Member

Thanks! Cool that this works & sorry for the late reply. I would like to see some changes before merging, though,

  • Should we use the example datasets so it's self-contained? Maybe could mention how to use with cmip6-ng.
  • Did you ever double check this is consistent with Lea's results?
  • rename notebook to e.g. example_mesmer_multi_ens_multi_scen.ipynb?
  • There is quite a bit of clean-up possible
    • unused functionality and imports (but see the suggestions above)
    • in Cell 12 you manipulate the data in a function call which is not nice (in a tutorial)
    • there is more but commenting a notebook is annoying - I might go over the notebook later, but would appreciate if you gave it a first pass
  • the emulations have the same variability. Haven't we discussed this?
    • can you gather the emulation part into a function (in the notebook) - maybe needs a function for the variability and one for the trend?
    • should the historical part have for two different scenarios have the same variability? I think both should be possible

@mathause
Copy link
Member

TODO: check the status of datatree in xarray - it would be good if we can enable using it from xarray (DataTree and map_over_subtree is now available from the main xarray namespace) but needs a relatively new version.

@mathause mathause mentioned this pull request Oct 1, 2024
4 tasks
@veni-vidi-vici-dormivi
Copy link
Collaborator Author

Should we use the example datasets so it's self-contained? Maybe could mention how to use with cmip6-ng.

Yes absolutely, have done this locally already, will push it soon. I am currently working on implementing the data tree approach in the repo and moving this into the integration tests.

Did you ever double check this is consistent with Lea's results?

It is not because I treat the historical period as a completely independent scenario, i.e. I smooth historical and scenario separately thus leading to different values around the transition from historical to future period. This leads to different values than Lea's in the smoothed global mean and thus the residuals and everything thereafter, thus all the parameters. What do you think about this? I think that it is more elegant as there is no duplication of the historical period. As I see it, Lea solved this by taking the median over the scenario hists before:

gt_s, time_s = separate_hist_future(gt, time, cfg)
# compute median LOWESS estimate of historical part across all scenarios
gt_hist_all = gt_s.pop("hist")
gt_hist_median = np.median(gt_hist_all, axis=0)

rename notebook to e.g. example_mesmer_multi_ens_multi_scen.ipynb?

Agree. Will do.

There is quite a bit of clean-up possible

Will get back to this later

the emulations have the same variability. Haven't we discussed this?
can you gather the emulation part into a function (in the notebook) - maybe needs a function for the variability and one for the trend?
should the historical part have for two different scenarios have the same variability? I think both should be possible

Ah right. If we want different historical variability we just need different seeds for each scenario right, so both is possible depending on the seed?
Yes, good idea to gather it into a function in the notebook but not in the repo, I like that. I need to think about how to write it so that we can potentially reuse the trend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants