Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix two bugs in timeSeriesStats output in MPAS-Ocean #5218

Closed

Conversation

xylar
Copy link
Contributor

@xylar xylar commented Oct 5, 2022

This merge fixes two issues in timeSeriesStatsDaily and timeSeriesStatsMonthly in MPAS-Ocean.

The first is that the value xtimeMonthly_start was inaccurate at the start of a simulation and after model restart. This occurred because of a combination of problems.

  1. When E3SM reset the MPAS clock, MPAS alarms are recalibrated so they ring immediately (rather than at the end of the desired time-averaging interval), a behavior different from MPAS standalone runs.
  2. As a result, it turns out to be desirable to immediately check (and reset) alarms when MPAS) starts for the first time or restarts, allowing the time interval to begin afresh and time averaging to start over if appropriate.
  3. Because MPAS-O is a step behind when starting a new E3SM simulation, it is desirable to compute time-series stats on startup (but not on restart) so the first day or month is complete (rather than short one time step).
  4. xtimeMonthly_start was being computed by subtracting a time step from the current time, rather than a compute interval of timeSeriesStats after a restart.

This issue was present in E3SM output.
closes #5217

The second issue does not affect E3SM output because E3SM doesn't use restart files for timeSeriesStats. However, standalone runs that do use restart files for this analysis were experiencing a problem in which time averaging was not getting reset after a restart, leading to time averages over the whole simulation rather than the expected (daily or monthly) interval. This issue has been fixed by referencing the reset alarm to the simulationStartTime rather than the beginning of the current (possibly restart) run.

closes MPAS-Dev/MPAS-Model#121

@xylar
Copy link
Contributor Author

xylar commented Oct 5, 2022

This PR is based off of #5202 and will need to be rebased after that goes in.

@xylar
Copy link
Contributor Author

xylar commented Oct 5, 2022

Testing

I have used the 4 new tests introduced in MPAS-Dev/compass#435 to show that these fixes work as expected in standalone MPAS-Ocean and to demonstrate the bugs using the branch in #5202.

@xylar
Copy link
Contributor Author

xylar commented Oct 5, 2022

@jonbob, would you be willing to run a 1-year E3SM simulation with this branch that you can compare with something you already have handy? That should be plenty to verify that xtimeDaily_start and xtimeMonthly_start is fixed and that all other variables are unchanged. This can wait until #5202 goes in or a test run with #5202 could be the baseline, whichever is more efficient for you.

@xylar xylar requested a review from jonbob October 5, 2022 17:09
@xylar
Copy link
Contributor Author

xylar commented Oct 5, 2022

@mark-petersen, I'd like this fix to go in before the code freeze. Any chance you could review it soon?

@jonbob
Copy link
Contributor

jonbob commented Oct 5, 2022

@xylar -- I'll be happy to test, doing whatever runs you think are necessary. I'll prioritize getting #5202 in as well.

@xylar
Copy link
Contributor Author

xylar commented Oct 5, 2022

@matthewhoffman, your suggested fix of using simulationStartTime worked like a charm (on the very first try!).

@mark-petersen
Copy link
Contributor

Thanks, I'm reviewing now. MPAS-Seaice has the identical timeSeriesStats. Does it need the identical fix?

@xylar
Copy link
Contributor Author

xylar commented Oct 5, 2022

Yes, eventually, MPAS-Seaice will need this and also the equivalent of #5202. It seems hard to get both of those in and tested before the code freeze, though.

@xylar
Copy link
Contributor Author

xylar commented Oct 5, 2022

I tried an E3SM test (ERS_D.ne11_oQU240.WCYCL1850NS.anvil_intel) and it fails. Let me figure that out before you do anymore review or testing.

Copy link
Contributor

@mark-petersen mark-petersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with gnu debug, gnu optimized, and intel optimized using all daily and monthly timeSeriesStats tests and MPAS-Dev/compass#435. This PR passes all. Based on @xylar's comments, it looks like this fixes the problem described.

@xylar xylar force-pushed the ocn/fix-time-series-stats-restart branch from b86244a to d063071 Compare October 5, 2022 22:29
@xylar
Copy link
Contributor Author

xylar commented Oct 5, 2022

Update: ERS_D.ne11_oQU240.WCYCL1850NS.anvil_intel now runs with the fix I pushed. I'm testing ERS.ne11_oQU240.WCYCL1850NS.anvil_intel against a baseline from #5202 and will report back as soon as that finishes.

@xylar
Copy link
Contributor Author

xylar commented Oct 6, 2022

ERS.ne11_oQU240.WCYCL1850NS.anvil_intel passes and is BFB with #5202. @jonbob, this is ready for you to review but maybe fine to wait until #5202 goes in.

@xylar xylar force-pushed the ocn/fix-time-series-stats-restart branch from d063071 to 774b398 Compare October 13, 2022 20:09
@jonbob
Copy link
Contributor

jonbob commented Oct 19, 2022

note: running this PR in E3SM did not initially achieve the expected results

@xylar
Copy link
Contributor Author

xylar commented Oct 19, 2022

Yes, I'm working on debugging this. It is proving quite challenging.

@xylar xylar force-pushed the ocn/fix-time-series-stats-restart branch 4 times, most recently from 1857ba2 to 83c98e3 Compare October 21, 2022 08:37
@xylar
Copy link
Contributor Author

xylar commented Oct 21, 2022

@mark-petersen and @jonbob, I have updated the PR description and requested a re-review.

Comment on lines +1453 to +1457
if ($CONTINUE_RUN eq 'TRUE') {
add_default($nl, 'config_AM_timeSeriesStatsMonthly_compute_on_startup', 'val'=>".false.");
} else {
add_default($nl, 'config_AM_timeSeriesStatsMonthly_compute_on_startup', 'val'=>".true.");
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and similar changes above and below for other time-series stats seems to be necessary so that the first day or month in an E3SM simulation is complete. Let me know if you have concerns.

description="Logical flag determining if an analysis member computation occurs on start-up. You likely want this off for this (time series) analysis member because it will accumulate any state prior to time stepping (double counting the last time step)."
description="Logical flag determining if an analysis member computation occurs on start-up. This should be set to .false. for better efficiency in most cases because the results of this first call will simply be thrown away. The exception is at the beginning of new E3SM simulations (CONTINUE_RUN = FALSE), where the first time step will be missing if this is not set to .true."
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and related descriptions for other time-series stats have been updated for better accuracy and clarity. @jonbob, I didn't try to update the associate files in bld because I don't remember the exact process. I can look that up, though, if you would like me to take care of it.

@xylar
Copy link
Contributor Author

xylar commented Oct 21, 2022

Re-testing

I again ran the 4 new tests introduced in MPAS-Dev/compass#435 to show that these fixes work as expected in standalone MPAS-Ocean.

I also ran 3 months of the E3SM test case CMPASO-JRA1p4_TL319_EC30to60E2r2 on Chrysalis, with 2 restarts in between. I see (as expected) the following Time and Time_bnds values:

$ ncdump -v Time,Time_bnds 20221021.CMPASO-JRA1p4_TL319_EC30to60E2r2.fix-time-series-stats.chrysalis.mpaso.hist.am.timeSeriesStatsMonthly.0001-01-01.nc | tail

 Time = 15.5 ;

 Time_bnds =
  0, 31 ;

$ ncdump -v Time,Time_bnds 20221021.CMPASO-JRA1p4_TL319_EC30to60E2r2.fix-time-series-stats.chrysalis.mpaso.hist.am.timeSeriesStatsMonthly.0001-02-01.nc | tail

 Time = 45 ;

 Time_bnds =
  31, 59 ;
}

$ ncdump -v Time,Time_bnds 20221021.CMPASO-JRA1p4_TL319_EC30to60E2r2.fix-time-series-stats.chrysalis.mpaso.hist.am.timeSeriesStatsMonthly.0001-03-01.nc | tail

 Time = 74.5 ;

 Time_bnds =
  59, 90 ;
}

See results in:

/lcrc/group/e3sm/ac.xylar/scratch/chrys/20221021.CMPASO-JRA1p4_TL319_EC30to60E2r2.fix-time-series-stats.chrysalis/run

@xylar
Copy link
Contributor Author

xylar commented Oct 21, 2022

Getting there but testing in MPAS-Seaice is still showing problems that I want to understand before merging this.

@rljacob
Copy link
Member

rljacob commented Nov 3, 2022

telecon notes: waiting for @xylar to fix a few things.

@xylar
Copy link
Contributor Author

xylar commented Nov 3, 2022

Still debugging...

@xylar
Copy link
Contributor Author

xylar commented Nov 4, 2022

I looked into this further today. Similar fixes in MPAS-Seaice revealed a problem with this approach. I have an alternative in mind that I will try to implement next week. If this PR being open is a problem, we can close it for now and re-open later.

Before this merge, it was incorrectly assumed to be one time
step before the current time.  The correct interval depends on
the compute interval, which has now been made available in the
form of the `computeInterval`.
These variables will get read back on restart into the main
system versions of these variables, which is not what we would
want.  Instead, on restart, Time_bnds(1) needs to be
recomputed from xtime_start
Previously, timeSeriesStats was using the start time of the current
run as the beginning of stats.  Since this time starts over at
restarts, it was not triggering a reset of the stats as it should
have at the beginning of a restart interval that is commensurate
with the time-averaging interval.

With this merge, we now use the `simulationStartTime` to determine
when time-averaging began, so that alarms are correctly triggered
on restart.
With this merge, we copute timeSeriesStats on startup if we are
doing a fresh E3SM run but not if we are restarting.  This ensures
that we have a complete first day, month, etc.
@rljacob
Copy link
Member

rljacob commented Jan 5, 2023

notes: had to be redone. Still in progress.

@xylar
Copy link
Contributor Author

xylar commented Jan 5, 2023

I can close this for now as well.

@xylar xylar closed this Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants