Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alarms based on nsteps #447

Merged
merged 20 commits into from
Aug 22, 2024
Merged

alarms based on nsteps #447

merged 20 commits into from
Aug 22, 2024

Conversation

jedwards4b
Copy link
Collaborator

@jedwards4b jedwards4b commented Mar 29, 2024

Description of changes

Fixes the case of alarms based on nsteps when components have different individual timesteps.

Specific notes

Contributors other than yourself, if any: @mvertens - Mariana was essential to this work, I was trying to work in the Fortran when she pointed out that we could avoid the problem by converting nsteps to another unit (seconds) in the python. This made it much easier.

CMEPS Issues Fixed #443

Are changes expected to change answers? bfb

Any User Interface Changes (namelist or namelist defaults changes)?

Testing performed

Please describe the tests along with the target model and machine(s)
If possible, please also added hashes that were used in the testing

The test was to run ./create_newcase --case /glade/derecho/scratch/jedwards/f.cam6_3_144.ne30.datest --res ne30pg3_ne30pg3_mg17 --compset FMTHIST --run-unsupported
then set

./xmlchange REST_OPTION=nsteps  
./xmlchange REST_N=12 
./xmlchange STOP_OPTION=nsteps
./xmlchange STOP_N=24 

Prior to this change the cpl and mosart restart files are not written at the proper time.

This change should not affect any other alarm settings, only those using nsteps.

@DeniseWorthen I have used nuopc_shr_methods.F90 in the mediator for this change, unfortunately this file is not shared with UFS. We need to discuss how to rearrange the code so that this works for UFS models.

@jedwards4b jedwards4b marked this pull request as draft March 29, 2024 20:19
@jedwards4b jedwards4b self-assigned this Mar 29, 2024
@jedwards4b
Copy link
Collaborator Author

The proposed fix for this is to move nuopc_shr_methods.F90 into cdeps/share and
cesm_share/src

@DeniseWorthen
Copy link
Collaborator

@jedwards4b If I understand, we already utilize some of the modules w/in CDEPS/share when we build CMEPS. So I think adding this nuopc_shr_method in that location would work.

@jedwards4b
Copy link
Collaborator Author

@DeniseWorthen One thing that I wasn't sure of is if you use the same *_cpl_dt variables as cesm does. If not we will need to figure out how to do this for ufs.

@DeniseWorthen
Copy link
Collaborator

@jedwards4b I see. No, we don't use the cpl_dt variables in the same way.

For restarts, ATM controls it's own restart writing frequency and we just ensure that the coupled components (using restart_n, restart_option) are in sync w/ that. We don't currently have any use case where we use nsteps as the restart frequency.

But, unrelated, this comes along at the same time that I need to make sure we can always write restarts at the end-of-run (this is for when we use IAU). For some reason, I have in mind that the write_restart_at_endofrun option was not working.

if (ChkErr(rc,__LINE__,u_FILE_u)) return
AlarmInterval = AlarmInterval * opt_n
if (mod(AlarmInterval, TimestepInterval) /= (timestepinterval*0)) then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please write a comment here as to why you are multiplying by 0.

cesm/driver/esm_time_mod.F90 Show resolved Hide resolved
@jedwards4b
Copy link
Collaborator Author

@DeniseWorthen I've added more error checking so that if someone in UFS does try to use the nsteps option it will throw an error. I also tested REST_OPTION='end' and fixed an issue, but it seems that issue would only affect cesm - the fix was in the driver code. Maybe the ufs driver is similar and needs the same change, but I don't have access to that code.

@DeniseWorthen
Copy link
Collaborator

@jedwards4b I'm having trouble w/ the gust variables again. I've made this change

index ad2eda90..1678238d 100644
--- a/mediator/med_phases_aofluxes_mod.F90
+++ b/mediator/med_phases_aofluxes_mod.F90
@@ -1717,10 +1717,6 @@ end subroutine med_aofluxes_map_ogrid2xgrid_input
     call fldbun_getfldptr(fldbun, 'So_duu10n', aoflux_out%duu10n, xgrid=xgrid, rc=rc)
     if (chkerr(rc,__LINE__,u_FILE_u)) return

-    call fldbun_getfldptr(fldbun, 'So_ugustOut', aoflux_out%ugust_out, xgrid=xgrid, rc=rc)
-    if (chkerr(rc,__LINE__,u_FILE_u)) return
-    call fldbun_getfldptr(fldbun, 'So_u10withGust', aoflux_out%u10_withGust, xgrid=xgrid, rc=rc)
-    if (chkerr(rc,__LINE__,u_FILE_u)) return
     call fldbun_getfldptr(fldbun, 'So_u10res', aoflux_out%u10res, xgrid=xgrid, rc=rc)
     if (chkerr(rc,__LINE__,u_FILE_u)) return
     call fldbun_getfldptr(fldbun, 'Faox_taux', aoflux_out%taux, xgrid=xgrid, rc=rc)
@@ -1752,8 +1748,12 @@ end subroutine med_aofluxes_map_ogrid2xgrid_input
     if (add_gusts) then
        call fldbun_getfldptr(fldbun, 'So_ugustOut', aoflux_out%ugust_out, xgrid=xgrid, rc=rc)
        if (chkerr(rc,__LINE__,u_FILE_u)) return
+       call fldbun_getfldptr(fldbun, 'So_u10withGust', aoflux_out%u10_withGust, xgrid=xgrid, rc=rc)
+       if (chkerr(rc,__LINE__,u_FILE_u)) return
+
     else
        allocate(aoflux_out%ugust_out(lsize)); aoflux_out%ugust_out(:) = 0._R8
+       allocate(aoflux_out%u10_withGust(lsize)); aoflux_out%u10_withGust(:) = 0._R8
     end if

   end subroutine set_aoflux_out_pointers

Which I think I can fix. It seems the last gust PR created a duplicate line for getting the So_ugustOut fldptr. But I think putting them both inside the use_gusts works.

I'm not sure what to do with the So_u10res pointer. Is that always used regardless of whether gusts are included or not?

@jedwards4b
Copy link
Collaborator Author

@DeniseWorthen Your latest comment does not seem to be related to this PR. Can you remove it and put it in the right place? Did you mean to put it here? #446

@jedwards4b
Copy link
Collaborator Author

Adding code to fix REST_OPTION=end turns out to be a can of worms - for this to work stop_alarm must be set first and I'm finding that that is not the case in at least some of the nuopc component caps. ugh

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Apr 1, 2024

@jedwards4b I don't think the issue is related to #446. The current main contains

call fldbun_getfldptr(fldbun, 'So_ugustOut', aoflux_out%ugust_out, xgrid=xgrid, rc=rc)

and then a few lines down

if (add_gusts) then
call fldbun_getfldptr(fldbun, 'So_ugustOut', aoflux_out%ugust_out, xgrid=xgrid, rc=rc)
if (chkerr(rc,__LINE__,u_FILE_u)) return
else
allocate(aoflux_out%ugust_out(lsize)); aoflux_out%ugust_out(:) = 0._R8
end if

This was done by @uturuncoglu in d56c50c. We now have another variable So_u10withGust which I thought could use the same fix (place inside of add_gusts). But now there seems to be a third variable (So_u10res) and I'm not sure how to fix that one for UFS. The error I see is

20240401 155101.781 ERROR            PET00 ESMCI_Container.h:178 ESMCI::Container::get() Invalid argument  - key does not exist: So_u10res
20240401 155101.781 ERROR            PET00 ESMCI_Container_F.C:451 c_esmc_containergetfield() Invalid argument  - Internal subroutine call returned Error
20240401 155101.781 ERROR            PET00 ESMF_Container.F90:636 ESMF_ContainerGetField() Invalid argument  - Internal subroutine call returned Error
20240401 155101.781 ERROR            PET00 ESMF_FieldBundle.F90:11581 ESMF_FieldBundleGetItem() Invalid argument  - Internal subroutine call returned Error

@DeniseWorthen
Copy link
Collaborator

@jedwards4b I can't test your branch because it contains the changes related to gusts that are raising an issue on my side

https://github.com/jedwards4b/CMEPS/blob/44cb9019a00ff02191802be5bcccc7d9d4087d38/mediator/med_phases_aofluxes_mod.F90#L1720

@jedwards4b
Copy link
Collaborator Author

but that change was not introduced here - find the pr that introduced the problem or open a new issue.

@DeniseWorthen
Copy link
Collaborator

@jedwards4b Yes, thanks. I was trying to be responsive and test your nsteps PR. I am not able to test because the previous PR (which I will find) was merged w/o testing by UFS. So this is the first I've seen this failure. I'm sorry if I've been unclear.

@DeniseWorthen
Copy link
Collaborator

@jedwards4b I've had to put this aside for a couple of days while I work on finishing something. But I will get back to it soon.

@jedwards4b
Copy link
Collaborator Author

okay - I plan to work on it more anyway when derecho comes back. There are three distinct copies of subroutine alarmInit in cmeps - I plan to merge these into a single routine, I also figured out a better solution to the REST_OPTION='end' issue.

@jedwards4b
Copy link
Collaborator Author

@DeniseWorthen can you try this PR again and see if it works for you now.

@DeniseWorthen
Copy link
Collaborator

Yes, will do. I'm not able to get onto Derecho right now though. It's still down for me.

@jedwards4b
Copy link
Collaborator Author

yes, it's still down for everyone - i didn't realize that you were testing there too.

@DeniseWorthen DeniseWorthen mentioned this pull request Apr 11, 2024
@jedwards4b jedwards4b requested review from mvertens and removed request for uturuncoglu August 21, 2024 18:32
@jedwards4b
Copy link
Collaborator Author

@DeniseWorthen I've updated to the latest cmeps main and tested this with cesm - do you want to try it again?

@jedwards4b jedwards4b marked this pull request as ready for review August 21, 2024 18:33
@DeniseWorthen
Copy link
Collaborator

@jedwards4b Sure. I'll start the tests later today.

@DeniseWorthen
Copy link
Collaborator

@jedwards4b I can't compile because it can't find nuopc_shr_methods.

@jedwards4b
Copy link
Collaborator Author

Sorry - can you try with this CDEPS branch:
To https://github.com/jedwards4b/CDEPS

  • [new branch] add/nuopc_shr_methods.F90 -> add/nuopc_shr_methods.F90

@DeniseWorthen
Copy link
Collaborator

Thanks. I think remember now that I did this before in testing. I can compile it now, so I'll start the tests.

@DeniseWorthen
Copy link
Collaborator

@jedwards4b This doesn't cause any issues. I do get a new field in the CDEPS restart files (strlen) but that is the only impact.

Copy link
Collaborator

@mvertens mvertens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be fewer changes to have med_time_mod.F90 simply use nuopc_shr_methods directly and not get rid of the file. That way it becomes explicit as to what is happening and there are much fewer changes to this code.

cime_config/namelist_definition_drv.xml Show resolved Hide resolved
cime_config/runseq/driver_config.py Show resolved Hide resolved
mediator/med_phases_aofluxes_mod.F90 Show resolved Hide resolved
@jedwards4b jedwards4b merged commit 5b7d769 into ESCOMP:main Aug 22, 2024
2 checks passed
@jedwards4b jedwards4b deleted the fix/nsteps_alarms branch August 22, 2024 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

REST_OPTION=nsteps not working correctly
3 participants