Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run MED -> GLC in CISM NOEVOLVE mode #442

Merged
merged 2 commits into from
Apr 9, 2024

Conversation

billsacks
Copy link
Member

@billsacks billsacks commented Mar 21, 2024

Description of changes

Previously we hadn't been doing the mapping from MED -> GLC when CISM was running in NOEVOLVE mode. This was a problem because this downscaled SMB is one of the main benefits of running CISM in NOEVOLVE mode.

Resolves #426. See that issue for details.

Specific notes

Contributors other than yourself, if any: Consultation with @mvertens

CMEPS Issues Fixed (include github issue #): #426

Are changes expected to change answers? Changes answers for some cpl hist fields (diagnostic only) in configurations with CISM in NOEVOLVE mode. Otherwise bit-for-bit.

Any User Interface Changes (namelist or namelist defaults changes)? No

Diffs in runseq

I have verified that the run sequence is the same as before for cases with CISM in EVOLVE mode, as well as for cases with sglc or xglc. Differences just appear in cases with CISM in NOEVOLVE mode.

T compset NOEVOLVE

These are the diffs for ERS_D_Ly3.f09_g17_gris4.T1850Gg.green_gnu.cism-noevolve:

--- ERS_D_Ly3.f09_g17_gris4.T1850Gg.green_gnu.cism-noevolve.20240320_165142_7a4g4j/CaseDocs/nuopc.runseq	2024-03-20 16:54:53
+++ ERS_D_Ly3.f09_g17_gris4.T1850Gg.green_gnu.cism-noevolve.20240320_165220_owwtn2/CaseDocs/nuopc.runseq	2024-03-20 16:53:06
@@ -3,6 +3,8 @@
   LND
   LND -> MED :remapMethod=redist
   MED med_phases_post_lnd
+  MED med_phases_prep_glc
+  MED -> GLC :remapMethod=redist
   GLC -> MED :remapMethod=redist
   MED med_phases_history_write
   MED med_phases_history_write

I compset NOEVOLVE

These are the diffs for SMS_Lm13.f10_f10_mg37.I1850Clm50SpG.green_gnu.cism-noevolve:

--- SMS_Lm13.f10_f10_mg37.I1850Clm50SpG.green_gnu.cism-noevolve.20240320_165200_idgr5p/CaseDocs/nuopc.runseq	2024-03-20 16:55:01
+++ SMS_Lm13.f10_f10_mg37.I1850Clm50SpG.green_gnu.cism-noevolve.20240320_165228_o0jlbq/CaseDocs/nuopc.runseq	2024-03-20 16:53:12
@@ -16,10 +16,12 @@
   ROF
   ROF -> MED :remapMethod=redist
   MED med_phases_post_rof
+@
+  MED med_phases_prep_glc
+  MED -> GLC :remapMethod=redist
+  GLC -> MED :remapMethod=redist
   MED med_phases_history_write
   MED med_phases_restart_write
   MED med_phases_profile
 @
-  GLC -> MED :remapMethod=redist
-@
 ::

Note that, in addition to adding MED med_phases_prep_glc and MED -> GLC :remapMethod=redist at daily frequency, this also changes MED med_phases_history_write, MED med_phases_restart_write and MED med_phases_profile to be done in the outer (86400) loop rather than the 10800 loop. This is because I have removed the setting of the active flag in the relevant call to runseq.enter_time_loop. This is an intentional change: my intuition is that it's right to only call these things daily in this case, since the med2glc fields will likely be wrong if history / restart is at a frequency other than daily. But I think this will restrict us to restarting a CISM NOEVOLVE case no more frequently than daily (as is also the case for a CISM EVOLVE case), and I'm not sure if this change might have some other negative implications that I'm not thinking of / aware of. We could change this back if people feel it would be better to leave it as it was before. (This could be changed back by restoring the active flag but now setting active=run_glc.)

Testing performed

Tested in the context of CISM-wrapper on ESCOMP/CISM-wrapper@feaf293 (on the branch in this PR: ESCOMP/CISM-wrapper#86). I ran the full aux_glc test suite on derecho (intel & gnu) with comparisons against baseline. Failures were as expected:

    FAIL ERS_D_Ly3.f09_g17_gris4.T1850Gg.derecho_intel.cism-noevolve NLCOMP
    FAIL ERS_D_Ly3.f09_g17_gris4.T1850Gg.derecho_intel.cism-noevolve BASELINE /glade/derecho/scratch/sacks/aux_glc_240301164745/baselines: DIFF
    FAIL NCK_Ly3.f09_g17_gris20.T1850Gg.derecho_gnu COMPARE_base_multiinst
    FAIL SMS_Lm13.f10_f10_mg37.I1850Clm50SpG.derecho_intel.cism-noevolve NLCOMP
    FAIL SMS_Lm13.f10_f10_mg37.I1850Clm50SpG.derecho_intel.cism-noevolve BASELINE /glade/derecho/scratch/sacks/aux_glc_240301164745/baselines: DIFF

(The NCK test fails on master, too.) I confirmed that the diffs are as expected.

Note that, for this testing, I cherry-picked the changes on this CMEPS branch onto cmeps0.14.43, which is the version currently used in CISM-wrapper master.

Previously we hadn't been doing the mapping from MED -> GLC when CISM
was running in NOEVOLVE mode. This was a problem because this downscaled
SMB is one of the main benefits of running CISM in NOEVOLVE mode.

Resolves ESCOMP#426. See that issue for details.

One other change here is that I skip GLC -> MED in NOEVOLVE mode. I
*think* that's a safe thing to do, but I'm not 100% sure of it.
When this was based on run_glc, NOEVOLVE cases were failing like this:

20240319 135112.648 ERROR            PET0000 ESMF_ArrayGet.F90:1949 ESMF_ArrayGetFPtr Object being used before creation  - Bad Object
20240319 135112.649 ERROR            PET0000 ESMF_FieldGet.F90:2634 ESMF_FieldGetDataPtr Object being used before creation  - Internal subroutine call returned Error
20240319 135112.649 ERROR            PET0000 nuopc_shr_methods.F90:304 Object being used before creation  - Passing error in return code
20240319 135112.649 ERROR            PET0000 glc_import_export.F90:475 Object being used before creation  - Passing error in return code
20240319 135112.649 ERROR            PET0000 glc_comp_nuopc.F90:481 Object being used before creation  - Passing error in return code
20240319 135112.649 ERROR            PET0000 ESM0001:src/addon/NUOPC/src/NUOPC_Driver.F90:2901 Object being used before creation  - Phase 'IPDv01p3' Initialize for modelComp 3: GLC did not return ESMF_SUCCESS
20240319 135112.649 ERROR            PET0000 ESM0001:src/addon/NUOPC/src/NUOPC_Driver.F90:1985 Object being used before creation  - Passing error in return code
20240319 135112.649 ERROR            PET0000 ensemble:src/addon/NUOPC/src/NUOPC_Driver.F90:2901 Object being used before creation  - Phase 'IPDv02p3' Initialize for modelComp 1: ESM0001 did not return ESMF_SUCCESS
20240319 135112.649 ERROR            PET0000 ensemble:src/addon/NUOPC/src/NUOPC_Driver.F90:1990 Object being used before creation  - Passing error in return code
20240319 135112.649 ERROR            PET0000 ensemble:src/addon/NUOPC/src/NUOPC_Driver.F90:489 Object being used before creation  - Passing error in return code
20240319 135112.649 ERROR            PET0000 esmApp.F90:134 Object being used before creation  - Passing error in return code
20240319 135112.649 INFO             PET0000 Finalizing ESMF

This fails in the InitializeRealize call to export_fields, and
specifically in this:

       ! Set scalars in export state
       call State_SetScalar(dble(get_nx_tot(instance_index=ns)), flds_scalar_index_nx, &
            NStateExp(ns), flds_scalar_name, flds_scalar_num, rc)

So it seems like we need to have GLC -> MED in the run sequence for
initialization to work correctly.
@billsacks billsacks requested a review from mvertens March 21, 2024 15:18
@billsacks billsacks self-assigned this Mar 21, 2024
Comment on lines -38 to -49
comp_glc = case.get_value("COMP_GLC")
run_glc = False
post_glc = False
if (comp_glc == 'cism'):
run_glc = True
if case.get_value("CISM_EVOLVE"):
post_glc = True
else:
post_glc = False
elif (comp_glc == 'xglc'):
run_glc = True
post_glc = True
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the other changes in this PR, this whole block became redundant.

Comment on lines +58 to +74
# Note that we do some aspects of the GLC outer loop even if run_glc is False
# (as long as med_to_glc is True).
#
# Note that, in contrast to the other outer_loop variables, this doesn't check glc_cpl_time.
# This is for consistency with the logic that was in place before adding this variable;
# this seems to implicitly assume that glc_cpl_time > atm_cpl_time.
glc_outer_loop = med_to_glc

inner_loop = ((atm_cpl_time < ocn_cpl_time) or
(atm_cpl_time < rof_cpl_time) or
(run_glc and atm_cpl_time < glc_cpl_time) or
(glc_outer_loop and atm_cpl_time < glc_cpl_time) or
atm_cpl_time == ocn_cpl_time)

with RunSeq(os.path.join(caseroot, "CaseDocs", "nuopc.runseq")) as runseq:

#------------------
runseq.enter_time_loop(glc_cpl_time, newtime=run_glc, active=med_to_glc)
runseq.enter_time_loop(glc_cpl_time, newtime=glc_outer_loop)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This piece (the control over the inner_loop and enter_time_loop's newtime) is the piece I'm least certain of and would be most happy to get a second set of eyes on. These now depend on med_to_glc rather than run_glc – though note that I have effectively swapped the logic for run_glc and med_to_glc in this PR: in most cases they have the same value, but in a NOEVOLVE case now med_to_glc is True and run_glc is False, whereas previously the reverse was the case. It seems right to have these depend on the one that is more often True (med_to_glc now).

Copy link
Collaborator

@jedwards4b jedwards4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@billsacks
Copy link
Member Author

Per @jedwards4b email, we'll wait to merge this until after #446 is merged, because we want #446 in alpha17d and want this PR in alpha17e (coordinated with ESCOMP/CISM-wrapper#86).

@jedwards4b jedwards4b merged commit f9169ce into ESCOMP:main Apr 9, 2024
2 checks passed
@billsacks
Copy link
Member Author

For future reference: it looks like this got merged to main as part of #445

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When CISM is running in NOEVOLVE mode, we should still do MED -> GLC
3 participants