Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEOSgcm NH Replay-NoIncrement is not identical to AMIP #275

Open
mathomp4 opened this issue Apr 1, 2021 · 27 comments
Open

GEOSgcm NH Replay-NoIncrement is not identical to AMIP #275

mathomp4 opened this issue Apr 1, 2021 · 27 comments
Assignees
Labels
bug Something isn't working

Comments

@mathomp4
Copy link
Member

mathomp4 commented Apr 1, 2021

This was discovered by @sdrabenh as he was looking at the nonhydrostatic (NH) GEOS model. Turns out if you build GEOS for non-hydrostatic dynamics, the model does not start-stop regress. This usually points to an internal state issue. The question is: where?

I suppose there are two possible culprits: Dynamics and Moist. Maybe perhaps DZ and W are not being carried around correctly in Dyn? Or perhaps Moist runs differently in non-hydro?

I'm adding up the usual suspects (@atrayano, @bena-nasa, @wmputman) to this issue for opinions and thoughts.

@mathomp4 mathomp4 added the bug Something isn't working label Apr 1, 2021
@mathomp4
Copy link
Member Author

mathomp4 commented Apr 1, 2021

For ease of testing, I have created a branch here on the fixture that grabs all of my PRs for this:

feature/mathomp4/nonhydro-gcm

To build it, you either use ./parallel_build.csh -nonhydrostatic or you add:

-DHYDROSTATIC=NO

to the cmake command.

Then make sure you are correctly selecting nonhydrostatic in gcm_setup. One of my PR in this branch defaults it to whatever the -DHYDROSTATIC was passed in as.

ETA: The differences between this and main are:

diff --git a/components.yaml b/components.yaml
index 8ea3fd0..5cab83b 100644
--- a/components.yaml
+++ b/components.yaml
@@ -5,7 +5,7 @@ GEOSgcm:
 env:
   local: ./@env
   remote: ../ESMA_env.git
-  tag: v3.1.3
+  branch: feature/mathomp4/update-pbuild-for-hydrostatic
   develop: main

 cmake:
@@ -55,13 +55,13 @@ GEOSgcm_GridComp:
 FVdycoreCubed_GridComp:
   local: ./src/Components/@GEOSgcm_GridComp/GEOSagcm_GridComp/GEOSsuperdyn_GridComp/@FVdycoreCubed_GridComp
   remote: ../FVdycoreCubed_GridComp.git
-  tag: v1.2.10
+  branch: feature/mathomp4/update-setup-for-hydrostatic-fv3
   develop: develop

 fvdycore:
   local: ./src/Components/@GEOSgcm_GridComp/GEOSagcm_GridComp/GEOSsuperdyn_GridComp/@FVdycoreCubed_GridComp/@fvdycore
   remote: ../GFDL_atmos_cubed_sphere.git
-  tag: geos/v1.1.4
+  tag: geos/v1.1.5
   develop: geos/develop

 GEOSchem_GridComp:
@@ -93,7 +93,7 @@ mom6:
 GEOSgcm_App:
   local: ./src/Applications/@GEOSgcm_App
   remote: ../GEOSgcm_App.git
-  tag: v1.3.15
+  branch: feature/mathomp4/update-setup-for-hydrostatic
   develop: develop

 UMD_Etc:

@mathomp4
Copy link
Member Author

mathomp4 commented Apr 1, 2021

As an additional clue, @sdrabenh said that he saw an issue with the zero-increment replay (which I think means you turn on regular replay, but set REPLAY_T, REPLAY_Q, et al, to NO. @sdrabenh can you tell us what test you did that saw this issue? Might help point us to the issue.

@mathomp4
Copy link
Member Author

mathomp4 commented Apr 1, 2021

Note: I saw start-stop failure with a "default" experiment. I was only doing NH at C24 and nothing else exciting. (That is, no GFDL microphysics, etc.)

@sdrabenh
Copy link
Collaborator

sdrabenh commented Apr 1, 2021

As an additional clue, @sdrabenh said that he saw an issue with the zero-increment replay (which I think means you turn on regular replay, but set REPLAY_T, REPLAY_Q, et al, to NO. @sdrabenh can you tell us what test you did that saw this issue? Might help point us to the issue.

Correct, I ran the default 1MOM L72 C48 nonhydrostatic build for 1 day and it failed gcm_regress.j and the 0-increment test @mathomp4 mentioned which should produce results identical to AMIP mode. Furthermore, we are seeing similar issues in the GF2020 development branches. Since we haven't typically tested the NH model, the question is whether this is a new or old problem?

@mathomp4
Copy link
Member Author

mathomp4 commented Apr 1, 2021

It looks like we never get the DZ or W internal state pointers in DynCore_GridCompMod.F90:

    call MAPL_GetPointer(INTERNAL, AK, 'AK', RC=STATUS)
    call MAPL_GetPointer(INTERNAL, BK, 'BK', RC=STATUS)
    call MAPL_GetPointer(INTERNAL,UD,'U'  ,RC=STATUS)
    call MAPL_GetPointer(INTERNAL,VD,'V'  ,RC=STATUS)
    call MAPL_GetPointer(INTERNAL,PE,'PE' ,RC=STATUS)
    call MAPL_GetPointer(INTERNAL,PT,'PT' ,RC=STATUS)
    call MAPL_GetPointer(INTERNAL,PK,'PKZ',RC=STATUS)

Maybe we should if nonhydro?

@mathomp4
Copy link
Member Author

mathomp4 commented Apr 1, 2021

Correct, I ran the default 1MOM L72 C48 nonhydrostatic build for 1 day and it failed gcm_regress.j and the 0-increment test @mathomp4 mentioned which should produce results identical to AMIP mode. Furthermore, we are seeing similar issues in the GF2020 development branches. Since we haven't typically tested the NH model, the question is whether this is a new or old problem?

Ah yes. AMIP and Replay-NoInc should be identical. Thanks. I knew there was a test for that!

@sdrabenh
Copy link
Collaborator

sdrabenh commented Apr 1, 2021

It looks like we never get the DZ or W internal state pointers in DynCore_GridCompMod.F90:

    call MAPL_GetPointer(INTERNAL, AK, 'AK', RC=STATUS)
    call MAPL_GetPointer(INTERNAL, BK, 'BK', RC=STATUS)
    call MAPL_GetPointer(INTERNAL,UD,'U'  ,RC=STATUS)
    call MAPL_GetPointer(INTERNAL,VD,'V'  ,RC=STATUS)
    call MAPL_GetPointer(INTERNAL,PE,'PE' ,RC=STATUS)
    call MAPL_GetPointer(INTERNAL,PT,'PT' ,RC=STATUS)
    call MAPL_GetPointer(INTERNAL,PK,'PKZ',RC=STATUS)

Maybe we should if nonhydro?

Good question perhaps @wmputman can comment

@mathomp4
Copy link
Member Author

mathomp4 commented Apr 1, 2021

I forgot about FV_StateMod.F90. We do get the DZ and W internal state there. Maybe that file matters more...

@bena-nasa
Copy link
Contributor

I'll investigate the restart

@sdrabenh
Copy link
Collaborator

sdrabenh commented Apr 1, 2021

Another clue, in a separate test built with the feature/wmputman/DevDYAMONDv2_Merge04nhGF2020evap3 branches the GFDL NH model does regress. Could this be in issue with the 1 MOM microphysics? Unfortunately, that failed the 0-increment replay test.

@mathomp4
Copy link
Member Author

mathomp4 commented Apr 1, 2021

I just tried GFDL with the "stock" GCM and it doesn't regress. So I guess GFDL is different in that branch!

@bena-nasa
Copy link
Contributor

I think DZ and W are being written ok in the restart, the pointers are filled in in FV_StateMod.F90 so I'm not sure what is going on. If you do a zero-length run you get identical restarts so. I can confirm the start/stop failure.
I'm wondering if the standalone FV3 would show this.
Could we be missing a variable? This is bizarre.

@wmputman
Copy link

wmputman commented Apr 2, 2021

I think start/stop NH is due to the use of make_nh in the fv3 nml. You cannot have Make_NH = .T. for both segments, just the first segment

@wmputman
Copy link

wmputman commented Apr 2, 2021

Alternatively, if you regrid restarts, or make all the DZ values = 0.0, FV3 with make_nh automatically without the nml flag

@mathomp4
Copy link
Member Author

mathomp4 commented Apr 2, 2021

I think start/stop NH is due to the use of make_nh in the fv3 nml. You cannot have Make_NH = .T. for both segments, just the first segment

@wmputman Oooh. Okay. I'll work on testing this as I think I can easily make a sed rule for the second segment.

But, it does look like my (newly-regridded) restarts have all-0 W which this code bit from FV_StateMod seems to care about:

1575if (fv_first_run) then
1576   │      ! Make_NH
1577if ( .not. FV_Atm(1)%flagstruct%hydrostatic ) then
1578if (all(FV_Atm(1)%w(isc:iec,jsc:jec,:) == 0.0)) FV_Atm(1)%flagstruct%Make_NH = .true.
1579if ( FV_Atm(1)%flagstruct%Make_NH ) then
1580if (mpp_pe()==0) print*, 'fv_first_run: FV3 is making Non-Hydrostatic W and DZ'
1581call p_var(FV_Atm(1)%npz,         isc,         iec,       jsc,     jec,  FV_Atm(1)%ptop,     ptop_min,  &
1582   │                      FV_Atm(1)%delp, FV_Atm(1)%delz, FV_Atm(1)%pt, FV_Atm(1)%ps, FV_Atm(1)%pe,  FV_Atm(1)%peln,   &
1583   │                      FV_Atm(1)%pk,   FV_Atm(1)%pkz, kappa, FV_Atm(1)%q, FV_Atm(1)%ng, &
1584   │                      FV_Atm(1)%ncnst, FV_Atm(1)%gridstruct%area_64, FV_Atm(1)%flagstruct%dry_mass,  &
1585   │                      FV_Atm(1)%flagstruct%adjust_dry_mass,  FV_Atm(1)%flagstruct%mountain, &
1586   │                      FV_Atm(1)%flagstruct%moist_phys,  FV_Atm(1)%flagstruct%hydrostatic, &
1587   │                      FV_Atm(1)%flagstruct%nwat, FV_Atm(1)%domain, FV_Atm(1)%flagstruct%make_nh)
1588   │           FV_Atm(1)%flagstruct%Make_NH=.false.
1589endif
1590endif
1591   │      ! Mark FV setup complete
1592   │       fv_first_run = .false.
1593endif

I guess the question is: should we have some sort of "test" in gcm_run.j that looks to see if W in fvcore_internal_rst is all zero and sets make_nh to false if so?

Or, perhaps should we do this:

  1. Compiled with HYDROSTATIC=YES, selects run NH in gcm_setup: make_nh: .T.
  2. Compiled with HYDROSTATIC=NO, selects run NH in gcm_setup: make_nh: .F.

A user could always set make_nh to whatever they want, but since regrid.pl seems to make all-0 W restarts, I don't think it's unreasonable that if you build for nonhydrostatic and run for nonhydrostatic, you'll have recently regridded restarts?

@wmputman
Copy link

wmputman commented Apr 2, 2021

@mathomp4: yes, having that variable in the nml as .F. is fine for me. And then a user can change it if they want to (at the risk of causing troubles for long runs when they forget to remove it).

@bena-nasa
Copy link
Contributor

bena-nasa commented Apr 2, 2021

I take it back, this does not fix it. If I set make_nh .F. for the 2nd segment it still fails regression.
Now the very weird thing, I whim I tried this:
I uncommented in the AGCM.rc
#USE_AEROSOL_NN: 0
It does pass regression, except for the gocart and irrad restarts as long as make_nh is false for the 2nd segment.

@mathomp4
Copy link
Member Author

mathomp4 commented Apr 2, 2021

Okay. I'll work on getting the logic right in gcm_setup, et al. Give me a bit...

@mathomp4
Copy link
Member Author

mathomp4 commented Apr 2, 2021

Okay, I pushed updates to gcm_setup for the make_nh bits.

@mathomp4
Copy link
Member Author

mathomp4 commented Apr 2, 2021

Also: weirdly, regression did work for me with my make_nh fixes in gcm_setup. I'm going to make a new clone with the branches I indicated above and be sure I don't have some non-committed fix.

@wmputman
Copy link

wmputman commented Apr 2, 2021

USE_AEROSOL_NN needs to change in concert with two GFDL-MP nml values: prog_ccn and use_ccn

USE_AEROSOL_NN: 0
prog_ccn: .F.
use_ccn: .T.

USE_AEROSOL_NN: 1
prog_ccn: .T.
use_ccn: .F.

This is only relevant for GFDL-MP, for 1MOM only USE_AEROSOL_NN matters

@mathomp4
Copy link
Member Author

mathomp4 commented Apr 2, 2021

Hmm. I was running 1MOM, and I'll try that first.

Also: I guess we should put a comment in the AGCM.rc telling folks to change those in concert?

@mathomp4
Copy link
Member Author

mathomp4 commented Apr 2, 2021

@wmputman Hmm. A question. Currently in Moist, USE_AEROSOL_NN defaults to 1, which is why you have to uncomment out the line in AGCM.rc to get it to 0.

But the default values in fvcore_layout.rc in GEOSgcm_App are:

40:  prog_ccn = .false.
64:  use_ccn = .true.

which matches your USE_AEROSOL_NN: 0 settings.

Should we change the defaults in fvcore_layout.rc to match the default in Moist? Or should GFDL be running with USE_AEROSOL_NN set to 0 by default if someone chooses it?

@mathomp4
Copy link
Member Author

mathomp4 commented Apr 2, 2021

Some testing. After I pushed the make_nh fix for gcm_setup:

Run Regress
AMIP NH PASS
Replay NH PASS
ReplayNoINC NH PASS

So that is good!

Now for the bad. As @sdrabenh reported, with NH dynamics, AMIP is not identical to Replay-NoIncrement! They are identical for hydrostatic dynamics, so I see no reason they shouldn't be for NH.

@mathomp4 mathomp4 changed the title GEOSgcm NH model does not regress: Start-Stop GEOSgcm NH Replay-NoIncrement is not identical to AMIP Apr 2, 2021
@mathomp4
Copy link
Member Author

mathomp4 commented Apr 2, 2021

I wonder, do we need a REPLAY_W: now for NH? Or maybe all those DUDT bits in AGCM/mkiau need to have a DWDT analogue? (This comment brought to you by "Matt doesn't really know how Replay works" aka "Matt asks @lltakacs or @atrayano when he has a replay question." 😄 )

@wmputman
Copy link

wmputman commented Apr 2, 2021

I'll have to look at add_incs for NH...

@wmputman
Copy link

wmputman commented Apr 2, 2021

There are never any W increments applied to FV3 (for now...) and DZ/PKZ just get adjusted based on T-Increments. If they are 0.0 there should be no change.

   if (.not. HYDROSTATIC ) then
      ! remove old T from DZ
      STATE%VARS%DZ = STATE%VARS%DZ / STATE%VARS%PT

      ! Update T
      STATE%VARS%PT =  STATE%VARS%PT                         *DPOLD
      STATE%VARS%PT = (STATE%VARS%PT + DT*TEND*(MAPL_CP/CVM))/DPNEW

      ! update DZ with new T
      STATE%VARS%DZ = STATE%VARS%DZ * STATE%VARS%PT
   else
      ! Update T
      STATE%VARS%PT =  STATE%VARS%PT                         *DPOLD
      STATE%VARS%PT = (STATE%VARS%PT + DT*TEND*(MAPL_CP/CVM))/DPNEW
   endif

And then PKZ is just recalculated if getPKZ:

if ( .not.hydrostatic ) then

!$omp parallel do default(shared)
do k=1,npz
do j=jsc,jec
do i=isc,iec
! perfect gas law: p = density * rdgas * virtual_temperature
! pkz(i,j,k) = ( rdgdelp(i,j,k)pt(i,j,k)/delz(i,j,k) )**kappa
pkz(i,j,k) = exp( kappa
log(rdg
delp(i,j,k)temp(i,j,k) &
(1.d0+zvirqv(i,j,k))/delz(i,j,k)) )
enddo
enddo
enddo
else
!$omp parallel do default(shared)
do k=1,npz
do j=jsc,jec
do i=isc,iec
pkz(i,j,k) = (pk(i,j,k+1)-pk(i,j,k)) / &
(kappa
(peln(i,j,k+1)-peln(i,j,k)))
enddo
enddo
enddo
endif

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants