Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

part of cam6_4_011: fix the path to fms for fv3 build, remove mct reference #1067

Merged
merged 1 commit into from
Jul 19, 2024

Conversation

jedwards4b
Copy link

@jedwards4b jedwards4b commented Jun 27, 2024

The path to the FMS library needs to be updated for fv3 builds.
Fixes issue #1068

@cacraigucar cacraigucar added misc tag issue or PR candidate for upcoming misc tag next tag This issue is ready to be fixed in the next CAM tag CoupledEval3 labels Jun 27, 2024
Copy link
Collaborator

@jtruesdal jtruesdal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for cleaning this up.

@brian-eaton
Copy link
Collaborator

@jedwards4b, I'm confused. We have an several FV3 tests in the aux_cam regression tests, and they all are passing. I just tried the test referred to in #1068 (ERS_Ln9.C96_C96_mg17.FHS94.derecho_intel.cam-outfrq9s) on izumi/gnu (since derecho is down) and it builds and runs. I tested using the latest cam tag cam6_4_005. The FMS.bldlog file does contain a warning about a missing include file, but it's not an error. Does this PR just clean that up, or is it more than that?

@jedwards4b
Copy link
Author

It's more than that - the cime PR ESMCI/cime#4647 changes the path to the FMS library removing "nuopc" at line 55 and comp_interface at line 111. Since the current cam externals don't have 4647 your tests will still pass without this change.

@brian-eaton
Copy link
Collaborator

@jedwards4b, I tried pulling in this PR along with the new cime6.1.0 tag and removing the mct submodule. My first build attempt fails with the error (in csm_share.bldlog)

/work/test-src/cam6_4_006_rm-mct/share/RandNum/src/dsfmt_f03/dSFMT.c:17:10:
fatal error: dSFMT-params.h: No such file or directory 

The file is in share/RandNum/include/dSFMT-params.h, but that include directory is not showing up in the compilation command. Are there other submodules besides cime that need to be updated?

@jedwards4b
Copy link
Author

You need the new share tag as well: share1.1.2

@brian-eaton
Copy link
Collaborator

@jedwards4b, Do I need to update the cmeps tag as well? Currently using cmeps0.14.67

/work/test-src/cam6_4_006_rm-mct/components/cmeps/cime_config/../cesm/driver/esm.F90:799:9:

  799 |     use m_MCTWorld   , only : mct_world_init => init
      |         1
Fatal Error: Cannot open module file ‘m_mctworld.mod’ for reading at (1): No such file or directory

@jedwards4b
Copy link
Author

Yes cmeps1.0.0.

@brian-eaton
Copy link
Collaborator

The build is now completing, but a test of F2000climo at f10 with debug is getting a seg fault:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
...
#3  0x5f834c117e28 in med_aofluxes_update
	at /work/test-src/cam6_4_006_rm-mct/components/cmeps/cime_config/../mediator/med_phases_aofluxes_mod.F90:1082
#4  0x5f834c11b1bb in __med_phases_aofluxes_mod_MOD_med_phases_aofluxes_run
	at /work/test-src/cam6_4_006_rm-mct/components/cmeps/cime_config/../mediator/med_phases_aofluxes_mod.F90:320
...
#43  0x58dae17039aa in esmapp
	at /work/test-src/cam6_4_006_rm-mct/components/cmeps/cime_config/../cesm/driver/esmApp.F90:1
#44  0x58dae17040d3 in main
	at /work/test-src/cam6_4_006_rm-mct/components/cmeps/cime_config/../cesm/driver/esmApp.F90:7

Any ideas?

@jedwards4b
Copy link
Author

I do not get this error using SMS_D_Ln9.f19_f19_mg17.F2000climo.derecho_intel.cam-outfrq9s or with
SMS_D_Ln9.f10_f10_mg37.F2000climo.izumi_intel.cam-outfrq9s

/scratch/cluster/jedwards/SMS_D_Ln9.f10_f10_mg37.F2000climo.izumi_intel.cam-outfrq9s.20240708_104012_g3o8ng

@brian-eaton
Copy link
Collaborator

brian-eaton commented Jul 8, 2024

I ran SMS_D_Ln9.f10_f10_mg37.F2000climo.XPS-8950_gnu.cam-outfrq9s (on my desktop) and got the same error reported above. We must not be testing the same source (or gnu debug is catching something that intel isn't). The source I'm tested was created as follows.

Start from cam6_4_006, merge jedwards4b:fix/fms_path, update submodules to cime6.1.0, cmeps1.0.0, and share1.1.2, and remove mct submodule. This source is available from my CAM fork on branch rm-mct. Is that what you're testing?

@jedwards4b
Copy link
Author

I started over, followed your instructions, built the following test - still passes:
/scratch/cluster/jedwards/SMS_D_Ln9.f10_f10_mg37.F2000climo.izumi_gnu.cam-outfrq9s.20240708_131722_neo4x8
Source is in /home/jedwards/cam6_4_006/

@brian-eaton
Copy link
Collaborator

Thanks for looking at this. I ran the test on izumi and got the same result as you. But izumi has really old gnu compilers, v9.3.0. I'm using v11.4.0 on my desktop. So I reran on derecho which has newer gnu compilers, v12.2.0 and v13.2.0 (not sure what the cime tests are configured to use). On derecho I get what appears to be the same failure I got on my desktop:

dec2445.hsn.de.hpc.ucar.edu 126: Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
dec2445.hsn.de.hpc.ucar.edu 126: 
dec2445.hsn.de.hpc.ucar.edu 126: Backtrace for this error:
...
dec2445.hsn.de.hpc.ucar.edu 58: #1  0x511b0c in med_aofluxes_update
dec2445.hsn.de.hpc.ucar.edu 58: 	at /glade/derecho/scratch/eaton/test-src/cam6_4_006_rm-mct/components/cmeps/cime_config/../mediator/med_phases_aofluxes_mod.F90:1082
dec2445.hsn.de.hpc.ucar.edu 58: #2  0x514d4b in __med_phases_aofluxes_mod_MOD_med_phases_aofluxes_run
dec2445.hsn.de.hpc.ucar.edu 58: 	at /glade/derecho/scratch/eaton/test-src/cam6_4_006_rm-mct/components/cmeps/cime_config/../mediator/med_phases_aofluxes_mod.F90:320
dec2445.hsn.de.hpc.ucar.edu 122: #3  0x1494c501399f in _ZNK5ESMCI13MethodElement7executeEPvPi
dec2445.hsn.de.hpc.ucar.edu 122: 	at /glade/derecho/scratch/jedwards/tmp/spack-stage/spack-stage-esmf-8.6.0-bsogfa4e7dreitxbwm4gbppisw5q4x2t/spack-src/src/Superstructure/Component/src/ESMCI_MethodTable.C:377

Since the cmeps code has been updated I'm suspecting the problem was introduced there.

@jedwards4b
Copy link
Author

Fixed in ESCOMP/CMEPS#479

@brian-eaton
Copy link
Collaborator

Thanks @jedwards4b. FYI, it turns out that NAG debug also caught this problem.

@jedwards4b
Copy link
Author

To be clear - this is really a non-bug bug report. The compilers are reporting an unallocated or unassociated variable in a subroutine interface for a variable that is never actually used.

@cacraigucar cacraigucar merged commit 90997f2 into ESCOMP:cam_development Jul 19, 2024
@cacraigucar cacraigucar removed the next tag This issue is ready to be fixed in the next CAM tag label Jul 19, 2024
@cacraigucar cacraigucar changed the title fix the path to fms for fv3 build, remove mct reference part of cam6_4_011: fix the path to fms for fv3 build, remove mct reference Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CoupledEval3 misc tag issue or PR candidate for upcoming misc tag
Projects
Status: Tag
Development

Successfully merging this pull request may close these issues.

4 participants