Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flexible controls of GPU configuration #4396

Merged
merged 26 commits into from
Aug 16, 2023

Conversation

sjsprecious
Copy link
Collaborator

@sjsprecious sjsprecious commented Apr 17, 2023

This PR updates the configurations for the GPU offload based on the PRs in CMEPS (ESCOMP/CMEPS#363) and ccs_config_cesm (ESMCI/ccs_config_cesm#97).

An example to build a GPU case on Derecho with these new GPU options by using create_newcase looks:
./create_newcase --case /path_to_case_dir --mach derecho --compiler nvhpc --mpilib mpich --compset F2000dev --res f19_f19_mg17 --queue main --ngpus-per-node 4 --gpu-type a100 --gpu-offload openacc

An example to build a GPU case on Derecho with these new GPU options by using create_test looks:
./create_test ERP_Ln9_G4-a100-openacc.f19_f19_mg17.F2000dev.gust_nvhpc.cam-outfrq9s --test-root /path_to_case_dir --output-root /path_to_output -q main

…/jedwards4b/cime/compare/28b7431..3f4b1ab

	modified:   CIME/Tools/Makefile
	modified:   CIME/XML/env_batch.py
	modified:   CIME/XML/env_mach_specific.py
	modified:   CIME/build.py
	modified:   CIME/case/case.py
	modified:   CIME/data/config/xml_schemas/config_machines.xsd
	modified:   CIME/data/config/xml_schemas/env_mach_specific.xsd
	modified:   CIME/scripts/create_newcase.py
	modified:   CIME/test_scheduler.py
	modified:   CIME/tests/test_unit_case.py
	modified:   CIME/XML/env_mach_pes.py
	modified:   CIME/case/case.py
	modified:   CIME/config.py
	modified:   doc/source/users_guide/cime-customize.rst
	modified:   CIME/data/config/xml_schemas/config_machines.xsd
	modified:   CIME/data/config/xml_schemas/config_machines.xsd
	modified:   CIME/case/case.py
	modified:   CIME/XML/env_mach_pes.py
	modified:   CIME/case/case.py
@jedwards4b jedwards4b self-requested a review April 17, 2023 16:11
@sjsprecious
Copy link
Collaborator Author

@jedwards4b will issue a separate PR to introduce these changes.

CIME/config.py Outdated Show resolved Hide resolved
@rljacob
Copy link
Member

rljacob commented Aug 8, 2023

I believe this works independently of the E3SM approach to define gpu-specific compilers? And I think we would handle a machine like derecho with separate machine entries for each node type.

@jedwards4b
Copy link
Contributor

that's correct - we are departing from that approach.

@jedwards4b
Copy link
Contributor

@sjsprecious It looks like you may need to add something to make sure this only happens when driver=nuopc

@sjsprecious
Copy link
Collaborator Author

Thanks @jedwards4b . Do you mean that we should only use these new GPU options for the NUOPC driver? Why can't we use it for MCT?

@jedwards4b
Copy link
Contributor

If you want use them for MCT then you will need to make changes in
components/cpl7/driver/cime_config/config_component.xml
similar to what you did in cmeps.

@sjsprecious
Copy link
Collaborator Author

Thanks @jedwards4b for your explanation.

So what is the best way you think to handle the MCT case? Currently generating a case with MCT will fail due to the missing new XML variables as expected. Would you suggest bypassing those errors and continuing to build anyway (probably ignoring the given GPU options and always building a CPU case)?

@jedwards4b
Copy link
Contributor

probably the easiest is to just add them to the mct config_component.xml file.

@sjsprecious
Copy link
Collaborator Author

Thanks @jedwards4b . I just opened a PR to introduce these new XML variables in CPL7 (ESCOMP/CESM_CPL7andDataComps#25). Could you please review and merge it if the PR looks good to you?

@jedwards4b
Copy link
Contributor

@sjsprecious The workflow test for e3sm, mct above needs to pass. Remember that these variables are not defined for e3sm.

@sjsprecious
Copy link
Collaborator Author

Thanks @jedwards4b . Since I am not familiar with the E3SM build process, could you please let me know where I can add those new XML variables for the E3SM build workflow?

@jedwards4b jedwards4b merged commit 41b67a8 into ESMCI:master Aug 16, 2023
10 of 11 checks passed
@jedwards4b jedwards4b deleted the add_gpu_gust branch August 16, 2023 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants