Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test multi-grid support for Component level CPU-GPU switch #69

Open
1 task
FlorianDeconinck opened this issue May 22, 2024 · 0 comments
Open
1 task

Comments

@FlorianDeconinck
Copy link
Collaborator

The SI team delivered a way to have hybrid decomposition per Component. For a CPU/GPU hybrid this means we could run the CPU only component on a small domain, to maximize L3/L2 on CPU and GPU to maximize bandwith we would leverage a bigger domain and push everything in VRAM.

This task needs to benchmark (and light validate) the decomposition for dycore and/or moist.


  • Benchmark CPU/GPU hybrid decomposition vs Fortran and vs single-domain NDSL

Detail Per Hamid email (might be merged by the time we get onto the task):

Hi Florian,

After we resolved the layout reproducibility issue, the mixed hybrid code is now ready.

It works correctly (zero diff with the baseline) but needs some tuning – a known issue.

We can share screen when you have time to go over.

To get the code, you can do:

- mepo clone [[email protected]:GEOS-ESM/GEOSgcm](mailto:[email protected]:GEOS-ESM/GEOSgcm)
- cd GEOSgcm
- mepo checkout-if-exists feature/aoloso/hybrid_112923
- mepo develop fvdycore
 

You build as usual.

To configure a run, please take a look at AGCM.rc in /discover/nobackup/aoloso/geos_hybrid5/c48_hybrid for a run that uses 3 OpenMP threads per MPI process in dyncore gridcomp.

The run uses a total of 36 PEs. On the dyncore side you have 12 MPI ranks x 3 OpenMP threads. Everywhere else you have 36 MPI ranks.

A more interesting run is in c720_splitField in the same directory. That run is configured to use 4 threads for dyncore. It uses 2400 PEs – 600 MPI ranks x 4 OpenMP threads for dyncore, 2400 MPI ranks everywhere else.

 

There are restrictions on how chop cubed sphere into subdomains. Checks are in the code to catch violations with explanations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant