Test multi-grid support for Component level CPU-GPU switch #69

FlorianDeconinck · 2024-05-22T17:49:16Z

The SI team delivered a way to have hybrid decomposition per Component. For a CPU/GPU hybrid this means we could run the CPU only component on a small domain, to maximize L3/L2 on CPU and GPU to maximize bandwith we would leverage a bigger domain and push everything in VRAM.

This task needs to benchmark (and light validate) the decomposition for dycore and/or moist.

Benchmark CPU/GPU hybrid decomposition vs Fortran and vs single-domain NDSL

Detail Per Hamid email (might be merged by the time we get onto the task):

Hi Florian,

After we resolved the layout reproducibility issue, the mixed hybrid code is now ready.

It works correctly (zero diff with the baseline) but needs some tuning – a known issue.

We can share screen when you have time to go over.

To get the code, you can do:

- mepo clone [[email protected]:GEOS-ESM/GEOSgcm](mailto:[email protected]:GEOS-ESM/GEOSgcm)
- cd GEOSgcm
- mepo checkout-if-exists feature/aoloso/hybrid_112923
- mepo develop fvdycore
 

You build as usual.

To configure a run, please take a look at AGCM.rc in /discover/nobackup/aoloso/geos_hybrid5/c48_hybrid for a run that uses 3 OpenMP threads per MPI process in dyncore gridcomp.

The run uses a total of 36 PEs. On the dyncore side you have 12 MPI ranks x 3 OpenMP threads. Everywhere else you have 36 MPI ranks.

A more interesting run is in c720_splitField in the same directory. That run is configured to use 4 threads for dyncore. It uses 2400 PEs – 600 MPI ranks x 4 OpenMP threads for dyncore, 2400 MPI ranks everywhere else.

 

There are restrictions on how chop cubed sphere into subdomains. Checks are in the code to catch violations with explanations.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test multi-grid support for Component level CPU-GPU switch #69

Test multi-grid support for Component level CPU-GPU switch #69

FlorianDeconinck commented May 22, 2024

Test multi-grid support for Component level CPU-GPU switch #69

Test multi-grid support for Component level CPU-GPU switch #69

Comments

FlorianDeconinck commented May 22, 2024