Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel simulations using MPI.jl #141

Draft
wants to merge 45 commits into
base: master
Choose a base branch
from

Conversation

marinlauber
Copy link
Member

@marinlauber marinlauber commented Jul 12, 2024

This pull request is a work in progress.

I open it now to see how we can add parallel capabilities to WaterLily.jl efficiently, keeping in mind that ultimately we want to be able to do multi-CPU/GPU simulations.

I have run most of the examples/TwoD_* files with the double ghost again, which works in serial.

This pull request changes/adds many files, and I will briefly describe the changes made.

changed files

  • src/WaterLily: enables passing the type of the Poisson solver to the simulations (mainly to simplify my testing), preliminary MPI extension (not used, to be discussed).
  • src/Flow.jl: implement double ghost cells and remove the special QUICK/CD scheme on the boundaries, as these are no longer needed.
  • src/MultiLevelPoisson.jl: implement downsampling for double ghost arrays and change all utilities functions accordingly. Explicitly define the dot product functions to overload these with the MPI functions later on. Also, change the solver! function as the PoissonMPI.jl test did not converge properly with these Linfty criteria.
  • src/Poisson.jl: add a perBC! call in Jacobi (not needed, I think) and adjust the solver.
  • src/util.jl: adjust all the inside functions and loc to account for the double ghost cells. Adjust BC!, perBC! and exitBC! functions for the double ghost cells. This also introduce a custom Array type MPIArray that allocates send and receive buffers to avoid allocating them at every mpi_swap call. This new array type allows type dispatch within the extension.

New files

  • examples/TwoD_CircleMPI.jl: Script to simulate the flow around a 2D circle using MPI.

  • examples/TwoD_CircleMPIArray.jl: the classical flow around a 2D circle but using the custom MPIArray type to demonstrate the fact that in serial (if MPI is now loaded) the flow solver works fine.

  • ext/WaterLilyMPIExt.jl: MPI extension that defines new functions using type dispatches for the WaterLily function that are now parallel.

  • test/test_mpi.jl initial MPI test should be changed to use this instead.

  • WaterLilyMPI.jl: contains all the function overload needed to perform parallel WaterLily simulations. Define an MPIGrid type that stores information about the decomposition (global for now) and the mpi_swap function that performs the message passing together with some MPI utils.

  • MPIArray.jl: a custom Array type that also allocates send and receive buffers to avoid allocating them at every mpi_swap call. This is an idea for the final implementation and has not been tested yet.

  • FlowSolverMPI.jl: tests for some critical part of the flow solver, from sdf measures to sim_step. Use with vis_mpiwaterlily.jl to see plot the results on the different ranks.

  • PoisonMPI.jl: parallel Poisson solver test on an analytical solution. Use with vis_mpiwaterlily.jl to see plot the results on the different ranks.

  • diffusion_2D_mpi.jl: initial test of MPI function, deprecated

  • vis_diffusion.jl: use to the the results of diffusion_2D_mpi.jl deprecated

  • test/poisson.jl a simple Poisson test, will be removed

The things that remain to do

  • decide how to incorporate this into the main solver; this introduces many changes, like double ghosts, etc. We might want to consider whether this is another version or another branch or if we merge it into the main solver.
  • try to minimize global operations in the code (for example, the AllReduce in Poisson.residuals!)
  • proper debug (sometimes it fails with an MPI error; I haven't managed to track it yet, but I suspect it was from using @views() for the send-receive buffer. This could be avoided if we allocate the send and receive buffer with the arrays using something similar to what it is in the file MPIArray.jl
  • Update the VTK extension to enable the writing of parallel files.
  • Think about a GPU version of this...
  • See how we can hide the MPI communication with array core computation to achieve perfect weak scaling (a la ImplicitGlobalGrid.jl)
  • benchmark it and write proper tests
  • convective exit with global (across all ranks) mass flux check.

Some of the results from FlowSolverMPI.jl

basic rank and sdf check
rank_and_SDF

zeroth kernel moment vector with and without halos
mu0

full sim_step check
sim_step_ml

@marinlauber marinlauber added the enhancement New feature or request label Jul 12, 2024
@marinlauber marinlauber added the help wanted Extra attention is needed label Jul 12, 2024
@marinlauber
Copy link
Member Author

marinlauber commented Jul 12, 2024

OUTDATED COMMENT @weymouth @b-fg @TzuYaoHuang My initial idea is to incorporate this into the main solver as an extension and use the custom type MPIArray and type dispatch to use the parallel version of the functions that need changing (all the functions in WaterLilyMPI.jl essentially). For example, ext/WaterLilyMPIExt.jl would define

BC!(a::MPIArray,A,saveexit=false,predir=())
    ...
end

The flow will be constructed using Simulation(...,mem=MPIArray), which will trigger the call of the correct functions. This will work for most of the functions, except loc, which might be annoying. Using the custom type will also mean that we won't allocate anything during MPI exchanges; the send and receive buffers are allocated as part of these arrays, so they can be accessed in the function that needs a transfer. We allocate only 2 send and receive buffers, with sufficient size to store any of the buffers in 3D.

We could also bind the MPIGrid to these array subtypes to avoid a global definition of the grid, but then I don't know how to deal with functions like loc.

ParallelVTK.jl Outdated Show resolved Hide resolved
@marinlauber
Copy link
Member Author

marinlauber commented Jul 31, 2024

SOLVED
The last thing that is not "performance-related" is to find a way to locally overwrite the grid_loc() function that sets the offset of the Waterlily.loc function within the MPI grid.
The solution is to have a default function in src/util.jl like this

grid_loc(args) = 0

And the function WaterLily.loc(i,I,T) becomes

@inline loc(i,I::CartesianIndex{N},T=Float32) where N = SVector{N,T}(global_loc() .+ I.I .- 2.5 .- 0.5 .* δ(i,I).I)

where the function global_loc() = grid_loc(Val(:WaterLily_MPIExt)) by default calls the initial src/util.jl.grid_loc()
And then in the extension I simply need to use type dipstach and define another grid_loc() function

grid_loc(::Val{:WaterLily_MPIExt}) = mpi_grid().global_loc

end
function vtkWriter(fname="WaterLily";attrib=default_attrib(),dir="vtk_data",T=Float32)
function vtkWriter(fname="WaterLily";attrib=default_attrib(),dir="vtk_data",T=Float32,extents=[(1:1,1:1)])
!isdir(dir) && mkdir(dir)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can actually be an issue if all rank try to create the dir.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants