-
Notifications
You must be signed in to change notification settings - Fork 858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CUDA/HIP implementations of reduction operators #12569
base: main
Are you sure you want to change the base?
Commits on Sep 15, 2024
-
Add CUDA/HIP implementations of reduction operators
The operators are generated from macros. Function pointers to kernel launch functions are stored inside the ompi_op_t as a pointer to a struct that is filled if accelerator support is available. The ompi_op* API is extended to include versions taking streams and device IDs to allow enqueuing operators on streams. The old functions map to the stream versions with a NULL stream. Signed-off-by: Joseph Schuchart <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4b8da14 - Browse repository at this point
Copy the full SHA 4b8da14View commit details -
Build op/cuda and op/rocm as dso by default
Signed-off-by: Joseph Schuchart <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 13aeecf - Browse repository at this point
Copy the full SHA 13aeecfView commit details -
Remove DECLSPEC from internal functions
Signed-off-by: Joseph Schuchart <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for bc5c3a1 - Browse repository at this point
Copy the full SHA bc5c3a1View commit details -
op/cuda: Lazily initialize the CUDA information
Signed-off-by: Joseph Schuchart <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for c2c5aec - Browse repository at this point
Copy the full SHA c2c5aecView commit details -
op/cuda: Add flexible vector type
CUDA provides only limited vector widths and only for variable width integer types. We use our own vector type and some C++ templates to get more flexible vectors. We aim to get 128bit loads by adjusting the width based on the type size. Signed-off-by: Joseph Schuchart <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 606f778 - Browse repository at this point
Copy the full SHA 606f778View commit details -
op/cuda: cleanup and remove short float remnants
Signed-off-by: Joseph Schuchart <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 37c5dad - Browse repository at this point
Copy the full SHA 37c5dadView commit details -
Add LDFLAGS to op/rocm linker command
Signed-off-by: Joseph Schuchart <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4d4d629 - Browse repository at this point
Copy the full SHA 4d4d629View commit details -
First attempt to check for NVCC
Signed-off-by: Joseph Schuchart <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9fe6351 - Browse repository at this point
Copy the full SHA 9fe6351View commit details -
Signed-off-by: Joseph Schuchart <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 60cc5aa - Browse repository at this point
Copy the full SHA 60cc5aaView commit details -
Mark NVCC, NVCCFLAGS, HIPCC, and HIPCCFLAGS as precious
Signed-off-by: Joseph Schuchart <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 46fbda1 - Browse repository at this point
Copy the full SHA 46fbda1View commit details -
Point CI workflows to nvcc/hipcc
Signed-off-by: Joseph Schuchart <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 730102b - Browse repository at this point
Copy the full SHA 730102bView commit details
Commits on Sep 16, 2024
-
Signed-off-by: Joseph Schuchart <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for c200c02 - Browse repository at this point
Copy the full SHA c200c02View commit details