Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CUDA/HIP implementations of reduction operators #12569

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Commits on Sep 15, 2024

  1. Add CUDA/HIP implementations of reduction operators

    The operators are generated from macros. Function pointers to
    kernel launch functions are stored inside the ompi_op_t as a
    pointer to a struct that is filled if accelerator support is available.
    
    The ompi_op* API is extended to include versions taking streams and device
    IDs to allow enqueuing operators on streams. The old functions map
    to the stream versions with a NULL stream.
    
    Signed-off-by: Joseph Schuchart <[email protected]>
    devreal committed Sep 15, 2024
    Configuration menu
    Copy the full SHA
    4b8da14 View commit details
    Browse the repository at this point in the history
  2. Build op/cuda and op/rocm as dso by default

    Signed-off-by: Joseph Schuchart <[email protected]>
    devreal committed Sep 15, 2024
    Configuration menu
    Copy the full SHA
    13aeecf View commit details
    Browse the repository at this point in the history
  3. Remove DECLSPEC from internal functions

    Signed-off-by: Joseph Schuchart <[email protected]>
    devreal committed Sep 15, 2024
    Configuration menu
    Copy the full SHA
    bc5c3a1 View commit details
    Browse the repository at this point in the history
  4. op/cuda: Lazily initialize the CUDA information

    Signed-off-by: Joseph Schuchart <[email protected]>
    devreal committed Sep 15, 2024
    Configuration menu
    Copy the full SHA
    c2c5aec View commit details
    Browse the repository at this point in the history
  5. op/cuda: Add flexible vector type

    CUDA provides only limited vector widths and only for variable
    width integer types. We use our own vector type and some C++
    templates to get more flexible vectors. We aim to get 128bit loads
    by adjusting the width based on the type size.
    
    Signed-off-by: Joseph Schuchart <[email protected]>
    devreal committed Sep 15, 2024
    Configuration menu
    Copy the full SHA
    606f778 View commit details
    Browse the repository at this point in the history
  6. op/cuda: cleanup and remove short float remnants

    Signed-off-by: Joseph Schuchart <[email protected]>
    devreal committed Sep 15, 2024
    Configuration menu
    Copy the full SHA
    37c5dad View commit details
    Browse the repository at this point in the history
  7. Add LDFLAGS to op/rocm linker command

    Signed-off-by: Joseph Schuchart <[email protected]>
    devreal committed Sep 15, 2024
    Configuration menu
    Copy the full SHA
    4d4d629 View commit details
    Browse the repository at this point in the history
  8. First attempt to check for NVCC

    Signed-off-by: Joseph Schuchart <[email protected]>
    devreal committed Sep 15, 2024
    Configuration menu
    Copy the full SHA
    9fe6351 View commit details
    Browse the repository at this point in the history
  9. Add check for hipcc

    Signed-off-by: Joseph Schuchart <[email protected]>
    devreal committed Sep 15, 2024
    Configuration menu
    Copy the full SHA
    60cc5aa View commit details
    Browse the repository at this point in the history
  10. Mark NVCC, NVCCFLAGS, HIPCC, and HIPCCFLAGS as precious

    Signed-off-by: Joseph Schuchart <[email protected]>
    devreal committed Sep 15, 2024
    Configuration menu
    Copy the full SHA
    46fbda1 View commit details
    Browse the repository at this point in the history
  11. Point CI workflows to nvcc/hipcc

    Signed-off-by: Joseph Schuchart <[email protected]>
    devreal committed Sep 15, 2024
    Configuration menu
    Copy the full SHA
    730102b View commit details
    Browse the repository at this point in the history

Commits on Sep 16, 2024

  1. More robust find for cudart

    Signed-off-by: Joseph Schuchart <[email protected]>
    devreal committed Sep 16, 2024
    Configuration menu
    Copy the full SHA
    c200c02 View commit details
    Browse the repository at this point in the history