Skip to content

6.0.x Feature List

Wenduo Wang edited this page Aug 13, 2024 · 29 revisions

Time Line

Target date - end CY24.

When should we plan to cut the 6.0.x branch? As late as possible, unless we are blocking 7.0 changes (ABI).

Strike through means feature is complete and committed to Open MPI main branch.

What to get done by end of Q2

  • Extended Accelerator API:
    • CUDA support for IPC
    • ZE support for IPC
  • Switch over to forked PRRTe Phase 1
    • Documentation Change
    • Remove Remove prte binaries
    • Remove --with-prte configure option from ompi
    • Same MCAs
  • BTL Self aware of accelerators
  • Reduction op (and others) offload support (Joseph)
  • Collectives:
    • Merge XHC if they can commit to supporting it.
    • Merge acoll once it passes CI
    • smdirect won't be merged, salvage for parts.
    • propose JSON format for tuning file
    • Remove coll/sm (tuned is OK fallback, XHC/acoll coming soon)
    • Performance testing of Luke's han alltoall pr with UCX.
  • Remove:
    • GNI BTL
    • udredge_rcache
    • Remove pvfs2 components
  • Big Count:
    • API-level function generation
    • Collective embiggening Phase 1 (everything except *v *w collectives)

What to get done by end of Q3

  • Phase 2 PRRTE
    • MCA parameters move into ompi namespace.
    • prte_info is gone, move those to ompi_info, perhaps a prte-mca option?
  • Memory Kind support:
    • Add memory-kind option
    • Return supported memory kinds
  • ROMIO Refresh
  • Collective embiggening Phase 2 (*v *w collectives)
  • Remove:
    • Remove use TKR in MPI module for Fortran (old NAG)

List of Features planned for the 6.0.x release stream

ABI:

  • If Jacob's ABI work is ready, it might help solidify the standard to have our implementation done.
    • Merge ABI work into main, enable it only when requested, and stress in documentation it is experimental.

MPI 4.0 (critical):

  • Big count support
    • API level functions (in progress 1-2 months)
    • Collective embiggening (discussed at F2F, stage in none v,w functions first)
    • Changes to datatype engine/combiner support (could be a challenge)
    • ROMIO refresh
  • PRRTE switch Phase 1

MPI 4.0 (tentative):

  • MPI_T events (probably won't do for 6.0.x).

Accelerator support:

  • extended accelerator API functionality (IPC) and conversion of the last components to use accelerator API (DONE for ROCM, not CUDA or ZE).
  • level zero (ze) accelerator component (DONE basic support, IPC not implemented, Howard)
  • support for MPI 4.1 memory kinds info object (assume we have PRRTE move, 1 month for basic support)
  • reduction op (and others) offload support (Joseph estimates 1-2 months to get in)
  • SMSC accelerator (Edgar - not sure yet about this one for 6.0.x)
    • Stream-aware datatype engine.
  • BTL self issue (doesn't support accelerators currently). (Khawthar working on this)
  • Datatype engine accelerator awareness(e.g. memcpy2d) (George).

What about smart pointers? Probably could not get this in to a 6.0.x.

MPI 4.1:

  • implement memory allocation kind info. (see above for accelerator features)

Things to remove:

  • GNI BTL - no longer have access to systems to support this (Howard)
  • UDREG Rcache - no longer have access to systems that can use this (Howard)
  • FS/PVFS2 an FBTL/PVFS2 - no longer have access to systems to support this (Edgar)
  • coll/sm
  • Remove TKR version of use mpi module. (Howard)
    • This was deferred from 4.0.x because in April/May 2018 (and then deferred again from v5.0.x in October 2018), it was discovered that:
      1. The RHEL 7.x default gcc (4.8.5) still uses the TKR mpi module
      2. The NAG compiler still uses the TKR mpi module.

Collectives:

  • mca/coll: blocking reduction on accelerator (this is discussed above, Joseph)
  • mca/coll: hierarchical MPI_Alltoall(v), MPI_Gatherv, MPI_Scatterv. (various orgs working on this)
  • mca/coll: new algorithms (various orgs working on this)

There are quite a few open PRs related to collectives. Can some of these get merged? See notes from 2024 F2F Meeting

Random:

  • Sessions - add support for UCX PML (Howard, 2-3 weeks)
  • Sessions - various small fixes (Howard, 1 month)
  • Atomics - can we just rely on C11 and remove some of this code? We are currently using gcc atomics for performance reasons. Joseph would like to have a wrapper for atomic types and direct load/store access.
Clone this wiki locally