Skip to content

5.0.x FeatureList

Edgar Gabriel edited this page Mar 30, 2022 · 63 revisions

[Unclear schedule at this time]

Features deferred to v5.0.x from 4.0.x

  • Remove some or all of the deleted MPI-1 and MPI-2 functionality (e.g., MPI_ATTR_DELETE, MPI_UB/MPI_LB, ...etc.).

    • Buckle up: this is a complicated topic. 🙁
      • tl;dr: In v5.0.x, remove the C++ bindings, but leave everything else as not-prototyped-in-mpi.h-etc-by-default. Re-evaluate deleting MPI_ATTR_DELETE (etc.) in the v6.0.x timeframe.
      • 2020-02: AGREED
    • Prior to Oct 2018:
      • All these functions/etc. were marked as "deprecated" (possibly in 2.0.x series? definitely by the 3.0.x series).
      • In the v4.0.x series, the C++ bindings are not built by default, and mpi.h/mpif.h/mpi+mpi_f08 modules do not have prototypes/declarations for all the MPI-1 deleted functions and globals (although all the symbols are still present in libmpi for ABI compatibility reasons, at the request of our packagers).
      • v4.0.x does allow using --enable-mpi1-compatibility to restore the declarations in mpi.h (and friends).
    • At the October 2018 face-to-face meeting:
      • We talked specifically about this issue, especially in the context of v5.0.x.
      • Even before v4.0.0 was released, we started getting complaints about legacy MPI applications failing to build by default with v4.0.0 prereleases (because the apps used the deleted MPI-1/MPI-2 functionality, and the user didn't build Open MPI with --enable-mpi1-compatibility).
      • Due to this, it feels like we need to spend time educating the MPI community about moving away from the deleted MPI-1/MPI-2 functionality. This may take a while.
        • Remember that distros and auto-packagers (e.g., Spack, EasyBuild, etc.) will almost certainly --enable-mpi1-compatibility, so some/many users may not even feel the pain of not enabling the functionality by default yet).
      • At SC19 BOF, Paul reminded us that Scalapack hasn't been updated yet to remove their MPI-1 API usage (GB: ScaLAPACK 2.1 released at SC'19 addressed this issue)
      • As such, we probably shouldn't actually ditch the deleted MPI-1/MPI-2 functionality in v5.0.x.
      • Instead, let's focus on educating the MPI community for now, and re-evaluate whether we can actually ditch the deleted MPI-1/MPI-2 functionality in v6.0.x.
      • That being said, we all seem to agree that removing the C++ bindings in v5.0.x is not a problem.
  • Delete the openib BTL

    • 2020-02: AGREED (and this is already done)
    • It's effectively unmaintained
    • All networks supported by the openib BTL (IB, RoCE, iWARP) now supported by Libfabric and/or UCX.
    • There was talk of doing this for v4.0.0
    • Hence, it seems that the future of RoCE and iWARP is either or both of Libfabric and UCX.
      • ...but neither of those will be 100% ready for Open MPI v4.0.0.
      • It didn't seem to make sense to make iWARP users move from openib to iwarp in v4.0.0 (and potentially something similar for non-Mellanox RoCE users), and then move them again to something else in v5.0.0 ("How to annoy your users, 101").
      • The lowest cost solution for v4.0.0 was to disable IB support by default in openib (i.e., only iWARP and RoCE will use it by default), and punt the ultimate decision about potentially deleting the openib BTL to v5.0.0. Note that v4.0.0 also has a "back-door" MCA parameter to enable IB devices, for the "just in case" scenarios (where users, for whatever reason, who don't want to upgrade to UCX).
    • With all that, need to investigate and see what the Right course of action is for v5.0.0 (i.e., re-evaluate where Libfabric and/or UCX are w.r.t. RoCE support for non-Mellanox devices and iWARP support), and how to plumb that support into Open MPI / expose it to the user.
  • (UTK) Better multithreading. - George

    • 2020-02: AGREED (and this is already done)
    • In OB1 PML, normal OMP parallel Sections. Improved for injection and extraction rates.
    • Implications for other PMLs. Very OB1 specific Maybe a little bit in progress.
  • (UTK) ULFM support via new MPIX functions. Most is in MPIX, but some in PML.

    • 2020-02: UTK is working on this. There are a few PRRTE things outstanding. Pretty close to ready, though.
    • Depends on PMIx v3.x
  • Need PMIx v4.0.x

    • 2020-02: AGREED
  • Need PRRTE v2.0.x

    • 2020-02: AGREED
  • Want Nathan's fix for Vader and other BTL to allow us to have SOMETHING for OSC_RDMA for one-sided + MT runs.

    • 2020-02: @hjelmn will look at this. Either need to put the emulation code in OSC-RDMA (for networks that don't natively support atomics) or put it in each BTL. 🤷‍♂
    • The goal is to ditch OSC-PT2PT.
    • something similar coming into BTL-TCP
    • If osc/rdma supports all possible scenarios (e.g., all BTLs support the RDMA methods osc/rdma needs), this should allow us to remove osc/pt2pt (i.e., 100% migrated to osc/rdma). Would be good if there was an osc/pt2pt alias in case anyone is scripting their mpirun's to select --mca osc pt2pt.
  • Change defaults for embedding libevent / hwloc (see this issue) - HELP NEEDED see PR 5395

    • 2020-02: ALREADY DONE
  • Get rid of BTL-SM-CUDA

    • 2020-02: AGREED
    • It's unmaintained.
    • NVIDIA supports CUDA through UCX.
    • Need to also clean up public documentation (FAQ, web site, etc.) to remove references (or specifcially-version-ize those references) to SM-CUDA
    • Also look at configury help string for --with-cuda. Remember: CUDA is still used in the datatype engine.
    • NVIDIA will look at possibly moving shared memory CUDA support to vader (for cases where UCX is not sufficient, and Libfabric does not yet natively support CUDA for shared memory).
  • Get rid of BTL-SM

    • 2020-02: AGREED
    • It's just a stub right now. Might as well get rid of it.
  • Rename and properly document mpirun's --output-filename option to --output-directory

  • Delete ancient gfortran?

    • 2020-02: AGREED keep old gfortran for now
    • May be too early to do this -- e.g. RHEL 7 is still common

Other new features

  • Switched to PRRTE.

  • New command line options for affinity

  • Simplified network selection (--net) CLI option

  • Displaying what networks were/will be actually used

  • MPI_T documentation

    • 2020-02: we need this.
    • Apparently, we have a CVAR per MCA framework that will tell you which component(s) are currently selected / available.
    • There is no CVAR that lists all the frameworks.
    • You can read/write MCA parameters (before MPI_INIT) -- e.g., set btl.
      • This isn't quite right in 2020-02 master HEAD; @hjelmn thinks it is a bug and will fix it.
  • OMPIO

    • External32 support ALREADY DONE
    • Support for file atomicity ALREADY DONE
    • Add support for IBM Spectrum Scale/GPFS file systems ALREADY DONE
    • Add support for IME ALREADY DONE
    • Add support for DAOS
  • Deleted MPIR interface (MAYBE)

    • This assume PMIx debugger interfaces are done / stable.
    • We issue a deprecation warning in v4.0.0 for any tool that uses the MPIR interface.
    • This topic is under review. Gotten lots of pushback from some important Open MPI users.
    • On the one hand: we announced the removal from OMPI 3 years ago.
    • On the other hand: vendors haven't switched to PMIx yet because MPIR still exists / ain't broke.
    • Someone needs to blink or nothing will change.
      • There's now a "shim" MPIR library!!!
        • Have tested the shim with PMIx v4.
        • It might work with PMI v3...? (has not been tested)
        • It will not work with PMIx v2.
      • A vendor has integrated to use the new PMIx interface (can't name the vendor here on the public wiki).
      • Another vendor is finally talking about it.
      • This means that we are clear to remove MPIR for v5!
        • It's actually already gone from OMPI master -- because it was part of ORTE (which has now been replaced with PRRTE).
    • NEWS-worthy: PMIx has capability for debuggers to attach upon right-after-fork-but-right-at-the-beginning-of-exec. This is a new OMPI/PMIx capability.
  • Use PMIx directly - replace current PMIx component with something more like the hwloc component.

    • 2020-02: DONE
  • Per https://github.com/open-mpi/ompi/pull/6315, we should finish the "reachable" MCA framework

    • Then we don't need to worry about default values for if_include/if_exclude values for OOB/TCP (which may be replaced with PRRTE anyway) and BTL/TCP -- i.e., should we default exclude "docker", etc.
    • 2020-02:
      • It's in use by the TCP BTL.
      • OOB-TCP is gone (ORTE is gone)
      • usNIC should probably be updated to use this.
      • Need to integrate with PRRTE
  • Be more deliberate about our ABI breaks. This came up on webex (WeeklyTelcon_20190910), and we talked about a few items:

    • Better definition of what our ABI promises are.
    • Possible automation to help detect breaks earlier - (see: Issue 6949)
    • Possibly generate a symbol white-list for passing to the linker.
  • HACKING says we allow flex >= 2.5.4 because of CentOS 5. I think it's ok to disregard CentOS / RHEL 5 now -- let's bump the minimum version of flex up to 2.5.35 (see HACKING for details).

    • 2020-02: AGREED
    • RHEL 5 is officially dead.
    • @hjelmn is excited about removing a bunch of old flex-compat code.
    • Yes, let's move the min flex up to 2.5.35.
  • Add multi nic support for ofi MTL using hwloc to match processes and nics.

  • Add event queue (https://ofiwg.github.io/libfabric/master/man/fi_eq.3.html) support for ofi MTL

    • 2020-02: AWS is working on it.
  • Re-jigger our collective algorithm selection mechanism

    • 2020-02: AWS is working on it.
    • William will send email to org about how to run collectives perf sweep
    • more discussion required after that
  • Other MPI-4 Features (at least as MPIX)

    • Sessions or at least its guts
    • 2020-02: Current status:
      • Synced up with current 2020-02 Forum MPI_Sessions proposal
      • Only works with OB1 PML for extended CID matching
        • 128 bit CIDs (for sessions-created communicators): 64 comes from PMIx, 64 comes from local OMPI
        • See the Cluster 2019 paper about this
        • Other PMLs will need work.
      • Work for using MPI functions before initialization is also done. Generally working.
  • Multi-threaded optimizations and refactoring of OSC/UCX

  • Multi-threaded optimizations of OSHMEM, SPML/UCX.

  • OSHMEM v1.5 support

  • UCX collectives support

  • Rename "vader" as "sm" and have "vader" be an alias.

    • Nvidia investigating the possibility of putting CUDA extensions in what-is-vader-today-but-what-will-become-the-sm BTL. May or may not be in time for v5.0.0.
  • Delete mpirun --amca

    • And other changes to command line parsing
    • We will need to document/educate users on these changes
  • Add threading MCA framework to support alternate threading packages

Clone this wiki locally