Skip to content

WeeklyTelcon_20230131

Geoffrey Paulsen edited this page Feb 7, 2023 · 2 revisions

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Geoffrey Paulsen (IBM)
  • Jeff Squyres (Cisco)
  • Brendan Cunningham (Cornelis Networks)
  • David Bernholdt
  • Edgar Gabriel (AMD)
  • Howard Pritchard (LANL)
  • Joseph Schuchart (UTK)
  • Josh Fisher (Cornelis Networks)
  • Josh Hursey (IBM)
  • Luke Robison (Amazon)
  • Thomas Naughton (ORNL)
  • Todd Kordenbrock (Sandia)
  • William Zhang (AWS)

New Items

  • Reminder: When issues with various company CI controls, please post in #general Slack.

  • New - Issue #11347 Versioning is wrong in v5.0.x

    • We agreed v4.0.x -> v4.1.x -> v5.0.x should be ABI compatible.
      • Compile an MPI Application with v4.0.x, then RM -Rf OMPI, and then install the v5.0.0 into the same location, and it just work.
      • Did we figure out the Fortran ABI break?
        • Memory: Yes we did break Fortran ABI.
        • Broke ABI in a very narrow case, when you compile Fortran with 8byte ints, and C 4byte int.
        • Two other things that may or maynot break ABI.
        • Did some stuff with intents and asyncs, and went from named interfaces to unnamed.
          • Unsure if this affects ABI.
      • ABI mostly just care about C and mpif.h
      • Fortran library has different .so versioning.
    • Blocker for next v5.0.0rc - get library versioning correct.
    • When we talk about ABI - Fortran will be nuanced.

v4.1.x

  • Made a minor change for another rc. Trying to get rc built.

v5.0.x

  • RC from last week, got pushed to this week.
    • Still waiting on https://github.com/open-mpi/ompi/issues/11354
    • may be enable dso option?
      • Accelerator initially picks CUDA and then disqualifies it, but at teardown it trys to teardown CUDA.
        • Reason it does this, is because CUDA now uses delayed startup so will still be enabled.
        • Another variable if CUDA was initialized.
      • Should also be on main (comment saying otherwise
    • Howard said after the call that this isn't a blocker for rc10
  • Waiting on PMIx and PRRTE submodule update.
    • Ralph pestered us to please merge it. - just merged on main.
    • Merged, will make rc10
  • Need documentation for v5.0.0
  • Manpages need an audit before release.
    • Double check --prefix behavior
    • Not the same behavior as v4.1.x
  • What is status of HAN?
    • Joseph pushed a bunch of data, but not on the call. Go read this.
    • Joseph had some more experiments. HAN collective component with shared memory PR, we were pretty good compared to tuned and another
      • Comparing HAN with shared Mem component.
      • How many ppr? Between 2ppr and 64ppr
    • Better numbers, would be good to document this.
      • In OSU there's always a barrier before the operation. If Barrier and operation match up well, you get lower latency.
      • We'd talked about supplying some docs about how HAN is great, and why we're enabling it for v5.0.0 by default.
        • Like to include instructions on how to reproduce as well for users.
        • document in ECP -
      • Our current resolution is to enable it as is, and fix current regressions in future releases.
      • What else is needed to enable it by default?
        • Just need to flip a switch.
        • The module that Joseph has for shared memory for HAN at the moment would need some work to add additional collectives.
        • And it relies on xpmem to be available.
        • So for now just enable HAN for collectives we have, and later enable for other collectives.
        • George would like to re-use what tuned does, without reimplementhing everything, but a shared memory component is a better choice, but with more work.
        • If we don't enabled HAN now by default, it's v5.1 (best case) before it's enabled.
          • The trade offs lean toward turning it on and fixing whatever problems might be there.
        • There is a PR for tuned (increases default segment size), and changes algorithms in tuned for shared memory.
        • Need to start moving forward, rather than doing more analysis.

Main branch

ITT

Clone this wiki locally