Skip to content

WeeklyTelcon_20220419

Geoffrey Paulsen edited this page Apr 19, 2022 · 1 revision

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Geoffrey Paulsen (IBM)
  • Jeff Squyres (Cisco)
  • Austen Lauria (IBM)
  • Edgar Gabriel (UoH)
  • Howard Pritchard (LANL)
  • Matthew Dosanjh (Sandia)
  • Todd Kordenbrock (Sandia)
  • William Zhang (AWS)
  • Brendan Cunningham (Cornelis Networks)
  • Brian Barrett (AWS)
  • Harumi Kuno (HPE)
  • Tommy Janjusic (nVidia)
  • David Bernhold (ORNL)
  • Josh Fisher (Cornelis Networks)
  • Akshay Venkatesh (NVIDIA)
  • Hessam Mirsadeghi (UCX/nVidia)

not there today (I keep this for easy cut-n-paste for future notes)

  • Christoph Niethammer (HLRS)
  • Joseph Schuchart
  • Josh Hursey (IBM)
  • Thomas Naughton (ORNL)
  • Artem Polyakov (nVidia)
  • Aurelien Bouteiller (UTK)
  • Brandon Yates (Intel)
  • Charles Shereda (LLNL)
  • Erik Zeiske
  • Geoffroy Vallee (ARM)
  • George Bosilca (UTK)
  • Joshua Ladd (nVidia)
  • Marisa Roman (Cornelius)
  • Mark Allen (IBM)
  • Matias Cabral (Intel)
  • Michael Heinz (Cornelis Networks)
  • Nathan Hjelm (Google)
  • Noah Evans (Sandia)
  • Raghu Raja (AWS)
  • Ralph Castain (Intel)
  • Sam Gutierrez (LLNL)
  • Scott Breyer (Sandia?)
  • Shintaro iwasaki
  • Xin Zhao (nVidia)

v4.1.x

  • 4.1.4 - Looking to do a quick turnaround to get coll_ucc component released.
  • A bug, William added a check to compare fabric names.
    • Thinks the check is correct, but fabric names might not be?
    • Hopefully today or tomorrow he'll have it fixed.
  • Want to make an 4.1.4rc1 soon.
  • Mellanox is testing this internally, but not in MTT.
    • If anyone else wants to test coll_ucc, need to up the priority (it's default=0)

v5.0.x

  • v5.0.0rc6 went out last week.
  • srun issue opened up and giles already PRed a fix.
  • Nothing new yet for an rc this week.
  • Remaining major issues
    • orte -> prrte docs
    • Ralph has a laundry list for prrte v2.1 that sound like we need fixed for OMPI v5.0.0
      • And sound like a lot of work
      • Two that stand out:
        • mpirun (no DVM) - issue in how resources are being allocated for anything more complex than -np 2
          • Markalle opened a prrte issue on Power9
          • Need to understand this disconnect
        • Bouncing data back between PMIx and PRRTE repeatedly.
      • opal_show help not being aggregated correctly. Possibly above?
      • Java binding issue isn't a blocker, but represents a bigger disconnect between us an PRRTE.
  • Brian and Jeff are working on an mpirun requirements doc.
    • Will need to beef up the mpirun before the exec of prterun.
    • No hook in prrte for setting up path in back end daemons
    • Brian opened an Issue #10252
    • Brian wrote up a first cut at it, and Jeff hasn't yet read it.
    • He will paste it into the Issue later today or tomorrow.
      • v5.0.0 blocker items.
  • Jeff gave the packagers a heads up that we're now including the html docs and they might want to package that.

v4.0.x

  • No plan for update
  • alltoallv patches - Patch went into v4.1.x that

Main branch

  • Reminder: Delete master branch in your fork. It will save you much pain in the future.
    • Do it NOW!
  • No longer master nightly tarballs, so if MTT isn't updated, you're not running recent code.
  • Howard wants to talk about Fortran.
    • He's been trying to use recessitate and what Giles started to do a while ago.
      • OMPI doesn't doe f08 correctly at ALL.
      • Something is wrong in the Fortran->C bindings for years.
      • Howard rebased a fix that Giles did
      • XL Fortran compiler is very inadequate in F08.
      • configury that Giles added is based on the assumption (correct one).
        • If you have cscoping, and #include <ISO_Fortran_binding_support.h> something header.
        • Compiler will pull this in because it's part of the compiler.
        • GNU <8 doesn't use this ISO
    • Jeff, Howard and Mark will meet next Thursday to talk about this.
      • Howard things the bindings generations are somewhat seperate.
      • Jeff swears that Rolf told him that the descriptor way, and the older way of just taking a pointer... that the descriptor way is compatible, but since the first element of the struct is a pointer, then it's compatible.
        • Howard agrees with this argument, but it can be a scalar, but scalar will be passed as a pointer as well.
        • Don't think it'll misbehave, but if the compiler supports descriptors we're supposed to support it.
    • Reviving Giles work is good and neccesary,
    • This is post v5.0.0 topic.

MTT

  • IBM asked nVidia if they would take over PGI compiler build/testing.
    • nVidia is still looking into it.

Face-to-face

Clone this wiki locally