Skip to content

WeeklyTelcon_20220809

Geoffrey Paulsen edited this page Aug 23, 2022 · 1 revision

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Austen Lauria (IBM)
  • Geoffrey Paulsen (IBM)
  • Jeff Squyres (Cisco)
  • Brendan Cunningham (Cornelis Networks)
  • Edgar Gabriel (UoH)
  • Howard Pritchard (LANL)
  • Josh Fisher (Cornelis Networks)
  • Todd Kordenbrock (Sandia)
  • William Zhang (AWS)
  • Jan (Sandia -ULT support in Open MPI)
  • Matthew Dosanjh (Sandia)
  • Hessam Mirsadeghi (UCX/nVidia)
  • Joseph Schuchart

not there today (I keep this for easy cut-n-paste for future notes)

  • David Bernhold (ORNL)
  • Josh Hursey (IBM)
  • Tommy Janjusic (nVidia)
  • Akshay Venkatesh (NVIDIA)
  • Artem Polyakov (nVidia)
  • Aurelien Bouteiller (UTK)
  • Brandon Yates (Intel)
  • Brian Barrett (AWS)
  • Charles Shereda (LLNL)
  • Christoph Niethammer (HLRS)
  • Erik Zeiske
  • Geoffroy Vallee (ARM)
  • George Bosilca (UTK)
  • Harumi Kuno (HPE)
  • Joshua Ladd (nVidia)
  • Marisa Roman (Cornelius)
  • Mark Allen (IBM)
  • Matias Cabral (Intel)
  • Michael Heinz (Cornelis Networks)
  • Nathan Hjelm (Google)
  • Noah Evans (Sandia)
  • Raghu Raja (AWS)
  • Ralph Castain (Intel)
  • Sam Gutierrez (LLNL)10513
  • Scott Breyer (Sandia?)
  • Shintaro iwasaki
  • Thomas Naughton (ORNL)
  • Xin Zhao (nVidia)

v4.1.x

  • v4.1.5
    • Schedule: targeting ~6 mon (Nov 1)
    • No driver on schedule yet.
  • Potential CVE issue in libevent.. but might not need to do anything.
    • Worse case we'd just updage our libevent version.
    • Tracking down details.

v5.0.x

  • Schedule:
    • PMIx and PRRTE changes coming at end of August.
      • Try to have bugfixes PRed end of August, to give time to iterate and merged.
    • Still using Critical v5.0.x Issues (https://github.com/open-mpi/ompi/projects/3) yesterday
  • Issue 10641 Ralph changed the PRRTE branches (switching us to v3.2 branch)
    • Lots of changes from PRRTE v2.1 -> v3.2
    • Still working to get CI working
      • MTT still failing with SLURM.
      • Gone from segv in MPIRUN to resource detection.
    • Ralph doesn't have SLURM to help with.
    • Looking for someone with SLURM to help.
    • Austen will open an Issue for this.
  • Does ANYONE use Open MPI's Java Bindings?
  • Docs
    • mpirun --help is OUT OF DATE.
      • Have to do this relatively quickly, before PRRTE releases.
      • Austen, Geoff and Tomi will be
      • REASON for this, is because mpirun command line is in PRRTE.
  • mpirun manpage needs to be re-written.
    • Docs are online and can be updates asyncronously.
    • Jeff posted PR to document runpath vs rpath
      • Our configure checks some linker flags, but there might be default in linker or in system that really governs what happens.

PRRTE

  • Ralph is looking to release PRRTE v3.x by end of the month.
    • If Open MPI wants Java Bindings, we'd need to do some Java work in PRRTE before end of the month.

Main branch

  • HAN / Adapt runs.
    • Post to Devel. Summary, and link to results.
    • Want to make these the default.
    • Geoff will send out.
    • Include a pointer on how to run these tests in YOUR environment.
  • Incompatibilities in User Level threading that Jan
    • What's the schedule for fixes to get into v5.0.x
    • Will try to get PRs in by end of August and then iterate.

Accelerator framework

  • William said yesterday that they wanted one more day of testing.
  • sm_cuda component was moved into framework.
    • nVidia has some issues building, and will try again to test
  • Accelerator framework Good first step, but will need to fix (super high level)
    • Does this framework allow us to get rid of sm_cuda altogether.

Attomics PRs.

  • 10492 and link to 10487
    • Geoff STILL needs to test on ppc64le, will get to this, THIS week.
  • Joseph will post some additional info thing in the ticket

MTT

Administrative tasks

Face-to-face

Clone this wiki locally