Skip to content

WeeklyTelcon_20211109

Geoffrey Paulsen edited this page Nov 11, 2021 · 1 revision

Open MPI Weekly Telecon ---

Attendees (on Web-ex)

  • Austen Lauria (IBM)
  • Brendan Cunningham (Cornelis Networks)
  • Brian Barrett (AWS)
  • David Bernholdt (ORNL)
  • Geoffrey Paulsen (IBM)
  • George Bosilca (UTK)
  • Harumi Kuno (HPE)
  • Howard Pritchard (LANL)
  • Jeff Squyres (Cisco)
  • Joseph Schuchart (HLRS)
  • Josh Hursey (IBM)
  • Matthew Dosanjh (Sandia)
  • Michael Heinz (Cornelis Networks)
  • Thomas Naughton (ORNL)
  • Todd Kordenbrock (Sandia)
  • Tomislav Janjusic (NVIDIA)
  • William Zhang (AWS)

not there today (I keep this for easy cut-n-paste for future notes)

  • Akshay Venkatesh (NVIDIA)
  • Artem Polyakov (NVIDIA)
  • Aurelien Bouteiller (UTK)
  • Brandon Yates (Intel)
  • Charles Shereda (LLNL)
  • Christoph Niethammer (HLRS)
  • Edgar Gabriel (UH)
  • Erik Zeiske (HPE)
  • Geoffroy Vallee (ARM)
  • Hessam Mirsadeghi (NVIDIA))
  • Joshua Ladd (NVIDIA)
  • Marisa Roman (Cornelius)
  • Mark Allen (IBM)
  • Matias Cabral (Intel)
  • Nathan Hjelm (Google)
  • Noah Evans (Sandia)
  • Raghu Raja
  • Ralph Castain (Intel)
  • Sam Gutierrez (LANL)
  • Scott Breyer (Sandia?)
  • Shintaro iwasaki
  • Sriraj Paul (Intel)
  • Xin Zhao (NVIDIA)

New Topics For Today

  • Does Fortran Fixes affect API? (i.e. needed for v5.0.0?)
  • Howard has been implementing isend/recv and isend/replace
    • ULFM might not have looked closely enough about how this was defined in the standard.
    • What if send completed, but the recv failed?
      • Not hard to code, just not well defined. Let the forum discuss.

v4.0.x

  • Please test!
  • Schedule: Nov. for 4.0.7
  • Howard and Geoff will meet Friday 12th in hopes of building final v4.0.7, assuming no new issues reported.

v4.1.x

  • Schedule:
    • Another RC probably tomorrow
  • Just want to verify no regressions in v4.1.x release series.
  • George fixed Scatter, and a few other Collectives over the last few weeks.
    • George started to look at Issue #9619
      • George will try to reproduce.

v5.0.x

  • Schedule: rc2 went out yesterday.
  • Two big Headers coming OSHMEM headers and OSC UCX
  • Sent an email to devel
  • https://github.com/open-mpi/ompi/issues/9540 might be ready on v5.0.x
  • 8 PRs open.
    • PR 9594 - Fixes some BTL issues (against master) will take a few days to review.
  • Issue #9554 Jeff asked about Partitions support going to v5.0 or not?
    • Matthew is interested
  • PR #9495 TCP Onesided for master.
  • Tommy's still pushing on UCX Onesided.
  • PR 9576 - Ralph filed a ticket about building packages externally.
    • Working with fedora packagers. Will be a v5.0.x
    • Might need some back and forth with PMIx. The way he updated PMIx might need massive change to OMPI.
      • Ball is somewhat in Jeff's Court.
      • Across OMPI/PMIx/PRRTE - Just need to
  • MPI Info stuff that Yoseph and Howard are working on.
    • Marking a few MPI_ calls as deprecated.
    • Nevermind, Don't mark as deprecated, since we're not MPI 4.0 compliant, so DONT mark as deprecated yet.
    • No additional discussion. *
  • Documentation
    • Got a change in sphynx tools needed. No sure if there's a release yet.
      • This fixes outputting issues in manpages.
    • Process to update FAQ is to talk to Jeff or Harumi.
    • Any changes in README or FAQ let them know to make changes in NEW docs.
      • For now, make changes in ompi-www and README as usual and let them know.
  • Issue 9501 regression, needs to be fixed or reverted.
  • No test for building from tarball, ensure we don't need pandoc.
  • Github Project of [critical v5.0.x issues|https://github.com/open-mpi/ompi/projects/3]
    • Issue #8983 If we partially disable OSC/TCP BTL - Not breaking MPI compliance, just breaking One-sided performance badly.
    • Described approach of rc1 on Sept 23, disabling any functionality that are blockers to allow for the rc.
      • Worried that blockers might not be fixed in time, so will put in code to issue an error at runtime to prevent getting into those paths, and document it heavily.

Super Computing SC BoF

  • Time and Date of BOF Nov 16 @ 12:15pm US Eastern Time.
  • Was accepted for Open MPI
    • Our Hybrid BoF will be mostly VIRTUAL BoF
      • George may be there in person for tutorial (tho other tutorials will be fully-virtual)
    • Bird of a Feather will be Virtual.
    • George sent out an email to Amazon, Cisco, IBM, nVidia
  • Where do we drop slides? Jeff will send again.
    • Google Slides - Due Tuesday Nov 9th.
    • Focusing on v5.0

Master

Documentation

  • No update
  • Don't do the old system, use this new system for v5.0.0

MPI 4.0 API

  • [Open MPI 4.0 API Compliance Github Project|https://github.com/open-mpi/ompi/projects/2]
  • Joseph says we're not dropping Info Keys as we SHOULD in the MPI 4.0.
    • Issue #9555
    • Do we want this in OMPI v5.0.0?
      • It'd be nice, because it's going to change behavior.
      • But it might also be bad because it's a change in behavior (if users depending on MPI 3.1 behavior)
        • But since it wasn't specified in MPI 3.1, so maybe whatever we do is okay.
    • Joseph posted PR #9567 to address Issue #9555,
      • Would like to get MPI-IO addressed in this PR before it's merged as well.
      • Will continue to work on that.
  • Need to decide what to do with 8057
  • Joseph could use help with ompio / Romio.
    • Needs to talk to edgar about ompio.
    • Right now, track if keys are read.
    • Keys are being copied from ompi to opal info keys.
    • MPI 4 might be "too restrictive" for our archetecture.
    • Trying to see how it could fit.
  • Sessions branch, don't want to merge into master until possibly v5.0.1 gets out.
    • It will complicate things in finalize/initialize code.

MTT

  • Looking okay.
  • Revive MTT development montly meetings in January of 2022
  • Some Cherry-Pi MTT socket connection with DB timeout issue.
  • Looks like something was wrong with MTT.
    • That machine just got upgraded.
    • Install fail is kinda weird.

Longer Term discussions

  • No discussion.
Clone this wiki locally