Skip to content

WeeklyTelcon_20220111

Geoffrey Paulsen edited this page Jan 11, 2022 · 1 revision

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Austen Lauria (IBM)
  • Brendan Cunningham (Cornelis Networks)
  • Brian Barrett (AWS)
  • Christoph Niethammer (HLRS)
  • Edgar Gabriel (UoH)
  • Geoffrey Paulsen (IBM)
  • George Bosilca (UTK)
  • Howard Pritchard (LANL)
  • Jeff Squyres (Cisco)
  • Joseph Schuchart
  • Josh Hursey (IBM)
  • Matthew Dosanjh (Sandia)
  • Thomas Naughton (ORNL)
  • Todd Kordenbrock (Sandia)
  • William Zhang (AWS)

not there today (I keep this for easy cut-n-paste for future notes)

  • Akshay Venkatesh (NVIDIA)
  • Artem Polyakov (nVidia)
  • Aurelien Bouteiller (UTK)
  • Brandon Yates (Intel)
  • Charles Shereda (LLNL)
  • David Bernhold (ORNL)
  • Erik Zeiske
  • Geoffroy Vallee (ARM)
  • Harumi Kuno (HPE)
  • Hessam Mirsadeghi (UCX/nVidia)
  • Joshua Ladd (nVidia)
  • Marisa Roman (Cornelius)
  • Mark Allen (IBM)
  • Matias Cabral (Intel)
  • Michael Heinz (Cornelis Networks)
  • Nathan Hjelm (Google)
  • Noah Evans (Sandia)
  • Raghu Raja (AWS)
  • Ralph Castain (Intel)
  • Sam Gutierrez (LLNL)
  • Scott Breyer (Sandia?)
  • Shintaro iwasaki
  • Tomislav Janjusic (nVidia)
  • Xin Zhao (nVidia)

4.0.x

  • Schedule: No schedule for v4.0.8 yet
    • bugfixes case-by-case basis
  • Winding down v4.0.x, and after v5.0.x will stop
  • Really only want small changes reported by users.
  • Otherwise, point users to v4.1.x release.

v4.1.x

  • Schedule: Shooting for v4.1.3 end of March/Q1.
  • No other update.

v5.0.x

  • Need ROMIO update.
    • Open an issue to track this.
  • What's the status of MPI Sessions?
    • It's ready for review, but it's a very big PR.
    • Nathan pulled attributes out
    • v5.0 release managers should look at it closely. Two main things in it:
      1. Reorganization for MPI_Init/Finalize to be able to be called multiple times
      • Attributes is used for this
      1. Extended CID thing. Read this section and this link for cluster 19 paper.
      • Allow us to use PMIx process set stuff more efficently.
        • Request a unique 64bit PMIx CID as part of a PMIx group join.
          • Expensive to get this from PMIx for large jobs.
      • UCX isn't able to do Sessions (because not using extended CID thing now)
        • For these PMLs just need to look at Init/Finalize work.
      • If you get a Comm outside of MPI_Comm_from_Group, goes back to
    • Nice to have in v5.0, but since we won't have all other
    • Howard is fixing one conflict on PR.
    • Deadline of Feb 1st Review - Send email to devel
  • Thinking about an RC before and after Sessions.
    • Well as far as tracking, we have nightly tarballs, and it'll be clear in git
  • Docs rework
    • We made a lot of progress on revamping the docs with restructured text.
    • Might actually be able to get this done by v5.0.x
    • Dont go review yet, but lots of good progress.

Master

  • Fortran
    • Howard put a comment in PR #9837, We'll need a different type that uses assumed rank instead of assumed ____
      • based on email comment on standard.
      • Lists the types of dummy args when the actual arguments are async, and assumed size is not one of them.
        • Assumed size is Type(*) - so can't use asyncronous.
      • Anything that takes a buffer calls it Type(*), Dimension(..) and Dimension is the rank.
      • If it's Type(*), Dimension(*) then can't do asyncronous.
      • For the moment this only affects master. How can we fix master so it can be brought to v5.0.x
    • Jeff will fix on a new PR, but doesn't want it on #9837.
    • Tackle Omnibus one that's still in flight. Unclear if this would make it to v5.0.x
  • Some other MPI-4.0 standard things that are small. If we want to get these into v5.0, we need reviews.
    • Howard pinged

MTT

  • Cisco test build failures still pending fixes.
    • Cisco re-enabled MTT but had a typeo
    • Did we change minimum hwloc allowable by master?
      • Yes, but can't remember what.
      • Found a bug, will file an issue.
  • IBM has an OMPI build failure with XL compiler on ppc64le.
    • IBM's looking to workaround with Open MPI code change.
    • Looking at this morning.
      • Looks like XL didn't like combo of -O2 + -g
    • CFLAGS is for -g and doesn't like changing default when --debug is used.
    • Shouldn't undo autoconf-isms.
  • Test related - Late last year, we found a number of collectives that had issues with large payloads, and push against MPI-3's large buffer.
    • Will be pushing to the public repo.
Clone this wiki locally