Skip to content

WeeklyTelcon_20200210

Geoffrey Paulsen edited this page Feb 12, 2020 · 1 revision

Open MPI Weekly Telecon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Geoffrey Paulsen (IBM)
  • Jeff Squyres (Cisco)
  • Brian Barrett (AWS)
  • Artem Polyakov (Mellanox)
  • Todd Kordenbrock (Sandia)
  • Brendan Cunningham (Intel)
  • Austen Lauria (IBM)
  • Edgar Gabriel (UH)
  • Joseph Schuchart
  • Josh Hursey (IBM)
  • Ralph Castain (Intel)
  • Nathan Hjelm (Google)
  • Michael Heinz (Intel)
  • William Zhang (AWS)
  • Harumi Kuno (HPE)
  • Howard Pritchard (LANL)

not there today (I keep this for easy cut-n-paste for future notes)

  • Noah Evans (Sandia)
  • Joshua Ladd (Mellanox)
  • Thomas Naughton (ORNL)
  • Charles Shereda (LLNL)
  • David Bernhold (ORNL)
  • George Bosilca (UTK)
  • Matthew Dosanjh (Sandia)
  • Brandon Yates (Intel)
  • Erik Zeiske
  • Mark Allen (IBM)
  • Matias Cabral (Intel)
  • Xin Zhao (Mellanox)
  • mohan (AWS)
  • Akshay Venkatesh (NVIDIA)

New Business

  • Jeff is going to register everyone for Face to Face after the call.

    • If you're coming to Face to face, please add yourself to wiki now.
  • MTT -

    • If you change your MTT to startup PRRTE at begining of session, and just use prun.
    • Can see times cut in half or more.
    • This is good, but also need to test mpirun wrapper.
    • Cisco is converting half of MPI installs to use prrte/prun
  • AWS where can scale out horizontally, will continue to do both.

  • PRRTE Transition:

    • ORTE is gone, PRRTE is in it's place. Expect some hickups
    • A bunch of MTT failures, because people need to update command line changes for - vs -- in command line prompts.
    • A number of Fortran failures, that don't make much sense.
  • IBM MTT is hitting IOF issue, where file descriptor shuts down, and libevent spins hard

  • PRRTE - Josh turned on CI.

    • Auto labeller is not yet there. Experimenting
    • Like to get OMPI side running prte option
    • Whenever we move PMIX or PRTE submodule pointer, it'll label the PR.
  • Anyone can click the override-merge button.

    • Hasn't been an issue, but remember this won't trigger PR based hooks.
    • Still 1+ month of effort before Open MPI v5.0 could be ready with this.
    • see: https://github.com/openpmix/prrte/issues/298 for additional mpirun launch items
  • OMPI master submodule pointers setup to track PMIx and PRRTE master.

    • Hopefully long term, master can track release branches.
    • But still ensure there's some regression tracking of master/master/master.
    • But once things settle down, might not want everyone's masters tracking each other.
    • But if we DONT have master/master/master then new features that span across repos will be challenging
    • Ompi v5.0 might want to trigger a major revision of other dependencies (PMIx and PRRTE)?

Old Business

  • Singleton comm-spawn... how do we make this work? - PMIx understands it.
    • Do we need to support singleton comm-spawn starting the PRRTEs?
    • Now that we will support a persistant infrastructure, maybe we just require users to start it first.
    • Address comm-spawn issues that have been raised.

Release Branches

Review v3.0.x Milestones v3.0.6

Review v3.1.x Milestones v3.1.6

  • Jeff filed 7361 - compilation issue and filed.

Review v4.0.x Milestones v4.0.3

  • v4.0.3 in the works.
    • Put out an rc end of this week once PMIx 3.1.5 releases.
    • Schedule: Feb 21.

v5.0.0

  • Schedule: No real schedule yet.
    • No release managers selected yet.
    • IBM (Geoff with Austen's help) is interested.

Face to face

  • Portland Oregon, Feb 17, 2020.
  • Please register on Wiki page, since Jeff has to register you.
  • Date looks good. Feb 17th right before MPI Forum
    • 2pm monday, and maybe most of Tuesday
    • Cisco has a portland facility and is happy to host.
    • about 20-30 min drive from MPI Forum, will probably need a car.

Infrastrastructure

Review Master Master Pull Requests

CI status


Depdendancies

PMIx Update

  • PMIx v3.1.5 rc2 posted this week. Release should be Friday.
  • CI testing only tests build and did it run, but doesn't test HOW it ran.
    • Environment setup can be a bit different.
    • For example no-permissions in /tmp. Might pass on one machine, and fail on another without /tmp permissions.

ORTE/PRRTE

MTT


Back to 2019 WeeklyTelcon-2019

Clone this wiki locally