Skip to content

WeeklyTelcon_20180403

Geoffrey Paulsen edited this page Jan 15, 2019 · 1 revision

Open MPI Weekly Telecon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen
  • Akvenkatesh
  • Artem Polyakov
  • David Bernholdt
  • Edgar Gabriel
  • Howard
  • Josh Hursey
  • Nathan Hjelm
  • Todd Kordenbrock
  • Xin Zhao

Agenda/New Business

Minutes

Review v2.x Milestones v2.1.3

  • v2.1.4 - Targeting Oct 15th,
    • Merged in a bunch of stuff.
    • One-sided multithreaded bugs that came up.
      • Doesn't feel like it's worth it to fix in v2.1.x, so instead pulled configurey changes from v2.0 to v2.1.x
  • No new news on v2.1.x

Review v3.0.x Milestones v3.0.2

  • v3.0.1 went out the door.
    • Oops, Did not get PMIx Compatibility pieces in embedded PMIx
  • v3.0.2 open for bugfixes. Probably a quick turnaround on this.
    • Will pre-emptively fix PMIx compatibility pieces to pickup PMIx v1.2.5 clients.
    • This will bring in PMIx compatibility with OMPI client (mpirun/orted/libmpi) from OMPI v2.1.3
  • memkind disable needs to get into v3.0.2
  • Josh Ladd should review 4398

Review v3.1.x Milestones v3.1.0

  • Brian not here today.
  • PR4977 caused corruption. Nathan PRed last week.
    • fixed in v3.0.1 and v3.1.0
  • Cisco is seeing a number of Spawn issues on v3.1.x in MTT.
    • Most of these were oversubscription issues, needed ini changes
    • Still seeing another 100 or so more failures, he expects the same. Possibly new regressions.
    • Jeff still needs to look at.

Review Master Master Pull Requests

  • All 32bit builds failed in CI for all PRs over the weekend. Brian fixed it yesterday.

Other topics

  • All 32bit builds failed in CI for all PRs over the weekend. Brian fixed it yesterday.
  • Ending Open MPI Mirrors program
    • Used to have website would be "mirror friendly" so others could host it around the world.
    • Now, Internet connectivity and bandwidth is much better.
    • Moving behind Amazon's Cloud front performs this for our mirror clouds.
    • sending out the "So Long and thanks for all the Fish" message to Mirrors.
    • nightly and release will be moved to download.open-mpi.org (From Amazon's S3) Fast!
    • No longer version controlled.
  • Review Open MPI / PMIx embedding / SLURM - client & server version compatibilities.
    • Reviewed a google doc spreadsheet, Jeff shared. Sent out in email on discussion list.
    • Artem commented on some SLURM compatibilities. SLURM 16.05 - support PMIx v1.x SLURM 17.11 - SLURM can be configured with either PMIx v1.x or 2.x SLURM does not imbed PMIx, must configure against.
    • Implications for OpenMPI
      • When you have PMIx client v1.2.3 with server v1.2.3 works. (all testing with itself works)
      • This graph is coming from a PMIx client / server standpoint, and describes
      • Wasn't there some blanket cross-version support statements?
        • v1.2.5, v2.0.3, v2.1.1, v3.0.0
      • How is PMIx dstore represented in this graph? ORTE MCA parameter needed for client/server missmatch
    • There is a 3rd chart to describe what testing should be done.
    • This chart does not describe configuring with external PMIx, and compatibility.
      • Containers and externals are different, to be discussed later.
    • Need to figure out how to discuss this with Users.
      • Perhaps discussing compatibilities between user's tools (Orte / slurm / mpirun / Debuggers / etc)
    • one of the things good about PMI v1 or v2, is that their interface stayed the same for years.
      • Well, also PMIx supporting multiple "levels" the message is no longer "use PMI v1/v2 everywhere... there are various levels of support / compatibility everywhere.

MTT / Jenkins Testing Dev

  • IBM CI is back up
  • Cisco and IBM MTT didn't trigger last night.

When should we branch v4.0?

Oldest PR

Oldest Issue


Status Updates:

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM, Fujitsu
  3. Amazon,
  4. Cisco, ORNL, UTK, NVIDIA

Back to 2018 WeeklyTelcon-2018

Clone this wiki locally