Skip to content

WeeklyTelcon_20180515

Geoffrey Paulsen edited this page Jan 15, 2019 · 1 revision

Open MPI Weekly Telecon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen
  • Todd Kordenbrock
  • Joshua Ladd
  • Brian
  • David Bernholdt
  • Howard Pritchard
  • Xin Zhao
  • Matthew Dosanjh

not there today (I keep this for easy cut-n-paste for future notes)

  • Geoffroy Vallee
  • Dan Topa (LANL)
  • Jeff Squyres
  • Akvenkatesh
  • Nathan Hjelm
  • Ralph
  • Josh Hursey

Agenda/New Business

Minutes

Review v2.x Milestones v2.1.4

  • v2.1.4 - Targeting Oct 15th,
  • lower priority to v3.0 and v3.1
  • No new news on v2.1.x

Review v3.0.x Milestones v3.0.2

  • Schedule:
    • Quick turnaround on this, Shooting for May 1st.
  • Posted v3.0.2 RC1 yesterday
    • Hope to be
    • Need to sort out shared library version numbers.
    • Jeff will finish looking at PR tomorrow or today.

Review v3.1.x Milestones v3.1.0

  • Shipped v3.1.0 last week
  • Merged in most of outstanding changes on v3.1.x
    • 3 still awaiting review.
    • Can someone look at opal_free_list fix?
    • PR4397 - UCX
  • Schedule
    • Like to hold to a month turn around.
    • Next week, will cut a release canidate.
  • Long outstanding list of PR for v3.1.x branch.
    • 4 or 5 need review. one is Geoff tagged for review. (done)
    • will hold of about a week in case we need to do a quick turn-around oops release.
  • Mellanox v3.1.x MTT has many many failures. Josh will look at.
  • For v3.1.1 - Want to get UCX OSC at a higher priority.
    • Issue 5048
    • looking pretty good. Howard brought up some issues on single node with xpmem.
      • UCX bug.
      • Xin will rerun and see where we stand.
    • xpmem can be disabled via env var.
    • Issue with Connect-X3 attomic support. UCX limitation.
      • For v3.1.1 Some want fallbacks, or Errors, but don't segv.
    • For v4.0
      • Mellanox planning to do emulation on CPU if IB card can't do HCA attomics.
      • Still need a check in OMPI, incase they're running with old UCX.

v4.0.0

  • Schedule: mid-July branch. mid-Sept relelase.

  • Start meeting weekly.

    • iWarp have a person to contact.
    • Unclear if UCT BTL will work on Connect-X3 or Broadcomm rocky.
  • UCX Community has committed to doing Emulation in UCX.

    • UCX + Connect-X3 Will work for pt2pt and collectives, but not RMA
    • Will emulate for v4.0.0
  • It would be nice to have a doc that is the set of supported hardware / and which drivers to use.

  • As a heads up ULFM support may require PMIx v3.0

  • MPI Standards removal for MPI removed items in Open MPI v4.0

    • Nathan sent out email about PR 5127 - to remove all MPI2.x standard items.
    • A little weird to be able to pull back MPI1 removed items.
      • Lets remove these too at the same time.
    • C++ bindings are seperate pull request. PR 5128 Goal is to have these removed as well.
      • Sent poll out - 12 of 41 responses. 1 of 12 has said they're using C++ binding, but not sure if it's accurate.
      • If C++ bindings are sufficently isolated we could move to a seperte repo.
      • But if no one is really using, lets just remove it all.
  • Lets turn off more building by default.

    • Forum didn't REMOVE everything that was deprecated in MPI v3.0 standard.

Review Master Master Pull Requests

  • Last week: OSHMEM v1.4 - not sure if we have to drop the depricated APIs, curious OMPI is dropping depricated APIs...
    • Only remove things removed from the OSHMEM standard, not things Deprecated as "deprecated" means it will be removed from a future version of the standard. If some APIs were removed from the standard, then ask oshmem email list their thoughts.
    • Xin should be able to push first version of OSHMEM v1.4 changes to master next week or so.
    • Xin should be pushing today or tomorrow... It's been passing some simple tests.

PMIx

Other topics

  • Rolled back on putting all Tarballs in S3 and using Cloud-front (for distributing tarballs)
    • Some issue with Mirrors.
    • Rolling this back out today.
    • test tarball

MTT / Jenkins Testing Dev

  • Got compiler licenses for NAG compiler, and Absoft
    • Both Fortran
    • No progress.
  • Get copy of perl JSON, and put it on MTT.
    • DONE

Oldest PR

Oldest Issue


Status Updates:

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM, Fujitsu
  3. Amazon,
  4. Cisco, ORNL, UTK, NVIDIA

Back to 2018 WeeklyTelcon-2018

Clone this wiki locally