Skip to content

WeeklyTelcon_20181106

Geoffrey Paulsen edited this page Jan 15, 2019 · 1 revision

Open MPI Weekly Telecon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (in Person)

  • Geoff Paulsen
  • Jeff Squyres
  • Brian
  • David Bernholdt
  • Geoffroy Vallee
  • Howard Pritchard
  • Josh Hursey
  • Matias Cabral
  • Matthew Dosanjh
  • Nathan Hjelm
  • Ralph Castain
  • Thomas Naughton
  • Todd Kordenbrock
  • Xin Zhao

not there today (I keep this for easy cut-n-paste for future notes)

  • Akshay Venkatesh (nVidia)
    • In the web-ex but no audio.
  • Edgar Gabriel
  • Aravind Gopalakrishnan (Intel)
  • Arm (UTK)
  • Dan Topa (LANL)
  • Dan Topa (LANL)
  • George
  • Peter Gottesman (Cisco)
  • Joshua Ladd?
  • mohan

Agenda/New Business

  • Vader Issue 6014 - Aquire thread fence anywhere we're using the sync buildin primatives.

    • Symptom is hang.
    • Reproducer is tight loop on barrier
    • Need a FAQ entry on this.
    • Workaround is --disable-builtin-atomics
      • no real downside from doing this.
    • 3.x, 4.x, maybe 2.1.x?
    • master wasn't affected because it uses C11 by default.
    • Wherever this goes back to, we need to update PMIx there to, since same code.
    • End of the day, this will drive a new release on v2.x, v3.0 and 3.1
  • Face to Face was next week

  • Summary of PMIx re-architecturing for v5.0

  • Lots of TCP wire-up discussion

  • github suggestion on email filtering

Minutes

Review v2.1.6 (not going to do this in immediate future.

  • Schedule:
  • Dec 1st after super computing

Review v3.0.x Milestones v3.0.3

  • Schedule:
  • Yesterday 3.0.3 shipped.
  • Scheduled 3.0.4 may of 2019

Review v3.1.x Milestones v3.1.0

  • Schedule:
  • Shipped 3.1.3 yesterday
  • Scheduled 3.1.4 april of 2019

v4.0.0

  • Schedule: Shoot
  • on closed PR -
    • blocking items:
    • Vader: atomics fix for v4.0.0 Issue 6014
    • LSF configure change to match support change README
    • Probably new PMIx version with this fix as well.
  • Two rankfile mapper issues reported on mailing list. Howard will file issue.
  • Want to issue MPR deprication warning in (master)PR 5947
  • Need to add to FAQ, README, and NEWS (Important giant changes)
    • Decided to make a special website, we've never removed MPI symbols before.
    • Make it googlable.
    • MPI1 removal
      • How to transition from MPI_UB/MPI_LB
      • symbols that are missing.
      • How to transition is IN MPI 2.2 standard doc, but not in 3.1... haha
    • openib removal for infiniband cards.
      • go get ucx (or use this mca param)
    • configury defaults to external for some components (pmix, hwloc, libevent)
  • Mellanox will writeup some FAQ entries for UCX in Open MPI
    • Jeff made some changes to ompi-www PR.
  • nVidia will writeup some GPU FAQ entries
    • added a PR, just needs to sign commit.

PMIx

  • Nothing new at this time.
  • PMIx will need this vader fix that Nathan will be PRing soon.

Master

  • IBM gets a build failure, looks like it's cluster related.
    • openib is failing out at IBM MTT with device failure.
  • Absoft started failing.

Super Computing Open MPI BOF, and PMIx BOF

  • Invite your friends to BOF.
    • Wed is a social thing. (Jeff Squyres is organizing - email if interested)

MTT

  • Cisco has a one-sided info check that failed a hundred times.
    • Cisco install fail looks like a legit compile fail (ipv6 master)

New topics

  • We have a new ibm-ompi SLACK channel for Open MPI developers.
    • Not for users, just developers...
    • email Jeff If you're interested in being added.

Review Master Master Pull Requests

  • didn't discuss today.

Oldest PR

Oldest Issue


Status Updates:

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM, Fujitsu
  3. Amazon,
  4. Cisco, ORNL, UTK, NVIDIA

Back to 2018 WeeklyTelcon-2018

Clone this wiki locally