Skip to content

WeeklyTelcon_20180522

Geoffrey Paulsen edited this page Jan 15, 2019 · 1 revision

Open MPI Weekly Telecon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen
  • Edgar Gabriel
  • Joshua Ladd
  • Todd Kordenbrock
  • Xin Zhao
  • Jeff Squyres
  • Geoffroy Vallee
  • Thomas Naughton
  • Brian

not there today (I keep this for easy cut-n-paste for future notes)

  • David Bernholdt
  • Howard Pritchard
  • Matthew Dosanjh
  • Dan Topa (LANL)
  • Akvenkatesh
  • Nathan Hjelm
  • Ralph
  • Josh Hursey

Agenda/New Business

Minutes

Review v2.x Milestones v2.1.4

  • v2.1.4 - Targeting Oct 15th,
  • lower priority to v3.0 and v3.1
  • No new news on v2.1.x

Review v3.0.x Milestones v3.0.2

  • Schedule:
    • Hope to post v3.0.2 later this week
  • Ready to ship v3.0.2
    • Need to sort out shared library version numbers.
      • No suprises here. Just some fortran bits. After some testing (forgot if this was 3.0.2 or 3.1.1) Decided code changes, not interface adds.
      • Done and ready to go.

Review v3.1.x Milestones v3.1.0

  • Merged in most of outstanding changes on v3.1.x
    • PR4397 - UCX
  • Schedule
    • Shooting for early June.
    • Next week, will cut a release canidate.
  • Long outstanding list of PR for v3.1.x branch.
    • 4 or 5 need review. one is Geoff tagged for review. (done)
    • will hold of about a week in case we need to do a quick turn-around oops release.
  • Mellanox v3.1.x MTT has many many failures. Josh will look at.
    • Boris was looking at this, something with new PMIx. Should be fixed.
  • IBM MTT hasn't run since April, and is now running again.
  • RESOLVED UCX xpmem issue
    • looking pretty good. Howard brought up some issues on single node with xpmem.
    • xpmem can be disabled via env var.
  • Create an issue for v3.1.1 - gpaulsen
    • Issue with Connect-X3 attomic support. UCX limitation.
      • For v3.1.1 Some want fallbacks, or Errors, but don't segv.
    • For v4.0
      • Mellanox planning to do emulation on CPU if IB card can't do HCA attomics.
      • Still need a check in OMPI, incase they're running with old UCX.

v4.0.0

  • Schedule: mid-July branch. mid-Sept relelase.
  • Start meeting weekly.
    • iWarp have a person to contact.
    • Unclear if UCT BTL will work on Connect-X3 or Broadcomm rocky.
  • At developer meeting we discussed removing the old use-mpi fortran module.
    • Can't remove since RHEL 7.x is using gfortran 4.8.5
  • UCX Community has committed to doing Emulation in UCX.
    • UCX + Connect-X3 Will work for pt2pt and collectives, but not RMA
    • Will emulate for v4.0.0
  • It would be nice to have a doc that is the set of supported hardware / and which drivers to use.
  • MPI Standards removal for MPI removed items in Open MPI v4.0
    • Nathan sent out email about PR 5127 - to remove all MPI2.x standard items.
    • A little weird to be able to pull back MPI1 removed items.
      • Lets remove these too at the same time.
    • Delete all of this in OMPI 5.0
  • C++ bindings are seperate pull request. PR 5128 Goal is to have these removed as well.
    • Nathan has a PR to put Deprectated warnings for ALL MPI1 stuff.
    • Delete all of this in OMPI 5.0
  • Lets turn off more building by default.
    • Forum didn't REMOVE everything that was deprecated in MPI v3.0 standard.
  • go over v4.0
    • I'll ping george on both first two.
      • does uct btl include pt2pt. add in support for send/recv methods.
  • hwloc v2.0.x
    • jeff or Geoff
  • SPC - jeff had some comments before pulling.
  • Need to check status of MPI 3.2 standard.
  • Fujitsu - PMIX persistant collectives... look at mailing list.
  • libevent and hwloc jeff will look at configury for making external prefered
  • OSHMEM
    • Mellanox is making good progress.
    • Do not build OSHMEM if a viable SPML cannot be built - Brian posted PR.
  • Update ROMIO giles might be good canidate for that. (experienced).
  • Not going to remove fortran MPI TKR module - will
  • Imporved performance for single and MT -nathan
  • MPMD Support for SLURM 17.11 - Howard.
    • Feature might be somewhat buggy.
    • Nasty way to launch dameons if using SRUN.
    • If you use PMI2 instead of PMIx
    • Won't be a PMIx v3.0 for v4.0 timeframe.
      • Howard is driving to make it at least build, but not use new features.
  • iWarp - has a rewrite to rename openib
    • Will test on new cards this/next week.
    • Connect-X3 UCX uses pt2pt, from this perspective good to drop openib btl.
  • Add to v4.0 list:
    • Edgar Vulcan component -waiting for one more commit from student.
    • Add support for Cuda buffers in OMPI-IO
    • A couple of of updates for luster component, but not sure if it will make it default. time we could switch

Review Master Master Pull Requests

  • PR5180 - Remove MXM MTL action item from developer's meeting.
    • Mellanox approved.
  • Last week: OSHMEM v1.4 - not sure if we have to drop the depricated APIs, curious OMPI is dropping depricated APIs...
    • Only remove things removed from the OSHMEM standard, not things Deprecated as "deprecated" means it will be removed from a future version of the standard. If some APIs were removed from the standard, then ask oshmem email list their thoughts.
    • Xin should be able to push first version of OSHMEM v1.4 changes to master next week or so.
    • Xin should be pushing today or tomorrow... It's been passing some simple tests.
  • Egar has a new component with weird name, we need to

PMIx

  • As a heads up ULFM support may require PMIx v3.0

Other topics

  • All Tarballs in S3
  • Set an end date for web-mirrors... end of june.

MTT / Jenkins Testing Dev

  • Got compiler licenses for NAG compiler, and Absoft
    • Both Fortran
    • No progress.
  • Get copy of perl JSON, and put it on MTT.
    • DONE

Oldest PR

Oldest Issue


Status Updates:

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM, Fujitsu
  3. Amazon,
  4. Cisco, ORNL, UTK, NVIDIA

Back to 2018 WeeklyTelcon-2018

Clone this wiki locally