Skip to content

WeeklyTelcon_20180130

Geoffrey Paulsen edited this page Jan 15, 2019 · 1 revision

Open MPI Weekly Telecon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen
  • Brian
  • Edgar Gabriel
  • Geoffroy Vallee
  • Howard
  • Matthew Dosanjh
  • Josh Ladd
  • Mohan --- A number of usuals no here today:
  • Jeff Squyres
  • akvenkatesh
  • Artem
  • Josh Hursey
  • Todd Kordenbrock
  • Nathan

Agenda/New Business

  • News: Ralph will not be able to work on Open MPI anymore. He will continue to work on PMIx, but not even the Open MPI PMIx merge.
  • We Need a v3.1 release engineer to help Brian will send email to devel-core
  • MPI forum is in Portland in over a month.
    • Nothing for OMPI Community at this time.
  • Face2Face -
    • Ralph offered to have a brain dump day. Email Brian if interested.
    • Current thought is to co-locate PMIx and MPI meeting in Dallas.
    • Consensus for Week of March 20-22
    • Face to Face - if we set it for the week of March 20-22 we can co-locate PMIx and OMPI
      • Assuming about half day of orte.

Minutes

Review v2.x Milestones v2.1.3

  • No chance to look at.
  • Lack of schedule and interest, this is Pushed back to March 1st.

Review v3.0.x Milestones v3.0.1

  • Schedule: RC2 posted.
    • Looking for feedback, thumbs up or down.
    • Will have an RC3 after a few more PRs are pulled in.
  • Target v3.0.x in PR4715
    • Review required.
  • Will Pull in PR4716
    • Issue 4563
      • not seeing on little arm boxes here, Jenkins uses --disable-builtin-atomics.
  • Comm Spawn - Documentation PR ready or pulled
  • Issue 4509
    • We believe this is closed. Asked Nathan to close.
  • Issue - hwloc can't handle cuda from a different location
    • On Master specifically disabling hwloc cuda.
    • External component does NOT disable build, since
  • 4677 - hwloc2 WIP, may need help with.

Review v3.1.x Milestones v3.1.0

  • SCHEDULE:
    • RC2 posted last week.
    • PMIx v2.1.0 - merge in rc2 or wait for final?
      • Ralph says PMIx v2.1.0 final is this week.
    • Please download and test OMPI v3.1.x RC2.
  • Blocker on v3.1.x
    • Waiting on PMIx v2.1.0 final to pull in:
    • PR4516
    • OSC monitoring fix (doesn't build with Portals 4)
    • PMIx 2.1 PR4605
      • PR4746
      • Ralph - there is cleanup issue with PMIx 2.1, but we have cleanup issues today
      • Mellanox will help work on this.
    • UCX one sided violating PR4688
    • Issue 4303
      • Probably just need to build a patch.

Review Master Master Pull Requests

  • Issue Issue4686
    • Something going on in there. Possibly atomic related.
    • Might need Nathan's attention.
    • Mellanox will try to reproduce after reverting atomic change. Timing issue.
  • Dynamic operations, a TON of sigfaults. All in opal_progress, during ompi_sync_wait multi-credit.
    • Something is wrong with atomics. Intercomm_create or Spawn.
    • Cisco is tickling the most, and will look at.
    • Delayed.
  • PR4697 Got resolved and merged to master. * Opal Progress change looks good for most interconnects. * TCP performance regression was resolved and merged to master. * Going to PR this into v3.1.x * George has some thoughts with this * Don't have any non-OS wrappers for TLS * Master now checks for Cx11 Can we make it default? * Mac Sierra may/maynot even with _Thread_local * Would be nice if we could require Cx11 for v4.0
  • Reg-ex expression creation.
    • PR4710
    • someone created a test and put it in make-check rather than MTT.
    • Then made the component static so that don't have to do make install
    • Dont think we should be adding tests to make-check
    • Question - Is there a Regex library we could use? Reg-ex is hard.
    • This is working pretty well, but did add Framework to allow for future components.

Process

  • Change behavior of opal_check_package
    • Brian will send email to devel
    • Make it more explicit when it finds issues
  • Issue Issue4423
  • When your PR has been accepted into a release branch, please go to the issue, and remove the target of the release branch that it was just merged into. Attempting to automate this in the future.

MTT / Jenkins Testing Dev

  • New Topic - We currently can't write unit tests against components.
    • Some way to say "this unit test is against this component".
    • Intel went through and did this internally for orte. Already hosted in public domain.
      • Ralph will send link to Brian to take a look.
  • Python Client can't report back to database.

Abandoning OpenIB BTL

  • Discuss abandoning openib btl.
    • LNLL - is no longer paying anyone to maintain openib btl.
      • Nathan has a UCX BTL
    • ETA on GPU in UCX - basic minus CUDA IPC in test now.
    • Any warning message if on iWarp
    • What's the roadmap for this? 3.x or 4.x?

Oldest PR

Oldest Issue

Next face-to-face meeting

  • pushed date to late feb or march.

Status Updates:

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM, Fujitsu
  3. Amazon,
  4. Cisco, ORNL, UTK, NVIDIA

Back to 2017 WeeklyTelcon-2017

Clone this wiki locally