Skip to content

WeeklyTelcon_20171212

Geoffrey Paulsen edited this page Jan 9, 2018 · 1 revision

Open MPI Weekly Telecon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen
  • Jeff Squyres
  • Brian
  • Howard Pritchard
  • Nathan Hjelm
  • David Bernholdt
  • Ralph
  • Thomas Naughton
  • Todd Kordenbrock
  • Geoffroy Vallee
  • Joshua Ladd
  • Artem
  • Josh Hursey

Agenda

Review v2.0.x Milestones v2.0.4

  • put some PRs on branch, but not forcing a new release.

Review v2.x Milestones v2.1.2

  • A few PRs coming through.
  • Bugfix only mode.
  • Launchmond / Alliena attach mode - Issue 3660. This mechanism is part of MPIR,
    • regresed from 2.0 to 2.1.
    • MPIR attach FIFO - never tested this well.
    • It'd be good to have a version of STAT on test.
    • Launchmond has integration tests themselves. Problem is it either hangs, or works.
    • Ralph will send receipe to devel mailing list.
    • We will scope the scale of the fix (should be easier to see what regressed from v2.0 rather than backport something from v3.0

Review v3.0.x Milestones v3.0

  • 4509 madvise hook - Jeff and Howard will discuss.
    • Now that we hook madvise, we need to be morecareful.
    • Nathan hopes his PR 4576 on master would reduce the occurances to 0, but need user to verify.
      • may have to invalidate a LARGE region, even though it's mostly valide just because glibc invalideded a small part of it.
    • Will test 4576 in master tonight, and then merge into v2.x, v3.0.x and v3.1.x
    • Any other blockers for v3.0.1? Need to tag them very soon. Want to release before end of holidays.

Review v3.1.x Milestones v3.1

  • v3.1.0 still has Blocker Issue 4509
    • Hope it was fixed in PR 4576 in master tonight, to merge in later.
  • Dist Graph Create / Tree Create is still segfaulting - but others can't reproduce.
  • Blockers:
    • madvise
    • IMB fails with vader
    • update to PMIx v2.1.0
      • bugfix - dependency in PMIx, mellanox is working on this.
      • PR4606 PGI failure in OMPI pull request, but it passed in PMIx.
  • Issue 4465 - rsh launching failures Mellanox is looking at. Not sure if it's v3.0.0, v3.1.x or master only.

Review Master Master Pull Requests

  • Issue 4303 What do we do with Treematch?
    • segfaults frequently, so maybe turn it off by default. Component in topo, creates graphs when you create a communicator.
    • If you get a reproducer, then update ticket and hand to George.

MTT / Jenkins Testing Dev

  • Seems to be a memory leak in the OMPI Jenkins
    • Working on a solution
    • Workaround by turnning off pipeline builds.


This week Discussion Points.

  • Brian sent an email earlier this week about News file
    • Either we make merging painful for developers, or we create a rather large amount of work for release managers.
    • Can automate via Pull Request that ends up in the merge.
    • block NEWS: whatever you want NEWS to be.
    • With metadata, using Pull Requests, then can change that NEWS block after the fact.
    • Would happen at make dist time. Public API calls.
  • WebEx Schedule: WebEx Next Tuesday Dec19 (unless 0% chance of getting v3.0.1 out)
    • Cancel Dec 26,
    • Cancel Jan 2nd
  • Jeff will create new WebEx URL for 2018.

Oldest PR

Oldest Issue

Next face-to-face meeting

  • See on list email
    • Tentatively: Jan 23-25
    • Geoff will send email to devel-core asking if the community wants to push the date back to Feb or March.

Status Updates:

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM, Fujitsu
  3. Amazon,
  4. Cisco, ORNL, UTK, NVIDIA

Back to 2017 WeeklyTelcon-2017

Clone this wiki locally