Skip to content

WeeklyTelcon_20200526

Geoffrey Paulsen edited this page May 26, 2020 · 1 revision

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Jeff Squyres (Cisco)
  • Austen Lauria (IBM)
  • Aurelien Bouteiller (UTK)
  • Barrett, Brian (AWS)
  • Brendan Cunningham (Intel)
  • Christoph Niethammer (HLRS)
  • Edgar Gabriel (UH)
  • George Bosilca (UTK)
  • Harumi Kuno (HPE)
  • Howard Pritchard (LANL)
  • Joseph Schuchart
  • Joshua Ladd (nVidia/Mellanox)
  • Matthew Dosanjh (Sandia)
  • Michael Heinz (Intel)
  • Naughton III, Thomas (ORNL)
  • Ralph Castain (Intel)
  • Todd Kordenbrock (Sandia)
  • Geoffrey Paulsen (IBM)

not there today (I keep this for easy cut-n-paste for future notes)

  • David Bernhold (ORNL)
  • Josh Hursey (IBM)
  • William Zhang (AWS)
  • Akshay Venkatesh (NVIDIA)
  • Artem Polyakov (nVidia/Mellanox)
  • Brandon Yates (Intel)
  • Charles Shereda (LLNL)
  • Erik Zeiske
  • Geoffroy Vallee (ARM)
  • Mark Allen (IBM)
  • Matias Cabral (Intel)
  • Nathan Hjelm (Google)
  • Noah Evans (Sandia)
  • Scott Breyer (Sandia?)
  • Shintaro iwasaki
  • Xin Zhao (nVidia/Mellanox)
  • mohan (AWS)

New

  • Building with older libfabric fails, which causes all Coverity builds to also fail.
    • libfabric is preinstalled, should be an easy fix on aws.open-mpi.org
    • new selection code put in by AWS broke older libfabrics.
      • should continue to support back to v1.0.5
      • AWS will fix
      • Jeff filed 7753 master-only.

Release Branches

Review v4.0.x Milestones v4.0.4

  • v4.0.4rc2 - tags, and push up to github.
    • Couldn't run the tarball builder. Something's broken with it in Jenkins
    • Configure in jenkins there's a bad URL to get the
    • It's now stuck. Best case brian can look at tallbar builder Friday.
  • Jenkins Problem is the script Jenkins kicks off ompi-scripts repos.
    • Jeff, Howard, and Brian have access.
    • Brian played with it a bit on Saturday.

Review v5.0.0 Milestones v5.0.0

  • PR7762 - Giles filed - we're missing some fortran and C functions for MPI_Status

    • Jeff is reviewing.
    • Possibly wanted on v4.0.x
  • Schedule:

    • PRRTE - Intel is changing his priority to move PRRTE to low priority. - Intel needs it by Q4, but end of summer at the latest. - It works with tiny option set. - Many feature requests. Many CI options.
      • What are all the fetures? - many ci tests
      • Branching early, life just gets more painful.
      • One silver lining - is most of these features are PRRTE alone.
        • So submodule updates are pretty easy. Doesn't require cherry-picks.
        • If there's a feature that's blocking functionality
    • Some things would be considered as major regressions against OMPI v4
      • May need an audit - what features aren't there
        • What's no there with prrte launch?
        • judge priority of missing features, open issues, etc.
      • SOMEONE could do this, but wouldn't neccisarily give a good indication of v4-vs-v5 features.
    • If we drop OMPI v5.0 without a number of v4 features.
      • HUGE outcry from community if we don't support a lot of features from v4.
      • We are dying under the weight of our feature set.
    • Branching is not releasing. But putting in large features is painful after branch.
    • Everyone is approaching from "my company's" point of view, but we're managing the software for the community.
      • Both points of view are correct.
    • Set of things that none of our companies need, but that the community needs.
    • Bigger picture - how do we maintain this code? Haven't been able to maintain.
      • Right now we're saying "we need to do a release, lets see what we've got, and add more later".
      • Bigger picture - we could put together a README.md explaining why we removed some features.
        • Help set community's expectations.
    • Scope vs Schedule vs Resources
      • Probably can't
      • Can't release v5.0 because need to have these discussion.
        • Can't release v5.0 without new runtime.
      • Only other possible solution might be a v4.1 branched off of v4.0.x somewhere
        • Would need VERY specific cherry-picking into, and avoid any runtime issues.
    • One of the v5.0 RMs will have a list by Monday of needed features.
    • Moved the branch date to June 2nd, but will discuss next week with more data.
      • Original release was planned for end of June. Will discuss more next week

master

  • MTT on master is looking pretty good.
    • A recent commit broke some CI using older libevent.
    • AWS is working to fix this.

Face to face


Infrastrastructure

  • scale-testing, PRs have to opt-into it.

Review Master Master Pull Requests

CI status


Depdendancies

PMIx Update

ORTE/PRRTE

MTT


Back to 2020 WeeklyTelcon-2020

Clone this wiki locally