Skip to content

WeeklyTelcon_20171010

Geoffrey Paulsen edited this page Jan 9, 2018 · 1 revision

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen (IBM)
  • Jeff Squyres
  • Edgar Gabriel
  • Howard
  • Josh Hursey
  • Joshua Ladd
  • George
  • Mohan
  • Nathan Hjelm
  • Ralph
  • Thomas Naughton
  • Todd Kordenbrock

Agenda

Review v2.0.x Milestones v2.0.4

  • Iterating a bit on disabling cuda inside of hwloc 4249 PR on this branch.
    • Issue 4248 - disabling cuda on hwloc
    • On all existing release branches, do -cuda=no for hwloc configury.
  • Issue 2525 - may close since users can't access.
  • Schedule: if we get PRs in today, we should aim to get v2.0.x release DONE this week.

Review v2.x Milestones v2.1.2

  • v2.1.3 (unscheduled, but probably jan 19, 2018)

    • PR4172 - a mix between feature / bugfix.
  • Are we going to do anything for v2.x for hwloc 2?

    • At least put in a configure error if detects hwloc v2.x
  • HWLoc is about to release v2.0

    • If topology info comes in from outside, what hwloc was that resource manager using?
    • Is the XML annotated with which version of hwloc generated it?
    • would be nice to gracefully fail, since fairly opaque.
    • Seems like we'll need a rosetta stone for
    • HWLOC is a static framework.
    • Brice is going to get HWLOC by super computing, but it might be tight.
    • Are we comfortable releasing with an alpha/beta version of HWLOC imbedded.
    • OMPI 2.x will not work with HWLOC 2.0, because Changed APIs.
      • May want some configure errors (not in there yet)
    • 3.0 only works with older hwloc pre-2.0. In v3.0.x if it's hwloc 2.0, we error at configure.
    • in 3.1 branch external hwloc allows either hwloc 2.0 or older hwloc, but must decide at build time.
    • Still have to run 3.1 everywhere.
    • Do we want to backport the hwloc 2.0 support to v3.0?

Review v3.0.x Milestones v3.0

  • v3.0.1 - Opened the branch for bugfixes Sep 18th.
    • Still targeting End of October for release of v3.0.1
    • Everything ready to push has been.
    • a few PRs need review.

Review v3.1.x Milestones v3.1](https://github.com/open-mpi/ompi/milestone/27)

  • Branched v3.1 last night, but forgot to build nightly tarballs.

    • Building now.
    • Cisco plans to drop v2.0.x, to pickup v3.1
  • v3.1.x Snapshots are not getting posted. Has to do with cron failures.

    • Causing nightly mtts to not be run.
  • PMIx 2.1 should get in in time for v3.1

    • In master, but no PR to OMPI v3.1.x yet, since they haven't released it yet.
    • Schedule is at Risk:
      • What hwloc are we shipping.
      • PMIx on track, but needs PR.
      • Known issues on master, that need to be associated with v3.1
  • Administration

    • Looking at a way to recognize some supporting organizations to help acknowledge their support.

Review Master Master Pull Requests

  • MTT Amazon ARM v8 is failing all CI.
  • Default behavior of show load errors is true (has been true since 2006)
    • Been true for at least 8 years.
    • Don't remember this being a conscious change, maybe by accident.
    • If you're a packager, you build with all packages, so you can support everyone.
    • But then their users get a bunch of errors because they don't have everything installed.
    • Should put a configure option of what do you want the default to be.
    • Jeff Squyres - signed up to do this configury work. - Thanks.

MTT / Jenkins Testing Dev

  • Python client doesn't have nightly snapshot integration.
    • Need this since this is most of the release testing.

This week Discussion Points.

  • Website - openmpi.org
    • Brian trying to make things more automated, so can checkout repo, etc. Repo is TOO large.
    • Majority of the problem is the Tarballs. and already storing those in S3.

Oldest PR

Oldest Issue

Next face-to-face meeting

  • Jan / Feb
  • Possible locations: San Jose, Portland, Albuquerque, Dallas

Status Updates:

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM, Fujitsu
  3. Amazon,
  4. Cisco, ORNL, UTK, NVIDIA

Back to 2017 WeeklyTelcon-2017

Clone this wiki locally