Skip to content

WeeklyTelcon_20201027

Geoffrey Paulsen edited this page Jan 19, 2021 · 1 revision

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • NOT-YET-UPDATED

4.0.x

  • No driver for 4.0.6 right now
  • If Openpmix 3.2.1 fixes spawn, may drive.
  • Ragus OFI MTL patches, may drive.
  • OpenPMIx v3.2.1 post an RC this week.
  • Issue about singleton comm-spawn - Used to work, not sure when regression introduced.
    • Nice to get to work

v4.1

  • v4.1 HAN PR is ready (up to date with master PR)
    • Coverity found a few issues, but can address in a future PR.
  • Brian saw AB tests - something changes in Allgather
    • IAllgather issues in Master and v4.1 (before HAN as well).
    • George thinks it's TCP BTL (not HAN).
    • Iallgather requires an active participation from a thread in mpi lib (ob1 + BTLs)
    • We may not be doing the right thing in MPI_Test
      • don't see the same behavior in UCX.
      • Probably issue in OB1.
      • Instead of draining network as much as possible in OB1, we just progress one thing and return.
    • AWS did run some benchmarks on HAN.
      • Tuned was doing better at <4MB
      • BCast HAN better at all
  • Pull in Openpmix 3.2.1
  • Looking for an RC.

Open-MPI v5.0

  • Met last week, looked through some open tickets. Waiting on pmix / prrte branching.

  • PMIx development in progress.

  • PRRTE - Regressions compared to existing

    • Write MTT tests that try different command line options.
    • Josh Hurey has been doing some work on Tools interface.
    • Started going through the various command line options, and focused on map-bind and order, and triaging those.
  • Configury changes looking good.

  • Branching plan for PRRTE and PMIx.

    • When look at spreadsheet of blocker tasks, for PRRTE and PMIx (getting reasonable)
    • PRRTE is pretty long / miserable.
    • We need a plan around tackling those.
  • PRRTE list URL: https://docs.google.com/spreadsheets/d/1OXxoxT9P_YLtepHg6vsW3-vp4pdzGQgyknNbkzenYvw/edit#gid=883290229

  • Another place we could use some eyes...

    • PMIx v4.0 - does some device distance computation (in current PMIx)
      • Removal of NUMA makes this more challenging.
      • Not sure that this is right (generating the correct action based on result).
      • Would be worth spending some time to see if this is what we want.
    • Networking guys need to look at this, since placement is out of date.
  • Branching PRRTE in november is still in the plan.

  • PRRTE usual areas of pain, or map/rank binding options.

    • Wireup is all solid.
    • Documentation / man pages for various tools in prrte.
    • Some descrepencies between prun help and manpage, not sure which is true.
    • Need to do manpages / help pages for prun (out of date, and not syncronized).
    • If you do mpirun from Open-MPI, the command line options are different.
    • If you do mpirun from an MPICH or OSHRUN personality, you get different command line options...
      • How do write that manpage.
    • There is a PR for this manpage... about half done, help wanted.
  • Open-MPI might have it's own mpirun manpage that only focuses on open-mpi personality.

    • manpage in markdown now.
  • Need to scope some of those issues.

  • A new PR 8056 for MPIT events (for MPI v4.0, Open-MPI v5.0)

  • Been in a holding pattern

    • Josh Ladd is ready and willing for RM work, has just been busy with nVidia/Mellanox transition.
  • Schedule: PMIx v4.0 Standard is in good shape.

    • libpmix in September
  • Ralph will be retiring early December

  • Brian will be taking a 3 month leave of absence from Amazon starting next week.

    • Raghu or possibly someone from AWS
  • Mailing list issues.

  • ULFM review

    • What are our Internal ABI guarantees?
      • Example: in ULFM pull request changes sizeof(ompi_proc_t)
    • size changes if ULFM is configured in or not.
    • ompi_proc_t is used by SHMEM and they're using the extention space, so WE can't use that.
    • ompi_proc_t is something that leaks into MPI API space... :(
      • Brian will look at this.
    • Still open - Aurelien was not sure what the ABI requirements are.
      • Aurelien will update PR8007 And Brian will look to see what ABI changes are being discussed.
  • Questions about users doing their own PMIx implementation.

    • Is OMPI v5.0 is going to #if 0 all of the PMIx APIs not needed by MPI?
      • Consensus
    • If they implement their own pmix, they want to implement the bare minimum.
    • OMPI v5 will require PMIx v3
    • We should point out that we already have an existing way to interface with older PMIx, and they should use that.
    • Want to support OMPI v5 in FLUX is the issue.
  • PMIx v4.0

    • Now provides device distances for fabric and GPU.
    • Can compare to OFI provider's implementation. Written in collaboration with brice.
    • Ralph could come up with a few slides, and we could advertise. Probably alot of interest.
  • PRRTE needs to come up to speed with PMIx v4.0

    • PMIx change went in. openpmix v4.0 a
  • PRRTE Several blockers - Looking for resources for PRRTE blockers.

    • Listed in PRRTE issues / spreadsheet.
    • 2 or 3 known bugs that need to be fixed.
  • Brian discovered a number of issues.

    • When we pull in hwloc as a submodule, it exposes us to a number of their dependencies to build them like a full external in the make dist.
    • If we import by tarball, and already configure with external
    • Brian is proposing stay as a hybrid for "now"
      • libevent and hwloc - move to redistributing
      • leave pmix and prrte as submodules.
    • Default in v5.0 we STILL prefer the external version (installed on the system).
    • We may be getting to the place where we could ask users to just go get libevent.
    • hwloc we need 2.0 or later, and a number of distros haven't upgraded yet.

master / new topics

  • May want to disable certain avx512 capabilities
    • In Intel Docs, for a specific instruction, need the AVX512 OR the KNCI flag.
    • If George can test for that flag, he can test for just that functionality.
    • But not sure how to test for this flag.
    • If anyone knows, please contact George.
  • Cisco had to do a revamp on MTT cluster. Might have some Cisco issues.
    • If there are MTT issues let jeff know.
  • Branch date for v5.0.x?
    • RMs could go off and look at features on wiki and create a proposal?
    • What's the plan to track things, frustrating people.
  • MTT - A bunch of Build errors on Cisco and IBM.
    • A build failure. PGI compiler.
    • We moved our MTT over to RHEL8, and moving our CI over to RHEL8
    • Found a build error in hwloc + PGI - hwloc never picked up a configury fix
    • It's in hwloc release branch.
    • May just need to wait until hwloc rolls another release.
      • Or configure our MTT to use external hwloc.
  • PR 8007 -
    • Please go look at PR 8007 - did we do the right things for LICENSE / Copyright.
    • If we could get a couple more folks looking at this.
    • Then if all of us are comfortable with it, we might want to ask Pavan at MPICH.
    • If we attempt to do the right thing in the future.
      • AWS needs to run it through their lawyers. Ragu
    • Changes the LICENSE, which may change how users redistribute.
      • This would be the first thing we can't turn off.

PPRTE

  • Been doing some work from Tools side.
  • A lot of new work needed to stabilize it.
  • Not too many bug reports lately, but maybe some more as use picks up.
  • Some ULFM and scale testing.
  • Open MPI master submodule update is manual process.

PMIx v4.0

  • Release canidate of document for PMIx v4.0 standard.
  • Want to replace shared memory component.
  • Need to do some Tools fixes.

Video Presentation

  • George and Jeff are leading
  • Looking at week of Nov 30th - two weeks after supercomputing.
  • Thursday and Friday are slightly preferable.
  • Pick a US/Europe friendly talk
  • 90 minute total time
  • first 60 minutes of pre-recorded presentations
  • 30 minutes for questions.
  • Online get a lot more questions.
  • A moderator to help monitor questions
  • Submit up to 2 slide for presenters to submit.
  • Round Robin the presenters

New

HWLOC initializiation thing. (Issue #7937)

  • trivial to fix in master.
  • Once Brian gets his configure stuff in.
  • May need someone else to finish.
  • Should be able to call PMIx Init, and ____ init, don't need opal init at begining of MPI_Init.
    • This won't work going back into releases.
    • buried in mca system.
  • What to do about fixing release branches.
  • Can't give local topology without ___
  • Don't run it at scale.
  • The portable way to get it, is hwloc.

DELETE:

  • Summary: We committed some code
    • Race condition we always win (because it happens at finalize and haven't cared), but now in ULFM (and possibly Sessions)
    • We switched the configury logic so we always prefer external libevent (above a certain level of external libevent).
      • Most OSes are above that level, so almost always prefer external libevent.
      • If we get the fix into our internal libevent,
        • Concern is that unless we or users explicitly request internal libevent, we'll almost never get this fix.
      • One solution would be
    • Can't think of another solution.
    • Packagers don't like to use our internal component
    • Only thing we can think of is if you want ULFM, you can't use external libevent.
  • Progress of getting PR accepted upstream?
    • Yes, prepared an upstream libevent PR.
      • They want a non-open-mpi reproducer.
      • Have ideas on how to create this reproducer, but not sure if it's very easy.
      • Original code writer added some protection, but has since retired. This PR removes this protection.
        • Actually "we" added this race condition protection in libevent. It delays removal of file descriptor until too late.
          • The fix validates the FD before handling. Sounds right to all.
    • Not started yet. Creating
    • May be a way to code around this on ULFM, but not really sure, because things get into a bad state, and only way might be to ruin our performance.
  • If we protect this with configure (when building ULFM and have to use internal libevent).
    • It means we move to submodules for libevent, we'd have to "mirror" libevent ourselves
  • Only master / v5.0
    • If we have TCP it could happen, but we disable errors in Finalize so don't hit this issue.
  • libevent patch to this OLD internal libevent 2022
    • It's possible that the problem goes away in newer libevent. But updating libevent was a major hassle.
    • George check if code is gone or has been modified in libevent.
      • Code is still there in latest libevent (so still need fix).
    • updating libevent would be a much better solution.
  • If upgrading to new libevent is answer.
Clone this wiki locally