Skip to content

WeeklyTelcon_20200707

Geoffrey Paulsen edited this page Jul 7, 2020 · 1 revision

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Jeff Squyres (Cisco)
  • Artem Polyakov (nVidia/Mellanox)
  • Aurelien Bouteiller (UTK)
  • Austen Lauria (IBM)
  • Barrett, Brian (AWS)
  • Brendan Cunningham (Intel)
  • Christoph Niethammer (HLRS)
  • Edgar Gabriel (UH)
  • Geoffrey Paulsen (IBM)
  • George Bosilca (UTK)
  • Howard Pritchard (LANL)
  • Joseph Schuchart
  • Josh Hursey (IBM)
  • Joshua Ladd (nVidia/Mellanox)
  • Matthew Dosanjh (Sandia)
  • Noah Evans (Sandia)
  • Ralph Castain (Intel)
  • Naughton III, Thomas (ORNL)
  • Todd Kordenbrock (Sandia)
  • Tomislav Janjusic
  • William Zhang (AWS)

not there today (I keep this for easy cut-n-paste for future notes)

  • Akshay Venkatesh (NVIDIA)
  • Brandon Yates (Intel)
  • Charles Shereda (LLNL)
  • David Bernhold (ORNL)
  • Erik Zeiske
  • Geoffroy Vallee (ARM)
  • Harumi Kuno (HPE)
  • Mark Allen (IBM)
  • Matias Cabral (Intel)
  • Michael Heinz (Intel)
  • Nathan Hjelm (Google)
  • Scott Breyer (Sandia?)
  • Shintaro iwasaki
  • William Zhang (AWS)
  • Xin Zhao (nVidia/Mellanox)
  • mohan (AWS)

New

MPI Forum was last week.

  • Sessions is now in in.
  • Partition communication voted in.

Thread local storage issue

  • OpalTSDCreate - takes a thread storage local key that would be tracked locally in opal.
    • But when we go to delete, it's not being deleted.
    • But want flexibility to destroy on our own or explicitly
    • George thinks the mode we have today, since tracking all keys to be released by main thread.
    • George thinks Artem's approach is the correct approach.
  • Would have to change the way that keys are USED, and different components are using it in a different way.
  • Something similar should be done in different places.
  • If you do it just for UCX, then others can see how you did it and check for their code.
  • So we think current PR is good, but it leaves old API and new API.
    • But it might be better to remove OLD way and make broken components do SOMETHING to update their code.
    • Should be easy for components to add explicit cleanup calls
  • Master branch only.

AVX - converging on a solution.

  • RHEL6 has a weird issue.
    • If we want this new feature just for new systems only, it'd be okay.
    • hardware has AVX2, but assembler doesn't understand code.

C11 PR is a mess

  • George needs some input on PR
  • We're assuming that
  • We don't need _atomic_ in most cases just need volatile
  • patch linked to the issue PR7914
  • TBD if master-only Probably more than
  • We're not breaking things, we just get alot of valid complaints from intel compiler.
    • STDOUT of make is ~16 MB due to all intel compiler warnings without this fix

Discuss Open-MPI binding when direct-launched

  • Schizo SLURM binding detection - Might not need a solution on v4.0.x
    • Summary of the issue: if running on new SLURM version, we do the wrong thing.
      • Even worse, customer setting binding what they want in SLURM, but open-mpi binding incorrectly.
    • We may need a better way to tell OMPI schizo a better way to "do nothing"
    • This won't happen for v4.0.5 release
    • 7511 - Opened against v4.0.x - Some hisory here. Reason we set binding on direct launch jobs.
      • People were complaining that if they ran with mpirun, they got one performance, but direct launch through SLURM (or others) they got different behavior
      • If they run direct-launch - they might
        • Our solution was if we don't see that direct launch has attempted to bind, then we bind for them.
      • A better solution would be to have direct manager to inform us that they're not binding.
    • But once again srun has changed the way they do things
    • Seems unexpected to be binding when not requested.
    • This has been in there over a decade.
  • Should we be doing this at all? (Binding when using a direct launch)
    • We won't be doing this at all in v5.0.
    • Consensus - They need to fix this in direct-launcher.
  • Do we do changes in v4.0.x, or v4.1.x, or leave it.
    • Proposal to add item in README - the autobinding behavior of SLURM, only works before versions of X, and will be removed in SLURM.
    • What if we just remove all of the ompi-binds code in SLURM?
      • If we can find SLURM version string in env, we can not bind above certain version of SLURM.
  • Is this only SLURM or all direct-launch?
    • Wells there is only SLURM, Cray, JSM. We think Cray and JSM don't do anything, so probably just a SLRUM schizo issue.
  • In schedMD they only support 2 years back. 201802 is as old as they support.
    • if this older version has this issue, they it's a bug, and don't do it.
    • If it's regressed in 2019 version, then base if we do it on SLURM version.
  • We SHOULD do the SAME thing in v4.0.x and v4.1.x
    • May take 2 weeks to do this, need to install SLURM versions.
    • Could see if
  • Do we want to block v4.0.5 and v4.1.0 for this?
    • Yes probably a blocker.

Discuss Open-MPI tuned file for v4.1.0

  • We have a band-aid for v4.1.0 collective tuning.
  • Consensus - going to merge incremental improvement.
    • Need to push for people to test HAN.
  • Consensus - Should we make HAN on default on master?
    • To get better testing, and want for v5.0
    • Wait until we have understanding

Two other sporadic issues on master

  • AWS CI - see timeout every couple of jobs ALL CI
    • ring or hello-world about 10%
  • Yesterday IBM test suite get a random small percentage of failures 2 rank over 2 hosts
    • incomplete corefiles indicate inside INIT - still investigate.

Release Branches

Review v4.1.x Milestones v4.1.0

  • Schedule: Want to release end-of-July

  • Posted a v4.1.0 rc1 to go through mechanisms to ensure we can release.

  • Release Engineers: Brian (AWS) Jeff Squyres (Cisco)

  • Still want: George's Collectives Tunings for tuned coll AVX UCX PRs awaiting review.

  • Past: We've come to consensus for a v4.1.0 release

    • Need include/exclude selection, worried about consistent selection.
    • Alot of PRs outstanding, but can't merge until
      • Patch for OFI stuff messed up v4.1.x branch.
      • Howard has a fix PR, Jeff is looking at.
    • Howard changed new OFI BTL parameters to be consistent with MTL
    • Not breaking ABI or backwards compatibility.
    • v4.1.x branch, branched from v4.0.4 tag.
    • NOT touching runtime!!!
    • Not going to be pulling in a new PMIx version.
  • All MTT is online on v4.1.x branch

  • Not compiling under SLURM EFA test. (OFI BTL issue)

Review v4.0.x Milestones v4.0.4

  • Discussed Open-MPI binding when direct-launched (see above)

  • v4.0.5 schedule: End of July

    • PR7898 - We need resolution on this on master
      • 7893 - master release
    • Two potential drivers for a quick v4.0.5 turn-around.
    • OSC RDMA Bug - May drive a v4.0.5 release.
    • Program Aborts on detach.
  • OSC pt2pt we have on v4.0.x

  • Fragmented Puts, the counting is not correct for a particular user request

    • Non-continguous rPuts.
    • Also needed in a v4.0.5
  • How urgent is ROMIO fix?

    • Good to have in v4.0.5, but hard to make testcase to hit.
  • usNic failing almost all multi-node tests on v4.0.x

    • Jeff started to look at last week, but didn't get to look at this last week.
    • v4.0.x WAS working, and seeing Master failing.
    • ACTION - check back next week.
  • iWarp support Issue 7861.

    • How are we supposed to run iWarp in Open-MPI v4.0.x?
    • How much do we care about iWarp?
    • At a minimum need to update FAQ.

Review v5.0.0 Milestones v5.0.0

  • No update this week other than master discussion.

  • Need to put OSC pt2pt

    • OS RDMA requires a single BTL that can contact every single process.
      • This didn't use to be the case. (Comment in the code)
  • We can't use the OSC pt2pt.

    • It is not thread safe. Doesn't conform to MPI4 standard. Not safe.
    • This is just a testing falicy. Could add tests to show this, but still at same boat.
    • Either product A or B is broken and we need to fix it.
  • RDMA Onesided should fall back to "my atomics" because TCP will never have rdma atomics.

    • The idea was to put the atomics into the BTL base, which could do all of the one-sided atomics under the covers.
  • Jeff will close the PR, and

  • Jeff will Nathan will fetching, get, compare and swap.

  • Two new PRs for MPI4.0 Error handling - new PRs from Aurelien Bouteiller.

  • Does UCX support iWarp?

    • Does libFabric support iWarp via verbs provider?
    • https://github.com/openucx/ucx/issues/2507 suggest it doesn't.
    • Brian thinks that libFabric
    • OFI can support iWarp, just need to specify the provider in the include list.
    • This person who's asking is a partner not a customer
  • PMIX

    • Working on PMIx v4.0.0 which is what Open MPI v5.0 will use.
    • Sessions needs something from PMIx v4
    • ULFM - not sure if it needs PMIx, think it needs PRRTE changes.
    • PPN scaling issue - simple algorithmic issue in this function
      • PMIX talked about it. Artem might know someone who might be interested in working on it.
      • Algorithm behind one of the interfaces doesn't scale well.
      • Not a regression. Above ~ 4K nodes, becomes quadratic.
  • PRRTE

    • Nothing's happening there.

master

  • Mostly discussed above.

Face to face

  • Many companies are not allowing a face to face travel until 2021 due to COVID19.
    • Instead lets do a series of virtual-face to face?
  • Yes this summer to discuss for v5.0
    • Maybe we can do it by topic?
    • Maybe not 4 or 8 hour things.
  • Different topics on different days.
  • Do a doodle poll of least-worse days in late July/August.
  • Start a list of topics.

Super Computing Birds-of-a-feather

  • George and Jeff will help plan and come to community.
  • May not have Super Computing conference at ALL this year.
  • Many other projects are doing a virtual state of the union type meeting to try to cover what they'd usually do in a Birds of a feather meeting.
  • Then this works pretty well, and do this a couple of times a year.
  • Not constrained to Super Computing

Infrastructure

  • scale-testing, PRs have to opt-into it.

Review Master Master Pull Requests

CI status


Depdendancies

PMIx Update

ORTE/PRRTE

MTT


Back to 2020 WeeklyTelcon-2020

Clone this wiki locally