Skip to content

WeeklyTelcon_20230110

Geoffrey Paulsen edited this page Jan 10, 2023 · 1 revision

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Austen Lauria (IBM)
  • Brendan Cunningham (Cornelis Networks)
  • Brian Barrett (AWS)
  • Edgar Gabriel (UoH)
  • Geoffrey Paulsen (IBM)
  • Howard Pritchard (LANL)
  • Jeff Squyres (Cisco)
  • Josh Fisher (Cornelis Networks)
  • Josh Hursey (IBM)
  • Luke Robison
  • Thomas Naughton (ORNL)
  • Tommy Janjusic (nVidia)
  • William Zhang (AWS)

New Items

  • This UCX-inspired PML issue: https://github.com/open-mpi/ompi/pull/11228
    • UCX is not the issue, but issue came about based on this discussion.
    • Talked to Ralph and Tommy and others.
    • Don't have enough info. Need to audit startup and shutdown.
      • We might need to put a group together to discuss the ordering of init/finalize, especially with regards to sessions, and determine what is supposed to happen. Then go back and audit the existing code and compare to what actually happens. We might well find some inefficiencies that could be startup/shutdown performance regressions at scale.
      • Howard has been working on this for last few weeks.
      • We have two extra PMIx Fences in MPI_Init.
      • Somehow with Sessions and other PRs, we ended up with redundant Fences.
      • Howard has a PR for this, that will removed these extra ones.
      • There's a notion that everyone needs to be able to talk to everyone else, after a Session is INITed, but that's not true.
      • SMBTL, notion that everyone on the node wants to figure out how they're going to talk at init time, but this is not consistant with Session init.
      • But for now we'll just do whats needed for MPI_Init(), and for those who are pushing MPI_Sessions, might hit these issues.
        • Long term we need to fix:
          • Group Communication operations to create a communicator from a Session.
          • Some other items I didn't capture.
        • This is too much
        • In the code, Sessions are called "Instances", but the name was later named "Sessions".
      • Other aspect of this, and why UCX became problematic.
        • UCX folks found out (xpmem) accessing these from another process don't have the same symantics of SysV shared memory.
          • this can result in a page that's backed by nothing from another process.
        • It may still try in a work-access (read) from a process has exited (SIGBUS or SEGFAULT).
        • To workaround, UCX PML put in a PMIx fence, but that isn't consistant
        • George chimed in saying that add_procs/del_procs are supposed to be local operation.
      • Tommy agrees that add_procs/del_procs shouldn't be fenced.
      • the change in the PR # workaround the issue, moved the PMIx finalized later.
        • This PR fixes this symptom, but this implies that there may be other problems (or mask future issues) elsewhere
          • We need a group discussion.
      • Didn't follow this too much, because UCX PML doesn't currently work with Sessions.
    • Testing - howard started putting in some tests in ibm/sessions section, and if you run ibm private test repo
      • Won't fail the tests in the public session tests (these are more like sessions1, very similar to MPI usage)
      • Tests in ibm/sessions (more like sessions2, much more asymetric) should hit these.
      • A lot of places where we use PMIx_Get() with "Intermediate" which will almost certainly fail without the PMIx fences
    • It looks like the --enable-timing code has been broken. The configure + env works, but the results are gibberish.
      • timing infrastructure is always broken due to these things, but no one uses it until there's an issue.
      • Howard would like to get it working. Is this something that might be easy to fix?
    • Setup a discussion for longer term solution.
      • Could also discuss on PMIx meeting every 2 weeks.
  • hwloc and PV (https://github.com/open-mpi/ompi/issues/11246)
    • Document in v5.0 README: If you have PV, need hwloc 2.7.2+, there is a v2.8 out there as well.
    • Is there any reason to update our internal hwloc to v2.7.2
      • Users of PV are large HPC installation.
      • Work someone has to do, got to stop messing with hwloc
    • We Should minimize changes into v5.0.x, so DONT update internal hwloc on v5.0.x
      • what about on main?
      • We'd Rather just remove hwloc and libevent internals for next MAJOR (from master) release.

v4.1.x

  • Need to do a PR to update the release date in News, and then can build release tarball.

v5.0.x

  • A lot of v5.0.x bugfixs went in (lots of coverity fixes).
  • Need documentation
  • Need to come to consensus on HAN/Adapt
    • There are a bunch of PRs out, but no one has reviewed.
    • Brian thinks we should figure out the messaging, and turn HAN. Say that we promised
  • Does main/v5.0.x care about 32-bit builds? https://github.com/open-mpi/ompi/issues/11248 and https://github.com/open-mpi/ompi/pull/11282 We came to the conclusion yesterday at the RM call: unless someone steps up to maintain 32 bit, we'll turn it off in configure for now. After v5.0.0 has been released and out in the wild for a while, and we're absolutely sure no one is going to surface saying "I care about 32 bit!", we can talk about removing the 32 bit infrastructure from OMPI
    • No bugs reported on v4.x with 32bit.
    • Even raspberry Pis are 64bit.
    • v5.0.0 is the right time to get this in.
  • Update Autotools version for dist tarballs before v5.0.0. https://github.com/open-mpi/ompi/pull/11264
    • This is the update to stay current.
    • Does not affect developer builds, only building release tarballs.
  • William is asking about v5.0.0 schedule.
    • Reason it's not ready, is that there isn't the manpower.
    • What it's going to take, is to say they have a hard delivery requirement, and bring the manpower.
    • HAN - Going after performance, we could be here for years.
      • libfabric that we just discussed.
      • PMIx fence issue.
      • LD_LIBRARY_PATH and --prefix, we're doing something wrong. Issue #11269.
        • By default on linux, what was done should have worked, but didn't.
        • There is code missing from our mpirun to set the LD_LIBRARY_PATH for the application.
        • Yes we need to do rpath/runpath... but the big problem is we need the above.

Main branch

  • Would like to remove hwloc and libevent internals for next MAJOR (from master) release.
  • PR #11116 that Howard closed. Core issue is still there.
    • RHEL7 (which we still support) provides libfabric 1.7
    • We need to decide what supporting an OS means.
      • We can change code to make it work with libfabric 1.7 (right now it is coded to work with v1.9)
      • Given that we support RHEL7, it seems reasonable that we then support all of the versions than the OS ships.
    • Howard closed #11116 is that it makes our code simplier to just require 1.9, but if we want to support 1.7 we need a bunch of if-defs.
      • It's more than just ifdefs around the function calls.
      • some of the structures have changed, so a bit more gross than this.
      • Thats why he closed this, because very low chance that these paths would be tested.
      • RHEL7 extended support goes through 2026
    • For at least libfabric, Howard can go and do ifdef and configury stuff.
  • Lets figure out our guiding principle for Supporting an OS.
    • I would expect that whatever the OS ships, would work.

MTT

Clone this wiki locally