Skip to content

WeeklyTelcon_20220621

Geoffrey Paulsen edited this page Jun 22, 2022 · 1 revision

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Akshay Venkatesh (NVIDIA)
  • Austen Lauria (IBM)
  • Brendan Cunningham (Cornelis Networks)
  • Christoph Niethammer (HLRS)
  • David Bernhold (ORNL)
  • Edgar Gabriel (UoH)
  • Geoffrey Paulsen (IBM)
  • George Bosilca (UTK)
  • Harumi Kuno (HPE)
  • Hessam Mirsadeghi (UCX/nVidia)
  • Howard Pritchard (LANL)
  • Joseph Schuchart
  • Josh Fisher (Cornelis Networks)
  • Thomas Naughton (ORNL)
  • Todd Kordenbrock (Sandia)
  • Tommy Janjusic (nVidia)
  • William Zhang (AWS)

not there today (I keep this for easy cut-n-paste for future notes)

  • Artem Polyakov (nVidia)
  • Aurelien Bouteiller (UTK)
  • Brandon Yates (Intel)
  • Brian Barrett (AWS)
  • Charles Shereda (LLNL)
  • Erik Zeiske
  • Geoffroy Vallee (ARM)
  • Jeff Squyres (Cisco)
  • Josh Hursey (IBM)
  • Joshua Ladd (nVidia)
  • Marisa Roman (Cornelius)
  • Mark Allen (IBM)
  • Matias Cabral (Intel)
  • Matthew Dosanjh (Sandia)
  • Michael Heinz (Cornelis Networks)
  • Nathan Hjelm (Google)
  • Noah Evans (Sandia)
  • Raghu Raja (AWS)
  • Ralph Castain (Intel)
  • Sam Gutierrez (LLNL)
  • Scott Breyer (Sandia?)
  • Shintaro iwasaki
  • Xin Zhao (nVidia)

v4.1.x

v5.0.x

  • Schedule:
    • PRRTE is targeting late summer.
    • Newish issue regarding the partition communication features.
      • Since it's a new feature try to get these in as well.
  • Only a small number of changes on v5.0.x branch.
    • Some docs
  • main has also been quiet this week.
    • New Issues opened 10480 - Need to be done prior to release.
    • New Issues opened 10481 - Need to be done prior to release.
    • A few other issues.
    • mpirun -v on v5.0.x returns prrte version.
  • Does anyone still care about the min-dist mapper? Considering dropping this is PRRTE.
    • Mellanox developed and will reply.
  • Open an issue to track it?

Main branch

  • Discussed Accelerator framework (see below)
  • Discussed atomics PRs (see below)

Accelerator framework

  • Tommy got some discussion that they do have customers who use the sm_cuda component.

  • William will try to update sm_cuda component and convert it into the framework.

    • Akshay had some comments.
    • Mellanox commits to testing these changes.
  • Want to see what priority to set HAN and Adapt by default and what priority.

    • Depends on scale and message sizes.
    • Not just the message size, but also the ranking affects the performance
    • Tuned, the communications go between ranks based on tree ignoring ranking on nodes.
      • Han rearranges the ranks to allow for optimal approach at each level.
    • Han should be faster and more stable because
  • Adapt deals with asyncronous order of arrival to collective.

    • Tommy saw some segv with Adapt, so he just
    • logic is very similar to tuned with tree. But much more async
      • really adapt based on which arrives

Joseph posted two atomics PRs

  • 10492 and link to 10487
  • C11 atomics makes every atomic sequentially
  • But we have many code-paths that we don't want this.
    • If you don't use threads, or if you do use thread but do initializations, we don't want this.
  • First thought on this is to relax load and stores.
    • But going through code and figuring out where to
    • So second PR just removes _atomic for C11.
  • Difference measures was 20-25% for local messages.
  • 10492 moves us back to where we were before C11 atomics.
  • Because they're atomic gcc uses exchange (x86)
    • and exchange is very expensive even if there's already a lock around it.
    • saw this in GCC 9, but not 10, but then again in 11.
  • Compiler doesn't know
    • OPAL_THREAD macros. no way to tell it to avoid it.
  • Variable is marked with atomic flag. and doing + in thread
  • objdump an ob1 function.
  • with _atomic we have no control over memory ordering other than explicit atomic load/store operations...
    • This is what first PR does...
  • Is there a risk with 2nd PR that we might need to add some locks.
    • Code we have today has been tested with old flavor, so it should be pretty safe.
    • When we write new code, we'll need to
  • Given the way OPAL_ATOMIC is structured, we hope no one expected an increment was not atomic.

MTT

Face-to-face

  • Wiki for face to face: https://github.com/open-mpi/ompi/wiki/Meeting-2022
    • Should think about schedule, location, and topics.
    • Some new topics added this week. Please consider adding more topics.
    • Might be better to do a half-day/day-long virtual working session.
      • Due to company's travel policies, and convenience.
Clone this wiki locally