Skip to content

WeeklyTelcon_20211207

Geoffrey Paulsen edited this page Dec 7, 2021 · 1 revision

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Geoffrey Paulsen (IBM)
  • Jeff Squyres (Cisco)
  • Austen Lauria (IBM)
  • Brendan Cunningham (Cornelis Networks)
  • Brian Barrett (AWS)
  • David Bernhold (ORNL)
  • Edgar Gabriel (UH)
  • Harumi Kuno (HPE)
  • Hessam Mirsadeghi (UCX/nVidia)
  • William Zhang (AWS)

not there today (I keep this for easy cut-n-paste for future notes)

  • Akshay Venkatesh (NVIDIA)
  • Artem Polyakov (nVidia/Mellanox)
  • Aurelien Bouteiller (UTK)
  • Brandon Yates (Intel)
  • Charles Shereda (LLNL)
  • Christoph Niethammer (HLRS)
  • Erik Zeiske
  • Geoffroy Vallee (ARM)
  • George Bosilca (UTK)
  • Howard Pritchard (LANL)
  • Joseph Schuchart
  • Josh Hursey (IBM)
  • Joshua Ladd (nVidia/Mellanox)
  • Marisa Roman (Cornelius)
  • Mark Allen (IBM)
  • Matias Cabral (Intel)
  • Matthew Dosanjh (Sandia)
  • Michael Heinz (Cornelis Networks)
  • Nathan Hjelm (Google)
  • Naughton III, Thomas (ORNL)
  • Noah Evans (Sandia)
  • Raghu Raja (AWS)
  • Ralph Castain (Intel)
  • Sam Gutierrez (LLNL)
  • Scott Breyer (Sandia?)
  • Shintaro iwasaki
  • Todd Kordenbrock (Sandia)
  • Tomislav Janjusic
  • Xin Zhao (nVidia/Mellanox)

4.0.x

  • Schedule: No schedule for v4.0.8 yet - sometime in 2022
    • bugfixes case-by-case basis

v4.1.x

  • Schedule: No schedule for v4.1.3 yet either - sometime in 2022

v5.0.0

  • Austen PRed a bunch of commits on master not yet in v5.0
    • 11 still need reviews.
  • Submodule pointers on v5.0 need updating
    • Still pointing at something on PMIx v4.1.x.
    • Brian PRing some fixes so we can update to PMIx v4.2
  • Seeing a ton of OSHMEM compile warnings
  • libNBC uninitialized variable. Jeff filed 9749 this morning (prob on both master and v5.0.x)

Master

  • Community Warm/Open to bringing in Sessions, but want to see Howard's PR later this week
  • Clock Monotomic - Jeff updated Timers.md in ompi-www
    • May only be Linux and OSX - maybe just an opal_inline, doesn't warrent a whole framework
    • WTIME a long time ago said not using framework.
      • Everyone just needs to agree to use one function
      • just need ompi_wtime (very MPI specific), wouldn't put it into opal
        • just going to call clock_gettime_monotomic_raw (doesn't allow for migrating to another core)
    • Maybe we should unify the times.
    • No requirement that MPI_Times to be comparable to Wtick and Wtime.
      • Quirks on different platforms.
    • Opal_Timers really build for opal progress where we needed a 10ms with low pertibation.

MTT

  • Cisco has some test build failures.
  • IBM has an OMPI build failure with XL compiler on ppc64le.
ompi_proc_sentinel_to_name(uintptr_t)$AF56_10.  Compilation ended.  Contact your Service
Representative and provide the following information: Internal abort. For more information visit:
http://www.ibm.com/support/docview.wss?uid=swg21110810
make[2]: *** [Makefile:2559: dpm/dpm.lo] Error 1
make[1]: *** [Makefile:2665: all-recursive] Error 1
make: *** [Makefile:1478: all-recursive] Error 1
  • IBM's looking to workaround with Open MPI code change.

Open-MPI v5.0

Longer Term discussions

Doc update

  • OMPI docs and manpages, but persistant problem that mpirun is really prrterun

  • PR 8329 - convert README, HACKING, and possibly Manpages to restructured text.

    • Uses https://www.sphinx-doc.org/en/master/ (Python tool, can pip install)
    • Intent this is for v5.0
      • mpirun / prrterun - we had quite a bit of details in orte, but are updating as much as possible.
    • Ralph has asked about this for PMIx/PRRTE since this is turning out to work
  • No update - 3/16

    • Could be independent of PMIx and PRRTE.
    • PMIx and PRRTE want to follow suite, and not require both pandoc and sphynx.
Clone this wiki locally