Skip to content

WeeklyTelcon_20160705

Jeff Squyres edited this page Nov 18, 2016 · 1 revision

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen
  • Jeff Squyres
  • Howard Pritchard
  • Josh Hursey
  • Arm Patinyasakdikul
  • Joshua Ladd
  • Nathan Hjelm
  • Nysal
  • Ralph
  • Ryan Grant
  • Sylvain Jeaugey
  • Todd Kordenbrock

Agenda

Review 1.10

Review 2.0.x

Review Master MTT testing (https://mtt.open-mpi.org/)

  • Has improved. 233 failures, currently on Cisco.
    • Cisco - Many Cisco failures are local cluster issues. Art is working on cleaning up.
  • Jeff put in a patch into MTT to allow thread hangs to be marked as hangs.
  • nVidia failures are all PMIx failures.
    • Giles found a race condition in PMIx 2.0.
  • v2.x failures on Comm_spawn_loop.
  • overall not too bad.

MTT Dev status:

New Items:

  • Face to face coming up
  • Need to discuss ways to take payments.
  • WebSite transitions
    • Website itself
    • Nightly tarballs
    • Archives of mailing lists entries.
    • Have mbox archives of all of the lists also. But as soon as we move stuff, where do NEW posts get archived?
  • Travis was hung over the weekend. Not sure why.
  • ibm jenkins was off over the weekend, should be fixed now.

Status Updates:

  • Cisco
    • Have Arm, got usNIC BTL thread multiple in master
    • lots of minor bug fixing and 2.x items.
    • been more focused on libfabric stuff.
  • NVIDIA
    • Watching MTT
    • when have 2 cpus and IB card on node, might want to use IB card to do transfers between GPUs.

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM
  3. Cisco, ORNL, UTK, NVIDIA

Back to 2016 WeeklyTelcon-2016

Clone this wiki locally