Skip to content

WeeklyTelcon_20180220

Geoffrey Paulsen edited this page Jan 15, 2019 · 1 revision

Open MPI Weekly Telecon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen
  • Jeff Squyres
  • Brian
  • David Bernholdt
  • Geoffroy Vallee
  • George
  • Josh Ladd
  • Artem
  • Matthew Dosanjh (Sandia)
  • Nathan Hjelm
  • Todd Kordenbrock
  • Thomas Naughton

--- A number of usuals not here today:

  • Howard
  • Edgar Gabriel
  • akvenkatesh
  • Josh Hursey
  • Mohan

Agenda/New Business

Minutes

Review v2.x Milestones v2.1.3

  • Merged last thing last night, and Howard will make RC2 v2.1.3 rc2 tomorrow.
  • There has been some patches, so Howard will start an RC build of v2.1.3 rc1
  • Issue 4349
    • Support for PowerBE was removed for v2.x (should be only be removed for major version).
      • We can't find notes as to why it was removed for v2.x
      • Nathan is testing this now.
    • One person said they'd stay at v2.1.1 and go no further.
    • Not getting overwhelming "yes we want it back".
    • No one has yet volunteered to support Power BE, for v3.x and later.

Review v3.0.x Milestones v3.0.1

  • Issue 4338 - SLURM integration broken on v3.0.x and v3.1.x
    • Not regression. Marked as blocker.
  • Schedule
    • Will build RC4 tonight.

Review v3.1.x Milestones v3.1.0

  • Issue 4338 - SLURM integration broken on v3.0.x and v3.1.x
    • Not regression. Marked as blocker.
  • Issue missing ibrary versioning for a new common component.
    • Will get into a new RC.
  • SCHEDULE:
    • Will build RC tomorrow.
  • Issue 4829
    • George and Giles - iovec > 2GB single message.
    • process VM readmem and writemem - glibc generates a syscall.
      • iovec not behaving as man page indicates.
    • Simple fix when we use CMA, set Vader's put and get limit to 2GB
    • George posted PR 4832
    • CMA only (CMA fails, and then copies in and copies out).
    • Also yells at the user, slow but not a silent data coruption.
    • not a regression, and shouldn't rush a fix.
    • Affects VADER
  • Paul Hargrove ran testing, and didn't find any issues.

Review Master Master Pull Requests

  • Out of order issue (BLOCKER)
    • Issue 4795
    • Went into master, but no PR to v3.1.x
    • Old bug, not a regression.
    • TCP and usNIC - multilink issue
    • numlinks parameter is clearly broken / untested.
  • Issue 4799
    • begining in v2.1 not binding to core.
    • Because binding to socket (by default), the pmix thread migrates to different core (on same socket)
    • Ralph suggested to revise deffault binding based on this.
    • Don't remember much decent when we moved default from core to socket.
      • idea was we're embrasing threads and bind to socket is much more thread friendly,
      • but comes with cost.
    • Performance from 30ms down to 6ms for single PMIx Get.
    • Open MPI right now has it's own service thread. Could that ALSO be impacted?
    • PROBLEM is that this is the SAME issue for Open MPI progression threads.
    • v4.0 at earliest.
    • SHould put something into the README about this affect of binding to socket by default.
    • Jeff will create an issue to add a blurb to README.

Process

  • Issue Issue4423
  • When your PR has been accepted into a release branch, please go to the issue, and remove the target of the release branch that it was just merged into. Attempting to automate this in the future.
  • In July we missed a Review who has commit access. We forgot, and will do again this Summer.

MTT / Jenkins Testing Dev

  • Is it possible to call web-api to see if tests had run on a given git hash.
  • New python database status
    • It's ready to use, but no one is working on reporter piece.
    • python client will run an work locally, but can't report back to server.
    • There is a rest API that Josh H. Implemented.
      • Current reporter has a bunch of hard-coded structure PHP.
  • Issue MTT Issue614
    • How does Cherry PI server get started on AWS?
    • Howard going to update Chrry PI server at AWS next week
    • Josh H had a better solution, but doesn't have cycles right now.
  • Autogenerate AUTHORs list script for v3.0.x
  • Brian has scripted the create tarball process: https://jenkins.open-mpi.org/jenkins/job/open-mpi.dist.create-tarball/
  • Tagging is another area to script.
    • Only tagging versions that are tested with MTT.
    • want to be able to put check in tag script check MTT to see if MTT tests had been run on that commit.
  • Copied tarballs from git into S3. They're in both locations now.
    • Brian has to update some scripts before he removes from from git.

Abandoning OpenIB BTL

  • OLD Discuss abandoning openib btl.
    • Nathan has a UCX BTL
    • ETA on GPU in UCX - basic minus CUDA IPC is in test now.
    • Any warning message if on iWarp?
    • What's the roadmap for this? 3.x or 4.x?

Oldest PR

Oldest Issue


Status Updates:

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM, Fujitsu
  3. Amazon,
  4. Cisco, ORNL, UTK, NVIDIA

Back to 2017 WeeklyTelcon-2017

Clone this wiki locally