Skip to content

Meeting Minutes 2017 07

Tomislav Janjusic edited this page Jan 6, 2024 · 1 revision

Face to Face - July 11, 2017

Attendees:

  • Ralph Castain (Intel)
  • Jeff Squyres (Cisco)
  • Brice Goglin (Inria)
  • Brian Barrett (AWS) [only Tuesday and Wednesday]
  • Mohan Gandhi (AWS)
  • Shinji Sumimoto (Fujitsu)
  • Takahiro Kawashima (Fujitsu)
  • Nathan Hjelm (LANL)
  • Howard Prichard (LANL)
  • George Bosilca (UTK) [at least partially]
  • Edgar Gabriel (UH)
  • Artem Polyakov (Mellanox)
  • Matthew Dosanjh (SNL)
  • Geoff Paulsen (IBM)
  • Geoffroy Vallee (ORNL)

First starting off with normal weekly

  • v1.10 - lets move this to unsupported.
    • stop making nightly tarballs.
  • v2.0.x and v2.1.x
    • Nothing much. PRs keep trickling in.
  • PR 3717 - declare it a bug.
  • v3.0.0
    • Some outstanding PRs that are just waiting for Reviews.
    • Thursday Morning - going out with --enable-builtin-atomics by default (different default than in v2.1.x
      • --disable-builtin-atomics needs testing in v3.0.0
      • goal is to improve performance in v3.0.1 by changing default back to --disable-builtin-atomics.
  • BLOCKER - Serious issue on v3.0.0
    • Vader - https://github.com/open-mpi/ompi/issues/3821
    • RDMA interface in side of OB1 rget protocol.
      • vader, ugenie, openib all support rget.
    • The fix is to remove the fragmentation, but this ruins the BTL pipelining.
    • It LOOKs like it's under the eager limit, but really, it was much bigger, but when you look it, the size is wrong.
  • Schedule for v3.0.0 - Originally was scheduled for June.
    • Took a little longer on getting PMIx, and testing infrastructure.
    • Will talk later about testing tarballs for release, rather than git clone.
    • Howard will nping Vidia on testing.
    • v3.0.0 PMIx 2.0 is good. Will merge last bugfix.
    • Outstanding Blockers
    • Do another RC tomorrow and get feedback.

---

Discussion

---

☎️ Should we forward all OMPI_ env vars from mpirun environments to started process environments?

  • If so, should we also for ORTE_ and OPAL_ env vars?
  • Or should we only forward OMPI_MCA_ env vars?
    • NOTE: current master forwards all OMPI_ env vars
  • Should we make a non-OMPI_MCA_ prefix that we also forward, but something less than all of OMPI_? (E.g., OMPI_FORWARD_, or something better)
  • What about non-OMPI MCA params (e.g., PMIX_MCA)?
    • Just envars, or do we add a registration function for cmd line support (e.g., -pmca foo x)?
  • PMIx based resource managers are using a mechnaism to allow us to do whatever we want to do before they launch the job.
    • Intel is using this to set the PSM2 security tolken, and the resource manager propagates that.
    • COuld use this mechanism for PMIx to read this conf file to understand what env vars to ask PMIx resource manager to propagate. This would be the portable way to do this.
  • With MPIRUN, there are lots of ways to do this.
  • SLURM has a conf file to tell it what env to propagate. SLURM out of the box doesn't do this.
  • There are environment variables that people are setting that are not OMPI_ env variables, for example SLURM plugin doesn't know to pull THOSE env vars.
  • If you have multiple MCA components, do we allow for multiple types of command line options, for example:
    • -mca - open mpi MCA components.
    • -pmca - pmix MCA components.
  • Two pieces:
    1. Have env pattern; PMIX_MCA-*
    2. mpirun -mca rmpas_base verbose 5 - now want to set a PMIX mca parameter?
      • Do we want to provide this command line propagation?
      • Nathan's proposal --pmix-mca --- just turn the string before the -mca to uppercase and set that.
  • Some discussion about RegEx package for describing what to propagate.
  • SLURM always forwards everything, except a few tricky env vars.
  • some way during configuration for components to register prefixes for mpirun to forward.
    • Will handle multiple prefixes.
  • Ralph will write something up, and we can discuss in PR.

Brian Discusses some Release process.

  • Demonstrates how to build release tarball through Jenkins.
  • Proposal is to build a release tarball from a nightly tarball.
    • This used to be hard to release not from tip of release.
  • Today building nightly tarballs is still somewhat differently.
  • Wanna fix www git repo (it's 5 or 6 GB, should be 5 or 6 MB).
  • About 2-3 weeks away from building release tarballs exactly the same as nightly.
  • Instead of cron, Brian would propose we use Jenkins to build.
  • Need to make building the tarballs the "same" between nightlies and release.
  • Discussion
    • Need a better way to do NEWS and AUTHORS (should be able to totally automate AUTHORS)
    • Could cause merge conflicts if done in PRs, and then one would have to rebase (BAD).
    • Can we get a NEWS decoration to commit messages on branches so that we know what to put in NEWS?
      • People generally like the idea that developers should add NEWS worthy things to the NEWS file themselves in their PRs.
    • We have full NEWS of all older releases in NEWS.
    • AUTHORS - Jenkins can run a script and auto-commit to release or create a PR.
      • Alternate - what if we just commit the script, and not the AUTHORS file, What if that just ends up tarball?
        • script need mailcap that lives in ompi.
        • git username / email.
        • Organization list.
      • Some drawbacks to organization list.
      • Generally like people to just auto-generate list.
      • When you clone, won't get AUTHORS, might be some legal.
  • Brian has some fix on how to make nightly build tarballs, and he will PR these to hwloc and Pmix.
    • The goal is that everyone who has open mpi community access should have access to login to jenkins.

CI - Brian shows off some Jenkins fun.

  • Open MPI does some CI on each commit.
  • looking at jenkins. Multijob project open-mpi.pull_request *

Lunch - Mexican. Thanks Amazon!

The git way

  • Now that we've been using git for a while, lets re discuss how we want to handle release branches.
  • Two models to discuss:
    1. Everyone PRs to master, and then we MERGE to release branches. Possibly additional commits on release branch to update versions, etc.
    2. A model where bugfixes go onto release branches and are merged back to master, and new features go onto master.
  • Brian's v3.0 experiences.
    • Goal should be to keep release branch shorter (perhaps 1 month?)
    • Need to get developers to be 'okay' with reverting commits that cause failures.
    • Cherry picking during v3.0.0 has been painful because we broke ABI compatibility, which made cherry-picking painful.
  • One of IBM's pain points is the "bugfixes must go to master first"
    • reasons for that requirement:
      • Keeps discussion in one place (on the master PR, do that one first).
      • Have to go to master anyway.
  • Discussion about how to query MTT database to get info about "is the product better today than yesterday"
  • Next MTT discuss:
    • Having MTT submitters use the new Restful API approach (cisco, Amazon, and IBM volunteered to test this first.
    • Next MTT meeting discuss if we want to go with known failure file, or a do-not-run.
    • Figure out if and how to drive failure results to 0, so that we can trigger on failures.
  • ☎️ Revisit this old discussion: should we continue cherry-picking from master to release branches?
    • The Git Way is usually to merge from master to release branches
      • (Artem) Few comments: my impression is that Git way is vice-versa (https://www.atlassian.com/git/tutorials/comparing-workflows#gitflow-workflow). It assumes following types of branches:
        • develop (persistent, where all new features go),
        • master (persistent, where all the releases are, each marked with the tag)
        • feature (temporal, branched from develop, merged back: for the temp work on new feature)
        • hotfix (temporal, branched from master, merged back: to fix post-release bugs). (!) Once the hotfix is merged to master, master is merged back to develop, not vice-versa to keep develop consistent with master.
        • release (temporal, branched from develop, merger to master: to harden before next release)
      • Currently: a) our master = developer; b) we don't have master equivalent; c) we keep release branches which force us to do cherry-picking and we sometimes have problems with lost commits.
      • This is not to say that we should follow this, one disadvantage I already see - not easy to support the old releases as release branches are eliminated after it stabilized. Just to keep in mind.
    • This puts more emphasis on master to be more stable. But maybe with all of our new CI, master is more stable these days...?
    • There are pros, cons, and differences: e.g., things wouldn't go on master unless we intend to merge them to release branches.

We now have options for merging PRs.

  • Continue the way we do now (merge at current head)
  • Rebase and merge (i.e., much more of linear history)
  • Rebase and squash - We think this is aweful, but developers should squash as appropriate.
  • For master do we first rebase folk's branches before merging to Top of master?
  • USECASE: rebasing a really old branch with multiple commits, is painful because you have to fix EACH on the branch... but if you MERGE, git figures that out.
    • Tool to help with rebasing a really old branch without MERGING called: "git-rerere"
  • Problem is that git time is incorrect...
    • Use -topo to show git log in topographical order, rather than chronological.
  • when you rebase and merge it assigns each new commit a new time, and new commit hash all at rebase time.

UCX packaging in OMPI sources (Mellanox)

  • Want this in OMPI v3.1
  • Configuration prerequisites
    • When we turn it on (check available fabrics, tcp should be available soon, then UCX can be always on)
  • How new versions are updated
  • Placement inside the sources: needs to be available for both MPI and SHMEM layers.
  • Any Precedent? libfabric was pre-released with Open MPI in Dec 2nd, 2014 and removed June 9th, 2015.
    • Temporary measure.
  • If configure can find libucx on the box, configure can just configure it in (without --withucx), and then use normal selection logic to use ucx.
  • did not re-discuss embedding ucx

Discussed the selection logic problem

  • A new framework before opening PML to gather what transports are available.
  • During init - have a transport Framework - Everything in common, PSM, Common, everything has a common library in ompi for initialization sharing stuff.
  • Transport_init(...),
    • returns: latency, bandwidth, capable of talking to remote??? special case for SHMEM==0)
    • components: ucx, libabric, tcp, sm, self?, gni, usnic
  • Brian will write up a description of proposed 'Transport Framework' that is meant to solve this, and common issues.

v3.1 versus v4.0

  • The thing that we thought drove a backwards incompatibliity was George's Datatype stuff, but due to delays in v3.0, that was merged there.
  • At this time we believe there is no ABI break in master. So next release will be 3.1 NOT 4.0
  • IBM will run some imb testing to detect ABI break between 3.0 and master, to see if there are any other breaks.
  • IBM will also read through the logs to see if anything looks suspect.

deprecating internal components

  • We will deprecate internal components (hwloc, libevent, ___) in v3.1
  • in v3.1 we will look for external component first, and then internal if can't find.
  • in v3.1 if we select the internal component, we will emmit a message that it is deprecated.

deprecate MPIR Interface in v3.1 in favor of PMIx for communication between tools and MPI

  • MPIR is still current, and older debuggers haven't ported.
  • Can't really make both PMIx and MPIR method work, because MPIR API is mostly memory locations, not API calls.

Signal propagation

  • Came up on the user list again, this time wanting a way to signal only child procs that call MPI_Init (and not any intermediate procs such as shell scripts)
  • Ralph added an MCA param to either hit only direct children, or all descendants of those children - but not exactly what the user requested
  • What we do now, is probably fine??? I missed some discussion here.

Fall 2017 release features?

  • release date Oct 31st, 2017
  • Cut-off would be Sept 1st, 2017 (try to cut the release branch time to live down to 2 months).
  • WIll cut the branch off of master, and name it either 3.1 or 4.0 based on ABI breakage in master on Sept 1st.
  • IBM strongly prefers no new ABI breaks in Fall 2017 release, and can help in the effort to test ABI breakage.
  • Discuss CI testing to test for ABI breakage.
    • MPI API can be done, but is also testing.
    • Fortran can be difficult especially F90 and use_mpi
  • IBM will write some small CI tests to detect SOME ABI breaks (Comm size structure changes, perhaps others)
  • IBM will add some CI to build 3.0 and run with master.
  • What PMIx version? v2.1 - will we need to BLOCK for this?
    • No - PMIx v2.1 is API/ABI compatible, adds more compatiblity.
  • What about sm BTL?
    • either just remove sm for 3.0.0 or
    • or in Fall 2017 add aliasing so sm will use vader (keeps backwards compatibility).
  • TCP performance enhancements.
  • From MPI forum - Comm_Asserts - allow_overtake, and no anysrc.
  • configure flag to remove MPI 3.0 removed symbols.
    • Kinda iffy because configure one way it's ABI compat, one way it's not.
  • May or may not get selection logic in.

Do we care about Strict C99 stuff (e.g., pointer to constant)

  • no

How to better track PRs across multiple release branches?

  • E.g., ensure it has already been merged to master
    • Don't know how to automate that.
  • E.g., ensure that we merge at vX only when it has been merged at all desired versions < vX
  • One possibility: should we always make an issue, and put a tag on it for each version that a given PR is merged against?
  • Can this be automated via bot somehow?
  • Discussion
  • More Messages for Developers, and thank-yous for folks finding bugs.
  • Create a bot to watch issues, and another (or whatever) to watch PRs.
  • Use Cases:
    1. PR on master, no issue.
      • PR bot sees no "issue" reference, and sets to success
        • Decided that requiring issue refrence on all PRs is too heavy.
      • Issue bot does nothing.
    2. PR bot on master, "fixes #123"
      • PR bot checks if and "target:*" on Issue, fail else success.
      • Once merged, "commit bot", sees issues remove "target:master", add "target:mergedto_master"
      • "issue bot" sees close, if more "target *" labels, reopen issue with friendly node.
    3. New PR on branch with "fixes #123"
      • PR bot doesn't check issue.
      • commit bot see issue, removes "target:THIS", add "target:mergedto_THIS", if #targets ==0 comment "Please close me".
  • Brian's going to put the picture in.

We have a lot of issues open.

  • Many are issues where we keep going back and forth on.
  • Some are because we keep it open, waiting for customer to reply.
  • Could have a 7-day-autoresolve state, and then if no one ever comes back, it just auto-closes after 7 days.
  • customer issue -> add comment "Think I fixed..." -> tag "autoclose"
  • cron forall issues, if "autoclose" and just update > 7 days. comment, close.
  • issue bot (new update), if "autoclose" tag, remove "auto close tag".

Move the entire Open MPI web site behind a CDN?

  • If so, we can remove the mirrors program
  • Discussion for Jeff an Brian, is it contingent on s3? no. Just need to move some SSL certs.
  • Michigan creates SSL certs, need to give them a heads up.

Do we want to get rid of HostGator?

  • it would make some things easier since can't use some plugins in S3.
  • Some folks still use it for IMAP, and Outgoing mail.
  • Leave mailbot running here.
  • have it for 2 more years.

Investigate shared location for OMPI organization secrets/keys/passwords (e.g., LastPass? 1Password? ...?)

  • There are a small number of secrets we have AWS root credentials, some other passwords.
  • Does SPI have a safe we can put an envelope in?
  • Jeff S will take a to-do to set this up, and give Brian, Ralph and Jeff S. the keys.

hwloc integration

  • Easy way to disable hwloc internals such as NVML from OMPI's configure?
  • How to deal with hwloc 2.0 ABI break (2 components?)

Proposal for OMPI signed-off-by policy:

  1. Do not grandfather old commits
  2. If you cherry pick someone else's commit, you need to sign off.
    • Not necessary for legal reasons.
  • No legal implication for cherry-picking from something already in repo.
  • Commit requirement is for putting INTO repo.
  • Everything already in, has already shipped.
  • Not an Issue.

Threading model

Rankfile mapper: Ralph can no longer maintain it. Who will become the maintainer? (IBM volunteered)

  • Ralph and IBM have a call next week to transition.

Shall we link components against their native main library - e.g., ORTE components to libopen-rte?

  • See https://github.com/open-mpi/ompi/issues/3705
  • Required reading before the discussion: https://github.com/open-mpi/ompi/wiki/Linkers
    • Default is case 2.
    • --enable-static (builds libmpi.a, still builds libmpi.so, and disable dlopen)
      • case 7 AND 16.
    • LOCAL is case 14, and doesn't work for python bindings.
  • Remember: there is a workaround -- --disable-dlopen (i.e., cases 4 and 16 in the tables on that wiki). But that doesn't help if the OS/distro installs a "case 2" Open MPI by default.
  • PROBLEM - Python dlopen(LOCAL) libmpi. Are the linkers smart enough to recognize that the lib
  • Brian, Jeff, and Ralph, all remember this LONG time ago, and explicitly changed these to NOT do this because of customer issues.
    • CANT remember. Made things consistant, and mentioned because it had caused problems somewhere, but no example
  • Distros don't build --disable-dlopen, so python bindings has a problem when using distro versions.
  • Workaround that Python Bindings are currently doing is described in the wiki

Thursday Morning

Fujitsu Status

  • Presented Status.
  • Running on ARM8 (newly named AARCH64)
    • Discussion about support for ARM8.
    • Fujitsu doesn't care about ARM8 on v2.0.x or v2.1.x
    • Fujitsu Cares about v3.0.0 ARM8 support.
  • Fujitsu backported some assembly from v3.0 to v2.0.x or v2.1.x
    • on v3.0.0 open-mpi is enabling-builtin-atomics by default
      • But the performance is worse (we don't know why).
  • ARM (AARCH64) Support of Open MPI (currently list v4-v8)
    • Delete v4 and v5
  • PR #3701 Non-PML persistent requests
    • Support non PML persistent requests.
    • Wants to 'schedule' the requests in a start_all situation to improve bandwidth.
  • PR #3700 Hang-up detection feature
    • George suggested using an external tool with debug interface to check message queues on a timer. This keeps the timer out of the critical progress loop.
    • It may not detect deadlocks, and just
    • The timeout is disabled, but it adds overhead to critical synchronization progress.
    • George proposed an alternative solution that doesn't happen in the syncs, and would only cost a single global variable modification overhead. He will comment in PR.

Discussion about requiring C99 or C11(?) compilers to build open mpi

  • mpi.h must stay at C89.
  • Going to create an issue with compiler versions to remove support for in v3.1

Removing btl_sm for v3.0.0

  • Discussion that there's no time for aliasing
    • But not needed since in v3.1 we'll have some transport aliasing
    • Decided Jeff will remove btl_sm and add some message in a btl_sm that will abort and print message to use sm_vader if they are explicitly requesting btl_sm.

Remove CR from master before we branch for v3.1.x

  • it was removed for v2.0, but not removed from master.
  • then it is STILL in v3.0.0 (which was branched from master)
  • George will go through and keep the parts he wants, and remove the rest.
  • So just do the simple removal of configure flag for CR for v3.0.0 and better removal can then come later.
    • Ralph had a caution about one trickey configure variable need to be careful with.

Just realized that we need to do our administrative checking of who has commit access.

  • Jeff will send out email or git doc for people to comment on

Multithreaded Onesided - It's buggy, just fix bugs or refactor?

Desire to break ABI for Datatype change to help libraries (like ROMIO /

  • will wait for after branching for v3.1 at end of september before committing to master.

Splitting Opal

  • Other projects are using Opal because it fits their needs. Get MCA and other functionality.

  • Some projects are renaming symbols, and others are not.

  • PMIx has it's own opal, SCX has it's own opal (which PMIx uses for communication), and OMPI of course has it's own OPAL.

  • PROBLEMS:

    • These projects have to keep syncing with OPAL
    • These projects have multiple redundant copies of opal.
    • PMIx will take a while to use any new solution if opal is a new seperate github project.
  • Potential solution

    • Opal could be modified to take and return a context to support multiple clients.
    • Would need to think about multiple threads calling into OPAL simultaneously.
  • Projects need: timers, atomics,

  • WOuld need to do something similar for OMPI for MPI_Sessions.

  • What about datatypes in OPAL? There are just 8 basic datatypes in OPAL, and maps all MPI datatypes. It's probably okay to keep this in opal.

  • can start with a default context, and start by moving global variables into context. Then decide if we want static of dynamic contexts later.

  • Where will this code live?

    • If it's a separate repo inside of open-mpi organization on github, then devs would have to checkout BOTH, and that would be PAINFUL.
      • Would also require seperate versioning, release managers, etc...
    • Can do much of this work in place before separating it out.
    • If one of the goals is to have a repo you can just get opal from, that could be automatize (rather than submodules)
  • STill have versioning issues.

    • OPAL (really open-pal, since some other libopal rpm out there) has always been viewed as internal, and versioned with Open MPI.
    • Could provide a script to help symbol renaming to help multiple versions of opal in a single process for NOW.
      • Longterm in a perfect world we'd want just one opal in the process... but lots of work to solve this long term.
    • discussion about script to push upstream to make symbol renaming easier for shortterm.
  • ☎️ CI:

    • What can we do about the fragility of the Jenkins infrastructure?
      • It seems like one or more of the CI's is broken every week due to lost connections or changed protocols, thereby blocking all commits.
    • Other random CI updates
    • Release process updates
      • Where should Open MPI downloads be:
        • OMPI web site (probably not)
        • S3
        • Github

PMIx working group meetings

  • Network
  • Tiered Storage
  • OpenMP/MPI coordination
  • Language bindings as apps begin using PMIx? (Ralph volunteers to do Fortran!)

Old issue about BTL progress functions: https://github.com/open-mpi/ompi/issues/1695

  • This was already done.

MPI_File backing file location

Automate reduction of symbol name pollution?

SPI: Any updates / action items?

  • No updates. Pretty sure they got the money.

  • Other pending PR's that require any discussion...?

    • ...
Clone this wiki locally