Skip to content

Request refactoring test

Jeff Squyres edited this page Jul 26, 2016 · 32 revisions

Per the discussion on the 2016-07-26 webex, we decided to test several aspects of request refactoring.

Below is a proposal for running various tests / collecting data to evaluate the performance of OMPI with and without threading, and to evaluate the performance after the request code refactoring. The idea is that several organizations would run these tests and collect the data specified.

1. Single threaded performance

Benchmark: OSU benchmarks v5.3, message rate test (osu_mbw_mr)

Run the osu_mbw_mr benchmark to measure the effect on single threaded performance from before all the threaded improvements / request refactor (*).

(*) NOTE: Per https://github.com/open-mpi/ompi/issues/1902, we expect there to be some performance degradation. Once this issue is fixed, there should be no performance degradation. If there is, we should investigate/fix.

Open MPI versions to be tested

  • 1.10.3
  • 2.0.0
  • Master, commit before request refactoring (need to find a suitable git hash here, so that we all test the same thing)
    • Make sure to disable debugging! (this is likely from before we switched master to always build optimized)
  • Master head (need to agree on a specific git hash to test, so that we all test the same thing)

2. Multithreaded performance

Benchmark: OSU benchmarks v5.3, message rate test (osu_mbw_mr)

This test should spawn an even number of processes on a single server (using the vader BTL). Each thread should do a ping-pong with another thread in the same NUMA domain.

  1. 16 processes/1 process per core, each process uses MPI_THREAD_SINGLE
    • Use the stock osu_mbw_mr benchmark
    • This is the baseline performance measurement.
  2. 16 processes/1 process per core, each process uses MPI_THREAD_MULTIPLE
    • Use the stock osu_mbw_mr benchmark, but set OMPI_MPI_THREAD_LEVEL to 3, thereby setting MPI_THREAD_MULTIPLE
    • The intent of this test is to measure the performance delta between this test and the baseline. We expect the performance delta to be nonzero (because we are now using locking/atomics -- especially once https://github.com/open-mpi/ompi/issues/1902 is fixed).
  3. 1 process/16 threads/1 thread per core (obviously using MPI_THREAD_MULTIPLE).
    • Use Arm's test for this (which essentially runs osu_mbw_mr in each thread).
    • The intent of this test is to measure the performance delta between this test and the baseline. We expect the performance delta to be nonzero (because we are now using locking/atomics -- especially once https://github.com/open-mpi/ompi/issues/1902 is fixed).

If the performance difference between the 2nd and 3rd tests and the baseline is large, we will need to investigate why.

Open MPI versions to be tested

  • 2.0.0
  • Master, commit before request refactoring (need to find a suitable git hash here, so that we all test the same thing)
    • Make sure to disable debugging! (this is likely from before we switched master to always build optimized)
  • Master head (need to agree on a specific git hash to test, so that we all test the same thing)

3. Multithreaded performance, showing request refactoring benefits

The goals of the request refactoring were to:

  1. Decrease lock contention when multiple threads are blocking in MPI_WAIT*
  2. Better allow non-MPI threads to progress when multiple threads are blocking in MPI_WAIT*

Benchmark : 2 threads/process.

  • Thread A, doing matrix multiplication and measuring flops.
  • Thread B, doing MPI_Wait of n requests.
  • Total of 16 processes.

GOAL : The FLOPS should increase significantly with the new request.

Version to be tested

  • Master, commit before request refactoring.
  • Master head
  • 2.0.0
Clone this wiki locally