Skip to content
Jeff Squyres edited this page Dec 5, 2021 · 7 revisions

A lot of discussions were made in the issue#3003 and related issues/PRs regarding timers used in Open MPI. This page marshals the information and explains the current implementation as of March, 2019 for Open MPI developers.

In this page, the word "timer" is defined as a function to give the current time. The time is expressed as an amount of time since a point in the past. In the MPI world, it can be used to implement the MPI_WTIME routine.

Characteristics of timers

Several timers are available depending on systems. Each timer has its characteristics.

Accurate time

Monotonic time

Time should increase monotonically. In other words, time should not go back into the past.

If a timer is implemented using a CPU cycle counter and a system has multiple cores, time may not increase monotonically when a process is migrated to another core, especially a core on another socket.

Constant rate tick

Time should increase at a constant rate compared to real time.

If a timer is implemented using a CPU cycle counter, time may not increase at a constant rate when the frequency of the CPU changes.

High resolution

How small the tick is. If a timer is used for the MPI_WTIME routine, the resolution is reflected to the MPI_WTICK routine.

Low overhead

How much time is needed to get the current time.

Time correction respondence

Whether the timer is affected by a system time correction, like one by a NTP daemon. If affected, time may go back into the past and may jump discontinuously.

Global time

Whether the timer is synchronized among compute nodes. This is reflected to the MPI_WTIME_IS_GLOBAL attribute key.

Available timers

Hardware-native timers

Many hardware architectures provide high resolution and low overhead timers.

If a hardware-native timer is based on a CPU cycle counter, we should pay attention to core migration of a process and CPU frequency change.

x86-64

The x86-64 architecture provides the RDTSCP and the RDTSC instructions, which read the TSC (time stamp counter). They are complex.

The TSC is implemented differently across CPU models.

TSC type constant rate tick? monotonic time?
(original) TSC no per core (?)
constant TSC yes per core (?)
invariant TSC yes per socket (?)

A problem of the invariant TSC is that the instruction to determine the frequency is privileged.

Arm (Armv8-A)

The Armv8-A architecture provides the Generic Timer and the Generic Timer feature includes a system counter.

The system counter in the Generic Timer:

  • Is a system level. Therefore all CPU cores in a compute node see the same counter.
  • Measures the passing of time in real-time.
  • Increases at a fixed frequency, typically in the range 1-50MHz, except in lower-power operating modes. The CNTFRQ_EL0 register holds a copy of the current clock frequency.
  • Starts operating from zero.
  • Can be obtained by reading the CNTVCT_EL0 register.

See ARM Architecture Reference Manual ARMv8, for ARMv8-A architecture profile.

PowerPC and POWER

...

Keyword: time base facility

SPARC-V9

The SPARC-V9 architecture provides the TICK register.

The counter field of the TICK register:

  • Is a 63-bit counter that counts CPU clock cycles.
  • Can be read by the RDTICK instruction.

See The SPARC Architecture Manual, Version 9

Software-managed timers

Usually an OS provides library functions to get the current time.

Software-managed timers may be affected by a system time correction, like one by a NTP daemon.

clock_gettime and clock_getres

The clock_gettime function returns the current time and the clock_getres returns the resolution (precision). The first argument clock_id is used to select a type of a clock. For example, CLOCK_REALTIME represents the clock measuring real time for the system since the Epoch. This clock is affected by discontinuous jumps in the system time. CLOCK_MONOTONIC represents the monotonic clock for the system since an unspecified point in the past. This clock is not affected by discontinuous jumps in the system time.

The clock_gettime and the clock_getres functions are defined in POSIX.1-2001 and later. OS X has problem with clock_gettime?

Amended in Dec 2021: macOS CLOCK_MONOTONIC can actually go backwards; see https://github.com/mobile-shell/mosh/pull/1124, for example (and the corresponding darwin implementation of clock_gettime(), which confirms the discussion that CLOCK_MONOTONTIC can go backwards on macOS, but CLOCK_MONOTONIC_RAW will not).

Some OSes implement these functions as system calls and therefore they are high overhead. GNU/Linux implements these functions using vDSO on some architectures to avoid the overhead.

gettimeofday

The clock_gettime function returns the current time, expressed as seconds and microseconds since the Epoch. The clock may be affected by discontinuous jumps in the system time.

The gettimeofday function is defined in POSIX.1-2001 but is marked as obsolete in POSIX.1-2008.

Some OSes implement this function as a system call and therefore it is high overhead. GNU/Linux implements this function using vDSO on some architectures to avoid the overhead.

times

The times function returns the CPU time spent executing instructions of the calling process and the CPU time spent in the system while executing tasks on behalf of the calling process.

The times function are defined in POSIX.1-2001 and later.

Usage

Timers are used in several places in the Open MPI code.

MPI_WTIME and MPI_WTICK

The MPI_WTIME routine returns an elapsed wall-clock time since some time in the past. The MPI_WTICK routine returns the resolution of the MPI_WTIME routine.

  • Accuracy is important.
  • High resolution and low overhead are better.
  • The values should not be affected by a system time correction.

opal_progress

We need to trip the event library at some interval in the opal_progress function.

  • Accuracy and high resolution are not important.
  • Low overhead is important.

Software-based performance counters

...

Timings

...

Current code

As of Apr 2019 (HEAD 9bb8fd509b970d31232a430db73aa204b8a9b40d).

Macros defined by the configure

  • OPAL_HAVE_CLOCK_GETTIME
    If the clock_gettime function is provided the OS, the value is 1. Otherwise, the value is 0. This macro is defined in $build_dir/opal/include/opal_config.h.

Macros defined in opal/include/opal/sys/

  • OPAL_TIMER_MONOTONIC
    If the opal_sys_timer_get_cycles function always returns monotonically increasing values in a node, the value is 1. Otherwise, the value is 0. This macro is once defined in opal/include/opal/sys/timer.h as 1 and is redefined as 0 in opal/include/opal/sys/*/timer.h for some architectures.
  • OPAL_HAVE_SYS_TIMER_GET_CYCLES
    If the opal_sys_timer_get_cycles function is implemented for the architecture, the value is 1. Otherwise, the value is 0. This macro is defined in opal/include/opal/sys/*/timer.h.
  • OPAL_HAVE_SYS_TIMER_IS_MONOTONIC
    If the opal_sys_timer_is_monotonic function is implemented for the architecture, the value is 1. For some architectures, this macro is defined as 1 (with the architecture-dependant opal_sys_timer_is_monotonic function) in opal/include/opal/sys/*/timer.h. For other architectures, this macros is defined as 1 (with the opal_sys_timer_is_monotonic function which returns the value of OPAL_TIMER_MONOTONIC) in opal/include/opal/sys/timer.h.

These macros are currently used in opal/mca/timer/linux/.

Functions defined in opal/include/opal/sys/

  • opal_sys_timer_get_cycles
    This function returns a cycle count. The cycle count may not be the cycle count of the CPU itself, if there is another sufficiently close counter with better behavior characteristics (like the Time Base counter on many Power/PowerPC platforms). This function is defined only if the value of the OPAL_HAVE_SYS_TIMER_GET_CYCLES macro is 1.
  • opal_sys_timer_freq
    This function returns the frequency of the cycle counter in use, NOT the frequency of the main CPU. This function is currently defined only for arm64.
  • opal_sys_timer_is_monotonic
    This function returns whether the opal_sys_timer_get_cycles function returns monotonic time. This function is always defined because the default function is defined in opal/include/opal/sys/timer.h.

These functions are defined in opal/include/opal/sys/*/timer.h if available.

These functions are currently used in opal/mca/timer/linux/.

Macros defined in opal/mca/timer/

  • OPAL_TIMER_CYCLE_NATIVE
    If the opal_timer_base_get_cycle function is implemented directly using an architecture-dependent cycle counter or computed from some other data (such as a high-resolution timer), the value is 1. Otherwise, the value is 0.
  • OPAL_TIMER_CYCLE_SUPPORTED
    If the opal_timer_base_get_cycle function is implemented for the OS, the value is 1. Otherwise, the value is 0.
  • OPAL_TIMER_USEC_NATIVE
    ...
  • OPAL_TIMER_USEC_SUPPORTED
    If the opal_timer_base_get_usec function is implemented for the OS, the value is 1. Otherwise, the value is 0.

These macros are defined in opal/mca/timer/*/timer_*.h.

Functions defined in opal/mca/timer/

  • opal_timer_base_get_cycles
    This function returns a cycle count. The cycle count may not be the cycle count of the CPU itself, if there is another sufficiently close counter with better behavior characteristics (like the Time Base counter on many Power/PowerPC platforms).
  • opal_timer_base_get_usec
    This function returns the current time in micro second.
  • opal_timer_base_get_freq
    This function returns the frequency of the cycle counter in use, NOT the frequency of the main CPU.

These functions are defined in opal/mca/timer/*/timer_*.h (as inline) or opal/mca/timer/*/timer_*_component.c (as non-inline) if available.

Macros defined in opal/mca/timer/linux/

  • opal_timer_linux_get_cycles_clock_gettime
    ...
  • opal_timer_linux_get_usec_clock_gettime
    ...
  • opal_timer_linux_get_cycles_sys_timer
    ...
  • opal_timer_linux_get_usec_sys_timer
    ...

Variables

  • mca_timer_base_monotonic
    ...

History of implementation of MPI_WTIME

  1. Originally MPI_WTIME was implemented using gettimeofday.
  2. In the commit ee75c45ec5, it was changed to use opal_timer_base_get_usec if OPAL_TIMER_USEC_NATIVE is 1. In this instance, OPAL_TIMER_USEC_NATIVE for Linux was 0.
  3. In the PR#285, OPAL_TIMER_USEC_NATIVE for Linux was changed to OPAL_HAVE_SYS_TIMER_GET_CYCLES and MPI_WTIME was changed to use opal_timer_base_get_cycles if OPAL_TIMER_CYCLE_NATIVE is 1. By this commit, MPI_WTIME was broken in the case that the CPU frequency changes during MPI program execution.
  4. In the issue#3003, the problem was reported.
  5. In the PR#3184, MPI_WTIME was changed to use gettimeofday as a workaround.
  6. In the PR#3201, MPI_WTIME was changed to use clock_gettime on Linux.
Clone this wiki locally