Skip to content

Commit

Permalink
docs: Add discovery component section in docs
Browse files Browse the repository at this point in the history
Signed-off-by: Mahendra Paipuri <[email protected]>
  • Loading branch information
mahendrapaipuri committed Oct 21, 2024
1 parent 4cf546d commit 5f1a5ce
Show file tree
Hide file tree
Showing 8 changed files with 3,145 additions and 17 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ in a resource manager agnostic way.
- Monitor energy, performance, IO and network metrics for different types of resource
managers (SLURM, Openstack, k8s)
- Support NVIDIA (MIG and vGPU) and AMD GPUs
- Provides targets using [HTTP Discovery Component](https://grafana.com/docs/alloy/latest/reference/components/discovery/discovery.http/)
to [Grafana Alloy](https://grafana.com/docs/alloy/latest) to continuously profile compute units
- Realtime access to metrics *via* Grafana dashboards
- Access control to Prometheus datasource in Grafana
- Stores aggregated metrics in a separate DB that can be retained for long time
Expand Down
3 changes: 2 additions & 1 deletion website/cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,8 @@
"Mellanox",
"blkio",
"tsbd",
"gpuuuid"
"gpuuuid",
"Pyroscope"
],
// flagWords - list of words to be always considered incorrect
// This is useful for offensive words and common spelling errors.
Expand Down
2 changes: 2 additions & 0 deletions website/docs/00-introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ of backward compatibility.
- Monitor energy, performance, IO and network metrics for different types of resource
managers (SLURM, Openstack, k8s)
- Support NVIDIA (MIG and vGPU) and AMD GPUs
- Provides targets using [HTTP Discovery Component](https://grafana.com/docs/alloy/latest/reference/components/discovery/discovery.http/)
to [Grafana Alloy](https://grafana.com/docs/alloy/latest) to continuously profile compute units
- Realtime access to metrics *via* Grafana dashboards
- Access control to Prometheus datasource in Grafana
- Stores aggregated metrics in a separate DB that can be retained for long time
Expand Down
57 changes: 49 additions & 8 deletions website/docs/01-philisophy.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Philosophy

## CPU, memory, IO and network metrics
## Supported metrics

### CPU, memory, IO and network

The idea we are leveraging here is that every resource manager has to resort to cgroups
on Linux to manage CPU, memory and IO resources. Each resource manager does it
Expand All @@ -27,15 +29,15 @@ agnostic to resource manager and underlying file system. Similarly network metri
TCP and UDP protocols for both IPv4 and IPv6 can be gathered by using carefully crafted
bpf programs and attaching to relevant kernel functions.

This is a distributed approach where a daemon exporter will run on each compute node. Whenever
Prometheus make a scrape request, the exporter will walk through cgroup file system and
bpf program maps and
This is a distributed approach where a daemon Prometheus exporter will run on
each compute node. Whenever Prometheus make a scrape request, the exporter will
walk through cgroup file system and bpf program maps and
exposes the data to Prometheus. As reading cgroups file system is relatively cheap,
there is a very little overhead running this daemon service. Similarly, BPF programs are
extremely fast and efficient as they are run in kernel space. On average the exporter
takes less than 20 MB of memory.

## Energy consumption
### Energy metrics

In an age where green computing is becoming more and more important, it is essential to
expose the energy consumed by the compute units to the users to make them more aware.
Expand All @@ -55,7 +57,7 @@ This node level power consumption can be split into consumption of individual co
by using relative CPU times used by the compute unit. Although, this is not an exact
estimation of power consumed by the compute unit, it stays a very good approximation.

## Emissions
### Emission metrics

The exporter is capable of exporting emission factors from different data sources
which can be used to estimate equivalent CO2 emissions. Currently, for
Expand All @@ -70,7 +72,7 @@ constant global average emission factor can also be used.
Emissions collector is capable of exporting emission factors from different sources
and users can choose the factor that suits their needs.

## GPU metrics
### GPU metrics

Currently, only nVIDIA and AMD GPUs are supported. This exporter leverages
[DCGM exporter](https://github.com/NVIDIA/dcgm-exporter/tree/main) for nVIDIA GPUs and
Expand All @@ -86,7 +88,7 @@ vGPUs scheduled on that physical GPU. Similarly, in the case of Multi GPU Instan
the energy consumption of each MIG instance is estimated based on the relative number
of Streaming Multiprocessors (SM) and total energy consumption of the physical GPU.

## Performance metrics
### Performance metrics

Presenting energy and emission metrics is only one side of the story. This will
help end users to quickly and cheaply identify their workloads that are consuming
Expand All @@ -102,3 +104,42 @@ performance metrics for nVIDIA GPUs as well as long as operators install and ena
nVIDIA DCGM libraries. More details can be found in
[DCGM](https://docs.nvidia.com/datacenter/dcgm/latest/user-guide/feature-overview.html#profiling-metrics)
docs.

### Continuous Profiling

[Continuous Profiling](https://www.cncf.io/blog/2022/05/31/what-is-continuous-profiling/) enables
users to profile their codes on production systems which can help them to fix abnormal CPU
usage, memory leaks, _etc_. A good primer for the continuous profiling can be consulted from
[Elastic Docs](https://www.elastic.co/what-is/continuous-profiling). CEEMS stack lets the users
and developers to identify which applications or processes to continuously profiling where CEEMS
will work in tandem with continuous profiling software to profile these applications and processes.

## Technologies involved

### Databases

One of the principal objectives of CEEMS stack is to avoid creating new software and use
open source components as much as possible. It is clear that stack needs a Time Series
Database (TSDB) to store time series metrics of compute units and [Prometheus](https://prometheus.io/)
proved to be the _defacto_ standard in cloud-native community for its performance. Thus,
CEEMS use Prometheus (or PromQL compliant TSDB) as its TSDB. CEEMS also use a relational
DB for storing a list of compute units along with their aggregate metrics from different
resource managers. CEEMS uses [SQLite](https://www.sqlite.org/) for its simplicity and
performance. Moreover CEEMS relational DB does not need concurrent writes as there is always
a single thread (go routine) is fetching compute units from underlying resource manager
and writing them to the DB. Thus, SQLite can be a very good option and avoids having to
maintain complex DB servers.

For the case of continuous profiling, [Grafana Pyroscope](https://grafana.com/oss/pyroscope/)
provides an OSS version of continuous profiling database which can be regarded as equivalent
of Prometheus for profiling data. [Grafana Alloy](https://grafana.com/docs/alloy/latest/)
is the agent that runs on all compute nodes like Prometheus exporter which in-turn sends
profiling data to Pyroscope server. CEEMS stack provides a list of targets (processes)
that needs to continuously profiling to Grafana Alloy.

### Visualization

Once the metrics are gathered, we need an application to visualize metrics for the end-users
in a user-friendly way. CEEMS uses [Grafana](https://grafana.com/grafana/) which is also
the _de facto_ standard in cloud-native community. Grafana has very good integration for
Prometheus and also for Grafana Pyroscope.
17 changes: 15 additions & 2 deletions website/docs/components/ceems-exporter.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,10 @@ sidebar_position: 1
## Background

`ceems_exporter` is the Prometheus exporter that exposes individual compute unit
metrics, RAPL energy, IPMI power consumption, emission factor and GPU to compute unit
mapping.
metrics, RAPL energy, IPMI power consumption, emission factor, GPU to compute unit
mapping, performance metrics, IO and network metrics. Besides, the exporter supports
a [HTTP discovery component](https://grafana.com/docs/alloy/latest/reference/components/discovery/discovery.http/)
that can provide a list of targets to [Grafana Alloy](https://grafana.com/docs/alloy/latest/).

`ceems_exporter` collectors can be categorized as follows:

Expand Down Expand Up @@ -422,6 +424,17 @@ These metrics are mainly used to estimate the proportion of CPU and memory usage
individual compute units and to estimate the energy consumption of compute unit
based on these proportions.

## Grafana Alloy target discovery

Grafana Alloy provides a [eBPF based continuous profiling](https://grafana.com/docs/alloy/latest/reference/components/pyroscope/pyroscope.ebpf/)
component. It needs a list of targets (processes in the current case) and label those
targets appropriately with unique identifier of the compute unit. For instance, for a
given compute unit (like batch job for SLURM), there can be multiple processes in the
job and we need to provide a list of all these processes PID labelled by the ID of
that compute unit to Grafana Alloy. CEEMS exporter can provide a list of these processes
correctly labelled by the compute unit identifier and eventually these profiles will be
aggregated by compute unit identifier on Pyroscope server.

## Metrics

Please look at [Metrics](./metrics.md) that lists all the metrics exposed by CEEMS
Expand Down
89 changes: 84 additions & 5 deletions website/docs/configuration/ceems-exporter.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,12 @@ a consistent styling. They will be removed in `v1.0.0`.

:::

## Slurm collector
## Collectors

The following collectors are supported by Prometheus exporter and they can be configured
from CLI arguments as briefed below:

### Slurm collector

Although fetching metrics from cgroups do not need any additional privileges, getting
GPU ordinal to job ID needs extra privileges. This is due to the fact that this
Expand Down Expand Up @@ -228,7 +233,7 @@ enable and disable them at runtime is more involved.
Both perf and eBPF sub-collectors extra privileges to work and the necessary privileges
are discussed in [Security](./security.md) section.

## Libvirt collector
### Libvirt collector

Libvirt collector is meant to be used on Openstack cluster where VMs are managed by
libvirt. Most of the options applicable to Slurm are applicable to libvirt as well.
Expand Down Expand Up @@ -258,7 +263,7 @@ processes inside the guest.
Both perf and eBPF sub-collectors extra privileges to work and the necessary privileges
are discussed in [Security](./security.md) section.

## IPMI collector
### IPMI collector

Currently, collector supports FreeIPMI, OpenIMPI, IPMIUtils and Cray's [`capmc`](https://cray-hpe.github.io/docs-csm/en-10/operations/power_management/cray_advanced_platform_monitoring_and_control_capmc/)
framework. If one of these binaries exist on `PATH`, the exporter will automatically
Expand Down Expand Up @@ -309,7 +314,7 @@ might not include the power consumption of GPUs.

:::

## RAPL collector
### RAPL collector

For the kernels that are `<5.3`, there is no special configuration to be done. If the
kernel version is `>=5.3`, RAPL metrics are only available for `root`. Three approaches
Expand All @@ -323,7 +328,7 @@ directory to give read permissions to the user that is running `ceems_exporter`.

We recommend the capabilities approach as it requires minimum configuration.

## Emissions collector
### Emissions collector

The only configuration needed for emissions collector is an API token for
[Electricity Maps](https://app.electricitymaps.com/map). For non commercial uses,
Expand All @@ -338,3 +343,77 @@ This collector can be run separately on a node that has internet access by disab
rest of the collectors.

:::

## Grafana Alloy targets discoverer

CEEMS exporter exposes a special endpoint that can be used as
[HTTP discovery component](https://grafana.com/docs/alloy/latest/reference/components/discovery/discovery.http/)
which can provide a list of targets to Pyroscope eBPF component for continuous profiling.

Currently, the discovery component supports **only SLURM resource manager**. There is
no added value to continuously profile a VM instance managed by Libvirt from hypervisor
as we will not be able ease to resolve symbols of guest instance from the hypervisor. By
default the discovery component is disabled and it can be enabled using the following
component:

```bash
ceems_exporter --discoverer.alloy-targets.resource-manager=slurm
```

which will collect targets from SLURM jobs on the current node.

:::tip[TIP]

The discovery component runs at a dedicated endpoint which can be configured
using `--web.targets-path`. Thus, it is possible to run both discovery
components and Prometheus collectors at the same time as follows:

```bash
ceems_exporter --collector.slurm --discoverer.alloy-targets.resource-manager=slurm
```

:::

Similar to `perf` sub-collector, it is possible to configure the discovery component
to discover the targets only when certain environment variable is set in the process. For
example if we use the following CLI arguments to the exporter

```bash
ceems_exporter --discoverer.alloy-targets.resource-manager=slurm --discoverer.alloy-targets.env-var=ENABLE_CONTINUOUS_PROFILING
```

only SLURM jobs that have a environment variable `ENABLE_CONTINUOUS_PROFILING` set
in their jobs will be continuously profiled. Multiple environment variable names can
be passed by repeating the CLI argument `--discoverer.alloy-targets.env-var`. The
presence of environment variable triggers the continuous profiling irrespective of
the value set to it.

Once the discovery component is enabled, Grafana Alloy can be configured to get
the targets from this component using following config:

```river
discovery.http "processes" {
url = "http://localhost:9010/alloy-targets"
refresh_interval = "10s"
}
pyroscope.write "staging" {
endpoint {
url = "http://pyroscope:4040"
}
}
pyroscope.ebpf "default" {
collect_interval = "10s"
forward_to = [ pyroscope.write.staging.receiver ]
targets = discovery.http.processes.output
}
```

The above configuration makes Grafana Alloy to scrape the discovery component
of the exporter every 10 seconds. The output of the discovery component is passed
to Pyroscope eBPF component which will continuously profile the processes and
collect those profiles every 10 seconds. Finally, Pyroscope eBPF components will
send these profiles to Pyroscope. More details on how to configure authentication
and TLS for various components can be consulted from [Grafana Alloy](https://grafana.com/docs/alloy) and
[Grafana Pyroscope](https://grafana.com/docs/pyroscope/latest/introduction/) docs.
12 changes: 11 additions & 1 deletion website/src/components/HomepageFeatures/index.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,16 @@ const FeatureList: FeatureItem[] = [
</>
),
},
{
title: "Supports Grafana Alloy/Pyroscope for Continuous Profiling",
Svg: require("@site/static/img/pyroscope.svg").default,
description: (
<>
CEEMS is capable of providing targets to Grafana Alloy eBPF
component for continuous profiling of compute workloads and apps.
</>
),
},
{
title: "ML/AI workloads",
Svg: require("@site/static/img/ml_ai.svg").default,
Expand All @@ -56,7 +66,7 @@ const FeatureList: FeatureItem[] = [

function Feature({title, Svg, description}: FeatureItem) {
return (
<div className={clsx('col col--3')}>
<div className={clsx('col col--4')}>
<div className="text--center">
<Svg className={styles.featureSvg} role="img" />
</div>
Expand Down
Loading

0 comments on commit 5f1a5ce

Please sign in to comment.