docs: Add discovery component section in docs

Signed-off-by: Mahendra Paipuri <[email protected]>
mahendrapaipuri · Oct 21, 2024 · 5f1a5ce · 5f1a5ce
1 parent 4cf546d
commit 5f1a5ce
Show file tree

Hide file tree

Showing 8 changed files with 3,145 additions and 17 deletions.
diff --git a/README.md b/README.md
@@ -36,6 +36,8 @@ in a resource manager agnostic way.
 - Monitor energy, performance, IO and network metrics for different types of resource
 managers (SLURM, Openstack, k8s)
 - Support NVIDIA (MIG and vGPU) and AMD GPUs
+- Provides targets using [HTTP Discovery Component](https://grafana.com/docs/alloy/latest/reference/components/discovery/discovery.http/)
+to [Grafana Alloy](https://grafana.com/docs/alloy/latest) to continuously profile compute units
 - Realtime access to metrics *via* Grafana dashboards
 - Access control to Prometheus datasource in Grafana
 - Stores aggregated metrics in a separate DB that can be retained for long time

diff --git a/website/cspell.json b/website/cspell.json
@@ -55,7 +55,8 @@
         "Mellanox",
         "blkio",
         "tsbd",
-        "gpuuuid"
+        "gpuuuid",
+        "Pyroscope"
     ],
     // flagWords - list of words to be always considered incorrect
     // This is useful for offensive words and common spelling errors.

diff --git a/website/docs/00-introduction.md b/website/docs/00-introduction.md
@@ -28,6 +28,8 @@ of backward compatibility.
 - Monitor energy, performance, IO and network metrics for different types of resource
 managers (SLURM, Openstack, k8s)
 - Support NVIDIA (MIG and vGPU) and AMD GPUs
+- Provides targets using [HTTP Discovery Component](https://grafana.com/docs/alloy/latest/reference/components/discovery/discovery.http/)
+to [Grafana Alloy](https://grafana.com/docs/alloy/latest) to continuously profile compute units
 - Realtime access to metrics *via* Grafana dashboards
 - Access control to Prometheus datasource in Grafana
 - Stores aggregated metrics in a separate DB that can be retained for long time

diff --git a/website/docs/01-philisophy.md b/website/docs/01-philisophy.md
@@ -1,6 +1,8 @@
 # Philosophy
 
-## CPU, memory, IO and network metrics
+## Supported metrics
+
+### CPU, memory, IO and network
 
 The idea we are leveraging here is that every resource manager has to resort to cgroups
 on Linux to manage CPU, memory and IO resources. Each resource manager does it
@@ -27,15 +29,15 @@ agnostic to resource manager and underlying file system. Similarly network metri
 TCP and UDP protocols for both IPv4 and IPv6 can be gathered by using carefully crafted
 bpf programs and attaching to relevant kernel functions.
 
-This is a distributed approach where a daemon exporter will run on each compute node. Whenever
-Prometheus make a scrape request, the exporter will walk through cgroup file system and
-bpf program maps and
+This is a distributed approach where a daemon Prometheus exporter will run on
+each compute node. Whenever Prometheus make a scrape request, the exporter will
+walk through cgroup file system and bpf program maps and
 exposes the data to Prometheus. As reading cgroups file system is relatively cheap,
 there is a very little overhead running this daemon service. Similarly, BPF programs are
 extremely fast and efficient as they are run in kernel space. On average the exporter
 takes less than 20 MB of memory.
 
-## Energy consumption
+### Energy metrics
 
 In an age where green computing is becoming more and more important, it is essential to
 expose the energy consumed by the compute units to the users to make them more aware.
@@ -55,7 +57,7 @@ This node level power consumption can be split into consumption of individual co
 by using relative CPU times used by the compute unit. Although, this is not an exact
 estimation of power consumed by the compute unit, it stays a very good approximation.
 
-## Emissions
+### Emission metrics
 
 The exporter is capable of exporting emission factors from different data sources
 which can be used to estimate equivalent CO2 emissions. Currently, for
@@ -70,7 +72,7 @@ constant global average emission factor can also be used.
 Emissions collector is capable of exporting emission factors from different sources
 and users can choose the factor that suits their needs.
 
-## GPU metrics
+### GPU metrics
 
 Currently, only nVIDIA and AMD GPUs are supported. This exporter leverages
 [DCGM exporter](https://github.com/NVIDIA/dcgm-exporter/tree/main) for nVIDIA GPUs and
@@ -86,7 +88,7 @@ vGPUs scheduled on that physical GPU. Similarly, in the case of Multi GPU Instan
 the energy consumption of each MIG instance is estimated based on the relative number
 of Streaming Multiprocessors (SM) and total energy consumption of the physical GPU.
 
-## Performance metrics
+### Performance metrics
 
 Presenting energy and emission metrics is only one side of the story. This will
 help end users to quickly and cheaply identify their workloads that are consuming
@@ -102,3 +104,42 @@ performance metrics for nVIDIA GPUs as well as long as operators install and ena
 nVIDIA DCGM libraries. More details can be found in
 [DCGM](https://docs.nvidia.com/datacenter/dcgm/latest/user-guide/feature-overview.html#profiling-metrics)
 docs.
+
+### Continuous Profiling
+
+[Continuous Profiling](https://www.cncf.io/blog/2022/05/31/what-is-continuous-profiling/) enables
+users to profile their codes on production systems which can help them to fix abnormal CPU
+usage, memory leaks, _etc_. A good primer for the continuous profiling can be consulted from
+[Elastic Docs](https://www.elastic.co/what-is/continuous-profiling). CEEMS stack lets the users
+and developers to identify which applications or processes to continuously profiling where CEEMS
+will work in tandem with continuous profiling software to profile these applications and processes.
+
+## Technologies involved
+
+### Databases
+
+One of the principal objectives of CEEMS stack is to avoid creating new software and use
+open source components as much as possible. It is clear that stack needs a Time Series
+Database (TSDB) to store time series metrics of compute units and [Prometheus](https://prometheus.io/)
+proved to be the _defacto_ standard in cloud-native community for its performance. Thus,
+CEEMS use Prometheus (or PromQL compliant TSDB) as its TSDB. CEEMS also use a relational
+DB for storing a list of compute units along with their aggregate metrics from different
+resource managers. CEEMS uses [SQLite](https://www.sqlite.org/) for its simplicity and
+performance. Moreover CEEMS relational DB does not need concurrent writes as there is always
+a single thread (go routine) is fetching compute units from underlying resource manager
+and writing them to the DB. Thus, SQLite can be a very good option and avoids having to
+maintain complex DB servers.
+
+For the case of continuous profiling, [Grafana Pyroscope](https://grafana.com/oss/pyroscope/)
+provides an OSS version of continuous profiling database which can be regarded as equivalent
+of Prometheus for profiling data. [Grafana Alloy](https://grafana.com/docs/alloy/latest/)
+is the agent that runs on all compute nodes like Prometheus exporter which in-turn sends
+profiling data to Pyroscope server. CEEMS stack provides a list of targets (processes)
+that needs to continuously profiling to Grafana Alloy.
+
+### Visualization
+
+Once the metrics are gathered, we need an application to visualize metrics for the end-users
+in a user-friendly way. CEEMS uses [Grafana](https://grafana.com/grafana/) which is also
+the _de facto_ standard in cloud-native community. Grafana has very good integration for
+Prometheus and also for Grafana Pyroscope.
diff --git a/website/docs/components/ceems-exporter.md b/website/docs/components/ceems-exporter.md
@@ -7,8 +7,10 @@ sidebar_position: 1
 ## Background
 
 `ceems_exporter` is the Prometheus exporter that exposes individual compute unit
-metrics, RAPL energy, IPMI power consumption, emission factor and GPU to compute unit
-mapping.
+metrics, RAPL energy, IPMI power consumption, emission factor, GPU to compute unit
+mapping, performance metrics, IO and network metrics. Besides, the exporter supports
+a [HTTP discovery component](https://grafana.com/docs/alloy/latest/reference/components/discovery/discovery.http/)
+that can provide a list of targets to [Grafana Alloy](https://grafana.com/docs/alloy/latest/).
 
 `ceems_exporter` collectors can be categorized as follows:
 
@@ -422,6 +424,17 @@ These metrics are mainly used to estimate the proportion of CPU and memory usage
 individual compute units and to estimate the energy consumption of compute unit
 based on these proportions.
 
+## Grafana Alloy target discovery
+
+Grafana Alloy provides a [eBPF based continuous profiling](https://grafana.com/docs/alloy/latest/reference/components/pyroscope/pyroscope.ebpf/)
+component. It needs a list of targets (processes in the current case) and label those
+targets appropriately with unique identifier of the compute unit. For instance, for a
+given compute unit (like batch job for SLURM), there can be multiple processes in the
+job and we need to provide a list of all these processes PID labelled by the ID of
+that compute unit to Grafana Alloy. CEEMS exporter can provide a list of these processes
+correctly labelled by the compute unit identifier and eventually these profiles will be
+aggregated by compute unit identifier on Pyroscope server.
+
 ## Metrics
 
 Please look at [Metrics](./metrics.md) that lists all the metrics exposed by CEEMS

diff --git a/website/docs/configuration/ceems-exporter.md b/website/docs/configuration/ceems-exporter.md
@@ -21,7 +21,12 @@ a consistent styling. They will be removed in `v1.0.0`.
 
 :::
 
-## Slurm collector
+## Collectors
+
+The following collectors are supported by Prometheus exporter and they can be configured
+from CLI arguments as briefed below:
+
+### Slurm collector
 
 Although fetching metrics from cgroups do not need any additional privileges, getting
 GPU ordinal to job ID needs extra privileges. This is due to the fact that this
@@ -228,7 +233,7 @@ enable and disable them at runtime is more involved.
 Both perf and eBPF sub-collectors extra privileges to work and the necessary privileges
 are discussed in [Security](./security.md) section.
 
-## Libvirt collector
+### Libvirt collector
 
 Libvirt collector is meant to be used on Openstack cluster where VMs are managed by
 libvirt. Most of the options applicable to Slurm are applicable to libvirt as well.
@@ -258,7 +263,7 @@ processes inside the guest.
 Both perf and eBPF sub-collectors extra privileges to work and the necessary privileges
 are discussed in [Security](./security.md) section.
 
-## IPMI collector
+### IPMI collector
 
 Currently, collector supports FreeIPMI, OpenIMPI, IPMIUtils and Cray's [`capmc`](https://cray-hpe.github.io/docs-csm/en-10/operations/power_management/cray_advanced_platform_monitoring_and_control_capmc/)
 framework. If one of these binaries exist on `PATH`, the exporter will automatically
@@ -309,7 +314,7 @@ might not include the power consumption of GPUs.
 
 :::
 
-## RAPL collector
+### RAPL collector
 
 For the kernels that are `<5.3`, there is no special configuration to be done. If the
 kernel version is `>=5.3`, RAPL metrics are only available for `root`. Three approaches
@@ -323,7 +328,7 @@ directory to give read permissions to the user that is running `ceems_exporter`.
 
 We recommend the capabilities approach as it requires minimum configuration.
 
-## Emissions collector
+### Emissions collector
 
 The only configuration needed for emissions collector is an API token for
 [Electricity Maps](https://app.electricitymaps.com/map). For non commercial uses,
@@ -338,3 +343,77 @@ This collector can be run separately on a node that has internet access by disab
 rest of the collectors.
 
 :::
+
+## Grafana Alloy targets discoverer
+
+CEEMS exporter exposes a special endpoint that can be used as
+[HTTP discovery component](https://grafana.com/docs/alloy/latest/reference/components/discovery/discovery.http/)
+which can provide a list of targets to Pyroscope eBPF component for continuous profiling.
+
+Currently, the discovery component supports **only SLURM resource manager**. There is
+no added value to continuously profile a VM instance managed by Libvirt from hypervisor
+as we will not be able ease to resolve symbols of guest instance from the hypervisor. By
+default the discovery component is disabled and it can be enabled using the following
+component:
+
+```bash
+ceems_exporter --discoverer.alloy-targets.resource-manager=slurm
+```
+
+which will collect targets from SLURM jobs on the current node.
+
+:::tip[TIP]
+
+The discovery component runs at a dedicated endpoint which can be configured
+using `--web.targets-path`. Thus, it is possible to run both discovery
+components and Prometheus collectors at the same time as follows:
+
+```bash
+ceems_exporter --collector.slurm --discoverer.alloy-targets.resource-manager=slurm
+```
+
+:::
+
+Similar to `perf` sub-collector, it is possible to configure the discovery component
+to discover the targets only when certain environment variable is set in the process. For
+example if we use the following CLI arguments to the exporter
+
+```bash
+ceems_exporter --discoverer.alloy-targets.resource-manager=slurm --discoverer.alloy-targets.env-var=ENABLE_CONTINUOUS_PROFILING
+```
+
+only SLURM jobs that have a environment variable `ENABLE_CONTINUOUS_PROFILING` set
+in their jobs will be continuously profiled. Multiple environment variable names can
+be passed by repeating the CLI argument `--discoverer.alloy-targets.env-var`. The
+presence of environment variable triggers the continuous profiling irrespective of
+the value set to it.
+
+Once the discovery component is enabled, Grafana Alloy can be configured to get
+the targets from this component using following config:
+
+```river
+discovery.http "processes" {
+  url = "http://localhost:9010/alloy-targets"
+  refresh_interval = "10s"
+}
+
+pyroscope.write "staging" {
+  endpoint {
+    url = "http://pyroscope:4040"
+  }
+}
+
+pyroscope.ebpf "default" {
+  collect_interval = "10s"
+  forward_to   = [ pyroscope.write.staging.receiver ]
+  targets      = discovery.http.processes.output
+}
+```
+
+The above configuration makes Grafana Alloy to scrape the discovery component
+of the exporter every 10 seconds. The output of the discovery component is passed
+to Pyroscope eBPF component which will continuously profile the processes and
+collect those profiles every 10 seconds. Finally, Pyroscope eBPF components will
+send these profiles to Pyroscope. More details on how to configure authentication
+and TLS for various components can be consulted from [Grafana Alloy](https://grafana.com/docs/alloy) and
+[Grafana Pyroscope](https://grafana.com/docs/pyroscope/latest/introduction/) docs.
diff --git a/website/src/components/HomepageFeatures/index.tsx b/website/src/components/HomepageFeatures/index.tsx
@@ -31,6 +31,16 @@ const FeatureList: FeatureItem[] = [
       </>
     ),
   },
+  {
+    title: "Supports Grafana Alloy/Pyroscope for Continuous Profiling",
+    Svg: require("@site/static/img/pyroscope.svg").default,
+    description: (
+      <>
+        CEEMS is capable of providing targets to Grafana Alloy eBPF
+        component for continuous profiling of compute workloads and apps.
+      </>
+    ),
+  },
   {
     title: "ML/AI workloads",
     Svg: require("@site/static/img/ml_ai.svg").default,
@@ -56,7 +66,7 @@ const FeatureList: FeatureItem[] = [
 
 function Feature({title, Svg, description}: FeatureItem) {
   return (
-    <div className={clsx('col col--3')}>
+    <div className={clsx('col col--4')}>
       <div className="text--center">
         <Svg className={styles.featureSvg} role="img" />
       </div>