Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve alert aggregations for multiple clusters #633

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

hamishforbes
Copy link

Many of the alerts are aggregating on metrics and throwing away the cluster label.

While this isn't the end of the world, pod names are probably going to be unique, but if you have many clusters trying to figure out which one kube-proxy-4jlm2 is running on is a pain.

Because these labels are currently just hardcoded into strings its also difficult to override them in your own jsonnet.

This PR adds extra labels to aggregations.
Only if multi cluster is enabled and extensible.

For example in my environment I also inject a dc label with the AWS region for all metrics as well as the cluster label.
So in my config I can do

_config+:: {
  showMultiCluster: true,
  clusterGroupLabels+: ['dc'],
}

and include that dc label in aggregations

@hamishforbes
Copy link
Author

Hi, is there anything else I can do to move this along?
Any changes or tests required?
ATM i'm maintaining a fork for this which is a bit of a pita.

I did just find #524, which I missed when making this PR but largely does the same thing, that's been hanging around since last year.
Is there just no desire to support multiple clusters in alerting?

@@ -32,11 +32,11 @@
// label exists for 2 values. This avoids "many-to-many matching
// not allowed" errors when joining with kube_pod_status_phase.
expr: |||
sum by (namespace, pod) (
max by(namespace, pod) (
sum by (namespace, pod, %(clusterGroupLabelsStr)s) (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be using clusterLabel ? I don't see why we would need another variable for this.

Same question for all cases with clusterGroupLabelsStr

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value for clusterGroupLabels is clusterLabel, so in the simple case they are the same thing.

But clusterLabel is used in other places (e.g. in dashboards as a selector like %(clusterLabel)s="$cluster") and so can only be a single label.

Using a different variable for grouping labels gives more flexibility and allows use cases like I described in the PR, where I ideally want to group by $clusterLabel, dc and not just the clusterLabel.

@hamishforbes hamishforbes force-pushed the multi-cluster-alerts branch 2 times, most recently from fa29ff2 to 5d51afc Compare January 16, 2022 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants