Skip to content

Commit

Permalink
Move modified HPA alert to helm chart
Browse files Browse the repository at this point in the history
* Rename the KubeHpaMaxedOut alert to KubeHpaMaxedOutMultiPod
* Include deployment of this alert in kube-prometheus-stack helm chart

Signed-off-by: Tobias Wackenhut <[email protected]>
  • Loading branch information
tynsh committed Dec 13, 2023
1 parent ba670a1 commit 38611a0
Show file tree
Hide file tree
Showing 4 changed files with 28 additions and 16 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# SAS Viya Monitoring for Kubernetes

## Unreleased

* **Metrics**
* [CHANGE] The KubeHpaMaxedOut alert has (effectively) been renamed KubeHpaMaxedOutMultiPod

## Version 1.2.20 (12DEC2023)

* **Metrics**
Expand Down
9 changes: 0 additions & 9 deletions monitoring/bin/deploy_monitoring_cluster.sh
Original file line number Diff line number Diff line change
Expand Up @@ -320,15 +320,6 @@ for f in monitoring/rules/viya/rules-*.yaml; do
kubectl apply -n $MON_NS -f $f
done

kubectl get prometheusrule -n $MON_NS v4m-kubernetes-apps 2>/dev/null
if [ $? == 0 ]; then
log_verbose "Patching KubeHpaMaxedOut rule"
# Fixes the issue of false positives when max replicas == 1
kubectl patch prometheusrule --type='json' -n $MON_NS v4m-kubernetes-apps --patch "$(cat monitoring/kube-hpa-alert-patch.json)"
else
log_debug "PrometheusRule $MON_NS/v4m-kubernetes-apps does not exist"
fi

# Elasticsearch Datasource for Grafana
LOGGING_DATASOURCE="${LOGGING_DATASOURCE:-false}"
if [ "$LOGGING_DATASOURCE" == "true" ]; then
Expand Down
7 changes: 0 additions & 7 deletions monitoring/kube-hpa-alert-patch.json

This file was deleted.

23 changes: 23 additions & 0 deletions monitoring/values-prom-operator.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,29 @@
commonLabels:
sas.com/monitoring-base: kube-viya-monitoring

defaultRules:
disabled:
KubeHpaMaxedOut: true

additionalPrometheusRulesMap:
sas-modified-default-rules:
groups:
- name: kubernetes-apps
rules:
- alert: KubeHpaMaxedOutMultiPod
annotations:
description: HPA {{ $labels.namespace }}/{{ $labels.horizontalpodautoscaler }}
has been running at max replicas for longer than 15 minutes.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubehpamaxedout
summary: HPA is running at max replicas
expr: (kube_horizontalpodautoscaler_status_current_replicas{job="kube-state-metrics",namespace=~".*"}
== kube_horizontalpodautoscaler_spec_max_replicas{job="kube-state-metrics",namespace=~".*"})
and kube_horizontalpodautoscaler_status_current_replicas{job="kube-state-metrics",namespace=~".*"}
> 1
for: 15m
labels:
severity: warning

# ===================
# Prometheus Operator
# ===================
Expand Down

0 comments on commit 38611a0

Please sign in to comment.