Skip to content

Commit

Permalink
[ops] Introduce GitpodWsManagerMk2BackupFailureError and GitpodWsMana…
Browse files Browse the repository at this point in the history
…gerMk2BackupFailureCritical (#20259)

* [ops] Introduce GitpodWsManagerMk2BackupFailureError and GitpodWsManagerMk2BackupFailureCritical

* Fix
  • Loading branch information
kylos101 authored Oct 2, 2024
1 parent e63652e commit 2c2a86e
Showing 1 changed file with 20 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -45,3 +45,23 @@ spec:
sum by(cluster) (avg_over_time(gitpod_workspace_regular_not_active_percentage_mk2[1m]) > 0)
AND
sum by(cluster) (rate(gitpod_ws_manager_mk2_workspace_startup_seconds_sum{type="Regular"}[1m])) == 0
- alert: GitpodWsManagerMk2BackupFailureError
labels:
severity: error
team: engine
annotations:
runbook_url: https://github.com/gitpod-io/runbooks/blob/main/runbooks/WorkspaceBackupFailures.md
summary: Workspace backups failed recently in cluster {{ $labels.cluster }}
description: This can happen when a single node has failed in the cloud provider
expr: |
sum by (cluster) (increase(gitpod_ws_manager_mk2_workspace_backups_failure_total{cluster!~"ephemeral.*"}[1h])) <= 16
- alert: GitpodWsManagerMk2BackupFailureCritical
labels:
severity: critical
team: engine
annotations:
runbook_url: https://github.com/gitpod-io/runbooks/blob/main/runbooks/WorkspaceBackupFailures.md
summary: Workspace backups failed recently in cluster {{ $labels.cluster }}
description: This can be an indicator of two or more nodes failing in a cloud provider
expr: |
sum by (cluster) (increase(gitpod_ws_manager_mk2_workspace_backups_failure_total{cluster!~"ephemeral.*"}[1h])) > 16

0 comments on commit 2c2a86e

Please sign in to comment.