Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-prometheus-stack - Retention problems #4869

Open
brancomrt opened this issue Sep 20, 2024 · 5 comments
Open

kube-prometheus-stack - Retention problems #4869

brancomrt opened this issue Sep 20, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@brancomrt
Copy link

Describe the bug a clear and concise description of what the bug is.

I am experiencing issues with the configuration of retention policies in the kube-prometheus-stack when installed via Helm chart version 61.7.1.

I set the parameter prometheus.prometheusSpec.retention to a value of 10m or 1h for testing data rotation purposes, but the storage PVC keeps growing and does not clean up the data.

What's your helm version?

version.BuildInfo{Version:"v3.14.4", GitCommit:"81c902a123462fd4052bc5e9aa9c513c4c8fc142", GitTreeState:"clean", GoVersion:"go1.21.9"}

What's your kubectl version?

Client Version: v1.27.10 Kustomize Version: v5.0.1 Server Version: v1.28.12+rke2r1

Which chart?

kube-prometheus-stack

What's the chart version?

61.7.1

What happened?

I am experiencing issues with the configuration of retention policies in the kube-prometheus-stack when installed via Helm chart version 61.7.1.

I set the parameter prometheus.prometheusSpec.retention to a value of 10m or 1h for testing data rotation purposes, but the storage PVC keeps growing and does not clean up the data.

What you expected to happen?

Automatic cleanup of Prometheus storage data on the PVC

How to reproduce it?

Waiting for the retention period defined in the values.yaml and checking the storage size of the PVC prometheus-kube-prometheus-stack-prometheus-db-prometheus-kube-prometheus-stack-prometheus-0 to see if it decreases.

Enter the changed values of values.yaml?

prometheus.prometheusSpec.retention

Enter the command that you execute and failing/misfunctioning.

helm upgrade kube-prometheus-stack -n monitoring ./

Local values.yaml chart.

Anything else we need to know?

No response

@brancomrt brancomrt added the bug Something isn't working label Sep 20, 2024
@brancomrt
Copy link
Author

I am using a storage class that stores data on NFS.

storageSpec:
volumeClaimTemplate:
spec:
storageClassName: "nfs-client"
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 200Gi

kubectl get storageclasses.storage.k8s.io

NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
nfs-client cluster.local/nfs-subdir-external-provisioner Delete Immediate true 131d

@chanakya-svt
Copy link

@brancomrt I am also facing the same issue with the retention. I set my retention to 15m but the metrics are cleared and the wal size keeps increasing consuming my disk to the point that I am missing metrics because of no space on device.

Were you able to resolve this?

TIA

Below are my args in the statefulset passed to prometheus v2.54.1

--web.console.templates=/etc/prometheus/consoles    
--web.console.libraries=/etc/prometheus/console_libraries 
--config.file=/etc/prometheus/config_out/prometheus.env.yaml                       
--web.enable-lifecycle                                     
--web.external-url=https://redacted.com/prometheus-metrics
--web.route-prefix=/prometheus-metrics                                                                
--log.level=debug                                                              
--storage.tsdb.retention.time=15m
--storage.tsdb.path=/prometheus
--storage.tsdb.wal-compression
--web.config.file=/etc/prometheus/web_config/web-config.yaml

@chanakya-svt
Copy link

It was mentioned here in a comment that its resolved in v2.21 but I am using v2.54 and issue still persists.

@DrFaust92
Copy link
Contributor

I cant find exact ref to this but because default block size is compacted every 2 hrs you cannot set retention to below that value without changing serveral other parameters as well.

regardless, this is a ticket is relevant for upstream prom/operator and not the chart repo

@brancomrt
Copy link
Author

Thank you @DrFaust92

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants