[kube-prometheus-stack] update kps to v0.65.1 #3267

QuentinBisson · 2023-04-20T10:39:24Z

What this PR does / why we need it

Upgrades prometheus-operator to v0.65.1

Which issue this PR fixes

(optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged)

fixes [kube-prometheus-stack] update to 0.64 #3214

Special notes for your reviewer

Checklist

DCO signed
Chart Version bumped
Title of the PR starts with chart name (e.g. [prometheus-couchdb-exporter])

@monotek and @GMartinez-Sisti as it's my first major release, I'd like if you could take a look, especially as this one adds a new crd

Signed-off-by: QuentinBisson <[email protected]>

QuentinBisson · 2023-04-20T12:18:44Z

Well this will not be fun to fix:

Error: INSTALLATION FAILED: create: failed to create: Secret "sh.helm.release.v1.kube-prometheus-stack-l5sr2p6xtu.v1" is invalid: data: Too long: must have at most 1048576 bytes

QuentinBisson · 2023-04-20T13:55:12Z

So the issue is coming from the newer PrometheusAgent CRD which is causing the helm secret to be too big.

The only solution I can think of is to have the crds outside the chart but maybe you can think of a better solution?

If that is fine with you i was thinking of doing something similar to https://github.com/cowboysysop/charts/tree/master/charts/vertical-pod-autoscaler/templates/crds a job that pulls and would then also update the crds

zeritti · 2023-04-20T14:02:34Z

The only solution I can think of is to have the crds outside the chart but maybe you can think of a better solution?

An immediate/short-term solution might be using stripped-down crds as released by the Prometheus operator team. These do not include description fields, the manifests are considerably smaller.

QuentinBisson · 2023-04-20T14:18:42Z

That could work but this would reduce functionality as users would not be able to use kubectl explain on the CRDs but also, as you mention, this is only short-term. With the upcoming ScrapeConfig and maybe Silence CRD in prometheus operator, I know this issue will happen again "soon" so I would rather we have a proper fix :)

I also tend to think the job will help as upgrades of CRDs will then be at the charge of the helm chart :)

zeritti · 2023-04-20T15:09:02Z

Yes, a job would do. Keeping the CRDs out of the chart seems to be the only fix in this respect.

GMartinez-Sisti · 2023-04-20T15:53:02Z

Related #1876

monotek

Can you please check for the failing ci?

QuentinBisson · 2023-04-21T09:04:47Z

@monotek i'm trying to fix it but the issue is that the New prometheus agent crd make us go over the 1MB secret limit of Kubernetes

QuentinBisson · 2023-04-24T11:22:16Z

@GMartinez-Sisti and @monotek I tested this on existing clusters but this is quite a big change so I won't merge with newer approvals

Signed-off-by: QuentinBisson <[email protected]>

charts/kube-prometheus-stack/templates/crds/job.yaml

Signed-off-by: Quentin Bisson <[email protected]>

GMartinez-Sisti · 2023-04-24T11:36:43Z

charts/kube-prometheus-stack/templates/crds/job.yaml

+            - --server-side
+            - --force-conflicts={{ $.Values.crds.forceConflicts }}
+            - -f
+            - https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/{{ $appVersion }}/example/prometheus-operator-crd/monitoring.coreos.com_{{ . }}.yaml


I would also allow this URL to be customisable as some environments might not have external internet access and this can be easily checked into an internal artefact repository.

GMartinez-Sisti · 2023-04-24T11:38:18Z

@GMartinez-Sisti and @monotek I tested this on existing clusters but this is quite a big change so I won't merge with newer approvals

Ideally we would have CI automation to deploy the version on main and then the PR one over it. I can test this but it would take a couple of days. The logic looks good though, I like this approach 💪 .

Left some comments.

Signed-off-by: QuentinBisson <[email protected]>

jcpunk · 2023-05-12T20:47:09Z

charts/kube-prometheus-stack/values.yaml

+  ## This defaults to non root user with uid 2000 and gid 2000. *v1.PodSecurityContext  false
+  ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
+  ##
+  securityContext:


If this could be made compatible with the restricted profile by default that would be great.

Signed-off-by: QuentinBisson <[email protected]>

Signed-off-by: Quentin Bisson <[email protected]>

QuentinBisson · 2023-05-16T14:08:14Z

We really need to reach a concensus here because we need to bump the major version to avoid issues with prometheus-operator that was upgraded in another PR.

Either we:

Keep the CRD folders but do not add the newer ones as they are not causing the prometheus-operator to fail
Delete the CRD folder (fix the test) and explain to users that they need to install it themselves
Add the crd job present in this branch
Move something else in another chart (dashboards, rules, ...)?
Find another way forward.

@monotek I know you do not like that approach but would you have another one we can go with in your opinion? I would really like some help to move this forwards :(

rouke-broersma · 2023-05-16T14:20:22Z

We really need to reach a concensus here because we need to bump the major version to avoid issues with prometheus-operator that was upgraded in another PR.

Personally I'm fine with any solution because I already don't use this chart for CRDs. I would say that moving the dashboards out of this chart is probably not an alternative but rather a necessary next step instead because I imagine that sooner or later this limit will be hit again if the dashboards remain in this chart.

zeritti · 2023-05-16T17:54:58Z

I'd suggest adding thanosrulers/status in resources in the operator's clusterrole template, the subresource is new in 0.65.0.

onedr0p · 2023-05-16T18:51:25Z

One thing to note is that Argo doesn't implement helm properly meaning the crds option present currently in this PR is something that Argo folks cannot use because they are implemented with a helm hook. I'm not using Argo so I'm not sure how important that is or if Argo users have a different means of managing crds that can keep up to date with this helm chart.

rouke-broersma · 2023-05-16T19:01:40Z

One thing to note is that Argo doesn't implement helm properly meaning the crds option present currently in this PR is something that Argo folks cannot use because they are implemented with a helm hook. I'm not using Argo so I'm not sure how important that is or if Argo users have a different means of managing crds that can keep up to date with this helm chart.

Argo converts helm hooks to Argo hooks, pre-install and preupgrade become 'presync' which should be fine in this case I believe.

xiu · 2023-05-19T14:39:00Z

We really need to reach a concensus here because we need to bump the major version to avoid issues with prometheus-operator that was upgraded in another PR.

Keep the CRD folders but do not add the newer ones as they are not causing the prometheus-operator to fail

This could be an option as prometheus-operator doesn't need either ScrapeConfig or PrometheusAgent to start. It just pushes the problem further in the future though.

Delete the CRD folder (fix the test) and explain to users that they need to install it themselves

This would be my preferred option to keep things simple. People can use prometheus-operator-crds to install the CRDs.

jkroepke · 2023-05-22T19:17:23Z

I'm for for deleting the CRDs from kube-prometheus-stack and shift the maintenance to prometheus-operator-crd helm chart. This also resolve issue that kubectl needs manually applied on major updates.

However, please keep in mind, removing the CRDs from crds/ folder does not remove the CRD from cluster.

This can cause can conflict that prometheus-operator-crd can't installed on existing clusters. We have to add a documentation to "migrate" the CRDs from kube-prometheus-stack to prometheus-operator-crd, for example that the CRDs needs to be annotated with meta.helm.sh/release-name: <RELEASE_NAME> and meta.helm.sh/release-namespace: <RELEASE_NAMESPACE>

onedr0p · 2023-05-22T19:31:10Z

@jkroepke I am not happy with that solution because of what I mentioned here[1] and my proceeding 3 comments. Flux is able to manage the CRDs just fine in the special crds folders.

[1] #3267 (comment)

jkroepke · 2023-05-22T19:38:10Z

And what about non-flux users? I would personally not care about ArgoCD or flux here. Nice to see that the GitOps operators has some special feature to just fix helm, but we should looking forward for typical helm users. As I know, ArgoCD solves the CRDs issues with helm template | kubectl apply

@onedr0p you can create a umbrella chart to bundle both helm chart to have the old behavior back.

onedr0p · 2023-05-22T19:40:44Z

And what about non-flux users?

Documentation still stays the same, kubectl apply the crds on upgrade. I think we've found the major issue here, being the grafana dashboards having the largest size[1].

[1] #3267 (comment)

Even if we take out the crds, leaving the dashboards in this chart will only lead to this issue down the road as more things are added.

jkroepke · 2023-05-22T20:07:43Z

I think we've found the major issue here, being the grafana dashboards having the largest size[1].

Yeah. I created #3416 as POC which minified all the dashboard json files.

From

1.6M    ../templates/grafana/dashboards-1.14

down to

688K    ../templates/grafana/dashboards-1.14

which can have a potential save of 1MB data (or 1MB of pure whitespaces)...

rouke-broersma · 2023-05-22T21:01:34Z

I think we've found the major issue here, being the grafana dashboards having the largest size[1].

Yeah. I created #3416 as POC which minified all the dashboard json files.

From
1.6M    ../templates/grafana/dashboards-1.14
down to
688K    ../templates/grafana/dashboards-1.14
which can have a potential save of 1MB data (or 1MB of pure whitespaces)...

#3267 (comment)

Helm already compresses all manifests inside the secret so not sure if minifying the json helps anything. Needs to be properly tested.

jkroepke · 2023-05-23T05:07:02Z

Helm already compresses all manifests inside the secret so not sure if minifying the json helps anything. Needs to be properly tested.

Everyone is invited to test #3416, the CI is at least green.

rouke-broersma · 2023-05-23T06:37:05Z

Helm already compresses all manifests inside the secret so not sure if minifying the json helps anything. Needs to be properly tested.

Everyone is invited to test #3416, the CI is at least green.

Ah I missed that you posted the secret sizes sorry, looks good! Tho 979K is still dangerously close to the limit, a couple more crds (which some have commented are already planned) and we're back there.

jkroepke · 2023-05-23T06:41:53Z

I agree, that this is not a long term solution. We may need to discuss how to proceed there. However there a short-term fix for now to proceed as usual.

I do not really understand, why CRDs are stored in a helm release while helm does not managed them. The helm guys are crazy.

jkroepke · 2023-05-23T07:20:21Z

By the way, on a another test branch:

Minify the CRDs (convert them from YAML to JSON) would also reduce the size to 811K in total.

% kubectl get secrets sh.helm.release.v1.kube-prometheus-stack.v1 -o jsonpath='{.data.release}' | base64 -d > release
% ls -lah release                                                                                                    
-rw-r--r--@ 1 jok  staff   811K May 23 09:11 release

While this is fine from Helm/Kubernetes point of view, I'm not aware of the downsides by deliver JSON manifests

QuentinBisson · 2023-05-23T13:52:58Z

This will be closed in favor of #3416 but I will keep it open to make this into an issue to define the way forward

monotek

set to "request changes" to block accidental merging it as changes are already implemented in #3416

jkroepke · 2023-06-02T04:10:25Z

Hi there, would it make sense to create an issue instead discuss in a PR? It feels like this topic is in sleep mode for know. I'm interrest into finding a solution before having the issues on new CRDs again.

QuentinBisson · 2023-06-02T07:03:58Z

@jkroepke yes it would but I did not have the bandwith to write the issue down yet, you can do it if you want :)

jkroepke · 2023-07-03T23:15:33Z

@jkroepke yes it would but I did not have the bandwith to write the issue down yet, you can do it if you want :)

#3548 created. @QuentinBisson

QuentinBisson · 2023-07-04T07:10:17Z

Thank you @jkroepke :)

QuentinBisson requested review from andrewgkew, gianrubio, gkarthiks, GMartinez-Sisti, scottrigby and Xtigyro as code owners April 20, 2023 10:39

update kps to v0.64.0

c108a24

Signed-off-by: QuentinBisson <[email protected]>

QuentinBisson force-pushed the kps64 branch from 61eee41 to c108a24 Compare April 20, 2023 10:39

GMartinez-Sisti previously approved these changes Apr 20, 2023

View reviewed changes

adjust rbac

1c25302

Signed-off-by: QuentinBisson <[email protected]>

QuentinBisson dismissed GMartinez-Sisti’s stale review via 1c25302 April 20, 2023 12:00

monotek approved these changes Apr 21, 2023

View reviewed changes

monotek requested changes Apr 21, 2023

View reviewed changes

QuentinBisson force-pushed the kps64 branch from 0f51825 to 0aa4d90 Compare April 24, 2023 11:21

QuentinBisson force-pushed the kps64 branch from d84d0c6 to 6861257 Compare April 24, 2023 11:25

[kube-prometheus-stack] Move CRD installation/upgrade to a job

082861b

Signed-off-by: QuentinBisson <[email protected]>

QuentinBisson force-pushed the kps64 branch from 6861257 to 082861b Compare April 24, 2023 11:27

GMartinez-Sisti reviewed Apr 24, 2023

View reviewed changes

charts/kube-prometheus-stack/templates/crds/job.yaml Outdated Show resolved Hide resolved

Merge branch 'main' into kps64

bee0f0c

Signed-off-by: Quentin Bisson <[email protected]>

GMartinez-Sisti reviewed Apr 24, 2023

View reviewed changes

Move crd list to values

704b130

Signed-off-by: QuentinBisson <[email protected]>

jcpunk mentioned this pull request May 9, 2023

Implement "helm get notes" feature on pull and/or update kluctl/kluctl#367

Open

14 tasks

jcpunk reviewed May 12, 2023

View reviewed changes

add default security profile

98b5b05

Signed-off-by: QuentinBisson <[email protected]>

QuentinBisson force-pushed the kps64 branch from ef08496 to 98b5b05 Compare May 16, 2023 14:03

Merge branch 'main' into kps64

7d2c5cf

Signed-off-by: Quentin Bisson <[email protected]>

Jellyfrog mentioned this pull request May 17, 2023

[kube-prometheus-stack] Add scrapeconfigs to prometheus-operator ClusterRole #3393

Closed

3 tasks

jkroepke mentioned this pull request May 22, 2023

[kube-prometheus-stack] Minify Grafana Dashboards and Update CRDs to 0.65.1 #3416

Merged

3 tasks

monotek requested changes May 25, 2023

View reviewed changes

jkroepke closed this Jul 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[kube-prometheus-stack] update kps to v0.65.1 #3267

[kube-prometheus-stack] update kps to v0.65.1 #3267

QuentinBisson commented Apr 20, 2023 •

edited

Loading

QuentinBisson commented Apr 20, 2023

QuentinBisson commented Apr 20, 2023

zeritti commented Apr 20, 2023

QuentinBisson commented Apr 20, 2023 •

edited

Loading

zeritti commented Apr 20, 2023

GMartinez-Sisti commented Apr 20, 2023

monotek left a comment

QuentinBisson commented Apr 21, 2023

QuentinBisson commented Apr 24, 2023

GMartinez-Sisti Apr 24, 2023

GMartinez-Sisti commented Apr 24, 2023

jcpunk May 12, 2023

QuentinBisson commented May 16, 2023

rouke-broersma commented May 16, 2023

zeritti commented May 16, 2023

onedr0p commented May 16, 2023

rouke-broersma commented May 16, 2023

xiu commented May 19, 2023

jkroepke commented May 22, 2023

onedr0p commented May 22, 2023 •

edited

Loading

jkroepke commented May 22, 2023 •

edited

Loading

onedr0p commented May 22, 2023 •

edited

Loading

jkroepke commented May 22, 2023

rouke-broersma commented May 22, 2023

jkroepke commented May 23, 2023

rouke-broersma commented May 23, 2023

jkroepke commented May 23, 2023

jkroepke commented May 23, 2023

QuentinBisson commented May 23, 2023

monotek left a comment •

edited

Loading

jkroepke commented Jun 2, 2023

QuentinBisson commented Jun 2, 2023

jkroepke commented Jul 3, 2023

QuentinBisson commented Jul 4, 2023

[kube-prometheus-stack] update kps to v0.65.1 #3267

[kube-prometheus-stack] update kps to v0.65.1 #3267

Conversation

QuentinBisson commented Apr 20, 2023 • edited Loading

What this PR does / why we need it

Which issue this PR fixes

Special notes for your reviewer

Checklist

QuentinBisson commented Apr 20, 2023

QuentinBisson commented Apr 20, 2023

zeritti commented Apr 20, 2023

QuentinBisson commented Apr 20, 2023 • edited Loading

zeritti commented Apr 20, 2023

GMartinez-Sisti commented Apr 20, 2023

monotek left a comment

Choose a reason for hiding this comment

QuentinBisson commented Apr 21, 2023

QuentinBisson commented Apr 24, 2023

GMartinez-Sisti Apr 24, 2023

Choose a reason for hiding this comment

GMartinez-Sisti commented Apr 24, 2023

jcpunk May 12, 2023

Choose a reason for hiding this comment

QuentinBisson commented May 16, 2023

rouke-broersma commented May 16, 2023

zeritti commented May 16, 2023

onedr0p commented May 16, 2023

rouke-broersma commented May 16, 2023

xiu commented May 19, 2023

jkroepke commented May 22, 2023

onedr0p commented May 22, 2023 • edited Loading

jkroepke commented May 22, 2023 • edited Loading

onedr0p commented May 22, 2023 • edited Loading

jkroepke commented May 22, 2023

rouke-broersma commented May 22, 2023

jkroepke commented May 23, 2023

rouke-broersma commented May 23, 2023

jkroepke commented May 23, 2023

jkroepke commented May 23, 2023

QuentinBisson commented May 23, 2023

monotek left a comment • edited Loading

Choose a reason for hiding this comment

jkroepke commented Jun 2, 2023

QuentinBisson commented Jun 2, 2023

jkroepke commented Jul 3, 2023

QuentinBisson commented Jul 4, 2023

QuentinBisson commented Apr 20, 2023 •

edited

Loading

QuentinBisson commented Apr 20, 2023 •

edited

Loading

onedr0p commented May 22, 2023 •

edited

Loading

jkroepke commented May 22, 2023 •

edited

Loading

onedr0p commented May 22, 2023 •

edited

Loading

monotek left a comment •

edited

Loading