Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catalogd is not reloading server certificate #378

Open
joelanford opened this issue Sep 6, 2024 · 3 comments
Open

Catalogd is not reloading server certificate #378

joelanford opened this issue Sep 6, 2024 · 3 comments

Comments

@joelanford
Copy link
Member

If the server certificate changes, the catalogd webserver needs to reload it.

0.24.0 introduces a new hostname in catalogd's server certificate, but it appears that during an upgrade from 0.23.0 to 0.24.0, the new catalogd pods start prior to the cert-manager noticing the Certificate change and updating the secret.

Catalogd should watch the mounted secret and reload it when it changes.

@joelanford
Copy link
Member Author

I'm looking more into this now. It looks like we are using a cert watcher and the cert watcher eventually notices that the files change. However it can take quite some time, which results in the mutating webhook being unavailable (because the mutating webhook configuration is attempting to use the name that has not yet propagated to the serving cert).

Trying to figure out where this delay is coming from:

  1. Is cert-manager taking some time to regenerate the secret?
  2. Is kubelet taking some time to propagate the secret change into the pod's volume mount?
  3. Is the cert-manager not noticing the change immediately?

I have a feeling it is (2), so looking into that possibility a bit more.

@joelanford
Copy link
Member Author

joelanford commented Sep 6, 2024

Trying to figure out where this delay is coming from:

  1. Is cert-manager taking some time to regenerate the secret?
  2. Is kubelet taking some time to propagate the secret change into the pod's volume mount?
  3. Is the cert-manager not noticing the change immediately?

As I suspected, it looks like (2) is the issue. Ultimately I think this problem is a confluence of factors

  1. We introduced a new webhook in catalogd.
  2. We created a new Service to route traffic to the new webhook.
  3. We updated a Certificate to include a new DNS name for (2).
  4. We kept using the same secret name in the Certificate in (3). This means the new catalogd pod is able to successfully mount the old secret prior to the edit from (3) propagating to the secret.
  5. We created a new MutatingWebhookConfiguration that uses the new DNS name from (2).
  6. Kubelet takes some time to propagate the secret update into the catalogd pod volume mount.

We can avoid issues like this in the future in either of the following ways:

  1. Use an existing service, which would avoid the need to change the certificate.
  2. Use a different secret name when the content of the certificate changes. That would ensure that the new catalogd pod only ever successfully mounts the correct certificate.

@tmshort
Copy link
Contributor

tmshort commented Sep 6, 2024

I believe when I was testing the CA watcher in the operator-controller, it could take up to two minutes for kubelet to update the certificates. But typically it was a minute or less.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants