Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gradual rollout / multicluster support #312

Open
StupidScience opened this issue Feb 21, 2024 · 6 comments
Open

Gradual rollout / multicluster support #312

StupidScience opened this issue Feb 21, 2024 · 6 comments

Comments

@StupidScience
Copy link

Hello.

We're at the moment considering migration from deprecated k8s-registrar to spire-controller-manager and we're facing few challenges:

  • No way to do a gradual migration / test changes in non-disruptive way
  • No support for our multicluster setup

Context

We have multi-cluster setup for trust domain. So in each cluster we have spire installation with servers, agents and k8s-registrar in reconciler mode. Spire servers have shared database tho. In reconciler mode basically each clusters' k8s-registrar is responsible for its own cluster and do not touch entries that belong to another cluster.

Once we install spire-controller-manager even in one cluster it immediately removes all registered by k8s-registrar (or in any other way) entries. So k8s-registrar lose its permissions to register smth, all registrered entries for all k8s clusters along with all static entries are gone and all federation entries defined in spire server config also being removed without visible attempts to recover.
If we would install controller in few cluster I imagine they would constantly remove each other entries.

I briefly looked into code and it seems to be expected behaviour, so controller becomes the only source of truth for all entries.

What we would like to add:

Dry-run mode

So controller would only print out what it is going to do instead of doing updating/deleting/etc.

Ownership mechanism

So controller would only look into entries that it is owner for. It would help for both multi-cluster setup and gradual migration.

Possible solution for ownership

Controller manager can add some metadata to entries' Hints, e.g. Hint: owner=cluster-1. In this case controller manager for cluster-1 will touch only records that it is owner for and skip all others:

  • with empty metadata to allow gradually switch from one registration mechanism to another
  • with Hint: owner=cluster-(2,3,4,...) to support multiple clusters installation

By default this ownership could be disabled so breaking change won't introduced and other flags could be added to get an ownership over objects if required.

I believe external-dns uses somewhat similar mechanism with ownership via TXT records.

It is not clear for me what to do better for out of k8s "static" entries and federations so would appreciate your input.

Let me know if you want me to split this issue into multiple ones.

@riuvshyn
Copy link

static entries should be fine actually as in both clusters we can have identical ClusterStaticEntry resources and both controllers can reconcile it feels like the only problem with k8s workloads which are k8s cluster specific.

@StupidScience
Copy link
Author

@kfox1111 you mentioned some hint based filtering in spiffe/spire#4898. Is it WIP/PoC somewhere or I misinterpreted it or just didn’t find in this repo?

@kfox1111
Copy link
Contributor

Something I tried just on my own box.

@kfox1111
Copy link
Contributor

@StupidScience Have a look at #325

@StupidScience
Copy link
Author

@kfox1111 thanks, I checked and conceptually (didn't look thoroughly through the code) it should work for our use case at least. How will that behave with ClusterStaticEntry resources tho? Or is it only for ClusterSPIFFEID?

Will try to elaborate a bit:

In my understanding ClusterStaticEntry resources are not cluster specific actually but rather trust domain specific. So in this case will each controller try to create their own entry? Will that actually work?

@kfox1111
Copy link
Contributor

It should work for static entries as well.

The change only filters what the controller manager looks at when reconciling entries. Agents use the full set.

Multiple controller managers may still might fight if the unique spiffeid/selectors/parentid are the same across multiple clusters. But otherwise, should work I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants