Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

init container to decide allowing reconciliation of manager #196

Merged
merged 2 commits into from
Jul 31, 2024

Conversation

leelavg
Copy link
Contributor

@leelavg leelavg commented Jul 29, 2024

we want to run manager only when required, i.e, we are going to reconcile and manage CSI among other operands. we are achieving that by adding an init container which only get successful if client-operator is deployed standalone or running in provider configured odf cluster.

Depends on: red-hat-storage/ocs-operator#2717
ref: red-hat-storage/odf-operator#450

@leelavg
Copy link
Contributor Author

leelavg commented Jul 29, 2024

/cherry-pick release-4.16

@openshift-cherrypick-robot

@leelavg: once the present PR merges, I will cherry-pick it on top of release-4.16 in a new PR and assign it to you.

In response to this:

/cherry-pick release-4.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@leelavg leelavg requested review from nb-ohad and Madhu-1 and removed request for nb-ohad July 29, 2024 06:58
@leelavg
Copy link
Contributor Author

leelavg commented Jul 29, 2024

/hold

@pull-request-size pull-request-size bot added size/L and removed size/M labels Jul 29, 2024
@leelavg leelavg changed the title controllers: reconcile webhook only when we are managing CSI init container to decide allowing of reconciliation of manager Jul 29, 2024
@leelavg
Copy link
Contributor Author

leelavg commented Jul 29, 2024

Testing:

- No StorageCluster CRD
I0729 14:00:24.636988 2124186 main.go:51] StorageCluster CRD not found and proceeding with reconciliation of operands

- CRD exists but not storagecluster
I0729 13:51:26.625930 2120304 main.go:77] Waiting for StorageCluster to be deployed
I0729 13:51:31.638621 2120304 main.go:77] Waiting for StorageCluster to be deployed
^Csignal: interrupt

- storagecluster exists but not in Provider mode
> k -nopenshift-storage get storagecluster/ocs-storagecluster -ojsonpath='{.metadata.annotations}' | jq 'del(."kubectl.kubernetes.io/last-applied-configuration")'
{
  "uninstall.ocs.openshift.io/cleanup-policy": "delete",
  "uninstall.ocs.openshift.io/mode": "graceful"
}
I0729 13:52:24.907032 2120630 main.go:79] Waiting for StorageCluster deployed mode to be 'Provider'
I0729 13:52:30.179057 2120630 main.go:79] Waiting for StorageCluster deployed mode to be 'Provider'
^Csignal: interrupt

> k -nopenshift-storage get storagecluster/ocs-storagecluster -ojsonpath='{.metadata.annotations}' | jq 'del(."kubectl.kubernetes.io/last-applied-configuration")'
{
  "mode": "provider",
  "uninstall.ocs.openshift.io/cleanup-policy": "delete",
  "uninstall.ocs.openshift.io/mode": "graceful"
}
I0729 13:56:41.060310 2123137 main.go:72] StorageCluster is configured to run in 'Provider' mode and proceeding with reconciliation of operands

update to the client-console deployment is yet to be tested.

Copy link
Contributor

@nb-ohad nb-ohad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a new service account an a new RBAC?
Why can't we just use the operators SA and RBAC?

service/reconcile-waiter/main.go Outdated Show resolved Hide resolved
service/reconcile-waiter/main.go Outdated Show resolved Hide resolved
service/reconcile-waiter/main.go Outdated Show resolved Hide resolved
@leelavg leelavg changed the title init container to decide allowing of reconciliation of manager init container to decide allowing reconciliation of manager Jul 30, 2024
@leelavg
Copy link
Contributor Author

leelavg commented Jul 30, 2024

Why do we need a new service account an a new RBAC? Why can't we just use the operators SA and RBAC?

  1. We have two deployments in CSV, client-op (custom sa) and client-console (default sa)
  2. We need to inject sidecar into both of these deployments, however custom sa in client-op is enough for it's init containers but using the same which has more privilege for init container in client-console didn't seem good
  3. Reusing client-op sa for client-console reduces good chunk of rbac in this PR which I could do based on your comment, wdyt?

I realized something when replying to the comments, CSV will only be successful if Pod is in Running state, with an init container, pod will only be in Running state after init which could potentially fail d/s CI. Overall we might need to rethink on this.

@nb-ohad
Copy link
Contributor

nb-ohad commented Jul 30, 2024

Why do we need a new service account an a new RBAC? Why can't we just use the operators SA and RBAC?

  1. We have two deployments in CSV, client-op (custom sa) and client-console (default sa)
  2. We need to inject sidecar into both of these deployments, however custom sa in client-op is enough for it's init containers but using the same which has more privilege for init container in client-console didn't seem good
  3. Reusing client-op sa for client-console reduces good chunk of rbac in this PR which I could do based on your comment, wdyt?

I realized something when replying to the comments, CSV will only be successful if Pod is in Running state, with an init container, pod will only be in Running state after init which could potentially fail d/s CI. Overall we might need to rethink on this.

CI should take into account that this pod is waiting on installation, not the other way around

@leelavg leelavg force-pushed the webhook branch 3 times, most recently from 888f0a3 to 2b4d44b Compare July 31, 2024 03:38
@leelavg
Copy link
Contributor Author

leelavg commented Jul 31, 2024

Tested:

  1. client-op is running standalone : proceed
  2. client-op is installed alongside odf with
    a. storagecluster doesn't exist : wait
    b. storagecluster not in provider mode: wait
    c. storagecluster in provider mode: proceed

@leelavg
Copy link
Contributor Author

leelavg commented Jul 31, 2024

I have raised tracker issues for other u/s projects which tries to consume client-op as olm bundles, need to fix them if those projects require updated client-op bundles:
red-hat-storage/ocs-operator#2722
red-hat-storage/odf-operator#461

service/deployment-guard/main.go Outdated Show resolved Hide resolved
use an init container to wait till either we are running standalone or
alongside with odf configured in provider mode

Signed-off-by: Leela Venkaiah G <[email protected]>
@nb-ohad
Copy link
Contributor

nb-ohad commented Jul 31, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Jul 31, 2024
Copy link

openshift-ci bot commented Jul 31, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: leelavg, nb-ohad

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@leelavg
Copy link
Contributor Author

leelavg commented Jul 31, 2024

/unhold

@openshift-merge-bot openshift-merge-bot bot merged commit c2ae75e into red-hat-storage:main Jul 31, 2024
12 checks passed
@openshift-cherrypick-robot

@leelavg: #196 failed to apply on top of branch "release-4.16":

Applying: bundle: add a new init container to wait for reconcile condition
Using index info to reconstruct a base tree...
M	Dockerfile
M	Makefile
M	bundle.Dockerfile
M	bundle/manifests/ocs-client-operator.clusterserviceversion.yaml
M	bundle/metadata/annotations.yaml
M	hack/go-build.sh
Falling back to patching base and 3-way merge...
Auto-merging hack/go-build.sh
Auto-merging bundle/metadata/annotations.yaml
Auto-merging bundle/manifests/ocs-client-operator.clusterserviceversion.yaml
CONFLICT (content): Merge conflict in bundle/manifests/ocs-client-operator.clusterserviceversion.yaml
Auto-merging bundle.Dockerfile
Auto-merging Makefile
Auto-merging Dockerfile
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 bundle: add a new init container to wait for reconcile condition
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-4.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

leelavg added a commit to leelavg/ocs-client-operator that referenced this pull request Jul 31, 2024
leelavg added a commit to leelavg/ocs-client-operator that referenced this pull request Jul 31, 2024
leelavg added a commit to leelavg/ocs-client-operator that referenced this pull request Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants