Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common automatic update #266

Merged
merged 10 commits into from
Jul 8, 2023
Merged

common automatic update #266

merged 10 commits into from
Jul 8, 2023

Conversation

mbaldessari
Copy link
Contributor

  • Revert "Make Test"
  • Revert "Adding rbac to support the vault sa checking on the vault-0 pod status."
  • Revert "Fix image and comment"
  • Revert "Add tests"
  • Revert "WIP add presync for eso that waits for vault to be up"
  • Increase the default retry limit when syncing
  • Add Changes.md entry
  • Fix up tests after common rebase

This reverts commit 64e9dc7.
This reverts commit ab5532a.
ArgoCD will retry 5 times by default to sync an application in case of
errors and then will give up. So if an application contains a reference
to a CRD that has not been installed yet (say because it will be
installed by another application), it will error out and retry later.
This happens by default for a maximum of 5 times [1]. After those 5 times
the application will give up and will stay in Degraded moded and
eventually move to Failed. In this case a manual sync will usually fix
the application just fine (i.e. as long as the missing CRD has been
installed in the meantime).

Now to solve this issue we can add complex preSync Jobs that wait for
the needed resources, but this fundamentally breaks the simplicity of
things and introduces unneeded dependencies. In this change we just
increase the default retry limit to something larger (20) that should
cover most cases. The retry limit functionality is rather undocumented
currently in the docs but is defined at [2] and also shown at [3].

In our patterns' case the concrete issue happened as follows:
1. ESO ClusterSecrets were often not synced/degraded
2. We introduced a Job in a preSync hook for the ESO chart that would
   wait on vault to be ready before applying the rest of ESO
3. MCG started failing because the config-demo app had already tried to
   sync 5 times and failed everytime because the ESO CRDs were not
   installed yet (due to ESO waiting on vault)

So instead of adding yet another job, let's just try a lot more often.
We picked 20 as a sane default because that should have argo try for
about 60 minutes (3min is the default maximum backoff limit)

Tested with two MCG installations (with the ESO Job hook included) and
both worked out of the box. Whereas before I managed to get three
failures out of three installs.

[1] https://github.com/argoproj/argo-cd/blob/master/controller/appcontroller.go#L1680
[2] https://github.com/argoproj/argo-cd/blob/master/manifests/crds/application-crd.yaml#L1476
[3] https://github.com/argoproj/argo-cd/blob/master/docs/operator-manual/application.yaml#L202C18-L202C100
Revert ESO/Vault Job and add a default higher number of retries
@mbaldessari mbaldessari merged commit cef4dc8 into validatedpatterns:main Jul 8, 2023
4 checks passed
mhjacks pushed a commit to mhjacks/multicloud-gitops that referenced this pull request Jul 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant