-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 wait for lb dns name to propagate before resolving #5033
base: main
Are you sure you want to change the base?
Conversation
Instead of trying to resolve the primary LB DNS name right after its creation, wait for it to propagate so the resolution is most likely to succeed. This fixes an issue where the first "no such host" cached dns response with high TTL would make CAPA spin for minutes (as high as 15!) waiting for the DNS name to resolve even though it had already propagated a few minutes after the first attempt.
This should help speed things up a bit while we are waiting for the primary LB DNS name to propagate.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @r4f4. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
* Fix for DNS name resolution too early: kubernetes-sigs/cluster-api-provider-aws#5033
Openshift e2e tests show no regressions for cases with low TTL. For example in this run the DNS name is resolved right after the wait is done:
and we can see that the secondary LB is reconciling while we wait for the dns name:
|
/ok-to-test |
/test pull-cluster-api-provider-aws-e2e-blocking |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/lgtm Can you please reword the changelog entry – "a possible issue" is too vague. |
Done. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Instead of trying to resolve the primary LB DNS name right after its creation, wait for it to propagate so the resolution is most likely to succeed.
This fixes an issue where the first "no such host" cached dns response with high TTL would make CAPA spin for minutes (as high as 15!) waiting for the DNS name to resolve even though it had already propagated a few minutes after the first attempt.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #5032
Special notes for your reviewer:
I couldn't find a more elegant way to solve this other than a sleep after the LB is created. I wanted to add a
retryAfterDuration
here right after the DNS name is set and before the name resolution is attempted but it would involve somehow saving state of the timestamp in between reconcile loops.Checklist:
Release note: