Edge case around `timeout` #35

macobo · 2021-07-08T07:56:13Z

Thanks again for creating this action! 🚀 We've been using at posthog for our own helm chart to great success.

However we had been having difficulty with flaky tests due to timeouts. Since the stack is quite involved, it can take >10m to boot on k3s on github actions sometimes. We found that when our helm chart took longer, this action always timed out at 10 minutes with the following error:

Error: Action failure: rollout of deploy/posthog-web
The process '/usr/local/bin/kubectl' failed with exit code 1

error: deployment "posthog-web" exceeded its progress deadline

The underlying cause is this:

[Deployments] .spec.progressDeadlineSeconds is an optional field that specifies the number of seconds you want to wait for your Deployment to progress before the system reports back that the Deployment has failed progressing - surfaced as a condition with Type=Progressing, Status=False. and Reason=ProgressDeadlineExceeded in the status of the resource. The Deployment controller will keep retrying the Deployment. This defaults to 600.

From https://kubernetes.io/docs/concepts/workloads/controllers/deployment/

In practice this means that since this action uses kubectl rollout status, under the hood it times out when going above 600 seconds.

I've patched this in our chart by explicitly setting spec.progressDeadlineSeconds (PostHog/charts-clickhouse#40) but thought it's worth raising here as well. Is this something this action should handle implicitly?

cc @consideRatio

The text was updated successfully, but these errors were encountered:

welcome · 2021-07-08T07:56:14Z

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.

You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

consideRatio · 2021-07-08T09:30:03Z

@macobo thanks for your kind feedback and sharing your insights about this - today I learned about this for the first time!

Hmm hmmm I'm thinking:

it would be reasonable for this action to fail if a Deployment is configured with spec.progressDeadlineSeconds and ends up timing out
it would be unreasonable for this action to make any change to the k8s resources, and it could make sense for whoever manages the resources to ensure progressDeadlineSeconds is set to a value so it won't occur.
it would add significant complexity for this action to always wait a given time no matter if progressDeadlineSeconds becomes the limiting factor, and it's doesn't feel obvious it would be desired
it could make sense to add a note someplace in the documentation about this, or just having this issue findable as it now is via search enginges and github search

What do you think @macobo? Do you have a suggestion on a change to pursue in this repo?

macobo · 2021-07-08T11:01:49Z

Agreed with everything you're saying. I think the issue might be good enough of a MVP though a note in the README.md under gotchas wouldn't hurt. :)

macobo changed the title ~~Edge case around --timeout~~ Edge case around timeout Jul 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Edge case around `timeout` #35

Edge case around `timeout` #35

macobo commented Jul 8, 2021

welcome bot commented Jul 8, 2021

consideRatio commented Jul 8, 2021

macobo commented Jul 8, 2021

Edge case around timeout #35

Edge case around timeout #35

Comments

macobo commented Jul 8, 2021

welcome bot commented Jul 8, 2021

consideRatio commented Jul 8, 2021

macobo commented Jul 8, 2021

Edge case around `timeout` #35

Edge case around `timeout` #35