Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix]: check provisioning -> running every 15 secs instead of 5 #472

Merged
merged 2 commits into from
Aug 21, 2024

Conversation

rahulait
Copy link
Contributor

What this PR does / why we need it:
It looks like we are too aggressive in checking whether the instance has moved from provisioning -> running state. When an instance is created, it comes up, gets its disks created and resized, boots up and finally becomes running. This operation itself takes 1 to 2 mins based on the size of instance and other factors. Currently during reconcile, we are checking if instance has moved from provisioning to running state every 5 secs. This means for each instance, we are making close to 12 to 20 GET calls. Here is a sample output for one instance:

// provisioning new instance
2024-08-20T13:24:08.426931642Z GET  /v4beta/linode/instances?page=1  HTTP/1.1
2024-08-20T13:24:08.516412267Z GET  /v4beta/regions/us-ord  HTTP/1.1
2024-08-20T13:24:08.596356587Z GET  /v4beta/images/linode%2Fubuntu22.04  HTTP/1.1
2024-08-20T13:24:08.701246429Z GET  /v4beta/vpcs/88353  HTTP/1.1
2024-08-20T13:24:09.245072948Z POST  /v4beta/linode/instances  HTTP/1.1
2024-08-20T13:24:09.370996586Z POST  /v4beta/linode/instances/62849600/boot  HTTP/1.1
2024-08-20T13:24:09.499083864Z GET  /v4beta/linode/instances/62849600/ips  HTTP/1.1
2024-08-20T13:24:09.629499882Z GET  /v4beta/linode/instances/62849600/configs?page=1  HTTP/1.1
2024-08-20T13:24:09.779294293Z GET  /v4beta/linode/instances/62849600/ips  HTTP/1.1
2024-08-20T13:24:09.920384585Z POST  /v4beta/nodebalancers/854217/configs/1397979/nodes  HTTP/1.1
2024-08-20T13:24:10.202103378Z GET  /v4beta/linode/instances?page=1  HTTP/1.1

//checking if instance is running/provisioned
2024-08-20T13:24:10.431260943Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:24:10.601453685Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:24:13.599866494Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:24:13.767120059Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:24:15.619029327Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:24:20.828628645Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:24:25.999815763Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:24:31.163618605Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:24:36.330660557Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:24:41.555155689Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:24:46.738508917Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:24:51.905828192Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:24:57.102032464Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:25:02.306054006Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:25:07.463763577Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:25:12.675023274Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:25:12.861765692Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:25:17.979051026Z GET  /v4beta/linode/instances/62849600  HTTP/1.1

// Instance in running state
2024-08-20T13:25:23.146056450Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:25:23.375264105Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:25:23.685719943Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:27:58.322861280Z GET  /v4beta/linode/instances/62849600  HTTP/1.1

Based on the above logs, we can see that we are too aggressive in checking whether the instance has provisioned and moved to running or not. Instead of trying every 5 seconds, we can try every 15 seconds and that will cut down the number of GET api calls we make during instance creation to 1/3 of the existing number. I am suggesting 15 instead of 10 as I feel 15 is reasonable time to wait and then try again than being too aggressive.
Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests
  • adds or updates e2e tests

Copy link

codecov bot commented Aug 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 65.33%. Comparing base (56faeae) to head (0549dab).
Report is 14 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #472   +/-   ##
=======================================
  Coverage   65.33%   65.33%           
=======================================
  Files          80       80           
  Lines        4192     4192           
=======================================
  Hits         2739     2739           
  Misses       1243     1243           
  Partials      210      210           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@AshleyDumaine
Copy link
Member

See also #473 but I'm fine with merging this and pulling in the change 👍

@rahulait rahulait merged commit 86d3a25 into main Aug 21, 2024
13 checks passed
@eljohnson92 eljohnson92 changed the title check provisioning -> running every 15 secs instead of 5 [fix]: check provisioning -> running every 15 secs instead of 5 Aug 23, 2024
@eljohnson92 eljohnson92 deleted the reduce-api-calls branch September 20, 2024 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants