[fix]: check provisioning -> running every 15 secs instead of 5 #472

rahulait · 2024-08-21T12:36:40Z

What this PR does / why we need it:
It looks like we are too aggressive in checking whether the instance has moved from provisioning -> running state. When an instance is created, it comes up, gets its disks created and resized, boots up and finally becomes running. This operation itself takes 1 to 2 mins based on the size of instance and other factors. Currently during reconcile, we are checking if instance has moved from provisioning to running state every 5 secs. This means for each instance, we are making close to 12 to 20 GET calls. Here is a sample output for one instance:

// provisioning new instance
2024-08-20T13:24:08.426931642Z GET  /v4beta/linode/instances?page=1  HTTP/1.1
2024-08-20T13:24:08.516412267Z GET  /v4beta/regions/us-ord  HTTP/1.1
2024-08-20T13:24:08.596356587Z GET  /v4beta/images/linode%2Fubuntu22.04  HTTP/1.1
2024-08-20T13:24:08.701246429Z GET  /v4beta/vpcs/88353  HTTP/1.1
2024-08-20T13:24:09.245072948Z POST  /v4beta/linode/instances  HTTP/1.1
2024-08-20T13:24:09.370996586Z POST  /v4beta/linode/instances/62849600/boot  HTTP/1.1
2024-08-20T13:24:09.499083864Z GET  /v4beta/linode/instances/62849600/ips  HTTP/1.1
2024-08-20T13:24:09.629499882Z GET  /v4beta/linode/instances/62849600/configs?page=1  HTTP/1.1
2024-08-20T13:24:09.779294293Z GET  /v4beta/linode/instances/62849600/ips  HTTP/1.1
2024-08-20T13:24:09.920384585Z POST  /v4beta/nodebalancers/854217/configs/1397979/nodes  HTTP/1.1
2024-08-20T13:24:10.202103378Z GET  /v4beta/linode/instances?page=1  HTTP/1.1

//checking if instance is running/provisioned
2024-08-20T13:24:10.431260943Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:24:10.601453685Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:24:13.599866494Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:24:13.767120059Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:24:15.619029327Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:24:20.828628645Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:24:25.999815763Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:24:31.163618605Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:24:36.330660557Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:24:41.555155689Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:24:46.738508917Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:24:51.905828192Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:24:57.102032464Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:25:02.306054006Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:25:07.463763577Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:25:12.675023274Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:25:12.861765692Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:25:17.979051026Z GET  /v4beta/linode/instances/62849600  HTTP/1.1

// Instance in running state
2024-08-20T13:25:23.146056450Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:25:23.375264105Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:25:23.685719943Z GET  /v4beta/linode/instances/62849600  HTTP/1.1
2024-08-20T13:27:58.322861280Z GET  /v4beta/linode/instances/62849600  HTTP/1.1

Based on the above logs, we can see that we are too aggressive in checking whether the instance has provisioned and moved to running or not. Instead of trying every 5 seconds, we can try every 15 seconds and that will cut down the number of GET api calls we make during instance creation to 1/3 of the existing number. I am suggesting 15 instead of 10 as I feel 15 is reasonable time to wait and then try again than being too aggressive.
Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

TODOs:

squashed commits
includes documentation
adds unit tests
adds or updates e2e tests

codecov · 2024-08-21T12:41:28Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 65.33%. Comparing base (56faeae) to head (0549dab).
Report is 14 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #472   +/-   ##
=======================================
  Coverage   65.33%   65.33%           
=======================================
  Files          80       80           
  Lines        4192     4192           
=======================================
  Hits         2739     2739           
  Misses       1243     1243           
  Partials      210      210

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

AshleyDumaine · 2024-08-21T15:01:04Z

See also #473 but I'm fine with merging this and pulling in the change 👍

check provisioning -> running every 15 secs instead of 5

17a9805

Merge branch 'main' into reduce-api-calls

0549dab

AshleyDumaine approved these changes Aug 21, 2024

View reviewed changes

rahulait merged commit 86d3a25 into main Aug 21, 2024
13 checks passed

eljohnson92 added the bugfix label Aug 23, 2024

eljohnson92 changed the title ~~check provisioning -> running every 15 secs instead of 5~~ [fix]: check provisioning -> running every 15 secs instead of 5 Aug 23, 2024

eljohnson92 deleted the reduce-api-calls branch September 20, 2024 20:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix]: check provisioning -> running every 15 secs instead of 5 #472

[fix]: check provisioning -> running every 15 secs instead of 5 #472

rahulait commented Aug 21, 2024

codecov bot commented Aug 21, 2024 •

edited

Loading

AshleyDumaine commented Aug 21, 2024

[fix]: check provisioning -> running every 15 secs instead of 5 #472

[fix]: check provisioning -> running every 15 secs instead of 5 #472

Conversation

rahulait commented Aug 21, 2024

codecov bot commented Aug 21, 2024 • edited Loading

Codecov Report

AshleyDumaine commented Aug 21, 2024

codecov bot commented Aug 21, 2024 •

edited

Loading