Speeding up scaling operations #58

skuda · 2017-10-19T13:36:15Z

Hi,

This is not a bug, sorry I didn't find a better way to communicate this!

I have been using the autoscaler and it's working great, but it's somewhat slow, for us adding new nodes is taking approximately 10 minutes.

Maybe our use case is a bit special but we have usually very small load that sometimes go up very fast, the specific service I am speaking about acts as a precomputed cache.
If the cache is full, hits are very cheap, if the cache is purged, something that happens 2 or 3 times per week, the load skyrocket for about 2 to 3 hours.

I understand that creating the nodes, installing everything and adding them to the cluster is something that takes its time, but I have been thinking, why not having a specific number of pre-configured nodes, only deallocated, it would be much faster to just put online existing servers that destroy them and recreate them from the very beginning every time, no?

Best,
Miguel.

wbuchwalter · 2017-10-20T13:35:47Z

Hi @skuda,

This is the best place to discuss this :)
Keeping deallocated node is not a bad idea, but I think it will be quite complex to implement correctly.
What would be an acceptable scaling time in your case?

oryagel · 2017-10-20T14:43:55Z

We would like to see something like that as well - reduce the starvation time. I was thinking of a different approach, just keep extra nodes alive. For exmaple, if I will configure extra-nodes=2. the autoscaler will always keep extra cores and memory that match the resources of two nodes.

skuda · 2017-10-20T15:00:44Z

@wbuchwalter For me, 2 or even 3 minutes would be fast enough.

alexquintero · 2018-02-22T23:43:43Z

I'm not sure this is the fault of the autoscaler itself but rather a function of how long it takes for Azure to spin up a VM for your cluster. In my general tests it takes anywhere from 7-13 minutes to get a new VM in an Availability Set. This is in westus by the way. I wonder if the vm scale up time differs based on region?

I agree with @skuda that a shorter vm time of a few minutes would be ideal. I personally don't think this is possible without using VM Scale Sets. Maybe when those are supported by acs-engine we will get the shorter spin up time.

Or... when we are able to use a stable ACI connector, or some other method of having a virtual kubelet with infinite capacity (serverless containers), then there shouldn't be any VM spin up time anymore.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speeding up scaling operations #58

Speeding up scaling operations #58

skuda commented Oct 19, 2017

wbuchwalter commented Oct 20, 2017 •

edited

Loading

oryagel commented Oct 20, 2017

skuda commented Oct 20, 2017

alexquintero commented Feb 22, 2018

Speeding up scaling operations #58

Speeding up scaling operations #58

Comments

skuda commented Oct 19, 2017

wbuchwalter commented Oct 20, 2017 • edited Loading

oryagel commented Oct 20, 2017

skuda commented Oct 20, 2017

alexquintero commented Feb 22, 2018

wbuchwalter commented Oct 20, 2017 •

edited

Loading