Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better RTT and lower cloud costs by preferencing lower latency members #15918

Open
alam0rt opened this issue May 18, 2023 · 5 comments
Open

better RTT and lower cloud costs by preferencing lower latency members #15918

alam0rt opened this issue May 18, 2023 · 5 comments

Comments

@alam0rt
Copy link

alam0rt commented May 18, 2023

What would you like to be added?

Problem statement

etcd clients (such as kube-apiserver) will use round robin to select a member to connect to:

// Build returns itself for Resolver, because it's both a builder and a resolver.
func (r *EtcdManualResolver) Build(target resolver.Target, cc resolver.ClientConn, opts resolver.BuildOptions) (resolver.Resolver, error) {
r.serviceConfig = cc.ParseServiceConfig(`{"loadBalancingPolicy": "round_robin"}`)

This load balancing configuration has a downside in a HA cloud environment where cross zone traffic is metered as the apiserver will possibly connect to a member in another zone, which in turn replicates to members in other zones.

The other load balancing configurations are not suitable either, for example pick-first will serially connect to each member, and as the name implies, pick the first which connects. This would require ordering the endpoints which could be done if the order of the --etcd-servers in `kube-apiserver' order were retained (will need to test).

Why is this needed?

I believe a new load balancing configuration which prioritises members with lowest latency is a sensible default option for etcd.

  • This should see a decrease in RTT (manually configuring our test environment saw RTT from client -> etcd cluster reduce from 1.5ms in the worst case to about 550us)

image
This screenshot shows a graph of RTT between each zone as I trialed the pick-first configuration. You can see that the 9 lines (one for each relationship between zones) drops to 3 (as there are only 3 zones and traffic is no longer leaving the zone), plus the large reduction in RTT from the apiserver to etcd.

  • This should reduce cloud costs for etcd users
    • consider 1tb of traffic sent from the apiservers in a 3 member setup (1 member and 1 client in each zone)
    • in this environment, each gb of cross zone traffic costs $0.10
    • round robin would see 66% percent of the traffic go cross zone and thus would incur a cost
    • A latency aware lb configuration (presuming same zone == less latency 100% of the time) would save your users 66 dollars per day of operating costs.

Note this is just some rough napkin maths and I could be way off, regardless, I believe this feature would be beneficial.

@alam0rt
Copy link
Author

alam0rt commented May 18, 2023

Alright, so I forked etcd and switched over the default client lb config to use pick-first (alam0rt@a9fd304)

This works, and the ordering specified in --etcd-servers is respected, but now the issue is that with a 5 member etcd cluster (each zone has 2, 2, 1 members), the pick-first obviously prioritises the first member, so the other member in the zone gets a bit too much love from the corresponding apiserver.

So yeah, I feel a new lb configuration is required and we can't rely on pick first + configuration changes to ordering of members.

@serathius
Copy link
Member

I would prefer to delegate loadbalancing algorithm implementation to grpc as we are already planning to migrate to it #15145

@alam0rt
Copy link
Author

alam0rt commented May 18, 2023

Ah nice, are you referring to this? https://github.com/grpc/grpc/blob/master/doc/grpc_xds_features.md

@serathius
Copy link
Member

serathius commented May 19, 2023

Ah nice, are you referring to this? https://github.com/grpc/grpc/blob/master/doc/grpc_xds_features.md

No, I was just saying that in the best case etcd project should not implement loadbalancing, just provide a saine default and allow users to configure grpc themselves (pass flags via options).

@stale
Copy link

stale bot commented Sep 17, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Sep 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants