Skip to content

Commit

Permalink
Make svclb as simple as possible
Browse files Browse the repository at this point in the history
Signed-off-by: manuelbuil <[email protected]>
  • Loading branch information
manuelbuil committed Sep 27, 2024
1 parent a0c6f32 commit eb3bc9d
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 81 deletions.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Remove svclb daemonset
# Remove svclb klipper-lb

Date: 2024-09-26

Expand Down Expand Up @@ -77,10 +77,14 @@ KUBE-SVC-CVG3OEGEH7H5P3HQ all -- 0.0.0.0/0 0.0.0.0/0

As a consequence, the traffic never gets into the svclb daemonset pod. This can be additionally demonstrated by running a tcpdump on the svclb daemonset pod and no traffic will appear. This can also be demonstrated by tracing the iptables flow, where we will see how traffic is following the described path.

Therefore, if we replace the logic to find the node IPs of the serviceLB controller by something which does not require the svclb daemonset, we could get rid of that daemonset since traffic is never reaching it. That replacement should be easy because in the end a daemonset means all nodes, so we could basically query kube-api to provide the IPs of all nodes.

*/ ON-GOING IMPLEMENTATION */
## Solutions

One obvious solution is to get rid of the daemonset completely. However, that solution will not detect if there is a process already using the port in a node (by another k8s service or by a node server) because kube-proxy does not check this.

One simpler solution would be to replace klipper-lb with a tiny image which includes the binary `sleep` and remove all the extra permissions required to run klipper-lb. In this case, we would use the daemonset to reserve the port and if a node is already using that port, that node will not be included as external-IP. Moreover, the daemonset pod running on that node will clearly warn the user that it can't run because the port is already reserved, hence making it easy to understand why that node IP is filtered out.

This commit includes the simpler solution using bci-busybox as the new image for the daemonset

## Decision

Expand All @@ -89,10 +93,8 @@ Therefore, if we replace the logic to find the node IPs of the serviceLB control
## Consequences

### Positives
* Less resource consumption as we won't need one daemonset per LoadBalancer type of service
* One fewer repo to maintain (klipper-lb)
* Easier to understand flow of traffic

### Negatives
* Possible confusion for users that have been using this feature for a long time ("Where is the daemonset?") or users relying on that daemonset for their automation
* In today's solution, if two LoadBalancer type services are using the same port, it is rather easy to notice that things don't work because the second daemonset will not deploy, as the port is already being used by the first daemonset. Kube-proxy does not check if two services are using the same port and it will create both rules without any error. The service that gets its rules higher in the chain is the one that will be reached when querying $nodeIP:$port. Perhaps we could add some logic in the controller that warns users about a duplication of the pair ip&port
* None that I can think of
81 changes: 6 additions & 75 deletions pkg/cloudprovider/servicelb.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@ package cloudprovider

import (
"context"
"encoding/json"
"fmt"
"sort"
"strconv"
"strings"
"time"
"encoding/json"

"sigs.k8s.io/yaml"

"github.com/k3s-io/k3s/pkg/util"
Expand Down Expand Up @@ -55,7 +55,7 @@ const (
)

var (
DefaultLBImage = "rancher/klipper-lb:v0.4.9"
DefaultLBImage = "registry.suse.com/bci/bci-busybox:latest"
)

func (k *k3s) Register(ctx context.Context,
Expand Down Expand Up @@ -434,28 +434,8 @@ func (k *k3s) newDaemonSet(svc *core.Service) (*apps.DaemonSet, error) {
name := generateName(svc)
oneInt := intstr.FromInt(1)
priorityClassName := k.getPriorityClassName(svc)
localTraffic := servicehelper.RequestsOnlyLocalTraffic(svc)
sourceRangesSet, err := servicehelper.GetLoadBalancerSourceRanges(svc)
if err != nil {
return nil, err
}
sourceRanges := strings.Join(sourceRangesSet.StringSlice(), ",")
securityContext := &core.PodSecurityContext{}

for _, ipFamily := range svc.Spec.IPFamilies {
switch ipFamily {
case core.IPv4Protocol:
securityContext.Sysctls = append(securityContext.Sysctls, core.Sysctl{Name: "net.ipv4.ip_forward", Value: "1"})
case core.IPv6Protocol:
securityContext.Sysctls = append(securityContext.Sysctls, core.Sysctl{Name: "net.ipv6.conf.all.forwarding", Value: "1"})
if sourceRanges == "0.0.0.0/0" {
// The upstream default load-balancer source range only includes IPv4, even if the service is IPv6-only or dual-stack.
// If using the default range, and IPv6 is enabled, also allow IPv6.
sourceRanges += ",::/0"
}
}
}

ds := &apps.DaemonSet{
ObjectMeta: meta.ObjectMeta{
Name: name,
Expand Down Expand Up @@ -522,6 +502,7 @@ func (k *k3s) newDaemonSet(svc *core.Service) (*apps.DaemonSet, error) {
Name: portName,
Image: k.LBImage,
ImagePullPolicy: core.PullIfNotPresent,
Command: []string{"sleep", "inf"},
Ports: []core.ContainerPort{
{
Name: portName,
Expand All @@ -530,57 +511,7 @@ func (k *k3s) newDaemonSet(svc *core.Service) (*apps.DaemonSet, error) {
Protocol: port.Protocol,
},
},
Env: []core.EnvVar{
{
Name: "SRC_PORT",
Value: strconv.Itoa(int(port.Port)),
},
{
Name: "SRC_RANGES",
Value: sourceRanges,
},
{
Name: "DEST_PROTO",
Value: string(port.Protocol),
},
},
SecurityContext: &core.SecurityContext{
Capabilities: &core.Capabilities{
Add: []core.Capability{
"NET_ADMIN",
},
},
},
}

if localTraffic {
container.Env = append(container.Env,
core.EnvVar{
Name: "DEST_PORT",
Value: strconv.Itoa(int(port.NodePort)),
},
core.EnvVar{
Name: "DEST_IPS",
ValueFrom: &core.EnvVarSource{
FieldRef: &core.ObjectFieldSelector{
FieldPath: getHostIPsFieldPath(),
},
},
},
)
} else {
container.Env = append(container.Env,
core.EnvVar{
Name: "DEST_PORT",
Value: strconv.Itoa(int(port.Port)),
},
core.EnvVar{
Name: "DEST_IPS",
Value: strings.Join(svc.Spec.ClusterIPs, ","),
},
)
}

ds.Spec.Template.Spec.Containers = append(ds.Spec.Template.Spec.Containers, container)
}

Expand Down Expand Up @@ -710,8 +641,8 @@ func (k *k3s) getPriorityClassName(svc *core.Service) string {
return k.LBDefaultPriorityClassName
}

// getTolerations retrieves the tolerations from a service's annotations.
// It parses the tolerations from a JSON or YAML string stored in the annotations.
// getTolerations retrieves the tolerations from a service's annotations.
// It parses the tolerations from a JSON or YAML string stored in the annotations.
func (k *k3s) getTolerations(svc *core.Service) ([]core.Toleration, error) {
tolerationsStr, ok := svc.Annotations[tolerationsAnnotation]
if !ok {
Expand Down

0 comments on commit eb3bc9d

Please sign in to comment.