Skip to content

Automatically taint nodes and evict pods based on cpu pressure

License

Notifications You must be signed in to change notification settings

rtreffer/kubernetes-pressurecooker

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kubernetes Pressure Cooker

Automatically taint and evict nodes with high CPU overload. Derived from kubernetes-loadwatcher.

The load average describes the average length of the run queue whenever a scheduling decision is made. But it does not tell us how often processes were waiting for CPU time. The kernel pressure metrics (psi by facebook) describes how often there was not enough CPU available.

Synopsis

A kubernetes node can be overcommited on CPU: there might be more processes that want more CPU than requested. This can easily happen due to variable resource usage per pod, variance in hardware or variance in pod distributions. By default, Kubernetes will not evict Pods from a node based on CPU usage, since CPU is considered a compressible resource. However if a node does not have enough CPU resources to handle all pods it will impose additional latencies that can be undesirable based on the workload (e.g. web/interactive traffic).

This project contains a small Kubernetes controller that watches each node's CPU pressure; when a certain threshold is exceeded, the node will be tainted (so that no additional workloads are scheduled on an already-overloaded node) and finally the controller will start to evict Pods from the node.

Pressure is more sensitive for small overloads, e.g. with pressure information it is easy to express "there is an up to 20% chance to not get CPU instantly when needed".

How it works

This controller can be started with two threshold flags: -taint-threshold and -evict-threshold. There are also safeguard flags -min-pod-age and -eviction-backoff. The controller will continuously monitor a node's CPU pressure.

  • If the CPU pressure (5min average) exceeds the taint threshold, the node will be tainted with a pressurecooker/load-exceeded taint with the PreferNoSchedule effect. This will instruct Kubernetes to not schedule any additional workloads on this node if at all possible.

  • If the CPU load (both 5min and 15min average) falls back below the taint threshold, the taint will be removed again.

  • If the CPU load (15 min average) exceeds the eviction threshold, the controller will pick a suitable Pod running on the node and evict it. However, the following types of Pods will not be evicted:

    • Pods with the Guaranteed QoS class
    • Pods belonging to Stateful Sets
    • Pods belonging to Daemon Sets
    • Standalone pods not managed by any kind of controller
    • Pods running in the kube-system namespace or with a critical priorityClassName
    • Pods newer than min-pod-age

After a Pod was evicted, the next Pod will be evicted after a configurable eviction backoff (controllable using the evict-backoff argument) if the load15 is still above the eviction threshold.

Older pods will be evicted first. The ration to remove old pods first is tat it is usually better to move well behaving pods away from bad neighbors than moving bad neighbors through the cluster. And as a node will always stay in a healthy state it can be assumed that the older pods are less likely to be the cause of an overload.

About

Automatically taint nodes and evict pods based on cpu pressure

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Go 97.7%
  • Dockerfile 2.3%