Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add memory evict doc #140

Merged
merged 1 commit into from
Aug 2, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 130 additions & 0 deletions docs/user-manuals/memory-evict.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# Eviction Strategy base on Memory Usage

## Introduction

Koordinator supports the dynamic overcommitment from idle resources on node to low-priority
zwzhang0107 marked this conversation as resolved.
Show resolved Hide resolved
Pods as Batch priority. In co-location scenarios, the actual memory resource usage of
nodes is constantly changing. For incompressible resources such as memory, high resource
usage of node may cause OOM, which results in the high-priority Pod got killed. Koordinator
provides an eviction strategy based on the memory usage node. `Koordlet` will continuously
detect the memory usage of node (Total-Available) in second-level granularity.
When the resource memory usage of node is high, it will evict low-priority BE Pods to
ensure the QoS of high-priority pods until the memory usage of node reduces below to the
threshold (evictThreshold). During the eviction process, Pods with lower priority(Pod.Spec.Priority)
will be selected first, and if the priority is the same, Pods which consume more memory will be
evicted first.


![image](/img/memory-evict.svg)

### Prerequisite
Please make sure Koordinator components are correctly installed in your cluster. If not, please refer to
[Installation](/docs/installation).

| Component | Version Requirement |
| --- | ------- |
| Kubernetes | ≥v1.18 |
| koordinator | ≥v0.3.0 |

The eviction strategy is provided by `Koordlet`, which is disabled by default in feature-gate.
Please make sure the `BEMemoryEvict=true` field has been added in the `-feature-gates` arguments of `Koordlet`
as the [example](https://github.com/koordinator-sh/charts/blob/main/versions/v1.2.0/templates/koordlet.yaml#L36)。

## Use Eviction Strategy base on Memory Usage

1. Create a configmap.yaml file based on the following ConfigMap content:
```yaml
#ConfigMap slo-controller-config example。
apiVersion: v1
kind: ConfigMap
metadata:
name: slo-controller-config # name should be set as the configuration of koord-manager, e.g. ack-slo-config
zwzhang0107 marked this conversation as resolved.
Show resolved Hide resolved
namespace: koordinator-system # namespace should be set as the configuration of installation, e.g. kube-system
data:
# enable the eviction strategy base on Memory Usage
resource-threshold-config: |
{
"clusterStrategy": {
"enable": true,
"memoryEvictThresholdPercent": 70
}
}
```

| Configuration item | Parameter | Valid values | Description |
| :-------------- | :------ | :-------- | :----------------------------------------------------------- |
| `enable` | Boolean | true; false | true:enable the eviction.; false(default):disable the eviction. |
| `memoryEvictThresholdPercent` | Int | 0~100 | eviction threshold percent of node memory usage, default is 70. |

2. Check whether a ConfigMap named `slo-controller-config` exists in the `koordinator-system` namespace.

- If a ConfigMap named `slo-controller-config` exists, we commend that you run the kubectl patch command to update the ConfigMap. This avoids changing other settings in the ConfigMap.

```bash
kubectl patch cm -n koordinator-system slo-controller-config --patch "$(cat configmap.yaml)"
```

- If no ConfigMap named `slo-controller-config` exists, run the kubectl patch command to create a ConfigMap named ack-slo-config:

```bash
kubectl apply -f configmap.yaml
```

3. Create a file named be-pod-demo.yaml based on the following YAML content:

```yaml
apiVersion: v1
kind: Pod
metadata:
name: be-pod-demo
labels:
koordinator.sh/qosClass: 'BE' # set Pod QoS as BE
spec:
containers:
- args:
- '-c'
- '1'
- '--vm'
- '1'
command:
- stress
image: polinux/stress
imagePullPolicy: Always
name: stress
restartPolicy: Always
schedulerName: default-scheduler
```

4. Run the following command to deploy the ls-pod-demo pod in the cluster:

```bash
kubectl apply -f ls-pod-demo.yaml
```

5. Run the following command to check the ls-pod-demo pod in Running state:

```bash
$ kubectl get pod be-pod-demo
NAME READY STATUS RESTARTS AGE
be-pod-demo 1/1 Running 0 7s
```
6. Run the following command through [stress tool](https://linux.die.net/man/1/stress)
zwzhang0107 marked this conversation as resolved.
Show resolved Hide resolved
make sure the memory usage of node is above the threshold config, and the argument `--vm-bytes`
means the process will consume 10GB memory, this should be adjusted according to the node capacity.

```bash
$ stress --cpu 1 --vm 1 --vm-bytes 10G --vm-keep
```

7. Check the running state of be-pod-demo, then you can find the be-pod-demo pod is not exsit,
and the eviction information can be found in events.

```bash
$ kubectl get pod be-pod-demo
Error from server (NotFound): pods "be-pod-demo" not found

$ kubectl get event
LAST SEEN TYPE REASON OBJECT MESSAGE
46s Normal Killing pod/be-pod-demo Stopping container stress
48s Warning evictPodSuccess $you-pod-object evict Pod:be-pod-demo, reason: EvictPodByNodeMemoryUsage, message: killAndEvictBEPods for node(${your-node-id}), need to release memory: 8077889699
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# 基于内存用量的驱逐策略

## 简介

Koordinator支持了将节点空闲资源动态超卖给低优先级Pod,在混部场景下,节点实际的内存资源用量时刻在变化,对于内存这类不可压缩类型的资源,
当节点资源用量较高时,可能会引发整机内存OOM,导致高优先级Pod的进程被kill。为防止这一情况发生,Koordiantor提供了基于单机内存用量的驱逐策略。
单机组件Koordlet会以秒级粒度持续探测整机内存的用量情况(Total-Available),当整机资源内存用量较高时,会将低优先级的BE类型Pod驱逐,
保障高优先级Pod的服务质量。在驱逐过程中会首先选择优先级(Pod.Spec.Priority)更低的Pod进行驱逐,若优先级相同,
则优先驱逐内存资源用量更多的Pod,直至整机内存用量降低到配置的安全水位(evictThreshold)以下。

![image](/img/memory-evict.svg)

## 使用限制
请确保Koordinator已正确安装在你的集群中。若未安装,请参考[安装文档](https://koordinator.sh/docs/installation),所需的版本要求情况如下:

| 组件 | 版本要求 |
| --- | ------- |
| Kubernetes | ≥v1.18 |
| koordinator | ≥v0.3.0 |

该功能由单机组件Koordlet提供,对应的feature-gate默认关闭,使用前请确保koordlet的启动参数`-feature-gates`中已经添加了`BEMemoryEvict=true`,
详见[参考示例](https://github.com/koordinator-sh/charts/blob/main/versions/v1.2.0/templates/koordlet.yaml#L36)。

## 操作步骤

1. 使用以下ConfigMap,创建configmap.yaml文件
```yaml
#ConfigMap slo-controller-config 样例。
apiVersion: v1
kind: ConfigMap
metadata:
name: slo-controller-config # 以koord-manager实际配置的名字为准,例如ack-slo-config
namespace: koordinator-system # 命名空间以环境中实际安装的情况为准,例如kube-system
data:
# 开启基于内存用量的驱逐功能。
resource-threshold-config: |
{
"clusterStrategy": {
"enable": true,
"memoryEvictThresholdPercent": 70
}
}
```

| 参数 | 类型 | 取值范围 | 说明 |
| :-------------- | :------ | :-------- | :----------------------------------------------------------- |
| `enable` | Boolean | true; false | true:集群全局开启单机内存驱逐策略。false(默认值):集群全局关闭单机内存驱逐策略。 |
| `memoryEvictThresholdPercent` | Int | 0~100 | 整机内存资源用量百分比水位,表示触发驱逐的内存阈值,默认值为70。 |

2. 查看安装的命名空间下是否存在ConfigMap,以命名空间`koordinator-system`和ConfigMap名字`slo-controller-config`为例,具体以实际安装配置为准。

- 若存在ConfigMap `slo-controller-config`,请使用PATCH方式进行更新,避免干扰ConfigMap中其他配置项。

```bash
kubectl patch cm -n koordinator-system slo-controller-config --patch "$(cat configmap.yaml)"
```

- 若不存在ConfigMap `slo-controller-config`,请执行以下命令进行创建Configmap。

```bash
kubectl apply -f configmap.yaml
```

3. 使用以下YAML内容,创建be-pod-demo.yaml文件。

```yaml
apiVersion: v1
kind: Pod
metadata:
name: be-pod-demo
labels:
koordinator.sh/qosClass: 'BE' #指定Pod的QoS级别为BE。
spec:
containers:
- args:
- '-c'
- '1'
- '--vm'
- '1'
command:
- stress
image: polinux/stress
imagePullPolicy: Always
name: stress
restartPolicy: Always
schedulerName: default-scheduler
```

4. 执行以下命令,将be-pod-demo部署到集群。

```bash
$ kubectl apply -f be-pod-demo.yaml
```

5. 执行以下命令,查看be-pod-demo状态,等待Pod启动完成。

```bash
$ kubectl get pod be-pod-demo
NAME READY STATUS RESTARTS AGE
be-pod-demo 1/1 Running 0 7s
```

6. 在节点执行以下命令,使用[stress工具](https://linux.die.net/man/1/stress)启动进程,
确保整机内存资源用量被提升到驱逐水位以上,其中`--vm-bytes`参数表示stress进程占用的内存量10GB,测试时可根据实际机型情况进行调整。

```bash
$ stress --cpu 1 --vm 1 --vm-bytes 10G --vm-keep
```

7. 观察be-pod-demo运行情况,可以发现be-pod-demo已经不存在,驱逐信息可以通过event查看到。

```bash
$ kubectl get pod be-pod-demo
Error from server (NotFound): pods "be-pod-demo" not found

$ kubectl get event
LAST SEEN TYPE REASON OBJECT MESSAGE
46s Normal Killing pod/be-pod-demo Stopping container stress
48s Warning evictPodSuccess $you-pod-object evict Pod:be-pod-demo, reason: EvictPodByNodeMemoryUsage, message: killAndEvictBEPods for node(${your-node-id}), need to release memory: 8077889699
```
1 change: 1 addition & 0 deletions sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ const sidebars = {
'user-manuals/memory-qos',
'user-manuals/performance-collector',
'user-manuals/cpu-qos',
'user-manuals/memory-evict',
],
},
{
Expand Down
Loading
Loading