Skip to content

Commit

Permalink
operator: improve readme structure
Browse files Browse the repository at this point in the history
Fixes: #1132

Co-authored-by: Eero Tamminen <[email protected]>
Signed-off-by: Tuomas Katila <[email protected]>
  • Loading branch information
tkatila and eero-t committed Apr 24, 2023
1 parent 2a36526 commit 4a4a0e5
Showing 1 changed file with 66 additions and 50 deletions.
116 changes: 66 additions & 50 deletions cmd/operator/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ Table of Contents
* [Introduction](#introduction)
* [Installation](#installation)
* [Upgrade](#upgrade)
* [Limiting Supported Devices](#limiting-supported-devices)
* [Known issues](#known-issues)

## Introduction
Expand All @@ -16,6 +17,12 @@ administrators.

## Installation

The default operator deployment depends on NFD and cert-manager. Those components have to be installed to the cluster before the operator can be deployed.

> **Note**: Operator can also be installed via Helm charts. See [INSTALL.md](../../INSTALL.md) for details.
### NFD

Install NFD (if it's not already installed) and node labelling rules (requires NFD v0.10+):

```
Expand All @@ -38,7 +45,7 @@ nfd-worker-qqq4h 1/1 Running 0 25h
Note that labelling is not performed immediately. Give NFD 1 minute to pick up the rules and label nodes.

As a result all found devices should have correspondent labels, e.g. for Intel DLB devices the label is
intel.feature.node.kubernetes.io/dlb:
`intel.feature.node.kubernetes.io/dlb`:
```
$ kubectl get no -o json | jq .items[].metadata.labels |grep intel.feature.node.kubernetes.io/dlb
"intel.feature.node.kubernetes.io/dlb": "true",
Expand All @@ -55,6 +62,8 @@ deployments/operator/samples/deviceplugin_v1_fpgadeviceplugin.yaml: intel.fea
deployments/operator/samples/deviceplugin_v1_dsadeviceplugin.yaml: intel.feature.node.kubernetes.io/dsa: 'true'
```

### Cert-Manager

The default operator deployment depends on [cert-manager](https://cert-manager.io/) running in the cluster.
See installation instructions [here](https://cert-manager.io/docs/installation/kubectl/).

Expand All @@ -68,45 +77,7 @@ cert-manager-cainjector-87c85c6ff-59sb5 1/1 Running 0 21d
cert-manager-webhook-64dc9fff44-29cfc 1/1 Running 0 21d
```

Also if your cluster operates behind a corporate proxy make sure that the API
server is configured not to send requests to cluster services through the
proxy. You can check that with the following command:

```bash
$ kubectl describe pod kube-apiserver --namespace kube-system | grep -i no_proxy | grep "\.svc"
```

In case there's no output and your cluster was deployed with `kubeadm` open
`/etc/kubernetes/manifests/kube-apiserver.yaml` at the control plane nodes and
append `.svc` and `.svc.cluster.local` to the `no_proxy` environment variable:

```yaml
apiVersion: v1
kind: Pod
metadata:
...
spec:
containers:
- command:
- kube-apiserver
- --advertise-address=10.237.71.99
...
env:
- name: http_proxy
value: http://proxy.host:8080
- name: https_proxy
value: http://proxy.host:8433
- name: no_proxy
value: 127.0.0.1,localhost,.example.com,10.0.0.0/8,.svc,.svc.cluster.local
...
```

**Note:** To build clusters using `kubeadm` with the right `no_proxy` settings from the very beginning,
set the cluster service names to `$no_proxy` before `kubeadm init`:

```
$ export no_proxy=$no_proxy,.svc,.svc.cluster.local
```
### Device Plugin Operator

Finally deploy the operator itself:

Expand All @@ -117,7 +88,7 @@ $ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes
Now you can deploy the device plugins by creating corresponding custom resources.
The samples for them are available [here](/deployments/operator/samples/).

## Usage
### Device Plugin Custom Resource

Deploy your device plugin by applying its custom resource, e.g.
`GpuDevicePlugin` with
Expand All @@ -134,8 +105,22 @@ NAME DESIRED READY NODE SELECTOR AGE
gpudeviceplugin-sample 1 1 5s
```

## Upgrade

The upgrade of the deployed plugins can be done by simply installing a new release of the operator.

The operator auto-upgrades operator-managed plugins (CR images and thus corresponding deployed daemonsets) to the current release of the operator.

During upgrade the tag in the image path is updated (e.g. docker.io/intel/intel-sgx-plugin:tag), but the rest of the path is left intact.

No upgrade is done for:
- Non-operator managed deployments
- Operator deployments without numeric tags

## Limiting Supported Devices

In order to limit the deployment to a specific device type,
use one of kustomizations under deployments/operator/device.
use one of kustomizations under `deployments/operator/device`.

For example, to limit the deployment to FPGA, use:

Expand All @@ -148,20 +133,51 @@ In this case, create a new kustomization with the necessary resources
that passes the desired device types to the operator using `--device`
command line argument multiple times.

## Upgrade
## Known issues

The upgrade of the deployed plugins can be done by simply installing a new release of the operator.
### Cluster behind a proxy

The operator auto-upgrades operator-managed plugins (CR images and thus corresponding deployed daemonsets) to the current release of the operator.
If your cluster operates behind a corporate proxy make sure that the API
server is configured not to send requests to cluster services through the
proxy. You can check that with the following command:

The [registry-url]/[namespace]/[image] are kept intact on the upgrade.
```bash
$ kubectl describe pod kube-apiserver --namespace kube-system | grep -i no_proxy | grep "\.svc"
```

No upgrade is done for:
In case there's no output and your cluster was deployed with `kubeadm` open
`/etc/kubernetes/manifests/kube-apiserver.yaml` at the control plane nodes and
append `.svc` and `.svc.cluster.local` to the `no_proxy` environment variable:

- Non-operator managed deployments
- Operator deployments without numeric tags
```yaml
apiVersion: v1
kind: Pod
metadata:
...
spec:
containers:
- command:
- kube-apiserver
- --advertise-address=10.237.71.99
...
env:
- name: http_proxy
value: http://proxy.host:8080
- name: https_proxy
value: http://proxy.host:8433
- name: no_proxy
value: 127.0.0.1,localhost,.example.com,10.0.0.0/8,.svc,.svc.cluster.local
...
```

## Known issues
**Note:** To build clusters using `kubeadm` with the right `no_proxy` settings from the very beginning,
set the cluster service names to `$no_proxy` before `kubeadm init`:

```
$ export no_proxy=$no_proxy,.svc,.svc.cluster.local
```

### Leader election enabled

When the operator is run with leader election enabled, that is with the option
`--leader-elect`, make sure the cluster is not overloaded with excessive
Expand Down

0 comments on commit 4a4a0e5

Please sign in to comment.