Uprade EKS and worker nodes

AWS EKS doesn't update the cluster automatically.

Subscribe to the Amazon Linux AMI Security Bulletin

Security Patch for Worker AMI

Check if new EKS AMI available after ALAS2 alert
if needed increase worker count via builder (unless we have autoscaling)
Manually drain and kill each node that uses old AMI
Check in EC2 console if workers are using new AMI

To manually drain and kill the nodes:

kubectl get nodes       #
kubectl cordon my-node  # no new Pods will be scheduled here
kubectl drain --ignore-daemonsets my-node   # existing Pods will be evicted and sent to another node
aws ec2 terminate-instances --instance-ids=...  # terminate a node, a new one will be created

kubectl drain will complain if pods are using local data storage or if evicting a pod would violate a PodDisruptionBudget. You can force the eviction using --delete-local-data and --disable-eviction respectively. Check which pods are complaining before doing this and make sure that this wouldn't break production services.

Copied from builder docs.

k8s version upgrade

check aws docs for availability and notes
use silver-surfer/kubedd to check for api deprecations kubedd --target-kubernetes-version=1.22 (example for 1.22 upgrade) DEPRECATED is okay, but if an api is DELETED in the new k8s version you will have to fix the affected charts.
bump k8s version (one minor at a time) in elife.yaml
apply using builder/bldr update_infrastructure:kubernetes-aws--flux-prod This should change the EKS (i.e k8s control plane) and AutoscalingGroup AMI image.
If flux fails to access the api after the EKS upgrade, try restarting it with kubectl -n flux rollout restart deployment flux
upgrade kube-proxy (see aws docs)
drain and terminate node by node as described above to upgrade the workers

Changing api versions in the chart can lead to helm complaining about existing resource conflict. This appears to be an issue with helm3 that helm-operator is aware of but can't fix until helm3 fixes it upstream. To fix: delete the resource e.g. Deployment, DaemonSet, StatefulSet with kubectl. They should automatically be replaced by the new version. This will cause brief downtime.

Further documentation

https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html https://github.com/elifesciences/builder/blob/master/docs/eks.md#ami-update

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

upgrading-eks.md

upgrading-eks.md

Uprade EKS and worker nodes

Security Patch for Worker AMI

k8s version upgrade

Further documentation

Files

upgrading-eks.md

Latest commit

History

upgrading-eks.md

File metadata and controls

Uprade EKS and worker nodes

Security Patch for Worker AMI

k8s version upgrade

Further documentation