From 93e876dc02a16ae22d0cb52d9c7c0ea459dc5a10 Mon Sep 17 00:00:00 2001 From: agrant3 Date: Wed, 12 Jul 2023 09:38:56 +0100 Subject: [PATCH] AG: updated to include page for service policies (TTL, kubeconfig, namespaces) --- docs/services/gpuservice/index.md | 4 ++++ docs/services/gpuservice/policies.md | 21 +++++++++++++++++++++ mkdocs.yml | 1 + 3 files changed, 26 insertions(+) create mode 100644 docs/services/gpuservice/policies.md diff --git a/docs/services/gpuservice/index.md b/docs/services/gpuservice/index.md index f3020d43c..b44e7b7b4 100644 --- a/docs/services/gpuservice/index.md +++ b/docs/services/gpuservice/index.md @@ -38,6 +38,10 @@ A standard project namespace has the following initial quota (subject to ongoing Note these quotas are maximum use by a single project, and that during periods of high usage Kubernetes Jobs maybe queued waiting for resource to become available on the cluster. +## Additional Service Policy Information + +Additional information on service policies can be found [here](policies.md). + ## EIDF GPU Service Tutorial This tutorial teaches users how to submit tasks to the EIDFGPUS, but it is not a comprehensive overview of Kubernetes. diff --git a/docs/services/gpuservice/policies.md b/docs/services/gpuservice/policies.md new file mode 100644 index 000000000..650c2c911 --- /dev/null +++ b/docs/services/gpuservice/policies.md @@ -0,0 +1,21 @@ +# GPU Service Policies + +## Namespaces + +Each project will be given a namespace which will have an applied quota. + +Default Quota: + +- CPU: 100 Cores +- Memory: 1TiB +- GPU: 12 + +## Kubeconfig + +Each project will be assigned a kubeconfig file for access to the service which will allow operation in the assigned namespace and access to exposed service operators, for example the GPU and CephRBD operators. + +## Kubernetes Job Time to Live + +All Kubernetes Jobs submitted to the service will have a Time to Live (TTL) applied via "spec.ttlSecondsAfterFinished" automatically. The default TTL for jobs using the service will be 1 week (604800 seconds). A completed job (in success or error state) will be deleted from the service once one week has elapsed after execution has completed. This will reduce excessive object accumulation on the service. + +Note: This policy is automated and does not require users to change their job specifications. diff --git a/mkdocs.yml b/mkdocs.yml index 6149520bd..ae3b69c00 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -56,6 +56,7 @@ nav: - "Running codes": services/ultra2/run.md - "GPU Service": - "Overview": services/gpuservice/index.md + - "Policies": services/gpuservice/policies.md - "Tutorial": - "Getting Started": services/gpuservice/training/L1_getting_started.md - "Persistent Volumes": services/gpuservice/training/L2_requesting_persistent_volumes.md