Skip to content

Commit

Permalink
Merge pull request #94 from EPCCed/ttl_update
Browse files Browse the repository at this point in the history
AG: updated to include page for service policies
  • Loading branch information
akrause2014 authored Jul 21, 2023
2 parents a0711f4 + 93e876d commit c633359
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 0 deletions.
4 changes: 4 additions & 0 deletions docs/services/gpuservice/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@ A standard project namespace has the following initial quota (subject to ongoing

Note these quotas are maximum use by a single project, and that during periods of high usage Kubernetes Jobs maybe queued waiting for resource to become available on the cluster.

## Additional Service Policy Information

Additional information on service policies can be found [here](policies.md).

## EIDF GPU Service Tutorial

This tutorial teaches users how to submit tasks to the EIDFGPUS, but it is not a comprehensive overview of Kubernetes.
Expand Down
21 changes: 21 additions & 0 deletions docs/services/gpuservice/policies.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# GPU Service Policies

## Namespaces

Each project will be given a namespace which will have an applied quota.

Default Quota:

- CPU: 100 Cores
- Memory: 1TiB
- GPU: 12

## Kubeconfig

Each project will be assigned a kubeconfig file for access to the service which will allow operation in the assigned namespace and access to exposed service operators, for example the GPU and CephRBD operators.

## Kubernetes Job Time to Live

All Kubernetes Jobs submitted to the service will have a Time to Live (TTL) applied via "spec.ttlSecondsAfterFinished" automatically. The default TTL for jobs using the service will be 1 week (604800 seconds). A completed job (in success or error state) will be deleted from the service once one week has elapsed after execution has completed. This will reduce excessive object accumulation on the service.

Note: This policy is automated and does not require users to change their job specifications.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ nav:
- "Running codes": services/ultra2/run.md
- "GPU Service":
- "Overview": services/gpuservice/index.md
- "Policies": services/gpuservice/policies.md
- "Tutorial":
- "Getting Started": services/gpuservice/training/L1_getting_started.md
- "Persistent Volumes": services/gpuservice/training/L2_requesting_persistent_volumes.md
Expand Down

0 comments on commit c633359

Please sign in to comment.