This page shows you how to run a Job in a Kubernetes cluster with Kueue enabled.
The intended audience for this page are batch users.
Make sure the following conditions are met:
- A Kubernetes cluster is running.
- The kubectl command-line tool has communication with your cluster.
- Kueue is installed.
- The cluster has quotas configured.
Run the following command to list the Queues available in your namespace.
kubectl -n default get localqueues
# Or use the 'queues' alias.
kubectl -n default get queues
The output is similar to the following:
NAME CLUSTERQUEUE PENDING WORKLOADS
main cluster-total 3
The ClusterQueue defines the quotas for the Queue.
Running a Job in Kueue is similar to running a Job in a Kubernetes cluster without Kueue. However, you must consider the following differences:
- You should create the Job in a suspended state, as Kueue will decide when it's the best time to start the Job.
- You have to set the Queue you want to submit the Job to. Use the
kueue.x-k8s.io/queue-name
annotation. - You should include the resource requests for each Job Pod.
Here is a sample Job with three Pods that just sleep for a few seconds. This sample is also available in config/samples/sample-job.yaml.
# sample-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
generateName: sample-job-
annotations:
kueue.x-k8s.io/queue-name: main
spec:
parallelism: 3
completions: 3
suspend: true
template:
spec:
containers:
- name: dummy-job
image: gcr.io/k8s-staging-perf-tests/sleep:latest
args: ["30s"]
resources:
requests:
cpu: 1
memory: "200Mi"
restartPolicy: Never
You can run the Job with the following command:
kubectl create -f sample-job.yaml
Internally, Kueue will create a corresponding Workload for this Job with a matching name.
kubectl -n default get workloads
The output will be similar to the following:
NAME QUEUE ADMITTED BY AGE
sample-job-sl4bm main 1s
You can see the workload status with the following command:
kubectl -n default describe workload sample-job-sl4bm
If the ClusterQueue doesn't have enough quota to run the workload, the output will be similar to the following:
Name: sample-job-jtz5f
Namespace: default
Labels: <none>
Annotations: <none>
API Version: kueue.x-k8s.io/v1alpha1
Kind: Workload
Metadata:
...
Spec:
...
Status:
Conditions:
Last Probe Time: 2022-03-28T19:43:03Z
Last Transition Time: 2022-03-28T19:43:03Z
Message: workload didn't fit
Reason: Pending
Status: False
Type: Admitted
Events: <none>
When the ClusterQueue has enough quota to run the workload, it will admit the workload. To see if the workload was admitted, run the following command:
kubectl -n default get workloads
The output is similar to the following:
NAME QUEUE ADMITTED BY AGE
sample-job-sl4bm main cluster-total 45s
To view the event for the workload admission, run the following command:
kubectl -n default describe workload sample-job-sl4bm
The output is similar to the following:
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Admitted 50s kueue-manager Admitted by ClusterQueue cluster-total
To continue monitoring the workload progress, you can run the following command:
kubectl -n default describe workload sample-job-sl4bm
Once the workload has finished running, the output is similar to the following:
...
Status:
Conditions:
...
Last Probe Time: 2022-03-28T19:43:37Z
Last Transition Time: 2022-03-28T19:43:37Z
Message: Job finished successfully
Reason: JobFinished
Status: True
Type: Finished
...
To review more details about the Job status, run the following command:
kubectl -n default describe job sample-job-sl4bm
The output is similar to the following:
Name: sample-job-sl4bm
Namespace: default
...
Start Time: Mon, 28 Mar 2022 15:45:17 -0400
Completed At: Mon, 28 Mar 2022 15:45:49 -0400
Duration: 32s
Pods Statuses: 0 Active / 3 Succeeded / 0 Failed
Pod Template:
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Suspended 22m job-controller Job suspended
Normal CreatedWorkload 22m kueue-job-controller Created Workload: default/sample-job-rxb6q
Normal SuccessfulCreate 19m job-controller Created pod: sample-job-rxb6q-7bqld
Normal Started 19m kueue-job-controller Admitted by clusterQueue cluster-total
Normal SuccessfulCreate 19m job-controller Created pod: sample-job-rxb6q-7jw4z
Normal SuccessfulCreate 19m job-controller Created pod: sample-job-rxb6q-m7wgm
Normal Resumed 19m job-controller Job resumed
Normal Completed 18m job-controller Job completed
Since events have a timestamp with a resolution of seconds, the events might be listed in a slightly different order from which they actually occurred.