From 0c7fd8193e263f6db6ece7e383b4efc85f817d5f Mon Sep 17 00:00:00 2001
From: Tuomas Katila <tuomas.katila@intel.com>
Date: Tue, 18 Apr 2023 14:07:25 +0300
Subject: [PATCH] gpu: add notes about gpu-plugin modes

Fixes: #1381

Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
---
 cmd/gpu_plugin/README.md | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/cmd/gpu_plugin/README.md b/cmd/gpu_plugin/README.md
index 46c495302..fe11bd507 100644
--- a/cmd/gpu_plugin/README.md
+++ b/cmd/gpu_plugin/README.md
@@ -20,6 +20,7 @@ Table of Contents
               * [Fractional resources details](#fractional-resources-details)
     * [Verify Plugin Registration](#verify-plugin-registration)
 * [Testing and Demos](#testing-and-demos)
+* [Use Cases for Different Modes](#use-cases-for-different-modes)
 * [Issues with media workloads on multi-GPU setups](#issues-with-media-workloads-on-multi-gpu-setups)
     * [Workaround for QSV and VA-API](#workaround-for-qsv-and-va-api)
 
@@ -48,7 +49,7 @@ backend libraries can offload compute operations to GPU.
 | -enable-monitoring | - | disabled | Enable 'i915_monitoring' resource that provides access to all Intel GPU devices on the node |
 | -resource-manager | - | disabled | Enable fractional resource management, [see also dependencies](#fractional-resources) |
 | -shared-dev-num | int | 1 | Number of containers that can share the same GPU device |
-| -allocation-policy | string | none | 3 possible values: balanced, packed, none. It is meaningful when shared-dev-num > 1, balanced mode is suitable for workload balance among GPU devices, packed mode is suitable for making full use of each GPU device, none mode is the default. Allocation policy does not have effect when resource manager is enabled. |
+| -allocation-policy | string | none | 3 possible values: balanced, packed, none. It is meaningful when shared-dev-num > 1: balanced mode is suitable for workload balance among GPU devices, packed mode is suitable for making full use of each GPU device, and none selects first available device from kubelet. None mode is the default. Allocation policy does not have effect when resource manager is enabled. |
 
 The plugin also accepts a number of other arguments (common to all plugins) related to logging.
 Please use the -h option to see the complete list of logging related options.
@@ -315,6 +316,15 @@ The GPU plugin functionality can be verified by deploying an [OpenCL image](../.
       Warning  FailedScheduling  <unknown>  default-scheduler  0/1 nodes are available: 1 Insufficient gpu.intel.com/i915.
     ```
 
+## Use Cases for Different Modes
+
+Intel GPU-plugin supports a few different operation modes. Depending on the workloads the cluster is running, some modes make less sense than others. Below is a table that explains the differences between the modes and suggests workload types for each mode. The mode selection requires pre-though as it is cluster wide.
+
+| Mode | Sharing | Workload examples | Time critical |
+|:---- |:-------- |:------- |:------- |
+| shared-dev-num == 1 | No, 1 container per GPU | AI/ML training or an application that can fully utilize a GPU e.g. pack multiple workloads. | Yes |
+| shared-dev-num > 1 | Yes, >1 containers per GPU | Inference, media transcode, media analytics. Workloads that only require part of the GPU. See also [allocation profiles](#modes-and-configuration-options). | No |
+| shared-dev-num > 1 && resource-management | Yes and no, 1>= containers per GPU | All workloads, but requires GPU resource allocations. GPUs can be dedicated and shared based on the GPU resources like memory and millicores. Requires [GAS](https://github.com/intel/platform-aware-scheduling/tree/master/gpu-aware-scheduling). See also [fractional use](#fractional-resources-details). | Depends on the Pod spec |
 
 ## Issues with media workloads on multi-GPU setups