docs: add docs for jcloud horizontal pod autoscale (#5957)

Co-authored-by: winstonww <[email protected]>
jina-ai · Jul 20, 2023 · 2912320 · 2912320
1 parent 6f8c10b
commit 2912320
Showing 1 changed file with 19 additions and 7 deletions.
diff --git a/docs/concepts/jcloud/configuration.md b/docs/concepts/jcloud/configuration.md
@@ -226,17 +226,29 @@ executors:
 
 Below are the defaults and requirements for the configurations:
 
-| Name   | Default     | Allowed                  | Description                                       |
-| ------ | ----------- | ------------------------ | ------------------------------------------------- |
-| min    | 1           | int                      | Minimum number of replicas (`0` means serverless) |
-| max    | 2           | int, up to 5             | Maximum number of replicas                        |
-| metric | concurrency | `concurrency`  /   `rps` | Metric for scaling                                |
-| target | 100         | int                      | Target number the replicas try to maintain        |
+| Name             | Default     | Allowed                                     | Description                                                               |
+| ------           | ----------- | ------------------------                    | -------------------------------------------------                         |
+| min              | 1           | int                                         | Minimum number of replicas (`0` means serverless)                         |
+| max              | 2           | int, up to 5                                | Maximum number of replicas                                                |
+| metric           | concurrency | `concurrency` / `rps` / `cpu` / `memory`    | Metric for scaling                                                        |
+| scale_down_delay | 30s         | str, `0s` <= value <= `1h`                  | Time window which must pass at reduced concurrency before a scaling down  |
+| target           | 100         | int                                         | Target number the replicas try to maintain.                               |
+
+The unit of `target` depends of the metric specified. Refer to the table below:
+
+| Metric        | Target                                                                                                                                                  |
+| ----          | -----                                                                                                                                                   |
+| `concurrency` | Number of concurrent requests processed at any given time.                                                                                              |
+| `rps`         | Number of requests processed per second per replica.                                                                                                    |
+| `cpu`         | Average % CPU utilization of each pod<br>(e.g. `60` means replicas will be scaled up when pods on average reach 60% CPU utilization)                    |
+| `memory`      | Average mebibytes of memory used by each pod<br>(e.g. `200` means replicas will be scaled up when the average pods' memory consumption exceeds 200MiB). |
 
 After you make a JCloud deployment using the autoscaling configuration, the Flow serving part is just the same: the only difference you may notice is it takes a few extra seconds to handle the initial requests since it needs to scale the deployments behind the scenes. Let JCloud handle the scaling from now on, and you can deal with the code!
 
+Note, that if `metric` is `cpu` or `memory`, `min` will be reset to 1 if user sets it to set to 0.
+
 ### Pricing
-At present, pricing for autoscaled Executor/Gateway follows the same {ref}`JCloud pricing rules <jcloud-pricing>` for the most part.
+At present, pricing for autoscaled Executor/Gateway largely follows the same {ref}`JCloud pricing rules <jcloud-pricing>` as other Jina AI services.
 We track the minimum number of replicas in autoscale configurations and use it as a multiplier for the replicas used when calculating the
 `Credits Per Hour`.