Skip to content

Commit

Permalink
docs: add docs for jcloud horizontal pod autoscale (#5957)
Browse files Browse the repository at this point in the history
Co-authored-by: winstonww <[email protected]>
  • Loading branch information
winstonww and winstonww authored Jul 20, 2023
1 parent 6f8c10b commit 2912320
Showing 1 changed file with 19 additions and 7 deletions.
26 changes: 19 additions & 7 deletions docs/concepts/jcloud/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -226,17 +226,29 @@ executors:

Below are the defaults and requirements for the configurations:

| Name | Default | Allowed | Description |
| ------ | ----------- | ------------------------ | ------------------------------------------------- |
| min | 1 | int | Minimum number of replicas (`0` means serverless) |
| max | 2 | int, up to 5 | Maximum number of replicas |
| metric | concurrency | `concurrency` / `rps` | Metric for scaling |
| target | 100 | int | Target number the replicas try to maintain |
| Name | Default | Allowed | Description |
| ------ | ----------- | ------------------------ | ------------------------------------------------- |
| min | 1 | int | Minimum number of replicas (`0` means serverless) |
| max | 2 | int, up to 5 | Maximum number of replicas |
| metric | concurrency | `concurrency` / `rps` / `cpu` / `memory` | Metric for scaling |
| scale_down_delay | 30s | str, `0s` <= value <= `1h` | Time window which must pass at reduced concurrency before a scaling down |
| target | 100 | int | Target number the replicas try to maintain. |

The unit of `target` depends of the metric specified. Refer to the table below:

| Metric | Target |
| ---- | ----- |
| `concurrency` | Number of concurrent requests processed at any given time. |
| `rps` | Number of requests processed per second per replica. |
| `cpu` | Average % CPU utilization of each pod<br>(e.g. `60` means replicas will be scaled up when pods on average reach 60% CPU utilization) |
| `memory` | Average mebibytes of memory used by each pod<br>(e.g. `200` means replicas will be scaled up when the average pods' memory consumption exceeds 200MiB). |

After you make a JCloud deployment using the autoscaling configuration, the Flow serving part is just the same: the only difference you may notice is it takes a few extra seconds to handle the initial requests since it needs to scale the deployments behind the scenes. Let JCloud handle the scaling from now on, and you can deal with the code!

Note, that if `metric` is `cpu` or `memory`, `min` will be reset to 1 if user sets it to set to 0.

### Pricing
At present, pricing for autoscaled Executor/Gateway follows the same {ref}`JCloud pricing rules <jcloud-pricing>` for the most part.
At present, pricing for autoscaled Executor/Gateway largely follows the same {ref}`JCloud pricing rules <jcloud-pricing>` as other Jina AI services.
We track the minimum number of replicas in autoscale configurations and use it as a multiplier for the replicas used when calculating the
`Credits Per Hour`.

Expand Down

0 comments on commit 2912320

Please sign in to comment.