Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CLI] bake run.yaml file inside docker container #122

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

yanxi0830
Copy link
Contributor

@yanxi0830 yanxi0830 commented Sep 26, 2024

Changes

  • Motivation: We should not need to install llama CLI & run llama stack configure / llama stack run outside of docker containers. Downloading docker image should be sufficient to start Llama Stack server.

  • [RFC] Only use docker commands for running w/ docker container. New developer flow for interaction with docker image.

Developer Flow

  1. Download docker image from docker hub.
docker image pull llamastack/llamastack-local-gpu
  1. Run w/ built in default config
podman run --network host -it -p 5000:5000 -v ~/.llama:/root/.llama --gpus=all llamastack-local-gpu --port 5000
  1. (Advanced Option 1) Run with custom config outside docker
podman run --network host -it \
-p 5000:5000 \
-v path/to/run.yaml:/app/run.yaml \
-v ~/.llama:/root/.llama \
--gpus=all \
llamastack-d1 \
--port 5000
--config /app/config.yaml \

where path/to/run.yaml is absolute path to config outside container.

  1. (Advanced Option 2) Configure w/ wizard inside docker image & run
podman run --network host -it --entrypoint "/bin/bash" llamastack-d0

$ (inside container) llama stack configure llamastack-build.yaml
...
Run configuration saved to d0-run.yaml
podman run --network host -it -p 5001:5001 -v ~/.llama:/root/.llama --gpus=all llamastack-d0 --port 5001 --config /app/d0-run.yaml
  1. Add templated local-gpu run.yaml & local-cpu run.yaml files for easier configuration in (3).

Distribution Owner: Building Docker

$ llama stack build

> Enter a name for your Llama Stack (e.g. my-local-stack): d7
> Enter the image type you want your Llama Stack to be built as (docker or conda): docker

 Llama Stack is composed of several APIs working together. Let's configure the providers (implementations) you want to use for these APIs.
> Enter provider for the inference API: (default=meta-reference): meta-reference
> Enter provider for the safety API: (default=meta-reference): meta-reference
> Enter provider for the agents API: (default=meta-reference): meta-reference
> Enter provider for the memory API: (default=meta-reference): meta-reference
> Enter provider for the telemetry API: (default=meta-reference): meta-reference
 
 > (Optional) Enter a short description for your Llama Stack:
Build spec configuration saved at /data/users/xiyan/llama-stack/tmp/configs/d7-build.yaml
Configuring API `inference`...
=== Configuring provider `meta-reference` for API inference...
Enter value for model (default: Llama3.1-8B-Instruct) (required): 
Do you want to configure quantization? (y/n): n
Enter value for torch_seed (optional): 
Enter value for max_seq_len (default: 4096) (required): 
Enter value for max_batch_size (default: 1) (required): 

Configuring API `safety`...
=== Configuring provider `meta-reference` for API safety...
Do you want to configure llama_guard_shield? (y/n): n
Do you want to configure prompt_guard_shield? (y/n): n

Configuring API `agents`...
=== Configuring provider `meta-reference` for API agents...
Enter `type` for persistence_store (options: redis, sqlite, postgres) (default: sqlite): 

Configuring SqliteKVStoreConfig:
Enter value for namespace (optional): 
Enter value for db_path (default: /home/xiyan/.llama/runtime/kvstore.db) (required): 

Configuring API `memory`...
=== Configuring provider `meta-reference` for API memory...
> Please enter the supported memory bank type your provider has for memory: vect
or

Configuring API `telemetry`...
=== Configuring provider `meta-reference` for API telemetry...

> YAML configuration has been written to `/data/users/xiyan/llama-stack/tmp/configs/d7-run.yaml`.
Dockerfile created successfully in /tmp/tmp.4Mfy6zpfb2/DockerfileFROM python:3.10-slim
WORKDIR /app
...

...
Success! You can run it with: podman run -p 8000:8000 llamastack-d7

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants