Skip to content

Latest commit

 

History

History
130 lines (95 loc) · 6.68 KB

ramalama.1.md

File metadata and controls

130 lines (95 loc) · 6.68 KB

% ramalama 1

NAME

ramalama - Simple management tool for working with AI Models

SYNOPSIS

ramalama [options] command

DESCRIPTION

RamaLama : The goal of RamaLama is to make AI boring.

RamaLama tool facilitates local management and serving of AI Models.

On first run RamaLama inspects your system for GPU support, falling back to CPU support if no GPUs are present.

RamaLama uses container engines like Podman or Docker to pull the appropriate OCI image with all of the software necessary to run an AI Model for your systems setup.

Running in containers eliminates the need for users to configure the host system for AI. After the initialization, RamaLama runs the AI Models within a container based on the OCI image.

RamaLama then pulls AI Models from model registires. Starting a chatbot or a rest API service from a simple single command. Models are treated similarly to how Podman and Docker treat container images.

When both Podman and Docker are installed, RamaLama defaults to Podman, The RAMALAMA_CONTAINER_ENGINE=docker environment variable can override this behaviour. When neather are installed RamaLama will attempt to run the model with software on the local system.

RamaLama supports multiple AI model registries types called transports. Supported transports:

Note:

On Macs with Arm support and Podman, the Podman machine must be configured to use the krunkit VM Type. This allows the Mac's GPU to be used within the VM.

TRANSPORTS

Transports Web Site
HuggingFace huggingface.co
Ollama ollama.com
OCI Container Registries opencontainers.org
Examples: quay.io, Docker Hub, and Artifactory

RamaLama uses the Ollama registry transport by default. Use the RAMALAMA_TRANSPORTS environment variable to modify the default. export RAMALAMA_TRANSPORT=huggingface Changes RamaLama to use huggingface transport.

Individual model transports can be modifies when specifying a model via the huggingface://, oci://, or ollama:// prefix.

ramalama pull huggingface://afrideva/Tiny-Vicuna-1B-GGUF/tiny-vicuna-1b.q2_k.gguf

To make it easier for users, RamaLama uses shortname files, which container alias names for fully specified AI Models allowing users to specify the shorter names when referring to models. RamaLama reads shortnames.conf files if they exist . These files contain a list of name value pairs for specification of the model. The following table specifies the order which Ramama reads the files . Any duplicate names that exist override previously defined shortnames.

Shortnames type Path
Distribution /usr/share/ramalama/shortnames.conf
Administrators /etc/ramamala/shortnames.conf
Users $HOME/.config/ramalama/shortnames.conf
$ cat /usr/share/ramalama/shortnames.conf
[shortnames]
  "tiny" = "ollama://tinyllama"
  "granite" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
  "granite:7b" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
  "ibm/granite" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
  "merlinite" = "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf"
  "merlinite:7b" = "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf"
...

ramalama [GLOBAL OPTIONS]

GLOBAL OPTIONS

--container

run RamaLama in the default container (default: True) use environment variable "RAMALAMA_IN_CONTAINER=false" to change default.

--dryrun

show container runtime command without executing it (default: False)

--engine

run RamaLama using the specified container engine. use environment variable RAMALAMA_CONTAINER_ENGINE to modify the default behaviour.

--help, -h

show this help message and exit

--image=IMAGE

OCI container image to run with specified AI model. By default RamaLama attempts to use the best AI OCI image based on GPU on the local system. The --image option allows users to override the default.

The RAMALAMA_IMAGE environment variable can be used to modify the default image. export RAMALAMA_TRANSPORT=quay.io/ramalama/aiimage:latest tells RamaLama to use the quay.io/ramalama/aiimage:latest image.

--nocontainer

do not run RamaLama in the default container (default: False)

--runtime

specify the runtime to use, valid options are 'llama.cpp' and 'vllm' (default: llama.cpp)

--store=STORE

store AI Models in the specified directory (default rootless: $HOME/.local/share/ramalama, default rootful: /var/lib/ramalama)

COMMANDS

Command Description
ramalama-containers(1) list all RamaLama containers
ramalama-info(1) Display RamaLama configuration information
ramalama-list(1) list all downloaded AI Models
ramalama-login(1) login to remote registry
ramalama-logout(1) logout from remote registry
ramalama-pull(1) pull AI Models from Model registries to local storage
ramalama-push(1) push AI Models from local storage to remote registries
ramalama-rm(1) remove AI Models from local storage
ramalama-run(1) run specified AI Model as a chatbot
ramalama-serve(1) serve REST API on specified AI Model
ramalama-stop(1) stop named container that is running AI Model
ramalama-version(1) display version of RamaLama

CONFIGURATION FILES

SEE ALSO

podman(1), docker(1)

HISTORY

Aug 2024, Originally compiled by Dan Walsh [email protected]