Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] Minor update in model profiling documentation #14

Merged
merged 1 commit into from
May 15, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion vidur/profiling/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,6 @@ We need actual GPUs to get profiling data for a new model. Once the profiling is
- Ensure that correct parameters are set in the YAML file so that the reference transformer model [GPTModel](vidur/profiling/mlp/mlp_impl.py) closely resembles the new model.
- We use this reference model to profile only the MLP operations of all the models so the attention operations are no-op'ed here.
1. Clone the [`sarathi-serve`](https://github.com/microsoft/sarathi-serve) GitHub repo and follow its README to install it. Let us assume that the Python virtual environment was created in `sarathi-serve/env`.
1. Ensure that the new model is added to the `sarathi-serve` repo. `sarathi-serve` is a fork of vLLM so one can use the [Adding a New Model](https://docs.vllm.ai/en/stable/models/adding_model.html) documentation to add the new model.
1. Clone this (`vidur`) repo but keep the `sarathi-serve/env` virtual environment activated.
1. Run the following command to install the simulator in the virtual environment: `python -m pip install -e .` from the `vidur/` directory.
1. For compute profiling, 1 GPU is enough even for tensor parallel degrees greater than 1. So `num_gpus` set to 1 is sufficient albeit slower for MLP and attention profiling. For network profiling, 4 GPUs are needed for TP1 and 8 GPUs are needed for TP8 etc.
Expand Down
Loading