Skip to content

Commit

Permalink
[Feature] Integrate lmdeploy pipeline api (#1198)
Browse files Browse the repository at this point in the history
* integrate lmdeploy's pipeline api

* fix linting

* update user guide

* rename

* update

* update

* update

* rollback class name

* update

* remove unused code

* update

* update

* fix ci check

* compatibility

* remove concurrency

* Update configs/models/hf_internlm/lmdeploy_internlm2_chat_7b.py

* Update docs/zh_cn/advanced_guides/evaluation_lmdeploy.md

* [Bug] fix lint

---------

Co-authored-by: Songyang Zhang <[email protected]>
Co-authored-by: tonysy <[email protected]>
  • Loading branch information
3 people authored Oct 9, 2024
1 parent d2ab51a commit b52ba65
Show file tree
Hide file tree
Showing 16 changed files with 249 additions and 955 deletions.
69 changes: 0 additions & 69 deletions configs/eval_internlm_chat_lmdeploy_pytorch.py

This file was deleted.

41 changes: 0 additions & 41 deletions configs/eval_internlm_chat_lmdeploy_tis.py

This file was deleted.

40 changes: 0 additions & 40 deletions configs/eval_internlm_chat_turbomind_tis.py

This file was deleted.

28 changes: 0 additions & 28 deletions configs/eval_internlm_turbomind_tis.py

This file was deleted.

17 changes: 13 additions & 4 deletions configs/models/hf_internlm/lmdeploy_internlm2_chat_7b.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,24 @@
from opencompass.models import TurboMindModelwithChatTemplate


models = [
dict(
type=TurboMindModelwithChatTemplate,
abbr='internlm2-chat-7b-turbomind',
abbr=f'internlm2-chat-7b-lmdeploy',
path='internlm/internlm2-chat-7b',
engine_config=dict(session_len=8192, max_batch_size=16, tp=1),
gen_config=dict(top_k=1, temperature=1e-6, top_p=0.9, max_new_tokens=4096),
# inference backend of LMDeploy. It can be either 'turbomind' or 'pytorch'.
# If the model is not supported by 'turbomind', it will fallback to
# 'pytorch'
backend='turbomind',
# For the detailed engine config and generation config, please refer to
# https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/messages.py
engine_config=dict(tp=1),
gen_config=dict(do_sample=False),
max_seq_len=8192,
max_out_len=4096,
batch_size=16,
# the max number of prompts that LMDeploy receives
# in `generate` function
batch_size=5000,
run_cfg=dict(num_gpus=1),
)
]
88 changes: 88 additions & 0 deletions docs/en/advanced_guides/evaluation_lmdeploy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Evaluation with LMDeploy

We now support evaluation of models accelerated by the [LMDeploy](https://github.com/InternLM/lmdeploy). LMDeploy is a toolkit designed for compressing, deploying, and serving LLM. It has a remarkable inference performance. We now illustrate how to evaluate a model with the support of LMDeploy in OpenCompass.

## Setup

### Install OpenCompass

Please follow the [instructions](https://opencompass.readthedocs.io/en/latest/get_started/installation.html) to install the OpenCompass and prepare the evaluation datasets.

### Install LMDeploy

Install lmdeploy via pip (python 3.8+)

```shell
pip install lmdeploy
```

The default prebuilt package is compiled on CUDA 12. However, if CUDA 11+ is required, you can install lmdeploy by:

```shell
export LMDEPLOY_VERSION=0.6.0
export PYTHON_VERSION=310
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
```

## Evaluation

When evaluating a model, it is necessary to prepare an evaluation configuration that specifies information such as the evaluation dataset, the model, and inference parameters.

Taking [internlm2-chat-7b](https://huggingface.co/internlm/internlm2-chat-7b) as an example, the evaluation config is as follows:

```python
# configure the dataset
from mmengine.config import read_base


with read_base():
# choose a list of datasets
from .datasets.mmlu.mmlu_gen_a484b3 import mmlu_datasets
from .datasets.ceval.ceval_gen_5f30c7 import ceval_datasets
from .datasets.triviaqa.triviaqa_gen_2121ce import triviaqa_datasets
from opencompass.configs.datasets.gsm8k.gsm8k_0shot_v2_gen_a58960 import \
gsm8k_datasets
# and output the results in a chosen format
from .summarizers.medium import summarizer

datasets = sum((v for k, v in locals().items() if k.endswith('_datasets')), [])

# configure lmdeploy
from opencompass.models import TurboMindModelwithChatTemplate



# configure the model
models = [
dict(
type=TurboMindModelwithChatTemplate,
abbr=f'internlm2-chat-7b-lmdeploy',
# model path, which can be the address of a model repository on the Hugging Face Hub or a local path
path='internlm/internlm2-chat-7b',
# inference backend of LMDeploy. It can be either 'turbomind' or 'pytorch'.
# If the model is not supported by 'turbomind', it will fallback to
# 'pytorch'
backend='turbomind',
# For the detailed engine config and generation config, please refer to
# https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/messages.py
engine_config=dict(tp=1),
gen_config=dict(do_sample=False),
# the max size of the context window
max_seq_len=7168,
# the max number of new tokens
max_out_len=1024,
# the max number of prompts that LMDeploy receives
# in `generate` function
batch_size=5000,
run_cfg=dict(num_gpus=1),
)
]
```

Place the aforementioned configuration in a file, such as "configs/eval_internlm2_lmdeploy.py". Then, in the home folder of OpenCompass, start evaluation by the following command:

```shell
python run.py configs/eval_internlm2_lmdeploy.py -w outputs
```

You are expected to get the evaluation results after the inference and evaluation.
78 changes: 0 additions & 78 deletions docs/en/advanced_guides/evaluation_turbomind.md

This file was deleted.

Loading

0 comments on commit b52ba65

Please sign in to comment.