Skip to content

Commit

Permalink
Add support for baichuan2 models
Browse files Browse the repository at this point in the history
  • Loading branch information
xusenlin committed Sep 7, 2023
1 parent 6b2ec78 commit 6b82db7
Show file tree
Hide file tree
Showing 6 changed files with 89 additions and 46 deletions.
1 change: 1 addition & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ DEVICE=cuda
DEVICE_MAP=
GPUS=
NUM_GPUs=1
DTYPE=half

# patch related
PATCH_TYPE=
Expand Down
42 changes: 23 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@

## 📢 新闻

+ 【2023.08.28】 添加 [baichuan2](https://github.com/baichuan-inc/Baichuan2) 模型支持,[启动方式链接](https://github.com/xusenlinzy/api-for-open-llm/blob/master/docs/SCRIPT.md#baichuan2)


+ 【2023.08.28】 添加 `transformers.TextIteratorStreamer` 流式输出支持,只需将环境变量修改为 `USE_STREAMER_V2=true`


Expand Down Expand Up @@ -83,25 +86,26 @@

**语言模型**

| 模型 | 基座模型 | 参数量 | 语言 | 模型权重链接 |
|:----------------------------------------------------------------------:|:------------:|:--------:|:------:|:-----------------------------------------------------------------------------------------------------------:|
| [codellama](https://github.com/facebookresearch/codellama) | LLaMA2 | 7/13/34B | multi | [codellama/CodeLlama-7b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf) |
| [xverse-13b-chat](https://github.com/xverse-ai/XVERSE-13B) | Xverse | 13B | multi | [xverse/XVERSE-13B-Chat](https://huggingface.co/xverse/XVERSE-13B-Chat) |
| [qwen-7b-chat](https://github.com/QwenLM/Qwen-7B) | Qwen | 7B | en, zh | [Qwen/Qwen-7B-Chat](https://huggingface.co/baichuan-inc/Qwen/Qwen-7B-Chat) |
| [baichuan-13b-chat](https://github.com/baichuan-inc/Baichuan-13B) | Baichuan | 13B | en, zh | [baichuan-inc/Baichuan-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat) |
| [InternLM](https://github.com/InternLM/InternLM) | InternLM | 7B | en, zh | [internlm/internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b) |
| [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B) | GLM | 6/130B | en, zh | [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b) |
| [baichaun-7b](https://github.com/baichuan-inc/baichuan-7B) | Baichuan | 7B | en, zh | [baichuan-inc/baichuan-7B](https://huggingface.co/baichuan-inc/baichuan-7B) |
| [Guanaco](https://github.com/artidoro/qlora/tree/main) | LLaMA | 7/33/65B | en | [timdettmers/guanaco-33b-merged](https://huggingface.co/timdettmers/guanaco-33b-merged) |
| [YuLan-Chat](https://github.com/RUC-GSAI/YuLan-Chat) | LLaMA | 13/65B | en, zh | [RUCAIBox/YuLan-Chat-13b-delta](https://huggingface.co/RUCAIBox/YuLan-Chat-13b-delta) |
| [TigerBot](https://github.com/TigerResearch/TigerBot) | BLOOMZ | 7/180B | en, zh | [TigerResearch/tigerbot-7b-sft](https://huggingface.co/TigerResearch/tigerbot-7b-sft) |
| [OpenBuddy](https://github.com/OpenBuddy/OpenBuddy) | LLaMA、Falcon | 7B | multi | [OpenBuddy](https://huggingface.co/OpenBuddy) |
| [MOSS](https://github.com/OpenLMLab/MOSS) | CodeGen | 16B | en, zh | [fnlp/moss-moon-003-sft-int4](https://huggingface.co/fnlp/moss-moon-003-sft-int4) |
| [Phoenix](https://github.com/FreedomIntelligence/LLMZoo) | BLOOMZ | 7B | multi | [FreedomIntelligence/phoenix-inst-chat-7b](https://huggingface.co/FreedomIntelligence/phoenix-inst-chat-7b) |
| [BAIZE](https://github.com/project-baize/baize-chatbot) | LLaMA | 7/13/30B | en | [project-baize/baize-lora-7B](https://huggingface.co/project-baize/baize-lora-7B) |
| [Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca) | LLaMA | 7/13B | en, zh | [ziqingyang/chinese-alpaca-plus-lora-7b](https://huggingface.co/ziqingyang/chinese-alpaca-plus-lora-7b) |
| [BELLE](https://github.com/LianjiaTech/BELLE) | BLOOMZ | 7B | zh | [BelleGroup/BELLE-7B-2M](https://huggingface.co/BelleGroup/BELLE-7B-2M) |
| [ChatGLM](https://github.com/THUDM/ChatGLM-6B) | GLM | 6B | en, zh | [THUDM/chatglm-6b](https://huggingface.co/THUDM/chatglm-6b) |
| 模型 | 基座模型 | 参数量 | 语言 | 模型权重链接 |
|:---------------------------------------------------------------------:|:------------:|:--------:|:------:|:-----------------------------------------------------------------------------------------------------------:|
| [baichuan2](https://github.com/baichuan-inc/Baichuan2) | Baichuan | 7/13 | en, zh | [baichuan-inc/Baichuan2-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat) |
| [codellama](https://github.com/facebookresearch/codellama) | LLaMA2 | 7/13/34B | multi | [codellama/CodeLlama-7b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf) |
| [xverse-13b-chat](https://github.com/xverse-ai/XVERSE-13B) | Xverse | 13B | multi | [xverse/XVERSE-13B-Chat](https://huggingface.co/xverse/XVERSE-13B-Chat) |
| [qwen-7b-chat](https://github.com/QwenLM/Qwen-7B) | Qwen | 7B | en, zh | [Qwen/Qwen-7B-Chat](https://huggingface.co/baichuan-inc/Qwen/Qwen-7B-Chat) |
| [baichuan-13b-chat](https://github.com/baichuan-inc/Baichuan-13B) | Baichuan | 13B | en, zh | [baichuan-inc/Baichuan-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat) |
| [InternLM](https://github.com/InternLM/InternLM) | InternLM | 7B | en, zh | [internlm/internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b) |
| [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B) | GLM | 6/130B | en, zh | [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b) |
| [baichaun-7b](https://github.com/baichuan-inc/baichuan-7B) | Baichuan | 7B | en, zh | [baichuan-inc/baichuan-7B](https://huggingface.co/baichuan-inc/baichuan-7B) |
| [Guanaco](https://github.com/artidoro/qlora/tree/main) | LLaMA | 7/33/65B | en | [timdettmers/guanaco-33b-merged](https://huggingface.co/timdettmers/guanaco-33b-merged) |
| [YuLan-Chat](https://github.com/RUC-GSAI/YuLan-Chat) | LLaMA | 13/65B | en, zh | [RUCAIBox/YuLan-Chat-13b-delta](https://huggingface.co/RUCAIBox/YuLan-Chat-13b-delta) |
| [TigerBot](https://github.com/TigerResearch/TigerBot) | BLOOMZ | 7/180B | en, zh | [TigerResearch/tigerbot-7b-sft](https://huggingface.co/TigerResearch/tigerbot-7b-sft) |
| [OpenBuddy](https://github.com/OpenBuddy/OpenBuddy) | LLaMA、Falcon | 7B | multi | [OpenBuddy](https://huggingface.co/OpenBuddy) |
| [MOSS](https://github.com/OpenLMLab/MOSS) | CodeGen | 16B | en, zh | [fnlp/moss-moon-003-sft-int4](https://huggingface.co/fnlp/moss-moon-003-sft-int4) |
| [Phoenix](https://github.com/FreedomIntelligence/LLMZoo) | BLOOMZ | 7B | multi | [FreedomIntelligence/phoenix-inst-chat-7b](https://huggingface.co/FreedomIntelligence/phoenix-inst-chat-7b) |
| [BAIZE](https://github.com/project-baize/baize-chatbot) | LLaMA | 7/13/30B | en | [project-baize/baize-lora-7B](https://huggingface.co/project-baize/baize-lora-7B) |
| [Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca) | LLaMA | 7/13B | en, zh | [ziqingyang/chinese-alpaca-plus-lora-7b](https://huggingface.co/ziqingyang/chinese-alpaca-plus-lora-7b) |
| [BELLE](https://github.com/LianjiaTech/BELLE) | BLOOMZ | 7B | zh | [BelleGroup/BELLE-7B-2M](https://huggingface.co/BelleGroup/BELLE-7B-2M) |
| [ChatGLM](https://github.com/THUDM/ChatGLM-6B) | GLM | 6B | en, zh | [THUDM/chatglm-6b](https://huggingface.co/THUDM/chatglm-6b) |


**嵌入模型**
Expand Down
25 changes: 16 additions & 9 deletions api/apapter/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
from loguru import logger
from peft import PeftModel
from tqdm import tqdm

from transformers import (
AutoModel,
AutoConfig,
Expand Down Expand Up @@ -51,7 +50,14 @@ def load_model(self, model_name_or_path: Optional[str] = None, adapter_model: Op
num_gpus = kwargs.get("num_gpus", 1)
if device == "cuda":
if "torch_dtype" not in config_kwargs:
config_kwargs["torch_dtype"] = torch.float16
dtype = kwargs.get("dtype", "half")
if dtype == "half":
config_kwargs["torch_dtype"] = torch.float16
elif dtype == "bfloat16":
config_kwargs["torch_dtype"] = torch.bfloat16
else:
config_kwargs["torch_dtype"] = torch.float32

if num_gpus != 1:
config_kwargs["device_map"] = "auto"
# model_kwargs["device_map"] = "sequential" # This is important for not the same VRAM sizes
Expand Down Expand Up @@ -449,13 +455,14 @@ class CodeLlamaModelAdapter(LlamaModelAdapter):

@property
def tokenizer_class(self):
try:
from transformers import CodeLlamaTokenizer
return CodeLlamaTokenizer
except ImportError:
logger.error(
"transformers is not installed correctly. Please use the following command to install transformers\npip install git+https://github.com/huggingface/transformers.git."
)
require_version("transformers>=4.33.1", "To fix: pip install transformers>=4.33.1")
from transformers import CodeLlamaTokenizer

return CodeLlamaTokenizer

@property
def default_model_name_or_path(self):
return "codellama/CodeLlama-7b-Instruct-hf"


register_model_adapter(ChatglmModelAdapter)
Expand Down
16 changes: 7 additions & 9 deletions api/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@

from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from loguru import logger
from sentence_transformers import SentenceTransformer

from api.apapter import get_prompt_adapter
Expand Down Expand Up @@ -52,6 +51,7 @@ def create_generate_model():
load_in_8bit=config.LOAD_IN_8BIT,
load_in_4bit=config.LOAD_IN_4BIT,
use_ptuning_v2=config.USING_PTUNING_V2,
dtype=config.DTYPE,
)

return ModelServer(
Expand Down Expand Up @@ -98,14 +98,12 @@ def create_vllm_engine():

# A separate tokenizer to map token IDs to strings.
if "code-llama" in config.MODEL_NAME.lower():
try:
from transformers import CodeLlamaTokenizer

engine.engine.tokenizer = CodeLlamaTokenizer.from_pretrained(engine_args.tokenizer)
except ImportError:
logger.error(
"transformers is not installed correctly. Please use the following command to install transformers\npip install git+https://github.com/huggingface/transformers.git."
)
from transformers.utils.versions import require_version

require_version("transformers>=4.33.1", "To fix: pip install transformers>=4.33.1")
from transformers import CodeLlamaTokenizer

engine.engine.tokenizer = CodeLlamaTokenizer.from_pretrained(engine_args.tokenizer)
else:
engine.engine.tokenizer = get_tokenizer(
engine_args.tokenizer,
Expand Down
49 changes: 41 additions & 8 deletions docs/SCRIPT.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,8 @@ python server.py

**环境变量修改内容参考下面**

+ [baichuan2](https://github.com/xusenlinzy/api-for-open-llm/blob/master/docs/SCRIPT.md#baichuan2)

+ [code-llama](https://github.com/xusenlinzy/api-for-open-llm/blob/master/docs/SCRIPT.md#code-llama)

+ [sqlcoder](https://github.com/xusenlinzy/api-for-open-llm/blob/master/docs/SCRIPT.md#sqlcoder)
Expand Down Expand Up @@ -301,10 +303,6 @@ DEVICE_MAP=auto

### CODE-LLAMA

```shell
pip install git+https://github.com/huggingface/transformers.git
```

codellama/CodeLlama-7b-Instruct-hf

```shell
Expand All @@ -314,10 +312,6 @@ MODEL_PATH=codellama/CodeLlama-7b-Instruct-hf

### Wizard-Coder

```shell
pip install git+https://github.com/huggingface/transformers.git
```

WizardLM/WizardCoder-Python-34B-V1.0

```shell
Expand All @@ -326,3 +320,42 @@ MODEL_PATH=WizardLM/WizardCoder-Python-34B-V1.0
PROMPT_NAME=alpaca
DEVICE_MAP=auto
```


### Baichuan2

`Baichuan2` 系列模型中,为了加快推理速度使用了 `pytorch2.0` 加入的新功能 `F.scaled_dot_product_attention`,因此需要在 `pytorch2.0` 环境下运行

可以使用下面的命令升级 `llm-api:pytorch` 环境,或者直接使用 `llm-api:vllm` 环境

```shell
pip install torch -U
```

baichuan-inc/Baichuan2-13B-Chat

```shell
MODEL_NAME=baichuan2-13b-chat
MODEL_PATH=baichuan-inc/Baichuan2-13B-Chat
DEVICE_MAP=auto
DTYPE=bfloat16
```

`BitsAndBytes` 量化

```shell
MODEL_NAME=baichuan2-13b-chat
MODEL_PATH=baichuan-inc/Baichuan2-13B-Chat
DEVICE_MAP=auto
LOAD_IN_8BIT=true
```

在线量化

```shell
MODEL_NAME=baichuan2-13b-chat
MODEL_PATH=baichuan-inc/Baichuan2-13B-Chat
DEVICE_MAP=
DTYPE=half
QUANTIZE=8
```
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ tenacity
sentencepiece
tiktoken
loguru
transformers>=4.31.0
transformers>=4.33.1
peft>=0.4.0
accelerate>=0.20.3
triton
Expand Down

0 comments on commit 6b82db7

Please sign in to comment.