Add support for baichuan2 models

xusenlinzy · Sep 7, 2023 · 6b82db7 · 6b82db7
1 parent 6b2ec78
commit 6b82db7
Show file tree

Hide file tree

Showing 6 changed files with 89 additions and 46 deletions.
diff --git a/.env.example b/.env.example
@@ -18,6 +18,7 @@ DEVICE=cuda
 DEVICE_MAP=
 GPUS=
 NUM_GPUs=1
+DTYPE=half
 
 # patch related
 PATCH_TYPE=

diff --git a/README.md b/README.md
@@ -20,6 +20,9 @@
 
 ## 📢 新闻
 
++ 【2023.08.28】 添加 [baichuan2](https://github.com/baichuan-inc/Baichuan2) 模型支持，[启动方式链接](https://github.com/xusenlinzy/api-for-open-llm/blob/master/docs/SCRIPT.md#baichuan2)
+
+
 + 【2023.08.28】 添加 `transformers.TextIteratorStreamer` 流式输出支持，只需将环境变量修改为 `USE_STREAMER_V2=true`
 
 
@@ -83,25 +86,26 @@
 
 **语言模型**
 
-|                                   模型                                   |     基座模型     |   参数量    |   语言   |                                                   模型权重链接                                                    |
-|:----------------------------------------------------------------------:|:------------:|:--------:|:------:|:-----------------------------------------------------------------------------------------------------------:|
-|       [codellama](https://github.com/facebookresearch/codellama)       |    LLaMA2    | 7/13/34B | multi  |       [codellama/CodeLlama-7b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf)       |
-|       [xverse-13b-chat](https://github.com/xverse-ai/XVERSE-13B)       |    Xverse    |   13B    | multi  |                   [xverse/XVERSE-13B-Chat](https://huggingface.co/xverse/XVERSE-13B-Chat)                   |
-|           [qwen-7b-chat](https://github.com/QwenLM/Qwen-7B)            |     Qwen     |    7B    | en, zh |                 [Qwen/Qwen-7B-Chat](https://huggingface.co/baichuan-inc/Qwen/Qwen-7B-Chat)                  |
-|   [baichuan-13b-chat](https://github.com/baichuan-inc/Baichuan-13B)    |   Baichuan   |   13B    | en, zh |           [baichuan-inc/Baichuan-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat)           |
-|            [InternLM](https://github.com/InternLM/InternLM)            |   InternLM   |    7B    | en, zh |                [internlm/internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b)                |
-|            [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B)            |     GLM      |  6/130B  | en, zh |                        [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b)                        |
-|       [baichaun-7b](https://github.com/baichuan-inc/baichuan-7B)       |   Baichuan   |    7B    | en, zh |                 [baichuan-inc/baichuan-7B](https://huggingface.co/baichuan-inc/baichuan-7B)                 |
-|         [Guanaco](https://github.com/artidoro/qlora/tree/main)         |    LLaMA     | 7/33/65B |   en   |           [timdettmers/guanaco-33b-merged](https://huggingface.co/timdettmers/guanaco-33b-merged)           |
-|          [YuLan-Chat](https://github.com/RUC-GSAI/YuLan-Chat)          |    LLaMA     |  13/65B  | en, zh |            [RUCAIBox/YuLan-Chat-13b-delta](https://huggingface.co/RUCAIBox/YuLan-Chat-13b-delta)            |
-|         [TigerBot](https://github.com/TigerResearch/TigerBot)          |    BLOOMZ    |  7/180B  | en, zh |            [TigerResearch/tigerbot-7b-sft](https://huggingface.co/TigerResearch/tigerbot-7b-sft)            |
-|          [OpenBuddy](https://github.com/OpenBuddy/OpenBuddy)           | LLaMA、Falcon |    7B    | multi  |                                [OpenBuddy](https://huggingface.co/OpenBuddy)                                |
-|               [MOSS](https://github.com/OpenLMLab/MOSS)                |   CodeGen    |   16B    | en, zh |              [fnlp/moss-moon-003-sft-int4](https://huggingface.co/fnlp/moss-moon-003-sft-int4)              |
-|        [Phoenix](https://github.com/FreedomIntelligence/LLMZoo)        |    BLOOMZ    |    7B    | multi  | [FreedomIntelligence/phoenix-inst-chat-7b](https://huggingface.co/FreedomIntelligence/phoenix-inst-chat-7b) |
-|        [BAIZE](https://github.com/project-baize/baize-chatbot)         |    LLaMA     | 7/13/30B |   en   |              [project-baize/baize-lora-7B](https://huggingface.co/project-baize/baize-lora-7B)              |
-| [Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca)  |    LLaMA     |  7/13B   | en, zh |   [ziqingyang/chinese-alpaca-plus-lora-7b](https://huggingface.co/ziqingyang/chinese-alpaca-plus-lora-7b)   |
-|             [BELLE](https://github.com/LianjiaTech/BELLE)              |    BLOOMZ    |    7B    |   zh   |                   [BelleGroup/BELLE-7B-2M](https://huggingface.co/BelleGroup/BELLE-7B-2M)                   |
-|             [ChatGLM](https://github.com/THUDM/ChatGLM-6B)             |     GLM      |    6B    | en, zh |                         [THUDM/chatglm-6b](https://huggingface.co/THUDM/chatglm-6b)                         |
+|                                  模型                                   |     基座模型     |   参数量    |   语言   |                                                   模型权重链接                                                    |
+|:---------------------------------------------------------------------:|:------------:|:--------:|:------:|:-----------------------------------------------------------------------------------------------------------:|
+|        [baichuan2](https://github.com/baichuan-inc/Baichuan2)         |   Baichuan   |   7/13   | en, zh |          [baichuan-inc/Baichuan2-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat)          |
+|      [codellama](https://github.com/facebookresearch/codellama)       |    LLaMA2    | 7/13/34B | multi  |       [codellama/CodeLlama-7b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf)       |
+|      [xverse-13b-chat](https://github.com/xverse-ai/XVERSE-13B)       |    Xverse    |   13B    | multi  |                   [xverse/XVERSE-13B-Chat](https://huggingface.co/xverse/XVERSE-13B-Chat)                   |
+|           [qwen-7b-chat](https://github.com/QwenLM/Qwen-7B)           |     Qwen     |    7B    | en, zh |                 [Qwen/Qwen-7B-Chat](https://huggingface.co/baichuan-inc/Qwen/Qwen-7B-Chat)                  |
+|   [baichuan-13b-chat](https://github.com/baichuan-inc/Baichuan-13B)   |   Baichuan   |   13B    | en, zh |           [baichuan-inc/Baichuan-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat)           |
+|           [InternLM](https://github.com/InternLM/InternLM)            |   InternLM   |    7B    | en, zh |                [internlm/internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b)                |
+|           [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B)            |     GLM      |  6/130B  | en, zh |                        [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b)                        |
+|      [baichaun-7b](https://github.com/baichuan-inc/baichuan-7B)       |   Baichuan   |    7B    | en, zh |                 [baichuan-inc/baichuan-7B](https://huggingface.co/baichuan-inc/baichuan-7B)                 |
+|        [Guanaco](https://github.com/artidoro/qlora/tree/main)         |    LLaMA     | 7/33/65B |   en   |           [timdettmers/guanaco-33b-merged](https://huggingface.co/timdettmers/guanaco-33b-merged)           |
+|         [YuLan-Chat](https://github.com/RUC-GSAI/YuLan-Chat)          |    LLaMA     |  13/65B  | en, zh |            [RUCAIBox/YuLan-Chat-13b-delta](https://huggingface.co/RUCAIBox/YuLan-Chat-13b-delta)            |
+|         [TigerBot](https://github.com/TigerResearch/TigerBot)         |    BLOOMZ    |  7/180B  | en, zh |            [TigerResearch/tigerbot-7b-sft](https://huggingface.co/TigerResearch/tigerbot-7b-sft)            |
+|          [OpenBuddy](https://github.com/OpenBuddy/OpenBuddy)          | LLaMA、Falcon |    7B    | multi  |                                [OpenBuddy](https://huggingface.co/OpenBuddy)                                |
+|               [MOSS](https://github.com/OpenLMLab/MOSS)               |   CodeGen    |   16B    | en, zh |              [fnlp/moss-moon-003-sft-int4](https://huggingface.co/fnlp/moss-moon-003-sft-int4)              |
+|       [Phoenix](https://github.com/FreedomIntelligence/LLMZoo)        |    BLOOMZ    |    7B    | multi  | [FreedomIntelligence/phoenix-inst-chat-7b](https://huggingface.co/FreedomIntelligence/phoenix-inst-chat-7b) |
+|        [BAIZE](https://github.com/project-baize/baize-chatbot)        |    LLaMA     | 7/13/30B |   en   |              [project-baize/baize-lora-7B](https://huggingface.co/project-baize/baize-lora-7B)              |
+| [Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca) |    LLaMA     |  7/13B   | en, zh |   [ziqingyang/chinese-alpaca-plus-lora-7b](https://huggingface.co/ziqingyang/chinese-alpaca-plus-lora-7b)   |
+|             [BELLE](https://github.com/LianjiaTech/BELLE)             |    BLOOMZ    |    7B    |   zh   |                   [BelleGroup/BELLE-7B-2M](https://huggingface.co/BelleGroup/BELLE-7B-2M)                   |
+|            [ChatGLM](https://github.com/THUDM/ChatGLM-6B)             |     GLM      |    6B    | en, zh |                         [THUDM/chatglm-6b](https://huggingface.co/THUDM/chatglm-6b)                         |
 
 
 **嵌入模型**

diff --git a/api/apapter/model.py b/api/apapter/model.py
@@ -8,7 +8,6 @@
 from loguru import logger
 from peft import PeftModel
 from tqdm import tqdm
-
 from transformers import (
     AutoModel,
     AutoConfig,
@@ -51,7 +50,14 @@ def load_model(self, model_name_or_path: Optional[str] = None, adapter_model: Op
         num_gpus = kwargs.get("num_gpus", 1)
         if device == "cuda":
             if "torch_dtype" not in config_kwargs:
-                config_kwargs["torch_dtype"] = torch.float16
+                dtype = kwargs.get("dtype", "half")
+                if dtype == "half":
+                    config_kwargs["torch_dtype"] = torch.float16
+                elif dtype == "bfloat16":
+                    config_kwargs["torch_dtype"] = torch.bfloat16
+                else:
+                    config_kwargs["torch_dtype"] = torch.float32
+
             if num_gpus != 1:
                 config_kwargs["device_map"] = "auto"
                 # model_kwargs["device_map"] = "sequential"  # This is important for not the same VRAM sizes
@@ -449,13 +455,14 @@ class CodeLlamaModelAdapter(LlamaModelAdapter):
 
     @property
     def tokenizer_class(self):
-        try:
-            from transformers import CodeLlamaTokenizer
-            return CodeLlamaTokenizer
-        except ImportError:
-            logger.error(
-                "transformers is not installed correctly. Please use the following command to install transformers\npip install git+https://github.com/huggingface/transformers.git."
-            )
+        require_version("transformers>=4.33.1", "To fix: pip install transformers>=4.33.1")
+        from transformers import CodeLlamaTokenizer
+
+        return CodeLlamaTokenizer
+
+    @property
+    def default_model_name_or_path(self):
+        return "codellama/CodeLlama-7b-Instruct-hf"
 
 
 register_model_adapter(ChatglmModelAdapter)

diff --git a/api/models.py b/api/models.py
@@ -2,7 +2,6 @@
 
 from fastapi import FastAPI
 from fastapi.middleware.cors import CORSMiddleware
-from loguru import logger
 from sentence_transformers import SentenceTransformer
 
 from api.apapter import get_prompt_adapter
@@ -52,6 +51,7 @@ def create_generate_model():
         load_in_8bit=config.LOAD_IN_8BIT,
         load_in_4bit=config.LOAD_IN_4BIT,
         use_ptuning_v2=config.USING_PTUNING_V2,
+        dtype=config.DTYPE,
     )
 
     return ModelServer(
@@ -98,14 +98,12 @@ def create_vllm_engine():
 
     # A separate tokenizer to map token IDs to strings.
     if "code-llama" in config.MODEL_NAME.lower():
-        try:
-            from transformers import CodeLlamaTokenizer
-
-            engine.engine.tokenizer = CodeLlamaTokenizer.from_pretrained(engine_args.tokenizer)
-        except ImportError:
-            logger.error(
-                "transformers is not installed correctly. Please use the following command to install transformers\npip install git+https://github.com/huggingface/transformers.git."
-            )
+        from transformers.utils.versions import require_version
+
+        require_version("transformers>=4.33.1", "To fix: pip install transformers>=4.33.1")
+        from transformers import CodeLlamaTokenizer
+
+        engine.engine.tokenizer = CodeLlamaTokenizer.from_pretrained(engine_args.tokenizer)
     else:
         engine.engine.tokenizer = get_tokenizer(
             engine_args.tokenizer,

diff --git a/docs/SCRIPT.md b/docs/SCRIPT.md
@@ -118,6 +118,8 @@ python server.py
 
 **环境变量修改内容参考下面**
 
++ [baichuan2](https://github.com/xusenlinzy/api-for-open-llm/blob/master/docs/SCRIPT.md#baichuan2)
+
 + [code-llama](https://github.com/xusenlinzy/api-for-open-llm/blob/master/docs/SCRIPT.md#code-llama)
 
 + [sqlcoder](https://github.com/xusenlinzy/api-for-open-llm/blob/master/docs/SCRIPT.md#sqlcoder)  
@@ -301,10 +303,6 @@ DEVICE_MAP=auto
 
 ### CODE-LLAMA
 
-```shell
-pip install git+https://github.com/huggingface/transformers.git
-```
-
 codellama/CodeLlama-7b-Instruct-hf
 
 ```shell
@@ -314,10 +312,6 @@ MODEL_PATH=codellama/CodeLlama-7b-Instruct-hf
 
 ### Wizard-Coder
 
-```shell
-pip install git+https://github.com/huggingface/transformers.git
-```
-
 WizardLM/WizardCoder-Python-34B-V1.0
 
 ```shell
@@ -326,3 +320,42 @@ MODEL_PATH=WizardLM/WizardCoder-Python-34B-V1.0
 PROMPT_NAME=alpaca
 DEVICE_MAP=auto
 ```
+
+
+### Baichuan2
+
+`Baichuan2` 系列模型中，为了加快推理速度使用了 `pytorch2.0` 加入的新功能 `F.scaled_dot_product_attention`，因此需要在 `pytorch2.0` 环境下运行
+
+可以使用下面的命令升级 `llm-api:pytorch` 环境，或者直接使用 `llm-api:vllm` 环境
+
+```shell
+pip install torch -U
+```
+
+baichuan-inc/Baichuan2-13B-Chat
+
+```shell
+MODEL_NAME=baichuan2-13b-chat
+MODEL_PATH=baichuan-inc/Baichuan2-13B-Chat
+DEVICE_MAP=auto
+DTYPE=bfloat16
+```
+
+`BitsAndBytes` 量化
+
+```shell
+MODEL_NAME=baichuan2-13b-chat
+MODEL_PATH=baichuan-inc/Baichuan2-13B-Chat
+DEVICE_MAP=auto
+LOAD_IN_8BIT=true
+```
+
+在线量化
+
+```shell
+MODEL_NAME=baichuan2-13b-chat
+MODEL_PATH=baichuan-inc/Baichuan2-13B-Chat
+DEVICE_MAP=
+DTYPE=half
+QUANTIZE=8
+```
diff --git a/requirements.txt b/requirements.txt
@@ -9,7 +9,7 @@ tenacity
 sentencepiece
 tiktoken
 loguru
-transformers>=4.31.0
+transformers>=4.33.1
 peft>=0.4.0
 accelerate>=0.20.3
 triton