From 407cb8420a46f62b04334d93195c2a991a46736b Mon Sep 17 00:00:00 2001 From: wacdev Date: Sun, 30 Apr 2023 13:10:29 +0800 Subject: [PATCH 1/2] now can run on mac m2 --- .gitignore | 3 + README.md | 66 +++++++++------------ demo.py | 15 +++-- environment.yml | 6 +- minigpt4/configs/models/minigpt4.yaml | 2 +- minigpt4/models/mini_gpt4.py | 11 ++-- requirements.txt | 85 +++++++++++++++++++++++++++ 7 files changed, 135 insertions(+), 53 deletions(-) create mode 100644 .gitignore create mode 100644 requirements.txt diff --git a/.gitignore b/.gitignore new file mode 100644 index 00000000..4c9d999b --- /dev/null +++ b/.gitignore @@ -0,0 +1,3 @@ +.DS_Store +__pycache__/ +model/ diff --git a/README.md b/README.md index 16de6901..f5cbaa87 100644 --- a/README.md +++ b/README.md @@ -5,17 +5,14 @@ [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1OK4kYsZphwt5DXchKkzMBjYF6jnkqh4R?usp=sharing) [![YouTube](https://badges.aleen42.com/src/youtube.svg)](https://www.youtube.com/watch?v=__tftoxpBAw&feature=youtu.be) - ## News We now provide a pretrained MiniGPT-4 aligned with Vicuna-7B! The demo GPU memory consumption now can be as low as 12GB. - ## Online Demo Click the image to chat with MiniGPT-4 around your images [![demo](figs/online_demo.png)](https://minigpt-4.github.io) - ## Examples | | | :-------------------------:|:-------------------------: @@ -24,19 +21,15 @@ Click the image to chat with MiniGPT-4 around your images More examples can be found in the [project page](https://minigpt-4.github.io). - - ## Introduction -- MiniGPT-4 aligns a frozen visual encoder from BLIP-2 with a frozen LLM, Vicuna, using just one projection layer. +- MiniGPT-4 aligns a frozen visual encoder from BLIP-2 with a frozen LLM, Vicuna, using just one projection layer. - We train MiniGPT-4 with two stages. The first traditional pretraining stage is trained using roughly 5 million aligned image-text pairs in 10 hours using 4 A100s. After the first stage, Vicuna is able to understand the image. But the generation ability of Vicuna is heavilly impacted. - To address this issue and improve usability, we propose a novel way to create high-quality image-text pairs by the model itself and ChatGPT together. Based on this, we then create a small (3500 pairs in total) yet high-quality dataset. - The second finetuning stage is trained on this dataset in a conversation template to significantly improve its generation reliability and overall usability. To our surprise, this stage is computationally efficient and takes only around 7 minutes with a single A100. -- MiniGPT-4 yields many emerging vision-language capabilities similar to those demonstrated in GPT-4. - +- MiniGPT-4 yields many emerging vision-language capabilities similar to those demonstrated in GPT-4. ![overview](figs/overview.png) - ## Getting Started ### Installation @@ -51,11 +44,10 @@ conda env create -f environment.yml conda activate minigpt4 ``` - **2. Prepare the pretrained Vicuna weights** The current version of MiniGPT-4 is built on the v0 versoin of Vicuna-13B. -Please refer to our instruction [here](PrepareVicuna.md) +Please refer to our instruction [here](PrepareVicuna.md) to prepare the Vicuna weights. The final weights would be in a single folder in a structure similar to the following: @@ -65,10 +57,10 @@ vicuna_weights ├── generation_config.json ├── pytorch_model.bin.index.json ├── pytorch_model-00001-of-00003.bin -... +... ``` -Then, set the path to the vicuna weight in the model config file +Then, set the path to the vicuna weight in the model config file [here](minigpt4/configs/models/minigpt4.yaml#L16) at Line 16. **3. Prepare the pretrained MiniGPT-4 checkpoint** @@ -77,13 +69,10 @@ Download the pretrained checkpoints according to the Vicuna model you prepare. | Checkpoint Aligned with Vicuna 13B | Checkpoint Aligned with Vicuna 7B | :------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------: - [Downlad](https://drive.google.com/file/d/1a4zLvaiDBr-36pasffmgpvH5P7CKmpze/view?usp=share_link) | [Download](https://drive.google.com/file/d/1RY9jV0dyqLX-o38LrumkKRh6Jtaop58R/view?usp=sharing) - - -Then, set the path to the pretrained checkpoint in the evaluation config file -in [eval_configs/minigpt4_eval.yaml](eval_configs/minigpt4_eval.yaml#L10) at Line 11. - + [Downlad](https://drive.google.com/file/d/1a4zLvaiDBr-36pasffmgpvH5P7CKmpze/view?usp=share_link) | [Download](https://drive.google.com/file/d/1RY9jV0dyqLX-o38LrumkKRh6Jtaop58R/view?usp=sharing) +Then, set the path to the pretrained checkpoint in the evaluation config file +in [eval_configs/minigpt4_eval.yaml](eval_configs/minigpt4_eval.yaml#L10) at Line 11. ### Launching Demo Locally @@ -93,58 +82,59 @@ Try out our demo [demo.py](demo.py) on your local machine by running python demo.py --cfg-path eval_configs/minigpt4_eval.yaml --gpu-id 0 ``` -To save GPU memory, Vicuna loads as 8 bit by default, with a beam search width of 1. -This configuration requires about 23G GPU memory for Vicuna 13B and 11.5G GPU memory for Vicuna 7B. +To save GPU memory, Vicuna loads as 8 bit by default, with a beam search width of 1. +This configuration requires about 23G GPU memory for Vicuna 13B and 11.5G GPU memory for Vicuna 7B. For more powerful GPUs, you can run the model -in 16 bit by setting low_resource to False in the config file +in 16 bit by setting low_resource to False in the config file [minigpt4_eval.yaml](eval_configs/minigpt4_eval.yaml) and use a larger beam search width. Thanks [@WangRongsheng](https://github.com/WangRongsheng), you can also run our code on [Colab](https://colab.research.google.com/drive/1OK4kYsZphwt5DXchKkzMBjYF6jnkqh4R?usp=sharing) - ### Training The training of MiniGPT-4 contains two alignment stages. **1. First pretraining stage** In the first pretrained stage, the model is trained using image-text pairs from Laion and CC datasets -to align the vision and language model. To download and prepare the datasets, please check -our [first stage dataset preparation instruction](dataset/README_1_STAGE.md). +to align the vision and language model. To download and prepare the datasets, please check +our [first stage dataset preparation instruction](dataset/README_1_STAGE.md). After the first stage, the visual features are mapped and can be understood by the language model. -To launch the first stage training, run the following command. In our experiments, we use 4 A100. -You can change the save path in the config file +To launch the first stage training, run the following command. In our experiments, we use 4 A100. +You can change the save path in the config file [train_configs/minigpt4_stage1_pretrain.yaml](train_configs/minigpt4_stage1_pretrain.yaml) ```bash torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigpt4_stage1_pretrain.yaml ``` -A MiniGPT-4 checkpoint with only stage one training can be downloaded +A MiniGPT-4 checkpoint with only stage one training can be downloaded [here (13B)](https://drive.google.com/file/d/1u9FRRBB3VovP1HxCAlpD9Lw4t4P6-Yq8/view?usp=share_link) or [here (7B)](https://drive.google.com/file/d/1HihQtCEXUyBM1i9DQbaK934wW3TZi-h5/view?usp=share_link). Compared to the model after stage two, this checkpoint generate incomplete and repeated sentences frequently. - **2. Second finetuning stage** In the second stage, we use a small high quality image-text pair dataset created by ourselves and convert it to a conversation format to further align MiniGPT-4. -To download and prepare our second stage dataset, please check our +To download and prepare our second stage dataset, please check our [second stage dataset preparation instruction](dataset/README_2_STAGE.md). -To launch the second stage alignment, -first specify the path to the checkpoint file trained in stage 1 in +To launch the second stage alignment, +first specify the path to the checkpoint file trained in stage 1 in [train_configs/minigpt4_stage1_pretrain.yaml](train_configs/minigpt4_stage2_finetune.yaml). -You can also specify the output path there. +You can also specify the output path there. Then, run the following command. In our experiments, we use 1 A100. ```bash torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigpt4_stage2_finetune.yaml ``` -After the second stage alignment, MiniGPT-4 is able to talk about the image coherently and user-friendly. - +After the second stage alignment, MiniGPT-4 is able to talk about the image coherently and user-friendly. +## Run on Mac +``` +pip install -r requirements.txt +``` ## Acknowledgement @@ -152,19 +142,17 @@ After the second stage alignment, MiniGPT-4 is able to talk about the image cohe + [Lavis](https://github.com/salesforce/LAVIS) This repository is built upon Lavis! + [Vicuna](https://github.com/lm-sys/FastChat) The fantastic language ability of Vicuna with only 13B parameters is just amazing. And it is open-source! - If you're using MiniGPT-4 in your research or applications, please cite using this BibTeX: ```bibtex @misc{zhu2022minigpt4, - title={MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models}, + title={MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models}, author={Deyao Zhu and Jun Chen and Xiaoqian Shen and Xiang Li and Mohamed Elhoseiny}, journal={arXiv preprint arXiv:2304.10592}, year={2023}, } ``` - ## License This repository is under [BSD 3-Clause License](LICENSE.md). -Many codes are based on [Lavis](https://github.com/salesforce/LAVIS) with +Many codes are based on [Lavis](https://github.com/salesforce/LAVIS) with BSD 3-Clause License [here](LICENSE_Lavis.md). diff --git a/demo.py b/demo.py index b3659f1c..509b9f5e 100644 --- a/demo.py +++ b/demo.py @@ -19,6 +19,7 @@ from minigpt4.runners import * from minigpt4.tasks import * +CUDA = torch.cuda.is_available() def parse_args(): parser = argparse.ArgumentParser(description="Demo") @@ -57,11 +58,13 @@ def setup_seeds(config): model_config = cfg.model_cfg model_config.device_8bit = args.gpu_id model_cls = registry.get_model_class(model_config.arch) -model = model_cls.from_config(model_config).to('cuda:{}'.format(args.gpu_id)) +GPU = 'cuda:{}'.format(args.gpu_id) if CUDA else None +model = model_cls.from_config(model_config).to(GPU) +model = torch.compile(model) vis_processor_cfg = cfg.datasets_cfg.cc_sbu_align.vis_processor.train vis_processor = registry.get_processor_class(vis_processor_cfg.name).from_config(vis_processor_cfg) -chat = Chat(model, vis_processor, device='cuda:{}'.format(args.gpu_id)) +chat = Chat(model, vis_processor, device=GPU) print('Initialization Finished') # ======================================== @@ -118,7 +121,7 @@ def gradio_answer(chatbot, chat_state, img_list, num_beams, temperature): image = gr.Image(type="pil") upload_button = gr.Button(value="Upload & Start Chat", interactive=True, variant="primary") clear = gr.Button("Restart") - + num_beams = gr.Slider( minimum=1, maximum=10, @@ -127,7 +130,7 @@ def gradio_answer(chatbot, chat_state, img_list, num_beams, temperature): interactive=True, label="beam search numbers)", ) - + temperature = gr.Slider( minimum=0.1, maximum=2.0, @@ -142,9 +145,9 @@ def gradio_answer(chatbot, chat_state, img_list, num_beams, temperature): img_list = gr.State() chatbot = gr.Chatbot(label='MiniGPT-4') text_input = gr.Textbox(label='User', placeholder='Please upload your image first', interactive=False) - + upload_button.click(upload_img, [image, text_input, chat_state], [image, text_input, upload_button, chat_state, img_list]) - + text_input.submit(gradio_ask, [text_input, chatbot, chat_state], [text_input, chatbot, chat_state]).then( gradio_answer, [chatbot, chat_state, img_list, num_beams, temperature], [chatbot, chat_state, img_list] ) diff --git a/environment.yml b/environment.yml index d5cfcf87..4ac35867 100644 --- a/environment.yml +++ b/environment.yml @@ -4,13 +4,13 @@ channels: - defaults - anaconda dependencies: - - python=3.9 + - python=3.10 - cudatoolkit - pip - - pytorch=1.12.1 + - pytorch=2.0.0 - pytorch-mutex=1.0=cuda - torchaudio=0.12.1 - - torchvision=0.13.1 + - torchvision=0.15.1 - pip: - accelerate==0.16.0 - aiohttp==3.8.4 diff --git a/minigpt4/configs/models/minigpt4.yaml b/minigpt4/configs/models/minigpt4.yaml index 87af4486..3f6d9810 100644 --- a/minigpt4/configs/models/minigpt4.yaml +++ b/minigpt4/configs/models/minigpt4.yaml @@ -13,7 +13,7 @@ model: num_query_token: 32 # Vicuna - llama_model: "/path/to/vicuna/weights/" + llama_model: './model/vicuna-13b' # /path/to/vicuna/weights/ # generation configs prompt: "" diff --git a/minigpt4/models/mini_gpt4.py b/minigpt4/models/mini_gpt4.py index 667edd56..ca84511a 100644 --- a/minigpt4/models/mini_gpt4.py +++ b/minigpt4/models/mini_gpt4.py @@ -10,6 +10,7 @@ from minigpt4.models.modeling_llama import LlamaForCausalLM from transformers import LlamaTokenizer +CUDA = torch.cuda.is_available() @registry.register_model("mini_gpt4") class MiniGPT4(Blip2Base): @@ -86,17 +87,19 @@ def __init__( self.llama_tokenizer = LlamaTokenizer.from_pretrained(llama_model, use_fast=False) self.llama_tokenizer.pad_token = self.llama_tokenizer.eos_token + torch_dtype = torch.float16 if CUDA else torch.float32 if self.low_resource: self.llama_model = LlamaForCausalLM.from_pretrained( llama_model, - torch_dtype=torch.float16, - load_in_8bit=True, - device_map={'': device_8bit} + torch_dtype=torch_dtype, + load_in_8bit=CUDA, + offload_folder="model/offload", + device_map={'': device_8bit} if CUDA else 'auto' ) else: self.llama_model = LlamaForCausalLM.from_pretrained( llama_model, - torch_dtype=torch.float16, + torch_dtype=torch_dtype, ) for name, param in self.llama_model.named_parameters(): diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 00000000..dc648610 --- /dev/null +++ b/requirements.txt @@ -0,0 +1,85 @@ +accelerate==0.18.0 +aiofiles==23.1.0 +aiohttp==3.8.4 +aiosignal==1.3.1 +altair==4.2.2 +antlr4-python3-runtime==4.9.3 +anyio==3.6.2 +async-timeout==4.0.2 +attrs==23.1.0 +bitsandbytes==0.38.1 +braceexpand==0.1.7 +certifi==2022.12.7 +charset-normalizer==3.1.0 +click==8.1.3 +contourpy==1.0.7 +cycler==0.11.0 +entrypoints==0.4 +eva-decord==0.6.1 +fastapi==0.95.1 +ffmpy==0.3.0 +filelock==3.12.0 +fonttools==4.39.3 +frozenlist==1.3.3 +fsspec==2023.4.0 +gradio==3.28.1 +gradio_client==0.1.4 +h11==0.14.0 +httpcore==0.17.0 +httpx==0.24.0 +huggingface-hub==0.14.1 +idna==3.4 +iopath==0.1.10 +Jinja2==3.1.2 +jsonschema==4.17.3 +kiwisolver==1.4.4 +linkify-it-py==2.0.0 +markdown-it-py==2.2.0 +MarkupSafe==2.1.2 +matplotlib==3.7.1 +mdit-py-plugins==0.3.3 +mdurl==0.1.2 +mpmath==1.3.0 +multidict==6.0.4 +networkx==3.1 +numpy==1.24.3 +omegaconf==2.3.0 +opencv-python==4.7.0.72 +orjson==3.8.11 +packaging==23.1 +pandas==2.0.1 +Pillow==9.5.0 +portalocker==2.7.0 +psutil==5.9.5 +pydantic==1.10.7 +pydub==0.25.1 +pyparsing==3.0.9 +pyrsistent==0.19.3 +python-dateutil==2.8.2 +python-multipart==0.0.6 +pytz==2023.3 +PyYAML==6.0 +regex==2023.3.23 +requests==2.29.0 +semantic-version==2.10.0 +sentencepiece==0.1.98 +six==1.16.0 +sniffio==1.3.0 +socksio==1.0.0 +starlette==0.26.1 +sympy==1.11.1 +timm==0.6.13 +tokenizers==0.13.3 +toolz==0.12.0 +torch==2.0.0 +torchvision==0.15.1 +tqdm==4.65.0 +transformers==4.28.1 +typing_extensions==4.5.0 +tzdata==2023.3 +uc-micro-py==1.0.1 +urllib3==1.26.15 +uvicorn==0.22.0 +webdataset==0.2.48 +websockets==11.0.2 +yarl==1.9.2 From e2c0d3f82adcb561efbb1c55a20c420eaa718429 Mon Sep 17 00:00:00 2001 From: wacdev Date: Sun, 30 Apr 2023 13:54:09 +0800 Subject: [PATCH 2/2] add demo.sh --- demo.py | 0 demo.sh | 7 +++++++ eval_configs/minigpt4_eval.yaml | 2 +- 3 files changed, 8 insertions(+), 1 deletion(-) mode change 100644 => 100755 demo.py create mode 100755 demo.sh diff --git a/demo.py b/demo.py old mode 100644 new mode 100755 diff --git a/demo.sh b/demo.sh new file mode 100755 index 00000000..78c95b15 --- /dev/null +++ b/demo.sh @@ -0,0 +1,7 @@ +#!/usr/bin/env bash + +DIR=$(realpath $0) && DIR=${DIR%/*} +cd $DIR +set -ex + +python demo.py --cfg-path $DIR/eval_configs/minigpt4_eval.yaml diff --git a/eval_configs/minigpt4_eval.yaml b/eval_configs/minigpt4_eval.yaml index f9e55a30..3c7df359 100644 --- a/eval_configs/minigpt4_eval.yaml +++ b/eval_configs/minigpt4_eval.yaml @@ -8,7 +8,7 @@ model: low_resource: True prompt_path: "prompts/alignment.txt" prompt_template: '###Human: {} ###Assistant: ' - ckpt: '/path/to/pretrained/ckpt/' + ckpt: ./model/pretrained_minigpt4.pth # /path/to/pretrained/ckpt/ datasets: