Evaluation for Osprey 🔎

This document provides instructions on evaluating Osprey on four representative tasks, including open-vocabulary segmentation, referring object classification, detailed region description and region level captioning.

We have developed two types of models：the first is Osprey， the second is Osprey-Chat(denote Osprey* in our paper). Osprey-Chat exhibits better conversation and image-level understanding&reasoning capabilities with additional llava data(llava_v1_5_mix665k.json).

1. Open-Vocabulary Segmentation

Download SentenceBERT model, which is used for calculating the semantic similarity.
The evaluation is based on detectron2, please install the following dependences.

git clone https://github.com/facebookresearch/detectron2.git
python -m pip install -e detectron2
pip install git+https://github.com/cocodataset/panopticapi.git
pip install git+https://github.com/mcordts/cityscapesScripts.git

Prepare datasets, please refer to Data preparation.

Cityscapes

cd osprey/eval
python eval_open_vocab_seg_detectron2.py --dataset cityscapes --model path/to/osprey-7b --bert path/to/all-MiniLM-L6-v2

Ade20K

cd osprey/eval
python eval_open_vocab_seg_detectron2.py --dataset ade --model path/to/osprey-7b --bert path/to/all-MiniLM-L6-v2

2. Referring Object Classification

LVIS

Download our generated lvis_val_1k_category.json (We randomly sample 1K images with 4,004 objects from LVIS dataset.)

cd osprey/eval
python lvis_paco_eval.py --model path/to/osprey-7b --bert path/to/all-MiniLM-L6-v2 --img path/to/coco-all-imgs --json lvis_val_1k_category.json

PACO

Download our generated paco_val_1k_category.json (We randomly sample 1K images with 4,263 objects from PACO dataset.)

cd osprey/eval
python lvis_paco_eval.py --model path/to/osprey-7b --bert path/to/all-MiniLM-L6-v2 --img path/to/coco-all-imgs --json paco_val_1k_category.json

3. Detailed Region Description

Fill in the gpt interface in eval_gpt.py.
Change the path in gpt_eval.sh.

cd osprey/eval
sh gpt_eval.sh

4. Ferret-Bench

Note that we have converted the boxes in box_refer_caption.json and box_refer_reason.json to polygon format denoted by segmentation.

Referring Description

cd osprey/eval
python ferret_bench_eval.py --model_name path/to/osprey-chat-7b --root_path path/to/coco_imgs --json_path ./ferret_bench/box_refer_caption.json

Referring Reasoning

cd osprey/eval
python ferret_bench_eval.py --model_name path/to/osprey-chat-7b --root_path path/to/coco_imgs --json_path ./ferret_bench/box_refer_reason.json

Then use GPT-4 to evaluate the result as in Ferret.

5. POPE

Download coco from POPE and put under osprey/eval/pope.
Change the path in pope_eval.sh.

cd osprey/eval
sh pope_eval.sh

6. Region Level Captioning

We fine-tune Osprey-7B on training set of RefCOCOg. The fintuned model can be found in Osprey-7B-refcocog-fintune.
Download finetune_refcocog_val_with_mask.json.
Generate output json files:

cd osprey/eval
python refcocog_eval.py --model path/to/Osprey-7B-refcocog-fintune --img path/to/coco-all-imgs --json finetune_refcocog_val_with_mask.json

Finally, evaluate the output json file using CaptionMetrics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Evaluation for Osprey 🔎

1. Open-Vocabulary Segmentation

Cityscapes

Ade20K

2. Referring Object Classification

LVIS

PACO

3. Detailed Region Description

4. Ferret-Bench

Referring Description

Referring Reasoning

5. POPE

6. Region Level Captioning

Files

README.md

Latest commit

History

README.md

File metadata and controls

Evaluation for Osprey 🔎

1. Open-Vocabulary Segmentation

Cityscapes

Ade20K

2. Referring Object Classification

LVIS

PACO

3. Detailed Region Description

4. Ferret-Bench

Referring Description

Referring Reasoning

5. POPE

6. Region Level Captioning