-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add: docs for disentangled vqa metric
- Loading branch information
1 parent
76ca310
commit a9902db
Showing
4 changed files
with
68 additions
and
0 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
# Disentangled VQA Metrics | ||
|
||
This module aims to implement the Spatial relationship metric described in Section 4.1 from the paper [T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation](https://arxiv.org/pdf/2307.06350). | ||
|
||
| ![](../assets/disentangled_blip_vqa.png) | | ||
|:--:| | ||
| Using the disentangled BLIP-VQA model for attribute-binding evaluation as proposed in [T2I-CompBench](https://arxiv.org/pdf/2307.06350.pdf) | | ||
|
||
| ![](../assets/disentangled_blip_vqa_dashboard.png) | | ||
|:--:| | ||
| Weave gives us a holistic view of the evaluations to drill into individual ouputs and scores. | | ||
|
||
!!! example | ||
## Step 1: Generate evaluation dataset | ||
|
||
Generate the dataset consisting of prompts in the format `“a {adj_1} {noun_1} and a {adj_2} {noun_2}”` and the corresponding metadata using an LLM capable of generating json objects like GPT4-O. The dataset is then published both as a [W&B dataset artifact](https://docs.wandb.ai/guides/artifacts) and as a | ||
[weave dataset](https://wandb.github.io/weave/guides/core-types/datasets). | ||
|
||
```python | ||
from hemm.metrics.attribute_binding import AttributeBindingDatasetGenerator | ||
|
||
dataset_generator = AttributeBindingDatasetGenerator( | ||
openai_model="gpt-4o", | ||
openai_seed=42, | ||
num_prompts_in_single_call=20, | ||
num_api_calls=50, | ||
project_name="disentangled_vqa", | ||
) | ||
|
||
dataset_generator(dump_dir="./dump") | ||
``` | ||
|
||
## Step 2: Evaluate | ||
|
||
```python | ||
import wandb | ||
import weave | ||
|
||
wandb.init(project=project, entity=entity, job_type="evaluation") | ||
weave.init(project_name=project) | ||
|
||
diffusion_model = BaseDiffusionModel( | ||
diffusion_model_name_or_path=diffusion_model_address, | ||
enable_cpu_offfload=diffusion_model_enable_cpu_offfload, | ||
image_height=image_size[0], | ||
image_width=image_size[1], | ||
) | ||
evaluation_pipeline = EvaluationPipeline(model=diffusion_model) | ||
|
||
judge = BlipVQAJudge() | ||
metric = DisentangledVQAMetric(judge=judge, name="disentangled_blip_metric") | ||
evaluation_pipeline.add_metric(metric) | ||
|
||
evaluation_pipeline(dataset=dataset) | ||
``` | ||
|
||
## Metrics | ||
|
||
:::hemm.metrics.attribute_binding.disentangled_vqa | ||
|
||
## Judges | ||
|
||
:::hemm.metrics.attribute_binding.judges | ||
|
||
## Dataset Generation | ||
|
||
:::hemm.metrics.attribute_binding.dataset_generator |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters