add: docs for disentangled vqa metric

wandb · Jun 29, 2024 · a9902db · a9902db
1 parent 76ca310
commit a9902db
Show file tree

Hide file tree

Showing 4 changed files with 68 additions and 0 deletions.
diff --git a/docs/assets/disentangled_blip_vqa.png b/docs/assets/disentangled_blip_vqa.png
diff --git a/docs/assets/disentangled_blip_vqa_dashboard.png b/docs/assets/disentangled_blip_vqa_dashboard.png
diff --git a/docs/metrics/disentangled_vqa.md b/docs/metrics/disentangled_vqa.md
@@ -0,0 +1,67 @@
+# Disentangled VQA Metrics
+
+This module aims to implement the Spatial relationship metric described in Section 4.1 from the paper [T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation](https://arxiv.org/pdf/2307.06350).
+
+| ![](../assets/disentangled_blip_vqa.png) | 
+|:--:| 
+| Using the disentangled BLIP-VQA model for attribute-binding evaluation as proposed in [T2I-CompBench](https://arxiv.org/pdf/2307.06350.pdf) |
+
+| ![](../assets/disentangled_blip_vqa_dashboard.png) | 
+|:--:| 
+| Weave gives us a holistic view of the evaluations to drill into individual ouputs and scores. |
+
+!!! example
+    ## Step 1: Generate evaluation dataset
+
+    Generate the dataset consisting of prompts in the format `“a {adj_1} {noun_1} and a {adj_2} {noun_2}”` and the corresponding metadata using an LLM capable of generating json objects like GPT4-O. The dataset is then published both as a [W&B dataset artifact](https://docs.wandb.ai/guides/artifacts) and as a
+    [weave dataset](https://wandb.github.io/weave/guides/core-types/datasets).
+
+    ```python
+    from hemm.metrics.attribute_binding import AttributeBindingDatasetGenerator
+
+        dataset_generator = AttributeBindingDatasetGenerator(
+            openai_model="gpt-4o",
+            openai_seed=42,
+            num_prompts_in_single_call=20,
+            num_api_calls=50,
+            project_name="disentangled_vqa",
+        )
+
+        dataset_generator(dump_dir="./dump")
+    ```
+
+    ## Step 2: Evaluate
+
+    ```python
+    import wandb
+    import weave
+
+    wandb.init(project=project, entity=entity, job_type="evaluation")
+    weave.init(project_name=project)
+
+    diffusion_model = BaseDiffusionModel(
+        diffusion_model_name_or_path=diffusion_model_address,
+        enable_cpu_offfload=diffusion_model_enable_cpu_offfload,
+        image_height=image_size[0],
+        image_width=image_size[1],
+    )
+    evaluation_pipeline = EvaluationPipeline(model=diffusion_model)
+
+    judge = BlipVQAJudge()
+    metric = DisentangledVQAMetric(judge=judge, name="disentangled_blip_metric")
+    evaluation_pipeline.add_metric(metric)
+
+    evaluation_pipeline(dataset=dataset)
+    ```
+
+## Metrics
+
+:::hemm.metrics.attribute_binding.disentangled_vqa
+
+## Judges
+
+:::hemm.metrics.attribute_binding.judges
+
+## Dataset Generation
+
+:::hemm.metrics.attribute_binding.dataset_generator
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -46,6 +46,7 @@ nav:
     - Image-Quality-Metrics: 'metrics/image_quality.md'
     - Prompt-Image-Alignment: 'metrics/prompt_image_alignment.md'
     - Spatial-Relationship: 'metrics/spatial_relationship.md'
+    - Disentangled-VQA: 'metrics/disentangled_vqa.md'
   - Utils: 'utils.md'
 
 repo_url: https://github.com/wandb/Hemm