Added support for the HyDE method in quey analysis for RAG plates #1413

lanlanguai · 2024-07-26T06:18:26Z

Features
Added the HyDE method for query-analysis in the RAG module, including an example for better understanding.
Fixed the issue with the static methods in TestRAGEmbeddingFactory not being callable. The previous code passed static methods as parameters for parameterized testing, but static methods are not callable objects, leading to a TypeError. This was resolved by converting static methods to regular functions and defining them outside the class.
Feature Docs
No additional documentation provided.

Influence
As an optional process in RAG, query-analysis will rewrite queries to enhance search results.

Result
All unit tests for the new features have passed.
The query-analysis process in the RAG module runs smoothly, effectively rewriting and optimizing queries for better search results.
Other
Added a detailed description of the changes and fixes made in the submission.

Simulation functions (mock_openai_embedding, mock_azure_embedding, mock_gemini_embedding, and mock_ollama_embedding) have been added. Reason for adding: Fix the issue that static methods are not callable: The previous code parameterized the static method as a parameterized test, but the static method was not a callable object, resulting in a TypeError error.Factory.py

codecov-commenter · 2024-07-26T07:06:35Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 15.62500% with 27 lines in your changes missing coverage. Please review.

Project coverage is 55.66%. Comparing base (c0abe17) to head (2819b2e).
Report is 12 commits behind head on main.

Files	Patch %	Lines
metagpt/rag/query_analysis/HyDE.py	0.00%	14 Missing ⚠️
metagpt/rag/factories/HyDEQueryTransformFactory.py	0.00%	13 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1413       +/-   ##
===========================================
+ Coverage   30.64%   55.66%   +25.01%     
===========================================
  Files         320      323        +3     
  Lines       19426    19458       +32     
===========================================
+ Hits         5954    10831     +4877     
+ Misses      13472     8627     -4845

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

better629 · 2024-08-12T05:52:04Z

config/config2.example.yaml

@@ -20,6 +20,10 @@ embedding:
  embed_batch_size: 100
  dimensions: # output dimension of embedding model

+# RAG Analysis
+hyde:


use the structure like to support more configuration inside rag
rag:
query:
hyde:
include_original: True

better629 · 2024-08-12T05:58:38Z

config/config2.yaml

no need to commit this file if there are no related changes.

better629 · 2024-08-12T05:59:28Z

examples/rag_pipeline.py

 from pydantic import BaseModel

 from metagpt.const import DATA_PATH, EXAMPLE_DATA_PATH
 from metagpt.logs import logger
 from metagpt.rag.engines import SimpleEngine
+from metagpt.rag.factories.HyDEQueryTransformFactory import HyDEQueryTransformFactory


file name usually in low case with '_'

better629 · 2024-08-12T05:59:42Z

examples/rag_pipeline.py

@@ -212,6 +214,22 @@ async def init_and_query_es(self):
        answer = await engine.aquery(TRAVEL_QUESTION)
        self._print_query_result(answer)

+    async def use_HyDe(self):


use_hyde
and keep in a uniform format, HyDE. No HyDe

better629 · 2024-08-12T06:01:09Z

metagpt/config2.py

@@ -51,6 +52,9 @@ class Config(CLIParams, YamlModel):
    # RAG Embedding
    embedding: EmbeddingConfig = EmbeddingConfig()

+    # RAG Analysis
+    hyde: HydeConfig = HydeConfig()


better629 · 2024-08-12T06:02:51Z

metagpt/configs/query_analysis_config.py

@@ -0,0 +1,5 @@
+from metagpt.utils.yaml_model import YamlModel
+


use rag_config.py to support independent rag configuration

better629 · 2024-08-12T06:04:46Z

metagpt/rag/query_analysis/HyDE.py

+
+        if self._include_original:
+            embedding_strs.extend(query_bundle.embedding_strs)
+        logger.info(f" Hypothetical doc:{embedding_strs} ")


usually not to print embedding, it's too long and not a good log str

better629 · 2024-08-12T06:08:17Z

examples/rag_pipeline.py

+        engine = SimpleEngine.from_docs(input_files=[TRAVEL_DOC_PATH])
+        # create HyDE query engine
+        hyde_query_transformr = HyDEQueryTransformFactory().create_hyde_query_transform()
+        hyde_query_engine = TransformQueryEngine(engine, hyde_query_transformr)


How to integrate with SimpleEngine, not directly TransformQueryEngine.
What I means is that one engine entrance to support like query rewrite, rerank and so on.

better629 · 2024-08-12T06:09:24Z

examples/rag_pipeline.py

+        # 1.  save docs
+        engine = SimpleEngine.from_docs(input_files=[TRAVEL_DOC_PATH])
+        # create HyDE query engine
+        hyde_query_transformr = HyDEQueryTransformFactory().create_hyde_query_transform()


add datasets comparison result with/without HyDE method.

better629 · 2024-08-19T12:51:00Z

config/config2.example.yaml

@@ -23,13 +23,9 @@ rag:
  # RAG Query Analysis
  query_analysis:
    hyde:
-      include_original: true  # In the query rewrite, determines whether to include the original
+      include_original: True  # In the query rewrite, determines whether to include the original


true not True

better629 · 2024-08-19T12:53:37Z

metagpt/rag/query_analysis/hyde.py

@@ -0,0 +1,63 @@
+from typing import Any, Dict, Optional
+from llama_index.core.llms import LLM


why import this, not used

better629 · 2024-08-19T12:54:32Z

config/config2.example.yaml

-  api_version: ""
-  embed_batch_size: 100
-  dimensions: # output dimension of embedding model
+  embedding:


don't change this embedding one.

# Conflicts: # metagpt/config2.py

add hotpotqa

lanlanguai · 2024-08-20T08:38:08Z

The configuration information and results from running the configurations with and without the HyDE method using metagpt/rag/benchmark/hotpotqa.py are as follows:

Model	Sample_Size	HyDE_Used	Exact_Match	F1_Score
deepseek	20	yes	0.1	0.289846
deepseek	20	no	0.1	0.265604
gpt4-o	20	yes	0.55	0.726190
gpt4-o	20	no	0.45	0.626190
gpt4-o	100	yes	0.6	0.752560
gpt4-o	100	no	0.57	0.741560

liaojianxing added 4 commits July 25, 2024 11:15

Added support for HyDE

0e81347

fix config2.yaml

2cc6d60

Add the HyDE example to the rag_pipeline

b4fa468

lanlanguai had a problem deploying to unittest July 26, 2024 06:18 — with GitHub Actions Failure

lanlanguai closed this Jul 26, 2024

lanlanguai deleted the rag_HyDE branch July 26, 2024 06:29

lanlanguai restored the rag_HyDE branch July 26, 2024 06:30

lanlanguai reopened this Jul 26, 2024

lanlanguai had a problem deploying to unittest July 26, 2024 06:51 — with GitHub Actions Failure

better629 reviewed Aug 12, 2024

View reviewed changes

liaojianxing and others added 4 commits August 19, 2024 15:32

Update hyde function

ec40043

add rag config

b2458d8

Formatting Files

008fe37

Merge branch 'rag_HyDE' into main

775130b

lanlanguai had a problem deploying to unittest August 19, 2024 09:03 — with GitHub Actions Failure

lanlanguai had a problem deploying to unittest August 19, 2024 09:07 — with GitHub Actions Failure

lanlanguai had a problem deploying to unittest August 19, 2024 09:08 — with GitHub Actions Failure

lanlanguai had a problem deploying to unittest August 19, 2024 09:09 — with GitHub Actions Failure

lanlanguai had a problem deploying to unittest August 19, 2024 09:10 — with GitHub Actions Failure

lanlanguai had a problem deploying to unittest August 19, 2024 09:11 — with GitHub Actions Failure

better629 reviewed Aug 19, 2024

View reviewed changes

liaojianxing added 4 commits August 20, 2024 14:54

add hotpotqa.py

3e53b33

Merge remote-tracking branch 'upstream/main'

2b85048

# Conflicts: # metagpt/config2.py

fix config

d5c3c20

Merge remote-tracking branch 'upstream/main'

2619eff

# Conflicts: # metagpt/config2.py

lanlanguai force-pushed the rag_HyDE branch from 03aed28 to 2619eff Compare August 20, 2024 08:24

lanlanguai had a problem deploying to unittest August 20, 2024 08:24 — with GitHub Actions Failure

lanlanguai added 2 commits August 20, 2024 16:25

Merge branch 'rag_HyDE' into main

c053214

Merge pull request #2 from lanlanguai/main

5fd3670

add hotpotqa

lanlanguai had a problem deploying to unittest August 20, 2024 08:26 — with GitHub Actions Failure

Update config2.example.yaml

621cb22

lanlanguai had a problem deploying to unittest August 20, 2024 08:26 — with GitHub Actions Failure

Update config2.example.yaml

83041d2

lanlanguai had a problem deploying to unittest August 20, 2024 08:27 — with GitHub Actions Failure

Delete metagpt/rag/factories/HyDEQueryTransformFactory.py

26b0285

lanlanguai had a problem deploying to unittest August 20, 2024 08:29 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support for the HyDE method in quey analysis for RAG plates #1413

Added support for the HyDE method in quey analysis for RAG plates #1413

lanlanguai commented Jul 26, 2024

codecov-commenter commented Jul 26, 2024

better629 Aug 12, 2024

better629 Aug 12, 2024

better629 Aug 12, 2024

better629 Aug 12, 2024

better629 Aug 12, 2024

better629 Aug 12, 2024

better629 Aug 12, 2024

better629 Aug 12, 2024

better629 Aug 12, 2024

better629 Aug 19, 2024

better629 Aug 19, 2024

better629 Aug 19, 2024

lanlanguai commented Aug 20, 2024

		@@ -0,0 +1,5 @@
		from metagpt.utils.yaml_model import YamlModel

		@@ -0,0 +1,63 @@
		from typing import Any, Dict, Optional
		from llama_index.core.llms import LLM

Added support for the HyDE method in quey analysis for RAG plates #1413

Are you sure you want to change the base?

Added support for the HyDE method in quey analysis for RAG plates #1413

Conversation

lanlanguai commented Jul 26, 2024

codecov-commenter commented Jul 26, 2024

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lanlanguai commented Aug 20, 2024