Enhance qa evaluation #823

Prikshit7766 · 2023-10-16T13:53:48Z

Description

This pull request introduces two categories of distance metrics: Embedding Distance Metrics and String Distance Metrics, enhancing the capabilities of the library for comparing embeddings and strings.

Notebook

Distance Metrics for Comparing Embeddings

Supported Embedding Hubs
Huggingface
OpenAI

Metric Name	Description
Cosine similarity	Measures the cosine of the angle between two vectors.
Euclidean distance	Calculates the straight-line distance between two points in space.
Manhattan distance	Computes the sum of the absolute differences between corresponding elements of two vectors.
Chebyshev distance	Determines the maximum absolute difference between elements in two vectors.
Hamming distance	Measure the difference between two equal-length sequences of symbols and is defined as the number of positions at which the corresponding symbols are different.

String Distance Metrics

Metric Name	Description
jaro	Measures the similarity between two strings based on the number of matching characters and transpositions.
jaro_winkler	An extension of the Jaro metric that gives additional weight to common prefixes.
hamming	Measure the difference between two equal-length sequences of symbols and is defined as the number of positions at which the corresponding symbols are different.
levenshtein	Calculates the minimum number of single-character edits (insertions, deletions, substitutions) required to transform one string into another.
damerau_levenshtein	Similar to Levenshtein distance but allows transpositions as a valid edit operation.
Indel	Focuses on the number of insertions and deletions required to match two strings.

Impact

Applicability to QA Evaluations: These new distance metrics greatly expand the utility of our library in the context of Quality Assurance (QA) evaluations. With the ability to measure both embedding and string distances, users can now conduct more comprehensive assessments.
Experimentation with Evaluation Strategies: Users can now experiment with different evaluation strategies tailored to their specific use cases.
Comprehensive Evaluation Approaches: By incorporating these two categories of distance metrics, we provide users with a diverse toolkit for evaluation. This comprehensive set of evaluation approaches ensures that our library can address a wide range of QA Evaluations.

Results:

original_question	perturbed_question	expected_result	actual_result	eval_score	pass
Where are you likely to find a hamburger?	WHERE ARE YOU LIKELY TO FIND A HAMBURGER? A. FAST FOOD RESTAURANT B. PIZZA C. GROUND UP DEAD COWS D. MOUTH E. COW CARCASS	A. fast food restaurant	A. FAST FOOD RESTAURANT	0.999998	True
James was looking for a good place to buy farmland. Where might he look?	James was looking for a good place to buy farmland. Where might he look? A. midwest B. countryside C. estate D. farming areas E. illinois	D. farming areas	D. farming areas	1.000000	True

Fixes Enhance QA Evaluation with Langchain #785

Type of change

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Usage

Configuration Structure

To configure your embedding models and evaluation metrics, you can use a YAML configuration file. The configuration structure includes:

model_parameters specifying model-related settings.
evaluation setting the evaluation metric, distance, and threshold.
embeddings allowing you to choose the embedding model and hub.
tests defining different test scenarios and their min_pass_rate.

Here's an example of the configuration structure:

model_parameters:
  temperature: 0.2
  max_tokens: 64

evaluation:
  metric: embedding_distance
  distance: cosine
  threshold: 0.8

embeddings:
  model: text-embedding-ada-002
  hub: openai

tests:
  defaults:
    min_pass_rate: 1.0

  robustness:
    add_typo:
      min_pass_rate: 0.70
    lowercase:
      min_pass_rate: 0.70

Checklist:

I've added Google style docstrings to my code.
I've used pydantic for typing when/where necessary.
I have linted my code
I have added tests to cover my changes.

…bs/langtest into enhance-qa-evaluation

chakravarthik27

LGTM 😄

Prikshit7766 and others added 30 commits October 14, 2023 23:05

added embedding_distance metrics

e6a7e90

added manhattan, chebyshev, hamming distance

5b42855

added String Distance Distance

eac9e88

string_distance: added levenshtein, damerau_levenshtein, indel

bb03c7c

added huggingface embeddings

b6599c0

added Open ai embeddings

978a66c

updated llm modelhandler

1214dee

refactored the code

805046c

added is_pass_string_distance

e484f36

Added is_pass method for rmbedding distance

5d191be

Updated params in smple

25514a0

updated sample.py

3a40187

added test_string_distance

789a247

Added test case suit for embedding distance

647825c

updated test_string_distance

02ef4a6

refactored code and added validation checks

646c75c

refactor code

c169e34

Renamed File

c5c149e

Updated Jaro distance docstring

053b8e9

Sample.py: Added relevant docstrings

41a3393

TEST: Test HF embeddings

1319376

added Evaluation_Metrics notebook

c5447f6

rest threshold and fix import

7be14d7

typo fix

4632426

Sample.py: updated HuggingfaceEmbeddings arg name

5fa04d2

added tenacity dependency

4f18534

updated shape of embeddings

120a5d9

Updated threshold in nb

abbf69f

Notebook: Updated sum nb for threshold

b62354e

minor fix

c04e32f

Prikshit7766 assigned RakshitKhajuria and Prikshit7766 Oct 16, 2023

RakshitKhajuria added 3 commits October 16, 2023 19:40

Updated poetry for tenacity

65bc2cb

Merge branch 'enhance-qa-evaluation' of https://github.com/JohnSnowLa…

4380bea

…bs/langtest into enhance-qa-evaluation

Fix: Testcases

10b4760

Prikshit7766 requested a review from chakravarthik27 October 16, 2023 17:42

chakravarthik27 approved these changes Oct 17, 2023

View reviewed changes

Prikshit7766 added 2 commits October 17, 2023 11:02

resolve conflict

90c8558

minor fix

dba830a

ArshaanNazir merged commit 2846664 into release/1.7.0 Oct 17, 2023
3 checks passed

ArshaanNazir linked an issue Oct 17, 2023 that may be closed by this pull request

Enhance QA Evaluation with Langchain #785

Closed

Prikshit7766 deleted the enhance-qa-evaluation branch October 22, 2023 18:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance qa evaluation #823

Enhance qa evaluation #823

Prikshit7766 commented Oct 16, 2023 •

edited by RakshitKhajuria

Loading

chakravarthik27 left a comment

Enhance qa evaluation #823

Enhance qa evaluation #823

Conversation

Prikshit7766 commented Oct 16, 2023 • edited by RakshitKhajuria Loading

Description

Distance Metrics for Comparing Embeddings

String Distance Metrics

Impact

Results:

Type of change

Usage

Checklist:

chakravarthik27 left a comment

Choose a reason for hiding this comment

Prikshit7766 commented Oct 16, 2023 •

edited by RakshitKhajuria

Loading