Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance qa evaluation #823

Merged
merged 35 commits into from
Oct 17, 2023
Merged

Enhance qa evaluation #823

merged 35 commits into from
Oct 17, 2023

Conversation

Prikshit7766
Copy link
Contributor

@Prikshit7766 Prikshit7766 commented Oct 16, 2023

Description

This pull request introduces two categories of distance metrics: Embedding Distance Metrics and String Distance Metrics, enhancing the capabilities of the library for comparing embeddings and strings.

Notebook

Distance Metrics for Comparing Embeddings

Supported Embedding Hubs
Huggingface
OpenAI
Metric Name Description
Cosine similarity Measures the cosine of the angle between two vectors.
Euclidean distance Calculates the straight-line distance between two points in space.
Manhattan distance Computes the sum of the absolute differences between corresponding elements of two vectors.
Chebyshev distance Determines the maximum absolute difference between elements in two vectors.
Hamming distance Measure the difference between two equal-length sequences of symbols and is defined as the number of positions at which the corresponding symbols are different.

String Distance Metrics

Metric Name Description
jaro Measures the similarity between two strings based on the number of matching characters and transpositions.
jaro_winkler An extension of the Jaro metric that gives additional weight to common prefixes.
hamming Measure the difference between two equal-length sequences of symbols and is defined as the number of positions at which the corresponding symbols are different.
levenshtein Calculates the minimum number of single-character edits (insertions, deletions, substitutions) required to transform one string into another.
damerau_levenshtein Similar to Levenshtein distance but allows transpositions as a valid edit operation.
Indel Focuses on the number of insertions and deletions required to match two strings.

Impact

  • Applicability to QA Evaluations: These new distance metrics greatly expand the utility of our library in the context of Quality Assurance (QA) evaluations. With the ability to measure both embedding and string distances, users can now conduct more comprehensive assessments.
  • Experimentation with Evaluation Strategies: Users can now experiment with different evaluation strategies tailored to their specific use cases.
  • Comprehensive Evaluation Approaches: By incorporating these two categories of distance metrics, we provide users with a diverse toolkit for evaluation. This comprehensive set of evaluation approaches ensures that our library can address a wide range of QA Evaluations.

Results:

original_question perturbed_question expected_result actual_result eval_score pass
Where are you likely to find a hamburger? WHERE ARE YOU LIKELY TO FIND A HAMBURGER? A. FAST FOOD RESTAURANT B. PIZZA C. GROUND UP DEAD COWS D. MOUTH E. COW CARCASS A. fast food restaurant A. FAST FOOD RESTAURANT 0.999998 True
James was looking for a good place to buy farmland. Where might he look? James was looking for a good place to buy farmland. Where might he look? A. midwest B. countryside C. estate D. farming areas E. illinois D. farming areas D. farming areas 1.000000 True

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Usage

Configuration Structure

To configure your embedding models and evaluation metrics, you can use a YAML configuration file. The configuration structure includes:

  • model_parameters specifying model-related settings.
  • evaluation setting the evaluation metric, distance, and threshold.
  • embeddings allowing you to choose the embedding model and hub.
  • tests defining different test scenarios and their min_pass_rate.

Here's an example of the configuration structure:

model_parameters:
  temperature: 0.2
  max_tokens: 64

evaluation:
  metric: embedding_distance
  distance: cosine
  threshold: 0.8

embeddings:
  model: text-embedding-ada-002
  hub: openai

tests:
  defaults:
    min_pass_rate: 1.0

  robustness:
    add_typo:
      min_pass_rate: 0.70
    lowercase:
      min_pass_rate: 0.70

Checklist:

  • I've added Google style docstrings to my code.
  • I've used pydantic for typing when/where necessary.
  • I have linted my code
  • I have added tests to cover my changes.

Copy link
Collaborator

@chakravarthik27 chakravarthik27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 😄

@ArshaanNazir ArshaanNazir merged commit 2846664 into release/1.7.0 Oct 17, 2023
3 checks passed
@ArshaanNazir ArshaanNazir linked an issue Oct 17, 2023 that may be closed by this pull request
@Prikshit7766 Prikshit7766 deleted the enhance-qa-evaluation branch October 22, 2023 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhance QA Evaluation with Langchain
4 participants