Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat v1.1 Updates to chat client, retrieval, and evauations #54

Merged
merged 74 commits into from
Feb 12, 2024

Conversation

parambharat
Copy link
Contributor

Overview

This PR introduces enhancements and fixes to the chat client, focusing on query handling, You.com retrieval, and various formatting and compatibility improvements.

Key Enhancements

  • Query Enhancements: Added a query enhancer to classify query intents, and identify keywords and sub-queries from a user query.
  • Chat History Retrieval: Implemented a feature for more efficient chat history retrieval.
  • Compatibility and Formatting Fixes: Addressed issues related to Pydanticv1 compatibility, JSON formatting errors, and Slack message formatting.

Additional Features

  • Improved chat prompting method.
  • Separated the query handler for better modularity.
  • Enhanced ingestion pipeline and evaluation processes.

src/wandbot/api/schemas.py Outdated Show resolved Hide resolved
query: str
language: str = "en"
initial_k: int = 50
top_k: int = 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe adding tags here to do filtering on the app slide, after-retrieval but before-response

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added include_tags and `exclude_tags keys for both inclusions and exclusions

src/wandbot/apps/slack/__main__.py Show resolved Hide resolved
import regex as re


class MrkdwnFormatter:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a typo, Mrkdwn is actually a formatting language used by slack

src/wandbot/apps/slack/formatter.py Outdated Show resolved Hide resolved
parambharat and others added 26 commits January 22, 2024 22:33

class MultiLabel(BaseModel):
label: Labels = Field(..., description="The label for the query")
reasoning: str = Field(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think reasoning should come before label to allow the model to think first

"support",
Labels.BEST_PRACTICES.value: "The query is related to best practices for using Weights & Biases. Answer the query "
"and provide guidance where necessary",
Labels.COURSE_RELATED.value: "The query is related to a Weight & Biases course and/or skill enhancement. Answer "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe specify that the W&B courses are "Machine Learning" and "AI" courses

{
"role": "system",
"content": (
"You are a Weights & Biases support manager. Your goal is to enhance the user query by adding "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Your goal is to enhance...", maybe something a little more specific related to what the end goal is

"Your goal is to refine, clarify and augment the user query before it is passed to another AI support assistant. Please add the following information to the query: ...."

return enhanced_query


class QueryHandlerConfig(BaseSettings):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this live in a separate config file?

return all_nodes


class RetrieverConfig(BaseSettings):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be in a separate config?

Your job is to judge the factful consistency of the generated answer with respect to the document.
- An answer is considered factually consistent if it contents can be inferred solely from the provided documentation.
- if an answer contains true information, if the information is not found in the document, then the answer is factually inconsistent.
- The generated answer must provide only correct information according to the documentation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"correct information" feels ambiguous, maybe clarify to something more like "provide only information found in the documentation"

safe_parse_eval_response,
)

SYSTEM_TEMPLATE = """You are a Weight & Biases support expert tasked with evaluating the factful consistency of answers to questions asked by users to a technical support chatbot.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general, should the system templates be broken out into a separate file(s)?

{{
"reason": <<Provide a brief explanation for your decision here>>,
"score": <<Provide a score as per the above guidelines>>,
"decision": <<Provide your final decision here, either 'relevant', or 'irrelevant'>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same decision comment as above

except Exception as e:
print(e)
print(eval_response)
score = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe set to -1 just so its extremely obvious this isn't a valid score returned by the LLM?

spec = row_dict["spec"]
content = json.loads(spec)
markdown_content = self.parse_content(content)
output["content"] = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it'd be useful to put the title ("display_name") and description in as their own standalone kv pairs, it could be a useful bit of metadata to have

@morganmcg1
Copy link
Member

lgtm!

@parambharat parambharat merged commit 7a2baf3 into main Feb 12, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants