Skip to content

Commit

Permalink
Allow for custom prompts and also fix flags for custom ingestion
Browse files Browse the repository at this point in the history
  • Loading branch information
ash0ts committed Nov 22, 2023
1 parent 045966a commit 35bae27
Show file tree
Hide file tree
Showing 4 changed files with 44 additions and 13 deletions.
37 changes: 27 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,12 +42,12 @@ poetry run python -m src.wandbot.ingestion
You will notice that the data is ingested into the `data/cache` directory and stored in three different directories `raw_data`, `vectorstore` with individual files for each step of the ingestion process.
These datasets are also stored as wandb artifacts in the project defined in the environment variable `WANDB_PROJECT` and can be accessed from the [wandb dashboard](https://wandb.ai/wandb/wandbot-dev).

#### Custom Dataset
### Custom Dataset

To run the Data Ingestion with a custom dataset you can use the following command:
To run the Data Ingestion with a custom dataset you can use the following command where the below path can be replaced with your <path_to_custom_dataset_config_yaml>:

```bash
poetry run python -m src.wandbot.ingestion --custom --custom_dataset_config_yaml <path_to_yaml>
poetry run python -m src.wandbot.ingestion --custom --custom_dataset_config_yaml="./src/wandbot/ingestion/custom_dataset.yaml"
```

where
Expand All @@ -67,8 +67,32 @@ The YAML is structured as follows:
is_git_repo: true
language: "en"
docstore_dir: "custom_store_en"
- CustomConfig2:
...
```
To load an index based on the custom dataset as defined above, you can set the following environment variable to an artifact path:
```bash
WANDB_INDEX_ARTIFACT="{ENTITY}/{PROJECT}/custom_index:latest"
```

#### Custom Prompt

To load an prompt based on a custom prompt in the format of the [chat_prompt.json](data/prompts/chat_prompt.json) file, you can set the following environment variable to an artifact path:

```bash
CHAT_PROMPT_PATH="./data/prompts/example_custom_prompt.json"
```

### Running Chat Locally

To run the chat locally, you can use the following command:

```bash
poetry run python -m src.wandbot.chat.chat
```

### Running the Q&A Bot

Before running the Q&A bot, ensure the following environment variables are set:
Expand Down Expand Up @@ -106,13 +130,6 @@ For more detailed instructions on installing and running the bot, please refer t

Executing these commands will launch the API, Slackbot, and Discord bot applications, enabling you to interact with the bot and ask questions related to the Weights & Biases documentation.

#### Custom Dataset

To load an index based on the custom dataset as defined above, you can set the following environment variable to an artifact path:

```bash
WANDB_INDEX_ARTIFACT="{ENTITY}/{PROJECT}/custom_index:latest"
```

### Evaluation

Expand Down
6 changes: 5 additions & 1 deletion src/wandbot/chat/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,11 @@ class ChatConfig(BaseSettings):
fallback_model_name: str = "gpt-3.5-turbo-16k-0613"
max_fallback_retries: int = 6
chat_temperature: float = 0.1
chat_prompt: pathlib.Path = pathlib.Path("data/prompts/chat_prompt.json")
chat_prompt: pathlib.Path = Field(
"data/prompts/chat_prompt.json",
env="CHAT_PROMPT_PATH",
validation_alias="chat_prompt_path"
)
index_artifact: str = Field(
"wandbot/wandbot-dev/wandbot_index:latest",
env="WANDB_INDEX_ARTIFACT",
Expand Down
4 changes: 2 additions & 2 deletions src/wandbot/ingestion/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ def main(custom: bool, custom_dataset_config_yaml: pathlib.Path):
print(vectorstore_artifact)

if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('--custom', type=bool, default=True,
parser = argparse.ArgumentParser(description='Ingest data into wandb')
parser.add_argument('--custom', action='store_true',
help='Flag for ingesting a custom dataset')
parser.add_argument('--custom_dataset_config_yaml', type=pathlib.Path,
default=pathlib.Path(__file__).parent / "custom_dataset.yaml",
Expand Down
10 changes: 10 additions & 0 deletions src/wandbot/ingestion/custom_dataset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,13 @@
is_git_repo: true
language: "en"
docstore_dir: "custom_store_en"
- CustomConfig2:
name: "custom_store2"
data_source:
remote_path: "https://docs.wandb.ai/"
repo_path: "https://github.com/wandb/docodile"
base_path: "docs"
file_pattern: "*.md"
is_git_repo: true
language: "en"
docstore_dir: "custom_store_en2"

0 comments on commit 35bae27

Please sign in to comment.