Skip to content

Commit

Permalink
Merge pull request #25 from argilla-io/feat/update-argilla-2.0
Browse files Browse the repository at this point in the history
[FEAT] Update argilla 2.0
  • Loading branch information
davidberenstein1957 authored Aug 21, 2024
2 parents b155c51 + ce2b171 commit d86870a
Show file tree
Hide file tree
Showing 15 changed files with 785 additions and 879 deletions.
48 changes: 36 additions & 12 deletions .github/workflows/integration-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,54 +9,78 @@ jobs:
integration_tests:
name: Running integration tests, which require an Argilla instance running
runs-on: ubuntu-latest
services:
argilla:
image: argilla/argilla-quickstart:latest
ports:
- 6900:6900

strategy:
fail-fast: false
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]

defaults:
run:
shell: bash -l {0}

steps:
- name: 🪑 Wait for argilla-quickstart
run: |
while ! curl -XGET http://localhost:6900/api/_status; do sleep 30; done
- name: 🛎 Checkout Code
uses: actions/checkout@v3

- name: 🐍 Setup Conda Env
uses: conda-incubator/setup-miniconda@v2
with:
miniforge-variant: Mambaforge
miniforge-version: latest
use-mamba: true
activate-environment: argilla

- name: 🐍 Get date for Conda cache
id: get-date
run: echo "::set-output name=today::$(/bin/date -u '+%Y%m%d')"
shell: bash

- name: 🐍 Cache Conda env
uses: actions/cache@v3
id: cache
with:
path: ${{ env.CONDA }}/envs
key: conda-${{ runner.os }}--${{ runner.arch }}--${{ steps.get-date.outputs.today }}-${{ hashFiles('environment_dev.yml') }}-${{ env.CACHE_NUMBER }}

- name: 👜 Cache pip
uses: actions/cache@v3
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ env.CACHE_NUMBER }}-${{ hashFiles('pyproject.toml') }}

- name: 🛜 Netsat
run: |
apt update && apt install sudo
sudo apt install net-tools
netstat -lt
- name: 🗃️ Install pytest
- name: 📦 Download Docker Compose file
run: |
pip install pytest
pip install -e .
- name: 📈 Run end2end examples
curl https://raw.githubusercontent.com/argilla-io/argilla/main/examples/deployments/docker/docker-compose.yaml -o docker-compose.yaml
working-directory: .

- name: 📦 Set up Docker Compose
run: |
docker compose -f docker-compose.yaml up -d
working-directory: .

- name: 🪑 Wait for services to be ready
run: |
while ! curl -XGET http://localhost:6900/api/_status; do sleep 30; done
- name: 🗃️ Install dependencies
run: |
pip install -e ".[dev,tests]"
- name: 📈 Run tests
env:
ARGILLA_ENABLE_TELEMETRY: 0
run: |
pytest tests
- name: 📦 Tear down Docker Compose
if: always()
run: |
docker compose -f docker-compose.yaml down
working-directory: .
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ venv/
ENV/
env.bak/
venv.bak/
.python-version

# Spyder project settings
.spyderproject
Expand Down
77 changes: 35 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,81 +4,74 @@
</div>

> [!TIP]
> To discuss, get support, or give feedback [join Argilla's Slack Community](https://join.slack.com/t/rubrixworkspace/shared_invite/zt-whigkyjn-a3IUJLD7gDbTZ0rKlvcJ5g) and you will be able to engage with our amazing community and also with the core developers of `argilla` and `distilabel`.
> To discuss, get support, or give feedback [join Discord](http://hf.co/join/discord) in #argilla-distilabel-general and #argilla-distilabel-help. You will be able to engage with our amazing community and the core developers of `argilla` and `distilabel`.

This integration allows the user to include the feedback loop that Argilla offers into the LlamaIndex ecosystem. It's based on a callback handler to be run within the LlamaIndex workflow.
This integration allows the user to include the feedback loop that Argilla offers into the LlamaIndex ecosystem. It's based on a callback handler to be run within the LlamaIndex workflow.

Don't hesitate to check out both [LlamaIndex](https://github.com/run-llama/llama_index) and [Argilla](https://github.com/argilla-io/argilla)

## Getting Started

You first need to install argilla and argilla-llama-index as follows:
You first need to install argilla-llama-index as follows:

```bash
pip install argilla-llama-index
```

You will need to an Argilla Server running to monitor the LLM. You can either install the server locally or have it on HuggingFace Spaces. For a complete guide on how to install and initialize the server, you can refer to the [Quickstart Guide](https://docs.argilla.io/en/latest/getting_started/quickstart_installation.html).
If you already have deployed Argilla, you can skip this step. Otherwise, you can quickly deploy Argilla following [this guide](https://docs.argilla.io/latest/getting_started/quickstart/).

## Usage
## Basic Usage

It requires just a simple step to log your data into Argilla within your LlamaIndex workflow. We just need to call the handler before starting production with your LLM.
To easily log your data into Argilla within your LlamaIndex workflow, you only need a simple step. Just call the Argilla global handler for Llama Index before starting production with your LLM.

We will use GPT3.5 from OpenAI as our LLM. For this, you will need a valid API key from OpenAI. You can have more info and get one via [this link](https://openai.com/blog/openai-api).
- `dataset_name`: The name of the dataset. If the dataset does not exist, it will be created with the specified name. Otherwise, it will be updated.
- `api_url`: The URL to connect to the Argilla instance.
- `api_key`: The API key to authenticate with the Argilla instance.
- `number_of_retrievals`: The number of retrieved documents to be logged. Defaults to 0.
- `workspace_name`: The name of the workspace to log the data. By default, the first available workspace.

After you get your API key, the easiest way to import it is through an environment variable, or via *getpass()*.
> For more information about the credentials, check the documentation for [users](https://docs.argilla.io/latest/how_to_guides/user/) and [workspaces](https://docs.argilla.io/latest/how_to_guides/workspace/).
```python
import os
from getpass import getpass

openai_api_key = os.getenv("OPENAI_API_KEY", None) or getpass("Enter OpenAI API key:")
from llama_index.core import set_global_handler

set_global_handler(
"argilla",
dataset_name="query_model",
api_url="http://localhost:6900",
api_key="argilla.apikey",
number_of_retrievals=2,
)
```

Let's now write all the necessary imports
Let's log some data into Argilla. With the code below, you can create a basic LlamaIndex workflow. We will use GPT3.5 from OpenAI as our LLM ([OpenAI API key](https://openai.com/blog/openai-api)). Moreover, we will use an example `.txt` file obtained from the [Llama Index documentation](https://docs.llamaindex.ai/en/stable/getting_started/starter_example.html).

```python
from llama_index.core import VectorStoreIndex, ServiceContext, SimpleDirectoryReader, set_global_handler
from llama_index.llms.openai import OpenAI
```

What we need to do is to set Argilla as the global handler as below. Within the handler, we need to provide the dataset name that we will use. If the dataset does not exist, it will be created with the given name. You can also set the API KEY, API URL, and the Workspace name. You can learn more about the variables that controls Argilla initialization [here](https://docs.argilla.io/en/latest/getting_started/installation/configurations/workspace_management.html)

> [!TIP]
> Remember that the default Argilla workspace name is `admin`. If you want to use a custom Workspace, you'll need to create it and grant access to the desired users. The link above also explains how to do that.


```python
set_global_handler("argilla", dataset_name="query_model")
```
import os

Let's now create the llm instance, using GPT-3.5 from OpenAI.
from llama_index.core import Settings, VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI

```python
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.8, openai_api_key=openai_api_key)
```
# LLM settings
Settings.llm = OpenAI(
model="gpt-3.5-turbo", temperature=0.8, openai_api_key=os.getenv("OPENAI_API_KEY")
)

With the code snippet below, you can create a basic workflow with LlamaIndex. You will also need a txt file as the data source within a folder named "data". For a sample data file and more info regarding the use of Llama Index, you can refer to the [Llama Index documentation](https://docs.llamaindex.ai/en/stable/getting_started/starter_example.html).
# Load the data and create the index
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)

```python
service_context = ServiceContext.from_defaults(llm=llm)
docs = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(docs, service_context=service_context)
# Create the query engine
query_engine = index.as_query_engine()
```

Now, let's run the `query_engine` to have a response from the model.
Now, let's run the `query_engine` to have a response from the model. The generated response will be logged into Argilla.

```python
response = query_engine.query("What did the author do growing up?")
response
```

```bash
The author worked on two main things outside of school before college: writing and programming. They wrote short stories and tried writing programs on an IBM 1401. They later got a microcomputer, built it themselves, and started programming on it.
```

The prompt given and the response obtained will be logged in to Argilla server. You can check the data on Argilla's UI:

![Argilla Dataset](docs/assets/argilla-ui-dataset.png)
![Argilla UI](/docs/assets/UI-screenshot.png)
Binary file added docs/assets/UI-screenshot-github.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/UI-screenshot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/assets/argilla-ui-dataset.png
Binary file not shown.
Binary file removed docs/assets/rag_example_1.png
Binary file not shown.
Loading

0 comments on commit d86870a

Please sign in to comment.