argilla-io · sdiazlor · Oct 7, 2024 · Sep 17, 2024 · Sep 17, 2024 · Sep 17, 2024
diff --git a/LICENSE_HEADER b/LICENSE_HEADER
@@ -1,13 +1,13 @@
-Copyright 2023-present, Argilla, Inc.
+ Copyright 2021-present, the Recognai S.L. team.
 
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
 
-    http://www.apache.org/licenses/LICENSE-2.0
+     http://www.apache.org/licenses/LICENSE-2.0
 
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
diff --git a/README.md b/README.md
@@ -22,7 +22,7 @@ If you already have deployed Argilla, you can skip this step. Otherwise, you can
 
 ## Basic Usage
 
-To easily log your data into Argilla within your LlamaIndex workflow, you only need a simple step. Just call the Argilla global handler for Llama Index before starting production with your LLM.
+To easily log your data into Argilla within your LlamaIndex workflow, you only need to initialize the span handler and attach it to the Llama Index dispatcher. This ensured that the predictions obtained using Llama Index are automatically logged to the Argilla instance.
 
 - `dataset_name`: The name of the dataset. If the dataset does not exist, it will be created with the specified name. Otherwise, it will be updated.
 - `api_url`: The URL to connect to the Argilla instance.
@@ -33,23 +33,23 @@ To easily log your data into Argilla within your LlamaIndex workflow, you only n
 > For more information about the credentials, check the documentation for [users](https://docs.argilla.io/latest/how_to_guides/user/) and [workspaces](https://docs.argilla.io/latest/how_to_guides/workspace/).
 
 ```python
-from llama_index.core import set_global_handler
+import llama_index.core.instrumentation as instrument
+from argilla_llama_index import ArgillaHandler
 
-set_global_handler(
-    "argilla",
-    dataset_name="query_model",
+span_handler = ArgillaHandler(
+    dataset_name="query_llama_index",
     api_url="http://localhost:6900",
     api_key="argilla.apikey",
     number_of_retrievals=2,
 )
+
+dispatcher = instrument.get_dispatcher().add_span_handler(span_handler)
 ```
 
 Let's log some data into Argilla. With the code below, you can create a basic LlamaIndex workflow. We will use GPT3.5 from OpenAI as our LLM ([OpenAI API key](https://openai.com/blog/openai-api)). Moreover, we will use an example `.txt` file obtained from the [Llama Index documentation](https://docs.llamaindex.ai/en/stable/getting_started/starter_example.html).
 
-
-
 ```python
-import os 
+import os
 
 from llama_index.core import Settings, VectorStoreIndex, SimpleDirectoryReader
 from llama_index.llms.openai import OpenAI
@@ -63,8 +63,8 @@ Settings.llm = OpenAI(
 documents = SimpleDirectoryReader("data").load_data()
 index = VectorStoreIndex.from_documents(documents)
 
-# Create the query engine
-query_engine = index.as_query_engine()
+# Create the query engine with the same similarity top k as the number of retrievals
+query_engine = index.as_query_engine(similarity_top_k=2)
 ```
 
 Now, let's run the `query_engine` to have a response from the model. The generated response will be logged into Argilla.

diff --git a/docs/assets/UI-screenshot-github.png b/docs/assets/UI-screenshot-github.png
diff --git a/docs/assets/UI-screenshot.png b/docs/assets/UI-screenshot.png
diff --git a/docs/tutorials/getting_started.ipynb b/docs/tutorials/getting_started.ipynb
@@ -6,9 +6,9 @@
    "source": [
     "# ✨🦙 Getting started with Argilla's LlamaIndex Integration\n",
     "\n",
-    "In this tutorial, we will show the basic usage of this integration that allows the user to include the feedback loop that Argilla offers into the LlamaIndex ecosystem. It's based on a callback handler to be run within the LlamaIndex workflow. \n",
+    "In this tutorial, we will show the basic usage of this integration that allows the user to include the feedback loop that Argilla offers into the LlamaIndex ecosystem. It's based on a span handler to be run within the LlamaIndex workflow.\n",
     "\n",
-    "Don't hesitate to check out both [LlamaIndex](https://github.com/run-llama/llama_index) and [Argilla](https://github.com/argilla-io/argilla)"
+    "Don't hesitate to check out both [LlamaIndex](https://github.com/run-llama/llama_index) and [Argilla](https://github.com/argilla-io/argilla)\n"
    ]
   },
   {
@@ -19,7 +19,7 @@
     "\n",
     "### Deploy the Argilla server¶\n",
     "\n",
-    "If you already have deployed Argilla, you can skip this step. Otherwise, you can quickly deploy Argilla following [this guide](https://docs.argilla.io/latest/getting_started/quickstart/)."
+    "If you already have deployed Argilla, you can skip this step. Otherwise, you can quickly deploy Argilla following [this guide](https://docs.argilla.io/latest/getting_started/quickstart/).\n"
    ]
   },
   {
@@ -28,7 +28,7 @@
    "source": [
     "### Set up the environment¶\n",
     "\n",
-    "To complete this tutorial, you need to install this integration."
+    "To complete this tutorial, you need to install this integration.\n"
    ]
   },
   {
@@ -37,14 +37,14 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "!pip install argilla-llama-index"
+    "%pip install \"argilla-llama-index>=2.1.0\""
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Let's make the required imports:"
+    "Let's make the required imports:\n"
    ]
   },
   {
@@ -53,20 +53,22 @@
    "metadata": {},
    "outputs": [],
    "source": [
+    "import llama_index.core.instrumentation as instrument\n",
     "from llama_index.core import (\n",
     "    Settings,\n",
     "    VectorStoreIndex,\n",
     "    SimpleDirectoryReader,\n",
-    "    set_global_handler,\n",
     ")\n",
-    "from llama_index.llms.openai import OpenAI"
+    "from llama_index.llms.openai import OpenAI\n",
+    "\n",
+    "from argilla_llama_index import ArgillaHandler"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We will use GPT3.5 from OpenAI as our LLM. For this, you will need a valid API key from OpenAI. You can have more information and get one via [this link](https://openai.com/blog/openai-api)."
+    "We will use GPT3.5 from OpenAI as our LLM. For this, you will need a valid API key from OpenAI. You can have more information and get one via [this link](https://openai.com/blog/openai-api).\n"
    ]
   },
   {
@@ -87,15 +89,15 @@
    "source": [
     "## Set the Argilla's LlamaIndex handler\n",
     "\n",
-    "To easily log your data into Argilla within your LlamaIndex workflow, you only need a simple step. Just call the Argilla global handler for Llama Index before starting production with your LLM. This ensured that the predictions obtained using Llama Index are automatically logged to the Argilla instance.\n",
+    "To easily log your data into Argilla within your LlamaIndex workflow, you only need to initialize the span handler and attach it to the Llama Index dispatcher. This ensured that the predictions obtained using Llama Index are automatically logged to the Argilla instance.\n",
     "\n",
     "- `dataset_name`: The name of the dataset. If the dataset does not exist, it will be created with the specified name. Otherwise, it will be updated.\n",
     "- `api_url`: The URL to connect to the Argilla instance.\n",
     "- `api_key`: The API key to authenticate with the Argilla instance.\n",
     "- `number_of_retrievals`: The number of retrieved documents to be logged. Defaults to 0.\n",
     "- `workspace_name`: The name of the workspace to log the data. By default, the first available workspace.\n",
     "\n",
-    "> For more information about the credentials, check the documentation for [users](https://docs.argilla.io/latest/how_to_guides/user/) and [workspaces](https://docs.argilla.io/latest/how_to_guides/workspace/)."
+    "> For more information about the credentials, check the documentation for [users](https://docs.argilla.io/latest/how_to_guides/user/) and [workspaces](https://docs.argilla.io/latest/how_to_guides/workspace/).\n"
    ]
   },
   {
@@ -104,27 +106,28 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "set_global_handler(\n",
-    "    \"argilla\",\n",
-    "    dataset_name=\"query_model\",\n",
+    "span_handler = ArgillaHandler\n",
+    "    dataset_name=\"query_llama_index\",\n",
     "    api_url=\"http://localhost:6900\",\n",
     "    api_key=\"argilla.apikey\",\n",
     "    number_of_retrievals=2,\n",
-    ")"
+    ")\n",
+    "\n",
+    "dispatcher = instrument.get_dispatcher().add_span_handler(span_handler)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Log the data to Argilla"
+    "## Log the data to Argilla\n"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "With the code below, you can create a basic LlamaIndex workflow. We will use an example `.txt` file obtained from the [Llama Index documentation](https://docs.llamaindex.ai/en/stable/getting_started/starter_example.html)."
+    "With the code below, you can create a basic LlamaIndex workflow. We will use an example `.txt` file obtained from the [Llama Index documentation](https://docs.llamaindex.ai/en/stable/getting_started/starter_example.html).\n"
    ]
   },
   {
@@ -145,21 +148,23 @@
    "outputs": [],
    "source": [
     "# LLM settings\n",
-    "Settings.llm = OpenAI(model=\"gpt-3.5-turbo\", temperature=0.8, openai_api_key=openai_api_key)\n",
+    "Settings.llm = OpenAI(\n",
+    "    model=\"gpt-3.5-turbo\", temperature=0.8, openai_api_key=openai_api_key\n",
+    ")\n",
     "\n",
     "# Load the data and create the index\n",
     "documents = SimpleDirectoryReader(\"../../data\").load_data()\n",
     "index = VectorStoreIndex.from_documents(documents)\n",
     "\n",
-    "# Create the query engine\n",
-    "query_engine = index.as_query_engine()"
+    "# Create the query engine with the same similarity top k as the number of retrievals\n",
+    "query_engine = index.as_query_engine(similarity_top_k=2)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Now, let's run the `query_engine` to have a response from the model."
+    "Now, let's run the `query_engine` to have a response from the model.\n"
    ]
   },
   {
@@ -178,7 +183,7 @@
    "source": [
     "The prompt given and the response obtained will be logged in as a chat, as well as the indicated number of retrieved documents.\n",
     "\n",
-    "![Argilla UI](../assets/UI-screenshot.png)"
+    "![Argilla UI](../assets/UI-screenshot.png)\n"
    ]
   }
  ],