From 8fe6a5f040fba44b39bba8d6a5655a1fd0b40a80 Mon Sep 17 00:00:00 2001 From: Nir Gazit Date: Fri, 12 Jan 2024 11:53:35 +0100 Subject: [PATCH 1/8] chore: continuing work by cartermp --- docs/ai/README.md | 24 +++++++++ docs/ai/llm-spans.md | 99 +++++++++++++++++++++++++++++++++++++ docs/ai/openai.md | 114 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 237 insertions(+) create mode 100644 docs/ai/README.md create mode 100644 docs/ai/llm-spans.md create mode 100644 docs/ai/openai.md diff --git a/docs/ai/README.md b/docs/ai/README.md new file mode 100644 index 0000000000..f04a867a22 --- /dev/null +++ b/docs/ai/README.md @@ -0,0 +1,24 @@ + + +# Semantic Conventions for AI systems + +**Status**: [Experimental][DocumentStatus] + +This document defines semantic conventions for the following kind of AI systems: + +* LLMs + +Semantic conventions for LLM operations are defined for the following signals: + +* [LLM Spans](llm-spans.md): Semantic Conventions for LLM requests - *spans*. + +Technology specific semantic conventions are defined for the following LLM providers: + +* [OpenAI](openai.md): Semantic Conventions for *OpenAI*. + +[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md \ No newline at end of file diff --git a/docs/ai/llm-spans.md b/docs/ai/llm-spans.md new file mode 100644 index 0000000000..19c4162321 --- /dev/null +++ b/docs/ai/llm-spans.md @@ -0,0 +1,99 @@ + + +# Semantic Conventions for LLM requests + +**Status**: [Experimental][DocumentStatus] + + + + + +- [LLM Request attributes](#llm-request-attributes) +- [Configuration](#configuration) +- [Semantic Conventions for specific LLM technologies](#semantic-conventions-for-specific-llm-technologies) + + + +A request to an LLM is modeled as a span in a trace. + +The **span name** SHOULD be set to a low cardinality value representing the request made to an LLM. +It MAY be a name of the API endpoint for the LLM being called. + +## Configuration + +Instrumentations for LLMs MUST offer the ability to turn off capture of prompts and completions. This is for three primary reasons: + +1. Data privacy concerns. End users of LLM applications may input sensitive information or personally identifiable information (PII) that they do not wish to be sent to a telemetry backend. +2. Data size concerns. Although there is no specified limit to sizes, there are practical limitations in programming languages and telemety systems. Some LLMs allow for extremely large context windows that end users may take full advantage of. +3. Performance concerns. Sending large amounts of data to a telemetry backend may cause performance issues for the application. + +By default, these configurations SHOULD NOT capture prompts and completions. + +## LLM Request attributes + +These attributes track input data and metadata for a request to an LLM. Each attribute represents a concept that is common to most LLMs. + + +| Attribute | Type | Description | Examples | Requirement Level | +|---|---|---|---|---| +| `llm.vendor` | string | The name of the LLM foundation model vendor, if applicable. If not using a vendor-supplied model, this field is left blank. | `openai` | Recommended | +| `llm.request.model` | string | The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model requested. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4` | Required | +| `llm.request.max_tokens` | int | The maximum number of tokens the LLM generates for a request. | `100` | Recommended | +| `llm.temperature` | float | The temperature setting for the LLM request. | `0.0` | Recommended | +| `llm.top_p` | float | The top_p sampling setting for the LLM request. | `1.0` | Recommended | +| `llm.stream` | bool | Whether the LLM responds with a stream. | `false` | Recommended | +| `llm.stop_sequences` | array | Array of strings the LLM uses as a stop sequence. | `["stop1"]` | Recommended | + +`llm.model` has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used. + +| Value | Description | +|---|---| +| `gpt-4` | GPT-4 | +| `gpt-4-32k` | GPT-4 with 32k context window | +| `gpt-3.5-turbo` | GPT-3.5-turbo | +| `gpt-3.5-turbo-16k` | GPT-3.5-turbo with 16k context window| +| `claude-instant-1` | Claude Instant (latest version) | +| `claude-2` | Claude 2 (latest version) | +| `other-llm` | Any LLM not listed in this table. Use for any fine-tuned version of a model. | + + +## LLM Response attributes + +These attributes track output data and metadata for a response from an LLM. Each attribute represents a concept that is common to most LLMs. + + +| Attribute | Type | Description | Examples | Requirement Level | +|---|---|---|---|---| +| `llm.response.id` | string | The unique identifier for the completion. | `chatcmpl-123` | Recommended | +| `llm.response.model` | string | The name of the LLM a response is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model actually used. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4-0613` | Required | +| `llm.response.finish_reason` | string | The reason the model stopped generating tokens | `stop` | Recommended | +| `llm.usage.prompt_tokens` | int | The number of tokens used in the LLM prompt. | `100` | Recommended | +| `llm.usage.completion_tokens` | int | The number of tokens used in the LLM response (completion). | `180` | Recommended | +| `llm.usage.total_tokens` | int | The total number of tokens used in the LLM prompt and response. | `280` | Recommended | + +`llm.response.finish_reason` MUST be one of the following: + +| Value | Description | +|---|---| +| `stop` | If the model hit a natural stop point or a provided stop sequence. | +| `max_tokens` | If the maximum number of tokens specified in the request was reached. | +| `tool_call` | If a function / tool call was made by the model (for models that support such functionality). | + + +## Events + +In the lifetime of an LLM span, an event for prompts sent and completions received MAY be created, depending on the configuration of the instrumentation. + + +| Attribute | Type | Description | Examples | Requirement Level | +| `llm.prompt` | string | The full prompt string sent to an LLM in a request. If the LLM accepts a more complex input like a JSON object made up of several pieces (such as OpenAI's different message types), this field is blank, and the response is instead captured in an event determined by the specific LLM technology semantic convention. | `\n\nHuman:You are an AI assistant that tells jokes. Can you tell me a joke about OpenTelemetry?\n\nAssistant:` | Recommended | + + + +| Attribute | Type | Description | Examples | Requirement Level | +| `llm.completion` | string | The full response string from an LLM. If the LLM responds with a more complex output like a JSON object made up of several pieces (such as OpenAI's message choices), this field is the content of the response. If the LLM produces multiple responses, then this field is left blank, and each response is instead captured in an event determined by the specific LLM technology semantic convention.| `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Recommended | + + +[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md \ No newline at end of file diff --git a/docs/ai/openai.md b/docs/ai/openai.md new file mode 100644 index 0000000000..4c7acf404a --- /dev/null +++ b/docs/ai/openai.md @@ -0,0 +1,114 @@ + + +# Semantic Conventions for OpenAI Spans + +**Status**: [Experimental][DocumentStatus] + +This document outlines the Semantic Conventions specific to +[OpenAI](https://platform.openai.com/) spans, extending the general semantics +found in the [LLM Semantic Conventions](llm-spans.md). These conventions are +designed to standardize telemetry data for OpenAI interactions, particularly +focusing on the `/chat/completions` endpoint. By following to these guidelines, +developers can ensure consistent, meaningful, and easily interpretable telemetry +data across different applications and platforms. + +## Chat Completions + +The span name for OpenAI chat completions SHOULD be `openai.chat` +to maintain consistency and clarity in telemetry data. + +## Request Attributes + +These are the attributes when instrumenting OpenAI LLM requests with the +`/chat/completions` endpoint. + + +| Attribute | Type | Description | Examples | Requirement Level | +|---|---|---|---|---| +| `llm.vendor` | string | The name of the LLM foundation model vendor, if applicable. If not using a vendor-supplied model, this field is left blank. | `openai` | Recommended | +| `llm.request.model` | string | The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model requested. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4` | Required | +| `llm.request.max_tokens` | int | The maximum number of tokens the LLM generates for a request. | `100` | Recommended | +| `llm.temperature` | float | The temperature setting for the LLM request. | `0.0` | Recommended | +| `llm.top_p` | float | The top_p sampling setting for the LLM request. | `1.0` | Recommended | +| `llm.stream` | bool | Whether the LLM responds with a stream. | `false` | Recommended | +| `llm.stop_sequences` | array | Array of strings the LLM uses as a stop sequence. | `["stop1"]` | Recommended | +| `llm.openai.n` | integer | The number of completions to generate. | `1` | Recommended | +| `llm.openai.presence_penalty` | float | If present, the `presence_penalty` used in an OpenAI request. Value is between -2.0 and 2.0. | `-0.5` | Recommended | +| `llm.openai.frequency_penalty` | float | If present, the `frequency_penalty` used in an OpenAI request. Value is between -2.0 and 2.0. | `-0.5` | Recommended | +| `llm.openai.logit_bias` | string | If present, the JSON-encoded string of a `logit_bias` used in an OpenAI request. | `{2435:-100, 640:-100}` | Recommended | +| `llm.openai.user` | string | If present, the `user` used in an OpenAI request. | `bob` | Opt-in | +| `llm.openai.response_format` | string | An object specifying the format that the model must output. Either `text` or `json_object` | `text` | Recommended | +| `llm.openai.seed` | integer | Seed used in request to improve determinism. | `1234` | Recommended | + + +## Response attributes + +Attributes for chat completion responses SHOULD follow these conventions: + + +| Attribute | Type | Description | Examples | Requirement Level | +|---|---|---|---|---| +| `llm.response.id` | string | The unique identifier for the completion. | `chatcmpl-123` | Recommended | +| `llm.response.model` | string | The name of the LLM a response is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model actually used. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4-0613` | Required | +| `llm.response.finish_reason` | string | The reason the model stopped generating tokens | `stop` | Recommended | +| `llm.usage.prompt_tokens` | int | The number of tokens used in the LLM prompt. | `100` | Recommended | +| `llm.usage.completion_tokens` | int | The number of tokens used in the LLM response (completion). | `180` | Recommended | +| `llm.usage.total_tokens` | int | The total number of tokens used in the LLM prompt and response. | `280` | Recommended | +| `llm.openai.created` | int | The UNIX timestamp (in seconds) if when the completion was created. | `1677652288` | Recommended | +| `llm.openai.system_fingerprint` | string | This fingerprint represents the backend configuration that the model runs with. | asdf987123 | Recommended | + + +## Request Events + +In the lifetime of an LLM span, an event for prompts sent and completions received MAY be created, depending on the configuration of the instrumentation. +Because OpenAI uses a more complex prompt structure, these events will be used instead of the generic ones detailed in the [LLM Semantic Conventions](llm-spans.md). + +### Prompt Events + +Prompt event name SHOULD be `llm.openai.prompt`. + + +| Attribute | Type | Description | Examples | Requirement Level | +|---|---|---|---|---| +| `role` | string | The role of the prompt author, can be one of `system`, `user`, `assistant`, or `tool` | `system` | Required | +| `content` | string | The content for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. | `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Required | +| `tool_call_id` | string | If role is `tool` or `function`, then this tool call that this message is responding to. | `get_current_weather` | Conditionally Required: If `role` is `tool`. | + + +### Tools Events + +Tools event name SHOULD be `llm.openai.tool`, specifying potential tools or functions the LLM can use. + + +| Attribute | Type | Description | Examples | Requirement Level | +|---|---|---|---|---| +| `type` | string | They type of the tool. Currently, only `function` is supported. | `function` | Required | +| `function.name` | string | The name of the function to be called. | `get_weather` | Required ! +| `function.description` | string | A description of what the function does, used by the model to choose when and how to call the function. | `` | Required | +| `function.parameters` | string | JSON-encoded string of the parameter object for the function. | `{"type": "object", "properties": {}}` | Required | + + +### Choice Events + +Recording details about Choices in each response MAY be included as +Span Events. + +Choice event name SHOULD be `llm.openai.choice`. + +If there is more than one `tool_call`, separate events SHOULD be used. + + +| `type` | string | Either `delta` or `message`. | `message` | Required | +|---|---|---|---|---| +| `finish_reason` | string | The reason the OpenAI model stopped generating tokens for this chunk. | `stop` | Recommended | +| `role` | string | The assigned role for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. | `system` | Required | +| `content` | string | The content for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. | `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Required | +| `tool_call.id` | string | If exists, the ID of the tool call. | `call_BP08xxEhU60txNjnz3z9R4h9` | Required | +| `tool_call.type` | string | Currently only `function` is supported. | `function` | Required | +| `tool_call.function.name` | string | If exists, the name of a function call for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. | `get_weather_report` | Required | +| `tool_call.function.arguments` | string | If exists, the arguments to call a function call with for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. | `{"type": "object", "properties": {"some":"data"}}` | Required | + + +[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md \ No newline at end of file From a521fc1cc960997f259256852c2ffd799b41ddd7 Mon Sep 17 00:00:00 2001 From: Drew Robbins Date: Mon, 22 Jan 2024 19:43:56 +0000 Subject: [PATCH 2/8] Update to use Yaml model files --- docs/ai/llm-spans.md | 78 ++++++------- docs/ai/openai.md | 111 ++++++++++-------- docs/attributes-registry/llm.md | 125 ++++++++++++++++++++ model/registry/llm.yaml | 194 ++++++++++++++++++++++++++++++++ model/trace/llm.yaml | 164 +++++++++++++++++++++++++++ 5 files changed, 581 insertions(+), 91 deletions(-) create mode 100644 docs/attributes-registry/llm.md create mode 100644 model/registry/llm.yaml create mode 100644 model/trace/llm.yaml diff --git a/docs/ai/llm-spans.md b/docs/ai/llm-spans.md index 19c4162321..9ed8134795 100644 --- a/docs/ai/llm-spans.md +++ b/docs/ai/llm-spans.md @@ -2,7 +2,7 @@ linkTitle: LLM Calls ---> -# Semantic Conventions for LLM requests +# Semantic Conventions for LLM Spans **Status**: [Experimental][DocumentStatus] @@ -10,9 +10,10 @@ linkTitle: LLM Calls -- [LLM Request attributes](#llm-request-attributes) - [Configuration](#configuration) -- [Semantic Conventions for specific LLM technologies](#semantic-conventions-for-specific-llm-technologies) +- [LLM Request attributes](#llm-request-attributes) +- [LLM Response attributes](#llm-response-attributes) +- [LLM Span Events](#llm-span-events) @@ -35,65 +36,52 @@ By default, these configurations SHOULD NOT capture prompts and completions. These attributes track input data and metadata for a request to an LLM. Each attribute represents a concept that is common to most LLMs. - + | Attribute | Type | Description | Examples | Requirement Level | |---|---|---|---|---| -| `llm.vendor` | string | The name of the LLM foundation model vendor, if applicable. If not using a vendor-supplied model, this field is left blank. | `openai` | Recommended | -| `llm.request.model` | string | The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model requested. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4` | Required | -| `llm.request.max_tokens` | int | The maximum number of tokens the LLM generates for a request. | `100` | Recommended | -| `llm.temperature` | float | The temperature setting for the LLM request. | `0.0` | Recommended | -| `llm.top_p` | float | The top_p sampling setting for the LLM request. | `1.0` | Recommended | -| `llm.stream` | bool | Whether the LLM responds with a stream. | `false` | Recommended | -| `llm.stop_sequences` | array | Array of strings the LLM uses as a stop sequence. | `["stop1"]` | Recommended | - -`llm.model` has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used. - -| Value | Description | -|---|---| -| `gpt-4` | GPT-4 | -| `gpt-4-32k` | GPT-4 with 32k context window | -| `gpt-3.5-turbo` | GPT-3.5-turbo | -| `gpt-3.5-turbo-16k` | GPT-3.5-turbo with 16k context window| -| `claude-instant-1` | Claude Instant (latest version) | -| `claude-2` | Claude 2 (latest version) | -| `other-llm` | Any LLM not listed in this table. Use for any fine-tuned version of a model. | +| [`llm.request.max_tokens`](../attributes-registry/llm.md) | int | The maximum number of tokens the LLM generates for a request. | `100` | Recommended | +| [`llm.request.model`](../attributes-registry/llm.md) | string | The name of the LLM a request is being made to. [1] | `gpt-4` | Required | +| [`llm.stop_sequences`](../attributes-registry/llm.md) | string | Array of strings the LLM uses as a stop sequence. | `stop1` | Recommended | +| [`llm.stream`](../attributes-registry/llm.md) | boolean | Whether the LLM responds with a stream. | `False` | Recommended | +| [`llm.temperature`](../attributes-registry/llm.md) | double | The temperature setting for the LLM request. | `0.0` | Recommended | +| [`llm.top_p`](../attributes-registry/llm.md) | double | The top_p sampling setting for the LLM request. | `1.0` | Recommended | +| [`llm.vendor`](../attributes-registry/llm.md) | string | The name of the LLM foundation model vendor, if applicable. [2] | `openai` | Recommended | + +**[1]:** The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model requested. If the LLM is a fine-tuned custom model, the value should have a more specific name than the base model that's been fine-tuned. + +**[2]:** The name of the LLM foundation model vendor, if applicable. If not using a vendor-supplied model, this field is left blank. ## LLM Response attributes These attributes track output data and metadata for a response from an LLM. Each attribute represents a concept that is common to most LLMs. - + | Attribute | Type | Description | Examples | Requirement Level | |---|---|---|---|---| -| `llm.response.id` | string | The unique identifier for the completion. | `chatcmpl-123` | Recommended | -| `llm.response.model` | string | The name of the LLM a response is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model actually used. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4-0613` | Required | -| `llm.response.finish_reason` | string | The reason the model stopped generating tokens | `stop` | Recommended | -| `llm.usage.prompt_tokens` | int | The number of tokens used in the LLM prompt. | `100` | Recommended | -| `llm.usage.completion_tokens` | int | The number of tokens used in the LLM response (completion). | `180` | Recommended | -| `llm.usage.total_tokens` | int | The total number of tokens used in the LLM prompt and response. | `280` | Recommended | - -`llm.response.finish_reason` MUST be one of the following: - -| Value | Description | -|---|---| -| `stop` | If the model hit a natural stop point or a provided stop sequence. | -| `max_tokens` | If the maximum number of tokens specified in the request was reached. | -| `tool_call` | If a function / tool call was made by the model (for models that support such functionality). | +| [`llm.response.finish_reason`](../attributes-registry/llm.md) | string | The reason the model stopped generating tokens. | `stop` | Recommended | +| [`llm.response.id`](../attributes-registry/llm.md) | string | The unique identifier for the completion. | `chatcmpl-123` | Recommended | +| [`llm.response.model`](../attributes-registry/llm.md) | string | The name of the LLM a response is being made to. [1] | `gpt-4-0613` | Required | +| [`llm.usage.completion_tokens`](../attributes-registry/llm.md) | int | The number of tokens used in the LLM response (completion). | `180` | Recommended | +| [`llm.usage.prompt_tokens`](../attributes-registry/llm.md) | int | The number of tokens used in the LLM prompt. | `100` | Recommended | +| [`llm.usage.total_tokens`](../attributes-registry/llm.md) | int | The total number of tokens used in the LLM prompt and response. | `280` | Recommended | + +**[1]:** The name of the LLM a response is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model actually used. If the LLM is a fine-tuned custom model, the value should have a more specific name than the base model that's been fine-tuned. -## Events +## LLM Span Events In the lifetime of an LLM span, an event for prompts sent and completions received MAY be created, depending on the configuration of the instrumentation. - + | Attribute | Type | Description | Examples | Requirement Level | -| `llm.prompt` | string | The full prompt string sent to an LLM in a request. If the LLM accepts a more complex input like a JSON object made up of several pieces (such as OpenAI's different message types), this field is blank, and the response is instead captured in an event determined by the specific LLM technology semantic convention. | `\n\nHuman:You are an AI assistant that tells jokes. Can you tell me a joke about OpenTelemetry?\n\nAssistant:` | Recommended | - +|---|---|---|---|---| +| [`llm.completion`](../attributes-registry/llm.md) | string | The full response string from an LLM in a response. [1] | `Why did the developer stop using OpenTelemetry? Because they couldnt trace their steps!` | Recommended | +| [`llm.prompt`](../attributes-registry/llm.md) | string | The full prompt string sent to an LLM in a request. [2] | `\\n\\nHuman:You are an AI assistant that tells jokes. Can you tell me a joke about OpenTelemetry?\\n\\nAssistant:` | Recommended | - -| Attribute | Type | Description | Examples | Requirement Level | -| `llm.completion` | string | The full response string from an LLM. If the LLM responds with a more complex output like a JSON object made up of several pieces (such as OpenAI's message choices), this field is the content of the response. If the LLM produces multiple responses, then this field is left blank, and each response is instead captured in an event determined by the specific LLM technology semantic convention.| `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Recommended | +**[1]:** The full response string from an LLM. If the LLM responds with a more complex output like a JSON object made up of several pieces (such as OpenAI's message choices), this field is the content of the response. If the LLM produces multiple responses, then this field is left blank, and each response is instead captured in an event determined by the specific LLM technology semantic convention. + +**[2]:** The full prompt string sent to an LLM in a request. If the LLM accepts a more complex input like a JSON object, this field is blank, and the response is instead captured in an event determined by the specific LLM technology semantic convention. [DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md \ No newline at end of file diff --git a/docs/ai/openai.md b/docs/ai/openai.md index 4c7acf404a..7dcbee5a6d 100644 --- a/docs/ai/openai.md +++ b/docs/ai/openai.md @@ -14,53 +14,63 @@ focusing on the `/chat/completions` endpoint. By following to these guidelines, developers can ensure consistent, meaningful, and easily interpretable telemetry data across different applications and platforms. + + +- [Chat Completions](#chat-completions) + * [Request Attributes](#request-attributes) + * [Response attributes](#response-attributes) +- [OpenAI Span Events](#openai-span-events) + * [Prompt Events](#prompt-events) + * [Tools Events](#tools-events) + * [Choice Events](#choice-events) + + + ## Chat Completions The span name for OpenAI chat completions SHOULD be `openai.chat` to maintain consistency and clarity in telemetry data. -## Request Attributes +### Request Attributes These are the attributes when instrumenting OpenAI LLM requests with the `/chat/completions` endpoint. - + | Attribute | Type | Description | Examples | Requirement Level | |---|---|---|---|---| -| `llm.vendor` | string | The name of the LLM foundation model vendor, if applicable. If not using a vendor-supplied model, this field is left blank. | `openai` | Recommended | -| `llm.request.model` | string | The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model requested. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4` | Required | -| `llm.request.max_tokens` | int | The maximum number of tokens the LLM generates for a request. | `100` | Recommended | -| `llm.temperature` | float | The temperature setting for the LLM request. | `0.0` | Recommended | -| `llm.top_p` | float | The top_p sampling setting for the LLM request. | `1.0` | Recommended | -| `llm.stream` | bool | Whether the LLM responds with a stream. | `false` | Recommended | -| `llm.stop_sequences` | array | Array of strings the LLM uses as a stop sequence. | `["stop1"]` | Recommended | -| `llm.openai.n` | integer | The number of completions to generate. | `1` | Recommended | -| `llm.openai.presence_penalty` | float | If present, the `presence_penalty` used in an OpenAI request. Value is between -2.0 and 2.0. | `-0.5` | Recommended | -| `llm.openai.frequency_penalty` | float | If present, the `frequency_penalty` used in an OpenAI request. Value is between -2.0 and 2.0. | `-0.5` | Recommended | -| `llm.openai.logit_bias` | string | If present, the JSON-encoded string of a `logit_bias` used in an OpenAI request. | `{2435:-100, 640:-100}` | Recommended | -| `llm.openai.user` | string | If present, the `user` used in an OpenAI request. | `bob` | Opt-in | -| `llm.openai.response_format` | string | An object specifying the format that the model must output. Either `text` or `json_object` | `text` | Recommended | -| `llm.openai.seed` | integer | Seed used in request to improve determinism. | `1234` | Recommended | +| [`llm.openai.logit_bias`](../attributes-registry/llm.md) | string | If present, the JSON-encoded string of a `logit_bias` used in an OpenAI request | `{2435:-100, 640:-100}` | Recommended | +| [`llm.openai.presence_penalty`](../attributes-registry/llm.md) | double | If present, the `presence_penalty` used in an OpenAI request. Value is between -2.0 and 2.0. | `-0.5` | Recommended | +| [`llm.openai.response_format`](../attributes-registry/llm.md) | string | An object specifying the format that the model must output. Either `text` or `json_object` | `text` | Recommended | +| [`llm.openai.user`](../attributes-registry/llm.md) | string | If present, the `user` used in an OpenAI request. | `bob` | Recommended | +| [`llm.request.max_tokens`](../attributes-registry/llm.md) | int | The maximum number of tokens the LLM generates for a request. | `100` | Recommended | +| [`llm.request.model`](../attributes-registry/llm.md) | string | The name of the LLM a request is being made to. [1] | `gpt-4` | Required | +| [`llm.stop_sequences`](../attributes-registry/llm.md) | string | Array of strings the LLM uses as a stop sequence. | `stop1` | Recommended | +| [`llm.stream`](../attributes-registry/llm.md) | boolean | Whether the LLM responds with a stream. | `False` | Recommended | +| [`llm.temperature`](../attributes-registry/llm.md) | double | The temperature setting for the LLM request. | `0.0` | Recommended | +| [`llm.top_p`](../attributes-registry/llm.md) | double | The top_p sampling setting for the LLM request. | `1.0` | Recommended | +| [`llm.vendor`](../attributes-registry/llm.md) | string | The name of the LLM foundation model vendor, if applicable. | `openai`; `microsoft` | Recommended | + +**[1]:** The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model requested. If the LLM is a fine-tuned custom model, the value should have a more specific name than the base model that's been fine-tuned. -## Response attributes +### Response attributes Attributes for chat completion responses SHOULD follow these conventions: - + | Attribute | Type | Description | Examples | Requirement Level | |---|---|---|---|---| -| `llm.response.id` | string | The unique identifier for the completion. | `chatcmpl-123` | Recommended | -| `llm.response.model` | string | The name of the LLM a response is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model actually used. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4-0613` | Required | -| `llm.response.finish_reason` | string | The reason the model stopped generating tokens | `stop` | Recommended | -| `llm.usage.prompt_tokens` | int | The number of tokens used in the LLM prompt. | `100` | Recommended | -| `llm.usage.completion_tokens` | int | The number of tokens used in the LLM response (completion). | `180` | Recommended | -| `llm.usage.total_tokens` | int | The total number of tokens used in the LLM prompt and response. | `280` | Recommended | -| `llm.openai.created` | int | The UNIX timestamp (in seconds) if when the completion was created. | `1677652288` | Recommended | -| `llm.openai.system_fingerprint` | string | This fingerprint represents the backend configuration that the model runs with. | asdf987123 | Recommended | +| [`llm.openai.created`](../attributes-registry/llm.md) | int | The UNIX timestamp (in seconds) if when the completion was created. | `1677652288` | Recommended | +| [`llm.openai.seed`](../attributes-registry/llm.md) | int | Seed used in request to improve determinism. | `1234` | Recommended | +| [`llm.response.finish_reason`](../attributes-registry/llm.md) | string | The reason the model stopped generating tokens. | `stop` | Recommended | +| [`llm.response.id`](../attributes-registry/llm.md) | string | The unique identifier for the completion. | `chatcmpl-123` | Recommended | +| [`llm.usage.completion_tokens`](../attributes-registry/llm.md) | int | The number of tokens used in the LLM response (completion). | `180` | Recommended | +| [`llm.usage.prompt_tokens`](../attributes-registry/llm.md) | int | The number of tokens used in the LLM prompt. | `100` | Recommended | +| [`llm.usage.total_tokens`](../attributes-registry/llm.md) | int | The total number of tokens used in the LLM prompt and response. | `280` | Recommended | -## Request Events +## OpenAI Span Events In the lifetime of an LLM span, an event for prompts sent and completions received MAY be created, depending on the configuration of the instrumentation. Because OpenAI uses a more complex prompt structure, these events will be used instead of the generic ones detailed in the [LLM Semantic Conventions](llm-spans.md). @@ -69,25 +79,25 @@ Because OpenAI uses a more complex prompt structure, these events will be used i Prompt event name SHOULD be `llm.openai.prompt`. - + | Attribute | Type | Description | Examples | Requirement Level | |---|---|---|---|---| -| `role` | string | The role of the prompt author, can be one of `system`, `user`, `assistant`, or `tool` | `system` | Required | -| `content` | string | The content for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. | `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Required | -| `tool_call_id` | string | If role is `tool` or `function`, then this tool call that this message is responding to. | `get_current_weather` | Conditionally Required: If `role` is `tool`. | +| [`llm.openai.content`](../attributes-registry/llm.md) | string | The content for a given OpenAI response. | `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Required | +| [`llm.openai.role`](../attributes-registry/llm.md) | string | The role of the prompt author, can be one of `system`, `user`, `assistant`, or `tool` | `user` | Required | +| [`llm.openai.tool_call.id`](../attributes-registry/llm.md) | string | If role is `tool` or `function`, then this tool call that this message is responding to. | `get_current_weather` | Conditionally Required: Required if the prompt role is `tool`. | ### Tools Events Tools event name SHOULD be `llm.openai.tool`, specifying potential tools or functions the LLM can use. - + | Attribute | Type | Description | Examples | Requirement Level | |---|---|---|---|---| -| `type` | string | They type of the tool. Currently, only `function` is supported. | `function` | Required | -| `function.name` | string | The name of the function to be called. | `get_weather` | Required ! -| `function.description` | string | A description of what the function does, used by the model to choose when and how to call the function. | `` | Required | -| `function.parameters` | string | JSON-encoded string of the parameter object for the function. | `{"type": "object", "properties": {}}` | Required | +| [`llm.openai.function.description`](../attributes-registry/llm.md) | string | A description of what the function does, used by the model to choose when and how to call the function. | `Gets the current weather for a location` | Required | +| [`llm.openai.function.name`](../attributes-registry/llm.md) | string | The name of the function to be called. | `get_weather` | Required | +| [`llm.openai.function.parameters`](../attributes-registry/llm.md) | string | JSON-encoded string of the parameter object for the function. | `{"type": "object", "properties": {}}` | Required | +| [`llm.openai.tool_call.type`](../attributes-registry/llm.md) | string | The type of the tool. Currently, only `function` is supported. | `function` | Required | ### Choice Events @@ -97,18 +107,27 @@ Span Events. Choice event name SHOULD be `llm.openai.choice`. -If there is more than one `tool_call`, separate events SHOULD be used. +If there is more than one `choice`, separate events SHOULD be used. - -| `type` | string | Either `delta` or `message`. | `message` | Required | + +| Attribute | Type | Description | Examples | Requirement Level | |---|---|---|---|---| -| `finish_reason` | string | The reason the OpenAI model stopped generating tokens for this chunk. | `stop` | Recommended | -| `role` | string | The assigned role for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. | `system` | Required | -| `content` | string | The content for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. | `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Required | -| `tool_call.id` | string | If exists, the ID of the tool call. | `call_BP08xxEhU60txNjnz3z9R4h9` | Required | -| `tool_call.type` | string | Currently only `function` is supported. | `function` | Required | -| `tool_call.function.name` | string | If exists, the name of a function call for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. | `get_weather_report` | Required | -| `tool_call.function.arguments` | string | If exists, the arguments to call a function call with for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. | `{"type": "object", "properties": {"some":"data"}}` | Required | +| [`llm.openai.choice.type`](../attributes-registry/llm.md) | string | The type of the choice, either `delta` or `message`. | `message` | Required | +| [`llm.openai.content`](../attributes-registry/llm.md) | string | The content for a given OpenAI response. | `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Required | +| [`llm.openai.function.arguments`](../attributes-registry/llm.md) | string | If exists, the arguments to call a function call with for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. | `{"type": "object", "properties": {"some":"data"}}` | Conditionally Required: [1] | +| [`llm.openai.function.name`](../attributes-registry/llm.md) | string | The name of the function to be called. | `get_weather` | Conditionally Required: [2] | +| [`llm.openai.role`](../attributes-registry/llm.md) | string | The role of the prompt author, can be one of `system`, `user`, `assistant`, or `tool` | `user` | Required | +| [`llm.openai.tool_call.id`](../attributes-registry/llm.md) | string | If role is `tool` or `function`, then this tool call that this message is responding to. | `get_current_weather` | Conditionally Required: [3] | +| [`llm.openai.tool_call.type`](../attributes-registry/llm.md) | string | The type of the tool. Currently, only `function` is supported. | `function` | Conditionally Required: [4] | +| [`llm.response.finish_reason`](../attributes-registry/llm.md) | string | The reason the model stopped generating tokens. | `stop` | Recommended | + +**[1]:** Required if the choice is the result of a tool call of type `function`. + +**[2]:** Required if the choice is the result of a tool call of type `function`. + +**[3]:** Required if the choice is the result of a tool call. + +**[4]:** Required if the choice is the result of a tool call. [DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md \ No newline at end of file diff --git a/docs/attributes-registry/llm.md b/docs/attributes-registry/llm.md new file mode 100644 index 0000000000..8c203ba211 --- /dev/null +++ b/docs/attributes-registry/llm.md @@ -0,0 +1,125 @@ + + +# Large Language Model (LLM) + + + +- [Generic LLM Attributes](#generic-llm-attributes) + * [Request Attributes](#request-attributes) + * [Response Attributes](#response-attributes) + * [Event Attributes](#event-attributes) +- [OpenAI Attributes](#openai-attributes) + * [Request Attributes](#request-attributes-1) + * [Response Attributes](#response-attributes-1) + * [Event Attributes](#event-attributes-1) + + + +## Generic LLM Attributes + +### Request Attributes + + +| Attribute | Type | Description | Examples | +|---|---|---|---| +| `llm.request.max_tokens` | int | The maximum number of tokens the LLM generates for a request. | `100` | +| `llm.request.model` | string | The name of the LLM a request is being made to. | `gpt-4` | +| `llm.stop_sequences` | string | Array of strings the LLM uses as a stop sequence. | `stop1` | +| `llm.stream` | boolean | Whether the LLM responds with a stream. | `False` | +| `llm.temperature` | double | The temperature setting for the LLM request. | `0.0` | +| `llm.top_p` | double | The top_p sampling setting for the LLM request. | `1.0` | +| `llm.vendor` | string | The name of the LLM foundation model vendor, if applicable. | `openai` | + + +### Response Attributes + + +| Attribute | Type | Description | Examples | +|---|---|---|---| +| `llm.response.finish_reason` | string | The reason the model stopped generating tokens. | `stop` | +| `llm.response.id` | string | The unique identifier for the completion. | `chatcmpl-123` | +| `llm.response.model` | string | The name of the LLM a response is being made to. | `gpt-4-0613` | +| `llm.usage.completion_tokens` | int | The number of tokens used in the LLM response (completion). | `180` | +| `llm.usage.prompt_tokens` | int | The number of tokens used in the LLM prompt. | `100` | +| `llm.usage.total_tokens` | int | The total number of tokens used in the LLM prompt and response. | `280` | + + +### Event Attributes + + +| Attribute | Type | Description | Examples | +|---|---|---|---| +| `llm.completion` | string | The full response string from an LLM in a response. | `Why did the developer stop using OpenTelemetry? Because they couldnt trace their steps!` | +| `llm.prompt` | string | The full prompt string sent to an LLM in a request. | `\\n\\nHuman:You are an AI assistant that tells jokes. Can you tell me a joke about OpenTelemetry?\\n\\nAssistant:` | + + +## OpenAI Attributes + +### Request Attributes + + +| Attribute | Type | Description | Examples | +|---|---|---|---| +| `llm.openai.frequency_penalty` | double | If present, the `frequency_penalty` used in an OpenAI request. Value is between -2.0 and 2.0. | `-0.5` | +| `llm.openai.logit_bias` | string | If present, the JSON-encoded string of a `logit_bias` used in an OpenAI request | `{2435:-100, 640:-100}` | +| `llm.openai.presence_penalty` | double | If present, the `presence_penalty` used in an OpenAI request. Value is between -2.0 and 2.0. | `-0.5` | +| `llm.openai.response_format` | string | An object specifying the format that the model must output. Either `text` or `json_object` | `text` | +| `llm.openai.seed` | int | Seed used in request to improve determinism. | `1234` | +| `llm.openai.user` | string | If present, the `user` used in an OpenAI request. | `bob` | + +`llm.openai.response_format` MUST be one of the following: + +| Value | Description | +|---|---| +| `text` | text | +| `json_object` | json_object | + + +### Response Attributes + + +| Attribute | Type | Description | Examples | +|---|---|---|---| +| `llm.openai.created` | int | The UNIX timestamp (in seconds) if when the completion was created. | `1677652288` | +| `llm.openai.system_fingerprint` | string | This fingerprint represents the backend configuration that the model runs with. | `asdf987123` | + + +### Event Attributes + + +| Attribute | Type | Description | Examples | +|---|---|---|---| +| `llm.openai.choice.type` | string | The type of the choice, either `delta` or `message`. | `message` | +| `llm.openai.content` | string | The content for a given OpenAI response. | `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | +| `llm.openai.finish_reason` | string | The reason the OpenAI model stopped generating tokens for this chunk. | `stop` | +| `llm.openai.function.arguments` | string | If exists, the arguments to call a function call with for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. | `{"type": "object", "properties": {"some":"data"}}` | +| `llm.openai.function.description` | string | A description of what the function does, used by the model to choose when and how to call the function. | `Gets the current weather for a location` | +| `llm.openai.function.name` | string | The name of the function to be called. | `get_weather` | +| `llm.openai.function.parameters` | string | JSON-encoded string of the parameter object for the function. | `{"type": "object", "properties": {}}` | +| `llm.openai.role` | string | The role of the prompt author, can be one of `system`, `user`, `assistant`, or `tool` | `user` | +| `llm.openai.tool_call.id` | string | If role is `tool` or `function`, then this tool call that this message is responding to. | `get_current_weather` | +| `llm.openai.tool_call.type` | string | The type of the tool. Currently, only `function` is supported. | `function` | + +`llm.openai.choice.type` MUST be one of the following: + +| Value | Description | +|---|---| +| `delta` | delta | +| `message` | message | + +`llm.openai.role` MUST be one of the following: + +| Value | Description | +|---|---| +| `system` | system | +| `user` | user | +| `assistant` | assistant | +| `tool` | tool | + +`llm.openai.tool_call.type` MUST be one of the following: + +| Value | Description | +|---|---| +| `function` | function | + \ No newline at end of file diff --git a/model/registry/llm.yaml b/model/registry/llm.yaml new file mode 100644 index 0000000000..60912165db --- /dev/null +++ b/model/registry/llm.yaml @@ -0,0 +1,194 @@ +groups: + - id: registry.llm + prefix: llm + type: attribute_group + brief: > + This document defines the attributes used to describe telemetry in the context of LLM (Large Language Models) requests and responses. + attributes: + - id: vendor + type: string + brief: The name of the LLM foundation model vendor, if applicable. + examples: 'openai' + tag: llm-generic-request + - id: request.model + type: string + brief: The name of the LLM a request is being made to. + examples: 'gpt-4' + tag: llm-generic-request + - id: request.max_tokens + type: int + brief: The maximum number of tokens the LLM generates for a request. + examples: [100] + tag: llm-generic-request + - id: temperature + type: double + brief: The temperature setting for the LLM request. + examples: [0.0] + tag: llm-generic-request + - id: top_p + type: double + brief: The top_p sampling setting for the LLM request. + examples: [1.0] + tag: llm-generic-request + - id: stream + type: boolean + brief: Whether the LLM responds with a stream. + examples: [false] + tag: llm-generic-request + - id: stop_sequences + type: string + brief: Array of strings the LLM uses as a stop sequence. + examples: ["stop1"] + tag: llm-generic-request + - id: response.id + type: string + brief: The unique identifier for the completion. + examples: ['chatcmpl-123'] + tag: llm-generic-response + - id: response.model + type: string + brief: The name of the LLM a response is being made to. + examples: ['gpt-4-0613'] + tag: llm-generic-response + - id: response.finish_reason + type: string + brief: The reason the model stopped generating tokens. + examples: ['stop'] + tag: llm-generic-response + - id: usage.prompt_tokens + type: int + brief: The number of tokens used in the LLM prompt. + examples: [100] + tag: llm-generic-response + - id: usage.completion_tokens + type: int + brief: The number of tokens used in the LLM response (completion). + examples: [180] + tag: llm-generic-response + - id: usage.total_tokens + type: int + brief: The total number of tokens used in the LLM prompt and response. + examples: [280] + tag: llm-generic-response + - id: prompt + type: string + brief: The full prompt string sent to an LLM in a request. + examples: ['\\n\\nHuman:You are an AI assistant that tells jokes. Can you tell me a joke about OpenTelemetry?\\n\\nAssistant:'] + tag: llm-generic-events + - id: completion + type: string + brief: The full response string from an LLM in a response. + examples: ['Why did the developer stop using OpenTelemetry? Because they couldnt trace their steps!'] + tag: llm-generic-events + - id: openai.presence_penalty + type: double + brief: If present, the `presence_penalty` used in an OpenAI request. Value is between -2.0 and 2.0. + examples: -0.5 + tag: tech-specific-openai-request + - id: openai.frequency_penalty + type: double + brief: If present, the `frequency_penalty` used in an OpenAI request. Value is between -2.0 and 2.0. + examples: -0.5 + tag: tech-specific-openai-request + - id: openai.logit_bias + type: string + brief: If present, the JSON-encoded string of a `logit_bias` used in an OpenAI request + examples: ['{2435:-100, 640:-100}'] + tag: tech-specific-openai-request + - id: openai.user + type: string + brief: If present, the `user` used in an OpenAI request. + examples: ['bob'] + tag: tech-specific-openai-request + - id: openai.response_format + type: + members: + - id: text + value: 'text' + - id: json_object + value: 'json_object' + brief: An object specifying the format that the model must output. Either `text` or `json_object` + examples: 'text' + tag: tech-specific-openai-request + - id: openai.seed + type: int + brief: Seed used in request to improve determinism. + examples: 1234 + tag: tech-specific-openai-request + - id: openai.created + type: int + brief: The UNIX timestamp (in seconds) if when the completion was created. + examples: 1677652288 + tag: tech-specific-openai-response + - id: openai.system_fingerprint + type: string + brief: This fingerprint represents the backend configuration that the model runs with. + examples: 'asdf987123' + tag: tech-specific-openai-response + - id: openai.role + type: + members: + - id: system + value: 'system' + - id: user + value: 'user' + - id: assistant + value: 'assistant' + - id: tool + value: 'tool' + brief: The role of the prompt author, can be one of `system`, `user`, `assistant`, or `tool` + examples: 'user' + tag: tech-specific-openai-events + - id: openai.content + type: string + brief: The content for a given OpenAI response. + examples: 'Why did the developer stop using OpenTelemetry? Because they couldn''t trace their steps!' + tag: tech-specific-openai-events + - id: openai.function.name + type: string + brief: The name of the function to be called. + examples: 'get_weather' + tag: tech-specific-openai-events + - id: openai.function.description + type: string + brief: A description of what the function does, used by the model to choose when and how to call the function. + examples: 'Gets the current weather for a location' + tag: tech-specific-openai-events + - id: openai.function.parameters + type: string + brief: JSON-encoded string of the parameter object for the function. + examples: '{"type": "object", "properties": {}}' + tag: tech-specific-openai-events + - id: openai.function.arguments + type: string + brief: If exists, the arguments to call a function call with for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. + examples: '{"type": "object", "properties": {"some":"data"}}' + tag: tech-specific-openai-events + - id: openai.finish_reason + type: string + brief: The reason the OpenAI model stopped generating tokens for this chunk. + examples: 'stop' + tag: tech-specific-openai-events + - id: openai.tool_call.id + type: string + brief: If role is `tool` or `function`, then this tool call that this message is responding to. + examples: 'get_current_weather' + tag: tech-specific-openai-events + - id: openai.tool_call.type + type: + members: + - id: function + value: 'function' + brief: The type of the tool. Currently, only `function` is supported. + examples: 'function' + tag: tech-specific-openai-events + - id: openai.choice.type + type: + members: + - id: delta + value: 'delta' + - id: message + value: 'message' + brief: The type of the choice, either `delta` or `message`. + examples: 'message' + tag: tech-specific-openai-events \ No newline at end of file diff --git a/model/trace/llm.yaml b/model/trace/llm.yaml new file mode 100644 index 0000000000..a4ee102374 --- /dev/null +++ b/model/trace/llm.yaml @@ -0,0 +1,164 @@ +groups: + - id: llm.request + type: span + brief: > + A request to an LLM is modeled as a span in a trace. The span name should be a low cardinality value representing the request made to an LLM, like the name of the API endpoint being called. + attributes: + - ref: llm.vendor + requirement_level: recommended + note: > + The name of the LLM foundation model vendor, if applicable. If not using a vendor-supplied model, this field is left blank. + - ref: llm.request.model + requirement_level: required + note: > + The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model requested. If the LLM is a fine-tuned custom model, the value should have a more specific name than the base model that's been fine-tuned. + - ref: llm.request.max_tokens + requirement_level: recommended + - ref: llm.temperature + requirement_level: recommended + - ref: llm.top_p + requirement_level: recommended + - ref: llm.stream + requirement_level: recommended + - ref: llm.stop_sequences + requirement_level: recommended + + - id: llm.response + type: span + brief: > + These attributes track output data and metadata for a response from an LLM. Each attribute represents a concept that is common to most LLMs. + attributes: + - ref: llm.response.id + requirement_level: recommended + - ref: llm.response.model + requirement_level: required + note: > + The name of the LLM a response is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model actually used. If the LLM is a fine-tuned custom model, the value should have a more specific name than the base model that's been fine-tuned. + - ref: llm.response.finish_reason + requirement_level: recommended + - ref: llm.usage.prompt_tokens + requirement_level: recommended + - ref: llm.usage.completion_tokens + requirement_level: recommended + - ref: llm.usage.total_tokens + requirement_level: recommended + + - id: llm.events + type: span + brief: > + In the lifetime of an LLM span, events for prompts sent and completions received may be created, depending on the configuration of the instrumentation. + attributes: + - ref: llm.prompt + requirement_level: recommended + note: > + The full prompt string sent to an LLM in a request. If the LLM accepts a more complex input like a JSON object, this field is blank, and the response is instead captured in an event determined by the specific LLM technology semantic convention. + - ref: llm.completion + requirement_level: recommended + note: > + The full response string from an LLM. If the LLM responds with a more complex output like a JSON object made up of several pieces (such as OpenAI's message choices), this field is the content of the response. If the LLM produces multiple responses, then this field is left blank, and each response is instead captured in an event determined by the specific LLM technology semantic convention. + + - id: llm.openai + type: span + brief: > + These are the attributes when instrumenting OpenAI LLM requests with the `/chat/completions` endpoint. + attributes: + - ref: llm.vendor + requirement_level: recommended + examples: ['openai', 'microsoft'] + tag: tech-specific-openai-request + - ref: llm.request.model + requirement_level: required + note: > + The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model requested. If the LLM is a fine-tuned custom model, the value should have a more specific name than the base model that's been fine-tuned. + tag: tech-specific-openai-request + - ref: llm.request.max_tokens + tag: tech-specific-openai-request + - ref: llm.temperature + tag: tech-specific-openai-request + - ref: llm.top_p + tag: tech-specific-openai-request + - ref: llm.stream + tag: tech-specific-openai-request + - ref: llm.stop_sequences + tag: tech-specific-openai-request + - ref: llm.openai.presence_penalty + tag: tech-specific-openai-request + - ref: llm.openai.logit_bias + tag: tech-specific-openai-request + - ref: llm.openai.user + tag: tech-specific-openai-request + - ref: llm.openai.response_format + tag: tech-specific-openai-request + - ref: llm.openai.seed + tag: tech-specific-openai-response + - ref: llm.response.id + tag: tech-specific-openai-response + - ref: llm.response.finish_reason + tag: tech-specific-openai-response + - ref: llm.usage.prompt_tokens + tag: tech-specific-openai-response + - ref: llm.usage.completion_tokens + tag: tech-specific-openai-response + - ref: llm.usage.total_tokens + tag: tech-specific-openai-response + - ref: llm.openai.created + tag: tech-specific-openai-response + - ref: llm.openai.system_fingerprint + tag: tech-sepecifc-openai-response + + - id: llm.openai.prompt + type: span + brief: > + These are the attributes when instrumenting OpenAI LLM requests and recording prompts in the request. + attributes: + - ref: llm.openai.role + requirement_level: required + - ref: llm.openai.content + requirement_level: required + - ref: llm.openai.tool_call.id + requirement_level: + conditionally_required: > + Required if the prompt role is `tool`. + + - id: llm.openai.tool + type: span + brief: > + These are the attributes when instrumenting OpenAI LLM requests that specify tools (or functions) the LLM can use. + attributes: + - ref: llm.openai.tool_call.type + requirement_level: required + - ref: llm.openai.function.name + requirement_level: required + - ref: llm.openai.function.description + requirement_level: required + - ref: llm.openai.function.parameters + requirement_level: required + + - id: llm.openai.choice + type: span + brief: > + These are the attributes when instrumenting OpenAI LLM requests and recording choices in the result. + attributes: + - ref: llm.openai.choice.type + requirement_level: required + - ref: llm.response.finish_reason + - ref: llm.openai.role + requirement_level: required + - ref: llm.openai.content + requirement_level: required + - ref: llm.openai.tool_call.id + requirement_level: + conditionally_required: > + Required if the choice is the result of a tool call. + - ref: llm.openai.tool_call.type + requirement_level: + conditionally_required: > + Required if the choice is the result of a tool call. + - ref: llm.openai.function.name + requirement_level: + conditionally_required: > + Required if the choice is the result of a tool call of type `function`. + - ref: llm.openai.function.arguments + requirement_level: + conditionally_required: > + Required if the choice is the result of a tool call of type `function`. \ No newline at end of file From 5843c65f3c66f6c76c898f6d1ad84f8b8eb76d0f Mon Sep 17 00:00:00 2001 From: Nir Gazit Date: Tue, 23 Jan 2024 19:30:32 +0100 Subject: [PATCH 3/8] chore: fixes in yaml according to reviews --- docs/ai/llm-spans.md | 87 --------------------- docs/ai/openai.md | 133 -------------------------------- docs/attributes-registry/llm.md | 33 ++++---- model/registry/llm.yaml | 59 +++++++------- model/trace/llm.yaml | 78 +++++++++++-------- 5 files changed, 88 insertions(+), 302 deletions(-) delete mode 100644 docs/ai/llm-spans.md delete mode 100644 docs/ai/openai.md diff --git a/docs/ai/llm-spans.md b/docs/ai/llm-spans.md deleted file mode 100644 index 9ed8134795..0000000000 --- a/docs/ai/llm-spans.md +++ /dev/null @@ -1,87 +0,0 @@ - - -# Semantic Conventions for LLM Spans - -**Status**: [Experimental][DocumentStatus] - - - - - -- [Configuration](#configuration) -- [LLM Request attributes](#llm-request-attributes) -- [LLM Response attributes](#llm-response-attributes) -- [LLM Span Events](#llm-span-events) - - - -A request to an LLM is modeled as a span in a trace. - -The **span name** SHOULD be set to a low cardinality value representing the request made to an LLM. -It MAY be a name of the API endpoint for the LLM being called. - -## Configuration - -Instrumentations for LLMs MUST offer the ability to turn off capture of prompts and completions. This is for three primary reasons: - -1. Data privacy concerns. End users of LLM applications may input sensitive information or personally identifiable information (PII) that they do not wish to be sent to a telemetry backend. -2. Data size concerns. Although there is no specified limit to sizes, there are practical limitations in programming languages and telemety systems. Some LLMs allow for extremely large context windows that end users may take full advantage of. -3. Performance concerns. Sending large amounts of data to a telemetry backend may cause performance issues for the application. - -By default, these configurations SHOULD NOT capture prompts and completions. - -## LLM Request attributes - -These attributes track input data and metadata for a request to an LLM. Each attribute represents a concept that is common to most LLMs. - - -| Attribute | Type | Description | Examples | Requirement Level | -|---|---|---|---|---| -| [`llm.request.max_tokens`](../attributes-registry/llm.md) | int | The maximum number of tokens the LLM generates for a request. | `100` | Recommended | -| [`llm.request.model`](../attributes-registry/llm.md) | string | The name of the LLM a request is being made to. [1] | `gpt-4` | Required | -| [`llm.stop_sequences`](../attributes-registry/llm.md) | string | Array of strings the LLM uses as a stop sequence. | `stop1` | Recommended | -| [`llm.stream`](../attributes-registry/llm.md) | boolean | Whether the LLM responds with a stream. | `False` | Recommended | -| [`llm.temperature`](../attributes-registry/llm.md) | double | The temperature setting for the LLM request. | `0.0` | Recommended | -| [`llm.top_p`](../attributes-registry/llm.md) | double | The top_p sampling setting for the LLM request. | `1.0` | Recommended | -| [`llm.vendor`](../attributes-registry/llm.md) | string | The name of the LLM foundation model vendor, if applicable. [2] | `openai` | Recommended | - -**[1]:** The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model requested. If the LLM is a fine-tuned custom model, the value should have a more specific name than the base model that's been fine-tuned. - -**[2]:** The name of the LLM foundation model vendor, if applicable. If not using a vendor-supplied model, this field is left blank. - - -## LLM Response attributes - -These attributes track output data and metadata for a response from an LLM. Each attribute represents a concept that is common to most LLMs. - - -| Attribute | Type | Description | Examples | Requirement Level | -|---|---|---|---|---| -| [`llm.response.finish_reason`](../attributes-registry/llm.md) | string | The reason the model stopped generating tokens. | `stop` | Recommended | -| [`llm.response.id`](../attributes-registry/llm.md) | string | The unique identifier for the completion. | `chatcmpl-123` | Recommended | -| [`llm.response.model`](../attributes-registry/llm.md) | string | The name of the LLM a response is being made to. [1] | `gpt-4-0613` | Required | -| [`llm.usage.completion_tokens`](../attributes-registry/llm.md) | int | The number of tokens used in the LLM response (completion). | `180` | Recommended | -| [`llm.usage.prompt_tokens`](../attributes-registry/llm.md) | int | The number of tokens used in the LLM prompt. | `100` | Recommended | -| [`llm.usage.total_tokens`](../attributes-registry/llm.md) | int | The total number of tokens used in the LLM prompt and response. | `280` | Recommended | - -**[1]:** The name of the LLM a response is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model actually used. If the LLM is a fine-tuned custom model, the value should have a more specific name than the base model that's been fine-tuned. - - -## LLM Span Events - -In the lifetime of an LLM span, an event for prompts sent and completions received MAY be created, depending on the configuration of the instrumentation. - - -| Attribute | Type | Description | Examples | Requirement Level | -|---|---|---|---|---| -| [`llm.completion`](../attributes-registry/llm.md) | string | The full response string from an LLM in a response. [1] | `Why did the developer stop using OpenTelemetry? Because they couldnt trace their steps!` | Recommended | -| [`llm.prompt`](../attributes-registry/llm.md) | string | The full prompt string sent to an LLM in a request. [2] | `\\n\\nHuman:You are an AI assistant that tells jokes. Can you tell me a joke about OpenTelemetry?\\n\\nAssistant:` | Recommended | - -**[1]:** The full response string from an LLM. If the LLM responds with a more complex output like a JSON object made up of several pieces (such as OpenAI's message choices), this field is the content of the response. If the LLM produces multiple responses, then this field is left blank, and each response is instead captured in an event determined by the specific LLM technology semantic convention. - -**[2]:** The full prompt string sent to an LLM in a request. If the LLM accepts a more complex input like a JSON object, this field is blank, and the response is instead captured in an event determined by the specific LLM technology semantic convention. - - -[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md \ No newline at end of file diff --git a/docs/ai/openai.md b/docs/ai/openai.md deleted file mode 100644 index 7dcbee5a6d..0000000000 --- a/docs/ai/openai.md +++ /dev/null @@ -1,133 +0,0 @@ - - -# Semantic Conventions for OpenAI Spans - -**Status**: [Experimental][DocumentStatus] - -This document outlines the Semantic Conventions specific to -[OpenAI](https://platform.openai.com/) spans, extending the general semantics -found in the [LLM Semantic Conventions](llm-spans.md). These conventions are -designed to standardize telemetry data for OpenAI interactions, particularly -focusing on the `/chat/completions` endpoint. By following to these guidelines, -developers can ensure consistent, meaningful, and easily interpretable telemetry -data across different applications and platforms. - - - -- [Chat Completions](#chat-completions) - * [Request Attributes](#request-attributes) - * [Response attributes](#response-attributes) -- [OpenAI Span Events](#openai-span-events) - * [Prompt Events](#prompt-events) - * [Tools Events](#tools-events) - * [Choice Events](#choice-events) - - - -## Chat Completions - -The span name for OpenAI chat completions SHOULD be `openai.chat` -to maintain consistency and clarity in telemetry data. - -### Request Attributes - -These are the attributes when instrumenting OpenAI LLM requests with the -`/chat/completions` endpoint. - - -| Attribute | Type | Description | Examples | Requirement Level | -|---|---|---|---|---| -| [`llm.openai.logit_bias`](../attributes-registry/llm.md) | string | If present, the JSON-encoded string of a `logit_bias` used in an OpenAI request | `{2435:-100, 640:-100}` | Recommended | -| [`llm.openai.presence_penalty`](../attributes-registry/llm.md) | double | If present, the `presence_penalty` used in an OpenAI request. Value is between -2.0 and 2.0. | `-0.5` | Recommended | -| [`llm.openai.response_format`](../attributes-registry/llm.md) | string | An object specifying the format that the model must output. Either `text` or `json_object` | `text` | Recommended | -| [`llm.openai.user`](../attributes-registry/llm.md) | string | If present, the `user` used in an OpenAI request. | `bob` | Recommended | -| [`llm.request.max_tokens`](../attributes-registry/llm.md) | int | The maximum number of tokens the LLM generates for a request. | `100` | Recommended | -| [`llm.request.model`](../attributes-registry/llm.md) | string | The name of the LLM a request is being made to. [1] | `gpt-4` | Required | -| [`llm.stop_sequences`](../attributes-registry/llm.md) | string | Array of strings the LLM uses as a stop sequence. | `stop1` | Recommended | -| [`llm.stream`](../attributes-registry/llm.md) | boolean | Whether the LLM responds with a stream. | `False` | Recommended | -| [`llm.temperature`](../attributes-registry/llm.md) | double | The temperature setting for the LLM request. | `0.0` | Recommended | -| [`llm.top_p`](../attributes-registry/llm.md) | double | The top_p sampling setting for the LLM request. | `1.0` | Recommended | -| [`llm.vendor`](../attributes-registry/llm.md) | string | The name of the LLM foundation model vendor, if applicable. | `openai`; `microsoft` | Recommended | - -**[1]:** The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model requested. If the LLM is a fine-tuned custom model, the value should have a more specific name than the base model that's been fine-tuned. - - -### Response attributes - -Attributes for chat completion responses SHOULD follow these conventions: - - -| Attribute | Type | Description | Examples | Requirement Level | -|---|---|---|---|---| -| [`llm.openai.created`](../attributes-registry/llm.md) | int | The UNIX timestamp (in seconds) if when the completion was created. | `1677652288` | Recommended | -| [`llm.openai.seed`](../attributes-registry/llm.md) | int | Seed used in request to improve determinism. | `1234` | Recommended | -| [`llm.response.finish_reason`](../attributes-registry/llm.md) | string | The reason the model stopped generating tokens. | `stop` | Recommended | -| [`llm.response.id`](../attributes-registry/llm.md) | string | The unique identifier for the completion. | `chatcmpl-123` | Recommended | -| [`llm.usage.completion_tokens`](../attributes-registry/llm.md) | int | The number of tokens used in the LLM response (completion). | `180` | Recommended | -| [`llm.usage.prompt_tokens`](../attributes-registry/llm.md) | int | The number of tokens used in the LLM prompt. | `100` | Recommended | -| [`llm.usage.total_tokens`](../attributes-registry/llm.md) | int | The total number of tokens used in the LLM prompt and response. | `280` | Recommended | - - -## OpenAI Span Events - -In the lifetime of an LLM span, an event for prompts sent and completions received MAY be created, depending on the configuration of the instrumentation. -Because OpenAI uses a more complex prompt structure, these events will be used instead of the generic ones detailed in the [LLM Semantic Conventions](llm-spans.md). - -### Prompt Events - -Prompt event name SHOULD be `llm.openai.prompt`. - - -| Attribute | Type | Description | Examples | Requirement Level | -|---|---|---|---|---| -| [`llm.openai.content`](../attributes-registry/llm.md) | string | The content for a given OpenAI response. | `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Required | -| [`llm.openai.role`](../attributes-registry/llm.md) | string | The role of the prompt author, can be one of `system`, `user`, `assistant`, or `tool` | `user` | Required | -| [`llm.openai.tool_call.id`](../attributes-registry/llm.md) | string | If role is `tool` or `function`, then this tool call that this message is responding to. | `get_current_weather` | Conditionally Required: Required if the prompt role is `tool`. | - - -### Tools Events - -Tools event name SHOULD be `llm.openai.tool`, specifying potential tools or functions the LLM can use. - - -| Attribute | Type | Description | Examples | Requirement Level | -|---|---|---|---|---| -| [`llm.openai.function.description`](../attributes-registry/llm.md) | string | A description of what the function does, used by the model to choose when and how to call the function. | `Gets the current weather for a location` | Required | -| [`llm.openai.function.name`](../attributes-registry/llm.md) | string | The name of the function to be called. | `get_weather` | Required | -| [`llm.openai.function.parameters`](../attributes-registry/llm.md) | string | JSON-encoded string of the parameter object for the function. | `{"type": "object", "properties": {}}` | Required | -| [`llm.openai.tool_call.type`](../attributes-registry/llm.md) | string | The type of the tool. Currently, only `function` is supported. | `function` | Required | - - -### Choice Events - -Recording details about Choices in each response MAY be included as -Span Events. - -Choice event name SHOULD be `llm.openai.choice`. - -If there is more than one `choice`, separate events SHOULD be used. - - -| Attribute | Type | Description | Examples | Requirement Level | -|---|---|---|---|---| -| [`llm.openai.choice.type`](../attributes-registry/llm.md) | string | The type of the choice, either `delta` or `message`. | `message` | Required | -| [`llm.openai.content`](../attributes-registry/llm.md) | string | The content for a given OpenAI response. | `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Required | -| [`llm.openai.function.arguments`](../attributes-registry/llm.md) | string | If exists, the arguments to call a function call with for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. | `{"type": "object", "properties": {"some":"data"}}` | Conditionally Required: [1] | -| [`llm.openai.function.name`](../attributes-registry/llm.md) | string | The name of the function to be called. | `get_weather` | Conditionally Required: [2] | -| [`llm.openai.role`](../attributes-registry/llm.md) | string | The role of the prompt author, can be one of `system`, `user`, `assistant`, or `tool` | `user` | Required | -| [`llm.openai.tool_call.id`](../attributes-registry/llm.md) | string | If role is `tool` or `function`, then this tool call that this message is responding to. | `get_current_weather` | Conditionally Required: [3] | -| [`llm.openai.tool_call.type`](../attributes-registry/llm.md) | string | The type of the tool. Currently, only `function` is supported. | `function` | Conditionally Required: [4] | -| [`llm.response.finish_reason`](../attributes-registry/llm.md) | string | The reason the model stopped generating tokens. | `stop` | Recommended | - -**[1]:** Required if the choice is the result of a tool call of type `function`. - -**[2]:** Required if the choice is the result of a tool call of type `function`. - -**[3]:** Required if the choice is the result of a tool call. - -**[4]:** Required if the choice is the result of a tool call. - - -[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md \ No newline at end of file diff --git a/docs/attributes-registry/llm.md b/docs/attributes-registry/llm.md index 8c203ba211..5dfb91d272 100644 --- a/docs/attributes-registry/llm.md +++ b/docs/attributes-registry/llm.md @@ -25,11 +25,11 @@ |---|---|---|---| | `llm.request.max_tokens` | int | The maximum number of tokens the LLM generates for a request. | `100` | | `llm.request.model` | string | The name of the LLM a request is being made to. | `gpt-4` | -| `llm.stop_sequences` | string | Array of strings the LLM uses as a stop sequence. | `stop1` | -| `llm.stream` | boolean | Whether the LLM responds with a stream. | `False` | -| `llm.temperature` | double | The temperature setting for the LLM request. | `0.0` | -| `llm.top_p` | double | The top_p sampling setting for the LLM request. | `1.0` | -| `llm.vendor` | string | The name of the LLM foundation model vendor, if applicable. | `openai` | +| `llm.request.stop_sequences` | string | Array of strings the LLM uses as a stop sequence. | `stop1` | +| `llm.request.stream` | boolean | Whether the LLM responds with a stream. | `False` | +| `llm.request.temperature` | double | The temperature setting for the LLM request. | `0.0` | +| `llm.request.top_p` | double | The top_p sampling setting for the LLM request. | `1.0` | +| `llm.request.vendor` | string | The name of the LLM foundation model vendor, if applicable. | `openai` | ### Response Attributes @@ -61,14 +61,14 @@ | Attribute | Type | Description | Examples | |---|---|---|---| -| `llm.openai.frequency_penalty` | double | If present, the `frequency_penalty` used in an OpenAI request. Value is between -2.0 and 2.0. | `-0.5` | -| `llm.openai.logit_bias` | string | If present, the JSON-encoded string of a `logit_bias` used in an OpenAI request | `{2435:-100, 640:-100}` | -| `llm.openai.presence_penalty` | double | If present, the `presence_penalty` used in an OpenAI request. Value is between -2.0 and 2.0. | `-0.5` | -| `llm.openai.response_format` | string | An object specifying the format that the model must output. Either `text` or `json_object` | `text` | -| `llm.openai.seed` | int | Seed used in request to improve determinism. | `1234` | -| `llm.openai.user` | string | If present, the `user` used in an OpenAI request. | `bob` | +| `llm.request.openai.frequency_penalty` | double | If present, the `frequency_penalty` used in an OpenAI request. Value is between -2.0 and 2.0. | `-0.5` | +| `llm.request.openai.logit_bias` | string | If present, the JSON-encoded string of a `logit_bias` used in an OpenAI request | `{2435:-100, 640:-100}` | +| `llm.request.openai.presence_penalty` | double | If present, the `presence_penalty` used in an OpenAI request. Value is between -2.0 and 2.0. | `-0.5` | +| `llm.request.openai.response_format` | string | An object specifying the format that the model must output. Either `text` or `json_object` | `text` | +| `llm.request.openai.seed` | int | Seed used in request to improve determinism. | `1234` | +| `llm.request.openai.user` | string | If present, the `user` used in an OpenAI request. | `bob` | -`llm.openai.response_format` MUST be one of the following: +`llm.request.openai.response_format` MUST be one of the following: | Value | Description | |---|---| @@ -81,8 +81,8 @@ | Attribute | Type | Description | Examples | |---|---|---|---| -| `llm.openai.created` | int | The UNIX timestamp (in seconds) if when the completion was created. | `1677652288` | -| `llm.openai.system_fingerprint` | string | This fingerprint represents the backend configuration that the model runs with. | `asdf987123` | +| `llm.response.openai.created` | int | The UNIX timestamp (in seconds) if when the completion was created. | `1677652288` | +| `llm.response.openai.system_fingerprint` | string | This fingerprint represents the backend configuration that the model runs with. | `asdf987123` | ### Event Attributes @@ -92,14 +92,13 @@ |---|---|---|---| | `llm.openai.choice.type` | string | The type of the choice, either `delta` or `message`. | `message` | | `llm.openai.content` | string | The content for a given OpenAI response. | `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | -| `llm.openai.finish_reason` | string | The reason the OpenAI model stopped generating tokens for this chunk. | `stop` | | `llm.openai.function.arguments` | string | If exists, the arguments to call a function call with for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. | `{"type": "object", "properties": {"some":"data"}}` | | `llm.openai.function.description` | string | A description of what the function does, used by the model to choose when and how to call the function. | `Gets the current weather for a location` | | `llm.openai.function.name` | string | The name of the function to be called. | `get_weather` | | `llm.openai.function.parameters` | string | JSON-encoded string of the parameter object for the function. | `{"type": "object", "properties": {}}` | | `llm.openai.role` | string | The role of the prompt author, can be one of `system`, `user`, `assistant`, or `tool` | `user` | +| `llm.openai.tool.type` | string | The type of the tool. Currently, only `function` is supported. | `function` | | `llm.openai.tool_call.id` | string | If role is `tool` or `function`, then this tool call that this message is responding to. | `get_current_weather` | -| `llm.openai.tool_call.type` | string | The type of the tool. Currently, only `function` is supported. | `function` | `llm.openai.choice.type` MUST be one of the following: @@ -117,7 +116,7 @@ | `assistant` | assistant | | `tool` | tool | -`llm.openai.tool_call.type` MUST be one of the following: +`llm.openai.tool.type` MUST be one of the following: | Value | Description | |---|---| diff --git a/model/registry/llm.yaml b/model/registry/llm.yaml index 60912165db..9bb6ee669b 100644 --- a/model/registry/llm.yaml +++ b/model/registry/llm.yaml @@ -5,7 +5,7 @@ groups: brief: > This document defines the attributes used to describe telemetry in the context of LLM (Large Language Models) requests and responses. attributes: - - id: vendor + - id: request.vendor type: string brief: The name of the LLM foundation model vendor, if applicable. examples: 'openai' @@ -20,22 +20,22 @@ groups: brief: The maximum number of tokens the LLM generates for a request. examples: [100] tag: llm-generic-request - - id: temperature + - id: request.temperature type: double brief: The temperature setting for the LLM request. examples: [0.0] tag: llm-generic-request - - id: top_p + - id: request.top_p type: double brief: The top_p sampling setting for the LLM request. examples: [1.0] tag: llm-generic-request - - id: stream + - id: request.stream type: boolean brief: Whether the LLM responds with a stream. examples: [false] tag: llm-generic-request - - id: stop_sequences + - id: request.stop_sequences type: string brief: Array of strings the LLM uses as a stop sequence. examples: ["stop1"] @@ -80,27 +80,27 @@ groups: brief: The full response string from an LLM in a response. examples: ['Why did the developer stop using OpenTelemetry? Because they couldnt trace their steps!'] tag: llm-generic-events - - id: openai.presence_penalty + - id: request.openai.presence_penalty type: double brief: If present, the `presence_penalty` used in an OpenAI request. Value is between -2.0 and 2.0. examples: -0.5 tag: tech-specific-openai-request - - id: openai.frequency_penalty + - id: request.openai.frequency_penalty type: double brief: If present, the `frequency_penalty` used in an OpenAI request. Value is between -2.0 and 2.0. examples: -0.5 tag: tech-specific-openai-request - - id: openai.logit_bias + - id: request.openai.logit_bias type: string brief: If present, the JSON-encoded string of a `logit_bias` used in an OpenAI request examples: ['{2435:-100, 640:-100}'] tag: tech-specific-openai-request - - id: openai.user + - id: request.openai.user type: string brief: If present, the `user` used in an OpenAI request. examples: ['bob'] tag: tech-specific-openai-request - - id: openai.response_format + - id: request.openai.response_format type: members: - id: text @@ -110,17 +110,17 @@ groups: brief: An object specifying the format that the model must output. Either `text` or `json_object` examples: 'text' tag: tech-specific-openai-request - - id: openai.seed + - id: request.openai.seed type: int brief: Seed used in request to improve determinism. examples: 1234 tag: tech-specific-openai-request - - id: openai.created + - id: response.openai.created type: int brief: The UNIX timestamp (in seconds) if when the completion was created. examples: 1677652288 tag: tech-specific-openai-response - - id: openai.system_fingerprint + - id: response.openai.system_fingerprint type: string brief: This fingerprint represents the backend configuration that the model runs with. examples: 'asdf987123' @@ -139,10 +139,13 @@ groups: brief: The role of the prompt author, can be one of `system`, `user`, `assistant`, or `tool` examples: 'user' tag: tech-specific-openai-events - - id: openai.content - type: string - brief: The content for a given OpenAI response. - examples: 'Why did the developer stop using OpenTelemetry? Because they couldn''t trace their steps!' + - id: openai.tool.type + type: + members: + - id: function + value: 'function' + brief: The type of the tool. Currently, only `function` is supported. + examples: 'function' tag: tech-specific-openai-events - id: openai.function.name type: string @@ -159,28 +162,20 @@ groups: brief: JSON-encoded string of the parameter object for the function. examples: '{"type": "object", "properties": {}}' tag: tech-specific-openai-events - - id: openai.function.arguments - type: string - brief: If exists, the arguments to call a function call with for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. - examples: '{"type": "object", "properties": {"some":"data"}}' - tag: tech-specific-openai-events - - id: openai.finish_reason + - id: openai.content type: string - brief: The reason the OpenAI model stopped generating tokens for this chunk. - examples: 'stop' + brief: The content for a given OpenAI response. + examples: 'Why did the developer stop using OpenTelemetry? Because they couldn''t trace their steps!' tag: tech-specific-openai-events - id: openai.tool_call.id type: string brief: If role is `tool` or `function`, then this tool call that this message is responding to. examples: 'get_current_weather' tag: tech-specific-openai-events - - id: openai.tool_call.type - type: - members: - - id: function - value: 'function' - brief: The type of the tool. Currently, only `function` is supported. - examples: 'function' + - id: openai.function.arguments + type: string + brief: If exists, the arguments to call a function call with for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. + examples: '{"type": "object", "properties": {"some":"data"}}' tag: tech-specific-openai-events - id: openai.choice.type type: diff --git a/model/trace/llm.yaml b/model/trace/llm.yaml index a4ee102374..4df11e1b5c 100644 --- a/model/trace/llm.yaml +++ b/model/trace/llm.yaml @@ -4,7 +4,7 @@ groups: brief: > A request to an LLM is modeled as a span in a trace. The span name should be a low cardinality value representing the request made to an LLM, like the name of the API endpoint being called. attributes: - - ref: llm.vendor + - ref: llm.request.vendor requirement_level: recommended note: > The name of the LLM foundation model vendor, if applicable. If not using a vendor-supplied model, this field is left blank. @@ -14,20 +14,14 @@ groups: The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model requested. If the LLM is a fine-tuned custom model, the value should have a more specific name than the base model that's been fine-tuned. - ref: llm.request.max_tokens requirement_level: recommended - - ref: llm.temperature + - ref: llm.request.temperature requirement_level: recommended - - ref: llm.top_p + - ref: llm.request.top_p requirement_level: recommended - - ref: llm.stream + - ref: llm.request.stream requirement_level: recommended - - ref: llm.stop_sequences + - ref: llm.request.stop_sequences requirement_level: recommended - - - id: llm.response - type: span - brief: > - These attributes track output data and metadata for a response from an LLM. Each attribute represents a concept that is common to most LLMs. - attributes: - ref: llm.response.id requirement_level: recommended - ref: llm.response.model @@ -42,9 +36,13 @@ groups: requirement_level: recommended - ref: llm.usage.total_tokens requirement_level: recommended + events: + - llm.content.prompt + - llm.content.completion - - id: llm.events - type: span + - id: llm.content.prompt + name: llm.content.prompt + type: event brief: > In the lifetime of an LLM span, events for prompts sent and completions received may be created, depending on the configuration of the instrumentation. attributes: @@ -52,6 +50,13 @@ groups: requirement_level: recommended note: > The full prompt string sent to an LLM in a request. If the LLM accepts a more complex input like a JSON object, this field is blank, and the response is instead captured in an event determined by the specific LLM technology semantic convention. + + - id: llm.content.completion + name: llm.content.completion + type: event + brief: > + In the lifetime of an LLM span, events for prompts sent and completions received may be created, depending on the configuration of the instrumentation. + attributes: - ref: llm.completion requirement_level: recommended note: > @@ -62,7 +67,7 @@ groups: brief: > These are the attributes when instrumenting OpenAI LLM requests with the `/chat/completions` endpoint. attributes: - - ref: llm.vendor + - ref: llm.request.vendor requirement_level: recommended examples: ['openai', 'microsoft'] tag: tech-specific-openai-request @@ -73,23 +78,23 @@ groups: tag: tech-specific-openai-request - ref: llm.request.max_tokens tag: tech-specific-openai-request - - ref: llm.temperature + - ref: llm.request.temperature tag: tech-specific-openai-request - - ref: llm.top_p + - ref: llm.request.top_p tag: tech-specific-openai-request - - ref: llm.stream + - ref: llm.request.stream tag: tech-specific-openai-request - - ref: llm.stop_sequences + - ref: llm.request.stop_sequences tag: tech-specific-openai-request - - ref: llm.openai.presence_penalty + - ref: llm.request.openai.presence_penalty tag: tech-specific-openai-request - - ref: llm.openai.logit_bias + - ref: llm.request.openai.logit_bias tag: tech-specific-openai-request - - ref: llm.openai.user + - ref: llm.request.openai.user tag: tech-specific-openai-request - - ref: llm.openai.response_format + - ref: llm.request.openai.response_format tag: tech-specific-openai-request - - ref: llm.openai.seed + - ref: llm.request.openai.seed tag: tech-specific-openai-response - ref: llm.response.id tag: tech-specific-openai-response @@ -101,13 +106,18 @@ groups: tag: tech-specific-openai-response - ref: llm.usage.total_tokens tag: tech-specific-openai-response - - ref: llm.openai.created + - ref: llm.response.openai.created tag: tech-specific-openai-response - - ref: llm.openai.system_fingerprint + - ref: llm.response.openai.system_fingerprint tag: tech-sepecifc-openai-response + events: + - llm.content.openai.prompt + - llm.content.openai.tool + - llm.content.openai.completion.choice - - id: llm.openai.prompt - type: span + - id: llm.content.openai.prompt + name: llm.content.openai.prompt + type: event brief: > These are the attributes when instrumenting OpenAI LLM requests and recording prompts in the request. attributes: @@ -120,12 +130,13 @@ groups: conditionally_required: > Required if the prompt role is `tool`. - - id: llm.openai.tool - type: span + - id: llm.content.openai.tool + name: llm.content.openai.tool + type: event brief: > These are the attributes when instrumenting OpenAI LLM requests that specify tools (or functions) the LLM can use. attributes: - - ref: llm.openai.tool_call.type + - ref: llm.openai.tool.type requirement_level: required - ref: llm.openai.function.name requirement_level: required @@ -134,8 +145,9 @@ groups: - ref: llm.openai.function.parameters requirement_level: required - - id: llm.openai.choice - type: span + - id: llm.content.openai.completion.choice + name: llm.content.openai.completion.choice + type: event brief: > These are the attributes when instrumenting OpenAI LLM requests and recording choices in the result. attributes: @@ -150,7 +162,7 @@ groups: requirement_level: conditionally_required: > Required if the choice is the result of a tool call. - - ref: llm.openai.tool_call.type + - ref: llm.openai.tool.type requirement_level: conditionally_required: > Required if the choice is the result of a tool call. From 0891f913fb51a61d8baf100582b41b2b2b000346 Mon Sep 17 00:00:00 2001 From: Nir Gazit Date: Fri, 12 Jan 2024 11:53:35 +0100 Subject: [PATCH 4/8] chore: @lmolkova reviews --- docs/ai/README.md | 2 +- docs/ai/llm-spans.md | 99 ++++++++++++++++++++++++++++++++++ docs/ai/openai.md | 114 ++++++++++++++++++++++++++++++++++++++++ model/registry/llm.yaml | 6 +-- model/trace/llm.yaml | 16 +++--- 5 files changed, 225 insertions(+), 12 deletions(-) create mode 100644 docs/ai/llm-spans.md create mode 100644 docs/ai/openai.md diff --git a/docs/ai/README.md b/docs/ai/README.md index f04a867a22..855503f97c 100644 --- a/docs/ai/README.md +++ b/docs/ai/README.md @@ -21,4 +21,4 @@ Technology specific semantic conventions are defined for the following LLM provi * [OpenAI](openai.md): Semantic Conventions for *OpenAI*. -[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md \ No newline at end of file +[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.26.0/specification/document-status.md \ No newline at end of file diff --git a/docs/ai/llm-spans.md b/docs/ai/llm-spans.md new file mode 100644 index 0000000000..19c4162321 --- /dev/null +++ b/docs/ai/llm-spans.md @@ -0,0 +1,99 @@ + + +# Semantic Conventions for LLM requests + +**Status**: [Experimental][DocumentStatus] + + + + + +- [LLM Request attributes](#llm-request-attributes) +- [Configuration](#configuration) +- [Semantic Conventions for specific LLM technologies](#semantic-conventions-for-specific-llm-technologies) + + + +A request to an LLM is modeled as a span in a trace. + +The **span name** SHOULD be set to a low cardinality value representing the request made to an LLM. +It MAY be a name of the API endpoint for the LLM being called. + +## Configuration + +Instrumentations for LLMs MUST offer the ability to turn off capture of prompts and completions. This is for three primary reasons: + +1. Data privacy concerns. End users of LLM applications may input sensitive information or personally identifiable information (PII) that they do not wish to be sent to a telemetry backend. +2. Data size concerns. Although there is no specified limit to sizes, there are practical limitations in programming languages and telemety systems. Some LLMs allow for extremely large context windows that end users may take full advantage of. +3. Performance concerns. Sending large amounts of data to a telemetry backend may cause performance issues for the application. + +By default, these configurations SHOULD NOT capture prompts and completions. + +## LLM Request attributes + +These attributes track input data and metadata for a request to an LLM. Each attribute represents a concept that is common to most LLMs. + + +| Attribute | Type | Description | Examples | Requirement Level | +|---|---|---|---|---| +| `llm.vendor` | string | The name of the LLM foundation model vendor, if applicable. If not using a vendor-supplied model, this field is left blank. | `openai` | Recommended | +| `llm.request.model` | string | The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model requested. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4` | Required | +| `llm.request.max_tokens` | int | The maximum number of tokens the LLM generates for a request. | `100` | Recommended | +| `llm.temperature` | float | The temperature setting for the LLM request. | `0.0` | Recommended | +| `llm.top_p` | float | The top_p sampling setting for the LLM request. | `1.0` | Recommended | +| `llm.stream` | bool | Whether the LLM responds with a stream. | `false` | Recommended | +| `llm.stop_sequences` | array | Array of strings the LLM uses as a stop sequence. | `["stop1"]` | Recommended | + +`llm.model` has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used. + +| Value | Description | +|---|---| +| `gpt-4` | GPT-4 | +| `gpt-4-32k` | GPT-4 with 32k context window | +| `gpt-3.5-turbo` | GPT-3.5-turbo | +| `gpt-3.5-turbo-16k` | GPT-3.5-turbo with 16k context window| +| `claude-instant-1` | Claude Instant (latest version) | +| `claude-2` | Claude 2 (latest version) | +| `other-llm` | Any LLM not listed in this table. Use for any fine-tuned version of a model. | + + +## LLM Response attributes + +These attributes track output data and metadata for a response from an LLM. Each attribute represents a concept that is common to most LLMs. + + +| Attribute | Type | Description | Examples | Requirement Level | +|---|---|---|---|---| +| `llm.response.id` | string | The unique identifier for the completion. | `chatcmpl-123` | Recommended | +| `llm.response.model` | string | The name of the LLM a response is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model actually used. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4-0613` | Required | +| `llm.response.finish_reason` | string | The reason the model stopped generating tokens | `stop` | Recommended | +| `llm.usage.prompt_tokens` | int | The number of tokens used in the LLM prompt. | `100` | Recommended | +| `llm.usage.completion_tokens` | int | The number of tokens used in the LLM response (completion). | `180` | Recommended | +| `llm.usage.total_tokens` | int | The total number of tokens used in the LLM prompt and response. | `280` | Recommended | + +`llm.response.finish_reason` MUST be one of the following: + +| Value | Description | +|---|---| +| `stop` | If the model hit a natural stop point or a provided stop sequence. | +| `max_tokens` | If the maximum number of tokens specified in the request was reached. | +| `tool_call` | If a function / tool call was made by the model (for models that support such functionality). | + + +## Events + +In the lifetime of an LLM span, an event for prompts sent and completions received MAY be created, depending on the configuration of the instrumentation. + + +| Attribute | Type | Description | Examples | Requirement Level | +| `llm.prompt` | string | The full prompt string sent to an LLM in a request. If the LLM accepts a more complex input like a JSON object made up of several pieces (such as OpenAI's different message types), this field is blank, and the response is instead captured in an event determined by the specific LLM technology semantic convention. | `\n\nHuman:You are an AI assistant that tells jokes. Can you tell me a joke about OpenTelemetry?\n\nAssistant:` | Recommended | + + + +| Attribute | Type | Description | Examples | Requirement Level | +| `llm.completion` | string | The full response string from an LLM. If the LLM responds with a more complex output like a JSON object made up of several pieces (such as OpenAI's message choices), this field is the content of the response. If the LLM produces multiple responses, then this field is left blank, and each response is instead captured in an event determined by the specific LLM technology semantic convention.| `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Recommended | + + +[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md \ No newline at end of file diff --git a/docs/ai/openai.md b/docs/ai/openai.md new file mode 100644 index 0000000000..4c7acf404a --- /dev/null +++ b/docs/ai/openai.md @@ -0,0 +1,114 @@ + + +# Semantic Conventions for OpenAI Spans + +**Status**: [Experimental][DocumentStatus] + +This document outlines the Semantic Conventions specific to +[OpenAI](https://platform.openai.com/) spans, extending the general semantics +found in the [LLM Semantic Conventions](llm-spans.md). These conventions are +designed to standardize telemetry data for OpenAI interactions, particularly +focusing on the `/chat/completions` endpoint. By following to these guidelines, +developers can ensure consistent, meaningful, and easily interpretable telemetry +data across different applications and platforms. + +## Chat Completions + +The span name for OpenAI chat completions SHOULD be `openai.chat` +to maintain consistency and clarity in telemetry data. + +## Request Attributes + +These are the attributes when instrumenting OpenAI LLM requests with the +`/chat/completions` endpoint. + + +| Attribute | Type | Description | Examples | Requirement Level | +|---|---|---|---|---| +| `llm.vendor` | string | The name of the LLM foundation model vendor, if applicable. If not using a vendor-supplied model, this field is left blank. | `openai` | Recommended | +| `llm.request.model` | string | The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model requested. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4` | Required | +| `llm.request.max_tokens` | int | The maximum number of tokens the LLM generates for a request. | `100` | Recommended | +| `llm.temperature` | float | The temperature setting for the LLM request. | `0.0` | Recommended | +| `llm.top_p` | float | The top_p sampling setting for the LLM request. | `1.0` | Recommended | +| `llm.stream` | bool | Whether the LLM responds with a stream. | `false` | Recommended | +| `llm.stop_sequences` | array | Array of strings the LLM uses as a stop sequence. | `["stop1"]` | Recommended | +| `llm.openai.n` | integer | The number of completions to generate. | `1` | Recommended | +| `llm.openai.presence_penalty` | float | If present, the `presence_penalty` used in an OpenAI request. Value is between -2.0 and 2.0. | `-0.5` | Recommended | +| `llm.openai.frequency_penalty` | float | If present, the `frequency_penalty` used in an OpenAI request. Value is between -2.0 and 2.0. | `-0.5` | Recommended | +| `llm.openai.logit_bias` | string | If present, the JSON-encoded string of a `logit_bias` used in an OpenAI request. | `{2435:-100, 640:-100}` | Recommended | +| `llm.openai.user` | string | If present, the `user` used in an OpenAI request. | `bob` | Opt-in | +| `llm.openai.response_format` | string | An object specifying the format that the model must output. Either `text` or `json_object` | `text` | Recommended | +| `llm.openai.seed` | integer | Seed used in request to improve determinism. | `1234` | Recommended | + + +## Response attributes + +Attributes for chat completion responses SHOULD follow these conventions: + + +| Attribute | Type | Description | Examples | Requirement Level | +|---|---|---|---|---| +| `llm.response.id` | string | The unique identifier for the completion. | `chatcmpl-123` | Recommended | +| `llm.response.model` | string | The name of the LLM a response is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model actually used. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4-0613` | Required | +| `llm.response.finish_reason` | string | The reason the model stopped generating tokens | `stop` | Recommended | +| `llm.usage.prompt_tokens` | int | The number of tokens used in the LLM prompt. | `100` | Recommended | +| `llm.usage.completion_tokens` | int | The number of tokens used in the LLM response (completion). | `180` | Recommended | +| `llm.usage.total_tokens` | int | The total number of tokens used in the LLM prompt and response. | `280` | Recommended | +| `llm.openai.created` | int | The UNIX timestamp (in seconds) if when the completion was created. | `1677652288` | Recommended | +| `llm.openai.system_fingerprint` | string | This fingerprint represents the backend configuration that the model runs with. | asdf987123 | Recommended | + + +## Request Events + +In the lifetime of an LLM span, an event for prompts sent and completions received MAY be created, depending on the configuration of the instrumentation. +Because OpenAI uses a more complex prompt structure, these events will be used instead of the generic ones detailed in the [LLM Semantic Conventions](llm-spans.md). + +### Prompt Events + +Prompt event name SHOULD be `llm.openai.prompt`. + + +| Attribute | Type | Description | Examples | Requirement Level | +|---|---|---|---|---| +| `role` | string | The role of the prompt author, can be one of `system`, `user`, `assistant`, or `tool` | `system` | Required | +| `content` | string | The content for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. | `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Required | +| `tool_call_id` | string | If role is `tool` or `function`, then this tool call that this message is responding to. | `get_current_weather` | Conditionally Required: If `role` is `tool`. | + + +### Tools Events + +Tools event name SHOULD be `llm.openai.tool`, specifying potential tools or functions the LLM can use. + + +| Attribute | Type | Description | Examples | Requirement Level | +|---|---|---|---|---| +| `type` | string | They type of the tool. Currently, only `function` is supported. | `function` | Required | +| `function.name` | string | The name of the function to be called. | `get_weather` | Required ! +| `function.description` | string | A description of what the function does, used by the model to choose when and how to call the function. | `` | Required | +| `function.parameters` | string | JSON-encoded string of the parameter object for the function. | `{"type": "object", "properties": {}}` | Required | + + +### Choice Events + +Recording details about Choices in each response MAY be included as +Span Events. + +Choice event name SHOULD be `llm.openai.choice`. + +If there is more than one `tool_call`, separate events SHOULD be used. + + +| `type` | string | Either `delta` or `message`. | `message` | Required | +|---|---|---|---|---| +| `finish_reason` | string | The reason the OpenAI model stopped generating tokens for this chunk. | `stop` | Recommended | +| `role` | string | The assigned role for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. | `system` | Required | +| `content` | string | The content for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. | `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Required | +| `tool_call.id` | string | If exists, the ID of the tool call. | `call_BP08xxEhU60txNjnz3z9R4h9` | Required | +| `tool_call.type` | string | Currently only `function` is supported. | `function` | Required | +| `tool_call.function.name` | string | If exists, the name of a function call for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. | `get_weather_report` | Required | +| `tool_call.function.arguments` | string | If exists, the arguments to call a function call with for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. | `{"type": "object", "properties": {"some":"data"}}` | Required | + + +[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md \ No newline at end of file diff --git a/model/registry/llm.yaml b/model/registry/llm.yaml index 9bb6ee669b..d45bad3368 100644 --- a/model/registry/llm.yaml +++ b/model/registry/llm.yaml @@ -5,7 +5,7 @@ groups: brief: > This document defines the attributes used to describe telemetry in the context of LLM (Large Language Models) requests and responses. attributes: - - id: request.vendor + - id: system type: string brief: The name of the LLM foundation model vendor, if applicable. examples: 'openai' @@ -30,7 +30,7 @@ groups: brief: The top_p sampling setting for the LLM request. examples: [1.0] tag: llm-generic-request - - id: request.stream + - id: request.is_stream type: boolean brief: Whether the LLM responds with a stream. examples: [false] @@ -41,7 +41,7 @@ groups: examples: ["stop1"] tag: llm-generic-request - id: response.id - type: string + type: string[] brief: The unique identifier for the completion. examples: ['chatcmpl-123'] tag: llm-generic-response diff --git a/model/trace/llm.yaml b/model/trace/llm.yaml index 4df11e1b5c..17fe1e709f 100644 --- a/model/trace/llm.yaml +++ b/model/trace/llm.yaml @@ -4,7 +4,7 @@ groups: brief: > A request to an LLM is modeled as a span in a trace. The span name should be a low cardinality value representing the request made to an LLM, like the name of the API endpoint being called. attributes: - - ref: llm.request.vendor + - ref: llm.system requirement_level: recommended note: > The name of the LLM foundation model vendor, if applicable. If not using a vendor-supplied model, this field is left blank. @@ -18,7 +18,7 @@ groups: requirement_level: recommended - ref: llm.request.top_p requirement_level: recommended - - ref: llm.request.stream + - ref: llm.request.is_stream requirement_level: recommended - ref: llm.request.stop_sequences requirement_level: recommended @@ -65,9 +65,9 @@ groups: - id: llm.openai type: span brief: > - These are the attributes when instrumenting OpenAI LLM requests with the `/chat/completions` endpoint. + A span representing a request to OpenAI's API, providing additional information on top of the generic llm.request. attributes: - - ref: llm.request.vendor + - ref: llm.system requirement_level: recommended examples: ['openai', 'microsoft'] tag: tech-specific-openai-request @@ -82,7 +82,7 @@ groups: tag: tech-specific-openai-request - ref: llm.request.top_p tag: tech-specific-openai-request - - ref: llm.request.stream + - ref: llm.request.is_stream tag: tech-specific-openai-request - ref: llm.request.stop_sequences tag: tech-specific-openai-request @@ -119,7 +119,7 @@ groups: name: llm.content.openai.prompt type: event brief: > - These are the attributes when instrumenting OpenAI LLM requests and recording prompts in the request. + This event is fired when a completion request is sent to OpenAI, specifying the prompt that was sent. attributes: - ref: llm.openai.role requirement_level: required @@ -134,7 +134,7 @@ groups: name: llm.content.openai.tool type: event brief: > - These are the attributes when instrumenting OpenAI LLM requests that specify tools (or functions) the LLM can use. + This event is fired when a completion request is sent to OpenAI, specifying tools that the LLM can use. attributes: - ref: llm.openai.tool.type requirement_level: required @@ -149,7 +149,7 @@ groups: name: llm.content.openai.completion.choice type: event brief: > - These are the attributes when instrumenting OpenAI LLM requests and recording choices in the result. + This event is fired when a completion response is returned from OpenAI, specifying one possibile completion returned by the LLM. attributes: - ref: llm.openai.choice.type requirement_level: required From d5a9753ddf682ffb9eefca92aeec8f94202cca2f Mon Sep 17 00:00:00 2001 From: Drew Robbins Date: Sun, 28 Jan 2024 12:16:00 -0800 Subject: [PATCH 5/8] Add OpenAI metrics --- docs/ai/README.md | 3 +- docs/ai/openai-metrics.md | 375 +++++++++++++++++++++++++++++++++ model/metrics/llm-metrics.yaml | 109 ++++++++++ model/registry/llm.yaml | 9 + 4 files changed, 495 insertions(+), 1 deletion(-) create mode 100644 docs/ai/openai-metrics.md create mode 100644 model/metrics/llm-metrics.yaml diff --git a/docs/ai/README.md b/docs/ai/README.md index 855503f97c..bf83b94856 100644 --- a/docs/ai/README.md +++ b/docs/ai/README.md @@ -19,6 +19,7 @@ Semantic conventions for LLM operations are defined for the following signals: Technology specific semantic conventions are defined for the following LLM providers: -* [OpenAI](openai.md): Semantic Conventions for *OpenAI*. +* [OpenAI](openai.md): Semantic Conventions for *OpenAI* spans. +* [OpenAI Metrics](openai-metrics.md): Semantic Conventions for *OpenAI* metrics. [DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.26.0/specification/document-status.md \ No newline at end of file diff --git a/docs/ai/openai-metrics.md b/docs/ai/openai-metrics.md new file mode 100644 index 0000000000..5b231da602 --- /dev/null +++ b/docs/ai/openai-metrics.md @@ -0,0 +1,375 @@ + + +# Semantic Conventions for OpenAI Matrics + +**Status**: [Experimental][DocumentStatus] + +This document defines semantic conventions for OpenAI client metrics. + + + + + +- [Chat completions](#chat-completions) + * [Metric: `openai.chat_completions.tokens`](#metric-openaichat_completionstokens) + * [Metric: `openai.chat_completions.choices`](#metric-openaichat_completionschoices) + * [Metric: `openai.chat_completions.duration`](#metric-openaichat_completionsduration) +- [Embeddings](#embeddings) + * [Metric: `openai.embeddings.tokens`](#metric-openaiembeddingstokens) + * [Metric: `openai.embeddings.vector_size`](#metric-openaiembeddingsvector_size) + * [Metric: `openai.embeddings.duration`](#metric-openaiembeddingsduration) +- [Image generation](#image-generation) + * [Metric: `openai.image_generations.duration`](#metric-openaiimage_generationsduration) + + + +## Chat completions + +### Metric: `openai.chat_completions.tokens` + +**Status**: [Experimental][DocumentStatus] + +This metric is required. + + +| Name | Instrument Type | Unit (UCUM) | Description | +| -------- | --------------- | ----------- | -------------- | +| `llm.openai.chat_completions.tokens` | Counter | `token` | Number of tokens used in prompt and completions. | + + + +| Attribute | Type | Description | Examples | Requirement Level | +|---|---|---|---|---| +| [`error.type`](../attributes-registry/error.md) | string | Describes a class of error the operation ended with. [1] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | Conditionally Required: if the operation ended in error | +| [`llm.response.model`](../attributes-registry/llm.md) | string | The name of the LLM a response is being made to. | `gpt-4-0613` | Required | +| [`llm.usage.token_type`](../attributes-registry/llm.md) | string | The type of token. | `prompt` | Recommended | +| [`server.address`](../attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [2] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | Required | + +**[1]:** The `error.type` SHOULD be predictable and SHOULD have low cardinality. +Instrumentations SHOULD document the list of errors they report. + +The cardinality of `error.type` within one instrumentation library SHOULD be low. +Telemetry consumers that aggregate data from multiple instrumentation libraries and applications +should be prepared for `error.type` to have high cardinality at query time when no +additional filters are applied. + +If the operation has completed successfully, instrumentations SHOULD NOT set `error.type`. + +If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes), +it's RECOMMENDED to: + +* Use a domain-specific attribute +* Set `error.type` to capture all errors, regardless of whether they are defined within the domain-specific set or not. + +**[2]:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available. + +`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used. + +| Value | Description | +|---|---| +| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | + +`llm.usage.token_type` MUST be one of the following: + +| Value | Description | +|---|---| +| `prompt` | prompt | +| `completion` | completion | + + +### Metric: `openai.chat_completions.choices` + +**Status**: [Experimental][DocumentStatus] + +This metric is required. + + +| Name | Instrument Type | Unit (UCUM) | Description | +| -------- | --------------- | ----------- | -------------- | +| `llm.openai.chat_completions.choices` | Counter | `choice` | Number of choices returned by chat completions call | + + + +| Attribute | Type | Description | Examples | Requirement Level | +|---|---|---|---|---| +| [`error.type`](../attributes-registry/error.md) | string | Describes a class of error the operation ended with. [1] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | Conditionally Required: if the operation ended in error | +| [`llm.response.finish_reason`](../attributes-registry/llm.md) | string | The reason the model stopped generating tokens. | `stop` | Recommended | +| [`llm.response.model`](../attributes-registry/llm.md) | string | The name of the LLM a response is being made to. | `gpt-4-0613` | Required | +| [`server.address`](../attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [2] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | Required | + +**[1]:** The `error.type` SHOULD be predictable and SHOULD have low cardinality. +Instrumentations SHOULD document the list of errors they report. + +The cardinality of `error.type` within one instrumentation library SHOULD be low. +Telemetry consumers that aggregate data from multiple instrumentation libraries and applications +should be prepared for `error.type` to have high cardinality at query time when no +additional filters are applied. + +If the operation has completed successfully, instrumentations SHOULD NOT set `error.type`. + +If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes), +it's RECOMMENDED to: + +* Use a domain-specific attribute +* Set `error.type` to capture all errors, regardless of whether they are defined within the domain-specific set or not. + +**[2]:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available. + +`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used. + +| Value | Description | +|---|---| +| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | + + + +### Metric: `openai.chat_completions.duration` + +**Status**: [Experimental][DocumentStatus] + +This metric is required. + +This metric SHOULD be specified with +[`ExplicitBucketBoundaries`](https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/metrics/api.md#instrument-advice) +of `[ 0, 0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1, 2.5, 5, 7.5, 10 ]`. + + +| Name | Instrument Type | Unit (UCUM) | Description | +| -------- | --------------- | ----------- | -------------- | +| `llm.openai.chat_completions.duration` | Histogram | `s` | Duration of chat completion operation | + + + +| Attribute | Type | Description | Examples | Requirement Level | +|---|---|---|---|---| +| [`error.type`](../attributes-registry/error.md) | string | Describes a class of error the operation ended with. [1] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | Conditionally Required: if the operation ended in error | +| [`llm.response.finish_reason`](../attributes-registry/llm.md) | string | The reason the model stopped generating tokens. | `stop` | Recommended | +| [`llm.response.model`](../attributes-registry/llm.md) | string | The name of the LLM a response is being made to. | `gpt-4-0613` | Required | +| [`server.address`](../attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [2] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | Required | + +**[1]:** The `error.type` SHOULD be predictable and SHOULD have low cardinality. +Instrumentations SHOULD document the list of errors they report. + +The cardinality of `error.type` within one instrumentation library SHOULD be low. +Telemetry consumers that aggregate data from multiple instrumentation libraries and applications +should be prepared for `error.type` to have high cardinality at query time when no +additional filters are applied. + +If the operation has completed successfully, instrumentations SHOULD NOT set `error.type`. + +If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes), +it's RECOMMENDED to: + +* Use a domain-specific attribute +* Set `error.type` to capture all errors, regardless of whether they are defined within the domain-specific set or not. + +**[2]:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available. + +`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used. + +| Value | Description | +|---|---| +| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | + + +## Embeddings + +### Metric: `openai.embeddings.tokens` + +**Status**: [Experimental][DocumentStatus] + +This metric is required. + + +| Name | Instrument Type | Unit (UCUM) | Description | +| -------- | --------------- | ----------- | -------------- | +| `llm.openai.embeddings.tokens` | Counter | `token` | Number of tokens used in prompt and completions. | + + + +| Attribute | Type | Description | Examples | Requirement Level | +|---|---|---|---|---| +| [`error.type`](../attributes-registry/error.md) | string | Describes a class of error the operation ended with. [1] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | Conditionally Required: if the operation ended in error | +| [`llm.response.model`](../attributes-registry/llm.md) | string | The name of the LLM a response is being made to. | `gpt-4-0613` | Required | +| [`llm.usage.token_type`](../attributes-registry/llm.md) | string | The type of token. | `prompt` | Recommended | +| [`server.address`](../attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [2] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | Required | + +**[1]:** The `error.type` SHOULD be predictable and SHOULD have low cardinality. +Instrumentations SHOULD document the list of errors they report. + +The cardinality of `error.type` within one instrumentation library SHOULD be low. +Telemetry consumers that aggregate data from multiple instrumentation libraries and applications +should be prepared for `error.type` to have high cardinality at query time when no +additional filters are applied. + +If the operation has completed successfully, instrumentations SHOULD NOT set `error.type`. + +If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes), +it's RECOMMENDED to: + +* Use a domain-specific attribute +* Set `error.type` to capture all errors, regardless of whether they are defined within the domain-specific set or not. + +**[2]:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available. + +`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used. + +| Value | Description | +|---|---| +| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | + +`llm.usage.token_type` MUST be one of the following: + +| Value | Description | +|---|---| +| `prompt` | prompt | +| `completion` | completion | + + +### Metric: `openai.embeddings.vector_size` + +**Status**: [Experimental][DocumentStatus] + +This metric is required. + + +| Name | Instrument Type | Unit (UCUM) | Description | +| -------- | --------------- | ----------- | -------------- | +| `llm.openai.embeddings.vector_size` | Counter | `element` | he size of returned vector. | + + + +| Attribute | Type | Description | Examples | Requirement Level | +|---|---|---|---|---| +| [`error.type`](../attributes-registry/error.md) | string | Describes a class of error the operation ended with. [1] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | Conditionally Required: if the operation ended in error | +| [`llm.response.model`](../attributes-registry/llm.md) | string | The name of the LLM a response is being made to. | `gpt-4-0613` | Required | +| [`server.address`](../attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [2] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | Required | + +**[1]:** The `error.type` SHOULD be predictable and SHOULD have low cardinality. +Instrumentations SHOULD document the list of errors they report. + +The cardinality of `error.type` within one instrumentation library SHOULD be low. +Telemetry consumers that aggregate data from multiple instrumentation libraries and applications +should be prepared for `error.type` to have high cardinality at query time when no +additional filters are applied. + +If the operation has completed successfully, instrumentations SHOULD NOT set `error.type`. + +If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes), +it's RECOMMENDED to: + +* Use a domain-specific attribute +* Set `error.type` to capture all errors, regardless of whether they are defined within the domain-specific set or not. + +**[2]:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available. + +`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used. + +| Value | Description | +|---|---| +| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | + + +### Metric: `openai.embeddings.duration` + +**Status**: [Experimental][DocumentStatus] + +This metric is required. + +This metric SHOULD be specified with +[`ExplicitBucketBoundaries`](https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/metrics/api.md#instrument-advice) +of `[ 0, 0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1, 2.5, 5, 7.5, 10 ]`. + + +| Name | Instrument Type | Unit (UCUM) | Description | +| -------- | --------------- | ----------- | -------------- | +| `llm.openai.embeddings.duration` | Histogram | `s` | Duration of embeddings operation | + + + +| Attribute | Type | Description | Examples | Requirement Level | +|---|---|---|---|---| +| [`error.type`](../attributes-registry/error.md) | string | Describes a class of error the operation ended with. [1] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | Conditionally Required: if the operation ended in error | +| [`llm.response.model`](../attributes-registry/llm.md) | string | The name of the LLM a response is being made to. | `gpt-4-0613` | Required | +| [`server.address`](../attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [2] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | Required | + +**[1]:** The `error.type` SHOULD be predictable and SHOULD have low cardinality. +Instrumentations SHOULD document the list of errors they report. + +The cardinality of `error.type` within one instrumentation library SHOULD be low. +Telemetry consumers that aggregate data from multiple instrumentation libraries and applications +should be prepared for `error.type` to have high cardinality at query time when no +additional filters are applied. + +If the operation has completed successfully, instrumentations SHOULD NOT set `error.type`. + +If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes), +it's RECOMMENDED to: + +* Use a domain-specific attribute +* Set `error.type` to capture all errors, regardless of whether they are defined within the domain-specific set or not. + +**[2]:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available. + +`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used. + +| Value | Description | +|---|---| +| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | + + +[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md + +## Image generation + +### Metric: `openai.image_generations.duration` + +**Status**: [Experimental][DocumentStatus] + +This metric is required. + +This metric SHOULD be specified with +[`ExplicitBucketBoundaries`](https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/metrics/api.md#instrument-advice) +of `[ 0, 0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1, 2.5, 5, 7.5, 10 ]`. + + +| Name | Instrument Type | Unit (UCUM) | Description | +| -------- | --------------- | ----------- | -------------- | +| `llm.openai.image_generations.duration` | Histogram | `s` | Duration of image generations operation | + + + +| Attribute | Type | Description | Examples | Requirement Level | +|---|---|---|---|---| +| [`error.type`](../attributes-registry/error.md) | string | Describes a class of error the operation ended with. [1] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | Recommended | +| [`llm.response.model`](../attributes-registry/llm.md) | string | The name of the LLM a response is being made to. | `gpt-4-0613` | Required | +| [`server.address`](../attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [2] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | Required | + +**[1]:** The `error.type` SHOULD be predictable and SHOULD have low cardinality. +Instrumentations SHOULD document the list of errors they report. + +The cardinality of `error.type` within one instrumentation library SHOULD be low. +Telemetry consumers that aggregate data from multiple instrumentation libraries and applications +should be prepared for `error.type` to have high cardinality at query time when no +additional filters are applied. + +If the operation has completed successfully, instrumentations SHOULD NOT set `error.type`. + +If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes), +it's RECOMMENDED to: + +* Use a domain-specific attribute +* Set `error.type` to capture all errors, regardless of whether they are defined within the domain-specific set or not. + +**[2]:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available. + +`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used. + +| Value | Description | +|---|---| +| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | + + +[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md \ No newline at end of file diff --git a/model/metrics/llm-metrics.yaml b/model/metrics/llm-metrics.yaml new file mode 100644 index 0000000000..2ca1ff3b41 --- /dev/null +++ b/model/metrics/llm-metrics.yaml @@ -0,0 +1,109 @@ +groups: + - id: metric.openai.chat_completions.tokens + type: metric + metric_name: llm.openai.chat_completions.tokens + brief: "Number of tokens used in prompt and completions." + instrument: counter + unit: "token" + stability: experimental + attributes: + - ref: llm.response.model + requirement_level: required + - ref: error.type + requirement_level: + conditionally_required: "if the operation ended in error" + - ref: llm.usage.token_type + - ref: server.address + requirement_level: required + - id: metric.openai.chat_completions.choices + type: metric + metric_name: llm.openai.chat_completions.choices + brief: "Number of choices returned by chat completions call" + instrument: counter + unit: "choice" + stability: experimental + attributes: + - ref: llm.response.model + requirement_level: required + - ref: error.type + requirement_level: + conditionally_required: "if the operation ended in error" + - ref: llm.response.finish_reason + - ref: server.address + requirement_level: required + - id: metric.openai.chat_completions.duration + type: metric + metric_name: llm.openai.chat_completions.duration + brief: "Duration of chat completion operation" + instrument: histogram + unit: 's' + stability: experimental + attributes: + - ref: llm.response.model + requirement_level: required + - ref: error.type + requirement_level: + conditionally_required: "if the operation ended in error" + - ref: llm.response.finish_reason + - ref: server.address + requirement_level: required + - id: metric.openai.embeddings.tokens + type: metric + metric_name: llm.openai.embeddings.tokens + brief: "Number of tokens used in prompt and completions." + instrument: counter + unit: "token" + stability: experimental + attributes: + - ref: llm.response.model + requirement_level: required + - ref: error.type + requirement_level: + conditionally_required: "if the operation ended in error" + - ref: llm.usage.token_type + - ref: server.address + requirement_level: required + - id: metric.openai.embeddings.vector_size + type: metric + metric_name: llm.openai.embeddings.vector_size + brief: "he size of returned vector." + instrument: counter + unit: "element" + stability: experimental + attributes: + - ref: llm.response.model + requirement_level: required + - ref: error.type + requirement_level: + conditionally_required: "if the operation ended in error" + - ref: server.address + requirement_level: required + - id: metric.openai.embeddings.duration + type: metric + metric_name: llm.openai.embeddings.duration + brief: "Duration of embeddings operation" + instrument: histogram + unit: 's' + stability: experimental + attributes: + - ref: llm.response.model + requirement_level: required + - ref: error.type + requirement_level: + conditionally_required: "if the operation ended in error" + - ref: server.address + requirement_level: required + - id: metric.openai.image_generations.duration + type: metric + metric_name: llm.openai.image_generations.duration + brief: "Duration of image generations operation" + instrument: histogram + unit: 's' + stability: experimental + attributes: + - ref: llm.response.model + requirement_level: required + - ref: error.type + conditionally_required: "if the operation ended in error" + - ref: server.address + requirement_level: required \ No newline at end of file diff --git a/model/registry/llm.yaml b/model/registry/llm.yaml index d45bad3368..1f59626ef4 100644 --- a/model/registry/llm.yaml +++ b/model/registry/llm.yaml @@ -55,6 +55,15 @@ groups: brief: The reason the model stopped generating tokens. examples: ['stop'] tag: llm-generic-response + - id: usage.token_type + type: + members: + - id: prompt + value: 'prompt' + - id: completion + value: 'completion' + brief: The type of token. + examples: ['prompt'] - id: usage.prompt_tokens type: int brief: The number of tokens used in the LLM prompt. From 0ef1c1b190a811520b0ce332ef5e5c4e1847e0f0 Mon Sep 17 00:00:00 2001 From: Drew Robbins Date: Mon, 29 Jan 2024 02:16:02 +0000 Subject: [PATCH 6/8] Fix linting errors --- docs/ai/README.md | 2 +- docs/ai/llm-spans.md | 7 ++++--- docs/ai/openai-metrics.md | 5 +---- docs/ai/openai.md | 26 +++++++++++++------------- 4 files changed, 19 insertions(+), 21 deletions(-) diff --git a/docs/ai/README.md b/docs/ai/README.md index bf83b94856..d5d51dcd75 100644 --- a/docs/ai/README.md +++ b/docs/ai/README.md @@ -22,4 +22,4 @@ Technology specific semantic conventions are defined for the following LLM provi * [OpenAI](openai.md): Semantic Conventions for *OpenAI* spans. * [OpenAI Metrics](openai-metrics.md): Semantic Conventions for *OpenAI* metrics. -[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.26.0/specification/document-status.md \ No newline at end of file +[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.26.0/specification/document-status.md diff --git a/docs/ai/llm-spans.md b/docs/ai/llm-spans.md index 19c4162321..12884056f9 100644 --- a/docs/ai/llm-spans.md +++ b/docs/ai/llm-spans.md @@ -10,9 +10,10 @@ linkTitle: LLM Calls -- [LLM Request attributes](#llm-request-attributes) - [Configuration](#configuration) -- [Semantic Conventions for specific LLM technologies](#semantic-conventions-for-specific-llm-technologies) +- [LLM Request attributes](#llm-request-attributes) +- [LLM Response attributes](#llm-response-attributes) +- [Events](#events) @@ -96,4 +97,4 @@ In the lifetime of an LLM span, an event for prompts sent and completions receiv | `llm.completion` | string | The full response string from an LLM. If the LLM responds with a more complex output like a JSON object made up of several pieces (such as OpenAI's message choices), this field is the content of the response. If the LLM produces multiple responses, then this field is left blank, and each response is instead captured in an event determined by the specific LLM technology semantic convention.| `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Recommended | -[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md \ No newline at end of file +[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md diff --git a/docs/ai/openai-metrics.md b/docs/ai/openai-metrics.md index 5b231da602..656148318a 100644 --- a/docs/ai/openai-metrics.md +++ b/docs/ai/openai-metrics.md @@ -124,7 +124,6 @@ it's RECOMMENDED to: | `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | - ### Metric: `openai.chat_completions.duration` **Status**: [Experimental][DocumentStatus] @@ -320,8 +319,6 @@ it's RECOMMENDED to: | `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | -[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md - ## Image generation ### Metric: `openai.image_generations.duration` @@ -372,4 +369,4 @@ it's RECOMMENDED to: | `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | -[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md \ No newline at end of file +[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md diff --git a/docs/ai/openai.md b/docs/ai/openai.md index 4c7acf404a..8105be0f29 100644 --- a/docs/ai/openai.md +++ b/docs/ai/openai.md @@ -6,22 +6,22 @@ linkTitle: OpenAI **Status**: [Experimental][DocumentStatus] -This document outlines the Semantic Conventions specific to -[OpenAI](https://platform.openai.com/) spans, extending the general semantics -found in the [LLM Semantic Conventions](llm-spans.md). These conventions are -designed to standardize telemetry data for OpenAI interactions, particularly -focusing on the `/chat/completions` endpoint. By following to these guidelines, +This document outlines the Semantic Conventions specific to +[OpenAI](https://platform.openai.com/) spans, extending the general semantics +found in the [LLM Semantic Conventions](llm-spans.md). These conventions are +designed to standardize telemetry data for OpenAI interactions, particularly +focusing on the `/chat/completions` endpoint. By following to these guidelines, developers can ensure consistent, meaningful, and easily interpretable telemetry data across different applications and platforms. ## Chat Completions -The span name for OpenAI chat completions SHOULD be `openai.chat` +The span name for OpenAI chat completions SHOULD be `openai.chat` to maintain consistency and clarity in telemetry data. ## Request Attributes -These are the attributes when instrumenting OpenAI LLM requests with the +These are the attributes when instrumenting OpenAI LLM requests with the `/chat/completions` endpoint. @@ -67,7 +67,7 @@ Because OpenAI uses a more complex prompt structure, these events will be used i ### Prompt Events -Prompt event name SHOULD be `llm.openai.prompt`. +Prompt event name SHOULD be `llm.openai.prompt`. | Attribute | Type | Description | Examples | Requirement Level | @@ -87,15 +87,15 @@ Tools event name SHOULD be `llm.openai.tool`, specifying potential tools or func | `type` | string | They type of the tool. Currently, only `function` is supported. | `function` | Required | | `function.name` | string | The name of the function to be called. | `get_weather` | Required ! | `function.description` | string | A description of what the function does, used by the model to choose when and how to call the function. | `` | Required | -| `function.parameters` | string | JSON-encoded string of the parameter object for the function. | `{"type": "object", "properties": {}}` | Required | +| `function.parameters` | string | JSON-encoded string of the parameter object for the function. | `{"type": "object", "properties": {}}` | Required | ### Choice Events -Recording details about Choices in each response MAY be included as -Span Events. +Recording details about Choices in each response MAY be included as +Span Events. -Choice event name SHOULD be `llm.openai.choice`. +Choice event name SHOULD be `llm.openai.choice`. If there is more than one `tool_call`, separate events SHOULD be used. @@ -111,4 +111,4 @@ If there is more than one `tool_call`, separate events SHOULD be used. | `tool_call.function.arguments` | string | If exists, the arguments to call a function call with for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. | `{"type": "object", "properties": {"some":"data"}}` | Required | -[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md \ No newline at end of file +[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md From fd57c6cf6fb1fd05c02591e26d38654d84532a69 Mon Sep 17 00:00:00 2001 From: Drew Robbins Date: Mon, 29 Jan 2024 02:27:49 +0000 Subject: [PATCH 7/8] Fix yamllint errors --- model/metrics/llm-metrics.yaml | 6 ++--- model/registry/llm.yaml | 6 ++--- model/trace/llm.yaml | 40 +++++++++++++++++++++++----------- 3 files changed, 33 insertions(+), 19 deletions(-) diff --git a/model/metrics/llm-metrics.yaml b/model/metrics/llm-metrics.yaml index 2ca1ff3b41..75db1e31ff 100644 --- a/model/metrics/llm-metrics.yaml +++ b/model/metrics/llm-metrics.yaml @@ -102,8 +102,8 @@ groups: stability: experimental attributes: - ref: llm.response.model - requirement_level: required - - ref: error.type + requirement_level: conditionally_required: "if the operation ended in error" + - ref: error.type - ref: server.address - requirement_level: required \ No newline at end of file + requirement_level: required diff --git a/model/registry/llm.yaml b/model/registry/llm.yaml index 1f59626ef4..31bf953b94 100644 --- a/model/registry/llm.yaml +++ b/model/registry/llm.yaml @@ -56,7 +56,7 @@ groups: examples: ['stop'] tag: llm-generic-response - id: usage.token_type - type: + type: members: - id: prompt value: 'prompt' @@ -183,7 +183,7 @@ groups: tag: tech-specific-openai-events - id: openai.function.arguments type: string - brief: If exists, the arguments to call a function call with for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. + brief: If exists, the arguments to call a function call with for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. examples: '{"type": "object", "properties": {"some":"data"}}' tag: tech-specific-openai-events - id: openai.choice.type @@ -195,4 +195,4 @@ groups: value: 'message' brief: The type of the choice, either `delta` or `message`. examples: 'message' - tag: tech-specific-openai-events \ No newline at end of file + tag: tech-specific-openai-events diff --git a/model/trace/llm.yaml b/model/trace/llm.yaml index 17fe1e709f..1c844732b0 100644 --- a/model/trace/llm.yaml +++ b/model/trace/llm.yaml @@ -11,7 +11,9 @@ groups: - ref: llm.request.model requirement_level: required note: > - The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model requested. If the LLM is a fine-tuned custom model, the value should have a more specific name than the base model that's been fine-tuned. + The name of the LLM a request is being made to. If the LLM is supplied by a vendor, + then the value must be the exact name of the model requested. If the LLM is a fine-tuned + custom model, the value should have a more specific name than the base model that's been fine-tuned. - ref: llm.request.max_tokens requirement_level: recommended - ref: llm.request.temperature @@ -27,7 +29,9 @@ groups: - ref: llm.response.model requirement_level: required note: > - The name of the LLM a response is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model actually used. If the LLM is a fine-tuned custom model, the value should have a more specific name than the base model that's been fine-tuned. + The name of the LLM a response is being made to. If the LLM is supplied by a vendor, + then the value must be the exact name of the model actually used. If the LLM is a + fine-tuned custom model, the value should have a more specific name than the base model that's been fine-tuned. - ref: llm.response.finish_reason requirement_level: recommended - ref: llm.usage.prompt_tokens @@ -44,13 +48,16 @@ groups: name: llm.content.prompt type: event brief: > - In the lifetime of an LLM span, events for prompts sent and completions received may be created, depending on the configuration of the instrumentation. + In the lifetime of an LLM span, events for prompts sent and completions received + may be created, depending on the configuration of the instrumentation. attributes: - ref: llm.prompt requirement_level: recommended note: > - The full prompt string sent to an LLM in a request. If the LLM accepts a more complex input like a JSON object, this field is blank, and the response is instead captured in an event determined by the specific LLM technology semantic convention. - + The full prompt string sent to an LLM in a request. If the LLM accepts a more + complex input like a JSON object, this field is blank, and the response is + instead captured in an event determined by the specific LLM technology semantic convention. + - id: llm.content.completion name: llm.content.completion type: event @@ -60,7 +67,11 @@ groups: - ref: llm.completion requirement_level: recommended note: > - The full response string from an LLM. If the LLM responds with a more complex output like a JSON object made up of several pieces (such as OpenAI's message choices), this field is the content of the response. If the LLM produces multiple responses, then this field is left blank, and each response is instead captured in an event determined by the specific LLM technology semantic convention. + The full response string from an LLM. If the LLM responds with a more + complex output like a JSON object made up of several pieces (such as OpenAI's message choices), + this field is the content of the response. If the LLM produces multiple responses, then this + field is left blank, and each response is instead captured in an event determined by the specific + LLM technology semantic convention. - id: llm.openai type: span @@ -74,7 +85,10 @@ groups: - ref: llm.request.model requirement_level: required note: > - The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model requested. If the LLM is a fine-tuned custom model, the value should have a more specific name than the base model that's been fine-tuned. + The name of the LLM a request is being made to. If the LLM is supplied by a + vendor, then the value must be the exact name of the model requested. If the + LLM is a fine-tuned custom model, the value should have a more specific name + than the base model that's been fine-tuned. tag: tech-specific-openai-request - ref: llm.request.max_tokens tag: tech-specific-openai-request @@ -126,7 +140,7 @@ groups: - ref: llm.openai.content requirement_level: required - ref: llm.openai.tool_call.id - requirement_level: + requirement_level: conditionally_required: > Required if the prompt role is `tool`. @@ -159,18 +173,18 @@ groups: - ref: llm.openai.content requirement_level: required - ref: llm.openai.tool_call.id - requirement_level: + requirement_level: conditionally_required: > Required if the choice is the result of a tool call. - ref: llm.openai.tool.type - requirement_level: + requirement_level: conditionally_required: > Required if the choice is the result of a tool call. - ref: llm.openai.function.name - requirement_level: + requirement_level: conditionally_required: > Required if the choice is the result of a tool call of type `function`. - ref: llm.openai.function.arguments - requirement_level: + requirement_level: conditionally_required: > - Required if the choice is the result of a tool call of type `function`. \ No newline at end of file + Required if the choice is the result of a tool call of type `function`. From c80b80c329faaf2b17cf74ec04bb4b5b1c5b5b07 Mon Sep 17 00:00:00 2001 From: Drew Robbins Date: Mon, 29 Jan 2024 05:04:35 +0000 Subject: [PATCH 8/8] Regenerate markdown based on yaml model --- docs/ai/llm-spans.md | 80 ++++++++++------------ docs/ai/openai-metrics.md | 2 +- docs/ai/openai.md | 113 +++++++++++++++++--------------- docs/attributes-registry/llm.md | 6 +- 4 files changed, 96 insertions(+), 105 deletions(-) diff --git a/docs/ai/llm-spans.md b/docs/ai/llm-spans.md index 12884056f9..894d464786 100644 --- a/docs/ai/llm-spans.md +++ b/docs/ai/llm-spans.md @@ -12,7 +12,6 @@ linkTitle: LLM Calls - [Configuration](#configuration) - [LLM Request attributes](#llm-request-attributes) -- [LLM Response attributes](#llm-response-attributes) - [Events](#events) @@ -36,65 +35,52 @@ By default, these configurations SHOULD NOT capture prompts and completions. These attributes track input data and metadata for a request to an LLM. Each attribute represents a concept that is common to most LLMs. - + | Attribute | Type | Description | Examples | Requirement Level | |---|---|---|---|---| -| `llm.vendor` | string | The name of the LLM foundation model vendor, if applicable. If not using a vendor-supplied model, this field is left blank. | `openai` | Recommended | -| `llm.request.model` | string | The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model requested. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4` | Required | -| `llm.request.max_tokens` | int | The maximum number of tokens the LLM generates for a request. | `100` | Recommended | -| `llm.temperature` | float | The temperature setting for the LLM request. | `0.0` | Recommended | -| `llm.top_p` | float | The top_p sampling setting for the LLM request. | `1.0` | Recommended | -| `llm.stream` | bool | Whether the LLM responds with a stream. | `false` | Recommended | -| `llm.stop_sequences` | array | Array of strings the LLM uses as a stop sequence. | `["stop1"]` | Recommended | - -`llm.model` has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used. - -| Value | Description | -|---|---| -| `gpt-4` | GPT-4 | -| `gpt-4-32k` | GPT-4 with 32k context window | -| `gpt-3.5-turbo` | GPT-3.5-turbo | -| `gpt-3.5-turbo-16k` | GPT-3.5-turbo with 16k context window| -| `claude-instant-1` | Claude Instant (latest version) | -| `claude-2` | Claude 2 (latest version) | -| `other-llm` | Any LLM not listed in this table. Use for any fine-tuned version of a model. | +| [`llm.request.is_stream`](../attributes-registry/llm.md) | boolean | Whether the LLM responds with a stream. | `False` | Recommended | +| [`llm.request.max_tokens`](../attributes-registry/llm.md) | int | The maximum number of tokens the LLM generates for a request. | `100` | Recommended | +| [`llm.request.model`](../attributes-registry/llm.md) | string | The name of the LLM a request is being made to. [1] | `gpt-4` | Required | +| [`llm.request.stop_sequences`](../attributes-registry/llm.md) | string | Array of strings the LLM uses as a stop sequence. | `stop1` | Recommended | +| [`llm.request.temperature`](../attributes-registry/llm.md) | double | The temperature setting for the LLM request. | `0.0` | Recommended | +| [`llm.request.top_p`](../attributes-registry/llm.md) | double | The top_p sampling setting for the LLM request. | `1.0` | Recommended | +| [`llm.response.finish_reason`](../attributes-registry/llm.md) | string | The reason the model stopped generating tokens. | `stop` | Recommended | +| [`llm.response.id`](../attributes-registry/llm.md) | string[] | The unique identifier for the completion. | `[chatcmpl-123]` | Recommended | +| [`llm.response.model`](../attributes-registry/llm.md) | string | The name of the LLM a response is being made to. [2] | `gpt-4-0613` | Required | +| [`llm.system`](../attributes-registry/llm.md) | string | The name of the LLM foundation model vendor, if applicable. [3] | `openai` | Recommended | +| [`llm.usage.completion_tokens`](../attributes-registry/llm.md) | int | The number of tokens used in the LLM response (completion). | `180` | Recommended | +| [`llm.usage.prompt_tokens`](../attributes-registry/llm.md) | int | The number of tokens used in the LLM prompt. | `100` | Recommended | +| [`llm.usage.total_tokens`](../attributes-registry/llm.md) | int | The total number of tokens used in the LLM prompt and response. | `280` | Recommended | + +**[1]:** The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model requested. If the LLM is a fine-tuned custom model, the value should have a more specific name than the base model that's been fine-tuned. + +**[2]:** The name of the LLM a response is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model actually used. If the LLM is a fine-tuned custom model, the value should have a more specific name than the base model that's been fine-tuned. + +**[3]:** The name of the LLM foundation model vendor, if applicable. If not using a vendor-supplied model, this field is left blank. -## LLM Response attributes +## Events + +In the lifetime of an LLM span, an event for prompts sent and completions received MAY be created, depending on the configuration of the instrumentation. -These attributes track output data and metadata for a response from an LLM. Each attribute represents a concept that is common to most LLMs. + +The event name MUST be `llm.content.prompt`. - | Attribute | Type | Description | Examples | Requirement Level | |---|---|---|---|---| -| `llm.response.id` | string | The unique identifier for the completion. | `chatcmpl-123` | Recommended | -| `llm.response.model` | string | The name of the LLM a response is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model actually used. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4-0613` | Required | -| `llm.response.finish_reason` | string | The reason the model stopped generating tokens | `stop` | Recommended | -| `llm.usage.prompt_tokens` | int | The number of tokens used in the LLM prompt. | `100` | Recommended | -| `llm.usage.completion_tokens` | int | The number of tokens used in the LLM response (completion). | `180` | Recommended | -| `llm.usage.total_tokens` | int | The total number of tokens used in the LLM prompt and response. | `280` | Recommended | - -`llm.response.finish_reason` MUST be one of the following: - -| Value | Description | -|---|---| -| `stop` | If the model hit a natural stop point or a provided stop sequence. | -| `max_tokens` | If the maximum number of tokens specified in the request was reached. | -| `tool_call` | If a function / tool call was made by the model (for models that support such functionality). | - +| [`llm.prompt`](../attributes-registry/llm.md) | string | The full prompt string sent to an LLM in a request. [1] | `\\n\\nHuman:You are an AI assistant that tells jokes. Can you tell me a joke about OpenTelemetry?\\n\\nAssistant:` | Recommended | -## Events +**[1]:** The full prompt string sent to an LLM in a request. If the LLM accepts a more complex input like a JSON object, this field is blank, and the response is instead captured in an event determined by the specific LLM technology semantic convention. + -In the lifetime of an LLM span, an event for prompts sent and completions received MAY be created, depending on the configuration of the instrumentation. + +The event name MUST be `llm.content.completion`. - | Attribute | Type | Description | Examples | Requirement Level | -| `llm.prompt` | string | The full prompt string sent to an LLM in a request. If the LLM accepts a more complex input like a JSON object made up of several pieces (such as OpenAI's different message types), this field is blank, and the response is instead captured in an event determined by the specific LLM technology semantic convention. | `\n\nHuman:You are an AI assistant that tells jokes. Can you tell me a joke about OpenTelemetry?\n\nAssistant:` | Recommended | - +|---|---|---|---|---| +| [`llm.completion`](../attributes-registry/llm.md) | string | The full response string from an LLM in a response. [1] | `Why did the developer stop using OpenTelemetry? Because they couldnt trace their steps!` | Recommended | - -| Attribute | Type | Description | Examples | Requirement Level | -| `llm.completion` | string | The full response string from an LLM. If the LLM responds with a more complex output like a JSON object made up of several pieces (such as OpenAI's message choices), this field is the content of the response. If the LLM produces multiple responses, then this field is left blank, and each response is instead captured in an event determined by the specific LLM technology semantic convention.| `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Recommended | +**[1]:** The full response string from an LLM. If the LLM responds with a more complex output like a JSON object made up of several pieces (such as OpenAI's message choices), this field is the content of the response. If the LLM produces multiple responses, then this field is left blank, and each response is instead captured in an event determined by the specific LLM technology semantic convention. [DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md diff --git a/docs/ai/openai-metrics.md b/docs/ai/openai-metrics.md index 656148318a..bf39882d9e 100644 --- a/docs/ai/openai-metrics.md +++ b/docs/ai/openai-metrics.md @@ -341,7 +341,7 @@ of `[ 0, 0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1, 2.5, 5, 7.5, | Attribute | Type | Description | Examples | Requirement Level | |---|---|---|---|---| | [`error.type`](../attributes-registry/error.md) | string | Describes a class of error the operation ended with. [1] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | Recommended | -| [`llm.response.model`](../attributes-registry/llm.md) | string | The name of the LLM a response is being made to. | `gpt-4-0613` | Required | +| [`llm.response.model`](../attributes-registry/llm.md) | string | The name of the LLM a response is being made to. | `gpt-4-0613` | Conditionally Required: if the operation ended in error | | [`server.address`](../attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [2] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | Required | **[1]:** The `error.type` SHOULD be predictable and SHOULD have low cardinality. diff --git a/docs/ai/openai.md b/docs/ai/openai.md index 8105be0f29..001d751ce5 100644 --- a/docs/ai/openai.md +++ b/docs/ai/openai.md @@ -24,40 +24,30 @@ to maintain consistency and clarity in telemetry data. These are the attributes when instrumenting OpenAI LLM requests with the `/chat/completions` endpoint. - + | Attribute | Type | Description | Examples | Requirement Level | |---|---|---|---|---| -| `llm.vendor` | string | The name of the LLM foundation model vendor, if applicable. If not using a vendor-supplied model, this field is left blank. | `openai` | Recommended | -| `llm.request.model` | string | The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model requested. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4` | Required | -| `llm.request.max_tokens` | int | The maximum number of tokens the LLM generates for a request. | `100` | Recommended | -| `llm.temperature` | float | The temperature setting for the LLM request. | `0.0` | Recommended | -| `llm.top_p` | float | The top_p sampling setting for the LLM request. | `1.0` | Recommended | -| `llm.stream` | bool | Whether the LLM responds with a stream. | `false` | Recommended | -| `llm.stop_sequences` | array | Array of strings the LLM uses as a stop sequence. | `["stop1"]` | Recommended | -| `llm.openai.n` | integer | The number of completions to generate. | `1` | Recommended | -| `llm.openai.presence_penalty` | float | If present, the `presence_penalty` used in an OpenAI request. Value is between -2.0 and 2.0. | `-0.5` | Recommended | -| `llm.openai.frequency_penalty` | float | If present, the `frequency_penalty` used in an OpenAI request. Value is between -2.0 and 2.0. | `-0.5` | Recommended | -| `llm.openai.logit_bias` | string | If present, the JSON-encoded string of a `logit_bias` used in an OpenAI request. | `{2435:-100, 640:-100}` | Recommended | -| `llm.openai.user` | string | If present, the `user` used in an OpenAI request. | `bob` | Opt-in | -| `llm.openai.response_format` | string | An object specifying the format that the model must output. Either `text` or `json_object` | `text` | Recommended | -| `llm.openai.seed` | integer | Seed used in request to improve determinism. | `1234` | Recommended | - - -## Response attributes - -Attributes for chat completion responses SHOULD follow these conventions: - - -| Attribute | Type | Description | Examples | Requirement Level | -|---|---|---|---|---| -| `llm.response.id` | string | The unique identifier for the completion. | `chatcmpl-123` | Recommended | -| `llm.response.model` | string | The name of the LLM a response is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model actually used. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4-0613` | Required | -| `llm.response.finish_reason` | string | The reason the model stopped generating tokens | `stop` | Recommended | -| `llm.usage.prompt_tokens` | int | The number of tokens used in the LLM prompt. | `100` | Recommended | -| `llm.usage.completion_tokens` | int | The number of tokens used in the LLM response (completion). | `180` | Recommended | -| `llm.usage.total_tokens` | int | The total number of tokens used in the LLM prompt and response. | `280` | Recommended | -| `llm.openai.created` | int | The UNIX timestamp (in seconds) if when the completion was created. | `1677652288` | Recommended | -| `llm.openai.system_fingerprint` | string | This fingerprint represents the backend configuration that the model runs with. | asdf987123 | Recommended | +| [`llm.request.is_stream`](../attributes-registry/llm.md) | boolean | Whether the LLM responds with a stream. | `False` | Recommended | +| [`llm.request.max_tokens`](../attributes-registry/llm.md) | int | The maximum number of tokens the LLM generates for a request. | `100` | Recommended | +| [`llm.request.model`](../attributes-registry/llm.md) | string | The name of the LLM a request is being made to. [1] | `gpt-4` | Required | +| [`llm.request.openai.logit_bias`](../attributes-registry/llm.md) | string | If present, the JSON-encoded string of a `logit_bias` used in an OpenAI request | `{2435:-100, 640:-100}` | Recommended | +| [`llm.request.openai.presence_penalty`](../attributes-registry/llm.md) | double | If present, the `presence_penalty` used in an OpenAI request. Value is between -2.0 and 2.0. | `-0.5` | Recommended | +| [`llm.request.openai.response_format`](../attributes-registry/llm.md) | string | An object specifying the format that the model must output. Either `text` or `json_object` | `text` | Recommended | +| [`llm.request.openai.seed`](../attributes-registry/llm.md) | int | Seed used in request to improve determinism. | `1234` | Recommended | +| [`llm.request.openai.user`](../attributes-registry/llm.md) | string | If present, the `user` used in an OpenAI request. | `bob` | Recommended | +| [`llm.request.stop_sequences`](../attributes-registry/llm.md) | string | Array of strings the LLM uses as a stop sequence. | `stop1` | Recommended | +| [`llm.request.temperature`](../attributes-registry/llm.md) | double | The temperature setting for the LLM request. | `0.0` | Recommended | +| [`llm.request.top_p`](../attributes-registry/llm.md) | double | The top_p sampling setting for the LLM request. | `1.0` | Recommended | +| [`llm.response.finish_reason`](../attributes-registry/llm.md) | string | The reason the model stopped generating tokens. | `stop` | Recommended | +| [`llm.response.id`](../attributes-registry/llm.md) | string[] | The unique identifier for the completion. | `[chatcmpl-123]` | Recommended | +| [`llm.response.openai.created`](../attributes-registry/llm.md) | int | The UNIX timestamp (in seconds) if when the completion was created. | `1677652288` | Recommended | +| [`llm.response.openai.system_fingerprint`](../attributes-registry/llm.md) | string | This fingerprint represents the backend configuration that the model runs with. | `asdf987123` | Recommended | +| [`llm.system`](../attributes-registry/llm.md) | string | The name of the LLM foundation model vendor, if applicable. | `openai`; `microsoft` | Recommended | +| [`llm.usage.completion_tokens`](../attributes-registry/llm.md) | int | The number of tokens used in the LLM response (completion). | `180` | Recommended | +| [`llm.usage.prompt_tokens`](../attributes-registry/llm.md) | int | The number of tokens used in the LLM prompt. | `100` | Recommended | +| [`llm.usage.total_tokens`](../attributes-registry/llm.md) | int | The total number of tokens used in the LLM prompt and response. | `280` | Recommended | + +**[1]:** The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model requested. If the LLM is a fine-tuned custom model, the value should have a more specific name than the base model that's been fine-tuned. ## Request Events @@ -67,27 +57,31 @@ Because OpenAI uses a more complex prompt structure, these events will be used i ### Prompt Events -Prompt event name SHOULD be `llm.openai.prompt`. +Prompt event name SHOULD be `llm.content.openai.prompt`. + + +The event name MUST be `llm.content.openai.prompt`. - | Attribute | Type | Description | Examples | Requirement Level | |---|---|---|---|---| -| `role` | string | The role of the prompt author, can be one of `system`, `user`, `assistant`, or `tool` | `system` | Required | -| `content` | string | The content for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. | `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Required | -| `tool_call_id` | string | If role is `tool` or `function`, then this tool call that this message is responding to. | `get_current_weather` | Conditionally Required: If `role` is `tool`. | +| [`llm.openai.content`](../attributes-registry/llm.md) | string | The content for a given OpenAI response. | `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Required | +| [`llm.openai.role`](../attributes-registry/llm.md) | string | The role of the prompt author, can be one of `system`, `user`, `assistant`, or `tool` | `user` | Required | +| [`llm.openai.tool_call.id`](../attributes-registry/llm.md) | string | If role is `tool` or `function`, then this tool call that this message is responding to. | `get_current_weather` | Conditionally Required: Required if the prompt role is `tool`. | ### Tools Events -Tools event name SHOULD be `llm.openai.tool`, specifying potential tools or functions the LLM can use. +Tools event name SHOULD be `llm.content.openai.tool`, specifying potential tools or functions the LLM can use. + + +The event name MUST be `llm.content.openai.tool`. - | Attribute | Type | Description | Examples | Requirement Level | |---|---|---|---|---| -| `type` | string | They type of the tool. Currently, only `function` is supported. | `function` | Required | -| `function.name` | string | The name of the function to be called. | `get_weather` | Required ! -| `function.description` | string | A description of what the function does, used by the model to choose when and how to call the function. | `` | Required | -| `function.parameters` | string | JSON-encoded string of the parameter object for the function. | `{"type": "object", "properties": {}}` | Required | +| [`llm.openai.function.description`](../attributes-registry/llm.md) | string | A description of what the function does, used by the model to choose when and how to call the function. | `Gets the current weather for a location` | Required | +| [`llm.openai.function.name`](../attributes-registry/llm.md) | string | The name of the function to be called. | `get_weather` | Required | +| [`llm.openai.function.parameters`](../attributes-registry/llm.md) | string | JSON-encoded string of the parameter object for the function. | `{"type": "object", "properties": {}}` | Required | +| [`llm.openai.tool.type`](../attributes-registry/llm.md) | string | The type of the tool. Currently, only `function` is supported. | `function` | Required | ### Choice Events @@ -95,20 +89,31 @@ Tools event name SHOULD be `llm.openai.tool`, specifying potential tools or func Recording details about Choices in each response MAY be included as Span Events. -Choice event name SHOULD be `llm.openai.choice`. +Choice event name SHOULD be `llm.content.openai.choice`. + +If there is more than one `choice`, separate events SHOULD be used. -If there is more than one `tool_call`, separate events SHOULD be used. + +The event name MUST be `llm.content.openai.completion.choice`. - -| `type` | string | Either `delta` or `message`. | `message` | Required | +| Attribute | Type | Description | Examples | Requirement Level | |---|---|---|---|---| -| `finish_reason` | string | The reason the OpenAI model stopped generating tokens for this chunk. | `stop` | Recommended | -| `role` | string | The assigned role for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. | `system` | Required | -| `content` | string | The content for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. | `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Required | -| `tool_call.id` | string | If exists, the ID of the tool call. | `call_BP08xxEhU60txNjnz3z9R4h9` | Required | -| `tool_call.type` | string | Currently only `function` is supported. | `function` | Required | -| `tool_call.function.name` | string | If exists, the name of a function call for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. | `get_weather_report` | Required | -| `tool_call.function.arguments` | string | If exists, the arguments to call a function call with for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. | `{"type": "object", "properties": {"some":"data"}}` | Required | +| [`llm.openai.choice.type`](../attributes-registry/llm.md) | string | The type of the choice, either `delta` or `message`. | `message` | Required | +| [`llm.openai.content`](../attributes-registry/llm.md) | string | The content for a given OpenAI response. | `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Required | +| [`llm.openai.function.arguments`](../attributes-registry/llm.md) | string | If exists, the arguments to call a function call with for a given OpenAI response, denoted by ``. The value for `` starts with 0, where 0 is the first message. | `{"type": "object", "properties": {"some":"data"}}` | Conditionally Required: [1] | +| [`llm.openai.function.name`](../attributes-registry/llm.md) | string | The name of the function to be called. | `get_weather` | Conditionally Required: [2] | +| [`llm.openai.role`](../attributes-registry/llm.md) | string | The role of the prompt author, can be one of `system`, `user`, `assistant`, or `tool` | `user` | Required | +| [`llm.openai.tool.type`](../attributes-registry/llm.md) | string | The type of the tool. Currently, only `function` is supported. | `function` | Conditionally Required: [3] | +| [`llm.openai.tool_call.id`](../attributes-registry/llm.md) | string | If role is `tool` or `function`, then this tool call that this message is responding to. | `get_current_weather` | Conditionally Required: [4] | +| [`llm.response.finish_reason`](../attributes-registry/llm.md) | string | The reason the model stopped generating tokens. | `stop` | Recommended | + +**[1]:** Required if the choice is the result of a tool call of type `function`. + +**[2]:** Required if the choice is the result of a tool call of type `function`. + +**[3]:** Required if the choice is the result of a tool call. + +**[4]:** Required if the choice is the result of a tool call. [DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md diff --git a/docs/attributes-registry/llm.md b/docs/attributes-registry/llm.md index 5dfb91d272..e9a66e9021 100644 --- a/docs/attributes-registry/llm.md +++ b/docs/attributes-registry/llm.md @@ -23,13 +23,13 @@ | Attribute | Type | Description | Examples | |---|---|---|---| +| `llm.request.is_stream` | boolean | Whether the LLM responds with a stream. | `False` | | `llm.request.max_tokens` | int | The maximum number of tokens the LLM generates for a request. | `100` | | `llm.request.model` | string | The name of the LLM a request is being made to. | `gpt-4` | | `llm.request.stop_sequences` | string | Array of strings the LLM uses as a stop sequence. | `stop1` | -| `llm.request.stream` | boolean | Whether the LLM responds with a stream. | `False` | | `llm.request.temperature` | double | The temperature setting for the LLM request. | `0.0` | | `llm.request.top_p` | double | The top_p sampling setting for the LLM request. | `1.0` | -| `llm.request.vendor` | string | The name of the LLM foundation model vendor, if applicable. | `openai` | +| `llm.system` | string | The name of the LLM foundation model vendor, if applicable. | `openai` | ### Response Attributes @@ -38,7 +38,7 @@ | Attribute | Type | Description | Examples | |---|---|---|---| | `llm.response.finish_reason` | string | The reason the model stopped generating tokens. | `stop` | -| `llm.response.id` | string | The unique identifier for the completion. | `chatcmpl-123` | +| `llm.response.id` | string[] | The unique identifier for the completion. | `[chatcmpl-123]` | | `llm.response.model` | string | The name of the LLM a response is being made to. | `gpt-4-0613` | | `llm.usage.completion_tokens` | int | The number of tokens used in the LLM response (completion). | `180` | | `llm.usage.prompt_tokens` | int | The number of tokens used in the LLM prompt. | `100` |