From 87ec4e6e3b56effd895775f4f8f63e412bde7f75 Mon Sep 17 00:00:00 2001 From: Drew Robbins Date: Thu, 3 Oct 2024 00:14:55 +0900 Subject: [PATCH 1/5] Add system specific conventions for OpenAI (#1385) Co-authored-by: Liudmila Molkova --- .chloggen/add_openai_specific_attributes.yaml | 4 + docs/attributes-registry/gen-ai.md | 27 +++ docs/gen-ai/openai.md | 190 ++++++++++++++++++ model/gen-ai/metrics.yaml | 6 + model/gen-ai/registry.yaml | 48 +++++ model/gen-ai/spans.yaml | 34 +++- 6 files changed, 306 insertions(+), 3 deletions(-) create mode 100644 .chloggen/add_openai_specific_attributes.yaml create mode 100644 docs/gen-ai/openai.md diff --git a/.chloggen/add_openai_specific_attributes.yaml b/.chloggen/add_openai_specific_attributes.yaml new file mode 100644 index 0000000000..52d41f7b95 --- /dev/null +++ b/.chloggen/add_openai_specific_attributes.yaml @@ -0,0 +1,4 @@ +change_type: enhancement +component: gen_ai +note: Add system specific conventions for OpenAI. +issues: [1370] diff --git a/docs/attributes-registry/gen-ai.md b/docs/attributes-registry/gen-ai.md index fd8e8ee117..0dc935e462 100644 --- a/docs/attributes-registry/gen-ai.md +++ b/docs/attributes-registry/gen-ai.md @@ -7,6 +7,7 @@ # Gen AI - [GenAI Attributes](#genai-attributes) +- [OpenAI Attributes](#openai-attributes) - [Deprecated GenAI Attributes](#deprecated-genai-attributes) ## GenAI Attributes @@ -73,6 +74,32 @@ If none of these options apply, the `gen_ai.system` SHOULD be set to `_OTHER`. | `input` | Input tokens (prompt, input, etc.) | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `output` | Output tokens (completion, response, etc.) | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +## OpenAI Attributes + +Thie group defines attributes for OpenAI. + +| Attribute | Type | Description | Examples | Stability | +| --------------------------------------- | ------ | --------------------------------------------------------------------- | ------------------ | ---------------------------------------------------------------- | +| `gen_ai.openai.request.response_format` | string | The response format that is requested. | `json` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.openai.request.seed` | int | Requests with same seed value more likely to return same result. | `100` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.openai.request.service_tier` | string | The service tier requested. May be a specific tier, detault, or auto. | `auto`; `default` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.openai.response.service_tier` | string | The service tier used for the response. | `scale`; `detault` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + +`gen_ai.openai.request.response_format` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used. + +| Value | Description | Stability | +| ------------- | --------------------------- | ---------------------------------------------------------------- | +| `json_object` | JSON object response format | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `json_schema` | JSON schema response format | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `text` | Text response format | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + +`gen_ai.openai.request.service_tier` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used. + +| Value | Description | Stability | +| --------- | -------------------------------------------------------------------- | ---------------------------------------------------------------- | +| `auto` | The system will utilize scale tier credits until they are exhausted. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `default` | The system will utilize the default scale tier. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + ## Deprecated GenAI Attributes Describes deprecated `gen_ai` attributes. diff --git a/docs/gen-ai/openai.md b/docs/gen-ai/openai.md new file mode 100644 index 0000000000..0d797ed20b --- /dev/null +++ b/docs/gen-ai/openai.md @@ -0,0 +1,190 @@ + + +# Semantic Conventions for OpenAI operations + +**Status**: [Experimental][DocumentStatus] + + + + + +- [OpenAI Span attributes](#openai-span-attributes) +- [OpenAI Metric attributes](#openai-metric-attributes) + - [Metric: `gen_ai.client.token.usage`](#metric-gen_aiclienttokenusage) + - [Metric: `gen_ai.client.operation.duration`](#metric-gen_aiclientoperationduration) + + + +The Semantic Conventions for [OpenAI](https://openai.com/) extend and override the semantic conventions +for [Gen AI Spans](gen-ai-spans.md) and [Gen AI Metrics](gen-ai-metrics.md). + +`gen_ai.system` MUST be set to `"openai"`. + +## OpenAI Span attributes + +These attributes track input data and metadata for a request to an OpenAI model. The attributes include general Generative AI +attributes and ones specific the OpenAI. + + + + + + + + +| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | +|---|---|---|---|---|---| +| [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. [1] | `chat`; `text_completion` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the GenAI model a request is being made to. [2] | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [3] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [4] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` if the operation ended in an error | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`gen_ai.openai.request.response_format`](/docs/attributes-registry/gen-ai.md) | string | The response format that is requested. | `json` | `Conditionally Required` if the request includes a response_format | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.openai.request.seed`](/docs/attributes-registry/gen-ai.md) | int | Requests with same seed value more likely to return same result. | `100` | `Conditionally Required` if the request includes a seed | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.openai.request.service_tier`](/docs/attributes-registry/gen-ai.md) | string | The service tier requested. May be a specific tier, detault, or auto. | `auto`; `default` | `Conditionally Required` [5] | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.openai.response.service_tier`](/docs/attributes-registry/gen-ai.md) | string | The service tier used for the response. | `scale`; `detault` | `Conditionally Required` [6] | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`server.port`](/docs/attributes-registry/server.md) | int | GenAI server port. [7] | `80`; `8080`; `443` | `Conditionally Required` If `server.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`gen_ai.request.frequency_penalty`](/docs/attributes-registry/gen-ai.md) | double | The frequency penalty setting for the GenAI request. | `0.1` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.request.max_tokens`](/docs/attributes-registry/gen-ai.md) | int | The maximum number of tokens the model generates for a request. | `100` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.request.presence_penalty`](/docs/attributes-registry/gen-ai.md) | double | The presence penalty setting for the GenAI request. | `0.1` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.request.stop_sequences`](/docs/attributes-registry/gen-ai.md) | string[] | List of sequences that the model will use to stop generating further tokens. | `["forest", "lived"]` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.request.temperature`](/docs/attributes-registry/gen-ai.md) | double | The temperature setting for the GenAI request. | `0.0` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.request.top_p`](/docs/attributes-registry/gen-ai.md) | double | The top_p sampling setting for the GenAI request. | `1.0` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.response.finish_reasons`](/docs/attributes-registry/gen-ai.md) | string[] | Array of reasons the model stopped generating tokens, corresponding to each generation received. | `["stop"]`; `["stop", "length"]` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.response.id`](/docs/attributes-registry/gen-ai.md) | string | The unique identifier for the completion. | `chatcmpl-123` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the model that generated the response. [8] | `gpt-4-0613` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.usage.input_tokens`](/docs/attributes-registry/gen-ai.md) | int | The number of tokens used in the prompt sent to OpenAI. | `100` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.usage.output_tokens`](/docs/attributes-registry/gen-ai.md) | int | The number of tokens used in the completions from OpenAI. | `180` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`server.address`](/docs/attributes-registry/server.md) | string | GenAI server address. [9] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | + +**[1]:** If one of the predefined values applies, but specific system uses a different name it's RECOMMENDED to document it in the semantic conventions for specific GenAI system and use system-specific name in the instrumentation. If a different name is not documented, instrumentation libraries SHOULD use applicable predefined value. + +**[2]:** The name of the GenAI model a request is being made to. If the model is supplied by a vendor, then the value must be the exact name of the model requested. If the model is a fine-tuned custom model, the value should have a more specific name than the base model that's been fine-tuned. + +**[3]:** The `gen_ai.system` describes a family of GenAI models with specific model identified +by `gen_ai.request.model` and `gen_ai.response.model` attributes. + +The actual GenAI product may differ from the one identified by the client. +For example, when using OpenAI client libraries to communicate with Mistral, the `gen_ai.system` +is set to `openai` based on the instrumentation's best knowledge. + +For custom model, a custom friendly name SHOULD be used. +If none of these options apply, the `gen_ai.system` SHOULD be set to `_OTHER`. + +**[4]:** The `error.type` SHOULD match the error code returned by the Generative AI provider or the client library, +the canonical name of exception that occurred, or another low-cardinality error identifier. +Instrumentations SHOULD document the list of errors they report. + +**[5]:** if the request includes a service_tier and the value is not 'auto' + +**[6]:** if the response was received and includes a service_tier + +**[7]:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available. + +**[8]:** If available. The name of the GenAI model that provided the response. If the model is supplied by a vendor, then the value must be the exact name of the model actually used. If the model is a fine-tuned custom model, the value should have a more specific name than the base model that's been fine-tuned. + +**[9]:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available. + + + +`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used. + +| Value | Description | Stability | +|---|---|---| +| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | + + +`gen_ai.openai.request.response_format` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used. + +| Value | Description | Stability | +|---|---|---| +| `json_object` | JSON object response format | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `json_schema` | JSON schema response format | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `text` | Text response format | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + + +`gen_ai.openai.request.service_tier` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used. + +| Value | Description | Stability | +|---|---|---| +| `auto` | The system will utilize scale tier credits until they are exhausted. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `default` | The system will utilize the default scale tier. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + + +`gen_ai.operation.name` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used. + +| Value | Description | Stability | +|---|---|---| +| `chat` | Chat completion operation such as [OpenAI Chat API](https://platform.openai.com/docs/api-reference/chat) | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `text_completion` | Text completions operation such as [OpenAI Completions API (Legacy)](https://platform.openai.com/docs/api-reference/completions) | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + + +`gen_ai.system` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used. + +| Value | Description | Stability | +|---|---|---| +| `anthropic` | Anthropic | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `cohere` | Cohere | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `openai` | OpenAI | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `vertex_ai` | Vertex AI | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + + + + + + + + +## OpenAI Metric attributes + +OpenAI metrics follow [Generative AI metrics](gen-ai-metrics.md) with the noted additional attributes. +Individual systems may include additional system-specific attributes. It is recommended to check system-specific documentation, if available. + +### Metric: `gen_ai.client.token.usage` + +Reports the usage of tokens following the common [gen_ai.client.token.usage](./gen-ai-metrics.md#metric-gen_aiclienttokenusage) definition. + +Additional attributes: + + + + + + + + +| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | +|---|---|---|---|---|---| +| [`gen_ai.openai.response.service_tier`](/docs/attributes-registry/gen-ai.md) | string | The service tier used for the response. | `scale`; `detault` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + + + + + + + +### Metric: `gen_ai.client.operation.duration` + +Measures the to complete an operation following the common [gen_ai.client.operation.duration](./gen-ai-metrics.md#metric-gen_aiclientoperationduration) definition. + +Additional attributes: + + + + + + + + +| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | +|---|---|---|---|---|---| +| [`gen_ai.openai.response.service_tier`](/docs/attributes-registry/gen-ai.md) | string | The service tier used for the response. | `scale`; `detault` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + + + + + + + +[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md diff --git a/model/gen-ai/metrics.yaml b/model/gen-ai/metrics.yaml index 2c7035876c..cbd22ebc70 100644 --- a/model/gen-ai/metrics.yaml +++ b/model/gen-ai/metrics.yaml @@ -30,6 +30,12 @@ groups: The `error.type` SHOULD match the error code returned by the Generative AI service, the canonical name of exception that occurred, or another low-cardinality error identifier. Instrumentations SHOULD document the list of errors they report. + - id: metric_attributes.gen_ai.openai + type: attribute_group + brief: 'This group describes GenAI server metrics attributes' + attributes: + - ref: gen_ai.openai.response.service_tier + requirement_level: recommended - id: metric.gen_ai.client.token.usage type: metric metric_name: gen_ai.client.token.usage diff --git a/model/gen-ai/registry.yaml b/model/gen-ai/registry.yaml index 1470ca1cb3..5b3d1cff79 100644 --- a/model/gen-ai/registry.yaml +++ b/model/gen-ai/registry.yaml @@ -148,3 +148,51 @@ groups: If one of the predefined values applies, but specific system uses a different name it's RECOMMENDED to document it in the semantic conventions for specific GenAI system and use system-specific name in the instrumentation. If a different name is not documented, instrumentation libraries SHOULD use applicable predefined value. + - id: registry.gen_ai.openai + type: attribute_group + display_name: OpenAI Attributes + brief: > + Thie group defines attributes for OpenAI. + attributes: + - id: gen_ai.openai.request.seed + stability: experimental + type: int + brief: Requests with same seed value more likely to return same result. + examples: [100] + - id: gen_ai.openai.request.response_format + stability: experimental + type: + members: + - id: text + value: "text" + brief: 'Text response format' + stability: experimental + - id: json_object + value: "json_object" + brief: 'JSON object response format' + stability: experimental + - id: json_schema + value: "json_schema" + brief: 'JSON schema response format' + stability: experimental + brief: The response format that is requested. + examples: ['json'] + - id: gen_ai.openai.request.service_tier + stability: experimental + type: + members: + - id: auto + value: "auto" + brief: The system will utilize scale tier credits until they are exhausted. + stability: experimental + - id: default + value: "default" + brief: The system will utilize the default scale tier. + stability: experimental + brief: The service tier requested. May be a specific tier, detault, or auto. + examples: ['auto', 'default'] + - id: gen_ai.openai.response.service_tier + stability: experimental + type: string + brief: The service tier used for the response. + examples: ['scale', 'detault'] diff --git a/model/gen-ai/spans.yaml b/model/gen-ai/spans.yaml index 9345f4f249..d634d94473 100644 --- a/model/gen-ai/spans.yaml +++ b/model/gen-ai/spans.yaml @@ -1,5 +1,5 @@ groups: - - id: trace.gen_ai.client + - id: trace.gen_ai.client.common type: span brief: > Describes GenAI operation span. @@ -20,8 +20,6 @@ groups: requirement_level: recommended - ref: gen_ai.request.top_p requirement_level: recommended - - ref: gen_ai.request.top_k - requirement_level: recommended - ref: gen_ai.request.stop_sequences requirement_level: recommended - ref: gen_ai.request.frequency_penalty @@ -85,3 +83,33 @@ groups: conditionally_required: if and only if corresponding event is enabled note: > It's RECOMMENDED to format completions as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation) + + - id: trace.gen_ai.client + extends: trace.gen_ai.client.common + brief: > + Describes a GenAI operation span. + attributes: + - ref: gen_ai.request.top_k + requirement_level: recommended + + - id: trace.gen_ai.openai.client + extends: trace.gen_ai.client.common + brief: > + Describes an OpenAI operation span. + attributes: + - ref: gen_ai.openai.request.seed + requirement_level: + conditionally_required: if the request includes a seed + - ref: gen_ai.openai.request.response_format + requirement_level: + conditionally_required: if the request includes a response_format + - ref: gen_ai.openai.request.service_tier + requirement_level: + conditionally_required: if the request includes a service_tier and the value is not 'auto' + - ref: gen_ai.openai.response.service_tier + requirement_level: + conditionally_required: if the response was received and includes a service_tier + - ref: gen_ai.usage.input_tokens + brief: The number of tokens used in the prompt sent to OpenAI. + - ref: gen_ai.usage.output_tokens + brief: The number of tokens used in the completions from OpenAI. From 25d438c3bf7234d27ba7a55e4520943e601f3cb9 Mon Sep 17 00:00:00 2001 From: Trask Stalnaker Date: Wed, 2 Oct 2024 21:09:46 -0700 Subject: [PATCH 2/5] Recommend to capture `db.namespace` from initial connection over not capturing any, also specify `db.namespace` value for PostgreSQL, MySQL and MariaDB (#1437) --- .chloggen/1437.yaml | 22 ++++++ docs/database/cassandra.md | 6 +- docs/database/hbase.md | 4 +- docs/database/mariadb.md | 135 +++++++++++++++++++++++++++++++++ docs/database/mssql.md | 17 ++++- docs/database/mysql.md | 135 +++++++++++++++++++++++++++++++++ docs/database/postgresql.md | 142 ++++++++++++++++++++++++++++++++++ docs/database/redis.md | 12 ++- docs/database/sql.md | 18 ++--- model/database/spans.yaml | 147 ++++++++++++++++++++++++++++-------- 10 files changed, 588 insertions(+), 50 deletions(-) create mode 100644 .chloggen/1437.yaml create mode 100644 docs/database/mariadb.md create mode 100644 docs/database/mysql.md create mode 100644 docs/database/postgresql.md diff --git a/.chloggen/1437.yaml b/.chloggen/1437.yaml new file mode 100644 index 0000000000..b45907f933 --- /dev/null +++ b/.chloggen/1437.yaml @@ -0,0 +1,22 @@ +# Use this changelog template to create an entry for release notes. +# +# If your change doesn't affect end users you should instead start +# your pull request title with [chore] or use the "Skip Changelog" label. + +# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix' +change_type: enhancement + +# The name of the area of concern in the attributes-registry, (e.g. http, cloud, db) +component: db + +# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`). +note: Recommend to capture `db.namespace` from initial connection over not capturing any, also specify `db.namespace` value for PostgreSQL, MySQL and MariaDB + +# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists. +# The values here must be integers. +issues: [ 1437 ] + +# (Optional) One or more lines of additional information to render under the primary note. +# These lines will be padded with 2 spaces and then inserted directly into the document. +# Use pipe (|) for multiline entries. +subtext: diff --git a/docs/database/cassandra.md b/docs/database/cassandra.md index 6d6142af9b..acabd5d171 100644 --- a/docs/database/cassandra.md +++ b/docs/database/cassandra.md @@ -22,7 +22,7 @@ The Semantic Conventions for [Cassandra](https://cassandra.apache.org/) extend a | Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | |---|---|---|---|---|---| | [`db.collection.name`](/docs/attributes-registry/db.md) | string | The name of the Cassandra table that the operation is acting upon. [1] | `public.users`; `customers` | `Conditionally Required` [2] | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`db.namespace`](/docs/attributes-registry/db.md) | string | The Cassandra keyspace name. [3] | `mykeyspace` | `Conditionally Required` If available. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`db.namespace`](/docs/attributes-registry/db.md) | string | The keyspace associated with the session. [3] | `mykeyspace` | `Conditionally Required` If available. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`db.operation.name`](/docs/attributes-registry/db.md) | string | The name of the operation or command being executed. [4] | `findAndModify`; `HMSET`; `SELECT` | `Conditionally Required` [5] | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`db.response.status_code`](/docs/attributes-registry/db.md) | string | [Cassandra protocol error code](https://github.com/apache/cassandra/blob/cassandra-5.0/doc/native_protocol_v5.spec) represented as a string. [6] | `102`; `40020` | `Conditionally Required` [7] | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [8] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` If and only if the operation failed. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | @@ -45,7 +45,9 @@ For batch operations, if the individual operations are known to have the same co **[2]:** If readily available. The collection name MAY be parsed from the query text, in which case it SHOULD be the first collection name found in the query. -**[3]:** For commands that switch the keyspace, this SHOULD be set to the target keyspace (even if the command fails). +**[3]:** If a database system has multiple namespace components, they SHOULD be concatenated (potentially using database system specific conventions) from most general to most specific namespace component, and more specific namespaces SHOULD NOT be captured without the more general namespaces, to ensure that "startswith" queries for the more general namespaces will be valid. +Semantic conventions for individual database systems SHOULD document what `db.namespace` means in the context of that system. +It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. **[4]:** It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. If the operation name is parsed from the query text, it SHOULD be the first operation name found in the query. diff --git a/docs/database/hbase.md b/docs/database/hbase.md index fdb4fd65e1..b4319c4a60 100644 --- a/docs/database/hbase.md +++ b/docs/database/hbase.md @@ -31,7 +31,9 @@ The Semantic Conventions for [HBase](https://hbase.apache.org/) extend and overr **[1]:** If table name includes the namespace, the `db.collection.name` SHOULD be set to the full table name. -**[2]:** When performing table-related operations, the instrumentations SHOULD extract the namespace from the table name according to the [HBase table naming conventions](https://hbase.apache.org/book.html#namespace_creation). If namespace is not provided, instrumentation SHOULD set `db.namespace` value to `default`. +**[2]:** If a database system has multiple namespace components, they SHOULD be concatenated (potentially using database system specific conventions) from most general to most specific namespace component, and more specific namespaces SHOULD NOT be captured without the more general namespaces, to ensure that "startswith" queries for the more general namespaces will be valid. +Semantic conventions for individual database systems SHOULD document what `db.namespace` means in the context of that system. +It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. **[3]:** It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. If the operation name is parsed from the query text, it SHOULD be the first operation name found in the query. diff --git a/docs/database/mariadb.md b/docs/database/mariadb.md new file mode 100644 index 0000000000..ee68fe9a30 --- /dev/null +++ b/docs/database/mariadb.md @@ -0,0 +1,135 @@ + + +# Semantic Conventions for MariaDB + +**Status**: [Experimental][DocumentStatus] + +The Semantic Conventions for *MariaDB* extend and override the [Database Semantic Conventions](database-spans.md). + +`db.system` MUST be set to `"mariadb"` and SHOULD be provided **at span creation time**. + +## Attributes + + + + + + + + +| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | +|---|---|---|---|---|---| +| [`db.collection.name`](/docs/attributes-registry/db.md) | string | The name of the SQL table that the operation is acting upon. [1] | `users`; `dbo.products` | `Conditionally Required` [2] | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`db.namespace`](/docs/attributes-registry/db.md) | string | The database associated with the connection. [3] | `products`; `customers` | `Conditionally Required` If available without an additional network call. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`db.operation.name`](/docs/attributes-registry/db.md) | string | The name of the operation or command being executed. [4] | `SELECT`; `INSERT`; `UPDATE`; `DELETE`; `CREATE`; `mystoredproc` | `Conditionally Required` [5] | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | [Maria DB error code](https://mariadb.com/kb/en/mariadb-error-code-reference/) represented as a string. [6] | `1008`; `3058` | `Conditionally Required` If response has ended with warning or an error. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [7] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` If and only if the operation failed. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [8] | `80`; `8080`; `443` | `Conditionally Required` [9] | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`db.query.text`](/docs/attributes-registry/db.md) | string | The database query being executed. [10] | `SELECT * FROM wuser_table where username = ?`; `SET mykey "WuValue"` | `Recommended` [11] | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`server.address`](/docs/attributes-registry/server.md) | string | Name of the database host. [12] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`db.query.parameter.`](/docs/attributes-registry/db.md) | string | A query parameter used in `db.query.text`, with `` being the parameter name, and the attribute value being a string representation of the parameter value. [13] | `someval`; `55` | `Opt-In` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + +**[1]:** It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. +If the collection name is parsed from the query text, it SHOULD be the first collection name found in the query and it SHOULD match the value provided in the query text including any schema and database name prefix. +For batch operations, if the individual operations are known to have the same collection name then that collection name SHOULD be used, otherwise `db.collection.name` SHOULD NOT be captured. + +**[2]:** If readily available. The collection name MAY be parsed from the query text, in which case it SHOULD be the first collection name found in the query. + +**[3]:** A connection's currently associated database may change during its lifetime, e.g. from executing `USE `. + +If instrumentation is unable to capture the connection's currently associated database on each query +without triggering an additional query to be executed (e.g. `SELECT DATABASE()`), +then it is RECOMMENDED to fallback and use the database provided when the connection was established. + +Instrumentation SHOULD document if `db.namespace` reflects the database provided when the connection was established. + +It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. + +**[4]:** This SHOULD be the SQL command such as `SELECT`, `INSERT`, `UPDATE`, `CREATE`, `DROP`. +In the case of `EXEC`, this SHOULD be the stored procedure name that is being executed. + +**[5]:** If readily available. The operation name MAY be parsed from the query text, in which case it SHOULD be the first operation name found in the query. + +**[6]:** SQL defines [SQLSTATE](https://wikipedia.org/wiki/SQLSTATE) as a database +return code which is adopted by some database systems like PostgreSQL. +See [PostgreSQL error codes](https://www.postgresql.org/docs/current/errcodes-appendix.html) +for the details. + +Other systems like MySQL, Oracle, or MS SQL Server define vendor-specific +error codes. Database SQL drivers usually provide access to both properties. +For example, in Java, the [`SQLException`](https://docs.oracle.com/javase/8/docs/api/java/sql/SQLException.html) +class reports them with `getSQLState()` and `getErrorCode()` methods. + +Instrumentations SHOULD populate the `db.response.status_code` with the +the most specific code available to them. + +Here's a non-exhaustive list of databases that report vendor-specific +codes with granularity higher than SQLSTATE (or don't report SQLSTATE +at all): + +- [DB2 SQL codes](https://www.ibm.com/docs/db2-for-zos/12?topic=codes-sql). +- [Maria DB error codes](https://mariadb.com/kb/en/mariadb-error-code-reference/) +- [Microsoft SQL Server errors](https://docs.microsoft.com/sql/relational-databases/errors-events/database-engine-events-and-errors) +- [MySQL error codes](https://dev.mysql.com/doc/mysql-errors/9.0/en/error-reference-introduction.html) +- [Oracle error codes](https://docs.oracle.com/cd/B28359_01/server.111/b28278/toc.htm) +- [SQLite result codes](https://www.sqlite.org/rescode.html) + +These systems SHOULD set the `db.response.status_code` to a +known vendor-specific error code. If only SQLSTATE is available, +it SHOULD be used. + +When multiple error codes are available and specificity is unclear, +instrumentation SHOULD set the `db.response.status_code` to the +concatenated string of all codes with '/' used as a separator. + +For example, generic DB instrumentation that detected an error and has +SQLSTATE `"42000"` and vendor-specific `1071` should set +`db.response.status_code` to `"42000/1071"`." + +**[7]:** The `error.type` SHOULD match the `db.response.status_code` returned by the database or the client library, or the canonical name of exception that occurred. +When using canonical exception type name, instrumentation SHOULD do the best effort to report the most relevant type. For example, if the original exception is wrapped into a generic one, the original exception SHOULD be preferred. +Instrumentations SHOULD document how `error.type` is populated. + +**[8]:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available. + +**[9]:** If using a port other than the default port for this DBMS and if `server.address` is set. + +**[10]:** For sanitization see [Sanitization of `db.query.text`](../../docs/database/database-spans.md#sanitization-of-dbquerytext). +For batch operations, if the individual operations are known to have the same query text then that query text SHOULD be used, otherwise all of the individual query texts SHOULD be concatenated with separator `; ` or some other database system specific separator if more applicable. +Even though parameterized query text can potentially have sensitive data, by using a parameterized query the user is giving a strong signal that any sensitive data will be passed as parameter values, and the benefit to observability of capturing the static part of the query text by default outweighs the risk. + +**[11]:** SHOULD be collected by default only if there is sanitization that excludes sensitive information. See [Sanitization of `db.query.text`](../../docs/database/database-spans.md#sanitization-of-dbquerytext). + +**[12]:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available. + +**[13]:** Query parameters should only be captured when `db.query.text` is parameterized with placeholders. +If a parameter has no name and instead is referenced only by index, then `` SHOULD be the 0-based index. + + + +The following attributes can be important for making sampling decisions +and SHOULD be provided **at span creation time** (if provided at all): + +* [`db.collection.name`](/docs/attributes-registry/db.md) +* [`db.namespace`](/docs/attributes-registry/db.md) +* [`db.operation.name`](/docs/attributes-registry/db.md) +* [`db.query.text`](/docs/attributes-registry/db.md) +* [`server.address`](/docs/attributes-registry/server.md) +* [`server.port`](/docs/attributes-registry/server.md) + +`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used. + +| Value | Description | Stability | +|---|---|---| +| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | + + + + + + + + +[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status diff --git a/docs/database/mssql.md b/docs/database/mssql.md index ecb59c4654..3b44bc09ab 100644 --- a/docs/database/mssql.md +++ b/docs/database/mssql.md @@ -22,7 +22,7 @@ The Semantic Conventions for the *Microsoft SQL Server* extend and override the | Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | |---|---|---|---|---|---| | [`db.collection.name`](/docs/attributes-registry/db.md) | string | The name of the SQL table that the operation is acting upon. [1] | `users`; `dbo.products` | `Conditionally Required` [2] | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`db.namespace`](/docs/attributes-registry/db.md) | string | The name of the database, fully qualified within the server address and port. [3] | `instance1.products`; `customers` | `Conditionally Required` If available. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`db.namespace`](/docs/attributes-registry/db.md) | string | The database associated with the connection, qualified by the instance name. [3] | `instance1.products`; `customers` | `Conditionally Required` If available without an additional network call. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`db.operation.name`](/docs/attributes-registry/db.md) | string | The name of the operation or command being executed. [4] | `SELECT`; `INSERT`; `UPDATE`; `DELETE`; `CREATE`; `mystoredproc` | `Conditionally Required` [5] | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`db.response.status_code`](/docs/attributes-registry/db.md) | string | [Microsoft SQL Server error](https://learn.microsoft.com/sql/relational-databases/errors-events/database-engine-events-and-errors) number represented as a string. [6] | `102`; `40020` | `Conditionally Required` If response has ended with warning or an error. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [7] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` If and only if the operation failed. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | @@ -37,8 +37,19 @@ For batch operations, if the individual operations are known to have the same co **[2]:** If readily available. The collection name MAY be parsed from the query text, in which case it SHOULD be the first collection name found in the query. -**[3]:** When connecting to a default instance, `db.namespace` SHOULD be set to the name of the database. When connecting to a [named instance](https://learn.microsoft.com/sql/connect/jdbc/building-the-connection-url#named-and-multiple-sql-server-instances), `db.namespace` SHOULD be set to the combination of instance and database name following the `{instance_name}.{database_name}` pattern. -For commands that switch the database, this SHOULD be set to the target database (even if the command fails). +**[3]:** When connected to a default instance, `db.namespace` SHOULD be set to the name of +the database. When connected to a [named instance](https://learn.microsoft.com/sql/connect/jdbc/building-the-connection-url#named-and-multiple-sql-server-instances), +`db.namespace` SHOULD be set to the combination of instance and database name following the `{instance_name}.{database_name}` pattern. + +A connection's currently associated database may change during its lifetime, e.g. from executing `USE `. + +If instrumentation is unable to capture the connection's currently associated database on each query +without triggering an additional query to be executed (e.g. `SELECT DB_NAME()`), +then it is RECOMMENDED to fallback and use the database provided when the connection was established. + +Instrumentation SHOULD document if `db.namespace` reflects the database provided when the connection was established. + +It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. **[4]:** This SHOULD be the SQL command such as `SELECT`, `INSERT`, `UPDATE`, `CREATE`, `DROP`. In the case of `EXEC`, this SHOULD be the stored procedure name that is being executed. diff --git a/docs/database/mysql.md b/docs/database/mysql.md new file mode 100644 index 0000000000..2a11f73098 --- /dev/null +++ b/docs/database/mysql.md @@ -0,0 +1,135 @@ + + +# Semantic Conventions for MySQL + +**Status**: [Experimental][DocumentStatus] + +The Semantic Conventions for *MySQL* extend and override the [Database Semantic Conventions](database-spans.md). + +`db.system` MUST be set to `"mysql"` and SHOULD be provided **at span creation time**. + +## Attributes + + + + + + + + +| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | +|---|---|---|---|---|---| +| [`db.collection.name`](/docs/attributes-registry/db.md) | string | The name of the SQL table that the operation is acting upon. [1] | `users`; `dbo.products` | `Conditionally Required` [2] | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`db.namespace`](/docs/attributes-registry/db.md) | string | The database associated with the connection. [3] | `products`; `customers` | `Conditionally Required` If available without an additional network call. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`db.operation.name`](/docs/attributes-registry/db.md) | string | The name of the operation or command being executed. [4] | `SELECT`; `INSERT`; `UPDATE`; `DELETE`; `CREATE`; `mystoredproc` | `Conditionally Required` [5] | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | [MySQL error number](https://dev.mysql.com/doc/mysql-errors/9.0/en/error-reference-introduction.html). [6] | `1005`; `MY-010016` | `Conditionally Required` If response has ended with warning or an error. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [7] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` If and only if the operation failed. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [8] | `80`; `8080`; `443` | `Conditionally Required` [9] | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`db.query.text`](/docs/attributes-registry/db.md) | string | The database query being executed. [10] | `SELECT * FROM wuser_table where username = ?`; `SET mykey "WuValue"` | `Recommended` [11] | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`server.address`](/docs/attributes-registry/server.md) | string | Name of the database host. [12] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`db.query.parameter.`](/docs/attributes-registry/db.md) | string | A query parameter used in `db.query.text`, with `` being the parameter name, and the attribute value being a string representation of the parameter value. [13] | `someval`; `55` | `Opt-In` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + +**[1]:** It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. +If the collection name is parsed from the query text, it SHOULD be the first collection name found in the query and it SHOULD match the value provided in the query text including any schema and database name prefix. +For batch operations, if the individual operations are known to have the same collection name then that collection name SHOULD be used, otherwise `db.collection.name` SHOULD NOT be captured. + +**[2]:** If readily available. The collection name MAY be parsed from the query text, in which case it SHOULD be the first collection name found in the query. + +**[3]:** A connection's currently associated database may change during its lifetime, e.g. from executing `USE `. + +If instrumentation is unable to capture the connection's currently associated database on each query +without triggering an additional query to be executed (e.g. `SELECT DATABASE()`), +then it is RECOMMENDED to fallback and use the database provided when the connection was established. + +Instrumentation SHOULD document if `db.namespace` reflects the database provided when the connection was established. + +It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. + +**[4]:** This SHOULD be the SQL command such as `SELECT`, `INSERT`, `UPDATE`, `CREATE`, `DROP`. +In the case of `EXEC`, this SHOULD be the stored procedure name that is being executed. + +**[5]:** If readily available. The operation name MAY be parsed from the query text, in which case it SHOULD be the first operation name found in the query. + +**[6]:** SQL defines [SQLSTATE](https://wikipedia.org/wiki/SQLSTATE) as a database +return code which is adopted by some database systems like PostgreSQL. +See [PostgreSQL error codes](https://www.postgresql.org/docs/current/errcodes-appendix.html) +for the details. + +Other systems like MySQL, Oracle, or MS SQL Server define vendor-specific +error codes. Database SQL drivers usually provide access to both properties. +For example, in Java, the [`SQLException`](https://docs.oracle.com/javase/8/docs/api/java/sql/SQLException.html) +class reports them with `getSQLState()` and `getErrorCode()` methods. + +Instrumentations SHOULD populate the `db.response.status_code` with the +the most specific code available to them. + +Here's a non-exhaustive list of databases that report vendor-specific +codes with granularity higher than SQLSTATE (or don't report SQLSTATE +at all): + +- [DB2 SQL codes](https://www.ibm.com/docs/db2-for-zos/12?topic=codes-sql). +- [Maria DB error codes](https://mariadb.com/kb/en/mariadb-error-code-reference/) +- [Microsoft SQL Server errors](https://docs.microsoft.com/sql/relational-databases/errors-events/database-engine-events-and-errors) +- [MySQL error codes](https://dev.mysql.com/doc/mysql-errors/9.0/en/error-reference-introduction.html) +- [Oracle error codes](https://docs.oracle.com/cd/B28359_01/server.111/b28278/toc.htm) +- [SQLite result codes](https://www.sqlite.org/rescode.html) + +These systems SHOULD set the `db.response.status_code` to a +known vendor-specific error code. If only SQLSTATE is available, +it SHOULD be used. + +When multiple error codes are available and specificity is unclear, +instrumentation SHOULD set the `db.response.status_code` to the +concatenated string of all codes with '/' used as a separator. + +For example, generic DB instrumentation that detected an error and has +SQLSTATE `"42000"` and vendor-specific `1071` should set +`db.response.status_code` to `"42000/1071"`." + +**[7]:** The `error.type` SHOULD match the `db.response.status_code` returned by the database or the client library, or the canonical name of exception that occurred. +When using canonical exception type name, instrumentation SHOULD do the best effort to report the most relevant type. For example, if the original exception is wrapped into a generic one, the original exception SHOULD be preferred. +Instrumentations SHOULD document how `error.type` is populated. + +**[8]:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available. + +**[9]:** If using a port other than the default port for this DBMS and if `server.address` is set. + +**[10]:** For sanitization see [Sanitization of `db.query.text`](../../docs/database/database-spans.md#sanitization-of-dbquerytext). +For batch operations, if the individual operations are known to have the same query text then that query text SHOULD be used, otherwise all of the individual query texts SHOULD be concatenated with separator `; ` or some other database system specific separator if more applicable. +Even though parameterized query text can potentially have sensitive data, by using a parameterized query the user is giving a strong signal that any sensitive data will be passed as parameter values, and the benefit to observability of capturing the static part of the query text by default outweighs the risk. + +**[11]:** SHOULD be collected by default only if there is sanitization that excludes sensitive information. See [Sanitization of `db.query.text`](../../docs/database/database-spans.md#sanitization-of-dbquerytext). + +**[12]:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available. + +**[13]:** Query parameters should only be captured when `db.query.text` is parameterized with placeholders. +If a parameter has no name and instead is referenced only by index, then `` SHOULD be the 0-based index. + + + +The following attributes can be important for making sampling decisions +and SHOULD be provided **at span creation time** (if provided at all): + +* [`db.collection.name`](/docs/attributes-registry/db.md) +* [`db.namespace`](/docs/attributes-registry/db.md) +* [`db.operation.name`](/docs/attributes-registry/db.md) +* [`db.query.text`](/docs/attributes-registry/db.md) +* [`server.address`](/docs/attributes-registry/server.md) +* [`server.port`](/docs/attributes-registry/server.md) + +`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used. + +| Value | Description | Stability | +|---|---|---| +| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | + + + + + + + + +[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status diff --git a/docs/database/postgresql.md b/docs/database/postgresql.md new file mode 100644 index 0000000000..c92ed52fb4 --- /dev/null +++ b/docs/database/postgresql.md @@ -0,0 +1,142 @@ + + +# Semantic Conventions for PostgreSQL + +**Status**: [Experimental][DocumentStatus] + +The Semantic Conventions for *PostgreSQL* extend and override the [Database Semantic Conventions](database-spans.md). + +`db.system` MUST be set to `"postgresql"` and SHOULD be provided **at span creation time**. + +## Attributes + + + + + + + + +| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | +|---|---|---|---|---|---| +| [`db.collection.name`](/docs/attributes-registry/db.md) | string | The name of the SQL table that the operation is acting upon. [1] | `users`; `dbo.products` | `Conditionally Required` [2] | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`db.namespace`](/docs/attributes-registry/db.md) | string | The schema associated with the connection, qualified by the database name. [3] | `mydatabase.products`; `mydatabase.customers` | `Conditionally Required` If available without an additional network call. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`db.operation.name`](/docs/attributes-registry/db.md) | string | The name of the operation or command being executed. [4] | `SELECT`; `INSERT`; `UPDATE`; `DELETE`; `CREATE`; `mystoredproc` | `Conditionally Required` [5] | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | [PostgreSQL error code](https://www.postgresql.org/docs/current/errcodes-appendix.html). [6] | `08000`; `08P01` | `Conditionally Required` If response has ended with warning or an error. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [7] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` If and only if the operation failed. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [8] | `80`; `8080`; `443` | `Conditionally Required` [9] | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`db.query.text`](/docs/attributes-registry/db.md) | string | The database query being executed. [10] | `SELECT * FROM wuser_table where username = ?`; `SET mykey "WuValue"` | `Recommended` [11] | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`server.address`](/docs/attributes-registry/server.md) | string | Name of the database host. [12] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`db.query.parameter.`](/docs/attributes-registry/db.md) | string | A query parameter used in `db.query.text`, with `` being the parameter name, and the attribute value being a string representation of the parameter value. [13] | `someval`; `55` | `Opt-In` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + +**[1]:** It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. +If the collection name is parsed from the query text, it SHOULD be the first collection name found in the query and it SHOULD match the value provided in the query text including any schema and database name prefix. +For batch operations, if the individual operations are known to have the same collection name then that collection name SHOULD be used, otherwise `db.collection.name` SHOULD NOT be captured. + +**[2]:** If readily available. The collection name MAY be parsed from the query text, in which case it SHOULD be the first collection name found in the query. + +**[3]:** `db.namespace` SHOULD be set to the combination of database and schema name following the `{database}.{schema}` pattern. + +A connection's currently associated database may change during its lifetime, e.g. from executing `SET search_path TO `. +If the search path has multiple schemas, the first schema in the search path SHOULD be used. + +If instrumentation is unable to capture the connection's currently associated schema on each query +without triggering an additional query to be executed (e.g. `SELECT current_schema()`), +then it is RECOMMENDED to fallback and use the schema provided when the connection was established. + +Instrumentation SHOULD document if `db.namespace` reflects the schema provided when the connection was established. + +Instrumentation MAY use the user name when the connection was established as a stand-in for the schema name. + +Instrumentation SHOULD document if `db.namespace` reflects the user provided when the connection was established. + +It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. + +**[4]:** This SHOULD be the SQL command such as `SELECT`, `INSERT`, `UPDATE`, `CREATE`, `DROP`. +In the case of `EXEC`, this SHOULD be the stored procedure name that is being executed. + +**[5]:** If readily available. The operation name MAY be parsed from the query text, in which case it SHOULD be the first operation name found in the query. + +**[6]:** SQL defines [SQLSTATE](https://wikipedia.org/wiki/SQLSTATE) as a database +return code which is adopted by some database systems like PostgreSQL. +See [PostgreSQL error codes](https://www.postgresql.org/docs/current/errcodes-appendix.html) +for the details. + +Other systems like MySQL, Oracle, or MS SQL Server define vendor-specific +error codes. Database SQL drivers usually provide access to both properties. +For example, in Java, the [`SQLException`](https://docs.oracle.com/javase/8/docs/api/java/sql/SQLException.html) +class reports them with `getSQLState()` and `getErrorCode()` methods. + +Instrumentations SHOULD populate the `db.response.status_code` with the +the most specific code available to them. + +Here's a non-exhaustive list of databases that report vendor-specific +codes with granularity higher than SQLSTATE (or don't report SQLSTATE +at all): + +- [DB2 SQL codes](https://www.ibm.com/docs/db2-for-zos/12?topic=codes-sql). +- [Maria DB error codes](https://mariadb.com/kb/en/mariadb-error-code-reference/) +- [Microsoft SQL Server errors](https://docs.microsoft.com/sql/relational-databases/errors-events/database-engine-events-and-errors) +- [MySQL error codes](https://dev.mysql.com/doc/mysql-errors/9.0/en/error-reference-introduction.html) +- [Oracle error codes](https://docs.oracle.com/cd/B28359_01/server.111/b28278/toc.htm) +- [SQLite result codes](https://www.sqlite.org/rescode.html) + +These systems SHOULD set the `db.response.status_code` to a +known vendor-specific error code. If only SQLSTATE is available, +it SHOULD be used. + +When multiple error codes are available and specificity is unclear, +instrumentation SHOULD set the `db.response.status_code` to the +concatenated string of all codes with '/' used as a separator. + +For example, generic DB instrumentation that detected an error and has +SQLSTATE `"42000"` and vendor-specific `1071` should set +`db.response.status_code` to `"42000/1071"`." + +**[7]:** The `error.type` SHOULD match the `db.response.status_code` returned by the database or the client library, or the canonical name of exception that occurred. +When using canonical exception type name, instrumentation SHOULD do the best effort to report the most relevant type. For example, if the original exception is wrapped into a generic one, the original exception SHOULD be preferred. +Instrumentations SHOULD document how `error.type` is populated. + +**[8]:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available. + +**[9]:** If using a port other than the default port for this DBMS and if `server.address` is set. + +**[10]:** For sanitization see [Sanitization of `db.query.text`](../../docs/database/database-spans.md#sanitization-of-dbquerytext). +For batch operations, if the individual operations are known to have the same query text then that query text SHOULD be used, otherwise all of the individual query texts SHOULD be concatenated with separator `; ` or some other database system specific separator if more applicable. +Even though parameterized query text can potentially have sensitive data, by using a parameterized query the user is giving a strong signal that any sensitive data will be passed as parameter values, and the benefit to observability of capturing the static part of the query text by default outweighs the risk. + +**[11]:** SHOULD be collected by default only if there is sanitization that excludes sensitive information. See [Sanitization of `db.query.text`](../../docs/database/database-spans.md#sanitization-of-dbquerytext). + +**[12]:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available. + +**[13]:** Query parameters should only be captured when `db.query.text` is parameterized with placeholders. +If a parameter has no name and instead is referenced only by index, then `` SHOULD be the 0-based index. + + + +The following attributes can be important for making sampling decisions +and SHOULD be provided **at span creation time** (if provided at all): + +* [`db.collection.name`](/docs/attributes-registry/db.md) +* [`db.namespace`](/docs/attributes-registry/db.md) +* [`db.operation.name`](/docs/attributes-registry/db.md) +* [`db.query.text`](/docs/attributes-registry/db.md) +* [`server.address`](/docs/attributes-registry/server.md) +* [`server.port`](/docs/attributes-registry/server.md) + +`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used. + +| Value | Description | Stability | +|---|---|---| +| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | + + + + + + + + +[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status diff --git a/docs/database/redis.md b/docs/database/redis.md index 3289822458..186d12efa9 100644 --- a/docs/database/redis.md +++ b/docs/database/redis.md @@ -21,7 +21,7 @@ The Semantic Conventions for [Redis](https://redis.com/) extend and override the | Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | |---|---|---|---|---|---| -| [`db.namespace`](/docs/attributes-registry/db.md) | string | The index of the database being accessed as used in the [`SELECT` command](https://redis.io/commands/select) (captured as a string). [1] | `0`; `1`; `15` | `Conditionally Required` If and only if it can be captured reliably. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`db.namespace`](/docs/attributes-registry/db.md) | string | The [database index] associated with the connection, represented as a string. [1] | `0`; `1`; `15` | `Conditionally Required` If and only if it can be captured reliably. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`db.operation.name`](/docs/attributes-registry/db.md) | string | The name of the operation or command being executed. [2] | `findAndModify`; `HMSET`; `SELECT` | `Conditionally Required` [3] | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`db.response.status_code`](/docs/attributes-registry/db.md) | string | The Redis [simple error](https://redis.io/docs/latest/develop/reference/protocol-spec/#simple-errors) prefix. [4] | `ERR`; `WRONGTYPE`; `CLUSTERDOWN` | `Conditionally Required` [5] | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [6] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` If and only if the operation failed. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | @@ -32,9 +32,13 @@ The Semantic Conventions for [Redis](https://redis.com/) extend and override the | [`server.address`](/docs/attributes-registry/server.md) | string | Name of the database host. [12] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`db.query.parameter.`](/docs/attributes-registry/db.md) | string | A query parameter used in `db.query.text`, with `` being the parameter name, and the attribute value being a string representation of the parameter value. [13] | `someval`; `55` | `Opt-In` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -**[1]:** The database index for current connection can be changed by the application dynamically. Instrumentations MAY use the initial database index provided in the connection string and keep track of the currently selected database to capture the `db.namespace`. -Instrumentations SHOULD NOT set this attribute if capturing it would require additional network calls to Redis. -For commands that switch the database, this SHOULD be set to the target database (even if the command fails). +**[1]:** A connection's currently associated database index may change during its lifetime, e.g. from executing `SELECT `. + +If instrumentation is unable to capture the connection's currently associated database index on each query +without triggering an additional query to be executed, +then it is RECOMMENDED to fallback and use the database index provided when the connection was established. + +Instrumentation SHOULD document if `db.namespace` reflects the database index provided when the connection was established. **[2]:** It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. If the operation name is parsed from the query text, it SHOULD be the first operation name found in the query. diff --git a/docs/database/sql.md b/docs/database/sql.md index fa3fbdb71e..8dae35e9d7 100644 --- a/docs/database/sql.md +++ b/docs/database/sql.md @@ -46,7 +46,7 @@ Instrumentations applied to generic SQL drivers SHOULD adhere to SQL semantic co | Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | |---|---|---|---|---|---| | [`db.collection.name`](/docs/attributes-registry/db.md) | string | The name of the SQL table that the operation is acting upon. [1] | `users`; `dbo.products` | `Conditionally Required` [2] | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`db.namespace`](/docs/attributes-registry/db.md) | string | The name of the database, fully qualified within the server address and port. [3] | `customers`; `test.users` | `Conditionally Required` If available. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`db.namespace`](/docs/attributes-registry/db.md) | string | The database associated with the connection, fully qualified within the server address and port. [3] | `customers`; `test.users` | `Conditionally Required` If available without an additional network call. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`db.operation.name`](/docs/attributes-registry/db.md) | string | The name of the operation or command being executed. [4] | `SELECT`; `INSERT`; `UPDATE`; `DELETE`; `CREATE`; `mystoredproc` | `Conditionally Required` [5] | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`db.response.status_code`](/docs/attributes-registry/db.md) | string | Database response code recorded as string. [6] | `ORA-17027`; `1052`; `2201B` | `Conditionally Required` If response has ended with warning or an error. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [7] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` If and only if the operation failed. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | @@ -61,7 +61,7 @@ For batch operations, if the individual operations are known to have the same co **[2]:** If readily available. The collection name MAY be parsed from the query text, in which case it SHOULD be the first collection name found in the query. -**[3]:** If a database system has multiple namespace components, they SHOULD be concatenated +**[3]:** If a database system has multiple namespace components (e.g. schema name and database name), they SHOULD be concatenated (potentially using database system specific conventions) from most general to most specific namespace component, and more specific namespaces SHOULD NOT be captured without the more general namespaces, to ensure that "startswith" queries for the more general namespaces will be valid. @@ -69,17 +69,15 @@ the more general namespaces, to ensure that "startswith" queries for the more ge Unless specified by the system-specific semantic convention, the `db.namespace` attribute matches the name of the database being accessed. -The database name can usually be obtained with database driver API such as -[JDBC `Connection.getCatalog()`](https://docs.oracle.com/javase/8/docs/api/java/sql/Connection.html#getCatalog--) -or [.NET `SqlConnection.Database`](https://learn.microsoft.com/dotnet/api/system.data.sqlclient.sqlconnection.database). +A connection's currently associated database may change during its lifetime, e.g. from executing `USE `. -Some database drivers don't detect when the current database is changed (for example, with SQL `USE database` statement). -Instrumentations that parse SQL statements MAY use the database name provided -in the connection string and keep track of the currently selected database name. +If instrumentation is unable to capture the connection's currently associated database on each query +without triggering an additional query to be executed (e.g. `SELECT DATABASE()`), +then it is RECOMMENDED to fallback and use the database provided when the connection was established. -For commands that switch the database, this SHOULD be set to the target database (even if the command fails). +Instrumentation SHOULD document if `db.namespace` reflects the database provided when the connection was established. -If instrumentation cannot reliably determine the current database name, it SHOULD NOT set `db.namespace`. +It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. **[4]:** This SHOULD be the SQL command such as `SELECT`, `INSERT`, `UPDATE`, `CREATE`, `DROP`. In the case of `EXEC`, this SHOULD be the stored procedure name that is being executed. diff --git a/model/database/spans.yaml b/model/database/spans.yaml index 10b56c8ad6..6f0dcbd91d 100644 --- a/model/database/spans.yaml +++ b/model/database/spans.yaml @@ -98,13 +98,22 @@ groups: attributes: - ref: db.namespace sampling_relevant: true - brief: The name of the database, fully qualified within the server address and port. - note: - When connecting to a default instance, `db.namespace` SHOULD be set to the name of - the database. When connecting to a [named instance](https://learn.microsoft.com/sql/connect/jdbc/building-the-connection-url#named-and-multiple-sql-server-instances), + brief: > + The database associated with the connection, qualified by the instance name. + note: | + When connected to a default instance, `db.namespace` SHOULD be set to the name of + the database. When connected to a [named instance](https://learn.microsoft.com/sql/connect/jdbc/building-the-connection-url#named-and-multiple-sql-server-instances), `db.namespace` SHOULD be set to the combination of instance and database name following the `{instance_name}.{database_name}` pattern. - For commands that switch the database, this SHOULD be set to the target database (even if the command fails). + A connection's currently associated database may change during its lifetime, e.g. from executing `USE `. + + If instrumentation is unable to capture the connection's currently associated database on each query + without triggering an additional query to be executed (e.g. `SELECT DB_NAME()`), + then it is RECOMMENDED to fallback and use the database provided when the connection was established. + + Instrumentation SHOULD document if `db.namespace` reflects the database provided when the connection was established. + + It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. examples: ["instance1.products", "customers"] - ref: db.response.status_code brief: > @@ -114,6 +123,90 @@ groups: Microsoft SQL Server does not report SQLSTATE. examples: ["102", "40020"] + - id: db.postgresql + type: span + extends: db.sql + brief: > + Attributes for PostgreSQL + attributes: + - ref: db.namespace + sampling_relevant: true + brief: > + The schema associated with the connection, qualified by the database name. + note: | + `db.namespace` SHOULD be set to the combination of database and schema name following the `{database}.{schema}` pattern. + + A connection's currently associated database may change during its lifetime, e.g. from executing `SET search_path TO `. + If the search path has multiple schemas, the first schema in the search path SHOULD be used. + + If instrumentation is unable to capture the connection's currently associated schema on each query + without triggering an additional query to be executed (e.g. `SELECT current_schema()`), + then it is RECOMMENDED to fallback and use the schema provided when the connection was established. + + Instrumentation SHOULD document if `db.namespace` reflects the schema provided when the connection was established. + + Instrumentation MAY use the user name when the connection was established as a stand-in for the schema name. + + Instrumentation SHOULD document if `db.namespace` reflects the user provided when the connection was established. + + It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. + examples: ["mydatabase.products", "mydatabase.customers"] + - ref: db.response.status_code + brief: > + [PostgreSQL error code](https://www.postgresql.org/docs/current/errcodes-appendix.html). + examples: ["08000", "08P01"] + + - id: db.mysql + type: span + extends: db.sql + brief: > + Attributes for MySQL + attributes: + - ref: db.namespace + sampling_relevant: true + brief: The database associated with the connection. + note: | + A connection's currently associated database may change during its lifetime, e.g. from executing `USE `. + + If instrumentation is unable to capture the connection's currently associated database on each query + without triggering an additional query to be executed (e.g. `SELECT DATABASE()`), + then it is RECOMMENDED to fallback and use the database provided when the connection was established. + + Instrumentation SHOULD document if `db.namespace` reflects the database provided when the connection was established. + + It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. + examples: ["products", "customers"] + - ref: db.response.status_code + brief: > + [MySQL error number](https://dev.mysql.com/doc/mysql-errors/9.0/en/error-reference-introduction.html). + examples: ["1005", "MY-010016"] + + - id: db.mariadb + type: span + extends: db.sql + brief: > + Attributes for MariaDB + attributes: + - ref: db.namespace + sampling_relevant: true + brief: The database associated with the connection. + note: | + A connection's currently associated database may change during its lifetime, e.g. from executing `USE `. + + If instrumentation is unable to capture the connection's currently associated database on each query + without triggering an additional query to be executed (e.g. `SELECT DATABASE()`), + then it is RECOMMENDED to fallback and use the database provided when the connection was established. + + Instrumentation SHOULD document if `db.namespace` reflects the database provided when the connection was established. + + It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. + examples: ["products", "customers"] + - ref: db.response.status_code + brief: > + [Maria DB error code](https://mariadb.com/kb/en/mariadb-error-code-reference/) + represented as a string. + examples: ["1008", "3058"] + - id: db.cassandra type: span stability: experimental @@ -123,8 +216,7 @@ groups: attributes: - ref: db.namespace sampling_relevant: true - brief: The Cassandra keyspace name. - note: For commands that switch the keyspace, this SHOULD be set to the target keyspace (even if the command fails). + brief: The keyspace associated with the session. examples: ["mykeyspace"] requirement_level: conditionally_required: If available. @@ -161,10 +253,6 @@ groups: brief: The HBase namespace. requirement_level: conditionally_required: If applicable. - note: > - When performing table-related operations, the instrumentations SHOULD extract the namespace from the table name according to - the [HBase table naming conventions](https://hbase.apache.org/book.html#namespace_creation). If namespace is not provided, - instrumentation SHOULD set `db.namespace` value to `default`. examples: ['mynamespace'] - ref: db.collection.name sampling_relevant: true @@ -220,18 +308,17 @@ groups: - ref: db.namespace sampling_relevant: true brief: > - The index of the database being accessed as used in the [`SELECT` command](https://redis.io/commands/select) - (captured as a string). + The [database index] associated with the connection, represented as a string. requirement_level: conditionally_required: If and only if it can be captured reliably. - note: > - The database index for current connection can be changed by the application dynamically. Instrumentations MAY use - the initial database index provided in the connection string and keep track of the currently selected - database to capture the `db.namespace`. + note: | + A connection's currently associated database index may change during its lifetime, e.g. from executing `SELECT `. - Instrumentations SHOULD NOT set this attribute if capturing it would require additional network calls to Redis. + If instrumentation is unable to capture the connection's currently associated database index on each query + without triggering an additional query to be executed, + then it is RECOMMENDED to fallback and use the database index provided when the connection was established. - For commands that switch the database, this SHOULD be set to the target database (even if the command fails). + Instrumentation SHOULD document if `db.namespace` reflects the database index provided when the connection was established. examples: ["0", "1", "15"] - ref: db.query.text sampling_relevant: true @@ -359,10 +446,13 @@ groups: brief: The name of the SQL table that the operation is acting upon. examples: ['users', 'dbo.products'] - ref: db.namespace + brief: > + The database associated with the connection, + fully qualified within the server address and port. requirement_level: - conditionally_required: If available. + conditionally_required: If available without an additional network call. note: | - If a database system has multiple namespace components, they SHOULD be concatenated + If a database system has multiple namespace components (e.g. schema name and database name), they SHOULD be concatenated (potentially using database system specific conventions) from most general to most specific namespace component, and more specific namespaces SHOULD NOT be captured without the more general namespaces, to ensure that "startswith" queries for the more general namespaces will be valid. @@ -370,18 +460,15 @@ groups: Unless specified by the system-specific semantic convention, the `db.namespace` attribute matches the name of the database being accessed. - The database name can usually be obtained with database driver API such as - [JDBC `Connection.getCatalog()`](https://docs.oracle.com/javase/8/docs/api/java/sql/Connection.html#getCatalog--) - or [.NET `SqlConnection.Database`](https://learn.microsoft.com/dotnet/api/system.data.sqlclient.sqlconnection.database). - - Some database drivers don't detect when the current database is changed (for example, with SQL `USE database` statement). - Instrumentations that parse SQL statements MAY use the database name provided - in the connection string and keep track of the currently selected database name. + A connection's currently associated database may change during its lifetime, e.g. from executing `USE `. - For commands that switch the database, this SHOULD be set to the target database (even if the command fails). + If instrumentation is unable to capture the connection's currently associated database on each query + without triggering an additional query to be executed (e.g. `SELECT DATABASE()`), + then it is RECOMMENDED to fallback and use the database provided when the connection was established. - If instrumentation cannot reliably determine the current database name, it SHOULD NOT set `db.namespace`. + Instrumentation SHOULD document if `db.namespace` reflects the database provided when the connection was established. + It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. - ref: db.response.status_code brief: > Database response code recorded as string. From 5298ea906487737fc2cc16fb43fc0fd3824d8135 Mon Sep 17 00:00:00 2001 From: Trask Stalnaker Date: Thu, 3 Oct 2024 08:45:17 -0700 Subject: [PATCH 3/5] Copy database semconv warning from spans/metrics to general README (#1445) Co-authored-by: Liudmila Molkova --- docs/database/README.md | 27 +++++++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/docs/database/README.md b/docs/database/README.md index 7fcb8d42a8..fcc769028a 100644 --- a/docs/database/README.md +++ b/docs/database/README.md @@ -13,11 +13,30 @@ This document defines semantic conventions for database client spans as well as database metrics and logs. > **Warning** +> > Existing database instrumentations that are using -> [v1.24.0 of this document](https://github.com/open-telemetry/semantic-conventions/blob/v1.24.0/docs/database/README.md) -> (or prior) SHOULD NOT change the version of the database conventions that they emit by default -> until a transition plan to the (future) stable semantic conventions has been published. -> Conventions include, but are not limited to, attributes, metric and span names, and unit of measure. +> [v1.24.0 of this document](https://github.com/open-telemetry/semantic-conventions/blob/v1.24.0/docs/database/database-spans.md) +> (or prior): +> +> * SHOULD NOT change the version of the database conventions that they emit by default +> until the database semantic conventions are marked stable. +> Conventions include, but are not limited to, attributes, +> metric and span names, and unit of measure. +> * SHOULD introduce an environment variable `OTEL_SEMCONV_STABILITY_OPT_IN` +> in the existing major version which is a comma-separated list of values. +> If the list of values includes: +> * `database` - emit the new, stable database conventions, +> and stop emitting the old experimental database conventions +> that the instrumentation emitted previously. +> * `database/dup` - emit both the old and the stable database conventions, +> allowing for a seamless transition. +> * The default behavior (in the absence of one of these values) is to continue +> emitting whatever version of the old experimental database conventions +> the instrumentation was emitting previously. +> * Note: `database/dup` has higher precedence than `database` in case both values are present +> * SHOULD maintain (security patching at a minimum) the existing major version +> for at least six months after it starts emitting both sets of conventions. +> * SHOULD drop the environment variable in the next major version. Semantic conventions for database operations are defined for the following signals: From 32b75a8d465ddea8af396666cd4020c15f4859e1 Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Fri, 4 Oct 2024 09:23:41 -0700 Subject: [PATCH 4/5] Introduce per-message structured GenAI events instead of prompt/completion span events (#980) --- .chloggen/980.yaml | 4 + docs/attributes-registry/gen-ai.md | 60 ++- docs/gen-ai/README.md | 1 + docs/gen-ai/gen-ai-events.md | 383 ++++++++++++++++++ docs/gen-ai/gen-ai-spans.md | 66 +-- .../deprecated/registry-deprecated.yaml | 12 + model/gen-ai/events.yaml | 48 +++ model/gen-ai/registry.yaml | 12 - model/gen-ai/spans.yaml | 29 -- 9 files changed, 481 insertions(+), 134 deletions(-) create mode 100644 .chloggen/980.yaml create mode 100644 docs/gen-ai/gen-ai-events.md create mode 100644 model/gen-ai/events.yaml diff --git a/.chloggen/980.yaml b/.chloggen/980.yaml new file mode 100644 index 0000000000..e8bc588dfe --- /dev/null +++ b/.chloggen/980.yaml @@ -0,0 +1,4 @@ +change_type: breaking +component: gen_ai +note: Deprecate `gen_ai.prompt` and `gen_ai.completion` attributes, introduce log-based events for GenAI inputs and outputs. +issues: [834, 980] diff --git a/docs/attributes-registry/gen-ai.md b/docs/attributes-registry/gen-ai.md index 0dc935e462..12c5283980 100644 --- a/docs/attributes-registry/gen-ai.md +++ b/docs/attributes-registry/gen-ai.md @@ -14,34 +14,28 @@ This document defines the attributes used to describe telemetry in the context of Generative Artificial Intelligence (GenAI) Models requests and responses. -| Attribute | Type | Description | Examples | Stability | -| ---------------------------------- | -------- | ------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------- | ---------------------------------------------------------------- | -| `gen_ai.completion` | string | The full response received from the GenAI model. [1] | `[{'role': 'assistant', 'content': 'The capital of France is Paris.'}]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `gen_ai.operation.name` | string | The name of the operation being performed. [2] | `chat`; `text_completion` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `gen_ai.prompt` | string | The full prompt sent to the GenAI model. [3] | `[{'role': 'user', 'content': 'What is the capital of France?'}]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `gen_ai.request.frequency_penalty` | double | The frequency penalty setting for the GenAI request. | `0.1` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `gen_ai.request.max_tokens` | int | The maximum number of tokens the model generates for a request. | `100` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `gen_ai.request.model` | string | The name of the GenAI model a request is being made to. | `gpt-4` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `gen_ai.request.presence_penalty` | double | The presence penalty setting for the GenAI request. | `0.1` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `gen_ai.request.stop_sequences` | string[] | List of sequences that the model will use to stop generating further tokens. | `["forest", "lived"]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `gen_ai.request.temperature` | double | The temperature setting for the GenAI request. | `0.0` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `gen_ai.request.top_k` | double | The top_k sampling setting for the GenAI request. | `1.0` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `gen_ai.request.top_p` | double | The top_p sampling setting for the GenAI request. | `1.0` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `gen_ai.response.finish_reasons` | string[] | Array of reasons the model stopped generating tokens, corresponding to each generation received. | `["stop"]`; `["stop", "length"]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `gen_ai.response.id` | string | The unique identifier for the completion. | `chatcmpl-123` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `gen_ai.response.model` | string | The name of the model that generated the response. | `gpt-4-0613` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `gen_ai.system` | string | The Generative AI product as identified by the client or server instrumentation. [4] | `openai` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `gen_ai.token.type` | string | The type of token being counted. | `input`; `output` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `gen_ai.usage.input_tokens` | int | The number of tokens used in the GenAI input (prompt). | `100` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `gen_ai.usage.output_tokens` | int | The number of tokens used in the GenAI response (completion). | `180` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | - -**[1]:** It's RECOMMENDED to format completions as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation) - -**[2]:** If one of the predefined values applies, but specific system uses a different name it's RECOMMENDED to document it in the semantic conventions for specific GenAI system and use system-specific name in the instrumentation. If a different name is not documented, instrumentation libraries SHOULD use applicable predefined value. - -**[3]:** It's RECOMMENDED to format prompts as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation) - -**[4]:** The `gen_ai.system` describes a family of GenAI models with specific model identified +| Attribute | Type | Description | Examples | Stability | +| ---------------------------------- | -------- | ------------------------------------------------------------------------------------------------ | -------------------------------- | ---------------------------------------------------------------- | +| `gen_ai.operation.name` | string | The name of the operation being performed. [1] | `chat`; `text_completion` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.request.frequency_penalty` | double | The frequency penalty setting for the GenAI request. | `0.1` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.request.max_tokens` | int | The maximum number of tokens the model generates for a request. | `100` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.request.model` | string | The name of the GenAI model a request is being made to. | `gpt-4` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.request.presence_penalty` | double | The presence penalty setting for the GenAI request. | `0.1` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.request.stop_sequences` | string[] | List of sequences that the model will use to stop generating further tokens. | `["forest", "lived"]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.request.temperature` | double | The temperature setting for the GenAI request. | `0.0` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.request.top_k` | double | The top_k sampling setting for the GenAI request. | `1.0` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.request.top_p` | double | The top_p sampling setting for the GenAI request. | `1.0` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.response.finish_reasons` | string[] | Array of reasons the model stopped generating tokens, corresponding to each generation received. | `["stop"]`; `["stop", "length"]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.response.id` | string | The unique identifier for the completion. | `chatcmpl-123` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.response.model` | string | The name of the model that generated the response. | `gpt-4-0613` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.system` | string | The Generative AI product as identified by the client or server instrumentation. [2] | `openai` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.token.type` | string | The type of token being counted. | `input`; `output` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.usage.input_tokens` | int | The number of tokens used in the GenAI input (prompt). | `100` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.usage.output_tokens` | int | The number of tokens used in the GenAI response (completion). | `180` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + +**[1]:** If one of the predefined values applies, but specific system uses a different name it's RECOMMENDED to document it in the semantic conventions for specific GenAI system and use system-specific name in the instrumentation. If a different name is not documented, instrumentation libraries SHOULD use applicable predefined value. + +**[2]:** The `gen_ai.system` describes a family of GenAI models with specific model identified by `gen_ai.request.model` and `gen_ai.response.model` attributes. The actual GenAI product may differ from the one identified by the client. @@ -104,7 +98,9 @@ Thie group defines attributes for OpenAI. Describes deprecated `gen_ai` attributes. -| Attribute | Type | Description | Examples | Stability | -| -------------------------------- | ---- | ----------------------------------------------------- | -------- | ------------------------------------------------------------------------------------------------------------------ | -| `gen_ai.usage.completion_tokens` | int | Deprecated, use `gen_ai.usage.output_tokens` instead. | `42` | ![Deprecated](https://img.shields.io/badge/-deprecated-red)
Replaced by `gen_ai.usage.output_tokens` attribute. | -| `gen_ai.usage.prompt_tokens` | int | Deprecated, use `gen_ai.usage.input_tokens` instead. | `42` | ![Deprecated](https://img.shields.io/badge/-deprecated-red)
Replaced by `gen_ai.usage.input_tokens` attribute. | +| Attribute | Type | Description | Examples | Stability | +| -------------------------------- | ------ | --------------------------------------------------------- | ----------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ | +| `gen_ai.completion` | string | Deprecated, use Event API to report completions contents. | `[{'role': 'assistant', 'content': 'The capital of France is Paris.'}]` | ![Deprecated](https://img.shields.io/badge/-deprecated-red)
Removed, no replacement at this time. | +| `gen_ai.prompt` | string | Deprecated, use Event API to report prompt contents. | `[{'role': 'user', 'content': 'What is the capital of France?'}]` | ![Deprecated](https://img.shields.io/badge/-deprecated-red)
Removed, no replacement at this time. | +| `gen_ai.usage.completion_tokens` | int | Deprecated, use `gen_ai.usage.output_tokens` instead. | `42` | ![Deprecated](https://img.shields.io/badge/-deprecated-red)
Replaced by `gen_ai.usage.output_tokens` attribute. | +| `gen_ai.usage.prompt_tokens` | int | Deprecated, use `gen_ai.usage.input_tokens` instead. | `42` | ![Deprecated](https://img.shields.io/badge/-deprecated-red)
Replaced by `gen_ai.usage.input_tokens` attribute. | diff --git a/docs/gen-ai/README.md b/docs/gen-ai/README.md index 086a6d327f..020fd0a4ca 100644 --- a/docs/gen-ai/README.md +++ b/docs/gen-ai/README.md @@ -16,6 +16,7 @@ use the conventions in limited non-critical workloads and share the feedback Semantic conventions for Generative AI operations are defined for the following signals: +* [Events](gen-ai-events.md): Semantic Conventions for Generative AI inputs and outputs - *events*. * [Metrics](gen-ai-metrics.md): Semantic Conventions for Generative AI operations - *metrics*. * [Spans](gen-ai-spans.md): Semantic Conventions for Generative AI requests - *spans*. diff --git a/docs/gen-ai/gen-ai-events.md b/docs/gen-ai/gen-ai-events.md new file mode 100644 index 0000000000..ea394a7c3c --- /dev/null +++ b/docs/gen-ai/gen-ai-events.md @@ -0,0 +1,383 @@ + + +# Semantic Conventions for GenAI events + +**Status**: [Experimental][DocumentStatus] + + + + + +- [Common attributes](#common-attributes) +- [System event](#system-event) +- [User event](#user-event) +- [Assistant event](#assistant-event) + - [`ToolCall` object](#toolcall-object) + - [`Function` object](#function-object) +- [Tool event](#tool-event) +- [Choice event](#choice-event) + - [`Message` object](#message-object) +- [Custom events](#custom-events) +- [Examples](#examples) + - [Chat completion](#chat-completion) + - [Tools](#tools) + - [Chat completion with multiple choices](#chat-completion-with-multiple-choices) + + + +GenAI instrumentations MAY capture user inputs sent to the model and responses received from it as [events](https://github.com/open-telemetry/opentelemetry-specification/tree/v1.33.0/specification/logs/event-api.md). + +> Note: +> Event API is experimental and not yet available in some languages. Check [spec-compliance matrix](https://github.com/open-telemetry/opentelemetry-specification/blob/main/spec-compliance-matrix.md#events) to see the implementation status in corresponding language. + +Instrumentations MAY capture inputs and outputs if and only if application has enabled the collection of this data. +This is for three primary reasons: + +1. Data privacy concerns. End users of GenAI applications may input sensitive information or personally identifiable information (PII) that they do not wish to be sent to a telemetry backend. +2. Data size concerns. Although there is no specified limit to sizes, there are practical limitations in programming languages and telemetry systems. Some GenAI systems allow for extremely large context windows that end users may take full advantage of. +3. Performance concerns. Sending large amounts of data to a telemetry backend may cause performance issues for the application. + +Body fields that contain user input, model output, or other potentially sensitive and verbose data +SHOULD NOT be captured by default. + +Semantic conventions for individual systems which extend content events SHOULD document all additional body fields and specify whether they +should be captured by default or need application to opt into capturing them. + +Telemetry consumers SHOULD expect to receive unknown body fields. + +Instrumentations SHOULD NOT capture undocumented body fields and MUST follow the documented defaults for known fields. +Instrumentations MAY offer configuration options allowing to disable events or allowing to capture all fields. + +## Common attributes + +The following attributes apply to all GenAI events. + + + + + + + + +| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | +|---|---|---|---|---|---| +| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [1] | `openai` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + +**[1]:** The `gen_ai.system` describes a family of GenAI models with specific model identified +by `gen_ai.request.model` and `gen_ai.response.model` attributes. + +The actual GenAI product may differ from the one identified by the client. +For example, when using OpenAI client libraries to communicate with Mistral, the `gen_ai.system` +is set to `openai` based on the instrumentation's best knowledge. + +For custom model, a custom friendly name SHOULD be used. +If none of these options apply, the `gen_ai.system` SHOULD be set to `_OTHER`. + + + +`gen_ai.system` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used. + +| Value | Description | Stability | +|---|---|---| +| `anthropic` | Anthropic | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `cohere` | Cohere | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `openai` | OpenAI | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `vertex_ai` | Vertex AI | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + + + + + + + + +## System event + +This event describes the instructions passed to the GenAI model. + +The event name MUST be `gen_ai.system.message`. + +| Body Field | Type | Description | Examples | Requirement Level | +|---|---|---|---|---| +| `role` | string | The actual role of the message author as passed in the message. | `"system"`, `"instructions"` | `Conditionally Required`: if available and not equal to `system` | +| `content` | `AnyValue` | The contents of the system message. | `"You're a friendly bot that answers questions about OpenTelemetry."` | `Opt-In` | + +## User event + +This event describes the prompt message specified by the user. + +The event name MUST be `gen_ai.user.message`. + +| Body Field | Type | Description | Examples | Requirement Level | +|---|---|---|---|---| +| `role` | string | The actual role of the message author as passed in the message. | `"user"`, `"customer"` | `Conditionally Required`: if available and if not equal to `user` | +| `content` | `AnyValue` | The contents of the user message. | `What telemetry is reported by OpenAI instrumentations?` | `Opt-In` | + +## Assistant event + +This event describes the assistant message. + +The event name MUST be `gen_ai.assistant.message`. + +| Body Field | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | +|--------------|--------------------------------|----------------------------------------|-------------------------------------------------|-------------------| +| `role` | string | The actual role of the message author as passed in the message. | `"assistant"`, `"bot"` | `Conditionally Required`: if available and if not equal to `assistant` | +| `content` | `AnyValue` | The contents of the assistant message. | `Spans, events, metrics defined by the GenAI semantic conventions.` | `Opt-In` | +| `tool_calls` | [ToolCall](#toolcall-object)[] | The tool calls generated by the model, such as function calls. | `[{"id":"call_mszuSIzqtI65i1wAUOE8w5H4", "function":{"name":"get_link_to_otel_semconv", "arguments":{"semconv":"gen_ai"}}, "type":"function"}]` | `Conditionally Required`: if available | + +### `ToolCall` object + +| Body Field | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | +|------------|-----------------------------|------------------------------------|-------------------------------------------------|-------------------| +| `id` | string | The id of the tool call | `call_mszuSIzqtI65i1wAUOE8w5H4` | `Required` | +| `type` | string | The type of the tool | `function` | `Required` | +| `function` | [Function](#function-object)| The function that the model called | `{"name":"get_link_to_otel_semconv", "arguments":{"semconv":"gen_ai"}}` | `Required` | + +### `Function` object + +| Body Field | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | +|-------------|------------|----------------------------------------|----------------------------|-------------------| +| `name` | string | The name of the function to call | `get_link_to_otel_semconv` | `Required` | +| `arguments` | `AnyValue` | The arguments to pass the the function | `{"semconv": "gen_ai"}` | `Opt-In` | + +## Tool event + +This event describes the output of the tool or function submitted back to the model. + +The event name MUST be `gen_ai.tool.message`. + +| Body Field | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | +|----------------|--------|-----------------------------------------------|---------------------------------|-------------------| +| `role` | string | The actual role of the message author as passed in the message. | `"tool"`, `"function"` | `Conditionally Required`: if available and if not equal to `tool` | +| `content` | AnyValue | The contents of the tool message. | `opentelemetry.io` | `Opt-In` | +| `id` | string | Tool call that this message is responding to. | `call_mszuSIzqtI65i1wAUOE8w5H4` | `Required` | + +## Choice event + +This event describes model-generated individual chat response (choice). +If GenAI model returns multiple choices, each choice SHOULD be recorded as an individual event. + +When response is streamed, instrumentations that report response events MUST reconstruct and report the full message and MUST NOT report individual chunks as events. +If the request to GenAI model fails with an error before content is received, instrumentation SHOULD report an event with truncated content (if enabled). If `finish_reason` was not received, it MUST be set to `error`. + +The event name MUST be `gen_ai.choice`. + +Choice event body has the following fields: + +| Body Field | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | +|-----------------|----------------------------|-------------------------------------------------|----------------------------------------|-------------------| +| `finish_reason` | string | The reason the model stopped generating tokens. | `stop`, `tool_calls`, `content_filter` | `Required` | +| `index` | int | The index of the choice in the list of choices. | `1` | `Required` | +| `message` | [Message](#message-object) | GenAI response message | `{"content":"The OpenAI semantic conventions are available at opentelemetry.io"}` | `Recommended` | + +### `Message` object + +| Body Field | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | +|----------------|--------------------------------|-----------------------------------------------|---------------------------------|-------------------| +| `role` | string | The actual role of the message author as passed in the message. | `"assistant"`, `"bot"` | `Conditionally Required`: if available and if not equal to `assistant` | +| `content` | `AnyValue` | The contents of the assistant message. | `Spans, events, metrics defined by the GenAI semantic conventions.` | `Opt-In` | +| `tool_calls` | [ToolCall](#toolcall-object)[] | The tool calls generated by the model, such as function calls. | `[{"id":"call_mszuSIzqtI65i1wAUOE8w5H4", "function":{"name":"get_link_to_otel_semconv", "arguments":"{\"semconv\":\"gen_ai\"}"}, "type":"function"}]` | `Conditionally Required`: if available | + +## Custom events + +System-specific events that are not covered in this document SHOULD be documented in corresponding Semantic Conventions extensions and +SHOULD follow `gen_ai.{gen_ai.system}.*` naming pattern for system-specific events. + +## Examples + +### Chat completion + +This example covers the following scenario: + +- user requests chat completion from OpenAI GPT-4 model for the following prompt: + - System message: `You're a friendly bot that answers questions about OpenTelemetry.` + - User message: `How to instrument GenAI library with OTel?` + +- The model responds with `"Follow GenAI semantic conventions available at opentelemetry.io."` message + +Span: + +| Attribute name | Value | +|---------------------------------|--------------------------------------------| +| Span name | `"chat gpt-4"` | +| `gen_ai.system` | `"openai"` | +| `gen_ai.request.model` | `"gpt-4"` | +| `gen_ai.request.max_tokens` | `200` | +| `gen_ai.request.top_p` | `1.0` | +| `gen_ai.response.id` | `"chatcmpl-9J3uIL87gldCFtiIbyaOvTeYBRA3l"` | +| `gen_ai.response.model` | `"gpt-4-0613"` | +| `gen_ai.usage.output_tokens` | `47` | +| `gen_ai.usage.input_tokens` | `52` | +| `gen_ai.response.finish_reasons`| `["stop"]` | + +Events: + +1. `gen_ai.system.message`. + + | Property | Value | + |---------------------|-------------------------------------------------------| + | `gen_ai.system` | `"openai"` | + | Event body | `{"content": "You're a friendly bot that answers questions about OpenTelemetry."}` | + +2. `gen_ai.user.message` + + | Property | Value | + |---------------------|-------------------------------------------------------| + | `gen_ai.system` | `"openai"` | + | Event body | `{"content":"How to instrument GenAI library with OTel?"}` | + +3. `gen_ai.choice` + + | Property | Value | + |---------------------|-------------------------------------------------------| + | `gen_ai.system` | `"openai"` | + | Event body (with content enabled) | `{"index":0,"finish_reason":"stop","message":{"content":"Follow GenAI semantic conventions available at opentelemetry.io."}}` | + | Event body (without content) | `{"index":0,"finish_reason":"stop","message":{}}` | + +### Tools + +This example covers the following scenario: + +1. Application requests chat completion from OpenAI GPT-4 model and provides a function definition. + + - Application provides the following prompt: + - User message: `How to instrument GenAI library with OTel?` + - Application defines a tool (a function) names `get_link_to_otel_semconv` with single string argument named `semconv` + +2. The model responds with a tool call request which application executes +3. The application requests chat completion again now with the tool execution result + +Here's the telemetry generated for each step in this scenario: + +1. Chat completion resulting in a tool call. + + | Attribute name | Value | + |---------------------|-------------------------------------------------------| + | Span name | `"chat gpt-4"` | + | `gen_ai.system` | `"openai"` | + | `gen_ai.request.model`| `"gpt-4"` | + | `gen_ai.request.max_tokens`| `200` | + | `gen_ai.request.top_p`| `1.0` | + | `gen_ai.response.id`| `"chatcmpl-9J3uIL87gldCFtiIbyaOvTeYBRA3l"` | + | `gen_ai.response.model`| `"gpt-4-0613"` | + | `gen_ai.usage.output_tokens`| `17` | + | `gen_ai.usage.input_tokens`| `47` | + | `gen_ai.response.finish_reasons`| `["tool_calls"]` | + + Events parented to this span: + + - `gen_ai.user.message` (not reported when capturing content is disabled) + + | Property | Value | + |---------------------|-------------------------------------------------------| + | `gen_ai.system` | `"openai"` | + | Event body | `{"content":"How to instrument GenAI library with OTel?"}` | + + - `gen_ai.choice` + + | Property | Value | + |---------------------|-------------------------------------------------------| + | `gen_ai.system` | `"openai"` | + | Event body (with content) | `{"index":0,"finish_reason":"tool_calls","message":{"tool_calls":[{"id":"call_VSPygqKTWdrhaFErNvMV18Yl","function":{"name":"get_link_to_otel_semconv","arguments":"{\"semconv\":\"GenAI\"}"},"type":"function"}]}` | + | Event body (without content) | `{"index":0,"finish_reason":"tool_calls","message":{"tool_calls":[{"id":"call_VSPygqKTWdrhaFErNvMV18Yl","function":{"name":"get_link_to_otel_semconv"},"type":"function"}]}` | + +2. Application executes the tool call. Application may create span which is not covered by this semantic convention. +3. Final chat completion call + + | Attribute name | Value | + |---------------------------------|-------------------------------------------------------| + | Span name | `"chat gpt-4"` | + | `gen_ai.system` | `"openai"` | + | `gen_ai.request.model` | `"gpt-4"` | + | `gen_ai.request.max_tokens` | `200` | + | `gen_ai.request.top_p` | `1.0` | + | `gen_ai.response.id` | `"chatcmpl-call_VSPygqKTWdrhaFErNvMV18Yl"` | + | `gen_ai.response.model` | `"gpt-4-0613"` | + | `gen_ai.usage.output_tokens` | `52` | + | `gen_ai.usage.input_tokens` | `47` | + | `gen_ai.response.finish_reasons`| `["stop"]` | + + Events parented to this span: + (in this example, the event content matches the original messages, but applications may also drop messages or change their content) + + - `gen_ai.user.message` (not reported when capturing content is not enabled) + + | Property | Value | + |----------------------------------|------------------------------------------------------------| + | `gen_ai.system` | `"openai"` | + | Event body | `{"content":"How to instrument GenAI library with OTel?"}` | + + - `gen_ai.assistant.message` + + | Property | Value | + |----------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------| + | `gen_ai.system` | `"openai"` | + | Event body (content enabled) | `{"tool_calls":[{"id":"call_VSPygqKTWdrhaFErNvMV18Yl","function":{"name":"get_link_to_otel_semconv","arguments":"{\"semconv\":\"GenAI\"}"},"type":"function"}]}` | + | Event body (content not enabled) | `{"tool_calls":[{"id":"call_VSPygqKTWdrhaFErNvMV18Yl","function":{"name":"get_link_to_otel_semconv"},"type":"function"}]}` | + + - `gen_ai.tool.message` + + | Property | Value | + |----------------------------------|------------------------------------------------------------------------------------------------| + | `gen_ai.system` | `"openai"` | + | Event body (content enabled) | `{"content":"opentelemetry.io/semconv/gen-ai","id":"call_VSPygqKTWdrhaFErNvMV18Yl"}` | + | Event body (content not enabled) | `{"id":"call_VSPygqKTWdrhaFErNvMV18Yl"}` | + + - `gen_ai.choice` + + | Property | Value | + |----------------------------------|-------------------------------------------------------------------------------------------------------------------------------| + | `gen_ai.system` | `"openai"` | + | Event body (content enabled) | `{"index":0,"finish_reason":"stop","message":{"content":"Follow OTel semconv available at opentelemetry.io/semconv/gen-ai"}}` | + | Event body (content not enabled) | `{"index":0,"finish_reason":"stop","message":{}}` | + +### Chat completion with multiple choices + +This example covers the following scenario: + +- user requests 2 chat completion from OpenAI GPT-4 model for the following prompt: + + - System message: `You're a friendly bot that answers questions about OpenTelemetry.` + - User message: `How to instrument GenAI library with OTel?` + +- The model responds with two choices + + - `"Follow GenAI semantic conventions available at opentelemetry.io."` message + - `"Use OpenAI instrumentation library."` message + +Span: + +| Attribute name | Value | +|---------------------|--------------------------------------------| +| Span name | `"chat gpt-4"` | +| `gen_ai.system` | `"openai"` | +| `gen_ai.request.model`| `"gpt-4"` | +| `gen_ai.request.max_tokens`| `200` | +| `gen_ai.request.top_p`| `1.0` | +| `gen_ai.response.id`| `"chatcmpl-9J3uIL87gldCFtiIbyaOvTeYBRA3l"` | +| `gen_ai.response.model`| `"gpt-4-0613"` | +| `gen_ai.usage.output_tokens`| `77` | +| `gen_ai.usage.input_tokens`| `52` | +| `gen_ai.response.finish_reasons`| `["stop"]` | + +Events: + +1. `gen_ai.system.message`: the same as in the [Chat Completion](#chat-completion) example +2. `gen_ai.user.message`: the same as in the previous example +3. `gen_ai.choice` + + | Property | Value | + |------------------------------|-------------------------------------------------------| + | `gen_ai.system` | `"openai"` | + | Event body (content enabled) | `{"index":0,"finish_reason":"stop","message":{"content":"Follow GenAI semantic conventions available at opentelemetry.io."}}` | + +4. `gen_ai.choice` + + | Property | Value | + |------------------------------|-------------------------------------------------------| + | `gen_ai.system` | `"openai"` | + | Event body (content enabled) | `{"index":1,"finish_reason":"stop","message":{"content":"Use OpenAI instrumentation library."}}` | + +[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status diff --git a/docs/gen-ai/gen-ai-spans.md b/docs/gen-ai/gen-ai-spans.md index 0a3eec44b4..ed63699ae3 100644 --- a/docs/gen-ai/gen-ai-spans.md +++ b/docs/gen-ai/gen-ai-spans.md @@ -2,7 +2,7 @@ linkTitle: Generative AI traces ---> -# Semantic Conventions for GenAI operations +# Semantic Conventions for GenAI spans **Status**: [Experimental][DocumentStatus] @@ -11,9 +11,8 @@ linkTitle: Generative AI traces - [Name](#name) -- [Configuration](#configuration) - [GenAI attributes](#genai-attributes) -- [Events](#events) +- [Capturing inputs and outputs](#capturing-inputs-and-outputs) @@ -27,15 +26,6 @@ GenAI spans MUST follow the overall [guidelines for span names](https://github.c The **span name** SHOULD be `{gen_ai.operation.name} {gen_ai.request.model}`. Semantic conventions for individual GenAI systems and frameworks MAY specify different span name format. -## Configuration - -Instrumentations for Generative AI clients MAY capture prompts and completions. -Instrumentations that support it, MUST offer the ability to turn off capture of prompts and completions. This is for three primary reasons: - -1. Data privacy concerns. End users of GenAI applications may input sensitive information or personally identifiable information (PII) that they do not wish to be sent to a telemetry backend. -2. Data size concerns. Although there is no specified limit to sizes, there are practical limitations in programming languages and telemetry systems. Some GenAI systems allow for extremely large context windows that end users may take full advantage of. -3. Performance concerns. Sending large amounts of data to a telemetry backend may cause performance issues for the application. - ## GenAI attributes These attributes track input data and metadata for a request to an GenAI model. Each attribute represents a concept that is common to most Generative AI clients. @@ -125,54 +115,8 @@ Instrumentations SHOULD document the list of errors they report. -## Events - -In the lifetime of a GenAI span, an event for prompts sent and completions received MAY be created, depending on the configuration of the instrumentation. - - - - - - - - -The event name MUST be `gen_ai.content.prompt`. +## Capturing inputs and outputs -| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | -|---|---|---|---|---|---| -| [`gen_ai.prompt`](/docs/attributes-registry/gen-ai.md) | string | The full prompt sent to the GenAI model. [1] | `[{'role': 'user', 'content': 'What is the capital of France?'}]` | `Conditionally Required` if and only if corresponding event is enabled | ![Experimental](https://img.shields.io/badge/-experimental-blue) | - -**[1]:** It's RECOMMENDED to format prompts as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation) - - - - - - - - - - - - - - - - -The event name MUST be `gen_ai.content.completion`. - -| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | -|---|---|---|---|---|---| -| [`gen_ai.completion`](/docs/attributes-registry/gen-ai.md) | string | The full response received from the GenAI model. [1] | `[{'role': 'assistant', 'content': 'The capital of France is Paris.'}]` | `Conditionally Required` if and only if corresponding event is enabled | ![Experimental](https://img.shields.io/badge/-experimental-blue) | - -**[1]:** It's RECOMMENDED to format completions as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation) - - - - - - - - +User inputs and model responses may be recorded as events parented to GenAI operation span. See [Semantic Conventions for GenAI events](./gen-ai-events.md) for the details. -[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md +[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status diff --git a/model/gen-ai/deprecated/registry-deprecated.yaml b/model/gen-ai/deprecated/registry-deprecated.yaml index 04a2968a74..2115482f36 100644 --- a/model/gen-ai/deprecated/registry-deprecated.yaml +++ b/model/gen-ai/deprecated/registry-deprecated.yaml @@ -16,3 +16,15 @@ groups: deprecated: Replaced by `gen_ai.usage.output_tokens` attribute. brief: "Deprecated, use `gen_ai.usage.output_tokens` instead." examples: [42] + - id: gen_ai.prompt + type: string + stability: experimental + deprecated: "Removed, no replacement at this time." + brief: "Deprecated, use Event API to report prompt contents." + examples: ["[{'role': 'user', 'content': 'What is the capital of France?'}]"] + - id: gen_ai.completion + type: string + stability: experimental + deprecated: "Removed, no replacement at this time." + brief: "Deprecated, use Event API to report completions contents." + examples: ["[{'role': 'assistant', 'content': 'The capital of France is Paris.'}]"] diff --git a/model/gen-ai/events.yaml b/model/gen-ai/events.yaml new file mode 100644 index 0000000000..72a4e692f0 --- /dev/null +++ b/model/gen-ai/events.yaml @@ -0,0 +1,48 @@ +groups: + - id: gen_ai.common.event.attributes + type: attribute_group + stability: experimental + brief: > + Describes common Gen AI event attributes. + attributes: + - ref: gen_ai.system + + - id: gen_ai.system.message + name: gen_ai.system.message + type: event + stability: experimental + brief: > + This event describes the instructions passed to the GenAI system inside the prompt. + extends: gen_ai.common.event.attributes + + - id: gen_ai.user.message + name: gen_ai.user.message + type: event + stability: experimental + brief: > + This event describes the prompt message specified by the user. + extends: gen_ai.common.event.attributes + + - id: gen_ai.assistant.message + name: gen_ai.assistant.message + type: event + stability: experimental + brief: > + This event describes the assistant message passed to GenAI system or received from it. + extends: gen_ai.common.event.attributes + + - id: gen_ai.tool.message + name: gen_ai.tool.message + type: event + stability: experimental + brief: > + This event describes the tool or function response message. + extends: gen_ai.common.event.attributes + + - id: gen_ai.choice + name: gen_ai.choice + type: event + stability: experimental + brief: > + This event describes the Gen AI response message. + extends: gen_ai.common.event.attributes diff --git a/model/gen-ai/registry.yaml b/model/gen-ai/registry.yaml index 5b3d1cff79..816f457093 100644 --- a/model/gen-ai/registry.yaml +++ b/model/gen-ai/registry.yaml @@ -119,18 +119,6 @@ groups: brief: 'Output tokens (completion, response, etc.)' brief: The type of token being counted. examples: ['input', 'output'] - - id: gen_ai.prompt - stability: experimental - type: string - brief: The full prompt sent to the GenAI model. - note: It's RECOMMENDED to format prompts as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation) - examples: ["[{'role': 'user', 'content': 'What is the capital of France?'}]"] - - id: gen_ai.completion - stability: experimental - type: string - brief: The full response received from the GenAI model. - note: It's RECOMMENDED to format completions as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation) - examples: ["[{'role': 'assistant', 'content': 'The capital of France is Paris.'}]"] - id: gen_ai.operation.name stability: experimental type: diff --git a/model/gen-ai/spans.yaml b/model/gen-ai/spans.yaml index d634d94473..86ddfc4a82 100644 --- a/model/gen-ai/spans.yaml +++ b/model/gen-ai/spans.yaml @@ -54,35 +54,6 @@ groups: The `error.type` SHOULD match the error code returned by the Generative AI provider or the client library, the canonical name of exception that occurred, or another low-cardinality error identifier. Instrumentations SHOULD document the list of errors they report. - events: - - gen_ai.content.prompt - - gen_ai.content.completion - - - id: gen_ai.content.prompt - name: gen_ai.content.prompt - type: event - brief: > - In the lifetime of an GenAI span, events for prompts sent and completions received - may be created, depending on the configuration of the instrumentation. - attributes: - - ref: gen_ai.prompt - requirement_level: - conditionally_required: if and only if corresponding event is enabled - note: > - It's RECOMMENDED to format prompts as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation) - - - id: gen_ai.content.completion - name: gen_ai.content.completion - type: event - brief: > - In the lifetime of an GenAI span, events for prompts sent and completions received - may be created, depending on the configuration of the instrumentation. - attributes: - - ref: gen_ai.completion - requirement_level: - conditionally_required: if and only if corresponding event is enabled - note: > - It's RECOMMENDED to format completions as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation) - id: trace.gen_ai.client extends: trace.gen_ai.client.common From 6e77ed5f6c39c2b26e96a85bb3b030c1ee2d20dc Mon Sep 17 00:00:00 2001 From: Joao Grassi <5938087+joaopgrassi@users.noreply.github.com> Date: Mon, 7 Oct 2024 11:36:02 +0200 Subject: [PATCH 5/5] Mark *.size messaging attributes as opt-in (#1442) --- .chloggen/size-attributes-opt-in.yaml | 4 ++++ docs/messaging/kafka.md | 10 +++++----- docs/messaging/messaging-spans.md | 24 ++++++++++++------------ docs/messaging/rabbitmq.md | 14 +++++++------- docs/messaging/rocketmq.md | 10 +++++----- model/messaging/spans.yaml | 8 ++++++-- 6 files changed, 39 insertions(+), 31 deletions(-) create mode 100755 .chloggen/size-attributes-opt-in.yaml diff --git a/.chloggen/size-attributes-opt-in.yaml b/.chloggen/size-attributes-opt-in.yaml new file mode 100755 index 0000000000..2062d885e1 --- /dev/null +++ b/.chloggen/size-attributes-opt-in.yaml @@ -0,0 +1,4 @@ +change_type: breaking +component: messaging +note: Mark *.size messaging attributes as Opt-In +issues: [474] diff --git a/docs/messaging/kafka.md b/docs/messaging/kafka.md index 5890dcac3f..b86f3a6f50 100644 --- a/docs/messaging/kafka.md +++ b/docs/messaging/kafka.md @@ -43,9 +43,9 @@ For Apache Kafka, the following additional attributes are defined: | [`messaging.destination.partition.id`](/docs/attributes-registry/messaging.md) | string | String representation of the partition id the message (or batch) is sent to or received from. | `1` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`messaging.kafka.message.key`](/docs/attributes-registry/messaging.md) | string | Message keys in Kafka are used for grouping alike messages to ensure they're processed on the same partition. They differ from `messaging.message.id` in that they're not unique. If the key is `null`, the attribute MUST NOT be set. [9] | `myKey` | `Recommended` If span describes operation on a single message. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`messaging.kafka.offset`](/docs/attributes-registry/messaging.md) | int | The offset of a record in the corresponding Kafka partition. | `42` | `Recommended` If span describes operation on a single message. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`messaging.message.body.size`](/docs/attributes-registry/messaging.md) | int | The size of the message body in bytes. [10] | `1439` | `Recommended` If span describes operation on a single message. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`messaging.message.id`](/docs/attributes-registry/messaging.md) | string | A value used by the messaging system as an identifier for the message, represented as a string. | `452a7c7c7c7048c2f887f61572b18fc2` | `Recommended` If span describes operation on a single message. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [11] | `80`; `8080`; `443` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [10] | `80`; `8080`; `443` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`messaging.message.body.size`](/docs/attributes-registry/messaging.md) | int | The size of the message body in bytes. Only applicable for spans describing single message operations. [11] | `1439` | `Opt-In` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | **[1]:** The `error.type` SHOULD be predictable, and SHOULD have low cardinality. @@ -84,10 +84,10 @@ the broker doesn't have such notion, the destination name SHOULD uniquely identi **[9]:** If the key type is not string, it's string representation has to be supplied for the attribute. If the key has no unambiguous, canonical string form, don't include its value. -**[10]:** This can refer to both the compressed or uncompressed body size. If both sizes are known, the uncompressed -body size should be used. +**[10]:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available. -**[11]:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available. +**[11]:** This can refer to both the compressed or uncompressed body size. If both sizes are known, the uncompressed +body size should be used. diff --git a/docs/messaging/messaging-spans.md b/docs/messaging/messaging-spans.md index 5ffc36c97d..1c6ac90035 100644 --- a/docs/messaging/messaging-spans.md +++ b/docs/messaging/messaging-spans.md @@ -346,13 +346,13 @@ Messaging system-specific attributes MUST be defined in the corresponding `messa | [`server.address`](/docs/attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [14] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Conditionally Required` If available. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`messaging.client.id`](/docs/attributes-registry/messaging.md) | string | A unique identifier for the client that consumes or produces a message. | `client-5`; `myhost@8742@s8083jm` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`messaging.destination.partition.id`](/docs/attributes-registry/messaging.md) | string | The identifier of the partition messages are sent to or received from, unique within the `messaging.destination.name`. | `1` | `Recommended` When applicable. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`messaging.message.body.size`](/docs/attributes-registry/messaging.md) | int | The size of the message body in bytes. [15] | `1439` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`messaging.message.conversation_id`](/docs/attributes-registry/messaging.md) | string | The conversation ID identifying the conversation to which the message belongs, represented as a string. Sometimes called "Correlation ID". | `MyConversationId` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`messaging.message.envelope.size`](/docs/attributes-registry/messaging.md) | int | The size of the message body and metadata in bytes. [16] | `2738` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`messaging.message.id`](/docs/attributes-registry/messaging.md) | string | A value used by the messaging system as an identifier for the message, represented as a string. | `452a7c7c7c7048c2f887f61572b18fc2` | `Recommended` If span describes operation on a single message. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`network.peer.address`](/docs/attributes-registry/network.md) | string | Peer address of the messaging intermediary node where the operation was performed. [17] | `10.1.2.80`; `/tmp/my.sock` | `Recommended` If applicable for this messaging system. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`network.peer.address`](/docs/attributes-registry/network.md) | string | Peer address of the messaging intermediary node where the operation was performed. [15] | `10.1.2.80`; `/tmp/my.sock` | `Recommended` If applicable for this messaging system. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`network.peer.port`](/docs/attributes-registry/network.md) | int | Peer port of the messaging intermediary node where the operation was performed. | `65123` | `Recommended` if and only if `network.peer.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | -| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [18] | `80`; `8080`; `443` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [16] | `80`; `8080`; `443` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`messaging.message.body.size`](/docs/attributes-registry/messaging.md) | int | The size of the message body in bytes. [17] | `1439` | `Opt-In` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`messaging.message.envelope.size`](/docs/attributes-registry/messaging.md) | int | The size of the message body and metadata in bytes. [18] | `2738` | `Opt-In` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | **[1]:** The actual messaging system may differ from the one known by the client. For example, when using Kafka client libraries to communicate with Azure Event Hubs, the `messaging.system` is set to `kafka` based on the instrumentation's best knowledge. @@ -401,17 +401,17 @@ the broker doesn't have such notion, the destination name SHOULD uniquely identi **[14]:** Server domain name of the broker if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. -**[15]:** This can refer to both the compressed or uncompressed body size. If both sizes are known, the uncompressed -body size should be used. - -**[16]:** This can refer to both the compressed or uncompressed size. If both sizes are known, the uncompressed -size should be used. - -**[17]:** Semantic conventions for individual messaging systems SHOULD document whether `network.peer.*` attributes are applicable. +**[15]:** Semantic conventions for individual messaging systems SHOULD document whether `network.peer.*` attributes are applicable. Network peer address and port are important when the application interacts with individual intermediary nodes directly, If a messaging operation involved multiple network calls (for example retries), the address of the last contacted node SHOULD be used. -**[18]:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available. +**[16]:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available. + +**[17]:** This can refer to both the compressed or uncompressed body size. If both sizes are known, the uncompressed +body size should be used. + +**[18]:** This can refer to both the compressed or uncompressed size. If both sizes are known, the uncompressed +size should be used. diff --git a/docs/messaging/rabbitmq.md b/docs/messaging/rabbitmq.md index 20ad57d91c..1089f87a1c 100644 --- a/docs/messaging/rabbitmq.md +++ b/docs/messaging/rabbitmq.md @@ -31,12 +31,12 @@ In RabbitMQ, the destination is defined by an *exchange* and a *routing key*. | [`messaging.rabbitmq.destination.routing_key`](/docs/attributes-registry/messaging.md) | string | RabbitMQ message routing key. | `myKey` | `Conditionally Required` If not empty. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`messaging.rabbitmq.message.delivery_tag`](/docs/attributes-registry/messaging.md) | int | RabbitMQ message delivery tag | `123` | `Conditionally Required` When available. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`server.address`](/docs/attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [5] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Conditionally Required` If available. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | -| [`messaging.message.body.size`](/docs/attributes-registry/messaging.md) | int | The size of the message body in bytes. [6] | `1439` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`messaging.message.conversation_id`](/docs/attributes-registry/messaging.md) | string | Message [correlation Id](https://www.rabbitmq.com/tutorials/tutorial-six-java#correlation-id) property. | `MyConversationId` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`messaging.message.id`](/docs/attributes-registry/messaging.md) | string | A value used by the messaging system as an identifier for the message, represented as a string. | `452a7c7c7c7048c2f887f61572b18fc2` | `Recommended` If span describes operation on a single message. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`network.peer.address`](/docs/attributes-registry/network.md) | string | Peer address of the network connection - IP address or Unix domain socket name. [7] | `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`network.peer.address`](/docs/attributes-registry/network.md) | string | Peer address of the network connection - IP address or Unix domain socket name. [6] | `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`network.peer.port`](/docs/attributes-registry/network.md) | int | Peer port number of the network connection. | `65123` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | -| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [8] | `80`; `8080`; `443` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [7] | `80`; `8080`; `443` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`messaging.message.body.size`](/docs/attributes-registry/messaging.md) | int | The size of the message body in bytes. [8] | `1439` | `Opt-In` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | **[1]:** The `error.type` SHOULD be predictable, and SHOULD have low cardinality. @@ -67,12 +67,12 @@ the broker doesn't have such notion, the destination name SHOULD uniquely identi **[5]:** Server domain name of the broker if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. -**[6]:** This can refer to both the compressed or uncompressed body size. If both sizes are known, the uncompressed -body size should be used. +**[6]:** If an operation involved multiple network calls (for example retries), the address of the last contacted node SHOULD be used. -**[7]:** If an operation involved multiple network calls (for example retries), the address of the last contacted node SHOULD be used. +**[7]:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available. -**[8]:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available. +**[8]:** This can refer to both the compressed or uncompressed body size. If both sizes are known, the uncompressed +body size should be used. diff --git a/docs/messaging/rocketmq.md b/docs/messaging/rocketmq.md index 48741ebd08..326d8dee63 100644 --- a/docs/messaging/rocketmq.md +++ b/docs/messaging/rocketmq.md @@ -35,13 +35,13 @@ Specific attributes for Apache RocketMQ are defined below. | [`messaging.rocketmq.message.group`](/docs/attributes-registry/messaging.md) | string | It is essential for FIFO message. Messages that belong to the same message group are always processed one by one within the same consumer group. | `myMessageGroup` | `Conditionally Required` If the message type is FIFO. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`server.address`](/docs/attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [9] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Conditionally Required` If available. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`messaging.client.id`](/docs/attributes-registry/messaging.md) | string | A unique identifier for the client that consumes or produces a message. | `client-5`; `myhost@8742@s8083jm` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`messaging.message.body.size`](/docs/attributes-registry/messaging.md) | int | The size of the message body in bytes. [10] | `1439` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`messaging.message.id`](/docs/attributes-registry/messaging.md) | string | A value used by the messaging system as an identifier for the message, represented as a string. | `452a7c7c7c7048c2f887f61572b18fc2` | `Recommended` If span describes operation on a single message. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`messaging.rocketmq.consumption_model`](/docs/attributes-registry/messaging.md) | string | Model of message consumption. This only applies to consumer spans. | `clustering`; `broadcasting` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`messaging.rocketmq.message.keys`](/docs/attributes-registry/messaging.md) | string[] | Key(s) of message, another way to mark message besides message id. | `["keyA", "keyB"]` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`messaging.rocketmq.message.tag`](/docs/attributes-registry/messaging.md) | string | The secondary classifier of message besides topic. | `tagA` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`messaging.rocketmq.message.type`](/docs/attributes-registry/messaging.md) | string | Type of message. | `normal`; `fifo`; `delay` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [11] | `80`; `8080`; `443` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [10] | `80`; `8080`; `443` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`messaging.message.body.size`](/docs/attributes-registry/messaging.md) | int | The size of the message body in bytes. [11] | `1439` | `Opt-In` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | **[1]:** The `error.type` SHOULD be predictable, and SHOULD have low cardinality. @@ -80,10 +80,10 @@ the broker doesn't have such notion, the destination name SHOULD uniquely identi **[9]:** Server domain name of the broker if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. -**[10]:** This can refer to both the compressed or uncompressed body size. If both sizes are known, the uncompressed -body size should be used. +**[10]:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available. -**[11]:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available. +**[11]:** This can refer to both the compressed or uncompressed body size. If both sizes are known, the uncompressed +body size should be used. diff --git a/model/messaging/spans.yaml b/model/messaging/spans.yaml index cc10819f8a..c955d4feb2 100644 --- a/model/messaging/spans.yaml +++ b/model/messaging/spans.yaml @@ -64,7 +64,9 @@ groups: sampling_relevant: true - ref: messaging.message.conversation_id - ref: messaging.message.envelope.size + requirement_level: opt_in - ref: messaging.message.body.size + requirement_level: opt_in - ref: messaging.batch.message_count requirement_level: conditionally_required: If the span describes an operation on a batch of messages. @@ -111,6 +113,7 @@ groups: brief: > Message [correlation Id](https://www.rabbitmq.com/tutorials/tutorial-six-java#correlation-id) property. - ref: messaging.message.body.size + requirement_level: opt_in - id: messaging.kafka type: attribute_group @@ -141,8 +144,8 @@ groups: conditionally_required: If the span describes an operation on a batch of messages. - ref: messaging.client.id - ref: messaging.message.body.size - requirement_level: - recommended: If span describes operation on a single message. + requirement_level: opt_in + brief: The size of the message body in bytes. Only applicable for spans describing single message operations. - id: messaging.rocketmq type: attribute_group @@ -172,6 +175,7 @@ groups: - ref: messaging.rocketmq.consumption_model - ref: messaging.client.id - ref: messaging.message.body.size + requirement_level: opt_in - ref: messaging.batch.message_count requirement_level: conditionally_required: If the span describes an operation on a batch of messages.