Add LLM semantic conventions #639

nirga · 2024-01-12T10:56:24Z

Advancement towards #327

Changes

Please provide a brief description of the changes here.

Continuing the work from #483. Introduces semantic conventions for modern AI systems.
I tried focusing on a minimal set, specifically supporting LLMs in general with some specific semantic conventions for OpenAI as its API is far more complex than others like Anthropic. Future PRs will address more foundation models as well as vector DBs and frameworks.

I'm trying to match this to what we've already started building with OpenLLMetry and will make the needed changes there once this is approved.

Merge requirement checklist

CONTRIBUTING.md guidelines followed.
CHANGELOG.md updated for non-trivial changes.
schema-next.yaml updated with changes to existing conventions.

joaopgrassi · 2024-01-12T12:09:39Z

Hi @nirga, thanks for the contribution!

Please, refer to the CONTRIBUTION guide. The markdown attribute tables are generated automatically, and you need to define the attributes via YAML files.

gyliu513

Good start, thanks @nirga !

docs/ai/llm-spans.md

gyliu513 · 2024-01-12T14:51:02Z

docs/ai/llm-spans.md

+<!-- semconv ai(tag=llm-response) -->
+| Attribute  | Type | Description  | Examples  | Requirement Level |
+|---|---|---|---|---|
+| `llm.response.id` | string | The unique identifier for the completion. | `chatcmpl-123` | Recommended |


How about adding llm.response.duration? This is requested to check the latency.

Isn't it covered already by the fact that an LLM request is a single span which as a duration?

Do you mean get the info just from otel span, right?

Yes, this information would be available from the Span. In the case of a streaming response some people want to know "time to first token", "max time/pause between tokens".

docs/ai/openai.md

nirga · 2024-01-12T15:44:12Z

Hi @nirga, thanks for the contribution!

Please, refer to the CONTRIBUTION guide. The markdown attribute tables are generated automatically, and you need to define the attributes via YAML files.

Thanks! I was merely copying and adapting #483. I'll work on converting this to a YAML as well.

drewby

Some initial feedback. Once we define the yaml model files, some other efficiencies (duplication and naming conventions) become evident.

Also, can we add the openai metrics back into this PR or do you want that to be in a separate PR?

docs/ai/llm-spans.md

drewby · 2024-01-22T20:00:04Z

docs/ai/llm-spans.md

+<!-- semconv ai(tag=llm-response) -->
+| Attribute  | Type | Description  | Examples  | Requirement Level |
+|---|---|---|---|---|
+| `llm.response.id` | string | The unique identifier for the completion. | `chatcmpl-123` | Recommended |


Yes, this information would be available from the Span. In the case of a streaming response some people want to know "time to first token", "max time/pause between tokens".

drewby · 2024-01-22T20:01:13Z

docs/ai/llm-spans.md

+| Attribute  | Type | Description  | Examples  | Requirement Level |
+|---|---|---|---|---|
+| `llm.response.id` | string | The unique identifier for the completion. | `chatcmpl-123` | Recommended |
+| `llm.response.model` | string | The name of the LLM a response is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model actually used. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4-0613` | Required |


Do we expect llm.request.model and llm.response.model to be different? The request and response are all recorded on one span, so these would be redundant?

Yes, for example in OpenAI you ask for gpt-4 and then get a specific version like gpt-4-0613 (I've also seen this in Ahthropic, Replicate, and others)

drewby · 2024-01-22T20:03:30Z

docs/ai/llm-spans.md

+|---|---|---|---|---|
+| `llm.vendor` | string | The name of the LLM foundation model vendor, if applicable. If not using a vendor-supplied model, this field is left blank. | `openai` | Recommended |
+| `llm.request.model` | string | The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model requested. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4` | Required |
+| `llm.request.max_tokens` | int | The maximum number of tokens the LLM generates for a request. | `100` | Recommended |


max_tokens is prefixed with request, whereas other parameters such as temperature are not prefixed. Perhaps we should remove the request prefix or add the prefix to the others. Instead of request, perhaps parameter is better.

Hmm... Parameter would sound weird for model, no? llm.parameter.model. I've added request to all request parameters.

linux-foundation-easycla · 2024-01-23T07:47:30Z

The committers listed above are authorized under a signed CLA.

✅ login: nirga / name: Nir Gazit (8fe6a5f, 5843c65, 0203aea, 0891f91, bdc1982, fdb3ba4, a844a28)
✅ login: drobbins-msft / name: Drew Robbins (a521fc1)
✅ login: drewby / name: Drew Robbins (d5a9753, 0ef1c1b, fd57c6c, c80b80c)

nirga · 2024-01-23T17:38:33Z

Hi @nirga, thanks for the contribution!

Please, refer to the CONTRIBUTION guide. The markdown attribute tables are generated automatically, and you need to define the attributes via YAML files.

@joaopgrassi YAML files were added.

lmolkova

Great start!
Left some suggestions and questions.

docs/ai/README.md

model/registry/llm.yaml

lmolkova · 2024-01-26T21:34:44Z

model/registry/llm.yaml

+        brief: The name of the LLM foundation model vendor, if applicable.
+        examples: 'openai'
+        tag: llm-generic-request
+      - id: request.model


why not just llm.model ?

Also, I assume there could be multiple model properties, perhaps llm.model.name would be more future-proof?

The reason is there are many providers (like OpenAI, Anthropic, Render, etc.) where you ask for a general version (like gpt-4-turbo-preview) but then get a specific version (like gpt-4-0125-preview). So we need a separation between the "request" model and the "actual" model.

makes sense. Is it a common case that request and response models are different?

The question still applies: llm.model.name, could the model contain more info other than the name? Like llm.request.model.name|version etc? Do we want to make "model" a top namespace and then request response can just re-use?

The question still applies: llm.model.name, could the model contain more info other than the name? Like llm.request.model.name|version etc? Do we want to make "model" a top namespace and then request response can just re-use?

This is a single input parameter in the services I've seen, not a separate name and version. The response model could be a different qualified identifier. For example, the request could be for 'gpt4' and the response could say 'gpt4-32k-turbo'.

model/registry/llm.yaml

lmolkova · 2024-01-26T21:39:34Z

model/registry/llm.yaml

+        tag: llm-generic-request
+      - id: request.stop_sequences
+        type: string
+        brief: Array of strings the LLM uses as a stop sequence.


if it's an array, should the type be ``string[]`? If there are good reasons (such as perf) to keep it as string, how values are separated? Could you also provide an example of array in examples?

what's the resolution here? should it be of type string[]?

Yes, it should likely be string[]

lmolkova · 2024-01-27T00:19:04Z

model/trace/llm.yaml

+      - llm.content.openai.tool
+      - llm.content.openai.completion.choice
+
+  - id: llm.content.openai.prompt


do we need this event?
It can be llm.content.prompt with open-ai specific attributes

That's a good question. Given that OpenAI (specifically) has a really different way of modeling prompts and completions, I wonder if it won't be cumbersome to use the same event for both?

we can start with one event and add more once it's proven to be too difficult. the spec would stay experimental for now anyway.
For the time being we can just list OpenAI-specific attributes and mention that they would appear on the events.

(Unless you already have good reasons to keep events separate)

Some frameworks (e.g. vllm) have OpenAI-compatible serving APIs.

Using the same event name can benefit this use case.

model/trace/llm.yaml

docs/ai/README.md

Adding OpenAI metrics and fixing markdown errors

lmolkova · 2024-01-31T22:03:01Z

docs/ai/llm-spans.md

+A request to an LLM is modeled as a span in a trace.
+
+The **span name** SHOULD be set to a low cardinality value representing the request made to an LLM.
+It MAY be a name of the API endpoint for the LLM being called.


we don't usually put endpoints in the span names. Perhaps we can stay vague and say that it should contain specific operation name (e.g. create_chat_completions).

See also a comment on metric regarding introducing llm.operation attribute

lmolkova · 2024-01-31T22:04:26Z

docs/ai/llm-spans.md

+
+## Configuration
+
+Instrumentations for LLMs MUST offer the ability to turn off capture of prompts and completions. This is for three primary reasons:


please set requirement levels on corresponding attributes to opt-in - then there will be no need to specify this requirement - https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/attribute-requirement-level.md

We can just say that prompts and completions could be sensitive (and keep explanation below)

lmolkova · 2024-01-31T22:05:07Z

docs/ai/llm-spans.md

+2. Data size concerns. Although there is no specified limit to sizes, there are practical limitations in programming languages and telemety systems. Some LLMs allow for extremely large context windows that end users may take full advantage of.
+3. Performance concerns. Sending large amounts of data to a telemetry backend may cause performance issues for the application.
+
+By default, these configurations SHOULD NOT capture prompts and completions.


Suggested change

By default, these configurations SHOULD NOT capture prompts and completions.

we need to change requirement level to opt-in and then it's redundunt

lmolkova · 2024-01-31T22:05:55Z

docs/ai/llm-spans.md

+<!-- semconv llm.request -->
+| Attribute  | Type | Description  | Examples  | Requirement Level |
+|---|---|---|---|---|
+| [`llm.request.is_stream`](../attributes-registry/llm.md) | boolean | Whether the LLM responds with a stream. | `False` | Recommended |


BTW, is it important for observability? How would i use it?

lmolkova · 2024-01-31T22:08:00Z

model/registry/llm.yaml

+        brief: The name of the LLM foundation model vendor, if applicable.
+        examples: 'openai'
+        tag: llm-generic-request
+      - id: request.model


makes sense. Is it a common case that request and response models are different?

lmolkova · 2024-01-31T22:49:43Z

docs/ai/llm-spans.md

+|---|---|---|---|---|
+| [`llm.prompt`](../attributes-registry/llm.md) | string | The full prompt string sent to an LLM in a request. [1] | `\\n\\nHuman:You are an AI assistant that tells jokes. Can you tell me a joke about OpenTelemetry?\\n\\nAssistant:` | Recommended |
+
+**[1]:** The full prompt string sent to an LLM in a request. If the LLM accepts a more complex input like a JSON object, this field is blank, and the response is instead captured in an event determined by the specific LLM technology semantic convention.


it's ok to put json into this attribute. Once we have a way to specify what goes into event payload, we'll move it there and json, xml or plain text would be perfectly fine.

prompt vs. completion seems to be the old Completion API that is deprecated by openai and many other model providers not even start with having Completion api (e.g. Mistral: https://docs.mistral.ai/api/)

The current open ai api is ChatCompletion which has a messages array as input and one message output.
https://platform.openai.com/docs/guides/text-generation/chat-completions-api

Also if we consider the llm-span is the "base" and to be extended for different model providers and APIs, i'm not sure if we should include things like prompt/completion (input/output) as part of it, since different model/api will have totally different input/output, trying to define a base input/output here doesn't help.

It probably makes more sense to define span type for each api not vendor?
Open AI has chat completion, embedding, image generation, gpt-4v ... it will be hard to capture that in a single span type 'openai'.

It probably makes more sense to define span type for each api not vendor? Open AI has chat completion, embedding, image generation, gpt-4v ... it will be hard to capture that in a single span type 'openai'.

Yes, have the same concern. Do you think "inputs" and "outputs" would be the more generic representation across various apis? Then we can add api specific attributes for chatcompletions, image generation, etc. gen_ai.openai.chatcompletions.*, gen_ai.openai.images.* etc. We should discuss this in the working group.

lmolkova · 2024-01-31T22:51:53Z

docs/ai/llm-spans.md

+|---|---|---|---|---|
+| [`llm.completion`](../attributes-registry/llm.md) | string | The full response string from an LLM in a response. [1] | `Why did the developer stop using OpenTelemetry? Because they couldnt trace their steps!` | Recommended |
+
+**[1]:** The full response string from an LLM. If the LLM responds with a more complex output like a JSON object made up of several pieces (such as OpenAI's message choices), this field is the content of the response. If the LLM produces multiple responses, then this field is left blank, and each response is instead captured in an event determined by the specific LLM technology semantic convention.


I don't understand this sentence.
Why leave this attribute blank and not out json there?
also, I think we should create even per each message in the completion, at least when response is streamed.

lmolkova · 2024-01-31T22:51:59Z

docs/ai/openai-metrics.md

@@ -0,0 +1,372 @@
+<!--- Hugo front matter used to generate the website version of this page:
+linkTitle: MEtrics


Suggested change

linkTitle: MEtrics

linkTitle: LLM metrics

(or OpenAI metrics depending on the discussion below)

lmolkova · 2024-01-31T22:53:31Z

docs/ai/README.md

@@ -0,0 +1,25 @@
+<!--- Hugo front matter used to generate the website version of this page:
+linkTitle: AI


nit: since it's LLM semconv, I think it should be in the llm folder and should have LLM title

lmolkova · 2024-01-31T22:53:49Z

docs/ai/README.md

+  to: database/README.md
+--->
+
+# Semantic Conventions for AI systems


Suggested change

# Semantic Conventions for AI systems

# Semantic Conventions for LLM clients

joaopgrassi · 2024-02-14T15:34:41Z

model/registry/llm.yaml

+        brief: The name of the LLM foundation model vendor, if applicable.
+        examples: 'openai'
+        tag: llm-generic-request
+      - id: request.model


The question still applies: llm.model.name, could the model contain more info other than the name? Like llm.request.model.name|version etc? Do we want to make "model" a top namespace and then request response can just re-use?

joaopgrassi · 2024-02-14T15:47:27Z

model/registry/llm.yaml

+        brief: The total number of tokens used in the LLM prompt and response.
+        examples: [280]
+        tag: llm-generic-response
+      - id: prompt


Should this be inside request.prompt?

Should this be inside request.prompt?

These were intended to be attributes on span events, but we will be moving them to the Event body.

joaopgrassi · 2024-02-14T15:51:03Z

model/registry/llm.yaml

+        brief: The full prompt string sent to an LLM in a request.
+        examples: ['\\n\\nHuman:You are an AI assistant that tells jokes. Can you tell me a joke about OpenTelemetry?\\n\\nAssistant:']
+        tag: llm-generic-events
+      - id: completion


Suggested change

- id: completion

- id: response.completion

No?

zzn2

Thanks @nirga for the great work!
Added some comments.

zzn2 · 2024-03-04T14:02:50Z

docs/ai/openai-metrics.md

+| Value  | Description |
+|---|---|
+| `prompt` | prompt |
+| `completion` | completion |


Does embedding have completion token type?

zzn2 · 2024-03-04T14:26:35Z

model/registry/llm.yaml

+        examples: ["stop1"]
+        tag: llm-generic-request
+      - id: response.id
+        type: string[]


Should response.id be of string type (instead of string[])?

zzn2 · 2024-03-04T14:35:55Z

model/trace/llm.yaml

+      - ref: llm.request.max_tokens
+        tag: tech-specific-openai-request
+      - ref: llm.request.temperature
+        tag: tech-specific-openai-request


What is the default requirement level if not specified?

Default is Recommended.

zzn2 · 2024-03-04T14:42:49Z

model/trace/llm.yaml

+      - llm.content.openai.tool
+      - llm.content.openai.completion.choice
+
+  - id: llm.content.openai.prompt


Some frameworks (e.g. vllm) have OpenAI-compatible serving APIs.

Using the same event name can benefit this use case.

wxpjimmy · 2024-03-07T02:17:44Z

docs/ai/llm-spans.md

+| [`llm.request.top_p`](../attributes-registry/llm.md) | double | The top_p sampling setting for the LLM request. | `1.0` | Recommended |
+| [`llm.response.finish_reason`](../attributes-registry/llm.md) | string | The reason the model stopped generating tokens. | `stop` | Recommended |
+| [`llm.response.id`](../attributes-registry/llm.md) | string[] | The unique identifier for the completion. | `[chatcmpl-123]` | Recommended |
+| [`llm.response.model`](../attributes-registry/llm.md) | string | The name of the LLM a response is being made to. [2] | `gpt-4-0613` | Required |


is this exclusive with llm.request.model?

Yes, the requested model identifier is sometimes different than the response model identifier. For example, Azure OpenAI allows for a deployment name as the request model, but responds with the actual LLM model name. Other systems will add the current variant at the end of the model name in the response.

zzn2 · 2024-03-07T11:40:15Z

docs/ai/llm-spans.md

+
+| Attribute  | Type | Description  | Examples  | Requirement Level |
+|---|---|---|---|---|
+| [`llm.prompt`](../attributes-registry/llm.md) | string | The full prompt string sent to an LLM in a request. [1] | `\\n\\nHuman:You are an AI assistant that tells jokes. Can you tell me a joke about OpenTelemetry?\\n\\nAssistant:` | Recommended |


Besides the plain text prompts, OpenAI's chat completion API also supports complicated inputs/outputs like images, function calls, etc. How do we plan to record these kinds of payloads into trace?

Besides the chat completion API, OpenAI also has a lot of other APIs like Embeddings, Images, Assistants, etc. How do we plan to support those scenarios?

We are discussing in the working group, requirements for an initial minimum PR to get into the semantic-conventions. After this initial merge. we will be able to create additional proposals, issues, PRs. We will likely reduce the surface area of this initial PR. Then proposals can be submitted for embeddings, images, etc.

nirga · 2024-03-19T11:46:58Z

Work continued in #825

nirga requested review from a team January 12, 2024 10:56

github-actions bot assigned reyang Jan 12, 2024

nirga mentioned this pull request Jan 12, 2024

Add the beginnings of AI Semantic conventions #483

Closed

3 tasks

gyliu513 reviewed Jan 12, 2024

View reviewed changes

gyliu513 mentioned this pull request Jan 17, 2024

🚀 Feature: Add IBM Watsonx Instrumentation traceloop/openllmetry#339

Closed

1 task

drewby reviewed Jan 22, 2024

View reviewed changes

nirga and others added 2 commits January 23, 2024 08:48

chore: continuing work by cartermp

8fe6a5f

Update to use Yaml model files

a521fc1

nirga force-pushed the ai branch from 8d3227a to a521fc1 Compare January 23, 2024 07:48

nirga added 2 commits January 23, 2024 19:30

chore: fixes in yaml according to reviews

5843c65

Merge branch 'main' into ai

0203aea

lmolkova requested changes Jan 27, 2024

View reviewed changes

lmolkova reviewed Jan 27, 2024

View reviewed changes

docs/ai/README.md Show resolved Hide resolved

lmolkova reviewed Jan 27, 2024

View reviewed changes

docs/ai/README.md Outdated Show resolved Hide resolved

chore: @lmolkova reviews

0891f91

nirga force-pushed the ai branch from e6862db to 0891f91 Compare January 28, 2024 18:06

nirga requested a review from lmolkova January 28, 2024 18:07

nirga and others added 7 commits January 28, 2024 19:08

Merge branch 'main' into ai

bdc1982

Add OpenAI metrics

d5a9753

Fix linting errors

0ef1c1b

Fix yamllint errors

fd57c6c

Regenerate markdown based on yaml model

c80b80c

Merge pull request #2 from drewby/ai

fdb3ba4

Adding OpenAI metrics and fixing markdown errors

Merge branch 'main' into ai

a844a28

lmolkova reviewed Jan 31, 2024

View reviewed changes

drewby mentioned this pull request Feb 7, 2024

Request to create semconv-llm-approvers #699

Open

joaopgrassi reviewed Feb 14, 2024

View reviewed changes

nirga mentioned this pull request Feb 28, 2024

[Contribution Request] Align promptflow traces and metrics with OpenLLMetry microsoft/promptflow#2162

Open

zzn2 reviewed Mar 4, 2024

View reviewed changes

nirga mentioned this pull request Mar 6, 2024

REQUEST: New membership for @nirga open-telemetry/community#1995

Closed

6 tasks

wxpjimmy reviewed Mar 7, 2024

View reviewed changes

zzn2 reviewed Mar 7, 2024

View reviewed changes

zzn2 mentioned this pull request Mar 8, 2024

Move towards the LLM OpenTelemetry convention microsoft/promptflow#2266

Closed

drewby mentioned this pull request Mar 11, 2024

REQUEST: New membership for drewby open-telemetry/community#2003

Closed

6 tasks

nirga mentioned this pull request Mar 19, 2024

LLM Semantic Conventions: Initial PR #825

Merged

3 tasks

nirga closed this Mar 19, 2024

arminru added the area:gen-ai label Aug 9, 2024

gyliu513 mentioned this pull request Sep 18, 2024

REQUEST: New membership for @gyliu513 open-telemetry/community#2355

Closed

6 tasks


		## Configuration

		Instrumentations for LLMs MUST offer the ability to turn off capture of prompts and completions. This is for three primary reasons:

		@@ -0,0 +1,372 @@
		<!--- Hugo front matter used to generate the website version of this page:
		linkTitle: MEtrics

		@@ -0,0 +1,25 @@
		<!--- Hugo front matter used to generate the website version of this page:
		linkTitle: AI

	# Semantic Conventions for AI systems
	# Semantic Conventions for LLM clients

Add LLM semantic conventions #639

Add LLM semantic conventions #639

Conversation

nirga commented Jan 12, 2024 • edited Loading

Changes

Merge requirement checklist

joaopgrassi commented Jan 12, 2024

gyliu513 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nirga commented Jan 12, 2024

drewby left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

linux-foundation-easycla bot commented Jan 23, 2024 • edited Loading

nirga commented Jan 23, 2024

lmolkova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmolkova Jan 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmolkova Jan 31, 2024 • edited Loading

Choose a reason for hiding this comment

lmolkova Jan 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zzn2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zzn2 Mar 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nirga commented Mar 19, 2024

nirga commented Jan 12, 2024 •

edited

Loading

linux-foundation-easycla bot commented Jan 23, 2024 •

edited

Loading

lmolkova Jan 31, 2024 •

edited

Loading

lmolkova Jan 31, 2024 •

edited

Loading

lmolkova Jan 31, 2024 •

edited

Loading

zzn2 Mar 7, 2024 •

edited

Loading