Code generation: how to avoid naming collisions #1118

lmolkova · 2024-06-03T17:31:55Z

As we discovered in #1031, certain semantic conventions changes (together with default code generator behavior) result in ambiguous constant names generated in the code.

For example, when foo.bar_baz is renamed to foo.bar.baz, the code generator produces the same constant name for both -FOO_BAR_BAR , FooBarBaz, etc depending on the preferred casing.

We don't remove attributes/enum members/metrics from the semantic conventions anymore (old property is deprecated), so both constants would exist in the same version.

So, the semantic conventions together with the codegen should prevent such collisions from happening:

The default tooling behavior should result in collision-free code.
There could be additional policies added to semconv to prevent certain renames.

Please comment/vote on specific options listed in the comments.

Note: there could be edge cases when ambiguity is fine and tolerable or another collision resolution approach could be used. Here we want to pick a default behavior - it does not prevent someone from implementing a different approach.

The text was updated successfully, but these errors were encountered:

lmolkova · 2024-06-03T17:35:18Z

Option 1: `foo.bar_baz` is generated as `FOO_BAR__BAZ`, `FooBar_Baz`, ...

any renames are allowed (for experimental conventions)
_ is preserved in constant names

Cons:

__ is ugly in snake_case and _ is ugly in PascalCase

lmolkova · 2024-06-03T17:38:29Z

Option 1.5: `foo.bar_baz` is generated as `FOO__BAR_BAZ`, `Foo_BarBaz`, ...

(same as Option 1 but a different format is used)

lmolkova · 2024-06-03T17:42:43Z

Option 2: `foo.bar_baz` is generated as `FOO_BARBAZ`, `FooBarbaz`, ...

renames that only remove or add _ are not allowed even for experimental attributes.
_ is removed during code generation

Cons:

limited rename options. Partially mitigated by the fact that we didn't recall making such renames in the past.

lmolkova · 2024-06-03T17:47:01Z

Option 3: do nothing

any renames are allowed (for experimental conventions)
_ and . are not distinguishable in constant names

Cons:

possible attribute name and schema version mismatch (instrumentation thinks it sets foo.bar_baz, but it's foo.bar.baz now).

lmolkova · 2024-06-03T17:52:28Z

Option 4: `foo.bar_baz` is generated as `FOO_BAR_BAZ`, `FooBarBaz`. When `foo.bar.baz` is added, the collision is detected, so the new attribute is called `FOO_BAR_BAZ_NEW`, `FooBarBazNew`.

any renames are allowed (for experimental conventions)
resolve collisions by giving new names

Cons:

new attribute would have a 'bad' name and deprecated attribute would have a 'good' name forever.

The rename can be done for the old attribute, resulting in Option 3.

marcalff · 2024-06-03T19:36:06Z

When 2 different symbols in semantic conventions generate the same symbol on the generated code,
that is, when there is a collision, one of the name mapping has to change, and this change is by definition a breaking change: code instrumented that refers to a given name has to be adjusted to use the adjusted name instead.

To minimize the overall impact, it is preferable that only symbols that are the least frequent gets to be broken.

Any solution that changes the output for foo.bar to be different, which is the most common case, should be discarded.

So, solution 1.5 is definitively out in my opinion, it will break everything.

Assuming semantic conventions containing the _ character are relatively less represented (in all the semconv that exists), the change should be on how _ is represented, to minimize the overall impact, without touching ..

~~Option 2 is out, it collides on foo.bar_baz and foo.barbaz.~~

[Edit] Looks like I misunderstood option 2. Sounds viable if these collisions are forbidden, and it avoids causing renames with the current generated code. +1 then

Option 4 does not scale in my opinion, after several renames, and it assumes there is some "state" stored somewhere to remember if a collision ever existed in the past or not. Does not sound viable.

Option 3 does not resolve the issue, this is bound to be repeated. For new semantic conventions, a desirable goal is to be able to decide the most natural name for a given semconv, without having to look at possible collisions, that would prevent to use a good name ... especially when the colliding semconv is deprecated and will eventually be removed over time.

The only viable solution looks like option 1 or option 2.

[EDIT 2024-06-11]

See newer proposal as option 5 in this thread, which is better to 1 and 2 in my opinion.

codeboten · 2024-06-04T19:03:24Z

Option 2 seems the preferable approach, but just to confirm this means that any semconv generated code after this is implemented will cause a breaking change with previous versions as the variables will have been renamed, correct?

I'm specifically trying to understand what this looks like in the context of the message client id change, does option 2 mean that the client ID change will be rolled back?

austinlparker · 2024-06-04T19:09:31Z

Option 2: foo.bar_baz is generated as FOO_BARBAZ, FooBarbaz, ...

renames that only remove or add _ are not allowed even for experimental attributes.

_ is removed during code generation

Cons:

limited rename options. Partially mitigated by the fact that we didn't recall making such renames in the past.

just to check, does that mean that messaging.client_id -> messaging.client.id gets rolled back and there's no change?

lmolkova · 2024-06-04T23:02:02Z

According to the Option2, there will be no collision for messaging.client_id -> messaging.client.id:

messaging.client_id would be MESSAGING_CLIENTID or MessagingClientid
messaging.client.id would be MESSAGING_CLIENT_ID or MessagingClientId

So I don't think we need to roll it back.

Option 2 seems the preferable approach, but just to confirm this means that any semconv generated code after this is implemented will cause a breaking change with previous versions as the variables will have been renamed, correct?

That's correct. Any option we pick (except opt3) will result in breaking changes in generated code. In most languages this is tolerable since semconv artifact is experimental.
So far we've identified JS and PHP which shipped stable semconv artifacts and JS was going to make some breaking changes anyway.
It will be a problem for PHP.

joaopgrassi · 2024-06-05T08:24:19Z

Another point that in this specific case: The new attribute messaging.client.id needs to produce a breaking change. We do not want to have the old const name carry the new attribute key value. That would cause instrumentations to send data in a mixed "schema" where some attributes are old and some are using the new keys.

To me, option 2 is the one that makes the most sense.

marcalff · 2024-06-05T08:36:33Z

Another point that in this specific case: The new attribute messaging.client.id needs to produce a breaking change.

It will not.

Existing code using MESSAGING_CLIENT_ID or MessagingClientId, corresponding to messaging.client_id, will need to be fixed to use MESSAGING_CLIENTID or MessagingClientid instead, once code generation is fixed.

joaopgrassi · 2024-06-05T08:39:59Z

Ah yes, that for sure. I think I didn't do a good job explaining, but what I was after is: whatever we do, the old const name can't carry the new attribute name. Because as you said, that wouldn't break, and be even worse because it would start sending the new attribute name.

marcalff · 2024-06-05T08:41:18Z

See my previous comment:

When 2 different symbols in semantic conventions generate the same symbol on the generated code,
that is, when there is a collision, one of the name mapping has to change, and this change is by definition a breaking change: code instrumented that refers to a given name has to be adjusted to use the adjusted name instead.

I don't think there is any way around this.

Failure to update the name will use the new semantic convention, so using MESSAGING_CLIENT_ID or MessagingClientId will mean messaging.client.id while it meant messaging.client_id before.

joaopgrassi · 2024-06-05T08:48:33Z

You're right, seems I also misunderstood option 2. I somehow thought the new one would be changed and not be used automatically, I see it now.

I think this is rather bad :(. It will be hard to not overlook something and end up with this wrong situation of old instrumentation producing/using the new attributes while not fully yet converted to the stable conventions.

The only way to not have this, would be to break both old and new generated consts. But breaking the mapping of . to _ in code gen is also bad.

dyladan · 2024-06-05T20:30:22Z

So far we've identified JS and PHP which shipped stable semconv artifacts and JS was going to make some breaking changes anyway.

In JS we're just marking everything currently exported by the package as deprecated and keeping it until we go to 2.0. We're exporting all the new names with a new style anyway to be friendlier to tree shakers and code minifiers (important in JS) so any breaking name changes due to a change in code generation are fine. If it happens again we'll just rev the major version as this package is entirely separate from the rest of the JS client.

In short: option 2 is fine for JS and is my preference

dyladan · 2024-06-06T14:22:57Z

Here's what the option 2 codegen looks like in JS open-telemetry/opentelemetry-js@dbb8328

note: we aren't merging any changes until this issue is resolved, this is just to prototype and make sure we're ready to move quickly when a final decision is made.

lmolkova · 2024-06-06T18:39:33Z

There is an overwhelming support for option 2:

_ should not be rendered when generating constants (fields, class, method, file, etc) names
semconv will have a policy to prevent renames that only add or remove underscores.

We're working on the tooling updates and have a couple of draft PRs which show how things would look like:

Java: Semconv codegen should produce different constant names if attribute is renamed _ -> `` semantic-conventions-java#75
Python: Semconv codegen should produce different constant names if attribute is renamed _ -> `` opentelemetry-python#3927

Before we ship tooling update, we'd like to get any last-minute feedback from everyone interested, specifically @lzchen, @jack-berg, @trask on the above PRs.

jack-berg · 2024-06-06T18:41:33Z

Discussed at today's Java SIG and option 2 was the preference. Thanks!

lmolkova · 2024-06-07T03:13:46Z

Option 2 does not look good in the real life:

DB_CASSANDRA_CONSISTENCYLEVEL
DbCassandraConsistencylevelValues
AWSELASTICBEANSTALK

open-telemetry/semantic-conventions-java#75

AWS__ELASTIC__BEAN__STALK, etc would be more readable.

I'd like to go back and explore Option 1 (__) or Option 3 (do nothing - _ and . are the same in the code).

Option 3+ may look like:

non-deprecated attribute gets the constant name, value is updated. The deprecated one is not generated - this could be tolerable for experimental conventions
semconv policy would prevent 2 non-deprecated attributes to have the same const name.
we won't allow to rename stable attributes (without major version bump) anyway

We can also prohibit . <-> _ renames for all attributes including experimental.

lmolkova · 2024-06-09T18:38:24Z

Here's more details on the Option 3.5:

allow collisions between deprecated and non-deprecated attributes, ignoring deprecated.
don't allow collisions between non-deprecated attributes.

Pros:

the code is not ugly (AWS_ELASTIC_BEAN_STALK vs AWS__ELASTIC__BEAN__STALK or AWSELASTICBEANSTALK)
we generate the same code, no breaking changes
we have a policy to prevent collisions for non-deprecated attributes
we can still clean up experimental attributes if we want to rename them
it won't affect stable attributes - we should not rename them

Cons:

possible attribute name and schema version mismatch (instrumentation thinks it sets foo.bar_baz, but it's foo.bar.baz now).

Prototypes:

Java example: Collisions: manual conflict resolution semantic-conventions-java#76
Build-tools policy + codegen impl: Attribute naming collisions and resolution build-tools#324
Weaver codegen example is on the way

dyladan · 2024-06-10T15:06:39Z

I think option 3.5 with prohibiting . <-> _ for all including experimental would be my preferred solution as it generates the nicest code and retains compatibility with the current generation. I think it is important to include that restriction in order to avoid telemetry shape changing without user intervention and the mentioned name and schema version mismatch.

The biggest question IMO would be then what to do about the collisions that have already happened. We can probably just accept the renames that have already happened as an accident of history.

trisch-me · 2024-06-10T15:34:36Z

i'm feeling prohibiting . <-> _ will limit our option to extract a namespace out of existing attributes in the future.

consider x.user_id and x.user_name which should become x.user.id and x.user.name
This is especially relevant for embedded feature where we could embed existing namespaces/fields under other namespaces

marcalff · 2024-06-10T15:52:33Z

Here's more details on the Option 3.5:

* allow collisions between deprecated and non-deprecated attributes, ignoring deprecated.

I don't understand how this would work, what is the code generation supposed to do then:

do not generate code at all for deprecated semconv, because of collisions ? What is the point of "deprecated" then ?
keep some deprecated semconv but not others (when a collision exists), which raises the questions:
- how to know if a deprecated semconv has a collision or not ?
- New property in the semconv metadata ?
- Support in build-tools for this new property ?

To expand on this, if the plan is to allow deprecated and non-deprecated semconv to coexist when there is a collision, "ignoring the deprecated", how about removing (instead of deprecating) the old semconv ... removal will surely resolve the collision problem.

trask · 2024-06-10T18:50:40Z

I'll take this (back) to the Java SIG on Thursday.

I suspect that we may prefer this trade off:

keep the nice readable constant names
allow breaking changes in our experimental semconv artifact when there is a conflicting rename (e.g. foo.bar_baz → foo.bar.baz)

note that Java already recommends that libraries make copies of experimental attributes in order to avoid the diamond dependency problem:

Generated code for experimental semantic conventions.
NOTE: This artifact has the -alpha and comes with no compatibility guarantees. Libraries can use this for testing, but should make copies of the attributes to avoid possible runtime errors from version conflicts.

see open-telemetry/semantic-conventions-java#50 (comment) for a bit more background on this recommendation:

opentelemetry-semconv-incubating contains stable and incubating semantic conventions

Classes live in io.opentelemetry.semconv.incubating package. Note this is different than stable artifact to make java module system work.

Stable attributes will be annotated @deprecated with javadoc pointing to the equivalent location in the stable artifact

The semantic-conventions repo will keep incubating attributes around indefinitely (or for a long time) and mark as deprecated. These deprecated, experimental attributes will be annotated @deprecated.

This artifact will always be marked alpha, since we may eventually remove some deprecated experimental attributes, and since the types of experimental attributes are subject to change.

We will discourage library authors from using this artifact. They can use it for making assertions in tests, but should make copies of any experimental attributes they depend on in instrumentation.

lmolkova · 2024-06-10T19:01:56Z

Discussed at Semconv and Maintainers calls:

there is a trade-off between readable/idiomatic code and collisions
the collisions affect experimental attributes only. we'll not rename stable attributes
languages/SIGs should be able to generate (or not generate) both constants (i.e. Code generation: how to avoid naming collisions #1118 (comment) - removing an attribute from semconv is not an option)
we probably have other candidates for _ <-> . renames (I'll share an update), if we restrict such renames now, we'll never have a chance to fix things.

One of the possible solutions includes a grace period during which _ <-> . are allowed.

lmolkova · 2024-06-10T19:29:24Z

List of existing (non-deprecated) attributes with _

k8s.container.restart_count
k8s.container.status.last_terminated_reason
aws.request_id
aws.dynamodb.table_names
aws.dynamodb.consumed_capacity
aws.dynamodb.item_collection_metrics
aws.dynamodb.provisioned_read_capacity
aws.dynamodb.provisioned_write_capacity
aws.dynamodb.consistent_read
aws.dynamodb.attributes_to_get
aws.dynamodb.index_name
aws.dynamodb.global_secondary_indexes
aws.dynamodb.local_secondary_indexes
aws.dynamodb.exclusive_start_table
aws.dynamodb.table_count
aws.dynamodb.scan_forward
aws.dynamodb.total_segments
aws.dynamodb.scanned_count
aws.dynamodb.attribute_definitions
aws.dynamodb.global_secondary_index_updates
aws.lambda.invoked_arn
aws.s3.copy_source
aws.s3.upload_id
aws.s3.part_number
gen_ai.system
gen_ai.request.model
gen_ai.request.max_tokens
gen_ai.request.temperature
gen_ai.request.top_p
gen_ai.response.id
gen_ai.response.model
gen_ai.response.finish_reasons
gen_ai.usage.prompt_tokens
gen_ai.usage.completion_tokens
gen_ai.token.type
gen_ai.prompt
gen_ai.completion
gen_ai.operation.name
user.full_name
opentracing.ref_type
container.image.repo_digests
container.command_line
container.command_args
aspnetcore.rate_limiting.policy
aspnetcore.rate_limiting.result
aspnetcore.routing.is_fallback
aspnetcore.request.is_unhandled
aspnetcore.routing.match_status
log.file.name_resolved
log.file.path_resolved
android.os.api_level
http.request.method_original
http.request.resend_count
http.response.status_code
session.previous_id
url.registered_domain
url.top_level_domain
otel.status_code
otel.status_description
tls.client.certificate_chain
tls.client.not_after
tls.client.not_before
tls.client.server_name
tls.client.supported_ciphers
tls.next_protocol
tls.server.certificate_chain
tls.server.not_after
tls.server.not_before
db.cassandra.consistency_level
db.cassandra.page_size
db.cassandra.speculative_execution_count
db.cosmosdb.client_id
db.cosmosdb.connection_mode
db.cosmosdb.operation_type
db.cosmosdb.request_charge
db.cosmosdb.request_content_length
db.cosmosdb.status_code
db.cosmosdb.sub_status_code
db.elasticsearch.path_parts
user_agent.original
user_agent.name
user_agent.version
cloud.resource_id
cloud.availability_zone
feature_flag.key
feature_flag.provider_name
feature_flag.variant
faas.max_memory
faas.invoked_name
faas.invoked_provider
faas.invoked_region
faas.invocation_id
heroku.release.creation_timestamp
rpc.connect_rpc.error_code
rpc.connect_rpc.request.metadata
rpc.connect_rpc.response.metadata
rpc.grpc.status_code
rpc.jsonrpc.error_code
rpc.jsonrpc.error_message
rpc.jsonrpc.request_id
rpc.message.compressed_size
rpc.message.uncompressed_size
process.parent_pid
process.session_leader.pid
process.group_leader.pid
process.command_line
process.command_args
process.real_user.id
process.real_user.name
process.saved_user.id
process.saved_user.name
process.context_switch_type
process.paging.fault_type
cloudevents.event_id
cloudevents.event_source
cloudevents.event_spec_version
cloudevents.event_type
cloudevents.event_subject
gcp.cloud_run.job.execution
gcp.cloud_run.job.task_index
os.build_id
system.cpu.logical_number
messaging.batch.message_count
messaging.destination_publish.anonymous
messaging.destination_publish.name
messaging.message.conversation_id
messaging.rabbitmq.destination.routing_key
messaging.rabbitmq.message.delivery_tag
messaging.rocketmq.client_group
messaging.rocketmq.consumption_model
messaging.rocketmq.message.delay_time_level
messaging.rocketmq.message.delivery_timestamp
messaging.gcp_pubsub.message.ordering_key
messaging.gcp_pubsub.message.ack_id
messaging.gcp_pubsub.message.ack_deadline
messaging.gcp_pubsub.message.delivery_attempt
messaging.servicebus.message.delivery_count
messaging.servicebus.message.enqueued_time
messaging.servicebus.destination.subscription_name
messaging.servicebus.disposition_status
messaging.eventhubs.message.enqueued_time

Some examples where _ -> . could make sense:

messaging.gcp_pubsub.* -> messaging.gcp.pubsub (it could make sense to use {provider}.{service} format instead of {provider}_{service})
process.command_line, process.command_args -> process.command.* - in case there are multiple properties that describe the same thing. Similar:
- process.parent_pid -> process.parent.pid|name|...
- rpc.jsonrpc.request_id -> rpc.jsonrpc.request.id
- db.cosmosdb.request_charge, db.cosmosdb.request_content_length -> db.cosmosdb.request.*
- aws.dynamodb.table_names -> aws.dynamodb.table.*

...

I believe we don't always know whether more attributes about something (e.g. request) are expected and default to adding an attribute with _.

Some vague semconv guidance could be:

default to . whenever possible. E.g. a thing and its properties should usually be separated with . (table.name, request.id, operation.name) since there could be multiple.
only use _ when . does not make sense or would alter the meaning. e.g. cloud_run, rate_limiting, max_tokens, resend_count, availability_zone

marcalff · 2024-06-11T10:24:17Z

Changing code generation rules in general for _ will affect a lot of existing, non deprecated, non colliding semantic conventions, which is not desirable.

OPTION 5

How about:

Define a new property, generate_as, in build tools / weaver
In semantic conventions, manually resolve collisions by providing an alternate generate_as name for the generated symbol, while not changing the associated value

For example:

semconv messaging.client.id is untouched.
semconv messaging.client_id is modified with the property generate_as = messaging.client_id.deprecated (since, after all, it is deprecated. Could be named .old, .v1, or anything different)

Taking C++ as an example, code generation will look like:

  kMessagingClientId = "messaging.client.id"; // new semconv
  kMessagingClientIdDeprecated = "messaging.client_id"; // old semconv.

[EDIT 2024-06-11]

Alternate example, if we want to change the new name instead:

semconv messaging.client.id is modified with the property generate_as = messaging.client.id.2 (to avoid the collision)
semconv messaging.client_id is untouched.

Taking C++ as an example, code generation will look like:

  kMessagingClientId2 = "messaging.client.id"; // new semconv
  kMessagingClientId = "messaging.client_id"; // old semconv.

The major benefit I see is that existing semconv like:

k8s.container.restart_count
etc

are unchanged.

Only symbols that actually collide need special treatment, and only user code that depends on these symbols needs to change (to use kMessagingClientIdDeprecated if this is really what the instrumentation wants).

What generate_as provides, is to decouple symbol names from semantic conventions values, so resolving a conflict in values (due to differences only with _ and .) does not force to revise code generation rules names for everything, impacting many unrelated symbols (all the existing semconv that contains a _ today).

marcalff · 2024-06-11T10:41:38Z

@lmolkova Please see proposal for option 5 above.

lmolkova · 2024-06-11T22:00:47Z

@marcalff thanks!

It does not make send to generate the deprecated attribute differently - nobody will update their code to point to a deprecated attribute - we don't want them to.

I.e. having generate_as would only make sense on the new attribute. But also we don't want to change how the new attribute is generated in the stable conventions.

But I really like the idea of manual resolution:

the collisions would be rare
linters would detect duplicates, I.e. it'd be quite hard to ship the artifact without first resolving collisions
the manual resolution would be one of the following:
- drop the deprecated attribute which has a collision with a new one - some languages could be fine with it.
- give a new name to the new attribute - I'd leave it up to each language to pick a pattern they like.
  - E.g. messaging.client.id would become incubating-semconv.MessagingAttributes.MESSAGING_CLIENT_ID2 = "messaging.client.id" and stable-semconv.MessagingAttributes.MESSAGING_CLIENT_ID = "messaging.client.id"
- someone can still do Option 5 and give a new name to the OLD attribute
  - E.g. messaging.client_id would become incubating-semconv.MessagingAttributes.MESSAGING_CLIENT_ID_DEPRECATED = "messaging.client_id"

More context on this here: open-telemetry/semantic-conventions-java#75 (comment)

and here's the prototype: open-telemetry/semantic-conventions-java#76 - effectively there is a way to remap constant name and/or remove attribute.

TL;DR: SIGs decide to drop the deprecated attribute or assign a different constant name - they pick a new constant name in case of a collision.

marcalff · 2024-06-13T09:42:42Z

TL;DR: SIGs decide to drop the deprecated attribute or assign a different constant name - they pick a new constant name in case of a collision.

I don't understand.

I was hopping for a solution where the alternate name comes from metadata added in the semantic convention for collisions, so that the code generation scripts do not have to be changed all the time to account for special cases as they arise.

Now it looks like each SIG, in their code generation scripts, it to add code for special cases.

Is the conclusion really that SIGs are on their own and must fix collisions created "upstream" in semantic conventions ?

In this case, opentelemetry-cpp already complies: it drops the deprecated messaging.client_id.

lmolkova · 2024-06-14T02:06:38Z

I was hopping for a solution where the alternate name comes from metadata added in the semantic convention for collisions, so that the code generation scripts do not have to be changed all the time to account for special cases as they arise.

Assuming there is an alternative constant name in the schema, would you drop the attribute or would you generate both attributes (with different names) in C++?

The prescriptive solution coming from semconv would be:

messaging.client_id (old) should be marked somehow so it can be dropped. E.g. collides_with: messaging.client.id
messaging.client.id (new) would be marked with alternative constant name. E.g. unque_const_name: messaging_client_id2.

Jinja scripts would need to be changed to either drop based one collides_with or use constant name from the unque_const_name (converting it to proper case).

Now it looks like each SIG, in their code generation scripts, it to add code for special cases.

The templates need to be changed now to support dropping/renaming. If a collision happens again, the change would be to add another attribute name to the list of excluded attributes (or to a manually maintained map). E.g. this change is java open-telemetry/semantic-conventions-java#76 adds all you need to either drop or rename constants by making a small config change.

Is the conclusion really that SIGs are on their own and must fix collisions created "upstream" in semantic conventions ?

I think it's a path forward and not a conclusion.
We need need to add checks to semconv to know when they happen before we release. We should probably block such renames until we have a reasonable conclusion.

Manual resolution seems like an easy and cheap thing to do to solve this and similar collisions in the same way. We can automate it, but it's still not clear to me what are all the options we need to provide.

Note:
We are migrating tooling from build-tools to weaver. Code generation migration which would involve some jinja updates and this would be a good time to add the `final' solution.

TL;DR:

semconv should detect codegen collisions when they are added
I want to see how SIGs handle the current collision to understand which options we need to provide
while we're discussing this, we should block renames that cause such collisions

lmolkova · 2024-06-17T16:07:41Z

Discussed at Semconv WG 6/17

we need to detect codegen collisions when they are added to semconv
we should put a moratorium on renames that'd result in collision until we have tooling/checks in place
we need to update attribute naming guidance and suggest to use . when it makes sense
we'll probably make different choices for different attributes when collision happens
- in case of messaging.client_id dropping the deprecated/experimental/slightly-used attribute is reasonable
- we could decide differently for process.command_line
- semconv should be the source of guidance on dropping/giving new constant names
- we need tooling for it

So:

phase 1:
- add collision checks to semconv tooling
- put moratorium on renames resulting in collisions
phase 2: Implement code-generation hints to drop/rename attributes in case of a collision #1462
- tooling work to support dropping/remapping based on the schema
phase 3:
- lift moratorium and allow colliding renames
- rename things
- ...
phase 4:
- eventually prohibit renames that result in collisions

dyladan · 2024-07-10T02:47:33Z

Has there been any resolution on this?

lmolkova · 2024-07-10T14:22:51Z

Yes, we're adding collision checks - #1209 - necessary tooling changes just landed.

github-actions bot assigned AlexanderWert Jun 3, 2024

lmolkova assigned lmolkova and unassigned AlexanderWert Jun 3, 2024

lmolkova mentioned this issue Jun 3, 2024

messaging.client_id -> messaging.client.id rename causes issues with code generation #1031

Closed

trentm mentioned this issue Jun 5, 2024

Semantic Conventions Update Options open-telemetry/opentelemetry-js#4771

Closed

trask mentioned this issue Jun 6, 2024

Update to semantic-conventions 1.26.0 open-telemetry/semantic-conventions-java#73

Merged

lmolkova mentioned this issue Jun 6, 2024

Semconv codegen should produce different constant names if attribute is renamed _ -> `` open-telemetry/opentelemetry-python#3927

Closed

8 tasks

lquerel mentioned this issue Jun 6, 2024

feat(forge): Add semconv_const filter to support semantic convention namespacing rules. open-telemetry/weaver#200

Merged

lmolkova mentioned this issue Jun 6, 2024

Semconv codegen should produce different constant names if attribute is renamed _ -> `` open-telemetry/semantic-conventions-java#75

Closed

lmolkova mentioned this issue Jun 9, 2024

Attribute naming collisions and resolution open-telemetry/build-tools#324

Closed

lmolkova mentioned this issue Jun 10, 2024

Add pre-massaged data into jinja/jq context when generating registry open-telemetry/weaver#204

Closed

joaopgrassi mentioned this issue Jun 13, 2024

Add additional LLM span attributes #1059

Merged

3 tasks

ChrsMark mentioned this issue Jun 25, 2024

[receiver/kubeletstats] Add k8s.{container,pod}.memory.node.utilization metrics open-telemetry/opentelemetry-collector-contrib#33591

Merged

trask mentioned this issue Jun 28, 2024

[profiles] introduce semantic convention #1188

Merged

3 tasks

This was referenced Jul 11, 2024

Attribute names: unicode on OTLP, only [a-z0-9._] in OTel semcov #1124

Closed

Rename cloud.resource_id to cloud.resource.id (blocked - we don't allow such renames for now) #1256

Open

lmolkova mentioned this issue Jul 24, 2024

Add attribute name and const names checks #1209

Merged

trentm mentioned this issue Aug 2, 2024

Use weaver to generate latest semconv 1.27 open-telemetry/opentelemetry-js#4690

Merged

trask mentioned this issue Aug 22, 2024

Added rules for capturing Apache Camel metrics exposed by JMX MBeans open-telemetry/opentelemetry-java-instrumentation#11901

Merged

lmolkova mentioned this issue Oct 9, 2024

Implement code-generation hints to drop/rename attributes in case of a collision #1462

Open

Code generation: how to avoid naming collisions #1118

Code generation: how to avoid naming collisions #1118

Comments

lmolkova commented Jun 3, 2024 • edited Loading

lmolkova commented Jun 3, 2024 • edited Loading

Option 1: foo.bar_baz is generated as FOO_BAR__BAZ, FooBar_Baz, ...

lmolkova commented Jun 3, 2024 • edited Loading

Option 1.5: foo.bar_baz is generated as FOO__BAR_BAZ, Foo_BarBaz, ...

lmolkova commented Jun 3, 2024 • edited Loading

Option 2: foo.bar_baz is generated as FOO_BARBAZ, FooBarbaz, ...

lmolkova commented Jun 3, 2024 • edited Loading

Option 3: do nothing

lmolkova commented Jun 3, 2024 • edited Loading

Option 4: foo.bar_baz is generated as FOO_BAR_BAZ, FooBarBaz. When foo.bar.baz is added, the collision is detected, so the new attribute is called FOO_BAR_BAZ_NEW, FooBarBazNew.

marcalff commented Jun 3, 2024 • edited Loading

codeboten commented Jun 4, 2024 • edited Loading

austinlparker commented Jun 4, 2024

Option 2: foo.bar_baz is generated as FOO_BARBAZ, FooBarbaz, ...

lmolkova commented Jun 4, 2024

joaopgrassi commented Jun 5, 2024

marcalff commented Jun 5, 2024

joaopgrassi commented Jun 5, 2024

marcalff commented Jun 5, 2024

joaopgrassi commented Jun 5, 2024

dyladan commented Jun 5, 2024

dyladan commented Jun 6, 2024

lmolkova commented Jun 6, 2024 • edited Loading

jack-berg commented Jun 6, 2024

lmolkova commented Jun 7, 2024 • edited Loading

lmolkova commented Jun 9, 2024 • edited Loading

dyladan commented Jun 10, 2024

trisch-me commented Jun 10, 2024 • edited Loading

marcalff commented Jun 10, 2024 • edited Loading

trask commented Jun 10, 2024

lmolkova commented Jun 10, 2024

lmolkova commented Jun 10, 2024 • edited Loading

marcalff commented Jun 11, 2024 • edited Loading

marcalff commented Jun 11, 2024

lmolkova commented Jun 11, 2024 • edited Loading

marcalff commented Jun 13, 2024

lmolkova commented Jun 14, 2024 • edited Loading

lmolkova commented Jun 17, 2024 • edited Loading

dyladan commented Jul 10, 2024

lmolkova commented Jul 10, 2024

lmolkova commented Jun 3, 2024 •

edited

Loading

lmolkova commented Jun 3, 2024 •

edited

Loading

Option 1: `foo.bar_baz` is generated as `FOO_BAR__BAZ`, `FooBar_Baz`, ...

lmolkova commented Jun 3, 2024 •

edited

Loading

Option 1.5: `foo.bar_baz` is generated as `FOO__BAR_BAZ`, `Foo_BarBaz`, ...

lmolkova commented Jun 3, 2024 •

edited

Loading

Option 2: `foo.bar_baz` is generated as `FOO_BARBAZ`, `FooBarbaz`, ...

lmolkova commented Jun 3, 2024 •

edited

Loading

lmolkova commented Jun 3, 2024 •

edited

Loading

Option 4: `foo.bar_baz` is generated as `FOO_BAR_BAZ`, `FooBarBaz`. When `foo.bar.baz` is added, the collision is detected, so the new attribute is called `FOO_BAR_BAZ_NEW`, `FooBarBazNew`.

marcalff commented Jun 3, 2024 •

edited

Loading

codeboten commented Jun 4, 2024 •

edited

Loading

Option 2: `foo.bar_baz` is generated as `FOO_BARBAZ`, `FooBarbaz`, ...

lmolkova commented Jun 6, 2024 •

edited

Loading

lmolkova commented Jun 7, 2024 •

edited

Loading

lmolkova commented Jun 9, 2024 •

edited

Loading

trisch-me commented Jun 10, 2024 •

edited

Loading

marcalff commented Jun 10, 2024 •

edited

Loading

lmolkova commented Jun 10, 2024 •

edited

Loading

marcalff commented Jun 11, 2024 •

edited

Loading

lmolkova commented Jun 11, 2024 •

edited

Loading

lmolkova commented Jun 14, 2024 •

edited

Loading

lmolkova commented Jun 17, 2024 •

edited

Loading