Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code generation: how to avoid naming collisions #1118

Open
lmolkova opened this issue Jun 3, 2024 · 34 comments
Open

Code generation: how to avoid naming collisions #1118

lmolkova opened this issue Jun 3, 2024 · 34 comments
Assignees

Comments

@lmolkova
Copy link
Contributor

lmolkova commented Jun 3, 2024

As we discovered in #1031, certain semantic conventions changes (together with default code generator behavior) result in ambiguous constant names generated in the code.

For example, when foo.bar_baz is renamed to foo.bar.baz, the code generator produces the same constant name for both -FOO_BAR_BAR , FooBarBaz, etc depending on the preferred casing.

We don't remove attributes/enum members/metrics from the semantic conventions anymore (old property is deprecated), so both constants would exist in the same version.

So, the semantic conventions together with the codegen should prevent such collisions from happening:

  • The default tooling behavior should result in collision-free code.
  • There could be additional policies added to semconv to prevent certain renames.

Please comment/vote on specific options listed in the comments.

Note: there could be edge cases when ambiguity is fine and tolerable or another collision resolution approach could be used. Here we want to pick a default behavior - it does not prevent someone from implementing a different approach.

@lmolkova
Copy link
Contributor Author

lmolkova commented Jun 3, 2024

Option 1: foo.bar_baz is generated as FOO_BAR__BAZ, FooBar_Baz, ...

  • any renames are allowed (for experimental conventions)
  • _ is preserved in constant names

Cons:

  • __ is ugly in snake_case and _ is ugly in PascalCase

@lmolkova
Copy link
Contributor Author

lmolkova commented Jun 3, 2024

Option 1.5: foo.bar_baz is generated as FOO__BAR_BAZ, Foo_BarBaz, ...

(same as Option 1 but a different format is used)

@lmolkova
Copy link
Contributor Author

lmolkova commented Jun 3, 2024

Option 2: foo.bar_baz is generated as FOO_BARBAZ, FooBarbaz, ...

  • renames that only remove or add _ are not allowed even for experimental attributes.
  • _ is removed during code generation

Cons:

  • limited rename options. Partially mitigated by the fact that we didn't recall making such renames in the past.

@lmolkova
Copy link
Contributor Author

lmolkova commented Jun 3, 2024

Option 3: do nothing

  • any renames are allowed (for experimental conventions)
  • _ and . are not distinguishable in constant names

Cons:

  • possible attribute name and schema version mismatch (instrumentation thinks it sets foo.bar_baz, but it's foo.bar.baz now).

@lmolkova
Copy link
Contributor Author

lmolkova commented Jun 3, 2024

Option 4: foo.bar_baz is generated as FOO_BAR_BAZ, FooBarBaz. When foo.bar.baz is added, the collision is detected, so the new attribute is called FOO_BAR_BAZ_NEW, FooBarBazNew.

  • any renames are allowed (for experimental conventions)
  • resolve collisions by giving new names

Cons:

  • new attribute would have a 'bad' name and deprecated attribute would have a 'good' name forever.

The rename can be done for the old attribute, resulting in Option 3.

@marcalff
Copy link
Member

marcalff commented Jun 3, 2024

When 2 different symbols in semantic conventions generate the same symbol on the generated code,
that is, when there is a collision, one of the name mapping has to change, and this change is by definition a breaking change: code instrumented that refers to a given name has to be adjusted to use the adjusted name instead.

To minimize the overall impact, it is preferable that only symbols that are the least frequent gets to be broken.

Any solution that changes the output for foo.bar to be different, which is the most common case, should be discarded.

So, solution 1.5 is definitively out in my opinion, it will break everything.

Assuming semantic conventions containing the _ character are relatively less represented (in all the semconv that exists), the change should be on how _ is represented, to minimize the overall impact, without touching ..

Option 2 is out, it collides on foo.bar_baz and foo.barbaz.

[Edit] Looks like I misunderstood option 2. Sounds viable if these collisions are forbidden, and it avoids causing renames with the current generated code. +1 then

Option 4 does not scale in my opinion, after several renames, and it assumes there is some "state" stored somewhere to remember if a collision ever existed in the past or not. Does not sound viable.

Option 3 does not resolve the issue, this is bound to be repeated. For new semantic conventions, a desirable goal is to be able to decide the most natural name for a given semconv, without having to look at possible collisions, that would prevent to use a good name ... especially when the colliding semconv is deprecated and will eventually be removed over time.

The only viable solution looks like option 1 or option 2.

[EDIT 2024-06-11]

See newer proposal as option 5 in this thread, which is better to 1 and 2 in my opinion.

@codeboten
Copy link

codeboten commented Jun 4, 2024

Option 2 seems the preferable approach, but just to confirm this means that any semconv generated code after this is implemented will cause a breaking change with previous versions as the variables will have been renamed, correct?

I'm specifically trying to understand what this looks like in the context of the message client id change, does option 2 mean that the client ID change will be rolled back?

@austinlparker
Copy link
Member

Option 2: foo.bar_baz is generated as FOO_BARBAZ, FooBarbaz, ...

  • renames that only remove or add _ are not allowed even for experimental attributes.
  • _ is removed during code generation

Cons:

  • limited rename options. Partially mitigated by the fact that we didn't recall making such renames in the past.

just to check, does that mean that messaging.client_id -> messaging.client.id gets rolled back and there's no change?

@lmolkova
Copy link
Contributor Author

lmolkova commented Jun 4, 2024

According to the Option2, there will be no collision for messaging.client_id -> messaging.client.id:

  • messaging.client_id would be MESSAGING_CLIENTID or MessagingClientid
  • messaging.client.id would be MESSAGING_CLIENT_ID or MessagingClientId

So I don't think we need to roll it back.

Option 2 seems the preferable approach, but just to confirm this means that any semconv generated code after this is implemented will cause a breaking change with previous versions as the variables will have been renamed, correct?

That's correct. Any option we pick (except opt3) will result in breaking changes in generated code. In most languages this is tolerable since semconv artifact is experimental.
So far we've identified JS and PHP which shipped stable semconv artifacts and JS was going to make some breaking changes anyway.
It will be a problem for PHP.

@joaopgrassi
Copy link
Member

Another point that in this specific case: The new attribute messaging.client.id needs to produce a breaking change. We do not want to have the old const name carry the new attribute key value. That would cause instrumentations to send data in a mixed "schema" where some attributes are old and some are using the new keys.

To me, option 2 is the one that makes the most sense.

@marcalff
Copy link
Member

marcalff commented Jun 5, 2024

Another point that in this specific case: The new attribute messaging.client.id needs to produce a breaking change.

It will not.

Existing code using MESSAGING_CLIENT_ID or MessagingClientId, corresponding to messaging.client_id, will need to be fixed to use MESSAGING_CLIENTID or MessagingClientid instead, once code generation is fixed.

@joaopgrassi
Copy link
Member

Ah yes, that for sure. I think I didn't do a good job explaining, but what I was after is: whatever we do, the old const name can't carry the new attribute name. Because as you said, that wouldn't break, and be even worse because it would start sending the new attribute name.

@marcalff
Copy link
Member

marcalff commented Jun 5, 2024

See my previous comment:

When 2 different symbols in semantic conventions generate the same symbol on the generated code,
that is, when there is a collision, one of the name mapping has to change, and this change is by definition a breaking change: code instrumented that refers to a given name has to be adjusted to use the adjusted name instead.

I don't think there is any way around this.

Failure to update the name will use the new semantic convention, so using MESSAGING_CLIENT_ID or MessagingClientId will mean messaging.client.id while it meant messaging.client_id before.

@joaopgrassi
Copy link
Member

You're right, seems I also misunderstood option 2. I somehow thought the new one would be changed and not be used automatically, I see it now.

I think this is rather bad :(. It will be hard to not overlook something and end up with this wrong situation of old instrumentation producing/using the new attributes while not fully yet converted to the stable conventions.

The only way to not have this, would be to break both old and new generated consts. But breaking the mapping of . to _ in code gen is also bad.

@dyladan
Copy link
Member

dyladan commented Jun 5, 2024

So far we've identified JS and PHP which shipped stable semconv artifacts and JS was going to make some breaking changes anyway.

In JS we're just marking everything currently exported by the package as deprecated and keeping it until we go to 2.0. We're exporting all the new names with a new style anyway to be friendlier to tree shakers and code minifiers (important in JS) so any breaking name changes due to a change in code generation are fine. If it happens again we'll just rev the major version as this package is entirely separate from the rest of the JS client.

In short: option 2 is fine for JS and is my preference

@dyladan
Copy link
Member

dyladan commented Jun 6, 2024

Here's what the option 2 codegen looks like in JS open-telemetry/opentelemetry-js@dbb8328

note: we aren't merging any changes until this issue is resolved, this is just to prototype and make sure we're ready to move quickly when a final decision is made.

@lmolkova
Copy link
Contributor Author

lmolkova commented Jun 6, 2024

There is an overwhelming support for option 2:

  • _ should not be rendered when generating constants (fields, class, method, file, etc) names
  • semconv will have a policy to prevent renames that only add or remove underscores.

We're working on the tooling updates and have a couple of draft PRs which show how things would look like:

Before we ship tooling update, we'd like to get any last-minute feedback from everyone interested, specifically @lzchen, @jack-berg, @trask on the above PRs.

@jack-berg
Copy link
Member

Discussed at today's Java SIG and option 2 was the preference. Thanks!

@lmolkova
Copy link
Contributor Author

lmolkova commented Jun 7, 2024

Option 2 does not look good in the real life:

  • DB_CASSANDRA_CONSISTENCYLEVEL
  • DbCassandraConsistencylevelValues
  • AWSELASTICBEANSTALK

open-telemetry/semantic-conventions-java#75

AWS__ELASTIC__BEAN__STALK, etc would be more readable.

I'd like to go back and explore Option 1 (__) or Option 3 (do nothing - _ and . are the same in the code).

Option 3+ may look like:

  • non-deprecated attribute gets the constant name, value is updated. The deprecated one is not generated - this could be tolerable for experimental conventions
  • semconv policy would prevent 2 non-deprecated attributes to have the same const name.
  • we won't allow to rename stable attributes (without major version bump) anyway

We can also prohibit . <-> _ renames for all attributes including experimental.

@lmolkova
Copy link
Contributor Author

lmolkova commented Jun 9, 2024

Here's more details on the Option 3.5:

  • allow collisions between deprecated and non-deprecated attributes, ignoring deprecated.
  • don't allow collisions between non-deprecated attributes.

Pros:

  • the code is not ugly (AWS_ELASTIC_BEAN_STALK vs AWS__ELASTIC__BEAN__STALK or AWSELASTICBEANSTALK)
  • we generate the same code, no breaking changes
  • we have a policy to prevent collisions for non-deprecated attributes
  • we can still clean up experimental attributes if we want to rename them
  • it won't affect stable attributes - we should not rename them

Cons:

  • possible attribute name and schema version mismatch (instrumentation thinks it sets foo.bar_baz, but it's foo.bar.baz now).

Prototypes:

@dyladan
Copy link
Member

dyladan commented Jun 10, 2024

I think option 3.5 with prohibiting . <-> _ for all including experimental would be my preferred solution as it generates the nicest code and retains compatibility with the current generation. I think it is important to include that restriction in order to avoid telemetry shape changing without user intervention and the mentioned name and schema version mismatch.

The biggest question IMO would be then what to do about the collisions that have already happened. We can probably just accept the renames that have already happened as an accident of history.

@trisch-me
Copy link
Contributor

trisch-me commented Jun 10, 2024

i'm feeling prohibiting . <-> _ will limit our option to extract a namespace out of existing attributes in the future.

consider x.user_id and x.user_name which should become x.user.id and x.user.name
This is especially relevant for embedded feature where we could embed existing namespaces/fields under other namespaces

@marcalff
Copy link
Member

marcalff commented Jun 10, 2024

Here's more details on the Option 3.5:

* allow collisions between deprecated and non-deprecated attributes, ignoring deprecated.

I don't understand how this would work, what is the code generation supposed to do then:

  • do not generate code at all for deprecated semconv, because of collisions ? What is the point of "deprecated" then ?
  • keep some deprecated semconv but not others (when a collision exists), which raises the questions:
    • how to know if a deprecated semconv has a collision or not ?
    • New property in the semconv metadata ?
    • Support in build-tools for this new property ?

To expand on this, if the plan is to allow deprecated and non-deprecated semconv to coexist when there is a collision, "ignoring the deprecated", how about removing (instead of deprecating) the old semconv ... removal will surely resolve the collision problem.

@trask
Copy link
Member

trask commented Jun 10, 2024

I'll take this (back) to the Java SIG on Thursday.

I suspect that we may prefer this trade off:

  • keep the nice readable constant names
  • allow breaking changes in our experimental semconv artifact when there is a conflicting rename (e.g. foo.bar_bazfoo.bar.baz)

note that Java already recommends that libraries make copies of experimental attributes in order to avoid the diamond dependency problem:

Generated code for experimental semantic conventions.
NOTE: This artifact has the -alpha and comes with no compatibility guarantees. Libraries can use this for testing, but should make copies of the attributes to avoid possible runtime errors from version conflicts.

see open-telemetry/semantic-conventions-java#50 (comment) for a bit more background on this recommendation:

  • opentelemetry-semconv-incubating contains stable and incubating semantic conventions
    • Classes live in io.opentelemetry.semconv.incubating package. Note this is different than stable artifact to make java module system work.
    • Stable attributes will be annotated @deprecated with javadoc pointing to the equivalent location in the stable artifact
    • The semantic-conventions repo will keep incubating attributes around indefinitely (or for a long time) and mark as deprecated. These deprecated, experimental attributes will be annotated @deprecated.
    • This artifact will always be marked alpha, since we may eventually remove some deprecated experimental attributes, and since the types of experimental attributes are subject to change.
    • We will discourage library authors from using this artifact. They can use it for making assertions in tests, but should make copies of any experimental attributes they depend on in instrumentation.

@lmolkova
Copy link
Contributor Author

Discussed at Semconv and Maintainers calls:

  • there is a trade-off between readable/idiomatic code and collisions
  • the collisions affect experimental attributes only. we'll not rename stable attributes
  • languages/SIGs should be able to generate (or not generate) both constants (i.e. Code generation: how to avoid naming collisions #1118 (comment) - removing an attribute from semconv is not an option)
  • we probably have other candidates for _ <-> . renames (I'll share an update), if we restrict such renames now, we'll never have a chance to fix things.

One of the possible solutions includes a grace period during which _ <-> . are allowed.

@lmolkova
Copy link
Contributor Author

lmolkova commented Jun 10, 2024

List of existing (non-deprecated) attributes with _
  • k8s.container.restart_count
  • k8s.container.status.last_terminated_reason
  • aws.request_id
  • aws.dynamodb.table_names
  • aws.dynamodb.consumed_capacity
  • aws.dynamodb.item_collection_metrics
  • aws.dynamodb.provisioned_read_capacity
  • aws.dynamodb.provisioned_write_capacity
  • aws.dynamodb.consistent_read
  • aws.dynamodb.attributes_to_get
  • aws.dynamodb.index_name
  • aws.dynamodb.global_secondary_indexes
  • aws.dynamodb.local_secondary_indexes
  • aws.dynamodb.exclusive_start_table
  • aws.dynamodb.table_count
  • aws.dynamodb.scan_forward
  • aws.dynamodb.total_segments
  • aws.dynamodb.scanned_count
  • aws.dynamodb.attribute_definitions
  • aws.dynamodb.global_secondary_index_updates
  • aws.lambda.invoked_arn
  • aws.s3.copy_source
  • aws.s3.upload_id
  • aws.s3.part_number
  • gen_ai.system
  • gen_ai.request.model
  • gen_ai.request.max_tokens
  • gen_ai.request.temperature
  • gen_ai.request.top_p
  • gen_ai.response.id
  • gen_ai.response.model
  • gen_ai.response.finish_reasons
  • gen_ai.usage.prompt_tokens
  • gen_ai.usage.completion_tokens
  • gen_ai.token.type
  • gen_ai.prompt
  • gen_ai.completion
  • gen_ai.operation.name
  • user.full_name
  • opentracing.ref_type
  • container.image.repo_digests
  • container.command_line
  • container.command_args
  • aspnetcore.rate_limiting.policy
  • aspnetcore.rate_limiting.result
  • aspnetcore.routing.is_fallback
  • aspnetcore.request.is_unhandled
  • aspnetcore.routing.match_status
  • log.file.name_resolved
  • log.file.path_resolved
  • android.os.api_level
  • http.request.method_original
  • http.request.resend_count
  • http.response.status_code
  • session.previous_id
  • url.registered_domain
  • url.top_level_domain
  • otel.status_code
  • otel.status_description
  • tls.client.certificate_chain
  • tls.client.not_after
  • tls.client.not_before
  • tls.client.server_name
  • tls.client.supported_ciphers
  • tls.next_protocol
  • tls.server.certificate_chain
  • tls.server.not_after
  • tls.server.not_before
  • db.cassandra.consistency_level
  • db.cassandra.page_size
  • db.cassandra.speculative_execution_count
  • db.cosmosdb.client_id
  • db.cosmosdb.connection_mode
  • db.cosmosdb.operation_type
  • db.cosmosdb.request_charge
  • db.cosmosdb.request_content_length
  • db.cosmosdb.status_code
  • db.cosmosdb.sub_status_code
  • db.elasticsearch.path_parts
  • user_agent.original
  • user_agent.name
  • user_agent.version
  • cloud.resource_id
  • cloud.availability_zone
  • feature_flag.key
  • feature_flag.provider_name
  • feature_flag.variant
  • faas.max_memory
  • faas.invoked_name
  • faas.invoked_provider
  • faas.invoked_region
  • faas.invocation_id
  • heroku.release.creation_timestamp
  • rpc.connect_rpc.error_code
  • rpc.connect_rpc.request.metadata
  • rpc.connect_rpc.response.metadata
  • rpc.grpc.status_code
  • rpc.jsonrpc.error_code
  • rpc.jsonrpc.error_message
  • rpc.jsonrpc.request_id
  • rpc.message.compressed_size
  • rpc.message.uncompressed_size
  • process.parent_pid
  • process.session_leader.pid
  • process.group_leader.pid
  • process.command_line
  • process.command_args
  • process.real_user.id
  • process.real_user.name
  • process.saved_user.id
  • process.saved_user.name
  • process.context_switch_type
  • process.paging.fault_type
  • cloudevents.event_id
  • cloudevents.event_source
  • cloudevents.event_spec_version
  • cloudevents.event_type
  • cloudevents.event_subject
  • gcp.cloud_run.job.execution
  • gcp.cloud_run.job.task_index
  • os.build_id
  • system.cpu.logical_number
  • messaging.batch.message_count
  • messaging.destination_publish.anonymous
  • messaging.destination_publish.name
  • messaging.message.conversation_id
  • messaging.rabbitmq.destination.routing_key
  • messaging.rabbitmq.message.delivery_tag
  • messaging.rocketmq.client_group
  • messaging.rocketmq.consumption_model
  • messaging.rocketmq.message.delay_time_level
  • messaging.rocketmq.message.delivery_timestamp
  • messaging.gcp_pubsub.message.ordering_key
  • messaging.gcp_pubsub.message.ack_id
  • messaging.gcp_pubsub.message.ack_deadline
  • messaging.gcp_pubsub.message.delivery_attempt
  • messaging.servicebus.message.delivery_count
  • messaging.servicebus.message.enqueued_time
  • messaging.servicebus.destination.subscription_name
  • messaging.servicebus.disposition_status
  • messaging.eventhubs.message.enqueued_time

Some examples where _ -> . could make sense:

  • messaging.gcp_pubsub.* -> messaging.gcp.pubsub (it could make sense to use {provider}.{service} format instead of {provider}_{service})
  • process.command_line, process.command_args -> process.command.* - in case there are multiple properties that describe the same thing. Similar:
    • process.parent_pid -> process.parent.pid|name|...
    • rpc.jsonrpc.request_id -> rpc.jsonrpc.request.id
    • db.cosmosdb.request_charge, db.cosmosdb.request_content_length -> db.cosmosdb.request.*
    • aws.dynamodb.table_names -> aws.dynamodb.table.*

...

I believe we don't always know whether more attributes about something (e.g. request) are expected and default to adding an attribute with _.

Some vague semconv guidance could be:

  • default to . whenever possible. E.g. a thing and its properties should usually be separated with . (table.name, request.id, operation.name) since there could be multiple.
  • only use _ when . does not make sense or would alter the meaning. e.g. cloud_run, rate_limiting, max_tokens, resend_count, availability_zone

@marcalff
Copy link
Member

marcalff commented Jun 11, 2024

Changing code generation rules in general for _ will affect a lot of existing, non deprecated, non colliding semantic conventions, which is not desirable.

OPTION 5

How about:

  • Define a new property, generate_as, in build tools / weaver
  • In semantic conventions, manually resolve collisions by providing an alternate generate_as name for the generated symbol, while not changing the associated value

For example:

  • semconv messaging.client.id is untouched.
  • semconv messaging.client_id is modified with the property generate_as = messaging.client_id.deprecated (since, after all, it is deprecated. Could be named .old, .v1, or anything different)

Taking C++ as an example, code generation will look like:

  kMessagingClientId = "messaging.client.id"; // new semconv
  kMessagingClientIdDeprecated = "messaging.client_id"; // old semconv.

[EDIT 2024-06-11]

Alternate example, if we want to change the new name instead:

  • semconv messaging.client.id is modified with the property generate_as = messaging.client.id.2 (to avoid the collision)
  • semconv messaging.client_id is untouched.

Taking C++ as an example, code generation will look like:

  kMessagingClientId2 = "messaging.client.id"; // new semconv
  kMessagingClientId = "messaging.client_id"; // old semconv.

The major benefit I see is that existing semconv like:

  • k8s.container.restart_count
  • etc

are unchanged.

Only symbols that actually collide need special treatment, and only user code that depends on these symbols needs to change (to use kMessagingClientIdDeprecated if this is really what the instrumentation wants).

What generate_as provides, is to decouple symbol names from semantic conventions values, so resolving a conflict in values (due to differences only with _ and .) does not force to revise code generation rules names for everything, impacting many unrelated symbols (all the existing semconv that contains a _ today).

@marcalff
Copy link
Member

@lmolkova Please see proposal for option 5 above.

@lmolkova
Copy link
Contributor Author

lmolkova commented Jun 11, 2024

@marcalff thanks!

It does not make send to generate the deprecated attribute differently - nobody will update their code to point to a deprecated attribute - we don't want them to.

I.e. having generate_as would only make sense on the new attribute. But also we don't want to change how the new attribute is generated in the stable conventions.

But I really like the idea of manual resolution:

  • the collisions would be rare
  • linters would detect duplicates, I.e. it'd be quite hard to ship the artifact without first resolving collisions
  • the manual resolution would be one of the following:
    • drop the deprecated attribute which has a collision with a new one - some languages could be fine with it.
    • give a new name to the new attribute - I'd leave it up to each language to pick a pattern they like.
      • E.g. messaging.client.id would become incubating-semconv.MessagingAttributes.MESSAGING_CLIENT_ID2 = "messaging.client.id" and stable-semconv.MessagingAttributes.MESSAGING_CLIENT_ID = "messaging.client.id"
    • someone can still do Option 5 and give a new name to the OLD attribute
      • E.g. messaging.client_id would become incubating-semconv.MessagingAttributes.MESSAGING_CLIENT_ID_DEPRECATED = "messaging.client_id"

More context on this here: open-telemetry/semantic-conventions-java#75 (comment)

and here's the prototype: open-telemetry/semantic-conventions-java#76 - effectively there is a way to remap constant name and/or remove attribute.

TL;DR: SIGs decide to drop the deprecated attribute or assign a different constant name - they pick a new constant name in case of a collision.

@marcalff
Copy link
Member

TL;DR: SIGs decide to drop the deprecated attribute or assign a different constant name - they pick a new constant name in case of a collision.

I don't understand.

I was hopping for a solution where the alternate name comes from metadata added in the semantic convention for collisions, so that the code generation scripts do not have to be changed all the time to account for special cases as they arise.

Now it looks like each SIG, in their code generation scripts, it to add code for special cases.

Is the conclusion really that SIGs are on their own and must fix collisions created "upstream" in semantic conventions ?

In this case, opentelemetry-cpp already complies: it drops the deprecated messaging.client_id.

@lmolkova
Copy link
Contributor Author

lmolkova commented Jun 14, 2024

I was hopping for a solution where the alternate name comes from metadata added in the semantic convention for collisions, so that the code generation scripts do not have to be changed all the time to account for special cases as they arise.

Assuming there is an alternative constant name in the schema, would you drop the attribute or would you generate both attributes (with different names) in C++?

The prescriptive solution coming from semconv would be:

  • messaging.client_id (old) should be marked somehow so it can be dropped. E.g. collides_with: messaging.client.id
  • messaging.client.id (new) would be marked with alternative constant name. E.g. unque_const_name: messaging_client_id2.

Jinja scripts would need to be changed to either drop based one collides_with or use constant name from the unque_const_name (converting it to proper case).

Now it looks like each SIG, in their code generation scripts, it to add code for special cases.

The templates need to be changed now to support dropping/renaming. If a collision happens again, the change would be to add another attribute name to the list of excluded attributes (or to a manually maintained map). E.g. this change is java open-telemetry/semantic-conventions-java#76 adds all you need to either drop or rename constants by making a small config change.

Is the conclusion really that SIGs are on their own and must fix collisions created "upstream" in semantic conventions ?

I think it's a path forward and not a conclusion.
We need need to add checks to semconv to know when they happen before we release. We should probably block such renames until we have a reasonable conclusion.

Manual resolution seems like an easy and cheap thing to do to solve this and similar collisions in the same way. We can automate it, but it's still not clear to me what are all the options we need to provide.

Note:
We are migrating tooling from build-tools to weaver. Code generation migration which would involve some jinja updates and this would be a good time to add the `final' solution.

TL;DR:

  • semconv should detect codegen collisions when they are added
  • I want to see how SIGs handle the current collision to understand which options we need to provide
  • while we're discussing this, we should block renames that cause such collisions

@lmolkova
Copy link
Contributor Author

lmolkova commented Jun 17, 2024

Discussed at Semconv WG 6/17

  • we need to detect codegen collisions when they are added to semconv

  • we should put a moratorium on renames that'd result in collision until we have tooling/checks in place

  • we need to update attribute naming guidance and suggest to use . when it makes sense

  • we'll probably make different choices for different attributes when collision happens

    • in case of messaging.client_id dropping the deprecated/experimental/slightly-used attribute is reasonable
    • we could decide differently for process.command_line
    • semconv should be the source of guidance on dropping/giving new constant names
    • we need tooling for it

So:

@dyladan
Copy link
Member

dyladan commented Jul 10, 2024

Has there been any resolution on this?

@lmolkova
Copy link
Contributor Author

Yes, we're adding collision checks - #1209 - necessary tooling changes just landed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants