Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move existing Kafka metrics to a JMX specific namespace #269

Closed
pyohannes opened this issue Aug 17, 2023 · 5 comments · Fixed by #338
Closed

Move existing Kafka metrics to a JMX specific namespace #269

pyohannes opened this issue Aug 17, 2023 · 5 comments · Fixed by #338
Assignees

Comments

@pyohannes
Copy link
Contributor

pyohannes commented Aug 17, 2023

Problem

The Kafka metrics in the messaging semantic conventions are based on metrics as they are reported by the JMX reporter. Although the JMX reporter is the default reporter, different reporters can be configured.

In their current form, the definition of those metrics causes confusion, as they suggest being an authoritative definition of metrics as they are reported by Kafka. However, they look different from the metrics in the official Kafka documentation, and they are different from the OTLP metrics defined in KIP-714.

Based on feedback from @AndrewJSchofield, people involved in the Kafka project currently don't have an interest in defining and maintaining a definition of Kafka metrics as part of the OTel semantic conventions.

Desired solution

The semantic conventions should make it clear that the existing metrics are JMX specific, ideally in the metric name. This allows support for conventions for Kafka metrics that are sent via a different reporter.

Even more importantly, this avoids confusion, as it makes clear that the OTel semantic conventions are not the place that provides an authoritative definition of Kafka metrics.

@jack-berg
Copy link
Member

Related issue in the specification.

@trask
Copy link
Member

trask commented Aug 21, 2023

I like the approach @jack-berg took to a similar problem in open-telemetry/opentelemetry-java-instrumentation#6138:

I've updated the PR to be a general purpose bridge between kafka client metrics and opentelemetry metric, rather than implementing as an allow list of known metrics. My reasoning is as follows:

  • There are 202 metrics exposed by kafka client metrics. I made it through mapping about 60% of them before I decided that manually mapping all the metrics is madness. Its error prone, brittle against changes between kafka client versions, and adds little value.
  • The metrics kafka client exposes are conceptually similar what we do in our micrometer shim: Kafka client has its own internal metrics system. Its authors have already done the analysis to decide what is important to monitor when using the kafka client, and they've provided a hook for to access the metrics in a generic format. In bridging these metrics to opentelemetry we should use a light hand. We should map the metrics to the appropriate opentelemetry instrument types, but should minimize changes to metric names, descriptions, etc. Trying to map these to semantic conventions is folly, because we don't control the instrumentation at its source and the authors could change the metrics drastically in the next version.
  • The lack of mapping to semantic conventions doesn't mean this data is not useful. This data is very useful to folks operating applications that produce and consume from kafka. We'd be doing a disservice to opentelemetry users and hampering the adoption of opentelemetry by rejecting good telemetry data because of a lack of semantic conventions. Furthermore, I don't think we should try to codify these metrics in the semantic conventions: 202 metrics is just too much data to review. Additionally, there's likely to be significant differences in the metrics that are exposed in kafka clients from language to language. Does that mean we should withhold this data that's available to java kafka client users? No. There's a place for semantic conventions, and there's a place for bridging in telemetry data that is available but not under our control.

@pyohannes
Copy link
Contributor Author

The lack of mapping to semantic conventions doesn't mean this data is not useful.

I fully agree, .NET runtime metrics are a great example of this.

The conception that telemetry data needs to be covered by OTel semantic conventions in order to be useful leads to unwanted side effects:

  1. Semantic conventions are written to comply with existing instrumentation, whereas it should be the other way round.
  2. As a consequence of the point above, we end up with a bunch of experimental semantic conventions with no clearly defined ownership.

My take is that telemetry conventions referring to a specific component (e. g. Kafka broker, or .NET runtime) can be part of OTel semantic conventions, but can as well live apart of them. In any case, they should be maintained by people working on that specific component, and not by the wider community.

@joaopgrassi
Copy link
Member

My take is that telemetry conventions referring to a specific component (e. g. Kafka broker, or .NET runtime) can be part of OTel semantic conventions, but can as well live apart of them. In any case, they should be maintained by people working on that specific component, and not by the wider community.

I agree as well. For me, what is a bit unclear is: If I specify such conventions outside OTel, should they follow the OTel guidelines? For ex, metrics names, instrument types and etc?

To me it makes sense that if such library/framework etc uses OTel components to record such data (tracer, meter) they must follow the OTel guidelines.

@pyohannes
Copy link
Contributor Author

Discussed in the semantic conventions SIG meeting on September 18th.

There was an agreement to remove experimental Kafka metrics from the semantic conventions. If a definition or reference of those metric is needed, it should be provided by the documentation of the package that synthesizes those metrics. It seems this documentation is already available in the kafkametricsreceiver collector component.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: V1 - Stable Semantics
Development

Successfully merging a pull request may close this issue.

5 participants