Move existing Kafka metrics to a JMX specific namespace #269

pyohannes · 2023-08-17T16:26:35Z

Problem

The Kafka metrics in the messaging semantic conventions are based on metrics as they are reported by the JMX reporter. Although the JMX reporter is the default reporter, different reporters can be configured.

In their current form, the definition of those metrics causes confusion, as they suggest being an authoritative definition of metrics as they are reported by Kafka. However, they look different from the metrics in the official Kafka documentation, and they are different from the OTLP metrics defined in KIP-714.

Based on feedback from @AndrewJSchofield, people involved in the Kafka project currently don't have an interest in defining and maintaining a definition of Kafka metrics as part of the OTel semantic conventions.

Desired solution

The semantic conventions should make it clear that the existing metrics are JMX specific, ideally in the metric name. This allows support for conventions for Kafka metrics that are sent via a different reporter.

Even more importantly, this avoids confusion, as it makes clear that the OTel semantic conventions are not the place that provides an authoritative definition of Kafka metrics.

jack-berg · 2023-08-21T15:49:45Z

Related issue in the specification.

trask · 2023-08-21T17:38:02Z

I like the approach @jack-berg took to a similar problem in open-telemetry/opentelemetry-java-instrumentation#6138:

I've updated the PR to be a general purpose bridge between kafka client metrics and opentelemetry metric, rather than implementing as an allow list of known metrics. My reasoning is as follows:

There are 202 metrics exposed by kafka client metrics. I made it through mapping about 60% of them before I decided that manually mapping all the metrics is madness. Its error prone, brittle against changes between kafka client versions, and adds little value.

The metrics kafka client exposes are conceptually similar what we do in our micrometer shim: Kafka client has its own internal metrics system. Its authors have already done the analysis to decide what is important to monitor when using the kafka client, and they've provided a hook for to access the metrics in a generic format. In bridging these metrics to opentelemetry we should use a light hand. We should map the metrics to the appropriate opentelemetry instrument types, but should minimize changes to metric names, descriptions, etc. Trying to map these to semantic conventions is folly, because we don't control the instrumentation at its source and the authors could change the metrics drastically in the next version.

The lack of mapping to semantic conventions doesn't mean this data is not useful. This data is very useful to folks operating applications that produce and consume from kafka. We'd be doing a disservice to opentelemetry users and hampering the adoption of opentelemetry by rejecting good telemetry data because of a lack of semantic conventions. Furthermore, I don't think we should try to codify these metrics in the semantic conventions: 202 metrics is just too much data to review. Additionally, there's likely to be significant differences in the metrics that are exposed in kafka clients from language to language. Does that mean we should withhold this data that's available to java kafka client users? No. There's a place for semantic conventions, and there's a place for bridging in telemetry data that is available but not under our control.

pyohannes · 2023-08-22T10:33:49Z

The lack of mapping to semantic conventions doesn't mean this data is not useful.

I fully agree, .NET runtime metrics are a great example of this.

The conception that telemetry data needs to be covered by OTel semantic conventions in order to be useful leads to unwanted side effects:

Semantic conventions are written to comply with existing instrumentation, whereas it should be the other way round.
As a consequence of the point above, we end up with a bunch of experimental semantic conventions with no clearly defined ownership.

My take is that telemetry conventions referring to a specific component (e. g. Kafka broker, or .NET runtime) can be part of OTel semantic conventions, but can as well live apart of them. In any case, they should be maintained by people working on that specific component, and not by the wider community.

joaopgrassi · 2023-08-22T11:58:29Z

My take is that telemetry conventions referring to a specific component (e. g. Kafka broker, or .NET runtime) can be part of OTel semantic conventions, but can as well live apart of them. In any case, they should be maintained by people working on that specific component, and not by the wider community.

I agree as well. For me, what is a bit unclear is: If I specify such conventions outside OTel, should they follow the OTel guidelines? For ex, metrics names, instrument types and etc?

To me it makes sense that if such library/framework etc uses OTel components to record such data (tracer, meter) they must follow the OTel guidelines.

pyohannes · 2023-09-21T14:15:45Z

Discussed in the semantic conventions SIG meeting on September 18th.

There was an agreement to remove experimental Kafka metrics from the semantic conventions. If a definition or reference of those metric is needed, it should be provided by the documentation of the package that synthesizes those metrics. It seems this documentation is already available in the kafkametricsreceiver collector component.

github-actions bot assigned arminru Aug 17, 2023

pyohannes mentioned this issue Sep 21, 2023

Remove experimental Kafka metrics #338

Merged

3 tasks

reyang closed this as completed in #338 Sep 21, 2023

trask mentioned this issue May 8, 2024

Add experimental go runtime metrics semantic conventions #981

Merged

lmolkova mentioned this issue Sep 8, 2024

Add VCS metrics from Github receiver #1383

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move existing Kafka metrics to a JMX specific namespace #269

Move existing Kafka metrics to a JMX specific namespace #269

pyohannes commented Aug 17, 2023 •

edited

Loading

jack-berg commented Aug 21, 2023

trask commented Aug 21, 2023

pyohannes commented Aug 22, 2023

joaopgrassi commented Aug 22, 2023

pyohannes commented Sep 21, 2023

Move existing Kafka metrics to a JMX specific namespace #269

Move existing Kafka metrics to a JMX specific namespace #269

Comments

pyohannes commented Aug 17, 2023 • edited Loading

Problem

Desired solution

jack-berg commented Aug 21, 2023

trask commented Aug 21, 2023

pyohannes commented Aug 22, 2023

joaopgrassi commented Aug 22, 2023

pyohannes commented Sep 21, 2023

pyohannes commented Aug 17, 2023 •

edited

Loading