chore: adding kafka brokers metrics #196

jcountsNR · 2023-07-14T21:02:45Z

No description provided.

jcountsNR · 2023-07-14T21:16:53Z

Adding a note, this is a continuation of this PR, I just could not make changes to that fork so I needed to create a new one.

jcountsNR · 2023-07-17T16:26:15Z

@jsuereth , I'm tagging you since it looks like you were the assignee on this one. We have a lot of eyes on our end wanting to get this one to the finish line, please let me know if there is anything I can do to make sure this one is good to go. Thanks!

dmitryax

Please move the metrics to another section ### Broker Metrics

docs/messaging/kafka.md

Co-authored-by: Dmitrii Anoshin <[email protected]>

jcountsNR · 2023-07-20T15:25:07Z

Hi @dmitryax , I'd like to make some progress on this while I'm waiting on approval for the semantic conventions PR. Can you tell me what a CI failure is? I'm not seeing any actual error code is so I don't know what needs to be fixed.

lmolkova

I'm not sure if this PR documents what's already done on Kafka and there are limitations on what can be changed, but I suggest to stay as close to general otel metric requirements as possible.
I left several comments on this.

Also, I'd recommend using messaging.kafka.broker (not brokers) as a namespace.

lmolkova · 2023-07-20T16:41:10Z

docs/messaging/kafka.md

+| messaging.kafka.brokers.count                | UpDownCounter | Int64      | brokers   | `{broker}`   | sum of brokers in the cluster | | |
+| messaging.kafka.brokers.consumer.fetch.rate  | Gauge         | Double     | fetches per second | `{fetch}/s`  | Average consumer fetch Rate. | `state` | `in`, `out` |
+| messaging.kafka.brokers.network.io           | Counter       | Int64      | bytes   | `By`     |  The bytes received or sent by the broker. | | |
+| messaging.kafka.brokers.requests.latency     | Gauge         | Double     | ms        | `{ms}` | Average Request latency in ms. | | |


is it possible to report it as a histogram? Then we'll have percentiles and messaging.kafka.brokers.requests.rate can also be derived from it?

Agreed. I would much prefer to see this as a histogram. Even better, in my opinion, would be an exponential histogram.

Hmm, I don't have any experience converting histograms from metrics.go into histograms for OTeL. Are there other examples where someone has done this in the past? The kafka consumers for example is already tracking lag as a gauge metric, so I thought the same would make sense here.

I could see it potentially being an upgrade, I'm just not sure how to apply it to update the collector.

lmolkova · 2023-07-20T16:43:32Z

docs/messaging/kafka.md

+
+| Name                                         | Instrument    | Value type | Unit   | Unit ([UCUM](/docs/general/metrics.md#instrument-units)) | Description    | Attribute Key | Attribute Values |
+| ---------------------------------------------| ------------- | ---------- | ------ | -------------------------------------------- | -------------- | ------------- | ---------------- |
+| messaging.kafka.brokers.count                | UpDownCounter | Int64      | brokers   | `{broker}`   | sum of brokers in the cluster | | |


Is this metric necessary?

assuming brokers report unique instance id (e.g. standard service.instance.id attribute), it can be derived from other metrics in this list. If brokers also report standard metrics (CPU, memory, etc), this can also be derived from them.

That might be true, this is primarily here to replace the kafka.brokers metric which already exists. So doing away with this completely might impact metrics that are being collected now, although I do see the point that it could be derived if we added the attribute.

lmolkova · 2023-07-20T16:44:28Z

docs/messaging/kafka.md

+| Name                                         | Instrument    | Value type | Unit   | Unit ([UCUM](/docs/general/metrics.md#instrument-units)) | Description    | Attribute Key | Attribute Values |
+| ---------------------------------------------| ------------- | ---------- | ------ | -------------------------------------------- | -------------- | ------------- | ---------------- |
+| messaging.kafka.brokers.count                | UpDownCounter | Int64      | brokers   | `{broker}`   | sum of brokers in the cluster | | |
+| messaging.kafka.brokers.consumer.fetch.rate  | Gauge         | Double     | fetches per second | `{fetch}/s`  | Average consumer fetch Rate. | `state` | `in`, `out` |


If brokers can report messaging.kafka.brokers.consumer.fetch.count, it can provide more information and the rate can be derived from it - can it be changed?

Indeed. I thought that it was preferred not to have rate metrics but instead use a counter from which the rate can be derived for whatever time period is desired.

Hmm, yeah I think that is correct. This should be updated.

lmolkova · 2023-07-20T16:45:12Z

docs/messaging/kafka.md

+| messaging.kafka.brokers.consumer.fetch.rate  | Gauge         | Double     | fetches per second | `{fetch}/s`  | Average consumer fetch Rate. | `state` | `in`, `out` |
+| messaging.kafka.brokers.network.io           | Counter       | Int64      | bytes   | `By`     |  The bytes received or sent by the broker. | | |
+| messaging.kafka.brokers.requests.latency     | Gauge         | Double     | ms        | `{ms}` | Average Request latency in ms. | | |
+| messaging.kafka.brokers.requests.rate        | Gauge         | Double     | requests per second | `{request}/s`| Average request rate per second. | | |


same here, messaging.kafka.brokers.request.count is more flexible and the rate can be derived from it.

lmolkova · 2023-07-20T16:46:25Z

docs/messaging/kafka.md

+| messaging.kafka.brokers.network.io           | Counter       | Int64      | bytes   | `By`     |  The bytes received or sent by the broker. | | |
+| messaging.kafka.brokers.requests.latency     | Gauge         | Double     | ms        | `{ms}` | Average Request latency in ms. | | |
+| messaging.kafka.brokers.requests.rate        | Gauge         | Double     | requests per second | `{request}/s`| Average request rate per second. | | |
+| messaging.kafka.brokers.requsts.size         | Gauge         | Double     | bytes     | `By`         | Average request size in bytes. | | |


this would probably be best represented with histogram to allow distribution and then a rate counter is not needed.
Another possibility would be to only report messaging.kafka.brokers.network.io with direction attribute or use messaging.kafka.brokers.request.bytes counter counting all the bytes .

Hi @lmolkova , is there an example of a histogram metric available and how that would be defined?

you can find examples throughout this repo, for example, take a look at http.client.request.duration. there is a lot of information in the spec repo and also take a look at metric docs on opentelemetry.io

lmolkova · 2023-07-20T16:52:39Z

docs/messaging/kafka.md

+| messaging.kafka.brokers.requests.latency     | Gauge         | Double     | ms        | `{ms}` | Average Request latency in ms. | | |
+| messaging.kafka.brokers.requests.rate        | Gauge         | Double     | requests per second | `{request}/s`| Average request rate per second. | | |
+| messaging.kafka.brokers.requsts.size         | Gauge         | Double     | bytes     | `By`         | Average request size in bytes. | | |
+| messaging.kafka.brokers.responses.rate       | Gauge         | Double     | responses per second| `{response}/s`| Average response rate per second. | | |


same comment as on request rate

lmolkova · 2023-07-20T16:52:51Z

docs/messaging/kafka.md

+| messaging.kafka.brokers.requests.rate        | Gauge         | Double     | requests per second | `{request}/s`| Average request rate per second. | | |
+| messaging.kafka.brokers.requsts.size         | Gauge         | Double     | bytes     | `By`         | Average request size in bytes. | | |
+| messaging.kafka.brokers.responses.rate       | Gauge         | Double     | responses per second| `{response}/s`| Average response rate per second. | | |
+| messaging.kafka.brokers.response_size        | Gauge         | Double     | bytes     | `By`         | Average response size in bytes. | | |


same comment as on request size

lmolkova · 2023-07-20T16:53:48Z

docs/messaging/kafka.md

+| messaging.kafka.brokers.requsts.size         | Gauge         | Double     | bytes     | `By`         | Average request size in bytes. | | |
+| messaging.kafka.brokers.responses.rate       | Gauge         | Double     | responses per second| `{response}/s`| Average response rate per second. | | |
+| messaging.kafka.brokers.response_size        | Gauge         | Double     | bytes     | `By`         | Average response size in bytes. | | |
+| messaging.kafka.brokers.requests.in.flight   | Gauge         | Int64      | requests  | `{request}`  | Requests in flight. | | |


maybe messaging.kafka.brokers.active_requests to stay consistent with HTTP semantic conventions.

lmolkova · 2023-07-20T16:59:00Z

docs/messaging/kafka.md

+
+**Description:** Kafka Broker level metrics.
+
+| Name                                         | Instrument    | Value type | Unit   | Unit ([UCUM](/docs/general/metrics.md#instrument-units)) | Description    | Attribute Key | Attribute Values |


I don't see any attributes - are there any? We should have at least standard service.* ones

I would suggest using the broker's id (which I would call "node id" instead of "broker id" following Kafka best practice).

AndrewJSchofield · 2023-07-27T11:02:56Z

docs/messaging/kafka.md

+| Name                                         | Instrument    | Value type | Unit   | Unit ([UCUM](/docs/general/metrics.md#instrument-units)) | Description    | Attribute Key | Attribute Values |
+| ---------------------------------------------| ------------- | ---------- | ------ | -------------------------------------------- | -------------- | ------------- | ---------------- |
+| messaging.kafka.brokers.count                | UpDownCounter | Int64      | brokers   | `{broker}`   | sum of brokers in the cluster | | |
+| messaging.kafka.brokers.consumer.fetch.rate  | Gauge         | Double     | fetches per second | `{fetch}/s`  | Average consumer fetch Rate. | `state` | `in`, `out` |


Indeed. I thought that it was preferred not to have rate metrics but instead use a counter from which the rate can be derived for whatever time period is desired.

AndrewJSchofield · 2023-07-27T11:04:49Z

docs/messaging/kafka.md

+| messaging.kafka.brokers.count                | UpDownCounter | Int64      | brokers   | `{broker}`   | sum of brokers in the cluster | | |
+| messaging.kafka.brokers.consumer.fetch.rate  | Gauge         | Double     | fetches per second | `{fetch}/s`  | Average consumer fetch Rate. | `state` | `in`, `out` |
+| messaging.kafka.brokers.network.io           | Counter       | Int64      | bytes   | `By`     |  The bytes received or sent by the broker. | | |
+| messaging.kafka.brokers.requests.latency     | Gauge         | Double     | ms        | `{ms}` | Average Request latency in ms. | | |


Agreed. I would much prefer to see this as a histogram. Even better, in my opinion, would be an exponential histogram.

AndrewJSchofield · 2023-07-27T11:05:14Z

docs/messaging/kafka.md

+| ---------------------------------------------| ------------- | ---------- | ------ | -------------------------------------------- | -------------- | ------------- | ---------------- |
+| messaging.kafka.brokers.count                | UpDownCounter | Int64      | brokers   | `{broker}`   | sum of brokers in the cluster | | |
+| messaging.kafka.brokers.consumer.fetch.rate  | Gauge         | Double     | fetches per second | `{fetch}/s`  | Average consumer fetch Rate. | `state` | `in`, `out` |
+| messaging.kafka.brokers.network.io           | Counter       | Int64      | bytes   | `By`     |  The bytes received or sent by the broker. | | |


Wouldn't we want to measure bytes sent and bytes received separately?

That was the original thought, but it was changed. I think changing it back makes sense, the PR I have for the kafka collector is separated.

AndrewJSchofield · 2023-07-27T11:08:23Z

docs/messaging/kafka.md

+
+**Description:** Kafka Broker level metrics.
+
+| Name                                         | Instrument    | Value type | Unit   | Unit ([UCUM](/docs/general/metrics.md#instrument-units)) | Description    | Attribute Key | Attribute Values |


I would suggest using the broker's id (which I would call "node id" instead of "broker id" following Kafka best practice).

jcountsNR · 2023-08-04T02:54:57Z

Hi @AndrewJSchofield @lmolkova @dmitryax . I made some updates and raised the commit. It looks correct to me now, let me know if I've missed anything.

jcountsNR · 2023-08-04T03:08:13Z

One other follow up, I don't think that node and broker are interchangeable, and since this is specific to broker metrics I'm inclined to say we should leave it as broker.id. I did add the attribute to everything except broker count.

jcountsNR · 2023-08-07T18:28:34Z

Sorry for a delayed response to this. This is for a very specific integration which uses sarama metrics, and it has some limitations. Since that is the case, I'm not sure that making significant updates to these metrics makes sense, because they aren't possible with the one integration so far that will be using them.

The broker related metrics for example don't use count, but rather rate for all of the metrics that they are translating for kafka. I think this PR should probably either be cancelled or updated to be a more general naming translation for the metrics we can get from confluent cloud brokers.

pyohannes · 2023-09-21T14:26:39Z

In a semantic conventions SIG meeting an agreement was reached to remove Kafka broker metrics from the semantic conventions. See #338, which removes the PR and gives a list of reasons.

@jcountsNR, if you have any opinions please weigh in on #338. Otherwise let's close this PR when/if #338 is merged.

jcountsNR · 2023-09-21T19:17:34Z

@pyohannes I'm good to close it, I don't think it's valid anymore.

jcountsNR added 2 commits July 14, 2023 13:58

chore: update kafka.md

4a26acc

chore: fix

fbe1e33

jcountsNR requested review from a team July 14, 2023 21:02

github-actions bot assigned jsuereth Jul 14, 2023

jcountsNR mentioned this pull request Jul 14, 2023

Feat: expand kafka broker metrics open-telemetry/opentelemetry-collector-contrib#24259

Closed

dmitryax reviewed Jul 18, 2023

View reviewed changes

docs/messaging/kafka.md Outdated Show resolved Hide resolved

docs/messaging/kafka.md Outdated Show resolved Hide resolved

jcountsNR and others added 2 commits July 18, 2023 08:53

Update docs/messaging/kafka.md

1fb25d1

Co-authored-by: Dmitrii Anoshin <[email protected]>

chore: move broker metrics

25f4535

lmolkova requested changes Jul 20, 2023

View reviewed changes

lmolkova reviewed Jul 20, 2023

View reviewed changes

AndrewJSchofield suggested changes Jul 27, 2023

View reviewed changes

jcountsNR added 3 commits August 3, 2023 19:42

chore: updates

fbcd579

Merge branch 'main' into main

c3d6579

chore: minor changes

f88093b

chore: update node back to broker

6f17ce9

fix

57a96c6

Merge branch 'open-telemetry:main' into main

75f0112

jcountsNR closed this Sep 21, 2023

lmolkova mentioned this pull request Sep 8, 2024

Add VCS metrics from Github receiver #1383

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: adding kafka brokers metrics #196

chore: adding kafka brokers metrics #196

jcountsNR commented Jul 14, 2023

jcountsNR commented Jul 14, 2023

jcountsNR commented Jul 17, 2023

dmitryax left a comment

jcountsNR commented Jul 20, 2023

lmolkova left a comment

lmolkova Jul 20, 2023

AndrewJSchofield Jul 27, 2023

jcountsNR Aug 3, 2023

lmolkova Jul 20, 2023

jcountsNR Aug 3, 2023

lmolkova Jul 20, 2023

AndrewJSchofield Jul 27, 2023

jcountsNR Aug 3, 2023

lmolkova Jul 20, 2023

jcountsNR Aug 3, 2023

lmolkova Jul 20, 2023

jcountsNR Jul 24, 2023

lmolkova Aug 3, 2023

lmolkova Jul 20, 2023

lmolkova Jul 20, 2023

lmolkova Jul 20, 2023

lmolkova Jul 20, 2023 •

edited

Loading

AndrewJSchofield Jul 27, 2023

AndrewJSchofield Jul 27, 2023

AndrewJSchofield Jul 27, 2023

AndrewJSchofield Jul 27, 2023

jcountsNR Aug 3, 2023

AndrewJSchofield Jul 27, 2023

jcountsNR commented Aug 4, 2023

jcountsNR commented Aug 4, 2023

jcountsNR commented Aug 7, 2023

pyohannes commented Sep 21, 2023

jcountsNR commented Sep 21, 2023


		Description: Kafka Broker level metrics.

		\| Name \| Instrument \| Value type \| Unit \| Unit ([UCUM](/docs/general/metrics.md#instrument-units)) \| Description \| Attribute Key \| Attribute Values \|

chore: adding kafka brokers metrics #196

chore: adding kafka brokers metrics #196

Conversation

jcountsNR commented Jul 14, 2023

jcountsNR commented Jul 14, 2023

jcountsNR commented Jul 17, 2023

dmitryax left a comment

Choose a reason for hiding this comment

jcountsNR commented Jul 20, 2023

lmolkova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmolkova Jul 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcountsNR commented Aug 4, 2023

jcountsNR commented Aug 4, 2023

jcountsNR commented Aug 7, 2023

pyohannes commented Sep 21, 2023

jcountsNR commented Sep 21, 2023

lmolkova Jul 20, 2023 •

edited

Loading