-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: adding kafka brokers metrics #196
Changes from 4 commits
4a26acc
fbe1e33
1fb25d1
25f4535
fbcd579
c3d6579
f88093b
6f17ce9
57a96c6
75f0112
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -111,7 +111,6 @@ This section defines how to apply semantic conventions when collecting Kafka met | |
| messaging.kafka.controllers.active | UpDownCounter | Int64 | controllers | `{controller}` | The number of active controllers in the broker. | | | | ||
| messaging.kafka.leader.elections | Counter | Int64 | elections | `{election}` | Leader election rate (increasing values indicates broker failures). | | | | ||
| messaging.kafka.leader.unclean-elections | Counter | Int64 | elections | `{election}` | Unclean leader election rate (increasing values indicates broker failures). | | | | ||
| messaging.kafka.brokers | UpDownCounter | Int64 | brokers | `{broker}` | Number of brokers in the cluster. | | | | ||
| messaging.kafka.topic.partitions | UpDownCounter | Int64 | partitions | `{partition}` | Number of partitions in topic. | `topic` | The ID (integer) of a topic | | ||
| messaging.kafka.partition.current_offset | Gauge | Int64 | partition offset | `{partition offset}` | Current offset of partition of topic. | `topic` | The ID (integer) of a topic | | ||
| | | | | | | `partition` | The number (integer) of the partition | | ||
|
@@ -159,4 +158,20 @@ This section defines how to apply semantic conventions when collecting Kafka met | |
| messaging.kafka.consumer.lag_sum | Gauge | Int64 | lag sum | `{lag sum}` | Current approximate sum of consumer group lag across all partitions of topic | `group` | The ID (string) of a consumer group | | ||
| | | | | | | `topic` | The ID (integer) of a topic | | ||
|
||
### Broker Metrics | ||
|
||
**Description:** Kafka Broker level metrics. | ||
|
||
| Name | Instrument | Value type | Unit | Unit ([UCUM](/docs/general/metrics.md#instrument-units)) | Description | Attribute Key | Attribute Values | | ||
| ---------------------------------------------| ------------- | ---------- | ------ | -------------------------------------------- | -------------- | ------------- | ---------------- | | ||
| messaging.kafka.brokers.count | UpDownCounter | Int64 | brokers | `{broker}` | sum of brokers in the cluster | | | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this metric necessary? assuming brokers report unique instance id (e.g. standard There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That might be true, this is primarily here to replace the |
||
| messaging.kafka.brokers.consumer.fetch.rate | Gauge | Double | fetches per second | `{fetch}/s` | Average consumer fetch Rate. | `state` | `in`, `out` | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If brokers can report There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indeed. I thought that it was preferred not to have rate metrics but instead use a counter from which the rate can be derived for whatever time period is desired. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, yeah I think that is correct. This should be updated. |
||
| messaging.kafka.brokers.network.io | Counter | Int64 | bytes | `By` | The bytes received or sent by the broker. | | | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wouldn't we want to measure bytes sent and bytes received separately? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That was the original thought, but it was changed. I think changing it back makes sense, the PR I have for the kafka collector is separated. |
||
| messaging.kafka.brokers.requests.latency | Gauge | Double | ms | `{ms}` | Average Request latency in ms. | | | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is it possible to report it as a histogram? Then we'll have percentiles and There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed. I would much prefer to see this as a histogram. Even better, in my opinion, would be an exponential histogram. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, I don't have any experience converting histograms from metrics.go into histograms for OTeL. Are there other examples where someone has done this in the past? The kafka consumers for example is already tracking I could see it potentially being an upgrade, I'm just not sure how to apply it to update the collector. |
||
| messaging.kafka.brokers.requests.rate | Gauge | Double | requests per second | `{request}/s`| Average request rate per second. | | | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same here, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. agreed. |
||
| messaging.kafka.brokers.requsts.size | Gauge | Double | bytes | `By` | Average request size in bytes. | | | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this would probably be best represented with histogram to allow distribution and then a rate counter is not needed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @lmolkova , is there an example of a histogram metric available and how that would be defined? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you can find examples throughout this repo, for example, take a look at |
||
| messaging.kafka.brokers.responses.rate | Gauge | Double | responses per second| `{response}/s`| Average response rate per second. | | | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same comment as on request rate |
||
| messaging.kafka.brokers.response_size | Gauge | Double | bytes | `By` | Average response size in bytes. | | | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same comment as on request size |
||
| messaging.kafka.brokers.requests.in.flight | Gauge | Int64 | requests | `{request}` | Requests in flight. | | | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe |
||
|
||
[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any attributes - are there any? We should have at least standard
service.*
onesThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest using the broker's id (which I would call "node id" instead of "broker id" following Kafka best practice).