Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: expand kafka broker metrics #24259

Closed
wants to merge 56 commits into from
Closed

Conversation

jcountsNR
Copy link

Rebasing and resubmitting from the original PR due to new team ownership.

ORIGINAL PR: 23646

@jcountsNR jcountsNR requested a review from a team July 13, 2023 16:22
@jcountsNR jcountsNR requested a review from dmitryax as a code owner July 13, 2023 16:22
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jul 13, 2023

@jcountsNR
Copy link
Author

@dmitryax , I didn't have a way to update the other PR since I did not raise it, so unfortunately I had to create a new one. I did however rebase with main and run make gotidy.

For the question that was posed in the previous PR -
'Isn't count the cumulative sum?' - New Relic views count as a tally of number of metrics rather than a sum.
For example, if 10 metrics are sent each with a value of 10, the count would be 10, and the sum would be 100. Hopefully that clarifies.

@dmitryax
Copy link
Member

'Isn't count the cumulative sum?' - New Relic views count as a tally of number of metrics rather than a sum.
For example, if 10 metrics are sent each with a value of 10, the count would be 10, and the sum would be 100. Hopefully that clarifies.

Ok, but this is not a New Relic specific component. Sums are usually preferable in OpenTelemetry because they are more resilient to data points misses. You can use https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/cumulativetodeltaprocessor to translate all the cumulative metrics to deltas.

Also, please take a look at metrics already defined in the Semantic Conventions https://github.com/open-telemetry/semantic-conventions/blob/ca881074f6df330bacdaa89372f2851c26545fee/docs/messaging/kafka.md?plain=1#L104C13-L104C29. messaging.kafka.network.io is already defined as a sum (cumulative counter) and the definition seems like the same as kafka.brokers.incoming_byte_rate and kafka.brokers.outgoing_byte_rate. We should probably suggest changing it to messaging.kafka.brokers.network.io.

@jcountsNR
Copy link
Author

'Isn't count the cumulative sum?' - New Relic views count as a tally of number of metrics rather than a sum.
For example, if 10 metrics are sent each with a value of 10, the count would be 10, and the sum would be 100. Hopefully that clarifies.

Ok, but this is not a New Relic specific component. Sums are usually preferable in OpenTelemetry because they are more resilient to data points misses. You can use https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/cumulativetodeltaprocessor to translate all the cumulative metrics to deltas.

Also, please take a look at metrics already defined in the Semantic Conventions https://github.com/open-telemetry/semantic-conventions/blob/ca881074f6df330bacdaa89372f2851c26545fee/docs/messaging/kafka.md?plain=1#L104C13-L104C29. messaging.kafka.network.io is already defined as a sum (cumulative counter) and the definition seems like the same as kafka.brokers.incoming_byte_rate and kafka.brokers.outgoing_byte_rate. We should probably suggest changing it to messaging.kafka.brokers.network.io.

Got it. I'm just catching up on this integration, but I'm thinking that count does work as a sum, and we might have misunderstood that originally.

For the second point, changing the name to messaging.kafka.brokers.network.io makes total sense to me. Will this track both incoming and outgoing byte rate though? Because we had 2 unique counters defined here.

@dmitryax
Copy link
Member

Will this track both incoming and outgoing byte rate though? Because we had 2 unique counters defined here.

Yes, the metric currently defined in the spec messaging.kafka.network.io has state attribute with in and out values

@dmitryax
Copy link
Member

There is a PR to semantic conventions originally submitted with this change open-telemetry/semantic-conventions#113. I added a couple of comments, and hopefully we will get more attention from approvers/maintainers of that project

@jcountsNR
Copy link
Author

understood, I may have to resubmit that one as well because I don't have privileges on that fork. I see your comments though, I'll make those changes and hopefully make that easy for approvers.

Just so I'm clear, what is the order of approval on both of these? Do I need to get that one approved prior to this one or the other way around?

@dmitryax
Copy link
Member

Just so I'm clear, what is the order of approval on both of these? Do I need to get that one approved prior to this one or the other way around?

Ideally, yes. Otherwise, metrics submitted here are more likely to be changed later. But feel free to work on both in parallel. This PR has some CI failures that should be fixed. We also need to start adding them to the new messaging. namespace even if other metrics are not migrated yet.

@jcountsNR
Copy link
Author

@dmitryax I had the same issue with other PR, I didn't have access to make changes to that fork so I had to create a new one. I did review your suggestions and I made the requested changes though, so hopefully this new one is in a more ready state.

open-telemetry/semantic-conventions#196

This was referenced Jul 26, 2023
@jcountsNR
Copy link
Author

@dmitryax commenting back on this one. I saw the request to run make generate in the failure logs, but in my local it's giving me an error that the executable file is not found in $PATH (mac). I'm not sure what file needs to be in the PATH so I'm having a hard time adding it.

@dmitryax
Copy link
Member

@dmitryax commenting back on this one. I saw the request to run make generate in the failure logs, but in my local it's giving me an error that the executable file is not found in $PATH (mac). I'm not sure what file needs to be in the PATH so I'm having a hard time adding it.

You don't have make installed? xcode-select --install should install all the dev tools

@jcountsNR
Copy link
Author

@dmitryax commenting back on this one. I saw the request to run make generate in the failure logs, but in my local it's giving me an error that the executable file is not found in $PATH (mac). I'm not sure what file needs to be in the PATH so I'm having a hard time adding it.

You don't have make installed? xcode-select --install should install all the dev tools

I do have it installed. It starts to run but it gives an error on mdatagen executable.

doc.go:5: running "mdatagen": exec: "mdatagen": executable file not found in $PATH

@dmitryax
Copy link
Member

Please try make install-tools first

@jcountsNR
Copy link
Author

@jcountsNR can you please add the broker metrics to the integration test?

Sorry for the delay on this one. I could not get the test to run locally, so I had to run it in EC2, hence all the commits. I added the broker metrics to the integration test, and it passes when I run it, and I verified that the metrics are being collected as well by running the collector with everything enabled. The only thing I'm missing is writing to the expected.yaml, for some reason it keeps writing a blank object, but it is working so I don't know why it's doing that.

@jcountsNR
Copy link
Author

@dmitryax , let me know if this works. I added the metrics to the test, and I also tested the collector to make sure they were coming.

// scraperinttest.WriteExpected(), // TODO remove
scraperinttest.WithCompareOptions(
// pmetrictest.IgnoreMetricValues(),
pmetrictest.IgnoreStartTimestamp(),
pmetrictest.IgnoreTimestamp(),
),
scraperinttest.WriteExpected(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this. This line is intender to be executed once locally to produce the expected.yaml file

Comment on lines +76 to +96
rCfg.Metrics.KafkaConsumerGroupLag.Enabled = true
rCfg.Metrics.KafkaConsumerGroupLagSum.Enabled = true
rCfg.Metrics.KafkaConsumerGroupMembers.Enabled = true
rCfg.Metrics.KafkaConsumerGroupOffset.Enabled = true
rCfg.Metrics.KafkaConsumerGroupOffsetSum.Enabled = true
rCfg.Metrics.KafkaPartitionCurrentOffset.Enabled = true
rCfg.Metrics.KafkaPartitionOldestOffset.Enabled = true
rCfg.Metrics.KafkaPartitionReplicas.Enabled = true
rCfg.Metrics.KafkaPartitionReplicasInSync.Enabled = true
rCfg.Metrics.KafkaTopicPartitions.Enabled = true
rCfg.Metrics.MessagingKafkaBrokerConsumerFetchCount.Enabled = true
rCfg.Metrics.MessagingKafkaBrokerConsumerFetchRate.Enabled = true
rCfg.Metrics.MessagingKafkaBrokerCount.Enabled = true
rCfg.Metrics.MessagingKafkaBrokerIncomingByteRate.Enabled = true
rCfg.Metrics.MessagingKafkaBrokerOutgoingByteRate.Enabled = true
rCfg.Metrics.MessagingKafkaBrokerRequestLatency.Enabled = true
rCfg.Metrics.MessagingKafkaBrokerRequestRate.Enabled = true
rCfg.Metrics.MessagingKafkaBrokerRequestSize.Enabled = true
rCfg.Metrics.MessagingKafkaBrokerRequestsInFlight.Enabled = true
rCfg.Metrics.MessagingKafkaBrokerResponseRate.Enabled = true
rCfg.Metrics.MessagingKafkaBrokerResponseSize.Enabled = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have all the metrics enabled, but the expected.yaml output doesn't have anything new. It means the integration test doesn't work some reason

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I never got it to write anything to expected.yaml, but I was getting metrics from it when I ran the collector. That does seem to indicate it's an issue in the integration test, but the only thing I've changed here is just enabling the metrics.

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Oct 18, 2023
Copy link
Contributor

github-actions bot commented Nov 2, 2023

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants