Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit process metrics #334

Open
braydonk opened this issue Sep 20, 2023 · 10 comments
Open

Revisit process metrics #334

braydonk opened this issue Sep 20, 2023 · 10 comments

Comments

@braydonk
Copy link
Contributor

Pursuant to #330 which should be merged soon, I am opening this issue to revisit the process metrics schema while we are making breaking changes anyway.

As of writing, the only changes to the process schema will be to namespace every attribute. There are a couple of changes that should be discussed among the Hostmetrics WG to improve the schema.

Following are the current points up for discussion:

  • Should the attributes be considered exhaustive?
  • process.threads -> process.thread.count
  • process.context_switches -> process.context_switch.count
  • process.network.io.direction, transmit -> send
  • process.cpu.utilization, do we have a better way to indicate that this metric is intended to be computed from another?

Please comment if you have any additional points to discuss. I will comment my own opinion on each below so I don't bias the root issue post with my own opinions. If any of these need a larger discussion I will spin them into child issues.

@braydonk
Copy link
Contributor Author

braydonk commented Sep 20, 2023

Should the attributes be considered exhaustive?

I think each attribute in the current process metrics schema is exhaustive:

  • both direction attributes have their respective in and out equivalents
  • process.paging.faults.type, is there such thing other than a major or minor page fault?
  • process.context_switches.type, voluntary and involuntary on their own are grammatically exhaustive, and I can't imagine any other value for this.
  • process.cpu.state is the only one where it could be up in the air for some super niche edge case; I can't think of any CPU state other than the existing system, user, and wait personally but would love to learn if there is something

@braydonk
Copy link
Contributor Author

braydonk commented Sep 20, 2023

process.threads -> process.thread.count

This was discussed in https://github.com/open-telemetry/semantic-conventions/pull/330/files#r1330094680

While I think it makes sense to merge with ECS where possible, I'm not sold on the rename to process.threads.count for 2 reasons:

Truthfully I'm not caught up with the plans for where we're merging with ECS and how we reconcile the differences, so I'd be happy to hear from the ECS stakeholders about how we should handle this.

@braydonk
Copy link
Contributor Author

process.context_switches -> process.context_switch.count

This is related to #260 guidance; this is a monotonic counter, so process.context_switch.count seems like a logical progression of this metric name to match with that.

@braydonk
Copy link
Contributor Author

process.network.io.direction, transmit -> send

This is mostly my own idea, I'm not married to it. However, I think send is a better attribute value than transmit for two reasons:

  • It's a closer logical opposite to receive
  • It matches with existing nomenclature from socket APIs

@braydonk
Copy link
Contributor Author

braydonk commented Sep 20, 2023

process.cpu.utilization, do we have a better way to indicate that this metric is intended to be computed from another?

This metric is computed from process.cpu.time and (insert name of CPU count metric here). Is there some better way to indicate within the semconv generation that this metric is indeed intended to be computed from other known values? At the moment it's only outlined in the description, and not from any actual semantic convention terminology.

@ChrsMark
Copy link
Member

ChrsMark commented Dec 7, 2023

Every metric in the process.threads namespace in the linked ECS docs all seem to give information about a single process thread "resource" so to speak. This metric is about the thread count of a process "resource", so I fear this metric being in a process.thread namespace is confusing.

I see the point @braydonk ! I agree that process.threads.count (violates the namespace pluralization rule) or process.threads is the accurate one in that case.

@AlexanderWert @trisch-me feel free to add your input here.

@trisch-me
Copy link
Contributor

I also do agree that existing ECS schema for process.thread is about thread itself and process.threads is about process itself having multiple threads and therefore they are having different semantic meaning. process.thread already exist in ECS and therefore we should be careful when introducing new naming conflicting with it.

Truthfully I'm not caught up with the plans for where we're merging with ECS and how we reconcile the differences, so I'd be happy to hear from the ECS stakeholders about how we should handle this.

We are currently actively adding new fields from ECS into Otel. Please also check this PR about consideration when adding new fields which might exist already in ECS

@mx-psi
Copy link
Member

mx-psi commented Jan 18, 2024

Discussed on January 18th System Semantic Conventions WG meeting, @braydonk to update after discussion in #330

@dmitryax
Copy link
Member

dmitryax commented May 9, 2024

@braydonk are we good to proceed on this issue or it's still blocked?

@braydonk
Copy link
Contributor Author

braydonk commented May 10, 2024

Yeah this should be good to proceed. Here's the current state of things:

Decisions Made

process.threads -> process.thread.count

Yes
It is an updowncounter so this new name follows the naming rule, done in #330.

process.context_switches -> process.context_switch.count

No
It is a counter so it remains as is to follow the naming rule.

process.cpu.utilization, do we have a better way to indicate that this metric is intended to be computed from another?

No
As long as I'm not mistaken, there isn't a tooling solution for this and the best way seems to be that this is called out in the description of the metric. Currently it sort of is:

brief:
Difference in process.cpu.time since the last measurement, divided by the elapsed time and number of CPUs
available to the process.

Should process.cpu.state be exhaustive?

No
The metric is being merged into a single cpu.state metric, where the list of possible values will be wider than the 3 expected to be used in the process metrics. See #1026.

No Decision Yet

Should the other attributes in this comment be made exhaustive?

I'm tempted to say no since these metrics have now been moved to the registry and it makes less sense to be prescriptive even though these metrics will likely only have these values when used in process metrics.

network.io.direction value transmit -> send

I think I still agree with my reasoning in the comment above, but I'm starting to lean towards not rocking the boat and leaving that where it is. I think I'm splitting hairs too much with it and am fine to leave it as is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

7 participants