Revisit process metrics #334

braydonk · 2023-09-20T14:32:35Z

Pursuant to #330 which should be merged soon, I am opening this issue to revisit the process metrics schema while we are making breaking changes anyway.

As of writing, the only changes to the process schema will be to namespace every attribute. There are a couple of changes that should be discussed among the Hostmetrics WG to improve the schema.

Following are the current points up for discussion:

Should the attributes be considered exhaustive?
process.threads -> process.thread.count
process.context_switches -> process.context_switch.count
process.network.io.direction, transmit -> send
process.cpu.utilization, do we have a better way to indicate that this metric is intended to be computed from another?

Please comment if you have any additional points to discuss. I will comment my own opinion on each below so I don't bias the root issue post with my own opinions. If any of these need a larger discussion I will spin them into child issues.

The text was updated successfully, but these errors were encountered:

braydonk · 2023-09-20T14:36:10Z

Should the attributes be considered exhaustive?

I think each attribute in the current process metrics schema is exhaustive:

both direction attributes have their respective in and out equivalents
process.paging.faults.type, is there such thing other than a major or minor page fault?
process.context_switches.type, voluntary and involuntary on their own are grammatically exhaustive, and I can't imagine any other value for this.
process.cpu.state is the only one where it could be up in the air for some super niche edge case; I can't think of any CPU state other than the existing system, user, and wait personally but would love to learn if there is something

braydonk · 2023-09-20T14:40:49Z

`process.threads` -> `process.thread.count`

This was discussed in https://github.com/open-telemetry/semantic-conventions/pull/330/files#r1330094680

While I think it makes sense to merge with ECS where possible, I'm not sold on the rename to process.threads.count for 2 reasons:

Every metric in the process.threads namespace in the linked ECS docs all seem to give information about a single process thread "resource" so to speak. This metric is about the thread count of a process "resource", so I fear this metric being in a process.thread namespace is confusing.
The guidance in Analyze naming conventions for (monotonic) counter metrics #260 could be relevant for this, however this is a non-monotonic counter. Does the guidance in Analyze naming conventions for (monotonic) counter metrics #260 apply in the negative sense as well, where metrics should only be named count iff they are a monotonic counter?

Truthfully I'm not caught up with the plans for where we're merging with ECS and how we reconcile the differences, so I'd be happy to hear from the ECS stakeholders about how we should handle this.

braydonk · 2023-09-20T14:41:55Z

`process.context_switches` -> `process.context_switch.count`

This is related to #260 guidance; this is a monotonic counter, so process.context_switch.count seems like a logical progression of this metric name to match with that.

braydonk · 2023-09-20T14:48:42Z

`process.network.io.direction`, `transmit` -> `send`

This is mostly my own idea, I'm not married to it. However, I think send is a better attribute value than transmit for two reasons:

It's a closer logical opposite to receive
It matches with existing nomenclature from socket APIs

braydonk · 2023-09-20T14:50:39Z

`process.cpu.utilization`, do we have a better way to indicate that this metric is intended to be computed from another?

This metric is computed from process.cpu.time and (insert name of CPU count metric here). Is there some better way to indicate within the semconv generation that this metric is indeed intended to be computed from other known values? At the moment it's only outlined in the description, and not from any actual semantic convention terminology.

ChrsMark · 2023-12-07T14:30:36Z

Every metric in the process.threads namespace in the linked ECS docs all seem to give information about a single process thread "resource" so to speak. This metric is about the thread count of a process "resource", so I fear this metric being in a process.thread namespace is confusing.

I see the point @braydonk ! I agree that process.threads.count (violates the namespace pluralization rule) or process.threads is the accurate one in that case.

@AlexanderWert @trisch-me feel free to add your input here.

trisch-me · 2023-12-07T16:36:13Z

I also do agree that existing ECS schema for process.thread is about thread itself and process.threads is about process itself having multiple threads and therefore they are having different semantic meaning. process.thread already exist in ECS and therefore we should be careful when introducing new naming conflicting with it.

Truthfully I'm not caught up with the plans for where we're merging with ECS and how we reconcile the differences, so I'd be happy to hear from the ECS stakeholders about how we should handle this.

We are currently actively adding new fields from ECS into Otel. Please also check this PR about consideration when adding new fields which might exist already in ECS

mx-psi · 2024-01-18T16:38:08Z

Discussed on January 18th System Semantic Conventions WG meeting, @braydonk to update after discussion in #330

dmitryax · 2024-05-09T14:46:14Z

@braydonk are we good to proceed on this issue or it's still blocked?

braydonk · 2024-05-10T12:32:53Z

Yeah this should be good to proceed. Here's the current state of things:

Decisions Made

process.threads -> process.thread.count

Yes
It is an updowncounter so this new name follows the naming rule, done in #330.

process.context_switches -> process.context_switch.count

No
It is a counter so it remains as is to follow the naming rule.

process.cpu.utilization, do we have a better way to indicate that this metric is intended to be computed from another?

No
As long as I'm not mistaken, there isn't a tooling solution for this and the best way seems to be that this is called out in the description of the metric. Currently it sort of is:

semantic-conventions/model/metrics/process-metrics.yaml

Lines 19 to 21 in c316b1f

    
           brief: 
        
             Difference in process.cpu.time since the last measurement, divided by the elapsed time and number of CPUs 
        
             available to the process.

Should process.cpu.state be exhaustive?

No
The metric is being merged into a single cpu.state metric, where the list of possible values will be wider than the 3 expected to be used in the process metrics. See #1026.

No Decision Yet

Should the other attributes in this comment be made exhaustive?

I'm tempted to say no since these metrics have now been moved to the registry and it makes less sense to be prescriptive even though these metrics will likely only have these values when used in process metrics.

network.io.direction value transmit -> send

I think I still agree with my reasoning in the comment above, but I'm starting to lean towards not rocking the boat and leaving that where it is. I think I'm splitting hairs too much with it and am fine to leave it as is.

github-actions bot assigned arminru Sep 20, 2023

ChrsMark mentioned this issue Dec 7, 2023

Add additional process attributes to registry #564

Merged

3 tasks

mx-psi added the system-semconv-wg-ga-blocker label Jan 18, 2024

github-actions bot added the Stale label Feb 13, 2024

joaopgrassi removed the Stale label Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisit process metrics #334

Revisit process metrics #334

braydonk commented Sep 20, 2023

braydonk commented Sep 20, 2023 •

edited

Loading

braydonk commented Sep 20, 2023 •

edited

Loading

braydonk commented Sep 20, 2023

braydonk commented Sep 20, 2023

braydonk commented Sep 20, 2023 •

edited

Loading

ChrsMark commented Dec 7, 2023

trisch-me commented Dec 7, 2023

mx-psi commented Jan 18, 2024

dmitryax commented May 9, 2024

braydonk commented May 10, 2024 •

edited

Loading

Revisit process metrics #334

Revisit process metrics #334

Comments

braydonk commented Sep 20, 2023

braydonk commented Sep 20, 2023 • edited Loading

Should the attributes be considered exhaustive?

braydonk commented Sep 20, 2023 • edited Loading

process.threads -> process.thread.count

braydonk commented Sep 20, 2023

process.context_switches -> process.context_switch.count

braydonk commented Sep 20, 2023

process.network.io.direction, transmit -> send

braydonk commented Sep 20, 2023 • edited Loading

process.cpu.utilization, do we have a better way to indicate that this metric is intended to be computed from another?

ChrsMark commented Dec 7, 2023

trisch-me commented Dec 7, 2023

mx-psi commented Jan 18, 2024

dmitryax commented May 9, 2024

braydonk commented May 10, 2024 • edited Loading

Decisions Made

No Decision Yet

braydonk commented Sep 20, 2023 •

edited

Loading

braydonk commented Sep 20, 2023 •

edited

Loading

`process.threads` -> `process.thread.count`

`process.context_switches` -> `process.context_switch.count`

`process.network.io.direction`, `transmit` -> `send`

braydonk commented Sep 20, 2023 •

edited

Loading

`process.cpu.utilization`, do we have a better way to indicate that this metric is intended to be computed from another?

braydonk commented May 10, 2024 •

edited

Loading