Add k8s.{pod,node}.network.{io,errors} metrics #1427

ChrsMark · 2024-09-24T07:50:48Z

Part of #1032

Changes

This PR adds the following metrics:

k8s.pod.network.io
k8s.pod.network.errors
k8s.node.network.io
k8s.node.network.errors

The above metrics are already used in the Collector in the kubeletstats receiver and the naming is aligned with the System metrics.

These metrics come with the following attributes network.io.direction and system.device which are already part of SemConv registry and are used in the System metrics. The attributes in the Collector will need to be adjusted to follow the recent SemConv rules.

Merge requirement checklist

CONTRIBUTING.md guidelines followed.
Change log entry added, according to the guidelines in When to add a changelog entry.
- If your PR does not need a change log, start the PR title with [chore]
schema-next.yaml updated with changes to existing conventions.

TylerHelmuth · 2024-09-30T14:46:29Z

model/k8s/metrics.yaml

+    instrument: counter
+    unit: "By"
+    attributes:
+      - ref: system.device


Is this the equivalent of interface?

Yes, that is designed based on the hostmetricsreceiver which will also need to be updated to the latest SemConv once the System metrics are stable.

However, I'm not really sure if that's a good and settled decision right now: #308 (comment)

If #1492 makes it, we will use network.interface.name here.

TylerHelmuth · 2024-09-30T14:52:58Z

docs/system/k8s-metrics.md

+| [`network.io.direction`](/docs/attributes-registry/network.md) | string | The network IO operation direction. | `transmit` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| [`system.device`](/docs/attributes-registry/system.md) | string | The device identifier | `(identifier)` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |


To save ourselves work later and to start preparing end users, can we start a list somewhere, maybe in the readme, of the breaking changes that will be included in the stable versions of these semantics?

I know this particular change is additive in this repo, but considering that the kubeletstatsreceiver has been emitting this metric with the attributes interface and direction since its inception this change will be very impactful.

Good idea. I wonder where this information would fit best. @open-telemetry/specs-semconv-maintainers @open-telemetry/specs-semconv-approvers any suggestions?

I think, ideally, we should be able to understand breaking changes and document them in Schema URL eventually for this purpose.

We don't have a good mechanism right now to do that, but @trask and @lmolkova have the most experience having stabilzied HTTP and soon, DB.

I think, ideally, we should be able to understand breaking changes and document them in Schema URL eventually for this purpose.

The thing here is that even if those are "fresh" metrics/attributes for the SemConv project, they are different compared to what we have so far in the Collector. So technically they are not breaking changes for SemConv. Will the Schema URL approach be able to support this?

Alternatively, one thing we can do for the Collector specifically is to file a meta issue that will collect all this breaking changes.
We could also update the k8s' components that will be affected to have their documentation explicitly mentioning this breaking change meta issue.

we usually have a migration plan as a part of stabilization process - like this one

semantic-conventions/docs/database/README.md

Lines 18 to 39 in cf4ce09

> [v1.24.0 of this document](https://github.com/open-telemetry/semantic-conventions/blob/v1.24.0/docs/database/database-spans.md)

> (or prior):

>

> * SHOULD NOT change the version of the database conventions that they emit by default

> until the database semantic conventions are marked stable.

> Conventions include, but are not limited to, attributes,

> metric and span names, and unit of measure.

> * SHOULD introduce an environment variable `OTEL_SEMCONV_STABILITY_OPT_IN`

> in the existing major version which is a comma-separated list of values.

> If the list of values includes:

> * `database` - emit the new, stable database conventions,

> and stop emitting the old experimental database conventions

> that the instrumentation emitted previously.

> * `database/dup` - emit both the old and the stable database conventions,

> allowing for a seamless transition.

> * The default behavior (in the absence of one of these values) is to continue

> emitting whatever version of the old experimental database conventions

> the instrumentation was emitting previously.

> * Note: `database/dup` has higher precedence than `database` in case both values are present

> * SHOULD maintain (security patching at a minimum) the existing major version

> for at least six months after it starts emitting both sets of conventions.

> * SHOULD drop the environment variable in the next major version.

In a few words

the instrumentation should keep doing what it's doing now (following semconv or not)

it should have an option to enable new conventions (for collector I'd assume it won't be an env var, but it'd make sense to have a consistent way of opting in across different receivers).

instrumentation can drop old conventions or just switch to a new ones by default with the next major version.

We've been also creating migration guides like these https://github.com/open-telemetry/semantic-conventions/blob/main/docs/non-normative/http-migration.md or https://github.com/open-telemetry/semantic-conventions/blob/main/docs/non-normative/db-migration.md which outline the list of changes.

You can probably do something similar and just mention undocumented (in semconv) old attributes there along with their replacements.

Thank's for the information @lmolkova! If I understand correctly could we create a similar migration plan file for K8s already and start collecting the diffs there?

If I understood @TylerHelmuth initial comment here, we would like to start informing users already about such changes/diffs even if we are far from stability. In this way users could already be prepared about what changes to expect.

lmolkova · 2024-10-11T17:39:11Z

model/k8s/metrics.yaml

+    type: metric
+    metric_name: k8s.node.network.io
+    stability: experimental
+    brief: "Node network IO"


could you please elaborate? Is it a number of bytes transmitted by the node?

lmolkova · 2024-10-11T17:40:23Z

model/k8s/metrics.yaml

+    type: metric
+    metric_name: k8s.node.network.errors
+    stability: experimental
+    brief: "Node network errors"


Could you elaborate on this one? Is there some external documentation we can link, what kinds of errors does it cover?

The best external reference we can find is from the Kubelet's stats API: https://github.com/kubernetes/kubernetes/blob/v1.31.1/staging/src/k8s.io/kubelet/pkg/apis/stats/v1alpha1/types.go#L189-L204

lmolkova · 2024-10-11T17:41:00Z

model/k8s/metrics.yaml

+    unit: "{error}"
+    attributes:
+      - ref: system.device
+      - ref: network.io.direction


is there any information about the error we can include? We have a very vague error.type attribute intended for this purpose.

I don't think we have any information about the errors' types. It seems these are aggregated errors over a specific interface: https://github.com/kubernetes/kubernetes/blob/v1.31.1/staging/src/k8s.io/kubelet/pkg/apis/stats/v1alpha1/types.go#L189-L204

lmolkova · 2024-10-11T17:41:44Z

model/k8s/metrics.yaml

+    type: metric
+    metric_name: k8s.pod.network.errors
+    stability: experimental
+    brief: "Pod network errors"


similar to the node errors - could you please elaborate on what it means?

lmolkova · 2024-10-11T17:41:58Z

model/k8s/metrics.yaml

+    unit: "{error}"
+    attributes:
+      - ref: system.device
+      - ref: network.io.direction


can we report any meaningful error.type here?

lmolkova · 2024-10-11T17:42:25Z

model/k8s/metrics.yaml

+    type: metric
+    metric_name: k8s.pod.network.io
+    stability: experimental
+    brief: "Pod network IO"


Suggested change

brief: "Pod network IO"

brief: "Number of bytes transmitted by the pod"?

It's not only transmitted but also received. That's why I guess we used IO in the kubeletstats' docs.

Makes sense! Is there something more elaborate we can say than "Pod network IO" ?

I made it Network bytes for the Pod. Guess it's more descriptive?

Signed-off-by: ChrsMark <[email protected]>

ChrsMark requested review from a team as code owners September 24, 2024 07:50

github-actions bot assigned joaopgrassi Sep 24, 2024

ChrsMark mentioned this pull request Sep 24, 2024

Define semantic conventions for k8s metrics #1032

Open

ChrsMark force-pushed the add_k8s_network_metrics branch 2 times, most recently from 5d093f4 to a7fe139 Compare September 24, 2024 08:11

TylerHelmuth reviewed Sep 30, 2024

View reviewed changes

ChrsMark mentioned this pull request Oct 3, 2024

Revisit system network metrics attributes #308

Open

lmolkova reviewed Oct 11, 2024

View reviewed changes

ChrsMark requested review from a team and tigrannajaryan as code owners October 18, 2024 06:58

ChrsMark added 2 commits October 18, 2024 09:59

Add k8s.{pod,node}.network.{io,errors} metrics

2ef9907

Signed-off-by: ChrsMark <[email protected]>

improve description

c7f3653

Signed-off-by: ChrsMark <[email protected]>

ChrsMark force-pushed the add_k8s_network_metrics branch from ddd42d2 to c7f3653 Compare October 18, 2024 06:59

ChrsMark requested review from a team and removed request for a team and tigrannajaryan October 18, 2024 06:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add k8s.{pod,node}.network.{io,errors} metrics #1427

Add k8s.{pod,node}.network.{io,errors} metrics #1427

ChrsMark commented Sep 24, 2024 •

edited

Loading

TylerHelmuth Sep 30, 2024

ChrsMark Sep 30, 2024

ChrsMark Oct 3, 2024

ChrsMark Oct 18, 2024

TylerHelmuth Sep 30, 2024

ChrsMark Sep 30, 2024

jsuereth Oct 3, 2024

ChrsMark Oct 3, 2024 •

edited

Loading

lmolkova Oct 11, 2024 •

edited

Loading

ChrsMark Oct 17, 2024

lmolkova Oct 11, 2024

lmolkova Oct 11, 2024

ChrsMark Oct 17, 2024

lmolkova Oct 11, 2024

ChrsMark Oct 17, 2024

lmolkova Oct 11, 2024

lmolkova Oct 11, 2024

lmolkova Oct 11, 2024

ChrsMark Oct 17, 2024

lmolkova Oct 17, 2024

ChrsMark Oct 18, 2024

		\| [`network.io.direction`](/docs/attributes-registry/network.md) \| string \| The network IO operation direction. \| `transmit` \| `Recommended` \| ![Experimental](https://img.shields.io/badge/-experimental-blue) \|
		\| [`system.device`](/docs/attributes-registry/system.md) \| string \| The device identifier \| `(identifier)` \| `Recommended` \| ![Experimental](https://img.shields.io/badge/-experimental-blue) \|

	> [v1.24.0 of this document](https://github.com/open-telemetry/semantic-conventions/blob/v1.24.0/docs/database/database-spans.md)
	> (or prior):
	>
	> * SHOULD NOT change the version of the database conventions that they emit by default
	> until the database semantic conventions are marked stable.
	> Conventions include, but are not limited to, attributes,
	> metric and span names, and unit of measure.
	> * SHOULD introduce an environment variable `OTEL_SEMCONV_STABILITY_OPT_IN`
	> in the existing major version which is a comma-separated list of values.
	> If the list of values includes:
	> * `database` - emit the new, stable database conventions,
	> and stop emitting the old experimental database conventions
	> that the instrumentation emitted previously.
	> * `database/dup` - emit both the old and the stable database conventions,
	> allowing for a seamless transition.
	> * The default behavior (in the absence of one of these values) is to continue
	> emitting whatever version of the old experimental database conventions
	> the instrumentation was emitting previously.
	> * Note: `database/dup` has higher precedence than `database` in case both values are present
	> * SHOULD maintain (security patching at a minimum) the existing major version
	> for at least six months after it starts emitting both sets of conventions.
	> * SHOULD drop the environment variable in the next major version.

	brief: "Pod network IO"
	brief: "Number of bytes transmitted by the pod"?

Add k8s.{pod,node}.network.{io,errors} metrics #1427

Are you sure you want to change the base?

Add k8s.{pod,node}.network.{io,errors} metrics #1427

Conversation

ChrsMark commented Sep 24, 2024 • edited Loading

Changes

Merge requirement checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChrsMark Oct 3, 2024 • edited Loading

Choose a reason for hiding this comment

lmolkova Oct 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChrsMark commented Sep 24, 2024 •

edited

Loading

ChrsMark Oct 3, 2024 •

edited

Loading

lmolkova Oct 11, 2024 •

edited

Loading