-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add k8s.{pod,node}.network.{io,errors} metrics #1427
base: main
Are you sure you want to change the base?
Conversation
5d093f4
to
a7fe139
Compare
instrument: counter | ||
unit: "By" | ||
attributes: | ||
- ref: system.device |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the equivalent of interface
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is designed based on the hostmetricsreceiver which will also need to be updated to the latest SemConv once the System metrics are stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I'm not really sure if that's a good and settled decision right now: #308 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If #1492 makes it, we will use network.interface.name
here.
| [`network.io.direction`](/docs/attributes-registry/network.md) | string | The network IO operation direction. | `transmit` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | ||
| [`system.device`](/docs/attributes-registry/system.md) | string | The device identifier | `(identifier)` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To save ourselves work later and to start preparing end users, can we start a list somewhere, maybe in the readme, of the breaking changes that will be included in the stable versions of these semantics?
I know this particular change is additive in this repo, but considering that the kubeletstatsreceiver has been emitting this metric with the attributes interface
and direction
since its inception this change will be very impactful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I wonder where this information would fit best. @open-telemetry/specs-semconv-maintainers @open-telemetry/specs-semconv-approvers any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, ideally, we should be able to understand breaking changes and document them in Schema URL eventually for this purpose.
The thing here is that even if those are "fresh" metrics/attributes for the SemConv project, they are different compared to what we have so far in the Collector. So technically they are not breaking changes for SemConv. Will the Schema URL approach be able to support this?
Alternatively, one thing we can do for the Collector specifically is to file a meta issue that will collect all this breaking changes.
We could also update the k8s' components that will be affected to have their documentation explicitly mentioning this breaking change meta issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we usually have a migration plan as a part of stabilization process - like this one
semantic-conventions/docs/database/README.md
Lines 18 to 39 in cf4ce09
> [v1.24.0 of this document](https://github.com/open-telemetry/semantic-conventions/blob/v1.24.0/docs/database/database-spans.md) | |
> (or prior): | |
> | |
> * SHOULD NOT change the version of the database conventions that they emit by default | |
> until the database semantic conventions are marked stable. | |
> Conventions include, but are not limited to, attributes, | |
> metric and span names, and unit of measure. | |
> * SHOULD introduce an environment variable `OTEL_SEMCONV_STABILITY_OPT_IN` | |
> in the existing major version which is a comma-separated list of values. | |
> If the list of values includes: | |
> * `database` - emit the new, stable database conventions, | |
> and stop emitting the old experimental database conventions | |
> that the instrumentation emitted previously. | |
> * `database/dup` - emit both the old and the stable database conventions, | |
> allowing for a seamless transition. | |
> * The default behavior (in the absence of one of these values) is to continue | |
> emitting whatever version of the old experimental database conventions | |
> the instrumentation was emitting previously. | |
> * Note: `database/dup` has higher precedence than `database` in case both values are present | |
> * SHOULD maintain (security patching at a minimum) the existing major version | |
> for at least six months after it starts emitting both sets of conventions. | |
> * SHOULD drop the environment variable in the next major version. |
In a few words
- the instrumentation should keep doing what it's doing now (following semconv or not)
- it should have an option to enable new conventions (for collector I'd assume it won't be an env var, but it'd make sense to have a consistent way of opting in across different receivers).
- instrumentation can drop old conventions or just switch to a new ones by default with the next major version.
We've been also creating migration guides like these https://github.com/open-telemetry/semantic-conventions/blob/main/docs/non-normative/http-migration.md or https://github.com/open-telemetry/semantic-conventions/blob/main/docs/non-normative/db-migration.md which outline the list of changes.
You can probably do something similar and just mention undocumented (in semconv) old attributes there along with their replacements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank's for the information @lmolkova! If I understand correctly could we create a similar migration plan file for K8s already and start collecting the diffs there?
If I understood @TylerHelmuth initial comment here, we would like to start informing users already about such changes/diffs even if we are far from stability. In this way users could already be prepared about what changes to expect.
model/k8s/metrics.yaml
Outdated
type: metric | ||
metric_name: k8s.node.network.io | ||
stability: experimental | ||
brief: "Node network IO" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please elaborate? Is it a number of bytes transmitted by the node?
type: metric | ||
metric_name: k8s.node.network.errors | ||
stability: experimental | ||
brief: "Node network errors" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on this one? Is there some external documentation we can link, what kinds of errors does it cover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best external reference we can find is from the Kubelet's stats API: https://github.com/kubernetes/kubernetes/blob/v1.31.1/staging/src/k8s.io/kubelet/pkg/apis/stats/v1alpha1/types.go#L189-L204
unit: "{error}" | ||
attributes: | ||
- ref: system.device | ||
- ref: network.io.direction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any information about the error we can include? We have a very vague error.type
attribute intended for this purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have any information about the errors' types. It seems these are aggregated errors over a specific interface: https://github.com/kubernetes/kubernetes/blob/v1.31.1/staging/src/k8s.io/kubelet/pkg/apis/stats/v1alpha1/types.go#L189-L204
type: metric | ||
metric_name: k8s.pod.network.errors | ||
stability: experimental | ||
brief: "Pod network errors" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to the node errors - could you please elaborate on what it means?
unit: "{error}" | ||
attributes: | ||
- ref: system.device | ||
- ref: network.io.direction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we report any meaningful error.type
here?
model/k8s/metrics.yaml
Outdated
type: metric | ||
metric_name: k8s.pod.network.io | ||
stability: experimental | ||
brief: "Pod network IO" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
brief: "Pod network IO" | |
brief: "Number of bytes transmitted by the pod"? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not only transmitted but also received. That's why I guess we used IO
in the kubeletstats' docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! Is there something more elaborate we can say than "Pod network IO" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it Network bytes for the Pod
. Guess it's more descriptive?
Signed-off-by: ChrsMark <[email protected]>
Signed-off-by: ChrsMark <[email protected]>
ddd42d2
to
c7f3653
Compare
Part of #1032
Fixes #1487
Changes
This PR adds the following metrics:
k8s.pod.network.io
k8s.pod.network.errors
k8s.node.network.io
k8s.node.network.errors
The above metrics are already used in the Collector in the kubeletstats receiver and the naming is aligned with the System metrics.
These metrics come with the following attributes
network.io.direction
andsystem.device
which are already part of SemConv registry and are used in the System metrics. The attributes in the Collector will need to be adjusted to follow the recent SemConv rules.Merge requirement checklist
[chore]