Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommended histogram bucket sizes for HTTP connection duration #336

Open
JamesNK opened this issue Sep 21, 2023 · 3 comments
Open

Recommended histogram bucket sizes for HTTP connection duration #336

JamesNK opened this issue Sep 21, 2023 · 3 comments
Assignees

Comments

@JamesNK
Copy link

JamesNK commented Sep 21, 2023

The http.server.request.duration histogram recommends bucket sizes:

[`ExplicitBucketBoundaries`](https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/metrics/api.md#instrument-advice)
of `[ 0, 0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1, 2.5, 5, 7.5, 10 ]`.

A server library has a histogram to track HTTP connection duration. It should have defined bucket sizes, but I'm are unsure what values to set. The HTTP request durations are too short (a connection could last, minutes, hours or even days).

Is there any agreement in the OTEL ecosystem about what good histogram buckets are for HTTP connection duration? (or longer running tasks in general)

@trask
Copy link
Member

trask commented Sep 21, 2023

related: #316

@samsp-msft
Copy link

I am suggesting:
[0, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1, 2, 5, 10, 30, 60, 120, 300]

in #open-telemetry/opentelemetry-dotnet#4922

It uses an approx 2x escalation for each bucket with alignment to minutes at the end.

It doesn't go up to hours, but the main benefit for longer connections is that you don't need to pay the setup costs of each connection on each request. Once the connection duration is in the order of minutes, the incremental cost of benefit of longer connections rapidly diminishes. This should be a good balance.

@JamesNK
Copy link
Author

JamesNK commented Oct 7, 2023

Up to 300 seconds is much better than 10 seconds. I think there are situations where a connection could live quite a long time. For example, web sockets in the browser (e.g. SignalR) and server-to-server scenarios where a client is reused for a long time.

I removed some of the smaller values and added capacity for up to an hour.

Before:
[0, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1, 2, 5, 10, 30, 60, 120, 300]

After:
[0, 0.01, 0.05, 0.1, 0.5, 1, 5, 10, 30, 60, 120, 300, 600, 1200, 3600]

TBH I'm not sure exactly where most connection lifetimes end up. I would be ok with tracking up to 300 seconds and then adjusting if needed.

Update:
ASP.NET Core Kestrel is using: [0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1, 2, 5, 10, 30, 60, 120, 300]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants