Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azkeys: each operation always emits 2 traces, the first one being a 401 failure #23480

Open
ItalyPaleAle opened this issue Sep 23, 2024 · 5 comments

Comments

@ItalyPaleAle
Copy link
Member

Bug Report

  • Relevant packages:
    • github.com/Azure/azure-sdk-for-go/sdk/security/keyvault/azkeys @ v1.1.0
    • github.com/Azure/azure-sdk-for-go/sdk/azcore/tracing @ v1.14.0
    • github.com/Azure/azure-sdk-for-go/sdk/tracing/azotel @ v0.4.0
  • Go version: 1.23.0

I have an app that makes calls to Azure Key Vault's data plane using the azkeys SDK. These calls include wrapping and unwrapping keys using key material stored in the AKV.

Calls to AKV are authenticated using tokens issued by Azure AD. AKV is configured to use Azure RBAC for authz.

I have enabled tracing using OpenTelemetry, through the official "tracing/azotel" SDK.

When I look at the traces, I see that every time my app makes a request to AKV, there are 2 traces:

  • The first one fails with 401
  • The second one succeeds

The operation overall completes successfully, and I understand by reading from issues such as Azure/azure-sdk-for-net#4914 that the 401 failures are expected and are due to how data plane authentication for AKV works.

However, while the SDK itself hides the exceptions so they are not logged, it does not hide the traces containing the failed requests. Because of that, when I look at the traces for my app, I see a lot of failures which are "false positives" and cause noise.

Screenshot 2024-09-23 at 07 56 52
@github-actions github-actions bot added the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Sep 23, 2024
@jhendrixMSFT
Copy link
Member

@lmolkova should we be suppressing the intermediate HTTP traces that are part of challenge auth?

@jhendrixMSFT jhendrixMSFT added Azure.Core and removed needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. labels Sep 23, 2024
@lmolkova
Copy link
Member

Traces show the reality - the first request fails and that's why the second one is made. If it happens on every request, it's a good indication that SDK does not pre-populate cached token on the Authentication header.

Instead of hiding it from users (including additional source of latency and possible SDK bugs) we should optimize the SDK and make it reuse tokens. We'd reduce app latency and load on the backend service.

@ItalyPaleAle
Copy link
Member Author

If it happens on every request

It does appear on every request, at least in my app.

I agree making it re-use tokens would be good.

I agree also that hiding the trace may be confusing. However, if the error is expected, could there be a way to not make it look like a failed request?
If there's a failure in one trace, tools could highlight the error as a false positive, causing noise (and possibly incorrect alerts)

@lmolkova
Copy link
Member

lmolkova commented Sep 23, 2024

could there be a way to not make it look like a failed request?

yes, that should be possible. Core tracing policy needs some extra context from the upstream that would help it classify something as not an error.

E.g. bearer token policy can express that 401 (on the first attempt) is not an error.
Methods like createIfNotExists in the client libs can express that 409 or similar is not an error.

However this is a tricky place. Application developers should look into the logical spans/metrics to get the failure rate (after all auth/tries) - there are always tons of transient issues in the high-scale network applications and if you look at them you see too much noise.

@lmolkova
Copy link
Member

That's the .NET API that helps with classification that's reused between tracing and logs https://learn.microsoft.com/en-us/dotnet/api/azure.core.responseclassifier?view=azure-dotnet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants