Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Serialize access to pod logs #53

Merged
merged 1 commit into from
Jul 14, 2024
Merged

fix: Serialize access to pod logs #53

merged 1 commit into from
Jul 14, 2024

Conversation

meyfa
Copy link
Member

@meyfa meyfa commented Jul 14, 2024

Pod logs are a major performance bottleneck, both due to enormous data transfers from the Kubernetes API and due to formatting being expensive. Before this patch, if multiple clients requested the same log simultaneously, these expensive operations were done multiple times in parallel. With this patch, mutexes are added around log retrieval and log formatting such that only one such operation can execute at any time.

In the multi-client scenario, the client arriving first will lock the mutex and retrieve the logs. When this client unlocks the mutex and another client is allowed through, the logs will already be cached and the request returns immediately.

Performance tests using the ab CLI tool confirm this: One of the requests still takes the same time as before, but all other requests are much faster.

Memory requirements are reduced from O(client count) to O(1).

In case multiple clients request different logs each, in the worst case, the response time may be slightly worse, but probably not by much on average:
Requesting logs in parallel from Kubernetes is likely not faster than requesting in series due to storage or network bottlenecks, and since Foreman is single-threaded, formatting in parallel is also not faster.

Additional Context

N/A

Checklist

  • The pull request title meets the Conventional Commits specification and optionally includes the scope, for example: feat: Add social login

Pod logs are a major performance bottleneck, both due to enormous
data transfers from the Kubernetes API and due to formatting being
a expensive. Before this patch, if multiple clients requested the same
log simultaneously, these expensive operations were done multiple
times in parallel. With this patch, mutexes are added around log
retrieval and log formatting such that only one such operation can
execute at any time.

In the multi-client scenario, the client arriving first will lock the mutex
and retrieve the logs. When this client unlocks the mutex and another
client is allowed through, the logs will already be cached and the
request returns immediately.

Performance tests using the `ab` CLI tool confirm this: One of the
requests still takes the same time as before, but all other requests are
much faster.

Memory requirements are reduced from O(client count) to O(1).

In case multiple clients request different logs each, in the worst case,
the response time may be slightly worse, but probably not by much
on average:
Requesting logs in parallel from Kubernetes is likely not faster than
requesting in series due to storage or network bottlenecks, and since
Foreman is single-threaded, formatting in parallel is also not faster.
@meyfa meyfa requested a review from a team as a code owner July 14, 2024 15:41
@lusu007 lusu007 merged commit 9cbc552 into main Jul 14, 2024
5 checks passed
@lusu007 lusu007 deleted the fix/serialize-logs branch July 14, 2024 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants