fix: Serialize access to pod logs #53

meyfa · 2024-07-14T15:41:57Z

Pod logs are a major performance bottleneck, both due to enormous data transfers from the Kubernetes API and due to formatting being expensive. Before this patch, if multiple clients requested the same log simultaneously, these expensive operations were done multiple times in parallel. With this patch, mutexes are added around log retrieval and log formatting such that only one such operation can execute at any time.

In the multi-client scenario, the client arriving first will lock the mutex and retrieve the logs. When this client unlocks the mutex and another client is allowed through, the logs will already be cached and the request returns immediately.

Performance tests using the ab CLI tool confirm this: One of the requests still takes the same time as before, but all other requests are much faster.

Memory requirements are reduced from O(client count) to O(1).

In case multiple clients request different logs each, in the worst case, the response time may be slightly worse, but probably not by much on average:
Requesting logs in parallel from Kubernetes is likely not faster than requesting in series due to storage or network bottlenecks, and since Foreman is single-threaded, formatting in parallel is also not faster.

Additional Context

N/A

Checklist

The pull request title meets the Conventional Commits specification and optionally includes the scope, for example: feat: Add social login

Pod logs are a major performance bottleneck, both due to enormous data transfers from the Kubernetes API and due to formatting being a expensive. Before this patch, if multiple clients requested the same log simultaneously, these expensive operations were done multiple times in parallel. With this patch, mutexes are added around log retrieval and log formatting such that only one such operation can execute at any time. In the multi-client scenario, the client arriving first will lock the mutex and retrieve the logs. When this client unlocks the mutex and another client is allowed through, the logs will already be cached and the request returns immediately. Performance tests using the `ab` CLI tool confirm this: One of the requests still takes the same time as before, but all other requests are much faster. Memory requirements are reduced from O(client count) to O(1). In case multiple clients request different logs each, in the worst case, the response time may be slightly worse, but probably not by much on average: Requesting logs in parallel from Kubernetes is likely not faster than requesting in series due to storage or network bottlenecks, and since Foreman is single-threaded, formatting in parallel is also not faster.

meyfa added the type::fix label Jul 14, 2024

meyfa requested a review from a team as a code owner July 14, 2024 15:41

lusu007 approved these changes Jul 14, 2024

View reviewed changes

lusu007 merged commit 9cbc552 into main Jul 14, 2024
5 checks passed

lusu007 deleted the fix/serialize-logs branch July 14, 2024 15:49

contane-bot mentioned this pull request Jul 14, 2024

chore(main): release 0.4.0 #52

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Serialize access to pod logs #53

fix: Serialize access to pod logs #53

meyfa commented Jul 14, 2024 •

edited

Loading

fix: Serialize access to pod logs #53

fix: Serialize access to pod logs #53

Conversation

meyfa commented Jul 14, 2024 • edited Loading

Additional Context

Checklist

meyfa commented Jul 14, 2024 •

edited

Loading