Rework threading model #7778

NBKelly · 2024-09-27T08:16:34Z

This turned out to be 200% easier than I thought it would be.

We're supposedly not meant to use virtual threads for anything that touches atoms, because atoms do funny concurrency things already, so we're still using claypoole's thread pools. But now we're keeping a fixed set of pools (cores+2), and assigning each lobby a pool at random from the least occupied pools. The only thing that really changes (we have to track) is just an integer for each pool. Other than that, this literally plugs 1:1 into the other code I wrote, which is neat.

~~I also enabled the simple-auto-threading? option on sente's socket server, which I'll try out on playtest, but that might be overkill.~~ Probably best to trial that later, instead of doing too much at once.

I'm going to sample it on playtest over the next week or two and see if it causes any issues or is noticable at all.

Summary of changes:

We use a fixed pool of threads to manage games. Each game is assigned a thread to work with from the pool, and if there are enough games you'll have 2/3+ lobbies on the same thread.
Everything is still queued up sequentially on the thread, so things will be processed in order still
The lobby itself uses it's own thread, so lobby updates wont slow down games, and games will (mostly) not slow down other games or the lobby
We now log the throughput time of each ws interaction as it's handled, and display that in our metrics (number of invocations, average, percentiles)
Prune idle users a little more aggressively (1 hour of no interaction)
The list of lobbies gets stripped and sorted only once, instead of once per user. (note that this sort could theoretically mutate the state, which would amortize time across invocations. I don't think it's worth it though)

…ochastically assign them

NoahTheDuke · 2024-09-27T10:40:01Z

cool, this looks closer to what we need

NBKelly · 2024-09-28T04:49:48Z

I decided it would be worthwhile to trace execution time on a per-endpoint basis. The buckets get cleared out every 5 minutes when we resolve telemetry, so they shouldn't explode the state.

NBKelly · 2024-09-28T22:22:49Z

I cleaned up broadcast-lobby-list a little.
I noticed our workflow was:

Filter the list (per user)
sort it (per user, their own filtered version)
apply summaries (strip info)
wrap it in a vec

Which is the exact same as:

Sort it once
Strip it once
per user, filter and wrap in a vec

So we can cut out around half of the work for that, which is nice.

NBKelly · 2024-09-28T23:58:22Z

I changed the lobby sub timeout from 6 hours to 1 hour, and made it so they actually get pruned (I missed a case before), and also get periodically cleaned up (currently every 5 or so minutes).

NBKelly · 2024-09-29T00:35:38Z

right now it all seems to be working on playtest, so I'm going to mark this as ready for review

NBKelly · 2024-10-04T02:29:58Z

just realized I made a really dumb mistake. I'll commit a few more times in a minute.

NBKelly · 2024-10-04T03:17:06Z

ok so it turns out I was passing in the executed function bodies to the thread pools, so they weren't really doing anything at all 😂. I converted them to macros (this took an embarrassing amount of time to figure out), and they appear to function correctly now, so we should actually be load balancing stuff.

I also made it so if for some reason we try to execute an op on a nil lobby, and the function doesn't already check for it, we execute that on the lobby thread rather than a non-existent thread from the non-existent lobby.

NoahTheDuke · 2024-10-04T11:34:34Z

i say go for it. can't get much worse

NBKelly added 3 commits September 27, 2024 20:11

update that pooling situation to use a fixed quantity of pools and st…

29e9bf4

…ochastically assign them

auto-thread ws messages so they don't block

3b51a34

added telemetry for thread pool occupancy

a03602d

NBKelly added 2 commits September 28, 2024 16:45

attach timestamp to ws messages before processing

ab08f52

log endpoint processing times and output them with our telemetry

fcceeab

NBKelly added 3 commits September 28, 2024 16:50

include

916f28e

add percentiles to log output

8e4007c

only sort and strip the lobby list once, instead of once per lobby sub

d4bc22a

NBKelly added 2 commits September 29, 2024 12:56

cleanup

ab69c2f

set lobby-sub timeout to 1hr, prune them regularly

c1fb547

NBKelly marked this pull request as ready for review September 29, 2024 00:35

NBKelly added 2 commits October 2, 2024 15:45

simple auto-threading off, save that for later

32a1ce9

threading fns to macros

4b6bf3d

NBKelly marked this pull request as draft October 4, 2024 02:30

destructures properly

b01e061

NBKelly marked this pull request as ready for review October 4, 2024 03:18

NBKelly requested a review from NoahTheDuke October 4, 2024 03:18

NBKelly merged commit 1e5b0d1 into mtgred:master Oct 4, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework threading model #7778

Rework threading model #7778

NBKelly commented Sep 27, 2024 •

edited

Loading

NoahTheDuke commented Sep 27, 2024

NBKelly commented Sep 28, 2024 •

edited

Loading

NBKelly commented Sep 28, 2024

NBKelly commented Sep 28, 2024

NBKelly commented Sep 29, 2024

NBKelly commented Oct 4, 2024

NBKelly commented Oct 4, 2024 •

edited

Loading

NoahTheDuke commented Oct 4, 2024

Rework threading model #7778

Rework threading model #7778

Conversation

NBKelly commented Sep 27, 2024 • edited Loading

NoahTheDuke commented Sep 27, 2024

NBKelly commented Sep 28, 2024 • edited Loading

NBKelly commented Sep 28, 2024

NBKelly commented Sep 28, 2024

NBKelly commented Sep 29, 2024

NBKelly commented Oct 4, 2024

NBKelly commented Oct 4, 2024 • edited Loading

NoahTheDuke commented Oct 4, 2024

NBKelly commented Sep 27, 2024 •

edited

Loading

NBKelly commented Sep 28, 2024 •

edited

Loading

NBKelly commented Oct 4, 2024 •

edited

Loading