feat(rdb_load): add support for loading huge streams #3855

andydunstall · 2024-10-03T06:41:43Z

Follows #3850 to add support for loading huge streams (#3760).

This loads the stream entries in partial reads, though loads the stream metadata and consumer groups in a single read (assuming consumer groups will be relatively small so don't need partial reads).

As with lists, loads streams in 512 segments as each stream node can contain 4kb of elements.

Also removes the outer Ltrace::arr as we now only use a single array. This means YieldIfNeeded is also redundant so removed.

Comparing a 5GB stream:

main: 4.8s / ~13GB RSS
load-huge-streams: 2.6s / ~7GB RSS

romange · 2024-10-03T08:18:25Z

src/server/rdb_load.cc

-        ec_ = RdbError(errc::rdb_file_corrupted);
-        return;
-      }
+  // We only load the stream_trace on the final read, so if not read we


I did not understand this comment. Can you explain please?

i've updated the comment

what I mean is ReadStreams is split into two sections:

Reading the stream entries (ltrace->arr)

Reading the stream metadata and consumer groups (ltrace->stream_trace)

loading the stream metadata and consumer groups in partial reads would be quite complex, and i'm guessing isn't expected to be large enough to require partial reads? so wasn't sure if it's worth trying to load consumer groups in partial reads?

the simplest option seems to be just load the stream entries (ltrace->arr) in partial reads, then on the final read also read the stream metadata and consumer groups (ltrace->stream_trace)

andydunstall · 2024-10-03T16:01:53Z

src/server/debugcmd.cc

@@ -124,7 +124,15 @@ tuple<const CommandId*, absl::InlinedVector<string, 5>> GeneratePopulateCommand(
    }
    json[json.size() - 1] = '}';  // Replace last ',' with '}'
    args.push_back(json);
+  } else if (type == "STREAM") {


Added this as it's useful for testing, though it is a bit different from the other populate commands since XADD adds a single stream entry with multiple elements in that entry (but the key still has only a single entry which is why the test calls populate 2000 times)

Can remove if preferred and just move this logic into the test (though this sped up the test and useful for manual testing)

andydunstall force-pushed the load-huge-streams branch from d22851f to 4dd427d Compare October 3, 2024 06:48

romange reviewed Oct 3, 2024

View reviewed changes

andydunstall added 2 commits October 3, 2024 16:19

chore: remove RdbLoad Ltrace::arr nested vector

5e78ec4

feat(rdb_load): add support for loading huge streams

3bb5db7

andydunstall force-pushed the load-huge-streams branch from 4dd427d to 3bb5db7 Compare October 3, 2024 15:32

andydunstall commented Oct 3, 2024

View reviewed changes

andydunstall marked this pull request as ready for review October 4, 2024 04:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rdb_load): add support for loading huge streams #3855

feat(rdb_load): add support for loading huge streams #3855

andydunstall commented Oct 3, 2024 •

edited

Loading

romange Oct 3, 2024

andydunstall Oct 3, 2024

andydunstall Oct 3, 2024 •

edited

Loading

feat(rdb_load): add support for loading huge streams #3855

Are you sure you want to change the base?

feat(rdb_load): add support for loading huge streams #3855

Conversation

andydunstall commented Oct 3, 2024 • edited Loading

romange Oct 3, 2024

Choose a reason for hiding this comment

andydunstall Oct 3, 2024

Choose a reason for hiding this comment

andydunstall Oct 3, 2024 • edited Loading

Choose a reason for hiding this comment

andydunstall commented Oct 3, 2024 •

edited

Loading

andydunstall Oct 3, 2024 •

edited

Loading