Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add zstd compression support #60

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

shanipribadi
Copy link

zstd experimental feature is enabled to calculate upper bound of Vec capacity to be allocated while decompressing data using zstd::bulk::Decompressor::upper_bound.

supported levels are -128~22, with 0 defaulting to level 3 (due to zstd library behaviour).

benchmark Load block from disk

Load block from disk/1024 KiB [no compression] time:   [6.0854 µs 6.2072 µs 6.3423 µs]
Load block from disk/1024 KiB [lz4] time:   [7.2133 µs 7.3160 µs 7.4288 µs]
Load block from disk/1024 KiB [miniz] time:   [10.358 µs 10.542 µs 10.740 µs]
Load block from disk/1024 KiB [zstd(-3)] time:   [9.2772 µs 9.5900 µs 9.9649 µs]
Load block from disk/1024 KiB [zstd(-1)] time:   [9.0652 µs 9.1867 µs 9.3248 µs]
Load block from disk/1024 KiB [zstd(1)] time:   [9.0680 µs 9.2127 µs 9.3672 µs]
Load block from disk/1024 KiB [zstd(3)] time:   [9.0162 µs 9.1445 µs 9.2872 µs]
Load block from disk/1024 KiB [zstd(12)] time:   [10.432 µs 10.605 µs 10.795 µs]

Load block from disk/131072 KiB [no compression] time:   [150.79 µs 153.20 µs 155.90 µs]
Load block from disk/131072 KiB [lz4] time:   [193.57 µs 196.56 µs 199.61 µs]
Load block from disk/131072 KiB [miniz] time:   [232.92 µs 236.82 µs 241.00 µs]
Load block from disk/131072 KiB [zstd(-3)] time:   [197.10 µs 199.85 µs 203.01 µs]
Load block from disk/131072 KiB [zstd(-1)] time:   [198.58 µs 201.26 µs 204.22 µs]
Load block from disk/131072 KiB [zstd(1)] time:   [201.25 µs 203.61 µs 206.26 µs]
Load block from disk/131072 KiB [zstd(3)] time:   [197.61 µs 199.59 µs 201.84 µs]
Load block from disk/131072 KiB [zstd(12)] time:   [217.46 µs 220.58 µs 224.06 µs]

zstd experimental feature is enabled to calculate upper bound of
Vec capacity to be allocated while decompressing data using
zstd::bulk::Decompressor::upper_bound.

supported levels are 1~22, with 0 defaulting to level 3.
@marvin-j97 marvin-j97 added enhancement New feature or request api labels Sep 25, 2024
@shanipribadi
Copy link
Author

shanipribadi commented Sep 26, 2024

There's a few things about this PR that might need some feedback/discussions.

  1. zstd-rs is a binding to the zstd library, so it's not pure rust, not sure if it's a design goal for lsm-tree to be pure rust or not.
  2. I've used zstd::bulk because the interface is convenient (uses Vec), and the doc says that it can be faster than the streaming one because it allocates the buffers in memory. This does mean that if the data being compressed/uncompressed is large then the memory that needed to be allocated is larger as well. I don't know enough to get a feel of how big the typical data being processed for compression/decompression in lsm-tree. I also am not sure if it's even possible to use the streaming interface given the current interface used by lsm-tree for compression.
  3. Wanted to get feedback on the serialization of Zstd(value), zstd itself accepts -(1<<17) ~ 22 for compression rate. so far with the u8 limit on the serialization, I have only map -128 ~ 22. based on testing (load block from disk bench) -128 is faster than lz4 (forgot to include it in the above description, will have to run it again later). If we want to expose the full range of possible fast levels, then need to figure out how to best map -(1<<17)~-1 to -128-1 due to the u8 limit.
  4. Would appreciate some guidelines on how the benchmarks in https://fjall-rs.github.io/post/announcing-fjall-2/ is being done, so I could replicate it to see whether it's actually worth it to add zstd. Based on my knowledge, typically zstd provide more optimal cpu/compression compared to DEFLATE, but it doesn't really compete against lz4 for pure decompression performance (unless I guess we use extreme fast/negative levels). But hoping that the better compression ratio and tunability of zstd would be useful.
  5. There's also a few unrelated change (e.g. benches/tree.rs block->data_block rename, some renaming of test names), please let me know if you'd rather I split out those into separate PR/commit.

@marvin-j97
Copy link
Contributor

marvin-j97 commented Sep 26, 2024

zstd-rs is a binding to the zstd library, so it's not pure rust, not sure if it's a design goal for lsm-tree to be pure rust or not

I would like it to be, but there's no production ready library out there right now. KillingSpark/zstd-rs#65 has some encoding efforts going on, but it's far from usable. Ideally, it could just be switched out if there ever is a worthy contender.

This does mean that if the data being compressed/uncompressed is large then the memory that needed to be allocated is larger as well

It never is. Blocks tend to be 4 - 64 KB in size, blobs maybe up to a couple of MB max. It's also needed in memory because the size needs to be known (for the block header).

If we want to expose the full range of possible fast levels, then need to figure out how to best map -(1<<17)~-1 to -128-1 due to the u8 limit.

Hmm yeah, interesting. The problem is that the block header needs to be fixed, so I went with 2 bytes because I haven't looked at how many compression levels there tend to be. Miniz just has 10 or so.

With a u8 we can go from 20 down to -234. Most sources tend to recommend something along the lines of -7 - 20. So I'm not sure how important it is to even support negative levels that are much lower than 200. That would need some benchmarking. If -8000 is barely faster than -127 at much worse space savings, there's not point in supporting it I think.

so I could replicate it to see whether it's actually worth it to add zstd

For the benchmarks in the first chapter I used this project: https://gist.github.com/marvin-j97/22dfbe2ae2d9a8b9bcc938c8d48e54c7 - it needs a corpus of text documents on disk (DOCS_FOLDER) that it will ingest.

You'll need to use fjall 2.0.1+ because I had to fix a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants