Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow reusing the same context across multiple Encoders and Decoders #247

Closed
wants to merge 2 commits into from

Conversation

DouglasDwyer
Copy link
Contributor

From profiling, I have found that creating a zstd::stream::Encoder and using it to encode a small amount of bytes (less than 100) takes over 200 microseconds per encode on my machine. The vast majority of this time is related to the creation of new zstd contexts, so this PR adds a simple method on Encoder and Decoder that allows one to supply an external context from zstd_safe instead. With these changes, performance is improved - the previously-mentioned benchmark takes less than 2 microseconds per encode.

This PR also makes some minor changes to the WASM shim, as compilation appears to be broken for it.

@esemeniuc
Copy link

would love to see this!

@esemeniuc
Copy link

bump @gyscos for visibility

@gyscos
Copy link
Owner

gyscos commented Jun 17, 2024

Thanks for the PR!

A few questions:

  • Since this is all internal API, why separate MaybeOwned and MaybeOwnedInner types?
  • Why store a * mut () + PhantomData rather than directly a &'a mut T?

This PR also makes some minor changes to the WASM shim, as compilation appears to be broken for it.

I don't seem to see these changes anymore, were they already merged through another PR?

@DouglasDwyer
Copy link
Contributor Author

Since this is all internal API, why separate MaybeOwned and MaybeOwnedInner types?

My instinct was to separate them for readability. However, since it's an internal API, they don't necessarily have to be separate. I will go ahead and change this.

Why store a * mut () + PhantomData rather than directly a &'a mut T?

The goal for doing this was to ensure that Encoder<'a> and Decoder<'a> were covariant over 'a. If &'a mut T was used, the function with_context<'b: 'a>(reader: R, context: &'a mut zstd_safe::DCtx<'b>) would not compile - it would be impossible to convert a &'a mut zstd_safe::DCtx<'b> to a &'a mut zstd_safe::DCtx<'a>.

However! Looking at this again, MaybeOwned<T> actually appears to be unsound (see this playground for example). As such, I am going to refactor with_context to just be fn with_context(reader: R, context: &'a mut zstd_safe::DCtx<'static>). I am unable to find a safe way to support having a borrowed DCtx<'a> for an arbitrary 'a - my only idea is to add another lifetime parameter to Encoder and Decoder, but that breaking change doesn't seem worth it. If someone knows of a better approach, please let me know.

I don't seem to see these changes anymore, were they already merged through another PR?

Yes, the WASM shim now works properly on main.


Taking another look at this PR, I feel like the cleanest solution might be to create an entirely separate Encoder and Decoder class for borrowed contexts, rather than trying to support both owned/borrowed contexts with one struct. But I don't have the time do any more API design than this. Please let me know if you have any other suggestions for pushing the PR forward. Thanks!

@DouglasDwyer
Copy link
Contributor Author

Ping @gyscos

@esemeniuc
Copy link

I found a workaround that reuses the dictionary: https://gist.github.com/esemeniuc/ad889c617cc4297043fc344136be563f
I'd guess its probably 95% as fast

@esemeniuc
Copy link

Bump @gyscos

@gyscos
Copy link
Owner

gyscos commented Jul 5, 2024

Thanks for the work! Let's start with that, we'll see later if/how we could allow both referencing a dict and a context.

@gyscos
Copy link
Owner

gyscos commented Jul 5, 2024

CI failures looks weird - maybe you need to rebase?

@gyscos
Copy link
Owner

gyscos commented Jul 5, 2024

Ended up testing and merging the PR locally. Thanks for the work!

@gyscos gyscos closed this Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants