Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Use brotli compression by default if possible for nextstrain.org requests #214

Open
corneliusroemer opened this issue Aug 2, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@corneliusroemer
Copy link
Member

(This is probably not really the right repo to open this issue but I couldn't find a better one, please move if you know of a better place @tsibley)

Context

I noticed that brotli is much better at compressing auspice trees than gzip and checked if we were using brotli for downloading resources from AWS. Turns out we don't.

Description

It would be great if we supported brotli compression as the default compression for trees downloaded from AWS (ncov-data, etc.)

Examples

Compression using brotli is much better, see here for the Nextclade reference build with 4k tips:
image

Brotli compresses 4x better than gzip.

Possible solution

Apparently it's not too hard to enable brotli compression on the AWS end: https://aws.amazon.com/about-aws/whats-new/2020/09/cloudfront-brotli-compression/

We may also need to change charon request headers, though. Not sure where these are set.

I think we should try to use brotli wherever possible, also for things like auspice jsons. It generally does better than gzip.

Finally, we could also consider using brotli compression for nextstrain remote download - though there it's of less need, I think.

@corneliusroemer corneliusroemer added the enhancement New feature or request label Aug 2, 2022
@tsibley
Copy link
Member

tsibley commented Aug 22, 2022

A couple thoughts:

It should be possible to swap gzip for brotli, but we'll have to support a mix of the two for a long time (potentially ~forever) because it will be impossible to coordinate all sources.

While just looking at the compression benchmarks in isolation makes for clear benefits, it's not clear to me that swapping is worth it with full consideration of the effort involved (e.g. time to engineer the swap (plan, write, test, etc), ongoing complexity of supporting both, opportunity cost of working on this instead of something else, etc).

We don't use CloudFront's dynamic compression, as not all access goes through CloudFront: a lot goes directly to S3. So we pre-compress and store compressed objects on S3. IIRC, CloudFront's dynamic compression also has (or used to have at least?) fairly low upper limits on the uncompressed size it supports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Status: Backlog
Development

No branches or pull requests

2 participants