Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use blosc2 package instead of bundled blosc #538

Closed
wants to merge 4 commits into from

Conversation

dstansby
Copy link
Contributor

This is a first attempt at fixing #262 by replacing the bundled blosc library with the blosc2 package available on PyPI.

There are several tests currently marked as xfail that need investigating and potentially fixing, but I thought it was worth opening this to avoid anyone else duplicating the work so far, and to see if anyone else wants to help investigate the pytest.xfails I had to put in to get tests passing on my local machine.

@normanrz
Copy link
Contributor

I wonder if blosc2 should rather be a separate codec because of the missing forward compatibility. From the readme:

Note: Python-Blosc2 is meant to be backward compatible with Python-Blosc data. That means that it can read data generated with Python-Blosc, but the opposite is not true (i.e. there is no forward compatibility).

Also, the blosc c code base is used for other codecs that are bundled with blosc, e.g. zstd, lz4.

@dstansby
Copy link
Contributor Author

I that case I might close this PR - I couldn't get the blosc package to install on my local machine, because it's no longer maintained (last upload Dec '22) and wheels aren't available for Python 3.12

@dstansby dstansby closed this Jun 21, 2024
@d-v-b
Copy link
Contributor

d-v-b commented Jun 21, 2024

I that case I might close this PR - I couldn't get the blosc package to install on my local machine, because it's no longer maintained (last upload Dec '22) and wheels aren't available for Python 3.12

I think the blosc developers want people to use blosc2, and understandably don't have much interest in blosc1 maintenance. We should definitely get blosc2 set up as a zarr codec for this reason.

@dstansby dstansby deleted the blosc-unbundle branch June 21, 2024 07:51
@mkitti
Copy link
Contributor

mkitti commented Jun 21, 2024

We really should not be encoding new data with in the Blosc-1 chunk format with upstream support being sparse and the upstream authors strongly encouraging us to to migrate.

Blosc1 chunk format:
https://github.com/Blosc/c-blosc/blob/main/README_CHUNK_FORMAT.rst

Blosc2 contiguous frame format:
https://www.blosc.org/c-blosc2/format/cframe_format.html

@martindurant
Copy link
Member

blosc c code base is used for other codecs that are bundled with blosc, e.g. zstd, lz4.

but these codecs are also available without blosc, in an incompatible way because of extra framing blosc adds.

@normanrz
Copy link
Contributor

I would love to see a ZEP that adds blosc2 to the Zarr spec.

@dstansby dstansby restored the blosc-unbundle branch August 31, 2024 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants