-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add wrappers for zarr v3 #524
base: main
Are you sure you want to change the base?
Conversation
I am not sure about the idea of using a URL that does not actually resolve to anything useful. |
pcodec is actually an "Array to Bytes" codec: https://github.com/zarr-developers/numcodecs/blob/main/numcodecs/pcodec.py How would that fit in here? |
Any thoughts about what to do with numcodecs codecs not defined in this repo, but currently used via entrypoints? |
seconding this sentiment, a URL that doesn't resolve to anything is rather confusing. I think |
Could we ask those codecs to implement Zarr codec entrypoints directly? Which codecs do you have in mind? The challenge is that the V3 codecs are quite a bit more explicit in their typing (Array to Bytes, Bytes to Bytes, etc.) than legacy numcodecs codecs. So automatically translating an arbitrary numcodecs codec to a V3 codec is not possible. |
I am thinking of https://github.com/fsspec/kerchunk/blob/main/kerchunk/codecs.py and imagecodecs. There are probably others. |
I had asked @MSanKeys963 to setup the respective redirects to the numcodecs docs. That should solve that. |
Must have missed pcodec. I'll add it. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #524 +/- ##
==========================================
- Coverage 99.91% 90.97% -8.94%
==========================================
Files 59 62 +3
Lines 2328 2593 +265
==========================================
+ Hits 2326 2359 +33
- Misses 2 234 +232
|
Could you say something a bit about why this code makes sense to be in |
The idea was that the |
That makes sense - I'll try and give this a proper review in the next couple of days! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't quite looked at everything, but here's some initial comments
numcodecs/zarr3.py
Outdated
class NumcodecsCodec: | ||
codec_config: dict[str, JSON] | ||
|
||
def __init__(self, *, codec_id: str | None = None, codec_config: dict[str, JSON]) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the codec_id type be narrowed to a Literal list of supported codecs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reviewed the code in detail because I haven't got the time to get the context on zarr v3 I need to do that. I've left some comments on what I'd like to see addressed before this is merged though, mainly to ensure we maintain compatibility with zarr v2.
I also wonder if we should make all of the zarr3
api private, just exposing it via the entrypoints, to give more flexibility for modifying it in the future?
* Sync with zarr 3 beta * Update zarr version in ci * dont install zarr python 3 in workflows running 3.10
Co-authored-by: David Stansby <[email protected]>
@dstansby I incorporated your feedback:
Not sure what is going on with codecov. I think all lines should be tested by at least a few CI runs. Help appreciated. |
Thinking about how users might interact with this PR in zarr-developers/zarr-python#2398, I think we should make |
Making it public sounds good - the API reference will need adding to https://numcodecs--524.org.readthedocs.build/en/524/api.html if it's going to be public. I'll have a look and try and debug the codecov issue now. |
@@ -15,6 +15,7 @@ jobs: | |||
python-version: ["3.10", "3.11", "3.12", "3.13.0"] | |||
# macos-12 is an intel runner, macos-14 is a arm64 runner | |||
platform: [ubuntu-latest, windows-latest, macos-12, macos-14] | |||
zarr-version: ["zarr>=2,<3", "zarr==3.0.0b0"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if I lost track of this - I think you convinced me earlier in the comments of this PR that we could just install zarr==3 here and not bother testing with zarr v2?
The numcodecs issue is because the version check isn't working. If you check the logs of the runs with zarr v3, they contain the line
|
The Zarr v3 specification only lists a few codecs that are officially supported. However, it is desirable to expose the codecs in numcodecs for use with v3 arrays as well. This PR adds wrapper classes for numcodecs support.
The name of the codecs is prefixed with
numcodecs.
to avoid naming collisions in case some codecs of numcodecs get added to the Zarr spec. Also, there is a warning that numcodecs codecs are not officially supported and will likely not work in any other Zarr implementation.Most array-to-array ("filters") and bytes-to-bytes codecs are supported. Absent are the variable-length codecs as well as json, msgpack and pickle.
Here is an example of the persisted configuration:
Use of numcodecs in v2 arrays is not affected.
Fixes #502