Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Icechunk Support #256

Merged
merged 90 commits into from
Oct 22, 2024
Merged

Add Icechunk Support #256

merged 90 commits into from
Oct 22, 2024

Conversation

mpiannucci
Copy link
Contributor

@mpiannucci mpiannucci commented Oct 15, 2024

Adds the ability to write to an Icechunk store. Co developed with @TomNicholas.

See earth-mover/VirtualiZarr#1 for more information (ported from that branch)

  • Closes #xxxx
  • Tests added
  • Tests passing
  • Full type hint coverage
  • Changes are documented in docs/releases.rst
  • New functions/methods are listed in api.rst
  • New functionality has documentation

@mpiannucci
Copy link
Contributor Author

Okay thinnkkkkk were finally solid

Copy link
Member

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm psyched about this! Just some last few commments.

virtualizarr/tests/test_writers/test_icechunk.py Outdated Show resolved Hide resolved
Comment on lines 173 to 175
# Check with xarray
ds = open_zarr(store=icechunk_filestore, zarr_format=3, consolidated=False)
assert np.allclose(ds.air.to_numpy(), expected_ds.air.to_numpy())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert_identical here too

Comment on lines 271 to 285
root_group = group(store=icechunk_filestore)
air_array = root_group["air"]
assert isinstance(air_array, Array)
assert air_array.shape == (3, 4)
assert air_array.dtype == np.dtype("float64")
assert air_array.attrs["units"] == "km"
assert np.allclose(air_array[:], la_v[:])

pres_array = root_group["pres"]
assert isinstance(pres_array, Array)
assert pres_array.shape == (3, 4)
assert pres_array.dtype == np.dtype("int32")
expected_ds = open_dataset(simple_netcdf4)
expected_array = expected_ds["foo"].to_numpy()
npt.assert_equal(pres_array, expected_array)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also all be checked by doing open_zarr then using a single xarray assertion. But it might be valuable to keep these lower-level assertions too.

virtualizarr/tests/test_writers/test_icechunk.py Outdated Show resolved Hide resolved
virtualizarr/writers/icechunk.py Outdated Show resolved Hide resolved
@TomNicholas
Copy link
Member

Also add a note to docs/release.rst please! That would be a good place to mention kerchunk version weirdness too.

@mpiannucci
Copy link
Contributor Author

There is some weirdness when comparing the icechunk datasets using the testing tools that dont happen if you check the individual peices of the dataset. Leaving further refinement for another PR

@TomNicholas TomNicholas merged commit 775c2c8 into zarr-developers:main Oct 22, 2024
11 checks passed
@TomNicholas
Copy link
Member

So psyched for this!!!

@TomNicholas TomNicholas added the Icechunk 🧊 Relates to Icechunk library / spec label Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Updates a dependency enhancement New feature or request Icechunk 🧊 Relates to Icechunk library / spec references formats Storing byte range info on disk zarr-python Relevant to zarr-python upstream
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants