Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

list_refs #318

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion icechunk-python/python/icechunk/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
StorageConfig,
StoreConfig,
VirtualRefConfig,
Ref,
__version__,
pyicechunk_store_create,
pyicechunk_store_exists,
Expand All @@ -30,9 +31,9 @@
"SnapshotMetadata",
"StoreConfig",
"VirtualRefConfig",
"Ref",
]


class IcechunkStore(Store, SyncMixin):
_store: PyIcechunkStore

Expand Down Expand Up @@ -627,6 +628,9 @@ def supports_listing(self) -> bool:
def supports_deletes(self) -> bool:
return self._store.supports_deletes

def list_refs(self) -> list[Ref]:
return self._store.list_refs()

def list(self) -> AsyncGenerator[str, None]:
"""Retrieve all keys in the store.

Expand Down
7 changes: 7 additions & 0 deletions icechunk-python/python/icechunk/_icechunk_python.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,12 @@ import datetime
from collections.abc import AsyncGenerator
from typing import Any

class Ref:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really the model we want to return to users?

Alternatives could be:

  1. Arraylake style Pydantic models: BranchName(str) and TagName(str)
  2. or,
class RefType(Enum):
    BRANCH = auto()
    TAG = auto()

class Ref:
    name: str
    kind: RefType

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, forgot to comment. i like your approach best, but I'd like to see how easy/hard is to pattern match the result in python. Probably tests will show me that.

Other option would be to have separate functions, list_tags and list_branches, or maybe the three of them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that tags and branches can be in separate functions and that simplifies a lot from the Python user perspective

class Tag:
name: str
class Branch:
name: str

class PyIcechunkStore:
def as_bytes(self) -> bytes: ...
def set_mode(self, read_only: bool) -> None: ...
Expand All @@ -12,6 +18,7 @@ class PyIcechunkStore:
def change_set_bytes(self) -> bytes: ...
@property
def branch(self) -> str | None: ...
def list_refs(self) -> list[Ref]: ...
def checkout_snapshot(self, snapshot_id: str) -> None: ...
async def async_checkout_snapshot(self, snapshot_id: str) -> None: ...
def checkout_branch(self, branch: str) -> None: ...
Expand Down
52 changes: 52 additions & 0 deletions icechunk-python/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,38 @@ struct PyIcechunkStore {
store: Arc<RwLock<Store>>,
}

#[pyclass(name = "Ref")]
#[derive(Debug, Clone, Eq, PartialEq, Hash)]
pub enum PyRef {
Tag { name: String },
Branch { name: String },
}

impl From<Ref> for PyRef {
fn from(reference: Ref) -> Self {
match reference {
Ref::Tag(name) => Self::Tag { name },
Ref::Branch(name) => Self::Branch { name },
}
}
}

#[pymethods]
impl PyRef {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beautiful

fn __repr__(&self) -> PyResult<String> {
match self {
PyRef::Branch { name } => Ok(format!("Branch('{}')", name)),
PyRef::Tag { name } => Ok(format!("Tag('{}')", name)),
}
}
fn __str__(&self) -> PyResult<String> {
match self {
PyRef::Branch { name } => Ok(name.to_owned()),
PyRef::Tag { name } => Ok(name.to_owned()),
}
}
}

#[pyclass(name = "StoreConfig")]
#[derive(Clone, Debug, Default)]
struct PyStoreConfig {
Expand Down Expand Up @@ -432,6 +464,25 @@ impl PyIcechunkStore {
Ok(snapshot_id.to_string())
}

fn list_refs<'py>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also provide async_list_refs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a python test that exercises this new function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I can do both. I'd like some input on #318 (comment) first before I tackle those.

&'py self,
py: Python<'py>,
) -> PyIcechunkStoreResult<Bound<'py, PyList>> {
pyo3_asyncio_0_21::tokio::get_runtime().block_on(async move {
let store = self.store.read().await;
let list = store
.list_refs()
.await?
.into_iter()
.map(|parent| {
let parent = Into::<PyRef>::into(parent);
Python::with_gil(|py| parent.into_py(py))
})
.collect::<Vec<_>>();
Ok(PyList::new_bound(py, list))
})
}

fn async_commit<'py>(
&'py self,
py: Python<'py>,
Expand Down Expand Up @@ -1044,6 +1095,7 @@ fn _icechunk_python(py: Python<'_>, m: &Bound<'_, PyModule>) -> PyResult<()> {
m.add_class::<PyStoreConfig>()?;
m.add_class::<PySnapshotMetadata>()?;
m.add_class::<PyVirtualRefConfig>()?;
m.add_class::<PyRef>()?;
m.add_function(wrap_pyfunction!(pyicechunk_store_exists, m)?)?;
m.add_function(wrap_pyfunction!(async_pyicechunk_store_exists, m)?)?;
m.add_function(wrap_pyfunction!(pyicechunk_store_create, m)?)?;
Expand Down
9 changes: 8 additions & 1 deletion icechunk/src/zarr.rs
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ use crate::{
snapshot::{NodeData, UserAttributesSnapshot},
ByteRange, ChunkOffset, IcechunkFormatError, SnapshotId,
},
refs::{update_branch, BranchVersion, Ref, RefError},
refs::{list_refs, update_branch, BranchVersion, Ref, RefError},
repository::{
get_chunk, ArrayShape, ChunkIndices, ChunkKeyEncoding, ChunkPayload, ChunkShape,
Codec, DataType, DimensionNames, FillValue, Path, RepositoryError,
Expand Down Expand Up @@ -524,6 +524,13 @@ impl Store {
Ok(())
}

/// Lists all references namely branch and tag names.
pub async fn list_refs(&self) -> StoreResult<Vec<Ref>> {
let guard = self.repository.read().await;
let storage = guard.storage().as_ref();
Ok(list_refs(storage).await?)
}

/// Returns the sequence of parents of the current session, in order of latest first.
pub async fn ancestry(
&self,
Expand Down
Loading