Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: fix ray sink error when there are no data to write (#2919)
Reproduce python code: ```python import ray from lance.ray.sink import LanceDatasink ray.init() sink = LanceDatasink("./data.lance") ray.data.range(10).filter((lambda row: row["id"] > 10)).map(lambda x: {"id": x["id"], "str": f"str-{x['id']}"}).write_datasink(sink) ``` When using the lance ray sink to write lance file, the empty sink which may be caused by filter operator in ray data will cause these exception. ``` File "/opt/conda/lib/python3.11/site-packages/ray/data/dataset.py", line 3621, in write_datasink datasink.on_write_complete(write_results) File "/opt/conda/lib/python3.11/site-packages/lance/ray/sink.py", line 141, in on_write_complete op = lance.LanceOperation.Overwrite(schema, fragments) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<string>", line 5, in __init__ File "/opt/conda/lib/python3.11/site-packages/lance/dataset.py", line 1962, in __post_init__ raise TypeError( TypeError: schema must be pyarrow.Schema, got <class 'NoneType'> ``` The `on_write_complete` function assigns the `schema` by `fragments`. If there is no `fragments`, the `schema` will be `None`
- Loading branch information