Skip to content

v0.3.8 Improved random access for non-numeric columns and duckdb extension

Compare
Choose a tag to compare
@changhiskhan changhiskhan released this 24 Feb 05:06
· 1489 commits to main since this release

You can now query lance datasets outside of python using duckdb! Thanks to @dacort for making the lance extension play nice with duckdb. dbt-duckdb-lance anyone? You can find the extension under integration/duckdb_lance.

We're also very excited to release a very substantial performance optimization for random access for non-numeric columns.
Previously, if you wanted to fetch a string or blob column along with nearest neighbor search results, the non-optimized binary decoder take could add up to 5-20x latency overhead, depending on the sparsity of the indices. In this release we've optimized the take performance so this is basically a free operation.

While most of the work in Rust is completed for filter pushdown, we've had to delay the general release for this feature until we're able to overcome some rough edges making pyarrow compute Expressions play nice with datafusion and sqlparser-rs. It'll be worth the wait though we promise!

Cosine similarity is shipped but the recall performance is lower, due to some issues during index creation. We recommend that you stick with the default L2 distance metric until we address this in the coming few releases.

We'd love to hear from you!

What's Changed

New Contributors

Full Changelog: v0.3.7...v0.3.8