Skip to content

Releases: lancedb/lance

v0.3.17 Support for nested dict columns

22 Mar 02:05
Compare
Choose a tag to compare

A warm welcome to @haoxins , a new contributor who has helped improve Lance documentation.

This release adds support for list-of-dict columns (thanks @lucazanna for reporting the bug in #715).

Also included in this release are various vector index improvements for scalability and more progress towards OPQ implementation.

What's Changed

New Contributors

Full Changelog: v0.3.16...v0.3.17

v0.3.16 Filte pushdown improvements

18 Mar 06:48
Compare
Choose a tag to compare

Welcome @wangfenjin to lance contributors. Thanks for submitting a bug fix for the Lance DuckDB extensions 🔥

This release contains 2 workarounds for arrow limitations:

  1. Lance datasets now support <field> LIKE '%' and <field> IN (<values>) filters to be passed in as string. Generic SQL syntax supported by datafusion is now accepted. This is a break from standard pyarrow Dataset behavior which only accepts arrow compute Expression, which is not present in rust and also does not support introspection in python for developers to build custom adapter.

  2. When concatenating arrow dictionary arrays, the dict values are duplicated. There is currently no concrete plans to change this behavior in Arrow. Instead, we fix that at write time in Lance.

What's Changed

New Contributors

Full Changelog: v0.3.15...v0.3.16

v0.3.15 Bug fix for combining vector search and filter predicate

16 Mar 06:04
Compare
Choose a tag to compare

Thanks to @cemoody for the bug report!

What's Changed

Full Changelog: v0.3.14...v0.3.15

v0.3.14 Timestamp support

15 Mar 21:09
Compare
Choose a tag to compare

This is a patch release that adds support for Arrow Timestamp type. Thanks @kesavkolla for the bug report!

Thanks to @Renkai we also an optimized Take for Boolean arrays.

What's Changed

Full Changelog: v0.3.13...v0.3.14

v0.3.13 Support fast Take for variable length list

10 Mar 00:19
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.3.12...v0.3.13

v0.3.12 Upgrade arrow-rs and bug fixes

08 Mar 22:53
70560f6
Compare
Choose a tag to compare
  • Upgraded arrow-rs dependency to 33.0 (Waiting on datafusion for 34.0 upgrade).
  • Nested Dictionary fields are now parsed and written correctly.
  • More progress towards OPQ implementation.

What's Changed

Full Changelog: v0.3.11...v0.3.12

v0.3.11 Bug fix release

07 Mar 05:59
Compare
Choose a tag to compare

Bug fix for reading variable length list arrays (welcome @gsilvestrin).

We're working on windows support (welcome to @dnsco) and OPQ implementation for vector index, so stay tuned!

What's Changed

New Contributors

Full Changelog: v0.3.10...v0.3.11

v0.3.10 Easier debugging for vector index

01 Mar 05:35
Compare
Choose a tag to compare

You can now choose to bypass the ANN index even if it was available and perform vector search using brute-force. This helps with debugging ANN results. Note that SIMD is still applicable during brute-force search.

What's Changed

  • [Bug] Fix passing metric type during PQ index building by @eddyxu in #644
  • [python] Allow user to bypass ANN index and search using brute-force … by @changhiskhan in #645
  • expand tilde paths in python by @ananis25 in #621
  • Fix binary encoder handling array buffer slicing by @eddyxu in #649

Full Changelog: v0.3.9...v0.3.10

v0.3.9 limited python support for predicate pushdown

25 Feb 07:31
Compare
Choose a tag to compare

By default pyarrow compute Expressions doesn't serialize to sql strings. This patch release enables a limited set of filter pushdowns via python. Supported syntax:

  1. field references
  2. Operators: > < >= <= = == !=
  3. conjunctions / disjunctions

This enables querying via duckdb without needing to load the whole dataset into memory first.

e.g., duckdb.query("SELECT * FROM dataset WHERE id=5")

What's Changed

  • [Rust] Handle double equals in filter by @eddyxu in #639

Full Changelog: v0.3.8...v0.3.9

v0.3.8 Improved random access for non-numeric columns and duckdb extension

24 Feb 05:06
Compare
Choose a tag to compare

You can now query lance datasets outside of python using duckdb! Thanks to @dacort for making the lance extension play nice with duckdb. dbt-duckdb-lance anyone? You can find the extension under integration/duckdb_lance.

We're also very excited to release a very substantial performance optimization for random access for non-numeric columns.
Previously, if you wanted to fetch a string or blob column along with nearest neighbor search results, the non-optimized binary decoder take could add up to 5-20x latency overhead, depending on the sparsity of the indices. In this release we've optimized the take performance so this is basically a free operation.

While most of the work in Rust is completed for filter pushdown, we've had to delay the general release for this feature until we're able to overcome some rough edges making pyarrow compute Expressions play nice with datafusion and sqlparser-rs. It'll be worth the wait though we promise!

Cosine similarity is shipped but the recall performance is lower, due to some issues during index creation. We recommend that you stick with the default L2 distance metric until we address this in the coming few releases.

We'd love to hear from you!

What's Changed

New Contributors

Full Changelog: v0.3.7...v0.3.8