Releases · lancedb/lance

22 Mar 02:05

v0.3.17

ec02352

A warm welcome to @haoxins , a new contributor who has helped improve Lance documentation.

This release adds support for list-of-dict columns (thanks @lucazanna for reporting the bug in #715).

Also included in this release are various vector index improvements for scalability and more progress towards OPQ implementation.

What's Changed

docs: fix the links by @haoxins in #701
repair macos build for duckdb extension by @changhiskhan in #705
filter evaluation with flat search by @changhiskhan in #704
fix flaky test by @changhiskhan in #706
[Bug] Fix transpose in MatrixView.data() by @eddyxu in #711
Refactored variable length encoders by @gsilvestrin in #710
add notebook for q&a bot by @changhiskhan in #707
Allow iteratively train PQ by @eddyxu in #712
Use relative eq and fix a compiling warning by @eddyxu in #714
docs: fix the mod path by @haoxins in #718
Composable vector search pipeline by @eddyxu in #716
Fix CI failure by increasing epsilon for test_train_pq_iteratively by @eddyxu in #719
Implement support for list of Dictionaries by @gsilvestrin in #664

New Contributors

@haoxins made their first contribution in #701

Full Changelog: v0.3.16...v0.3.17

Contributors

eddyxu, gsilvestrin, and 3 other contributors

Assets 2

18 Mar 06:48

changhiskhan

v0.3.16

27d36e8

v0.3.16 Filte pushdown improvements

Welcome @wangfenjin to lance contributors. Thanks for submitting a bug fix for the Lance DuckDB extensions 🔥

This release contains 2 workarounds for arrow limitations:

Lance datasets now support <field> LIKE '%' and <field> IN (<values>) filters to be passed in as string. Generic SQL syntax supported by datafusion is now accepted. This is a break from standard pyarrow Dataset behavior which only accepts arrow compute Expression, which is not present in rust and also does not support introspection in python for developers to build custom adapter.
When concatenating arrow dictionary arrays, the dict values are duplicated. There is currently no concrete plans to change this behavior in Arrow. Instead, we fix that at write time in Lance.

What's Changed

Changed encoders to handle multiple Arrays by @gsilvestrin in #681
Train kmeans iteratively by @eddyxu in #688
Changed writers to handle multiple Arrays by @gsilvestrin in #691
Streaming PQ by @eddyxu in #689
[Bug] PQ training generates empty centroids by @eddyxu in #693
Allow append mode even if dataset doesn't already exist by @ananis25 in #690
Support "LIKE" and "IN" in filters by @eddyxu in #696
fix typo by @wangfenjin in #697
Improve indexing performance by @eddyxu in #699
Compute PQ distortion. by @eddyxu in #695
Bugfix for BinaryEncoder positions by @gsilvestrin in #698

New Contributors

@wangfenjin made their first contribution in #697

Full Changelog: v0.3.15...v0.3.16

Contributors

eddyxu, gsilvestrin, and 2 other contributors

Assets 2

16 Mar 06:04

changhiskhan

v0.3.15

d92e77e

v0.3.15 Bug fix for combining vector search and filter predicate

Thanks to @cemoody for the bug report!

What's Changed

Missing column when both nearest and filter are applied by @changhiskhan in #686

Full Changelog: v0.3.14...v0.3.15

Contributors

changhiskhan and cemoody

Assets 2

15 Mar 21:09

changhiskhan

v0.3.14

fa1847b

v0.3.14 Timestamp support

This is a patch release that adds support for Arrow Timestamp type. Thanks @kesavkolla for the bug report!

Thanks to @Renkai we also an optimized Take for Boolean arrays.

What's Changed

OPQ rotation matrix training by @eddyxu in #669
Optimize boolean by @Renkai in #676
Support timestamp type by @eddyxu in #684

Full Changelog: v0.3.13...v0.3.14

Contributors

eddyxu, kesavkolla, and Renkai

Assets 2

10 Mar 00:19

changhiskhan

v0.3.13

211dc1e

v0.3.13 Support fast Take for variable length list

What's Changed

update arrow-rs version in duckdb-ext for lance as well by @changhiskhan in #670
Support take operation on List by @eddyxu in #671

Full Changelog: v0.3.12...v0.3.13

Contributors

eddyxu and changhiskhan

Assets 2

08 Mar 22:53

changhiskhan

v0.3.12

70560f6

v0.3.12 Upgrade arrow-rs and bug fixes

Upgraded arrow-rs dependency to 33.0 (Waiting on datafusion for 34.0 upgrade).
Nested Dictionary fields are now parsed and written correctly.
More progress towards OPQ implementation.

What's Changed

Matrix mul and transpose by @eddyxu in #661
Recursively set dictionaries in struct fields by @gsilvestrin in #662
Upgrading arrow version to 33.0 by @gsilvestrin in #665
[Rust] sampling over matrix. by @eddyxu in #666
Sorting dataset versions by @gsilvestrin in #668

Full Changelog: v0.3.11...v0.3.12

Contributors

eddyxu and gsilvestrin

Assets 2

07 Mar 05:59

changhiskhan

v0.3.11

7e1471c

v0.3.11 Bug fix release

Bug fix for reading variable length list arrays (welcome @gsilvestrin).

We're working on windows support (welcome to @dnsco) and OPQ implementation for vector index, so stay tuned!

What's Changed

Windows support by @dnsco in #651
Trigger CI when workflow changes by @changhiskhan in #653
Compute SVD by @eddyxu in #658
Fix offsets when reading arrays by @gsilvestrin in #657

New Contributors

@dnsco made their first contribution in #651
@gsilvestrin made their first contribution in #657

Full Changelog: v0.3.10...v0.3.11

Contributors

eddyxu, dnsco, and 2 other contributors

Assets 2

01 Mar 05:35

changhiskhan

v0.3.10

da58e4d

v0.3.10 Easier debugging for vector index

You can now choose to bypass the ANN index even if it was available and perform vector search using brute-force. This helps with debugging ANN results. Note that SIMD is still applicable during brute-force search.

What's Changed

[Bug] Fix passing metric type during PQ index building by @eddyxu in #644
[python] Allow user to bypass ANN index and search using brute-force … by @changhiskhan in #645
expand tilde paths in python by @ananis25 in #621
Fix binary encoder handling array buffer slicing by @eddyxu in #649

Full Changelog: v0.3.9...v0.3.10

Contributors

eddyxu, changhiskhan, and ananis25

Assets 2

25 Feb 07:31

changhiskhan

v0.3.9

73786e5

v0.3.9 limited python support for predicate pushdown

By default pyarrow compute Expressions doesn't serialize to sql strings. This patch release enables a limited set of filter pushdowns via python. Supported syntax:

field references
Operators: > < >= <= = == !=
conjunctions / disjunctions

This enables querying via duckdb without needing to load the whole dataset into memory first.

e.g., duckdb.query("SELECT * FROM dataset WHERE id=5")

What's Changed

[Rust] Handle double equals in filter by @eddyxu in #639

Full Changelog: v0.3.8...v0.3.9

Contributors

eddyxu

Assets 2

24 Feb 05:06

changhiskhan

v0.3.8

175bc47

v0.3.8 Improved random access for non-numeric columns and duckdb extension

You can now query lance datasets outside of python using duckdb! Thanks to @dacort for making the lance extension play nice with duckdb. dbt-duckdb-lance anyone? You can find the extension under integration/duckdb_lance.

We're also very excited to release a very substantial performance optimization for random access for non-numeric columns.
Previously, if you wanted to fetch a string or blob column along with nearest neighbor search results, the non-optimized binary decoder take could add up to 5-20x latency overhead, depending on the sparsity of the indices. In this release we've optimized the take performance so this is basically a free operation.

While most of the work in Rust is completed for filter pushdown, we've had to delay the general release for this feature until we're able to overcome some rough edges making pyarrow compute Expressions play nice with datafusion and sqlparser-rs. It'll be worth the wait though we promise!

Cosine similarity is shipped but the recall performance is lower, due to some issues during index creation. We recommend that you stick with the default L2 distance metric until we address this in the coming few releases.

We'd love to hear from you!

What's Changed

Update extension for v0.7.0 compatibility by @dacort in #599
Remove -j from DuckDB build script by @changhiskhan in #601
a minor preparatory refactor by @changhiskhan in #598
fix gha duckdb trigger paths by @changhiskhan in #602
Use MetricType to specify the metric / distance compute function by @eddyxu in #600
[Python] Specify metric type in Dataset.create_index by @eddyxu in #603
[Rust] Implement a datafusion phyiscal expr Column that can reads nested columns by @eddyxu in #610
benchmark query performance on 768D vectors by @changhiskhan in #607
Parse sql filter clause to create datafusion physical expression by @eddyxu in #609
Schema exclude fields by @eddyxu in #613
Exec filter during Scan by @eddyxu in #612
workaround to prevent the segfault until we figure out the real problem by @changhiskhan in #616
Improve random access on binary encoding by @eddyxu in #615
[Python] Support filter pushdown from Python Dataset API by @eddyxu in #618
refactor benchmark to use cosine similarity by @changhiskhan in #611
Encoding shared slices of arrays. by @eddyxu in #620
Fix plain encoding by @eddyxu in #622
Fix crash with column projection with ann search by @eddyxu in #624
Relax data type matching float numbers in filter pushdown by @eddyxu in #625
python integration tests for vector index by @changhiskhan in #623
Remove filter pushdown from python api for now by @changhiskhan in #628
PlainDecoder take on boolean values by @eddyxu in #627
remove debug prints by @changhiskhan in #633
Scan node to detect channel close and gracefully break the scan. by @eddyxu in #635

New Contributors

@dacort made their first contribution in #599

Full Changelog: v0.3.7...v0.3.8

Contributors

dacort, eddyxu, and changhiskhan

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

Releases: lancedb/lance

v0.3.17 Support for nested dict columns

What's Changed

New Contributors

Contributors

v0.3.16 Filte pushdown improvements

What's Changed

New Contributors

Contributors

v0.3.15 Bug fix for combining vector search and filter predicate

What's Changed

Contributors

v0.3.14 Timestamp support

What's Changed

Contributors

v0.3.13 Support fast Take for variable length list

What's Changed

Contributors

v0.3.12 Upgrade arrow-rs and bug fixes

What's Changed

Contributors

v0.3.11 Bug fix release

What's Changed

New Contributors

Contributors

v0.3.10 Easier debugging for vector index

What's Changed

Contributors

v0.3.9 limited python support for predicate pushdown

What's Changed

Contributors

v0.3.8 Improved random access for non-numeric columns and duckdb extension

What's Changed

New Contributors

Contributors