Releases: lancedb/lance
Releases · lancedb/lance
v0.11.0
What's Changed
Breaking Changes 🛠
- feat(rust)!: use BoxedError in Error::IO by @broccoliSpicy in #2329
New Features 🎉
- feat: add v2 support to fragment merge / update paths by @westonpace in #2311
- feat: add priority to I/O scheduler by @westonpace in #2315
- feat: add take_rows operation to the v2 file reader's python bindings by @westonpace in #2331
- feat: added example for reading and writing dataset in rust by @raunaks13 in #2349
- feat: new HNSW implementation by @BubbleCal in #2353
- feat: add fragment take / fixed-size-binary support to v2 format by @westonpace in #2354
Bug Fixes 🐛
- fix: recognize a simple expression like 'is_foo' as a scalar index query by @westonpace in #2356
- fix: rework list encoder to handle list-struct by @westonpace in #2344
- fix: minor bug fixes for v2 by @westonpace in #2361
Documentation 📚
- docs: clearify comments in table.proto -> message DataFragment -> physical_rows by @broccoliSpicy in #2346
Performance Improvements 🚀
- perf: use the file metadata cache in scalar indices by @westonpace in #2330
Other Changes
- chore: remove
m_max
anduse_heuristic
params from HNSW builder by @BubbleCal in #2336 - fix(java): fix JNI jar loader issue by @LuQQiu in #2340
- ci: fix labeler permissions by @wjones127 in #2348
- fix: rework decoding to fix bugs in nested struct decoding by @westonpace in #2337
New Contributors
- @broccoliSpicy made their first contribution in #2346
- @raunaks13 made their first contribution in #2349
Full Changelog: v0.10.18...v0.11.0
v0.10.18
What's Changed
New Features 🎉
- feat: don't load all list items before returning a batch by @westonpace in #2262
- feat(java): support dataset/fragment scan with string filter, columns filter by @LuQQiu in #2266
- feat: support storage options in write fragments by @eddyxu in #2289
- feat: ray sink takes storage options parameter by @eddyxu in #2293
- feat(python): support auto conversion between fixed size int array to pytorch tensor by @eddyxu in #2294
- feat: add experimental limit variables for GCS by @wjones127 in #2295
- feat: add support for large string & large binary to the v2 encoder & decoder by @westonpace in #2297
- feat: hamming distance by @eddyxu in #2110
- feat: add projection support to the v2 format by @westonpace in #2296
Bug Fixes 🐛
- fix: fix some corner cases that can arise in v2 list encoding by @westonpace in #2291
- fix: flush after writing pages so that we don't yield to the user with in-progress writes by @westonpace in #2298
- fix: fix a panic that could happen when scanning only the row id from fragment with deleted rows by @westonpace in #2302
- fix: relax 'take out of bounds' check which could cause failure if flat searching deleted rows by @westonpace in #2314
Documentation 📚
- docs: update quickstart doc for supporting IVF_HNSW_* by @BubbleCal in #2301
Performance Improvements 🚀
- perf: re-enable late materialization on full scans by @westonpace in #2290
Other Changes
- perf: do HNSW search with threads of CPU runtime by @BubbleCal in #2251
- chore: add ossrh plugins to jni package by @eddyxu in #2265
New Contributors
Full Changelog: v0.10.17...v0.10.18
v0.10.18-beta.1
What's Changed
New Features 🎉
- feat: don't load all list items before returning a batch by @westonpace in #2262
- feat(java): support dataset/fragment scan with string filter, columns filter by @LuQQiu in #2266
- feat: support storage options in write fragments by @eddyxu in #2289
- feat: ray sink takes storage options parameter by @eddyxu in #2293
- feat(python): support auto conversion between fixed size int array to pytorch tensor by @eddyxu in #2294
- feat: add experimental limit variables for GCS by @wjones127 in #2295
- feat: add support for large string & large binary to the v2 encoder & decoder by @westonpace in #2297
- feat: hamming distance by @eddyxu in #2110
Bug Fixes 🐛
- fix: fix some corner cases that can arise in v2 list encoding by @westonpace in #2291
- fix: flush after writing pages so that we don't yield to the user with in-progress writes by @westonpace in #2298
- fix: fix a panic that could happen when scanning only the row id from fragment with deleted rows by @westonpace in #2302
Performance Improvements 🚀
- perf: re-enable late materialization on full scans by @westonpace in #2290
Other Changes
- perf: do HNSW search with threads of CPU runtime by @BubbleCal in #2251
New Contributors
Full Changelog: v0.10.17...v0.10.18-beta.1
v0.10.17 Batch size bug fix, document spilling workaround
What's Changed
New Features 🎉
- feat: add nullability and u64 support to v2 list codec by @westonpace in #2255
Bug Fixes 🐛
- fix: don't use pushdown scan if a custom batch size has been set by @westonpace in #2277
Documentation 📚
- docs: document LANCE_BYPASS_SPILLING environment variable by @alexkohler in #2276
Full Changelog: v0.10.16...v0.10.17
v0.10.16
Bug fixes
- fix: use 1/ln(m) as the
mL
value for better performance by @BubbleCal in #2239 - fix: various fixes to ensure that the v2 writer is used by the ray sink if requested by @westonpace in #2248
- fix: bugs in list codec when dealing with a list of empty lists or multiple lists in a single page by @westonpace in #2222
- fix: creating IVF_HNSW index fails with cosine metric type by @BubbleCal in #2235
- fix: build & search causes panic with empty partition for IVF_HNSW by @BubbleCal in #2172
- fix: clippy on main by @chebbyChefNEQ in #2254
- fix: copy arrays when placing them in the v2 writer's accumulation queue by @westonpace in #2249
- fix: out of bound panics in take rows by @chebbyChefNEQ in #2259
- fix: allow for all-null pages in btree by @westonpace in #2245
New Features
- feat: add use_experimental_writer to write_fragments API by @westonpace in #2226
- feat: use v2 writer api in ray data sink by @eddyxu in #2231
- feat(java): add LanceOperation, commitAppend for batch write by @LuQQiu in #2207
- feat(python): expose FragmentMetadata attrs and compaction repr by @wjones127 in #2191
- feat: update arrow to 51, datafusion to 37 by @westonpace in #2240
- feat: improvement of Ray sink API by @eddyxu in #2237
- feat: forbid creating index if num_sub_vectors doesn't divide dim by @BubbleCal in #2234
- feat: add create_index to vector index extensions by @chebbyChefNEQ in #2250
- feat: support delta merge for IVF_HNSW_SQ by @BubbleCal in #2132
- feat(java): add version latestVersion api by @LuQQiu in #2238
- feat: raise GCS object size limit to 2.5TB by @wjones127 in #2261
Performance Improvements
- perf: concurrent building for HNSW graph by @BubbleCal in #2210
- perf: v2 fsl decode perf fix and some benchmarking utilities by @westonpace in #2214
- perf: avoid copying of creating memory dist calculator by @BubbleCal in #2219
- perf: keep less edges in non-base levels to improve greedy search performance by @BubbleCal in #2244
Other changes
- chore: remove --preview from ruff invocations by @westonpace in #2220
- chore: fix formatting in test_schema.py by @westonpace in #2227
- chore: bump to ruff 0.4.1 by @eddyxu in #2229
- chore: bump minimal python version to 39 by @eddyxu in #2230
- chore: use VectorStorage in HNSWBuilder by @chebbyChefNEQ in #2242
- test: test write methods against more varied layouts by @wjones127 in #2233
Full Changelog: v0.10.15...v0.10.16
v0.10.15: fix for LanceSchema pickling and empty directories
What's Changed
- fix: pickle LanceSchema nested fields correctly by @wjones127 in #2223
- fix: do not create directory when open an non-existing dataset by @eddyxu in #2215
- perf: build HNSW and quantization storage in parallel by @BubbleCal in #2196
- chore: remove extend_candidates option by @BubbleCal in #2217
- perf: avoid copying every vectors multiple times during building by @BubbleCal in #2216
- chore: search for enough candidates to build connections by @BubbleCal in #2218
- feat: add support v2 fragments on the scan path by @westonpace in #2213
- feat: allow loading extension vector index by @chebbyChefNEQ in #2221
Full Changelog: v0.10.14...v0.10.15
v0.10.14: Fixes for schema evolution
What's Changed
- fix: llm pre-training docs by @tanaymeh in #2205
- refactor: rework fragment API in preparation for lance v2 by @westonpace in #2194
- feat: add v2 writer to the write dataset path by @westonpace in #2206
- refactor: migrate scan path from v1-only to generic reader by @westonpace in #2201
- feat: replace
LanceFragment.add_columns
withmerge_columns
by @wjones127 in #2208 - feat: support complex schemas in append by @wjones127 in #2209
Full Changelog: v0.10.13...v0.10.14
v0.10.13
What's Changed
- feat: add FragmentWriter and Commiter for streaming ray write by @eddyxu in #2190
- perf: optimize construction & search for HNSW by @BubbleCal in #2193
- feat(java): add lance jni create empty dataset, create fragment, and get schema by @LuQQiu in #2175
- refactor: move existing utilities from file reader into fragment-level operations by @westonpace in #2197
- feat: more validation of fragments by @wjones127 in #2203
- feat: debug functions by @wjones127 in #2202
Full Changelog: v0.10.12...v0.10.13
v0.10.12 allow spill pool to be bypassed
What's Changed
- feat: add v2 nullability for primitive / fsl by @westonpace in #2169
- feat: add support for binary encoding by @westonpace in #2183
- feat: python binding for HNSW utils by @BubbleCal in #2182
- fix: add env variables to bypass/configure spill pool size by @westonpace in #2189
Full Changelog: v0.10.11...v0.10.12
v0.10.11: Ray data sink, fix for field ids
New features
- feat: lance ray data sink by @eddyxu in #2180
- feat: cast ivf_centroids to appropriate type by @wjones127 in #2097
- feat: add python bindings for the v2 reader/writer by @westonpace in #2158
- feat: mvp for lance version 0.2 reader / writer by @westonpace in #1965
- feat: add string arrays to the set of v2 encodings by @westonpace in #2163
Bug fixes
- fix: handle varying field ids by @wjones127 in #2187
Other changes
- fix: pin h5py version to avoid missing arm wheels by @westonpace in #2178
- fix: select the wrong neighbors by @BubbleCal in #2181
- chore: refine the IVF_HNSW index building path by @BubbleCal in #2162
- ci: full python 3.12 ci by @eddyxu in #2170
- perf: impl heuristic pruning for HNSW by @BubbleCal in #2171
- chore: fix license headers by @westonpace in #2177
- docs: ray integration by @eddyxu in #2185
- fix: fix ray test broken on windows by @westonpace in #2188
Full Changelog: v0.10.10...v0.10.11