Releases: lancedb/lance
v0.10.10: Easier S3 config
Features
- feat: easier and consistent S3 configuration by @wjones127 in #2147
- When using AWS S3, you no longer have to specify the region.
- S3 configuration is now more consistent in how it is picked up. It will now always read explicit
storage_options
first before looking at environment variables.
- feat: add encoders/decoders for basic types by @westonpace in #2142
- feat: initial reader/writer for the v2 format by @westonpace in #2153
- feat: add take to the v2 schedulers by @westonpace in #2156
- feat: add
ef
as query parameter by @BubbleCal in #2155 - feat: fast l2 for uint8 by @eddyxu in #2161
- perf: use fast u8 l2 route in hnsw beam search by @eddyxu in #2164
- feat: make substrait optional to allow avoiding libgit2 by @westonpace in #2168
Bug fixes
- fix: set prost to minimum 0.12.2 by @albertlockett in #2167
Other changes
- chore: apply clippy suggestions newly introduced in latest compiler by @westonpace in #2150
Full Changelog: v0.10.9...v0.10.10
v0.10.9 small robustness fix in cloud storage
What's Changed
- feat: support IVF_HNSW_SQ in Python by @BubbleCal in #2149
- feat: add outer retry loop to CloudObjectReader::size by @westonpace in #2151
- docs: fix typo in lance-arrow by @rgbkrk in #2152
New Contributors
Full Changelog: v0.10.8...v0.10.9
v0.10.8 bug fixes, support filter with count rows
What's Changed
- fix: may get lower recall from HNSW with quantization by @BubbleCal in #2145
- feat: add core traits for encoders & decoders in the v2 format by @westonpace in #2141
- feat: support filter in count rows by @eddyxu in #2146
- fix: variable file fragments by @wjones127 in #2148
Full Changelog: v0.10.7...v0.10.8
v0.10.7: major bugfix for drop_columns, storage_options in Python
Bug fixes
❗ There was an bug with drop_columns()
. If you've called this on your dataset, you should check if your dataset was affected by running dataset.validate()
. If this raises an error, you can call dataset.delete("false")
to force a repair operation on your dataset. Afterward it will work as expected.
- fix: remove data files with all dropped columns by @wjones127 in #2130
New Features
🚀 You can now configure object storage connection in the kwargs
of lance.dataset()
and lance.write_dataset()
with storage_options
. For example:
import lance
ds = lance.dataset(
"s3://bucket/path",
storage_options={
"region": "us-east-1",
"access_key_id": "my-access-key",
"secret_access_key": "my-secret-key",
"session_token": "my-session-token",
}
)
Read more in https://lancedb.github.io/lance/read_and_write.html#object-store-configuration
- feat(python): expose storage options by @wjones127 in #2131
- feat: extend datagen to cover more types by @westonpace in #2138
- feat: add a protobuf file describing encodings by @westonpace in #2137
- feat: add a basic encodings crate by @westonpace in #2139
- feat: support IVF_HNSW_SQ by @BubbleCal in #2136
Other Changes
- chore: expose dynamic projection on fragment API by @chebbyChefNEQ in #2144
Full Changelog: v0.10.6...v0.10.7
v0.10.6 Better fp16 perf in python, fix memory issues in scalar indices
What's Changed
- feat: expose migration check by @wjones127 in #2074
- chore: loading HNSW levels in parallel by @BubbleCal in #2093
- chore: construct the dist table only once while searching by @BubbleCal in #2094
- feat: enable fp16kernels in Mac and x86 Linux Python wheels by @wjones127 in #2098
- feat: support IVF_HNSW index by @BubbleCal in #2080
- perf: create ood dataset in bigann benchmark by @eddyxu in #2084
- chore: fix the time unit in logs by @BubbleCal in #2101
- chore: dynamically detect the schema while shuffling data by @BubbleCal in #2105
- chore: expose find_partition method by @chebbyChefNEQ in #2106
- fix: very low recall on IVF_HNSW by @BubbleCal in #2104
- fix: load_partition return Error when partition_id out of bounds by @LeoReeYang in #2107
- chore: expose query residulization by @chebbyChefNEQ in #2108
- chore: move java core api to sub java module by @LuQQiu in #2115
- docs: update image_to_tensor to to_tensor by @vipul-maheshwari in #2116
- perf: independent parallel building for IVF_HNSW partitions by @BubbleCal in #2109
- chore: add example of IVF_HNSW by @BubbleCal in #2112
- perf: fully building HNSW partitions in parallel by @BubbleCal in #2117
- perf: load HNSW levels in parallel by @BubbleCal in #2111
- chore: drop data after copied to reduce memory footprint by @BubbleCal in #2120
- fix: populate index cache at the end of loading by @chebbyChefNEQ in #2123
- fix: use the fair spill pool instead of the greedy spill pool by @westonpace in #2126
- feat: support create IVF_HNSW_PQ index in Python by @BubbleCal in #2127
- feat: add scalar quantizer by @BubbleCal in #2134
- fix: the HNSW index doesn't respect to refine factor by @BubbleCal in #2122
- feat: add sq storage and transformer by @BubbleCal in #2135
New Contributors
- @LeoReeYang made their first contribution in #2107
- @vipul-maheshwari made their first contribution in #2116
Full Changelog: v0.10.5...v0.10.6
v0.10.5 fix panic when reading datasets
Fixes a potential panic when reading a fragment that had multiple data files.
What's Changed
- chore: expose internal index APIs by @chebbyChefNEQ in #2082
- chore: expose prefilter traits by @chebbyChefNEQ in #2083
- feat: add java fragment create by @LuQQiu in #2081
- chore: enable codecov by @chebbyChefNEQ in #2088
- perf: add a set of benchmark dataset by @eddyxu in #2090
- perf: text2image benchmark by @eddyxu in #2091
- docs: add llm training example by @tanaymeh in #2087
- fix: only read the data file's fields from the page table and not the whole dataset's fields by @westonpace in #2095
New Contributors
Full Changelog: v0.10.4...v0.10.5
v0.10.4 Faster merge insert, fix for compaction race condition
What's Changed
- fix: assume default rust toolchain as stable by @kerryeon in #2055
- chore: extend JNI to get strings by @eddyxu in #2047
- chore: bump datafusion version by @universalmind303 in #2035
- chore: skip empty batch while chunking batches by @BubbleCal in #2026
- chore: utility to convert RecordBatchStream to FFI_ArrowArrayStream by @eddyxu in #2065
- feat: use a scalar index, if available, during a merge insert operation by @westonpace in #1987
- chore: write HNSW partitions by @BubbleCal in #2056
- chore: add struct for parameters of HNSW by @BubbleCal in #2057
- docs: add llm dataset creation example by @tanaymeh in #2060
- fix: update merge_insert code to use latest df version by @westonpace in #2071
- feat: support filter while searching in HNSW by @BubbleCal in #2058
- feat(java): fragment reader by @eddyxu in #2072
- fix: force fragments to be stored in the manifest in id-order by @westonpace in #2075
- chore: building
IVF_HNSW
index by @BubbleCal in #2066 - fix: fix bug in indexed merge insert where new data could cause merge insert to panic by @westonpace in #2076
- feat: emit warnings if f16 kernels not built by @westonpace in #2077
New Contributors
Full Changelog: v0.10.3...v0.10.4
v0.10.3 Temporal scalar indices and low RAM scalar index training
What's Changed
- ci: fix compilers for release by @wjones127 in #2032
- fix: use None not zero for limit default by @wjones127 in #2033
- fix: stronger numeric guarantees for distance kernels by @wjones127 in #2013
- feat(java): dataset get fragments by @eddyxu in #2034
- feat: config plumbing for vector benchmark framework by @chebbyChefNEQ in #2036
- fix: pin away from pyarrow 15.0.1 as it is causing segmentation fault in tests by @westonpace in #2045
- chore: upgrade chrono and fix deprecation warnings by @eddyxu in #2048
- feat: add support for scalar indices on temporal columns by @westonpace in #1968
- feat: use out-of-core sort to train btree indices by @westonpace in #2043
- chore: store ivf in arrow schema by @eddyxu in #2053
- chore: on disk pq storage by @eddyxu in #2049
Full Changelog: v0.10.2...v0.10.3
v0.10.2 Cosine, HNSW, f16 bug fixes
What's Changed
- chore(java): provide conversion utilities between jni objects and rust objects by @eddyxu in #2009
- perf: avoid re-calculating the distances while building HNSW by @BubbleCal in #2010
- chore: select less but enough neighbors to establish edges by @BubbleCal in #2011
- fix: fp16 kernels computed in f32 by @wjones127 in #1990
- chore: add fixture test for IVF index by @chebbyChefNEQ in #2014
- perf: greedy search for finding entry point by @BubbleCal in #2012
- chore: normalize transform by @eddyxu in #2017
- refactor: refactor ivf into transformer by @eddyxu in #2023
- docs: fix typo by @krlmlr in #2025
- chore: fix cosine residual calculation by @eddyxu in #2015
- feat: add recall report notebook by @chebbyChefNEQ in #2018
- ci: fix nightly build on apple silicon GHA by @eddyxu in #2024
- chore: update HNSW example by @BubbleCal in #2016
New Contributors
Full Changelog: v0.10.1...v0.10.2
v0.10.1 jvm support poc and fix bug with selecting nested field
What's Changed
- feat: implement java bindings by @beinan in #1928
- chore: simplify output schema in scanner by @chebbyChefNEQ in #1999
- fix: escape column names correctly by @chebbyChefNEQ in #2007
- fix: crate publish by @chebbyChefNEQ in #2008
New Contributors
Full Changelog: v0.10.0...v0.10.1