v0.8.19: Stats-aware scanning, faster GPU index training, and prefiltering bug fixes
New Features
- feat: a tensor dataset that shared with the same behavior as Lance torch Dataset by @eddyxu in #1679
- feat: add option to pass in precomputed row_id -> ivf partiton mapping and compute partiiton on GPU by @chebbyChefNEQ in #1680
- feat: add batch buffering and async loading to torch.LanceDataset by @chebbyChefNEQ in #1687
- feat: optimized pushdown scanner by @wjones127 in #1328
Bug Fixes
- fix: dont use scalar indices unless we are prefiltering by @westonpace in #1678
- fix: lance pytorch dataset parameter to load with row_id by @eddyxu in #1676
- fix: make sure to prefilter the flat portion of a combined knn by @westonpace in #1583
Performance Improvements
- perf: use datafusion to shuffle index partition data by @wjones127 in #1645
Other Changes
- chore: add utility to compute ground truth for benchmarks by @eddyxu in #1668
- chore: add new python benchmarks for testing scalar indices by @westonpace in #1658
Full Changelog: v0.8.18...v0.8.19