v0.8.19: Stats-aware scanning, faster GPU index training, and prefiltering bug fixes

westonpace released this 06 Dec 14:18

· 866 commits to main since this release

New Features

feat: a tensor dataset that shared with the same behavior as Lance torch Dataset by @eddyxu in #1679
feat: add option to pass in precomputed row_id -> ivf partiton mapping and compute partiiton on GPU by @chebbyChefNEQ in #1680
feat: add batch buffering and async loading to torch.LanceDataset by @chebbyChefNEQ in #1687
feat: optimized pushdown scanner by @wjones127 in #1328

Bug Fixes

fix: dont use scalar indices unless we are prefiltering by @westonpace in #1678
fix: lance pytorch dataset parameter to load with row_id by @eddyxu in #1676
fix: make sure to prefilter the flat portion of a combined knn by @westonpace in #1583

Performance Improvements

perf: use datafusion to shuffle index partition data by @wjones127 in #1645

Other Changes

chore: add utility to compute ground truth for benchmarks by @eddyxu in #1668
chore: add new python benchmarks for testing scalar indices by @westonpace in #1658

Full Changelog: v0.8.18...v0.8.19

Contributors

eddyxu, westonpace, and 2 other contributors

Assets 2