TPC-H benchmarks should create parquet/feather files with multiple row groups #34

westonpace · 2021-09-17T11:45:34Z

Right now, when generating TPC-H data, one huge row group / record batch is created with all of the data. Arrow should be able to handle that "ok" but it doesn't right now and that is perhaps not as realistic a scenario. Perhaps group the data into row groups of size 1M. The writers should have options to control row group / record batch size even if the input to the writer is one huge table.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TPC-H benchmarks should create parquet/feather files with multiple row groups #34

TPC-H benchmarks should create parquet/feather files with multiple row groups #34

westonpace commented Sep 17, 2021

TPC-H benchmarks should create parquet/feather files with multiple row groups #34

TPC-H benchmarks should create parquet/feather files with multiple row groups #34

Comments

westonpace commented Sep 17, 2021