[FEA] Implement a better JNI function to assemble the output columns from `cudf::read_json` #17002

ttnghia · 2024-10-04T18:23:47Z

After reading data using cudf::read_json, if the read schema is given, we need to rearrange the output columns from the output table such that the final output columns will have order matched with the column order given in the input schema.

Currently, this process can lead to copying a lot of columns from the output table of cudf::read_json (hundreds column) into a structs column, which leads to significant overhead. We can do much better by just moving them instead, so there will be no data copying at all.

The text was updated successfully, but these errors were encountered:

ttnghia added feature request New feature or request Spark Functionality that helps Spark RAPIDS labels Oct 4, 2024

ttnghia self-assigned this Oct 4, 2024

ttnghia mentioned this issue Oct 4, 2024

[FEA] Improve GpuJsonToStructs performance NVIDIA/spark-rapids#11560

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Implement a better JNI function to assemble the output columns from `cudf::read_json` #17002

[FEA] Implement a better JNI function to assemble the output columns from `cudf::read_json` #17002

ttnghia commented Oct 4, 2024

[FEA] Implement a better JNI function to assemble the output columns from cudf::read_json #17002

[FEA] Implement a better JNI function to assemble the output columns from cudf::read_json #17002

Comments

ttnghia commented Oct 4, 2024

[FEA] Implement a better JNI function to assemble the output columns from `cudf::read_json` #17002

[FEA] Implement a better JNI function to assemble the output columns from `cudf::read_json` #17002