Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Implement a better JNI function to assemble the output columns from cudf::read_json #17002

Open
ttnghia opened this issue Oct 4, 2024 · 0 comments
Assignees
Labels
feature request New feature or request Spark Functionality that helps Spark RAPIDS

Comments

@ttnghia
Copy link
Contributor

ttnghia commented Oct 4, 2024

After reading data using cudf::read_json, if the read schema is given, we need to rearrange the output columns from the output table such that the final output columns will have order matched with the column order given in the input schema.

Currently, this process can lead to copying a lot of columns from the output table of cudf::read_json (hundreds column) into a structs column, which leads to significant overhead. We can do much better by just moving them instead, so there will be no data copying at all.

@ttnghia ttnghia added feature request New feature or request Spark Functionality that helps Spark RAPIDS labels Oct 4, 2024
@ttnghia ttnghia self-assigned this Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

No branches or pull requests

1 participant