Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forward-merge branch-23.08 to branch-23.10 #3783

Merged
merged 1 commit into from
Aug 14, 2023
Merged

Conversation

GPUtester
Copy link
Contributor

Forward-merge triggered by push to branch-23.08 that creates a PR to keep branch-23.10 up-to-date. If this PR is unable to be immediately merged due to conflicts, it will remain open for the team to manually merge.

When creating multiple graphs with the same dask_cudf dataframe, there is a metadata mismatch occurring when one or more partitions are empty. In fact, during the second graph creation with the dask_cudf dataframe that was used/modified earlier, the metadata are not conserved for partitions with empty empty dataframes. This is due to the fact a _reference_ to the input dataframe partly destroyed (modfied) during the first graph creation is reused in the second graph creation.

This PR makes a copy of the input dataframe right after the repartition call to avoid that alteration.

Authors:
   - jnke2016 ([email protected])

Approvers:
   - Vibhu Jawa (https://github.com/VibhuJawa)
   - Alex Barghi (https://github.com/alexbarghi-nv)
   - Rick Ratzel (https://github.com/rlratzel)
@GPUtester GPUtester requested a review from a team as a code owner August 14, 2023 14:13
@GPUtester GPUtester merged commit 0917af8 into branch-23.10 Aug 14, 2023
14 checks passed
@GPUtester
Copy link
Contributor Author

SUCCESS - forward-merge complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants