[BUG] Avoid allocating and using size_input
vector while computing output col sizes when lists are present.
#16985
Labels
0 - Backlog
In queue waiting for assignment
bug
Something isn't working
cuIO
cuIO issue
libcudf
Affects libcudf (C++/CUDA) code.
Milestone
Describe the bug
We currently use a temporary vector called size_input of size
num_keys = input_cols.size() x max_depth x total_number_of_pages
when computing output column lengths when lists are present. This leads to OOM for ultra-wide and deeply nested tables ifnum_keys
becomes too large (e.g.25k input cols x 5 max depth x 25k total pages = 3.12B
). Note that even if this does not OOM on a larger GPU, the loop usingnum_keys
will certainly generate a runtime error or worse a logical error downstream.To avoid this, we should update
sizes
andPageNestingInfo.page_start_value
fields usingcuda::atomic_ref
ifnum_keys
is > 2B (to avoid breaking this loopSteps/Code to reproduce bug
On any RDS machine
Expected behavior
We should not OOM. Here is an unrefined alternative loop to update
sizes
vector. Need to redefine thereduction_keys
iterator and correspondingly updatePageNestingInfo.page_start_value
for each page of each input column.Environment details
cudf:
branch-24.12
on RDS machine:dgx-05
running dev-container:cuda12.5-conda
,Additional context
N/A
The text was updated successfully, but these errors were encountered: