Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]: CUB large input support #50

Open
7 of 22 tasks
Tracked by #47
jrhemstad opened this issue Apr 21, 2023 · 1 comment
Open
7 of 22 tasks
Tracked by #47

[FEA]: CUB large input support #50

jrhemstad opened this issue Apr 21, 2023 · 1 comment
Labels
cub For all items related to CUB feature request New feature or request.

Comments

@jrhemstad
Copy link
Collaborator

jrhemstad commented Apr 21, 2023

As a lower-level interface, CUB should optimize for flexibility and performance. As a result, CUB will not guarantee a large input will work by default. However, it should enable users to specify their desired offset type.

This means CUB should not perform any dynamic dispatch based on the input size. Instead, users should have a way to statically specify the offset type. In previous discussion we favored making the type of num_items a template and infer the offset type from the type of num_items.

Design-related research

  1. elstehle
  2. 3 of 4
    elstehle

Testing large number of items

  1. fbusato
  2. fbusato

Enable large num_items in CUB algorithms that are sensitive to the choice of offset_t

  1. 7 of 7
    elstehle
  2. elstehle
  3. 8 of 9
    elstehle
  4. 3 of 5
    elstehle
  5. elstehle

Clean up interim testing infrastructure

Documentation

@jrhemstad jrhemstad changed the title Determine and finalize design for large input support in CUB CUB large input support Apr 21, 2023
@miscco miscco added feature request New feature or request. cub For all items related to CUB labels Jul 12, 2023
@miscco miscco changed the title CUB large input support [FEA]: CUB large input support Jul 12, 2023
@elstehle
Copy link
Collaborator

elstehle commented Feb 21, 2024

legend for offset type: ✅ considered done | 🟡 considered lower priority | 🟠 considered higher priority, as it prevents usage for larger-than-INT_MAX number of items | ⏳ in progress

legend for testing columns: ✅ considered done | 🟡 to be done | 🟠 needs to support wider offset types first

algorithm offset type tests larger-thanINT_MAX tests close to [U]INT_MAX
device_adjacent_difference.cuh choose_offset_t ✅ 2^33, sanity check, iterators 🟡
device_copy.cuh 🟡num_ranges: uint32_t
🟡buffer sizes: iterator_traits<SizeIteratorT>::value_type
🟡 🟡
device_for.cuh 🟡NumItemsT: ForEachN, ForEachCopyN, Bulk
difference_type: ForEach, ForEachCopy
🟡 🟡
device_histogram.cuh 🟡 dynamic dispatch: int for (num_rows * row_stride_bytes)<INT_MAX;
OffsetT otherwise
🟡 🟡
device_memcpy.cuh 🟡 num_ranges: uint32_t
🟡 buffer sizes: iterator_traits<SizeIteratorT>::value_type
🟡 🟡
device_merge_sort.cuh 🟡 NumItemsT ✅ extensive check ✅ extensive check
device_partition.cuh int: Flagged, If
🟠 int: ThreeWayPartition
🟠 🟠
device_radix_sort.cuh choose_offset_t ✅ extensive check ✅ extensive check
device_reduce.cuh choose_offset_t: Reduce, Sum, Min, Max, ReduceByKey, TransformReduce
⚠️ (note) int: ArgMin, ArgMax
✅ sanity, 2^{30,31,33) ✅ sanity, 2^32-1
device_run_length_encode.cuh 🟠 int 🟠 🟠
device_scan.cuh ✅ choose_offset_t: DeviceScan
⏳ choose_offset_t: DeviceScanByKey
🟠 🟠
device_segmented_radix_sort.cuh 🟠 num_items & num_segments:int 🟠 🟠
device_segmented_reduce.cuh 🟡 common_iterator_value_t({Begin,End}OffsetIteratorT): Reduce, Sum, Min, Max
⚠️ (note) int: ArgMin, ArgMax
num_segments: int
✅ sanity, rnd [2^31; 2^33] 🟡
device_segmented_sort.cuh 🟠 num_items & num_segments:int 🟠 🟠
device_select.cuh choose_offset_t: UniqueByKey
int: Flagged, If, Unique
device_spmv.cuh 🟠 int 🟠 🟠
device_merge.cuh 🟠 int 🟡 🟡
device_transform.cuh 🟠 int 🟡 🟡

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cub For all items related to CUB feature request New feature or request.
Projects
Status: Todo
Development

No branches or pull requests

3 participants