vendor agnostic #5

bjarthur · 2023-04-25T20:37:51Z

fixes #1.

not merged yet because benchmarks are slower by ~10%:

the huge regression in batched_dot can partially be fixed by specifying CUDABackend(prefer_blocks=true), but this then is not vendor agnostic. see https://discourse.julialang.org/t/kernelabstractions-get-backend-keyword-arguments/97895

bjarthur · 2024-06-11T18:41:29Z

second pass at KA:

num threads hard-coded at 32 in the first (only) dimension to maximize block utilization mostly alleviates regression in bdot.

bjarthur force-pushed the master branch from 163f5b4 to d5ceb0e Compare January 23, 2024 12:56

bjarthur added 5 commits May 30, 2024 17:51

test spmv and spr with lower triangle properly

49b1c65

support views

def47a5

add size checks

5e33bd8

constrain test types

2e60bde

vendor agnostic

14e787a

bjarthur force-pushed the bja/ka branch from 259892b to 14e787a Compare June 11, 2024 18:33

Provide feedback