Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip configurations with fewer than 4 warps in tuning #188

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Commits on Jan 25, 2024

  1. Skip configurations with fewer than 4 warps in tuning

    Given the fact that SMs in Volta, Turing, Ampere, and Hopper have four
    processing blocks, each with one warp scheduler, I don't think it makes
    sense to try configurations during tuning where the number of warps per
    CTA is less than 4. This reduces the search space by 18.75% (well,
    assuming that each of the options of WARPS_M and WARPS_N amounts to the
    same number of valid kernels, which is probably not true...).
    
    We could also bump the limit to 8, so we allocate at least 2 warps per
    processing block. That allows the SM to switch to another warp if one
    warp stalls. This would reduce the search space by another 18.75%.
    
    We might even want to restrict this further. For example, I don't think
    a configuration like WARPS_M = 1, WARPS_N = 8 makes sense, as that has
    reduced data reuse across the M dimension compared to the configuration
    WARPS_M = 2, WARPS_N = 4, so we might also only want to try the
    following configurations:
    
    - 2 x 4
    - 4 x 2
    - 4 x 4
    - 8 x 4
    - 4 x 8
    
    That would reduce the search space by 68.75% in total.
    thomasfaingnaert committed Jan 25, 2024
    Configuration menu
    Copy the full SHA
    6776fe1 View commit details
    Browse the repository at this point in the history