Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory benchmarks for unexpected gl_SubgroupSize #44

Merged
merged 1 commit into from
Nov 27, 2023
Merged

Conversation

dneto0
Copy link
Collaborator

@dneto0 dneto0 commented Nov 24, 2023

Some Intel GPUs have flexible subgroup sizes.
subgroupSize can be 32 but minSubgroupSize can be smaller. In this case, unless you forcibly control the subgroup size at pipeline creation time, gl_SubgroupSize will report 32 but the actual number of invocations in the subgroup may be 8.

In the memory benchmarks, use a bitcount of the ballot to compute the dynamic (actual) size of the subgroup. The alternative is to use the much more recent (and less portable) subgroup size control extension.

Fixes: #43

Some Intel GPUs have flexible subgroup sizes.
subgroupSize can be 32 but minSubgroupSize can be smaller.
In this case, unless you forcibly control the subgroup size
at pipeline creation time, gl_SubgroupSize will report 32 but
the actual number of invocations in the subgroup may be 8.

In the memory benchmarks, use a bitcount of the ballot to compute
the dynamic (actual) size of the subgroup.  The alternative is
to use the much more recent (and less portable) subgroup size
control extension.

Fixes: #43
@dneto0 dneto0 requested a review from kuhar November 24, 2023 22:29
Copy link
Collaborator

@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this is surprising to me since gl_SubgroupSize is not even a constant.

When I used the subgrup tutorial as the reference https://www.khronos.org/blog/vulkan-subgroup-tutorial I understood that using gl_SubgroupSize was the way to get the actual subgroup size.

IIUC, this means that gl_SubgroupSize matches the subgroupSize from VkPhysicalDeviceSubgroupProperties instead of the dynamic number of invocations? Is there some reference that we could add as a comment to explain why this subgroup size calculation is necessary?

@antiagainst
Copy link
Collaborator

TIL. Thanks David! I overlooked this tricky part before.

Wow, this is surprising to me since gl_SubgroupSize is not even a constant.

I've also posted some useful resources in #43 (comment).

When I used the subgrup tutorial as the reference https://www.khronos.org/blog/vulkan-subgroup-tutorial I understood that using gl_SubgroupSize was the way to get the actual subgroup size.

That tutuorial is quite early. Apparently there are more developments and new extensions following it to improve things.

@dneto0 dneto0 merged commit 90866c8 into main Nov 27, 2023
8 checks passed
@dneto0 dneto0 deleted the issue-43 branch November 27, 2023 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistent gl_SubgroupSize across different GPUs and Vulkan versions/extensions
3 participants