Skip to content

Commit

Permalink
For M3 / M4, leading block dimensions always fix to 32.
Browse files Browse the repository at this point in the history
  • Loading branch information
liuliu committed Aug 16, 2024
1 parent cde0b15 commit 8d6197e
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion lib/nnc/mfa/v2/GEMMKernelDescriptor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ GEMMKernelDescriptor::GEMMKernelDescriptor(simd::ushort3 blockDimensions, GEMMOp

std::pair<simd::ushort3, std::optional<simd::ushort3>> GEMMKernelDescriptor::getBlockDimensions(MTL::Device* const mtlDevice, const uint32_t coreCount, const simd::uint3 matrixDimensions, const int64_t batchDimension, const GEMMOperandPrecisions memoryPrecisions, const simd::uchar3 transposeState) noexcept {
if (mtlDevice->supportsFamily(MTL::GPUFamily(1009))) {
return std::make_pair(simd::ushort3 { 32, 32, 8 }, std::nullopt);
return std::make_pair(simd::ushort3 { 32, 32, 8 }, simd::ushort3 { 32, 32, 32 });
}

// Find the actual number of threadgroups, with a large block size.
Expand Down

0 comments on commit 8d6197e

Please sign in to comment.