New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Optimize aarch64 GEMM kernel #32

Merged

robertknight merged 1 commit into main from aarch64-gemm-v2

Jan 5, 2024

Commits on Jan 5, 2024

Optimize aarch64 GEMM kernel
```
Revise aarch64 kernel to use SIMD intrinsics. The structure is the same as the
AVX 2 / FMA kernel, but the tile size is set to 8x8 as that performed best.

On an M1 Mac performance for an M=N=K=1024 matmul increases from ~334 to ~418
GFLOPS.
```
robertknight committed Jan 5, 2024
Configuration menu
View commit details

Copy full SHA for 31622c1

Browse repository at this point
Copy the full SHA

31622c1 View commit details

Browse the repository at this point in the history