Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use LoopVectorization for faster Loops #100

Open
13 of 16 tasks
avik-pal opened this issue Jul 22, 2024 · 0 comments
Open
13 of 16 tasks

Use LoopVectorization for faster Loops #100

avik-pal opened this issue Jul 22, 2024 · 0 comments

Comments

@avik-pal
Copy link
Member

avik-pal commented Jul 22, 2024

Recently I refactored the code base to use loops on CPU instead of broadcasting. This makes the code quite a bit faster but more importantly allows us to easily swap in LoopVectorization

See commit history of #97 for more details.

Improvements over NNlib Functions

I don't think NNlib will accept LoopVectorization as a dependency. So we implement them here itself

  • batched_mul --> A batched_matmul that checks if the Array is on CPU and if it can be loop vectorized then we loop vectorize else we forward to NNlib.batched_mul.
  • conv --> Bypass the CPU conv routines in fused_conv with ones written using Loop Vectoization.
  • pooling operations

Implementations where LoopVectorization will help

Reductions

  • We have the infrastructure setup in impl/fast_ops.jl. For LoopedArrayOp we need to simply use VectorizedStatistics.jl (and maybe VectorizedReductions.jl).

Automatic Differentiation

We don't need to worry about ChainRules. It anyways has rrule defined as of now. But Enzyme really not happy with Loop Vectorization. Use custom rules for the following:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant