Adding torch.compile
compatiblity
#9
+14
−4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hey
e3nn
developer here. Thanks for the great work ! Interested in upstreaming the kernels into the main repo if you guys are interested.We recently landed full PT2 compatibility. I re-ran the benchmarks and results were pretty dramatic. This was on an RTX A5500. I was wondering if it's possible to rerun the benchmarks on an XPU.
Interested in figuring out when to pivot to custom kernels and when to let Inductor take charge.
Also please feel free to edit my PR to suit your needs. Just wanted to point out the flag changes.
Thanks again !
Update: Did not check numerical accuracy so maybe that might be leading to some gains ? The flipping of the order is also interesting since I would have expected the approach highlighted in the paper to be better for higher Ls.
Speedup = e3nn time / custom triton kernel time