You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Yeah, I've actually been working on a PR that improves conv quite a bit by allowing direct convolution, and adding multithreading to most of the algorithms. It would be simple to add in place convolution to that. I've been pretty busy with my research and sort of stalled out on the PR, but I'll try to pick it up soon.
It would also be good to have a no-allocation conv! where you can pre-allocate everything it needs and pass it in. I have to run the same convolutions hundreds/thousands/millions of times and the allocations mean I pretty much can't use DSP.jl for that.
By the way, if you want to improve small matrix direct conv performance, I noticed that the direct windowing method in DynamicGrids.jl seems to be ~ 3x faster than DSP.jl for 3*3 kernels - although it's not written specifically for convolutions.
It uses @generated StaticArrays and blocking, and the compiler seems to inline and vectorise it pretty well (or something). It's ok for 5*5 but gets a lot slower from there.
Could we get an in place
conv!
function?Looking at the code, it seems like you'd only need to add a function that compares which of the inputs is larger and calls
_conv!
directly.The text was updated successfully, but these errors were encountered: