Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arm support #172

Closed
wants to merge 2 commits into from
Closed

Conversation

anthony-linaro
Copy link

This PR adds ARM64 support by way of the reference implementations.

NOTE: This goes in tandem with RenderKit/mkl-dnn#1

Performance is ....not great.

From oidnBenchmark.exe:

Windows ARM64 (using clang-cl 16.0.6):
RT.hdr_alb_nrm.1920x1080 ... 1.04168e+06 msec/image

Linux ARM64 (using GCC 12.2.0):
RT.hdr_alb_nrm.1920x1080 ... 635797 msec/image

Windows Emulated x64 (using release fom github):
RT.hdr_alb_nrm.1920x1080 ... 4174.18 msec/image

Command lines used:

Windows (within a vcvarsall ARM64 window, VS has clang installed as part of it):

cmake -G"Ninja" .. -DTBB_ROOT=<path\to\oneTBB\compiled\for\ARM64> -DISPC_EXECUTABLE=<path\to\ISPC> -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER="clang-cl" -DCMAKE_CXX_COMPILER="clang-cl"

Linux:

cmake -G"Ninja" ..  -DTBB_ROOT=<path\to\oneTBB\compiled\for\ARM64> -DISPC_EXECUTABLE=<path\to\ISPC> -DCMAKE_BUILD_TYPE=Release

And then both built with:

cmake --build .

Clearly, there is serious need for improvement here (250x slowdown!). What are the options? Would a PR of say, DirectML enablement, for Windows be acceptable?

I tried enabling Arm CL, which (after some faff) compiled and linked in, but none of the implementations seemed to be a match - it always fell back to the reference implementation, and if that was disabled, a segfault occurred.

The other ARM64 jit implementations present are not applicable in this case, as they use SVE512 (which is not widely implemented yet outside of a few select/recent CPUs).

@anthony-linaro
Copy link
Author

A kind request for feedback from @atafra :)

@atafra
Copy link
Collaborator

atafra commented Jul 14, 2023

Thanks for your efforts but unfortunately using a reference implementation which is 250x slower does not meet the standards of Open Image Denoise, which aims to provide high performance regardless of platform. DirectML wouldn't be a good solution either because it would be limited to Windows and would still not enable ARM CPUs.

At this point I think the only acceptable solution would be is writing a custom, reasonably fast convolution kernel using NEON, instead of using oneDNN. ISPC would probably be the most straightforward way to implement this. The other kernels are already implemented in ISPC.

@anthony-linaro
Copy link
Author

Closing in favour of an ISPC-based one I will make shortly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants