Arm support #172

anthony-linaro · 2023-07-14T09:19:49Z

This PR adds ARM64 support by way of the reference implementations.

NOTE: This goes in tandem with RenderKit/mkl-dnn#1

Performance is ....not great.

From oidnBenchmark.exe:

Windows ARM64 (using clang-cl 16.0.6):
RT.hdr_alb_nrm.1920x1080 ... 1.04168e+06 msec/image

Linux ARM64 (using GCC 12.2.0):
RT.hdr_alb_nrm.1920x1080 ... 635797 msec/image

Windows Emulated x64 (using release fom github):
RT.hdr_alb_nrm.1920x1080 ... 4174.18 msec/image

Command lines used:

Windows (within a vcvarsall ARM64 window, VS has clang installed as part of it):

cmake -G"Ninja" .. -DTBB_ROOT=<path\to\oneTBB\compiled\for\ARM64> -DISPC_EXECUTABLE=<path\to\ISPC> -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER="clang-cl" -DCMAKE_CXX_COMPILER="clang-cl"

Linux:

cmake -G"Ninja" ..  -DTBB_ROOT=<path\to\oneTBB\compiled\for\ARM64> -DISPC_EXECUTABLE=<path\to\ISPC> -DCMAKE_BUILD_TYPE=Release

And then both built with:

cmake --build .

Clearly, there is serious need for improvement here (250x slowdown!). What are the options? Would a PR of say, DirectML enablement, for Windows be acceptable?

I tried enabling Arm CL, which (after some faff) compiled and linked in, but none of the implementations seemed to be a match - it always fell back to the reference implementation, and if that was disabled, a segfault occurred.

The other ARM64 jit implementations present are not applicable in this case, as they use SVE512 (which is not widely implemented yet outside of a few select/recent CPUs).

anthony-linaro · 2023-07-14T09:20:39Z

A kind request for feedback from @atafra :)

atafra · 2023-07-14T12:40:51Z

Thanks for your efforts but unfortunately using a reference implementation which is 250x slower does not meet the standards of Open Image Denoise, which aims to provide high performance regardless of platform. DirectML wouldn't be a good solution either because it would be limited to Windows and would still not enable ARM CPUs.

At this point I think the only acceptable solution would be is writing a custom, reasonably fast convolution kernel using NEON, instead of using oneDNN. ISPC would probably be the most straightforward way to implement this. The other kernels are already implemented in ISPC.

anthony-linaro · 2023-11-02T10:59:34Z

Closing in favour of an ISPC-based one I will make shortly

anthony-linaro added 2 commits July 11, 2023 13:04

Add support for ARM64 devices

5c23392

Merge branch 'OpenImageDenoise:master' into arm-support

dd4bb2e

anthony-linaro closed this Nov 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arm support #172

Arm support #172

anthony-linaro commented Jul 14, 2023

anthony-linaro commented Jul 14, 2023

atafra commented Jul 14, 2023

anthony-linaro commented Nov 2, 2023

Arm support #172

Arm support #172

Conversation

anthony-linaro commented Jul 14, 2023

anthony-linaro commented Jul 14, 2023

atafra commented Jul 14, 2023

anthony-linaro commented Nov 2, 2023