Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use OpenImageDenoise via the command line to benefit from GPU-based LightmapGI denoising #7640

Closed
Calinou opened this issue Sep 11, 2023 · 9 comments

Comments

@Calinou
Copy link
Member

Calinou commented Sep 11, 2023

Describe the project you are working on

The Godot editor 🙂

Describe the problem or limitation you are having in your project

Godot currently supports denoising lightmaps using OpenImageDenoise (OIDN), but this is slow for 3 reasons:

  • We use an old version of OIDN because recent versions are difficult to build from source. Recent versions feature many optimizations not found in older versions.
  • We don't have access to GPU-based denoising, which is only part of recent OIDN versions. On modern GPUs, this can provide a speedup of over 50× compared to multithreaded CPU-based denoising with identical output quality.
  • Godot's implementation only uses a single CPU thread, as we don't link against Intel's Threaded Building Blocks (TBB) library. That library is also known to be difficult to integrate in an existing project, especially if it doesn't use CMake.

Describe the feature / enhancement and how it helps to overcome the problem or limitation

There are many advantages to using OIDN via the command line:

  • We don't have to bother with building it from source, which is a notoriously difficult task. Recent versions of OIDN requires a specific compiler called ISPC and other large libraries. These aren't readily available in up-to-date versions in Linux distributions, and are even more difficult to build from source on Windows and macOS (not to mention build times can be long). We strive to keep Godot easy to build from source, so using a recent OIDN as a library doesn't seem to be possible (unless we use an approach similar to [macOS/Windows] Add optional ANGLE backed OpenGL renderer support (runtime backend selection). godot#72831).
  • We no longer have to integrate OIDN library in the Godot editor binary. This reduces binary size by roughly 7 MB, which is significant.

Performance results on Linux:

# Intel Core i9-13900K
$ oidnBenchmark --device cpu
RT.hdr_alb_nrm.1920x1080 ... 353.716 msec/image
RT.ldr_alb_nrm.1920x1080 ... 357.93 msec/image
RT.hdr_alb_nrm.3840x2160 ... 1457.19 msec/image
RT.ldr_alb_nrm.3840x2160 ... 1452.21 msec/image
RT.hdr_alb_nrm.1280x720 ... 155.315 msec/image
RT.ldr_alb_nrm.1280x720 ... 153.774 msec/image
RTLightmap.hdr.2048x2048 ... 670.833 msec/image
RTLightmap.hdr.4096x4096 ... 2950.94 msec/image
RTLightmap.hdr.1024x1024 ... 167.51 msec/image

# GeForce RTX 4090
$ oidnBenchmark --device cuda
RT.hdr_alb_nrm.1920x1080 ... 6.7645 msec/image   # 52× faster than Intel Core i9-13900K 
RT.ldr_alb_nrm.1920x1080 ... 6.59508 msec/image  # 54× faster
RT.hdr_alb_nrm.3840x2160 ... 27.8542 msec/image  # 52× faster
RT.ldr_alb_nrm.3840x2160 ... 27.8098 msec/image  # 52× faster
RT.hdr_alb_nrm.1280x720 ... 2.98997 msec/image   # 52× faster
RT.ldr_alb_nrm.1280x720 ... 2.96565 msec/image   # 52× faster
RTLightmap.hdr.2048x2048 ... 12.5533 msec/image  # 53× faster
RTLightmap.hdr.4096x4096 ... 55.1833 msec/image  # 53× faster
RTLightmap.hdr.1024x1024 ... 3.0971 msec/image   # 54× faster

Denoising a 4K lightmap goes from a several seconds operation to a near-instant one. Even if your GPU is 10 times slower in compute than a RTX 4090, it'll still handily beat the i9-13900K in this test.

System VRAM utilization doesn't exceed 5.3 GB with the 4K lightmap denoise, so it looks like 8 GB GPUs should handle this fine (perhaps 6 GB for smaller lightmaps – remember that the editor will be running at the same time).

There are some caveats though:

  • For GPU acceleration to work, the user must have a functional CUDA setup (on NVIDIA), HIP setup (on AMD) or sycl setup (on Intel). If this is not available, multithreaded CPU-based denoising is still available, which is still a net performance win from the current implementation.
  • Official OIDN binaries don't include support for saving and loading OpenEXR images, as they doesn't link against OpenImageIO. This can be done with binaries that are built manually. Intel does not officially support using the CLI (it's only meant for evaluation and benchmarking purposes).
    • This means we'd have to provide our own OIDN binaries compiled from source, but not having to deal with integrating it in SCons should make the process much easier already. We already do something similar for FBX2glTF.
    • In the meantime, you can use this command line to handle conversion from and to OpenEXR (requires ImageMagick to be installed):
# On Windows, use `%TEMP%` (cmd) or `$env:TEMP` (PowerShell) instead of `/tmp`.
convert lightmap.exr -endian LSB /tmp/lightmap.pfm \
  && oidnDenoise --filter RTLightmap --hdr /tmp/lightmap.pfm --output /tmp/lightmap_denoised.pfm \
  && convert /tmp/lightmap_denoised.pfm lightmap.exr
  • Calling CLI programs is not possible in the Android and Web editors, but this isn't much of an issue as our version of OIDN doesn't support arm64, which means it's already not effective in the Android editor. Also, you probably won't be baking lightmaps on those platforms given the performance constraints. (The Web editor doesn't support baking lightmaps currently, as it only runs OpenGL.)

This proposal effectively supersedes godotengine/godot#47344, as OIDN would become an external program called by Godot, similar to FBX2glTF (for .fbx import) and Blender (for .blend import).

It's also been mentioned that we could use an algorithm such as this one or this one as a fallback to OIDN when the CLI binary isn't installed. This can be implemented as a Vulkan compute shader to provide universal GPU acceleration.

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

  • When the path to the OIDN CLI binary (oidnDenoise) is configured in the Editor Settings and the binary is indeed present, write lightmaps to a temporary location instead of writing them in the project folder.
  • Call the OIDN CLI binary after baking lightmaps, with the input path being the temporary file output path being within the project folder.

If this enhancement will not be used often, can it be worked around with a few lines of script?

This can be worked around by disabling Use Denoiser in LightmapGI and calling the above command line, but it must be done manually every time after baking lightmaps. Using watchexec can improve this somewhat, but you still need to start it every time you open the project in the editor.

If you do this, the lightmap texture will also be imported twice by the Godot editor (once in its non-denoised form, once in its denoised form). This further slows down the lightmap baking process, especially if VRAM compression is enabled on the lightmap texture. Writing the non-denoised lightmap to a temporary location prevents this issue, but it can't be done from the Godot editor itself.

Is there a reason why this should be core and not an add-on in the asset library?

This can't be worked around with an add-on efficiently (see above).

@mrezai
Copy link

mrezai commented Sep 15, 2023

Isn't it better to load oidn's dynamic library using dlopen/LoadLibrary instead of using it from command line? in this way data buffers can be pass directly to it so OpenEXR issue will be solved.

@Calinou
Copy link
Member Author

Calinou commented Sep 15, 2023

Isn't it better to load oidn's dynamic library using dlopen/LoadLibrary instead of using it from command line? in this way data buffers can be pass directly to it so OpenEXR issue will be solved.

Dynamic linking only solves part of the problem here, as we still need to compile and link the OIDN library on the same system the Godot executable was built (due to Linux binary compatibility issues).

Also, we want the Godot editor to be distributable as a single binary with no dependencies on other files. This allows you to run it directly from a ZIP archive that hasn't been extracted, for instance. (People frequently do that, even if you don't necessarily do it yourself 🙂)

That said, godotengine/godot#81659 has been progressing a lot so it should provide a pretty good fallback if you don't want to set up OIDN.

@mrezai
Copy link

mrezai commented Sep 16, 2023

I mean something like how Godot works with blender or fbx files:

Current CLI application in distribution of OIDN linked to dynamic libraries in its lib directory and not static ones so we need build it anyway.

@YuriSizov
Copy link
Contributor

Implemented in godotengine/godot#81659.

@Calinou Calinou added this to the 4.2 milestone Sep 27, 2023
@Calinou
Copy link
Member Author

Calinou commented Sep 27, 2023

Implemented in godotengine/godot#81659.

To clarify, godotengine/godot#81659 doesn't implement OIDN CLI but provides an alternative GPU-based denoiser that is pretty effective (while being much faster than our current implementation). It also doesn't require installing any additional software to work (not even a CUDA/HIP/Sycl setup as the OIDN CLI requires). We consider this to be a satisfactory solution, but we may explore adding support for OIDN CLI if there is a strong need for it.

@atirut-w
Copy link

Should this be reopened, then?

@Calinou
Copy link
Member Author

Calinou commented Sep 28, 2023

Should this be reopened, then?

No, as we don't expect this proposal to be relevant in the near future (as I said in my comment).

@YuriSizov
Copy link
Contributor

Sorry for the confusion that I caused here, got lost in the abbreviations 😅 We can reopen it to track that we still want to offer OIDN via an external executable.

@akien-mga
Copy link
Member

Implemented by godotengine/godot#82832.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants