Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic/hang in VulkanSwapChain::acquire during window resize #8185

Open
emezeske opened this issue Oct 8, 2024 · 5 comments
Open

Panic/hang in VulkanSwapChain::acquire during window resize #8185

emezeske opened this issue Oct 8, 2024 · 5 comments
Assignees
Labels
vulkan Issues with the Vulkan backend

Comments

@emezeske
Copy link
Contributor

emezeske commented Oct 8, 2024

⚠️ Issues not using this template will be systematically closed.

Describe the bug
Rapid window resizes (e.g. due to the user dragging an edge of the window) eventually cause Filament's Vulkan backend to crash. I initially noticed this with my app using Filament, but I was able to reproduce it with the gltf_viewer.exe demo as well. However it is more difficult to reproduce with gltf_viewer.exe, because it doesn't resize the render window until the user is done resizing it. My app resizes the render window constantly as the user is dragging it, which makes the bug much more likely to appear.

I initially found out about this because a user on an Intel Iris Xe integrated graphics chipset used my app. I happen to have an Intel Iris Xe as well, and it is VERY easy to reproduce on this chip. It doesn't crash, but hangs, and it is totally deterministic within about half a second of resizing the window.

I thought it was a bug specific to Intel Iris Xe, but eventually I reproduced it on my NVIDIA 2080 Ti. On the NVIDIA chip it is just a lot harder to get the issue to show up, but it's there. The fact that it's harder to repro on NVIDIA makes me think that it's an issue with running out of resources, and the NVIDIA chip has more.

Also, this page seems relevant:
https://docs.vulkan.org/samples/latest/samples/api/swapchain_recreation/README.html

Note that the swapchain may be recreated without a second acquire. This means that the swapchain could be recreated while there are pending old swapchains to be destroyed. The destruction of both old swapchains must now be deferred to when the first QP of the new swapchain has been processed. If an application resizes the window constantly and at a high rate, we would keep accumulating old swapchains and not free them until it stops.

To Reproduce
Steps to reproduce the behavior:

  1. Open gltf_viewer.exe on windows in VULKAN mode.
  2. Rapidly resize the window repeatedly. I found that I could only do this fast enough by putting it near the top of the screen and repeatedly double-clicking the title bar, cycling it between maximized and regular sizes as quickly as I could click.
  3. Eventually the app will crash in VulkanSwapChain::acquire(). But on NVIDIA hardware it's hard to repro this because it is rare.

Expected behavior
Not crashing.

Screenshots
N/A

Logs
I1008 15:47:53.8515384 9276.0 model_renderer.cc:66] [Filament I]: vkCreateSwapchain: 1809x1528, 44, 0, swapchain-size=3, identity-transform=true, depth=126
I1008 15:47:53.9954227 9276.0 model_renderer.cc:66] [Filament I]: vkCreateSwapchain: 483x765, 44, 0, swapchain-size=3, identity-transform=true, depth=126
I1008 15:47:54.0431041 9276.0 model_renderer.cc:66] [Filament I]: vkCreateSwapchain: 483x980, 44, 0, swapchain-size=3, identity-transform=true, depth=126
I1008 15:47:54.1264958 9276.0 model_renderer.cc:66] [Filament I]: vkCreateSwapchain: 483x1352, 44, 0, swapchain-size=3, identity-transform=true, depth=126
E1008 15:47:54.2013704 9276.0 model_renderer.cc:56] [Filament E]: Postcondition
in acquire:130
reason: Cannot acquire in swapchain.
E1008 15:47:54.2013983 9276.0 model_renderer.cc:56] [Filament E]:
E1008 15:47:55.0510578 9276.0 logging.cc:43] *** SIGABRT received at time=1728402475 ***
E1008 15:47:55.0513225 9276.0 logging.cc:43] @ 00007FF70A7943DC (unknown) abort
E1008 15:47:55.0514756 9276.0 logging.cc:43] @ 00007FF70A55A73E (unknown) utils::TPanicutils::PostconditionPanic::panic
E1008 15:47:55.0515894 9276.0 logging.cc:43] @ 00007FF70A55A668 (unknown) utils::TPanicutils::PostconditionPanic::panic
E1008 15:47:55.0517386 9276.0 logging.cc:43] @ 00007FF70A512568 (unknown) filament::backend::VulkanSwapChain::acquire
E1008 15:47:55.0518643 9276.0 logging.cc:43] @ 00007FF70A4CC0F3 (unknown) filament::backend::VulkanDriver::makeCurrent
E1008 15:47:55.0520594 9276.0 logging.cc:43] @ 00007FF70A49246D (unknown) filament::backend::CommandStream::CommandStream
E1008 15:47:55.0522953 9276.0 logging.cc:43] @ 00007FF70A492550 (unknown) filament::backend::CommandStream::execute
E1008 15:47:55.0524554 9276.0 logging.cc:43] @ 00007FF70A3CCE44 (unknown) filament::FEngine::execute
E1008 15:47:55.0526161 9276.0 logging.cc:43] @ 00007FF70A3CEC2E (unknown) filament::FEngine::loop
E1008 15:47:55.0528188 9276.0 logging.cc:43] @ 00007FF70A3C580F (unknown) std::thread::_Invoke<std::tuple<int (__cdecl filament::FEngine::)(void) __ptr64,filament::FEngine * __ptr64>,0,1>
E1008 15:47:55.0529430 9276.0 logging.cc:43] @ 00007FF70A794496 (unknown) thread_start<unsigned int (__cdecl
)(void *),1>
E1008 15:47:55.0530671 9276.0 logging.cc:43] @ 00007FFA3212257D (unknown) BaseThreadInitThunk
E1008 15:47:55.0531682 9276.0 logging.cc:43] @ 00007FFA341EAF28 (unknown) RtlUserThreadStart

Desktop (please complete the following information):

  • OS: Windows
  • GPU: NVIDIA GTX 2080 Ti, or Intel Iris Xe
  • Backend: Vulkan

Smartphone (please complete the following information):

  • Device: N/A
  • OS: N/A

Additional context
N/A

@emezeske
Copy link
Contributor Author

emezeske commented Oct 8, 2024

I've continued trying to debug this, and I have now confirmed that something is being leaked when the window is resized.

The tests below are done using my NVIDIA 2080 Ti. I did similar tests on my Intel Iris Xe and it crashes within 1 second of resizing. Same issue, but the integrated chipset runs out of whatever resource is leaked much more quickly.

In this screenshot, I show the windows task manager GPU memory graph. There are two regions where I am resizing the window wildly with the mouse, resulting in hundreds of calls to mPlatform->recreate() inside VulkanSwapChain::acquire. In both regions you can see the GPU memory use grow extremely rapidly:

resize-a-lot

Once it hits peak memory usage, window resizing gets erratic and glitchy and will eventually crash.

In this second screenshot, I show the GPU memory graph again, but this time I added a call to Engine::createSwapChain() and Engine::createRenderer() (as well as corresponding calls to Engine::destroy() for each) on every frame. This obviously shows that something is being leaked whenever the swap chain is destroyed and recreated.

destroy-create-each-frame

I think this is about as far I can debug. Hopefully this is helpful -- right now the Vulkan implementation on Windows is quite unstable due to this particular issue.

@emezeske
Copy link
Contributor Author

emezeske commented Oct 8, 2024

One last note: for a SUPER easy repro, even with a good graphics card, you can just write a for loop that resizes the window:

for (int i = 0; i < 10000; ++i) {
  MoveWindow(hwnd, 100 + i % 100, 100, 100, 100, TRUE); 
}

I did this and it crashes in less than a second. I guess for the demos it might be:

for (int i = 0; i < 10000; ++i) {
  SDL_SetWindowSize(window, 100 + i % 100, 100); 
}

@emezeske
Copy link
Contributor Author

emezeske commented Oct 8, 2024

Ah. It looks like the bug is that vkDestroySwapchainKHR() is not being called after vkCreateSwapchainKHR() is used to recreate the swap chain. So the old swap chain is being leaked. Note that setting VkSwapchainCreateInfoKHR::oldSwapchain in the creation does not cause it to be freed -- the calling code still has to destroy it once it is no longer in use, or it is leaked.

@emezeske
Copy link
Contributor Author

emezeske commented Oct 9, 2024

See also KhronosGroup/Vulkan-Docs#1678 for info on how to destroy the swap chain safely. Yuck.

@poweifeng
Copy link
Contributor

Thanks for bring this bug to our attention and doing the research. It's a bit tricky to address correctly. I'll try to resolve it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
vulkan Issues with the Vulkan backend
Projects
None yet
Development

No branches or pull requests

2 participants