Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlocks when DPG is called from a (heavy loaded) user thread #2053

Open
v-ein opened this issue Mar 3, 2023 · 11 comments
Open

Deadlocks when DPG is called from a (heavy loaded) user thread #2053

v-ein opened this issue Mar 3, 2023 · 11 comments
Labels
state: pending not addressed yet type: bug bug

Comments

@v-ein
Copy link
Contributor

v-ein commented Mar 3, 2023

Version of Dear PyGui

Version: 1.8.0
Operating System: Windows 10

My Issue/Question

If there are several threads calling DPG functions, there's a chance they get into a deadlock where one thread is holding GIL and waiting for the mutex, while the other does the opposite. Here's an example of how this may happen:

  • Thread A is a Python thread started by the user - probably in response to a button callback or another DPG event. Thread B is the DPG thread that processes handlers/callbacks.
  • Thread A locks the mutex with dpg.mutex() and starts manipulating the widget tree (e.g. building a large table).
  • Thread B starts a callback, which is a Python callable, and therefore starts waiting for GIL.
  • In 5 ms, Python forcibly takes GIL away from thread A and lets operating system switch to another thread (thread A thus starts waiting for GIL - still holding the mutex!)
  • Thread B runs the callback (holding GIL).
  • The callback calls a DPG function, which needs to look into the widgets tree and therefore attempts to lock the mutex. The mutex is held by thread A, the GIL is held by thread B, and they're waiting each other.

The issue never occurs if DPG is only run on the main thread and the callbacks thread - the renderer always releases GIL before attempting to lock the mutex. The rest of the code, however, holds GIL when locking the mutex.

As soon as there's a DPG call in a thread other than those two "standard" threads, there's a chance for a deadlock. A call to dpg.mutex() is not necessary - the same deadlock may happen with any DPG call; dpg.mutex() just increases chances to bump into it.

To Reproduce

Note: to reproduce this, one needs PR #2004 to be merged first. Without that, the mutex is not held by DPG callers.

Steps to reproduce the behavior in general:

  1. Add a callback that's called pretty often (e.g. the "visible" or "hover" callback) and calls a DPG function.
  2. Start a new Python thread.
  3. Lock DPG mutex in that thread and start doing a heavy job - not necessarily calling DPG functions; any job would do as long as it lasts longer than 5 ms.
  4. Wait for application to hang up.

With the example below:

  1. Hover the "hover me" text and make sure it shows quickly changing numbers.
  2. Leave the checkbox unset and click the button to start a new thread. You'll see how CPU usage goes up.
  3. Hover the text again and wait for a while. Make sure it doesn't hang up.
  4. Close the application and restart it.
  5. Set the checkbox and click the button again.
  6. Hover the button and the checkbox to see DPG is still functioning.
  7. Hover the text and wait for a while. Eventually it hangs up (well, it should actually hang up right away; 5 ms is not something visible to the naked eye).

Expected behavior

No deadlocks.

Screenshots/Video

image

Standalone, minimal, complete and verifiable example

#!/usr/local/bin/python3

from random import randrange
from threading import Thread
import dearpygui.dearpygui as dpg


dpg.create_context()
dpg.create_viewport(title="Test", width=500, height=400)

def heavy_task():
    lock_it = dpg.get_value("lock-mutex")
    while True:
        count = 0
        if lock_it:
            dpg.lock_mutex()
        for _ in range(1000):
            count += 1
        if lock_it:
            # give other threads a chance to run DPG stuff (including DPG renderer)
            dpg.unlock_mutex()

dpg.setup_dearpygui()
with dpg.window(pos=(100, 100), width=350, height=200, no_title_bar=False):
    with dpg.item_handler_registry() as handlers:
        dpg.add_item_hover_handler(callback=lambda: dpg.set_value("my-text", f"{randrange(100000):5}"))
    dpg.add_text("Start a thread then hover me!", tag="my-text")
    dpg.bind_item_handler_registry(dpg.last_item(), handlers)

    dpg.add_checkbox(label="Lock mutex in the user thread", tag="lock-mutex")
    dpg.add_button(label="Start thread", callback=lambda: Thread(target=heavy_task).start())

dpg.show_viewport()
dpg.start_dearpygui()
dpg.destroy_context()
@v-ein v-ein added state: pending not addressed yet type: bug bug labels Mar 3, 2023
@v-ein
Copy link
Contributor Author

v-ein commented Mar 3, 2023

Got a fix for this, going to open a PR later.

@Breakthrough
Copy link

Breakthrough commented Sep 12, 2023

@v-ein did you ever get a fix for this uploaded? I'm running into the same issue which makes it difficult to use modern versions of DPG for async video decoding.

This seems like a pretty critical issue for any applications requiring threading to offload CPU-heavy tasks. Are there any suggested workarounds? All guidance I can find suggests using a render callback, which is no longer supported. I also found that swapping the textures in the render loop seems to trigger the deadlock more frequently in some cases.

E.g. I run into this issue when I update a texture in a background thread while also dragging a slider quickly. The application suddenly deadlocks indefinitely.

Edit: Thank you for your work on the project as well!

@v-ein
Copy link
Contributor Author

v-ein commented Sep 13, 2023

No, not yet. I'll see if I get some spare time to do that.

I've got more fixes related to synchronization, and hoped to get a chance some day to combine them into one and push as a single PR. Most of them are massive edits and thus are difficult to review, so my idea was to reduce the grind for reviewers. I'll see what I can do.

@v-ein
Copy link
Contributor Author

v-ein commented Sep 13, 2023

Offloading CPU-heavy tasks is probably best done using a background thread, like you suggested. However, DPG is not entirely thread safe, not even if we fix all the sync issues. It maintains a state (dpg.last_item() and container stack, at the very least). Since the state is global, i.e. shared between threads, certain DPG functions are unsafe and may cause race conditions.

Each alternative has its own drawback:

  • Doing anything heavy directly in callbacks/handlers is highly undesirable: it can overload the handlers thread, making DPG skip events and therefore misbehave.
  • It's possible to do heavy tasks in the main thread - instead of start_dearpygui, create your own rendering loop, doing heavy work between the frames. This may reduce FPS and make UI unresponsive.
  • Another option is to do heavy stuff in a background thread without calling DPG, then put the results into a queue, and process that queue either in the rendering loop or in the handlers thread (e.g. with an "item visible" handler). Sounds overcomplicated.

I hope one day I can make DPG fully thread-safe by making its state thread-local. This is a far goal, and might not be 100% doable, so don't count on that.

@axeldavy
Copy link

The first alternative you describe is the easiest to put in place, but indeed if events pile up we can skip important callbacks.

One improvement to the behaviour of this alternative could be to be able to set a flag telling to 'skip' events if a new one is available.

For example a mouse handler might prefer to skip old mouse positions and go directly to the most recent one.

Imagine a plot with heavy rendering in callback (drawing lines, circles, etc). By heavy, I mean in terms on CPU, not GPU.
Doing the rendering in a callback is a good solution, as the mouse can still interact smoothly with the plot. The rendering can be performed on a hidden layer, and then when it is done, the layer is shown, and the previous one hidden (double buffering).

DearPyGui is ideal for the above scenario. The only issue is if mouse/keyboard events pile up. What do you think of introducing handler configuration options to skip redundant events in the callback queue ?

@v-ein
Copy link
Contributor Author

v-ein commented Jul 27, 2024

What do you think of introducing handler configuration options to skip redundant events in the callback queue ?

It's an interesting idea, though it doesn't fit well the current DPG API. I'll need to think about it. It can probably be done in Python as well, but DPG itself surely can skip events better (because it can forward-check the queue before running the handler, whereas on Python side we'd have to run all handlers first, and only then actually perform the task associated with this kind of event).

The handler could probably have kind of a "skip repetitive events" flag, but we need to clearly define what a repetitive event is. Also, I wonder if the mouse hadnler is the only handler where we can implement such filtering.

@v-ein
Copy link
Contributor Author

v-ein commented Jul 27, 2024

BTW it can also be done in Python with manual callback management, but #2208 will need to get fixed first.

@axeldavy
Copy link

By releasing the gil in dpg.mutex(), and encapsulating all dpg calls with dpg.mutex(), I was able to avoid any deadlock despite having several threads submitting dpg commands. This can be a simple workaround for anyone affected.

@v-ein
Copy link
Contributor Author

v-ein commented Jul 29, 2024

Exactly, that was my solution too. Most of the time when locking the mutex, DPG also needs to release GIL before owning the mutex.

@axeldavy
Copy link

axeldavy commented Aug 1, 2024

Even this solution I have ended up with hangs. It looks like the more I add keyboard and mouse handlers, the more likely I get these...
In the end I had to resort to not using the frame callback and only doing the default rendering loop, and do all drawings in the callback in a single thread.

@v-ein
Copy link
Contributor Author

v-ein commented Aug 1, 2024

I hope to eventually (in a couple of months) push some fixes for synchronization issues - who knows, it might help. Not going to be soon though (the fixes are completed in my local repo, but I'll need to revisit the code and also take #2275 into account).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
state: pending not addressed yet type: bug bug
Projects
None yet
Development

No branches or pull requests

3 participants