Fix: use max_workers and added worker timeout #18

geektophe · 2017-07-07T06:57:04Z

This patch forces the module to take into account the max_workers parameters.

The previous implementation spawned as much workers as the machine's cores using the multiprocessing library. The main issue is that the first operation done by multiprocessing when spawning a Process is a fork() system call. The module being internal to the scheduler,
it's the scheduler itself that is forked. With big configurations, the memory consumption is large enough to crash the whole machine.

The number of workers is now controlled by the max_workers parameter which was not used before.

The other fix is the inclusion of a worker_timeout parameter.

The previous implementation was waiting for the spawned workers to join 30 seconds, and was simply exiting afterwards if the job wasn't finished. Still with big configurations, if the MongoDB server does not
finish to handle queries quick enough, stalled scheduler processes (spawned workers) can remain.

With this patch, the scheduler waits for worker_timeout seconds, and kills the remaining jobs and raises an error in the logs.

Setting worker_timeout to 0 forces the scheduler to wait for all children to finish before going further.

This patch forces the module to take into account the `max_workers` parameters. The previous implementation spawned as much workers as the machine's cores using the `multiprocessing` library. The main issue is that the first operation done by `multiprocessing` when spawning a `Process` is a `fork()` system call. The module being internal to the scheduler, it's the scheduler itself that is forked. With big configurations, the memory consumption is large enough to crash the whole machine. The number of workers is now controlled by the `max_workers` parameter which was not used before. The other fix is the inclusion of a `worker_timeout` parameter. The previous implementation was waiting for the spawned workers to join 30 seconds, and was simply exiting afterwards if the job wasn't finished. Still with big configurations, if the MongoDB server does not finish to handle queries quick enough, stalled scheduler processes (spawned workers) can remain. With this patch, the scheduler waits for `worker_timeout` seconds, and kills the remaining jobs and raises an error in the logs. Setting `worker_timeout` to `0` forces the scheduler to wait for all children to finish before going further.

maethor · 2020-10-08T06:46:39Z

Hi @geektophe, I just pushed a new version which does not use workers anymore. I invite you to test it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: use max_workers and added worker timeout #18

Fix: use max_workers and added worker timeout #18

geektophe commented Jul 7, 2017

maethor commented Oct 8, 2020

Fix: use max_workers and added worker timeout #18

Are you sure you want to change the base?

Fix: use max_workers and added worker timeout #18

Conversation

geektophe commented Jul 7, 2017

maethor commented Oct 8, 2020