Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Worker keeps accepting WFT after Worker Thread terminates due to ERR_WORKER_OUT_OF_MEMORY #1536

Open
mjameswh opened this issue Sep 26, 2024 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@mjameswh
Copy link
Contributor

Describe the bug

A user reported the following sequence of events:

  • They have a TS Workflow Worker running normally;
  • At some point, they get the following ERROR-level log message: { code:"ERR_WORKER_OUT_OF_MEMORY" }.
  • After that point, it appears that the Worker is no longer making any progress on pending Workflow Tasks, despite the fact that the Worker process is still running.
  • From the Server side, metrics indicate a considerable increase in calls to PollWorkflowExecutionHistory and a considerable reduction in calls to PollWorkflowTaskQueue. Provided data doesn't allow identifying the exact provenance of those calls (they normally operate ~12 active Workers on that NS).
  • The situation continues until the Worker process is restarted, after which Workflow progress resumes.

Analysis

ERR_WORKER_OUT_OF_MEMORY is an error from Node itself. That means that Node terminated a Worker Thread due to running out of memory.

Given the symptoms, it is sensible to assume that the Worker Thread killed by Node would be the Workflow Worker Thread, which means the Temporal Worker will no longer be able to process incoming Workflow Task.

What happens next is not clear, but the correct behavior would be for the Worker to initiate shutdown. There’s very little we can do at that point, but at least, let’s not pretend that everything’s all right. Should probably print a clear CRITICAL level message to the log, and terminate the process ASAP.

@mjameswh mjameswh added the bug Something isn't working label Sep 26, 2024
@mjameswh mjameswh self-assigned this Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant