You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hm, ... is that not something the StepFunction could / should do itself?
E.g. if the stepfunction completes correctly, it pushes out a WRSC event with SUCCEEDED, if it fails (in any state) it can push out a WRSC with FAILED (and an appropriate / useful payload)?
if it fails (in any state) it can push out a WRSC with FAILED (and an appropriate / useful payload)?
Would it not need to then capture a potential failure in every task? I'm not sure it'd be able to capture everything.
I saw this more for step functions failing in ways we don't expect them to fail (like the secrets rotator notifications), not for if a workflow run has gone rogue.
Would it not need to then capture a potential failure in every task?
Maybe... I am not sure there is a global catch for the stepfunction as a whole. Could wrap the actual SF into a wrapper SF with a single step that has a catch... but ugly...
step functions failing in ways we don't expect them to fail
Is it important to then know that a SF has failed, or more generally that an analysis has not progressed as it should have?
(the second part should be covered as soon as we pre-generate workflow runs. We'd know what is supposed to run and we can check which of those have not progressed.
For production workloads it's important we know if the glue fails.
TODO
Getting an example of a failed state event from AWS CloudTrail doesn't seem to be straightforward, can only find StartExecution states.
Simply slack notifier logic can be found here - https://github.com/umccr/infrastructure/blob/master/cdk/apps/icav2_credentials/lambdas/slack_notifier/notify_slack.py
The text was updated successfully, but these errors were encountered: