Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporal Nexus: References (Commands, Errors, Failures) #3060

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion docs/references/commands.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,23 @@ This Command is triggered by a call to "upsert" Workflow [Search Attributes](/vi

### ProtocolMessageCommand

This Command helps guarantee ordering constraints for features such as Updates
This Command helps guarantee ordering constraints for features such as Updates.

This Command points at the message from which the Event is created.
Therefore, just from the Command, you can't predict the resulting Event type.

### ScheduleNexusOperation

This Command is triggered by a call to execute an Nexus Operation in the caller Workflow.

- Awaitable: Yes, a Workflow Execution can await on the action resulting from this Command.
- Corresponding Event: NexusOperationScheduled

By default, you can't schedule more than 30 Nexus Operations concurrently (or in total during Public Preview since completed operations are not yet deleted), see [Limits](/workflows#workflow-execution-nexus-operation-limits) for details.

### CancelNexusOperation

This Command is triggered by a call to request the cancellation of a Nexus Operation.

- Awaitable: No, a Workflow Execution can not await on the action resulting from this Command.
- Corresponding Event: NexusOperationCancelRequested
22 changes: 21 additions & 1 deletion docs/references/errors.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ Restart the [Worker](/workers) that the [Workflow](/workflows) and [Activity](/a
This error indicates that the [Workflow Task](/workers#workflow-task) failed to validate attributes on a property in the Upsert Memo or in a payload.
These attributes are either unset or exceeding size limits.

Reset any unset and empty atrributes.
Reset any unset and empty attributes.
Adjust the size of the [Memo](/workflows#memo) or payload to fit within the system's limits.

## Bad Record Marker Attributes {#bad-record-marker-attributes}
Expand Down Expand Up @@ -100,6 +100,12 @@ This error indicates unset or invalid attributes for [`ScheduleActivityTask`](/r
Reset any unset or empty attributes.
Adjust the size of the received payload to stay within the given size limit.

## Bad Schedule Nexus Operation Attributes

This error indicates unset or invalid attributes for ScheduleNexusOperation, for example if the Nexus Endpoint name used in the caller Workflow doesn't exist.

Reset any unset or empty attributes and adjust the size of the received payload to stay within the given size limit.

## Bad Search Attributes {#bad-search-attributes}

This error indicates that the [Workflow Task](/workers#workflow-task) has unset or invalid [Search Attributes](/visibility#search-attribute).
Expand Down Expand Up @@ -208,6 +214,20 @@ Therefore, the [Workflow Task](/workers#workflow-task) was failed to prevent add

Wait for the system to finish any currently running Child Workflows before redeploying this Task.

## Pending Nexus Operations Limit Exceeded {#pending-nexus-operations-limit-exceeded}

The Workflow has reached capacity for pending Nexus Operations. Therefore, the Workflow Task was failed to prevent the creation of another Nexus Operation.

:::note

In Public Preview, Nexus Operations are not yet deleted from mutable state when they complete, so the limit is for the total count of Nexus Operations in a given Workflow.

:::

Let the Workflow complete any current Nexus Operation before retrying the Task.

See [Per Workflow Nexus Operation Limits](/cloud/limits#per-workflow-nexus-operation-limits) for details.

## Pending Request Cancel Limit Exceeded {#pending-request-cancel-limit-exceeded}

This error indicates that the [Workflow Task](/workers#workflow-task) failed after attempting to add more cancel requests.
Expand Down
141 changes: 134 additions & 7 deletions docs/references/failures.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,8 @@ The base [Failure proto message](https://api-docs.temporal.io/#temporal.api.fail

## Application Failure

Workflow and Activity code use Application Failures to communicate application-specific failures that happen.
This is the only type of Failure created and thrown by user code.
Workflow, and Activity, and Nexus Operation code use Application Failures to communicate application-specific failures that happen.
This is the only type of Temporal Failure created and thrown by user code.

- TypeScript: [ApplicationFailure](https://typescript.temporal.io/api/classes/common.ApplicationFailure)
- Java: [ApplicationFailure](https://www.javadoc.io/doc/io.temporal/temporal-sdk/latest/io/temporal/failure/ApplicationFailure.html)
Expand Down Expand Up @@ -82,26 +82,130 @@ During conversion, the following Application Failure fields are set:

When an [Activity Execution](/activities#activity-execution) fails, the Application Failure from the last Activity Task is the `cause` field of the [ActivityFailure](#activity-failure) thrown in the Workflow.

### Errors in Nexus Operations

An error in a Nexus Operation can cause either a Nexus Operation Task Failure (the Task will be retried) or a Nexus Operation Execution Failure (the Nexus Operation is marked as failed).

#### Nexus Operation Task Failures

A Nexus Operation Task Failure is an unexpected situation failing to process a Nexus Operation Task in a handler.
This could be triggered by throwing an unknown error in your Nexus handler code.
These types of failures will cause the Nexus Operation Task to be retried.

#### Nexus Operation Execution Failures

A non-retryable Application Failure can be thrown by a Nexus Operation handler to fail the overall Nexus Operation Execution.
Nexus Operation Execution Failures put the Nexus Operation Execution into the "Failed" state and no more attempts will be made to complete the Nexus Operation.

#### Propagation of Workflow errors

Application Errors thrown from a Workflow created by a Nexus NewWorkflowRunOperation handler, will be automatically propagated to the caller as a non-retryable error and result in a Nexus Operation Execution Failure.

#### Using Failures in a Nexus handler

In a Nexus Operation handler, you can throw an Application Failure, a Nexus Error or another Error to fail the individual Nexus Operation Task or fail the overall Nexus Operation Execution.

Unknown errors are converted to a retryable Application Failure. During conversion, the following fields are set on the Application Failure:

- Non_retryable is set to false.
- Type is set to the error's type name.
- Message is set to the error message.

#### Retryable failures

Retryable Nexus Operation Task failures, like an unknown error, are automatically retried with a built-in Retry Policy.
When a Nexus Task fails, the caller Workflow records an event attempt failure on the pending Nexus Operation and sets the following fields:

- State is set to the new state, for example BackingOff.
- Attempt is set to an incremented count.
- Next_attempt_schedule_time is set when the Nexus Task will be retried.
- Last_attempt_failure is set with the following fields:
- Message is set to the error message.
- Failure_info is set to the Application Failure.

For example, an unknown error thrown in a Nexus handler will surface as:

```
temporal workflow describe -w my-workflow-id
...
Pending Nexus Operations: 1

Endpoint myendpoint
Service my-hello-service
Operation echo
OperationID
State BackingOff
Attempt 6
ScheduleToCloseTimeout 0s
NextAttemptScheduleTime 20 seconds from now
LastAttemptCompleteTime 11 seconds ago
LastAttemptFailure {"message":"unexpected response status: "500 Internal Server Error": internal error","applicationFailureInfo":{}}
```

### Non-retryable

When an Activity or Workflow throws an Application Failure, the Failure's `type` field is matched against a Retry Policy's list of [non-retryable errors](/encyclopedia/retry-policies#non-retryable-errors) to determine whether to retry the Activity or Workflow.
Activities and Workflow can also avoid retrying by setting an Application Failure's `non_retryable` flag to `true`.

When a Nexus Operation handler throws an Application Failure, it is retried by default using a built-in Retry Policy that cannot be customized.
Nexus Operation handlers can avoid retrying by setting an Application Failure's non_retryable flag to true.
When a non-retryable error is returned from a Nexus handler, the overall Nexus Operation Execution is failed and the error is returned to the caller’s Workflow Execution as a Nexus Operation Failure.

### Setting the Next Retry Delay {#next-retry-delay}

By setting the Next Retry Delay for a given Application Failure, you can tell the server to wait that amount of time before trying the Activity or Workflow again. This will override whatever the Retry Policy would have computed for your specific exception.
Activities and Workflows can set the Next Retry Delay for a given Application Failure, to tell the server to wait that amount of time before trying the Activity or Workflow again.
This will override whatever the Retry Policy would have computed for your specific exception.

Nexus Operations do not currently allow the handler to customize the Next Retry Delay for a given Application Failure.

Java: [NextRetryDelay](/develop/java/failure-detection#next-retry-delay)

### Nexus errors {#nexus-errors}

#### Default mapping

By default, Application Failures thrown from a Nexus Operation handler will be mapped to the following underlying Nexus Failures, based on what non_retryable is set to:

| non_retryable | Nexus error | HTTP status code |
| :-------------- | :------------------------- | :------------------------ |
| false (default) | HandlerErrorTypeInternal | 500 Internal Server Error |
| true | UnsuccessfulOperationError | 424 Failed Dependency |

#### Use Nexus Errors directly

For improved semantics and mapping to HTTP status codes for external Nexus callers (when supported), we recommend that Nexus Operation handlers throw a Nexus Error directly, which includes the list below with associated retry semantics.

For example the Nexus Go SDK provides

- `nexus.HandlerError(nexus.HandlerErrorType, msg)`
- `nexus.UnsuccessfulOperationError{state, failure}`

#### Retryable Nexus errors

| Nexus error type | non_retryable |
| :-------------------------------- | :------------ |
| HandlerErrorTypeResourceExhausted | false |
| HandlerErrorTypeInternal | false |
| HandlerErrorTypeNotImplemented | false |
| HandlerErrorTypeUnavailable | false |

#### Non-retryable Nexus errors

| Nexus error type | non_retryable |
| :------------------------------ | :------------ |
| HandlerErrorTypeBadRequest | true |
| HandlerErrorTypeUnauthenticated | true |
| HandlerErrorTypeUnauthorized | true |
| HandlerErrorTypeNotFound | true |
| UnsuccessfulOperationError | true |

## Cancelled Failure

When [Cancellation](/activities#cancellation) of a Workflow or Activity is requested, SDKs represent the cancellation to the user in language-specific ways.
When [Cancellation](/activities#cancellation) of a Workflow, Activity or Nexus Operation is requested, SDKs represent the cancellation to the user in language-specific ways.
For example, in TypeScript, in some cases a Cancelled Failure is thrown directly by a Workflow API function, and in other cases the Cancelled Failure is wrapped in a different Failure.
To check both types of cases, TypeScript has the [isCancellation](https://typescript.temporal.io/api/namespaces/workflow#iscancellation) helper.

{/* TODO also link to Workflow Cancellation concept */}

When a Workflow or Activity is successfully Cancelled, a Cancelled Failure is the `cause` field of the Activity Failure or "Workflow failed" error.
When a Workflow, Activity or Nexus Operation is successfully Cancelled, a Cancelled Failure is the `cause` field of the Activity Failure, Nexus Operation Failure or "Workflow failed" error.

- TypeScript: [CancelledFailure](https://typescript.temporal.io/api/classes/common.CancelledFailure)
- Java: [CanceledFailure](https://www.javadoc.io/doc/io.temporal/temporal-sdk/latest/io/temporal/failure/CanceledFailure.html)
Expand All @@ -122,6 +226,29 @@ For example, if an Activity Execution times out, the `cause` is a [Timeout Failu
- Python: [ActivityError](https://python.temporal.io/temporalio.exceptions.ActivityError.html)
- Proto: [ActivityFailureInfo](https://api-docs.temporal.io/#temporal.api.failure.v1.ActivityFailureInfo) and [Failure](https://api-docs.temporal.io/#temporal.api.failure.v1.Failure)

## Nexus Operation Failure

A Nexus Operation Failure is delivered to the Workflow Execution when a Nexus Operation fails.
It contains information about the failure and the Nexus Operation Execution; for example, the Nexus Operation name and Nexus Operation ID.
The reason for the failure is in the message and cause (typically an Application Error or a Canceled Error).

- Go: NexusOperationError
- Proto: NexusOperationFailureInfo

A Nexus Operation Failure includes the following fields:

- Endpoint is set to the name of the endpoint.
- Service is set to the name of the service.
- Operation is set to the name of the operation.
- Operation_id is set to the id of the operation, if this is an async operation.
- Scheduled_event_id is set to the caller’s event id that scheduled the operation.
- Message is set to a generic unsuccessful error message.
- Cause is set to the underlying Application Failure with the following fields:
- Non-retryable is set to true.
- Type is set to the error's type name.
- Message is set to the error message.
- Nexus_error_code is set the the underlying Nexus error code.

## Child Workflow Failure

A Child Workflow Failure is delivered to the Workflow Execution when a Child Workflow Execution fails.
Expand Down