-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Centralize exception handling and fix behavior for RequestTimeoutException #3063
Conversation
…ption Signed-off-by: Chase Engelbrecht <[email protected]>
Signed-off-by: Chase Engelbrecht <[email protected]>
Signed-off-by: Chase Engelbrecht <[email protected]>
Signed-off-by: Chase Engelbrecht <[email protected]>
List<OpenTelemetryLog> logs; | ||
|
||
if (Context.current().isCancelled()) { | ||
throw new RequestCancelledException("Cancelled by client"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QUES: Is there reason to favor throwing exception rather than responseObserver.onError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the request times out at the Armeria level, the request is terminated before the responseObserver has a chance to be populated. I wanted to centralize the logic for exception handling + metrics as well as cover all cases with a single pattern. Throwing the exception and then handling it in the GrpcRequestExceptionHandler allows for that.
The other caveat is that there is an edge case where we end up emitting both a requestTimeout metric and a badRequest/second requestTimeout/internalServerException/successRequest metric if we only handle the Armeria timeouts and continue to have the original metric/error handling here. This happens if we start serving the request before it times out but it times out before the request is fully served.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I did not realize it is captured by the exception handler.
} | ||
|
||
LOG.error("Failed to write the request of size {} due to:", request.toString().length(), e); | ||
throw new BufferWriteException(e.getMessage(), e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar question as above
Objects.requireNonNull(message); | ||
if (e instanceof IOException) { | ||
badRequestsCounter.increment(); | ||
return HttpResponse.of(HttpStatus.BAD_REQUEST, MediaType.ANY_TYPE, message); | ||
} else if (e instanceof TimeoutException) { | ||
} else if (e instanceof TimeoutException || e instanceof RequestTimeoutException) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between TimeoutException
and RequestTimeoutException
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TimeoutException is thrown by the BlockingBuffer, RequestTimeoutException is thrown by Armeria when the requestTimeout has expired
public class RequestExceptionHandler { | ||
public class HttpRequestExceptionHandler implements ExceptionHandlerFunction { | ||
static final String ARMERIA_REQUEST_TIMEOUT_MESSAGE = "Timeout waiting for request to be served. This is usually due to the buffer being full."; | ||
static final String DEFAULT_MESSAGE = ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any particular reason this is empty? Should we have something generic like "Internal Server Error"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. I just pulled this out into a constant from the original implementation. This empty string could be populated for non-5XX exceptions as well so I'm not sure if there's a good generic message. Open to suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand you are saying this was the default behavior.
if (Context.current().isCancelled()) { | ||
requestTimeoutCounter.increment(); | ||
responseObserver.onError(Status.CANCELLED.withDescription("Cancelled by client").asRuntimeException()); | ||
if (ServiceRequestContext.current().isTimedOut()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the fix you were referring to here?
Requests that are queued in the backend service's BlockingTaskExecutor will still be executed even if the request has already timed out. This adds unnecessary load to the pipeline by trying to process requests that have already returned a 503 (now 408/429 depending on http vs grpc) to the client. Added a check that a request is not timed out before attempting to process it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
if (e instanceof RequestTimeoutException) { | ||
message = ARMERIA_REQUEST_TIMEOUT_MESSAGE; | ||
} else { | ||
message = e.getMessage() == null ? DEFAULT_MESSAGE : e.getMessage(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This DEFAULT_MESSAGE
could possibly be improved since the handleExceptions
method already determines what type it is. So we can probably remove DEFAULT_MESSAGE
and derive it from the Status
.
Assuming that Status::toString
returns a useful string, this could be:
message = e.getMessage() == null ? status.toString() : e.getMessage();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one's a little trickier. The toString
method is
return MoreObjects.toStringHelper(this)
.add("code", code.name())
.add("description", description)
.add("cause", cause != null ? getStackTraceAsString(cause) : cause)
.toString();
The cause and description are null by default. We could add the whole exception as the cause but I don't think we want to return the stacktrace in the HTTP response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add something generic like Unexpected exception encountered: <exception class>
. Let me know what your thoughts are
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just code.name()
instead of Status.toString()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is, status.getCode.name()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, will do
public class RequestExceptionHandler { | ||
public class HttpRequestExceptionHandler implements ExceptionHandlerFunction { | ||
static final String ARMERIA_REQUEST_TIMEOUT_MESSAGE = "Timeout waiting for request to be served. This is usually due to the buffer being full."; | ||
static final String DEFAULT_MESSAGE = ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand you are saying this was the default behavior.
message = cause.getMessage() == null ? DEFAULT_MESSAGE : cause.getMessage(); | ||
} | ||
|
||
return handleException(cause, message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the default message issue can be resolved here.
This private handleException
method could be refactored to return HttpStatus
. All the return values appear to generate the same HttpResponse
aside from the status.
Then this line here becomes:
HttpStatus status = handleException(cause, /*message no longer needed*/);
return HttpResponse.of(status, MediaType.ANY_TYPE, cause.getMessage() == null ? status.toString() : cause.getMessage());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Will refactor
Signed-off-by: Chase Engelbrecht <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
When a request hits an Armeria server and cannot be processed within the request_timeout, a RequestTimeoutException occurs. Currently this case returns a 503 to the client. This PR fixes that by adding a custom exception handler to the push-based sources that properly handles RequestTimeoutExceptions.
As part of this work, I stumbled upon a couple other issues/quirks of the Armeria servers
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.