Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endpoint never reconnects to RabbitMQ after channel timeout #1194

Closed
TraGicCode opened this issue Jan 31, 2023 · 7 comments
Closed

Endpoint never reconnects to RabbitMQ after channel timeout #1194

TraGicCode opened this issue Jan 31, 2023 · 7 comments

Comments

@TraGicCode
Copy link

We are currently running into an issue and would like some assistance/clarity on how we can properly handle this. Currently, we have a handler that is calling an API. This api some crazy timeout ( over 50 minutes ) and we have no ability to customize tweak this via a cancellation token OR a custom timeout value. When this ThirdParty performs a deployment, the api in flight just hangs for the duration of the timeout. This is causing an issue because the RabbitMQ channel timeout ( as shown below ) kicks in.

Already closed: The AMQP operation was interrupted: AMQP close-reason, initiated by Peer, code=406, text='PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more', classId=0, methodId=0

Then what happens is, the connection to the queue is killed ( which means there is no consumer on the queue ) and our endpoint essentially runs like this forever and never reconnects which is a problem.

Is this because the API call is still waiting?
Can someone help me understand the correct way or a good way in which we can handle this?

**NOTE: I suppose it's possible our endpoint will reconnect after the api timeout occurs but usually our alarms go off by then and someone has manually restarted the endpoint. **

NServiceBus Version:
7.5.0
NServiceBus.RabbitMQ Version:
6.1.0
RabbitMQ.Client Version:
6.2.2
RabbitMQ Server Version:
3.9.11

@awright18
Copy link
Contributor

After looking at the recent releases, I noticed that version 6.1.3 fixed a bug that may be related to this issue. However, there have been a few other patches since that release, however, version 6.1.5 is the latest of the 6.* releases. Can you try that and let me know if it solves your issue?

Out of curiosity what is the type of API you are calling that won't let you set a timeout? That seems like an interesting challenge.

@TraGicCode
Copy link
Author

salesforce-marketingcloud/FuelSDK-CSharp#104

I created an issue and a pr a while back but their team said it's not a priority for them which is annoying....

@awright18
Copy link
Contributor

That's too bad looks like it hasn't had any significant changes in a while.

@TraGicCode
Copy link
Author

TraGicCode commented Feb 1, 2023

Hey @awright18 ,

It looks like upgrading the the latest patch version fixes the issue! It appears that once i upgrade the CircuitBreaker is actually kicking in and reconnecting to the broker. The version i posted the circuit breaker never triggers and the connection is never re-established.

CC: @bording

@TraGicCode TraGicCode reopened this Feb 1, 2023
@TraGicCode
Copy link
Author

TraGicCode commented Feb 1, 2023

One last question before i close this out, I'm trying to make sure i understand the behavior of what I'm seeing locally.

When i put a thread.sleep for 10 minute in a handler, and then set a very low delivery ack timeout in rabbitmq, it appears the circuit breaker starts arming and disarming pretty much constantly ( every 10 seconds ). Once the thread sleep finally finishes, the follow error message is shown

{ "application": "OrderConfirmation", "environment": "local", "time": "2023-02-01 04:46:50.4695", "level": "ERROR", "logger": "NServiceBus.Transport.RabbitMQ.MessagePump", "message": "Failed to acknowledge message 'c8eb9ee5-9d04-420f-9b5b-5eaf27025021' because the handler execution time exceeded the broker delivery acknowledgement timeout. Increase the length of the timeout on the broker. The message was returned to the
 queue.", "exception": { "type": "AlreadyClosedException", "message": "Already closed: The AMQP operation was interrupted: AMQP close-reason, initiated by Peer, code=406, text='PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 10000 ms. This timeout value can be configured, see consumers doc guide to learn more', classId=0, methodId=0", "method": "Void Transmit(RabbitMQ.Client.Im
pl.OutgoingCommand ByRef)", "stackTrace": "RabbitMQ.Client.Exceptions.AlreadyClosedException: Already closed: The AMQP operation was interrupted: AMQP close-reason, initiated by Peer, code=406, text='PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 10000 ms. This timeout value can be configured, see consumers doc guide to learn more', classId=0, methodId=0\r\n   at RabbitMQ.Clie
nt.Impl.SessionBase.Transmit(OutgoingCommand& cmd)\r\n   at RabbitMQ.Client.Impl.ModelBase.ModelSend(MethodBase method, ContentHeaderBase header, ReadOnlyMemory`1 body)\r\n   at RabbitMQ.Client.Framing.Impl.Model.BasicAck(UInt64 deliveryTag, Boolean multiple)\r\n   at RabbitMQ.Client.Impl.RecoveryAwareModel.BasicAck(UInt64 deliveryTag, Boolean multiple)\r\n   at RabbitMQ.Client.Impl.AutorecoveringModel.BasicAck(UInt64 d
eliveryTag, Boolean multiple)\r\n   at NServiceBus.Transport.RabbitMQ.ModelExtensions.<>c.<BasicAckSingle>b__1_0(Object state)\r\n   at System.Threading.Tasks.Task.InnerInvoke()\r\n   at System.Threading.Tasks.Task.Execute()\r\n--- End of stack trace from previous location where exception was thrown ---\r\n   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)\r\n   at System.Runtime.CompilerSer
vices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n   at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task)\r\n   at NServiceBus.Transport.RabbitMQ.MessagePump.<Process>d__36.MoveNext()" } }

and the circuit breaker stops constantly tripping.

  1. Can someone explain if this is documented anywhere and/or the explanation on what is going on, why this is going on, and why its expected?
  2. How does this also affect the endpoint when concurrency is configured to process messages in parallel?

@bording
Copy link
Member

bording commented Feb 1, 2023

@TraGicCode That all sounds normal and expected when you configure a consumer ack timeout that is very low compared to how long it takes for a message to process.

When you exceed the ack timeout, the broker cancels the consumer. Once that has happened, there is no clean way for us to keep the channel and connection open and create a new consumer, so we consider that a reconnection scenario: link

That means with a 10 minute handler and a 10 second timeout, you are seeing the ack timeout trigger, the circuit breaker arming to tell you about the consumer being canceled, and then the connection being shut down and reconnected. This will happen every 10 seconds until the handler finishes.

Once the handler finally finishes, you see the error message we added to "Increase the length of the timeout on the broker".

Hitting the ack timeout is bad, and it does mean that all in-flight messages can't be successfully acked and are returned to the queue.

That's why it's very important to make sure your ack timeout is set to be large enough allow a message to be processed and acked successfully.

@TraGicCode
Copy link
Author

Thanks @bording ! I created a github issue to recommend documenting the behavior in this special case scenario.

Closing this issue now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants