Endpoint never reconnects to RabbitMQ after channel timeout #1194

TraGicCode · 2023-01-31T18:32:59Z

We are currently running into an issue and would like some assistance/clarity on how we can properly handle this. Currently, we have a handler that is calling an API. This api some crazy timeout ( over 50 minutes ) and we have no ability to customize tweak this via a cancellation token OR a custom timeout value. When this ThirdParty performs a deployment, the api in flight just hangs for the duration of the timeout. This is causing an issue because the RabbitMQ channel timeout ( as shown below ) kicks in.

Already closed: The AMQP operation was interrupted: AMQP close-reason, initiated by Peer, code=406, text='PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more', classId=0, methodId=0

Then what happens is, the connection to the queue is killed ( which means there is no consumer on the queue ) and our endpoint essentially runs like this forever and never reconnects which is a problem.

Is this because the API call is still waiting?
Can someone help me understand the correct way or a good way in which we can handle this?

**NOTE: I suppose it's possible our endpoint will reconnect after the api timeout occurs but usually our alarms go off by then and someone has manually restarted the endpoint. **

NServiceBus Version:
7.5.0
NServiceBus.RabbitMQ Version:
6.1.0
RabbitMQ.Client Version:
6.2.2
RabbitMQ Server Version:
3.9.11

The text was updated successfully, but these errors were encountered:

awright18 · 2023-01-31T19:19:31Z

After looking at the recent releases, I noticed that version 6.1.3 fixed a bug that may be related to this issue. However, there have been a few other patches since that release, however, version 6.1.5 is the latest of the 6.* releases. Can you try that and let me know if it solves your issue?

Out of curiosity what is the type of API you are calling that won't let you set a timeout? That seems like an interesting challenge.

TraGicCode · 2023-01-31T20:08:19Z

salesforce-marketingcloud/FuelSDK-CSharp#104

I created an issue and a pr a while back but their team said it's not a priority for them which is annoying....

awright18 · 2023-01-31T20:50:18Z

That's too bad looks like it hasn't had any significant changes in a while.

TraGicCode · 2023-02-01T10:20:48Z

Hey @awright18 ,

It looks like upgrading the the latest patch version fixes the issue! It appears that once i upgrade the CircuitBreaker is actually kicking in and reconnecting to the broker. The version i posted the circuit breaker never triggers and the connection is never re-established.

CC: @bording

TraGicCode · 2023-02-01T10:48:37Z

One last question before i close this out, I'm trying to make sure i understand the behavior of what I'm seeing locally.

When i put a thread.sleep for 10 minute in a handler, and then set a very low delivery ack timeout in rabbitmq, it appears the circuit breaker starts arming and disarming pretty much constantly ( every 10 seconds ). Once the thread sleep finally finishes, the follow error message is shown

{ "application": "OrderConfirmation", "environment": "local", "time": "2023-02-01 04:46:50.4695", "level": "ERROR", "logger": "NServiceBus.Transport.RabbitMQ.MessagePump", "message": "Failed to acknowledge message 'c8eb9ee5-9d04-420f-9b5b-5eaf27025021' because the handler execution time exceeded the broker delivery acknowledgement timeout. Increase the length of the timeout on the broker. The message was returned to the
 queue.", "exception": { "type": "AlreadyClosedException", "message": "Already closed: The AMQP operation was interrupted: AMQP close-reason, initiated by Peer, code=406, text='PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 10000 ms. This timeout value can be configured, see consumers doc guide to learn more', classId=0, methodId=0", "method": "Void Transmit(RabbitMQ.Client.Im
pl.OutgoingCommand ByRef)", "stackTrace": "RabbitMQ.Client.Exceptions.AlreadyClosedException: Already closed: The AMQP operation was interrupted: AMQP close-reason, initiated by Peer, code=406, text='PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 10000 ms. This timeout value can be configured, see consumers doc guide to learn more', classId=0, methodId=0\r\n   at RabbitMQ.Clie
nt.Impl.SessionBase.Transmit(OutgoingCommand& cmd)\r\n   at RabbitMQ.Client.Impl.ModelBase.ModelSend(MethodBase method, ContentHeaderBase header, ReadOnlyMemory`1 body)\r\n   at RabbitMQ.Client.Framing.Impl.Model.BasicAck(UInt64 deliveryTag, Boolean multiple)\r\n   at RabbitMQ.Client.Impl.RecoveryAwareModel.BasicAck(UInt64 deliveryTag, Boolean multiple)\r\n   at RabbitMQ.Client.Impl.AutorecoveringModel.BasicAck(UInt64 d
eliveryTag, Boolean multiple)\r\n   at NServiceBus.Transport.RabbitMQ.ModelExtensions.<>c.<BasicAckSingle>b__1_0(Object state)\r\n   at System.Threading.Tasks.Task.InnerInvoke()\r\n   at System.Threading.Tasks.Task.Execute()\r\n--- End of stack trace from previous location where exception was thrown ---\r\n   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)\r\n   at System.Runtime.CompilerSer
vices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n   at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task)\r\n   at NServiceBus.Transport.RabbitMQ.MessagePump.<Process>d__36.MoveNext()" } }

and the circuit breaker stops constantly tripping.

Can someone explain if this is documented anywhere and/or the explanation on what is going on, why this is going on, and why its expected?
How does this also affect the endpoint when concurrency is configured to process messages in parallel?

bording · 2023-02-01T15:13:14Z

@TraGicCode That all sounds normal and expected when you configure a consumer ack timeout that is very low compared to how long it takes for a message to process.

When you exceed the ack timeout, the broker cancels the consumer. Once that has happened, there is no clean way for us to keep the channel and connection open and create a new consumer, so we consider that a reconnection scenario: link

That means with a 10 minute handler and a 10 second timeout, you are seeing the ack timeout trigger, the circuit breaker arming to tell you about the consumer being canceled, and then the connection being shut down and reconnected. This will happen every 10 seconds until the handler finishes.

Once the handler finally finishes, you see the error message we added to "Increase the length of the timeout on the broker".

Hitting the ack timeout is bad, and it does mean that all in-flight messages can't be successfully acked and are returned to the queue.

That's why it's very important to make sure your ack timeout is set to be large enough allow a message to be processed and acked successfully.

TraGicCode · 2023-02-01T15:29:01Z

Thanks @bording ! I created a github issue to recommend documenting the behavior in this special case scenario.

Closing this issue now!

TraGicCode closed this as completed Feb 1, 2023

TraGicCode reopened this Feb 1, 2023

TraGicCode closed this as completed Feb 1, 2023

TraGicCode mentioned this issue Feb 1, 2023

Document behavior of RabbitMQ During delivery acknowledgement timeout Particular/docs.particular.net#5993

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Endpoint never reconnects to RabbitMQ after channel timeout #1194

Endpoint never reconnects to RabbitMQ after channel timeout #1194

TraGicCode commented Jan 31, 2023

awright18 commented Jan 31, 2023

TraGicCode commented Jan 31, 2023

awright18 commented Jan 31, 2023

TraGicCode commented Feb 1, 2023 •

edited

Loading

TraGicCode commented Feb 1, 2023 •

edited

Loading

bording commented Feb 1, 2023

TraGicCode commented Feb 1, 2023

Endpoint never reconnects to RabbitMQ after channel timeout #1194

Endpoint never reconnects to RabbitMQ after channel timeout #1194

Comments

TraGicCode commented Jan 31, 2023

awright18 commented Jan 31, 2023

TraGicCode commented Jan 31, 2023

awright18 commented Jan 31, 2023

TraGicCode commented Feb 1, 2023 • edited Loading

TraGicCode commented Feb 1, 2023 • edited Loading

bording commented Feb 1, 2023

TraGicCode commented Feb 1, 2023

TraGicCode commented Feb 1, 2023 •

edited

Loading

TraGicCode commented Feb 1, 2023 •

edited

Loading