Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quorum queue replica that was shut down can rejoin after a restart and queue deletion, re-declaration #12366

Open
luos opened this issue Sep 24, 2024 · 4 comments
Labels

Comments

@luos
Copy link
Contributor

luos commented Sep 24, 2024

Describe the bug

Hi,

The issue below involves deleting and recreating a queue while a node is down, which means that most users will not be affected by this.

We've identified an issue with Quorum Queues which causes an out of date replica to come back as a leader again, resending past log messages, causing the now follower to reapply local effects, causing the new consumer to receive messages which were already processed.

This leads to duplicate message delivery - even though the messages were acknowledged properly and the queue processed the acks, etc. Basically the log will be replayed in its entirety, meaning messages processed days ago can reappear.

The effect of it is similar to rabbitmq/ra#387.

This issue causes the queue to actually become broken in some scenarios, but that is expected due to the bad internal state.

We know that the proper solution is to not delete the queue but probably ra should also have some built in protection to not allow out of date members to rejoin the cluster - at least not to become leaders.

I think a potential solution would be is to include cluster id in pre_vote and request_vote_rpc messages. According to my review, today there is no shared cluster ID for the ra clusters. There is a uid but that is for the server, not for the cluster.

Reproduction steps

  1. Use a three node cluster.
  2. Connect to "rmq1" with a consumer
  3. Create a quorum queue named “test” on "rmq1"
  4. Create consumer on "rmq1" for queue "test"
  5. Publish a single message with a unique identifier (eg. current time)
  6. Acknowledge the message on the consumer
  7. Shut down rmq1, client is disconnected
  8. Client reconnects to one of the up nodes (rmq2)
  9. Client deletes and recreates the queue
  10. Creates a consumer for queue "test"
  11. Restart down node "rmq1"
  12. The queue starts up on rmq1, notices that it is more up to date than the newly created replica on rmq2, leader becomes rmq1, other nodes revert back to follower
  13. RMQ1 notices that the (newly created) followers do not have some indexes,
    1. resends the append_entries for these log items
    2. log message "setting last index to 3, next_index 4 for…"
  14. Follower receives the entries from the log, and plays them with the bad initial state, meaning it will send out the message to the current local consumer.

Expected behavior

One or all of the following: :-)

  • Deleted replica can not rejoin the ra cluster
  • Deleted replica deletes itself
  • It can not become leader

Additional context

I can share some traces or debug output, not sure it makes sense without context.

Attached the "restart sequence", nothing special.

restart.sh.txt

@luos luos added the bug label Sep 24, 2024
@kjnilsson
Copy link
Contributor

You'd have to start recording each member's assigned "UId" in the queue record and base the recovery of the member on whether the current UId for the given cluster name matches or not.

@kjnilsson
Copy link
Contributor

even so you could reproduce a similar issue by partitioning a node, delete and re-create the queue on the majority side then re-join the partitioned node.

@luos
Copy link
Contributor Author

luos commented Sep 24, 2024

I see, in the case you are proposing it's more of RabbitMQs responsibility to recover / not recover the member if the uid changed.

I think it's a bit more resilient if this would be included in ra, ie. pre_vote would check membership.

Though thinking more about it - both sides are needed, and more probably.

One for RabbitMQ to clean up / not start removed members on startup, an implementation in ra to not allow the partitioned node to become a leader again, and another where RabbitMQ gets notified if an out of date uid member shows up, so it can do the cleanup.

@kjnilsson
Copy link
Contributor

The uids aren't exchanged in the raft commands so changing Ra would not be easy to do. It is better to put the responsibility to ensure the right members are running on the system running them if at all possible.

@michaelklishin michaelklishin changed the title Quorum queue replica on down node can rejoin after queue deletion Quorum queue replica that was shut down can rejoin after a restart and queue deletion, re-declaration Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants