-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor MDL BF aborting through ha_abort_transaction() #307
base: 10.6
Are you sure you want to change the base?
Conversation
{ | ||
victim_trx->lock.set_wsrep_victim(); | ||
} | ||
victim_trx->mutex_unlock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the point where:
Can victim transaction move from ACTIVE to COMMITTED_IN_MEMORY so that flag is still there ? or it is cleared during commit ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The transaction state is changed to TRX_STATE_COMMITTED_IN_MEMORY
in trx_t::commit_state()
. This is protected by TMTrxGuard
and happens before the end of trx_t::commit_in_memory()
where the lock.was_chose_as_deadlock_victim
is set to false.
If the lock.was_chosen_as_deadlock_victim
is set just before commit, the commit can still proceed as the flag is not checked in commit code path, and the flag is cleared at the end of trx_t::commit_in_memory()
. With this fix the flag cannot be set to true after the transaction has passed trx_t::commit_in_memory()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Add comment in code which clarifies this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After code reading I can see that you are correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one more possible race when autocommit non-locking transaction changes its state to TRX_STATE_NOT_STARTED
in commit_in_memory()
. Not sure what to do about it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just have one question but overall this looks very promising and better what we have now.
|
BF aborting through ha_abort_transaction() has proven to be problematic. The InnoDB implementation of wsrep_abort_transaction is racy with respect to THD/trx object access because THD mutexes cannot be held over whole BF abort process to avoid deadlocks when wsrep_abort_transaction calls back to server. Change the logic for BF aborts which originate from MDL in the following way: * First check if the BF abort should progress by calling wsrep-lib bf_abort() which checks the wsrep transaction and Galera side transaction states. * If the BF abort should progress, call ha_abort_transaction() without releasing LOCK_thd_data in between. * Remove call back to server in InnoDB wsrep_abort_transaction(). This makes the wsrep transaction state check and InnoDB transaction kill atomic, while simultaneously removing the need to call back to server from InnoDB. MTR test changes: - MDEV-25717: Introduce separate sync point in wsrep_abort_thd() to control thread execution during MDL induced BF abort. - galera_create_table_as_select: CTAS may now fail also with ER_QUERY_INTERRUPTED due to use of THD::awake().
0143cde
to
0d8dc5d
Compare
BF aborting through ha_abort_transaction() has proven to be
problematic. The InnoDB implementation of wsrep_abort_transaction
is racy with respect to THD/trx object access because THD mutexes
cannot be held over whole BF abort process to avoid deadlocks when
wsrep_abort_transaction calls back to server.
Change the logic for BF aborts which originate from MDL in the
following way:
wsrep-lib bf_abort() which checks the wsrep transaction
and Galera side transaction states.
without releasing LOCK_thd_data in between.
This makes the wsrep transaction state check and InnoDB
transaction kill atomic, while simultaneously removing the
need to call back to server from InnoDB.
TODO: Is THD::wsrep_aborter needed anymore? Always calling
wsrep-lib bf_abort() guards against multiple BF aborts.
Maybe still needed for InnoDB.
Change for MDEV-25717: Introduce separate sync point in
wsrep_abort_thd() to control thread execution during MDL
induced BF abort.
Change for galera_create_table_as_select: CTAS may now fail
also with ER_QUERY_INTERRUPTED due to use of THD::awake().