-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MDEV-29293 Refactor MDL BF abort logic #314
base: 10.5
Are you sure you want to change the base?
Conversation
The problem seems to be a deadlock between KILL command execution and BF abort issued by an applier, where: * KILL has locked victim's LOCK_thd_kill and LOCK_thd_data * applier has innodb side global lock mutex and victim trx mutex * KILL is calling innobase_kill_query, and is blocked by innodb global lock mutex * applier is in wsrep_innobase_kill_one_trx and is blocked by victim's LOCK_thd_kill The fix in this commit, removes the TOI replication of KILL command, and makes KILL execution less intrusive operation. Aborting of the victim happens now by using wsrep_abort_thd(), which is the same method as used for aborting victims of DDL execution. wsrep_thd_abort(), will start the victim aborting from inside innodb, holding the lock_sys mutex and victim trx mutex. Therefore the locking protocol is same as used in regular applier BF aborting procedure. wsrep_abort_thd will eventually call also THD::awake (as regular KILL would), and now awake is passed the user chosen kill signal, in case of KILL command execution. Applier BF aborting, otoh, will use KILL_QUERY_HARD signal. Notable changes in this commit: * wsrep client connections's error state may remain sticky after client connection is closed. This error message will then pop up for the next client session issuing first SQL statement. This problem raised with test galera.galera_bf_kill The fix is to reset wsrep client error state, before a THD is reused for next connetion * Releasing THD locks, in wsrep_abort_transaction, when locking innodb mutexes, this guarantees same locking order as with applier BF aborting * Handling BF aborting of idle victim of KILL QUERY (and lower signals) with background rollbacker. Kill signals higher than KILL_CONNECTION, otoh, will now skip background rollbacker treatment. This is because KILL_CONNECTION will wake up the victim so early, that victim execution may interfere with the rollbacker execution. * wsrep-lib is now using new branch: KILL_command, which has changed server_service::background_rollback() to return true/false depending on if the background rollbacking was started or not. * Avoiding to overwrite victim THD's error code to deadlock error, if aborting was due to manual KILL, this preserves the native error code for KILL victims
test_pr |
3 similar comments
test_pr |
test_pr |
test_pr |
@@ -121,7 +125,7 @@ CREATE TABLE t1 AS SELECT * FROM t2; | |||
UNLOCK TABLES; | |||
|
|||
--connection node_1 | |||
--error ER_TABLE_EXISTS_ERROR,ER_LOCK_DEADLOCK | |||
--error ER_TABLE_EXISTS_ERROR,ER_QUERY_INTERRUPTED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ER_LOCK_DEADLOCK -> ER_QUERY_INTERRUPTED change comes from changed BF abort path for operations which don't have open wsrep transaction. This should be fixed for CTAS specifically by opening a wsrep transaction for CTAS operation before MDL wait.
test_pr |
f31f824
to
69067e0
Compare
@@ -29,14 +29,12 @@ extern "C" my_bool wsrep_on(const THD *thd) | |||
|
|||
extern "C" void wsrep_thd_LOCK(const THD *thd) | |||
{ | |||
mysql_mutex_lock(&thd->LOCK_thd_kill); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splitting LOCK_thd_kill
and LOCK_thd_data
to separate calls to allow BF aborting of SR transaction from InnoDB lock manager. BF aborting SR transaction grabs LOCK_thd_kill
in THD::reset_globals()
which causes
LOCK_thd_kill
to be locked twice if wsrep_thd_LOCK()
locks it too.
sql/wsrep_server_service.cc
Outdated
@@ -146,6 +146,7 @@ void Wsrep_server_service::release_high_priority_service(wsrep::high_priority_se | |||
void Wsrep_server_service::background_rollback(wsrep::client_state& client_state) | |||
{ | |||
Wsrep_client_state& cs= static_cast<Wsrep_client_state&>(client_state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: No need for special handling for KILL
command here, KILL
does not go through wsrep-lib
in this PR.
69067e0
to
d4d2163
Compare
In this commit, the BF abort logic is refactored to eliminate the need to call back to server from InnoDB wsrep_abort_transaction(). This is to simplify the MDL BF abort codepath so that the direction is always only server -> InnoDB. The main change is in wsrep_abort_thd(), which first does the BF abort through wsrep-lib, and then continues to ha_abort_transaction() if the wsrep-lib BF abort was successful and no background rollback was started for victim. KILL command handling for wsrep threads was changed so that the thread to be killed is flagged first with wsrep_abort_by_kill to avoid calling storage engine ha_kill_query() from awake_no_mutex(). KILL for wsrep thread is executed by awake_no_mutex() combined with call to ha_abort_transaction(). The locking order for wsrep BF aborts is now: lock_sys.mutex trx.mutex LOCK_thd_kill LOCK_thd_data With this, BF aborts which originate from InnoDB lock manager don't have to release lock_sys.mutex and trx.mutex in between, which reduces chance for race conditions. In order to make the above locking order work for wsrep_abort_transaction(), the function records InnoDB transaction ID and releases victim LOCK_thd_kill and LOCK_thd_data. InnoDB lock_sys.mutex is then grabbed and the victim InnoDB trx object is looked up by InnoDB transaction ID. Locking/unlocking LOCK_thd_kill and LOCK_thd_data was reorganized so that it happens in the visible scope as much as possible. Remove LOCK_thd_kill from wsrep_thd_LOCK/UNLOCK to allow more fine grained locking for SR BF abort which may require locking of victim LOCK_thd_kill. Added explicit call for wsrep_thd_kill_LOCK/UNLOCK where appropriate. Wsrep-lib was updated to version which allows external locking for BF abort calls. Changes to MTR tests: - Disable galera_bf_abort_group_commit. This test is going to be removed (MDEV-30855). - Make galera_var_retry_autocommit result more readable by echoing cases and expectations into result. Only one expected result for reap to verify that server returns expected status for query. - Record galera_gcache_recover_manytrx as result file was incomplete. Trivial change. - Make galera_create_table_as_select more deterministic: Wait until CTAS execution has reached MDL wait for multi-master conflict case. Expected error from multi-master conflict is ER_QUERY_INTERRUPTED. This is because CTAS does not yet have open wsrep transaction when it is waiting for MDL, query gets interrupted instead of BF aborted. This should be addressed in separate task. - Test galera_bf_abort_registering to check that registering trx gets BF aborted through MDL. Co-authored-by: Jan Lindström <[email protected]>
d4d2163
to
50ac97f
Compare
bf8f134
to
9628c07
Compare
When wsrep transaction has reached committing state, it must become immune for KILL. Accompanied MTR test can be used to reproduce assertion failure if the transaction is not immune. Implemented additional protection in wsrep_kill_thd() against KILL in committing states.
8248f73
to
fb990d4
Compare
If wsrep thread is on commit codepath when KILL happens, the kill_signal is saved and restored in wsrep_after_commit() so that it will be processed after commit is over.
bc4bc85
to
ad8d7b1
Compare
This changes the locking order to lock_sys.mutex LOCK_thd_kill LOCK_thd_data trx.mutex Order LOCK_thd_data -> trx.mutex is required by group commit. In order to avoid trx going out of scope, grap reference to victim before releasing the mutex.
ad8d7b1
to
f6d98e7
Compare
Postpone kill only if the thread is supposed to commit the transaction. For transactions which will roll back, keep the killed state. Added THD::wsrep_killed_state for debugging.
There is a cycle trx reference -> LOCK_commit_order -> LOCK_thd_data -> trx reference which may prevent the victim transaction committing while BF aborter is trying to lock LOCK_thd_data. Break this cycle by using wsrep_thd_TRYLOCK() and checking if the victim is already committed in memory if the lock cannot be taken. Locking will be retried until the lock gets taken or the victim is found committed in memory.
6ecf3fe
to
b1b4f11
Compare
The manipulation of the variable can be done solely on server side. Moreover, it is now debug only variable and could be excluded from optimized builds. Also fix THD::reset_killed() to check if wsrep is enabled for THD.
This is to ensure that killed state will be restored before statement execution is over.
In rare case of false negative, checking the counter would cause test failure-
In this commit, the BF abort logic is refactored to eliminate
the need to call back to server from InnoDB wsrep_abort_transaction().
This is to simplify the MDL BF abort codepath so that the direction
is always only server -> InnoDB. The main change is in wsrep_abort_thd(),
which first does the BF abort through wsrep-lib, and then continues to
ha_abort_transaction() if the wsrep-lib BF abort was successful and no
background rollback was started for victim.
KILL command handling for wsrep threads was changed so that the
thread to be killed is flagged first with wsrep_abort_by_kill to
avoid calling storage engine ha_kill_query() from awake_no_mutex().
KILL for wsrep thread is executed by awake_no_mutex() combined
with call to ha_abort_transaction().
The locking order for wsrep BF aborts is now:
lock_sys.mutex
trx.mutex
LOCK_thd_kill
LOCK_thd_data
With this, BF aborts which originate from InnoDB lock manager
don't have to release lock_sys.mutex and trx.mutex in between,
which reduces chance for race conditions.
In order to make the above locking order work for
wsrep_abort_transaction(), the function records InnoDB transaction
ID and releases victim LOCK_thd_kill and LOCK_thd_data. InnoDB
lock_sys.mutex is then grabbed and the victim InnoDB trx object
is looked up by InnoDB transaction ID.
Locking/unlocking LOCK_thd_kill and LOCK_thd_data was reorganized
so that it happens in the visible scope as much as possible.
Remove LOCK_thd_kill from wsrep_thd_LOCK/UNLOCK to allow more
fine grained locking for SR BF abort which may require locking
of victim LOCK_thd_kill. Added explicit call for
wsrep_thd_kill_LOCK/UNLOCK where appropriate.
Changes to MTR tests:
be removed (MDEV-30855).
cases and expectations into result. Only one expected result for
reap to verify that server returns expected status for query.
Trivial change.
Wait until CTAS execution has reached MDL wait for multi-master
conflict case. Expected error from multi-master conflict is
ER_QUERY_INTERRUPTED. This is because CTAS does not yet have open
wsrep transaction when it is waiting for MDL, query gets interrupted
instead of BF aborted. This should be addressed in separate task.
BF aborted through MDL.
Co-authored-by: Jan Lindström [email protected]