Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sequencer): rewrite check_tx to be more efficient and fix regression #1515

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

Lilyjjo
Copy link
Contributor

@Lilyjjo Lilyjjo commented Sep 18, 2024

Summary

Previously handle_check_tx()'s processing of CheckTx::Recheck would always run all stateless checks and attempt insertion. This PR modified the function to only run the checks on transactions that have not been removed or are not contained inside of the app side mempool.

This PR additionally adds a HashSet of transaction hashes that are contained in the app's Mempool to the state of the app Mempool. This is used in the scenario where the sequencer restarts-- CometBFT persists transactions but our app mempool does not. The cache is used to allow transactions to re-enter the mempool on restarts.

This PR also restores the original behavior of CheckTX to not kick out transaction needlessly on rechecks, this regression was added when upgrading the mempool in #1323.

Background

A lot of the functionality from the original handle_check_tx() has moved into the app side mempool itself, including the balance and nonce checks. All of the remaining checks will not change between rechecks and only need to be ran when considering a transaction for insertion into the appside mempool.

Testing

Ran locally and added unit tests to ensure future nonces can be added and rechecks do not remove transactions needlessly.

Metrics

  • Deleted check_tx_removed_stale_nonce
  • Added check_tx_duration_seconds_check_tracked, mempool_tx_logic_error
  • Changed check_tx_duration_seconds_check_nonce to check_tx_duration_seconds_fetch_nonce

Related Issues

closes #1481

@github-actions github-actions bot added the sequencer pertaining to the astria-sequencer crate label Sep 18, 2024
@Lilyjjo Lilyjjo changed the title separate out checks sequencer, mempool: rewrite check_tx to be more efficient Sep 18, 2024
@Lilyjjo Lilyjjo changed the title sequencer, mempool: rewrite check_tx to be more efficient chore: sequencer, mempool: rewrite check_tx to be more efficient Sep 18, 2024
@Lilyjjo Lilyjjo changed the title chore: sequencer, mempool: rewrite check_tx to be more efficient chore: sequencer: rewrite check_tx to be more efficient Sep 18, 2024
@Lilyjjo Lilyjjo changed the title chore: sequencer: rewrite check_tx to be more efficient fix(sequencer): rewrite check_tx to be more efficient and fix regression Sep 18, 2024
@Lilyjjo Lilyjjo force-pushed the lilyjjo/check-tx-rewrite branch 3 times, most recently from 577d0c4 to 0de80ec Compare September 24, 2024 18:19
@Lilyjjo Lilyjjo marked this pull request as ready for review September 24, 2024 19:17
@Lilyjjo Lilyjjo requested a review from a team as a code owner September 24, 2024 19:17
parked.add(
timemarked_tx,
match parked.add(
timemarked_tx.clone(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you can do let id = timemarked_tx.id() before this, then use that later on in the function and remove the clone()

Comment on lines 153 to 155
if let Some(rsp) = insert_into_mempool(mempool, &state, signed_tx, metrics).await {
return rsp;
}
Copy link
Collaborator

@noot noot Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function returning an Option that's only set if there's an error response is somewhat confusing, would prefer for this function to return a Result<(), response::CheckTx> so the error variant is the error response

same for check_removed_comet_bft

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it, feels much better that way

Comment on lines 259 to 238
/// Returns a [`response::CheckTx`] if the transaction fails any of the checks.
/// Otherwise, it returns the [`SignedTransaction`] to be inserted into the mempool.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use a Result<SignedTransaction, response::CheckTx> as this function's return

@noot
Copy link
Collaborator

noot commented Sep 26, 2024

I think this PR looks good, just a few smaller comments!

This PR additionally adds a HashSet of transaction hashes that are contained in the app's Mempool to the state of the app Mempool. This is used in the scenario where the sequencer restarts-- CometBFT persists transactions but our app mempool does not. The cache is used to allow transactions to re-enter the mempool on restarts.

how does this work? with cometbft recheck?

@Lilyjjo
Copy link
Contributor Author

Lilyjjo commented Sep 26, 2024

This PR additionally adds a HashSet of transaction hashes that are contained in the app's Mempool to the state of the app Mempool. This is used in the scenario where the sequencer restarts-- CometBFT persists transactions but our app mempool does not. The cache is used to allow transactions to re-enter the mempool on restarts.

how does this work? with cometbft recheck?

Hmmm. I think I misheard why we wanted to allow rechecks to add to the appside mempool. I checked out the CometBFT code and didn't see it persisting the transactions anywhere that was accessible, (I wonder if I missed it). In the scenario where CometBFT doesn't persist, then the allowing recheck would only be useful if CometBFT doesn't crash if the sequencer app crashses. I'll do some more research

crates/astria-sequencer/src/mempool/mod.rs Outdated Show resolved Hide resolved
crates/astria-sequencer/src/mempool/mod.rs Outdated Show resolved Hide resolved
crates/astria-sequencer/src/mempool/mod.rs Show resolved Hide resolved
crates/astria-sequencer/src/mempool/mod.rs Show resolved Hide resolved
crates/astria-sequencer/src/service/mempool/mod.rs Outdated Show resolved Hide resolved
crates/astria-sequencer/src/service/mempool/mod.rs Outdated Show resolved Hide resolved
crates/astria-sequencer/src/service/mempool/mod.rs Outdated Show resolved Hide resolved
crates/astria-sequencer/src/service/mempool/mod.rs Outdated Show resolved Hide resolved
crates/astria-sequencer/src/mempool/mod.rs Outdated Show resolved Hide resolved
crates/astria-sequencer/src/mempool/mod.rs Outdated Show resolved Hide resolved
crates/astria-sequencer/src/mempool/mod.rs Outdated Show resolved Hide resolved
crates/astria-sequencer/src/mempool/mod.rs Outdated Show resolved Hide resolved
@Lilyjjo Lilyjjo added this pull request to the merge queue Oct 1, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 1, 2024
@Lilyjjo Lilyjjo added this pull request to the merge queue Oct 2, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 2, 2024
@Lilyjjo Lilyjjo added this pull request to the merge queue Oct 2, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 2, 2024
@Lilyjjo Lilyjjo added this pull request to the merge queue Oct 2, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 2, 2024
Copy link
Member

@SuperFluffy SuperFluffy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will follow @Fraser999 approval, but I have made some comments here and there.

I want to especially call attention to the bit which is acquiring a write-lock in a loop: can you leave a comment there why that's acceptable?

}

#[tokio::test]
async fn receck_adds_non_tracked_tx() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
async fn receck_adds_non_tracked_tx() {
async fn recheck_adds_non_tracked_tx() {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still a spelling issue here.

};

#[tokio::test]
async fn future_nonce_ok() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe?

Suggested change
async fn future_nonce_ok() {
async fn future_nonces_are_accepted() {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a more descriptive test description, like what I suggested or something else.

@@ -1,3 +1,6 @@
#[cfg(test)]
mod tests;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very mild nit: modules are usually sorted after imports. It's odd that rustfmt does not do this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually move this below the imports please. We are following the official Rust style guide: https://doc.rust-lang.org/stable/style-guide/items.html

@@ -109,6 +109,10 @@ impl TimemarkedTransaction {
pub(super) fn cost(&self) -> &HashMap<IbcPrefixed, u128> {
&self.cost
}

pub(super) fn id(&self) -> [u8; 32] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this using our new TransactionId?

That's something for a followup, but we should use that newtype instead of the inner array.

Also (@Fraser999 effort because references are 8 bytes on the stack whereas this array is 32 bytes on the stack):

Suggested change
pub(super) fn id(&self) -> [u8; 32] {
pub(super) fn id(&self) -> &[u8; 32] {

crates/astria-sequencer/src/mempool/mod.rs Show resolved Hide resolved
@Lilyjjo Lilyjjo enabled auto-merge October 2, 2024 17:39
@Lilyjjo Lilyjjo added this pull request to the merge queue Oct 2, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 2, 2024
@Lilyjjo Lilyjjo added this pull request to the merge queue Oct 2, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 2, 2024
@Lilyjjo Lilyjjo added this pull request to the merge queue Oct 2, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 2, 2024
@Lilyjjo Lilyjjo added this pull request to the merge queue Oct 2, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 2, 2024
@Lilyjjo Lilyjjo added this pull request to the merge queue Oct 2, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 2, 2024
@Lilyjjo Lilyjjo changed the title fix(sequencer): rewrite check_tx to be more efficient and fix regression fix(sequencer)!: rewrite check_tx to be more efficient and fix regression Oct 2, 2024
@Lilyjjo Lilyjjo changed the title fix(sequencer)!: rewrite check_tx to be more efficient and fix regression fix(sequencer): rewrite check_tx to be more efficient and fix regression Oct 2, 2024
Copy link
Member

@SuperFluffy SuperFluffy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm blocking this so I can have a second pass over this.

Copy link
Member

@SuperFluffy SuperFluffy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have gone over the PR and left some comments. Reading the PR description this patch attempts to tackle 3 things at the same time:

  1. avoid rerunning expensive checks that have already been performed once. I believe this new functionality is covered by the changes in crate::service::mempool. It is enabled by the functions check_removed_comet_bft and is_tracked. The is_tracked makes use of the new contained_txs hashset. Question: is this an optimization to avoid using pending and parked mempool fields? Clearly the mempool needed to track the transactions contained therein before - so what's the reason to have an extra index of transactions? It would be nice to see that explained in the patch description. It would furthermore be useful to understand how this new index and the previous caches will be kept in sync.

  2. fix a regression for CheckTxKind::Recheck. Grepping this PR I only find recheck mention in tests. Looking at the fn handle_check_tx function the request::CheckTx is immediately deconstructed and the CheckTxKind::Recheck never actively considered. It hence looks to me as if we don't discriminate between checks of new transactions and rechecked of transactions existing in the mempool. Given that this is a regression called out in both the patch title and text, it would be nice to have this explained.

  3. introduce a way to maintain state on sequencer restarts. This part seems to be completely orthogonal to the goals 1. and 2. All related code should be removed from this patch and moved into a new PR. Having said that, it seems that the index of checked transactions (the hashset mentioned in the PR description) was introduced to enable the main goal of the PR (namely skipping unnecessary checks summarized under 1.). Enabling restart logic thus seems to be a useful side effect? Either way, it is confusing to the reviewer in how far this should be made part of the patch, and why it wasn't moved to a new patch. The patch description should be amended accordingly and better explained.


The semantics of adding or removing transactions from the mempool-tracked txs is weird. Looking at the code that makes use of add_from_contained_txs and remove_from_contained_txs it looks like you would be better served by creating a custom guard like this (I have intentionally left out the metrics stuff to keep the example minimal; add them as necessary):

struct TxLock<'a> {
    mempool: &'a Mempool,
    txs: RwLockWriteGuard<'a, HashSet<[u8; 32]>>,
}

impl<'a> TxLock<'a> {
    fn add(&mut self, id: [u8; 32]) {
        self.tx.insert(id);
    }
    fn remove(&mut self, id: [u8; 32]) {
        self.tx.remove(id);
    }
}

impl Mempool {
    async fn lock_txs(&self) -> TxLock<'_> {
        TxLock {
            mempool: self,
            txs: mempool.contained_txs.write().await,
        }
    }
}

crates/astria-sequencer/src/mempool/mod.rs Show resolved Hide resolved

/// Removes a transaction from the mempool's tracked transactions.
/// Will increment logic error metrics and log error if transaction is not present.
fn remove_from_contained_txs(&self, tx_hash: [u8; 32], contained_txs: &mut HashSet<[u8; 32]>) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My arguments for the mempool bits above also apply here: this is not removing a transactions from the mempool tracked transactions. It's mutating an argument. This method seems to be a vehicle to access the metrics.

Change or remove please.

}

#[tokio::test]
async fn receck_adds_non_tracked_tx() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still a spelling issue here.

@@ -1,3 +1,6 @@
#[cfg(test)]
mod tests;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually move this below the imports please. We are following the official Rust style guide: https://doc.rust-lang.org/stable/style-guide/items.html


// tx is valid, push to mempool with current state
// generate address for the signed transaction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This address construction is not necessary I believe (but fixing it is the scope of this PR).

Please remove this comment and create a followup issue to just us the address bytes to fetch all relevant information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracked: #1620

}
RemovalReason::LowerNonceInvalidated => {
return Err(response::CheckTx {
code: Code::Err(AbciErrorCode::LOWER_NONCE_INVALIDATED.value()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent behind info was to give a human readable description of the abci code. So please use AbciErrorCode::LOWER_NONCE_INVALIDATED.to_string() or, if you need a more granular reason, add a new abci code.

Alternatively, provide more background information to the log line.

info: "transaction removed from app mempool due to lower nonce being \
invalidated"
.into(),
log: "Transaction removed from app mempool due to lower nonce being \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lower case. Also, log and info seem to provide the same information. That doesn't seem useful.

return Err(response::CheckTx {
code: Code::Err(AbciErrorCode::INVALID_NONCE.value()),
info: "transaction removed from app mempool due to stale nonce".into(),
log: "Transaction from app mempool due to stale nonce".into(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lower case - also info and log seem to have the same text. Doesn't seem useful.

if let Err(error) =
pending.add(ttx, current_account_nonce, &current_account_balances)
{
// remove from tracked
// note: this branch is not expected to be hit so grabbing the lock inside
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convention is to use upper case XXX, TODO, FIXME, NOTE etc for comments like this.

Suggested change
// note: this branch is not expected to be hit so grabbing the lock inside
// NOTE: this branch is not expected to be hit so grabbing the lock inside

if let Err(error) =
pending.add(ttx, current_account_nonce, &current_account_balances)
{
// remove from tracked
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If code doing something as simple as this needs to be commented, then its either not self evident (leave a comment tagged with XXX or NOTE, see below), or adjust the called methods to make it evident.

tx_hash: [u8; 32],
mempool: &AppMempool,
metrics: &Metrics,
) -> Result<(), response::CheckTx> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just return a RemovalReason here and instead implement a new trait IntoCheckTxResponse on it so that you don't need to do the conversion inline.

state: &S,
signed_tx: SignedTransaction,
metrics: &'static Metrics,
) -> Result<(), response::CheckTx> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have noted it for RemovalReason, but it also applies here: all the error-to-response conversions could be made a lot neater by adding a trait IntoCheckTxResponse and moving the conversions there instead of having them inlined in the function body.

@@ -18,6 +21,7 @@ use astria_core::{
},
};
use astria_eyre::eyre::WrapErr as _;
use bytes::Bytes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this import used anywhere here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sequencer pertaining to the astria-sequencer crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sequencer/mempool: CheckTX fails all rechecks
4 participants