fix(sequencer): rewrite check_tx to be more efficient and fix regression #1515

Lilyjjo · 2024-09-18T16:14:01Z

Summary

Previously handle_check_tx()'s processing of CheckTx::Recheck would always run all stateless checks and attempt insertion. This PR modified the function to only run the checks on transactions that have not been removed or are not contained inside of the app side mempool.

This PR additionally adds a HashSet of transaction hashes that are contained in the app's Mempool to the state of the app Mempool. This is used in the scenario where the sequencer restarts-- CometBFT persists transactions but our app mempool does not. The cache is used to allow transactions to re-enter the mempool on restarts.

This PR also restores the original behavior of CheckTX to not kick out transaction needlessly on rechecks, this regression was added when upgrading the mempool in #1323.

Background

A lot of the functionality from the original handle_check_tx() has moved into the app side mempool itself, including the balance and nonce checks. All of the remaining checks will not change between rechecks and only need to be ran when considering a transaction for insertion into the appside mempool.

Testing

Ran locally and added unit tests to ensure future nonces can be added and rechecks do not remove transactions needlessly.

Metrics

Deleted check_tx_removed_stale_nonce
Added check_tx_duration_seconds_check_tracked, mempool_tx_logic_error
Changed check_tx_duration_seconds_check_nonce to check_tx_duration_seconds_fetch_nonce

Related Issues

closes #1481

noot · 2024-09-26T20:17:05Z

crates/astria-sequencer/src/mempool/mod.rs

-                parked.add(
-                    timemarked_tx,
+                match parked.add(
+                    timemarked_tx.clone(),


i think you can do let id = timemarked_tx.id() before this, then use that later on in the function and remove the clone()

noot · 2024-09-26T20:24:53Z

crates/astria-sequencer/src/service/mempool/mod.rs

+    if let Some(rsp) = insert_into_mempool(mempool, &state, signed_tx, metrics).await {
+        return rsp;
+    }


this function returning an Option that's only set if there's an error response is somewhat confusing, would prefer for this function to return a Result<(), response::CheckTx> so the error variant is the error response

same for check_removed_comet_bft

Changed it, feels much better that way

noot · 2024-09-26T20:26:45Z

crates/astria-sequencer/src/service/mempool/mod.rs

+/// Returns a [`response::CheckTx`] if the transaction fails any of the checks.
+/// Otherwise, it returns the [`SignedTransaction`] to be inserted into the mempool.


you can use a Result<SignedTransaction, response::CheckTx> as this function's return

noot · 2024-09-26T20:54:56Z

I think this PR looks good, just a few smaller comments!

This PR additionally adds a HashSet of transaction hashes that are contained in the app's Mempool to the state of the app Mempool. This is used in the scenario where the sequencer restarts-- CometBFT persists transactions but our app mempool does not. The cache is used to allow transactions to re-enter the mempool on restarts.

how does this work? with cometbft recheck?

Lilyjjo · 2024-09-26T22:40:05Z

This PR additionally adds a HashSet of transaction hashes that are contained in the app's Mempool to the state of the app Mempool. This is used in the scenario where the sequencer restarts-- CometBFT persists transactions but our app mempool does not. The cache is used to allow transactions to re-enter the mempool on restarts.

how does this work? with cometbft recheck?

Hmmm. I think I misheard why we wanted to allow rechecks to add to the appside mempool. I checked out the CometBFT code and didn't see it persisting the transactions anywhere that was accessible, (I wonder if I missed it). In the scenario where CometBFT doesn't persist, then the allowing recheck would only be useful if CometBFT doesn't crash if the sequencer app crashses. I'll do some more research

crates/astria-sequencer/src/mempool/mod.rs

crates/astria-sequencer/src/service/mempool/mod.rs

crates/astria-sequencer/src/mempool/mod.rs

SuperFluffy

I will follow @Fraser999 approval, but I have made some comments here and there.

I want to especially call attention to the bit which is acquiring a write-lock in a loop: can you leave a comment there why that's acceptable?

SuperFluffy · 2024-10-02T13:05:40Z

crates/astria-sequencer/src/service/mempool/tests.rs

+}
+
+#[tokio::test]
+async fn receck_adds_non_tracked_tx() {


Suggested change

async fn receck_adds_non_tracked_tx() {

async fn recheck_adds_non_tracked_tx() {

There is still a spelling issue here.

SuperFluffy · 2024-10-02T13:06:16Z

crates/astria-sequencer/src/service/mempool/tests.rs

+};
+
+#[tokio::test]
+async fn future_nonce_ok() {


Maybe?

Suggested change

async fn future_nonce_ok() {

async fn future_nonces_are_accepted() {

Please use a more descriptive test description, like what I suggested or something else.

SuperFluffy · 2024-10-02T13:08:30Z

crates/astria-sequencer/src/service/mempool/mod.rs

@@ -1,3 +1,6 @@
+#[cfg(test)]
+mod tests;


Very mild nit: modules are usually sorted after imports. It's odd that rustfmt does not do this.

Actually move this below the imports please. We are following the official Rust style guide: https://doc.rust-lang.org/stable/style-guide/items.html

SuperFluffy · 2024-10-02T13:11:22Z

crates/astria-sequencer/src/mempool/transactions_container.rs

@@ -109,6 +109,10 @@ impl TimemarkedTransaction {
    pub(super) fn cost(&self) -> &HashMap<IbcPrefixed, u128> {
        &self.cost
    }
+
+    pub(super) fn id(&self) -> [u8; 32] {


Is this using our new TransactionId?

That's something for a followup, but we should use that newtype instead of the inner array.

Also (@Fraser999 effort because references are 8 bytes on the stack whereas this array is 32 bytes on the stack):

Suggested change

pub(super) fn id(&self) -> [u8; 32] {

pub(super) fn id(&self) -> &[u8; 32] {

crates/astria-sequencer/src/mempool/mod.rs

SuperFluffy

I'm blocking this so I can have a second pass over this.

SuperFluffy

I have gone over the PR and left some comments. Reading the PR description this patch attempts to tackle 3 things at the same time:

avoid rerunning expensive checks that have already been performed once. I believe this new functionality is covered by the changes in crate::service::mempool. It is enabled by the functions check_removed_comet_bft and is_tracked. The is_tracked makes use of the new contained_txs hashset. Question: is this an optimization to avoid using pending and parked mempool fields? Clearly the mempool needed to track the transactions contained therein before - so what's the reason to have an extra index of transactions? It would be nice to see that explained in the patch description. It would furthermore be useful to understand how this new index and the previous caches will be kept in sync.
fix a regression for CheckTxKind::Recheck. Grepping this PR I only find recheck mention in tests. Looking at the fn handle_check_tx function the request::CheckTx is immediately deconstructed and the CheckTxKind::Recheck never actively considered. It hence looks to me as if we don't discriminate between checks of new transactions and rechecked of transactions existing in the mempool. Given that this is a regression called out in both the patch title and text, it would be nice to have this explained.
introduce a way to maintain state on sequencer restarts. This part seems to be completely orthogonal to the goals 1. and 2. All related code should be removed from this patch and moved into a new PR. Having said that, it seems that the index of checked transactions (the hashset mentioned in the PR description) was introduced to enable the main goal of the PR (namely skipping unnecessary checks summarized under 1.). Enabling restart logic thus seems to be a useful side effect? Either way, it is confusing to the reviewer in how far this should be made part of the patch, and why it wasn't moved to a new patch. The patch description should be amended accordingly and better explained.

The semantics of adding or removing transactions from the mempool-tracked txs is weird. Looking at the code that makes use of add_from_contained_txs and remove_from_contained_txs it looks like you would be better served by creating a custom guard like this (I have intentionally left out the metrics stuff to keep the example minimal; add them as necessary):

struct TxLock<'a> {
    mempool: &'a Mempool,
    txs: RwLockWriteGuard<'a, HashSet<[u8; 32]>>,
}

impl<'a> TxLock<'a> {
    fn add(&mut self, id: [u8; 32]) {
        self.tx.insert(id);
    }
    fn remove(&mut self, id: [u8; 32]) {
        self.tx.remove(id);
    }
}

impl Mempool {
    async fn lock_txs(&self) -> TxLock<'_> {
        TxLock {
            mempool: self,
            txs: mempool.contained_txs.write().await,
        }
    }
}

crates/astria-sequencer/src/mempool/mod.rs

SuperFluffy · 2024-10-03T11:45:21Z

crates/astria-sequencer/src/mempool/mod.rs

+
+    /// Removes a transaction from the mempool's tracked transactions.
+    /// Will increment logic error metrics and log error if transaction is not present.
+    fn remove_from_contained_txs(&self, tx_hash: [u8; 32], contained_txs: &mut HashSet<[u8; 32]>) {


My arguments for the mempool bits above also apply here: this is not removing a transactions from the mempool tracked transactions. It's mutating an argument. This method seems to be a vehicle to access the metrics.

Change or remove please.

SuperFluffy · 2024-10-03T12:25:54Z

crates/astria-sequencer/src/service/mempool/tests.rs

+}
+
+#[tokio::test]
+async fn receck_adds_non_tracked_tx() {


There is still a spelling issue here.

SuperFluffy · 2024-10-03T12:26:11Z

crates/astria-sequencer/src/service/mempool/mod.rs

@@ -1,3 +1,6 @@
+#[cfg(test)]
+mod tests;


Actually move this below the imports please. We are following the official Rust style guide: https://doc.rust-lang.org/stable/style-guide/items.html

SuperFluffy · 2024-10-03T12:30:04Z

crates/astria-sequencer/src/service/mempool/mod.rs


-    // tx is valid, push to mempool with current state
+    // generate address for the signed transaction


This address construction is not necessary I believe (but fixing it is the scope of this PR).

Please remove this comment and create a followup issue to just us the address bytes to fetch all relevant information.

Tracked: #1620

SuperFluffy · 2024-10-03T12:54:37Z

crates/astria-sequencer/src/service/mempool/mod.rs

+            }
+            RemovalReason::LowerNonceInvalidated => {
+                return Err(response::CheckTx {
+                    code: Code::Err(AbciErrorCode::LOWER_NONCE_INVALIDATED.value()),


The intent behind info was to give a human readable description of the abci code. So please use AbciErrorCode::LOWER_NONCE_INVALIDATED.to_string() or, if you need a more granular reason, add a new abci code.

Alternatively, provide more background information to the log line.

SuperFluffy · 2024-10-03T12:55:04Z

crates/astria-sequencer/src/service/mempool/mod.rs

+                    info: "transaction removed from app mempool due to lower nonce being \
+                           invalidated"
+                        .into(),
+                    log: "Transaction removed from app mempool due to lower nonce being \


Lower case. Also, log and info seem to provide the same information. That doesn't seem useful.

SuperFluffy · 2024-10-03T12:55:40Z

crates/astria-sequencer/src/service/mempool/mod.rs

+                return Err(response::CheckTx {
+                    code: Code::Err(AbciErrorCode::INVALID_NONCE.value()),
+                    info: "transaction removed from app mempool due to stale nonce".into(),
+                    log: "Transaction from app mempool due to stale nonce".into(),


Lower case - also info and log seem to have the same text. Doesn't seem useful.

SuperFluffy · 2024-10-03T12:58:48Z

crates/astria-sequencer/src/mempool/mod.rs

                    if let Err(error) =
                        pending.add(ttx, current_account_nonce, &current_account_balances)
                    {
+                        // remove from tracked
+                        // note: this branch is not expected to be hit so grabbing the lock inside


Convention is to use upper case XXX, TODO, FIXME, NOTE etc for comments like this.

Suggested change

// note: this branch is not expected to be hit so grabbing the lock inside

// NOTE: this branch is not expected to be hit so grabbing the lock inside

SuperFluffy · 2024-10-03T13:00:24Z

crates/astria-sequencer/src/mempool/mod.rs

                    if let Err(error) =
                        pending.add(ttx, current_account_nonce, &current_account_balances)
                    {
+                        // remove from tracked


If code doing something as simple as this needs to be commented, then its either not self evident (leave a comment tagged with XXX or NOTE, see below), or adjust the called methods to make it evident.

SuperFluffy · 2024-10-03T13:17:49Z

crates/astria-sequencer/src/service/mempool/mod.rs

+    tx_hash: [u8; 32],
+    mempool: &AppMempool,
+    metrics: &Metrics,
+) -> Result<(), response::CheckTx> {


You could just return a RemovalReason here and instead implement a new trait IntoCheckTxResponse on it so that you don't need to do the conversion inline.

SuperFluffy · 2024-10-03T13:19:12Z

crates/astria-sequencer/src/service/mempool/mod.rs

+    state: &S,
+    signed_tx: SignedTransaction,
+    metrics: &'static Metrics,
+) -> Result<(), response::CheckTx> {


I have noted it for RemovalReason, but it also applies here: all the error-to-response conversions could be made a lot neater by adding a trait IntoCheckTxResponse and moving the conversions there instead of having them inlined in the function body.

SuperFluffy · 2024-10-03T13:36:24Z

crates/astria-sequencer/src/service/mempool/mod.rs

@@ -18,6 +21,7 @@ use astria_core::{
    },
 };
 use astria_eyre::eyre::WrapErr as _;
+use bytes::Bytes;


Is this import used anywhere here?

github-actions bot added the sequencer pertaining to the astria-sequencer crate label Sep 18, 2024

Lilyjjo changed the title ~~separate out checks~~ sequencer, mempool: rewrite check_tx to be more efficient Sep 18, 2024

Lilyjjo changed the title ~~sequencer, mempool: rewrite check_tx to be more efficient~~ chore: sequencer, mempool: rewrite check_tx to be more efficient Sep 18, 2024

Lilyjjo changed the title ~~chore: sequencer, mempool: rewrite check_tx to be more efficient~~ chore: sequencer: rewrite check_tx to be more efficient Sep 18, 2024

Lilyjjo changed the title ~~chore: sequencer: rewrite check_tx to be more efficient~~ fix(sequencer): rewrite check_tx to be more efficient and fix regression Sep 18, 2024

Lilyjjo force-pushed the lilyjjo/check-tx-rewrite branch 3 times, most recently from 577d0c4 to 0de80ec Compare September 24, 2024 18:19

Lilyjjo marked this pull request as ready for review September 24, 2024 19:17

Lilyjjo requested a review from a team as a code owner September 24, 2024 19:17

Lilyjjo requested a review from Fraser999 September 24, 2024 19:17

Lilyjjo force-pushed the lilyjjo/check-tx-rewrite branch from 0a4f435 to 56ff638 Compare September 26, 2024 12:57

Lilyjjo requested review from noot and SuperFluffy September 26, 2024 13:06

noot reviewed Sep 26, 2024

View reviewed changes

Lilyjjo requested a review from noot September 26, 2024 22:40

Fraser999 mentioned this pull request Sep 27, 2024

ENG-33 chore(sequencer): add cache for immutable check-tx results #1271

Closed

Fraser999 reviewed Sep 27, 2024

View reviewed changes

Lilyjjo requested a review from Fraser999 September 28, 2024 17:38

Lilyjjo added 3 commits September 30, 2024 11:06

rebase work on main

9dd9628

respond to comments from @noot

83e4718

respond to comment from @Fraser999

af24538

Lilyjjo force-pushed the lilyjjo/check-tx-rewrite branch from 345fcab to af24538 Compare September 30, 2024 15:06

Fraser999 reviewed Oct 1, 2024

View reviewed changes

Lilyjjo added 2 commits October 1, 2024 10:59

update error logs and rename alert

ddffdd8

Merge branch 'main' into lilyjjo/check-tx-rewrite

2f3a9c8

Lilyjjo added this pull request to the merge queue Oct 1, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 1, 2024

Lilyjjo added this pull request to the merge queue Oct 2, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 2, 2024

Lilyjjo added this pull request to the merge queue Oct 2, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 2, 2024

Lilyjjo added this pull request to the merge queue Oct 2, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 2, 2024

SuperFluffy reviewed Oct 2, 2024

View reviewed changes

Lilyjjo and others added 2 commits October 2, 2024 13:33

Merge branch 'main' into lilyjjo/check-tx-rewrite

2670373

add comment

021faa2

Lilyjjo enabled auto-merge October 2, 2024 17:39