From b16237ad6f019667a59b0e3e726f6ac20e2d0a1c Mon Sep 17 00:00:00 2001 From: Alexandru Gheorghe <49718502+alexggh@users.noreply.github.com> Date: Thu, 26 Sep 2024 15:11:00 +0300 Subject: [PATCH] [5 / 5] Introduce approval-voting-parallel (#4849) This is the implementation of the approach described here: https://github.com/paritytech/polkadot-sdk/issues/1617#issuecomment-2150321612 & https://github.com/paritytech/polkadot-sdk/issues/1617#issuecomment-2154357547 & https://github.com/paritytech/polkadot-sdk/issues/1617#issuecomment-2154721395. ## Description of changes The end goal is to have an architecture where we have single subsystem(`approval-voting-parallel`) and multiple worker types that would full-fill the work that currently is fulfilled by the `approval-distribution` and `approval-voting` subsystems. The main loop of the new subsystem would do just the distribution of work to the workers. The new subsystem will have: - N approval-distribution workers: This would do the work that is currently being done by the approval-distribution subsystem and in addition to that will also perform the crypto-checks that an assignment is valid and that a vote is correctly signed. Work is assigned via the following formula: `worker_index = msg.validator % WORKER_COUNT`, this guarantees that all assignments and approvals from the same validator reach the same worker. - 1 approval-voting worker: This would receive an already valid message and do everything the approval-voting currently does, except the crypto-checking that has been moved already to the approval-distribution worker. On the hot path of processing messages **no** synchronisation and waiting is needed between approval-distribution and approval-voting workers. Screenshot 2024-06-07 at 11 28 08 ## Guidelines for reading The full implementation is broken in 5 PRs and all of them are self-contained and improve things incrementally even without the parallelisation being implemented/enabled, the reason this approach was taken instead of a big-bang PR, is to make things easier to review and reduced the risk of breaking this critical subsystems. After reading the full description of this PR, the changes should be read in the following order: 1. https://github.com/paritytech/polkadot-sdk/pull/4848, some other micro-optimizations for networks with a high number of validators. This change gives us a speed up by itself without any other changes. 2. https://github.com/paritytech/polkadot-sdk/pull/4845 , this contains only interface changes to decouple the subsystem from the `Context` and be able to run multiple instances of the subsystem on different threads. **No functional changes** 3. https://github.com/paritytech/polkadot-sdk/pull/4928, moving of the crypto checks from approval-voting in approval-distribution, so that the approval-distribution has no reason to wait after approval-voting anymore. This change gives us a speed up by itself without any other changes. 4. https://github.com/paritytech/polkadot-sdk/pull/4846, interface changes to make approval-voting runnable on a separate thread. **No functional changes** 5. This PR, where we instantiate an `approval-voting-parallel` subsystem that runs on different workers the logic currently in `approval-distribution` and `approval-voting`. 6. The next step after this changes get merged and deploy would be to bring all the files from approval-distribution, approval-voting, approval-voting-parallel into a single rust crate, to make it easier to maintain and understand the structure. ## Results Running subsystem-benchmarks with 1000 validators 100 fully ocuppied cores and triggering all assignments and approvals for all tranches #### Approval does not lags behind. Master ``` Chain selection approved after 72500 ms hash=0x0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a ``` With this PoC ``` Chain selection approved after 3500 ms hash=0x0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a ``` #### Gathering enough assignments Enough assignments are gathered in less than 500ms, so that gives un a guarantee that un-necessary work does not get triggered, on master on the same benchmark because the subsystems fall behind on work, that number goes above 32 seconds on master. Screenshot 2024-06-20 at 15 48 22 #### Cpu usage: Master ``` CPU usage, seconds total per block approval-distribution 96.9436 9.6944 approval-voting 117.4676 11.7468 test-environment 44.0092 4.4009 ``` With this PoC ``` CPU usage, seconds total per block approval-distribution 0.0014 0.0001 --- unused approval-voting 0.0437 0.0044. --- unused approval-voting-parallel 5.9560 0.5956 approval-voting-parallel-0 22.9073 2.2907 approval-voting-parallel-1 23.0417 2.3042 approval-voting-parallel-2 22.0445 2.2045 approval-voting-parallel-3 22.7234 2.2723 approval-voting-parallel-4 21.9788 2.1979 approval-voting-parallel-5 23.0601 2.3060 approval-voting-parallel-6 22.4805 2.2481 approval-voting-parallel-7 21.8330 2.1833 approval-voting-parallel-db 37.1954 3.7195. --- the approval-voting thread. ``` # Enablement strategy Because just some trivial plumbing is needed in approval-distribution and approval-voting to be able to run things in parallel and because this subsystems plays a critical part in the system this PR proposes that we keep both ways of running the approval work, as separated subsystems and just a single subsystem(`approval-voting-parallel`) which has multiple workers for the distribution work and one worker for the approval-voting work and switch between them with a comandline flag. The benefits for this is twofold. 1. With the same polkadot binary we can easily switch just a few validators to use the parallel approach and gradually make this the default way of running, if now issues arise. 2. In the worst case scenario were it becomes the default way of running things, but we discover there are critical issues with it we have the path to quickly disable it by asking validators to adjust their command line flags. # Next steps - [x] Make sure through various testing we are not missing anything - [x] Polish the implementations to make them production ready - [x] Add Unittest Tests for approval-voting-parallel. - [x] Define and implement the strategy for rolling this change, so that the blast radius is minimal(single validator) in case there are problems with the implementation. - [x] Versi long running tests. - [x] Add relevant metrics. @ordian @eskimor @sandreim @AndreiEres, let me know what you think. --------- Signed-off-by: Alexandru Gheorghe --- .gitlab/pipeline/zombienet/polkadot.yml | 8 + Cargo.lock | 46 + Cargo.toml | 2 + .../src/lib.rs | 1 + polkadot/cli/src/cli.rs | 7 + polkadot/cli/src/command.rs | 1 + .../core/approval-voting-parallel/Cargo.toml | 55 + .../core/approval-voting-parallel/src/lib.rs | 957 +++++++++++++ .../approval-voting-parallel/src/metrics.rs | 236 ++++ .../approval-voting-parallel/src/tests.rs | 1178 +++++++++++++++++ .../approval-voting-regression-bench.rs | 1 + .../dispute-coordinator/src/initialized.rs | 33 +- .../node/core/dispute-coordinator/src/lib.rs | 4 +- .../core/dispute-coordinator/src/tests.rs | 1 + .../network/approval-distribution/src/lib.rs | 9 +- .../approval-distribution/src/metrics.rs | 63 +- .../approval-distribution/src/tests.rs | 75 +- polkadot/node/network/bridge/src/rx/mod.rs | 66 +- polkadot/node/network/bridge/src/rx/tests.rs | 1 + polkadot/node/overseer/src/dummy.rs | 4 + polkadot/node/overseer/src/lib.rs | 30 +- polkadot/node/overseer/src/tests.rs | 11 +- polkadot/node/primitives/src/approval/mod.rs | 2 +- polkadot/node/service/Cargo.toml | 2 + polkadot/node/service/src/lib.rs | 13 + polkadot/node/service/src/overseer.rs | 261 +++- .../node/service/src/relay_chain_selection.rs | 65 +- polkadot/node/service/src/tests.rs | 1 + polkadot/node/subsystem-bench/Cargo.toml | 1 + .../examples/approvals_no_shows.yaml | 1 + .../examples/approvals_throughput.yaml | 1 + .../approvals_throughput_best_case.yaml | 1 + .../src/lib/approval/helpers.rs | 29 +- .../subsystem-bench/src/lib/approval/mod.rs | 170 ++- .../src/lib/availability/mod.rs | 13 +- .../node/subsystem-bench/src/lib/display.rs | 17 + .../subsystem-bench/src/lib/environment.rs | 30 +- .../subsystem-bench/src/lib/mock/dummy.rs | 1 + .../node/subsystem-bench/src/lib/mock/mod.rs | 1 + .../src/lib/mock/network_bridge.rs | 27 +- .../subsystem-bench/src/lib/statement/mod.rs | 5 +- .../node/subsystem-bench/src/lib/usage.rs | 6 +- polkadot/node/subsystem-types/src/messages.rs | 97 ++ polkadot/node/test/service/src/lib.rs | 2 + .../adder/collator/src/main.rs | 1 + .../undying/collator/src/main.rs | 1 + .../node/approval/approval-voting-parallel.md | 30 + .../0009-approval-voting-coalescing.toml | 2 +- .../0016-approval-voting-parallel.toml | 120 ++ .../0016-approval-voting-parallel.zndsl | 35 + prdoc/pr_4849.prdoc | 47 + umbrella/Cargo.toml | 7 +- umbrella/src/lib.rs | 4 + 53 files changed, 3565 insertions(+), 217 deletions(-) create mode 100644 polkadot/node/core/approval-voting-parallel/Cargo.toml create mode 100644 polkadot/node/core/approval-voting-parallel/src/lib.rs create mode 100644 polkadot/node/core/approval-voting-parallel/src/metrics.rs create mode 100644 polkadot/node/core/approval-voting-parallel/src/tests.rs create mode 100644 polkadot/roadmap/implementers-guide/src/node/approval/approval-voting-parallel.md create mode 100644 polkadot/zombienet_tests/functional/0016-approval-voting-parallel.toml create mode 100644 polkadot/zombienet_tests/functional/0016-approval-voting-parallel.zndsl create mode 100644 prdoc/pr_4849.prdoc diff --git a/.gitlab/pipeline/zombienet/polkadot.yml b/.gitlab/pipeline/zombienet/polkadot.yml index 93fc4bbb578a..e25bc4ca2293 100644 --- a/.gitlab/pipeline/zombienet/polkadot.yml +++ b/.gitlab/pipeline/zombienet/polkadot.yml @@ -223,6 +223,14 @@ zombienet-polkadot-functional-0015-coretime-shared-core: --local-dir="${LOCAL_DIR}/functional" --test="0015-coretime-shared-core.zndsl" +zombienet-polkadot-functional-0016-approval-voting-parallel: + extends: + - .zombienet-polkadot-common + script: + - /home/nonroot/zombie-net/scripts/ci/run-test-local-env-manager.sh + --local-dir="${LOCAL_DIR}/functional" + --test="0016-approval-voting-parallel.zndsl" + zombienet-polkadot-smoke-0001-parachains-smoke-test: extends: - .zombienet-polkadot-common diff --git a/Cargo.lock b/Cargo.lock index 61f485bcecb3..c20c8f71c80a 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -14268,6 +14268,49 @@ dependencies = [ "tracing-gum", ] +[[package]] +name = "polkadot-node-core-approval-voting-parallel" +version = "7.0.0" +dependencies = [ + "assert_matches", + "async-trait", + "futures", + "futures-timer", + "itertools 0.11.0", + "kvdb-memorydb", + "log", + "parking_lot 0.12.3", + "polkadot-approval-distribution", + "polkadot-node-core-approval-voting", + "polkadot-node-jaeger", + "polkadot-node-metrics", + "polkadot-node-network-protocol", + "polkadot-node-primitives", + "polkadot-node-subsystem", + "polkadot-node-subsystem-test-helpers", + "polkadot-node-subsystem-util", + "polkadot-overseer", + "polkadot-primitives", + "polkadot-primitives-test-helpers", + "polkadot-subsystem-bench", + "rand", + "rand_chacha", + "rand_core 0.6.4", + "sc-keystore", + "schnorrkel 0.11.4", + "sp-application-crypto 30.0.0", + "sp-consensus", + "sp-consensus-babe", + "sp-consensus-slots", + "sp-core 28.0.0", + "sp-keyring", + "sp-keystore 0.34.0", + "sp-runtime 31.0.1", + "sp-tracing 16.0.0", + "thiserror", + "tracing-gum", +] + [[package]] name = "polkadot-node-core-av-store" version = "7.0.0" @@ -15397,6 +15440,7 @@ dependencies = [ "polkadot-network-bridge", "polkadot-node-collation-generation", "polkadot-node-core-approval-voting", + "polkadot-node-core-approval-voting-parallel", "polkadot-node-core-av-store", "polkadot-node-core-backing", "polkadot-node-core-bitfield-signing", @@ -15736,6 +15780,7 @@ dependencies = [ "polkadot-network-bridge", "polkadot-node-collation-generation", "polkadot-node-core-approval-voting", + "polkadot-node-core-approval-voting-parallel", "polkadot-node-core-av-store", "polkadot-node-core-backing", "polkadot-node-core-bitfield-signing", @@ -15892,6 +15937,7 @@ dependencies = [ "polkadot-availability-recovery", "polkadot-erasure-coding", "polkadot-node-core-approval-voting", + "polkadot-node-core-approval-voting-parallel", "polkadot-node-core-av-store", "polkadot-node-core-chain-api", "polkadot-node-metrics", diff --git a/Cargo.toml b/Cargo.toml index b7c9c0cdcbf1..c92254242fc9 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -158,6 +158,7 @@ members = [ "polkadot/erasure-coding/fuzzer", "polkadot/node/collation-generation", "polkadot/node/core/approval-voting", + "polkadot/node/core/approval-voting-parallel", "polkadot/node/core/av-store", "polkadot/node/core/backing", "polkadot/node/core/bitfield-signing", @@ -1032,6 +1033,7 @@ polkadot-gossip-support = { path = "polkadot/node/network/gossip-support", defau polkadot-network-bridge = { path = "polkadot/node/network/bridge", default-features = false } polkadot-node-collation-generation = { path = "polkadot/node/collation-generation", default-features = false } polkadot-node-core-approval-voting = { path = "polkadot/node/core/approval-voting", default-features = false } +polkadot-node-core-approval-voting-parallel = { path = "polkadot/node/core/approval-voting-parallel", default-features = false } polkadot-node-core-av-store = { path = "polkadot/node/core/av-store", default-features = false } polkadot-node-core-backing = { path = "polkadot/node/core/backing", default-features = false } polkadot-node-core-bitfield-signing = { path = "polkadot/node/core/bitfield-signing", default-features = false } diff --git a/cumulus/client/relay-chain-inprocess-interface/src/lib.rs b/cumulus/client/relay-chain-inprocess-interface/src/lib.rs index 4fea055203d8..f0a082dce534 100644 --- a/cumulus/client/relay-chain-inprocess-interface/src/lib.rs +++ b/cumulus/client/relay-chain-inprocess-interface/src/lib.rs @@ -379,6 +379,7 @@ fn build_polkadot_full_node( execute_workers_max_num: None, prepare_workers_hard_max_num: None, prepare_workers_soft_max_num: None, + enable_approval_voting_parallel: false, }, )?; diff --git a/polkadot/cli/src/cli.rs b/polkadot/cli/src/cli.rs index 3e5a6ccdd3c2..1445ade08e20 100644 --- a/polkadot/cli/src/cli.rs +++ b/polkadot/cli/src/cli.rs @@ -151,6 +151,13 @@ pub struct RunCmd { /// TESTING ONLY: disable the version check between nodes and workers. #[arg(long, hide = true)] pub disable_worker_version_check: bool, + + /// Enable approval-voting message processing in parallel. + /// + ///**Dangerous!** This is an experimental feature and should not be used in production, unless + /// explicitly advised to. + #[arg(long)] + pub enable_approval_voting_parallel: bool, } #[allow(missing_docs)] diff --git a/polkadot/cli/src/command.rs b/polkadot/cli/src/command.rs index 89e21bf135b6..16576e4b272b 100644 --- a/polkadot/cli/src/command.rs +++ b/polkadot/cli/src/command.rs @@ -244,6 +244,7 @@ where execute_workers_max_num: cli.run.execute_workers_max_num, prepare_workers_hard_max_num: cli.run.prepare_workers_hard_max_num, prepare_workers_soft_max_num: cli.run.prepare_workers_soft_max_num, + enable_approval_voting_parallel: cli.run.enable_approval_voting_parallel, }, ) .map(|full| full.task_manager)?; diff --git a/polkadot/node/core/approval-voting-parallel/Cargo.toml b/polkadot/node/core/approval-voting-parallel/Cargo.toml new file mode 100644 index 000000000000..e62062eab409 --- /dev/null +++ b/polkadot/node/core/approval-voting-parallel/Cargo.toml @@ -0,0 +1,55 @@ +[package] +name = "polkadot-node-core-approval-voting-parallel" +version = "7.0.0" +authors.workspace = true +edition.workspace = true +license.workspace = true +description = "Approval Voting Subsystem running approval work in parallel" + +[lints] +workspace = true + +[dependencies] +async-trait = { workspace = true } +futures = { workspace = true } +futures-timer = { workspace = true } +gum = { workspace = true } +itertools = { workspace = true } +thiserror = { workspace = true } + +polkadot-node-core-approval-voting = { workspace = true, default-features = true } +polkadot-approval-distribution = { workspace = true, default-features = true } +polkadot-node-subsystem = { workspace = true, default-features = true } +polkadot-node-subsystem-util = { workspace = true, default-features = true } +polkadot-overseer = { workspace = true, default-features = true } +polkadot-primitives = { workspace = true, default-features = true } +polkadot-node-primitives = { workspace = true, default-features = true } +polkadot-node-jaeger = { workspace = true, default-features = true } +polkadot-node-network-protocol = { workspace = true, default-features = true } +polkadot-node-metrics = { workspace = true, default-features = true } + +sc-keystore = { workspace = true, default-features = false } +sp-consensus = { workspace = true, default-features = false } +sp-consensus-slots = { workspace = true, default-features = false } +sp-application-crypto = { workspace = true, default-features = false, features = ["full_crypto"] } +sp-runtime = { workspace = true, default-features = false } + +rand = { workspace = true } +rand_core = { workspace = true } +rand_chacha = { workspace = true } + +[dev-dependencies] +async-trait = { workspace = true } +parking_lot = { workspace = true } +sp-keyring = { workspace = true, default-features = true } +sp-keystore = { workspace = true, default-features = true } +sp-core = { workspace = true, default-features = true } +sp-consensus-babe = { workspace = true, default-features = true } +sp-tracing = { workspace = true } +polkadot-node-subsystem-test-helpers = { workspace = true, default-features = true } +assert_matches = { workspace = true } +kvdb-memorydb = { workspace = true } +polkadot-primitives-test-helpers = { workspace = true, default-features = true } +log = { workspace = true, default-features = true } +polkadot-subsystem-bench = { workspace = true, default-features = true } +schnorrkel = { workspace = true, default-features = true } diff --git a/polkadot/node/core/approval-voting-parallel/src/lib.rs b/polkadot/node/core/approval-voting-parallel/src/lib.rs new file mode 100644 index 000000000000..18c73cfba1fc --- /dev/null +++ b/polkadot/node/core/approval-voting-parallel/src/lib.rs @@ -0,0 +1,957 @@ +// Copyright (C) Parity Technologies (UK) Ltd. +// This file is part of Polkadot. + +// Polkadot is free software: you can redistribute it and/or modify +// it under the terms of the GNU General Public License as published by +// the Free Software Foundation, either version 3 of the License, or +// (at your option) any later version. + +// Polkadot is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License +// along with Polkadot. If not, see . + +//! The Approval Voting Parallel Subsystem. +//! +//! This subsystem is responsible for orchestrating the work done by +//! approval-voting and approval-distribution subsystem, so they can +//! do their work in parallel, rather than serially, when they are run +//! as independent subsystems. +use itertools::Itertools; +use metrics::{Meters, MetricsWatcher}; +use polkadot_node_core_approval_voting::{Config, RealAssignmentCriteria}; +use polkadot_node_metrics::metered::{ + self, channel, unbounded, MeteredReceiver, MeteredSender, UnboundedMeteredReceiver, + UnboundedMeteredSender, +}; + +use polkadot_node_primitives::{ + approval::time::{Clock, SystemClock}, + DISPUTE_WINDOW, +}; +use polkadot_node_subsystem::{ + messages::{ApprovalDistributionMessage, ApprovalVotingMessage, ApprovalVotingParallelMessage}, + overseer, FromOrchestra, SpawnedSubsystem, SubsystemError, SubsystemResult, +}; +use polkadot_node_subsystem_util::{ + self, + database::Database, + runtime::{Config as RuntimeInfoConfig, RuntimeInfo}, +}; +use polkadot_overseer::{OverseerSignal, Priority, SubsystemSender, TimeoutExt}; +use polkadot_primitives::{CandidateIndex, Hash, ValidatorIndex, ValidatorSignature}; +use rand::SeedableRng; + +use sc_keystore::LocalKeystore; +use sp_consensus::SyncOracle; + +use futures::{channel::oneshot, prelude::*, StreamExt}; +pub use metrics::Metrics; +use polkadot_node_core_approval_voting::{ + approval_db::common::Config as DatabaseConfig, ApprovalVotingWorkProvider, +}; +use std::{ + collections::{HashMap, HashSet}, + fmt::Debug, + sync::Arc, + time::Duration, +}; +use stream::{select_with_strategy, PollNext, SelectWithStrategy}; +pub mod metrics; + +#[cfg(test)] +mod tests; + +pub(crate) const LOG_TARGET: &str = "parachain::approval-voting-parallel"; +// Value rather arbitrarily: Should not be hit in practice, it exists to more easily diagnose dead +// lock issues for example. +const WAIT_FOR_SIGS_GATHER_TIMEOUT: Duration = Duration::from_millis(2000); + +/// The number of workers used for running the approval-distribution logic. +pub const APPROVAL_DISTRIBUTION_WORKER_COUNT: usize = 4; + +/// The default channel size for the workers, can be overridden by the user through +/// `overseer_channel_capacity_override` +pub const DEFAULT_WORKERS_CHANNEL_SIZE: usize = 64000 / APPROVAL_DISTRIBUTION_WORKER_COUNT; + +fn prio_right<'a>(_val: &'a mut ()) -> PollNext { + PollNext::Right +} + +/// The approval voting parallel subsystem. +pub struct ApprovalVotingParallelSubsystem { + /// `LocalKeystore` is needed for assignment keys, but not necessarily approval keys. + /// + /// We do a lot of VRF signing and need the keys to have low latency. + keystore: Arc, + db_config: DatabaseConfig, + slot_duration_millis: u64, + db: Arc, + sync_oracle: Box, + metrics: Metrics, + spawner: Arc, + clock: Arc, + overseer_message_channel_capacity_override: Option, +} + +impl ApprovalVotingParallelSubsystem { + /// Create a new approval voting subsystem with the given keystore, config, and database. + pub fn with_config( + config: Config, + db: Arc, + keystore: Arc, + sync_oracle: Box, + metrics: Metrics, + spawner: impl overseer::gen::Spawner + 'static + Clone, + overseer_message_channel_capacity_override: Option, + ) -> Self { + ApprovalVotingParallelSubsystem::with_config_and_clock( + config, + db, + keystore, + sync_oracle, + metrics, + Arc::new(SystemClock {}), + spawner, + overseer_message_channel_capacity_override, + ) + } + + /// Create a new approval voting subsystem with the given keystore, config, clock, and database. + pub fn with_config_and_clock( + config: Config, + db: Arc, + keystore: Arc, + sync_oracle: Box, + metrics: Metrics, + clock: Arc, + spawner: impl overseer::gen::Spawner + 'static, + overseer_message_channel_capacity_override: Option, + ) -> Self { + ApprovalVotingParallelSubsystem { + keystore, + slot_duration_millis: config.slot_duration_millis, + db, + db_config: DatabaseConfig { col_approval_data: config.col_approval_data }, + sync_oracle, + metrics, + spawner: Arc::new(spawner), + clock, + overseer_message_channel_capacity_override, + } + } + + /// The size of the channel used for the workers. + fn workers_channel_size(&self) -> usize { + self.overseer_message_channel_capacity_override + .unwrap_or(DEFAULT_WORKERS_CHANNEL_SIZE) + } +} + +#[overseer::subsystem(ApprovalVotingParallel, error = SubsystemError, prefix = self::overseer)] +impl ApprovalVotingParallelSubsystem { + fn start(self, ctx: Context) -> SpawnedSubsystem { + let future = run::(ctx, self) + .map_err(|e| SubsystemError::with_origin("approval-voting-parallel", e)) + .boxed(); + + SpawnedSubsystem { name: "approval-voting-parallel-subsystem", future } + } +} + +// It starts worker for the approval voting subsystem and the `APPROVAL_DISTRIBUTION_WORKER_COUNT` +// workers for the approval distribution subsystem. +// +// It returns handles that can be used to send messages to the workers. +#[overseer::contextbounds(ApprovalVotingParallel, prefix = self::overseer)] +async fn start_workers( + ctx: &mut Context, + subsystem: ApprovalVotingParallelSubsystem, + metrics_watcher: &mut MetricsWatcher, +) -> SubsystemResult<(ToWorker, Vec>)> +where +{ + gum::info!(target: LOG_TARGET, "Starting approval distribution workers"); + + // Build approval voting handles. + let (to_approval_voting_worker, approval_voting_work_provider) = build_worker_handles( + "approval-voting-parallel-db".into(), + subsystem.workers_channel_size(), + metrics_watcher, + prio_right, + ); + let mut to_approval_distribution_workers = Vec::new(); + let slot_duration_millis = subsystem.slot_duration_millis; + + for i in 0..APPROVAL_DISTRIBUTION_WORKER_COUNT { + let mut network_sender = ctx.sender().clone(); + let mut runtime_api_sender = ctx.sender().clone(); + let mut approval_distribution_to_approval_voting = to_approval_voting_worker.clone(); + + let approval_distr_instance = + polkadot_approval_distribution::ApprovalDistribution::new_with_clock( + subsystem.metrics.approval_distribution_metrics(), + subsystem.slot_duration_millis, + subsystem.clock.clone(), + Arc::new(RealAssignmentCriteria {}), + ); + let task_name = format!("approval-voting-parallel-{}", i); + let (to_approval_distribution_worker, mut approval_distribution_work_provider) = + build_worker_handles( + task_name.clone(), + subsystem.workers_channel_size(), + metrics_watcher, + prio_right, + ); + + metrics_watcher.watch(task_name.clone(), to_approval_distribution_worker.meter()); + + subsystem.spawner.spawn_blocking( + task_name.leak(), + Some("approval-voting-parallel"), + Box::pin(async move { + let mut state = + polkadot_approval_distribution::State::with_config(slot_duration_millis); + let mut rng = rand::rngs::StdRng::from_entropy(); + let mut session_info_provider = RuntimeInfo::new_with_config(RuntimeInfoConfig { + keystore: None, + session_cache_lru_size: DISPUTE_WINDOW.get(), + }); + + loop { + let message = match approval_distribution_work_provider.next().await { + Some(message) => message, + None => { + gum::info!( + target: LOG_TARGET, + "Approval distribution stream finished, most likely shutting down", + ); + break; + }, + }; + if approval_distr_instance + .handle_from_orchestra( + message, + &mut approval_distribution_to_approval_voting, + &mut network_sender, + &mut runtime_api_sender, + &mut state, + &mut rng, + &mut session_info_provider, + ) + .await + { + gum::info!( + target: LOG_TARGET, + "Approval distribution worker {}, exiting because of shutdown", i + ); + }; + } + }), + ); + to_approval_distribution_workers.push(to_approval_distribution_worker); + } + + gum::info!(target: LOG_TARGET, "Starting approval voting workers"); + + let sender = ctx.sender().clone(); + let to_approval_distribution = ApprovalVotingToApprovalDistribution(sender.clone()); + polkadot_node_core_approval_voting::start_approval_worker( + approval_voting_work_provider, + sender.clone(), + to_approval_distribution, + polkadot_node_core_approval_voting::Config { + slot_duration_millis: subsystem.slot_duration_millis, + col_approval_data: subsystem.db_config.col_approval_data, + }, + subsystem.db.clone(), + subsystem.keystore.clone(), + subsystem.sync_oracle, + subsystem.metrics.approval_voting_metrics(), + subsystem.spawner.clone(), + "approval-voting-parallel-db", + "approval-voting-parallel", + subsystem.clock.clone(), + ) + .await?; + + Ok((to_approval_voting_worker, to_approval_distribution_workers)) +} + +// The main run function of the approval parallel voting subsystem. +#[overseer::contextbounds(ApprovalVotingParallel, prefix = self::overseer)] +async fn run( + mut ctx: Context, + subsystem: ApprovalVotingParallelSubsystem, +) -> SubsystemResult<()> { + let mut metrics_watcher = MetricsWatcher::new(subsystem.metrics.clone()); + gum::info!( + target: LOG_TARGET, + "Starting workers" + ); + + let (to_approval_voting_worker, to_approval_distribution_workers) = + start_workers(&mut ctx, subsystem, &mut metrics_watcher).await?; + + gum::info!( + target: LOG_TARGET, + "Starting main subsystem loop" + ); + + run_main_loop(ctx, to_approval_voting_worker, to_approval_distribution_workers, metrics_watcher) + .await +} + +// Main loop of the subsystem, it shouldn't include any logic just dispatching of messages to +// the workers. +// +// It listens for messages from the overseer and dispatches them to the workers. +#[overseer::contextbounds(ApprovalVotingParallel, prefix = self::overseer)] +async fn run_main_loop( + mut ctx: Context, + mut to_approval_voting_worker: ToWorker, + mut to_approval_distribution_workers: Vec>, + metrics_watcher: MetricsWatcher, +) -> SubsystemResult<()> { + loop { + futures::select! { + next_msg = ctx.recv().fuse() => { + let next_msg = match next_msg { + Ok(msg) => msg, + Err(err) => { + gum::info!(target: LOG_TARGET, ?err, "Approval voting parallel subsystem received an error"); + return Err(err); + } + }; + + match next_msg { + FromOrchestra::Signal(msg) => { + if matches!(msg, OverseerSignal::ActiveLeaves(_)) { + metrics_watcher.collect_metrics(); + } + + for worker in to_approval_distribution_workers.iter_mut() { + worker + .send_signal(msg.clone()).await?; + } + + to_approval_voting_worker.send_signal(msg.clone()).await?; + if matches!(msg, OverseerSignal::Conclude) { + break; + } + }, + FromOrchestra::Communication { msg } => match msg { + // The message the approval voting subsystem would've handled. + ApprovalVotingParallelMessage::ApprovedAncestor(_, _,_) | + ApprovalVotingParallelMessage::GetApprovalSignaturesForCandidate(_, _) => { + to_approval_voting_worker.send_message( + msg.try_into().expect( + "Message is one of ApprovedAncestor, GetApprovalSignaturesForCandidate + and that can be safely converted to ApprovalVotingMessage; qed" + ) + ).await; + }, + // Now the message the approval distribution subsystem would've handled and need to + // be forwarded to the workers. + ApprovalVotingParallelMessage::NewBlocks(msg) => { + for worker in to_approval_distribution_workers.iter_mut() { + worker + .send_message( + ApprovalDistributionMessage::NewBlocks(msg.clone()), + ) + .await; + } + }, + ApprovalVotingParallelMessage::DistributeAssignment(assignment, claimed) => { + let worker = assigned_worker_for_validator(assignment.validator, &mut to_approval_distribution_workers); + worker + .send_message( + ApprovalDistributionMessage::DistributeAssignment(assignment, claimed) + ) + .await; + + }, + ApprovalVotingParallelMessage::DistributeApproval(vote) => { + let worker = assigned_worker_for_validator(vote.validator, &mut to_approval_distribution_workers); + worker + .send_message( + ApprovalDistributionMessage::DistributeApproval(vote) + ).await; + + }, + ApprovalVotingParallelMessage::NetworkBridgeUpdate(msg) => { + if let polkadot_node_subsystem::messages::NetworkBridgeEvent::PeerMessage( + peer_id, + msg, + ) = msg + { + let (all_msgs_from_same_validator, messages_split_by_validator) = validator_index_for_msg(msg); + + for (validator_index, msg) in all_msgs_from_same_validator.into_iter().chain(messages_split_by_validator.into_iter().flatten()) { + let worker = assigned_worker_for_validator(validator_index, &mut to_approval_distribution_workers); + + worker + .send_message( + ApprovalDistributionMessage::NetworkBridgeUpdate( + polkadot_node_subsystem::messages::NetworkBridgeEvent::PeerMessage( + peer_id, msg, + ), + ), + ).await; + } + } else { + for worker in to_approval_distribution_workers.iter_mut() { + worker + .send_message_with_priority::( + ApprovalDistributionMessage::NetworkBridgeUpdate(msg.clone()), + ).await; + } + } + }, + ApprovalVotingParallelMessage::GetApprovalSignatures(indices, tx) => { + handle_get_approval_signatures(&mut ctx, &mut to_approval_distribution_workers, indices, tx).await; + }, + ApprovalVotingParallelMessage::ApprovalCheckingLagUpdate(lag) => { + for worker in to_approval_distribution_workers.iter_mut() { + worker + .send_message( + ApprovalDistributionMessage::ApprovalCheckingLagUpdate(lag) + ).await; + } + }, + }, + }; + + }, + }; + } + Ok(()) +} + +// It sends a message to all approval workers to get the approval signatures for the requested +// candidates and then merges them all together and sends them back to the requester. +#[overseer::contextbounds(ApprovalVotingParallel, prefix = self::overseer)] +async fn handle_get_approval_signatures( + ctx: &mut Context, + to_approval_distribution_workers: &mut Vec>, + requested_candidates: HashSet<(Hash, CandidateIndex)>, + result_channel: oneshot::Sender< + HashMap, ValidatorSignature)>, + >, +) { + let mut sigs = HashMap::new(); + let mut signatures_channels = Vec::new(); + for worker in to_approval_distribution_workers.iter_mut() { + let (tx, rx) = oneshot::channel(); + worker.send_unbounded_message(ApprovalDistributionMessage::GetApprovalSignatures( + requested_candidates.clone(), + tx, + )); + signatures_channels.push(rx); + } + + let gather_signatures = async move { + let Some(results) = futures::future::join_all(signatures_channels) + .timeout(WAIT_FOR_SIGS_GATHER_TIMEOUT) + .await + else { + gum::warn!( + target: LOG_TARGET, + "Waiting for approval signatures timed out - dead lock?" + ); + return; + }; + + for result in results { + let worker_sigs = match result { + Ok(sigs) => sigs, + Err(_) => { + gum::error!( + target: LOG_TARGET, + "Getting approval signatures failed, oneshot got closed" + ); + continue; + }, + }; + sigs.extend(worker_sigs); + } + + if let Err(_) = result_channel.send(sigs) { + gum::debug!( + target: LOG_TARGET, + "Sending back approval signatures failed, oneshot got closed" + ); + } + }; + + if let Err(err) = ctx.spawn("approval-voting-gather-signatures", Box::pin(gather_signatures)) { + gum::warn!(target: LOG_TARGET, "Failed to spawn gather signatures task: {:?}", err); + } +} + +// Returns the worker that should receive the message for the given validator. +fn assigned_worker_for_validator( + validator: ValidatorIndex, + to_approval_distribution_workers: &mut Vec>, +) -> &mut ToWorker { + let worker_index = validator.0 as usize % to_approval_distribution_workers.len(); + to_approval_distribution_workers + .get_mut(worker_index) + .expect("Worker index is obtained modulo len; qed") +} + +// Returns the validators that initially created this assignments/votes, the validator index +// is later used to decide which approval-distribution worker should receive the message. +// +// Because this is on the hot path and we don't want to be unnecessarily slow, it contains two logic +// paths. The ultra fast path where all messages have the same validator index and we don't do +// any cloning or allocation and the path where we need to split the messages into multiple +// messages, because they have different validator indices, where we do need to clone and allocate. +// In practice most of the message will fall on the ultra fast path. +fn validator_index_for_msg( + msg: polkadot_node_network_protocol::ApprovalDistributionMessage, +) -> ( + Option<(ValidatorIndex, polkadot_node_network_protocol::ApprovalDistributionMessage)>, + Option>, +) { + match msg { + polkadot_node_network_protocol::Versioned::V1(ref message) => match message { + polkadot_node_network_protocol::v1::ApprovalDistributionMessage::Assignments(msgs) => + if let Ok(validator) = msgs.iter().map(|(msg, _)| msg.validator).all_equal_value() { + (Some((validator, msg)), None) + } else { + let split = msgs + .iter() + .map(|(msg, claimed_candidates)| { + ( + msg.validator, + polkadot_node_network_protocol::Versioned::V1( + polkadot_node_network_protocol::v1::ApprovalDistributionMessage::Assignments( + vec![(msg.clone(), *claimed_candidates)] + ), + ), + ) + }) + .collect_vec(); + (None, Some(split)) + }, + polkadot_node_network_protocol::v1::ApprovalDistributionMessage::Approvals(msgs) => + if let Ok(validator) = msgs.iter().map(|msg| msg.validator).all_equal_value() { + (Some((validator, msg)), None) + } else { + let split = msgs + .iter() + .map(|vote| { + ( + vote.validator, + polkadot_node_network_protocol::Versioned::V1( + polkadot_node_network_protocol::v1::ApprovalDistributionMessage::Approvals( + vec![vote.clone()] + ), + ), + ) + }) + .collect_vec(); + (None, Some(split)) + }, + }, + polkadot_node_network_protocol::Versioned::V2(ref message) => match message { + polkadot_node_network_protocol::v2::ApprovalDistributionMessage::Assignments(msgs) => + if let Ok(validator) = msgs.iter().map(|(msg, _)| msg.validator).all_equal_value() { + (Some((validator, msg)), None) + } else { + let split = msgs + .iter() + .map(|(msg, claimed_candidates)| { + ( + msg.validator, + polkadot_node_network_protocol::Versioned::V2( + polkadot_node_network_protocol::v2::ApprovalDistributionMessage::Assignments( + vec![(msg.clone(), *claimed_candidates)] + ), + ), + ) + }) + .collect_vec(); + (None, Some(split)) + }, + + polkadot_node_network_protocol::v2::ApprovalDistributionMessage::Approvals(msgs) => + if let Ok(validator) = msgs.iter().map(|msg| msg.validator).all_equal_value() { + (Some((validator, msg)), None) + } else { + let split = msgs + .iter() + .map(|vote| { + ( + vote.validator, + polkadot_node_network_protocol::Versioned::V2( + polkadot_node_network_protocol::v2::ApprovalDistributionMessage::Approvals( + vec![vote.clone()] + ), + ), + ) + }) + .collect_vec(); + (None, Some(split)) + }, + }, + polkadot_node_network_protocol::Versioned::V3(ref message) => match message { + polkadot_node_network_protocol::v3::ApprovalDistributionMessage::Assignments(msgs) => + if let Ok(validator) = msgs.iter().map(|(msg, _)| msg.validator).all_equal_value() { + (Some((validator, msg)), None) + } else { + let split = msgs + .iter() + .map(|(msg, claimed_candidates)| { + ( + msg.validator, + polkadot_node_network_protocol::Versioned::V3( + polkadot_node_network_protocol::v3::ApprovalDistributionMessage::Assignments( + vec![(msg.clone(), claimed_candidates.clone())] + ), + ), + ) + }) + .collect_vec(); + (None, Some(split)) + }, + polkadot_node_network_protocol::v3::ApprovalDistributionMessage::Approvals(msgs) => + if let Ok(validator) = msgs.iter().map(|msg| msg.validator).all_equal_value() { + (Some((validator, msg)), None) + } else { + let split = msgs + .iter() + .map(|vote| { + ( + vote.validator, + polkadot_node_network_protocol::Versioned::V3( + polkadot_node_network_protocol::v3::ApprovalDistributionMessage::Approvals( + vec![vote.clone()] + ), + ), + ) + }) + .collect_vec(); + (None, Some(split)) + }, + }, + } +} + +/// A handler object that both type of workers use for receiving work. +/// +/// In practive this is just a wrapper over two channels Receiver, that is injected into +/// approval-voting worker and approval-distribution workers. +type WorkProvider = WorkProviderImpl< + SelectWithStrategy< + MeteredReceiver>, + UnboundedMeteredReceiver>, + Clos, + State, + >, +>; + +pub struct WorkProviderImpl(T); + +impl Stream for WorkProviderImpl +where + T: Stream> + Unpin + Send, +{ + type Item = FromOrchestra; + + fn poll_next( + mut self: std::pin::Pin<&mut Self>, + cx: &mut std::task::Context<'_>, + ) -> std::task::Poll> { + self.0.poll_next_unpin(cx) + } +} + +#[async_trait::async_trait] +impl ApprovalVotingWorkProvider for WorkProviderImpl +where + T: Stream> + Unpin + Send, +{ + async fn recv(&mut self) -> SubsystemResult> { + self.0.next().await.ok_or(SubsystemError::Context( + "ApprovalVotingWorkProviderImpl: Channel closed".to_string(), + )) + } +} + +impl WorkProvider +where + M: Send + Sync + 'static, + Clos: FnMut(&mut State) -> PollNext, + State: Default, +{ + // Constructs a work providers from the channels handles. + fn from_rx_worker(rx: RxWorker, prio: Clos) -> Self { + let prioritised = select_with_strategy(rx.0, rx.1, prio); + WorkProviderImpl(prioritised) + } +} + +/// Just a wrapper for implementing `overseer::SubsystemSender` and +/// `overseer::SubsystemSender`. +/// +/// The instance of this struct can be injected into the workers, so they can talk +/// directly with each other without intermediating in this subsystem loop. +pub struct ToWorker( + MeteredSender>, + UnboundedMeteredSender>, +); + +impl Clone for ToWorker { + fn clone(&self) -> Self { + Self(self.0.clone(), self.1.clone()) + } +} + +impl ToWorker { + async fn send_signal(&mut self, signal: OverseerSignal) -> Result<(), SubsystemError> { + self.1 + .unbounded_send(FromOrchestra::Signal(signal)) + .map_err(|err| SubsystemError::QueueError(err.into_send_error())) + } + + fn meter(&self) -> Meters { + Meters::new(self.0.meter(), self.1.meter()) + } +} + +impl overseer::SubsystemSender for ToWorker { + fn send_message<'life0, 'async_trait>( + &'life0 mut self, + msg: T, + ) -> ::core::pin::Pin< + Box + ::core::marker::Send + 'async_trait>, + > + where + 'life0: 'async_trait, + Self: 'async_trait, + { + async { + if let Err(err) = + self.0.send(polkadot_overseer::FromOrchestra::Communication { msg }).await + { + gum::error!( + target: LOG_TARGET, + "Failed to send message to approval voting worker: {:?}, subsystem is probably shutting down.", + err + ); + } + } + .boxed() + } + + fn try_send_message(&mut self, msg: T) -> Result<(), metered::TrySendError> { + self.0 + .try_send(polkadot_overseer::FromOrchestra::Communication { msg }) + .map_err(|result| { + let is_full = result.is_full(); + let msg = match result.into_inner() { + polkadot_overseer::FromOrchestra::Signal(_) => + panic!("Cannot happen variant is never built"), + polkadot_overseer::FromOrchestra::Communication { msg } => msg, + }; + if is_full { + metered::TrySendError::Full(msg) + } else { + metered::TrySendError::Closed(msg) + } + }) + } + + fn send_messages<'life0, 'async_trait, I>( + &'life0 mut self, + msgs: I, + ) -> ::core::pin::Pin< + Box + ::core::marker::Send + 'async_trait>, + > + where + I: IntoIterator + Send, + I::IntoIter: Send, + I: 'async_trait, + 'life0: 'async_trait, + Self: 'async_trait, + { + async { + for msg in msgs { + self.send_message(msg).await; + } + } + .boxed() + } + + fn send_unbounded_message(&mut self, msg: T) { + if let Err(err) = + self.1.unbounded_send(polkadot_overseer::FromOrchestra::Communication { msg }) + { + gum::error!( + target: LOG_TARGET, + "Failed to send unbounded message to approval voting worker: {:?}, subsystem is probably shutting down.", + err + ); + } + } + + fn send_message_with_priority<'life0, 'async_trait, P>( + &'life0 mut self, + msg: T, + ) -> ::core::pin::Pin< + Box + ::core::marker::Send + 'async_trait>, + > + where + P: 'async_trait + Priority, + 'life0: 'async_trait, + Self: 'async_trait, + { + match P::priority() { + polkadot_overseer::PriorityLevel::Normal => self.send_message(msg), + polkadot_overseer::PriorityLevel::High => + async { self.send_unbounded_message(msg) }.boxed(), + } + } + + fn try_send_message_with_priority( + &mut self, + msg: T, + ) -> Result<(), metered::TrySendError> { + match P::priority() { + polkadot_overseer::PriorityLevel::Normal => self.try_send_message(msg), + polkadot_overseer::PriorityLevel::High => Ok(self.send_unbounded_message(msg)), + } + } +} + +/// Handles that are used by an worker to receive work. +pub struct RxWorker( + MeteredReceiver>, + UnboundedMeteredReceiver>, +); + +// Build all the necessary channels for sending messages to an worker +// and for the worker to receive them. +fn build_channels( + channel_name: String, + channel_size: usize, + metrics_watcher: &mut MetricsWatcher, +) -> (ToWorker, RxWorker) { + let (tx_work, rx_work) = channel::>(channel_size); + let (tx_work_unbounded, rx_work_unbounded) = unbounded::>(); + let to_worker = ToWorker(tx_work, tx_work_unbounded); + + metrics_watcher.watch(channel_name, to_worker.meter()); + + (to_worker, RxWorker(rx_work, rx_work_unbounded)) +} + +/// Build the worker handles used for interacting with the workers. +/// +/// `ToWorker` is used for sending messages to the workers. +/// `WorkProvider` is used by the workers for receiving the messages. +fn build_worker_handles( + channel_name: String, + channel_size: usize, + metrics_watcher: &mut MetricsWatcher, + prio_right: Clos, +) -> (ToWorker, WorkProvider) +where + M: Send + Sync + 'static, + Clos: FnMut(&mut State) -> PollNext, + State: Default, +{ + let (to_worker, rx_worker) = build_channels(channel_name, channel_size, metrics_watcher); + (to_worker, WorkProviderImpl::from_rx_worker(rx_worker, prio_right)) +} + +/// Just a wrapper for implementing `overseer::SubsystemSender`, so +/// that we can inject into the approval voting subsystem. +#[derive(Clone)] +pub struct ApprovalVotingToApprovalDistribution>( + S, +); + +impl> + overseer::SubsystemSender for ApprovalVotingToApprovalDistribution +{ + #[allow(clippy::type_complexity, clippy::type_repetition_in_bounds)] + fn send_message<'life0, 'async_trait>( + &'life0 mut self, + msg: ApprovalDistributionMessage, + ) -> ::core::pin::Pin< + Box + ::core::marker::Send + 'async_trait>, + > + where + 'life0: 'async_trait, + Self: 'async_trait, + { + self.0.send_message(msg.into()) + } + + fn try_send_message( + &mut self, + msg: ApprovalDistributionMessage, + ) -> Result<(), metered::TrySendError> { + self.0.try_send_message(msg.into()).map_err(|err| match err { + // Safe to unwrap because it was built from the same type. + metered::TrySendError::Closed(msg) => + metered::TrySendError::Closed(msg.try_into().unwrap()), + metered::TrySendError::Full(msg) => + metered::TrySendError::Full(msg.try_into().unwrap()), + }) + } + + #[allow(clippy::type_complexity, clippy::type_repetition_in_bounds)] + fn send_messages<'life0, 'async_trait, I>( + &'life0 mut self, + msgs: I, + ) -> ::core::pin::Pin< + Box + ::core::marker::Send + 'async_trait>, + > + where + I: IntoIterator + Send, + I::IntoIter: Send, + I: 'async_trait, + 'life0: 'async_trait, + Self: 'async_trait, + { + self.0.send_messages(msgs.into_iter().map(|msg| msg.into())) + } + + fn send_unbounded_message(&mut self, msg: ApprovalDistributionMessage) { + self.0.send_unbounded_message(msg.into()) + } + + fn send_message_with_priority<'life0, 'async_trait, P>( + &'life0 mut self, + msg: ApprovalDistributionMessage, + ) -> ::core::pin::Pin< + Box + ::core::marker::Send + 'async_trait>, + > + where + P: 'async_trait + Priority, + 'life0: 'async_trait, + Self: 'async_trait, + { + self.0.send_message_with_priority::

(msg.into()) + } + + fn try_send_message_with_priority( + &mut self, + msg: ApprovalDistributionMessage, + ) -> Result<(), metered::TrySendError> { + self.0.try_send_message_with_priority::

(msg.into()).map_err(|err| match err { + // Safe to unwrap because it was built from the same type. + metered::TrySendError::Closed(msg) => + metered::TrySendError::Closed(msg.try_into().unwrap()), + metered::TrySendError::Full(msg) => + metered::TrySendError::Full(msg.try_into().unwrap()), + }) + } +} diff --git a/polkadot/node/core/approval-voting-parallel/src/metrics.rs b/polkadot/node/core/approval-voting-parallel/src/metrics.rs new file mode 100644 index 000000000000..1b4ab4bd9b88 --- /dev/null +++ b/polkadot/node/core/approval-voting-parallel/src/metrics.rs @@ -0,0 +1,236 @@ +// Copyright (C) Parity Technologies (UK) Ltd. +// This file is part of Polkadot. + +// Polkadot is free software: you can redistribute it and/or modify +// it under the terms of the GNU General Public License as published by +// the Free Software Foundation, either version 3 of the License, or +// (at your option) any later version. + +// Polkadot is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License +// along with Polkadot. If not, see . + +//! The Metrics for Approval Voting Parallel Subsystem. + +use std::collections::HashMap; + +use polkadot_node_metrics::{metered::Meter, metrics}; +use polkadot_overseer::prometheus; + +#[derive(Default, Clone)] +pub struct Metrics(Option); + +/// Approval Voting parallel metrics. +#[derive(Clone)] +pub struct MetricsInner { + // The inner metrics of the approval distribution workers. + approval_distribution: polkadot_approval_distribution::metrics::Metrics, + // The inner metrics of the approval voting workers. + approval_voting: polkadot_node_core_approval_voting::Metrics, + + // Time of flight metrics for bounded channels. + to_worker_bounded_tof: prometheus::HistogramVec, + // Number of elements sent to the worker's bounded queue. + to_worker_bounded_sent: prometheus::GaugeVec, + // Number of elements received by the worker's bounded queue. + to_worker_bounded_received: prometheus::GaugeVec, + // Number of times senders blocked while sending messages to the worker. + to_worker_bounded_blocked: prometheus::GaugeVec, + // Time of flight metrics for unbounded channels. + to_worker_unbounded_tof: prometheus::HistogramVec, + // Number of elements sent to the worker's unbounded queue. + to_worker_unbounded_sent: prometheus::GaugeVec, + // Number of elements received by the worker's unbounded queue. + to_worker_unbounded_received: prometheus::GaugeVec, +} + +impl Metrics { + /// Get the approval distribution metrics. + pub fn approval_distribution_metrics( + &self, + ) -> polkadot_approval_distribution::metrics::Metrics { + self.0 + .as_ref() + .map(|metrics_inner| metrics_inner.approval_distribution.clone()) + .unwrap_or_default() + } + + /// Get the approval voting metrics. + pub fn approval_voting_metrics(&self) -> polkadot_node_core_approval_voting::Metrics { + self.0 + .as_ref() + .map(|metrics_inner| metrics_inner.approval_voting.clone()) + .unwrap_or_default() + } +} + +impl metrics::Metrics for Metrics { + /// Try to register the metrics. + fn try_register( + registry: &prometheus::Registry, + ) -> std::result::Result { + Ok(Metrics(Some(MetricsInner { + approval_distribution: polkadot_approval_distribution::metrics::Metrics::try_register( + registry, + )?, + approval_voting: polkadot_node_core_approval_voting::Metrics::try_register(registry)?, + to_worker_bounded_tof: prometheus::register( + prometheus::HistogramVec::new( + prometheus::HistogramOpts::new( + "polkadot_approval_voting_parallel_worker_bounded_tof", + "Duration spent in a particular approval voting worker channel from entrance to removal", + ) + .buckets(vec![ + 0.0001, 0.0004, 0.0016, 0.0064, 0.0256, 0.1024, 0.4096, 1.6384, 3.2768, + 4.9152, 6.5536, + ]), + &["worker_name"], + )?, + registry, + )?, + to_worker_bounded_sent: prometheus::register( + prometheus::GaugeVec::::new( + prometheus::Opts::new( + "polkadot_approval_voting_parallel_worker_bounded_sent", + "Number of elements sent to approval voting workers' bounded queues", + ), + &["worker_name"], + )?, + registry, + )?, + to_worker_bounded_received: prometheus::register( + prometheus::GaugeVec::::new( + prometheus::Opts::new( + "polkadot_approval_voting_parallel_worker_bounded_received", + "Number of elements received by approval voting workers' bounded queues", + ), + &["worker_name"], + )?, + registry, + )?, + to_worker_bounded_blocked: prometheus::register( + prometheus::GaugeVec::::new( + prometheus::Opts::new( + "polkadot_approval_voting_parallel_worker_bounded_blocked", + "Number of times approval voting workers blocked while sending messages to a subsystem", + ), + &["worker_name"], + )?, + registry, + )?, + to_worker_unbounded_tof: prometheus::register( + prometheus::HistogramVec::new( + prometheus::HistogramOpts::new( + "polkadot_approval_voting_parallel_worker_unbounded_tof", + "Duration spent in a particular approval voting worker channel from entrance to removal", + ) + .buckets(vec![ + 0.0001, 0.0004, 0.0016, 0.0064, 0.0256, 0.1024, 0.4096, 1.6384, 3.2768, + 4.9152, 6.5536, + ]), + &["worker_name"], + )?, + registry, + )?, + to_worker_unbounded_sent: prometheus::register( + prometheus::GaugeVec::::new( + prometheus::Opts::new( + "polkadot_approval_voting_parallel_worker_unbounded_sent", + "Number of elements sent to approval voting workers' unbounded queues", + ), + &["worker_name"], + )?, + registry, + )?, + to_worker_unbounded_received: prometheus::register( + prometheus::GaugeVec::::new( + prometheus::Opts::new( + "polkadot_approval_voting_parallel_worker_unbounded_received", + "Number of elements received by approval voting workers' unbounded queues", + ), + &["worker_name"], + )?, + registry, + )?, + }))) + } +} + +/// The meters to watch. +#[derive(Clone)] +pub struct Meters { + bounded: Meter, + unbounded: Meter, +} + +impl Meters { + pub fn new(bounded: &Meter, unbounded: &Meter) -> Self { + Self { bounded: bounded.clone(), unbounded: unbounded.clone() } + } +} + +/// A metrics watcher that watches the meters and updates the metrics. +pub struct MetricsWatcher { + to_watch: HashMap, + metrics: Metrics, +} + +impl MetricsWatcher { + /// Create a new metrics watcher. + pub fn new(metrics: Metrics) -> Self { + Self { to_watch: HashMap::new(), metrics } + } + + /// Watch the meters of a worker with this name. + pub fn watch(&mut self, worker_name: String, meters: Meters) { + self.to_watch.insert(worker_name, meters); + } + + /// Collect all the metrics. + pub fn collect_metrics(&self) { + for (name, meter) in &self.to_watch { + let bounded_readouts = meter.bounded.read(); + let unbounded_readouts = meter.unbounded.read(); + if let Some(metrics) = self.metrics.0.as_ref() { + metrics + .to_worker_bounded_sent + .with_label_values(&[name]) + .set(bounded_readouts.sent as u64); + + metrics + .to_worker_bounded_received + .with_label_values(&[name]) + .set(bounded_readouts.received as u64); + + metrics + .to_worker_bounded_blocked + .with_label_values(&[name]) + .set(bounded_readouts.blocked as u64); + + metrics + .to_worker_unbounded_sent + .with_label_values(&[name]) + .set(unbounded_readouts.sent as u64); + + metrics + .to_worker_unbounded_received + .with_label_values(&[name]) + .set(unbounded_readouts.received as u64); + + let hist_bounded = metrics.to_worker_bounded_tof.with_label_values(&[name]); + for tof in bounded_readouts.tof { + hist_bounded.observe(tof.as_f64()); + } + + let hist_unbounded = metrics.to_worker_unbounded_tof.with_label_values(&[name]); + for tof in unbounded_readouts.tof { + hist_unbounded.observe(tof.as_f64()); + } + } + } + } +} diff --git a/polkadot/node/core/approval-voting-parallel/src/tests.rs b/polkadot/node/core/approval-voting-parallel/src/tests.rs new file mode 100644 index 000000000000..215a707147fc --- /dev/null +++ b/polkadot/node/core/approval-voting-parallel/src/tests.rs @@ -0,0 +1,1178 @@ +// Copyright (C) Parity Technologies (UK) Ltd. +// This file is part of Polkadot. + +// Polkadot is free software: you can redistribute it and/or modify +// it under the terms of the GNU General Public License as published by +// the Free Software Foundation, either version 3 of the License, or +// (at your option) any later version. + +// Polkadot is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License +// along with Polkadot. If not, see . + +//! The tests for Approval Voting Parallel Subsystem. + +use std::{ + collections::{HashMap, HashSet}, + future::Future, + sync::Arc, + time::Duration, +}; + +use crate::{ + build_worker_handles, metrics::MetricsWatcher, prio_right, run_main_loop, start_workers, + validator_index_for_msg, ApprovalVotingParallelSubsystem, Metrics, WorkProvider, +}; +use assert_matches::assert_matches; +use futures::{channel::oneshot, future, stream::PollNext, StreamExt}; +use itertools::Itertools; +use polkadot_node_core_approval_voting::{ApprovalVotingWorkProvider, Config}; +use polkadot_node_network_protocol::{peer_set::ValidationVersion, ObservedRole, PeerId, View}; +use polkadot_node_primitives::approval::{ + time::SystemClock, + v1::{ + AssignmentCert, AssignmentCertKind, IndirectAssignmentCert, IndirectSignedApprovalVote, + RELAY_VRF_MODULO_CONTEXT, + }, + v2::{ + AssignmentCertKindV2, AssignmentCertV2, CoreBitfield, IndirectAssignmentCertV2, + IndirectSignedApprovalVoteV2, + }, +}; +use polkadot_node_subsystem::{ + messages::{ApprovalDistributionMessage, ApprovalVotingMessage, ApprovalVotingParallelMessage}, + FromOrchestra, +}; +use polkadot_node_subsystem_test_helpers::{mock::new_leaf, TestSubsystemContext}; +use polkadot_overseer::{ActiveLeavesUpdate, OverseerSignal, SpawnGlue, TimeoutExt}; +use polkadot_primitives::{CandidateHash, CoreIndex, Hash, ValidatorIndex}; +use sc_keystore::{Keystore, LocalKeystore}; +use sp_consensus::SyncOracle; +use sp_consensus_babe::{VrfPreOutput, VrfProof, VrfSignature}; +use sp_core::{testing::TaskExecutor, H256}; +use sp_keyring::Sr25519Keyring; +type VirtualOverseer = + polkadot_node_subsystem_test_helpers::TestSubsystemContextHandle; + +const SLOT_DURATION_MILLIS: u64 = 6000; + +pub mod test_constants { + pub(crate) const DATA_COL: u32 = 0; + pub(crate) const NUM_COLUMNS: u32 = 1; +} + +fn fake_assignment_cert(block_hash: Hash, validator: ValidatorIndex) -> IndirectAssignmentCert { + let ctx = schnorrkel::signing_context(RELAY_VRF_MODULO_CONTEXT); + let msg = b"WhenParachains?"; + let mut prng = rand_core::OsRng; + let keypair = schnorrkel::Keypair::generate_with(&mut prng); + let (inout, proof, _) = keypair.vrf_sign(ctx.bytes(msg)); + let preout = inout.to_preout(); + + IndirectAssignmentCert { + block_hash, + validator, + cert: AssignmentCert { + kind: AssignmentCertKind::RelayVRFModulo { sample: 1 }, + vrf: VrfSignature { pre_output: VrfPreOutput(preout), proof: VrfProof(proof) }, + }, + } +} + +fn fake_assignment_cert_v2( + block_hash: Hash, + validator: ValidatorIndex, + core_bitfield: CoreBitfield, +) -> IndirectAssignmentCertV2 { + let ctx = schnorrkel::signing_context(RELAY_VRF_MODULO_CONTEXT); + let msg = b"WhenParachains?"; + let mut prng = rand_core::OsRng; + let keypair = schnorrkel::Keypair::generate_with(&mut prng); + let (inout, proof, _) = keypair.vrf_sign(ctx.bytes(msg)); + let preout = inout.to_preout(); + + IndirectAssignmentCertV2 { + block_hash, + validator, + cert: AssignmentCertV2 { + kind: AssignmentCertKindV2::RelayVRFModuloCompact { core_bitfield }, + vrf: VrfSignature { pre_output: VrfPreOutput(preout), proof: VrfProof(proof) }, + }, + } +} + +/// Creates a meaningless signature +pub fn dummy_signature() -> polkadot_primitives::ValidatorSignature { + sp_core::crypto::UncheckedFrom::unchecked_from([1u8; 64]) +} + +fn build_subsystem( + sync_oracle: Box, +) -> ( + ApprovalVotingParallelSubsystem, + TestSubsystemContext>, + VirtualOverseer, +) { + sp_tracing::init_for_tests(); + + let pool = sp_core::testing::TaskExecutor::new(); + let (context, virtual_overseer) = polkadot_node_subsystem_test_helpers::make_subsystem_context::< + ApprovalVotingParallelMessage, + _, + >(pool.clone()); + + let keystore = LocalKeystore::in_memory(); + let _ = keystore.sr25519_generate_new( + polkadot_primitives::PARACHAIN_KEY_TYPE_ID, + Some(&Sr25519Keyring::Alice.to_seed()), + ); + + let clock = Arc::new(SystemClock {}); + let db = kvdb_memorydb::create(test_constants::NUM_COLUMNS); + let db = polkadot_node_subsystem_util::database::kvdb_impl::DbAdapter::new(db, &[]); + + ( + ApprovalVotingParallelSubsystem::with_config_and_clock( + Config { + col_approval_data: test_constants::DATA_COL, + slot_duration_millis: SLOT_DURATION_MILLIS, + }, + Arc::new(db), + Arc::new(keystore), + sync_oracle, + Metrics::default(), + clock.clone(), + SpawnGlue(pool), + None, + ), + context, + virtual_overseer, + ) +} + +#[derive(Clone)] +struct TestSyncOracle {} + +impl SyncOracle for TestSyncOracle { + fn is_major_syncing(&self) -> bool { + false + } + + fn is_offline(&self) -> bool { + unimplemented!("not used in network bridge") + } +} + +fn test_harness( + num_approval_distro_workers: usize, + prio_right: Clos, + subsystem_gracefully_exits: bool, + test_fn: impl FnOnce( + VirtualOverseer, + WorkProvider, + Vec>, + ) -> T, +) where + T: Future, + Clos: Clone + FnMut(&mut State) -> PollNext, + State: Default, +{ + let (subsystem, context, virtual_overseer) = build_subsystem(Box::new(TestSyncOracle {})); + let mut metrics_watcher = MetricsWatcher::new(subsystem.metrics.clone()); + let channel_size = 5; + + let (to_approval_voting_worker, approval_voting_work_provider) = + build_worker_handles::( + "to_approval_voting_worker".into(), + channel_size, + &mut metrics_watcher, + prio_right.clone(), + ); + + let approval_distribution_channels = { 0..num_approval_distro_workers } + .into_iter() + .map(|worker_index| { + build_worker_handles::( + format!("to_approval_distro/{}", worker_index), + channel_size, + &mut metrics_watcher, + prio_right.clone(), + ) + }) + .collect_vec(); + + let to_approval_distribution_workers = + approval_distribution_channels.iter().map(|(tx, _)| tx.clone()).collect_vec(); + let approval_distribution_work_providers = + approval_distribution_channels.into_iter().map(|(_, rx)| rx).collect_vec(); + + let subsystem = async move { + let result = run_main_loop( + context, + to_approval_voting_worker, + to_approval_distribution_workers, + metrics_watcher, + ) + .await; + + if subsystem_gracefully_exits && result.is_err() { + result + } else { + Ok(()) + } + }; + + let test_fut = test_fn( + virtual_overseer, + approval_voting_work_provider, + approval_distribution_work_providers, + ); + + futures::pin_mut!(test_fut); + futures::pin_mut!(subsystem); + + futures::executor::block_on(future::join( + async move { + let _overseer = test_fut.await; + }, + subsystem, + )) + .1 + .unwrap(); +} + +const TIMEOUT: Duration = Duration::from_millis(2000); + +async fn overseer_signal(overseer: &mut VirtualOverseer, signal: OverseerSignal) { + overseer + .send(FromOrchestra::Signal(signal)) + .timeout(TIMEOUT) + .await + .expect(&format!("{:?} is more than enough for sending signals.", TIMEOUT)); +} + +async fn overseer_message(overseer: &mut VirtualOverseer, msg: ApprovalVotingParallelMessage) { + overseer + .send(FromOrchestra::Communication { msg }) + .timeout(TIMEOUT) + .await + .expect(&format!("{:?} is more than enough for sending signals.", TIMEOUT)); +} + +async fn run_start_workers() { + let (subsystem, mut context, _) = build_subsystem(Box::new(TestSyncOracle {})); + let mut metrics_watcher = MetricsWatcher::new(subsystem.metrics.clone()); + let _workers = start_workers(&mut context, subsystem, &mut metrics_watcher).await.unwrap(); +} + +// Test starting the workers succeeds. +#[test] +fn start_workers_succeeds() { + futures::executor::block_on(run_start_workers()); +} + +// Test main loop forwards messages to the correct worker for all type of messages. +#[test] +fn test_main_loop_forwards_correctly() { + let num_approval_distro_workers = 4; + test_harness( + num_approval_distro_workers, + prio_right, + true, + |mut overseer, mut approval_voting_work_provider, mut rx_approval_distribution_workers| async move { + // 1. Check Signals are correctly forwarded to the workers. + let signal = OverseerSignal::ActiveLeaves(ActiveLeavesUpdate::start_work(new_leaf( + Hash::random(), + 1, + ))); + overseer_signal(&mut overseer, signal.clone()).await; + let approval_voting_receives = approval_voting_work_provider.recv().await.unwrap(); + assert_matches!(approval_voting_receives, FromOrchestra::Signal(_)); + for rx_approval_distribution_worker in rx_approval_distribution_workers.iter_mut() { + let approval_distribution_receives = + rx_approval_distribution_worker.next().await.unwrap(); + assert_matches!(approval_distribution_receives, FromOrchestra::Signal(_)); + } + + let (test_tx, _rx) = oneshot::channel(); + let test_hash = Hash::random(); + let test_block_nr = 2; + overseer_message( + &mut overseer, + ApprovalVotingParallelMessage::ApprovedAncestor(test_hash, test_block_nr, test_tx), + ) + .await; + assert_matches!( + approval_voting_work_provider.recv().await.unwrap(), + FromOrchestra::Communication { + msg: ApprovalVotingMessage::ApprovedAncestor(hash, block_nr, _) + } => { + assert_eq!(hash, test_hash); + assert_eq!(block_nr, test_block_nr); + } + ); + for rx_approval_distribution_worker in rx_approval_distribution_workers.iter_mut() { + assert!(rx_approval_distribution_worker + .next() + .timeout(Duration::from_millis(200)) + .await + .is_none()); + } + + // 2. Check GetApprovalSignaturesForCandidate is correctly forwarded to the workers. + let (test_tx, _rx) = oneshot::channel(); + let test_hash = CandidateHash(Hash::random()); + overseer_message( + &mut overseer, + ApprovalVotingParallelMessage::GetApprovalSignaturesForCandidate( + test_hash, test_tx, + ), + ) + .await; + + assert_matches!( + approval_voting_work_provider.recv().await.unwrap(), + FromOrchestra::Communication { + msg: ApprovalVotingMessage::GetApprovalSignaturesForCandidate(hash, _) + } => { + assert_eq!(hash, test_hash); + } + ); + + for rx_approval_distribution_worker in rx_approval_distribution_workers.iter_mut() { + assert!(rx_approval_distribution_worker + .next() + .timeout(Duration::from_millis(200)) + .await + .is_none()); + } + + // 3. Check NewBlocks is correctly forwarded to the workers. + overseer_message(&mut overseer, ApprovalVotingParallelMessage::NewBlocks(vec![])).await; + for rx_approval_distribution_worker in rx_approval_distribution_workers.iter_mut() { + assert_matches!(rx_approval_distribution_worker.next().await.unwrap(), + FromOrchestra::Communication { + msg: ApprovalDistributionMessage::NewBlocks(blocks) + } => { + assert!(blocks.is_empty()); + } + ); + } + assert!(approval_voting_work_provider + .recv() + .timeout(Duration::from_millis(200)) + .await + .is_none()); + + // 4. Check DistributeAssignment is correctly forwarded to the workers. + let validator_index = ValidatorIndex(17); + let assignment = + fake_assignment_cert_v2(Hash::random(), validator_index, CoreIndex(1).into()); + overseer_message( + &mut overseer, + ApprovalVotingParallelMessage::DistributeAssignment(assignment.clone(), 1.into()), + ) + .await; + + for (index, rx_approval_distribution_worker) in + rx_approval_distribution_workers.iter_mut().enumerate() + { + if index == validator_index.0 as usize % num_approval_distro_workers { + assert_matches!(rx_approval_distribution_worker.next().await.unwrap(), + FromOrchestra::Communication { + msg: ApprovalDistributionMessage::DistributeAssignment(cert, bitfield) + } => { + assert_eq!(cert, assignment); + assert_eq!(bitfield, 1.into()); + } + ); + } else { + assert!(rx_approval_distribution_worker + .next() + .timeout(Duration::from_millis(200)) + .await + .is_none()); + } + } + assert!(approval_voting_work_provider + .recv() + .timeout(Duration::from_millis(200)) + .await + .is_none()); + + // 5. Check DistributeApproval is correctly forwarded to the workers. + let validator_index = ValidatorIndex(26); + let expected_vote = IndirectSignedApprovalVoteV2 { + block_hash: H256::random(), + candidate_indices: 1.into(), + validator: validator_index, + signature: dummy_signature(), + }; + + overseer_message( + &mut overseer, + ApprovalVotingParallelMessage::DistributeApproval(expected_vote.clone()), + ) + .await; + + for (index, rx_approval_distribution_worker) in + rx_approval_distribution_workers.iter_mut().enumerate() + { + if index == validator_index.0 as usize % num_approval_distro_workers { + assert_matches!(rx_approval_distribution_worker.next().await.unwrap(), + FromOrchestra::Communication { + msg: ApprovalDistributionMessage::DistributeApproval(vote) + } => { + assert_eq!(vote, expected_vote); + } + ); + } else { + assert!(rx_approval_distribution_worker + .next() + .timeout(Duration::from_millis(200)) + .await + .is_none()); + } + } + + // 6. Check NetworkBridgeUpdate::PeerMessage is correctly forwarded just to one of the + // workers. + let approvals = vec![ + IndirectSignedApprovalVoteV2 { + block_hash: H256::random(), + candidate_indices: 1.into(), + validator: validator_index, + signature: dummy_signature(), + }, + IndirectSignedApprovalVoteV2 { + block_hash: H256::random(), + candidate_indices: 2.into(), + validator: validator_index, + signature: dummy_signature(), + }, + ]; + let expected_msg = polkadot_node_network_protocol::Versioned::V3( + polkadot_node_network_protocol::v3::ApprovalDistributionMessage::Approvals( + approvals.clone(), + ), + ); + overseer_message( + &mut overseer, + ApprovalVotingParallelMessage::NetworkBridgeUpdate( + polkadot_node_subsystem::messages::NetworkBridgeEvent::PeerMessage( + PeerId::random(), + expected_msg.clone(), + ), + ), + ) + .await; + + for (index, rx_approval_distribution_worker) in + rx_approval_distribution_workers.iter_mut().enumerate() + { + if index == validator_index.0 as usize % num_approval_distro_workers { + assert_matches!(rx_approval_distribution_worker.next().await.unwrap(), + FromOrchestra::Communication { + msg: ApprovalDistributionMessage::NetworkBridgeUpdate( + polkadot_node_subsystem::messages::NetworkBridgeEvent::PeerMessage( + _, + msg, + ), + ) + } => { + assert_eq!(msg, expected_msg); + } + ); + } else { + assert!(rx_approval_distribution_worker + .next() + .timeout(Duration::from_millis(200)) + .await + .is_none()); + } + } + assert!(approval_voting_work_provider + .recv() + .timeout(Duration::from_millis(200)) + .await + .is_none()); + + assert!(approval_voting_work_provider + .recv() + .timeout(Duration::from_millis(200)) + .await + .is_none()); + + // 7. Check NetworkBridgeUpdate::PeerConnected is correctly forwarded to all workers. + let expected_peer_id = PeerId::random(); + overseer_message( + &mut overseer, + ApprovalVotingParallelMessage::NetworkBridgeUpdate( + polkadot_node_subsystem::messages::NetworkBridgeEvent::PeerConnected( + expected_peer_id, + ObservedRole::Authority, + ValidationVersion::V3.into(), + None, + ), + ), + ) + .await; + + for rx_approval_distribution_worker in rx_approval_distribution_workers.iter_mut() { + assert_matches!(rx_approval_distribution_worker.next().await.unwrap(), + FromOrchestra::Communication { + msg: ApprovalDistributionMessage::NetworkBridgeUpdate( + polkadot_node_subsystem::messages::NetworkBridgeEvent::PeerConnected( + peer_id, + role, + version, + authority_id, + ), + ) + } => { + assert_eq!(peer_id, expected_peer_id); + assert_eq!(role, ObservedRole::Authority); + assert_eq!(version, ValidationVersion::V3.into()); + assert_eq!(authority_id, None); + } + ); + } + assert!(approval_voting_work_provider + .recv() + .timeout(Duration::from_millis(200)) + .await + .is_none()); + + // 8. Check ApprovalCheckingLagUpdate is correctly forwarded to all workers. + overseer_message( + &mut overseer, + ApprovalVotingParallelMessage::ApprovalCheckingLagUpdate(7), + ) + .await; + + for rx_approval_distribution_worker in rx_approval_distribution_workers.iter_mut() { + assert_matches!(rx_approval_distribution_worker.next().await.unwrap(), + FromOrchestra::Communication { + msg: ApprovalDistributionMessage::ApprovalCheckingLagUpdate( + lag + ) + } => { + assert_eq!(lag, 7); + } + ); + } + assert!(approval_voting_work_provider + .recv() + .timeout(Duration::from_millis(200)) + .await + .is_none()); + + overseer_signal(&mut overseer, OverseerSignal::Conclude).await; + + overseer + }, + ); +} + +/// Test GetApprovalSignatures correctly gatheres the signatures from all workers. +#[test] +fn test_handle_get_approval_signatures() { + let num_approval_distro_workers = 4; + + test_harness( + num_approval_distro_workers, + prio_right, + true, + |mut overseer, mut approval_voting_work_provider, mut rx_approval_distribution_workers| async move { + let (tx, rx) = oneshot::channel(); + let first_block = Hash::random(); + let second_block = Hash::random(); + let expected_candidates: HashSet<_> = + vec![(first_block, 2), (second_block, 3)].into_iter().collect(); + + overseer_message( + &mut overseer, + ApprovalVotingParallelMessage::GetApprovalSignatures( + expected_candidates.clone(), + tx, + ), + ) + .await; + + assert!(approval_voting_work_provider + .recv() + .timeout(Duration::from_millis(200)) + .await + .is_none()); + let mut all_votes = HashMap::new(); + for (index, rx_approval_distribution_worker) in + rx_approval_distribution_workers.iter_mut().enumerate() + { + assert_matches!(rx_approval_distribution_worker.next().await.unwrap(), + FromOrchestra::Communication { + msg: ApprovalDistributionMessage::GetApprovalSignatures( + candidates, tx + ) + } => { + assert_eq!(candidates, expected_candidates); + let to_send: HashMap<_, _> = {0..10}.into_iter().map(|validator| { + let validator_index = ValidatorIndex(validator as u32 * num_approval_distro_workers as u32 + index as u32); + (validator_index, (first_block, vec![2, 4], dummy_signature())) + }).collect(); + tx.send(to_send.clone()).unwrap(); + all_votes.extend(to_send.clone()); + + } + ); + } + + let received_votes = rx.await.unwrap(); + assert_eq!(received_votes, all_votes); + overseer_signal(&mut overseer, OverseerSignal::Conclude).await; + + overseer + }, + ) +} + +/// Test subsystem exits with error when approval_voting_work_provider exits. +#[test] +fn test_subsystem_exits_with_error_if_approval_voting_worker_errors() { + let num_approval_distro_workers = 4; + + test_harness( + num_approval_distro_workers, + prio_right, + false, + |overseer, approval_voting_work_provider, _rx_approval_distribution_workers| async move { + // Drop the approval_voting_work_provider to simulate an error. + std::mem::drop(approval_voting_work_provider); + + overseer + }, + ) +} + +/// Test subsystem exits with error when approval_distribution_workers exits. +#[test] +fn test_subsystem_exits_with_error_if_approval_distribution_worker_errors() { + let num_approval_distro_workers = 4; + + test_harness( + num_approval_distro_workers, + prio_right, + false, + |overseer, _approval_voting_work_provider, rx_approval_distribution_workers| async move { + // Drop the approval_distribution_workers to simulate an error. + std::mem::drop(rx_approval_distribution_workers.into_iter().next().unwrap()); + overseer + }, + ) +} + +/// Test signals sent before messages are processed in order. +#[test] +fn test_signal_before_message_keeps_receive_order() { + let num_approval_distro_workers = 4; + + test_harness( + num_approval_distro_workers, + prio_right, + true, + |mut overseer, mut approval_voting_work_provider, mut rx_approval_distribution_workers| async move { + let signal = OverseerSignal::ActiveLeaves(ActiveLeavesUpdate::start_work(new_leaf( + Hash::random(), + 1, + ))); + overseer_signal(&mut overseer, signal.clone()).await; + + let validator_index = ValidatorIndex(17); + let assignment = + fake_assignment_cert_v2(Hash::random(), validator_index, CoreIndex(1).into()); + overseer_message( + &mut overseer, + ApprovalVotingParallelMessage::DistributeAssignment(assignment.clone(), 1.into()), + ) + .await; + + let approval_voting_receives = approval_voting_work_provider.recv().await.unwrap(); + assert_matches!(approval_voting_receives, FromOrchestra::Signal(_)); + let rx_approval_distribution_worker = rx_approval_distribution_workers + .get_mut(validator_index.0 as usize % num_approval_distro_workers) + .unwrap(); + let approval_distribution_receives = + rx_approval_distribution_worker.next().await.unwrap(); + assert_matches!(approval_distribution_receives, FromOrchestra::Signal(_)); + assert_matches!( + rx_approval_distribution_worker.next().await.unwrap(), + FromOrchestra::Communication { + msg: ApprovalDistributionMessage::DistributeAssignment(_, _) + } + ); + + overseer_signal(&mut overseer, OverseerSignal::Conclude).await; + overseer + }, + ) +} + +/// Test signals sent after messages are processed with the highest priority. +#[test] +fn test_signal_is_prioritized_when_unread_messages_in_the_queue() { + let num_approval_distro_workers = 4; + + test_harness( + num_approval_distro_workers, + prio_right, + true, + |mut overseer, mut approval_voting_work_provider, mut rx_approval_distribution_workers| async move { + let validator_index = ValidatorIndex(17); + let assignment = + fake_assignment_cert_v2(Hash::random(), validator_index, CoreIndex(1).into()); + overseer_message( + &mut overseer, + ApprovalVotingParallelMessage::DistributeAssignment(assignment.clone(), 1.into()), + ) + .await; + + let signal = OverseerSignal::ActiveLeaves(ActiveLeavesUpdate::start_work(new_leaf( + Hash::random(), + 1, + ))); + overseer_signal(&mut overseer, signal.clone()).await; + + let approval_voting_receives = approval_voting_work_provider.recv().await.unwrap(); + assert_matches!(approval_voting_receives, FromOrchestra::Signal(_)); + let rx_approval_distribution_worker = rx_approval_distribution_workers + .get_mut(validator_index.0 as usize % num_approval_distro_workers) + .unwrap(); + let approval_distribution_receives = + rx_approval_distribution_worker.next().await.unwrap(); + assert_matches!(approval_distribution_receives, FromOrchestra::Signal(_)); + assert_matches!( + rx_approval_distribution_worker.next().await.unwrap(), + FromOrchestra::Communication { + msg: ApprovalDistributionMessage::DistributeAssignment(_, _) + } + ); + + overseer_signal(&mut overseer, OverseerSignal::Conclude).await; + overseer + }, + ) +} + +/// Test peer view updates have higher priority than normal messages. +#[test] +fn test_peer_view_is_prioritized_when_unread_messages_in_the_queue() { + let num_approval_distro_workers = 4; + + test_harness( + num_approval_distro_workers, + prio_right, + true, + |mut overseer, mut approval_voting_work_provider, mut rx_approval_distribution_workers| async move { + let validator_index = ValidatorIndex(17); + let approvals = vec![ + IndirectSignedApprovalVoteV2 { + block_hash: H256::random(), + candidate_indices: 1.into(), + validator: validator_index, + signature: dummy_signature(), + }, + IndirectSignedApprovalVoteV2 { + block_hash: H256::random(), + candidate_indices: 2.into(), + validator: validator_index, + signature: dummy_signature(), + }, + ]; + let expected_msg = polkadot_node_network_protocol::Versioned::V3( + polkadot_node_network_protocol::v3::ApprovalDistributionMessage::Approvals( + approvals.clone(), + ), + ); + overseer_message( + &mut overseer, + ApprovalVotingParallelMessage::NetworkBridgeUpdate( + polkadot_node_subsystem::messages::NetworkBridgeEvent::PeerMessage( + PeerId::random(), + expected_msg.clone(), + ), + ), + ) + .await; + + overseer_message( + &mut overseer, + ApprovalVotingParallelMessage::NetworkBridgeUpdate( + polkadot_node_subsystem::messages::NetworkBridgeEvent::PeerViewChange( + PeerId::random(), + View::default(), + ), + ), + ) + .await; + + for (index, rx_approval_distribution_worker) in + rx_approval_distribution_workers.iter_mut().enumerate() + { + assert_matches!(rx_approval_distribution_worker.next().await.unwrap(), + FromOrchestra::Communication { + msg: ApprovalDistributionMessage::NetworkBridgeUpdate( + polkadot_node_subsystem::messages::NetworkBridgeEvent::PeerViewChange( + _, + _, + ), + ) + } => { + } + ); + if index == validator_index.0 as usize % num_approval_distro_workers { + assert_matches!(rx_approval_distribution_worker.next().await.unwrap(), + FromOrchestra::Communication { + msg: ApprovalDistributionMessage::NetworkBridgeUpdate( + polkadot_node_subsystem::messages::NetworkBridgeEvent::PeerMessage( + _, + msg, + ), + ) + } => { + assert_eq!(msg, expected_msg); + } + ); + } else { + assert!(rx_approval_distribution_worker + .next() + .timeout(Duration::from_millis(200)) + .await + .is_none()); + } + } + + assert!(approval_voting_work_provider + .recv() + .timeout(Duration::from_millis(200)) + .await + .is_none()); + + overseer_signal(&mut overseer, OverseerSignal::Conclude).await; + overseer + }, + ) +} + +// Test validator_index_for_msg with empty messages. +#[test] +fn test_validator_index_with_empty_message() { + let result = validator_index_for_msg(polkadot_node_network_protocol::Versioned::V1( + polkadot_node_network_protocol::v1::ApprovalDistributionMessage::Assignments(vec![]), + )); + + assert_eq!(result, (None, Some(vec![]))); + + let result = validator_index_for_msg(polkadot_node_network_protocol::Versioned::V2( + polkadot_node_network_protocol::v2::ApprovalDistributionMessage::Assignments(vec![]), + )); + + assert_eq!(result, (None, Some(vec![]))); + + let result = validator_index_for_msg(polkadot_node_network_protocol::Versioned::V3( + polkadot_node_network_protocol::v3::ApprovalDistributionMessage::Assignments(vec![]), + )); + + assert_eq!(result, (None, Some(vec![]))); + + let result = validator_index_for_msg(polkadot_node_network_protocol::Versioned::V1( + polkadot_node_network_protocol::v1::ApprovalDistributionMessage::Approvals(vec![]), + )); + + assert_eq!(result, (None, Some(vec![]))); + + let result = validator_index_for_msg(polkadot_node_network_protocol::Versioned::V2( + polkadot_node_network_protocol::v2::ApprovalDistributionMessage::Approvals(vec![]), + )); + + assert_eq!(result, (None, Some(vec![]))); + + let result = validator_index_for_msg(polkadot_node_network_protocol::Versioned::V3( + polkadot_node_network_protocol::v3::ApprovalDistributionMessage::Approvals(vec![]), + )); + + assert_eq!(result, (None, Some(vec![]))); +} + +// Test validator_index_for_msg when all the messages are originating from the same validator. +#[test] +fn test_validator_index_with_all_messages_from_the_same_validator() { + let validator_index = ValidatorIndex(3); + let v1_assignment = polkadot_node_network_protocol::Versioned::V1( + polkadot_node_network_protocol::v1::ApprovalDistributionMessage::Assignments(vec![ + (fake_assignment_cert(H256::random(), validator_index), 1), + (fake_assignment_cert(H256::random(), validator_index), 3), + ]), + ); + let result = validator_index_for_msg(v1_assignment.clone()); + + assert_eq!(result, (Some((validator_index, v1_assignment)), None)); + + let v1_approval = polkadot_node_network_protocol::Versioned::V1( + polkadot_node_network_protocol::v1::ApprovalDistributionMessage::Approvals(vec![ + IndirectSignedApprovalVote { + block_hash: H256::random(), + candidate_index: 1, + validator: validator_index, + signature: dummy_signature(), + }, + IndirectSignedApprovalVote { + block_hash: H256::random(), + candidate_index: 1, + validator: validator_index, + signature: dummy_signature(), + }, + ]), + ); + let result = validator_index_for_msg(v1_approval.clone()); + + assert_eq!(result, (Some((validator_index, v1_approval)), None)); + + let validator_index = ValidatorIndex(3); + let v2_assignment = polkadot_node_network_protocol::Versioned::V2( + polkadot_node_network_protocol::v2::ApprovalDistributionMessage::Assignments(vec![ + (fake_assignment_cert(H256::random(), validator_index), 1), + (fake_assignment_cert(H256::random(), validator_index), 3), + ]), + ); + let result = validator_index_for_msg(v2_assignment.clone()); + + assert_eq!(result, (Some((validator_index, v2_assignment)), None)); + + let v2_approval = polkadot_node_network_protocol::Versioned::V2( + polkadot_node_network_protocol::v2::ApprovalDistributionMessage::Approvals(vec![ + IndirectSignedApprovalVote { + block_hash: H256::random(), + candidate_index: 1, + validator: validator_index, + signature: dummy_signature(), + }, + IndirectSignedApprovalVote { + block_hash: H256::random(), + candidate_index: 1, + validator: validator_index, + signature: dummy_signature(), + }, + ]), + ); + let result = validator_index_for_msg(v2_approval.clone()); + + assert_eq!(result, (Some((validator_index, v2_approval)), None)); + + let validator_index = ValidatorIndex(3); + let v3_assignment = polkadot_node_network_protocol::Versioned::V3( + polkadot_node_network_protocol::v3::ApprovalDistributionMessage::Assignments(vec![ + ( + fake_assignment_cert_v2(H256::random(), validator_index, CoreIndex(1).into()), + 1.into(), + ), + ( + fake_assignment_cert_v2(H256::random(), validator_index, CoreIndex(3).into()), + 3.into(), + ), + ]), + ); + let result = validator_index_for_msg(v3_assignment.clone()); + + assert_eq!(result, (Some((validator_index, v3_assignment)), None)); + + let v3_approval = polkadot_node_network_protocol::Versioned::V3( + polkadot_node_network_protocol::v3::ApprovalDistributionMessage::Approvals(vec![ + IndirectSignedApprovalVoteV2 { + block_hash: H256::random(), + candidate_indices: 1.into(), + validator: validator_index, + signature: dummy_signature(), + }, + IndirectSignedApprovalVoteV2 { + block_hash: H256::random(), + candidate_indices: 1.into(), + validator: validator_index, + signature: dummy_signature(), + }, + ]), + ); + let result = validator_index_for_msg(v3_approval.clone()); + + assert_eq!(result, (Some((validator_index, v3_approval)), None)); +} + +// Test validator_index_for_msg when all the messages are originating from different validators, +// so the function should split them by validator index, so we can forward them separately to the +// worker they are assigned to. +#[test] +fn test_validator_index_with_messages_from_different_validators() { + let first_validator_index = ValidatorIndex(3); + let second_validator_index = ValidatorIndex(4); + let assignments = vec![ + (fake_assignment_cert(H256::random(), first_validator_index), 1), + (fake_assignment_cert(H256::random(), second_validator_index), 3), + ]; + let v1_assignment = polkadot_node_network_protocol::Versioned::V1( + polkadot_node_network_protocol::v1::ApprovalDistributionMessage::Assignments( + assignments.clone(), + ), + ); + let result = validator_index_for_msg(v1_assignment.clone()); + + assert_matches!(result, (None, Some(_))); + let messsages_split_by_validator = result.1.unwrap(); + assert_eq!(messsages_split_by_validator.len(), assignments.len()); + for (index, (validator_index, message)) in messsages_split_by_validator.into_iter().enumerate() + { + assert_eq!(validator_index, assignments[index].0.validator); + assert_eq!( + message, + polkadot_node_network_protocol::Versioned::V1( + polkadot_node_network_protocol::v1::ApprovalDistributionMessage::Assignments( + assignments.get(index).into_iter().cloned().collect(), + ), + ) + ); + } + + let v2_assignment = polkadot_node_network_protocol::Versioned::V2( + polkadot_node_network_protocol::v2::ApprovalDistributionMessage::Assignments( + assignments.clone(), + ), + ); + let result = validator_index_for_msg(v2_assignment.clone()); + + assert_matches!(result, (None, Some(_))); + let messsages_split_by_validator = result.1.unwrap(); + assert_eq!(messsages_split_by_validator.len(), assignments.len()); + for (index, (validator_index, message)) in messsages_split_by_validator.into_iter().enumerate() + { + assert_eq!(validator_index, assignments[index].0.validator); + assert_eq!( + message, + polkadot_node_network_protocol::Versioned::V2( + polkadot_node_network_protocol::v2::ApprovalDistributionMessage::Assignments( + assignments.get(index).into_iter().cloned().collect(), + ), + ) + ); + } + + let first_validator_index = ValidatorIndex(3); + let second_validator_index = ValidatorIndex(4); + let v2_assignments = vec![ + ( + fake_assignment_cert_v2(H256::random(), first_validator_index, CoreIndex(1).into()), + 1.into(), + ), + ( + fake_assignment_cert_v2(H256::random(), second_validator_index, CoreIndex(3).into()), + 3.into(), + ), + ]; + + let approvals = vec![ + IndirectSignedApprovalVote { + block_hash: H256::random(), + candidate_index: 1, + validator: first_validator_index, + signature: dummy_signature(), + }, + IndirectSignedApprovalVote { + block_hash: H256::random(), + candidate_index: 2, + validator: second_validator_index, + signature: dummy_signature(), + }, + ]; + let v2_approvals = polkadot_node_network_protocol::Versioned::V2( + polkadot_node_network_protocol::v2::ApprovalDistributionMessage::Approvals( + approvals.clone(), + ), + ); + let result = validator_index_for_msg(v2_approvals.clone()); + + assert_matches!(result, (None, Some(_))); + let messsages_split_by_validator = result.1.unwrap(); + assert_eq!(messsages_split_by_validator.len(), approvals.len()); + for (index, (validator_index, message)) in messsages_split_by_validator.into_iter().enumerate() + { + assert_eq!(validator_index, approvals[index].validator); + assert_eq!( + message, + polkadot_node_network_protocol::Versioned::V2( + polkadot_node_network_protocol::v2::ApprovalDistributionMessage::Approvals( + approvals.get(index).into_iter().cloned().collect(), + ), + ) + ); + } + + let v3_assignment = polkadot_node_network_protocol::Versioned::V3( + polkadot_node_network_protocol::v3::ApprovalDistributionMessage::Assignments( + v2_assignments.clone(), + ), + ); + let result = validator_index_for_msg(v3_assignment.clone()); + + assert_matches!(result, (None, Some(_))); + let messsages_split_by_validator = result.1.unwrap(); + assert_eq!(messsages_split_by_validator.len(), v2_assignments.len()); + for (index, (validator_index, message)) in messsages_split_by_validator.into_iter().enumerate() + { + assert_eq!(validator_index, v2_assignments[index].0.validator); + assert_eq!( + message, + polkadot_node_network_protocol::Versioned::V3( + polkadot_node_network_protocol::v3::ApprovalDistributionMessage::Assignments( + v2_assignments.get(index).into_iter().cloned().collect(), + ), + ) + ); + } + + let approvals = vec![ + IndirectSignedApprovalVoteV2 { + block_hash: H256::random(), + candidate_indices: 1.into(), + validator: first_validator_index, + signature: dummy_signature(), + }, + IndirectSignedApprovalVoteV2 { + block_hash: H256::random(), + candidate_indices: 2.into(), + validator: second_validator_index, + signature: dummy_signature(), + }, + ]; + let v3_approvals = polkadot_node_network_protocol::Versioned::V3( + polkadot_node_network_protocol::v3::ApprovalDistributionMessage::Approvals( + approvals.clone(), + ), + ); + let result = validator_index_for_msg(v3_approvals.clone()); + + assert_matches!(result, (None, Some(_))); + let messsages_split_by_validator = result.1.unwrap(); + assert_eq!(messsages_split_by_validator.len(), approvals.len()); + for (index, (validator_index, message)) in messsages_split_by_validator.into_iter().enumerate() + { + assert_eq!(validator_index, approvals[index].validator); + assert_eq!( + message, + polkadot_node_network_protocol::Versioned::V3( + polkadot_node_network_protocol::v3::ApprovalDistributionMessage::Approvals( + approvals.get(index).into_iter().cloned().collect(), + ), + ) + ); + } +} diff --git a/polkadot/node/core/approval-voting/benches/approval-voting-regression-bench.rs b/polkadot/node/core/approval-voting/benches/approval-voting-regression-bench.rs index 0b03f1127ee8..db0396a83193 100644 --- a/polkadot/node/core/approval-voting/benches/approval-voting-regression-bench.rs +++ b/polkadot/node/core/approval-voting/benches/approval-voting-regression-bench.rs @@ -53,6 +53,7 @@ fn main() -> Result<(), String> { stop_when_approved: false, workdir_prefix: "/tmp".to_string(), num_no_shows_per_candidate: 0, + approval_voting_parallel_enabled: true, }; println!("Benchmarking..."); diff --git a/polkadot/node/core/dispute-coordinator/src/initialized.rs b/polkadot/node/core/dispute-coordinator/src/initialized.rs index 5096fe5e6891..9cf9047b7273 100644 --- a/polkadot/node/core/dispute-coordinator/src/initialized.rs +++ b/polkadot/node/core/dispute-coordinator/src/initialized.rs @@ -34,8 +34,9 @@ use polkadot_node_primitives::{ }; use polkadot_node_subsystem::{ messages::{ - ApprovalVotingMessage, BlockDescription, ChainSelectionMessage, DisputeCoordinatorMessage, - DisputeDistributionMessage, ImportStatementsResult, + ApprovalVotingMessage, ApprovalVotingParallelMessage, BlockDescription, + ChainSelectionMessage, DisputeCoordinatorMessage, DisputeDistributionMessage, + ImportStatementsResult, }, overseer, ActivatedLeaf, ActiveLeavesUpdate, FromOrchestra, OverseerSignal, RuntimeApiError, }; @@ -117,6 +118,7 @@ pub(crate) struct Initialized { /// `CHAIN_IMPORT_MAX_BATCH_SIZE` and put the rest here for later processing. chain_import_backlog: VecDeque, metrics: Metrics, + approval_voting_parallel_enabled: bool, } #[overseer::contextbounds(DisputeCoordinator, prefix = self::overseer)] @@ -130,7 +132,13 @@ impl Initialized { highest_session_seen: SessionIndex, gaps_in_cache: bool, ) -> Self { - let DisputeCoordinatorSubsystem { config: _, store: _, keystore, metrics } = subsystem; + let DisputeCoordinatorSubsystem { + config: _, + store: _, + keystore, + metrics, + approval_voting_parallel_enabled, + } = subsystem; let (participation_sender, participation_receiver) = mpsc::channel(1); let participation = Participation::new(participation_sender, metrics.clone()); @@ -148,6 +156,7 @@ impl Initialized { participation_receiver, chain_import_backlog: VecDeque::new(), metrics, + approval_voting_parallel_enabled, } } @@ -1059,9 +1068,21 @@ impl Initialized { // 4. We are waiting (and blocking the whole subsystem) on a response right after - // therefore even with all else failing we will never have more than // one message in flight at any given time. - ctx.send_unbounded_message( - ApprovalVotingMessage::GetApprovalSignaturesForCandidate(candidate_hash, tx), - ); + if self.approval_voting_parallel_enabled { + ctx.send_unbounded_message( + ApprovalVotingParallelMessage::GetApprovalSignaturesForCandidate( + candidate_hash, + tx, + ), + ); + } else { + ctx.send_unbounded_message( + ApprovalVotingMessage::GetApprovalSignaturesForCandidate( + candidate_hash, + tx, + ), + ); + } match rx.await { Err(_) => { gum::warn!( diff --git a/polkadot/node/core/dispute-coordinator/src/lib.rs b/polkadot/node/core/dispute-coordinator/src/lib.rs index 34d9ddf3a97c..84408eb96305 100644 --- a/polkadot/node/core/dispute-coordinator/src/lib.rs +++ b/polkadot/node/core/dispute-coordinator/src/lib.rs @@ -122,6 +122,7 @@ pub struct DisputeCoordinatorSubsystem { store: Arc, keystore: Arc, metrics: Metrics, + approval_voting_parallel_enabled: bool, } /// Configuration for the dispute coordinator subsystem. @@ -164,8 +165,9 @@ impl DisputeCoordinatorSubsystem { config: Config, keystore: Arc, metrics: Metrics, + approval_voting_parallel_enabled: bool, ) -> Self { - Self { store, config, keystore, metrics } + Self { store, config, keystore, metrics, approval_voting_parallel_enabled } } /// Initialize and afterwards run `Initialized::run`. diff --git a/polkadot/node/core/dispute-coordinator/src/tests.rs b/polkadot/node/core/dispute-coordinator/src/tests.rs index f97a625a9528..b41cdb94b4d2 100644 --- a/polkadot/node/core/dispute-coordinator/src/tests.rs +++ b/polkadot/node/core/dispute-coordinator/src/tests.rs @@ -580,6 +580,7 @@ impl TestState { self.config, self.subsystem_keystore.clone(), Metrics::default(), + false, ); let backend = DbBackend::new(self.db.clone(), self.config.column_config(), Metrics::default()); diff --git a/polkadot/node/network/approval-distribution/src/lib.rs b/polkadot/node/network/approval-distribution/src/lib.rs index 971b6de5f8f6..2fcb639338ef 100644 --- a/polkadot/node/network/approval-distribution/src/lib.rs +++ b/polkadot/node/network/approval-distribution/src/lib.rs @@ -73,7 +73,8 @@ use std::{ time::Duration, }; -mod metrics; +/// Approval distribution metrics. +pub mod metrics; #[cfg(test)] mod tests; @@ -99,7 +100,7 @@ const MAX_BITFIELD_SIZE: usize = 500; pub struct ApprovalDistribution { metrics: Metrics, slot_duration_millis: u64, - clock: Box, + clock: Arc, assignment_criteria: Arc, } @@ -2668,7 +2669,7 @@ impl ApprovalDistribution { Self::new_with_clock( metrics, slot_duration_millis, - Box::new(SystemClock), + Arc::new(SystemClock), assignment_criteria, ) } @@ -2677,7 +2678,7 @@ impl ApprovalDistribution { pub fn new_with_clock( metrics: Metrics, slot_duration_millis: u64, - clock: Box, + clock: Arc, assignment_criteria: Arc, ) -> Self { Self { metrics, slot_duration_millis, clock, assignment_criteria } diff --git a/polkadot/node/network/approval-distribution/src/metrics.rs b/polkadot/node/network/approval-distribution/src/metrics.rs index 10553c352966..2f677ba415e4 100644 --- a/polkadot/node/network/approval-distribution/src/metrics.rs +++ b/polkadot/node/network/approval-distribution/src/metrics.rs @@ -79,31 +79,19 @@ impl Metrics { .map(|metrics| metrics.time_import_pending_now_known.start_timer()) } - pub fn on_approval_already_known(&self) { - if let Some(metrics) = &self.0 { - metrics.approvals_received_result.with_label_values(&["known"]).inc() - } - } - - pub fn on_approval_entry_not_found(&self) { - if let Some(metrics) = &self.0 { - metrics.approvals_received_result.with_label_values(&["noapprovalentry"]).inc() - } - } - - pub fn on_approval_recent_outdated(&self) { + pub(crate) fn on_approval_recent_outdated(&self) { if let Some(metrics) = &self.0 { metrics.approvals_received_result.with_label_values(&["outdated"]).inc() } } - pub fn on_approval_invalid_block(&self) { + pub(crate) fn on_approval_invalid_block(&self) { if let Some(metrics) = &self.0 { metrics.approvals_received_result.with_label_values(&["invalidblock"]).inc() } } - pub fn on_approval_unknown_assignment(&self) { + pub(crate) fn on_approval_unknown_assignment(&self) { if let Some(metrics) = &self.0 { metrics .approvals_received_result @@ -112,94 +100,73 @@ impl Metrics { } } - pub fn on_approval_duplicate(&self) { + pub(crate) fn on_approval_duplicate(&self) { if let Some(metrics) = &self.0 { metrics.approvals_received_result.with_label_values(&["duplicate"]).inc() } } - pub fn on_approval_out_of_view(&self) { + pub(crate) fn on_approval_out_of_view(&self) { if let Some(metrics) = &self.0 { metrics.approvals_received_result.with_label_values(&["outofview"]).inc() } } - pub fn on_approval_good_known(&self) { + pub(crate) fn on_approval_good_known(&self) { if let Some(metrics) = &self.0 { metrics.approvals_received_result.with_label_values(&["goodknown"]).inc() } } - pub fn on_approval_bad(&self) { + pub(crate) fn on_approval_bad(&self) { if let Some(metrics) = &self.0 { metrics.approvals_received_result.with_label_values(&["bad"]).inc() } } - pub fn on_approval_unexpected(&self) { - if let Some(metrics) = &self.0 { - metrics.approvals_received_result.with_label_values(&["unexpected"]).inc() - } - } - - pub fn on_approval_bug(&self) { + pub(crate) fn on_approval_bug(&self) { if let Some(metrics) = &self.0 { metrics.approvals_received_result.with_label_values(&["bug"]).inc() } } - pub fn on_assignment_already_known(&self) { - if let Some(metrics) = &self.0 { - metrics.assignments_received_result.with_label_values(&["known"]).inc() - } - } - - pub fn on_assignment_recent_outdated(&self) { + pub(crate) fn on_assignment_recent_outdated(&self) { if let Some(metrics) = &self.0 { metrics.assignments_received_result.with_label_values(&["outdated"]).inc() } } - pub fn on_assignment_invalid_block(&self) { + pub(crate) fn on_assignment_invalid_block(&self) { if let Some(metrics) = &self.0 { metrics.assignments_received_result.with_label_values(&["invalidblock"]).inc() } } - pub fn on_assignment_duplicate(&self) { + pub(crate) fn on_assignment_duplicate(&self) { if let Some(metrics) = &self.0 { metrics.assignments_received_result.with_label_values(&["duplicate"]).inc() } } - pub fn on_assignment_out_of_view(&self) { + pub(crate) fn on_assignment_out_of_view(&self) { if let Some(metrics) = &self.0 { metrics.assignments_received_result.with_label_values(&["outofview"]).inc() } } - pub fn on_assignment_good_known(&self) { + pub(crate) fn on_assignment_good_known(&self) { if let Some(metrics) = &self.0 { metrics.assignments_received_result.with_label_values(&["goodknown"]).inc() } } - pub fn on_assignment_bad(&self) { + pub(crate) fn on_assignment_bad(&self) { if let Some(metrics) = &self.0 { metrics.assignments_received_result.with_label_values(&["bad"]).inc() } } - pub fn on_assignment_duplicatevoting(&self) { - if let Some(metrics) = &self.0 { - metrics - .assignments_received_result - .with_label_values(&["duplicatevoting"]) - .inc() - } - } - - pub fn on_assignment_far(&self) { + pub(crate) fn on_assignment_far(&self) { if let Some(metrics) = &self.0 { metrics.assignments_received_result.with_label_values(&["far"]).inc() } diff --git a/polkadot/node/network/approval-distribution/src/tests.rs b/polkadot/node/network/approval-distribution/src/tests.rs index 4ee9320e0e45..068559dea762 100644 --- a/polkadot/node/network/approval-distribution/src/tests.rs +++ b/polkadot/node/network/approval-distribution/src/tests.rs @@ -54,7 +54,7 @@ type VirtualOverseer = fn test_harness>( assignment_criteria: Arc, - clock: Box, + clock: Arc, mut state: State, test_fn: impl FnOnce(VirtualOverseer) -> T, ) -> State { @@ -555,16 +555,15 @@ fn try_import_the_same_assignment() { let _ = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), state_without_reputation_delay(), |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; - // setup peers + setup_peer_with_view(overseer, &peer_a, view![], ValidationVersion::V1).await; setup_peer_with_view(overseer, &peer_b, view![hash], ValidationVersion::V1).await; setup_peer_with_view(overseer, &peer_c, view![hash], ValidationVersion::V1).await; - // Set up a gossip topology, where a, b, c and d are topology neighbors to the node // under testing. let peers_with_optional_peer_id = peers .iter() @@ -661,7 +660,7 @@ fn try_import_the_same_assignment_v2() { let _ = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), state_without_reputation_delay(), |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -772,7 +771,7 @@ fn delay_reputation_change() { let _ = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), state_with_reputation_delay(), |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -845,7 +844,7 @@ fn spam_attack_results_in_negative_reputation_change() { let _ = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), state_without_reputation_delay(), |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -942,7 +941,7 @@ fn peer_sending_us_the_same_we_just_sent_them_is_ok() { let _ = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), state_without_reputation_delay(), |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -1043,7 +1042,7 @@ fn import_approval_happy_path_v1_v2_peers() { let _ = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), state_without_reputation_delay(), |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -1183,7 +1182,7 @@ fn import_approval_happy_path_v2() { let candidate_hash_second = polkadot_primitives::CandidateHash(Hash::repeat_byte(0xCC)); let _ = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), state_without_reputation_delay(), |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -1314,7 +1313,7 @@ fn multiple_assignments_covered_with_one_approval_vote() { let _ = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), state_without_reputation_delay(), |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -1524,7 +1523,7 @@ fn unify_with_peer_multiple_assignments_covered_with_one_approval_vote() { let _ = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), state_without_reputation_delay(), |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -1723,7 +1722,7 @@ fn import_approval_bad() { let _ = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), state_without_reputation_delay(), |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -1810,7 +1809,7 @@ fn update_our_view() { let state = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), State::default(), |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -1858,7 +1857,7 @@ fn update_our_view() { let state = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), state, |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -1877,7 +1876,7 @@ fn update_our_view() { let state = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), state, |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -1905,7 +1904,7 @@ fn update_peer_view() { let state = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), State::default(), |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -2004,7 +2003,7 @@ fn update_peer_view() { let state = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), state, |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -2064,7 +2063,7 @@ fn update_peer_view() { let finalized_number = 4_000_000_000; let state = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), state, |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -2106,7 +2105,7 @@ fn update_peer_authority_id() { let _state = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), State::default(), |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -2287,7 +2286,7 @@ fn import_remotely_then_locally() { let _ = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), state_without_reputation_delay(), |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -2393,7 +2392,7 @@ fn sends_assignments_even_when_state_is_approved() { let _ = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), State::default(), |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -2499,7 +2498,7 @@ fn sends_assignments_even_when_state_is_approved_v2() { let _ = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), State::default(), |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -2625,7 +2624,7 @@ fn race_condition_in_local_vs_remote_view_update() { let _ = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), state_without_reputation_delay(), |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -2711,7 +2710,7 @@ fn propagates_locally_generated_assignment_to_both_dimensions() { let _ = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), State::default(), |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -2841,7 +2840,7 @@ fn propagates_assignments_along_unshared_dimension() { let _ = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), state_without_reputation_delay(), |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -3000,7 +2999,7 @@ fn propagates_to_required_after_connect() { let _ = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), State::default(), |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -3165,7 +3164,7 @@ fn sends_to_more_peers_after_getting_topology() { let _ = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), State::default(), |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -3303,7 +3302,7 @@ fn originator_aggression_l1() { let _ = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), state, |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -3484,7 +3483,7 @@ fn non_originator_aggression_l1() { let _ = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), state, |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -3609,7 +3608,7 @@ fn non_originator_aggression_l2() { let aggression_l2_threshold = state.aggression_config.l2_threshold.unwrap(); let _ = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), state, |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -3794,7 +3793,7 @@ fn resends_messages_periodically() { state.aggression_config.resend_unfinalized_period = Some(2); let _ = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), state, |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -3958,7 +3957,7 @@ fn import_versioned_approval() { let candidate_hash = polkadot_primitives::CandidateHash(Hash::repeat_byte(0xBB)); let _ = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), state, |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -4131,7 +4130,7 @@ fn batch_test_round(message_count: usize) { let subsystem = ApprovalDistribution::new_with_clock( Default::default(), Default::default(), - Box::new(SystemClock {}), + Arc::new(SystemClock {}), Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), ); let mut rng = rand_chacha::ChaCha12Rng::seed_from_u64(12345); @@ -4318,7 +4317,7 @@ fn subsystem_rejects_assignment_in_future() { let _ = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(89) }), - Box::new(DummyClock {}), + Arc::new(DummyClock {}), state_without_reputation_delay(), |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -4384,7 +4383,7 @@ fn subsystem_rejects_bad_assignments() { Arc::new(MockAssignmentCriteria { tranche: Err(InvalidAssignment(criteria::InvalidAssignmentReason::NullAssignment)), }), - Box::new(DummyClock {}), + Arc::new(DummyClock {}), state_without_reputation_delay(), |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -4447,7 +4446,7 @@ fn subsystem_rejects_wrong_claimed_assignments() { let _ = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(DummyClock {}), + Arc::new(DummyClock {}), state_without_reputation_delay(), |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; @@ -4531,7 +4530,7 @@ fn subsystem_accepts_tranche0_duplicate_assignments() { let _ = test_harness( Arc::new(MockAssignmentCriteria { tranche: Ok(0) }), - Box::new(DummyClock {}), + Arc::new(DummyClock {}), state_without_reputation_delay(), |mut virtual_overseer| async move { let overseer = &mut virtual_overseer; diff --git a/polkadot/node/network/bridge/src/rx/mod.rs b/polkadot/node/network/bridge/src/rx/mod.rs index 7745c42f78a1..9a2357f02d8b 100644 --- a/polkadot/node/network/bridge/src/rx/mod.rs +++ b/polkadot/node/network/bridge/src/rx/mod.rs @@ -45,8 +45,9 @@ use polkadot_node_subsystem::{ errors::SubsystemError, messages::{ network_bridge_event::NewGossipTopology, ApprovalDistributionMessage, - BitfieldDistributionMessage, CollatorProtocolMessage, GossipSupportMessage, - NetworkBridgeEvent, NetworkBridgeRxMessage, StatementDistributionMessage, + ApprovalVotingParallelMessage, BitfieldDistributionMessage, CollatorProtocolMessage, + GossipSupportMessage, NetworkBridgeEvent, NetworkBridgeRxMessage, + StatementDistributionMessage, }, overseer, ActivatedLeaf, ActiveLeavesUpdate, FromOrchestra, OverseerSignal, SpawnedSubsystem, }; @@ -89,6 +90,7 @@ pub struct NetworkBridgeRx { validation_service: Box, collation_service: Box, notification_sinks: Arc>>>, + approval_voting_parallel_enabled: bool, } impl NetworkBridgeRx { @@ -105,6 +107,7 @@ impl NetworkBridgeRx { peerset_protocol_names: PeerSetProtocolNames, mut notification_services: HashMap>, notification_sinks: Arc>>>, + approval_voting_parallel_enabled: bool, ) -> Self { let shared = Shared::default(); @@ -125,6 +128,7 @@ impl NetworkBridgeRx { validation_service, collation_service, notification_sinks, + approval_voting_parallel_enabled, } } } @@ -156,6 +160,7 @@ async fn handle_validation_message( peerset_protocol_names: &PeerSetProtocolNames, notification_service: &mut Box, notification_sinks: &mut Arc>>>, + approval_voting_parallel_enabled: bool, ) where AD: validator_discovery::AuthorityDiscovery + Send, { @@ -276,6 +281,7 @@ async fn handle_validation_message( ], sender, &metrics, + approval_voting_parallel_enabled, ) .await; @@ -329,6 +335,7 @@ async fn handle_validation_message( NetworkBridgeEvent::PeerDisconnected(peer), sender, &metrics, + approval_voting_parallel_enabled, ) .await; } @@ -398,7 +405,13 @@ async fn handle_validation_message( network_service.report_peer(peer, report.into()); } - dispatch_validation_events_to_all(events, sender, &metrics).await; + dispatch_validation_events_to_all( + events, + sender, + &metrics, + approval_voting_parallel_enabled, + ) + .await; }, } } @@ -652,6 +665,7 @@ async fn handle_network_messages( mut validation_service: Box, mut collation_service: Box, mut notification_sinks: Arc>>>, + approval_voting_parallel_enabled: bool, ) -> Result<(), Error> where AD: validator_discovery::AuthorityDiscovery + Send, @@ -669,6 +683,7 @@ where &peerset_protocol_names, &mut validation_service, &mut notification_sinks, + approval_voting_parallel_enabled, ).await, None => return Err(Error::EventStreamConcluded), }, @@ -727,6 +742,7 @@ async fn run_incoming_orchestra_signals( sync_oracle: Box, metrics: Metrics, notification_sinks: Arc>>>, + approval_voting_parallel_enabled: bool, ) -> Result<(), Error> where AD: validator_discovery::AuthorityDiscovery + Clone, @@ -766,6 +782,7 @@ where local_index, }), ctx.sender(), + approval_voting_parallel_enabled, ); }, FromOrchestra::Communication { @@ -787,6 +804,7 @@ where dispatch_validation_event_to_all_unbounded( NetworkBridgeEvent::UpdatedAuthorityIds(peer_id, authority_ids), ctx.sender(), + approval_voting_parallel_enabled, ); }, FromOrchestra::Signal(OverseerSignal::Conclude) => return Ok(()), @@ -826,6 +844,7 @@ where finalized_number, &metrics, ¬ification_sinks, + approval_voting_parallel_enabled, ); note_peers_count(&metrics, &shared); } @@ -875,6 +894,7 @@ where validation_service, collation_service, notification_sinks, + approval_voting_parallel_enabled, } = bridge; let (task, network_event_handler) = handle_network_messages( @@ -887,6 +907,7 @@ where validation_service, collation_service, notification_sinks.clone(), + approval_voting_parallel_enabled, ) .remote_handle(); @@ -900,6 +921,7 @@ where sync_oracle, metrics, notification_sinks, + approval_voting_parallel_enabled, ); futures::pin_mut!(orchestra_signal_handler); @@ -926,6 +948,7 @@ fn update_our_view( finalized_number: BlockNumber, metrics: &Metrics, notification_sinks: &Arc>>>, + approval_voting_parallel_enabled: bool, ) { let new_view = construct_view(live_heads.iter().map(|v| v.hash), finalized_number); @@ -970,6 +993,7 @@ fn update_our_view( dispatch_validation_event_to_all_unbounded( NetworkBridgeEvent::OurViewChange(our_view.clone()), ctx.sender(), + approval_voting_parallel_enabled, ); dispatch_collation_event_to_all_unbounded( @@ -1081,8 +1105,15 @@ async fn dispatch_validation_event_to_all( event: NetworkBridgeEvent, ctx: &mut impl overseer::NetworkBridgeRxSenderTrait, metrics: &Metrics, + approval_voting_parallel_enabled: bool, ) { - dispatch_validation_events_to_all(std::iter::once(event), ctx, metrics).await + dispatch_validation_events_to_all( + std::iter::once(event), + ctx, + metrics, + approval_voting_parallel_enabled, + ) + .await } async fn dispatch_collation_event_to_all( @@ -1095,6 +1126,7 @@ async fn dispatch_collation_event_to_all( fn dispatch_validation_event_to_all_unbounded( event: NetworkBridgeEvent, sender: &mut impl overseer::NetworkBridgeRxSenderTrait, + approval_voting_parallel_enabled: bool, ) { event .focus() @@ -1106,11 +1138,20 @@ fn dispatch_validation_event_to_all_unbounded( .ok() .map(BitfieldDistributionMessage::from) .and_then(|msg| Some(sender.send_unbounded_message(msg))); - event - .focus() - .ok() - .map(ApprovalDistributionMessage::from) - .and_then(|msg| Some(sender.send_unbounded_message(msg))); + + if approval_voting_parallel_enabled { + event + .focus() + .ok() + .map(ApprovalVotingParallelMessage::from) + .and_then(|msg| Some(sender.send_unbounded_message(msg))); + } else { + event + .focus() + .ok() + .map(ApprovalDistributionMessage::from) + .and_then(|msg| Some(sender.send_unbounded_message(msg))); + } event .focus() .ok() @@ -1131,6 +1172,7 @@ async fn dispatch_validation_events_to_all( events: I, sender: &mut impl overseer::NetworkBridgeRxSenderTrait, _metrics: &Metrics, + approval_voting_parallel_enabled: bool, ) where I: IntoIterator>, I::IntoIter: Send, @@ -1160,7 +1202,11 @@ async fn dispatch_validation_events_to_all( for event in events { send_message!(event, StatementDistributionMessage); send_message!(event, BitfieldDistributionMessage); - send_message!(event, ApprovalDistributionMessage); + if approval_voting_parallel_enabled { + send_message!(event, ApprovalVotingParallelMessage); + } else { + send_message!(event, ApprovalDistributionMessage); + } send_message!(event, GossipSupportMessage); } } diff --git a/polkadot/node/network/bridge/src/rx/tests.rs b/polkadot/node/network/bridge/src/rx/tests.rs index 601dca5cb8a3..a96817eb2546 100644 --- a/polkadot/node/network/bridge/src/rx/tests.rs +++ b/polkadot/node/network/bridge/src/rx/tests.rs @@ -529,6 +529,7 @@ fn test_harness>( validation_service, collation_service, notification_sinks, + approval_voting_parallel_enabled: false, }; let network_bridge = run_network_in(bridge, context) diff --git a/polkadot/node/overseer/src/dummy.rs b/polkadot/node/overseer/src/dummy.rs index fc5f0070773b..6f9cd9d00403 100644 --- a/polkadot/node/overseer/src/dummy.rs +++ b/polkadot/node/overseer/src/dummy.rs @@ -88,6 +88,7 @@ pub fn dummy_overseer_builder( DummySubsystem, DummySubsystem, DummySubsystem, + DummySubsystem, >, SubsystemError, > @@ -131,6 +132,7 @@ pub fn one_for_all_overseer_builder( Sub, Sub, Sub, + Sub, >, SubsystemError, > @@ -155,6 +157,7 @@ where + Subsystem, SubsystemError> + Subsystem, SubsystemError> + Subsystem, SubsystemError> + + Subsystem, SubsystemError> + Subsystem, SubsystemError> + Subsystem, SubsystemError> + Subsystem, SubsystemError> @@ -183,6 +186,7 @@ where .statement_distribution(subsystem.clone()) .approval_distribution(subsystem.clone()) .approval_voting(subsystem.clone()) + .approval_voting_parallel(subsystem.clone()) .gossip_support(subsystem.clone()) .dispute_coordinator(subsystem.clone()) .dispute_distribution(subsystem.clone()) diff --git a/polkadot/node/overseer/src/lib.rs b/polkadot/node/overseer/src/lib.rs index 23adf4f4d8a4..10a4320433a1 100644 --- a/polkadot/node/overseer/src/lib.rs +++ b/polkadot/node/overseer/src/lib.rs @@ -76,13 +76,13 @@ use sc_client_api::{BlockImportNotification, BlockchainEvents, FinalityNotificat use self::messages::{BitfieldSigningMessage, PvfCheckerMessage}; use polkadot_node_subsystem_types::messages::{ - ApprovalDistributionMessage, ApprovalVotingMessage, AvailabilityDistributionMessage, - AvailabilityRecoveryMessage, AvailabilityStoreMessage, BitfieldDistributionMessage, - CandidateBackingMessage, CandidateValidationMessage, ChainApiMessage, ChainSelectionMessage, - CollationGenerationMessage, CollatorProtocolMessage, DisputeCoordinatorMessage, - DisputeDistributionMessage, GossipSupportMessage, NetworkBridgeRxMessage, - NetworkBridgeTxMessage, ProspectiveParachainsMessage, ProvisionerMessage, RuntimeApiMessage, - StatementDistributionMessage, + ApprovalDistributionMessage, ApprovalVotingMessage, ApprovalVotingParallelMessage, + AvailabilityDistributionMessage, AvailabilityRecoveryMessage, AvailabilityStoreMessage, + BitfieldDistributionMessage, CandidateBackingMessage, CandidateValidationMessage, + ChainApiMessage, ChainSelectionMessage, CollationGenerationMessage, CollatorProtocolMessage, + DisputeCoordinatorMessage, DisputeDistributionMessage, GossipSupportMessage, + NetworkBridgeRxMessage, NetworkBridgeTxMessage, ProspectiveParachainsMessage, + ProvisionerMessage, RuntimeApiMessage, StatementDistributionMessage, }; pub use polkadot_node_subsystem_types::{ @@ -550,6 +550,7 @@ pub struct Overseer { BitfieldDistributionMessage, StatementDistributionMessage, ApprovalDistributionMessage, + ApprovalVotingParallelMessage, GossipSupportMessage, DisputeDistributionMessage, CollationGenerationMessage, @@ -595,7 +596,19 @@ pub struct Overseer { RuntimeApiMessage, ])] approval_voting: ApprovalVoting, - + #[subsystem(blocking, message_capacity: 64000, ApprovalVotingParallelMessage, sends: [ + AvailabilityRecoveryMessage, + CandidateValidationMessage, + ChainApiMessage, + ChainSelectionMessage, + DisputeCoordinatorMessage, + RuntimeApiMessage, + NetworkBridgeTxMessage, + ApprovalVotingMessage, + ApprovalDistributionMessage, + ApprovalVotingParallelMessage, + ])] + approval_voting_parallel: ApprovalVotingParallel, #[subsystem(GossipSupportMessage, sends: [ NetworkBridgeTxMessage, NetworkBridgeRxMessage, // TODO @@ -613,6 +626,7 @@ pub struct Overseer { AvailabilityStoreMessage, AvailabilityRecoveryMessage, ChainSelectionMessage, + ApprovalVotingParallelMessage, ])] dispute_coordinator: DisputeCoordinator, diff --git a/polkadot/node/overseer/src/tests.rs b/polkadot/node/overseer/src/tests.rs index 8e78d8fc8921..cb0add03e2e7 100644 --- a/polkadot/node/overseer/src/tests.rs +++ b/polkadot/node/overseer/src/tests.rs @@ -950,7 +950,7 @@ fn test_prospective_parachains_msg() -> ProspectiveParachainsMessage { // Checks that `stop`, `broadcast_signal` and `broadcast_message` are implemented correctly. #[test] fn overseer_all_subsystems_receive_signals_and_messages() { - const NUM_SUBSYSTEMS: usize = 23; + const NUM_SUBSYSTEMS: usize = 24; // -4 for BitfieldSigning, GossipSupport, AvailabilityDistribution and PvfCheckerSubsystem. const NUM_SUBSYSTEMS_MESSAGED: usize = NUM_SUBSYSTEMS - 4; @@ -1028,6 +1028,11 @@ fn overseer_all_subsystems_receive_signals_and_messages() { handle .send_msg_anon(AllMessages::ApprovalDistribution(test_approval_distribution_msg())) .await; + handle + .send_msg_anon(AllMessages::ApprovalVotingParallel( + test_approval_distribution_msg().into(), + )) + .await; handle .send_msg_anon(AllMessages::ApprovalVoting(test_approval_voting_msg())) .await; @@ -1101,6 +1106,7 @@ fn context_holds_onto_message_until_enough_signals_received() { let (chain_selection_bounded_tx, _) = metered::channel(CHANNEL_CAPACITY); let (pvf_checker_bounded_tx, _) = metered::channel(CHANNEL_CAPACITY); let (prospective_parachains_bounded_tx, _) = metered::channel(CHANNEL_CAPACITY); + let (approval_voting_parallel_tx, _) = metered::channel(CHANNEL_CAPACITY); let (candidate_validation_unbounded_tx, _) = metered::unbounded(); let (candidate_backing_unbounded_tx, _) = metered::unbounded(); @@ -1125,6 +1131,7 @@ fn context_holds_onto_message_until_enough_signals_received() { let (chain_selection_unbounded_tx, _) = metered::unbounded(); let (pvf_checker_unbounded_tx, _) = metered::unbounded(); let (prospective_parachains_unbounded_tx, _) = metered::unbounded(); + let (approval_voting_parallel_unbounded_tx, _) = metered::unbounded(); let channels_out = ChannelsOut { candidate_validation: candidate_validation_bounded_tx.clone(), @@ -1150,6 +1157,7 @@ fn context_holds_onto_message_until_enough_signals_received() { chain_selection: chain_selection_bounded_tx.clone(), pvf_checker: pvf_checker_bounded_tx.clone(), prospective_parachains: prospective_parachains_bounded_tx.clone(), + approval_voting_parallel: approval_voting_parallel_tx.clone(), candidate_validation_unbounded: candidate_validation_unbounded_tx.clone(), candidate_backing_unbounded: candidate_backing_unbounded_tx.clone(), @@ -1174,6 +1182,7 @@ fn context_holds_onto_message_until_enough_signals_received() { chain_selection_unbounded: chain_selection_unbounded_tx.clone(), pvf_checker_unbounded: pvf_checker_unbounded_tx.clone(), prospective_parachains_unbounded: prospective_parachains_unbounded_tx.clone(), + approval_voting_parallel_unbounded: approval_voting_parallel_unbounded_tx.clone(), }; let (mut signal_tx, signal_rx) = metered::channel(CHANNEL_CAPACITY); diff --git a/polkadot/node/primitives/src/approval/mod.rs b/polkadot/node/primitives/src/approval/mod.rs index 79f4cfa9e0be..42342f9889a9 100644 --- a/polkadot/node/primitives/src/approval/mod.rs +++ b/polkadot/node/primitives/src/approval/mod.rs @@ -124,7 +124,7 @@ pub mod v1 { } /// Metadata about a block which is now live in the approval protocol. - #[derive(Debug)] + #[derive(Debug, Clone)] pub struct BlockApprovalMeta { /// The hash of the block. pub hash: Hash, diff --git a/polkadot/node/service/Cargo.toml b/polkadot/node/service/Cargo.toml index 89f8212bf9d8..8d50b54b2fdc 100644 --- a/polkadot/node/service/Cargo.toml +++ b/polkadot/node/service/Cargo.toml @@ -116,6 +116,7 @@ polkadot-gossip-support = { optional = true, workspace = true, default-features polkadot-network-bridge = { optional = true, workspace = true, default-features = true } polkadot-node-collation-generation = { optional = true, workspace = true, default-features = true } polkadot-node-core-approval-voting = { optional = true, workspace = true, default-features = true } +polkadot-node-core-approval-voting-parallel = { optional = true, workspace = true, default-features = true } polkadot-node-core-av-store = { optional = true, workspace = true, default-features = true } polkadot-node-core-backing = { optional = true, workspace = true, default-features = true } polkadot-node-core-bitfield-signing = { optional = true, workspace = true, default-features = true } @@ -160,6 +161,7 @@ full-node = [ "polkadot-network-bridge", "polkadot-node-collation-generation", "polkadot-node-core-approval-voting", + "polkadot-node-core-approval-voting-parallel", "polkadot-node-core-av-store", "polkadot-node-core-backing", "polkadot-node-core-bitfield-signing", diff --git a/polkadot/node/service/src/lib.rs b/polkadot/node/service/src/lib.rs index 794243568804..d029f3be53eb 100644 --- a/polkadot/node/service/src/lib.rs +++ b/polkadot/node/service/src/lib.rs @@ -660,6 +660,8 @@ pub struct NewFullParams { #[allow(dead_code)] pub malus_finality_delay: Option, pub hwbench: Option, + /// Enable approval voting processing in parallel. + pub enable_approval_voting_parallel: bool, } #[cfg(feature = "full-node")] @@ -753,6 +755,7 @@ pub fn new_full< execute_workers_max_num, prepare_workers_soft_max_num, prepare_workers_hard_max_num, + enable_approval_voting_parallel, }: NewFullParams, ) -> Result { use polkadot_availability_recovery::FETCH_CHUNKS_THRESHOLD; @@ -784,6 +787,14 @@ pub fn new_full< Some(backoff) }; + // Running approval voting in parallel is enabled by default on all networks except Polkadot and + // Kusama, unless explicitly enabled by the commandline option. + // This is meant to be temporary until we have enough confidence in the new system to enable it + // by default on all networks. + let enable_approval_voting_parallel = (!config.chain_spec.is_kusama() && + !config.chain_spec.is_polkadot()) || + enable_approval_voting_parallel; + let disable_grandpa = config.disable_grandpa; let name = config.network.node_name.clone(); @@ -806,6 +817,7 @@ pub fn new_full< overseer_handle.clone(), metrics, Some(basics.task_manager.spawn_handle()), + enable_approval_voting_parallel, ) } else { SelectRelayChain::new_longest_chain(basics.backend.clone()) @@ -1016,6 +1028,7 @@ pub fn new_full< dispute_coordinator_config, chain_selection_config, fetch_chunks_threshold, + enable_approval_voting_parallel, }) }; diff --git a/polkadot/node/service/src/overseer.rs b/polkadot/node/service/src/overseer.rs index 3c071e34fe11..a98b6bcb3083 100644 --- a/polkadot/node/service/src/overseer.rs +++ b/polkadot/node/service/src/overseer.rs @@ -58,6 +58,9 @@ pub use polkadot_network_bridge::{ }; pub use polkadot_node_collation_generation::CollationGenerationSubsystem; pub use polkadot_node_core_approval_voting::ApprovalVotingSubsystem; +pub use polkadot_node_core_approval_voting_parallel::{ + ApprovalVotingParallelSubsystem, Metrics as ApprovalVotingParallelMetrics, +}; pub use polkadot_node_core_av_store::AvailabilityStoreSubsystem; pub use polkadot_node_core_backing::CandidateBackingSubsystem; pub use polkadot_node_core_bitfield_signing::BitfieldSigningSubsystem; @@ -139,9 +142,16 @@ pub struct ExtendedOverseerGenArgs { /// than the value put in here we always try to recovery availability from backers. /// The presence of this parameter here is needed to have different values per chain. pub fetch_chunks_threshold: Option, + /// Enable approval-voting-parallel subsystem and disable the standalone approval-voting and + /// approval-distribution subsystems. + pub enable_approval_voting_parallel: bool, } /// Obtain a prepared validator `Overseer`, that is initialized with all default values. +/// +/// The difference between this function and `validator_with_parallel_overseer_builder` is that this +/// function enables the standalone approval-voting and approval-distribution subsystems +/// and disables the approval-voting-parallel subsystem. pub fn validator_overseer_builder( OverseerGenArgs { runtime_client, @@ -174,6 +184,7 @@ pub fn validator_overseer_builder( dispute_coordinator_config, chain_selection_config, fetch_chunks_threshold, + enable_approval_voting_parallel, }: ExtendedOverseerGenArgs, ) -> Result< InitializedOverseerBuilder< @@ -203,6 +214,7 @@ pub fn validator_overseer_builder( CollatorProtocolSubsystem, ApprovalDistributionSubsystem, ApprovalVotingSubsystem, + DummySubsystem, GossipSupportSubsystem, DisputeCoordinatorSubsystem, DisputeDistributionSubsystem, @@ -223,7 +235,8 @@ where let spawner = SpawnGlue(spawner); let network_bridge_metrics: NetworkBridgeMetrics = Metrics::register(registry)?; - + let approval_voting_parallel_metrics: ApprovalVotingParallelMetrics = + Metrics::register(registry)?; let builder = Overseer::builder() .network_bridge_tx(NetworkBridgeTxSubsystem::new( network_service.clone(), @@ -241,6 +254,7 @@ where peerset_protocol_names, notification_services, notification_sinks, + enable_approval_voting_parallel, )) .availability_distribution(AvailabilityDistributionSubsystem::new( keystore.clone(), @@ -310,18 +324,19 @@ where rand::rngs::StdRng::from_entropy(), )) .approval_distribution(ApprovalDistributionSubsystem::new( - Metrics::register(registry)?, + approval_voting_parallel_metrics.approval_distribution_metrics(), approval_voting_config.slot_duration_millis, Arc::new(RealAssignmentCriteria {}), )) .approval_voting(ApprovalVotingSubsystem::with_config( - approval_voting_config, + approval_voting_config.clone(), parachains_db.clone(), keystore.clone(), Box::new(sync_service.clone()), - Metrics::register(registry)?, + approval_voting_parallel_metrics.approval_voting_metrics(), Arc::new(spawner.clone()), )) + .approval_voting_parallel(DummySubsystem) .gossip_support(GossipSupportSubsystem::new( keystore.clone(), authority_discovery_service.clone(), @@ -332,6 +347,229 @@ where dispute_coordinator_config, keystore.clone(), Metrics::register(registry)?, + enable_approval_voting_parallel, + )) + .dispute_distribution(DisputeDistributionSubsystem::new( + keystore.clone(), + dispute_req_receiver, + authority_discovery_service.clone(), + Metrics::register(registry)?, + )) + .chain_selection(ChainSelectionSubsystem::new(chain_selection_config, parachains_db)) + .prospective_parachains(ProspectiveParachainsSubsystem::new(Metrics::register(registry)?)) + .activation_external_listeners(Default::default()) + .span_per_active_leaf(Default::default()) + .active_leaves(Default::default()) + .supports_parachains(runtime_client) + .metrics(metrics) + .spawner(spawner); + + let builder = if let Some(capacity) = overseer_message_channel_capacity_override { + builder.message_channel_capacity(capacity) + } else { + builder + }; + Ok(builder) +} + +/// Obtain a prepared validator `Overseer`, that is initialized with all default values. +/// +/// The difference between this function and `validator_overseer_builder` is that this +/// function enables the approval-voting-parallel subsystem and disables the standalone +/// approval-voting and approval-distribution subsystems. +pub fn validator_with_parallel_overseer_builder( + OverseerGenArgs { + runtime_client, + network_service, + sync_service, + authority_discovery_service, + collation_req_v1_receiver: _, + collation_req_v2_receiver: _, + available_data_req_receiver, + registry, + spawner, + is_parachain_node, + overseer_message_channel_capacity_override, + req_protocol_names, + peerset_protocol_names, + notification_services, + }: OverseerGenArgs, + ExtendedOverseerGenArgs { + keystore, + parachains_db, + candidate_validation_config, + availability_config, + pov_req_receiver, + chunk_req_v1_receiver, + chunk_req_v2_receiver, + statement_req_receiver, + candidate_req_v2_receiver, + approval_voting_config, + dispute_req_receiver, + dispute_coordinator_config, + chain_selection_config, + fetch_chunks_threshold, + enable_approval_voting_parallel, + }: ExtendedOverseerGenArgs, +) -> Result< + InitializedOverseerBuilder< + SpawnGlue, + Arc, + CandidateValidationSubsystem, + PvfCheckerSubsystem, + CandidateBackingSubsystem, + StatementDistributionSubsystem, + AvailabilityDistributionSubsystem, + AvailabilityRecoverySubsystem, + BitfieldSigningSubsystem, + BitfieldDistributionSubsystem, + ProvisionerSubsystem, + RuntimeApiSubsystem, + AvailabilityStoreSubsystem, + NetworkBridgeRxSubsystem< + Arc, + AuthorityDiscoveryService, + >, + NetworkBridgeTxSubsystem< + Arc, + AuthorityDiscoveryService, + >, + ChainApiSubsystem, + CollationGenerationSubsystem, + CollatorProtocolSubsystem, + DummySubsystem, + DummySubsystem, + ApprovalVotingParallelSubsystem, + GossipSupportSubsystem, + DisputeCoordinatorSubsystem, + DisputeDistributionSubsystem, + ChainSelectionSubsystem, + ProspectiveParachainsSubsystem, + >, + Error, +> +where + RuntimeClient: RuntimeApiSubsystemClient + ChainApiBackend + AuxStore + 'static, + Spawner: 'static + SpawnNamed + Clone + Unpin, +{ + use polkadot_node_subsystem_util::metrics::Metrics; + + let metrics = ::register(registry)?; + let notification_sinks = Arc::new(Mutex::new(HashMap::new())); + + let spawner = SpawnGlue(spawner); + + let network_bridge_metrics: NetworkBridgeMetrics = Metrics::register(registry)?; + let approval_voting_parallel_metrics: ApprovalVotingParallelMetrics = + Metrics::register(registry)?; + let builder = Overseer::builder() + .network_bridge_tx(NetworkBridgeTxSubsystem::new( + network_service.clone(), + authority_discovery_service.clone(), + network_bridge_metrics.clone(), + req_protocol_names.clone(), + peerset_protocol_names.clone(), + notification_sinks.clone(), + )) + .network_bridge_rx(NetworkBridgeRxSubsystem::new( + network_service.clone(), + authority_discovery_service.clone(), + Box::new(sync_service.clone()), + network_bridge_metrics, + peerset_protocol_names, + notification_services, + notification_sinks, + enable_approval_voting_parallel, + )) + .availability_distribution(AvailabilityDistributionSubsystem::new( + keystore.clone(), + IncomingRequestReceivers { + pov_req_receiver, + chunk_req_v1_receiver, + chunk_req_v2_receiver, + }, + req_protocol_names.clone(), + Metrics::register(registry)?, + )) + .availability_recovery(AvailabilityRecoverySubsystem::for_validator( + fetch_chunks_threshold, + available_data_req_receiver, + &req_protocol_names, + Metrics::register(registry)?, + )) + .availability_store(AvailabilityStoreSubsystem::new( + parachains_db.clone(), + availability_config, + Box::new(sync_service.clone()), + Metrics::register(registry)?, + )) + .bitfield_distribution(BitfieldDistributionSubsystem::new(Metrics::register(registry)?)) + .bitfield_signing(BitfieldSigningSubsystem::new( + keystore.clone(), + Metrics::register(registry)?, + )) + .candidate_backing(CandidateBackingSubsystem::new( + keystore.clone(), + Metrics::register(registry)?, + )) + .candidate_validation(CandidateValidationSubsystem::with_config( + candidate_validation_config, + keystore.clone(), + Metrics::register(registry)?, // candidate-validation metrics + Metrics::register(registry)?, // validation host metrics + )) + .pvf_checker(PvfCheckerSubsystem::new(keystore.clone(), Metrics::register(registry)?)) + .chain_api(ChainApiSubsystem::new(runtime_client.clone(), Metrics::register(registry)?)) + .collation_generation(CollationGenerationSubsystem::new(Metrics::register(registry)?)) + .collator_protocol({ + let side = match is_parachain_node { + IsParachainNode::Collator(_) | IsParachainNode::FullNode => + return Err(Error::Overseer(SubsystemError::Context( + "build validator overseer for parachain node".to_owned(), + ))), + IsParachainNode::No => ProtocolSide::Validator { + keystore: keystore.clone(), + eviction_policy: Default::default(), + metrics: Metrics::register(registry)?, + }, + }; + CollatorProtocolSubsystem::new(side) + }) + .provisioner(ProvisionerSubsystem::new(Metrics::register(registry)?)) + .runtime_api(RuntimeApiSubsystem::new( + runtime_client.clone(), + Metrics::register(registry)?, + spawner.clone(), + )) + .statement_distribution(StatementDistributionSubsystem::new( + keystore.clone(), + statement_req_receiver, + candidate_req_v2_receiver, + Metrics::register(registry)?, + rand::rngs::StdRng::from_entropy(), + )) + .approval_distribution(DummySubsystem) + .approval_voting(DummySubsystem) + .approval_voting_parallel(ApprovalVotingParallelSubsystem::with_config( + approval_voting_config, + parachains_db.clone(), + keystore.clone(), + Box::new(sync_service.clone()), + approval_voting_parallel_metrics, + spawner.clone(), + overseer_message_channel_capacity_override, + )) + .gossip_support(GossipSupportSubsystem::new( + keystore.clone(), + authority_discovery_service.clone(), + Metrics::register(registry)?, + )) + .dispute_coordinator(DisputeCoordinatorSubsystem::new( + parachains_db.clone(), + dispute_coordinator_config, + keystore.clone(), + Metrics::register(registry)?, + enable_approval_voting_parallel, )) .dispute_distribution(DisputeDistributionSubsystem::new( keystore.clone(), @@ -407,6 +645,7 @@ pub fn collator_overseer_builder( DummySubsystem, DummySubsystem, DummySubsystem, + DummySubsystem, >, Error, > @@ -439,6 +678,7 @@ where peerset_protocol_names, notification_services, notification_sinks, + false, )) .availability_distribution(DummySubsystem) .availability_recovery(AvailabilityRecoverySubsystem::for_collator( @@ -481,6 +721,7 @@ where .statement_distribution(DummySubsystem) .approval_distribution(DummySubsystem) .approval_voting(DummySubsystem) + .approval_voting_parallel(DummySubsystem) .gossip_support(DummySubsystem) .dispute_coordinator(DummySubsystem) .dispute_distribution(DummySubsystem) @@ -537,9 +778,15 @@ impl OverseerGen for ValidatorOverseerGen { "create validator overseer as mandatory extended arguments were not provided" .to_owned(), )))?; - validator_overseer_builder(args, ext_args)? - .build_with_connector(connector) - .map_err(|e| e.into()) + if ext_args.enable_approval_voting_parallel { + validator_with_parallel_overseer_builder(args, ext_args)? + .build_with_connector(connector) + .map_err(|e| e.into()) + } else { + validator_overseer_builder(args, ext_args)? + .build_with_connector(connector) + .map_err(|e| e.into()) + } } } diff --git a/polkadot/node/service/src/relay_chain_selection.rs b/polkadot/node/service/src/relay_chain_selection.rs index c0b1ce8b0ebe..e48874f01ca6 100644 --- a/polkadot/node/service/src/relay_chain_selection.rs +++ b/polkadot/node/service/src/relay_chain_selection.rs @@ -39,8 +39,8 @@ use super::{HeaderProvider, HeaderProviderProvider}; use futures::channel::oneshot; use polkadot_node_primitives::MAX_FINALITY_LAG as PRIMITIVES_MAX_FINALITY_LAG; use polkadot_node_subsystem::messages::{ - ApprovalDistributionMessage, ApprovalVotingMessage, ChainSelectionMessage, - DisputeCoordinatorMessage, HighestApprovedAncestorBlock, + ApprovalDistributionMessage, ApprovalVotingMessage, ApprovalVotingParallelMessage, + ChainSelectionMessage, DisputeCoordinatorMessage, HighestApprovedAncestorBlock, }; use polkadot_node_subsystem_util::metrics::{self, prometheus}; use polkadot_overseer::{AllMessages, Handle}; @@ -169,6 +169,7 @@ where overseer: Handle, metrics: Metrics, spawn_handle: Option, + approval_voting_parallel_enabled: bool, ) -> Self { gum::debug!(target: LOG_TARGET, "Using dispute aware relay-chain selection algorithm",); @@ -179,6 +180,7 @@ where overseer, metrics, spawn_handle, + approval_voting_parallel_enabled, )), } } @@ -230,6 +232,7 @@ pub struct SelectRelayChainInner { overseer: OH, metrics: Metrics, spawn_handle: Option, + approval_voting_parallel_enabled: bool, } impl SelectRelayChainInner @@ -244,8 +247,15 @@ where overseer: OH, metrics: Metrics, spawn_handle: Option, + approval_voting_parallel_enabled: bool, ) -> Self { - SelectRelayChainInner { backend, overseer, metrics, spawn_handle } + SelectRelayChainInner { + backend, + overseer, + metrics, + spawn_handle, + approval_voting_parallel_enabled, + } } fn block_header(&self, hash: Hash) -> Result { @@ -284,6 +294,7 @@ where overseer: self.overseer.clone(), metrics: self.metrics.clone(), spawn_handle: self.spawn_handle.clone(), + approval_voting_parallel_enabled: self.approval_voting_parallel_enabled, } } } @@ -448,13 +459,25 @@ where // 2. Constrain according to `ApprovedAncestor`. let (subchain_head, subchain_number, subchain_block_descriptions) = { let (tx, rx) = oneshot::channel(); - overseer - .send_msg( - ApprovalVotingMessage::ApprovedAncestor(subchain_head, target_number, tx), - std::any::type_name::(), - ) - .await; - + if self.approval_voting_parallel_enabled { + overseer + .send_msg( + ApprovalVotingParallelMessage::ApprovedAncestor( + subchain_head, + target_number, + tx, + ), + std::any::type_name::(), + ) + .await; + } else { + overseer + .send_msg( + ApprovalVotingMessage::ApprovedAncestor(subchain_head, target_number, tx), + std::any::type_name::(), + ) + .await; + } match rx .await .map_err(Error::ApprovedAncestorCanceled) @@ -476,13 +499,23 @@ where // task for sending the message to not block here and delay finality. if let Some(spawn_handle) = &self.spawn_handle { let mut overseer_handle = self.overseer.clone(); + let approval_voting_parallel_enabled = self.approval_voting_parallel_enabled; let lag_update_task = async move { - overseer_handle - .send_msg( - ApprovalDistributionMessage::ApprovalCheckingLagUpdate(lag), - std::any::type_name::(), - ) - .await; + if approval_voting_parallel_enabled { + overseer_handle + .send_msg( + ApprovalVotingParallelMessage::ApprovalCheckingLagUpdate(lag), + std::any::type_name::(), + ) + .await; + } else { + overseer_handle + .send_msg( + ApprovalDistributionMessage::ApprovalCheckingLagUpdate(lag), + std::any::type_name::(), + ) + .await; + } }; spawn_handle.spawn( diff --git a/polkadot/node/service/src/tests.rs b/polkadot/node/service/src/tests.rs index 195432bcb75d..85b750ddad67 100644 --- a/polkadot/node/service/src/tests.rs +++ b/polkadot/node/service/src/tests.rs @@ -83,6 +83,7 @@ fn test_harness>( context.sender().clone(), Default::default(), None, + false, ); let target_hash = case_vars.target_block; diff --git a/polkadot/node/subsystem-bench/Cargo.toml b/polkadot/node/subsystem-bench/Cargo.toml index ae798cf2640a..293df9f6e6d5 100644 --- a/polkadot/node/subsystem-bench/Cargo.toml +++ b/polkadot/node/subsystem-bench/Cargo.toml @@ -80,6 +80,7 @@ serde_yaml = { workspace = true } serde_json = { workspace = true } polkadot-node-core-approval-voting = { workspace = true, default-features = true } +polkadot-node-core-approval-voting-parallel = { workspace = true, default-features = true } polkadot-approval-distribution = { workspace = true, default-features = true } sp-consensus-babe = { workspace = true, default-features = true } sp-runtime = { workspace = true } diff --git a/polkadot/node/subsystem-bench/examples/approvals_no_shows.yaml b/polkadot/node/subsystem-bench/examples/approvals_no_shows.yaml index cae1a30914da..1423d324df3f 100644 --- a/polkadot/node/subsystem-bench/examples/approvals_no_shows.yaml +++ b/polkadot/node/subsystem-bench/examples/approvals_no_shows.yaml @@ -9,6 +9,7 @@ TestConfiguration: coalesce_tranche_diff: 12 num_no_shows_per_candidate: 10 workdir_prefix: "/tmp/" + approval_voting_parallel_enabled: false n_validators: 500 n_cores: 100 min_pov_size: 1120 diff --git a/polkadot/node/subsystem-bench/examples/approvals_throughput.yaml b/polkadot/node/subsystem-bench/examples/approvals_throughput.yaml index 7edb48e302a4..87c6103a5d0a 100644 --- a/polkadot/node/subsystem-bench/examples/approvals_throughput.yaml +++ b/polkadot/node/subsystem-bench/examples/approvals_throughput.yaml @@ -9,6 +9,7 @@ TestConfiguration: coalesce_tranche_diff: 12 num_no_shows_per_candidate: 0 workdir_prefix: "/tmp" + approval_voting_parallel_enabled: true n_validators: 500 n_cores: 100 min_pov_size: 1120 diff --git a/polkadot/node/subsystem-bench/examples/approvals_throughput_best_case.yaml b/polkadot/node/subsystem-bench/examples/approvals_throughput_best_case.yaml index 7c24f50e6af5..5e2ea3817d17 100644 --- a/polkadot/node/subsystem-bench/examples/approvals_throughput_best_case.yaml +++ b/polkadot/node/subsystem-bench/examples/approvals_throughput_best_case.yaml @@ -8,6 +8,7 @@ TestConfiguration: stop_when_approved: true coalesce_tranche_diff: 12 num_no_shows_per_candidate: 0 + approval_voting_parallel_enabled: false workdir_prefix: "/tmp/" n_validators: 500 n_cores: 100 diff --git a/polkadot/node/subsystem-bench/src/lib/approval/helpers.rs b/polkadot/node/subsystem-bench/src/lib/approval/helpers.rs index 4b2b91696824..a3a475ac6b98 100644 --- a/polkadot/node/subsystem-bench/src/lib/approval/helpers.rs +++ b/polkadot/node/subsystem-bench/src/lib/approval/helpers.rs @@ -21,8 +21,11 @@ use polkadot_node_network_protocol::{ View, }; use polkadot_node_primitives::approval::time::{Clock, SystemClock, Tick}; +use polkadot_node_subsystem::messages::{ + ApprovalDistributionMessage, ApprovalVotingParallelMessage, +}; use polkadot_node_subsystem_types::messages::{ - network_bridge_event::NewGossipTopology, ApprovalDistributionMessage, NetworkBridgeEvent, + network_bridge_event::NewGossipTopology, NetworkBridgeEvent, }; use polkadot_overseer::AllMessages; use polkadot_primitives::{ @@ -121,6 +124,7 @@ pub fn generate_topology(test_authorities: &TestAuthorities) -> SessionGridTopol pub fn generate_new_session_topology( test_authorities: &TestAuthorities, test_node: ValidatorIndex, + approval_voting_parallel_enabled: bool, ) -> Vec { let topology = generate_topology(test_authorities); @@ -129,14 +133,29 @@ pub fn generate_new_session_topology( topology, local_index: Some(test_node), }); - vec![AllMessages::ApprovalDistribution(ApprovalDistributionMessage::NetworkBridgeUpdate(event))] + vec![if approval_voting_parallel_enabled { + AllMessages::ApprovalVotingParallel(ApprovalVotingParallelMessage::NetworkBridgeUpdate( + event, + )) + } else { + AllMessages::ApprovalDistribution(ApprovalDistributionMessage::NetworkBridgeUpdate(event)) + }] } /// Generates a peer view change for the passed `block_hash` -pub fn generate_peer_view_change_for(block_hash: Hash, peer_id: PeerId) -> AllMessages { +pub fn generate_peer_view_change_for( + block_hash: Hash, + peer_id: PeerId, + approval_voting_parallel_enabled: bool, +) -> AllMessages { let network = NetworkBridgeEvent::PeerViewChange(peer_id, View::new([block_hash], 0)); - - AllMessages::ApprovalDistribution(ApprovalDistributionMessage::NetworkBridgeUpdate(network)) + if approval_voting_parallel_enabled { + AllMessages::ApprovalVotingParallel(ApprovalVotingParallelMessage::NetworkBridgeUpdate( + network, + )) + } else { + AllMessages::ApprovalDistribution(ApprovalDistributionMessage::NetworkBridgeUpdate(network)) + } } /// Helper function to create a a signature for the block header. diff --git a/polkadot/node/subsystem-bench/src/lib/approval/mod.rs b/polkadot/node/subsystem-bench/src/lib/approval/mod.rs index 9d85039b8880..29ebc4a419ae 100644 --- a/polkadot/node/subsystem-bench/src/lib/approval/mod.rs +++ b/polkadot/node/subsystem-bench/src/lib/approval/mod.rs @@ -49,20 +49,21 @@ use itertools::Itertools; use orchestra::TimeoutExt; use overseer::{metrics::Metrics as OverseerMetrics, MetricsTrait}; use polkadot_approval_distribution::ApprovalDistribution; +use polkadot_node_core_approval_voting_parallel::ApprovalVotingParallelSubsystem; use polkadot_node_primitives::approval::time::{ slot_number_to_tick, tick_to_slot_number, Clock, ClockExt, SystemClock, }; use polkadot_node_core_approval_voting::{ - ApprovalVotingSubsystem, Config as ApprovalVotingConfig, Metrics as ApprovalVotingMetrics, - RealAssignmentCriteria, + ApprovalVotingSubsystem, Config as ApprovalVotingConfig, RealAssignmentCriteria, }; use polkadot_node_network_protocol::v3 as protocol_v3; use polkadot_node_primitives::approval::{self, v1::RelayVRFStory}; -use polkadot_node_subsystem::{overseer, AllMessages, Overseer, OverseerConnector, SpawnGlue}; +use polkadot_node_subsystem::{ + messages::{ApprovalDistributionMessage, ApprovalVotingMessage, ApprovalVotingParallelMessage}, + overseer, AllMessages, Overseer, OverseerConnector, SpawnGlue, +}; use polkadot_node_subsystem_test_helpers::mock::new_block_import_info; -use polkadot_node_subsystem_types::messages::{ApprovalDistributionMessage, ApprovalVotingMessage}; -use polkadot_node_subsystem_util::metrics::Metrics; use polkadot_overseer::Handle as OverseerHandleReal; use polkadot_primitives::{ BlockNumber, CandidateEvent, CandidateIndex, CandidateReceipt, Hash, Header, Slot, ValidatorId, @@ -138,6 +139,9 @@ pub struct ApprovalsOptions { /// The number of no shows per candidate #[clap(short, long, default_value_t = 0)] pub num_no_shows_per_candidate: u32, + /// Enable approval voting parallel. + #[clap(short, long, default_value_t = true)] + pub approval_voting_parallel_enabled: bool, } impl ApprovalsOptions { @@ -272,7 +276,7 @@ pub struct ApprovalTestState { /// Total unique sent messages. total_unique_messages: Arc, /// Approval voting metrics. - approval_voting_metrics: ApprovalVotingMetrics, + approval_voting_parallel_metrics: polkadot_node_core_approval_voting_parallel::Metrics, /// The delta ticks from the tick the messages were generated to the the time we start this /// message. delta_tick_from_generated: Arc, @@ -330,7 +334,10 @@ impl ApprovalTestState { total_sent_messages_from_node: Arc::new(AtomicU64::new(0)), total_unique_messages: Arc::new(AtomicU64::new(0)), options, - approval_voting_metrics: ApprovalVotingMetrics::try_register(&dependencies.registry) + approval_voting_parallel_metrics: + polkadot_node_core_approval_voting_parallel::Metrics::try_register( + &dependencies.registry, + ) .unwrap(), delta_tick_from_generated: Arc::new(AtomicU64::new(630720000)), configuration: configuration.clone(), @@ -456,6 +463,14 @@ impl ApprovalTestState { }) .collect() } + + fn subsystem_name(&self) -> &'static str { + if self.options.approval_voting_parallel_enabled { + "approval-voting-parallel-subsystem" + } else { + "approval-distribution-subsystem" + } + } } impl ApprovalTestState { @@ -597,13 +612,16 @@ impl PeerMessageProducer { // so when the approval-distribution answered to it, we know it doesn't have anything // else to process. let (tx, rx) = oneshot::channel(); - let msg = ApprovalDistributionMessage::GetApprovalSignatures(HashSet::new(), tx); - self.send_overseer_message( - AllMessages::ApprovalDistribution(msg), - ValidatorIndex(0), - None, - ) - .await; + let msg = if self.options.approval_voting_parallel_enabled { + AllMessages::ApprovalVotingParallel( + ApprovalVotingParallelMessage::GetApprovalSignatures(HashSet::new(), tx), + ) + } else { + AllMessages::ApprovalDistribution( + ApprovalDistributionMessage::GetApprovalSignatures(HashSet::new(), tx), + ) + }; + self.send_overseer_message(msg, ValidatorIndex(0), None).await; rx.await.expect("Failed to get signatures"); self.notify_done.send(()).expect("Failed to notify main loop"); gum::info!("All messages processed "); @@ -743,7 +761,11 @@ impl PeerMessageProducer { for validator in 1..self.state.test_authorities.validator_authority_id.len() as u32 { let peer_id = self.state.test_authorities.peer_ids.get(validator as usize).unwrap(); let validator = ValidatorIndex(validator); - let view_update = generate_peer_view_change_for(block_info.hash, *peer_id); + let view_update = generate_peer_view_change_for( + block_info.hash, + *peer_id, + self.state.options.approval_voting_parallel_enabled, + ); self.send_overseer_message(view_update, validator, None).await; } @@ -808,24 +830,12 @@ fn build_overseer( let system_clock = PastSystemClock::new(SystemClock {}, state.delta_tick_from_generated.clone()); - let approval_voting = ApprovalVotingSubsystem::with_config_and_clock( - TEST_CONFIG, - Arc::new(db), - Arc::new(keystore), - Box::new(TestSyncOracle {}), - state.approval_voting_metrics.clone(), - Arc::new(system_clock.clone()), - Arc::new(SpawnGlue(spawn_task_handle.clone())), - ); + let keystore = Arc::new(keystore); + let db = Arc::new(db); - let approval_distribution = ApprovalDistribution::new_with_clock( - Metrics::register(Some(&dependencies.registry)).unwrap(), - SLOT_DURATION_MILLIS, - Box::new(system_clock.clone()), - Arc::new(RealAssignmentCriteria {}), - ); let mock_chain_api = MockChainApi::new(state.build_chain_api_state()); - let mock_chain_selection = MockChainSelection { state: state.clone(), clock: system_clock }; + let mock_chain_selection = + MockChainSelection { state: state.clone(), clock: system_clock.clone() }; let mock_runtime_api = MockRuntimeApi::new( config.clone(), state.test_authorities.clone(), @@ -840,11 +850,14 @@ fn build_overseer( network_interface.subsystem_sender(), state.test_authorities.clone(), ); - let mock_rx_bridge = MockNetworkBridgeRx::new(network_receiver, None); + let mock_rx_bridge = MockNetworkBridgeRx::new( + network_receiver, + None, + state.options.approval_voting_parallel_enabled, + ); let overseer_metrics = OverseerMetrics::try_register(&dependencies.registry).unwrap(); - let dummy = dummy_builder!(spawn_task_handle, overseer_metrics) - .replace_approval_distribution(|_| approval_distribution) - .replace_approval_voting(|_| approval_voting) + let task_handle = spawn_task_handle.clone(); + let dummy = dummy_builder!(task_handle, overseer_metrics) .replace_chain_api(|_| mock_chain_api) .replace_chain_selection(|_| mock_chain_selection) .replace_runtime_api(|_| mock_runtime_api) @@ -853,8 +866,45 @@ fn build_overseer( .replace_availability_recovery(|_| MockAvailabilityRecovery::new()) .replace_candidate_validation(|_| MockCandidateValidation::new()); - let (overseer, raw_handle) = - dummy.build_with_connector(overseer_connector).expect("Should not fail"); + let (overseer, raw_handle) = if state.options.approval_voting_parallel_enabled { + let approval_voting_parallel = ApprovalVotingParallelSubsystem::with_config_and_clock( + TEST_CONFIG, + db.clone(), + keystore.clone(), + Box::new(TestSyncOracle {}), + state.approval_voting_parallel_metrics.clone(), + Arc::new(system_clock.clone()), + SpawnGlue(spawn_task_handle.clone()), + None, + ); + dummy + .replace_approval_voting_parallel(|_| approval_voting_parallel) + .build_with_connector(overseer_connector) + .expect("Should not fail") + } else { + let approval_voting = ApprovalVotingSubsystem::with_config_and_clock( + TEST_CONFIG, + db.clone(), + keystore.clone(), + Box::new(TestSyncOracle {}), + state.approval_voting_parallel_metrics.approval_voting_metrics(), + Arc::new(system_clock.clone()), + Arc::new(SpawnGlue(spawn_task_handle.clone())), + ); + + let approval_distribution = ApprovalDistribution::new_with_clock( + state.approval_voting_parallel_metrics.approval_distribution_metrics(), + TEST_CONFIG.slot_duration_millis, + Arc::new(system_clock.clone()), + Arc::new(RealAssignmentCriteria {}), + ); + + dummy + .replace_approval_voting(|_| approval_voting) + .replace_approval_distribution(|_| approval_distribution) + .build_with_connector(overseer_connector) + .expect("Should not fail") + }; let overseer_handle = OverseerHandleReal::new(raw_handle); (overseer, overseer_handle) @@ -943,11 +993,18 @@ pub async fn bench_approvals_run( // First create the initialization messages that make sure that then node under // tests receives notifications about the topology used and the connected peers. let mut initialization_messages = env.network().generate_peer_connected(|e| { - AllMessages::ApprovalDistribution(ApprovalDistributionMessage::NetworkBridgeUpdate(e)) + if state.options.approval_voting_parallel_enabled { + AllMessages::ApprovalVotingParallel(ApprovalVotingParallelMessage::NetworkBridgeUpdate( + e, + )) + } else { + AllMessages::ApprovalDistribution(ApprovalDistributionMessage::NetworkBridgeUpdate(e)) + } }); initialization_messages.extend(generate_new_session_topology( &state.test_authorities, ValidatorIndex(NODE_UNDER_TEST), + state.options.approval_voting_parallel_enabled, )); for message in initialization_messages { env.send_message(message).await; @@ -1012,7 +1069,14 @@ pub async fn bench_approvals_run( state.total_sent_messages_to_node.load(std::sync::atomic::Ordering::SeqCst) as usize; env.wait_until_metric( "polkadot_parachain_subsystem_bounded_received", - Some(("subsystem_name", "approval-distribution-subsystem")), + Some(( + "subsystem_name", + if state.options.approval_voting_parallel_enabled { + "approval-voting-parallel-subsystem" + } else { + "approval-distribution-subsystem" + }, + )), |value| { gum::debug!(target: LOG_TARGET, ?value, ?at_least_messages, "Waiting metric"); value >= at_least_messages as f64 @@ -1029,11 +1093,22 @@ pub async fn bench_approvals_run( CandidateEvent::CandidateIncluded(receipt_fetch, _head, _, _) => { let (tx, rx) = oneshot::channel(); - let msg = ApprovalVotingMessage::GetApprovalSignaturesForCandidate( - receipt_fetch.hash(), - tx, - ); - env.send_message(AllMessages::ApprovalVoting(msg)).await; + let msg = if state.options.approval_voting_parallel_enabled { + AllMessages::ApprovalVotingParallel( + ApprovalVotingParallelMessage::GetApprovalSignaturesForCandidate( + receipt_fetch.hash(), + tx, + ), + ) + } else { + AllMessages::ApprovalVoting( + ApprovalVotingMessage::GetApprovalSignaturesForCandidate( + receipt_fetch.hash(), + tx, + ), + ) + }; + env.send_message(msg).await; let result = rx.await.unwrap(); @@ -1057,7 +1132,7 @@ pub async fn bench_approvals_run( state.total_sent_messages_to_node.load(std::sync::atomic::Ordering::SeqCst) as usize; env.wait_until_metric( "polkadot_parachain_subsystem_bounded_received", - Some(("subsystem_name", "approval-distribution-subsystem")), + Some(("subsystem_name", state.subsystem_name())), |value| { gum::debug!(target: LOG_TARGET, ?value, ?at_least_messages, "Waiting metric"); value >= at_least_messages as f64 @@ -1098,5 +1173,8 @@ pub async fn bench_approvals_run( state.total_unique_messages.load(std::sync::atomic::Ordering::SeqCst) ); - env.collect_resource_usage(&["approval-distribution", "approval-voting"]) + env.collect_resource_usage( + &["approval-distribution", "approval-voting", "approval-voting-parallel"], + true, + ) } diff --git a/polkadot/node/subsystem-bench/src/lib/availability/mod.rs b/polkadot/node/subsystem-bench/src/lib/availability/mod.rs index 32dc8ae2c8dc..f28adff315f8 100644 --- a/polkadot/node/subsystem-bench/src/lib/availability/mod.rs +++ b/polkadot/node/subsystem-bench/src/lib/availability/mod.rs @@ -210,7 +210,7 @@ pub fn prepare_test( state.test_authorities.clone(), ); let network_bridge_rx = - network_bridge::MockNetworkBridgeRx::new(network_receiver, Some(chunk_req_v2_cfg)); + network_bridge::MockNetworkBridgeRx::new(network_receiver, Some(chunk_req_v2_cfg), false); let runtime_api = MockRuntimeApi::new( state.config.clone(), @@ -372,7 +372,7 @@ pub async fn benchmark_availability_read( ); env.stop().await; - env.collect_resource_usage(&["availability-recovery"]) + env.collect_resource_usage(&["availability-recovery"], false) } pub async fn benchmark_availability_write( @@ -506,9 +506,8 @@ pub async fn benchmark_availability_write( ); env.stop().await; - env.collect_resource_usage(&[ - "availability-distribution", - "bitfield-distribution", - "availability-store", - ]) + env.collect_resource_usage( + &["availability-distribution", "bitfield-distribution", "availability-store"], + false, + ) } diff --git a/polkadot/node/subsystem-bench/src/lib/display.rs b/polkadot/node/subsystem-bench/src/lib/display.rs index b153d54a7c36..c47dd9a07900 100644 --- a/polkadot/node/subsystem-bench/src/lib/display.rs +++ b/polkadot/node/subsystem-bench/src/lib/display.rs @@ -96,6 +96,23 @@ pub struct TestMetric { value: f64, } +impl TestMetric { + pub fn name(&self) -> &str { + &self.name + } + + pub fn value(&self) -> f64 { + self.value + } + + pub fn label_value(&self, label_name: &str) -> Option<&str> { + self.label_names + .iter() + .position(|name| name == label_name) + .and_then(|index| self.label_values.get(index).map(|s| s.as_str())) + } +} + impl Display for TestMetric { fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { write!( diff --git a/polkadot/node/subsystem-bench/src/lib/environment.rs b/polkadot/node/subsystem-bench/src/lib/environment.rs index a63f90da50b3..4de683ad6487 100644 --- a/polkadot/node/subsystem-bench/src/lib/environment.rs +++ b/polkadot/node/subsystem-bench/src/lib/environment.rs @@ -351,10 +351,14 @@ impl TestEnvironment { } } - pub fn collect_resource_usage(&self, subsystems_under_test: &[&str]) -> BenchmarkUsage { + pub fn collect_resource_usage( + &self, + subsystems_under_test: &[&str], + break_down_cpu_usage_per_task: bool, + ) -> BenchmarkUsage { BenchmarkUsage { network_usage: self.network_usage(), - cpu_usage: self.cpu_usage(subsystems_under_test), + cpu_usage: self.cpu_usage(subsystems_under_test, break_down_cpu_usage_per_task), } } @@ -378,7 +382,11 @@ impl TestEnvironment { ] } - fn cpu_usage(&self, subsystems_under_test: &[&str]) -> Vec { + fn cpu_usage( + &self, + subsystems_under_test: &[&str], + break_down_per_task: bool, + ) -> Vec { let test_metrics = super::display::parse_metrics(self.registry()); let mut usage = vec![]; let num_blocks = self.config().num_blocks as f64; @@ -392,6 +400,22 @@ impl TestEnvironment { total: total_cpu, per_block: total_cpu / num_blocks, }); + + if break_down_per_task { + for metric in subsystem_cpu_metrics.all() { + if metric.name() != "substrate_tasks_polling_duration_sum" { + continue; + } + + if let Some(task_name) = metric.label_value("task_name") { + usage.push(ResourceUsage { + resource_name: format!("{}/{}", subsystem, task_name), + total: metric.value(), + per_block: metric.value() / num_blocks, + }); + } + } + } } let test_env_cpu_metrics = diff --git a/polkadot/node/subsystem-bench/src/lib/mock/dummy.rs b/polkadot/node/subsystem-bench/src/lib/mock/dummy.rs index 8783b35f1c04..092a8fc5f4c1 100644 --- a/polkadot/node/subsystem-bench/src/lib/mock/dummy.rs +++ b/polkadot/node/subsystem-bench/src/lib/mock/dummy.rs @@ -96,5 +96,6 @@ mock!(NetworkBridgeTx); mock!(ChainApi); mock!(ChainSelection); mock!(ApprovalVoting); +mock!(ApprovalVotingParallel); mock!(ApprovalDistribution); mock!(RuntimeApi); diff --git a/polkadot/node/subsystem-bench/src/lib/mock/mod.rs b/polkadot/node/subsystem-bench/src/lib/mock/mod.rs index da4ac05e33b7..2ca47d9fc082 100644 --- a/polkadot/node/subsystem-bench/src/lib/mock/mod.rs +++ b/polkadot/node/subsystem-bench/src/lib/mock/mod.rs @@ -47,6 +47,7 @@ macro_rules! dummy_builder { // All subsystem except approval_voting and approval_distribution are mock subsystems. Overseer::builder() .approval_voting(MockApprovalVoting {}) + .approval_voting_parallel(MockApprovalVotingParallel {}) .approval_distribution(MockApprovalDistribution {}) .availability_recovery(MockAvailabilityRecovery {}) .candidate_validation(MockCandidateValidation {}) diff --git a/polkadot/node/subsystem-bench/src/lib/mock/network_bridge.rs b/polkadot/node/subsystem-bench/src/lib/mock/network_bridge.rs index d70953926d13..f5474a61e3dc 100644 --- a/polkadot/node/subsystem-bench/src/lib/mock/network_bridge.rs +++ b/polkadot/node/subsystem-bench/src/lib/mock/network_bridge.rs @@ -24,13 +24,13 @@ use crate::{ use futures::{channel::mpsc::UnboundedSender, FutureExt, StreamExt}; use polkadot_node_network_protocol::Versioned; use polkadot_node_subsystem::{ - messages::NetworkBridgeTxMessage, overseer, SpawnedSubsystem, SubsystemError, -}; -use polkadot_node_subsystem_types::{ messages::{ - ApprovalDistributionMessage, BitfieldDistributionMessage, NetworkBridgeEvent, - StatementDistributionMessage, + ApprovalDistributionMessage, ApprovalVotingParallelMessage, NetworkBridgeTxMessage, }, + overseer, SpawnedSubsystem, SubsystemError, +}; +use polkadot_node_subsystem_types::{ + messages::{BitfieldDistributionMessage, NetworkBridgeEvent, StatementDistributionMessage}, OverseerSignal, }; use sc_network::{request_responses::ProtocolConfig, RequestFailure}; @@ -57,6 +57,8 @@ pub struct MockNetworkBridgeRx { network_receiver: NetworkInterfaceReceiver, /// Chunk request sender chunk_request_sender: Option, + /// Approval voting parallel enabled. + approval_voting_parallel_enabled: bool, } impl MockNetworkBridgeTx { @@ -73,8 +75,9 @@ impl MockNetworkBridgeRx { pub fn new( network_receiver: NetworkInterfaceReceiver, chunk_request_sender: Option, + approval_voting_parallel_enabled: bool, ) -> MockNetworkBridgeRx { - Self { network_receiver, chunk_request_sender } + Self { network_receiver, chunk_request_sender, approval_voting_parallel_enabled } } } @@ -199,9 +202,15 @@ impl MockNetworkBridgeRx { Versioned::V3( polkadot_node_network_protocol::v3::ValidationProtocol::ApprovalDistribution(msg) ) => { - ctx.send_message( - ApprovalDistributionMessage::NetworkBridgeUpdate(NetworkBridgeEvent::PeerMessage(peer_id, polkadot_node_network_protocol::Versioned::V3(msg))) - ).await; + if self.approval_voting_parallel_enabled { + ctx.send_message( + ApprovalVotingParallelMessage::NetworkBridgeUpdate(NetworkBridgeEvent::PeerMessage(peer_id, polkadot_node_network_protocol::Versioned::V3(msg))) + ).await; + } else { + ctx.send_message( + ApprovalDistributionMessage::NetworkBridgeUpdate(NetworkBridgeEvent::PeerMessage(peer_id, polkadot_node_network_protocol::Versioned::V3(msg))) + ).await; + } } Versioned::V3( polkadot_node_network_protocol::v3::ValidationProtocol::StatementDistribution(msg) diff --git a/polkadot/node/subsystem-bench/src/lib/statement/mod.rs b/polkadot/node/subsystem-bench/src/lib/statement/mod.rs index bd47505f56ae..e2d50f285687 100644 --- a/polkadot/node/subsystem-bench/src/lib/statement/mod.rs +++ b/polkadot/node/subsystem-bench/src/lib/statement/mod.rs @@ -135,7 +135,8 @@ fn build_overseer( network_interface.subsystem_sender(), state.test_authorities.clone(), ); - let network_bridge_rx = MockNetworkBridgeRx::new(network_receiver, Some(candidate_req_cfg)); + let network_bridge_rx = + MockNetworkBridgeRx::new(network_receiver, Some(candidate_req_cfg), false); let dummy = dummy_builder!(spawn_task_handle, overseer_metrics) .replace_runtime_api(|_| mock_runtime_api) @@ -445,5 +446,5 @@ pub async fn benchmark_statement_distribution( ); env.stop().await; - env.collect_resource_usage(&["statement-distribution"]) + env.collect_resource_usage(&["statement-distribution"], false) } diff --git a/polkadot/node/subsystem-bench/src/lib/usage.rs b/polkadot/node/subsystem-bench/src/lib/usage.rs index 883e9aa7ad0a..5f691ae2db39 100644 --- a/polkadot/node/subsystem-bench/src/lib/usage.rs +++ b/polkadot/node/subsystem-bench/src/lib/usage.rs @@ -32,14 +32,14 @@ impl std::fmt::Display for BenchmarkUsage { write!( f, "\n{}\n{}\n\n{}\n{}\n", - format!("{:<32}{:>12}{:>12}", "Network usage, KiB", "total", "per block").blue(), + format!("{:<64}{:>12}{:>12}", "Network usage, KiB", "total", "per block").blue(), self.network_usage .iter() .map(|v| v.to_string()) .sorted() .collect::>() .join("\n"), - format!("{:<32}{:>12}{:>12}", "CPU usage, seconds", "total", "per block").blue(), + format!("{:<64}{:>12}{:>12}", "CPU usage, seconds", "total", "per block").blue(), self.cpu_usage .iter() .map(|v| v.to_string()) @@ -134,7 +134,7 @@ pub struct ResourceUsage { impl std::fmt::Display for ResourceUsage { fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result { - write!(f, "{:<32}{:>12.4}{:>12.4}", self.resource_name.cyan(), self.total, self.per_block) + write!(f, "{:<64}{:>12.4}{:>12.4}", self.resource_name.cyan(), self.total, self.per_block) } } diff --git a/polkadot/node/subsystem-types/src/messages.rs b/polkadot/node/subsystem-types/src/messages.rs index 854a9da158be..fafc700e739f 100644 --- a/polkadot/node/subsystem-types/src/messages.rs +++ b/polkadot/node/subsystem-types/src/messages.rs @@ -955,6 +955,103 @@ pub struct BlockDescription { pub candidates: Vec, } +/// Message to the approval voting parallel subsystem running both approval-distribution and +/// approval-voting logic in parallel. This is a combination of all the messages ApprovalVoting and +/// ApprovalDistribution subsystems can receive. +/// +/// The reason this exists is, so that we can keep both modes of running in the same polkadot +/// binary, based on the value of `--approval-voting-parallel-enabled`, we decide if we run with two +/// different subsystems for approval-distribution and approval-voting or run the approval-voting +/// parallel which has several parallel workers for the approval-distribution and a worker for +/// approval-voting. +/// +/// This is meant to be a temporary state until we can safely remove running the two subsystems +/// individually. +#[derive(Debug, derive_more::From)] +pub enum ApprovalVotingParallelMessage { + /// Gets mapped into `ApprovalVotingMessage::ApprovedAncestor` + ApprovedAncestor(Hash, BlockNumber, oneshot::Sender>), + + /// Gets mapped into `ApprovalVotingMessage::GetApprovalSignaturesForCandidate` + GetApprovalSignaturesForCandidate( + CandidateHash, + oneshot::Sender, ValidatorSignature)>>, + ), + /// Gets mapped into `ApprovalDistributionMessage::NewBlocks` + NewBlocks(Vec), + /// Gets mapped into `ApprovalDistributionMessage::DistributeAssignment` + DistributeAssignment(IndirectAssignmentCertV2, CandidateBitfield), + /// Gets mapped into `ApprovalDistributionMessage::DistributeApproval` + DistributeApproval(IndirectSignedApprovalVoteV2), + /// An update from the network bridge, gets mapped into + /// `ApprovalDistributionMessage::NetworkBridgeUpdate` + #[from] + NetworkBridgeUpdate(NetworkBridgeEvent), + + /// Gets mapped into `ApprovalDistributionMessage::GetApprovalSignatures` + GetApprovalSignatures( + HashSet<(Hash, CandidateIndex)>, + oneshot::Sender, ValidatorSignature)>>, + ), + /// Gets mapped into `ApprovalDistributionMessage::ApprovalCheckingLagUpdate` + ApprovalCheckingLagUpdate(BlockNumber), +} + +impl TryFrom for ApprovalVotingMessage { + type Error = (); + + fn try_from(msg: ApprovalVotingParallelMessage) -> Result { + match msg { + ApprovalVotingParallelMessage::ApprovedAncestor(hash, number, tx) => + Ok(ApprovalVotingMessage::ApprovedAncestor(hash, number, tx)), + ApprovalVotingParallelMessage::GetApprovalSignaturesForCandidate(candidate, tx) => + Ok(ApprovalVotingMessage::GetApprovalSignaturesForCandidate(candidate, tx)), + _ => Err(()), + } + } +} + +impl TryFrom for ApprovalDistributionMessage { + type Error = (); + + fn try_from(msg: ApprovalVotingParallelMessage) -> Result { + match msg { + ApprovalVotingParallelMessage::NewBlocks(blocks) => + Ok(ApprovalDistributionMessage::NewBlocks(blocks)), + ApprovalVotingParallelMessage::DistributeAssignment(assignment, claimed_cores) => + Ok(ApprovalDistributionMessage::DistributeAssignment(assignment, claimed_cores)), + ApprovalVotingParallelMessage::DistributeApproval(vote) => + Ok(ApprovalDistributionMessage::DistributeApproval(vote)), + ApprovalVotingParallelMessage::NetworkBridgeUpdate(msg) => + Ok(ApprovalDistributionMessage::NetworkBridgeUpdate(msg)), + ApprovalVotingParallelMessage::GetApprovalSignatures(candidate_indicies, tx) => + Ok(ApprovalDistributionMessage::GetApprovalSignatures(candidate_indicies, tx)), + ApprovalVotingParallelMessage::ApprovalCheckingLagUpdate(lag) => + Ok(ApprovalDistributionMessage::ApprovalCheckingLagUpdate(lag)), + _ => Err(()), + } + } +} + +impl From for ApprovalVotingParallelMessage { + fn from(msg: ApprovalDistributionMessage) -> Self { + match msg { + ApprovalDistributionMessage::NewBlocks(blocks) => + ApprovalVotingParallelMessage::NewBlocks(blocks), + ApprovalDistributionMessage::DistributeAssignment(cert, bitfield) => + ApprovalVotingParallelMessage::DistributeAssignment(cert, bitfield), + ApprovalDistributionMessage::DistributeApproval(vote) => + ApprovalVotingParallelMessage::DistributeApproval(vote), + ApprovalDistributionMessage::NetworkBridgeUpdate(msg) => + ApprovalVotingParallelMessage::NetworkBridgeUpdate(msg), + ApprovalDistributionMessage::GetApprovalSignatures(candidate_indicies, tx) => + ApprovalVotingParallelMessage::GetApprovalSignatures(candidate_indicies, tx), + ApprovalDistributionMessage::ApprovalCheckingLagUpdate(lag) => + ApprovalVotingParallelMessage::ApprovalCheckingLagUpdate(lag), + } + } +} + /// Response type to `ApprovalVotingMessage::ApprovedAncestor`. #[derive(Clone, Debug)] pub struct HighestApprovedAncestorBlock { diff --git a/polkadot/node/test/service/src/lib.rs b/polkadot/node/test/service/src/lib.rs index b12387884861..f879aa93df9f 100644 --- a/polkadot/node/test/service/src/lib.rs +++ b/polkadot/node/test/service/src/lib.rs @@ -101,6 +101,7 @@ pub fn new_full( execute_workers_max_num: None, prepare_workers_hard_max_num: None, prepare_workers_soft_max_num: None, + enable_approval_voting_parallel: false, }, ), sc_network::config::NetworkBackendType::Litep2p => @@ -123,6 +124,7 @@ pub fn new_full( execute_workers_max_num: None, prepare_workers_hard_max_num: None, prepare_workers_soft_max_num: None, + enable_approval_voting_parallel: false, }, ), } diff --git a/polkadot/parachain/test-parachains/adder/collator/src/main.rs b/polkadot/parachain/test-parachains/adder/collator/src/main.rs index e8588274df27..4660b4d38f7c 100644 --- a/polkadot/parachain/test-parachains/adder/collator/src/main.rs +++ b/polkadot/parachain/test-parachains/adder/collator/src/main.rs @@ -98,6 +98,7 @@ fn main() -> Result<()> { execute_workers_max_num: None, prepare_workers_hard_max_num: None, prepare_workers_soft_max_num: None, + enable_approval_voting_parallel: false, }, ) .map_err(|e| e.to_string())?; diff --git a/polkadot/parachain/test-parachains/undying/collator/src/main.rs b/polkadot/parachain/test-parachains/undying/collator/src/main.rs index 7198a831a477..3dfa714e6d16 100644 --- a/polkadot/parachain/test-parachains/undying/collator/src/main.rs +++ b/polkadot/parachain/test-parachains/undying/collator/src/main.rs @@ -100,6 +100,7 @@ fn main() -> Result<()> { execute_workers_max_num: None, prepare_workers_hard_max_num: None, prepare_workers_soft_max_num: None, + enable_approval_voting_parallel: false, }, ) .map_err(|e| e.to_string())?; diff --git a/polkadot/roadmap/implementers-guide/src/node/approval/approval-voting-parallel.md b/polkadot/roadmap/implementers-guide/src/node/approval/approval-voting-parallel.md new file mode 100644 index 000000000000..84661b7bf9b3 --- /dev/null +++ b/polkadot/roadmap/implementers-guide/src/node/approval/approval-voting-parallel.md @@ -0,0 +1,30 @@ +# Approval voting parallel + +The approval-voting-parallel subsystem acts as an orchestrator for the tasks handled by the [Approval Voting](approval-voting.md) +and [Approval Distribution](approval-distribution.md) subsystems. Initially, these two systems operated separately and interacted +with each other and other subsystems through orchestra. + +With approval-voting-parallel, we have a single subsystem that creates two types of workers: +- Four approval-distribution workers that operate in parallel, each handling tasks based on the validator_index of the message + originator. +- One approval-voting worker that performs the tasks previously managed by the standalone approval-voting subsystem. + +This subsystem does not maintain any state. Instead, it functions as an orchestrator that: +- Spawns and initializes each workers. +- Forwards each message and signal to the appropriate worker. +- Aggregates results for messages that require input from more than one worker, such as GetApprovalSignatures. + +## Forwarding logic + +The messages received and forwarded by approval-voting-parallel split in three categories: +- Signals which need to be forwarded to all workers. +- Messages that only the `approval-voting` worker needs to handle, `ApprovalVotingParallelMessage::ApprovedAncestor` + and `ApprovalVotingParallelMessage::GetApprovalSignaturesForCandidate` +- Control messages that all `approval-distribution` workers need to receive `ApprovalVotingParallelMessage::NewBlocks`, + `ApprovalVotingParallelMessage::ApprovalCheckingLagUpdate` and all network bridge variants `ApprovalVotingParallelMessage::NetworkBridgeUpdate` + except `ApprovalVotingParallelMessage::NetworkBridgeUpdate(NetworkBridgeEvent::PeerMessage)` +- Data messages `ApprovalVotingParallelMessage::NetworkBridgeUpdate(NetworkBridgeEvent::PeerMessage)` which need to be sent + just to a single `approval-distribution` worker based on the ValidatorIndex. The logic for assigning the work is: + ``` + assigned_worker_index = validator_index % number_of_workers; + ``` diff --git a/polkadot/zombienet_tests/functional/0009-approval-voting-coalescing.toml b/polkadot/zombienet_tests/functional/0009-approval-voting-coalescing.toml index 19c7015403d7..113de0e73aa1 100644 --- a/polkadot/zombienet_tests/functional/0009-approval-voting-coalescing.toml +++ b/polkadot/zombienet_tests/functional/0009-approval-voting-coalescing.toml @@ -18,7 +18,7 @@ requests = { memory = "2G", cpu = "1" } [[relaychain.node_groups]] name = "alice" - args = [ "-lparachain=trace,runtime=debug" ] + args = [ "-lparachain=debug,runtime=debug" ] count = 13 [[parachains]] diff --git a/polkadot/zombienet_tests/functional/0016-approval-voting-parallel.toml b/polkadot/zombienet_tests/functional/0016-approval-voting-parallel.toml new file mode 100644 index 000000000000..c035e23639c1 --- /dev/null +++ b/polkadot/zombienet_tests/functional/0016-approval-voting-parallel.toml @@ -0,0 +1,120 @@ +[settings] +timeout = 1000 + +[relaychain] +default_image = "{{ZOMBIENET_INTEGRATION_TEST_IMAGE}}" +chain = "rococo-local" + +[relaychain.genesis.runtimeGenesis.patch.configuration.config] + needed_approvals = 4 + relay_vrf_modulo_samples = 2 + +[relaychain.genesis.runtimeGenesis.patch.configuration.config.approval_voting_params] + max_approval_coalesce_count = 5 + +[relaychain.default_resources] +limits = { memory = "4G", cpu = "2" } +requests = { memory = "2G", cpu = "1" } + + [[relaychain.node_groups]] + name = "alice" + args = ["-lparachain=debug,runtime=debug", "--enable-approval-voting-parallel"] + count = 8 + + [[relaychain.node_groups]] + name = "bob" + args = ["-lparachain=debug,runtime=debug"] + count = 7 + +[[parachains]] +id = 2000 +addToGenesis = true +genesis_state_generator = "undying-collator export-genesis-state --pov-size=100000 --pvf-complexity=1" + + [parachains.collator] + name = "collator01" + image = "{{COL_IMAGE}}" + command = "undying-collator" + args = ["-lparachain=debug", "--pov-size=100000", "--pvf-complexity=1", "--parachain-id=2000"] + +[[parachains]] +id = 2001 +addToGenesis = true +genesis_state_generator = "undying-collator export-genesis-state --pov-size=100000 --pvf-complexity=10" + + [parachains.collator] + name = "collator02" + image = "{{COL_IMAGE}}" + command = "undying-collator" + args = ["-lparachain=debug", "--pov-size=100000", "--parachain-id=2001", "--pvf-complexity=10"] + +[[parachains]] +id = 2002 +addToGenesis = true +genesis_state_generator = "undying-collator export-genesis-state --pov-size=100000 --pvf-complexity=100" + + [parachains.collator] + name = "collator03" + image = "{{COL_IMAGE}}" + command = "undying-collator" + args = ["-lparachain=debug", "--pov-size=100000", "--parachain-id=2002", "--pvf-complexity=100"] + +[[parachains]] +id = 2003 +addToGenesis = true +genesis_state_generator = "undying-collator export-genesis-state --pov-size=20000 --pvf-complexity=300" + + [parachains.collator] + name = "collator04" + image = "{{COL_IMAGE}}" + command = "undying-collator" + args = ["-lparachain=debug", "--pov-size=20000", "--parachain-id=2003", "--pvf-complexity=300"] + +[[parachains]] +id = 2004 +addToGenesis = true +genesis_state_generator = "undying-collator export-genesis-state --pov-size=100000 --pvf-complexity=300" + + [parachains.collator] + name = "collator05" + image = "{{COL_IMAGE}}" + command = "undying-collator" + args = ["-lparachain=debug", "--pov-size=100000", "--parachain-id=2004", "--pvf-complexity=300"] + +[[parachains]] +id = 2005 +addToGenesis = true +genesis_state_generator = "undying-collator export-genesis-state --pov-size=20000 --pvf-complexity=400" + + [parachains.collator] + name = "collator06" + image = "{{COL_IMAGE}}" + command = "undying-collator" + args = ["-lparachain=debug", "--pov-size=20000", "--pvf-complexity=400", "--parachain-id=2005"] + +[[parachains]] +id = 2006 +addToGenesis = true +genesis_state_generator = "undying-collator export-genesis-state --pov-size=100000 --pvf-complexity=300" + + [parachains.collator] + name = "collator07" + image = "{{COL_IMAGE}}" + command = "undying-collator" + args = ["-lparachain=debug", "--pov-size=100000", "--pvf-complexity=300", "--parachain-id=2006"] + +[[parachains]] +id = 2007 +addToGenesis = true +genesis_state_generator = "undying-collator export-genesis-state --pov-size=100000 --pvf-complexity=300" + + [parachains.collator] + name = "collator08" + image = "{{COL_IMAGE}}" + command = "undying-collator" + args = ["-lparachain=debug", "--pov-size=100000", "--pvf-complexity=300", "--parachain-id=2007"] + +[types.Header] +number = "u64" +parent_hash = "Hash" +post_state = "Hash" \ No newline at end of file diff --git a/polkadot/zombienet_tests/functional/0016-approval-voting-parallel.zndsl b/polkadot/zombienet_tests/functional/0016-approval-voting-parallel.zndsl new file mode 100644 index 000000000000..d70707747474 --- /dev/null +++ b/polkadot/zombienet_tests/functional/0016-approval-voting-parallel.zndsl @@ -0,0 +1,35 @@ +Description: Check finality works with approval voting parallel enabled +Network: ./0016-approval-voting-parallel.toml +Creds: config + +# Check authority status. +alice: reports node_roles is 4 + +# Ensure parachains are registered. +alice: parachain 2000 is registered within 60 seconds +alice: parachain 2001 is registered within 60 seconds +alice: parachain 2002 is registered within 60 seconds +alice: parachain 2003 is registered within 60 seconds +alice: parachain 2004 is registered within 60 seconds +alice: parachain 2005 is registered within 60 seconds +alice: parachain 2006 is registered within 60 seconds +alice: parachain 2007 is registered within 60 seconds + +# Ensure parachains made progress. +alice: parachain 2000 block height is at least 10 within 300 seconds +alice: parachain 2001 block height is at least 10 within 300 seconds +alice: parachain 2002 block height is at least 10 within 300 seconds +alice: parachain 2003 block height is at least 10 within 300 seconds +alice: parachain 2004 block height is at least 10 within 300 seconds +alice: parachain 2005 block height is at least 10 within 300 seconds +alice: parachain 2006 block height is at least 10 within 300 seconds +alice: parachain 2007 block height is at least 10 within 300 seconds + +alice: reports substrate_block_height{status="finalized"} is at least 30 within 180 seconds +bob: reports substrate_block_height{status="finalized"} is at least 30 within 180 seconds + +alice: reports polkadot_parachain_approval_checking_finality_lag < 3 +bob: reports polkadot_parachain_approval_checking_finality_lag < 3 + +alice: reports polkadot_parachain_approvals_no_shows_total < 3 within 10 seconds +bob: reports polkadot_parachain_approvals_no_shows_total < 3 within 10 seconds diff --git a/prdoc/pr_4849.prdoc b/prdoc/pr_4849.prdoc new file mode 100644 index 000000000000..185295151068 --- /dev/null +++ b/prdoc/pr_4849.prdoc @@ -0,0 +1,47 @@ +title: Introduce approval-voting-parallel subsystem + +doc: + - audience: Node Dev + description: | + This introduces a new subsystem called approval-voting-parallel. It combines the tasks + previously handled by the approval-voting and approval-distribution subsystems. + + The new subsystem is enabled by default on all test networks. On production networks + like Polkadot and Kusama, the legacy system with two separate subsystems is still in use. + However, there is a CLI option --enable-approval-voting-parallel to gradually roll out + the new subsystem on specific nodes. Once we are confident that it works as expected, + it will be enabled by default on all networks. + + The approval-voting-parallel subsystem coordinates two groups of workers: + - Four approval-distribution workers that operate in parallel, each handling tasks based + on the validator_index of the message originator. + - One approval-voting worker that performs the tasks previously managed by the standalone + approval-voting subsystem. + +crates: + - name: polkadot-overseer + bump: major + - name: polkadot-node-primitives + bump: major + - name: polkadot-node-subsystem-types + bump: major + - name: polkadot-service + bump: major + - name: polkadot-approval-distribution + bump: major + - name: polkadot-node-core-approval-voting + bump: major + - name: polkadot-node-core-approval-voting-parallel + bump: major + - name: polkadot-network-bridge + bump: major + - name: polkadot-node-core-dispute-coordinator + bump: major + - name: cumulus-relay-chain-inprocess-interface + bump: major + - name: polkadot-cli + bump: major + - name: polkadot + bump: major + - name: polkadot-sdk + bump: minor diff --git a/umbrella/Cargo.toml b/umbrella/Cargo.toml index b7c1c375094f..83cbebbc61cd 100644 --- a/umbrella/Cargo.toml +++ b/umbrella/Cargo.toml @@ -600,7 +600,7 @@ runtime = [ "sp-wasm-interface", "sp-weights", ] -node = ["asset-test-utils", "bridge-hub-test-utils", "cumulus-client-cli", "cumulus-client-collator", "cumulus-client-consensus-aura", "cumulus-client-consensus-common", "cumulus-client-consensus-proposer", "cumulus-client-consensus-relay-chain", "cumulus-client-network", "cumulus-client-parachain-inherent", "cumulus-client-pov-recovery", "cumulus-client-service", "cumulus-relay-chain-inprocess-interface", "cumulus-relay-chain-interface", "cumulus-relay-chain-minimal-node", "cumulus-relay-chain-rpc-interface", "cumulus-test-relay-sproof-builder", "emulated-integration-tests-common", "fork-tree", "frame-benchmarking-cli", "frame-remote-externalities", "frame-support-procedural-tools", "generate-bags", "mmr-gadget", "mmr-rpc", "pallet-contracts-mock-network", "pallet-revive-mock-network", "pallet-transaction-payment-rpc", "parachains-runtimes-test-utils", "polkadot-approval-distribution", "polkadot-availability-bitfield-distribution", "polkadot-availability-distribution", "polkadot-availability-recovery", "polkadot-cli", "polkadot-collator-protocol", "polkadot-dispute-distribution", "polkadot-erasure-coding", "polkadot-gossip-support", "polkadot-network-bridge", "polkadot-node-collation-generation", "polkadot-node-core-approval-voting", "polkadot-node-core-av-store", "polkadot-node-core-backing", "polkadot-node-core-bitfield-signing", "polkadot-node-core-candidate-validation", "polkadot-node-core-chain-api", "polkadot-node-core-chain-selection", "polkadot-node-core-dispute-coordinator", "polkadot-node-core-parachains-inherent", "polkadot-node-core-prospective-parachains", "polkadot-node-core-provisioner", "polkadot-node-core-pvf", "polkadot-node-core-pvf-checker", "polkadot-node-core-pvf-common", "polkadot-node-core-pvf-execute-worker", "polkadot-node-core-pvf-prepare-worker", "polkadot-node-core-runtime-api", "polkadot-node-jaeger", "polkadot-node-metrics", "polkadot-node-network-protocol", "polkadot-node-primitives", "polkadot-node-subsystem", "polkadot-node-subsystem-types", "polkadot-node-subsystem-util", "polkadot-overseer", "polkadot-parachain-lib", "polkadot-rpc", "polkadot-service", "polkadot-statement-distribution", "polkadot-statement-table", "sc-allocator", "sc-authority-discovery", "sc-basic-authorship", "sc-block-builder", "sc-chain-spec", "sc-cli", "sc-client-api", "sc-client-db", "sc-consensus", "sc-consensus-aura", "sc-consensus-babe", "sc-consensus-babe-rpc", "sc-consensus-beefy", "sc-consensus-beefy-rpc", "sc-consensus-epochs", "sc-consensus-grandpa", "sc-consensus-grandpa-rpc", "sc-consensus-manual-seal", "sc-consensus-pow", "sc-consensus-slots", "sc-executor", "sc-executor-common", "sc-executor-polkavm", "sc-executor-wasmtime", "sc-informant", "sc-keystore", "sc-mixnet", "sc-network", "sc-network-common", "sc-network-gossip", "sc-network-light", "sc-network-statement", "sc-network-sync", "sc-network-transactions", "sc-network-types", "sc-offchain", "sc-proposer-metrics", "sc-rpc", "sc-rpc-api", "sc-rpc-server", "sc-rpc-spec-v2", "sc-service", "sc-state-db", "sc-statement-store", "sc-storage-monitor", "sc-sync-state-rpc", "sc-sysinfo", "sc-telemetry", "sc-tracing", "sc-transaction-pool", "sc-transaction-pool-api", "sc-utils", "snowbridge-runtime-test-common", "sp-blockchain", "sp-consensus", "sp-core-hashing", "sp-core-hashing-proc-macro", "sp-database", "sp-maybe-compressed-blob", "sp-panic-handler", "sp-rpc", "staging-chain-spec-builder", "staging-node-inspect", "staging-tracking-allocator", "std", "subkey", "substrate-build-script-utils", "substrate-frame-rpc-support", "substrate-frame-rpc-system", "substrate-prometheus-endpoint", "substrate-rpc-client", "substrate-state-trie-migration-rpc", "substrate-wasm-builder", "tracing-gum", "xcm-emulator", "xcm-simulator"] +node = ["asset-test-utils", "bridge-hub-test-utils", "cumulus-client-cli", "cumulus-client-collator", "cumulus-client-consensus-aura", "cumulus-client-consensus-common", "cumulus-client-consensus-proposer", "cumulus-client-consensus-relay-chain", "cumulus-client-network", "cumulus-client-parachain-inherent", "cumulus-client-pov-recovery", "cumulus-client-service", "cumulus-relay-chain-inprocess-interface", "cumulus-relay-chain-interface", "cumulus-relay-chain-minimal-node", "cumulus-relay-chain-rpc-interface", "cumulus-test-relay-sproof-builder", "emulated-integration-tests-common", "fork-tree", "frame-benchmarking-cli", "frame-remote-externalities", "frame-support-procedural-tools", "generate-bags", "mmr-gadget", "mmr-rpc", "pallet-contracts-mock-network", "pallet-revive-mock-network", "pallet-transaction-payment-rpc", "parachains-runtimes-test-utils", "polkadot-approval-distribution", "polkadot-availability-bitfield-distribution", "polkadot-availability-distribution", "polkadot-availability-recovery", "polkadot-cli", "polkadot-collator-protocol", "polkadot-dispute-distribution", "polkadot-erasure-coding", "polkadot-gossip-support", "polkadot-network-bridge", "polkadot-node-collation-generation", "polkadot-node-core-approval-voting", "polkadot-node-core-approval-voting-parallel", "polkadot-node-core-av-store", "polkadot-node-core-backing", "polkadot-node-core-bitfield-signing", "polkadot-node-core-candidate-validation", "polkadot-node-core-chain-api", "polkadot-node-core-chain-selection", "polkadot-node-core-dispute-coordinator", "polkadot-node-core-parachains-inherent", "polkadot-node-core-prospective-parachains", "polkadot-node-core-provisioner", "polkadot-node-core-pvf", "polkadot-node-core-pvf-checker", "polkadot-node-core-pvf-common", "polkadot-node-core-pvf-execute-worker", "polkadot-node-core-pvf-prepare-worker", "polkadot-node-core-runtime-api", "polkadot-node-jaeger", "polkadot-node-metrics", "polkadot-node-network-protocol", "polkadot-node-primitives", "polkadot-node-subsystem", "polkadot-node-subsystem-types", "polkadot-node-subsystem-util", "polkadot-overseer", "polkadot-parachain-lib", "polkadot-rpc", "polkadot-service", "polkadot-statement-distribution", "polkadot-statement-table", "sc-allocator", "sc-authority-discovery", "sc-basic-authorship", "sc-block-builder", "sc-chain-spec", "sc-cli", "sc-client-api", "sc-client-db", "sc-consensus", "sc-consensus-aura", "sc-consensus-babe", "sc-consensus-babe-rpc", "sc-consensus-beefy", "sc-consensus-beefy-rpc", "sc-consensus-epochs", "sc-consensus-grandpa", "sc-consensus-grandpa-rpc", "sc-consensus-manual-seal", "sc-consensus-pow", "sc-consensus-slots", "sc-executor", "sc-executor-common", "sc-executor-polkavm", "sc-executor-wasmtime", "sc-informant", "sc-keystore", "sc-mixnet", "sc-network", "sc-network-common", "sc-network-gossip", "sc-network-light", "sc-network-statement", "sc-network-sync", "sc-network-transactions", "sc-network-types", "sc-offchain", "sc-proposer-metrics", "sc-rpc", "sc-rpc-api", "sc-rpc-server", "sc-rpc-spec-v2", "sc-service", "sc-state-db", "sc-statement-store", "sc-storage-monitor", "sc-sync-state-rpc", "sc-sysinfo", "sc-telemetry", "sc-tracing", "sc-transaction-pool", "sc-transaction-pool-api", "sc-utils", "snowbridge-runtime-test-common", "sp-blockchain", "sp-consensus", "sp-core-hashing", "sp-core-hashing-proc-macro", "sp-database", "sp-maybe-compressed-blob", "sp-panic-handler", "sp-rpc", "staging-chain-spec-builder", "staging-node-inspect", "staging-tracking-allocator", "std", "subkey", "substrate-build-script-utils", "substrate-frame-rpc-support", "substrate-frame-rpc-system", "substrate-prometheus-endpoint", "substrate-rpc-client", "substrate-state-trie-migration-rpc", "substrate-wasm-builder", "tracing-gum", "xcm-emulator", "xcm-simulator"] tuples-96 = [ "frame-support-procedural?/tuples-96", "frame-support?/tuples-96", @@ -1967,6 +1967,11 @@ path = "../polkadot/node/core/approval-voting" default-features = false optional = true +[dependencies.polkadot-node-core-approval-voting-parallel] +path = "../polkadot/node/core/approval-voting-parallel" +default-features = false +optional = true + [dependencies.polkadot-node-core-av-store] path = "../polkadot/node/core/av-store" default-features = false diff --git a/umbrella/src/lib.rs b/umbrella/src/lib.rs index b7b9c15fe588..4a653dab99b6 100644 --- a/umbrella/src/lib.rs +++ b/umbrella/src/lib.rs @@ -796,6 +796,10 @@ pub use polkadot_node_collation_generation; #[cfg(feature = "polkadot-node-core-approval-voting")] pub use polkadot_node_core_approval_voting; +/// Approval Voting Subsystem running approval work in parallel. +#[cfg(feature = "polkadot-node-core-approval-voting-parallel")] +pub use polkadot_node_core_approval_voting_parallel; + /// The Availability Store subsystem. Wrapper over the DB that stores availability data and /// chunks. #[cfg(feature = "polkadot-node-core-av-store")]