-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(utxo): prioritize electrum connections #1966
base: dev
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your effort!
The PR seems to be in its early stages right now. This is not an actual review, I just did a quick review to point out some areas that should not be forgotten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is still in progress, I left some comments that can help you improve the code.
I will do a more thorough review when this is ready for review!
Please also merge with latest dev and start adding doc comments.
mm2src/coins/utxo/rpc_clients.rs
Outdated
pub struct ElectrumClientImpl { | ||
weak_self: Mutex<Option<Weak<ElectrumClientImpl>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very bad design choice, the struct should never reference it self to not create any unstable behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just introduce a new struct ? e.g
struct ElectrumClientImplWeak(Mutex<Option<Weak<ElectrumClientImpl>>>)
mm2src/coins/utxo/rpc_clients.rs
Outdated
struct ConnMng(Arc<ConnMngImpl>); | ||
|
||
impl ConnMng { | ||
async fn suspend_server(self, address: String) -> Result<(), String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a new implementation, please start using specific error types and MmError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mm2src/coins/utxo/rpc_clients.rs
Outdated
spawner.spawn(async move { | ||
let state = conn_mng.0.guarded.lock().await; | ||
state.connecting.store(false, AtomicOrdering::Relaxed); | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please elaborate on this more? ConnectingStateCtx
will be considered dropped before connecting is set to false if the lock was held by other operation for sometime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly should have been more detailed! Thanks for pointing this out! This place is the heart of selective connectivity.
mm2src/coins/utxo/rpc_clients.rs
Outdated
let address: String = { | ||
if address.is_some() { | ||
address.map(|internal| internal.to_string()) | ||
} else { | ||
guard.active.as_ref().cloned() | ||
} | ||
}?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let address: String = { | |
if address.is_some() { | |
address.map(|internal| internal.to_string()) | |
} else { | |
guard.active.as_ref().cloned() | |
} | |
}?; | |
let address = address.map_or_else(|| guard.active.as_ref().cloned(), |internal| Some(internal.to_string()))?; |
38adc56
to
cfbeea0
Compare
splitting the big function into smaller ones and sharing the sharable parts between native and wasm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next iteration! I wanted to finish the review completely this time, but decided to give it another last look after the below comments are addressed!
Please fix failing CI steps. One thing to add to the tracking issue checklist (ref. #1966 (comment)) is an RPC to get the list of currently connected / used electrums, so we remember to add this in the future.
mm2src/coins/utxo/rpc_clients/electrum_rpc/connection_manager/manager.rs
Show resolved
Hide resolved
mm2src/coins/utxo/rpc_clients/electrum_rpc/connection_manager/manager.rs
Show resolved
Hide resolved
mm2src/coins/utxo/rpc_clients/electrum_rpc/connection_manager/manager.rs
Outdated
Show resolved
Hide resolved
mm2src/coins/utxo/rpc_clients/electrum_rpc/connection_manager/manager.rs
Show resolved
Hide resolved
22c21e0
to
555b958
Compare
…s fail this is kind of doing the connection managers work as a last resort fail safe mechanisim when electrum request fails, that is, we establish all connections sequentially and try them. things we need to consider: - we prolly need to introduce wait_till_connected back (now that we don't wait forever) - we don't disconnect any of the established connections, they will disconnect automatically when not used for a long time, but for this long time, we might break the `max_connected` rule.
let the background thread of the connection manager respect the `max_connected` threshold (not entirely really, we can reach up to `max_connected + 1`). This is done by refactoring the connection establishment to not establish all connections but only connections that we know we will maintain for sure. This also is lighter overall than the old approach. but it's also slower since connections are tried sequentially, we don't need the speed here anyways since if an electrum request fails we fail back to try all connections which is a logic that isn't controlled by the connection manager in any way.
removed write_maintained_connections and added maintain & unmaintain methods.
555b958
to
f9866b1
Compare
disconnecting like that broke a lot of requests, we probably shouldn't disconnect from some thread since we don't know if another thread is holding the connection to use it
5d03fe4
to
6638b6a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes, reviewed the latest changes only, still one more review to go!
This connection manager can become a great crate in the future to be used by bitcoin wallet devs or any other UTXO wallet dev, I don't think they will find a better electrum client than this. But in time they will just use KDF straight away :)
A few notes for you @mariocynicys / me to remember:
- I guess we need to merge this chore(tests): don't use
.wait()
and useblock_on
instead #2220 first. - There is a failing test in CI that you should check
mm2_tests::mm2_tests_inner::test_cancel_all_orders
mm2src/coins/utxo/rpc_clients/electrum_rpc/connection_manager/manager.rs
Show resolved
Hide resolved
mm2src/coins/utxo/rpc_clients/electrum_rpc/connection_manager/manager.rs
Outdated
Show resolved
Hide resolved
This reverts commit d98fd21. This should make a lot of tests fail again, proving we shouldn't randomly just disconnect a connection from different threads.
… checking when doing version checking, the connection isn't marked as maintained yet, this means flagging the not needed will disconnect it immediately. a fix for this is to keep a state of `Connecting` for example to avoid disconnecting a connection which is in progress, eaiser solution is to not ask for disconnection for connections that we didn't force connect. so only if we caused the connection to connect will we call for it to be not needed anymore
for the fall back case. But delegate whether to disconnect or not to the connection manager. this helps with concurrent requests disconnecting for no reason (so no error) and breaking other requests
this was used to make sure the background task have found a working connection by the time we start querying electrum requests. we don't need this now though. because if there are no working/maintained connections we fall back to using any/all other connections (and we establish the connection inline first)
the test was failing because the expected request id was 0 and not 1. this is becasue the server version request was constructed AFTER the get verbose transaction request. this might not be always the case, so the background thread might in some (rare) instances be fast enough and establish the connection and query for version before get verbose transaction is called, so the id 0 is reserved then for the server version query. lets just remove the id part in the expected string as we don't really care about it
to avoid a potential starvation due to a potential busy waiting loop
02a5d69
to
b546bbd
Compare
This currently isn't really problamatic, but if re-used later could have a hard to debug nightmares the solution here would be to totally order both operations favoring abortion over finishing. In `AbortableQueue` we would need to realize this by checking the future ID in question and not call `on_future_finished` if it's already aborted
this was the side effect we had (described in the previous commit) in abortable queue. on_future_finished was called AFTER the abortable queue has been fully cleared and reset. We need to make sure first that an abort handle the future actually exists in the abort handlers vector and that the future is aborted (i.e. the same future ID wasn't used to run a new future that is still running).
c5e75d9
to
99beb7e
Compare
5777be3
to
2f9c568
Compare
This PR refactors the electrum part of the RPC clients. And also introduces a new feature related to connection management decisions.
Within the electrum enable request, one could now specify two parameters (
min_connected
&max_connected
).min_connected
: KDF needs to keep at leastmin_connected
working active (maintained) connections at all times.max_connected
: KDF needs to keep at mostmax_connected
working active (maintained) connections at any time.selective
/single-server effect set these parameters to (1, 1)multiple
/all-servers effect, set these parameters to (1, len(servers)) (or don't set at all, as this is used as the default)len(servers)
connections and keep them maintained. As some connections get lost for whatever reason, they will not be retried right away. Retries only take place:min_connected
, which is1
in this case.10
servers at all times, and if we ever fall below10
we fire another round of reconnections. After this round we are guaranteed to have <=68
connections.10
connections now, we lose1
and have9
now. A round of reconnections start and we end up with some number of active connections between9 (assuming only our 9 connected servers were the non-faulty ones)
and68
. If after the reconnections round we still fall below10
=min_connected
connections, we keep retrying.0 < min_connected <= max_connected
<= len(servers)
The connection manager maintains these maintained (working active) connections and these are the ones used for any electrum request (they are all used concurrently and whichever returns the success result first is considered the response for this request).
For requests directed for specific servers (like server_version, get_block_count_from, etc...), the connections to these servers are established first if they are not in the maintained servers set and then the request is performed. So even if the server isn't maintained/active, it could still be queried.
The servers order in the electrum enable request encode an implicit priority (previously in this PR, was encoded as
primary
&secondary
). Servers that appear first in the list are assumed to be more robust and thus have higher priority compared to servers after.Since we do an altruistic round of retries every 10 minutes, if we ever had the case were a high priority server was down and thus wasn't maintained by us and now is back online, we will prefer maintaining it over another already-maintained low priority server (this is assuming the maintained servers set is full, i.e.
len(maintained server) == min_connected
). Simply put, high priority servers will kick out low priority ones when they come back online if we have no room to fit both.Addresses #791