update_agent: put state behind RwLock #577

kelvinfan001 · 2021-06-10T02:44:34Z

We would like to put update agent state behind a RwLock so ticks
can be handled sequentially while allowing messages to the
update_agent actor to be handled concurrently, in general.

Closes: #569

lucab · 2021-06-10T13:53:54Z

src/update_agent/mod.rs

@@ -370,7 +372,7 @@ pub(crate) struct UpdateAgent {
    /// Update strategy.
    strategy: UpdateStrategy,
    /// Current status for agent state machine.
-    state: UpdateAgentState,
+    state: Arc<RwLock<UpdateAgentState>>,


Self-note: I instinctively got some doubts on this Arc but I've no reason to believe it is wrong. It may need a short explanation to clarify why a plain RwLock is not enough.

(Also added as a comment in the code) I believe we need to use an Arc because consumers of this field will likely need to own it (e.g. consumers in futures).

src/update_agent/actor.rs

kelvinfan001 · 2021-07-27T02:31:21Z

src/update_agent/actor.rs

                log::trace!(
                    "scheduling next agent refresh in {} seconds",
                    pause.as_secs()
                );
                Self::tick_later(ctx, pause);
            } else {
                let update_timestamp = chrono::Utc::now();
-                actor.state_changed = update_timestamp;
+                actor.info.state_changed = update_timestamp;


Not exactly related to this PR, but I noticed that the logic around updating this state_changed field is highly coupled with the logic around refresh_delay() which is in turn dependent on whether the state was actually changed. I think this made sense back in simpler times when tick_now if and only if change in state; however, there are now cases where a change in state does not necessarily tick_now.
Anyways, IIUC, I think the logic around updating this state_changed field should be refactored to a more intuitive location.

Just wondering, would this state_changed field fit more naturally into the state itself?

I thought about that, but UpdateAgentState is an enum. Would it be a good idea to attach the data to the enum the same way we attach e.g. Release to some of the variants?

I was more thinking about the opposite: turning state into its own struct and making the enum just one of its field, next to the timestamp.
On a second thought, I'm not seeing right now what we are using this state_changed information for. Possibly it's unused at the moment, in which case you can either decide to drop it or to clarify its location and semantics (e.g. so that it could be sanely queried maybe via dbus).

kelvinfan001 · 2021-07-27T02:55:33Z

Some groundwork that happened in this PR includes:

use async/await (instead of Actix-style future combinators) within the tick functions and their helpers
convert the ticks and some of their helpers to methods of the new UpdateAgentInfo instead of UpdateAgent

1. solved the issue of it being super cumbersome to change all the parameters and return types of helper functions to always take ownership of state and return ownership of state once they're done with it, and also makes the code cleaner to read, IMO
2. was necessary because the async block does not have access to (ownership of) an entire UpdateAgent instance; it now only has access to a clone of the "read-only" side of UpdateAgent and a lock guard to the UpdateAgent's state.

We would like to put most of the "writable" parts of the update agent behind a RwLock in order to make sure that, when the update_agent actor receives multiple messages, it sequentially processes the messages that would like to write/update the agent state to avoid races. In order to do this, we explicitly separate out a `state` field and an `info` field in the `UpdateAgent` struct, where the `state` field holds the `UpdateAgentState` that is behind a RwLock. Though nothing enforces this, all the fields (except for `state_changed`) in `UpdateAgentInfo` should be read-only, and needs not be behind any lock. During each tick, because we are passing the asynchronously acquired lock guard of the agent state across futures, we can no longer make use of Actix's `ResponseActFuture` (which contains the actor state and context). We now use the newer `async/await` syntax where we can to make it easier to move data across futures and generally make the code easier to read. For context, the reason we previously stuck to the less ergonomic future combinator (`.then()`) syntax was because we wanted to make use of `ResponseActFuture`.

lucab · 2021-07-27T14:11:44Z

src/update_agent/actor.rs

+                UpdateAgentState::EndState => Ok(()),
+            };
+
+            if modify_state_outcome.is_err() {


It looks like most of the tick_* functions are internally doing error logging/handling, so the error type here is not really used (in fact, it's a ()). Do you think it would be possible to drop the Result at all? That would probably allow factoring our the .await too, I think.

Ah right, there are actually only a couple tick-functions that even make use of this Result. I was originally thinking to make minimal changes to the areas that didn't have to be changed. Taking another look to see if not returning Result is doable.

Follow up to this here #610

lucab · 2021-07-27T14:14:23Z

This is amazing! The side-effect of switching to plain async is possibly even nicer than the improved locking granularity 🎉
I've left a couple of comments related to possible low hanging fruits / cleanups, but the PR itself already looks good.

kelvinfan001 added kind/groundwork area/updates status/on-hold labels Jun 10, 2021

lucab reviewed Jun 10, 2021

View reviewed changes

kelvinfan001 commented Jul 27, 2021

View reviewed changes

kelvinfan001 changed the title ~~WIP: lock update agent state~~ Put update agent state behind RwLock Jul 27, 2021

kelvinfan001 removed the status/on-hold label Jul 27, 2021

lucab reviewed Jul 27, 2021

View reviewed changes

lucab enabled auto-merge July 27, 2021 14:42

lucab added this to the vNext milestone Jul 27, 2021

lucab changed the title ~~Put update agent state behind RwLock~~ update_agent: put state behind RwLock Jul 27, 2021

lucab approved these changes Jul 27, 2021

View reviewed changes

lucab merged commit 10a9c40 into coreos:main Jul 27, 2021

This was referenced Jul 27, 2021

Split out state_changed field from UpdateAgentInfo #609

Closed

update_agent: split state_changed out of UpdateAgentInfo #614

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update_agent: put state behind RwLock #577

update_agent: put state behind RwLock #577

kelvinfan001 commented Jun 10, 2021 •

edited by lucab

Loading

lucab Jun 10, 2021

kelvinfan001 Jul 27, 2021

kelvinfan001 Jul 27, 2021

lucab Jul 27, 2021

kelvinfan001 Jul 27, 2021

lucab Jul 27, 2021

kelvinfan001 commented Jul 27, 2021 •

edited

Loading

lucab Jul 27, 2021 •

edited

Loading

kelvinfan001 Jul 27, 2021

kelvinfan001 Jul 27, 2021

lucab commented Jul 27, 2021

update_agent: put state behind RwLock #577

update_agent: put state behind RwLock #577

Conversation

kelvinfan001 commented Jun 10, 2021 • edited by lucab Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kelvinfan001 commented Jul 27, 2021 • edited Loading

lucab Jul 27, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lucab commented Jul 27, 2021

kelvinfan001 commented Jun 10, 2021 •

edited by lucab

Loading

kelvinfan001 commented Jul 27, 2021 •

edited

Loading

lucab Jul 27, 2021 •

edited

Loading