Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[draft] Split full update flow by components / linear components workflow #179

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

l0kix2
Copy link
Collaborator

@l0kix2 l0kix2 commented Mar 21, 2024

Disclaimer: This is an incomplete draft of change, so some tests are not green yet (since not everything even implemented) and some code is messy, though main intention can be seen. Also I've intentionally haven't removed unused code for ease of review (it will be done in separate review for convenience).

Motivation

Make introducing changes in flow/components easier by simplifying flow, replacing bidirectional dependency with one-directional and more straightforward component update flow.
Make components more easy to test independently.
Improve ops experience of operator by introducing linear components flow.

Short version

Before: orchestrator sets statuses and components act according to them
After: components don't check any statuses but their own

Before: full update flow with mix of actions on ytsaurus and components sync
After: only order of components is set in the main loop

Before: order of component updates are dynamic
After: order of components update is declared in code

For now it is no sense to read all the change, since it is noisy.
It makes sense to check

Long version

I've removed so called "full update flow" (parts of it can be seen in sync.go and ytsaurus_client.go and components ). As a result main flow of components updating orchestration can be significantly simplified.
Of course things, that happened in the "full update flow" can't be thrown away, but they can be nicely placed in corresponding components.

So instead of:

[Check Possibility] -> [Safe Mode] -> [Delete Tablet Cells] -> [MasterSnapshots] -> [All Pods Rebuilt] -> [Exit Master Read Only] -> [Recover Tablet Cells] -> [Update Op Archive] -> [Init QT] -> [Disable Safe Mode]

we will have:

- Master flow: [Check Possibility] -> [Safe Mode] -> [MasterSnapshots] -> [Master rebuilt] -> [Exit Master Read Only] -> [Disable Safe Mode]
- Tablet Nodes flow: [Delete Tablet Cells] -> [Tablet Nodes Rebuilt] -> [Recover Tablet Cells]
- Scheduler flow [Scheduler Rebuilt] -> [Update Op Archive]
- QT flow: [QT Rebuilt] -> [Init QT] 

Why it is better? That way components doesn't need to check main resource status/update status and they become more loosely coupled with main loop and each other. Instead two way link (ytsaurus checking status of components <-> components checking state of ytsaurus) we have simple contract: component should report their status and main loop will scheduler sync when needed. So components became easier to write, because if their sync was called they must advance in their flow without checking anything other their own state. Also it is easier to test components separately example (quick and dirty) can be seen here.

Another consequence of change if we can easily implement linear order of updating components as described in #115. It can look somewhat like this.

Also with moving pieces of full update inside the componets I've redesigned their flow in a more declarative way, because 1) current version relied on update statuses, which are set from outside and code should be updated 2) current flow is not straightforward and hard to update it without breaking something.

To be done

  • Fix all existing tests
  • Add e2e test with all the components to ensure that nothing broken (currently only main components are tested) — this should be done in main branch in separate PR
  • support full update mode for master (possibly after rolling master update implemented with new setting)

Base automatically changed from l0kix2/testutil_improvements to main March 21, 2024 16:32
@l0kix2 l0kix2 force-pushed the l0kix2/115_incapsulate_full_update branch 4 times, most recently from 5dea6a8 to 100761c Compare March 22, 2024 21:00
@l0kix2 l0kix2 force-pushed the l0kix2/115_incapsulate_full_update branch 3 times, most recently from 73ee666 to 73fce2f Compare April 9, 2024 12:10
@l0kix2 l0kix2 force-pushed the l0kix2/115_incapsulate_full_update branch 2 times, most recently from fc1e2eb to e602bfb Compare April 25, 2024 11:40
@l0kix2 l0kix2 force-pushed the l0kix2/115_incapsulate_full_update branch from a6878d1 to d03f2e5 Compare May 24, 2024 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant