Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating TaskManager into Discovery #451

Merged
merged 2 commits into from
Sep 29, 2022

Conversation

tegefaulkes
Copy link
Contributor

@tegefaulkes tegefaulkes commented Sep 26, 2022

Description

This PR focuses on updating the Discovery domain to make use of the TaskManager.

Issues Fixed

Tasks

  • 1. Update the Discovery queue to make use of the TaskManager queuing.
  • 2. Remove any tests that are no longer required, I.E. anything testing the async queue aspect of the Discovery since it's handled via TaskManager now.
  • ~ 5. re-enable Discovery tests in the generated CI tests ~ - The problem is ultimately a connection timeout, not addressed here.
  • 3. Update any tests that need it.
  • 4. Add any new tests as needed.

Final checklist

  • Domain specific tests
  • Full tests
  • Updated inline-comment documentation
  • Lint fixed
  • Squash and rebased
  • Sanity check the final build

@tegefaulkes tegefaulkes self-assigned this Sep 26, 2022
@ghost
Copy link

ghost commented Sep 26, 2022

👇 Click on the image for a new way to code review
  • Make big changes easier — review code in small groups of related files

  • Know where to start — see the whole change at a glance

  • Take a code tour — explore the change with an interactive tour

  • Make comments and review — all fully sync’ed with github

    Try it now!

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map Legend

@tegefaulkes tegefaulkes changed the title wip: Discovery depends on TaskManager Integrating TaskManager into Discovery Sep 26, 2022
@CMCDragonkai
Copy link
Member

We can limit the scope of discovery operations because we only care about:

  1. Our own gestalt
  2. Any gestalt we trust
  3. Any gestalt that the user triggers manually

We don't need to crawl transitive gestalts (friends of friends...).

@CMCDragonkai
Copy link
Member

Discovery is likely to be push and pull at the same time. But right now everything in PK is pull-based cause it's easier to implement.

So we can do something simple.

At most we might have 1000 gestalts trusted, and each gestalt is likely a handful of nodes, say 5. Then at the most have 5000 nodes to check in with.

We can say 1 day TTL to revisit each node.

When the push-based gossip protocol is available, we can actually increase the TTL, and that TTL acts like the last-resort in the sense that if no update has occurred from there we can check in.

It could be even simpler of we can expect that any given node of a gestalt should be up to date with their own gestalt's information.

At this point in time, let's make it configurable and see how it goes.

The goal here for this PR is not to make our discovery perfect UI/UX-wise, but to simply integrate the task manager system, simplify the code in discovery so it becomes stateless (with respect to task management), and also incorporate timedCancellable where possible.

@CMCDragonkai
Copy link
Member

So timedCancellable for #450 will be in another PR, but this starts it out for the discovery domain at least.

@tegefaulkes tegefaulkes force-pushed the feature-Discovery_TaskManager_integration branch from ab75115 to 00ce3e1 Compare September 27, 2022 08:12
@tegefaulkes
Copy link
Contributor Author

General changes so far:

  • The parts of discovery that handles queuing has been removed. This includes the 'plug' promises, database levels and the discovery ID generator.
  • The logic of the queue where it handled the vertex was separated out into a new method. This logic included connecting to a vertex, getting the claims' data, iterating over the claims, verifying the claims and adding them to the gestaltGraph, and adding the new vertices to the list.
  • Adding a vertex to the list now consists of creating a singleton task. each task focuses on processing a single vertex.
  • I tried to cancel all tasks on discovery.destroy() but it would require the taskManager to be started at the time since we need to iterate over active tasks at the time. For now cancel the tasks when starting with fresh.
  • To avoid re-trying tasks, a visitedVertices set is used, this hasn't been updated. A more complex mechanism is need for this that has persistence. I think this will be left with the more complex discovery changes later.
  • Discovery tests have been updated. none removed or added so far. The change was just using taskManager.

@tegefaulkes tegefaulkes force-pushed the feature-Discovery_TaskManager_integration branch 2 times, most recently from b5d7f5f to 02912dd Compare September 28, 2022 05:07
@tegefaulkes
Copy link
Contributor Author

Re-based on staging and a quick fix. Should pass CI now.

@tegefaulkes
Copy link
Contributor Author

The Discovery failures is due to proxy.stop() taking a very long time to complete. I could dig into it, but ultimately the problem will be addressed by the cancel-ability changes PR after this one. For now i'll keep the discovery tests disabled.

console.time
    proxy.stop(): 19142 ms
      at Object.<anonymous> (tests/discovery/Discovery.test.ts:240:13)

@tegefaulkes tegefaulkes force-pushed the feature-Discovery_TaskManager_integration branch from 20c1eab to 8408a8f Compare September 28, 2022 05:49
Copy link
Contributor Author

@tegefaulkes tegefaulkes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small changes and discussion needed.

src/discovery/Discovery.ts Show resolved Hide resolved
src/discovery/Discovery.ts Outdated Show resolved Hide resolved
src/discovery/Discovery.ts Outdated Show resolved Hide resolved
src/discovery/Discovery.ts Outdated Show resolved Hide resolved
src/nodes/NodeManager.ts Outdated Show resolved Hide resolved
src/nodes/NodeManager.ts Outdated Show resolved Hide resolved
tests/client/service/gestaltsDiscoveryByNode.test.ts Outdated Show resolved Hide resolved
@tegefaulkes tegefaulkes force-pushed the feature-Discovery_TaskManager_integration branch from b4842f4 to 2d004ee Compare September 28, 2022 07:04
@tegefaulkes
Copy link
Contributor Author

This should be ready to merge pending CI. @CMCDragonkai did you still want to review?

@CMCDragonkai
Copy link
Member

Yes give me a moment.

@tegefaulkes tegefaulkes marked this pull request as ready for review September 28, 2022 07:56
Copy link
Member

@CMCDragonkai CMCDragonkai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial review. Haven't gone into the discovery code yet.

src/PolykeyAgent.ts Outdated Show resolved Hide resolved
src/discovery/types.ts Show resolved Hide resolved
tests/discovery/Discovery.test.ts Show resolved Hide resolved
src/discovery/Discovery.ts Outdated Show resolved Hide resolved
src/discovery/Discovery.ts Outdated Show resolved Hide resolved
src/discovery/Discovery.ts Show resolved Hide resolved
src/discovery/Discovery.ts Outdated Show resolved Hide resolved
@CMCDragonkai
Copy link
Member

The Discovery failures is due to proxy.stop() taking a very long time to complete. I could dig into it, but ultimately the problem will be addressed by the cancel-ability changes PR after this one. For now i'll keep the discovery tests disabled.

console.time
    proxy.stop(): 19142 ms
      at Object.<anonymous> (tests/discovery/Discovery.test.ts:240:13)

Please link the next PR to here. Applying timedCancellable across the entire codebase.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Sep 29, 2022

As per: https://github.com/MatrixAI/Polykey/pull/451/files/2d004eeb5bc7270320ecef7fb0cb3ebfe43877c8#r983028305

It is possible to remove these mocks from our tests:

const mockedRefreshBucket = jest.spyOn(
  NodeManager.prototype,
  'refreshBucket',
);

If we realise that what we need is to have 2-stage start for some of our objects.

The NodeManager.start is similar to TaskManager.start in the sense that there are 2 stages here.

  1. Making the object ready to be interacted with
  2. Actually starting the background tasks (live side-effects)

We can reuse the pattern of lazy parameter, by putting a lazy parameter into NodeManager.start.

Then we also need to move NodeConnectionManager.start's ops in setting the seed node to node manager into NodeManager.syncNodeGraph:

    // Adding seed nodes
    for (const nodeIdEncoded in this.seedNodes) {
      const nodeId = nodesUtils.decodeNodeId(nodeIdEncoded);
      if (nodeId == null) never();
      await this.nodeManager.setNode(
        nodeId,
        this.seedNodes[nodeIdEncoded],
        true,
      );
    }

The above code seems that it can just be done as part of NodeManager.syncSeedNodes.

Then in NodeManager.start, we enclose the background operations as part of:

    if (!lazy) {
      await this.startSync();
    }

So this startSync would be similar to TaskManager.startProcessing, in that it would perform these 2 operations:

    async startSync() {
      await this.syncSeedNodes(); // Replaced name for `syncNodeGraph`
      await this.setupRefreshBucketTasks();
    }

Can we then have an equivalent stopSync? Similar to stopProcessing? Or does this not matter.

I guess it would whatever NodeManager.stop is currently doing, but just factored out to stopSync.

Atm, it looks like syncSeedNodeGraph doesn't actually get cancelled if we call NodeManager.stop, therefore there is a place to create a stopSync method that NodeManager.stop ends up calling as well.

This would basically make NodeManager very similar to TaskManager.

Then in the PolykeyAgent, under createPolykeyAgent, it would need the lazy: true, and then under start, it would do something like:

      await this.taskManager.start({ fresh, lazy: true });
      await this.nodeManager.start({ lazy: true });
      await this.nodeConnectionManager.start({ 
        nodeManager: this.nodeManager 
      });
      await this.nodeGraph.start({ fresh });
      await this.discovery.start({ fresh });
      await this.vaultManager.start({ fresh });
      await this.notificationsManager.start({ fresh });
      await this.sessionManager.start({ fresh });
      if (lazy) {
        await this.nodeManager.startSync();
        await this.taskManager.startProcessing();
      }

Notice the lazy boolean here, this propagates the pattern to PolykeyAgent. Which would mean upon testing PolykeyAgent such as in the discovery tests, rather than mocking out specific side-effects, we can just start the PolykeyAgent lazily, meaning none of the background tasks are started. If we want to do that, we would need another method PolykeyAgent.startX (not sure what X should be called). But that would then start the background tasks for PolykeyAgent.


What does it mean to do tests on PK agent without having it run background tasks? It seems that this means that it would avoid interference if we are trying to do a deterministic test.

So imagine you want to test the PK agent to do a specific direct connection or a specific call, and you would prefer that these background tasks aren't running. Then it does make sense to have this ability of staging the second stage of start.

So now we have 3/4 stages:

  1. Async creation
  2. Sync construction
  3. Async start
  4. Async start background stuff

@CMCDragonkai
Copy link
Member

The fwxProxy logger name should just be Proxy.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Sep 29, 2022

@tegefaulkes can you create a new issue for discovery refactoring: #451 (comment) (needs TTL) - this can be done in separate PR

And also a new issue for this: #451 (comment) - this can be done separate PR

@tegefaulkes
Copy link
Contributor Author

tegefaulkes commented Sep 29, 2022

All review addressed now. I'll clean up the commit history again, merge, and create the new issues/PRs.

- The parts of discovery that handles queuing has been removed. This includes the 'plug' promises, database levels and the discovery ID generator.
- The logic of the queue where it handled the vertex was separated out into a new method. This logic included connecting to a vertex, getting the claims' data, iterating over the claims, verifying the claims and adding them to the gestaltGraph, and adding the new vertices to the list.
- Adding a vertex to the list now consists of creating a singleton task. each task focuses on processing a single vertex.
- I tried to cancel all tasks on `discovery.destroy()` but it would require the `taskManager` to be started at the time since we need to iterate over active tasks at the time. For now cancel the tasks when starting with fresh.
- To avoid re-trying tasks, a `visitedVertices` set is used, this hasn't been updated. A more complex mechanism is need for this that has persistence. I think this will be left with the more complex discovery changes later.
- Discovery tests have been updated. none removed or added so far. The change was just using `taskManager`.
- Removed discovery utils and types. These are not needed anymore since we don't make use of a discovery ID generator.
- Added locking on the vertex to avoid duplicating a task due to racing. Also catching and throwing the singleton reason, so we don't get any errors on tasks that were canceled for this reason.
- Discovery tasks are canceled during fresh start and destroy.
-`Discovery` stops any active tasks when stopping
- Added a check for if `TaskManager` is still processing when calling stop on `Discovery` and `NodeManager`.
@tegefaulkes tegefaulkes force-pushed the feature-Discovery_TaskManager_integration branch from 25f4ad3 to 6d8f606 Compare September 29, 2022 06:48
@tegefaulkes
Copy link
Contributor Author

Squashed and re-based on staging.

@CMCDragonkai
Copy link
Member

Please link the new issues you've created here too.

@tegefaulkes
Copy link
Contributor Author

Issue #461 created to address #451 (comment).

@tegefaulkes
Copy link
Contributor Author

Issue #462 created to address further discovery refactoring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants