Supporting trusted seed nodes (`nodes` domain refactoring, static sources for seed nodes) #289

joshuakarp · 2021-11-11T04:10:02Z

Description

Supporting trusted seed nodes configuration.

Issues Fixed

Fixes Embed Trusted Default Seed Nodes into src/config.ts for bootstrapping the P2P network #269
Does not address Propagate networking connection error handling into NodeConnection error handling #224

Tasks

Refactor nodes domain:
- - Refactor all notions of broker into seed nodes
- - Remove separation of brokerNodeConnections - the seed nodes should be treated as regular nodes, that get added to the NodeGraph (Kademlia system)
- - Change NodeAddress.ip to NodeAddress.host
DNS resolution
- - resolveHost function to resolve a hostname into an IP address (Host type)
- - figure out where resolveHost calls are required, and add them in
Support various seed node sources:
- - Resolve all 3 sources to a common object type (NodeMapping: NodeId -> NodeAddress - i.e. NodeBucket without the lastUpdated field)
- CLI arguments
  - - Parser for nodeId@host:port structure
  - ~~resolve issue with variadic options and argParser Using Option.parseArgs with variadic and default options tj/commander.js#1641~~ no longer a problem - see Embed Trusted Default Seed Nodes into src/config.ts for bootstrapping the P2P network #269 (comment)
- Environment variable PK_SEED_NODES
  - - Parser for semicolon separated structure
  - - Figure out if possible to use commander's .env with this parsing logic (instead of just a straight read from environment)
- src/config.ts
  - - Parser for config.ts POJO structure
- verification:
  - - single CLI argument
  - - multiple CLI arguments
  - - single in PK_SEED_NODES environment variable (and takes precedence over src/config.ts)
  - - multiple in PK_SEED_NODES environment variable (and takes precedence over src/config.ts)
  - - single in src/config.ts
  - - multiple in src/config.ts
  - - combination of CLI, environment, src/config.ts - should only take CLI arguments
  - - combination of CLI, environment, src/config.ts with <seed-nodes> flag - should take CLI arguments as well as src/config.ts
- tests:
  - - combination of CLI, environment, src/config.ts - should only take CLI arguments
  - - combination of environment, src/config.ts and <seed-nodes> flag - should take all

~~4. Resolve propagation of NodeConnection errors (from networking domain) - see #224~~ - will now be handled in #225

Final checklist

joshuakarp · 2021-11-11T04:11:01Z

This PR has been branched from PR #283. Ideally merging order should then be: Gitlab MR -> #283 -> this PR

joshuakarp · 2021-11-11T04:12:14Z

In the process of refactoring nodes, when thinking about on NodeGraph.start and how it will potentially re-add existing broker nodes into the Kademlia system, I discovered a small error when updating a node in a full bucket. When a new node is added to a full bucket, the expectation is that the least active node is removed from the bucket, in favour of this new node. However, there was an odd error that I was encountering where the most recently added node was the one being removed on a full bucket. For some reason, the date comparison in:

      const leastActive = bucketEntries.reduce((prev, curr) => {
        return prev[1].lastUpdated < curr[1].lastUpdated ? prev : curr;
      });

was saying that 2021-11-11T03:42:22.700Z was not earlier than (<) 2021-11-11T03:42:22.791Z, as per the following:

    prev {
      address: { ip: '1.1.1.1', port: 1 },
      lastUpdated: '2021-11-11T03:42:22.700Z'
    } curr {
      address: { ip: '0.0.0.1', port: 1234 },
      lastUpdated: 2021-11-11T03:42:22.791Z
    }
	
	// prev[1].lastUpdated < curr[1].lastUpdated = false
	// prev[1].lastUpdated = 2021-11-11T03:42:22.700Z 
	// curr[1].lastUpdated = 2021-11-11T03:42:22.791Z

What I realised was that it was actually doing a string comparison, and not a Date comparison. Note the lack of ' around curr.lastUpdated - this is because it's a Date object, whereas prev.lastUpdated is a string. This was because the string date was the date being acquired from the database, whereas the Date date was being added to the bucket locally, and hadn't been retrieved from the database.

Therefore, an easy fix:

      const leastActive = bucketEntries.reduce((prev, curr) => {
        return new Date(prev[1].lastUpdated) < new Date(curr[1].lastUpdated) ? prev : curr;
      });

I've also added a test for this scenario in 747a009

joshuakarp · 2021-11-11T06:07:43Z

To provide support for both hostnames and IP addresses in our Kademlia system, I'm hoping to make to make this as simple as possible by simply extending the NodeAddress type:

type NodeAddress = {
  ip: Host;
  port: Port;
};

to the following:

type NodeAddress = {
  host: IPAddress | Hostname;
  port: Port;
};

where both IPAddress and Hostname are both opaque strings. Then, to determine whether we need to resolve the host to an IP address, we simply do a typeof on the host field.

However, thinking about this some more, this won't work. The NodeAddress will be stored in leveldb. Upon retrieval, there's no notion of typing: the host field will strictly be a string.

Perhaps instead of this then, we'll need a function to parse the host on retrieval, and determine whether the host field is either an IP address or hostname. From memory, we already have an isIPv4 function somewhere, so that would be sufficient. To save ourselves from having to parse the host every time we retrieve, perhaps we should also add a boolean flag to the NodeAddress - isHostname or something of the sort. I'm not a huge fan of this though, as it makes it messy.

CMCDragonkai · 2021-11-11T06:41:46Z

If you change Host, there's alot of other places where it is used. Maybe instead just use Host and Hostname?

That way you know that Hostname is DNS. But Host is always IP.

See: https://en.wikipedia.org/wiki/Hostname it's always DNS.

joshuakarp · 2021-11-11T06:54:56Z

If you change Host, there's alot of other places where it is used. Maybe instead just use Host and Hostname?

That way you know that Hostname is DNS. But Host is always IP.

See: https://en.wikipedia.org/wiki/Hostname it's always DNS.

Yeah that seems fine, but I don't think changing the types actually resolves the problem of supporting hostnames, because they're just stored as strings in leveldb. So whenever we retrieve from the database, there's no type information to discern whether it's an IP address or a hostname. So I'm thinking we'll need a parser to check what it is? Is there a simpler solution?

CMCDragonkai · 2021-11-11T06:57:16Z

If you have a function that can take Host or Hostname, you would need another function to parse/check what it is and react accordingly. This should be done on the boundary of your domain at least. Then your domain internals can handle specifically Host.

In fact I reckon there should be only 1 or 2 places where Hostname type is used. And it's always at the boundary. I'd recommend keeping it around as part of node connection metadata though, could be useful for debugging or analytics.

CMCDragonkai · 2021-11-11T06:57:59Z

See: https://nodejs.org/api/net.html#netisipinput for checking if it is an IP. All other things are going to be hostnames.

I think we do use this a bit?

Tests if input is an IP address. Returns 0 for invalid strings, returns 4 for IP version 4 addresses, and returns 6 for IP version 6 addresses.

CMCDragonkai · 2021-11-12T01:47:47Z

BTW this requires fixing:

      const [client,] = await Promise.all([
        GRPCClientAgent.createGRPCClientAgent({
          nodeId: this.targetNodeId,
          host: this.ingressHost,
          port: this.ingressPort,
          proxyConfig: this.proxyConfig,
          logger: this.logger.getChild(GRPCClientAgent.name),
        }),
        Array.from(brokerConnections, ([_, conn]) =>
          conn.sendHolePunchMessage(
            this.keyManager.getNodeId(),
            this.targetNodeId,
            egressAddress,
            signature,
          ),
        ),
      ]);

It nests the promise array: Promise.all([..., [...]]).

You should be spreading it. Use ...Array.from to spread it into the root array. That way Promise.all can wait for all it.

Right now this is promise leak.

joshuakarp · 2021-11-12T02:23:38Z

If you have a function that can take Host or Hostname, you would need another function to parse/check what it is and react accordingly. This should be done on the boundary of your domain at least. Then your domain internals can handle specifically Host.

In fact I reckon there should be only 1 or 2 places where Hostname type is used. And it's always at the boundary. I'd recommend keeping it around as part of node connection metadata though, could be useful for debugging or analytics.

There's a few instances where Hostname will be used:

when a seed node is read from its source (environment variable, src/config.ts, config file)
when retrieved from the database, a node will either have an IP address or hostname (Host or Hostname)
if a node's address is a hostname, then this will also always be transferred across the network when required/discovered (as opposed to its resolved IP address) - i.e. we read from the database, and immediately transfer it across

Furthermore, as far as I can tell, the only place we will need to resolve a hostname is when creating a connection to it (or, more generally, when we need to actually perform any network-related actions to the host).

As such:

resolving the hostname in NodeManager.getNode doesn't make sense, as we'll sometimes need the Hostname (when transferring, or placing in NodeConnection metadata)
resolving the hostname before inserting into the database doesn't make sense either, as per above. The resolved IP address should be seen as dynamic

Therefore, I see the best solution as only resolving the hostname to an IP when it's required for the networking work. This would mean that it is resolved internally, as opposed to on the boundary. It seems that in all other places, it's preferable to be preserving it as a Hostname.

EDIT: There was some discussion in our sprint meeting today (15/11/2021) about whether this might be the best approach.

We need to think about whether there are any implications of allowing hostnames for any arbitrary keynode. (Right now, we treat the seed nodes as an arbitrary keynode, and as such, a Hostname is a perfectly valid entry in the NodeGraph)

Right now, the expectation is that only seed nodes would be utilising a hostname. Are there consequences for potentially allowing this for regular keynodes?

joshuakarp · 2021-11-12T04:30:08Z

It's quite useful to have extended the NodeAddress type for the ip field to be either a Host | Hostname:

type NodeAddress = {
  ip: Host | Hostname;
  port: Port;
};

for type safety. That is, when retrieving from the database, or getting some NodeId -> NodeAddress mapping from somewhere, the compiler can tell you explicitly when you'll need to resolve to a Host type for the networking-related functionality.

CMCDragonkai · 2021-11-12T04:56:25Z

Can that be?

type NodeAddress = {
  host: Host | Hostname;
  port: Port;
};

joshuakarp · 2021-11-12T04:57:03Z

Can that be?

type NodeAddress = {
  host: Host | Hostname;
  port: Port;
};

Yeah, that's easy enough, I'll change it.

EDIT: This is done: 491bcd8

joshuakarp · 2021-11-12T04:58:51Z

I've realised that #283 hasn't been rebased on master after I merged the changes from #284, and therefore I'm also missing those changes here. I'd thought that this had already been done, but I'll do it now for both these branches.

joshuakarp · 2021-11-12T05:05:13Z

I've rebased this on master now, so it includes the changes from #284.

joshuakarp · 2021-11-12T06:18:50Z

nodes domain changes:

changed NodeAddress.ip to NodeAddress.host

`NodeManager`:

replaced usage of broker with seed
removed brokerNodeConnections as a separate map - connections to seed nodes are added to the regular NodeConnectionMap like any other node. Furthermore, connections to the seed nodes are instantiated on NodeManager.start
- createConnectionToBroker has therefore been removed
- I'm still retaining a separate map of the seed nodes' NodeId -> NodeAddress mappings
- note that currently, NodeConnection requires a map of seed node connections to establish its connection (in order to relay a hole punch message to a target node). Right now, I've added a getConnectionsToSeedNodes function that simply retrieves each seed node's NodeConnection from the NodeConnectionMap - this is assumed to be a constant time lookup, as the connections are established on start
any usage of NodeAddress.host for forming connections has been replaced with resolveHost. Otherwise, this has remained the same

`NodeGraph`:

fixed problem with date comparison on lastUpdated field Supporting trusted seed nodes (nodes domain refactoring, static sources for seed nodes) #289 (comment)

`NodeConnection`:

replaced usage of broker with seed
added a field for Hostname (defined if the provided host is a Hostname, else undefined) as per Supporting trusted seed nodes (nodes domain refactoring, static sources for seed nodes) #289 (comment)

joshuakarp · 2021-11-17T03:02:51Z

I've been having some troubles with commander and getting the argParser to work as expected for our seed nodes.

We expect that a user can supply multiple seed nodes on start. For example: pk agent start --seed-node [email protected]:1314 [email protected]:1314

This means that, like the --ingress-host flag, we need to make this variadic. However, we also want to default to the seed nodes from the environment variable, or from the config.ts.

However, I'm noticing some issues with the usage of argParser - specifically, when parsing the provided CLI arguments, it also parses the default and environment variable values. I've created an upstream issue tj/commander.js#1641 to try to see if there's a means of getting around this. See the issue there for more details on this.

joshuakarp · 2021-11-17T06:41:49Z

Specifically documenting the issue:

For the following in src/bin/options.ts:

function parseSeedNodes(rawSeedNode: string): NodeMapping {
  const idHostPort = rawSeedNode.split(/[@:]/);
  if (idHostPort.length != 3) {
    throw new commander.InvalidOptionArgumentError(`${rawSeedNode} is not of format 'nodeId@host:port'`)
  }
  if (!nodesUtils.isNodeId(idHostPort[0])) {
    throw new commander.InvalidOptionArgumentError(`${idHostPort[0]} is not a valid node ID`);
  }
  if (!networkUtils.isValidHostname(idHostPort[1])) {
    throw new commander.InvalidOptionArgumentError(`${idHostPort[1]} is not a valid hostname`);
  }
  const port = parseNumber(idHostPort[2]);
  const seedNode: NodeMapping = {
    id: idHostPort[0] as NodeId,
    host: idHostPort[1] as Host | Hostname,
    port: port as Port,
  };
  return seedNode;
  // remove curr prev, just try curr
}

const seedNodes = new commander.Option(
  '-sn, --seed-node [nodeId@host:port...]',
  'Seed Node address mapping',
)
  .argParser(parseSeedNodes)
  .env('PK_SEED_NODES') // TODO: need to figure out how to parse env
  .default(getConfigSeedNodes());

And the following test:

    describe('Providing seed nodes on start', () => {
      const seedNode1Id = makeNodeId('vi3et1hrpv2m2lrplcm7cu913kr45v51cak54vm68anlbvuf83ra0');
      const seedNode1Host = 'testnet.polykey.io' as Hostname;
      const seedNode1Port = 1314 as Port;

      const seedNode2Id = makeNodeId('v359vgrgmqf1r5g4fvisiddjknjko6bmm4qv7646jr7fi9enbfuug');
      const seedNode2Host = 'testnet.polykey.io' as Hostname;
      const seedNode2Port = 1315 as Port;

      test.only('multiple -sn CLI arguments', async () => {
        const result = await testUtils.pk([
          'agent',
          'start',
          '-np',
          foregroundNodePath,
          '--password-file',
          passwordFile,
          '-sn',
          `${seedNode1Id}@${seedNode1Host}:${seedNode1Port}`,
          `${seedNode2Id}@${seedNode2Host}:${seedNode2Port}`,
        ]);

Logging options in src/bin/agent/start.ts provides the following output:

    {
      format: 'human',
      fresh: false,
      clientHost: 'localhost',
      clientPort: 0,
      ingressHost: [ '0.0.0.0' ],
      ingressPort: 0,
      seedNode: {
        id: 'v359vgrgmqf1r5g4fvisiddjknjko6bmm4qv7646jr7fi9enbfuug',
        host: 'testnet.polykey.io',
        port: 1315
      },
      nodePath: '/run/user/1000/polykey-test-hSlgpb/foreground',
      passwordFile: '/run/user/1000/polykey-test-hSlgpb/passwordFile'
    }

i.e. only one of the seed nodes present when using argParser without the curr and prev style.

joshuakarp · 2021-11-19T04:31:43Z

I'm having some problems with the session token seemingly not being created on pk unlock for my tests. e.g. trying to do the following:

pk agent start (with some seed node)
pk agent unlock --password-file
pk node find (the seed node - to verify it's propagated correctly)

When attempting to create and start GRPCClientClient with these commands, I'm getting the following error:

Error:
Cannot read property 'socket' of null

I've traced this to GRPCClient.start's call to const socket = session.socket as TLSSocket;.

Given that the session authentication has changed a bit in the Gitlab MR, I'm considering waiting to rebase once it's merged such that these problems might just resolve themselves. Some discussion of changes from here onwards: https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/213#note_737728551

joshuakarp · 2021-11-24T23:46:26Z

I've squashed the commits of this PR down. Given that I'd initially branched from #283, I've squashed those commits from #283 down to a single, separate commit d4fde43. There's some commits I'm missing from #283, but I expect that this will be resolved after #283 is merged into master, and I make another rebase on master after that.

joshuakarp · 2021-11-24T23:47:19Z

Lots of merge conflicts to resolve after rebasing on master! Spending some time on it.

joshuakarp · 2021-11-26T02:37:01Z

Didn't end up rebasing yesterday (there were a few issues to resolve in #283), but have rebased from #283 now (which was rebased on master).

joshuakarp · 2021-11-28T23:44:39Z

Current status with this PR:

rebased on Implementing BIP39 recovery, Unattended Bootstrap and Agent Start and Status #283 now (which brought the changes in from master too)
still need to complete the pk agent start tests at the top-level description
for now, I'm going to cross off the 'Resolve propagation of NodeConnection errors (from networking domain)' checklist item, given that we'll be undertaking a big refactoring effort of NodeConnection in Refactoring Nodes Domain: NodeConnection & NodeConnectionManager and Fixing Bugs #225. All of this refactoring effort will be in a separate PR

I know there's been some refactoring in #283, so depending on merge order, it may make sense for me to rebase on #283 again before merging.

joshuakarp · 2021-11-29T02:19:45Z

I was having some initial issues with setting up the required tests here: #289 (comment)

Given that the session refactoring has changed this a fair bit, if I don't encounter these issues anymore, then I can hopefully get these tests completed by today. After that, this PR will be ready for review and merge.

joshuakarp · 2021-11-29T06:21:38Z

Given that I've been working on the roadmap and issue planning this afternoon, haven't had a chance to look at the bin tests today. Will be picking them up tomorrow morning.

joshuakarp · 2021-12-15T02:45:40Z

Unrelated to the actual PR, but with ZenHub, I had to manually remove #224 from the "connected issues" such that I could move the issue around on the project board independently of #269 (which is being fixed in this PR). Worthwhile to know for future PRs too.

joshuakarp · 2021-12-17T06:26:17Z

Rebased on master

CMCDragonkai · 2021-12-17T06:32:38Z

Will need to look at:

tests/agent
tests/nodes
tests/claims
...

src/bin/utils/options.ts

src/config.ts

src/bin/utils/parsers.ts

src/bin/agent/CommandStart.ts

src/errors.ts

src/PolykeyAgent.ts

CMCDragonkai · 2021-12-20T01:34:18Z

Assigned to @tegefaulkes to help finish this off.

src/nodes/NodeGraph.ts

tegefaulkes · 2021-12-20T05:59:55Z

Only 2 tests are failing now.

rpcNodes.test.ts>should ping a node (online + offline)
rpcGestalts.test.ts>should discover gestalt via Node

Looking into them now.

tegefaulkes · 2021-12-20T06:26:40Z

should ping a node (online + offline) is returning true when the agent is stopped. adding a sleep(60000) between the stop and ping fixes it. This suggests that the connection needs to time out after stopping the agent. I think we've had issues with connections not being closed properly. Is this being addressed anywhere? @joshuakarp @CMCDragonkai.

src/bin/utils/options.ts

src/bin/agent/CommandStart.ts

tests/bin/agent/start.test.ts

1. Adding support for hostnames 2. Adding parsers for seed node sources 3. --seed-nodes option added to pk agent start 4. Separating NodeGraph db sync from NodeGraph.start 5. Adding Fixing timeouts for connections. 6. Separating seed node connection establishment from NodeManager.start, and adding to PolykeyAgent.start instead (dependency on ForwardProxy being started) 7. Re-adding the seed nodes into config.ts and setting PK_SEED_NODES to be empty by default (for CLI testing to connect to 0 seed nodes by default)

CMCDragonkai · 2021-12-20T08:17:22Z

should ping a node (online + offline) is returning true when the agent is stopped. adding a sleep(60000) between the stop and ping fixes it. This suggests that the connection needs to time out after stopping the agent. I think we've had issues with connections not being closed properly. Is this being addressed anywhere? @joshuakarp @CMCDragonkai.

Not yet. We will address in CI/CD PR with overall testing overhaul.

tegefaulkes · 2021-12-20T08:19:07Z

Fixed the two above issues, squashed and linted.

tegefaulkes · 2021-12-20T08:28:29Z

CI/CD is running on this now, if that passes then we can merge this.

CMCDragonkai · 2021-12-20T08:31:05Z

Won't wait for the CI/CD. I'm expecting failures that will be fixed by the #231 PR.

joshuakarp self-assigned this Nov 11, 2021

joshuakarp changed the title ~~Supporting trusted seed nodes (nodes, input sources,~~ Supporting trusted seed nodes (nodes domain refactoring, static sources for seed nodes) Nov 11, 2021

joshuakarp force-pushed the seed-nodes branch from ff45e0c to d9ccb8e Compare November 12, 2021 05:04

joshuakarp force-pushed the seed-nodes branch from d44346f to 3a4edfa Compare November 24, 2021 23:44

joshuakarp force-pushed the seed-nodes branch from 3a4edfa to c5f29cf Compare November 26, 2021 02:35

joshuakarp mentioned this pull request Nov 29, 2021

Refactoring Nodes Domain: NodeConnection & NodeConnectionManager and Fixing Bugs #225

Closed

joshuakarp force-pushed the seed-nodes branch from 83cffbe to 28f1582 Compare December 17, 2021 06:26