Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting trusted seed nodes (nodes domain refactoring, static sources for seed nodes) #289

Merged
merged 1 commit into from
Dec 20, 2021

Conversation

joshuakarp
Copy link
Contributor

@joshuakarp joshuakarp commented Nov 11, 2021

Description

Supporting trusted seed nodes configuration.

Issues Fixed

Tasks

  1. Refactor nodes domain:
    • - Refactor all notions of broker into seed nodes
    • - Remove separation of brokerNodeConnections - the seed nodes should be treated as regular nodes, that get added to the NodeGraph (Kademlia system)
    • - Change NodeAddress.ip to NodeAddress.host
  2. DNS resolution
    • - resolveHost function to resolve a hostname into an IP address (Host type)
    • - figure out where resolveHost calls are required, and add them in
  3. Support various seed node sources:
    • - Resolve all 3 sources to a common object type (NodeMapping: NodeId -> NodeAddress - i.e. NodeBucket without the lastUpdated field)
    • CLI arguments
    • Environment variable PK_SEED_NODES
      • - Parser for semicolon separated structure
      • - Figure out if possible to use commander's .env with this parsing logic (instead of just a straight read from environment)
    • src/config.ts
      • - Parser for config.ts POJO structure
    • verification:
      • - single CLI argument
      • - multiple CLI arguments
      • - single in PK_SEED_NODES environment variable (and takes precedence over src/config.ts)
      • - multiple in PK_SEED_NODES environment variable (and takes precedence over src/config.ts)
      • - single in src/config.ts
      • - multiple in src/config.ts
      • - combination of CLI, environment, src/config.ts - should only take CLI arguments
      • - combination of CLI, environment, src/config.ts with <seed-nodes> flag - should take CLI arguments as well as src/config.ts
    • tests:
      • - combination of CLI, environment, src/config.ts - should only take CLI arguments
      • - combination of environment, src/config.ts and <seed-nodes> flag - should take all

4. Resolve propagation of NodeConnection errors (from networking domain) - see #224 - will now be handled in #225

Final checklist

  • Domain specific tests
  • Full tests
  • Updated inline-comment documentation
  • Lint fixed
  • Squash and rebased
  • Sanity check the final build

@joshuakarp joshuakarp self-assigned this Nov 11, 2021
@joshuakarp
Copy link
Contributor Author

This PR has been branched from PR #283. Ideally merging order should then be: Gitlab MR -> #283 -> this PR

@joshuakarp
Copy link
Contributor Author

joshuakarp commented Nov 11, 2021

In the process of refactoring nodes, when thinking about on NodeGraph.start and how it will potentially re-add existing broker nodes into the Kademlia system, I discovered a small error when updating a node in a full bucket. When a new node is added to a full bucket, the expectation is that the least active node is removed from the bucket, in favour of this new node. However, there was an odd error that I was encountering where the most recently added node was the one being removed on a full bucket. For some reason, the date comparison in:

      const leastActive = bucketEntries.reduce((prev, curr) => {
        return prev[1].lastUpdated < curr[1].lastUpdated ? prev : curr;
      });

was saying that 2021-11-11T03:42:22.700Z was not earlier than (<) 2021-11-11T03:42:22.791Z, as per the following:

    prev {
      address: { ip: '1.1.1.1', port: 1 },
      lastUpdated: '2021-11-11T03:42:22.700Z'
    } curr {
      address: { ip: '0.0.0.1', port: 1234 },
      lastUpdated: 2021-11-11T03:42:22.791Z
    }
	
	// prev[1].lastUpdated < curr[1].lastUpdated = false
	// prev[1].lastUpdated = 2021-11-11T03:42:22.700Z 
	// curr[1].lastUpdated = 2021-11-11T03:42:22.791Z

What I realised was that it was actually doing a string comparison, and not a Date comparison. Note the lack of ' around curr.lastUpdated - this is because it's a Date object, whereas prev.lastUpdated is a string. This was because the string date was the date being acquired from the database, whereas the Date date was being added to the bucket locally, and hadn't been retrieved from the database.

Therefore, an easy fix:

      const leastActive = bucketEntries.reduce((prev, curr) => {
        return new Date(prev[1].lastUpdated) < new Date(curr[1].lastUpdated) ? prev : curr;
      });

I've also added a test for this scenario in 747a009

@joshuakarp joshuakarp changed the title Supporting trusted seed nodes (nodes, input sources, Supporting trusted seed nodes (nodes domain refactoring, static sources for seed nodes) Nov 11, 2021
@joshuakarp
Copy link
Contributor Author

To provide support for both hostnames and IP addresses in our Kademlia system, I'm hoping to make to make this as simple as possible by simply extending the NodeAddress type:

type NodeAddress = {
  ip: Host;
  port: Port;
};

to the following:

type NodeAddress = {
  host: IPAddress | Hostname;
  port: Port;
};

where both IPAddress and Hostname are both opaque strings. Then, to determine whether we need to resolve the host to an IP address, we simply do a typeof on the host field.


However, thinking about this some more, this won't work. The NodeAddress will be stored in leveldb. Upon retrieval, there's no notion of typing: the host field will strictly be a string.

Perhaps instead of this then, we'll need a function to parse the host on retrieval, and determine whether the host field is either an IP address or hostname. From memory, we already have an isIPv4 function somewhere, so that would be sufficient. To save ourselves from having to parse the host every time we retrieve, perhaps we should also add a boolean flag to the NodeAddress - isHostname or something of the sort. I'm not a huge fan of this though, as it makes it messy.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Nov 11, 2021

If you change Host, there's alot of other places where it is used. Maybe instead just use Host and Hostname?

That way you know that Hostname is DNS. But Host is always IP.

See: https://en.wikipedia.org/wiki/Hostname it's always DNS.

@joshuakarp
Copy link
Contributor Author

If you change Host, there's alot of other places where it is used. Maybe instead just use Host and Hostname?

That way you know that Hostname is DNS. But Host is always IP.

See: https://en.wikipedia.org/wiki/Hostname it's always DNS.

Yeah that seems fine, but I don't think changing the types actually resolves the problem of supporting hostnames, because they're just stored as strings in leveldb. So whenever we retrieve from the database, there's no type information to discern whether it's an IP address or a hostname. So I'm thinking we'll need a parser to check what it is? Is there a simpler solution?

@CMCDragonkai
Copy link
Member

If you have a function that can take Host or Hostname, you would need another function to parse/check what it is and react accordingly. This should be done on the boundary of your domain at least. Then your domain internals can handle specifically Host.

In fact I reckon there should be only 1 or 2 places where Hostname type is used. And it's always at the boundary. I'd recommend keeping it around as part of node connection metadata though, could be useful for debugging or analytics.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Nov 11, 2021

See: https://nodejs.org/api/net.html#netisipinput for checking if it is an IP. All other things are going to be hostnames.

I think we do use this a bit?

Tests if input is an IP address. Returns 0 for invalid strings, returns 4 for IP version 4 addresses, and returns 6 for IP version 6 addresses.

@CMCDragonkai
Copy link
Member

BTW this requires fixing:

      const [client,] = await Promise.all([
        GRPCClientAgent.createGRPCClientAgent({
          nodeId: this.targetNodeId,
          host: this.ingressHost,
          port: this.ingressPort,
          proxyConfig: this.proxyConfig,
          logger: this.logger.getChild(GRPCClientAgent.name),
        }),
        Array.from(brokerConnections, ([_, conn]) =>
          conn.sendHolePunchMessage(
            this.keyManager.getNodeId(),
            this.targetNodeId,
            egressAddress,
            signature,
          ),
        ),
      ]);

It nests the promise array: Promise.all([..., [...]]).

You should be spreading it. Use ...Array.from to spread it into the root array. That way Promise.all can wait for all it.

Right now this is promise leak.

@joshuakarp
Copy link
Contributor Author

joshuakarp commented Nov 12, 2021

If you have a function that can take Host or Hostname, you would need another function to parse/check what it is and react accordingly. This should be done on the boundary of your domain at least. Then your domain internals can handle specifically Host.

In fact I reckon there should be only 1 or 2 places where Hostname type is used. And it's always at the boundary. I'd recommend keeping it around as part of node connection metadata though, could be useful for debugging or analytics.

There's a few instances where Hostname will be used:

  • when a seed node is read from its source (environment variable, src/config.ts, config file)
  • when retrieved from the database, a node will either have an IP address or hostname (Host or Hostname)
  • if a node's address is a hostname, then this will also always be transferred across the network when required/discovered (as opposed to its resolved IP address) - i.e. we read from the database, and immediately transfer it across

Furthermore, as far as I can tell, the only place we will need to resolve a hostname is when creating a connection to it (or, more generally, when we need to actually perform any network-related actions to the host).

As such:

  • resolving the hostname in NodeManager.getNode doesn't make sense, as we'll sometimes need the Hostname (when transferring, or placing in NodeConnection metadata)
  • resolving the hostname before inserting into the database doesn't make sense either, as per above. The resolved IP address should be seen as dynamic

Therefore, I see the best solution as only resolving the hostname to an IP when it's required for the networking work. This would mean that it is resolved internally, as opposed to on the boundary. It seems that in all other places, it's preferable to be preserving it as a Hostname.


EDIT: There was some discussion in our sprint meeting today (15/11/2021) about whether this might be the best approach.

We need to think about whether there are any implications of allowing hostnames for any arbitrary keynode. (Right now, we treat the seed nodes as an arbitrary keynode, and as such, a Hostname is a perfectly valid entry in the NodeGraph)

Right now, the expectation is that only seed nodes would be utilising a hostname. Are there consequences for potentially allowing this for regular keynodes?

@joshuakarp
Copy link
Contributor Author

It's quite useful to have extended the NodeAddress type for the ip field to be either a Host | Hostname:

type NodeAddress = {
  ip: Host | Hostname;
  port: Port;
};

for type safety. That is, when retrieving from the database, or getting some NodeId -> NodeAddress mapping from somewhere, the compiler can tell you explicitly when you'll need to resolve to a Host type for the networking-related functionality.

@CMCDragonkai
Copy link
Member

Can that be?

type NodeAddress = {
  host: Host | Hostname;
  port: Port;
};

@joshuakarp
Copy link
Contributor Author

joshuakarp commented Nov 12, 2021

Can that be?

type NodeAddress = {
  host: Host | Hostname;
  port: Port;
};

Yeah, that's easy enough, I'll change it.


EDIT: This is done: 491bcd8

@joshuakarp
Copy link
Contributor Author

I've realised that #283 hasn't been rebased on master after I merged the changes from #284, and therefore I'm also missing those changes here. I'd thought that this had already been done, but I'll do it now for both these branches.

@joshuakarp
Copy link
Contributor Author

I've rebased this on master now, so it includes the changes from #284.

@joshuakarp
Copy link
Contributor Author

joshuakarp commented Nov 12, 2021

nodes domain changes:

  • changed NodeAddress.ip to NodeAddress.host

NodeManager:

  • replaced usage of broker with seed
  • removed brokerNodeConnections as a separate map - connections to seed nodes are added to the regular NodeConnectionMap like any other node. Furthermore, connections to the seed nodes are instantiated on NodeManager.start
    • createConnectionToBroker has therefore been removed
    • I'm still retaining a separate map of the seed nodes' NodeId -> NodeAddress mappings
    • note that currently, NodeConnection requires a map of seed node connections to establish its connection (in order to relay a hole punch message to a target node). Right now, I've added a getConnectionsToSeedNodes function that simply retrieves each seed node's NodeConnection from the NodeConnectionMap - this is assumed to be a constant time lookup, as the connections are established on start
  • any usage of NodeAddress.host for forming connections has been replaced with resolveHost. Otherwise, this has remained the same

NodeGraph:

NodeConnection:

@joshuakarp
Copy link
Contributor Author

I've been having some troubles with commander and getting the argParser to work as expected for our seed nodes.

We expect that a user can supply multiple seed nodes on start. For example: pk agent start --seed-node [email protected]:1314 [email protected]:1314

This means that, like the --ingress-host flag, we need to make this variadic. However, we also want to default to the seed nodes from the environment variable, or from the config.ts.

However, I'm noticing some issues with the usage of argParser - specifically, when parsing the provided CLI arguments, it also parses the default and environment variable values. I've created an upstream issue tj/commander.js#1641 to try to see if there's a means of getting around this. See the issue there for more details on this.

@joshuakarp
Copy link
Contributor Author

Specifically documenting the issue:

For the following in src/bin/options.ts:

function parseSeedNodes(rawSeedNode: string): NodeMapping {
  const idHostPort = rawSeedNode.split(/[@:]/);
  if (idHostPort.length != 3) {
    throw new commander.InvalidOptionArgumentError(`${rawSeedNode} is not of format 'nodeId@host:port'`)
  }
  if (!nodesUtils.isNodeId(idHostPort[0])) {
    throw new commander.InvalidOptionArgumentError(`${idHostPort[0]} is not a valid node ID`);
  }
  if (!networkUtils.isValidHostname(idHostPort[1])) {
    throw new commander.InvalidOptionArgumentError(`${idHostPort[1]} is not a valid hostname`);
  }
  const port = parseNumber(idHostPort[2]);
  const seedNode: NodeMapping = {
    id: idHostPort[0] as NodeId,
    host: idHostPort[1] as Host | Hostname,
    port: port as Port,
  };
  return seedNode;
  // remove curr prev, just try curr
}

const seedNodes = new commander.Option(
  '-sn, --seed-node [nodeId@host:port...]',
  'Seed Node address mapping',
)
  .argParser(parseSeedNodes)
  .env('PK_SEED_NODES') // TODO: need to figure out how to parse env
  .default(getConfigSeedNodes());

And the following test:

    describe('Providing seed nodes on start', () => {
      const seedNode1Id = makeNodeId('vi3et1hrpv2m2lrplcm7cu913kr45v51cak54vm68anlbvuf83ra0');
      const seedNode1Host = 'testnet.polykey.io' as Hostname;
      const seedNode1Port = 1314 as Port;

      const seedNode2Id = makeNodeId('v359vgrgmqf1r5g4fvisiddjknjko6bmm4qv7646jr7fi9enbfuug');
      const seedNode2Host = 'testnet.polykey.io' as Hostname;
      const seedNode2Port = 1315 as Port;

      test.only('multiple -sn CLI arguments', async () => {
        const result = await testUtils.pk([
          'agent',
          'start',
          '-np',
          foregroundNodePath,
          '--password-file',
          passwordFile,
          '-sn',
          `${seedNode1Id}@${seedNode1Host}:${seedNode1Port}`,
          `${seedNode2Id}@${seedNode2Host}:${seedNode2Port}`,
        ]);

Logging options in src/bin/agent/start.ts provides the following output:

    {
      format: 'human',
      fresh: false,
      clientHost: 'localhost',
      clientPort: 0,
      ingressHost: [ '0.0.0.0' ],
      ingressPort: 0,
      seedNode: {
        id: 'v359vgrgmqf1r5g4fvisiddjknjko6bmm4qv7646jr7fi9enbfuug',
        host: 'testnet.polykey.io',
        port: 1315
      },
      nodePath: '/run/user/1000/polykey-test-hSlgpb/foreground',
      passwordFile: '/run/user/1000/polykey-test-hSlgpb/passwordFile'
    }

i.e. only one of the seed nodes present when using argParser without the curr and prev style.

@joshuakarp
Copy link
Contributor Author

joshuakarp commented Nov 19, 2021

I'm having some problems with the session token seemingly not being created on pk unlock for my tests. e.g. trying to do the following:

pk agent start (with some seed node)
pk agent unlock --password-file
pk node find (the seed node - to verify it's propagated correctly)

When attempting to create and start GRPCClientClient with these commands, I'm getting the following error:

Error:
Cannot read property 'socket' of null

I've traced this to GRPCClient.start's call to const socket = session.socket as TLSSocket;.

Given that the session authentication has changed a bit in the Gitlab MR, I'm considering waiting to rebase once it's merged such that these problems might just resolve themselves. Some discussion of changes from here onwards: https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/213#note_737728551

@joshuakarp
Copy link
Contributor Author

I've squashed the commits of this PR down. Given that I'd initially branched from #283, I've squashed those commits from #283 down to a single, separate commit d4fde43. There's some commits I'm missing from #283, but I expect that this will be resolved after #283 is merged into master, and I make another rebase on master after that.

@joshuakarp
Copy link
Contributor Author

Lots of merge conflicts to resolve after rebasing on master! Spending some time on it.

@joshuakarp
Copy link
Contributor Author

Didn't end up rebasing yesterday (there were a few issues to resolve in #283), but have rebased from #283 now (which was rebased on master).

@joshuakarp
Copy link
Contributor Author

joshuakarp commented Nov 28, 2021

Current status with this PR:

I know there's been some refactoring in #283, so depending on merge order, it may make sense for me to rebase on #283 again before merging.

@joshuakarp
Copy link
Contributor Author

joshuakarp commented Nov 29, 2021

I was having some initial issues with setting up the required tests here: #289 (comment)

Given that the session refactoring has changed this a fair bit, if I don't encounter these issues anymore, then I can hopefully get these tests completed by today. After that, this PR will be ready for review and merge.

@joshuakarp
Copy link
Contributor Author

Given that I've been working on the roadmap and issue planning this afternoon, haven't had a chance to look at the bin tests today. Will be picking them up tomorrow morning.

@joshuakarp
Copy link
Contributor Author

Unrelated to the actual PR, but with ZenHub, I had to manually remove #224 from the "connected issues" such that I could move the issue around on the project board independently of #269 (which is being fixed in this PR). Worthwhile to know for future PRs too.

image

@joshuakarp
Copy link
Contributor Author

Rebased on master

@CMCDragonkai
Copy link
Member

Will need to look at:

  • tests/agent
  • tests/nodes
  • tests/claims
  • ...

src/config.ts Outdated Show resolved Hide resolved
src/config.ts Outdated Show resolved Hide resolved
src/config.ts Outdated Show resolved Hide resolved
@CMCDragonkai
Copy link
Member

Assigned to @tegefaulkes to help finish this off.

src/nodes/NodeGraph.ts Outdated Show resolved Hide resolved
@tegefaulkes
Copy link
Contributor

Only 2 tests are failing now.

  • rpcNodes.test.ts>should ping a node (online + offline)
  • rpcGestalts.test.ts>should discover gestalt via Node

Looking into them now.

@tegefaulkes
Copy link
Contributor

should ping a node (online + offline) is returning true when the agent is stopped. adding a sleep(60000) between the stop and ping fixes it. This suggests that the connection needs to time out after stopping the agent. I think we've had issues with connections not being closed properly. Is this being addressed anywhere? @joshuakarp @CMCDragonkai.

1. Adding support for hostnames
2. Adding parsers for seed node sources
3. --seed-nodes option added to pk agent start
4. Separating NodeGraph db sync from NodeGraph.start
5. Adding Fixing timeouts for connections.
6. Separating seed node connection establishment from NodeManager.start, and adding to PolykeyAgent.start instead (dependency on ForwardProxy being started)
7. Re-adding the seed nodes into config.ts and setting PK_SEED_NODES to be empty by default (for CLI testing to connect to 0 seed nodes by default)
@CMCDragonkai
Copy link
Member

should ping a node (online + offline) is returning true when the agent is stopped. adding a sleep(60000) between the stop and ping fixes it. This suggests that the connection needs to time out after stopping the agent. I think we've had issues with connections not being closed properly. Is this being addressed anywhere? @joshuakarp @CMCDragonkai.

Not yet. We will address in CI/CD PR with overall testing overhaul.

@tegefaulkes
Copy link
Contributor

Fixed the two above issues, squashed and linted.

@tegefaulkes
Copy link
Contributor

CI/CD is running on this now, if that passes then we can merge this.

@CMCDragonkai
Copy link
Member

Won't wait for the CI/CD. I'm expecting failures that will be fixed by the #231 PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Embed Trusted Default Seed Nodes into src/config.ts for bootstrapping the P2P network
3 participants