Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NAT-Traversal Testing with testnet.polykey.io #159

Closed
5 tasks done
CMCDragonkai opened this issue May 26, 2021 · 45 comments
Closed
5 tasks done

NAT-Traversal Testing with testnet.polykey.io #159

CMCDragonkai opened this issue May 26, 2021 · 45 comments
Assignees
Labels
development Standard development epic Big issue with multiple subissues r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices

Comments

@CMCDragonkai
Copy link
Member

CMCDragonkai commented May 26, 2021

Specification

To automatically test for NAT-busting is a bit complex, you need to simulate the existence of multiple machines, and then simulate full-cone nat, restricted cone nat and then symmetric nat.

Since we don't have a relay proxy enabled yet, symmetric NATs is going to be left out for now. So we'll focus on all the NAT architectures except for symmetric NAT.

Additional context

Actually I don't think we should bother with QEMU or NixOS here. It's too complicated. QEMU might be good choice for being able to run the test cross-platform, but lacking expertise on QEMU here (I've already worked on it with respect to netboot work), and more experience with network namespaces should mean we can do this tests on just Linux. NixOS limits our environment even more and requires running in a NixOS environment.

Note that network namespaces with Linux stateful firewalls should be perfectly capable of simulating a port-restricted firewall.

Old context follows...

The best way to do this is with VM system using QEMU.

NixOS has a multi-machine testing system that can be used to do this, however such tests can only run on NixOS: https://nixos.org/manual/nixos/unstable/index.html#sec-nixos-tests We have pre-existing code for this:

NixOS NAT Module Test
# here is the testing base file: https://github.com/NixOS/nixpkgs/blob/master/nixos/lib/testing-python.nix
with import ../../pkgs.nix {};
let
  pk = (callPackage ../../nix/default.nix {}).package;
in
  import <nixpkgs/nixos/tests/make-test-python.nix> {
    nodes =
      {
        privateNode1 =
          { nodes, pkgs, ... }:
          {
            virtualisation.vlans = [ 1 ];
            environment.variables = {
              PK_PATH = "$HOME/polykey";
            };
            environment.systemPackages = [ pk pkgs.tcpdump ];
            networking.firewall.enable = false;
            networking.defaultGateway = (pkgs.lib.head nodes.router1.config.networking.interfaces.eth1.ipv4.addresses).address;
          };
        privateNode2 =
          { nodes, pkgs, ... }:
          {
            virtualisation.vlans = [ 2 ];
            environment.variables = {
              PK_PATH = "$HOME/polykey";
            };
            environment.systemPackages = [ pk pkgs.tcpdump ];
            networking.firewall.enable = false;
            networking.defaultGateway = (pkgs.lib.head nodes.router2.config.networking.interfaces.eth1.ipv4.addresses).address;
          };
        router1 =
          { pkgs, ... }:
          {
            virtualisation.vlans = [ 1 3 ];
            environment.systemPackages = [ pkgs.tcpdump ];
            networking.firewall.enable = false;
            networking.nat.externalInterface = "eth2";
            networking.nat.internalIPs = [ "192.168.1.0/24" ];
            networking.nat.enable = true;
          };
        router2 =
          { ... }:
          {
            virtualisation.vlans = [ 2 3 ];
            environment.systemPackages = [ pkgs.tcpdump ];
            networking.firewall.enable = false;
            networking.nat.externalInterface = "eth2";
            networking.nat.internalIPs = [ "192.168.2.0/24" ];
            networking.nat.enable = true;
          };
        publicNode =
          { config, pkgs, ... }:
          {
            virtualisation.vlans = [ 3 ];
            environment.variables = {
              PK_PATH = "$HOME/polykey";
            };
            environment.systemPackages = [ pk pkgs.tcpdump ];
            networking.firewall.enable = false;
          };
      };
    testScript =''
      start_all()
      # can start polykey-agent in both public and private nodes
      publicNode.succeed("pk agent start")
      privateNode1.succeed("pk agent start")
      privateNode2.succeed("pk agent start")
      # can create a new keynode in both public and private nodes
      create_node_command = "pk agent create -n {name} -e {name}@email.com -p passphrase"
      publicNode.succeed(create_node_command.format(name="publicNode"))
      privateNode1.succeed(create_node_command.format(name="privateNode1"))
      privateNode2.succeed(create_node_command.format(name="privateNode2"))
      # can add privateNode node info to publicNode
      publicNodeNodeInfo = publicNode.succeed("pk nodes get -c -b")
      privateNode1.succeed("pk nodes add -b '{}'".format(publicNodeNodeInfo))
      privateNode2.succeed("pk nodes add -b '{}'".format(publicNodeNodeInfo))
      # can add publicNode node info to privateNodes
      privateNode1NodeInfo = privateNode1.succeed("pk nodes get -c -b")
      privateNode2NodeInfo = privateNode2.succeed("pk nodes get -c -b")
      publicNode.succeed("pk nodes add -b '{}'".format(privateNode1NodeInfo))
      publicNode.succeed("pk nodes add -b '{}'".format(privateNode2NodeInfo))
      # copy public keys over to node machines
      publicNodePublicKey = publicNode.succeed("cat $HOME/.polykey/.keys/public_key")
      privateNode1PublicKey = privateNode1.succeed("cat $HOME/.polykey/.keys/public_key")
      privateNode2PublicKey = privateNode2.succeed("cat $HOME/.polykey/.keys/public_key")
      privateNode1.succeed("echo '{}' > $HOME/publicNode.pub".format(publicNodePublicKey))
      privateNode1.succeed("echo '{}' > $HOME/privateNode2.pub".format(privateNode2PublicKey))
      privateNode2.succeed("echo '{}' > $HOME/publicNode.pub".format(publicNodePublicKey))
      privateNode2.succeed("echo '{}' > $HOME/privateNode1.pub".format(privateNode1PublicKey))
      publicNode.succeed("echo '{}' > $HOME/privateNode1.pub".format(privateNode1PublicKey))
      publicNode.succeed("echo '{}' > $HOME/privateNode2.pub".format(privateNode2PublicKey))
      # modify node info to match node machines' host address
      publicNode.succeed("pk nodes update -p $HOME/privateNode1.pub -ch privateNode1")
      publicNode.succeed("pk nodes update -p $HOME/privateNode2.pub -ch privateNode2")
      privateNode1.succeed(
          "pk nodes update -p $HOME/publicNode.pub -ch publicNode -r $HOME/publicNode.pub"
      )
      privateNode2.succeed(
          "pk nodes update -p $HOME/publicNode.pub -ch publicNode -r $HOME/publicNode.pub"
      )
      # privateNodes can ping publicNode
      privateNode1.succeed("pk nodes ping -p $HOME/publicNode.pub")
      privateNode2.succeed("pk nodes ping -p $HOME/publicNode.pub")
      # can create a new vault in publicNode and clone it from both privateNodes
      publicNode.succeed("pk vaults new publicVault")
      publicNode.succeed("echo 'secret content' > $HOME/secret")
      publicNode.succeed("pk secrets new publicVault:Secret -f $HOME/secret")
      privateNode1.succeed("pk vaults clone -n publicVault -p $HOME/publicNode.pub")
      privateNode2.succeed("pk vaults clone -n publicVault -p $HOME/publicNode.pub")
      # can create a new vault in privateNode1
      privateNode1.succeed("pk vaults new privateVault1")
      # can create a new secret in privateNode1
      privateNode1.succeed("echo 'secret content' > $HOME/secret")
      privateNode1.succeed("pk secrets new privateVault1:Secret -f $HOME/secret")
      # setup a relay between privateNode1 and publicNode
      privateNode1.succeed("pk nodes relay -p $HOME/publicNode.pub")
      # add privateNode1 node info to privateNode2
      privateNode1NodeInfo = privateNode1.succeed("pk nodes get -c -b")
      privateNode2.succeed("pk nodes add -b '{}'".format(privateNode1NodeInfo))
      # add privateNode2 node info to privateNode1
      privateNode2NodeInfo = privateNode2.succeed("pk nodes get -c -b")
      privateNode1.succeed("pk nodes add -b '{}'".format(privateNode2NodeInfo))
      # can ping privateNode1 to privateNode2
      privateNode2.succeed("pk nodes ping -p ~/privateNode1.pub")
      # can pull a vault from privateNode1 to privateNode2
      privateNode2.succeed("pk vaults clone -p ~/privateNode1.pub -n privateVault1")
    '';
  }

Tasks

  1. - Create test harness/fixture utilities that create a multi-node situation
  2. - Simulate a NAT table situation by making use of network namespaces
  3. - This test can only run on Linux that supports virtual network namespaces.
    4. [ ] - The test will have to be run separately from npm test which runs jest. This test can be done inside Gitlab CI/CD if the CI/CD on Linux supports creating network namespaces. If not, it's a manual test. Using conditional testing instead Conditional testing for platform-specific tests #380
  4. - Review my gist https://gist.github.com/CMCDragonkai/3f3649d7f1be9c7df36f which explains how to use network namespaces. The Linux iptables firewall has to be used that simulates a NAT that allows outgoing packets but denies incoming packets except for the connections that are already live. This is called a "stateful firewall". I've done this before, but I forgot the details.
  5. - You'll need to use https://stackabuse.com/executing-shell-commands-with-node-js/ to run the ip netns commands. Remember to check whether the OS is linux before allowing one to run these tests.
  6. [ ] - Add in testing involving testnet.polykey.io which should run only during integration testing after the integration:deployment job (because it has to deploy to the testnet in that job). - Reissued Integration Tests for testnet.polykey.com Polykey-CLI#71
@CMCDragonkai
Copy link
Member Author

This should be incorporated into our automated tests when we run jest. But that would also mean using nix-build... etc. Or it can be done outside as a separate command that is only run in our checkPhase during the building of the application/library (possibly in our release.nix since our default.nix doesn't have this).

@CMCDragonkai
Copy link
Member Author

We need to implement a test for relaying the hole punching message. This is not meant to be using the notifications domain because it's part of automated connection establishment.

We need to test several situations:

  1. Test if we can do this with a designated seed node, this coincides with Testnet Node Deployment (testnet.polykey.io) #194 <- only this one for release
  2. Test if we can do this with any node on the Polykey network, thus generalising to decentralised relays.

@CMCDragonkai CMCDragonkai changed the title Automated NAT-Busting Testing NAT-Traversal Testing (non-Symmetric NAT) Aug 23, 2021
@CMCDragonkai CMCDragonkai added the development Standard development label Aug 29, 2021
@CMCDragonkai
Copy link
Member Author

@joshuakarp I'm curious how exactly are we going to implement a optimised routing system for routing the hole punching relay messages?

The same algorithm can later be used for picking the optimal relay for mesh proxying to defeat symmetric NAT.

I remember we mentioned some usage of kademlia or represenation of "closest" node.

If we just assume that we always use our seed cluster/bootstrap cluster, then this is just centralised routing. But if we enable any keynode to be a relay, then we need to understand that the the PK network is a loose mesh, with loose connection lifetimes as well. Is kademlia actually useful for routing here?

This feels like a routing problem, and it seems that existing routers already have algorithms that help solve this problem. Is there any cross over with things like spanning tree algorithms https://en.wikipedia.org/wiki/Minimum_spanning_tree?

Given that we all keynodes may be on the public internet. Another matter is whether all live network links are equal in quality. Of course in reality they are not where latency and throughput and reliability matters. But if we are only distinguishing between vertexes where edges can be made vs vertexes where edges cannot be made, then our algorithm should converge very quickly to find the proper relaying route.

@joshuakarp
Copy link
Contributor

joshuakarp commented Aug 30, 2021

@CMCDragonkai Kademlia inherently has a "closeness" mechanism. That is, the XOR value of two node IDs determine closeness (smaller = closer, larger = further away). Remember that with Kademlia, we store more node ID -> node address mappings of the nodes that are "closest" to us: this is the fundamental part of the k-buckets structure.

Isn't this inherently a routing solution?

See this too, straight from the Kademlia paper https://www.scs.stanford.edu/~dm/home/papers/kpos.pdf:

We start with some definitions. For a k-bucket covering the distance range [2^i, 2^(i+1)), define the index of the bucket to be i. Define the depth, h, of a node to be [number of k buckets] − i, where i is the smallest index of a non-empty bucket. Define node y’s bucket height in node x to be the index of the bucket into which x would insert y minus the index of x’s least significant empty bucket. Because node IDs are randomly chosen, it follows that highly non-uniform distributions are unlikely. Thus with overwhelming probability the height of any given node will be within a constant of log n for a system with n nodes. Moreover, the bucket height of the closest node to an ID in the kth-closest node will likely be within a constant of log k.

Our next step will be to assume the invariant that every k-bucket of every node contains at least one contact if a node exists in the appropriate range. Given this assumption, we show that the node lookup procedure is correct and takes logarithmic time. Suppose the closest node to the target ID has depth h. If none of this node’s h most significant k-buckets is empty, the lookup procedure will find a node half as close (or rather whose distance is one bit shorter) in each step, and thus turn up the node in h − log k steps. If one of the node’s k-buckets is empty, it could be the case that the target node resides in the range of the empty bucket. In this case, the final steps will not decrease the distance by half. However, the search will proceed exactly as though the bit in the key corresponding to the empty bucket had been flipped. Thus, the lookup algorithm will always return the closest node in h − log k steps.

I found a pretty good animation of this too, to showcase the lookup procedure https://kelseyc18.github.io/kademlia_vis/lookup/

As a side note, I started to read quite an interesting paper about using notions of "trust" to overcome some of the issues with malicious nodes and attack vectors on these kinds of systems: https://ieeexplore.ieee.org/document/6217954

@CMCDragonkai
Copy link
Member Author

Kademlia's closeness is used to route to the relevant node that has information on the node ID to IP address. I can see how that might mean that you can trigger a hole punch relay message at that node.

Does this mean you would need to send that as a option/flag that means you want to pass on a hole punching message on the call to resolve a node ID? This would mean resolution and relaying a hole punch is done at the same time.

Or you would need to know which node returned the resolution and then use that.

However there's still a problem with this mechanism. The relaying node must already have an open connection with the receiving node. If the relaying node does not have an open and live connection and that the receiving node is behind a restricted NAT, then the relaying cannot actually relay anything just like the sending node.

There is an assumption here that the node that resolves has an open connection to all the IP addresses. But is this actually true? There are several points here:

  1. Relaying node must maintain an open and already have a live connection to the receiving node. Thus you want to route a relay message to a node that is open to it.
  2. Sending node must be able to open a connection to the relaying node otherwise you have a chicken or egg problem here. A transitive NAT traversal problem.
  3. Kademlia doesn't have a locality optimisation based on network locality for throughput nor latency. But this can solved later.
  4. Seed/bootstrap nodes is the best candidate at the moment for relaying but if we want to decentralised this, this should work as a mesh.
  5. Participating as part of the mesh should be optional... Or if not then all relay messages should ideally not leak which PK node is contacting which PK node. Which sounds like an onion routing scheme.

@CMCDragonkai
Copy link
Member Author

Is the kademlia contact database rebalanced/replicated across the network like a DHT?

Otherwise how does one store a contact if not by being contacted by it and contacting it in turn?

@joshuakarp
Copy link
Contributor

Does this mean you would need to send that as a option/flag that means you want to pass on a hole punching message on the call to resolve a node ID? This would mean resolution and relaying a hole punch is done at the same time.

In order for Kademlia to function, there are lots of implicit connection establishments taking place. That is, every time you receive k closest nodes from another node, the idea is that you would connect to each of these received nodes and query them for their k closest nodes. If you don't already have a connection established with them, then you need to send a hole-punching packet across the network to attempt to establish connection.

So yes, as part of the resolution process, we are already sending hole punch packets to each of these nodes we need to contact.

Or you would need to know which node returned the resolution and then use that.

This could be a worthwhile optimisation.

However there's still a problem with this mechanism. The relaying node must already have an open connection with the receiving node. If the relaying node does not have an open and live connection and that the receiving node is behind a restricted NAT, then the relaying cannot actually relay anything just like the sending node.

There is an assumption here that the node that resolves has an open connection to all the IP addresses. But is this actually true?

Yeah, you're right. I remember we had some brief discussion about whether we should consider having "persistent" connections to some of the "closest" nodes in the network. That is, upon coming online, we immediately connect to these nodes. But yeah, in order to even establish these persistent connections, we have the same issue.

@joshuakarp
Copy link
Contributor

Is the kademlia contact database rebalanced/replicated across the network like a DHT?

Otherwise how does one store a contact if not by being contacted by it and contacting it in turn?

Currently no. There's no rebalancing/replication across the network. There's currently 2 ways that nodes are added to the database:

  1. This kademlia "discovery" process, of contacting other nodes to find the k closest nodes (any found nodes that are able to be connected to are added to our database).
  2. I added an initial "sync" to a node when it comes online. That is, it contacts the provided seed nodes (if any are provided) and asks for the k closest nodes to itself. Currently, this also attempts to establish connection before adding the node to our database.

@CMCDragonkai
Copy link
Member Author

I think our plan is for the release, we'll stick with the centralised seed node cluster #194.

We can put the problem of decentralised relaying to a post-release issue. This issue is more focused on just creating a test-harness for NAT-traversal, so we should focus this issue on this problem.

In the mean time, I'll create a new issue for decentralised relaying.

@CMCDragonkai
Copy link
Member Author

If you have difficulties working on this, I can ask @nzhang-zh or @Zachaccino to help advise.

@CMCDragonkai CMCDragonkai added the epic Big issue with multiple subissues label Oct 26, 2021
@CMCDragonkai
Copy link
Member Author

Our tests here should probably change to be manual as soon as #194 is done and then figure out how to automate these tests.

@CMCDragonkai CMCDragonkai changed the title NAT-Traversal Testing (non-Symmetric NAT) NAT-Traversal Testing (non-Symmetric NAT) with testnet.polykey.io Nov 1, 2021
@joshuakarp
Copy link
Contributor

Start date changed from Nov 15th to Nov 19th (based on delays in #231).

@joshuakarp
Copy link
Contributor

joshuakarp commented Nov 16, 2021

Start date changed from Friday Nov 19th to Tuesday Nov 23rd (delays in #269, #231, and CLI MR on Gitlab).

@joshuakarp
Copy link
Contributor

Start date changed from Tuesday Nov 23rd to Monday Dec 6th (delayed from refactoring work in #283).

@joshuakarp
Copy link
Contributor

Removing this from #291 as it should be closed as part of the testnet deployment (#194).

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Mar 1, 2022

These tests must be written outside or separately from src/tests. This way npm test does not run the NAT traversal testing. This is because NAT traversal testing may require a real network (when going to the seed nodes) or require OS simulation of NAT. A couple solutions here:

  1. Create a separate tests-nat directory - disadvantage here is that you lose all your existing jest context and utilities, but you have to configure it again
  2. Use https://jestjs.io/docs/cli#--testpathignorepatternsregexarray if the we use something like tests/nat as a subdirectory - this is advantageous for re-using all the same jest context, but just means we have to configure jest to ignore by default these tests, which maybe done in package.json or jest.config.js.

It's best to continue using our jest tooling for these tests, but if we need to use OS simulation, then the jest tests may need to be executing shell commands which then encapsulate scripts that run inside a network namespaces.

@CMCDragonkai
Copy link
Member Author

This issue requires more deeper specifications, that work out all the different cases being tested. It's going to depend on the resolution of #326 as that will finish the testnet deployment. These test cases may use the testnet.polykey.io.

@CMCDragonkai
Copy link
Member Author

Some ideas for initial cases..

These cases do not have a signalling server. I.e. no seed node involved in coordination.

  1. Node1 connect to Node2 - basic sanity test
  2. Node1 behind NAT connects to Node2 - here Node1 is acting like a client and it is behind a NAT, connecting to an open Node2 that isn't behind NAT
  3. Node1 connects to Node2 behind NAT - here Node1 is acting like a client, connecting to a closed Node2 that is behind a NAT
  4. Node1 behind NAT connects to Node2 behind NAT - here Node1 is acting like a client and it is behind NAT, and it is connecting to Node2 which is also behind NAT

For the NAT, we need to simulate the 4 types:

  1. Port restricted
  2. Address restricted
  3. Full cone
  4. Symmetric

I'm not sure if our Linux netns and firewall can simulate all 4, but it should be able to do at the very least port restricted.

These cases do have a signalling server:

  1. Node1 connect to node2
  2. Node1 behind NAT connects to Node2 - here Node1 is acting like a client and it is behind a NAT, connecting to an open Node2 that isn't behind NAT
  3. Node1 connects to Node2 behind NAT - here Node1 is acting like a client, connecting to a closed Node2 that is behind a NAT
  4. Node1 behind NAT connects to Node 2 behind NAT - here Node1 is acting like a client and it is behind NAT, and it is connecting to Node2 which is also behind NAT

The signalling server, is enabled by having both node1 and node2 already connected to the seed node. That seed node should then be relaying connection request messages.

That should be enough for now. No TURN relay testing yet.

Note that some tests are expected to "fail", in that we want to test what the expected exceptional behaviour handling is. Like when the nodes cannot connect, how do we communicate this to the end user.

@CMCDragonkai
Copy link
Member Author

In order to create these network namespaces, you have to use both ip and iptables commands to simulate the NAT architectures we're looking for. The gist guide https://gist.github.com/CMCDragonkai/3f3649d7f1be9c7df36f provides an example of the sort of things that will be called from the jest tests.

This also means when we actually do the tests, the tests will be done with pkSpawn or pkExpect, pkExec. These tests are high level tests, they don't import things inside the src/ codebase. It's all about using the pk command line, and running them inside the network namespaces. Which means they are similar to tests/bin.

@emmacasolin
Copy link
Contributor

emmacasolin commented Mar 7, 2022

After doing some prototyping today, I'm now able to setup two nodes bethind two routers (four network namespaces) and have node 1 and node 2 be able to ping each other. The next step will be adding iptables rules to the routers to simulate nat, but for now this is how I'm setting everything up before that point:

# Create four network namespaces
sudo ip netns add node1
sudo ip netns add node2
sudo ip netns add router1
sudo ip netns add router2

# Create veth interfaces to connect the namespaces such that we have
# node1 <-veth1-> router1 <-veth3-> router2 <-veth2-> node2
sudo ip link add veth1-n1 type veth peer name veth1-r1
sudo ip link add veth2-n2 type veth peer name veth2-r2
sudo ip link add veth3-r1 type veth peer name veth3-r2

# Connect up the ends to the correct namespaces
sudo ip link set veth1-r1 netns router1
sudo ip link set veth1-n1 netns node1
sudo ip link set veth2-r2 netns router2
sudo ip link set veth2-n2 netns node2
sudo ip link set veth3-r1 netns router1
sudo ip link set veth3-r2 netns router2

# Bring up loopback and the veth interfaces for all of the namespaces
sudo ip netns exec node1 ip link set lo up
sudo ip netns exec node1 ip link set veth1-n1 up
sudo ip netns exec node2 ip link set lo up
sudo ip netns exec node2 ip link set veth2-n2 up
sudo ip netns exec router1 ip link set lo up
sudo ip netns exec router1 ip link set veth1-r1 up
sudo ip netns exec router1 ip link set veth3-r1 up
sudo ip netns exec router2 ip link set lo up
sudo ip netns exec router2 ip link set veth2-r2 up
sudo ip netns exec router2 ip link set veth3-r2 up

# Create subnets for the veth interfaces such that we have
# node1 1.1.1.1 <-> 1.1.1.2 router1
# router1 3.3.3.1 <-> 3.3.3.2 router2
# router2 2.2.2.1 <-> 2.2.2.2 node2
# Note that the subnets have to share the first three numbers in order for the communication to work
sudo ip netns exec node1 ip addr add 1.1.1.1/24 dev veth1-n1
sudo ip netns exec router1 ip addr add 1.1.1.2/24 dev veth1-r1
sudo ip netns exec router1 ip addr add 3.3.3.1/24 dev veth3-r1
sudo ip netns exec router2 ip addr add 3.3.3.2/24 dev veth3-r2
sudo ip netns exec router2 ip addr add 2.2.2.1/24 dev veth2-r2
sudo ip netns exec node2 ip addr add 2.2.2.2/24 dev veth2-n2

# At this point everything should be able to communicate with its "neighbours" but we need to set the default routes to allow the rest of the namespaces to communicate
# Node1 should default to the interface on Router1 that it's connected to via veth1
sudo ip netns exec node1 ip route add default via 1.1.1.2
# Router1 should default to the interface on Router2 that it's connected to via veth3
sudo ip netns exec router1 ip route add default via 3.3.3.2
# Router2 should default to the interface on Router1 that it's connected to via veth3
sudo ip netns exec router2 ip route add default via 3.3.3.1
# Node2 should default to the interface on Router2 that it's connected to via veth2
sudo ip netns exec node2 ip route add default via 2.2.2.1

After running all of these commands, we should be able to ping 2.2.2.2 (Node2) from Node1 (and vice versa).

@CMCDragonkai
Copy link
Member Author

To be able to do Nat simulation beyond full cone, you need a stateful firewall. In iptables this is known as conntrack. Have a look at conntrack and stateful iptables.

@CMCDragonkai
Copy link
Member Author

I believe that with node 1 and router 1 they can all share the same namespace.

This is because the network namespace creates its own private network, and both Node 1 and Router 1 are on the same private network.

However it may be better for you to test with 4 namespaces first and then see how you can optimise just down to 2.

@CMCDragonkai
Copy link
Member Author

If your commands will require sudo permissions, then you can run the jest test script as sudo. For example sudo npm test. However if your need dependencies from the nix shell to be sudo then sudo nix-shell is also possible.

Do note that any files created will be in root ownership so it's important that any temporary files created are deleted.

@CMCDragonkai
Copy link
Member Author

Also any command using route or ifconfig should be using ip ... commands because the former 2 are being deprecated.

@emmacasolin
Copy link
Contributor

To be able to do Nat simulation beyond full cone, you need a stateful firewall. In iptables this is known as conntrack. Have a look at conntrack and stateful iptables.

From my research I think these iptables rules should replicate a stateful firewall:

# External address of our router
addr_ext="192.168.2.170"
# Port (using same for private and router, as well as for external hosts)
port="55555"
  • iptables -A INPUT -i $addr_ext --dport $port -m state --state ESTABLISHED,RELATED -j ACCEPT (accept packets from addresses we've communicated with in the past)
  • iptables -A INPUT -i $addr_ext --dport $port -m state --state NEW -j DROP (and drop packets from those we haven't)

I remember having a quick look a conntrack and it didn't seem like the right thing to use, but I can have another look at it.

@emmacasolin
Copy link
Contributor

emmacasolin commented Mar 7, 2022

I believe that with node 1 and router 1 they can all share the same namespace.

This is because the network namespace creates its own private network, and both Node 1 and Router 1 are on the same private network.

However it may be better for you to test with 4 namespaces first and then see how you can optimise just down to 2.

Hmm yeah that might work. I'll keep prototyping with four for now but that could be something to look into later.

I found this which might be useful for setting up namespaces that contain multiple hosts with a router: https://github.com/mininet/mininet

@emmacasolin
Copy link
Contributor

I'm in the process of testing iptables rules to see if the NAT is working correctly, however I'm finding it hard to test for this. I was wanting to use wireshark but I can't open it from inside a namespace. I tried using nsenter to do this but it doesn't seem to be working.

@tegefaulkes
Copy link
Contributor

You can change the net namespace of a program using

ip netns attach NAME PID - create a new named network namespace

              If NAME is available in /var/run/netns this command attaches the network namespace of the process PID to NAME as if it were created with ip netns.

But I'm not sure how well it will work with wireshark. Alternatively you can use tcpdump or netcat for simple testing.

@emmacasolin
Copy link
Contributor

This test.ts is currently correctly setting up four namespaces (node1 <-> router1 <-> router2 <-> node2) where node1 and node2 are able to ping each other (and get a response back) by communicating through the two routers. I've been testing trying to add rules to the nat table for router1 in order to simulate full-cone NAT, however from running simple tests in the kernel it doesn't look like the rules are performing correctly at this stage, so this is something I'll need to keep prototyping.

import { exec } from "child_process";

async function main() {
  // Namespaces
  const netnsn1 = 'node1';
  const netnsn2 = 'node2';
  const netnsr1 = 'router1';
  const netnsr2 = 'router2';
  // Veth cables (ends)
  const n1ToR1 = 'veth1-n1';
  const r1ToN1 = 'veth1-r1';
  const r2ToN2 = 'veth2-r2';
  const n2ToR2 = 'veth2-n2';
  const r1ToR2 = 'veth3-r1';
  const r2ToR1 = 'veth3-r2';
  // Subnets
  const n1ToR1Subnet = '1.1.1.1';
  const r1ToN1Subnet = '1.1.1.2';
  const r2ToN2Subnet = '2.2.2.1';
  const n2ToR2Subnet = '2.2.2.2';
  const r1ToR2Subnet = '3.3.3.1';
  const r2ToR1Subnet = '3.3.3.2';
  // Subnet mask
  const subnetMask = '/24';
  // Logger for exec commands
  const logger = (error, stdout, stderr) => {
    if (error) {
      console.log(`error: ${error.message}`);
      return;
    }
    if (stderr) {
        console.log(`stderr: ${stderr}`);
        return;
    }
    console.log(`stdout: ${stdout}`);
  }
  // Create network namespaces for two nodes with NAT routers
  exec(`ip netns add ${netnsn1}`, logger);
  exec(`ip netns add ${netnsn2}`, logger);
  exec(`ip netns add ${netnsr1}`, logger);
  exec(`ip netns add ${netnsr2}`, logger);
  // Create veth pairs to link the namespaces
  exec(`ip link add ${n1ToR1} type veth peer name ${r1ToN1}`, logger);
  exec(`ip link add ${r2ToN2} type veth peer name ${n2ToR2}`, logger);
  exec(`ip link add ${r1ToR2} type veth peer name ${r2ToR1}`, logger);
  // Link up the veth pairs to the correct namespaces
  exec(`ip link set ${n1ToR1} netns ${netnsn1}`, logger);
  exec(`ip link set ${n2ToR2} netns ${netnsn2}`, logger);
  exec(`ip link set ${r1ToN1} netns ${netnsr1}`, logger);
  exec(`ip link set ${r1ToR2} netns ${netnsr1}`, logger);
  exec(`ip link set ${r2ToN2} netns ${netnsr2}`, logger);
  exec(`ip link set ${r2ToR1} netns ${netnsr2}`, logger);
  // Loopback and veths are down by default - get them running
  exec(`ip netns exec ${netnsn1} ip link set lo up`, logger);
  exec(`ip netns exec ${netnsn1} ip link set ${n1ToR1} up`, logger);
  exec(`ip netns exec ${netnsn2} ip link set lo up`, logger);
  exec(`ip netns exec ${netnsn2} ip link set ${n2ToR2} up`, logger);
  exec(`ip netns exec ${netnsr1} ip link set lo up`, logger);
  exec(`ip netns exec ${netnsr1} ip link set ${r1ToN1} up`, logger);
  exec(`ip netns exec ${netnsr1} ip link set ${r1ToR2} up`, logger);
  exec(`ip netns exec ${netnsr2} ip link set lo up`, logger);
  exec(`ip netns exec ${netnsr2} ip link set ${r2ToN2} up`, logger);
  exec(`ip netns exec ${netnsr2} ip link set ${r2ToR1} up`, logger);
  // Create subnets for the veth pairs to communicate over
  exec(`ip netns exec ${netnsn1} ip addr add ${n1ToR1Subnet}${subnetMask} dev ${n1ToR1}`, logger);
  exec(`ip netns exec ${netnsn2} ip addr add ${n2ToR2Subnet}${subnetMask} dev ${n2ToR2}`, logger);
  exec(`ip netns exec ${netnsr1} ip addr add ${r1ToN1Subnet}${subnetMask} dev ${r1ToN1}`, logger);
  exec(`ip netns exec ${netnsr1} ip addr add ${r1ToR2Subnet}${subnetMask} dev ${r1ToR2}`, logger);
  exec(`ip netns exec ${netnsr2} ip addr add ${r2ToN2Subnet}${subnetMask} dev ${r2ToN2}`, logger);
  exec(`ip netns exec ${netnsr2} ip addr add ${r2ToR1Subnet}${subnetMask} dev ${r2ToR1}`, logger);
  // Setup the defalt routes for each namespace
  exec(`ip netns exec ${netnsn1} ip route add default via ${r1ToN1Subnet}`, logger);
  exec(`ip netns exec ${netnsn2} ip route add default via ${r2ToN2Subnet}`, logger);
  exec(`ip netns exec ${netnsr1} ip route add default via ${r2ToR1Subnet}`, logger);
  exec(`ip netns exec ${netnsr2} ip route add default via ${r1ToR2Subnet}`, logger);
  // Check that everything was setup correctly
  // Interfaces are up at the correct addresses
  exec(`ip netns exec ${netnsn1} ip addr`, logger);
  exec(`ip netns exec ${netnsn2} ip addr`, logger);
  exec(`ip netns exec ${netnsr1} ip addr`, logger);
  exec(`ip netns exec ${netnsr2} ip addr`, logger);
  // Routing tables are correct
  exec(`ip netns exec ${netnsn1} ip route`, logger);
  exec(`ip netns exec ${netnsn2} ip route`, logger);
  exec(`ip netns exec ${netnsr1} ip route`, logger);
  exec(`ip netns exec ${netnsr2} ip route`, logger);
  // Can ping from one node to the other
  exec(`ip netns exec ${netnsn1} ping -c 3 ${n2ToR2Subnet}`, logger);
  exec(`ip netns exec ${netnsn2} ping -c 3 ${n1ToR1Subnet}`, logger);
  // Delete the namespaces
  exec(`ip netns del ${netnsn1}`, logger);
  exec(`ip netns del ${netnsn2}`, logger);
  exec(`ip netns del ${netnsr1}`, logger);
  exec(`ip netns del ${netnsr2}`, logger);
}

main();

@emmacasolin
Copy link
Contributor

I've got the NAT rules working!! For testing this I created this setup of namespaces linked with my real system (since I can only open wireshark from my real system):

image

I wanted client to act like a client behind a router (router) and for my root system to act like a server. The only default routing that was required was on client, so that packets to any address (e.g. root) would be routed through router.

sudo ip netns exec client ip route add default via 10.2.2.2

I then added the following iptables rules to the router:

# Any packets leaving on veth1 coming from the client (10.2.2.1/24) should be made to look like they're coming from the router (10.1.1.1)
iptables -t nat -A POSTROUTING -s 10.2.2.1/24 -o veth1 -j SNAT --to-source 10.1.1.1
# Any packets arriving on veth1 addressed to the router (10.1.1.1/24) should be redirected to the client (10.2.2.1)
iptables -t nat -A PREROUTING -d 10.1.1.1/24 -i veth1 -j DNAT --to-destination 10.2.2.1 

This simulates the endpoint-independent NAT mapping used by full-cone, restricted-cone, and port-restricted-cone NAT. Note that specifying the interface that the packet is arriving on for the PREROUTING rule is required (specifying the outgoing interface for the POSTROUTING rules isn't necessary but I added it in for symmetry). You can see why by looking at the PREROUTING table after adding these rules:

iptables -t nat -nvL PREROUTING          Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 DNAT       all  --  veth1  *       0.0.0.0/0            10.1.1.0/24          to:10.2.2.1

Even though in our rule we specified the target address to be matched as 10.1.1.1/24, it gets stored as 10.1.1.0/24, meaning that packets being sent to 10.1.1.2 (root) will also be matched by this pattern, meaning that packets sent from the client to 10.1.1.2 will just be redirected back to itself. If we specify the incoming interface as veth1, then this rule will only match packets arriving from the router's out-facing interface and not those arriving from the side facing the client.

With all of this setup done, we can now send packets to and from the client and root. From the client's perspective it is communicating directly with root: it's sending packets addressed to 10.1.1.2 and receiving packets addressed from 10.1.1.2. From root's perspective, it never knows what the client's address is, or even that it's communicating with a client behind a router, since it receives packets addressed from 10.1.1.1 and sends packets addressed back to 10.1.1.1.

image

image

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Mar 18, 2022

Some relevant discussions in the PR #357 (comment) about MASQUERADE vs SNAT and difference between TCP and UDP as well as how symmetric NAT degrades to port-restricted when you only have 1 external IP.

Also for our test cases, I suggest this matrix can help:

image

It comes from https://dh2i.com/kbs/kbs-2961448-understanding-different-nat-types-and-hole-punching/

The non-routable ones should be routable with a TURN relay.

@CMCDragonkai
Copy link
Member Author

@emmacasolin Can you change to address ranges instead as that should make it easier when we have more than 1 agent behind a NAT.

@emmacasolin
Copy link
Contributor

With regards to this comment #357 (comment) we only need to simulate port-restricted cone and symmetric NAT for our tests. This is because our NAT busting will work for full cone and address-restricted cone NAT if it works for port-restricted cone and symmetric NAT, since the architectures they use are the same or less sophisticated than port-restricted cone/symmetric NAT.

@CMCDragonkai CMCDragonkai linked a pull request Jun 10, 2022 that will close this issue
11 tasks
@CMCDragonkai CMCDragonkai changed the title NAT-Traversal Testing (non-Symmetric NAT) with testnet.polykey.io NAT-Traversal Testing with testnet.polykey.io Jun 14, 2022
@CMCDragonkai
Copy link
Member Author

@emmacasolin

I've changed the issue name here to remove "non-Symmetric NAT" because we are infact testing with symmetric NAT now.

This issue is blocked on testnet deployment #378.

@CMCDragonkai
Copy link
Member Author

@emmacasolin can you tick off the tasks here if they are done.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Jun 28, 2022

Earlier tasks are all ticked by the merging of #381 to staging.

I've added task 7 to address the testing for testnet.polykey.io. It can only occur after integration:deployment.

Such a test would need to be conditional as well, but this time representing tests that run during integration.

@tegefaulkes is currently working on getting our tests/bin to work during integration:* jobs, so that work would be relevant because these would be the tests that should only run after integration:builds and integration:deployment finishes.

@CMCDragonkai
Copy link
Member Author

@emmacasolin you'll start on this now, and since the testnet deployment will occur on each deployment to staging, that means you'll need to trigger testnet deployment locally whenever you're fixing up anything related to the testnet.

Please go through your AWS account, and test that you can interact with ECS and ECR. You'll need to use the ./scripts/deploy-image.sh and ./scripts/deploy-service.sh that is going to be merged in #396.

Some initial related bugs include reviewing #402. Also rename that PR to more specific to what is being solved there.

@CMCDragonkai CMCDragonkai added the r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices label Jul 24, 2022
@CMCDragonkai
Copy link
Member Author

The last task is now a separate issue MatrixAI/Polykey-CLI#71, so this issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development epic Big issue with multiple subissues r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices
Development

Successfully merging a pull request may close this issue.

4 participants