Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testnet securely maintain a pool of recovery codes #285

Closed
5 tasks done
CMCDragonkai opened this issue Nov 1, 2021 · 13 comments
Closed
5 tasks done

Testnet securely maintain a pool of recovery codes #285

CMCDragonkai opened this issue Nov 1, 2021 · 13 comments
Assignees
Labels
epic Big issue with multiple subissues procedure Action that must be executed production Affects a production deployment that involves customers r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices security Security risk

Comments

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Nov 1, 2021

Once we have the ability to use PK_RECOVERY_CODE to automatically bootstrap the PK keynodes, we need to create at least 1 recovery code and 1 root key to be used.

The recovery code must be kept secret. I'll maintain this right now. The root key will be inside AWS's block device mounted into the ECS container and this will be kept safe inside AWS.

The recovery code will need to be used as an environment variable for ECS for the testnet.

Eventually we can store the recovery code inside a running Polykey node, and make use of AWS integrations, like our wiki page: "Service Deployment Secrets with AWS ECS".

Doing this should ensure that we don't need to maintain the volume state mounted in to the ECS container, it just has to be mutable, but it can be deleted, since everything can be recovered.

Tasks

  1. - Use pk bootstrap locally to generate a recovery code and root key.
  2. - Save the recovery code securely.
  3. - Try using pk bootstrap on a different directory and see if the same root key is used. Compare them.
  4. - Delete the root key.
  5. - Use the recovery code for ECS Task Definition

This will be done for 1 single testnet node. We can scale this up later.

@CMCDragonkai CMCDragonkai added development Standard development epic Big issue with multiple subissues procedure Action that must be executed production Affects a production deployment that involves customers security Security risk and removed development Standard development labels Nov 1, 2021
@CMCDragonkai CMCDragonkai self-assigned this Nov 1, 2021
@joshuakarp
Copy link
Contributor

Start date changed from Nov 11th to Nov 17th (based on delays in #231).

@joshuakarp
Copy link
Contributor

joshuakarp commented Nov 16, 2021

Start date changed from Wednesday Nov 17th to Friday Nov 19th (delays in #269, #231, and CLI MR on Gitlab).

@joshuakarp
Copy link
Contributor

joshuakarp commented Nov 29, 2021

Start date changed from Friday Nov 19th to Wednesday Dec 1 (delayed from refactoring work in #283).

@CMCDragonkai
Copy link
Member Author

Going to bootstrap this from a password manager.

@CMCDragonkai
Copy link
Member Author

This no longer blocks #194, as it can be resolved before #285 is done. Testnet nodes are still using simple passwords atm, we need to fix some bugs before setting up all the secrets in AWS for recovery codes. In particular addressing partially #403.

@CMCDragonkai CMCDragonkai changed the title Create Testnet Seed Nodes - Maintain Recovery Key and Private Key Testnet securely maintain a pool of recovery codes Jul 11, 2022
@CMCDragonkai
Copy link
Member Author

This still needs to be done, but we need to work out partially #403, as to whether we have a set of Node IDs in PKI or as a gestalt graph or just as a set.

I prefer going down the route of the gestalt graph. But I think for efficiency sake, we still need to provide a set to the agent software and update the src/config.ts with that set. But still with the ability to discover new node IDs.

Once some level of integration testing is done, then this will become the next priority to have testnet integration tests fully passing @tegefaulkes @emmacasolin

@CMCDragonkai
Copy link
Member Author

@CMCDragonkai CMCDragonkai added the r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices label Jul 24, 2022
@CMCDragonkai
Copy link
Member Author

According to https://gitlab.com/MatrixAI/Engineering/Polykey/Polykey-Infrastructure/-/merge_requests/2#note_1145876524, we will inject our recovery codes into AWS secrets manager.

We can use this: https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-secrets-manager/index.html to automate the injection.

That means recovery code generation is done manually and also fixed. It could be done automatically if the infrastructure code had access to the recovery code generators. It would mean bringing in PK's code base and calling generateRecoveryCode. This is only possible once we release the this stable library on NPM. We'll do this after #446.

The NodeId is just the public key then.

Auto generation must be stable. For example if we expect the pool to be 10, and there's 5, then only generate 5 more. If there's 10, we don't delete anything.

Deletion is manual for now.

@CMCDragonkai
Copy link
Member Author

Based on working through #403, the usage of recovery code pool should be placed into the AWS secrets manager. The naming of these secrets should be:

polykey-testnet-agent-1 -> [RecoveryCode, Password, NodeId]
polykey-testnet-agent-2 -> [RecoveryCode, Password, NodeId]

Note that NodeId can be replaced in the future, because it would be deterministically be derived from the RecoveryCode, but we can only do this practically once #446 is merged in otherwise keypair generation will take too long.

@CMCDragonkai
Copy link
Member Author

The secret generation is not automated, we don't have an infrastructure orchestrator at this point in time. But it could be done as part of Polykey-Infrastructure later.

@CMCDragonkai
Copy link
Member Author

@tegefaulkes please link the PR for Polykey-Infrastructure that addresses this.

@tegefaulkes
Copy link
Contributor

The PR hasn't been made yet. I'll link it when it is created.

@tegefaulkes
Copy link
Contributor

No PR was made, It was pushed directly to master. I'm resolving this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic Big issue with multiple subissues procedure Action that must be executed production Affects a production deployment that involves customers r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices security Security risk
Development

Successfully merging a pull request may close this issue.

3 participants