Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP DO NOT MERGE] ZK KYC #47

Open
wants to merge 31 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
c0e6ae6
adding files
Mar 30, 2023
02ed055
dropdown menu for instructions
topanisto Mar 30, 2023
72d2367
added solidity contract
Mar 31, 2023
87e236b
typo fix
topanisto Mar 31, 2023
91616a7
Merge branch 'main' of https://github.com/novus677/zk-kyc
topanisto Mar 31, 2023
e20e3bd
pulling upstream
novus677 Apr 11, 2023
dcdf8d9
refactored email circuits and added nullifier
novus677 Apr 22, 2023
0c0b297
deleted old files
novus677 Apr 25, 2023
89f79bb
added kyc-email-handler
novus677 Apr 27, 2023
81f5d5c
merge conflicts
novus677 Apr 27, 2023
a1c89d4
refactored input generation
novus677 May 2, 2023
d436e6d
Merge remote-tracking branch 'upstream/refactor'
novus677 May 2, 2023
5058874
deleted files and added verifier.sol file
novus677 May 3, 2023
df22615
edit numbered step
novus677 May 4, 2023
db11c70
update gitignore
novus677 May 4, 2023
9e0a689
remove ignored files
novus677 May 4, 2023
e46df0e
added compatibility with old public keys
novus677 May 11, 2023
057c413
refactored contract
novus677 May 12, 2023
16ecfc1
Create CNAME
novus677 May 13, 2023
d219ffa
Delete CNAME
novus677 May 13, 2023
63a0bf6
added vkey
novus677 May 13, 2023
dec072f
Merge branch 'main' of https://github.com/novus677/zk-kyc
novus677 May 13, 2023
308dc32
strip quotes if bad fetch
novus677 May 15, 2023
84330f2
added nullifier check
novus677 May 15, 2023
5bba795
fixed contract tests
novus677 May 16, 2023
ca48af0
edit frontend and README
novus677 May 16, 2023
35cf980
edited frontend
novus677 May 16, 2023
42af9d7
new contract
novus677 May 17, 2023
9840031
Update README.md
novus677 Jun 8, 2023
120ddac
Update README.md
novus677 Jul 18, 2023
99501ac
Update README.md
novus677 Dec 28, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -69,3 +69,10 @@ generate_input_log.txt
*.debug
*.env
.vscode

# from zk-kyc
/temp.py
*.pem
*.der
src/contracts/broadcast/
lib/forge-std/
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,6 @@
[submodule "src/contracts/lib/forge-std"]
path = src/contracts/lib/forge-std
url = https://github.com/foundry-rs/forge-std
[submodule "lib/forge-std"]
path = lib/forge-std
url = https://github.com/foundry-rs/forge-std
64 changes: 33 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,31 @@
# Anonymous KYC with ZK Email

Generate an anonymous proof of personhood badge at [anonkyc.com](https://anonkyc.com). Note: website likely doesn't work right now due to DKIM public keys changing.

## What is ZK KYC?

A ZK KYC is KYC (Know Your Customer) that hides particular details of the user's identity such as the user's name, date of birth, citizenship, etc. Our ZK KYC proof generator implements the most basic level of ZK KYC: a proof of personhood that reveals no other information about the user. In particular, a user can have multiple addresses but can only ever have one proof of personhood badge. Other levels of ZK KYC could prove that the user is above the age of 21, or that the user is a U.S. citizen, etc.

## Motivation

The use of KYCs to prevent fraud and to comply with regulations compromises the goals of decentralized technologies by placing private information in the hands of centralized organizations. A ZK KYC provides a possible solution to both sides of the debate: the KYC component can give organizations trust in their customers and also provide Sybil resistance, while the ZK component keeps customers' private information completely hidden.

## How ZK KYC works

See the bottom of this README, or [this blog post](https://blog.aayushg.com/posts/zkemail/) for an explainer on how ZK Email works. The main idea is that we use the ZK Email circuit to verify that a KYC confirmation email from e.g., Coinbase, is real. We also use ZK-regex circuits to match the subject of the email with that of a KYC confirmation email.

To prevent someone from just minting infinite proof of personhood badges, we also attach a nullifier to every set of inputs. In our case, we concatenate the body hashes from the Airbnb and Coinbase confirmation emails and then hash that.

This is actually why we need two KYCs: one from Airbnb and one from Coinbase. At first glance, such a ZK KYC implementation could work with just Airbnb and an Airbnb KYC confirmation email, where your public nullifier is just the hash of the signature or the body hash. However, under such a setup, Airbnb would still be able to de-anonymize you from your public nullifier. Under our setup, Airbnb and Coinbase would have to collude in order to de-anonymize you. If we wanted to, we could add even more KYC requirements to make the system even more secure.

## Known issues

Our current setup has several limitations on who can generate a ZK KYC:
- Old public keys. Email domains typically rotate their public keys every six-or-so months. As a result, older KYC confirmation emails can't get verified. We are storing public keys so that we can check against them in the future, but unfortunately we don't have access to most of the older Airbnb/Coinbase public keys
- New email format. If Airbnb suddenly decides to change the subject of their KYC confirmation emails, we will need to build a new zk regex circuit to match that new subject. Such formatting changes have happened in the past with Coinbase emails.

# Legacy

# ZK Email Verify

**WIP: This tech is extremely tricky to use and very much a work in progress, and we do not recommend use in any production application right now. This is both due to unaudited code, and several theoretical gotchas such as lack of nullifiers, no signed bcc’s, non-nested reply signatures, upgradability of DNS, and hash sizings. None of these affect our current Twitter MVP usecase, but are not generally guaranteed. If you have a possible usecase, we are happy to help brainstorm if your trust assumptions are in fact correct!**
Expand Down Expand Up @@ -209,7 +237,9 @@ Change s3 address in the frontend to your bucket.
To do a non-chunked zkey for non-browser running,

```

yarn compile-all

```

### Really Large Circuits
Expand Down Expand Up @@ -383,26 +413,12 @@ const word_char = '(a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z|A|B|C|D|
let regex = `email was meant for @${word_char}+`;
```

To understand these better, use https://cyberzhg.github.io/toolbox/ and use the 3 regex tools for visualization of the min-DFA state.

## FAQ/Possible Errors

### Can you provide an example header for me to understand what exactly is signed?

We are hijacking DKIM signatures in order to verify parts of emails, which can be verified on chain via succinct zero knowledge proofs. Here is an example of the final, canoncalized actual header string that is signed by google.com's public key:

`to:"[email protected]" <[email protected]>\r\nsubject:test email\r\nmessage-id:<CAOmXgjU78_L7d-H7Wqf2qph=-uED3Kw6NEU2PzSP6jiUH0Bb+Q@mail.gmail.com>\r\ndate:Fri, 24 Mar 2023 13:02:10 +0700\r\nfrom:ZK Email <[email protected]>\r\nmime-version:1.0\r\ndkim-signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679637743; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=gCRK/FdzAYnMHic55yb00uF8AHZ/3HvyLVQJbWQ2T8o=; b=`

Thus, we can extract whatever information we want out of here via regex, including to/from/body hash! We can do the same for an email body.

### I'm having issues with the intricacies of the SHA hashing. How do I understand the function better?

Use https://sha256algorithm.com/ as an explainer! It's a great visualization of what is going on, and our code should match what is going on there.

### I'm having trouble with regex or base64 understanding. How do I understand that better?

Use https://cyberzhg.github.io/toolbox/ to experiement with conversions to/from base64 and to/from DFAs and NFAs.

### What are the differences between generating proofs (snarkjs.groth16.fullprove) on the client vs. on a server?

If the server is generating the proof, it has to have the private input. We want people to own their own data, so client side proving is the most secure both privacy and anonymity wise. There are fancier solutions (MPC, FHE, recursive proofs etc), but those are still in the research stage.
Expand All @@ -422,21 +438,7 @@ const = result.results[0].publicKey.toString();
TypeError: Cannot read properties of undefined (reading 'toString')
```

You need to have internet connection while running dkim verification locally, in order to fetch the public key. If you have internet connection, make sure you downloaded the email with the headers: you should see a DKIM section in the file. DKIM verifiction may also fail after the public keys rotate, though this has not been confirmed.

### How do I lookup the RSA pubkey for a domain?

Use [easydmarc.com/tools/dkim-lookup?domain=twitter.com](https://easydmarc.com/tools/dkim-lookup?domain=twitter.com).

### DKIM parsing/public key errors with generate_input.ts

```
Writing to file...
/Users/aayushgupta/Documents/.projects.nosync/zk-email-verify/src/scripts/generate_input.ts:190
throw new Error(`No public key found on generate_inputs result ${JSON.stringify(result)}`);
```

Depending on the "info" error at the end of the email, you probably need to go through src/helpers/dkim/\*.js and replace some ".replace" functions with ".replaceAll" instead (likely tools.js), and also potentially strip some quotes.
You need to have internet connection while running dkim verification locally, in order to fetch the public key. If you have internet connection, make sure you downloaded the email with the headers: you should see a DKIM section in the file.

### No available storage method found.

Expand Down Expand Up @@ -476,9 +478,9 @@ Apologies, this part is some messy legacy code from previous projects. You use v

zkp.ts is the key file that calls the important proving functions. You should be able to just call the exported functions from there, along with setting up your own s3 bucket and setting the constants at the top.

### What is the licensing on this technology?
### Why did you choose GPL over MIT licensing?

Everything we write is MIT licensed. Note that circom and circomlib is GPL. Broadly we are pro permissive open source usage with attribution! We hope that those who derive profit from this, contribute that money altruistically back to this technology and open source public good.
Since circom is GPL, we are forced to use the GPL license, which is still a highly permissive license. You can dm us if you'd like to treat non-circom parts of the repo as MIT licensed, but broadly we are pro permissive open source usage with attribution! We hope that those who derive profit from this primitive contribute that money altruistically back to this technology.

## To-Do

Expand Down
111 changes: 111 additions & 0 deletions circuits/email_airbnb.circom
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
pragma circom 2.1.5;

include "../node_modules/circomlib/circuits/bitify.circom";
include "./helpers/sha.circom";
include "./helpers/rsa.circom";
include "./helpers/base64.circom";
include "./helpers/extract.circom";

include "./regexes/from_regex.circom";
include "./regexes/tofrom_domain_regex.circom";
include "./regexes/body_hash_regex.circom";
include "./regexes/to_regex.circom";
include "./regexes/subject_regex.circom";
include "./regexes/airbnb_kyc_regex.circom";

// Here, n and k are the biginteger parameters for RSA
// This is because the number is chunked into k pack_size of n bits each
// Max header bytes shouldn't need to be changed much per email,
// but the max mody bytes may need to be changed to be larger if the email has a lot of i.e. HTML formatting
// TODO: split into header and body
template AirbnbEmailVerify(max_header_bytes, n, k) {
assert(max_header_bytes % 64 == 0);
// assert(n * k > 2048); // constraints for 2048 bit RSA
assert(n * k > 1024); // costraints for 1024 bit RSA
assert(n < (255 \ 2)); // we want a multiplication to fit into a circom signal

signal input in_padded[max_header_bytes]; // prehashed email data, includes up to 512 + 64? bytes of padding pre SHA256, and padded with lots of 0s at end after the length
signal input modulus[k]; // rsa pubkey, verified with smart contract + optional oracle
signal input signature[k];
signal input in_len_padded_bytes; // length of in email data including the padding, which will inform the sha256 block length

signal input email_to_idx;
signal output to_email[max_header_bytes]; // to email address of email

// Identity commitment variables
// (note we don't need to constrain the +1 due to https://geometry.xyz/notebook/groth16-malleability)
signal input address;

// Base 64 body hash variables
var LEN_SHA_B64 = 44; // ceil(32/3) * 4, should be automatically calculated.
signal input body_hash_idx;
signal output body_hash_reveal[LEN_SHA_B64];

// SHA HEADER: 506,670 constraints
// This calculates the SHA256 hash of the header, which is the "base_msg" that is RSA signed.
// The header signs the fields in the "h=Date:From:To:Subject:MIME-Version:Content-Type:Message-ID;"
// section of the "DKIM-Signature:"" line, along with the body hash.
// Note that nothing above the "DKIM-Signature:" line is signed.
component sha = Sha256Bytes(max_header_bytes);
sha.in_padded <== in_padded;
sha.in_len_padded_bytes <== in_len_padded_bytes;
var msg_len = (256+n)\n;

component base_msg[msg_len];
for (var i = 0; i < msg_len; i++) {
base_msg[i] = Bits2Num(n);
}
for (var i = 0; i < 256; i++) {
base_msg[i\n].in[i%n] <== sha.out[255 - i];
}
for (var i = 256; i < n*msg_len; i++) {
base_msg[i\n].in[i%n] <== 0;
}

// VERIFY RSA SIGNATURE: 149,251 constraints
// The fields that this signature actually signs are defined as the body and the values in the header
component rsa = RSAVerify65537(n, k);
for (var i = 0; i < msg_len; i++) {
rsa.base_message[i] <== base_msg[i].out;
}
for (var i = msg_len; i < k; i++) {
rsa.base_message[i] <== 0;
}
rsa.modulus <== modulus;
rsa.signature <== signature;

// TO HEADER REGEX: X constraints
// This extracts the to email, and the precise regex format can be viewed in the README
// We cannot use to: field at all due to Hotmail
signal to_regex_out, to_regex_reveal[max_header_bytes];
(to_regex_out, to_regex_reveal) <== ToRegex(max_header_bytes)(in_padded);
to_regex_out === 1;
to_email <== VarShiftLeft(max_header_bytes, max_header_bytes)(to_regex_reveal, email_to_idx); // can probably change output length

// BODY HASH REGEX: 617,597 constraints
// This extracts the body hash from the header (i.e. the part after bh= within the DKIM-signature section)
// which is used to verify the body text matches this signed hash + the signature verifies this hash is legit
signal bh_regex_out, bh_reveal[max_header_bytes];
(bh_regex_out, bh_reveal) <== BodyHashRegex(max_header_bytes)(in_padded);
bh_regex_out === 1;
body_hash_reveal <== VarShiftLeft(max_header_bytes, LEN_SHA_B64)(bh_reveal, body_hash_idx);

// AIRBNB REGEX: X constraints
// Checks Airbnb regex matches KYC confirmation email
component airbnb_regex = AirbnbKYCRegex(max_header_bytes);
airbnb_regex.msg <== in_padded;
// This ensures we found a match at least once
component found_airbnb = IsZero();
found_airbnb.in <== airbnb_regex.out;
found_airbnb.out === 0;
// log(airbnb_regex.out);
}

// In circom, all output signals of the main component are public (and cannot be made private), the input signals of the main component are private if not stated otherwise using the keyword public as above. The rest of signals are all private and cannot be made public.
// This makes modulus and reveal_twitter_packed public. hash(signature) can optionally be made public, but is not recommended since it allows the mailserver to trace who the offender is.

// Args:
// * max_header_bytes = 1024 is the max number of bytes in the header
// * n = 121 is the number of bits in each chunk of the modulus (RSA parameter)
// * k = 17 is the number of chunks in the modulus (RSA parameter)
// component main { public [ modulus, address ] } = AirbnbEmailVerify(1024, 121, 17);
109 changes: 109 additions & 0 deletions circuits/email_both.circom
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
pragma circom 2.1.5;

include "./email_airbnb.circom";
include "./email_coinbase.circom";

// Current "to email" extractor only works on certain formats of emails. In particular, if there's a name after the "To:" and before the email, the extractor will extract part of the extra text.
// The current setup works for Airbnb and Coinbase confirmation emails, although that may change in the future.

// Here, n and k are the biginteger parameters for RSA
// This is because the number is chunked into n chunks of k bits each
// Max header bytes shouldn't need to be changed much per email,
// but the max mody bytes may need to be changed to be larger if the email has a lot of i.e. HTML formatting
template KYCVerify(max_header_bytes, n, k) {
assert(max_header_bytes % 64 == 0);
// assert(n * k > 2048); // constraints for 2048 bit RSA, e.g., Twitter
assert(n * k > 1024); // constraints for 1024 bit RSA, e.g., Airbnb and Coinbase
assert(n < 255 \ 2); // we want a multiplication to fit into a circom signal

var max_packed_bytes = (max_header_bytes - 1) \ 7 + 1; // ceil(max_header_bytes / 7)
signal body_hash_concat[128]; // body hash output from each email has length 44, round up to lowest multiple of 64

// ADDRESS INPUT SIGNALS
signal input address;

// AIRBNB INPUT SIGNALS
signal input in_padded_airbnb[max_header_bytes]; // prehashed email data, includes up to 512 + 64? bytes of padding pre SHA256, and padded with lots of 0s at end after the length
signal input modulus_airbnb[k]; // rsa pubkey, verified with smart contract + optional oracle
signal input signature_airbnb[k];
signal input in_len_padded_bytes_airbnb; // length of in email data including the padding, which will inform the sha256 block length
signal input body_hash_idx_airbnb;
signal input email_to_idx_airbnb;

// COINBASE INPUT SIGNALS
signal input in_padded_coinbase[max_header_bytes]; // prehashed email data, includes up to 512 + 64? bytes of padding pre SHA256, and padded with lots of 0s at end after the length
signal input modulus_coinbase[k]; // rsa pubkey, verified with smart contract + optional oracle
signal input signature_coinbase[k];
signal input in_len_padded_bytes_coinbase; // length of in email data including the padding, which will inform the sha256 block length
signal input body_hash_idx_coinbase;
signal input email_to_idx_coinbase;

// OUTPUT SIGNALS
// Outputs the hash of the two body hashes, which serves as the nullifier
// Currently doesn't output from/to emails for domain check but should probably add that later
var pack_size = 7;
var output_len = (32 - 1) \ pack_size + 1; // output_len = 5
signal output nullifier[output_len];

component airbnb_verify = AirbnbEmailVerify(max_header_bytes, n, k);
component coinbase_verify = CoinbaseEmailVerify(max_header_bytes, n, k);

// AIRBNB EMAIL IMPUTS
airbnb_verify.in_padded <== in_padded_airbnb;
airbnb_verify.modulus <== modulus_airbnb;
airbnb_verify.signature <== signature_airbnb;
airbnb_verify.in_len_padded_bytes <== in_len_padded_bytes_airbnb;
airbnb_verify.body_hash_idx <== body_hash_idx_airbnb;
airbnb_verify.email_to_idx <== email_to_idx_airbnb;
airbnb_verify.address <== address;

// COINBASE EMAIL INPUTS
coinbase_verify.in_padded <== in_padded_coinbase;
coinbase_verify.modulus <== modulus_coinbase;
coinbase_verify.signature <== signature_coinbase;
coinbase_verify.in_len_padded_bytes <== in_len_padded_bytes_coinbase;
coinbase_verify.body_hash_idx <== body_hash_idx_coinbase;
coinbase_verify.email_to_idx <== email_to_idx_coinbase;
coinbase_verify.address <== address;

// CHECK TO-EMAILS MATCH
// Check that the to emails match
signal to_email_airbnb[max_header_bytes] <== airbnb_verify.to_email;
signal to_email_coinbase[max_header_bytes] <== coinbase_verify.to_email;
for (var i = 0; i < max_header_bytes; i++) {
to_email_airbnb[i] === to_email_coinbase[i];
}

// PACKED OUTPUT
// Nullifier output for solidity verifier
for (var i = 0; i < 44; i++) {
body_hash_concat[i] <== airbnb_verify.body_hash_reveal[i];
body_hash_concat[i + 44] <== coinbase_verify.body_hash_reveal[i];
}
for (var i = 88; i < 128; i++) {
body_hash_concat[i] <== 0;
}
component sha = Sha256Bytes(128);
sha.in_padded <== body_hash_concat;
sha.in_len_padded_bytes <== 128;

// Convert SHA output into 256/8 = 32 bytes
component sha_out[32];
for (var i = 0; i < 32; i++) {
sha_out[i] = Bits2Num(8);
}
for (var i = 0; i < 256; i++) {
sha_out[i\8].in[i%8] <== sha.out[255 - i];
}

component packer = PackBytes(32, output_len, pack_size);
for (var i = 0; i < 32; i++) {
packer.in[i] <== sha_out[i].out;
}
nullifier <== packer.out;
}

// In circom, all output signals of the main component are public (and cannot be made private), the input signals of the main component are private if not stated otherwise using the keyword public as above. The rest of signals are all private and cannot be made public.
// This makes modulus and reveal_twitter_packed public. hash(signature) can optionally be made public, but is not recommended since it allows the mailserver to trace who the offender is.
// TODO: change public signals in smart contract to match new public signals
component main { public [ modulus_airbnb, modulus_coinbase, address ] } = KYCVerify(1024, 121, 17);
Loading