-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIMD-0095: Extendable Output (XOF) Hashing Support #95
base: main
Are you sure you want to change the base?
Conversation
@ankeleralph Thanks for submitting this SIMD. Somewhat related: The Firedancer team has also been working on a native new cryptographic set accumulator to replace the epoch account hash, but we haven't gotten to publishing a SIMD yet. This accumulator needs a 2048-bit hash function. Conveniently, accounts are already hashed with BLAKE3, which can be trivially expanded to an XOF while calculating the final block of the hash function. Solana already suffers from a proliferation of too many hash functions, and it costs a lot of time to write optimized implementations for validator clients. We should therefore avoid standardizing two different XOF hash functions. Without considering the cryptographic quality of SHAKE vs BLAKE3, I would prefer BLAKE3: It is already widely used and native to the Solana runtime. Exposing it to smart contracts is therefore also a much smaller implementation change than introducing a new hash function. Could you amend this proposal to evaluate BLAKE3 XOF as an alternative? I'm sure there are some arguments against BLAKE3 for your proposed use cases. |
Another alternative is to implement the BulletProof zero-knowledge library as a | ||
native program entirely, however, this would limit the use cases that can | ||
additionally be enabled by supporting the customable extendable output functin | ||
cSHAKE, and the merlin transcripts. Though, supporting a native zero-knowledge | ||
proof library would likely be more efficient. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like introducing syscalls for cSHAKE would only serve to accelerate computation.
When these syscalls are introduced, usually before/after benchmarks are provided.
How much faster would your proposed syscalls be compared to an eBPF implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will re-double this message. Adding new syscalls can hurt overall performance for all smart contracts. It would be useful to see an SBPF implementation as that may be sufficient, especially if only one or a handful of apps need this right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are much more benefits that other developers could have from existing cSHAKE syscalls. A cSHAKE implementation would enable to build customisable output length hash functions, domain separation for different protocol components and potential higher security due to customisation features. Overall it would make it a more versatile and potentially robust choice for some cryptographic operations within the Solana VM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ankeleralph I'm comparing against a pure eBPF implementation here, not to other syscalls. A pure eBPF implementation is even more versatile than the proposed syscall because you can optimize and extend the logic (e.g. use different parameters for the sponge function) without a hard fork.
define_syscall!(fn sol_cshake128(vals: *const u8, val_len: u64, func_name: | ||
*const u8, cust_string: *const u8, hash_result: *mut u8) -> u64); | ||
define_syscall!(fn sol_cshake256(vals: *const u8, val_len: u64, func_name: | ||
*const u8, cust_string: *const u8, hash_result: *mut u8) -> u64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, cshake
is only a thin wrapper over SHAKE
and Keccak
. keccak256 is already exposed via a syscall. Why not separately expose a SHAKE syscall and implement the wrapper in eBPF? This would be more flexible and have small overhead in eBPF.
Although other syscalls don't do this yet, I'd strongly recommend a batch-style API that takes multiple inputs.
On modern x86, the fastest hashing technique for the SHA2/3 and BLAKE2/3 families of hash functions is typically a SIMD approach where 8 or 16 hash states are calculated at once. This is typically 2-4x faster over hashing one message at a time.
define_syscall!(fn sol_strobe128_ad(...) -> u64); | ||
define_syscall!(fn sol_strobe128_meta_ad(...) -> u64); | ||
define_syscall!(fn sol_strobe128_key(...) -> u64); | ||
define_syscall!(fn sol_strobe128_prf(...) -> u64); | ||
|
||
define_syscall!(fn sol_strobe256_ad(...) -> u64); | ||
define_syscall!(fn sol_strobe256_meta_ad(...) -> u64); | ||
define_syscall!(fn sol_strobe256_key(...) -> u64); | ||
define_syscall!(fn sol_strobe256_prf(...) -> u64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that separate syscalls would provide a significant speedup over an eBPF implementation of strobe that uses a batch SHAKE or Keccak syscall as described here: https://github.com/solana-foundation/solana-improvement-documents/pull/95/files#r1438427328
AFAICT, the most expensive operation in the Strobe framework is Keccak/SHAKE hashing. The rest seems to be just byte array concatenation, which is decently fast in eBPF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. We need benchmarks to see how long the keccak portion takes vs the other work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its obvious that most performance will be required by the underlying call to the sponge function (in this case cSHAKE or Keccak), as the metadata operations will not require a lot of overhead. Let me know if you want to see some more concrete numbers for benchmarks, happy to quickly do a few to support the adoption of this proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove these metadata operations then? My overall concern as a validator maintainer is the maintenance burden.
It would take me a week to build cross-client testing infrastructure for all of these syscalls. This would also include formal verification (https://saw.galois.com/) of the Rust (Labs/Agave) and C (Firedancer) implementations for equivalence. All this overhead would be unnecessary if implemented in eBPF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the proposal is a bit confusing. It would be useful to know who would actually be the concrete app(s)/user(s) of all of these new syscalls and hashing formulations. Its seems as though the STROBE bits can be implemented in the VM and do not require their own syscalls. Benchmarks will also be necessary. I would recommend also addressing @ripatel-fd's comments.
|
||
This proposal introduces three new concepts to the Solana runtime: | ||
|
||
- Support extendable Output Functions (XOF) hashing, based on cSHAKE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BLAKE3 has XOF, could we not make the existing syscall for BLAKE3 just support XOF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding, cSHAKE vs BLAKE3, as @samkim-crypto already mentioned this would require developers to change the proof generation as well, which would create an additional barrier to move the proof verification on-chain.
Another alternative is to implement the BulletProof zero-knowledge library as a | ||
native program entirely, however, this would limit the use cases that can | ||
additionally be enabled by supporting the customable extendable output functin | ||
cSHAKE, and the merlin transcripts. Though, supporting a native zero-knowledge | ||
proof library would likely be more efficient. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will re-double this message. Adding new syscalls can hurt overall performance for all smart contracts. It would be useful to see an SBPF implementation as that may be sufficient, especially if only one or a handful of apps need this right now.
|
||
#### Implementation Details | ||
|
||
The Strobe designers released an official implementation in C available at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unlikely the FD will use this implementation and will roll our own.
define_syscall!(fn sol_strobe128_ad(...) -> u64); | ||
define_syscall!(fn sol_strobe128_meta_ad(...) -> u64); | ||
define_syscall!(fn sol_strobe128_key(...) -> u64); | ||
define_syscall!(fn sol_strobe128_prf(...) -> u64); | ||
|
||
define_syscall!(fn sol_strobe256_ad(...) -> u64); | ||
define_syscall!(fn sol_strobe256_meta_ad(...) -> u64); | ||
define_syscall!(fn sol_strobe256_key(...) -> u64); | ||
define_syscall!(fn sol_strobe256_prf(...) -> u64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. We need benchmarks to see how long the keccak portion takes vs the other work.
define_syscall!(fn sol_strobe128_ad(...) -> u64); | ||
define_syscall!(fn sol_strobe128_meta_ad(...) -> u64); | ||
define_syscall!(fn sol_strobe128_key(...) -> u64); | ||
define_syscall!(fn sol_strobe128_prf(...) -> u64); | ||
|
||
define_syscall!(fn sol_strobe256_ad(...) -> u64); | ||
define_syscall!(fn sol_strobe256_meta_ad(...) -> u64); | ||
define_syscall!(fn sol_strobe256_key(...) -> u64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every new syscall adds overhead and hurts performance for the VM vs an SBPF implementation for some specific app. I would recommend finding an approach similar to what @ripatel-fd has mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concur with @lheeger-jump. The technical reasons for this is the code footprint of a compiled interpreter. Right now, all the interpreter core and all its cheap syscalls probably fit in L2 cache. The more syscalls we add, the more instruction cache pressure is increased across every program execution.
On an FPGA implementation of the VM, this is even worse as you start running into hard physical constraints.
The main application for this would be the boomerang protocol for Brave (https://arxiv.org/pdf/2401.01353.pdf), which uses Bulletproofs for the zkp. The main components that currently make the bulletproof implementation difficult is the curve operations and the support for hash function needed for Fiat-Shamir. The curve operations can use the curve25519 syscalls, but we currently do not have support for XOF hash support to implement Fiat-Shamir. In theory, the Fiat-Shamir heuristic can be implemented using any cryptographic hash function including BLAKE3 XOF. Brave expects boomerang to have about 5 million users for the boomerang protocol (@ankeleralph ). Since token22 also uses bulletproofs, this proposal can also benefit future zkp additions to token22 as well. Given that SHAKE is used in many of the newer (non-bulletproof) zk systems for Fiat-Shamir as well, I would be in favor of this SIMD given that the above points are addressed. |
I think then instead, I would propose a replacement for ZkTokenProgram as these two would be redundant |
Yes, ZkTokenProgram uses bulletproofs plus additional smaller proofs that are outside the scope of bulletproofs, but if this proposal is implemented, then the instructions in ZkTokenProgram could just be replaced with the curve25519 and XOF hash support. |
Hi @lheeger-jump, @ripatel-fd, to clarify my intended usecase, we are building a protocol (more details in our research paper, or blog post) that uses Bulletproof zero-knowledge proofs, which we would like to verify within a Solana program. Therefore, we would need support for the merlin transcripts, that are based on STROBE, and STROBE is using cSHAKE. |
@ankeleralph Thank you for the context.
Solana provides facilities for sharing eBPF code across different programs either by statically linking it in via Cargo or by deploying a shared program invoked via cross-program-invocation. Would you be open to trying that first? |
Doing an implementation on chain as @ripatel-fd mentioned is a good way to verify the feasibility of the feature. |
This proposal introduces three new concepts to the Solana runtime:
version of SHAKE
Using the above new concepts would enable regular Solana programs to:
zero-knowledge proofs, which turns interactive proofs into non-interactive proofs