Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite crypto.{randomBytes,randomFill} implementation in Zig #14204

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

wpaulino
Copy link
Contributor

This moves most of the validation logic for both of these functions to happen natively. Ultimately, we want the node crypto library to be a thin wrapper over native implementations. The actual randomness generation for both was already covered by native code with getRandomValues. This is still the case, but now we invoke the underlying randomness generator behind getRandomValues from a native context.

The new implementation of randomFill is more efficient as we were previously allocating a new random buffer and copying over the bytes. This is no longer the case -- we now mutate the buffer directly natively.

These changes had the following result (on an Apple M1 Pro):

benchmark                time (avg)             (min … max)       p75       p99      p999
----------------------------------------------------------- -----------------------------
randomBytes 64B        34.6 ns/iter     (31.11 ns … 366 ns)  31.74 ns  59.12 ns    276 ns
randomBytes 256B      77.96 ns/iter     (70.64 ns … 374 ns)  74.38 ns    260 ns    343 ns
randomBytes 1K          372 ns/iter       (335 ns … 746 ns)    359 ns    700 ns    746 ns
randomBytes 4K          891 ns/iter     (801 ns … 1'174 ns)  1'005 ns  1'131 ns  1'174 ns
randomBytes 16K       2'660 ns/iter   (2'560 ns … 3'214 ns)  2'683 ns  2'994 ns  3'214 ns
randomBytes 64K      10'092 ns/iter     (8'625 ns … 678 µs)  9'000 ns 33'708 ns    378 µs
randomBytes 256K     39'682 ns/iter    (34'333 ns … 674 µs) 35'375 ns 71'083 ns    542 µs
randomBytes 1M          157 µs/iter     (137 µs … 1'345 µs)    142 µs    597 µs    824 µs
randomFillSync 64B    24.21 ns/iter    (22.99 ns … 73.4 ns)  23.17 ns  37.82 ns  44.41 ns
randomFillSync 256B   63.55 ns/iter   (61.12 ns … 91.04 ns)  62.38 ns  76.88 ns  81.85 ns
randomFillSync 1K       307 ns/iter       (272 ns … 364 ns)    313 ns    338 ns    364 ns
randomFillSync 4K       755 ns/iter       (720 ns … 796 ns)    768 ns    788 ns    796 ns
randomFillSync 16K    2'328 ns/iter   (2'219 ns … 2'422 ns)  2'367 ns  2'420 ns  2'422 ns
randomFillSync 64K    8'715 ns/iter   (8'435 ns … 8'849 ns)  8'793 ns  8'842 ns  8'849 ns
randomFillSync 256K  37'038 ns/iter (34'708 ns … 71'250 ns) 36'334 ns 64'500 ns 66'000 ns
randomFillSync 1M       144 µs/iter       (137 µs … 193 µs)    142 µs    171 µs    185 µs

bun/main:

benchmark                time (avg)             (min … max)       p75       p99      p999
----------------------------------------------------------- -----------------------------
randomBytes 64B       33.65 ns/iter     (30.15 ns … 363 ns)  31.15 ns  58.65 ns    283 ns
randomBytes 256B      78.38 ns/iter     (70.66 ns … 393 ns)  75.99 ns    265 ns    355 ns
randomBytes 1K          388 ns/iter       (345 ns … 844 ns)    376 ns    708 ns    844 ns
randomBytes 4K          907 ns/iter     (807 ns … 1'357 ns)    988 ns  1'229 ns  1'357 ns
randomBytes 16K       2'686 ns/iter   (2'564 ns … 3'172 ns)  2'714 ns  2'979 ns  3'172 ns
randomBytes 64K      10'118 ns/iter     (8'625 ns … 681 µs)  9'042 ns 33'125 ns    389 µs
randomBytes 256K     39'792 ns/iter    (34'416 ns … 643 µs) 35'875 ns 73'875 ns    508 µs
randomBytes 1M          159 µs/iter     (137 µs … 1'313 µs)    143 µs    621 µs    862 µs
randomFillSync 64B       59 ns/iter     (54.14 ns … 375 ns)  55.79 ns  99.47 ns    331 ns
randomFillSync 256B     107 ns/iter     (98.82 ns … 414 ns)    105 ns    286 ns    395 ns
randomFillSync 1K       416 ns/iter       (378 ns … 766 ns)    403 ns    730 ns    766 ns
randomFillSync 4K       992 ns/iter     (900 ns … 1'294 ns)  1'101 ns  1'253 ns  1'294 ns
randomFillSync 16K    2'944 ns/iter   (2'851 ns … 3'393 ns)  2'970 ns  3'054 ns  3'393 ns
randomFillSync 64K   11'014 ns/iter     (9'666 ns … 648 µs)  9'917 ns 34'542 ns    380 µs
randomFillSync 256K  50'979 ns/iter  (39'416 ns … 1'276 µs) 39'958 ns    405 µs  1'162 µs
randomFillSync 1M       198 µs/iter     (156 µs … 1'538 µs)    164 µs  1'146 µs  1'280 µs

This moves most of the validation logic for both of these functions to
happen natively. Ultimately, we want the node crypto library to be a
thin wrapper over native implementations. The actual randomness
generation for both was already covered by native code with
`getRandomValues`. This is still the case, but now we invoke the
underlying randomness generator behind `getRandomValues` from a native
context.

The new implementation of `randomFill` is more efficient as we were
previously allocating a new random buffer and copying over the bytes.
This is no longer the case -- we now mutate the buffer directly
natively.

These changes had the following result (on an Apple M1 Pro):

```
benchmark                time (avg)             (min … max)       p75       p99      p999
----------------------------------------------------------- -----------------------------
randomBytes 64B        34.6 ns/iter     (31.11 ns … 366 ns)  31.74 ns  59.12 ns    276 ns
randomBytes 256B      77.96 ns/iter     (70.64 ns … 374 ns)  74.38 ns    260 ns    343 ns
randomBytes 1K          372 ns/iter       (335 ns … 746 ns)    359 ns    700 ns    746 ns
randomBytes 4K          891 ns/iter     (801 ns … 1'174 ns)  1'005 ns  1'131 ns  1'174 ns
randomBytes 16K       2'660 ns/iter   (2'560 ns … 3'214 ns)  2'683 ns  2'994 ns  3'214 ns
randomBytes 64K      10'092 ns/iter     (8'625 ns … 678 µs)  9'000 ns 33'708 ns    378 µs
randomBytes 256K     39'682 ns/iter    (34'333 ns … 674 µs) 35'375 ns 71'083 ns    542 µs
randomBytes 1M          157 µs/iter     (137 µs … 1'345 µs)    142 µs    597 µs    824 µs
randomFillSync 64B    24.21 ns/iter    (22.99 ns … 73.4 ns)  23.17 ns  37.82 ns  44.41 ns
randomFillSync 256B   63.55 ns/iter   (61.12 ns … 91.04 ns)  62.38 ns  76.88 ns  81.85 ns
randomFillSync 1K       307 ns/iter       (272 ns … 364 ns)    313 ns    338 ns    364 ns
randomFillSync 4K       755 ns/iter       (720 ns … 796 ns)    768 ns    788 ns    796 ns
randomFillSync 16K    2'328 ns/iter   (2'219 ns … 2'422 ns)  2'367 ns  2'420 ns  2'422 ns
randomFillSync 64K    8'715 ns/iter   (8'435 ns … 8'849 ns)  8'793 ns  8'842 ns  8'849 ns
randomFillSync 256K  37'038 ns/iter (34'708 ns … 71'250 ns) 36'334 ns 64'500 ns 66'000 ns
randomFillSync 1M       144 µs/iter       (137 µs … 193 µs)    142 µs    171 µs    185 µs
```

bun/main:

```
benchmark                time (avg)             (min … max)       p75       p99      p999
----------------------------------------------------------- -----------------------------
randomBytes 64B       33.65 ns/iter     (30.15 ns … 363 ns)  31.15 ns  58.65 ns    283 ns
randomBytes 256B      78.38 ns/iter     (70.66 ns … 393 ns)  75.99 ns    265 ns    355 ns
randomBytes 1K          388 ns/iter       (345 ns … 844 ns)    376 ns    708 ns    844 ns
randomBytes 4K          907 ns/iter     (807 ns … 1'357 ns)    988 ns  1'229 ns  1'357 ns
randomBytes 16K       2'686 ns/iter   (2'564 ns … 3'172 ns)  2'714 ns  2'979 ns  3'172 ns
randomBytes 64K      10'118 ns/iter     (8'625 ns … 681 µs)  9'042 ns 33'125 ns    389 µs
randomBytes 256K     39'792 ns/iter    (34'416 ns … 643 µs) 35'875 ns 73'875 ns    508 µs
randomBytes 1M          159 µs/iter     (137 µs … 1'313 µs)    143 µs    621 µs    862 µs
randomFillSync 64B       59 ns/iter     (54.14 ns … 375 ns)  55.79 ns  99.47 ns    331 ns
randomFillSync 256B     107 ns/iter     (98.82 ns … 414 ns)    105 ns    286 ns    395 ns
randomFillSync 1K       416 ns/iter       (378 ns … 766 ns)    403 ns    730 ns    766 ns
randomFillSync 4K       992 ns/iter     (900 ns … 1'294 ns)  1'101 ns  1'253 ns  1'294 ns
randomFillSync 16K    2'944 ns/iter   (2'851 ns … 3'393 ns)  2'970 ns  3'054 ns  3'393 ns
randomFillSync 64K   11'014 ns/iter     (9'666 ns … 648 µs)  9'917 ns 34'542 ns    380 µs
randomFillSync 256K  50'979 ns/iter  (39'416 ns … 1'276 µs) 39'958 ns    405 µs  1'162 µs
randomFillSync 1M       198 µs/iter     (156 µs … 1'538 µs)    164 µs  1'146 µs  1'280 µs
```

switch (slice.len) {
0 => {},
// 512 bytes or less we reuse from the same cache as UUID generation.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was wrong, it's actually 256 bytes based on the current values. Not sure if we intended for the comment or the code to be the source of truth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant