Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Keccak #65

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from
Draft

Add Keccak #65

wants to merge 13 commits into from

Conversation

mkannwischer
Copy link
Collaborator

WIP adding Keccak via SLOTHY.

Right now this is a hybrid 4x Keccak (2 scalar, 2 Neon). I de-interleaved the previous manual-interleaved code and optimized it via SLOTHY. There is still a lot of potential for refactoring.

In the current state (slothy-optimizer/pqax@c69030c), the results look as follow:

[0|5|25|50|75|95|100] = [(7670) | 7671 | 7671 |* 7672 *| 7675 | 7697 | (7709)] (100-th AVGs of keccak_f1600_x4_hybrid_slothy)
[0|5|25|50|75|95|100] = [(6623) | 6624 | 6624 |* 6624 *| 6628 | 6646 | (6672)] (100-th AVGs of keccak_f1600_x4_hybrid_slothy_opt_a55)

For reference:

The 6624 is already quite a bit faster than the 7288 reported in https://kannwischer.eu/papers/2022_armv8keccak.pdf
This is still slower than the 1x scalar one in the same paper which was 1418; 1418*4=5672)

Related to slothy-optimizer/pqax#6

@hanno-becker
Copy link
Collaborator

Rebase on top of #81

When `split_heuristic_preprocess_naive_interleaving` is enabled,
SLOTHY preprocesses the input by naively reordering instructions
according to their depths in the computational flow graph.

This commit introduces another naive interleaving strategy
"alternate" which will make SLOTHY alternate evenly between
instructions tagged with `interleaving_class=0/1`. This is
useful when two sequential blocks of code are to be interleaved
as evenly as possible, which is common in scalar/Neon hybrids.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants