-
Notifications
You must be signed in to change notification settings - Fork 775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize xxh32 and xxh64 with ARM SVE instructions #737
Comments
High level question : does the multi-buffer method generate the same final hash value as the regular one ? |
Only a part of code could be vectorized in multi-buffer, just like the block operation in XXH3. The left calculation is handled in each job. The final hash value must be same as the regular one. |
Has it been already validated ? A high level explanation would also be welcome, |
Yes, xxhash_mb_rand_test.c compare hash value with the result from regular xxhash.
I'll try to add more explanation while I re-organize the patches. |
This issue is primarily focused on a target implementation of but a generic important claim made here is that If that's confirmed, this would be a fairly important property, that could be applied beyond SVE across multiple instruction sets. |
With pull request #713 , XXH3 is optimized by ARM SVE instructions. Since data is divided in blocks in XXH3, and vector instructions could handle data in parallel.
For XXH32 & XXH64, data is fetched with stream. So a new method (multi-buffer) could be used to adopt vector instructions. The implementation is in https://github.com/hzhuang1/isa-l_crypto/tree/debug_xxh32. Multi-buffer also means multiple jobs. With multiple jobs running in parallel, vector instructions could be used to accelerate.
The performance data fetched from two machines is above. One is SVE512 (fujitsu), and the other is SVE256 (AWS).
The text was updated successfully, but these errors were encountered: