-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support base64 encoding #45
Comments
As I've mentioned in #27 (comment) the encoding is out of scope for this library, and if you want to have an optimised WebAssembly encoding, it would be much more appropriate to have a specialised library for this. It seems that you are suggesting that the output of the hash function (= digest) should be If I understood you correctly, you'd want a WebAssembly implementation of |
So I created this request for both performance and API conformance... For performance, I was misreading the Performance section of the README. It sounded on initial read that you were able to encode the hashes 20% faster compared to outside the WASM. My bad. For conformance, since you are mimicking node's |
Oh that's right, the Node Full compatibility with the Node Base64For This would again be purely for convenience and there is no performance benefit over doing it manually afterwards, as it is essentially: const digest = create32()
.update("some data")
.digest();
const digestBase64 = btoa(digest);
// If the digest is a standalone call (after all updates on `hash`)
// it becomes:
const digestBase64 = btoa(hash.digest()); If we really wanted to go into micro-optimisations, this would be faster than any implementation that works with All in all, if that functionality should be added, it needs to be considered that the convenience should not come at the expense of the default path. In that case, different methods, such as |
This would not produce the desired result as // Utility methods to output base64 from xxhash-wasm
// Note that the uint hash must be written in big endian for output to be
// platform independent and conform to canonical spec for xxhash
import { Buffer } from "node:buffer";
import xxhash from "xxhash-wasm";
const hasher = await xxhash();
// Node only version using Buffer class' toString()
export const xx64ToString = (data, encoding) => {
const hashBuffer = Buffer.allocUnsafe(8);
hashBuffer.writeBigUInt64BE(hasher.h64(data));
return hashBuffer.toString(encoding);
};
// Native version for browser using DataView and btoa()
export const xx64ToBase64 = (data) => {
const hashBuffer = new ArrayBuffer(8);
const view = new DataView(hashBuffer);
view.setBigUint64(0, hasher.h64(data));
const hashBytes = new Uint8Array(hashBuffer);
return btoa(String.fromCodePoint(...hashBytes));
}; The node version is more versatile since it can output any encoding supported by node. For example, outputting base64url in the browser would require an extra step to replace any I am sensitive to arguments for keeping packages scoped properly, but in this case I'd argue that at least handing the user a proper big endian buffer representation of the hash is in scope. For example, it would be just as easy above to create a typed array and pull the buffer from that: const hashBuffer = (new BigUint64Array([hasher.h64(data)])).buffer; However, encoding that would produce an incorrect result on little endian platforms unless the bytes were reversed first. |
xxhash-wasm never has a buffer of the hash, however, so it's no better at producing a buffer of this data than a user-space function, and as you've demonstrated with your samples, there's not a good portable way of doing so. Fwiw, practically speaking, a lookup table is going to be the fastest portable solution here given that we have known number of bits to work (meaning padding is easily handled), so something like
|
The latter half of that confuses me. Those methods are portable across architectures, and the 2nd is portable in the sense of JS engine as well. My point was to demonstrate (a) how to do it right, and (b) if left entirely to the user, it's easy to do it wrong (e.g. by making architecture assumptions or using a little endian form of the integer). Including something in the library to return a buffer or alternate encoding would be purely for user convenience to save the time of having to lookup the canonical format for xxhash and understand byte order in general. Alternatively, you could just demonstrate the right way in the docs and I'd be happy to provide a gist for linking. |
Just a feature request to support encodings other than
hex
natively in the web assembly, particularlybase64
andbase64url
for shorter digests.Thanks for the package 👍🏻
The text was updated successfully, but these errors were encountered: