Skip to content

Conversation

NotsoanoNimus
Copy link
Contributor

@NotsoanoNimus NotsoanoNimus commented Jul 11, 2025

Updating the C3 hash kit with some more modern hashes that are performant, random, and work reliably according to their known test vectors.

For this work, I'd like to give a special thanks to Dr. Timofey Prodanov for this article and the related benchmarks spreadsheet. Both of these items are important, interesting, and relevant for this change.

Typical benchmark results:

---------------------- BENCHMARKS -----------------------
Benchmarking non_crypto_benchmarks::fnv64a_1 ....... [COMPLETE] 12.88 ns, 25.75 CPU's clocks
Benchmarking non_crypto_benchmarks::fnv32a_1 ....... [COMPLETE] 10.29 ns, 20.57 CPU's clocks
Benchmarking non_crypto_benchmarks::wyhash2_1 ...... [COMPLETE] 12.70 ns, 25.40 CPU's clocks
Benchmarking non_crypto_benchmarks::metro64_1 ...... [COMPLETE] 86.14 ns, 172.27 CPU's clocks
Benchmarking non_crypto_benchmarks::metro128_1 ..... [COMPLETE] 100.94 ns, 201.86 CPU's clocks
Benchmarking non_crypto_benchmarks::a5hash_1 ....... [COMPLETE] 13.87 ns, 27.74 CPU's clocks
Benchmarking non_crypto_benchmarks::komi_1 ......... [COMPLETE] 22.43 ns, 44.84 CPU's clocks

Benchmarking non_crypto_benchmarks::fnv64a_4 ....... [COMPLETE] 29.84 ns, 59.66 CPU's clocks
Benchmarking non_crypto_benchmarks::fnv32a_4 ....... [COMPLETE] 26.93 ns, 53.84 CPU's clocks
Benchmarking non_crypto_benchmarks::wyhash2_4 ...... [COMPLETE] 13.33 ns, 26.64 CPU's clocks
Benchmarking non_crypto_benchmarks::metro64_4 ...... [COMPLETE] 55.46 ns, 110.90 CPU's clocks
Benchmarking non_crypto_benchmarks::metro128_4 ..... [COMPLETE] 65.28 ns, 130.54 CPU's clocks
Benchmarking non_crypto_benchmarks::a5hash_4 ....... [COMPLETE] 15.75 ns, 31.50 CPU's clocks
Benchmarking non_crypto_benchmarks::komi_4 ......... [COMPLETE] 26.86 ns, 53.70 CPU's clocks

Benchmarking non_crypto_benchmarks::fnv64a_8 ....... [COMPLETE] 54.15 ns, 108.29 CPU's clocks
Benchmarking non_crypto_benchmarks::fnv32a_8 ....... [COMPLETE] 47.62 ns, 95.22 CPU's clocks
Benchmarking non_crypto_benchmarks::wyhash2_8 ...... [COMPLETE] 11.48 ns, 22.97 CPU's clocks
Benchmarking non_crypto_benchmarks::metro64_8 ...... [COMPLETE] 55.05 ns, 110.08 CPU's clocks
Benchmarking non_crypto_benchmarks::metro128_8 ..... [COMPLETE] 65.42 ns, 130.82 CPU's clocks
Benchmarking non_crypto_benchmarks::a5hash_8 ....... [COMPLETE] 15.82 ns, 31.63 CPU's clocks
Benchmarking non_crypto_benchmarks::komi_8 ......... [COMPLETE] 31.26 ns, 62.50 CPU's clocks

Benchmarking non_crypto_benchmarks::fnv64a_16 ...... [COMPLETE] 102.64 ns, 205.26 CPU's clocks
Benchmarking non_crypto_benchmarks::fnv32a_16 ...... [COMPLETE] 89.49 ns, 178.97 CPU's clocks
Benchmarking non_crypto_benchmarks::wyhash2_16 ..... [COMPLETE] 11.16 ns, 22.31 CPU's clocks
Benchmarking non_crypto_benchmarks::metro64_16 ..... [COMPLETE] 59.85 ns, 119.67 CPU's clocks
Benchmarking non_crypto_benchmarks::metro128_16 .... [COMPLETE] 96.86 ns, 193.69 CPU's clocks
Benchmarking non_crypto_benchmarks::a5hash_16 ...... [COMPLETE] 35.02 ns, 70.02 CPU's clocks
Benchmarking non_crypto_benchmarks::komi_16 ........ [COMPLETE] 46.51 ns, 91.35 CPU's clocks

Benchmarking non_crypto_benchmarks::fnv64a_32 ...... [COMPLETE] 216.51 ns, 433.00 CPU's clocks
Benchmarking non_crypto_benchmarks::fnv32a_32 ...... [COMPLETE] 175.75 ns, 351.48 CPU's clocks
Benchmarking non_crypto_benchmarks::metro64_32 ..... [COMPLETE] 69.13 ns, 138.23 CPU's clocks
Benchmarking non_crypto_benchmarks::metro128_32 .... [COMPLETE] 75.88 ns, 151.74 CPU's clocks
Benchmarking non_crypto_benchmarks::a5hash_32 ...... [COMPLETE] 19.27 ns, 38.54 CPU's clocks
Benchmarking non_crypto_benchmarks::komi_32 ........ [COMPLETE] 34.71 ns, 69.39 CPU's clocks

Benchmarking non_crypto_benchmarks::fnv64a_64 ...... [COMPLETE] 395.49 ns, 790.95 CPU's clocks
Benchmarking non_crypto_benchmarks::fnv32a_64 ...... [COMPLETE] 341.06 ns, 682.08 CPU's clocks
Benchmarking non_crypto_benchmarks::metro64_64 ..... [COMPLETE] 83.42 ns, 166.83 CPU's clocks
Benchmarking non_crypto_benchmarks::metro128_64 .... [COMPLETE] 92.86 ns, 185.70 CPU's clocks
Benchmarking non_crypto_benchmarks::a5hash_64 ...... [COMPLETE] 30.62 ns, 61.24 CPU's clocks
Benchmarking non_crypto_benchmarks::komi_64 ........ [COMPLETE] 49.37 ns, 98.73 CPU's clocks

Benchmarking non_crypto_benchmarks::fnv64a_128 ..... [COMPLETE] 754.96 ns, 1509.88 CPU's clocks
Benchmarking non_crypto_benchmarks::fnv32a_128 ..... [COMPLETE] 714.43 ns, 1428.83 CPU's clocks
Benchmarking non_crypto_benchmarks::metro64_128 .... [COMPLETE] 110.52 ns, 221.02 CPU's clocks
Benchmarking non_crypto_benchmarks::metro128_128 ... [COMPLETE] 146.85 ns, 293.68 CPU's clocks
Benchmarking non_crypto_benchmarks::a5hash_128 ..... [COMPLETE] 54.44 ns, 108.86 CPU's clocks
Benchmarking non_crypto_benchmarks::komi_128 ....... [COMPLETE] 88.81 ns, 177.59 CPU's clocks

Benchmarking non_crypto_benchmarks::fnv64a_1024 .... [COMPLETE] 6638.73 ns, 13277.26 CPU's clocks
Benchmarking non_crypto_benchmarks::fnv32a_1024 .... [COMPLETE] 5644.38 ns, 11288.57 CPU's clocks
Benchmarking non_crypto_benchmarks::metro64_1024 ... [COMPLETE] 446.42 ns, 892.78 CPU's clocks
Benchmarking non_crypto_benchmarks::metro128_1024 .. [COMPLETE] 553.51 ns, 1106.99 CPU's clocks
Benchmarking non_crypto_benchmarks::a5hash_1024 .... [COMPLETE] 390.97 ns, 781.91 CPU's clocks
Benchmarking non_crypto_benchmarks::komi_1024 ...... [COMPLETE] 442.33 ns, 884.62 CPU's clocks

52 benchmarks run.

@lerno Given the benchmarking results when run and Dr. Prodanov's tests, as well as the poor randomness and speed of FNV hashing, I would like to propose a change - in a different pull request - for how types are hashed by default in stdlib. If we use wyhash2 for types up to and including 16 bytes in size and metro64 for vectors, arrays, etc. beyond that, If we reevaluate which hashes in this PR are best at particular input sizes, based on speed and randomness, C3 can have much faster and more random hashing performance (which would hopefully speed a lot of things up)!

The wyhash2 speed changes are DISGUSTING - so are Metro's at a higher data size!

Anyway, just a thought. Let me know what you think.

@data-man
Copy link
Contributor

modern hashes

Just for info:
https://github.com/avaneev/a5hash
https://github.com/avaneev/komihash
https://github.com/Nicoshev/rapidhash

@NotsoanoNimus
Copy link
Contributor Author

modern hashes

Just for info: https://github.com/avaneev/a5hash https://github.com/avaneev/komihash https://github.com/Nicoshev/rapidhash

Now that's what I call modern!

Spent some time today and yesterday implementing a5hash and komihash, so there are even more options to choose from (sorry to the reviewers 😄).

@NotsoanoNimus
Copy link
Contributor Author

That about wraps up the initial code honing and few passes.

I won't add on top of this PR any further now, given no more CI issues or review feedback pop up. Looks ready to go!


for (; data.len >= 32; data = data[32:^32])
{
self.state[0] += ((ulong*)data.ptr)[0] * K[0]; self.state[0] = self.state[0].rotr(29) + self.state[2];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unaligned, so you need @unaligned_load


if (final_data.len >= 16)
{
self.state[0] += ((ulong*)final_data.ptr)[0] * K[2]; self.state[0] = self.state[0].rotr(33) * K[3];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but maybe this needs unaligned_load


if (final_data.len >= 8)
{
self.state[0] += ((ulong*)final_data.ptr)[0] * K[2]; self.state[0] = self.state[0].rotr(33) * K[3];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unaligned_load again..

@lerno
Copy link
Collaborator

lerno commented Jul 18, 2025

I found even more examples of unaligned load, so try to look through and fix those.

@lerno
Copy link
Collaborator

lerno commented Jul 18, 2025

A good way to test this is to pass in data that is unaligned, for example:

char[] data_aligned = "XThe data";
char[] data = data_aligned[1..]; // This is pretty sure to be unaligned.

@lerno
Copy link
Collaborator

lerno commented Jul 19, 2025

Please check the benchmarks now that things actually use unaligned access properly! I think I fixed all the bugs, so it's time to merge this!

@lerno lerno merged commit ed92476 into c3lang:master Jul 19, 2025
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants