aes-gcm: Enable AVX-512 implementation. #2444

briansmith · 2025-03-05T15:57:02Z

No description provided.

briansmith · 2025-03-05T15:58:50Z

src/cpu/intel.rs

+            // Intel: "15.3 DETECTION OF 512-BIT INSTRUCTION GROUPS OF THE INTEL
+            // AVX-512 FAMILY".
+            // `OPENSSL_cpuid_setup` clears these bits when XCR0[7:5] isn't 0b111.
+            // doesn't AVX-512  state.


Assuming PR #2439 is merged before this, then this will need to be updated.

codecov · 2025-03-05T15:59:11Z

Codecov Report

❌ Patch coverage is 30.50847% with 82 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.14%. Comparing base (f4836b0) to head (eac2860).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
src/aead/aes_gcm/vaesclmulavx512.rs	0.00%	36 Missing ⚠️
src/cpu/x86_64.rs	54.76%	12 Missing and 7 partials ⚠️
src/aead/gcm/vclmulavx512.rs	0.00%	16 Missing ⚠️
src/aead/aes_gcm.rs	56.25%	6 Missing and 1 partial ⚠️
src/aead/gcm.rs	0.00%	3 Missing ⚠️
src/cpu.rs	80.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2444      +/-   ##
==========================================
- Coverage   96.52%   96.14%   -0.38%     
==========================================
  Files         190      192       +2     
  Lines       20119    20217      +98     
  Branches      513      523      +10     
==========================================
+ Hits        19419    19437      +18     
- Misses        560      633      +73     
- Partials      140      147       +7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

briansmith · 2025-03-10T17:09:16Z

This is blocked on code coverage testing.

See also these pending changes upstream:
https://boringssl-review.googlesource.com/c/boringssl/+/77168
https://boringssl-review.googlesource.com/c/boringssl/+/77167

See also issue #2469 regarding developing a workaround for this code to work in ancient binutils.

briansmith · 2025-03-10T17:13:38Z

The code coverage aspect of this comprises two parts:

Ensure that the aes-gcm-avx2 implementation is tested in the coverage job regardless of whether the GItHub Actions runner would normally use the avx512 version.
Ensure that the aes-gcm-avx512 implementation is tested in the coverage job regardless of whether the GItHub Actions runner would normally use the avx512 version.

I think GitHub is experimenting with some AVX-512-enabled actions runners so the tests might be flaky in the interim without explicitly using QEMU to target specific CPUs that would choose each implementation.

In PR #2464 I am experimenting with QEMU 9.2.2, which adds newer CPUs than are available in QEMU 8.2.2 used in GitHub Actions Ubuntu 24.04 runners.

ebiggers · 2025-03-12T02:09:08Z

crypto/fipsmodule/aes/asm/aes-gcm-avx10-x86_64.pl

-
    # GHASH the remaining data 16 bytes at a time, using xmm registers only.
 .Laad_blockbyblock$local_label_suffix:
    test            $AADLEN, $AADLEN


It looks like you're updating this function to support only len=16 again? It might be a good idea to remove this check for len==0, and update the comment "|len| must be a multiple of 16" which this change makes outdated.

But, please note that if someone actually passes in a large amount of AAD (which can happen if someone uses the AES-GCM API to compute GMAC for an authentication-only use case), breaking it into 16-byte chunks is very bad for performance.

But, please note that if someone actually passes in a large amount of AAD (which can happen if someone uses the AES-GCM API to compute GMAC for an authentication-only use case), breaking it into 16-byte chunks is very bad for performance.

Yes, I'm aware, but I don't know of any use cases for that at all that would be relevant to ring users. I only know that Google does it for some unknown reason.

It looks like you're updating this function to support only len=16 again? It might be a good idea to remove this check for len==0, and update the comment "|len| must be a multiple of 16" which this change makes outdated.

Thanks. I will make those changes and also rebase this on top of the BoringSSL changes from upstream.

I didn't expect any users of large amounts of AAD either, but it turns out that with enough users there will be someone doing something unusual :(

If you're only doing 16 bytes at a time anyway, did you also consider just using gcm_gmult_vpclmulqdq_avx10()? If you XOR the 16 bytes of data into the GHASH accumulator ("Xi") and call gcm_gmult_vpclmulqdq_avx10(), that is equivalent to gcm_ghash_vpclmulqdq_avx10_512() with len=16.

If you're only doing 16 bytes at a time anyway, did you also consider just using gcm_gmult_vpclmulqdq_avx10()? If you XOR the 16 bytes of data into the GHASH accumulator ("Xi") and call gcm_gmult_vpclmulqdq_avx10(), that is equivalent to gcm_ghash_vpclmulqdq_avx10_512() with len=16.

That is how we were doing things before with pre-VAES implementations, but we've been switching over to the (tweaked) ghash implementations because it's less fighting the rustc optimizer on the Rust side.

We had trouble, for example, getting rustc to always use SSE XOR instead of byte-by-byte XOR, in some cases, thought that might be resolved now. Also, we had trouble getting rustc to assume that the partial/single-block case is more likely than the multi-block case. In later rustc versions it will matter less once we can use likely/unlikely.

Regardless, in PR #2478 I tweaked the AVX2 version of this function to be based on the gmult implementation instead of the ghash implementation. (See also PR #2477, which attempts the same tweaks still based on the ghash implementation).

I didn't expect any users of large amounts of AAD either, but it turns out that with enough users there will be someone doing something unusual :(

I don't think ring has any unusual users. We rely on people telling us what they need and we try to optimize for what people are actually using.

Scottmitch · 2025-10-21T22:55:41Z

Hello and thanks so much for supporting Ring. I’ve run benchmarks with this PR and the AVX-512 instructions provide a meaningful performance boost. IIUC most of the PRs/issues reference by this PR are closed/resolved. Are there any remaining issues or blockers before merging this PR?

briansmith self-assigned this Mar 5, 2025

briansmith commented Mar 5, 2025

View reviewed changes

briansmith force-pushed the b/aes-gcm-avx512 branch 4 times, most recently from f9ead65 to 159aa07 Compare March 8, 2025 21:31

briansmith mentioned this pull request Mar 10, 2025

Work around incompatibility of aes-gcm-avx512 with ancient GNU as (GNU binutils) #2469

Open

briansmith added this to the 0.17.15 milestone Mar 11, 2025

ebiggers reviewed Mar 12, 2025

View reviewed changes

briansmith force-pushed the b/aes-gcm-avx512 branch 3 times, most recently from 34889f5 to 7ca1e37 Compare March 20, 2025 00:43

georglauterbach mentioned this pull request Jun 4, 2025

New Release #2525

Open

briansmith force-pushed the b/aes-gcm-avx512 branch 7 times, most recently from 282f193 to 429a9be Compare July 17, 2025 19:23

briansmith force-pushed the b/aes-gcm-avx512 branch 7 times, most recently from e4d1361 to 7ff364b Compare July 26, 2025 23:46

briansmith force-pushed the b/aes-gcm-avx512 branch 3 times, most recently from 925ac5f to bd0587d Compare July 27, 2025 08:17

aes-gcm: Enable AVX-512 implementation.

eac2860

briansmith force-pushed the b/aes-gcm-avx512 branch from bd0587d to eac2860 Compare July 31, 2025 21:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

aes-gcm: Enable AVX-512 implementation. #2444

aes-gcm: Enable AVX-512 implementation. #2444

Uh oh!

briansmith commented Mar 5, 2025

Uh oh!

briansmith Mar 5, 2025

Uh oh!

codecov bot commented Mar 5, 2025 •

edited

Loading

Uh oh!

briansmith commented Mar 10, 2025

Uh oh!

briansmith commented Mar 10, 2025

Uh oh!

ebiggers Mar 12, 2025

Uh oh!

briansmith Mar 12, 2025

Uh oh!

briansmith Mar 12, 2025

Uh oh!

ebiggers Mar 12, 2025

Uh oh!

briansmith Mar 12, 2025

Uh oh!

Scottmitch commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aes-gcm: Enable AVX-512 implementation. #2444

Are you sure you want to change the base?

aes-gcm: Enable AVX-512 implementation. #2444

Uh oh!

Conversation

briansmith commented Mar 5, 2025

Uh oh!

briansmith Mar 5, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

briansmith commented Mar 10, 2025

Uh oh!

briansmith commented Mar 10, 2025

Uh oh!

ebiggers Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

briansmith Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

briansmith Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

ebiggers Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

briansmith Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

Scottmitch commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Mar 5, 2025 •

edited

Loading