Skip to content

Conversation

@hanno-becker
Copy link
Contributor

@hanno-becker hanno-becker commented Oct 20, 2025

mlkem-native will only call the FIPS202 API with aligned buffers, which can be useful for users providing their own FIPS202 implementation.

This commit extends the documentation and CBMC contracts accordingly, and adds debug assertions for checking alignment. Note that CBMC does, to the best of my knowledge, not support talking about absolute pointer alignment, so we cannot rely on proof here.

This is relevant for zerorisc/expo#97

@hanno-becker hanno-becker force-pushed the fips202_align_size branch 8 times, most recently from d5305d3 to e9b12dd Compare October 20, 2025 04:28
@hanno-becker hanno-becker changed the title FIPS202: Provide alignment constraints for in/out buffers FIPS202: Provide alignment constraints for output buffers Oct 20, 2025
@hanno-becker hanno-becker marked this pull request as ready for review October 20, 2025 06:17
@hanno-becker hanno-becker requested a review from a team as a code owner October 20, 2025 06:17
mlkem-native will only call the FIPS202 API with aligned output buffers
which can be useful for users providing their own FIPS202 implementation.

This commit extends the documentation and CBMC contracts accordingly,
and adds debug assertions for checking alignment. Note that CBMC does,
to the best of my knowledge, not support talking about absolute pointer
alignment, so we cannot rely on proof here.

We also take the opportunity to align internal buffers used in the
default C implementation of mlk_shake256x4.

Signed-off-by: Hanno Becker <[email protected]>
@hanno-becker hanno-becker added benchmark this PR should be benchmarked in CI and removed needs-work labels Oct 20, 2025
Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks

Benchmark suite Current: 038ab4a Previous: 6048cab Ratio
ML-KEM-512 keypair 12295 cycles 12295 cycles 1
ML-KEM-512 encaps 14953 cycles 14953 cycles 1
ML-KEM-512 decaps 19491 cycles 19501 cycles 1.00
ML-KEM-768 keypair 21347 cycles 21349 cycles 1.00
ML-KEM-768 encaps 23907 cycles 23926 cycles 1.00
ML-KEM-768 decaps 30474 cycles 30494 cycles 1.00
ML-KEM-1024 keypair 30333 cycles 30335 cycles 1.00
ML-KEM-1024 encaps 34540 cycles 34541 cycles 1.00
ML-KEM-1024 decaps 44141 cycles 44143 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks

Benchmark suite Current: 038ab4a Previous: 6048cab Ratio
ML-KEM-512 keypair 59731 cycles 59708 cycles 1.00
ML-KEM-512 encaps 67176 cycles 67100 cycles 1.00
ML-KEM-512 decaps 85904 cycles 85702 cycles 1.00
ML-KEM-768 keypair 101701 cycles 101880 cycles 1.00
ML-KEM-768 encaps 113021 cycles 113147 cycles 1.00
ML-KEM-768 decaps 139657 cycles 139940 cycles 1.00
ML-KEM-1024 keypair 154633 cycles 155297 cycles 1.00
ML-KEM-1024 encaps 171349 cycles 175881 cycles 0.97
ML-KEM-1024 decaps 207314 cycles 211447 cycles 0.98

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Benchmark suite Current: 038ab4a Previous: 6048cab Ratio
ML-KEM-512 keypair 9725 cycles 9609 cycles 1.01
ML-KEM-512 encaps 11041 cycles 11092 cycles 1.00
ML-KEM-512 decaps 15239 cycles 15103 cycles 1.01
ML-KEM-768 keypair 16498 cycles 16475 cycles 1.00
ML-KEM-768 encaps 17866 cycles 17760 cycles 1.01
ML-KEM-768 decaps 23402 cycles 23429 cycles 1.00
ML-KEM-1024 keypair 22467 cycles 22680 cycles 0.99
ML-KEM-1024 encaps 24296 cycles 24272 cycles 1.00
ML-KEM-1024 decaps 31947 cycles 31841 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Benchmark suite Current: 038ab4a Previous: 6048cab Ratio
ML-KEM-512 keypair 28948 cycles 28930 cycles 1.00
ML-KEM-512 encaps 36007 cycles 35949 cycles 1.00
ML-KEM-512 decaps 45334 cycles 45305 cycles 1.00
ML-KEM-768 keypair 48277 cycles 47999 cycles 1.01
ML-KEM-768 encaps 57865 cycles 57515 cycles 1.01
ML-KEM-768 decaps 70275 cycles 69922 cycles 1.01
ML-KEM-1024 keypair 71624 cycles 71219 cycles 1.01
ML-KEM-1024 encaps 83607 cycles 83197 cycles 1.00
ML-KEM-1024 decaps 100149 cycles 99767 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks

Benchmark suite Current: 038ab4a Previous: 6048cab Ratio
ML-KEM-512 keypair 28346 cycles 28332 cycles 1.00
ML-KEM-512 encaps 34143 cycles 34037 cycles 1.00
ML-KEM-512 decaps 44416 cycles 44301 cycles 1.00
ML-KEM-768 keypair 48277 cycles 48259 cycles 1.00
ML-KEM-768 encaps 54183 cycles 54210 cycles 1.00
ML-KEM-768 decaps 68653 cycles 68671 cycles 1.00
ML-KEM-1024 keypair 70435 cycles 70526 cycles 1.00
ML-KEM-1024 encaps 78853 cycles 78796 cycles 1.00
ML-KEM-1024 decaps 98465 cycles 98349 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Benchmark suite Current: 038ab4a Previous: 6048cab Ratio
ML-KEM-512 keypair 16870 cycles 16865 cycles 1.00
ML-KEM-512 encaps 18598 cycles 18609 cycles 1.00
ML-KEM-512 decaps 23994 cycles 24001 cycles 1.00
ML-KEM-768 keypair 28683 cycles 28733 cycles 1.00
ML-KEM-768 encaps 29906 cycles 29898 cycles 1.00
ML-KEM-768 decaps 37674 cycles 37704 cycles 1.00
ML-KEM-1024 keypair 41481 cycles 41518 cycles 1.00
ML-KEM-1024 encaps 43822 cycles 43893 cycles 1.00
ML-KEM-1024 decaps 54301 cycles 54381 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Benchmark suite Current: 038ab4a Previous: 6048cab Ratio
ML-KEM-512 keypair 11942 cycles 11879 cycles 1.01
ML-KEM-512 encaps 13427 cycles 13439 cycles 1.00
ML-KEM-512 decaps 18323 cycles 18317 cycles 1.00
ML-KEM-768 keypair 20556 cycles 20575 cycles 1.00
ML-KEM-768 encaps 21491 cycles 21509 cycles 1.00
ML-KEM-768 decaps 28763 cycles 28673 cycles 1.00
ML-KEM-1024 keypair 27917 cycles 27959 cycles 1.00
ML-KEM-1024 encaps 29609 cycles 29636 cycles 1.00
ML-KEM-1024 decaps 39042 cycles 39068 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Benchmark suite Current: 038ab4a Previous: 6048cab Ratio
ML-KEM-512 keypair 36416 cycles 36412 cycles 1.00
ML-KEM-512 encaps 42888 cycles 42831 cycles 1.00
ML-KEM-512 decaps 55700 cycles 55720 cycles 1.00
ML-KEM-768 keypair 59563 cycles 59631 cycles 1.00
ML-KEM-768 encaps 67561 cycles 67753 cycles 1.00
ML-KEM-768 decaps 84931 cycles 84867 cycles 1.00
ML-KEM-1024 keypair 87406 cycles 87652 cycles 1.00
ML-KEM-1024 encaps 98085 cycles 98272 cycles 1.00
ML-KEM-1024 decaps 119751 cycles 119773 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Benchmark suite Current: 038ab4a Previous: 6048cab Ratio
ML-KEM-512 keypair 38511 cycles 38505 cycles 1.00
ML-KEM-512 encaps 47561 cycles 47537 cycles 1.00
ML-KEM-512 decaps 60963 cycles 60960 cycles 1.00
ML-KEM-768 keypair 63878 cycles 63913 cycles 1.00
ML-KEM-768 encaps 74885 cycles 74861 cycles 1.00
ML-KEM-768 decaps 92934 cycles 92885 cycles 1.00
ML-KEM-1024 keypair 94405 cycles 94493 cycles 1.00
ML-KEM-1024 encaps 108804 cycles 108789 cycles 1.00
ML-KEM-1024 decaps 131421 cycles 131739 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Benchmark suite Current: 038ab4a Previous: 6048cab Ratio
ML-KEM-512 keypair 28330 cycles 28306 cycles 1.00
ML-KEM-512 encaps 34068 cycles 34098 cycles 1.00
ML-KEM-512 decaps 44360 cycles 44404 cycles 1.00
ML-KEM-768 keypair 48298 cycles 48265 cycles 1.00
ML-KEM-768 encaps 54277 cycles 54145 cycles 1.00
ML-KEM-768 decaps 68749 cycles 68639 cycles 1.00
ML-KEM-1024 keypair 70449 cycles 70481 cycles 1.00
ML-KEM-1024 encaps 78755 cycles 78889 cycles 1.00
ML-KEM-1024 decaps 98309 cycles 98498 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Benchmark suite Current: 038ab4a Previous: 6048cab Ratio
ML-KEM-512 keypair 16301 cycles 16268 cycles 1.00
ML-KEM-512 encaps 18424 cycles 18556 cycles 0.99
ML-KEM-512 decaps 24972 cycles 25030 cycles 1.00
ML-KEM-768 keypair 27878 cycles 29162 cycles 0.96
ML-KEM-768 encaps 31286 cycles 30470 cycles 1.03
ML-KEM-768 decaps 41010 cycles 39360 cycles 1.04
ML-KEM-1024 keypair 37568 cycles 37574 cycles 1.00
ML-KEM-1024 encaps 40416 cycles 40171 cycles 1.01
ML-KEM-1024 decaps 53882 cycles 53863 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Intel Xeon 3rd gen (c6i)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 038ab4a Previous: 6048cab Ratio
ML-KEM-768 decaps 41010 cycles 39360 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Benchmark suite Current: 038ab4a Previous: 6048cab Ratio
ML-KEM-512 keypair 17636 cycles 17658 cycles 1.00
ML-KEM-512 encaps 20651 cycles 20644 cycles 1.00
ML-KEM-512 decaps 27078 cycles 27090 cycles 1.00
ML-KEM-768 keypair 30226 cycles 30252 cycles 1.00
ML-KEM-768 encaps 32961 cycles 32955 cycles 1.00
ML-KEM-768 decaps 42172 cycles 42187 cycles 1.00
ML-KEM-1024 keypair 43830 cycles 43858 cycles 1.00
ML-KEM-1024 encaps 48845 cycles 48895 cycles 1.00
ML-KEM-1024 decaps 61577 cycles 61572 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Benchmark suite Current: 038ab4a Previous: 6048cab Ratio
ML-KEM-512 keypair 35375 cycles 35402 cycles 1.00
ML-KEM-512 encaps 40792 cycles 40806 cycles 1.00
ML-KEM-512 decaps 51543 cycles 51533 cycles 1.00
ML-KEM-768 keypair 59439 cycles 58804 cycles 1.01
ML-KEM-768 encaps 65595 cycles 66166 cycles 0.99
ML-KEM-768 decaps 80068 cycles 80061 cycles 1.00
ML-KEM-1024 keypair 87688 cycles 87785 cycles 1.00
ML-KEM-1024 encaps 97075 cycles 97069 cycles 1.00
ML-KEM-1024 decaps 116268 cycles 116310 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Benchmark suite Current: 038ab4a Previous: 6048cab Ratio
ML-KEM-512 keypair 46117 cycles 46150 cycles 1.00
ML-KEM-512 encaps 54851 cycles 54831 cycles 1.00
ML-KEM-512 decaps 69951 cycles 69981 cycles 1.00
ML-KEM-768 keypair 76003 cycles 75991 cycles 1.00
ML-KEM-768 encaps 87052 cycles 86935 cycles 1.00
ML-KEM-768 decaps 106918 cycles 106758 cycles 1.00
ML-KEM-1024 keypair 110478 cycles 110517 cycles 1.00
ML-KEM-1024 encaps 125335 cycles 124882 cycles 1.00
ML-KEM-1024 decaps 150500 cycles 150109 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Benchmark suite Current: 038ab4a Previous: 6048cab Ratio
ML-KEM-512 keypair 18673 cycles 18683 cycles 1.00
ML-KEM-512 encaps 22017 cycles 22000 cycles 1.00
ML-KEM-512 decaps 28979 cycles 29006 cycles 1.00
ML-KEM-768 keypair 31899 cycles 31932 cycles 1.00
ML-KEM-768 encaps 34989 cycles 34980 cycles 1.00
ML-KEM-768 decaps 45030 cycles 45049 cycles 1.00
ML-KEM-1024 keypair 46287 cycles 46313 cycles 1.00
ML-KEM-1024 encaps 51581 cycles 51641 cycles 1.00
ML-KEM-1024 decaps 65169 cycles 65214 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Benchmark suite Current: 038ab4a Previous: 6048cab Ratio
ML-KEM-512 keypair 59231 cycles 59398 cycles 1.00
ML-KEM-512 encaps 68776 cycles 68910 cycles 1.00
ML-KEM-512 decaps 87506 cycles 87493 cycles 1.00
ML-KEM-768 keypair 100095 cycles 99171 cycles 1.01
ML-KEM-768 encaps 111196 cycles 111365 cycles 1.00
ML-KEM-768 decaps 135788 cycles 136304 cycles 1.00
ML-KEM-1024 keypair 148806 cycles 148930 cycles 1.00
ML-KEM-1024 encaps 164667 cycles 164465 cycles 1.00
ML-KEM-1024 decaps 195869 cycles 195927 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Benchmark suite Current: 038ab4a Previous: 6048cab Ratio
ML-KEM-512 keypair 38780 cycles 38795 cycles 1.00
ML-KEM-512 encaps 44906 cycles 44928 cycles 1.00
ML-KEM-512 decaps 56675 cycles 56672 cycles 1.00
ML-KEM-768 keypair 65028 cycles 64175 cycles 1.01
ML-KEM-768 encaps 72044 cycles 72738 cycles 0.99
ML-KEM-768 decaps 88047 cycles 87957 cycles 1.00
ML-KEM-1024 keypair 95700 cycles 95668 cycles 1.00
ML-KEM-1024 encaps 106316 cycles 106273 cycles 1.00
ML-KEM-1024 decaps 126732 cycles 126846 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpacemiT K1 8 (Banana Pi F3) benchmarks

Benchmark suite Current: 038ab4a Previous: 6048cab Ratio
ML-KEM-512 keypair 155232 cycles 155240 cycles 1.00
ML-KEM-512 encaps 163138 cycles 163110 cycles 1.00
ML-KEM-512 decaps 206355 cycles 206370 cycles 1.00
ML-KEM-768 keypair 261038 cycles 260968 cycles 1.00
ML-KEM-768 encaps 275782 cycles 275538 cycles 1.00
ML-KEM-768 decaps 338138 cycles 337761 cycles 1.00
ML-KEM-1024 keypair 395569 cycles 395152 cycles 1.00
ML-KEM-1024 encaps 424071 cycles 422201 cycles 1.00
ML-KEM-1024 decaps 505419 cycles 506020 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@hanno-becker
Copy link
Contributor Author

@mkannwischer Unfortunately, one cannot fully discharge those constraints, as sometimes one hashed straight into a buffer provided at the toplevel API. One would need to either a) add alignment constraints at the toplevel, b) add a copy at the toplevel, c) avoid the alignment constraints altogether.

We should definitely extend our testing to exercise unaligned buffers at the toplevel.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks

Benchmark suite Current: 038ab4a Previous: 6048cab Ratio
ML-KEM-512 keypair 50771 cycles 51671 cycles 0.98
ML-KEM-512 encaps 59096 cycles 59384 cycles 1.00
ML-KEM-512 decaps 75960 cycles 76049 cycles 1.00
ML-KEM-768 keypair 88076 cycles 87579 cycles 1.01
ML-KEM-768 encaps 96737 cycles 95925 cycles 1.01
ML-KEM-768 decaps 118599 cycles 119410 cycles 0.99
ML-KEM-1024 keypair 130194 cycles 129911 cycles 1.00
ML-KEM-1024 encaps 141882 cycles 142563 cycles 1.00
ML-KEM-1024 decaps 173487 cycles 173582 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@mkannwischer
Copy link
Contributor

@mkannwischer Unfortunately, one cannot fully discharge those constraints, as sometimes one hashed straight into a buffer provided at the toplevel API. One would need to either a) add alignment constraints at the toplevel, b) add a copy at the toplevel, c) avoid the alignment constraints altogether.

We should definitely extend our testing to exercise unaligned buffers at the toplevel.

That is indeed unfortunate. Do you still want to merge this PR now? It gives the false impression that FIPS202 can assume alignment of the output.

@hanno-becker
Copy link
Contributor Author

Do you still want to merge this PR now?

No, definitely not in this form.

@hanno-becker hanno-becker marked this pull request as draft October 20, 2025 08:15
@hanno-becker hanno-becker added needs-work and removed benchmark this PR should be benchmarked in CI labels Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants