- 
                Notifications
    You must be signed in to change notification settings 
- Fork 38
FIPS202: Provide alignment constraints for output buffers #1240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
d5305d3    to
    e9b12dd      
    Compare
  
    e9b12dd    to
    e27068b      
    Compare
  
    mlkem-native will only call the FIPS202 API with aligned output buffers which can be useful for users providing their own FIPS202 implementation. This commit extends the documentation and CBMC contracts accordingly, and adds debug assertions for checking alignment. Note that CBMC does, to the best of my knowledge, not support talking about absolute pointer alignment, so we cannot rely on proof here. We also take the opportunity to align internal buffers used in the default C implementation of mlk_shake256x4. Signed-off-by: Hanno Becker <[email protected]>
e27068b    to
    038ab4a      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks
| Benchmark suite | Current: 038ab4a | Previous: 6048cab | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 12295cycles | 12295cycles | 1 | 
| ML-KEM-512 encaps | 14953cycles | 14953cycles | 1 | 
| ML-KEM-512 decaps | 19491cycles | 19501cycles | 1.00 | 
| ML-KEM-768 keypair | 21347cycles | 21349cycles | 1.00 | 
| ML-KEM-768 encaps | 23907cycles | 23926cycles | 1.00 | 
| ML-KEM-768 decaps | 30474cycles | 30494cycles | 1.00 | 
| ML-KEM-1024 keypair | 30333cycles | 30335cycles | 1.00 | 
| ML-KEM-1024 encaps | 34540cycles | 34541cycles | 1.00 | 
| ML-KEM-1024 decaps | 44141cycles | 44143cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks
| Benchmark suite | Current: 038ab4a | Previous: 6048cab | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 59731cycles | 59708cycles | 1.00 | 
| ML-KEM-512 encaps | 67176cycles | 67100cycles | 1.00 | 
| ML-KEM-512 decaps | 85904cycles | 85702cycles | 1.00 | 
| ML-KEM-768 keypair | 101701cycles | 101880cycles | 1.00 | 
| ML-KEM-768 encaps | 113021cycles | 113147cycles | 1.00 | 
| ML-KEM-768 decaps | 139657cycles | 139940cycles | 1.00 | 
| ML-KEM-1024 keypair | 154633cycles | 155297cycles | 1.00 | 
| ML-KEM-1024 encaps | 171349cycles | 175881cycles | 0.97 | 
| ML-KEM-1024 decaps | 207314cycles | 211447cycles | 0.98 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i)
| Benchmark suite | Current: 038ab4a | Previous: 6048cab | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 9725cycles | 9609cycles | 1.01 | 
| ML-KEM-512 encaps | 11041cycles | 11092cycles | 1.00 | 
| ML-KEM-512 decaps | 15239cycles | 15103cycles | 1.01 | 
| ML-KEM-768 keypair | 16498cycles | 16475cycles | 1.00 | 
| ML-KEM-768 encaps | 17866cycles | 17760cycles | 1.01 | 
| ML-KEM-768 decaps | 23402cycles | 23429cycles | 1.00 | 
| ML-KEM-1024 keypair | 22467cycles | 22680cycles | 0.99 | 
| ML-KEM-1024 encaps | 24296cycles | 24272cycles | 1.00 | 
| ML-KEM-1024 decaps | 31947cycles | 31841cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i) (no-opt)
| Benchmark suite | Current: 038ab4a | Previous: 6048cab | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 28948cycles | 28930cycles | 1.00 | 
| ML-KEM-512 encaps | 36007cycles | 35949cycles | 1.00 | 
| ML-KEM-512 decaps | 45334cycles | 45305cycles | 1.00 | 
| ML-KEM-768 keypair | 48277cycles | 47999cycles | 1.01 | 
| ML-KEM-768 encaps | 57865cycles | 57515cycles | 1.01 | 
| ML-KEM-768 decaps | 70275cycles | 69922cycles | 1.01 | 
| ML-KEM-1024 keypair | 71624cycles | 71219cycles | 1.01 | 
| ML-KEM-1024 encaps | 83607cycles | 83197cycles | 1.00 | 
| ML-KEM-1024 decaps | 100149cycles | 99767cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks
| Benchmark suite | Current: 038ab4a | Previous: 6048cab | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 28346cycles | 28332cycles | 1.00 | 
| ML-KEM-512 encaps | 34143cycles | 34037cycles | 1.00 | 
| ML-KEM-512 decaps | 44416cycles | 44301cycles | 1.00 | 
| ML-KEM-768 keypair | 48277cycles | 48259cycles | 1.00 | 
| ML-KEM-768 encaps | 54183cycles | 54210cycles | 1.00 | 
| ML-KEM-768 decaps | 68653cycles | 68671cycles | 1.00 | 
| ML-KEM-1024 keypair | 70435cycles | 70526cycles | 1.00 | 
| ML-KEM-1024 encaps | 78853cycles | 78796cycles | 1.00 | 
| ML-KEM-1024 decaps | 98465cycles | 98349cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a)
| Benchmark suite | Current: 038ab4a | Previous: 6048cab | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 16870cycles | 16865cycles | 1.00 | 
| ML-KEM-512 encaps | 18598cycles | 18609cycles | 1.00 | 
| ML-KEM-512 decaps | 23994cycles | 24001cycles | 1.00 | 
| ML-KEM-768 keypair | 28683cycles | 28733cycles | 1.00 | 
| ML-KEM-768 encaps | 29906cycles | 29898cycles | 1.00 | 
| ML-KEM-768 decaps | 37674cycles | 37704cycles | 1.00 | 
| ML-KEM-1024 keypair | 41481cycles | 41518cycles | 1.00 | 
| ML-KEM-1024 encaps | 43822cycles | 43893cycles | 1.00 | 
| ML-KEM-1024 decaps | 54301cycles | 54381cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a)
| Benchmark suite | Current: 038ab4a | Previous: 6048cab | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 11942cycles | 11879cycles | 1.01 | 
| ML-KEM-512 encaps | 13427cycles | 13439cycles | 1.00 | 
| ML-KEM-512 decaps | 18323cycles | 18317cycles | 1.00 | 
| ML-KEM-768 keypair | 20556cycles | 20575cycles | 1.00 | 
| ML-KEM-768 encaps | 21491cycles | 21509cycles | 1.00 | 
| ML-KEM-768 decaps | 28763cycles | 28673cycles | 1.00 | 
| ML-KEM-1024 keypair | 27917cycles | 27959cycles | 1.00 | 
| ML-KEM-1024 encaps | 29609cycles | 29636cycles | 1.00 | 
| ML-KEM-1024 decaps | 39042cycles | 39068cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a) (no-opt)
| Benchmark suite | Current: 038ab4a | Previous: 6048cab | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 36416cycles | 36412cycles | 1.00 | 
| ML-KEM-512 encaps | 42888cycles | 42831cycles | 1.00 | 
| ML-KEM-512 decaps | 55700cycles | 55720cycles | 1.00 | 
| ML-KEM-768 keypair | 59563cycles | 59631cycles | 1.00 | 
| ML-KEM-768 encaps | 67561cycles | 67753cycles | 1.00 | 
| ML-KEM-768 decaps | 84931cycles | 84867cycles | 1.00 | 
| ML-KEM-1024 keypair | 87406cycles | 87652cycles | 1.00 | 
| ML-KEM-1024 encaps | 98085cycles | 98272cycles | 1.00 | 
| ML-KEM-1024 decaps | 119751cycles | 119773cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a) (no-opt)
| Benchmark suite | Current: 038ab4a | Previous: 6048cab | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 38511cycles | 38505cycles | 1.00 | 
| ML-KEM-512 encaps | 47561cycles | 47537cycles | 1.00 | 
| ML-KEM-512 decaps | 60963cycles | 60960cycles | 1.00 | 
| ML-KEM-768 keypair | 63878cycles | 63913cycles | 1.00 | 
| ML-KEM-768 encaps | 74885cycles | 74861cycles | 1.00 | 
| ML-KEM-768 decaps | 92934cycles | 92885cycles | 1.00 | 
| ML-KEM-1024 keypair | 94405cycles | 94493cycles | 1.00 | 
| ML-KEM-1024 encaps | 108804cycles | 108789cycles | 1.00 | 
| ML-KEM-1024 decaps | 131421cycles | 131739cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2
| Benchmark suite | Current: 038ab4a | Previous: 6048cab | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 28330cycles | 28306cycles | 1.00 | 
| ML-KEM-512 encaps | 34068cycles | 34098cycles | 1.00 | 
| ML-KEM-512 decaps | 44360cycles | 44404cycles | 1.00 | 
| ML-KEM-768 keypair | 48298cycles | 48265cycles | 1.00 | 
| ML-KEM-768 encaps | 54277cycles | 54145cycles | 1.00 | 
| ML-KEM-768 decaps | 68749cycles | 68639cycles | 1.00 | 
| ML-KEM-1024 keypair | 70449cycles | 70481cycles | 1.00 | 
| ML-KEM-1024 encaps | 78755cycles | 78889cycles | 1.00 | 
| ML-KEM-1024 decaps | 98309cycles | 98498cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i)
| Benchmark suite | Current: 038ab4a | Previous: 6048cab | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 16301cycles | 16268cycles | 1.00 | 
| ML-KEM-512 encaps | 18424cycles | 18556cycles | 0.99 | 
| ML-KEM-512 decaps | 24972cycles | 25030cycles | 1.00 | 
| ML-KEM-768 keypair | 27878cycles | 29162cycles | 0.96 | 
| ML-KEM-768 encaps | 31286cycles | 30470cycles | 1.03 | 
| ML-KEM-768 decaps | 41010cycles | 39360cycles | 1.04 | 
| ML-KEM-1024 keypair | 37568cycles | 37574cycles | 1.00 | 
| ML-KEM-1024 encaps | 40416cycles | 40171cycles | 1.01 | 
| ML-KEM-1024 decaps | 53882cycles | 53863cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️  Performance Alert ⚠️ 
Possible performance regression was detected for benchmark 'Intel Xeon 3rd gen (c6i)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 038ab4a | Previous: 6048cab | Ratio | 
|---|---|---|---|
| ML-KEM-768 decaps | 41010cycles | 39360cycles | 1.04 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4
| Benchmark suite | Current: 038ab4a | Previous: 6048cab | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 17636cycles | 17658cycles | 1.00 | 
| ML-KEM-512 encaps | 20651cycles | 20644cycles | 1.00 | 
| ML-KEM-512 decaps | 27078cycles | 27090cycles | 1.00 | 
| ML-KEM-768 keypair | 30226cycles | 30252cycles | 1.00 | 
| ML-KEM-768 encaps | 32961cycles | 32955cycles | 1.00 | 
| ML-KEM-768 decaps | 42172cycles | 42187cycles | 1.00 | 
| ML-KEM-1024 keypair | 43830cycles | 43858cycles | 1.00 | 
| ML-KEM-1024 encaps | 48845cycles | 48895cycles | 1.00 | 
| ML-KEM-1024 decaps | 61577cycles | 61572cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4 (no-opt)
| Benchmark suite | Current: 038ab4a | Previous: 6048cab | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 35375cycles | 35402cycles | 1.00 | 
| ML-KEM-512 encaps | 40792cycles | 40806cycles | 1.00 | 
| ML-KEM-512 decaps | 51543cycles | 51533cycles | 1.00 | 
| ML-KEM-768 keypair | 59439cycles | 58804cycles | 1.01 | 
| ML-KEM-768 encaps | 65595cycles | 66166cycles | 0.99 | 
| ML-KEM-768 decaps | 80068cycles | 80061cycles | 1.00 | 
| ML-KEM-1024 keypair | 87688cycles | 87785cycles | 1.00 | 
| ML-KEM-1024 encaps | 97075cycles | 97069cycles | 1.00 | 
| ML-KEM-1024 decaps | 116268cycles | 116310cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i) (no-opt)
| Benchmark suite | Current: 038ab4a | Previous: 6048cab | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 46117cycles | 46150cycles | 1.00 | 
| ML-KEM-512 encaps | 54851cycles | 54831cycles | 1.00 | 
| ML-KEM-512 decaps | 69951cycles | 69981cycles | 1.00 | 
| ML-KEM-768 keypair | 76003cycles | 75991cycles | 1.00 | 
| ML-KEM-768 encaps | 87052cycles | 86935cycles | 1.00 | 
| ML-KEM-768 decaps | 106918cycles | 106758cycles | 1.00 | 
| ML-KEM-1024 keypair | 110478cycles | 110517cycles | 1.00 | 
| ML-KEM-1024 encaps | 125335cycles | 124882cycles | 1.00 | 
| ML-KEM-1024 decaps | 150500cycles | 150109cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3
| Benchmark suite | Current: 038ab4a | Previous: 6048cab | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 18673cycles | 18683cycles | 1.00 | 
| ML-KEM-512 encaps | 22017cycles | 22000cycles | 1.00 | 
| ML-KEM-512 decaps | 28979cycles | 29006cycles | 1.00 | 
| ML-KEM-768 keypair | 31899cycles | 31932cycles | 1.00 | 
| ML-KEM-768 encaps | 34989cycles | 34980cycles | 1.00 | 
| ML-KEM-768 decaps | 45030cycles | 45049cycles | 1.00 | 
| ML-KEM-1024 keypair | 46287cycles | 46313cycles | 1.00 | 
| ML-KEM-1024 encaps | 51581cycles | 51641cycles | 1.00 | 
| ML-KEM-1024 decaps | 65169cycles | 65214cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2 (no-opt)
| Benchmark suite | Current: 038ab4a | Previous: 6048cab | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 59231cycles | 59398cycles | 1.00 | 
| ML-KEM-512 encaps | 68776cycles | 68910cycles | 1.00 | 
| ML-KEM-512 decaps | 87506cycles | 87493cycles | 1.00 | 
| ML-KEM-768 keypair | 100095cycles | 99171cycles | 1.01 | 
| ML-KEM-768 encaps | 111196cycles | 111365cycles | 1.00 | 
| ML-KEM-768 decaps | 135788cycles | 136304cycles | 1.00 | 
| ML-KEM-1024 keypair | 148806cycles | 148930cycles | 1.00 | 
| ML-KEM-1024 encaps | 164667cycles | 164465cycles | 1.00 | 
| ML-KEM-1024 decaps | 195869cycles | 195927cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3 (no-opt)
| Benchmark suite | Current: 038ab4a | Previous: 6048cab | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 38780cycles | 38795cycles | 1.00 | 
| ML-KEM-512 encaps | 44906cycles | 44928cycles | 1.00 | 
| ML-KEM-512 decaps | 56675cycles | 56672cycles | 1.00 | 
| ML-KEM-768 keypair | 65028cycles | 64175cycles | 1.01 | 
| ML-KEM-768 encaps | 72044cycles | 72738cycles | 0.99 | 
| ML-KEM-768 decaps | 88047cycles | 87957cycles | 1.00 | 
| ML-KEM-1024 keypair | 95700cycles | 95668cycles | 1.00 | 
| ML-KEM-1024 encaps | 106316cycles | 106273cycles | 1.00 | 
| ML-KEM-1024 decaps | 126732cycles | 126846cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SpacemiT K1 8 (Banana Pi F3) benchmarks
| Benchmark suite | Current: 038ab4a | Previous: 6048cab | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 155232cycles | 155240cycles | 1.00 | 
| ML-KEM-512 encaps | 163138cycles | 163110cycles | 1.00 | 
| ML-KEM-512 decaps | 206355cycles | 206370cycles | 1.00 | 
| ML-KEM-768 keypair | 261038cycles | 260968cycles | 1.00 | 
| ML-KEM-768 encaps | 275782cycles | 275538cycles | 1.00 | 
| ML-KEM-768 decaps | 338138cycles | 337761cycles | 1.00 | 
| ML-KEM-1024 keypair | 395569cycles | 395152cycles | 1.00 | 
| ML-KEM-1024 encaps | 424071cycles | 422201cycles | 1.00 | 
| ML-KEM-1024 decaps | 505419cycles | 506020cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
| @mkannwischer Unfortunately, one cannot fully discharge those constraints, as sometimes one hashed straight into a buffer provided at the toplevel API. One would need to either a) add alignment constraints at the toplevel, b) add a copy at the toplevel, c) avoid the alignment constraints altogether. We should definitely extend our testing to exercise unaligned buffers at the toplevel. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks
| Benchmark suite | Current: 038ab4a | Previous: 6048cab | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 50771cycles | 51671cycles | 0.98 | 
| ML-KEM-512 encaps | 59096cycles | 59384cycles | 1.00 | 
| ML-KEM-512 decaps | 75960cycles | 76049cycles | 1.00 | 
| ML-KEM-768 keypair | 88076cycles | 87579cycles | 1.01 | 
| ML-KEM-768 encaps | 96737cycles | 95925cycles | 1.01 | 
| ML-KEM-768 decaps | 118599cycles | 119410cycles | 0.99 | 
| ML-KEM-1024 keypair | 130194cycles | 129911cycles | 1.00 | 
| ML-KEM-1024 encaps | 141882cycles | 142563cycles | 1.00 | 
| ML-KEM-1024 decaps | 173487cycles | 173582cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
| 
 That is indeed unfortunate. Do you still want to merge this PR now? It gives the false impression that FIPS202 can assume alignment of the output. | 
| 
 No, definitely not in this form. | 
mlkem-native will only call the FIPS202 API with aligned buffers, which can be useful for users providing their own FIPS202 implementation.
This commit extends the documentation and CBMC contracts accordingly, and adds debug assertions for checking alignment. Note that CBMC does, to the best of my knowledge, not support talking about absolute pointer alignment, so we cannot rely on proof here.
This is relevant for zerorisc/expo#97