- 
                Notifications
    You must be signed in to change notification settings 
- Fork 38
CBMC: Refine bounds for input and output of base multiplication #906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
1326e42    to
    f9b25ee      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks
| Benchmark suite | Current: ee52c4e | Previous: 8dba13b | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 12257cycles | 12254cycles | 1.00 | 
| ML-KEM-512 encaps | 14848cycles | 14846cycles | 1.00 | 
| ML-KEM-512 decaps | 19725cycles | 19721cycles | 1.00 | 
| ML-KEM-768 keypair | 21048cycles | 21051cycles | 1.00 | 
| ML-KEM-768 encaps | 23677cycles | 23679cycles | 1.00 | 
| ML-KEM-768 decaps | 30515cycles | 30515cycles | 1 | 
| ML-KEM-1024 keypair | 29977cycles | 29978cycles | 1.00 | 
| ML-KEM-1024 encaps | 34479cycles | 34478cycles | 1.00 | 
| ML-KEM-1024 decaps | 43430cycles | 43442cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i)
| Benchmark suite | Current: ee52c4e | Previous: 8dba13b | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 9597cycles | 9477cycles | 1.01 | 
| ML-KEM-512 encaps | 11103cycles | 11217cycles | 0.99 | 
| ML-KEM-512 decaps | 15370cycles | 15166cycles | 1.01 | 
| ML-KEM-768 keypair | 16376cycles | 16303cycles | 1.00 | 
| ML-KEM-768 encaps | 17765cycles | 17790cycles | 1.00 | 
| ML-KEM-768 decaps | 23604cycles | 23444cycles | 1.01 | 
| ML-KEM-1024 keypair | 22063cycles | 22001cycles | 1.00 | 
| ML-KEM-1024 encaps | 24180cycles | 24012cycles | 1.01 | 
| ML-KEM-1024 decaps | 31927cycles | 31587cycles | 1.01 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i) (no-opt)
| Benchmark suite | Current: ee52c4e | Previous: 8dba13b | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 30097cycles | 30164cycles | 1.00 | 
| ML-KEM-512 encaps | 35523cycles | 35677cycles | 1.00 | 
| ML-KEM-512 decaps | 44923cycles | 45084cycles | 1.00 | 
| ML-KEM-768 keypair | 48474cycles | 48448cycles | 1.00 | 
| ML-KEM-768 encaps | 55666cycles | 55690cycles | 1.00 | 
| ML-KEM-768 decaps | 68997cycles | 68876cycles | 1.00 | 
| ML-KEM-1024 keypair | 74954cycles | 74942cycles | 1.00 | 
| ML-KEM-1024 encaps | 83393cycles | 83410cycles | 1.00 | 
| ML-KEM-1024 decaps | 99713cycles | 99614cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks
| Benchmark suite | Current: ee52c4e | Previous: 8dba13b | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 59336cycles | 59330cycles | 1.00 | 
| ML-KEM-512 encaps | 66928cycles | 66920cycles | 1.00 | 
| ML-KEM-512 decaps | 86001cycles | 86014cycles | 1.00 | 
| ML-KEM-768 keypair | 101044cycles | 101087cycles | 1.00 | 
| ML-KEM-768 encaps | 112086cycles | 112317cycles | 1.00 | 
| ML-KEM-768 decaps | 139218cycles | 139358cycles | 1.00 | 
| ML-KEM-1024 keypair | 153559cycles | 153471cycles | 1.00 | 
| ML-KEM-1024 encaps | 172905cycles | 170285cycles | 1.02 | 
| ML-KEM-1024 decaps | 210032cycles | 207958cycles | 1.01 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a)
| Benchmark suite | Current: ee52c4e | Previous: 8dba13b | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 11636cycles | 11702cycles | 0.99 | 
| ML-KEM-512 encaps | 13259cycles | 13439cycles | 0.99 | 
| ML-KEM-512 decaps | 18151cycles | 18310cycles | 0.99 | 
| ML-KEM-768 keypair | 20132cycles | 20125cycles | 1.00 | 
| ML-KEM-768 encaps | 21192cycles | 21112cycles | 1.00 | 
| ML-KEM-768 decaps | 28294cycles | 28224cycles | 1.00 | 
| ML-KEM-1024 keypair | 27080cycles | 27067cycles | 1.00 | 
| ML-KEM-1024 encaps | 29207cycles | 29123cycles | 1.00 | 
| ML-KEM-1024 decaps | 38665cycles | 38639cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a)
| Benchmark suite | Current: ee52c4e | Previous: 8dba13b | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 17167cycles | 17264cycles | 0.99 | 
| ML-KEM-512 encaps | 18926cycles | 19057cycles | 0.99 | 
| ML-KEM-512 decaps | 24408cycles | 24513cycles | 1.00 | 
| ML-KEM-768 keypair | 29445cycles | 29552cycles | 1.00 | 
| ML-KEM-768 encaps | 30852cycles | 30875cycles | 1.00 | 
| ML-KEM-768 decaps | 38787cycles | 38640cycles | 1.00 | 
| ML-KEM-1024 keypair | 42726cycles | 42962cycles | 0.99 | 
| ML-KEM-1024 encaps | 45290cycles | 45604cycles | 0.99 | 
| ML-KEM-1024 decaps | 55637cycles | 55827cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i)
| Benchmark suite | Current: ee52c4e | Previous: 8dba13b | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 16123cycles | 16149cycles | 1.00 | 
| ML-KEM-512 encaps | 18258cycles | 18247cycles | 1.00 | 
| ML-KEM-512 decaps | 24788cycles | 24723cycles | 1.00 | 
| ML-KEM-768 keypair | 27724cycles | 27803cycles | 1.00 | 
| ML-KEM-768 encaps | 29501cycles | 29387cycles | 1.00 | 
| ML-KEM-768 decaps | 38979cycles | 38836cycles | 1.00 | 
| ML-KEM-1024 keypair | 37668cycles | 37578cycles | 1.00 | 
| ML-KEM-1024 encaps | 40643cycles | 40554cycles | 1.00 | 
| ML-KEM-1024 decaps | 53079cycles | 53168cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️  Performance Alert ⚠️ 
Possible performance regression was detected for benchmark 'Intel Xeon 3rd gen (c6i)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: f9b25ee | Previous: 8dba13b | Ratio | 
|---|---|---|---|
| ML-KEM-768 encaps | 30902cycles | 29387cycles | 1.05 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a) (no-opt)
| Benchmark suite | Current: ee52c4e | Previous: 8dba13b | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 36353cycles | 36358cycles | 1.00 | 
| ML-KEM-512 encaps | 42971cycles | 42959cycles | 1.00 | 
| ML-KEM-512 decaps | 55977cycles | 55963cycles | 1.00 | 
| ML-KEM-768 keypair | 59342cycles | 59249cycles | 1.00 | 
| ML-KEM-768 encaps | 67754cycles | 67706cycles | 1.00 | 
| ML-KEM-768 decaps | 84988cycles | 84932cycles | 1.00 | 
| ML-KEM-1024 keypair | 87597cycles | 87560cycles | 1.00 | 
| ML-KEM-1024 encaps | 98261cycles | 98317cycles | 1.00 | 
| ML-KEM-1024 decaps | 120160cycles | 120104cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a) (no-opt)
| Benchmark suite | Current: ee52c4e | Previous: 8dba13b | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 39011cycles | 39026cycles | 1.00 | 
| ML-KEM-512 encaps | 47090cycles | 47097cycles | 1.00 | 
| ML-KEM-512 decaps | 60662cycles | 60745cycles | 1.00 | 
| ML-KEM-768 keypair | 63305cycles | 63315cycles | 1.00 | 
| ML-KEM-768 encaps | 74013cycles | 73946cycles | 1.00 | 
| ML-KEM-768 decaps | 91862cycles | 91861cycles | 1.00 | 
| ML-KEM-1024 keypair | 94066cycles | 94031cycles | 1.00 | 
| ML-KEM-1024 encaps | 107373cycles | 107041cycles | 1.00 | 
| ML-KEM-1024 decaps | 129368cycles | 129724cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4
| Benchmark suite | Current: ee52c4e | Previous: 8dba13b | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 17808cycles | 17771cycles | 1.00 | 
| ML-KEM-512 encaps | 21034cycles | 21026cycles | 1.00 | 
| ML-KEM-512 decaps | 27670cycles | 27696cycles | 1.00 | 
| ML-KEM-768 keypair | 30684cycles | 30710cycles | 1.00 | 
| ML-KEM-768 encaps | 33605cycles | 33580cycles | 1.00 | 
| ML-KEM-768 decaps | 43197cycles | 43195cycles | 1.00 | 
| ML-KEM-1024 keypair | 44319cycles | 44315cycles | 1.00 | 
| ML-KEM-1024 encaps | 49573cycles | 49591cycles | 1.00 | 
| ML-KEM-1024 decaps | 62550cycles | 62497cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i) (no-opt)
| Benchmark suite | Current: ee52c4e | Previous: 8dba13b | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 47469cycles | 47466cycles | 1.00 | 
| ML-KEM-512 encaps | 55355cycles | 55344cycles | 1.00 | 
| ML-KEM-512 decaps | 71584cycles | 71556cycles | 1.00 | 
| ML-KEM-768 keypair | 77391cycles | 77538cycles | 1.00 | 
| ML-KEM-768 encaps | 87479cycles | 87559cycles | 1.00 | 
| ML-KEM-768 decaps | 108019cycles | 108276cycles | 1.00 | 
| ML-KEM-1024 keypair | 113202cycles | 113314cycles | 1.00 | 
| ML-KEM-1024 encaps | 126336cycles | 126306cycles | 1.00 | 
| ML-KEM-1024 decaps | 152907cycles | 152929cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4 (no-opt)
| Benchmark suite | Current: ee52c4e | Previous: 8dba13b | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 35922cycles | 35918cycles | 1.00 | 
| ML-KEM-512 encaps | 40862cycles | 40842cycles | 1.00 | 
| ML-KEM-512 decaps | 52270cycles | 52306cycles | 1.00 | 
| ML-KEM-768 keypair | 60047cycles | 60759cycles | 0.99 | 
| ML-KEM-768 encaps | 66957cycles | 67536cycles | 0.99 | 
| ML-KEM-768 decaps | 81391cycles | 81425cycles | 1.00 | 
| ML-KEM-1024 keypair | 89042cycles | 89028cycles | 1.00 | 
| ML-KEM-1024 encaps | 98838cycles | 98877cycles | 1.00 | 
| ML-KEM-1024 decaps | 117682cycles | 117716cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3
| Benchmark suite | Current: ee52c4e | Previous: 8dba13b | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 18864cycles | 18877cycles | 1.00 | 
| ML-KEM-512 encaps | 22344cycles | 22358cycles | 1.00 | 
| ML-KEM-512 decaps | 29577cycles | 29566cycles | 1.00 | 
| ML-KEM-768 keypair | 32236cycles | 32250cycles | 1.00 | 
| ML-KEM-768 encaps | 35628cycles | 35598cycles | 1.00 | 
| ML-KEM-768 decaps | 45941cycles | 45965cycles | 1.00 | 
| ML-KEM-1024 keypair | 46491cycles | 46487cycles | 1.00 | 
| ML-KEM-1024 encaps | 52069cycles | 52080cycles | 1.00 | 
| ML-KEM-1024 decaps | 65916cycles | 65962cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2
| Benchmark suite | Current: ee52c4e | Previous: 8dba13b | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 29410cycles | 29444cycles | 1.00 | 
| ML-KEM-512 encaps | 34613cycles | 34680cycles | 1.00 | 
| ML-KEM-512 decaps | 45213cycles | 45279cycles | 1.00 | 
| ML-KEM-768 keypair | 50051cycles | 50139cycles | 1.00 | 
| ML-KEM-768 encaps | 55328cycles | 55224cycles | 1.00 | 
| ML-KEM-768 decaps | 70158cycles | 70096cycles | 1.00 | 
| ML-KEM-1024 keypair | 73082cycles | 73098cycles | 1.00 | 
| ML-KEM-1024 encaps | 81472cycles | 81514cycles | 1.00 | 
| ML-KEM-1024 decaps | 101379cycles | 101471cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3 (no-opt)
| Benchmark suite | Current: ee52c4e | Previous: 8dba13b | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 38966cycles | 39685cycles | 0.98 | 
| ML-KEM-512 encaps | 44939cycles | 44864cycles | 1.00 | 
| ML-KEM-512 decaps | 56734cycles | 56842cycles | 1.00 | 
| ML-KEM-768 keypair | 64234cycles | 64182cycles | 1.00 | 
| ML-KEM-768 encaps | 72031cycles | 72669cycles | 0.99 | 
| ML-KEM-768 decaps | 88070cycles | 88056cycles | 1.00 | 
| ML-KEM-1024 keypair | 96228cycles | 96095cycles | 1.00 | 
| ML-KEM-1024 encaps | 106307cycles | 106211cycles | 1.00 | 
| ML-KEM-1024 decaps | 127120cycles | 127057cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SpacemiT K1 8 (Banana Pi F3) benchmarks
| Benchmark suite | Current: ee52c4e | Previous: 8dba13b | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 225601cycles | 225593cycles | 1.00 | 
| ML-KEM-512 encaps | 271839cycles | 271821cycles | 1.00 | 
| ML-KEM-512 decaps | 346406cycles | 346326cycles | 1.00 | 
| ML-KEM-768 keypair | 374348cycles | 374216cycles | 1.00 | 
| ML-KEM-768 encaps | 433877cycles | 433736cycles | 1.00 | 
| ML-KEM-768 decaps | 532770cycles | 532650cycles | 1.00 | 
| ML-KEM-1024 keypair | 554398cycles | 554463cycles | 1.00 | 
| ML-KEM-1024 encaps | 632515cycles | 632521cycles | 1.00 | 
| ML-KEM-1024 decaps | 754467cycles | 754358cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks
| Benchmark suite | Current: ee52c4e | Previous: 8dba13b | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 52821cycles | 53196cycles | 0.99 | 
| ML-KEM-512 encaps | 60537cycles | 61239cycles | 0.99 | 
| ML-KEM-512 decaps | 77877cycles | 77578cycles | 1.00 | 
| ML-KEM-768 keypair | 89670cycles | 89929cycles | 1.00 | 
| ML-KEM-768 encaps | 97185cycles | 98860cycles | 0.98 | 
| ML-KEM-768 decaps | 121486cycles | 123436cycles | 0.98 | 
| ML-KEM-1024 keypair | 135647cycles | 134458cycles | 1.01 | 
| ML-KEM-1024 encaps | 148998cycles | 147089cycles | 1.01 | 
| ML-KEM-1024 decaps | 180865cycles | 180649cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2 (no-opt)
| Benchmark suite | Current: ee52c4e | Previous: 8dba13b | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 59326cycles | 59431cycles | 1.00 | 
| ML-KEM-512 encaps | 68033cycles | 68011cycles | 1.00 | 
| ML-KEM-512 decaps | 86704cycles | 86649cycles | 1.00 | 
| ML-KEM-768 keypair | 98815cycles | 98924cycles | 1.00 | 
| ML-KEM-768 encaps | 110974cycles | 110402cycles | 1.01 | 
| ML-KEM-768 decaps | 134574cycles | 134811cycles | 1.00 | 
| ML-KEM-1024 keypair | 148680cycles | 148822cycles | 1.00 | 
| ML-KEM-1024 encaps | 163724cycles | 163875cycles | 1.00 | 
| ML-KEM-1024 decaps | 195449cycles | 195498cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks
| Benchmark suite | Current: ee52c4e | Previous: 8dba13b | Ratio | 
|---|---|---|---|
| ML-KEM-512 keypair | 29409cycles | 29440cycles | 1.00 | 
| ML-KEM-512 encaps | 34618cycles | 34675cycles | 1.00 | 
| ML-KEM-512 decaps | 45210cycles | 45264cycles | 1.00 | 
| ML-KEM-768 keypair | 50053cycles | 50138cycles | 1.00 | 
| ML-KEM-768 encaps | 55322cycles | 55220cycles | 1.00 | 
| ML-KEM-768 decaps | 70154cycles | 70075cycles | 1.00 | 
| ML-KEM-1024 keypair | 73076cycles | 73079cycles | 1.00 | 
| ML-KEM-1024 encaps | 81445cycles | 81517cycles | 1.00 | 
| ML-KEM-1024 decaps | 101376cycles | 101473cycles | 1.00 | 
This comment was automatically generated by workflow using github-action-benchmark.
5e5944a    to
    40541d4      
    Compare
  
    c83ec7c    to
    8055638      
    Compare
  
    ee61e85    to
    a86fbc4      
    Compare
  
    | CBMC for ML-KEM-512,768 only fails because the native contracts need adjusting and the HOL-Light proofs reworking accordingly. Checking what goes wrong with ML-KEM-1024. | 
| BLOCKER: The AVX2 basemul does not obey the new basemul bounds because it does not use lazy reduction. Unless/Until this is reworked -- which I'm not sure we should do as it was barely finished (#1097) -- we cannot proceed with this PR. | 
a86fbc4    to
    d7dbe26      
    Compare
  
    c9ffb66    to
    b518349      
    Compare
  
    | Still to do on this PR: 
 | 
f7b6b29    to
    65200d7      
    Compare
  
    - Change mlk_polyvec back to struct { mlk_poly vec[MLKEM_K]; }
- Change mlk_polymat to struct { mlk_polyvec vec[MLKEM_K]; }
- Update all function signatures to use pointer style
- Fix all implementations to use struct member access
- Update tests, benchmarks, and CBMC harnesses
- Add consistent const annotations
Signed-off-by: Hanno Becker <[email protected]>
    Previously, the base multiplication would assume that one of its inputs
is bound by 4096 in absolute value, but make no assumptions about the
other input ("b-input" henceforth) and its mulcache.
This commit refines the bounds slightly, as follows:
- The b-input is assumed to be bound by MLK_NTT_BOUND in absolute value.
  This comes for free since all values for b _are_ results of the NTT.
- The b-cache-input is assumed to be bound by MLKEM_Q in absolute value.
With those additional bounds in place, it can be showed that the result
of the base multiplication is below INT16_MAX/2 in absolute value.
Accordingly, this can be added as a precondition for the inverse NTT.
Those refined bounds can help in subsequent commits to improve the
reduction strategy inside the inverse NTT.
For the native AVX2 backend, the new output bound for the mulcache
forces an explicit zeroization of the mulcache. This is not ideal
since the cache is in fact entirely unused, but the performance
penalty should be marginal (if the compiler can't eliminate the
zeroization in the first place).
Signed-off-by: Hanno Becker <[email protected]>
    Signed-off-by: Rod Chapman <[email protected]>
Signed-off-by: Rod Chapman <[email protected]>
Signed-off-by: Rod Chapman <[email protected]>
Signed-off-by: Rod Chapman <[email protected]>
Signed-off-by: Rod Chapman <[email protected]>
polyvec_basemul_acc_montgomery_cached_asm_k[234] TODO - Re-check specifications and proof of AArch64 implementations of these 3 functions. Signed-off-by: Rod Chapman <[email protected]>
Signed-off-by: Rod Chapman <[email protected]>
65200d7    to
    dca2a1f      
    Compare
  
    
Previously, the base multiplication would assume that one of its inputs is bound by 4096 in absolute value, but make no assumptions about the other input ("b-input" henceforth) and its mulcache.
This commit refines the bounds slightly, as follows:
With those additional bounds in place, it can be showed that the result of the base multiplication is below INT16_MAX/2 in absolute value. Accordingly, this can be added as a precondition for the inverse NTT.
For the native AVX2 backend, the new output bound for the mulcache forces an explicit zeroization of the mulcache. This is not ideal since the cache is in fact entirely unused, but the performance penalty should be marginal (if the compiler can't eliminate the zeroization in the first place).