perf(lottery): per-signer Taylor cache + factored compare (~46% fewer guest cycles)#10
Merged
Conversation
The per-index lottery check recomputed the full Taylor expansion of exp(x) for every index, but the (phi_plus, phi_minus) bound sequence is a pure function of the per-signer x and the per-cert `three` — only the comparison against q is per-index. Build the sequence once per signer and replay the cached bounds; the expensive Ratio512 normalize/wide-mul work now amortises across a signer's indices (47.6 indices/signer on a real mainnet cert). Guest cycle counts (oaks_cert cycle_bench, mainnet SD certs): sd_a 301.2M -> 184.7M (-38.7%) sd_b 316.9M -> 202.4M (-36.1%) sd_c 276.1M -> 173.2M (-37.3%) genesis unchanged (no lottery path). Behaviourally a no-op: the cached bounds are bit-identical to the pre-cache series by construction. Verified by differential fuzz in `upstream_differential` (cache == old series over 2M realistic + 200k overflow-regime inputs; 0 mismatches vs a faithful re-port of upstream mithril-stm's exact BigInt lottery over 40k random inputs; 0 soundness-direction divergences in the decision-boundary sweep) and by `taylor_cache_tests` (688 synthetic + 2428 real mainnet cert indices). Documents three pre-existing lottery approximations vs upstream (from_float ~2^-52, ev_max 2^512-1, U512 Taylor overflow-panic) as divergence-registry entries #7-#9; none introduced by this change.
q.numer is always ev_max, so the ev_max*bound.denom side of each compare is constant per (signer, level). Precompute it once per cached Taylor bound and reuse it across the signer's indices, halving the per-level wide-muls. ~15-17% fewer guest cycles per SD cert. Bit-identical to the unfactored compare; pinned by factored_compare_* against crypto-ratio gt/lt and the upstream differential suite. Also adds a guest-only guest-bench feature with per-section cycle_count probes in verify_bls_multisig.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two structural optimizations to the BLS lottery — the dominant per-cert proving cost — plus the test scaffolding and divergence-registry updates that prove they hold against upstream Mithril.
Changes
Per-signer Taylor cache. The
(phi_plus, phi_minus)error-bound sequence is a pure function of the per-signerxand per-certthree; only the comparison againstqis per-index. Build it once per signer and replay the cached bounds instead of recomputing the full series for every index (~48 indices/signer on mainnet).Factor
ev_maxout of the compare.q.numeris alwaysev_max, so theev_max * bound.denomside of each cross-multiply is constant per (signer, level). Precompute it once per cached bound, halving the per-level wide-muls.Guest cycles (oaks_cert cycle_bench, mainnet SD certs)
Correctness
Both changes are bit-identical to the prior code by construction (the cache is a pure refactor; the factoring precomputes one operand of the same cross-multiply).
MithrilCertificateVerifierover the corpus + ~25 mutation axes): green.upstream_differential: the cache/compare diffed against a faithful BigInt re-port of upstream mithril-stm's lottery over 2M realistic + 200k overflow-regime + 40k random inputs and a decision-boundary sweep — 0 regressions, 0 soundness-direction divergences.factored_compare_*:q_gt_bound/q_lt_boundpinned bit-for-bit against crypto-ratio'sgt/lt, including near-equal (tied-high-limb) constructions that end-to-end fuzz cannot reach.Also documents three pre-existing lottery approximations vs upstream (
from_float~2^-52,ev_max2^512-1, U512 Taylor overflow-panic) as divergence-registry entries #7-#9, and adds a guest-onlyguest-benchfeature for per-section cycle profiling.