Skip to content

perf(lottery): per-signer Taylor cache + factored compare (~46% fewer guest cycles)#10

Merged
Sbcdn merged 3 commits into
mainfrom
perf/taylor-per-signer-cache
Jun 11, 2026
Merged

perf(lottery): per-signer Taylor cache + factored compare (~46% fewer guest cycles)#10
Sbcdn merged 3 commits into
mainfrom
perf/taylor-per-signer-cache

Conversation

@Sbcdn

@Sbcdn Sbcdn commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Two structural optimizations to the BLS lottery — the dominant per-cert proving cost — plus the test scaffolding and divergence-registry updates that prove they hold against upstream Mithril.

Changes

Per-signer Taylor cache. The (phi_plus, phi_minus) error-bound sequence is a pure function of the per-signer x and per-cert three; only the comparison against q is per-index. Build it once per signer and replay the cached bounds instead of recomputing the full series for every index (~48 indices/signer on mainnet).

Factor ev_max out of the compare. q.numer is always ev_max, so the ev_max * bound.denom side of each cross-multiply is constant per (signer, level). Precompute it once per cached bound, halving the per-level wide-muls.

Guest cycles (oaks_cert cycle_bench, mainnet SD certs)

cert before after Δ
sd_a 301.2M 153.2M -49.1%
sd_b 316.9M 171.4M -45.9%
sd_c 276.1M 147.9M -46.4%

Correctness

Both changes are bit-identical to the prior code by construction (the cache is a pure refactor; the factoring precomputes one operand of the same cross-multiply).

  • Full equivalence harness (real MithrilCertificateVerifier over the corpus + ~25 mutation axes): green.
  • Divergence-registry corpus verdict-equivalence gate: green.
  • upstream_differential: the cache/compare diffed against a faithful BigInt re-port of upstream mithril-stm's lottery over 2M realistic + 200k overflow-regime + 40k random inputs and a decision-boundary sweep — 0 regressions, 0 soundness-direction divergences.
  • factored_compare_*: q_gt_bound/q_lt_bound pinned bit-for-bit against crypto-ratio's gt/lt, including near-equal (tied-high-limb) constructions that end-to-end fuzz cannot reach.

Also documents three pre-existing lottery approximations vs upstream (from_float ~2^-52, ev_max 2^512-1, U512 Taylor overflow-panic) as divergence-registry entries #7-#9, and adds a guest-only guest-bench feature for per-section cycle profiling.

Sbcdn added 3 commits June 11, 2026 22:45
The per-index lottery check recomputed the full Taylor expansion of
exp(x) for every index, but the (phi_plus, phi_minus) bound sequence is
a pure function of the per-signer x and the per-cert `three` — only the
comparison against q is per-index. Build the sequence once per signer
and replay the cached bounds; the expensive Ratio512 normalize/wide-mul
work now amortises across a signer's indices (47.6 indices/signer on a
real mainnet cert).

Guest cycle counts (oaks_cert cycle_bench, mainnet SD certs):
  sd_a  301.2M -> 184.7M  (-38.7%)
  sd_b  316.9M -> 202.4M  (-36.1%)
  sd_c  276.1M -> 173.2M  (-37.3%)
genesis unchanged (no lottery path).

Behaviourally a no-op: the cached bounds are bit-identical to the
pre-cache series by construction. Verified by differential fuzz in
`upstream_differential` (cache == old series over 2M realistic + 200k
overflow-regime inputs; 0 mismatches vs a faithful re-port of upstream
mithril-stm's exact BigInt lottery over 40k random inputs; 0
soundness-direction divergences in the decision-boundary sweep) and by
`taylor_cache_tests` (688 synthetic + 2428 real mainnet cert indices).

Documents three pre-existing lottery approximations vs upstream
(from_float ~2^-52, ev_max 2^512-1, U512 Taylor overflow-panic) as
divergence-registry entries #7-#9; none introduced by this change.
q.numer is always ev_max, so the ev_max*bound.denom side of each compare
is constant per (signer, level). Precompute it once per cached Taylor
bound and reuse it across the signer's indices, halving the per-level
wide-muls. ~15-17% fewer guest cycles per SD cert.

Bit-identical to the unfactored compare; pinned by factored_compare_*
against crypto-ratio gt/lt and the upstream differential suite.

Also adds a guest-only guest-bench feature with per-section cycle_count
probes in verify_bls_multisig.
@Sbcdn Sbcdn merged commit 4466979 into main Jun 11, 2026
1 check failed
@Sbcdn Sbcdn deleted the perf/taylor-per-signer-cache branch June 11, 2026 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant