Skip to content

Perf/lottery fewer wide muls#9

Merged
Sbcdn merged 3 commits into
mainfrom
perf/lottery-fewer-wide-muls
Jun 11, 2026
Merged

Perf/lottery fewer wide muls#9
Sbcdn merged 3 commits into
mainfrom
perf/lottery-fewer-wide-muls

Conversation

@Sbcdn

@Sbcdn Sbcdn commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

Sbcdn added 3 commits June 10, 2026 22:46
The Taylor lottery test computes `phi + error_term` and
`phi - error_term` each iteration. Replace the two separate adds with
the fused `Ratio::add_sub`, which shares the three cross-multiplications
between the sum and difference — 3 U512 wide-multiplies per iteration
instead of 6. Together with crypto-ratio's integer-operand mul fast
path this cuts ~22% of the per-cert wide-multiplies, ~11% of total
guest cycles, with byte-identical verification output (full mainnet
corpus equivalence and num-rational differential both green).

Note: crypto-ratio dependency is a local path during development; repoint
to the published version before merge.
The divisor is bounded by the Taylor iteration count, so it fits a u64.
`div_by_u64` scales the denominator with a single-limb multiply instead
of widening to U512 — a clean, unreduced equivalent of div_by_uint.

On the cycle bench this adds ~2.7% over the add_sub/mul-fastpath stack
(~12-14% total vs baseline). Validated bit-identical to div_by_uint on
the zkVM guest (in-guest assertion across the full bench corpus) and via
the host equivalence gate.
add_sub / div_by_u64 land in crypto-ratio 0.2.0; bump the dependency
from 0.1.0.
@Sbcdn Sbcdn force-pushed the perf/lottery-fewer-wide-muls branch from f83c5d4 to eaf4cfa Compare June 11, 2026 14:02
@Sbcdn Sbcdn marked this pull request as ready for review June 11, 2026 14:16
@Sbcdn Sbcdn merged commit 283b676 into main Jun 11, 2026
1 check failed
@Sbcdn Sbcdn deleted the perf/lottery-fewer-wide-muls branch June 11, 2026 22:45
Sbcdn added a commit that referenced this pull request Jun 18, 2026
check_bounds used pos + needed > len; on the 32-bit guest a wire length
field near u32::MAX wraps the add to a small value that passes the check
and then panics the slice. Phrase as needed > len - pos (pos <= len is
invariant) so adversarial lengths reject cleanly as OutOfBounds instead
of panicking. Covers all 25 read sites; behaviour-identical on every
non-wrapping input.

Add a primitive-level pin for the wrap (the only way to exercise it off
the 32-bit target), an oversized-prefix reject case, and a malformed-byte
panic-safety fuzz over the parser.

README: divergence #9 (former U512 Taylor overflow panic) is resolved by
the wide fallback; describe the residual U2048 ceiling instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant