Benchmarking post-quantum signatures and protocol-level randomness surfaces on EVM by gas cost per cryptographically meaningful bit.
- Methodology (surfaces + weakest-link)
- PQ aggregation surfaces (BLS → PQ) — why this matters
- Core Metric
- Public Review Entry Points
- Why This Exists
- New: Weakest-link + Protocol Readiness Surfaces
- Reproducible Reports & Data Policy
- Gas extraction modes (snapshot vs logs)
- XOF Vector Suite (Keccak-CTR vs FIPS SHAKE)
- Canonical test vectors + calldata packs
- Measured Protocol Surfaces (EVM/L1)
- Chart
- Current Dataset (EVM/L1) — Gas Snapshots
- What We Built
- Repo Layout
- Dataset Schema (CSV)
- Security Normalization (Explicit Assumptions)
- Quick Start
- PreA Convention (ML-DSA-65)
- Vendor benchmarks (pinned refs)
- Benchmarks Included
- Related Work / References
- Roadmap
- License
- Disclaimer
- Maintainer
- Citation
To avoid mixing benchmark scopes, the dataset supports a surface taxonomy and an optional dependency graph:
- Canonical execution surfaces (S0–S3):
spec/case_graph.md - Case catalog / baseline nodes:
spec/case_catalog.md - Weakest-link report (generated):
reports/weakest_link_report.md - One-page status summary:
reports/summary.md
We separate app-facing verifier surfaces (ERC-1271 / ERC-7913 / AA) from protocol-facing surfaces
(e.g., a precompile-style verifier boundary, tagged as sig::protocol) to avoid mixing scopes.
Surface taxonomy: ERC-7913 adapters represent the app-facing verification boundary (wallets, dapps, AA integration), while sig::protocol (e.g., EIP-7932 candidate) represents a protocol-facing interface (precompile / enshrined verifier boundary).
For any pipeline record with depends_on[]:
effective_security_bits = min(security_bits(dep_i))
Where security_bits(x) is derived from records with security_metric_type ∈ {security_equiv_bits, lambda_eff, H_min}.
This repo may record multiple denominators for the same bench (e.g., lambda_eff for conservative crypto strength and security_equiv_bits for declared normalization) as separate records; comparisons must state which denominator is used.
Some vendor harnesses expose gas via Foundry snapshot lines (gas: N), while others print it via logs
(e.g., Gas used: N). Runners support both via scripts/extract_foundry_gas.py using a per-run needle
(e.g., Gas used:) so vendor repos do not need to be modified.
This repository is an experimental benchmarking lab spun out of
ml-dsa-65-ethereum-verification,
which provides the on-chain verification artifacts (Solidity implementation + gas harnesses) used as a primary vendor source.
It exists to answer one practical question for Ethereum engineers:
How expensive is "real security" on EVM — once you normalize gas by a declared security target and by protocol constraints?
In other words: "gas/verify" is not enough if the protocol envelope bounds end-to-end security even when the wallet uses PQ signatures.
GitHub does not render LaTeX by default, so the canonical formula is written in plain form:
gas_per_bit = gas_verify / security_metric_value
Where:
- gas_verify — on-chain gas used to verify a signature / proof (or a verifiable computation step).
- security_metric_type — what the denominator represents:
- for signatures / proofs (today):
security_equiv_bits(a declared classical-equivalent bits normalization convention) - for randomness / VRF / protocol surfaces:
H_min(min-entropy of the verified output under an explicit threat model)
- for signatures / proofs (today):
- security_metric_value — the denominator value in bits.
For the current signature-only dataset, we typically report:
gas_per_secure_bit = gas_verify / security_equiv_bits
This allows apples-to-apples comparisons across schemes at different security targets, under explicit assumptions.
If you have 10 minutes:
- reports/protocol_readiness.md — protocol constraints and why "gas/verify" can be misleading.
- spec/case_catalog.md + spec/case_graph.md — AA weakest-link (envelope dominance) cases + canonical graphs.
- spec/gas_per_secure_bit.md — definitions, normalization rules, reporting conventions.
- spec/xof_vector_suite.md + data/vectors/xof_vectors.json — canonical XOF wiring vectors (FIPS SHAKE + Keccak-CTR).
- data/results.jsonl — canonical dataset (CSV + reports are deterministically rebuilt from it).
Most public comparisons stop at "gas per verify". That hides critical differences:
- Different security levels (e.g., ECDSA ~128-bit convention vs ML-DSA-65 ~192 vs Falcon-1024 ~256)
- Different verification surfaces (EOA vs ERC-1271 vs EIP-4337 pipeline)
- Different protocol envelopes (e.g., L1 constraints can bound end-to-end security regardless of wallet scheme)
This repo focuses on:
- normalized units (gas per declared security-equivalent bit), and
- protocol-aware interpretation (weakest-link / envelope dominance).
This is designed to compare not only gas, but also what actually bounds security in protocol-aligned paths (envelopes, attestations, entropy dependencies).
Besides single-bench gas numbers, this repo also models end-to-end PQ readiness of real execution paths.
- Weakest-link security: for a pipeline record with
depends_on, the effective security is the minimum across dependencies.- Example: AA/UserOp paths can be PQ at the wallet layer but still be bounded by the L1 envelope assumption.
- Entropy / attestation surfaces: measured protocol surfaces (RANDAO, relay attestation) with
H_mindenominators.
Reports:
- Mermaid diagram:
spec/weakest_link.mmd - Example (Falcon Cat5 bounded by ECDSA envelope):
spec/weakest_link_falcon_ecdsa.mmd
This repository follows a single canonical source of truth model for benchmark data and reports.
To keep benchmarks comparable across implementations, this repo treats test vectors and calldata conventions as external, pinned artifacts.
Canonical packs live in pqevm-vector-packs (vectors + calldata shapes):
- repo: https://github.com/pipavlo82/pqevm-vector-packs
- purpose: single source of truth for (scheme, variant, packing, calldata) so different projects do not benchmark different conventions
This repo may reference packs via dataset metadata fields (e.g. vector_pack_ref, vector_pack_id, vector_id) when available.
data/results.jsonlis the only canonical input.- Each line is exactly one JSON object (JSONL).
- All edits, additions, and corrections must be done in
data/results.jsonlonly.
The following files are derived deterministically and must not be edited by hand:
data/results.csvreports/summary.mdreports/weakest_link_report.mdreports/protocol_readiness.mddocs/gas_per_secure_bit.svgdocs/gas_per_secure_bit_big.svg
Charts are derived from data/results.csv and must not be edited by hand. They are rebuilt from data/results.jsonl.
To regenerate all derived files locally:
bash scripts/make_reports.shThis script will:
- Rebuild
data/results.csvfromdata/results.jsonl - Validate JSONL integrity
- Enforce uniqueness of
(scheme, bench_name, repo, commit, chain_profile) - Generate all reports (including protocol readiness)
Pipeline roles:
scripts/parse_bench.py— ingestion +--regenrebuildsdata/results.csvfromdata/results.jsonlscripts/make_reports.sh— runs sanity checks + regenerates all reportsscripts/make_protocol_readiness.py— generatesreports/protocol_readiness.mdscripts/patch_protocol_readiness_*.py— inject pinned vendor snapshots intoreports/protocol_readiness.md(markers:MLDSA65_VENDOR_*,FALCON_VENDOR_*,ETHDILITHIUM_VENDOR_*; invoked fromscripts/make_reports.sh)
CI runs the same pipeline and fails if any generated file is not committed.
CI workflow: .github/workflows/reports.yml
It runs ./scripts/make_reports.sh and fails if git diff is non-empty.
git diff --statIf the working tree is not clean after running make_reports.sh, the pull request will fail.
Multiple records may exist for the same benchmark under different chain_profile values (e.g. EVM/L1, EVM/L2:arbitrum_one).
These represent execution-equivalent measurements with different fee or threat-model contexts:
- EVM execution gas is assumed equal across profiles
- Data availability / calldata pricing differs by chain
To prevent "silent convention drift" across PQ verifier implementations, this repo includes a small XOF wiring vector suite that covers both common EVM-relevant approaches:
- FIPS SHAKE128/SHAKE256 (standard / precompile-friendly)
- Keccak-CTR-style XOF (EVM-constrained / gas-oriented)
- Spec:
spec/xof_vector_suite.md - Vectors:
data/vectors/xof_vectors.json - Verifier:
scripts/verify_vectors.py
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
python scripts/verify_vectors.py data/vectors/xof_vectors.jsonA dedicated workflow validates vectors on every PR:
.github/workflows/vectors.yml
Some protocol-level "surfaces" are now measured for gas on EVM/L1.
For these entries, gas is measured, while the security denominator (e.g., H_min) may still be a placeholder until the threat model is finalized.
Current measured surfaces:
randao::l1_randao_mix_surface— gas = 5,820,H_min= 32 (placeholder)randao::mix_for_sample_selection_surface— gas = 13,081,H_min= 32 (placeholder) (randomness access for DAS sample selection)attestation::relay_attestation_surface— gas = 43,876,H_min= 128 (placeholder)das::verify_sample_512b_surface— gas = 2,464,das_sample_bits= 4096 (512B sample verification surface)
Reproduce measurements and refresh dataset + reports:
./scripts/run_protocol_surfaces.sh
./scripts/make_reports.shSee also:
reports/protocol_readiness.md(weakest-link readiness table)reports/weakest_link_report.mdreports/summary.md
(Full-detail chart: docs/gas_per_secure_bit.svg)
NOTE: Charts are derived from
data/results.csv(rebuilt fromdata/results.jsonlvia./scripts/make_reports.sh). If you change normalization conventions (e.g., ML-DSA-65 128 → 192), regenerate the dataset and charts.
Source of truth: data/results.jsonl (CSV is deterministically rebuilt by scripts/parse_bench.py --regen via ./scripts/make_reports.sh).
Normalization note: For ML-DSA-65 we report
security_equiv_bits=192(FIPS-204 Category 3 convention) in tables. Some raw vendor ingests may also recordlambda_eff=128as a budgeting denominator; those are clearly labeled as baseline and are not used for "secure-bit" comparisons.
| Scheme | Bench name | gas_verify | security_metric_value (bits) | gas / secure-bit |
|---|---|---|---|---|
| ECDSA | ecdsa_verify_ecrecover_foundry | 21,126 | 128 | 165.047 |
| ECDSA | ecdsa_erc1271_isValidSignature_foundry | 21,413 | 128 | 167.289 |
| ECDSA | ecdsa_verify_bytes65_foundry | 24,032 | 128 | 187.750 |
| Falcon | qa_getUserOpHash_foundry | 218,333 | 256 | 852.863 |
| ML-DSA-65 | preA_compute_w_fromPackedA_ntt_rho0_log | 1,499,354 | 192 | 7,809.135 |
| ML-DSA-65 | preA_compute_w_fromPackedA_ntt_rho1_log | 1,499,354 | 192 | 7,809.135 |
| Falcon | falcon_verifySignature_log | 10,336,055 | 256 | 40,375.215 |
| Falcon | qa_validateUserOp_userop_log | 10,589,132 | 256 | 41,363.797 |
| Falcon | qa_handleOps_userop_foundry | 10,966,076 | 256 | 42,836.234 |
| ML-DSA-65 | verify_poc_foundry | 68,901,612 | 192 | 358,862.563 |
| Scheme | Bench name | gas_verify | security_metric_value (bits) | gas / secure-bit |
|---|---|---|---|---|
| RANDAO | l1_randao_mix_surface | 5,820 | 32 (H_min) | 181.875 |
| RANDAO | mix_for_sample_selection_surface | 13,081 | 32 (H_min) | 408.781 |
| Attestation | relay_attestation_surface | 43,876 | 128 (H_min) | 342.781 |
| DAS | verify_sample_512b_surface | 2,464 | 4096 (das_sample_bits) | 0.602 |
Note: Protocol surfaces use security_metric_type=H_min; the current H_min values are declared placeholders until the threat model is pinned down. Gas numbers are measured; denominators are provisional. For surfaces, gas_verify denotes the measured gas of the surface operation/harness. For das_sample_bits, the denominator is not security bits but data size (512 bytes × 8 = 4096 bits), so this represents "gas per verified data bit" (budgeting), not "gas per secure-bit".
Dataset currently stores ML-DSA rows as lambda_eff=128; the 192-bit normalization is shown in reports vendor block / notes.
Notes:
qa_handleOps_userop_foundryincludes the full EIP-4337 pipeline (EntryPoint.handleOps), so it is not a "pure signature verify" cost.falcon_verifySignature_logis a clean verifySignature-only microbench extracted from QuantumAccount logs.verify_poc_foundryfor ML-DSA-65 is a full decode + checks +w = A*z − c*t1POC (FIPS-204 shape), built for correctness + reproducibility.
A reproducible benchmark lab with:
- Canonical dataset:
data/results.jsonl(source of truth) - Derived table:
data/results.csv - Schema/spec documents under
spec/ scripts/parse_bench.py --regendeterministically rebuildsdata/results.csvfromdata/results.jsonl
scripts/run_vendor_mldsa.sh— ML-DSA-65 (Foundry gas + log extraction for PreA)scripts/run_vendor_ethdilithium.sh— Dilithium (ZKNoxHQ/ETHDILITHIUM) benchesscripts/run_vendor_quantumaccount.sh— QuantumAccount (Falcon) benches + log-based gas extractionscripts/run_ecdsa.sh— ECDSA baselines (ecrecover, bytes65 wrapper, ERC-1271)scripts/run_protocol_surfaces.sh— measures protocol surfaces gas (RANDAO mix + relay attestation), replaces old records, regenerates reports
- Weakest-link / envelope dominance case catalog and canonical graphs:
spec/case_catalog.mdspec/case_graph.md
- Protocol readiness narrative:
reports/protocol_readiness.md
bench/— microbench contracts/tests for local measurementscripts/— runners + parsersdata/— dataset outputs (CSV/JSONL)docs/— charts (SVG) derived from datasetspec/— definitions, methodology, case catalog, schema notesreports/— narrative reports connecting results to protocol constraints
Vendored repos may be stored under vendors/; provenance is always recorded per row. Vendor licensing remains upstream.
Tabular format: data/results.csv (derived from canonical data/results.jsonl)
Columns:
ts_utc— timestamp UTCrepo,commit— provenance of the implementationscheme— e.g.,mldsa65,ecdsa,falcon1024,randao,attestationbench_name— benchmark identifierchain_profile— e.g.,EVM/L1(extendable to L2 profiles)gas_verify— gas used for the benchsecurity_metric_type— e.g.,security_equiv_bits(signatures) orH_min(randomness/VRF/protocol)security_metric_value— metric value in bits (e.g., 128 / 192 / 256)gas_per_secure_bit— computed asgas_verify / security_metric_valuehash_profile— e.g.,keccak256orunknownnotes— context + refs (runner, branch, extraction method)
Additional (optional) fields used for composed pipelines:
security_model— e.g.raworweakest_linkdepends_on— list of dependency record keys (scheme::bench_name) used to compute effective security
Right now (signature dataset) we primarily use:
security_metric_type = security_equiv_bitssecurity_metric_value ∈ {128.0, 192.0, 256.0}
For protocol surfaces:
security_metric_type = H_minsecurity_metric_value= declared min-entropy placeholder
- In
data/results.jsonl,provenanceis a nested JSON object, e.g.:{"repo":"ZKNoxHQ/ETHDILITHIUM","commit":"...","path":"vendor/ETHDILITHIUM"}. - In
data/results.csv,provenanceis stored as a JSON string (CSV-escaped quotes). This is intentional: it stays parseable by standard CSV tooling +json.loads().
This repo separates:
- A scheme's security category (when applicable), and
- A declared security-equivalent bits normalization value used for comparisons:
security_equiv_bits.
| Scheme | Security Category | security_equiv_bits |
Notes |
|---|---|---|---|
| ECDSA (secp256k1) | - | 128 | classical security convention |
| ML-DSA-65 (FIPS-204) | Category 3 | 192 | classical-equivalent convention |
| Falcon-1024 | Category 5 | 256 | classical-equivalent convention |
Important: These are normalization conventions, not security proofs. The rule is that they are explicit and applied consistently.
Dilithium normalization is parameter-set dependent (e.g., Dilithium2/3/5). Until the vendor variant is pinned to a
declared set, ETHDILITHIUM rows are recorded with lambda_eff=128 for budgeting comparability.
When security_metric_type=lambda_eff, the resulting gas_per_secure_bit column should be interpreted as a budgeting
ratio (gas per assumed baseline), not a claim of equivalent classical security.
If you want "per 128-bit baseline" as a convenience view:
gas_per_128b = gas_verify / 128
Label it explicitly as baseline (not "secure-bit").
- Linux/WSL recommended
git- Foundry (
forge) - Python 3
From repo root:
cd /path/to/gas-per-secure-bit
# ECDSA (rows)
RESET_DATA=0 ./scripts/run_ecdsa.sh
# Protocol surfaces (RANDAO + relay) — measured + replace semantics
./scripts/run_protocol_surfaces.sh
# ML-DSA (rows)
RESET_DATA=0 MLDSA_REF="feature/mldsa-ntt-opt-phase12-erc7913-packedA" ./scripts/run_vendor_mldsa.sh
# QuantumAccount/Falcon (rows)
QA_REF=main RESET_DATA=0 ./scripts/run_vendor_quantumaccount.sh
# Regenerate derived artifacts (CSV + reports)
bash scripts/make_reports.sh
wc -l data/results.jsonl data/results.csv
tail -n 20 data/results.csvNote: If you want to rebuild from scratch, delete data/results.jsonl and rerun the runners. Derived files (CSV + reports) will be regenerated automatically.
cut -d, -f4,5 data/results.csv | tail -n +2 | sort | uniq -cpython3 - <<'PY'
import csv
rows=list(csv.DictReader(open("data/results.csv")))
rows.sort(key=lambda r: float(r["gas_per_secure_bit"]))
for r in rows:
print(f'{r["scheme"]:10s} {r["bench_name"]:38s} gas={int(r["gas_verify"]):>9,d} gas/bit={float(r["gas_per_secure_bit"]):>12,.3f}')
PYbash scripts/make_reports.sh
ls -la reports/For benchmarks using precomputed A_ntt matrices, this repo follows a canonical calldata layout and provides
on-chain execution proofs for reproducibility.
- PreA (packedA_ntt) convention: docs/preA_packedA_ntt.md
- On-chain proof runner:
script/RunPreAOnChain.s.sol
# Terminal 1: Start local chain
anvil
# Terminal 2: Run on-chain proof script
forge script script/RunPreAOnChain.s.sol:RunPreAOnChain \
--rpc-url http://127.0.0.1:8545 \
--private-key $PK \
--broadcast -vvgas_compute_w_fromPacked_A_ntt(rho0) 1499354
gas_compute_w_fromPacked_A_ntt(rho1) 1499354
This provides a wiring-consistency proof: the same packedA_ntt construction used in the microbench
is executed on-chain and produces identical rho0/rho1 measurements, with broadcast artifacts saved for audit.
Note: the on-chain runner script lives in the ML-DSA vendor repo and is executed via the pinned vendor runner; this repo records the resulting broadcast artifact for reproducibility.
See also:
- Broadcast artifact:
vendors/ml-dsa-65-ethereum-verification/broadcast/RunPreAOnChain.s.sol/31337/run-latest.json - Deployed runner contract:
0xe7f1725e7734ce288f8367e1bb143e90bb3f0512(anvil, chainId=31337)
Vendor runners append measurements into data/results.jsonl with explicit provenance, then regenerate reports.
# ML-DSA-65 (pinned ref)
export MLDSA_REF=feature/mldsa-ntt-opt-phase12-erc7913-packedA
bash scripts/run_vendor_mldsa.sh
# Dilithium (ZKNoxHQ/ETHDILITHIUM, pinned commit by default in the runner)
bash scripts/run_vendor_ethdilithium.sh
# Falcon / QuantumAccount (if present)
bash scripts/run_vendor_quantumaccount.sh
# Rebuild CSV + reports (and inject vendor blocks)
bash scripts/make_reports.shverify_poc_foundry— full decode + checks + w = Az − ct1 verify POCpreA_compute_w_fromPackedA_ntt_rho{0,1}_log— compute_w microbench from packed A_ntt (PreA path)
ecdsa_verify_ecrecover_foundryecdsa_verify_bytes65_foundryecdsa_erc1271_isValidSignature_foundry
Ingested benches (EVM/L1 gas snapshots from the vendor repo):
ethdilithium_eth_verify_log— Dilithium verify (ETH mode), gas extracted from logsethdilithium_nist_verify_log— Dilithium verify (NIST mode), gas extracted from logsethdilithium_p256verify_log— P-256 verify microbench (log-based) included by the vendor repo
Runner:
scripts/run_vendor_ethdilithium.sh
Ingested benches:
qa_getUserOpHash_foundry— EntryPoint helperqa_handleOps_userop_foundry— end-to-end AA pipelineqa_validateUserOp_userop_log— account validation path (log-based gas)falcon_verifySignature_log— clean verifySignature-only microbench (log-based gas)
Local microbench copy:
bench/falcon/Falcon_GasMicro.t.sol
randao::l1_randao_mix_surface— Foundry gas harness (measured)randao::mix_for_sample_selection_surface— Foundry gas harness (measured)attestation::relay_attestation_surface— Foundry gas harness (measured)das::verify_sample_512b_surface— Foundry gas harness (measured)
- NIST FIPS-204 (ML-DSA): https://csrc.nist.gov/pubs/fips/204/final
- ZKNoxHQ:
- ETHFALCON: https://github.com/ZKNoxHQ/ETHFALCON
- ETHDILITHIUM: https://github.com/ZKNoxHQ/ETHDILITHIUM
- Paul Angus (Falcon discussions):
- EthResearch profile: https://ethresear.ch/u/paulangusbark
- Falcon reference site: https://falcon-sign.info
- EIP-4337 (EntryPoint / AA): https://eips.ethereum.org/EIPS/eip-4337
- EIP-1271 (contract wallet signatures): https://eips.ethereum.org/EIPS/eip-1271
- QuantumAccount (Falcon1024 AA stack): https://github.com/Cointrol-Limited/QuantumAccount
- Foundry: https://getfoundry.sh/
- Harden spec text in
spec/gas_per_secure_bit.md(definitions, assumptions, reporting rules). - Add more schemes: Dilithium, BLS, other PQ candidates relevant to EVM.
- Expand "weakest-link" catalog with more protocol cases and explicit attacker models.
- Add VRF / randomness objects with explicit
H_mindenominators under stated trust models. - Add L2 profiles (
chain_profile) and standardize reporting across L1/L2.
Converge dataset schema + methodology into a draft spec others can reuse:
- reproducible runners,
- canonical case catalog,
- comparable benchmark definitions,
- explicit security normalization rules.
See LICENSE (and vendor repo licenses where applicable). This repository records benchmark artifacts and provenance; vendor code remains licensed by upstream.
This is an experimental benchmarking lab. Results are not a security proof. Use the data as comparative engineering evidence under explicitly stated assumptions.
Maintained by Pavlo Tvardovskyi (GitHub: @pipavlo82)
Contact: [email protected]
If you use this repository (methodology, dataset schema, runners, or benchmarks) in research or production evaluation, please cite it as:
Pavlo Tvardovskyi, gas-per-secure-bit (GitHub repository), 2025.
https://github.com/pipavlo82/gas-per-secure-bit
For reproducibility, cite a tag or commit hash.
Ethereum's BLS aggregation provides scalability via algebraic structure, but practical verification still includes linear work to reconstruct aggregate public keys from participation bitfields. Post-quantum signature families lose this algebraic aggregation property, pushing the system toward proof-based aggregation (recursive SNARKs or folding / accumulation schemes).
As a result, "gas per verify" alone is insufficient: engineering decisions require surface-aware, security-normalized benchmarks across L1/L2/AA verification surfaces and, eventually, PQ aggregation proof verification surfaces.