perf: execution and tracegen rewrite #1567

jonathanpwang · 2025-04-10T04:51:47Z

To be squash merged

What's Changed

See changelog for details.

The STARK backend MultiStarkVerifyingKeys for all existing VM configs remains unchanged between v1.3.0 and this PR, with the exception of the RootVerifierProvingKey noted below. However the AppVerifyingKey has a difference in binary serialization due to the removal of an as_offset field from MemoryDimensions. This workflow shows that after converting to account for this change:

all AppVerifyingKeys remain unchanged
all STARK aggregation vkeys except the Root Verifier vkey remain unchanged
the Root Verifier vkey's trace height constraints were updated to fix a missing permutation. This changes the pre-vkey hash of the Root Verifier vkey, which affects the Halo2 static verifier's initial Fiat-Shamir transcript. However this change does not impact the security of existing Root Verifier vkeys generated through the SDK because Root Verifier proofs have fixed trace heights and these trace heights have been checked to satisfy all trace height constraints statically.,
the Halo2Verifier.sol Solidity verifier contract has changed between v1.3 and this PR change the Halo2Config::default() now uses verifier_k = 23 instead of verifier_k = 24 for a smaller default halo2 circuit. The initial Fiat-Shamir transcript state also changed due to the Root Verifier vkey change above.

Note: this PR is not targeting `main`. I've used `TODO` and `TEMP` to mark places in code that will need to be cleaned up before merging to `main`. Beginning the refactor of online memory to allow different host types in different address spaces. Going to touch a lot of APIs. Focusing on stabilizing APIs - currently this PR will not improve performance. Tests will not all pass because I have intentionally disabled some logging required for trace generation. Only execution tests will pass (or run the execute benchmark). In future PR(s): - [ ] make `Memory` trait for execution read/write API - [ ] better handling of type conversions for memory image - [ ] replace the underlying memory implementation with other implementations like mmap Towards INT-3743 Even with wasteful conversions, execution is faster: Before: https://github.com/openvm-org/openvm/actions/runs/14318675080 After: https://github.com/openvm-org/openvm/actions/runs/14371335248?pr=1559

Not merging to main Add `GuestMemory` trait and implement for `AddressMap`. We are moving more towards a trait based style to re-use code when different types of memory might be swapped out.

- make `VmSegmentExecutor` generic on `<Mem, Ctx, Ctrl>` where: - `Mem`: struct that implements `GuestMemory` - `Ctx`: struct that stores host context during execution - `Ctrl`: struct that implements pre/post segment execution hooks, termination condition and instruction execution logic - add `TracegenVmSegmentExecutor` that implements the current execution flow - move segmentation strategies to new file

- deleting `Vm{Adapter,Core}Chip` traits - no more records, directly use trace buffer - jal_lui chip is a demonstration of the new changes with working unit tests - changed unit tester - [x] need to add some dummy volatile memory to the tester to balance based on touched addresses

…1590) - introduce a new generic `InsExecutorE1` trait - add `InsExecutor::execute_e1` for rv32im instructions

…1589) Co-authored-by: Alexander Golovanov <[email protected]>

- fix some loadstore tests - remove records - wrap unsafe memory read/writes into safe wrappers --------- Co-authored-by: Jonathan Wang <[email protected]>

closes INT-3839 --------- Co-authored-by: Ayush Shukla <[email protected]>

- make `Rv32HintStoreChip` use the `NewVmChipWrapper` - rename `SingleTraceStep` to `TraceStep` and update it to work for chips whose execution creates multiple trace rows - comment out criterion execute benchmarks for now

one line fix. now that we're only initializing `TracingMemory` with `new`, we should remove this line from `with_image`

remove `memory/offline.rs` as we aren't using it anymore. Delete `VmAdapterChip` trait and `VmChipWrapper` since we also aren't using them anymore.

Made the rv32im tests pass and made all the testing files to have the same testing interface. Deleted the `test_adapter`. Kept all the test cases unchanged. The only commented test case remaining is the `store` test to the address space 4, which is failing because currently memory accesses with block size 4 are not supported with the address space 4. All the test files have 3 types of tests: Positive, Negative, and Sanity tests. All the test files have 2 helper functions: `create_test_chip`, `set_and_execute`. An important thing to notice about negative tests when expecting an interaction fail (aka ChallangePhase error) is that ther might be an imbalance created for the wrong reasons. For example, there might be an imbalance on the range checker bus created by the interactions: [send 1] (sent from the chip_air) [receive 2] (the execution did `add_count(2)` at some point) This is not a "valid" fail since 1 is still in the range of the range checker. Because of this a manual check is needed for all the negative checks. To see all the imbalances occurred during a test remove the 'disable_debug_builder();' line from the `run_negative_test` function and run the test. I am 95% sure that I wen through all the negative tests and checked that the imbalances occurred are correct. The `test_adapter` tried to address this issue by getting rid of interaction imbalances on the memory bus. But even with the `test _adapter` a manual check was necessary. To solve this I suggest that we somehow keep all the interactions that occur during the test and automatically check that actually an invalid interaction has happened on a specified bus. Resolves INT-3975 --------- Co-authored-by: Ayush Shukla <[email protected]>

Fixed an error in divrem negative tests. The trace pranking was done incorrectly. 2 instructions were being called (so the trace had height 2) each time but only one of the rows was being modified. Changed it so only one instruction is called each time Also, made the setup_tracing the default

Implemented e1 and e3 for HeapBranch, Heap, and VecHeap adapters. Updated the Bigint circuit correspondingly. Had to make some changes in the interfaces of rv32im Steps. In particular - Changed Reads type `([u8; N], [u8; N])` into `Into<[[u8;N];2]>` and Writes type `[u8; N]` into `From<[[u8;N];1]>`. This change corresponds to what we used to do with the previous integration API in order to make the interfaces to match. - Got rid of TraceAdapterContext in a lot of places. This is because the same Step can be using different AdapterSteps that require different TraceContexts. Or even the AdapterStep might require a `TraceContext` that the Step doesn't have. The easy solution was to implement AdapterSteps in a similar way as in the previous integration API. That is, added the necessary fields to the AdapterStep structs. I am thinking maybe deleting the `TraceContext` from the interface makes sense. I am not sure if there is a better way to do this Important Note: the tests don't run right now because a lot of the read/write operations are done in address space 2 with block size 32 but currently only block size 4 is supported by the memory. Resolves INT-3980

Closes INT-4013

Resolves INT-3801. - Added memory access adapters. To improve: * Allocate the trace buffer once before filling it as opposed to pushing to `Vec` how it's done now, * Maybe not call `get_f` too often (although I don't know how to avoid it normally). - Added volatile and persistent boundary chips tracegen, - Added merkle chip tracegen as described [here](https://docs.google.com/document/d/12cH7ZYRFWHgflpPzOILb7bg5XExdyWOL4vwrQ9HFGkQ/edit?tab=t.0#heading=h.hrg0oexxgu9). To improve: * Parallelize at least something, * Maybe support passing this struct between segments. - `VmChipTestBuilder` now has `::default_persistent`, so all tests in `extensions/rv32im/circuit` pass both with volatile and persistent memory interface.

`cargo` complains that `uuid` has a conflict checksum.

I used to handle creating new blocks in a wrong way when `align > initial_block_size`, now I hopefully do it right. Also added persistent base alu tests, although nothing changed for the persistent case, and added a dummy access in all of them that used to fail.

codspeed-hq · 2025-05-15T14:35:41Z

CodSpeed Instrumentation Performance Report

Merging #1567 will not alter performance

_{Comparing feat/new-execution (0c44b81) with feat/new-execution (d572c40)}

Summary

✅ 24 untouched benchmarks

This resolves INT-4012 by not using memory controller's memory in E1 execution.

implemented e1 and e3 for `VecHeapTwoReads` and `eq_mod` rv32 adapters. Implemented e1 and e3 for mod-builder. Updated the `algebra` and `ecc` extensions accordingly. Deleted all the pairing chips All the tests successfully run. Also, added back the address space 4 loadstore tests. Resolves INT-3914

- add codspeed walltime measurement job - tweak execution benchmarks to be heavier and more representative

…sts) (#1659) This resolves INT-3913. As a _side effect_, this removes `GuestMemory` trait -- it is a struct now with underlying `AddressMap<PAGE_SIZE>` (I didn't make `type GuestMemory = ...` because the vaguely called `read` and `write` methods would be too vaguely called for `AddressMap`). `VmStateMut` is generic over `MEM` though. I didn't fully implement `TraceStep` and `StepExecutorE1` for the phantom chip because the chip is relatively easy and I'm not sure it would be better expressible in terms of `NewVmChipWrapper`. `PhantomSubExecutor` also changed a little (now accepts `u32` instead of `F`, for example, and also `GuestMemory` instead of what it needed before).

breaking change: `SystemConfig::default()` now has `continuation_enabled: true` (previously it was `false`). closes INT-4526 INT-4525

I was going to rename `max_segment_length`, but it was in too many places of the CI so I didn't bother.

Copilot

Pull Request Overview

This PR introduces a comprehensive rewrite of the VM execution and trace generation architecture, specifically focusing on performance improvements through interpreter-based execution and optimized tracegen. The changes introduce multiple execution modes (E1/E2/E3) and redesign the chip complex system to support different backends and arenas.

Key changes:

Introduces a new interpreted execution system with pre-computed function pointers for performance
Redesigns the VM extension system with separate traits for execution, circuit, and prover extensions
Restructures memory configuration and adds support for multiple address spaces with different cell types

Reviewed Changes

Copilot reviewed 157 out of 518 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
crates/vm/src/arch/interpreter.rs	New interpreter implementation with pre-computed function pointers for E1/E2 execution
crates/vm/src/arch/integration_api.rs	Redesigned integration API with new trace filler patterns and chip wrapper abstractions
crates/vm/src/arch/hasher/mod.rs	Updated hasher trait to be thread-safe with Send + Sync bounds
crates/vm/src/arch/extensions.rs	Major restructuring of VM extension system with separate execution/circuit/prover traits
crates/vm/src/arch/execution_mode/	New execution mode system with pure, metered, and metered cost variants
crates/vm/src/arch/execution.rs	Updated execution traits and error handling for new interpreter system
crates/vm/src/arch/config.rs	Enhanced configuration system with address space management and memory cell types

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

crates/vm/src/arch/interpreter.rs

crates/vm/src/arch/execution_mode/metered_cost.rs

crates/vm/src/arch/execution_mode/metered/memory_ctx.rs

crates/vm/src/arch/config.rs

Closes INT-4644, INT-4645, INT-4650, INT-4648 (links are checked automatically by vocs)

) Closes INT-4569

Testing: - Verified on clean Ubuntu install - Verified on clean MacOS install with `xcode-select --install` already run Closes INT-4585

closes INT-4790 usage: ``` cargo openvm --verbose keygen cargo openvm --verbose prove app ```

@nyunyunyunyu

Reduces memory fragmentation by returning pages to OS with some delay. Credited to @nyunyunyunyu

- emit a warning if `max_constraint_degree` don't matching between `VmConfig` and `StarkFriConfig`

github-actions · 2025-08-19T15:42:48Z

group	app.proof_time_ms	app.cycles	app.cells_used	leaf.proof_time_ms	leaf.cycles	leaf.cells_used
verify_fibair	(+1082 [+104.2%]) 2,120	(+322700 [+inf%]) 322,700	(+1410672 [+8.1%]) 18,750,324	-	-	-
fibonacci	2,406	(+1500210 [+inf%]) 1,500,210	(+915004 [+1.8%]) 51,504,507	(+982 [+31.5%]) 4,096	(+1248019 [+inf%]) 1,248,019	(+1051762 [+1.5%]) 70,886,320
regex	(+501 [+7.0%]) 7,656	(+4108597 [+inf%]) 4,108,597	(-1784464 [-1.1%]) 164,734,992	(-1013 [-8.2%]) 11,400	(+3326652 [+inf%]) 3,326,652	(-55882973 [-18.6%]) 244,539,630
ecrecover	(+400 [+39.4%]) 1,416	(+140487 [+inf%]) 140,487	(+638816 [+7.8%]) 8,866,654	(+317 [+3.0%]) 10,848	(+2934903 [+inf%]) 2,934,903	(+3647772 [+1.5%]) 247,226,574
pairing	(+117 [+3.1%]) 3,942	(+1882939 [+inf%]) 1,882,939	(+846438 [+0.9%]) 98,834,293	(-2287 [-29.7%]) 5,415	(+2010444 [+inf%]) 2,010,444	(-57513491 [-28.0%]) 148,011,675
fib_e2e	20,930	12,000,210	410,933,839	24,705	7,462,425	441,087,103
kitchen_sink	15,589	153,644	904,738,232	23,599	7,904,011	769,363,478

Commit: f73da4d

Benchmark Workflow

jonathanpwang force-pushed the feat/new-execution branch from ea0bdd9 to ea3775c Compare April 14, 2025 02:04

jonathanpwang force-pushed the main branch from 9901980 to 61194f6 Compare May 2, 2025 17:24

jonathanpwang and others added 12 commits May 2, 2025 10:58

feat: GuestMemory trait (#1574)

01c028d

Not merging to main Add `GuestMemory` trait and implement for `AddressMap`. We are moving more towards a trait based style to re-use code when different types of memory might be swapped out.

feat(new-execution): add execute functions for rv32im instructions (#…

13d6234

…1590) - introduce a new generic `InsExecutorE1` trait - add `InsExecutor::execute_e1` for rv32im instructions

feat(new-execution): AdapterTraceStep trait and rv32 ALU refactor (#…

b85bfc2

…1589) Co-authored-by: Alexander Golovanov <[email protected]>

feat(new-execution): rv32im tracegen and e1 execution (#1607)

bcde3f8

fix(new-execution): fix some rv32im loadstore tests (#1611)

ae9635c

- fix some loadstore tests - remove records - wrap unsafe memory read/writes into safe wrappers --------- Co-authored-by: Jonathan Wang <[email protected]>

feat: access adapters (#1614)

e885ec0

closes INT-3839 --------- Co-authored-by: Ayush Shukla <[email protected]>

fix(new-execution): make rv32im hintstore work (#1616)

8c2d6af

- make `Rv32HintStoreChip` use the `NewVmChipWrapper` - rename `SingleTraceStep` to `TraceStep` and update it to work for chips whose execution creates multiple trace rows - comment out criterion execute benchmarks for now

fix(new-execution): don't override min block size (#1619)

6c86876

one line fix. now that we're only initializing `TracingMemory` with `new`, we should remove this line from `with_image`

fix: auipc tracegen

4a2ac13

jonathanpwang force-pushed the feat/new-execution branch from 6f764a4 to 4a2ac13 Compare May 2, 2025 18:07

jonathanpwang and others added 9 commits May 2, 2025 22:52

chore: remove OfflineMemory (#1623)

e9cabd2

remove `memory/offline.rs` as we aren't using it anymore. Delete `VmAdapterChip` trait and `VmChipWrapper` since we also aren't using them anymore.

feat(new-execution): add codspeed execution benchmark (#1643)

9174577

Closes INT-4013

fix: Cargo lock & format (#1650)

65ab533

`cargo` complains that `uuid` has a conflict checksum.

feat(new-execution): trigger codspeed ci on branch pushes (#1654)

164eb0e

Golovanov399 and others added 3 commits May 15, 2025 10:37

feat: new execution remove redundant controller memory e1 (#1653)

327fe86

This resolves INT-4012 by not using memory controller's memory in E1 execution.

feat(new-execution): measure walltime in codspeed ci job (#1656)

cf8bcdc

- add codspeed walltime measurement job - tweak execution benchmarks to be heavier and more representative

This comment was marked as outdated.

Sign in to view

jonathanpwang added 2 commits August 15, 2025 08:40

feat(sdk): update interfaces (#1962)

e83064b

breaking change: `SystemConfig::default()` now has `continuation_enabled: true` (previously it was `false`). closes INT-4526 INT-4525

chore(benchmark): add segment_max_cells to benchmark CLI args (#1977)

ae8f139

I was going to rename `max_segment_length`, but it was in too many places of the CI so I didn't bother.

This comment was marked as outdated.

Sign in to view

jonathanpwang changed the title ~~perf(DON'T MERGE): [WIP] execution and tracegen rewrite~~ perf: execution and tracegen rewrite Aug 17, 2025

Merge branch 'main' into feat/new-execution

18f133c

jonathanpwang marked this pull request as ready for review August 17, 2025 00:11

Copilot AI review requested due to automatic review settings August 17, 2025 00:11

Copilot AI reviewed Aug 17, 2025

View reviewed changes

openvm-org deleted a comment from github-actions bot Aug 17, 2025

This comment was marked as outdated.

Sign in to view

feat(cli): add openvm version to STARK/EVM proof jsons (#1986)

335f0b3

This comment was marked as outdated.

Sign in to view

yi-sun added 3 commits August 17, 2025 18:05

feat: switch docs into vocs (#1929)

b6bbef4

Closes INT-4644, INT-4645, INT-4650, INT-4648 (links are checked automatically by vocs)

feat: add --evm flag for cargo openvm setup and bump versions (#1987

f6dad57

) Closes INT-4569

feat: add clean install prereqs for Ubuntu and Mac (#1988)

290c601

Testing: - Verified on clean Ubuntu install - Verified on clean MacOS install with `xcode-select --install` already run Closes INT-4585

This comment was marked as outdated.

Sign in to view

chore: move some logging to debug level (#1991)

74a0760

This comment was marked as outdated.

Sign in to view

feat(cli): add --verbose mode (#1994)

4db999b

closes INT-4790 usage: ``` cargo openvm --verbose keygen cargo openvm --verbose prove app ```

This comment was marked as outdated.

Sign in to view

docs(changelog): document change in vk binary format (#1998)

30221fb

This comment was marked as outdated.

Sign in to view

ci(bench): tune jemalloc conf for execute benchmarks (#1999)

f2a225d

Reduces memory fragmentation by returning pages to OS with some delay. Credited to @nyunyunyunyu

This comment was marked as outdated.

Sign in to view

jonathanpwang and others added 2 commits August 19, 2025 08:10

docs: update changelog on halo2 verifier contract (#2001)

120f6db

fix(new-execution): warn if max_constraint_degree differ (#1979)

f73da4d

- emit a warning if `max_constraint_degree` don't matching between `VmConfig` and `StarkFriConfig`

jonathanpwang merged commit 3fd48e2 into main Aug 19, 2025
47 checks passed

jonathanpwang deleted the feat/new-execution branch August 19, 2025 15:51

jonathanpwang restored the feat/new-execution branch August 19, 2025 15:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: execution and tracegen rewrite #1567

perf: execution and tracegen rewrite #1567

Uh oh!

jonathanpwang commented Apr 10, 2025 •

edited

Loading

Uh oh!

codspeed-hq bot commented May 15, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

github-actions bot commented Aug 19, 2025

Uh oh!

Uh oh!

Uh oh!

perf: execution and tracegen rewrite #1567

perf: execution and tracegen rewrite #1567

Uh oh!

Conversation

jonathanpwang commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's Changed

Uh oh!

codspeed-hq bot commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Instrumentation Performance Report

Merging #1567 will not alter performance

Summary

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

github-actions bot commented Aug 19, 2025

Uh oh!

Uh oh!

Uh oh!

jonathanpwang commented Apr 10, 2025 •

edited

Loading

codspeed-hq bot commented May 15, 2025 •

edited

Loading