Skip to content

core/vm, core, miner: fix interrupt propagation to nested EVM calls#2092

Open
lucca30 wants to merge 4 commits intodevelopfrom
lmartins/interrupt-on-struct-field-to-address-nested-call-interruption
Open

core/vm, core, miner: fix interrupt propagation to nested EVM calls#2092
lucca30 wants to merge 4 commits intodevelopfrom
lmartins/interrupt-on-struct-field-to-address-nested-call-interruption

Conversation

@lucca30
Copy link
Contributor

@lucca30 lucca30 commented Feb 27, 2026

Description

Problem

The block-building interrupt flag (interruptBlockBuilding) is not propagated to nested EVM calls. When the interpreter executes opcodes that make nested calls — CALL, DELEGATECALL, STATICCALL, CALLCODE, CREATE — they all pass nil for the interrupt parameter to evm.Run(). This causes Run() to create a dummy new(atomic.Bool) that never fires, making nested contract execution completely uninterruptible.

Since virtually all real transactions involve at least one nested call (proxy patterns, DEX routers, multi-hop swaps), the per-opcode interrupt check is effectively bypassed for the bulk of EVM execution time.

Real-world impact

Block 83,527,864 on Polygon PoS mainnet took 5.654s to build instead of ~2s. The interrupt timer fired on schedule (~1.5s after block building started), but a transaction executing a complex nested call (proxy/DEX pattern) ran for an additional ~4.1 seconds because the interrupt couldn't reach the nested execution. The block was announced 3.7 seconds late.

Root cause

The interrupt flag flows through this chain for the top-level call:

worker.commitTransaction → core.ApplyTransaction → ApplyTransactionWithEVM
  → ApplyMessageNoFeeBurnOrTip → st.execute(interrupt)
    → evm.Call(..., interrupt) → evm.Run(contract, input, false, interrupt) ✓ PASSED

But when opcodes make nested calls, they all pass nil:

instructions.go  opCall:         evm.Call(..., nil)       // ← nil!
evm.go           CallCode:       evm.Run(..., nil)        // ← nil!
evm.go           DelegateCall:   evm.Run(..., nil)        // ← nil!
evm.go           StaticCall:     evm.Run(..., nil)        // ← nil!
evm.go           Create:         evm.Run(..., nil)        // ← nil!

When interrupt is nil, interpreter.go creates a dummy new(atomic.Bool) that is always false — the nested execution never checks the real interrupt flag.

Test: proving the bug exists

TestInterruptNestedCall in core/vm/runtime/runtime_test.go demonstrates the issue with two sub-tests:

  • TopLevel: An infinite loop at the top level, interrupted after 50ms. Proves the interrupt mechanism works for flat code.
  • NestedCall: The same infinite loop inside a STATICCALL. On pre-fix code, the interrupt is ignored inside the nested call.

Pre-fix results (commit 029c58680)

To reproduce the bug, checkout the test commit before the fix and run:

git checkout 029c58680
go test -run TestInterruptNestedCall -v ./core/vm/runtime/
=== RUN   TestInterruptNestedCall/TopLevel
--- PASS: (0.05s)
=== RUN   TestInterruptNestedCall/NestedCall
    runtime_test.go:204: Bug confirmed: nested call took 1.12s (interrupt ignored inside STATICCALL)
--- PASS: (1.12s)

The nested call took 1.12 seconds — it ran until gas exhaustion, completely ignoring the interrupt set at 50ms.

Post-fix results (commit c01a363ec)

git checkout c01a363ec
go test -run TestInterruptNestedCall -v ./core/vm/runtime/
=== RUN   TestInterruptNestedCall/TopLevel
--- PASS: (0.05s)
=== RUN   TestInterruptNestedCall/NestedCall
    runtime_test.go:201: Fix verified: nested call interrupted in 51ms
--- PASS: (0.05s)

The nested call now stops in ~51ms — a 22x improvement, matching the top-level interrupt latency.

Fix

Move interrupt from a function parameter to a field on the EVM struct. This ensures all call depths — top-level, nested, and deeply nested — share the same interrupt flag.

// New field on EVM struct
type EVM struct {
    // ...
    interrupt *atomic.Bool
}

// Set once at block-building start
func (evm *EVM) SetInterrupt(interrupt *atomic.Bool) {
    evm.interrupt = interrupt
}

// Run() reads from the struct — all call depths see the same flag
func (evm *EVM) Run(contract *Contract, input []byte, readOnly bool) (ret []byte, err error) {
    interrupt := evm.interrupt
    if interrupt == nil {
        interrupt = new(atomic.Bool)
    }
    for {
        if interrupt.Load() { return nil, ErrInterrupt }
        // ... execute opcode ...
    }
}

The interrupt parameter is removed from Call(), Run(), and the entire ApplyTransaction chain (29 files, removing parameter threading through ApplyMessage, ApplyMessageNoFeeBurnOrTip, execute(), etc.). The miner sets the interrupt once via evm.SetInterrupt(&w.interruptBlockBuilding) in makeEnv.

Performance validation

An earlier optimization replaced context.Done() with atomic.Bool for the per-opcode interrupt check, yielding a 25x throughput improvement (10.34 ns/op → 0.41 ns/op). We validated that our refactor does not regress this optimization.

Why there should be zero impact

Both pre-fix and post-fix produce identical hot-path code in the interpreter loop:

// Both versions: local variable used in the hot loop
interrupt := /* parameter (pre-fix) | evm.interrupt (post-fix) */
for {
    if interrupt.Load() { ... }  // ← same local *atomic.Bool in both cases
}

The only difference is one extra struct field read at function entry (evm.interrupt), which happens once per Run() call — negligible compared to millions of opcode iterations per transaction.

BenchmarkSimpleLoop comparison (100M gas, 6 runs, 5s benchtime)

                                       │   pre-fix   │              post-fix              │
                                       │   sec/op    │   sec/op     vs base               │
SimpleLoop/loop-100M                      133.6m ± 0%   134.5m ± 0%  +0.63% (p=0.002 n=6)
SimpleLoop/call-reverting-100M            465.3m ± 0%   466.5m ± 0%  +0.26% (p=0.026 n=6)
SimpleLoop/call-nonexist-100M             348.4m ± 0%   350.7m ± 0%  +0.67% (p=0.002 n=6)
SimpleLoop/call-EOA-100M                  312.9m ± 0%   315.5m ± 0%  +0.82% (p=0.002 n=6)
SimpleLoop/call-identity-100M             315.8m ± 1%   321.5m ± 0%  +1.80% (p=0.002 n=6)
SimpleLoop/staticcall-identity-100M       127.3m ± 1%   130.6m ± 1%  +2.58% (p=0.002 n=6)
SimpleLoop/loop2-100M                     155.2m ± 0%   166.8m ± 0%  +7.51% (p=0.002 n=6)

Memory allocations: unchanged across all benchmarks.

Interpretation: The critical loop-100M benchmark (pure opcode loop, most sensitive to per-opcode overhead) shows +0.63% — well within system noise. Most benchmarks fall in the 0.2–1.8% range. The loop2-100M outlier at +7.5% is attributed to thermal throttling: benchmarks were run sequentially (~4 minutes each), and the consistent slight increase across all benchmarks (never a decrease) is a classic sign of CPU thermal effects from back-to-back benchmark suites on Apple Silicon. The per-opcode interrupt check code is literally identical in both versions — the local *atomic.Bool variable and .Load() call produce the same machine code.

Changes

  • Bugfix (non-breaking change that solves an issue)
  • Hotfix (change that solves an urgent issue, and requires immediate attention)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (change that is not backwards-compatible and/or changes current functionality)
  • Changes only for a subset of nodes

Breaking changes

No breaking changes

…truct field

The block-building interrupt flag was threaded as a function parameter
through Call(), Run(), ApplyTransaction(), ApplyMessage(), and execute().
This made it easy to pass nil at nested call sites, which is exactly what
happened — DelegateCall, StaticCall, CallCode, Create, and even the CALL
opcode all passed nil, silently disabling per-opcode interrupt checks for
nested execution.

Instead, store the interrupt once on the EVM struct via SetInterrupt()
and read it in Run(). This removes the parameter from the entire call
chain (29 files) and ensures the interrupt is checked at every opcode
regardless of call depth.
@lucca30 lucca30 requested review from a team, manav2401 and pratikspatil024 February 27, 2026 17:01
@sonarqubecloud
Copy link

@codecov
Copy link

codecov bot commented Feb 27, 2026

Codecov Report

❌ Patch coverage is 68.62745% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 50.63%. Comparing base (fdbc857) to head (6514435).

Files with missing lines Patch % Lines
core/vm/evm.go 37.50% 5 Missing ⚠️
core/parallel_state_processor.go 0.00% 2 Missing ⚠️
eth/tracers/api.go 66.66% 0 Missing and 2 partials ⚠️
core/state_processor.go 87.50% 1 Missing ⚠️
core/vm/instructions.go 0.00% 1 Missing ⚠️
eth/gasestimator/gasestimator.go 0.00% 1 Missing ⚠️
eth/state_accessor.go 0.00% 1 Missing ⚠️
eth/tracers/api_bor.go 0.00% 1 Missing ⚠️
tests/bor/helper.go 0.00% 1 Missing ⚠️
tests/state_test_util.go 0.00% 1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (68.62%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2092      +/-   ##
===========================================
+ Coverage    50.59%   50.63%   +0.03%     
===========================================
  Files          875      875              
  Lines       151820   151824       +4     
===========================================
+ Hits         76815    76872      +57     
+ Misses       69929    69879      -50     
+ Partials      5076     5073       -3     
Files with missing lines Coverage Δ
accounts/abi/bind/backends/simulated.go 60.68% <100.00%> (ø)
consensus/bor/statefull/processor.go 0.00% <ø> (ø)
core/chain_makers.go 51.99% <100.00%> (ø)
core/state_prefetcher.go 91.56% <100.00%> (+0.10%) ⬆️
core/state_transition.go 70.00% <100.00%> (ø)
core/vm/interpreter.go 54.93% <100.00%> (+0.27%) ⬆️
core/vm/runtime/runtime.go 89.13% <100.00%> (ø)
internal/ethapi/api.go 39.70% <100.00%> (ø)
miner/worker.go 67.70% <100.00%> (-0.38%) ⬇️
core/state_processor.go 64.00% <87.50%> (ø)
... and 9 more

... and 19 files with indirect coverage changes

Files with missing lines Coverage Δ
accounts/abi/bind/backends/simulated.go 60.68% <100.00%> (ø)
consensus/bor/statefull/processor.go 0.00% <ø> (ø)
core/chain_makers.go 51.99% <100.00%> (ø)
core/state_prefetcher.go 91.56% <100.00%> (+0.10%) ⬆️
core/state_transition.go 70.00% <100.00%> (ø)
core/vm/interpreter.go 54.93% <100.00%> (+0.27%) ⬆️
core/vm/runtime/runtime.go 89.13% <100.00%> (ø)
internal/ethapi/api.go 39.70% <100.00%> (ø)
miner/worker.go 67.70% <100.00%> (-0.38%) ⬇️
core/state_processor.go 64.00% <87.50%> (ø)
... and 9 more

... and 19 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant