Skip to content

<Performance> fuzzbug: Repeated ref.func in a tiny hot loop is much slower in Wasmtime than in Wasmer Cranelift #13295

@gaaraw

Description

@gaaraw

Describe the bug

Repeated ref.func in a tiny hot loop appears to be much slower in Wasmtime than in Wasmer Cranelift.

After reduction, I got a minimal reproducer that preserves essentially the same gap, plus two close controls where the gap disappears. The evidence points specifically to the per-iteration ref.func path, not to the loop scaffold or the reference sink.

test_cases.zip

Primary reproducer:

  • primary_reproducer_ref_func_hotloop.wat

Supporting controls:

  • supporting_control_ref_func_hoisted.wat
  • supporting_control_ref_null_hotloop.wat

Test Case

Primary reproducer loop body:

ref.func $f0
global.set $g0
local.get $i
i64.const 1
i64.sub
local.tee $i
i64.const 0
i64.ne
br_if $body

The reduced reproducer uses:

  • trip count: 2^30
  • one declared function $f0
  • one mutable funcref global sink
  • one declarative element entry so that ref.func remains valid

Matched controls:

  • same loop shape, but use a hoisted prebuilt non-null reference via global.get $g0
  • same loop shape, but replace ref.func with ref.null func

Steps to Reproduce

  1. Build the primary testcase:
wat2wasm primary_reproducer_ref_func_hotloop.wat -o primary_reproducer_ref_func_hotloop.wasm
  1. Warm up once:
wasmtime primary_reproducer_ref_func_hotloop.wasm
  1. Measure runtime:
perf stat -r 3 -e 'task-clock' wasmtime primary_reproducer_ref_func_hotloop.wasm
  1. Run the same flow on the two supporting controls above.

  2. For comparison with Wasmer Cranelift:

wasmer run primary_reproducer_ref_func_hotloop.wasm
perf stat -r 3 -e 'task-clock' wasmer run primary_reproducer_ref_func_hotloop.wasm

Expected and actual Results

Primary reproducer and close controls

testcase wasmer_cranelift (s) wasmtime (s) ratio
primary_reproducer_ref_func_hotloop 2.7668 38.2027 13.81x
supporting_control_ref_func_hoisted 0.5029 0.5266 1.05x
supporting_control_ref_null_hotloop 0.4083 0.4468 1.09x

Observed pattern:

  • the primary reproducer is dramatically slower in Wasmtime than in Wasmer Cranelift;
  • hoisting the non-null function reference out of the loop collapses the gap;
  • replacing ref.func with ref.null func also collapses the gap.

This makes the trigger look very specifically tied to repeated hot-loop ref.func.

Family-level consistency

The original generated ref.func seeds showed the same shape:

testcase wasmer_cranelift (s) wasmtime (s) ratio
ref_func_1 2.9594 38.5392 13.02x
ref_func_2 2.7539 38.8498 14.11x

A related mixed testcase from the ref.is_null family also showed the same gap only when the loop used ref.func to create the non-null input each iteration:

testcase wasmer_cranelift (s) wasmtime (s) ratio
ref_is_null_2 (ref.func + ref.is_null) 2.9624 39.3673 13.29x
hoisted non-null control for ref.is_null 0.6419 0.6855 1.07x

So the ref.is_null outlier seems to be explained by the same repeated-ref.func trigger, rather than by ref.is_null itself.

Versions and Environment

  • Wasmtime version: wasmtime 41.0.0 (4898322a4 2025-12-18)
  • Host OS: Ubuntu 22.04.5 LTS x64
  • Architecture: x86_64
  • CPU: 12th Gen Intel(R) Core(TM) i7-12700

Extra Info

I also checked Wasmtime CLIF for the reduced reproducer to make sure the benchmark is still alive.

The hot loop still performs a per-iteration builtin call:

v6 = call fn0(v0, v32)
store notrap aligned table v6, v0+96

where fn0 is wasmtime_builtin_ref_func.

That builtin still performs a deeper indirect runtime call with extra frame/return-address bookkeeping:

v3 = get_frame_pointer.i64
store notrap aligned v3, v2+40
v4 = get_return_address.i64
store notrap aligned v4, v2+48
v7 = call_indirect sig0, v6(v0, v1)

In contrast, the hoisted control's hot loop is just a load/store path without wasmtime_builtin_ref_func in the loop:

v5 = load.i64 notrap aligned table v0+96
store notrap aligned table v5, v0+112

I have not confirmed the internal root cause, so I’m only reporting the measured trigger pattern:

  • repeated ref.func in a tiny hot loop;
  • slowdown remains after reduction to a minimal reproducer;
  • the gap disappears when ref.func is removed from the loop;
  • the gap also disappears for repeated ref.null func.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions