Describe the bug
Repeated ref.func in a tiny hot loop appears to be much slower in Wasmtime than in Wasmer Cranelift.
After reduction, I got a minimal reproducer that preserves essentially the same gap, plus two close controls where the gap disappears. The evidence points specifically to the per-iteration ref.func path, not to the loop scaffold or the reference sink.
test_cases.zip
Primary reproducer:
primary_reproducer_ref_func_hotloop.wat
Supporting controls:
supporting_control_ref_func_hoisted.wat
supporting_control_ref_null_hotloop.wat
Test Case
Primary reproducer loop body:
ref.func $f0
global.set $g0
local.get $i
i64.const 1
i64.sub
local.tee $i
i64.const 0
i64.ne
br_if $body
The reduced reproducer uses:
- trip count:
2^30
- one declared function
$f0
- one mutable
funcref global sink
- one declarative element entry so that
ref.func remains valid
Matched controls:
- same loop shape, but use a hoisted prebuilt non-null reference via
global.get $g0
- same loop shape, but replace
ref.func with ref.null func
Steps to Reproduce
- Build the primary testcase:
wat2wasm primary_reproducer_ref_func_hotloop.wat -o primary_reproducer_ref_func_hotloop.wasm
- Warm up once:
wasmtime primary_reproducer_ref_func_hotloop.wasm
- Measure runtime:
perf stat -r 3 -e 'task-clock' wasmtime primary_reproducer_ref_func_hotloop.wasm
-
Run the same flow on the two supporting controls above.
-
For comparison with Wasmer Cranelift:
wasmer run primary_reproducer_ref_func_hotloop.wasm
perf stat -r 3 -e 'task-clock' wasmer run primary_reproducer_ref_func_hotloop.wasm
Expected and actual Results
Primary reproducer and close controls
| testcase |
wasmer_cranelift (s) |
wasmtime (s) |
ratio |
primary_reproducer_ref_func_hotloop |
2.7668 |
38.2027 |
13.81x |
supporting_control_ref_func_hoisted |
0.5029 |
0.5266 |
1.05x |
supporting_control_ref_null_hotloop |
0.4083 |
0.4468 |
1.09x |
Observed pattern:
- the primary reproducer is dramatically slower in Wasmtime than in Wasmer Cranelift;
- hoisting the non-null function reference out of the loop collapses the gap;
- replacing
ref.func with ref.null func also collapses the gap.
This makes the trigger look very specifically tied to repeated hot-loop ref.func.
Family-level consistency
The original generated ref.func seeds showed the same shape:
| testcase |
wasmer_cranelift (s) |
wasmtime (s) |
ratio |
ref_func_1 |
2.9594 |
38.5392 |
13.02x |
ref_func_2 |
2.7539 |
38.8498 |
14.11x |
A related mixed testcase from the ref.is_null family also showed the same gap only when the loop used ref.func to create the non-null input each iteration:
| testcase |
wasmer_cranelift (s) |
wasmtime (s) |
ratio |
ref_is_null_2 (ref.func + ref.is_null) |
2.9624 |
39.3673 |
13.29x |
hoisted non-null control for ref.is_null |
0.6419 |
0.6855 |
1.07x |
So the ref.is_null outlier seems to be explained by the same repeated-ref.func trigger, rather than by ref.is_null itself.
Versions and Environment
- Wasmtime version:
wasmtime 41.0.0 (4898322a4 2025-12-18)
- Host OS:
Ubuntu 22.04.5 LTS x64
- Architecture:
x86_64
- CPU:
12th Gen Intel(R) Core(TM) i7-12700
Extra Info
I also checked Wasmtime CLIF for the reduced reproducer to make sure the benchmark is still alive.
The hot loop still performs a per-iteration builtin call:
v6 = call fn0(v0, v32)
store notrap aligned table v6, v0+96
where fn0 is wasmtime_builtin_ref_func.
That builtin still performs a deeper indirect runtime call with extra frame/return-address bookkeeping:
v3 = get_frame_pointer.i64
store notrap aligned v3, v2+40
v4 = get_return_address.i64
store notrap aligned v4, v2+48
v7 = call_indirect sig0, v6(v0, v1)
In contrast, the hoisted control's hot loop is just a load/store path without wasmtime_builtin_ref_func in the loop:
v5 = load.i64 notrap aligned table v0+96
store notrap aligned table v5, v0+112
I have not confirmed the internal root cause, so I’m only reporting the measured trigger pattern:
- repeated
ref.func in a tiny hot loop;
- slowdown remains after reduction to a minimal reproducer;
- the gap disappears when
ref.func is removed from the loop;
- the gap also disappears for repeated
ref.null func.
Describe the bug
Repeated
ref.funcin a tiny hot loop appears to be much slower in Wasmtime than in Wasmer Cranelift.After reduction, I got a minimal reproducer that preserves essentially the same gap, plus two close controls where the gap disappears. The evidence points specifically to the per-iteration
ref.funcpath, not to the loop scaffold or the reference sink.test_cases.zip
Primary reproducer:
primary_reproducer_ref_func_hotloop.watSupporting controls:
supporting_control_ref_func_hoisted.watsupporting_control_ref_null_hotloop.watTest Case
Primary reproducer loop body:
The reduced reproducer uses:
2^30$f0funcrefglobal sinkref.funcremains validMatched controls:
global.get $g0ref.funcwithref.null funcSteps to Reproduce
perf stat -r 3 -e 'task-clock' wasmtime primary_reproducer_ref_func_hotloop.wasmRun the same flow on the two supporting controls above.
For comparison with Wasmer Cranelift:
wasmer run primary_reproducer_ref_func_hotloop.wasm perf stat -r 3 -e 'task-clock' wasmer run primary_reproducer_ref_func_hotloop.wasmExpected and actual Results
Primary reproducer and close controls
primary_reproducer_ref_func_hotloopsupporting_control_ref_func_hoistedsupporting_control_ref_null_hotloopObserved pattern:
ref.funcwithref.null funcalso collapses the gap.This makes the trigger look very specifically tied to repeated hot-loop
ref.func.Family-level consistency
The original generated
ref.funcseeds showed the same shape:ref_func_1ref_func_2A related mixed testcase from the
ref.is_nullfamily also showed the same gap only when the loop usedref.functo create the non-null input each iteration:ref_is_null_2(ref.func+ref.is_null)ref.is_nullSo the
ref.is_nulloutlier seems to be explained by the same repeated-ref.functrigger, rather than byref.is_nullitself.Versions and Environment
wasmtime 41.0.0 (4898322a4 2025-12-18)Ubuntu 22.04.5 LTS x64x86_6412th Gen Intel(R) Core(TM) i7-12700Extra Info
I also checked Wasmtime CLIF for the reduced reproducer to make sure the benchmark is still alive.
The hot loop still performs a per-iteration builtin call:
where
fn0iswasmtime_builtin_ref_func.That builtin still performs a deeper indirect runtime call with extra frame/return-address bookkeeping:
In contrast, the hoisted control's hot loop is just a load/store path without
wasmtime_builtin_ref_funcin the loop:I have not confirmed the internal root cause, so I’m only reporting the measured trigger pattern:
ref.funcin a tiny hot loop;ref.funcis removed from the loop;ref.null func.