diff --git a/AGENTS.md b/AGENTS.md
index 592ad76..444799e 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -1263,6 +1263,19 @@ Fedora Asahi, and `v0.2.0` was released (PR #127 ded6e80; resume-anchor sync PR
 `v0.2.1` then shipped two reprioritized backlog items plus a consistency sweep: a Codex
 resume-anchor/PMU sweep (PR #129), a perf call-graph flamegraph + `make flamegraph` (PR #134,
 superseding the auto-closed #130, closing #32), and the FIX-like text protocol adapter (PR #131,
-closing #29), with the version bump on the release PR. There is no active milestone; the
+closing #29), with the version bump on the release PR.
+
+Since `v0.2.1`, a post-v0.2.1 hardening + perf wave (PRs #135–#146) is merged to `main` and
+unreleased, being cut as **`v0.2.2`**. It came out of a 4-round adversarial bug hunt (converged
+5→2→1→0 confirmed) plus flamegraph-guided optimization. Security/robustness: out-of-domain enum
+rejection in the replay/protocol decoders (#136); network hardening — EINTR retry, accept fairness,
+connection cap, UDP send-error tracking, transient-accept survival, and threaded/epoll fd-exhaustion
+handling (#137, #140, #143); CLI arg validation (#141); a **real UBSan abort gate** — the `asan`
+preset now sets `-fno-sanitize-recover=undefined`, since UBSan previously ran in recover mode and
+exited 0, so pure-UBSan defects passed CI green (#142); OCaml `diff_report` per-fixture robustness
+(#144). Perf (measured back-to-back A/B): `try_emplace` for baseline price levels (~+5%, #138) and
+an order-index hash `max_load_factor` cap at 0.25 (~+18.6%, #145), flamegraph regenerated
+(#135/#139/#146). Determinism preserved (byte-identical fixtures; OCaml differential pass).
+`make check`/`make asan` 270/270 (the latter now a real UBSan gate). After `v0.2.2`, the
 highest-value remaining work is non-code and gated on #94 (external review) and #90 (full cache-PMU
 evidence on a PMU-capable microarchitecture).
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 98f8da8..0f9dcb0 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,55 @@ All notable changes to this project. The format is loosely based on
 
 _Nothing yet._
 
+## [0.2.2] - 2026-06-24
+
+A security/robustness **hardening** wave plus two measured order-book **performance** wins, driven by
+a multi-round adversarial bug hunt (converged 5→2→1→0 confirmed bugs) and flamegraph-guided
+optimization. Same honesty bar: a deterministic C++20 exchange simulator and cross-language
+differential-testing harness — **not** a production exchange, no real-market connectivity, no latency
+or profitability claims, not formal verification. Determinism preserved throughout (fixtures
+byte-identical across g++/clang++ and vs the committed copies; the OCaml differential passes).
+`make check`/`make asan` 270/270.
+
+### Fixed
+
+- **Reject out-of-domain enum bytes in the decoders (#136).** `replay::decode_command` (NewLimit /
+  NewMarket) and `protocol::decode_reject` cast raw bytes to `Side` / `TimeInForce` / `RejectReason`
+  without validating the domain. Since the replay path applies decoded commands straight to the
+  engine with no gateway risk check, a corrupt log record could silently diverge replayed state.
+  Both now validate via `core::is_valid` (added `is_valid(RejectReason)`) and refuse out-of-domain
+  bytes like a malformed frame.
+- **Network-path hardening (#137, #140, #143).** The TCP gateway now retries `EINTR` in its
+  read/write paths and survives transient `accept()` errors (`EINTR`/`ECONNABORTED`) instead of
+  tearing the listener down; both the threaded acceptor (back-off retry) and the epoll loop (listener
+  disarm/re-arm) survive fd exhaustion (`EMFILE`/`ENFILE`); a `TcpServerOptions::max_active_connections`
+  cap sheds load; the epoll loop bounds accepts per tick for fairness; and `UdpPublisher` counts
+  `send_failures` rather than silently dropping datagrams.
+- **CLI argument validation (#141).** `qsl-client`, `qsl-mdfeed`, and `qsl-export-fixture` parse
+  numeric arguments with `std::from_chars` and reject malformed / out-of-range input with a usage
+  message and non-zero exit, instead of `std::terminate` (from an uncaught `std::sto*` exception) or
+  silently truncating an out-of-range port.
+- **UBSan gate now actually fails (#142).** The `asan` preset adds `-fno-sanitize-recover=undefined`
+  so UBSan **aborts** on the first violation. It previously ran in recover mode (print a diagnostic,
+  exit 0), so a pure-UBSan defect passed `make asan` / CI green. The tree is UBSan-clean under the
+  strict gate.
+- **OCaml `diff_report` robustness (#144).** The differential-bundle bin guards each fixture
+  (catching `Stream_parser.Parse_error` / `Sys_error`) so one malformed or unreadable fixture cannot
+  abort the whole batch and silently lose the divergence bundles for the rest.
+
+### Performance
+
+- **`try_emplace` for baseline price levels (#138).** `OrderBook::level_for` used
+  `std::map::emplace`, which allocates and frees a node even when the price level already exists.
+  `try_emplace` avoids that on the steady-state common path. Measured back-to-back A/B on the
+  `qsl-bench profile` workload: **~+5%**.
+- **Order-index hash load-factor cap (#145).** The `OrderId → Locator` index is the busiest structure
+  on the engine hot path (1–4 point lookups per op). Capping its `max_load_factor` at 0.25 shortens
+  probe chains. Measured A/B: **~+18.6%**. Determinism is unaffected — the index is never iterated
+  for output.
+- **Flamegraph regenerated (#135, #139, #146)** against the new code, now a dense (~20k-sample),
+  fully-symbolized frame-pointer profile with zero `[unknown]` frames.
+
 ## [0.2.1] - 2026-06-21
 
 Two backlog items — reprioritized by the maintainer and delivered — plus a resume-anchor and
diff --git a/CLAUDE.md b/CLAUDE.md
index 0b0cabd..60e126e 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -1207,6 +1207,19 @@ Fedora Asahi, and `v0.2.0` was released (PR #127 ded6e80; resume-anchor sync PR
 `v0.2.1` then shipped two reprioritized backlog items plus a consistency sweep: a Codex
 resume-anchor/PMU sweep (PR #129), a perf call-graph flamegraph + `make flamegraph` (PR #134,
 superseding the auto-closed #130, closing #32), and the FIX-like text protocol adapter (PR #131,
-closing #29), with the version bump on the release PR. There is no active milestone; the
+closing #29), with the version bump on the release PR.
+
+Since `v0.2.1`, a post-v0.2.1 hardening + perf wave (PRs #135–#146) is merged to `main` and
+unreleased, being cut as **`v0.2.2`**. It came out of a 4-round adversarial bug hunt (converged
+5→2→1→0 confirmed) plus flamegraph-guided optimization. Security/robustness: out-of-domain enum
+rejection in the replay/protocol decoders (#136); network hardening — EINTR retry, accept fairness,
+connection cap, UDP send-error tracking, transient-accept survival, and threaded/epoll fd-exhaustion
+handling (#137, #140, #143); CLI arg validation (#141); a **real UBSan abort gate** — the `asan`
+preset now sets `-fno-sanitize-recover=undefined`, since UBSan previously ran in recover mode and
+exited 0, so pure-UBSan defects passed CI green (#142); OCaml `diff_report` per-fixture robustness
+(#144). Perf (measured back-to-back A/B): `try_emplace` for baseline price levels (~+5%, #138) and
+an order-index hash `max_load_factor` cap at 0.25 (~+18.6%, #145), flamegraph regenerated
+(#135/#139/#146). Determinism preserved (byte-identical fixtures; OCaml differential pass).
+`make check`/`make asan` 270/270 (the latter now a real UBSan gate). After `v0.2.2`, the
 highest-value remaining work is non-code and gated on #94 (external review) and #90 (full cache-PMU
 evidence on a PMU-capable microarchitecture).
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 19fc3b1..83d5b0b 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -1,5 +1,5 @@
 cmake_minimum_required(VERSION 3.24)
-project(quant-systems-lab VERSION 0.2.1 LANGUAGES CXX)
+project(quant-systems-lab VERSION 0.2.2 LANGUAGES CXX)
 
 set(CMAKE_CXX_STANDARD 20)
 set(CMAKE_CXX_STANDARD_REQUIRED ON)
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index d5e4a5e..1fba07a 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -20,7 +20,7 @@ reviewable.
 
 ```bash
 make check                 # clang-format check + build + tests
-make asan                  # AddressSanitizer + UBSan build and tests
+make asan                  # AddressSanitizer + UBSan build and tests (UBSan aborts on first violation)
 dune runtest --root ocaml  # OCaml log verifier + independent replay + differential + mutation tests
 ```
 
diff --git a/HANDOFF.md b/HANDOFF.md
index 00271c9..0594dc6 100644
--- a/HANDOFF.md
+++ b/HANDOFF.md
@@ -33,8 +33,19 @@ partial-PMU reframe, and a full documentation staleness sweep — landed as PR #
 **v0.2.1 release** then adds two reprioritized backlog items and a consistency sweep: a Codex
 resume-anchor/PMU sweep (PR #129), a perf call-graph flamegraph + `make flamegraph` (PR #130,
 issue #32), the FIX-like text protocol adapter (PR #131, issue #29), and the version-bump release
-PR — merged in that order, with `v0.2.1` tagged on the release merge commit. There is no active
-milestone; the project is between releases.
+PR — merged in that order, with `v0.2.1` tagged on the release merge commit.
+
+Since `v0.2.1`, a **post-v0.2.1 hardening + perf wave (#135–#146) is merged to `main` and
+unreleased**, being cut as **`v0.2.2`**. It came out of a 4-round adversarial bug hunt (converged
+5→2→1→0 confirmed bugs) plus flamegraph-guided optimization. Security/robustness: out-of-domain enum
+rejection in the replay/protocol decoders (#136); network hardening — EINTR retry, accept fairness,
+connection cap, UDP send-error tracking, transient-accept survival, and threaded/epoll fd-exhaustion
+handling (#137, #140, #143); CLI arg validation (#141); a **real UBSan abort gate** —
+`-fno-sanitize-recover=undefined`, since UBSan previously ran in recover mode and exited 0 (#142);
+OCaml `diff_report` robustness (#144). Perf (measured A/B): `try_emplace` for baseline price levels
+(~+5%, #138) and an order-index hash load-factor cap (~+18.6%, #145), with the flamegraph regenerated
+(#135/#139/#146). `make check`/`make asan` 270/270 (the latter now under the real UBSan gate). The
+next action is to finish this `v0.2.2` doc/artifact overhaul and cut the tag.
 
 Background — Linux perf evidence (merged, now bare-metal partial PMU):
 
@@ -77,13 +88,15 @@ Current state:
 
 - latest synced main baseline: `ded6e80` (PR #127, v0.2.0); the `v0.2.1` baseline is the release-PR
   merge commit, after PRs #129/#130/#131
-- current active branch, if active: none (work lands via scoped PRs from `main`)
-- current active status: `v0.2.1` is the current release on top of `v0.2.0`. It adds the FIX-like
-  text protocol adapter (#29), `make flamegraph` + a bare-metal flamegraph artifact (#32), and a
-  Codex resume-anchor/PMU consistency sweep. `make check` 263/263 and `make asan` 263/263 on the
-  bare-metal Apple M2 Fedora Asahi host; both new code files pass the CI CodeScene Code Health gate.
-  No active milestone
-- release tag: `v0.2.1` (Latest, tagged on the release-PR merge commit), after `v0.2.0` and `v0.1.0`
+- current active branch, if active: `docs/post-v0.2.1-overhaul` (v0.2.2 prep + doc/artifact sweep)
+- current active status: `v0.2.1` is the latest tag; a post-v0.2.1 hardening + perf wave (#135–#146)
+  is merged to `main` and unreleased, being cut as `v0.2.2` (decoder enum rejection, network/CLI
+  hardening, a real UBSan abort gate, OCaml diff_report robustness, and two measured order-book perf
+  wins — `try_emplace` ~+5% and an index load-factor cap ~+18.6%). `make check` 270/270 and
+  `make asan` 270/270 (the latter now under the real UBSan gate) on the bare-metal Apple M2 Fedora
+  Asahi host; every touched file passes the CI CodeScene Code Health gate
+- release tag: `v0.2.1` (Latest, tagged on the release-PR merge commit), after `v0.2.0` and `v0.1.0`;
+  `v0.2.2` prepared on this branch, not yet tagged
 - open follow-up issue: #90 — narrowed to the full cache-counter PMU set; the bare-metal Apple host
   provides real cycles/instructions/branches/branch-misses but no cache-reference/cache-miss support
 - issues #95, #28, and #26 were closed by PR #112; issues #32 and #29 were closed by PR #134 and
@@ -94,12 +107,13 @@ Current state:
 
 ### Next milestone
 
-There is no active milestone. M0–M49, the Linux artifact refresh (PR #125), the v0.2.0 release
-(PR #127), and the v0.2.1 content (PRs #129/#134/#131 + release PR) are merged. The highest-value
-remaining work is non-code and externally gated: issue #94 (independent external review — needs a
-human reviewer) and issue #90 (full cache-counter PMU evidence — needs a PMU microarchitecture that
-exposes cache events). The #32 (flamegraph) and #29 (FIX adapter) backlog items are now done. Do not
-invent a new milestone without an explicit human request.
+There is no active milestone. M0–M49 are merged, as are the v0.2.0/v0.2.1 releases and the
+post-v0.2.1 hardening + perf wave (#135–#146, being released as `v0.2.2`). The immediate next action
+is to finish the `v0.2.2` doc/artifact overhaul (this branch) and cut the tag. After that the
+highest-value remaining work is non-code and externally gated: issue #94 (independent external
+review — needs a human reviewer) and issue #90 (full cache-counter PMU evidence — needs a PMU
+microarchitecture that exposes cache events). Do not invent a new milestone without an explicit
+human request.
 
 ### Phase III / IV purpose
 
diff --git a/PROGRESS.md b/PROGRESS.md
index 53d81cc..d5ba28d 100644
--- a/PROGRESS.md
+++ b/PROGRESS.md
@@ -20,36 +20,41 @@ Do not rely on prior chat memory.
 
 ## Current state
 
-- **Active milestone:** none — `v0.2.1` released; project is between releases
-- **Status:** ☑ `v0.2.1` published (FIX-like text protocol adapter #29, perf flamegraph #32, and a
-  resume-anchor/PMU consistency sweep) on top of `v0.2.0`
-- **Active branch:** none (work lands via scoped PRs from `main`)
+- **Active milestone:** none — `v0.2.1` is the latest tag, but a post-v0.2.1 hardening + perf wave
+  (12 PRs, #135–#146) has merged to `main` and is **unreleased**; it is being cut as **`v0.2.2`**
+- **Status:** ☑ `v0.2.1` published on top of `v0.2.0`; ☐ `v0.2.2` in preparation — security/robustness
+  hardening (decoder enum-domain rejection, network/CLI hardening, a real UBSan abort gate, OCaml
+  diff_report robustness) plus two measured order-book perf wins
+- **Active branch:** `docs/post-v0.2.1-overhaul` (the v0.2.2 prep + full doc/artifact staleness sweep)
 - **Last completed milestone:** M49 — NIC offload and low-latency networking study (PR #124,
-  d8c16b2); since then `v0.2.0` (PR #127, ded6e80) and the `v0.2.1` content: Codex resume-anchor
-  sweep (PR #129), perf flamegraph #32 (PR #134), and the FIX text adapter #29 (PR #131)
-- **Last completed docs sync:** v0.2.1 release prep (this PR): version bump + CHANGELOG `[0.2.1]`
-  and resume/release anchors brought current
-- **Release:** `v0.1.0` (tag on 9857e1a), `v0.2.0` (tag on ded6e80), and `v0.2.1` (tag created on the
-  squash-merge of the release PR, marked Latest) published as GitHub-only releases; no packages
-  published
-- **`make check` passing:** yes — `make check` 263/263 and `make asan` 263/263 on the bare-metal
-  Apple M2 (aarch64) Fedora Asahi host on 2026-06-21 (includes the v0.2.1 FIX-adapter and flamegraph
-  renderer tests)
-- **Last action:** delivered the `v0.2.1` content as scoped PRs and prepared this version-bump
-  release. Two reprioritized backlog items — the FIX-like text protocol adapter (#29) and the perf
-  call-graph flamegraph (#32) — plus the Codex resume-anchor/PMU consistency sweep (#127/#128
-  follow-up). Ran Codex as an independent reviewer across the stack and resolved every finding: the
-  FIX envelope now requires MsgType as the first body field and rejects duplicate tags;
-  `flamegraph.sh` classifies zero-sample/partial runs honestly, fails hard on renderer errors, and
-  gates on the folded sample total (not perf's estimate); and the resume anchors were made
-  consistent across PROGRESS/HANDOFF/AGENTS/CLAUDE. Brought every touched file through the CodeScene
-  Code Health gate (table-driven enum maps, a `decode_typed` skeleton, split `parse_envelope`,
-  flattened `flamegraph.py`). `make check`/`make asan` 263/263.
-- **Next action:** no active milestone. Highest-value remaining work is non-code and gated:
-  issue #94 (independent external review — needs a human reviewer) and issue #90 (full
-  cache-counter PMU evidence — needs a PMU microarchitecture that exposes cache events, e.g.
-  x86_64). The #32 (flamegraph) and #29 (FIX adapter) backlog items are done — shipped in `v0.2.1`
-  (PR #134 and PR #131) — so do not reopen them.
+  d8c16b2). Releases since: `v0.2.0` (PR #127, ded6e80) and `v0.2.1` (FIX adapter #131, flamegraph
+  #134, anchor sweep #129). Post-v0.2.1 unreleased work on `main`: #135–#146 (see Last action)
+- **Last completed docs sync:** this v0.2.2-prep overhaul — every `.md`/`.txt` audited against
+  current `main`; resume/release anchors, README, CHANGELOG, and all stale `results/*.txt`
+  provenance digests brought current to HEAD
+- **Release:** `v0.1.0` (tag on 9857e1a), `v0.2.0` (tag on ded6e80), and `v0.2.1` (tag on the
+  release-PR merge, marked Latest) published as GitHub-only releases; `v0.2.2` prepared here, not yet
+  tagged; no packages published
+- **`make check` passing:** yes — `make check` 270/270 and `make asan` 270/270 (the latter now under
+  the **real** UBSan abort gate from #142) on the bare-metal Apple M2 (aarch64) Fedora Asahi host on
+  2026-06-24
+- **Last action:** post-v0.2.1 hardening + perf wave merged to `main` as 12 scoped PRs (#135–#146),
+  driven by a multi-round adversarial bug hunt (converged 5→2→1→0 confirmed) and flamegraph-guided
+  optimization. Security/robustness: reject out-of-domain enum bytes in the replay/protocol decoders
+  (#136); network hardening — EINTR retry, accept fairness, connection cap, UDP send-error tracking,
+  transient-accept survival, and threaded/epoll fd-exhaustion handling (#137, #140, #143); CLI arg
+  validation so the tools reject malformed input instead of `std::terminate` (#141); the `asan`
+  preset now sets `-fno-sanitize-recover=undefined` so UBSan actually fails CI — previously it ran in
+  recover mode and exited 0 (#142); OCaml `diff_report` guards each fixture so one bad file cannot
+  abort the batch (#144). Perf (measured A/B): baseline price levels use `try_emplace` (~+5%, #138)
+  and the order-index hash caps its load factor at 0.25 (~+18.6%, #145); flamegraph regenerated
+  (#135, #139, #146). Determinism preserved throughout (byte-identical fixtures, OCaml differential
+  pass). `make check`/`make asan` 270/270.
+- **Next action:** finish the `v0.2.2` overhaul (this branch): regenerate the remaining stale
+  `results/*.txt` artifacts, then cut the `v0.2.2` tag/release. After that, the highest-value
+  remaining work is non-code and gated: issue #94 (independent external review — needs a human
+  reviewer) and issue #90 (full cache-counter PMU evidence — needs a PMU microarchitecture that
+  exposes cache events, e.g. x86_64).
 - **Blockers:** issue #90 is now a *cache-counter* PMU gap, not a host-access gap — this bare-metal
   Apple M2 exposes real `cycles`/`instructions`/`branches`/`branch-misses` but its PMU does not
   implement `cache-references`/`cache-misses`; closing it needs a PMU microarchitecture that exposes
@@ -221,15 +226,21 @@ Status key:
 
 - _none yet_
 
-Measured by `make bench` (full metadata + raw output in `results/latest.txt`). Hardware-,
-compiler-, and build-dependent — these are from one machine, not a production-latency claim.
-
-- Run: arm64, Apple clang 17, Release, seed 42, commit fbb8180 (synthetic, in-process; excludes network/disk/kernel path).
-- order book add/modify/cancel: ~126 ns/op
-- protocol NewOrder encode+decode: ~39 ns/op
-- in-process gateway session (crossing order with fill): ~270 ns/op
-- matching-engine flow apply: ~121 ns/command
-- replay from command log: ~132 ns/command
+Measured by `make bench` (full metadata + raw output in `results/latest.txt`, which is the
+authoritative source). Hardware-, compiler-, and build-dependent — from one machine, not a
+production-latency claim.
+
+- Run: aarch64 (Apple M2), GCC, Release, seed 42, Fedora Asahi Linux (synthetic, in-process;
+  excludes network/disk/kernel path). The earlier macOS Apple-clang numbers (~126/39/270/121/132 ns)
+  were superseded by the Linux regeneration and are not the current set.
+- order book add/modify/cancel: ~90 ns/op
+- protocol NewOrder encode+decode: ~16 ns/op
+- in-process gateway session (crossing order with fill): ~102 ns/op
+- matching-engine flow apply: ~91 ns/command
+- replay from command log: ~101 ns/command
+- Note: these single-process micro-benchmarks hold a near-empty order index, so they do not exercise
+  the deep-book steady state where the v0.2.2 engine wins land — `try_emplace` (~+5%, #138) and the
+  order-index load-factor cap (~+18.6%, #145) are measured on the `qsl-bench profile` workload.
 
 ---
 
@@ -431,6 +442,25 @@ Lower priority:
   release anchors and removed completed #29/#32 from every backlog list, synced AGENTS.md/CLAUDE.md
   to the v0.2.1 released state, and refreshed this release-readiness audit to 263 tests. `make
   check`/`make asan` 263/263. CodeScene MCP token still expired; CI is the authoritative gate.
+- [2026-06-24] Post-v0.2.1 hardening + perf wave (#135–#146), to be released as `v0.2.2`. Driven by a
+  multi-round adversarial bug hunt (4 rounds, converged 5→2→1→0 confirmed) plus flamegraph-guided
+  optimization. Security/robustness: reject out-of-domain enum bytes in the replay/protocol decoders
+  (#136, `core::is_valid` for Side/TimeInForce/RejectReason); network hardening — EINTR retry in the
+  TCP read/write path, accept fairness (epoll `max_accepts_per_tick`), connection cap
+  (`max_active_connections`), UDP send-error counter, transient-accept survival
+  (EINTR/ECONNABORTED), and threaded/epoll fd-exhaustion handling (#137, #140, #143); CLI arg
+  validation via `std::from_chars` so qsl-client/qsl-mdfeed/qsl-export-fixture reject malformed input
+  instead of `std::terminate`/silent port truncation (#141); the `asan` preset now sets
+  `-fno-sanitize-recover=undefined` so UBSan **aborts** on a violation — it previously ran in recover
+  mode and exited 0, so pure-UBSan defects passed CI green; the tree is UBSan-clean under the strict
+  gate (#142); OCaml `diff_report` guards each fixture so one malformed file cannot abort the batch
+  (#144). Perf (measured back-to-back A/B on the `qsl-bench profile` workload): baseline price levels
+  use `try_emplace` (~+5%, #138) and the order-index hash caps `max_load_factor` at 0.25 (~+18.6%,
+  #145); flamegraph regenerated against the new code (#135/#139/#146). Determinism preserved
+  throughout (byte-identical fixtures across g++/clang++ and vs committed; OCaml differential pass).
+  Then a full doc/artifact staleness overhaul (this branch): every `.md`/`.txt` audited against HEAD,
+  resume/release anchors + README + CHANGELOG brought current, and the stale `results/*.txt`
+  provenance digests regenerated. `make check`/`make asan` 270/270.
 - [2026-06-03] M35: implemented a multi-client TCP connection-scaling load test (`scripts/socket_load.sh`, `make socket-load`, Linux-only) driving N concurrent `qsl-client`s against the portable TCP and epoll (M34) gateways; `results/socket_load_summary.txt` is Docker-generated and constrained. A `/code-review` (3 finder agents) caught and fixed real measurement-integrity bugs before the PR: a failed trial's `wall=0` no longer poisons the reported best (only trials whose gateway served count toward the min); the `completed` column reports the WORST per-trial completion, not the last, so partial/total trial failures are surfaced rather than masked; a per-client `timeout` bounds a hang if the gateway dies; and `QSL_LOAD_TRIALS` is validated. Post-PR hardening uses fresh monotonic ports per gateway start, retries transient startup/serve failures on new ports, and refuses to write a partial artifact unless `QSL_LOAD_ALLOW_PARTIAL=1` is set intentionally; the refreshed artifact records `Dirty tree: no`. The scaling-shape claim remains constrained to loopback connection setup, not a demonstrated production-capacity advantage for either transport. Deferred follow-up: a shared `scripts/lib` to remove the dirty-tree / `wait_ready` / gateway-stop duplication across the three socket scripts.
 - [2026-06-03] M35: started after M34 (#98) squash-merged (commit 9e3750b). Scope: multi-client load / socket-pressure testing of the gateway/feed path (TCP/UDP stress, socket-buffer pressure, connection scaling, backpressure) building on M34's epoll multi-client path and M30's socket tooling. Constraints: scripts/tests document load shape + environment; results must distinguish kernel/socket pressure from user-space engine cost; no production-capacity claims (honest constrained-environment framing, like M29/M30).
 - [2026-06-04] M35: PR #100 squash-merged to `main` as a86b701 after all CI jobs and review checks were green. M35 is now landed; original M36 NUMA remains deferred until the repository-health refactor analysis is completed or explicitly skipped by the human.
@@ -837,14 +867,17 @@ Quant Systems Lab — Linux Systems + Exchange Infrastructure Simulator
 
 ## Next action remains
 
-There is no active milestone. `v0.2.1` is the current release, on top of `v0.2.0` (PR #127 ded6e80)
-and `v0.1.0`. The `v0.2.1` content is squash-merged to `main`: the Codex resume-anchor sweep
-(PR #129), the perf flamegraph #32 (PR #134, superseding the auto-closed #130), the FIX text adapter
-#29 (PR #131), and the version-bump release PR (#133), with `v0.2.1` tagged on the release merge
-commit. The committed perf artifacts remain **partial hardware PMU evidence** from this bare-metal
-Apple M2 (aarch64) Fedora Asahi host — real cycles/instructions/branches/branch-misses with
-cache-reference/cache-miss counters unsupported by the Apple Silicon PMU — not NIC-offload, latency,
-or full hardware-PMU evidence.
+`v0.2.1` is the latest tag, on top of `v0.2.0` (PR #127 ded6e80) and `v0.1.0`. A post-v0.2.1
+hardening + perf wave (#135–#146) is squash-merged to `main` and **unreleased**, being cut as
+`v0.2.2`: out-of-domain enum rejection in the decoders (#136); network hardening — EINTR retry,
+accept fairness, connection cap, UDP send-error tracking, transient-accept survival, and fd-exhaustion
+handling (#137, #140, #143); CLI arg validation (#141); a real UBSan abort gate (#142); OCaml
+`diff_report` robustness (#144); and two measured order-book perf wins — `try_emplace` (~+5%, #138)
+and the order-index load-factor cap (~+18.6%, #145), with the flamegraph regenerated (#135/#139/#146).
+`make check`/`make asan` 270/270. The committed perf artifacts remain **partial hardware PMU
+evidence** from this bare-metal Apple M2 (aarch64) Fedora Asahi host — real
+cycles/instructions/branches/branch-misses with cache-reference/cache-miss counters unsupported by
+the Apple Silicon PMU — not NIC-offload, latency, or full hardware-PMU evidence.
 
 Highest-value remaining work is non-code and gated: issue #94 (independent external review) and
 issue #90 (full cache-PMU evidence). Issue #90 needs a PMU **microarchitecture** that exposes cache
diff --git a/README.md b/README.md
index ee12fd7..319157b 100644
--- a/README.md
+++ b/README.md
@@ -98,14 +98,18 @@ methodology and caveats in [docs/benchmarking.md](docs/benchmarking.md) and
 
 | Scenario (synthetic, in-process) | Measured on this run |
 |---|---|
-| Order book add/modify/cancel | ~87 ns/op |
+| Order book add/modify/cancel | ~90 ns/op |
 | Protocol `NewOrder` encode+decode | ~16 ns/op |
-| Gateway session, crossing order with fill | ~110 ns/op |
-| Matching-engine flow (apply) | ~98 ns/command |
-| Replay from command log | ~110 ns/command |
-
-Reproduce with `make bench` (numbers will differ by machine). The differential-testing harness
-(generation, replay, shrinking) has its own benchmark — `make bench-diff`, written to
+| Gateway session, crossing order with fill | ~102 ns/op |
+| Matching-engine flow (apply) | ~91 ns/command |
+| Replay from command log | ~101 ns/command |
+
+Reproduce with `make bench` (numbers will differ by machine). These micro-benchmarks hold a
+near-empty order index, so they do **not** exercise the deep-book steady state where the v0.2.2
+engine optimizations land: `try_emplace` for baseline price levels (#138) and capping the
+order-index hash load factor (#145) were measured by back-to-back A/B on the `qsl-bench profile`
+workload at **~+5%** and **~+18.6%** respectively (determinism preserved). The differential-testing
+harness (generation, replay, shrinking) has its own benchmark — `make bench-diff`, written to
 [`results/differential.txt`](results/differential.txt) — kept separate so it does not disturb
 the core numbers above.
 
@@ -121,8 +125,10 @@ capture is dense (~20k samples) and stacks are fully symbolized — no `[unknown
 
 This is a **software cpu-clock sampling** hot-symbol profile, **not** PMU evidence: frame width is
 proportional to on-CPU samples, not wall-clock latency or throughput, and it is
-hardware/kernel/compiler/build dependent. The hot frames are `MatchingEngine::new_limit`/`cancel`,
-the order-book level/index operations, and the allocator. Provenance and classification are in
+hardware/kernel/compiler/build dependent. The hot frames are the matching and resting work —
+`MatchingEngine::new_limit` → `OrderBook::match_baseline` and `rest` → `level_for`, plus `cancel`;
+the per-level allocation churn and order-index lookups that previously dominated were cut by the
+v0.2.2 `try_emplace` (#138) and index load-factor (#145) wins. Provenance and classification are in
 [`results/flamegraph.txt`](results/flamegraph.txt); methodology in
 [docs/perf_analysis.md](docs/perf_analysis.md). GitHub renders the SVG statically; download the raw
 file for interactive zoom and search.
@@ -132,9 +138,12 @@ file for interactive zoom and search.
 - **Synthetic and local.** No real market data, no real venue connectivity, no order types
   beyond limit/market + GTC/IOC.
 - **Networking remains scoped.** The default TCP gateway is intentionally
-  loopback-only and unauthenticated. It now has portable threaded serving for multiple clients, and
-  Linux builds also include an opt-in `epoll` gateway prototype for event-driven readiness. These
-  are architecture and pressure-validation paths, not a production event loop or capacity claim.
+  loopback-only and unauthenticated. It has portable threaded serving for multiple clients, plus an
+  opt-in Linux `epoll` gateway prototype for event-driven readiness. Both paths were hardened in
+  v0.2.2: `EINTR` retry on read/write, survival of transient `accept()` errors and fd exhaustion
+  (`EMFILE`/`ENFILE`) instead of tearing the listener down, a connection cap, and per-tick accept
+  fairness. These are architecture and robustness paths, not a production event loop or capacity
+  claim.
 - **Benchmarks are microbenchmarks**, not end-to-end or production latency (see above).
   CPU-affinity/scheduler-migration and false-sharing studies are separate hardware-dependent
   artifacts; contiguous order-book storage is a bounded-domain architecture study, not a general
diff --git a/SECURITY.md b/SECURITY.md
index 1cce055..2993800 100644
--- a/SECURITY.md
+++ b/SECURITY.md
@@ -11,8 +11,10 @@ The demo network components are intentionally minimal:
   and bind to `127.0.0.1` only.
 - They are for local demonstration. **Do not expose `qsl-gateway` or `qsl-mdfeed` to untrusted
   networks**, and do not run them on a shared or public interface.
-- There is no TLS, access control, rate limiting, or DoS protection. Malformed input is handled
-  by disconnecting the peer, not by recovering the stream.
+- There is no TLS, access control, or rate limiting. The acceptors do have bounded resilience: an
+  optional connection cap, survival of transient `accept()` errors and fd exhaustion, and `EINTR`
+  retry on read/write — but this is robustness hardening, not DoS protection. Malformed input is
+  handled by disconnecting the peer, not by recovering the stream.
 
 ## Reporting
 
diff --git a/docs/binary_protocol.md b/docs/binary_protocol.md
index 1e1c948..33f4e28 100644
--- a/docs/binary_protocol.md
+++ b/docs/binary_protocol.md
@@ -68,7 +68,8 @@ buffer holds the full declared body before parsing.
 
 NewOrder enum fields are validated during decode. Out-of-range values for Side, OrderType,
 or TimeInForce return DecodeError::InvalidEnumValue and are not surfaced as internal domain
-messages.
+messages. Gateway-response decoders apply the same domain check: `decode_reject` returns
+`InvalidEnumValue` for a `RejectReason` byte outside the defined codes (#136).
 
 ## Trailing bytes and framing
 
diff --git a/docs/differential_testing.md b/docs/differential_testing.md
index 7284844..ded31d9 100644
--- a/docs/differential_testing.md
+++ b/docs/differential_testing.md
@@ -332,6 +332,9 @@ When the differential check fails in CI, the `ocaml-verifier` job runs `diff_rep
 positive fixtures and uploads a `differential-failure-bundle` artifact. For each diverging
 fixture it contains `<base>.original` (the fixture), `<base>.computed` (OCaml snapshot),
 `<base>.expected` (C++ snapshot), and `<base>.diff` (a line diff) — so a divergence can be
-debugged from the CI run without reproducing locally. The minimal-counterexample form of a
+debugged from the CI run without reproducing locally. `diff_report` guards each fixture
+independently: a malformed or unreadable fixture is reported as a comparison failure (non-zero
+exit), not allowed to abort the batch and lose the remaining fixtures' bundles (#144). The
+minimal-counterexample form of a
 failing *generated* stream is produced separately by the C++ shrinker (`qsl-export-stream
 shrink`, M19).
diff --git a/docs/fix_protocol.md b/docs/fix_protocol.md
index d25c8ce..497d370 100644
--- a/docs/fix_protocol.md
+++ b/docs/fix_protocol.md
@@ -92,4 +92,6 @@ ones:
   out-of-range integers, and oversized messages;
 - signed/extreme `int64` price and `uint64` id/seq round-trips.
 
-The adapter is also covered by the ASan/UBSan preset (`make asan`), since it parses untrusted text.
+The adapter is also covered by the ASan/UBSan preset (`make asan`), since it parses untrusted text;
+the UBSan half now aborts on the first violation (`-fno-sanitize-recover=undefined`, #142), so a
+UBSan defect in the parser fails the build rather than being silently recovered.
diff --git a/docs/pool_backed_storage.md b/docs/pool_backed_storage.md
index 6fa6b7e..98b2e5a 100644
--- a/docs/pool_backed_storage.md
+++ b/docs/pool_backed_storage.md
@@ -215,28 +215,28 @@ produced the earlier "intrusive is ~4-5x slower" ranking.
 
 This artifact moves engine construction, the registration prefix, and the end-of-run snapshot
 readout outside the timed interval (`Source digest:
-sha256:b606452b1bbff3d1c4eed8f59839701590cfbc824207f7b707c03ca66766353a`, `Dirty inputs: no`), so
-each row reflects per-command work. The corrected medians are:
+sha256:c1e4cd7db8472a87cbd23ece3a2d4b330f78ad876b58da412e0e54f6c4eb4cf7`, `Dirty inputs: no`), so
+each row reflects per-command work. The medians are:
 
 | Workload | Shape summary | Median ns/timed-command, fastest to slowest |
 | --- | --- | --- |
-| General generated flow | 4 symbols, 5000 timed cmds, 2238 trades, 793 cancels, 690 modifies, max 41 active levels, width 67, density 0.076 | Contiguous 93.2, intrusive 95.4, baseline 111.0, PMR 121.4 |
-| Dense bounded flow | 2 symbols, 5002 timed cmds, 1048 trades, 984 market orders, 20008 probes, max 80 active levels, width 136, density 0.147 | Intrusive 66.0, contiguous 70.7, PMR 88.3, baseline 96.4 |
-| Sparse wide flow | 4 symbols, 5000 timed cmds, no trades, 828 cancels, 828 modifies, max 32 active levels, width 985, density 0.004 | Intrusive 48.2, contiguous 60.9, PMR 72.1, baseline 81.0 |
-| Cancel/modify-heavy flow | 3 symbols, 5001 timed cmds, no trades, 1599 cancels, 1603 modifies, max 60 active levels, width 30, density 0.333 | Contiguous 42.8, intrusive 44.3, baseline 59.7, PMR 59.8 |
-| Match/traversal-heavy flow | 1 symbol, 5003 timed cmds, 4012 trades, 494 market orders, max 60 active levels, width 81, density 0.370 | Contiguous 69.9, intrusive 87.2, baseline 109.3, PMR 117.9 |
+| General generated flow | 4 symbols, 5000 timed cmds, 2238 trades, 793 cancels, 690 modifies, max 41 active levels, width 67, density 0.076 | Contiguous 71.2, intrusive 80.3, baseline 89.4, PMR 100.0 |
+| Dense bounded flow | 2 symbols, 5002 timed cmds, 1048 trades, 984 market orders, 20008 probes, max 80 active levels, width 136, density 0.147 | Intrusive 52.2, contiguous 57.8, baseline 66.1, PMR 66.3 |
+| Sparse wide flow | 4 symbols, 5000 timed cmds, no trades, 828 cancels, 828 modifies, max 32 active levels, width 985, density 0.004 | Contiguous 40.3, intrusive 42.8, baseline 55.7, PMR 57.9 |
+| Cancel/modify-heavy flow | 3 symbols, 5001 timed cmds, no trades, 1599 cancels, 1603 modifies, max 60 active levels, width 30, density 0.333 | Contiguous 31.4, intrusive 36.7, baseline 49.0, PMR 54.7 |
+| Match/traversal-heavy flow | 1 symbol, 5003 timed cmds, 4012 trades, 494 market orders, max 60 active levels, width 81, density 0.370 | Contiguous 56.8, intrusive 65.1, baseline 96.5, PMR 110.0 |
 
-### What the corrected numbers show
+### What the numbers show
 
-With per-run setup excluded the four modes cluster into a much tighter band (roughly 40-120
+With per-run setup excluded the four modes cluster into a much tighter band (roughly 30-110
 ns/command) instead of the earlier 40-486 spread, and the large earlier gaps are explained by
-per-run pool initialization rather than per-command cost. Intrusive and contiguous storage are the
-two fastest modes and trade the lead by workload shape: intrusive leads the insert/resting-heavy
-dense and sparse flows, contiguous leads the cancel/modify and traversal-heavy flows, and they are
-within a few ns/command on the general flow. Baseline `std::map`/`std::list` and PMR pooling sit
-behind both, with PMR sometimes ahead of baseline and sometimes behind. The medians above come from
-a quiet-host regeneration whose min/max ranges are tight; treat absolute values as environment- and
-build-dependent.
+per-run pool initialization rather than per-command cost. Contiguous storage is fastest on four of
+the five workloads (general, sparse, cancel/modify, match/traversal); the intrusive pool leads only
+the dense bounded flow and is close behind contiguous elsewhere. Baseline `std::map`/`std::list` and
+PMR pooling sit behind both, with baseline usually ahead of PMR. The medians above come from a
+regeneration whose per-mode min/max ranges are tight; treat absolute values as environment- and
+build-dependent, and note these post-v0.2.2 baseline rows already include the `try_emplace` (#138)
+and index load-factor (#145) wins.
 
 This does not make the intrusive pool "free". It pays a large fixed initialization cost
 (pre-allocating 65536 order and node slots per book) that this per-command metric deliberately
diff --git a/docs/recruiting_notes.md b/docs/recruiting_notes.md
index 91ebdbe..fa42665 100644
--- a/docs/recruiting_notes.md
+++ b/docs/recruiting_notes.md
@@ -45,8 +45,9 @@
 ## Résumé bullets — Linux Engineering (conservative)
 
 - Implemented TCP order-gateway transports and a UDP market-data feed on POSIX sockets
-  (loopback), with bounded receive timeouts, sequence-gap detection, threaded portable serving,
-  epoll-based Linux serving, and disconnect-on-malformed-framing.
+  (loopback), with bounded receive timeouts, sequence-gap detection, UDP send-error counting,
+  threaded portable serving with a connection cap and accept-error/fd-exhaustion survival,
+  epoll-based Linux serving, `EINTR`-retry on read/write, and disconnect-on-malformed-framing.
 - Built CLI tools for append-only-log inspection and deterministic replay, plus a demo script
   that orchestrates a loopback gateway round-trip with port-readiness polling and clean
   process teardown.
diff --git a/docs/release_readiness.md b/docs/release_readiness.md
index 2fa1637..e148c63 100644
--- a/docs/release_readiness.md
+++ b/docs/release_readiness.md
@@ -2,17 +2,22 @@
 
 A pre-release pass verifying the repo builds, demos, reproduces, and reads honestly. This audit
 covers **M0–M49, the v0.2.0 evidence refresh** (bare-metal Linux artifact regeneration and the
-documentation/staleness sweep), **and the v0.2.1 content** (the FIX-like text protocol adapter #29,
-the perf call-graph flamegraph + `make flamegraph` #32, and a Codex resume-anchor/PMU consistency
-sweep). It supersedes the v0.1.0-era audit; the actual GitHub release is cut by a human after
-squash-merge.
+documentation/staleness sweep), **the v0.2.1 content** (the FIX-like text protocol adapter #29, the
+perf call-graph flamegraph + `make flamegraph` #32, and a Codex resume-anchor/PMU consistency sweep),
+**and the post-v0.2.1 hardening + perf wave being cut as v0.2.2** (#135–#146): out-of-domain enum
+rejection in the decoders (#136), network-path hardening — EINTR retry, accept fairness, connection
+cap, UDP send-error tracking, transient-accept survival, and fd-exhaustion handling (#137/#140/#143),
+CLI argument validation (#141), a real UBSan abort gate (#142), OCaml `diff_report` robustness (#144),
+and two measured order-book perf wins — `try_emplace` (~+5%, #138) and an index load-factor cap
+(~+18.6%, #145). It supersedes the v0.1.0-era audit; the actual GitHub release is cut by a human
+after squash-merge.
 
 ## Verification (this session, bare-metal Apple M2 / aarch64 / GCC 16.1.1, Fedora Asahi Remix)
 
 | Check | Result |
 |---|---|
-| `make check` | 263/263 tests pass, no warnings (incl. the v0.2.1 FIX-adapter + flamegraph-renderer tests) |
-| `make asan` (ASan + UBSan) | 263/263, sanitizer-clean (the FIX text parser handles untrusted input) |
+| `make check` | 270/270 tests pass, no warnings (incl. the FIX-adapter, flamegraph-renderer, decoder enum-rejection, and CLI-arg-validation tests) |
+| `make asan` (ASan + UBSan) | 270/270, sanitizer-clean; the UBSan gate now **aborts** on the first violation (`-fno-sanitize-recover=undefined`, #142), so pure-UBSan defects no longer pass green, and the tree is clean under it |
 | `make tsan` (ThreadSanitizer) | 20/20 concurrency-labelled tests, race-clean |
 | `make check-fixtures` | committed differential fixtures match current C++ output |
 | `make check-manifest` | provenance manifest matches the committed fixtures |
@@ -88,7 +93,9 @@ verification.
 
 ## Outcome
 
-Release-ready as a portfolio artifact. The next GitHub-only release is `v0.2.1` (the FIX-like text
-protocol adapter #29, the perf flamegraph #32, and a Codex resume-anchor/PMU consistency sweep) on
-top of `v0.2.0` (Phase III/IV systems work — M24–M49 — plus the bare-metal evidence refresh); it
+Release-ready as a portfolio artifact. `v0.2.1` is already tagged (FIX adapter #29, perf flamegraph
+#32, anchor sweep) on top of `v0.2.0` (Phase III/IV systems work — M24–M49 — plus the bare-metal
+evidence refresh). The next GitHub-only release is **`v0.2.2`**, bundling the post-v0.2.1
+hardening + perf wave merged to `main` (#135–#146): decoder enum rejection, network/CLI hardening, a
+real UBSan abort gate, OCaml diff_report robustness, and the two measured order-book perf wins. It
 requires explicit human approval and a squash-merge before tagging.
diff --git a/docs/replay_and_recovery.md b/docs/replay_and_recovery.md
index 0c48f07..2f1caef 100644
--- a/docs/replay_and_recovery.md
+++ b/docs/replay_and_recovery.md
@@ -163,6 +163,8 @@ measurements, not a production recovery-time claim.
   not read back from the log (the log could also store events, but the engine is the source
   of truth for replay equivalence).
 - The reader loads the whole log into memory before replaying (adequate for the simulator).
-- Commands are trusted once their record checksum validates (M7); the command codec does not
-  re-validate enum domains — wire-level enum validation lives at the protocol boundary (M2)
-  and risk checks at the gateway (M5).
+- Commands are trusted once their record checksum validates (M7). The command codec also rejects
+  out-of-domain enum bytes: `replay::decode_command` refuses a `NewLimit`/`NewMarket` record whose
+  `Side` or `TimeInForce` byte is not a defined enum value (#136), so a corrupt log record cannot
+  apply garbage straight to the engine. Higher-level validation still lives at the protocol boundary
+  (M2) and the risk gateway (M5).
diff --git a/docs/socket_gateway.md b/docs/socket_gateway.md
index ad6bb36..ef914d2 100644
--- a/docs/socket_gateway.md
+++ b/docs/socket_gateway.md
@@ -38,7 +38,9 @@ The portable `TcpServer` writes responses with a send-all loop that tolerates pa
 The Linux `EpollServer` keeps a per-client outbound buffer and leaves the connection registered
 for `EPOLLOUT` until all pending response bytes are accepted by the kernel. Both write paths use
 `send(..., MSG_NOSIGNAL)` where available, and the platform socket option where available, so a
-client that drops before reading a response cannot terminate the gateway through `SIGPIPE`.
+client that drops before reading a response cannot terminate the gateway through `SIGPIPE`. Both the
+read and write paths retry on `EINTR` — a signal interruption is treated as retryable, not a
+disconnect.
 
 The epoll path treats `EAGAIN` / `EWOULDBLOCK` as normal nonblocking backpressure:
 
@@ -93,7 +95,12 @@ induces an over-cap response is disconnected.
 
 The default demo uses `TcpServer` because it is portable and easiest to inspect. The accept loop
 spawns one worker per accepted connection, so a slow or still-open client no longer prevents the
-server from accepting a later client. The shared `OrderGateway` remains protected by an internal
+server from accepting a later client. A connection cap (`TcpServerOptions::max_active_connections`,
+default `0` = unbounded) load-sheds — a freshly accepted connection at the cap is closed immediately
+rather than spawning another worker. The accept loop also survives transient `accept()` errors
+(`EINTR`/`ECONNABORTED`, retried) and file-descriptor exhaustion (`EMFILE`/`ENFILE`, a brief back-off
+retry) instead of tearing the listener down; the `EpollServer` handles the same conditions by
+disarming and re-arming the listener. The shared `OrderGateway` remains protected by an internal
 mutex; network I/O can overlap across clients, but matching-engine mutation stays serialized and
 deterministic.
 
diff --git a/docs/socket_hardening.md b/docs/socket_hardening.md
index 236786b..c806f5f 100644
--- a/docs/socket_hardening.md
+++ b/docs/socket_hardening.md
@@ -18,6 +18,11 @@ service.** Nothing here claims a production-networking stack.
 | Peer disconnect mid-write | `send(MSG_NOSIGNAL)` / `SO_NOSIGPIPE` so `SIGPIPE` can't kill the process | `Session` |
 | Indefinite blocking recv | Bounded `SO_RCVTIMEO` on the UDP client | `udp_feed` |
 | UDP burst loss | Detected via sequence gaps; receive-buffer sizing knob (below) | `udp_feed` |
+| UDP transmit failure | Counted, not silently dropped (`UdpPublisher::send_failures()`) | `udp_feed` |
+| Signal during read/write | `EINTR` retried (not treated as a disconnect) | `TcpServer`/`EpollServer` |
+| Transient accept error | `EINTR`/`ECONNABORTED` retried; listener kept alive | `TcpServer`/`EpollServer` |
+| FD exhaustion | `EMFILE`/`ENFILE` survived (back-off retry / listener disarm-rearm), not a teardown | `TcpServer`/`EpollServer` |
+| Connection-count overload | Optional cap (`max_active_connections`) load-sheds at the cap | `TcpServer` |
 
 The first five rows pre-date M30 (M9/M10); M30 adds the receive-buffer sizing knob and documents
 the loss model and the things deliberately left out.
@@ -75,9 +80,12 @@ stated plainly so the gap counter is not mistaken for reliability.
   bottleneck here. No `io_uring` code exists; none is claimed.
 - **TLS / authentication / authorization.** None. The services are loopback-only demos. Do not
   expose them on a routable interface (see `SECURITY.md`).
-- **Idle-peer timeouts, connection caps, rate limiting.** Not implemented. Heartbeats are a
-  liveness round-trip only; the gateway does not yet time out idle peers. These are reasonable
-  future hardening steps, explicitly not done today.
+- **Connection caps.** Implemented as an opt-in `TcpServer` knob (`max_active_connections`, default
+  `0` = unbounded): at the cap a freshly accepted connection is closed (load-shed) rather than
+  spawning another worker. See the posture table above.
+- **Idle-peer timeouts, rate limiting.** Not implemented. Heartbeats are a liveness round-trip only;
+  the gateway does not yet time out idle peers. These are reasonable future hardening steps,
+  explicitly not done today.
 - **`SO_REUSEADDR` / rapid rebind.** Not set; the profiling scripts dodge `TIME_WAIT` by using
   separate ports per pass instead of forcing address reuse.
 
diff --git a/results/README.md b/results/README.md
index 0f8b7aa..8515411 100644
--- a/results/README.md
+++ b/results/README.md
@@ -35,6 +35,13 @@ Benchmark results produced by `make bench` and scripts under `scripts/`.
 - `false_sharing_study.txt` — benchmark-only packed-vs-padded SPSC queue-cursor contention study
   (`make false-sharing-study`). It is research-note evidence about cache-line sharing shape, not
   a production throughput or latency claim.
+- `socket_load_summary.txt` — Linux multi-client TCP connection-scaling load experiment
+  (`make socket-load`, `scripts/socket_load.sh`): N concurrent `qsl-client`s against the threaded and
+  epoll gateways. Constrained loopback connection-setup shape only, not a production-capacity claim.
+- `socket_profile_loopback.txt` — Linux syscall/socket-path profiling of the gateway I/O path
+  (`make profile-io`, `scripts/profile_gateway_io.sh`). Loopback, constrained evidence.
+- `socket_stress_summary.txt` — UDP socket-buffer / burst-loss experiment (`make socket-stress`):
+  receive-buffer sizing vs observed sequence-gap loss on loopback. Research-note evidence only.
 - `crash_recovery_validation.txt` — M45 SIGKILL crash / torn-tail recovery validation for the
   append-only event log across durability modes (`make crash-recovery`). It is process-kill
   evidence only: it validates crash-mid-append recovery and acknowledged-record retention across
diff --git a/results/allocator_experiment.txt b/results/allocator_experiment.txt
index 3824cb3..653ba40 100644
--- a/results/allocator_experiment.txt
+++ b/results/allocator_experiment.txt
@@ -4,12 +4,12 @@ OS:          Linux 6.19.14-400.asahi.fc44.aarch64+16k
 Compiler:    c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2)
 Build type:  Release
 Provenance version: 1
-Git commit (informational): 33f8d11
-Source digest: sha256:4f09cf6b1db08de00d5fb480b77d0b1fe7ebb9ea70dbcc7d73807c7eb06e4598
+Git commit (informational): f9f7e98
+Source digest: sha256:e5fb637e109ffba8b25ab7a5274d325ea8edbbbf13aec8d88d1a486cdb1cc168
 Source digest scope: allocator-experiment
 Dirty inputs: no
 Generated output: results/allocator_experiment.txt
-Date: 2026-06-21T05:25:21Z
+Date: 2026-06-25T02:29:37Z
 Dataset:     engine::Order allocation microbenchmark (new/delete vs fixed pool)
 Warmup:      iters/10 per benchmark, before timing
 Units:       latency = ns/op + ops/sec
@@ -19,6 +19,6 @@ This measures allocator mechanics for order-like objects, not end-to-end engine
 hardware/compiler/build dependent. A negative or tiny delta is acceptable.
 
 Scenario / Metric / Result:
-order new/delete              500000 ops         14.4 ns/op       69407890 ops/sec
-order pool acquire/release    500000 ops          7.0 ns/op      142345350 ops/sec
-order pool burst cycle          2000 ops       7970.4 ns/op         125464 ops/sec
+order new/delete              500000 ops         12.4 ns/op       80810144 ops/sec
+order pool acquire/release    500000 ops          7.0 ns/op      142468570 ops/sec
+order pool burst cycle          2000 ops       7368.3 ns/op         135716 ops/sec
diff --git a/results/differential.txt b/results/differential.txt
index 9cd907e..7e9495e 100644
--- a/results/differential.txt
+++ b/results/differential.txt
@@ -4,12 +4,12 @@ OS:          Linux 6.19.14-400.asahi.fc44.aarch64+16k
 Compiler:    c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2)
 Build type:  Release
 Provenance version: 1
-Git commit (informational): 33f8d11
-Source digest: sha256:3fe4614b9c004642e244fafaf8d01905ed2dd92ca843bbd579e33e66f5e23836
+Git commit (informational): f9f7e98
+Source digest: sha256:736ee67ee7bfbbac0b8c45c5d2a0805b9bf19a664fa44e8ec650b38a9d46a90f
 Source digest scope: differential-benchmark-suite
 Dirty inputs: no
 Generated output: results/differential.txt
-Date: 2026-06-21T05:25:21Z
+Date: 2026-06-25T02:29:36Z
 Dataset:     property command streams (generate_property_flow, 3 symbols, 120 orders)
 Warmup:      iters/10 (or 1 throughput pass) per benchmark, before timing
 Units:       latency = ns/op + ops/sec; throughput = ns/item + items/sec
@@ -19,6 +19,6 @@ measure the differential-testing harness (generation, gateway replay, shrinking)
 production throughput; hardware/compiler/build dependent.
 
 Scenario / Metric / Result:
-property flow generation         123 items       58.0 ns/item     17228399 items/sec
-differential gateway replay       123 items       62.2 ns/item     16071425 items/sec
-shrink property flow             300 ops      31175.3 ns/op          32077 ops/sec
+property flow generation         123 items      108.7 ns/item      9200981 items/sec
+differential gateway replay       123 items      113.8 ns/item      8789365 items/sec
+shrink property flow             300 ops      59593.3 ns/op          16780 ops/sec
diff --git a/results/dpdk_environment.txt b/results/dpdk_environment.txt
index a6aa264..1e3abf8 100644
--- a/results/dpdk_environment.txt
+++ b/results/dpdk_environment.txt
@@ -8,12 +8,12 @@ CPU:         Avalanche-M2
 Compiler:    c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2)
 Build type:  n/a
 Provenance version: 1
-Git commit (informational): 8ab07b5
-Source digest: sha256:a15964c68f12b761a2e60e164dd8dfdc1f56d9fdc896cfe4f867ec5d22b3c8d0
+Git commit (informational): 081e1ec
+Source digest: sha256:ab13cbebe013b05626085319748c5fb9e6d51383be00d78ed7337542d99d67c0
 Source digest scope: dpdk-environment-check
 Dirty inputs: no
 Generated output: results/dpdk_environment.txt
-Date: 2026-06-21T05:43:48Z
+Date: 2026-06-25T02:37:40Z
 pkg-config:  /usr/bin/pkg-config
 libdpdk pkg-config status: not-available
 libdpdk version: not-found
diff --git a/results/false_sharing_study.txt b/results/false_sharing_study.txt
index 61bc6d4..461357e 100644
--- a/results/false_sharing_study.txt
+++ b/results/false_sharing_study.txt
@@ -6,12 +6,12 @@ OS:             Linux 6.19.14-400.asahi.fc44.aarch64+16k
 Compiler:       c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2)
 Build type:     Release
 Provenance version: 1
-Git commit (informational): 33f8d11
-Source digest: sha256:4c2b0de72788bd80e4877d9818693a37629ca2decf69260e33c4c3b0c3603c74
+Git commit (informational): f9f7e98
+Source digest: sha256:f8a12fc427a06c5795c555e77a0fca711876ea756923af1922590fee436ab5c2
 Source digest scope: false-sharing-study
 Dirty inputs: no
 Generated output: results/false_sharing_study.txt
-Date: 2026-06-21T05:25:24Z
+Date: 2026-06-25T02:29:38Z
 Dataset:        synthetic SPSC cursor exchange (producer tail / consumer head)
 
 Host support summary: portable two-thread C++ benchmark; no PMU counters required.
@@ -29,5 +29,5 @@ index with acquire. Benchmark-only control: the padded layout puts each index on
 128-byte boundary, so the two cursors sit on distinct coherency lines even on hosts
 with 128-byte cache lines (Apple Silicon); the production SpscRing pads to 64 bytes.
 
-packed indices              4000000 cursor updates        2.9 ns/update    340900588 updates/sec checksum=4000000061052
-padded indices              4000000 cursor updates       29.6 ns/update     33747427 updates/sec checksum=4000002007457
+packed indices              4000000 cursor updates        2.8 ns/update    359899203 updates/sec checksum=4000000021261
+padded indices              4000000 cursor updates       27.2 ns/update     36771064 updates/sec checksum=4000002008455
diff --git a/results/latest.txt b/results/latest.txt
index 5e68f2a..7c24ef2 100644
--- a/results/latest.txt
+++ b/results/latest.txt
@@ -4,12 +4,12 @@ OS:          Linux 6.19.14-400.asahi.fc44.aarch64+16k
 Compiler:    c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2)
 Build type:  Release
 Provenance version: 1
-Git commit (informational): 33f8d11
-Source digest: sha256:1271610ad6c9c96534b350239f0341fe413ce0b989f3aca3434b2f4652395b64
+Git commit (informational): 114445a
+Source digest: sha256:c54df82614b53ea736e845a5893096a9ee12b65a8cee49be28b9c53a25d5a9df
 Source digest scope: core-benchmark-suite
 Dirty inputs: no
 Generated output: results/latest.txt
-Date: 2026-06-21T05:25:21Z
+Date: 2026-06-25T02:36:30Z
 Dataset:     synthetic order flow (replay::generate_flow, seed 42, 4 symbols)
 Warmup:      iters/10 (or 1 throughput pass) per benchmark, before timing
 Units:       latency = ns/op + ops/sec; throughput = ns/item + items/sec
@@ -19,8 +19,8 @@ no kernel/IO path, stock allocator). NOT production exchange throughput or
 end-to-end latency; hardware/compiler/build dependent.
 
 Scenario / Metric / Result:
-order_book add/mod/cancel     200000 ops         87.3 ns/op       11458608 ops/sec
-protocol encode+decode        500000 ops         15.9 ns/op       62727387 ops/sec
-gateway session (fill)        200000 ops        109.7 ns/op        9115527 ops/sec
-matching engine flow            5004 items       98.2 ns/item     10181380 items/sec
-replay command log              5004 items      110.4 ns/item      9059370 items/sec
+order_book add/mod/cancel     200000 ops         90.6 ns/op       11043024 ops/sec
+protocol encode+decode        500000 ops         16.1 ns/op       62049736 ops/sec
+gateway session (fill)        200000 ops        102.3 ns/op        9776174 ops/sec
+matching engine flow            5004 items       91.4 ns/item     10939533 items/sec
+replay command log              5004 items      101.3 ns/item      9874313 items/sec
diff --git a/results/nic_offload_environment.txt b/results/nic_offload_environment.txt
index 109125b..14c7f86 100644
--- a/results/nic_offload_environment.txt
+++ b/results/nic_offload_environment.txt
@@ -1,4 +1,4 @@
-Command:     QSL_NIC_DEVICES=wld0 make nic-offload-check
+Command:     make nic-offload-check
 Artifact:    NIC offload and timestamping capability check (non-mutating)
 Evidence class: linux-readonly-capability-observation
 Host support summary: Linux host with read-only NIC capability inspection; no settings changed and no packet measurement ran
@@ -8,24 +8,24 @@ CPU:         Avalanche-M2
 Compiler:    c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2)
 Build type:  n/a
 Provenance version: 1
-Git commit (informational): 8ab07b5
-Source digest: sha256:904cbf8c83cd9ee0107e11be380c53f9509b28b2df2f96a2db8b54b323091a30
+Git commit (informational): 081e1ec
+Source digest: sha256:088a8ba85f514bc6264b43b5754e95f6a8f2e0a239936492f50f800031f8f782
 Source digest scope: nic-offload-environment-check
 Dirty inputs: no
 Generated output: results/nic_offload_environment.txt
-Date: 2026-06-21T05:43:48Z
+Date: 2026-06-25T02:37:40Z
 ethtool:     /usr/bin/ethtool
 ip:          /usr/bin/ip
 lspci:       /usr/bin/lspci
 phc_ctl:     not-found
 ptp4l:       not-found
-Requested Linux devices: wld0
+Requested Linux devices: docker0 tailscale0 wld0
 Missing requested devices: none
-Linux devices inspected: wld0
-Device count: 1
+Linux devices inspected: docker0 tailscale0 wld0
+Device count: 3
 Offload feature list visible: yes
 RSS indirection/hash visible: no
-Queue/channel info visible: no
+Queue/channel info visible: yes
 Hardware timestamping visible: no
 Offload settings changed: no
 RSS settings changed: no
@@ -38,6 +38,232 @@ Caveat: This artifact records read-only host and NIC capability context. It does
 not change offload flags, queue counts, RSS tables, timestamp filters, drivers,
 or interrupt affinity, and it does not support any NIC-offload or latency claim.
 
+== device docker0 summary ==
+operstate: down
+mtu:       1500
+driver:    n/a
+pci:       n/a
+rx queues: 1
+tx queues: 1
+
+== ip -details link show dev docker0 ==
+4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
+    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535 netns-immutable
+    bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time 30000 stp_state 0 priority 32768 vlan_filtering 0 vlan_protocol 802.1Q bridge_id 8000.xx:xx:xx:xx:xx:xx designated_root 8000.xx:xx:xx:xx:xx:xx root_port 0 root_path_cost 0 topology_change 0 topology_change_detected 0 hello_timer    0.00 tcn_timer    0.00 topology_change_timer    0.00 gc_timer    0.00 fdb_n_learned 0 fdb_max_learned 0 vlan_default_pvid 1 vlan_stats_enabled 0 vlan_stats_per_port 0 group_fwd_mask 0 group_address xx:xx:xx:xx:xx:xx mcast_snooping 1 no_linklocal_learn 0 mcast_vlan_snooping 0 mst_enabled 0 mdb_offload_fail_notification 0 mcast_router 1 mcast_query_use_ifaddr 0 mcast_querier 0 mcast_hash_elasticity 16 mcast_hash_max 4096 mcast_last_member_count 2 mcast_startup_query_count 2 mcast_last_member_interval 100 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000 mcast_startup_query_interval 3125 mcast_stats_enabled 0 mcast_igmp_version 2 mcast_mld_version 1 nf_call_iptables 0 nf_call_ip6tables 0 nf_call_arptables 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 gso_ipv4_max_size 65536 gro_ipv4_max_size 65536
+
+== ethtool -i docker0 ==
+driver: bridge
+version: 2.3
+firmware-version: N/A
+expansion-rom-version:
+bus-info: N/A
+supports-statistics: no
+supports-test: no
+supports-eeprom-access: no
+supports-register-dump: no
+supports-priv-flags: no
+
+== ethtool -k docker0 ==
+Features for docker0:
+rx-checksumming: off [fixed]
+tx-checksumming: on
+	tx-checksum-ipv4: off [fixed]
+	tx-checksum-ip-generic: on
+	tx-checksum-ipv6: off [fixed]
+	tx-checksum-fcoe-crc: off [fixed]
+	tx-checksum-sctp: off [fixed]
+scatter-gather: on
+	tx-scatter-gather: on
+	tx-scatter-gather-fraglist: on
+tcp-segmentation-offload: on
+	tx-tcp-segmentation: on
+	tx-tcp-ecn-segmentation: on
+	tx-tcp-mangleid-segmentation: on
+	tx-tcp6-segmentation: on
+	tx-tcp-accecn-segmentation: on
+generic-segmentation-offload: on
+generic-receive-offload: on
+large-receive-offload: off [fixed]
+rx-vlan-offload: off [fixed]
+tx-vlan-offload: on
+ntuple-filters: off [fixed]
+receive-hashing: off [fixed]
+highdma: on
+rx-vlan-filter: off [fixed]
+vlan-challenged: off [fixed]
+tx-gso-robust: on
+tx-fcoe-segmentation: on
+tx-gre-segmentation: on
+tx-gre-csum-segmentation: on
+tx-ipxip4-segmentation: on
+tx-ipxip6-segmentation: on
+tx-udp_tnl-segmentation: on
+tx-udp_tnl-csum-segmentation: on
+tx-gso-partial: on
+tx-tunnel-remcsum-segmentation: on
+tx-sctp-segmentation: on
+tx-esp-segmentation: on
+tx-udp-segmentation: on
+tx-gso-list: on
+tx-nocache-copy: off
+loopback: off [fixed]
+rx-fcs: off [fixed]
+rx-all: off [fixed]
+tx-vlan-stag-hw-insert: on
+rx-vlan-stag-hw-parse: off [fixed]
+rx-vlan-stag-filter: off [fixed]
+l2-fwd-offload: off [fixed]
+hw-tc-offload: off [fixed]
+esp-hw-offload: off [fixed]
+esp-tx-csum-hw-offload: off [fixed]
+rx-udp_tunnel-port-offload: off [fixed]
+tls-hw-tx-offload: off [fixed]
+tls-hw-rx-offload: off [fixed]
+rx-gro-hw: off [fixed]
+tls-hw-record: off [fixed]
+rx-gro-list: off
+macsec-hw-offload: off [fixed]
+rx-udp-gro-forwarding: off
+hsr-tag-ins-offload: off [fixed]
+hsr-tag-rm-offload: off [fixed]
+hsr-fwd-offload: off [fixed]
+hsr-dup-offload: off [fixed]
+
+== ethtool -l docker0 ==
+netlink error: Operation not supported
+command failed: /usr/bin/ethtool -l docker0
+
+== ethtool -x docker0 ==
+netlink error: Operation not supported
+command failed: /usr/bin/ethtool -x docker0
+
+== ethtool -T docker0 ==
+Time stamping parameters for docker0:
+Capabilities:
+	software-receive
+	software-system-clock
+PTP Hardware Clock: none
+Hardware Transmit Timestamp Modes: none
+Hardware Receive Filter Modes: none
+
+== device tailscale0 summary ==
+operstate: unknown
+mtu:       1280
+driver:    n/a
+pci:       n/a
+rx queues: 1
+tx queues: 1
+
+== ip -details link show dev tailscale0 ==
+2: tailscale0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1280 qdisc fq_codel state UNKNOWN mode DEFAULT group default qlen 500
+    link/none  promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535
+    tun type tun pi off vnet_hdr on persist off addrgenmode random numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 gso_ipv4_max_size 65536 gro_ipv4_max_size 65536
+
+== ethtool -i tailscale0 ==
+driver: tun
+version: 1.6
+firmware-version:
+expansion-rom-version:
+bus-info: tun
+supports-statistics: no
+supports-test: no
+supports-eeprom-access: no
+supports-register-dump: no
+supports-priv-flags: no
+
+== ethtool -k tailscale0 ==
+Features for tailscale0:
+rx-checksumming: off [fixed]
+tx-checksumming: on
+	tx-checksum-ipv4: off [fixed]
+	tx-checksum-ip-generic: on
+	tx-checksum-ipv6: off [fixed]
+	tx-checksum-fcoe-crc: off [fixed]
+	tx-checksum-sctp: off [fixed]
+scatter-gather: on
+	tx-scatter-gather: on
+	tx-scatter-gather-fraglist: on
+tcp-segmentation-offload: on
+	tx-tcp-segmentation: on
+	tx-tcp-ecn-segmentation: off
+	tx-tcp-mangleid-segmentation: off
+	tx-tcp6-segmentation: on
+	tx-tcp-accecn-segmentation: off [fixed]
+generic-segmentation-offload: on
+generic-receive-offload: on
+large-receive-offload: off [fixed]
+rx-vlan-offload: off [fixed]
+tx-vlan-offload: on
+ntuple-filters: off [fixed]
+receive-hashing: off [fixed]
+highdma: off [fixed]
+rx-vlan-filter: off [fixed]
+vlan-challenged: off [fixed]
+tx-gso-robust: off [fixed]
+tx-fcoe-segmentation: off [fixed]
+tx-gre-segmentation: off [fixed]
+tx-gre-csum-segmentation: off [fixed]
+tx-ipxip4-segmentation: off [fixed]
+tx-ipxip6-segmentation: off [fixed]
+tx-udp_tnl-segmentation: off
+tx-udp_tnl-csum-segmentation: off
+tx-gso-partial: off [fixed]
+tx-tunnel-remcsum-segmentation: off [fixed]
+tx-sctp-segmentation: off [fixed]
+tx-esp-segmentation: off [fixed]
+tx-udp-segmentation: on
+tx-gso-list: off [fixed]
+tx-nocache-copy: off
+loopback: off [fixed]
+rx-fcs: off [fixed]
+rx-all: off [fixed]
+tx-vlan-stag-hw-insert: on
+rx-vlan-stag-hw-parse: off [fixed]
+rx-vlan-stag-filter: off [fixed]
+l2-fwd-offload: off [fixed]
+hw-tc-offload: off [fixed]
+esp-hw-offload: off [fixed]
+esp-tx-csum-hw-offload: off [fixed]
+rx-udp_tunnel-port-offload: off [fixed]
+tls-hw-tx-offload: off [fixed]
+tls-hw-rx-offload: off [fixed]
+rx-gro-hw: off [fixed]
+tls-hw-record: off [fixed]
+rx-gro-list: off
+macsec-hw-offload: off [fixed]
+rx-udp-gro-forwarding: off
+hsr-tag-ins-offload: off [fixed]
+hsr-tag-rm-offload: off [fixed]
+hsr-fwd-offload: off [fixed]
+hsr-dup-offload: off [fixed]
+
+== ethtool -l tailscale0 ==
+Channel parameters for tailscale0:
+Pre-set maximums:
+RX:		n/a
+TX:		n/a
+Other:		n/a
+Combined:	1
+Current hardware settings:
+RX:		n/a
+TX:		n/a
+Other:		n/a
+Combined:	1
+
+== ethtool -x tailscale0 ==
+netlink error: Operation not supported
+command failed: /usr/bin/ethtool -x tailscale0
+
+== ethtool -T tailscale0 ==
+Time stamping parameters for tailscale0:
+Capabilities:
+	software-transmit
+	software-receive
+	software-system-clock
+PTP Hardware Clock: none
+Hardware Transmit Timestamp Modes: none
+Hardware Receive Filter Modes: none
+
 == device wld0 summary ==
 operstate: up
 mtu:       1500
@@ -50,7 +276,7 @@ tx queues: 1
 01:00.0 Network controller: Broadcom Inc. and subsidiaries BCM4387 802.11ax Dual Band Wireless LAN Controller (rev 07)
 
 == ip -details link show dev wld0 ==
-2: wld0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DORMANT group default qlen 1000
+3: wld0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DORMANT group default qlen 1000
     link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff permaddr xx:xx:xx:xx:xx:xx promiscuity 0 allmulti 0 minmtu 68 maxmtu 1500 netns-immutable addrgenmode none numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 gso_ipv4_max_size 65536 gro_ipv4_max_size 65536 parentbus pci parentdev 0000:01:00.0
     altname wlp1s0f0
     altname wlxxxxxxxxxxxxx
diff --git a/results/numa_affinity_study.txt b/results/numa_affinity_study.txt
index f95a440..5b4c8b5 100644
--- a/results/numa_affinity_study.txt
+++ b/results/numa_affinity_study.txt
@@ -1,4 +1,4 @@
-Command:     QSL_NUMA_ALLOW_CONSTRAINED=1 QSL_NUMA_BIN=build/bench/qsl-bench make numa-study
+Command:     QSL_NUMA_BIN=build/bench/qsl-bench make numa-study
 Evidence class: linux-constrained
 Host support summary: Linux host, constrained evidence
 Hardware:    aarch64
@@ -7,12 +7,12 @@ CPU:         Avalanche-M2
 Compiler:    c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2)
 Build type:  Release
 Provenance version: 1
-Git commit (informational): 33f8d11
-Source digest: sha256:ff7a0a6b696ef700cd7bb568a531cf2e06ea16932d1031a8cfe85be6e0d21b91
+Git commit (informational): f9f7e98
+Source digest: sha256:0b9e8373fa304d7e734399a92e3b7f8bc8f4c6ee538621ad53cc35b443c67909
 Source digest scope: numa-affinity-study
 Dirty inputs: no
 Generated output: results/numa_affinity_study.txt
-Date: 2026-06-21T05:25:24Z
+Date: 2026-06-25T02:30:17Z
 Benchmark binary: build/bench/qsl-bench
 Allowed CPUs: 0-7
 CPU chosen:  0
@@ -50,18 +50,18 @@ Pinned command:
 taskset -c 0 build/bench/qsl-bench
 
 Unpinned benchmark output:
-order_book add/mod/cancel     200000 ops        138.0 ns/op        7248959 ops/sec
-protocol encode+decode        500000 ops         20.9 ns/op       47914709 ops/sec
-gateway session (fill)        200000 ops        128.2 ns/op        7800869 ops/sec
-matching engine flow            5004 items      101.7 ns/item      9834865 items/sec
-replay command log              5004 items      113.7 ns/item      8798241 items/sec
+order_book add/mod/cancel     200000 ops        113.5 ns/op        8807360 ops/sec
+protocol encode+decode        500000 ops         20.0 ns/op       50088843 ops/sec
+gateway session (fill)        200000 ops        115.6 ns/op        8652239 ops/sec
+matching engine flow            5004 items       93.6 ns/item     10682213 items/sec
+replay command log              5004 items      101.4 ns/item      9862110 items/sec
 
 Pinned benchmark output:
-order_book add/mod/cancel     200000 ops        143.3 ns/op        6976774 ops/sec
-protocol encode+decode        500000 ops         27.7 ns/op       36063756 ops/sec
-gateway session (fill)        200000 ops        236.9 ns/op        4220492 ops/sec
-matching engine flow            5004 items      187.1 ns/item      5345523 items/sec
-replay command log              5004 items      221.8 ns/item      4508370 items/sec
+order_book add/mod/cancel     200000 ops        234.2 ns/op        4269507 ops/sec
+protocol encode+decode        500000 ops         29.9 ns/op       33473413 ops/sec
+gateway session (fill)        200000 ops        219.4 ns/op        4558316 ops/sec
+matching engine flow            5004 items      168.2 ns/item      5946192 items/sec
+replay command log              5004 items      187.6 ns/item      5329445 items/sec
 
 NUMA local benchmark output:
 NUMA node-local/remote binding skipped: fewer than two NUMA nodes found
@@ -76,10 +76,10 @@ Unpinned perf stat output:
                  0      context-switches:u
                  0      cpu-migrations:u
 
-       0.084315551 seconds time elapsed
+       0.095650252 seconds time elapsed
 
-       0.084129000 seconds user
-       0.000000000 seconds sys
+       0.094408000 seconds user
+       0.000983000 seconds sys
 
 
 
@@ -90,10 +90,10 @@ Pinned perf stat output:
                  0      context-switches:u
                  0      cpu-migrations:u
 
-       0.154719226 seconds time elapsed
+       0.144623525 seconds time elapsed
 
-       0.152299000 seconds user
-       0.001988000 seconds sys
+       0.141344000 seconds user
+       0.002985000 seconds sys
 
 
 
@@ -111,7 +111,7 @@ Core(s) per socket:                      4
 Socket(s):                               1
 Stepping:                                0x1
 Frequency boost:                         disabled
-CPU(s) scaling MHz:                      53%
+CPU(s) scaling MHz:                      100%
 CPU max MHz:                             2424.0000
 CPU min MHz:                             600.0000
 BogoMIPS:                                48.00
@@ -156,7 +156,7 @@ numactl --hardware output:
 available: 1 nodes (0)
 node 0 cpus: 0 1 2 3 4 5 6 7
 node 0 size: 7481 MB
-node 0 free: 1620 MB
+node 0 free: 2023 MB
 node distances:
 node     0
    0:   10
diff --git a/results/perf_report_linux.txt b/results/perf_report_linux.txt
index 92e4c04..3bd5be1 100644
--- a/results/perf_report_linux.txt
+++ b/results/perf_report_linux.txt
@@ -8,18 +8,18 @@ Perf:          perf version 6.19.14-400.asahi.fc44.aarch64
 Perf paranoid: 2
 Build type:    Release
 Provenance version: 1
-Git commit (informational): 33f8d11
-Source digest: sha256:c991d51c8076952f2c3dcd5e407f78e512d9fb4e573cb7ad65f7a700a9ed37a2
+Git commit (informational): f9f7e98
+Source digest: sha256:1837aa008369e0029dd4a16e7e780bacac293688e03351b88dbb4c586fbbf34e
 Source digest scope: perf-record-benchmark
 Dirty inputs: no
 Generated output: results/perf_report_linux.txt
-Date: 2026-06-21T05:25:24Z
+Date: 2026-06-25T02:30:02Z
 Benchmark binary: build/bench/qsl-bench
 Benchmark status: 0
 Dataset:       qsl-bench default synthetic benchmark suite
 Record event:  cpu-clock
 Sample freq:   2000 Hz
-Sample count:  186
+Sample count:  188
 Minimum samples for hot profile: 100
 Insufficient samples: no
 Report limit:  1%
@@ -34,22 +34,22 @@ cpu-clock event is a software sampling profile for hot-symbol investigation,
 not a latency or throughput measurement.
 
 Benchmark output:
-order_book add/mod/cancel     200000 ops        141.9 ns/op        7047598 ops/sec
-protocol encode+decode        500000 ops         21.3 ns/op       47032627 ops/sec
-gateway session (fill)        200000 ops        129.1 ns/op        7743908 ops/sec
-matching engine flow            5004 items      103.0 ns/item      9713046 items/sec
-replay command log              5004 items      112.8 ns/item      8863630 items/sec
+order_book add/mod/cancel     200000 ops        131.2 ns/op        7621590 ops/sec
+protocol encode+decode        500000 ops         20.1 ns/op       49771400 ops/sec
+gateway session (fill)        200000 ops        118.7 ns/op        8425787 ops/sec
+matching engine flow            5004 items       95.1 ns/item     10520557 items/sec
+replay command log              5004 items       99.8 ns/item     10021725 items/sec
 
 Benchmark output under perf:
-order_book add/mod/cancel     200000 ops        112.7 ns/op        8873425 ops/sec
-protocol encode+decode        500000 ops         21.0 ns/op       47551868 ops/sec
-gateway session (fill)        200000 ops        127.6 ns/op        7833933 ops/sec
-matching engine flow            5004 items      101.1 ns/item      9892789 items/sec
-replay command log              5004 items      119.5 ns/item      8368038 items/sec
+order_book add/mod/cancel     200000 ops        139.1 ns/op        7190560 ops/sec
+protocol encode+decode        500000 ops         20.5 ns/op       48888534 ops/sec
+gateway session (fill)        200000 ops        117.5 ns/op        8511050 ops/sec
+matching engine flow            5004 items       92.2 ns/item     10847835 items/sec
+replay command log              5004 items       97.9 ns/item     10213751 items/sec
 
 perf record stderr:
 [ perf record: Woken up 1 times to write data ]
-[ perf record: Captured and wrote 0.028 MB build/perf/qsl-bench.perf.data (186 samples) ]
+[ perf record: Captured and wrote 0.027 MB build/perf/qsl-bench.perf.data (188 samples) ]
 
 perf report stderr:
 
@@ -59,15 +59,20 @@ perf report output:
 #
 # Total Lost Samples: 0
 #
-# Samples: 186  of event 'cpu-clock:u'
-# Event count (approx.): 93000000
+# Samples: 188  of event 'cpu-clock:u'
+# Event count (approx.): 94000000
 #
-# Overhead  Symbol                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   Shared Object          IPC   [IPC Coverage]
-# ........  .......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................  .....................  ....................
+# Overhead  Symbol                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       Shared Object          IPC   [IPC Coverage]
+# ........  ...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................  .....................  ....................
 #
-    10.22%  [.] cfree@GLIBC_2.17                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     libc.so.6              -      -
+    12.23%  [.] cfree@GLIBC_2.17                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         libc.so.6              -      -
             |
-            |--1.61%--qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&)
+            |--3.19%--main
+            |          __libc_start_call_main
+            |          __libc_start_main@@GLIBC_2.34
+            |          _start
+            |
+            |--1.60%--qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&)
             |          decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0]
             |          qsl::engine::OrderBook::cancel(unsigned long)
             |          main
@@ -75,7 +80,7 @@ perf report output:
             |          __libc_start_main@@GLIBC_2.34
             |          _start
             |
-            |--1.08%--qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
+            |--1.06%--qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
             |          qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
             |          qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
             |          qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>)
@@ -84,7 +89,7 @@ perf report output:
             |          __libc_start_main@@GLIBC_2.34
             |          _start
             |
-            |--1.08%--qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0]
+            |--1.06%--qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0]
             |          qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
             |          qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
             |          qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
@@ -94,160 +99,93 @@ perf report output:
             |          __libc_start_main@@GLIBC_2.34
             |          _start
             |
-            |--1.08%--qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&)
-            |          main
-            |          __libc_start_call_main
-            |          __libc_start_main@@GLIBC_2.34
-            |          _start
-            |
-            |--1.08%--std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*)
-            |          qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&)
+            |--1.06%--qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
             |
-             --1.08%--main
+             --1.06%--0x5000000402b63
                        __libc_start_call_main
                        __libc_start_main@@GLIBC_2.34
                        _start
 
-     9.68%  [.] malloc                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               libc.so.6              -      -
+     6.91%  [.] malloc                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   libc.so.6              -      -
             |
-            |--6.45%--operator new(unsigned long)
-            |          |
-            |          |--2.15%--qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
-            |          |          |
-            |          |           --1.61%--qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
-            |          |                     qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
-            |          |                     qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
-            |          |                     qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>)
-            |          |                     main
-            |          |                     __libc_start_call_main
-            |          |                     __libc_start_main@@GLIBC_2.34
-            |          |                     _start
-            |          |
-            |          |--1.61%--qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
-            |          |          qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
-            |          |          qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
-            |          |          qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
-            |          |          qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
-            |          |          qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>)
-            |          |          main
-            |          |          __libc_start_call_main
-            |          |          __libc_start_main@@GLIBC_2.34
-            |          |          _start
+            |--3.72%--operator new(unsigned long)
             |          |
-            |          |--1.08%--qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0]
-            |          |          qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
-            |          |          qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
-            |          |          qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
-            |          |          qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>)
-            |          |          main
-            |          |          __libc_start_call_main
-            |          |          __libc_start_main@@GLIBC_2.34
-            |          |          _start
-            |          |
-            |           --1.08%--qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long)
+            |           --1.60%--qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long)
             |                     main
             |                     __libc_start_call_main
             |                     __libc_start_main@@GLIBC_2.34
             |                     _start
             |
-             --3.23%--__posix_memalign
+             --3.19%--__posix_memalign
                        operator new(unsigned long, std::align_val_t)
                        |
-                       |--1.61%--std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&)
-                       |          qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
+                       |--1.60%--qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
                        |          qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
                        |          |
-                       |           --1.08%--main
+                       |           --1.06%--main
                        |                     __libc_start_call_main
                        |                     __libc_start_main@@GLIBC_2.34
                        |                     _start
                        |
-                        --1.08%--qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
+                        --1.06%--qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long)
+                                  qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
                                   qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+
+     5.32%  [.] qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            qsl-bench              -      -
+            |
+            |--3.19%--main
+            |          __libc_start_call_main
+            |          __libc_start_main@@GLIBC_2.34
+            |          _start
+            |
+             --1.60%--qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+                       |
+                        --1.06%--qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&)
+                                  qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&)
                                   main
                                   __libc_start_call_main
                                   __libc_start_main@@GLIBC_2.34
                                   _start
 
-     8.60%  [.] qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  qsl-bench              -      -
+     4.79%  [.] qsl::engine::OrderBook::modify(unsigned long, long, unsigned int)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        qsl-bench              -      -
+            |
+            ---main
+               __libc_start_call_main
+               __libc_start_main@@GLIBC_2.34
+               _start
+
+     3.72%  [.] operator new(unsigned long)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              libstdc++.so.6.0.35    -      -
             |
-            |--6.99%--main
+            |--1.06%--qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&)
+            |          qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&)
+            |          qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+            |          qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+            |          qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+            |          qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
+            |          qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
+            |          qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>)
+            |          main
             |          __libc_start_call_main
             |          __libc_start_main@@GLIBC_2.34
             |          _start
             |
-             --1.61%--qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
-                       qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
-                       qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>)
+             --1.06%--qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long)
                        main
                        __libc_start_call_main
                        __libc_start_main@@GLIBC_2.34
                        _start
 
-     4.84%  [.] qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       qsl-bench              -      -
-            |
-             --4.30%--qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
-                       |
-                        --3.76%--qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
-                                  |
-                                  |--2.15%--qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
-                                  |          qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
-                                  |          qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
-                                  |          qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>)
-                                  |          main
-                                  |          __libc_start_call_main
-                                  |          __libc_start_main@@GLIBC_2.34
-                                  |          _start
-                                  |
-                                   --1.61%--qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&)
-                                             |
-                                              --1.08%--main
-                                                        __libc_start_call_main
-                                                        __libc_start_main@@GLIBC_2.34
-                                                        _start
-
-     4.30%  [.] malloc@plt                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           libstdc++.so.6.0.35    -      -
-            |
-            ---operator new(unsigned long)
-               |
-               |--2.15%--qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
-               |          |
-               |          |--1.08%--qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&)
-               |          |
-               |           --1.08%--qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
-               |                     qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
-               |                     qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
-               |                     qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>)
-               |                     main
-               |                     __libc_start_call_main
-               |                     __libc_start_main@@GLIBC_2.34
-               |                     _start
-               |
-                --1.08%--qsl::protocol::encode(qsl::protocol::Fill const&)
-                          qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
-                          qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
-                          qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
-                          qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>)
-                          main
-                          __libc_start_call_main
-                          __libc_start_main@@GLIBC_2.34
-                          _start
-
-     2.69%  [.] operator new(unsigned long)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          libstdc++.so.6.0.35    -      -
-            |
-             --1.08%--qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
-
-     2.69%  [.] qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     qsl-bench              -      -
+     3.72%  [.] qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         qsl-bench              -      -
             |
-            |--1.61%--qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&)
+            |--2.13%--qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&)
             |          |
-            |           --1.08%--qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&)
+            |           --1.60%--qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&)
             |                     main
             |                     __libc_start_call_main
             |                     __libc_start_main@@GLIBC_2.34
             |                     _start
             |
-             --1.08%--qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+             --1.60%--qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
                        qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
                        qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
                        qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>)
@@ -256,200 +194,245 @@ perf report output:
                        __libc_start_main@@GLIBC_2.34
                        _start
 
-     2.69%  [.] qsl::engine::OrderBook::contains(unsigned long) const                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                qsl-bench              -      -
+     3.72%  [.] qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         qsl-bench              -      -
             |
-             --1.61%--qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long)
+            |--2.13%--qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
+            |          qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>)
+            |          main
+            |          __libc_start_call_main
+            |          __libc_start_main@@GLIBC_2.34
+            |          _start
+            |
+             --1.60%--qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
+                       qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
+                       qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>)
                        main
                        __libc_start_call_main
                        __libc_start_main@@GLIBC_2.34
                        _start
 
-     2.15%  [.] __posix_memalign                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     libc.so.6              -      -
+     3.19%  [.] malloc@plt                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               libstdc++.so.6.0.35    -      -
             |
-            ---operator new(unsigned long, std::align_val_t)
+            ---operator new(unsigned long)
                |
-                --1.08%--std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&)
-                          qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
-                          qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
-
-     2.15%  [.] main                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 qsl-bench              -      -
-            |
-            ---__libc_start_call_main
-               __libc_start_main@@GLIBC_2.34
-               _start
-
-     2.15%  [.] operator delete(void*)@plt                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           libstdc++.so.6.0.35    -      -
-     2.15%  [.] operator delete(void*, unsigned long, std::align_val_t)@plt                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          libstdc++.so.6.0.35    -      -
-            |
-            |--1.08%--std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*)
-            |          decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0]
-            |          qsl::engine::OrderBook::cancel(unsigned long)
-            |
-             --1.08%--qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&)
-                       decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0]
-                       qsl::engine::OrderBook::cancel(unsigned long)
-
-     2.15%  [.] qsl::engine::OrderBook::modify(unsigned long, long, unsigned int)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    qsl-bench              -      -
-            |
-            ---main
-               __libc_start_call_main
-               __libc_start_main@@GLIBC_2.34
-               _start
-
-     2.15%  [.] std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*)                                                                                                                                                                                                                                                                                                                                              qsl-bench              -      -
-            |
-             --1.61%--decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0]
-                       |
-                        --1.08%--qsl::engine::OrderBook::cancel(unsigned long)
-
-     2.15%  [.] std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                libstdc++.so.6.0.35    -      -
-            |
-            ---qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&)
-               decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0]
-               qsl::engine::OrderBook::cancel(unsigned long)
-               main
-               __libc_start_call_main
-               __libc_start_main@@GLIBC_2.34
-               _start
+               |--1.06%--qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+               |          qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
+               |          qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
+               |          qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>)
+               |          main
+               |          __libc_start_call_main
+               |          __libc_start_main@@GLIBC_2.34
+               |          _start
+               |
+                --1.06%--qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long)
+                          main
+                          __libc_start_call_main
+                          __libc_start_main@@GLIBC_2.34
+                          _start
 
-     2.15%  [.] std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&)                                                                                                                                                                                                                                                                                                                                                                                                                                                            qsl-bench              -      -
+     3.19%  [.] qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      qsl-bench              -      -
             |
             ---qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
                qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
-               main
-               __libc_start_call_main
-               __libc_start_main@@GLIBC_2.34
-               _start
+               |
+               |--2.13%--main
+               |          __libc_start_call_main
+               |          __libc_start_main@@GLIBC_2.34
+               |          _start
+               |
+                --1.06%--qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+                          qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&)
 
-     1.61%  [.] decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0]                                                                                                                                                                                                                                                                                                                                            qsl-bench              -      -
+     3.19%  [.] qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           qsl-bench              -      -
             |
-            ---qsl::engine::OrderBook::cancel(unsigned long)
-               main
-               __libc_start_call_main
-               __libc_start_main@@GLIBC_2.34
-               _start
+            ---qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+               |
+                --2.66%--qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+                          |
+                          |--1.60%--qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&)
+                          |          |
+                          |           --1.06%--qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&)
+                          |                     main
+                          |                     __libc_start_call_main
+                          |                     __libc_start_main@@GLIBC_2.34
+                          |                     _start
+                          |
+                           --1.06%--qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+                                     qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
+                                     qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
+                                     qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>)
+                                     main
+                                     __libc_start_call_main
+                                     __libc_start_main@@GLIBC_2.34
+                                     _start
+
+     3.19%  [.] qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         qsl-bench              -      -
+            |
+            ---qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+               |
+                --2.66%--main
+                          __libc_start_call_main
+                          __libc_start_main@@GLIBC_2.34
+                          _start
 
-     1.61%  [.] memcpy@plt                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           qsl-bench              -      -
-     1.61%  [.] operator new(unsigned long, std::align_val_t)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        libstdc++.so.6.0.35    -      -
+     2.66%  [.] _mid_memalign                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            libc.so.6              -      -
             |
-             --1.08%--qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
+            |--1.60%--0x2fffff346e1a63
+            |          operator new(unsigned long, std::align_val_t)
+            |          qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long)
+            |          qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
+            |          qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+            |          main
+            |          __libc_start_call_main
+            |          __libc_start_main@@GLIBC_2.34
+            |          _start
+            |
+             --1.06%--0x63ffff346e1a63
+                       operator new(unsigned long, std::align_val_t)
+                       std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&)
+                       qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
                        qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
                        main
                        __libc_start_call_main
                        __libc_start_main@@GLIBC_2.34
                        _start
 
-     1.61%  [.] operator new(unsigned long, std::align_val_t)@plt                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    libstdc++.so.6.0.35    -      -
+     2.66%  [.] qsl::engine::OrderBook::contains(unsigned long) const                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    qsl-bench              -      -
             |
-             --1.08%--std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&)
-                       qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long)
-                       qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
-                       qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+             --2.13%--qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long)
                        main
                        __libc_start_call_main
                        __libc_start_main@@GLIBC_2.34
                        _start
 
-     1.61%  [.] qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 qsl-bench              -      -
+     2.66%  [.] qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      qsl-bench              -      -
             |
-            ---qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&)
-               qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
-               qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+            ---decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0]
                |
-                --1.08%--qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
-                          qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
-                          qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
-                          qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>)
+                --2.13%--qsl::engine::OrderBook::cancel(unsigned long)
                           main
                           __libc_start_call_main
                           __libc_start_main@@GLIBC_2.34
                           _start
 
-     1.61%  [.] qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     qsl-bench              -      -
+     2.66%  [.] std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    libstdc++.so.6.0.35    -      -
             |
-             --1.08%--qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
-                       qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>)
+            |--1.60%--qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&)
+            |          |
+            |           --1.06%--qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+            |                     qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+            |                     qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&)
+            |                     main
+            |                     __libc_start_call_main
+            |                     __libc_start_main@@GLIBC_2.34
+            |                     _start
+            |
+             --1.06%--qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&)
+                       decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0]
+                       qsl::engine::OrderBook::cancel(unsigned long)
                        main
                        __libc_start_call_main
                        __libc_start_main@@GLIBC_2.34
                        _start
 
-     1.61%  [.] std::__detail::_List_node_base::_M_unhook()@plt                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      qsl-bench              -      -
+     2.13%  [.] __posix_memalign                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         libc.so.6              -      -
             |
-             --1.08%--qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&)
-                       decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0]
-                       qsl::engine::OrderBook::cancel(unsigned long)
+            |--1.06%--operator new(unsigned long, std::align_val_t)
+            |          std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&)
+            |          qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
+            |          qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+            |
+             --1.06%--0x14ffff349d51d3
+                       qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
+                       qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
                        main
                        __libc_start_call_main
                        __libc_start_main@@GLIBC_2.34
                        _start
 
-     1.61%  [.] std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&)                                                                                                        qsl-bench              -      -
+     2.13%  [.] operator delete(void*)@plt                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               libstdc++.so.6.0.35    -      -
             |
-            ---qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long)
-               qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
-               qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
-               |
-                --1.08%--qsl::engine::OrderBook::modify(unsigned long, long, unsigned int)
-                          qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int)
-                          qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&)
-                          qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&)
-                          main
-                          __libc_start_call_main
-                          __libc_start_main@@GLIBC_2.34
-                          _start
+             --1.60%--main
+                       __libc_start_call_main
+                       __libc_start_main@@GLIBC_2.34
+                       _start
 
-     1.08%  [.] __memcpy_generic                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     libc.so.6              -      -
-     1.08%  [.] _mid_memalign                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        libc.so.6              -      -
+     2.13%  [.] qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  qsl-bench              -      -
             |
-            ---__posix_memalign
-               operator new(unsigned long, std::align_val_t)
-               std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&)
-               qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
-               qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+            ---qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
+               qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
+               qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>)
                main
                __libc_start_call_main
                __libc_start_main@@GLIBC_2.34
                _start
 
-     1.08%  [.] operator delete(void*, unsigned long)@plt                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            qsl-bench              -      -
-     1.08%  [.] qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     qsl-bench              -      -
+     1.60%  [.] decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0]                                                                                qsl-bench              -      -
             |
-            ---qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&)
+             --1.06%--qsl::engine::OrderBook::cancel(unsigned long)
+                       main
+                       __libc_start_call_main
+                       __libc_start_main@@GLIBC_2.34
+                       _start
 
-     1.08%  [.] qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             qsl-bench              -      -
-     1.08%  [.] qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  qsl-bench              -      -
+     1.60%  [.] free@plt                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 libstdc++.so.6.0.35    -      -
+     1.60%  [.] operator delete(void*, unsigned long, std::align_val_t)@plt                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              libstdc++.so.6.0.35    -      -
+     1.60%  [.] qsl::engine::OrderBook::can_store_limit(qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               qsl-bench              -      -
             |
-            ---qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&)
-               qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&)
+            ---qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+               qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+               qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
+               qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
+               qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>)
                main
                __libc_start_call_main
                __libc_start_main@@GLIBC_2.34
                _start
 
-     1.08%  [.] qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        qsl-bench              -      -
+     1.60%  [.] qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  qsl-bench              -      -
             |
-            ---qsl::engine::OrderBook::modify(unsigned long, long, unsigned int)
-               qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int)
-               qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&)
+            ---qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>)
                main
                __libc_start_call_main
                __libc_start_main@@GLIBC_2.34
                _start
 
-     1.08%  [.] qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              qsl-bench              -      -
+     1.60%  [.] qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      qsl-bench              -      -
             |
-            ---qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
-               qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
-               qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>)
+             --1.06%--main
+                       __libc_start_call_main
+                       __libc_start_main@@GLIBC_2.34
+                       _start
+
+     1.60%  [.] qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     qsl-bench              -      -
+            |
+            ---main
+               __libc_start_call_main
+               __libc_start_main@@GLIBC_2.34
+               _start
+
+     1.06%  [.] __memcpy_generic                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         libc.so.6              -      -
+     1.06%  [.] decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0]  qsl-bench              -      -
+            |
+            ---qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+
+     1.06%  [.] operator new(unsigned long)@plt                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          qsl-bench              -      -
+     1.06%  [.] qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         qsl-bench              -      -
+            |
+            ---qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&)
+
+     1.06%  [.] qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 qsl-bench              -      -
+            |
+            ---qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long)
                main
                __libc_start_call_main
                __libc_start_main@@GLIBC_2.34
                _start
 
-     1.08%  [.] qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      qsl-bench              -      -
+     1.06%  [.] qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               qsl-bench              -      -
             |
-            ---qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
+            ---qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
+               qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
                qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long)
                qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>)
                main
@@ -457,15 +440,27 @@ perf report output:
                __libc_start_main@@GLIBC_2.34
                _start
 
-     1.08%  [.] qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               qsl-bench              -      -
+     1.06%  [.] qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   qsl-bench              -      -
             |
             ---main
                __libc_start_call_main
                __libc_start_main@@GLIBC_2.34
                _start
 
-     1.08%  [.] std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               libstdc++.so.6.0.35    -      -
-     1.08%  [.] std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             libstdc++.so.6.0.35    -      -
+     1.06%  [.] std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*)                                                                                  qsl-bench              -      -
+     1.06%  [.] std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*, unsigned long)                                                                        qsl-bench              -      -
+            |
+            ---std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&)
+               qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
+               qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+
+     1.06%  [.] std::__detail::_List_node_base::_M_unhook()                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              libstdc++.so.6.0.35    -      -
+            |
+            ---qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&)
+               decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0]
+               qsl::engine::OrderBook::cancel(unsigned long)
+
+     1.06%  [.] std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&)                                                                                                                                                                                                qsl-bench              -      -
             |
             ---qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
                qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
@@ -474,9 +469,8 @@ perf report output:
                __libc_start_main@@GLIBC_2.34
                _start
 
-     1.08%  [.] std::pmr::(anonymous namespace)::newdel_res_t::do_deallocate(void*, unsigned long, unsigned long)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    libstdc++.so.6.0.35    -      -
 
 
 #
-# (Tip: Compare performance results with: perf diff [<old file> <new file>])
+# (Tip: To see list of saved events and attributes: perf evlist -v)
 #
diff --git a/results/perf_stat_linux.txt b/results/perf_stat_linux.txt
index ddcd14b..1c6f521 100644
--- a/results/perf_stat_linux.txt
+++ b/results/perf_stat_linux.txt
@@ -8,12 +8,12 @@ Perf:        perf version 6.19.14-400.asahi.fc44.aarch64
 Perf paranoid: 2
 Build type:  Release
 Provenance version: 1
-Git commit (informational): 33f8d11
-Source digest: sha256:d8856d2f599416a9e74050726279a67f88d61a3bb3d06de86eb3bf948d2a16a5
+Git commit (informational): f9f7e98
+Source digest: sha256:59d9fdbc9d64b974bd28094e55610cced29a381c6e2ec968092862a975bde281
 Source digest scope: perf-stat-benchmark
 Dirty inputs: no
 Generated output: results/perf_stat_linux.txt
-Date: 2026-06-21T05:25:23Z
+Date: 2026-06-25T02:30:17Z
 Benchmark binary: build/bench/qsl-bench
 Benchmark status: 0
 Dataset:     qsl-bench default synthetic benchmark suite
@@ -28,39 +28,39 @@ until every requested counter is supported. Profiling evidence for investigation
 not a production-latency claim.
 
 Benchmark output:
-order_book add/mod/cancel     200000 ops        143.9 ns/op        6947208 ops/sec
-protocol encode+decode        500000 ops         21.5 ns/op       46599221 ops/sec
-gateway session (fill)        200000 ops        129.7 ns/op        7710496 ops/sec
-matching engine flow            5004 items      102.4 ns/item      9769779 items/sec
-replay command log              5004 items      110.3 ns/item      9064737 items/sec
+order_book add/mod/cancel     200000 ops        137.2 ns/op        7288189 ops/sec
+protocol encode+decode        500000 ops         21.2 ns/op       47087754 ops/sec
+gateway session (fill)        200000 ops        120.8 ns/op        8277978 ops/sec
+matching engine flow            5004 items       93.2 ns/item     10733865 items/sec
+replay command log              5004 items       97.1 ns/item     10294783 items/sec
 
 Benchmark output under perf:
-order_book add/mod/cancel     200000 ops         92.7 ns/op       10785353 ops/sec
-protocol encode+decode        500000 ops         16.3 ns/op       61508483 ops/sec
-gateway session (fill)        200000 ops        110.8 ns/op        9023997 ops/sec
-matching engine flow            5004 items       98.1 ns/item     10190493 items/sec
-replay command log              5004 items      109.4 ns/item      9137639 items/sec
+order_book add/mod/cancel     200000 ops        121.9 ns/op        8202972 ops/sec
+protocol encode+decode        500000 ops         21.0 ns/op       47563493 ops/sec
+gateway session (fill)        200000 ops        120.9 ns/op        8269791 ops/sec
+matching engine flow            5004 items       95.8 ns/item     10437399 items/sec
+replay command log              5004 items       99.6 ns/item     10043348 items/sec
 
 perf stat output:
 
  Performance counter stats for 'build/bench/qsl-bench':
 
-       233,479,932      apple_avalanche_pmu/cycles/u
+       221,558,456      apple_avalanche_pmu/cycles/u
      <not counted>      apple_blizzard_pmu/cycles/u                                             (0.00%)
-     1,247,839,058      apple_avalanche_pmu/instructions/u
+     1,160,776,150      apple_avalanche_pmu/instructions/u
      <not counted>      apple_blizzard_pmu/instructions/u                                        (0.00%)
-       245,495,434      apple_avalanche_pmu/branches/u
+       233,032,815      apple_avalanche_pmu/branches/u
      <not counted>      apple_blizzard_pmu/branches/u                                           (0.00%)
-         1,272,574      apple_avalanche_pmu/branch-misses/u
+         1,143,050      apple_avalanche_pmu/branch-misses/u
      <not counted>      apple_blizzard_pmu/branch-misses/u                                        (0.00%)
    <not supported>      apple_avalanche_pmu/cache-references/u
    <not supported>      apple_blizzard_pmu/cache-references/u
    <not supported>      apple_avalanche_pmu/cache-misses/u
    <not supported>      apple_blizzard_pmu/cache-misses/u
                  0      context-switches:u
-               208      page-faults:u
+               229      page-faults:u
 
-       0.081496718 seconds time elapsed
+       0.091141580 seconds time elapsed
 
-       0.080390000 seconds user
-       0.000988000 seconds sys
+       0.090001000 seconds user
+       0.001001000 seconds sys
diff --git a/results/pool_backed_storage.txt b/results/pool_backed_storage.txt
index ceb42ed..aabf682 100644
--- a/results/pool_backed_storage.txt
+++ b/results/pool_backed_storage.txt
@@ -4,12 +4,12 @@ OS:          Linux 6.19.14-400.asahi.fc44.aarch64+16k
 Compiler:    c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2)
 Build type:  Release
 Provenance version: 1
-Git commit (informational): 33f8d11
-Source digest: sha256:0596425b8906deaa0bc34f841889eaddd11efa1fc39854c7f92327de1f69ad4d
+Git commit (informational): f9f7e98
+Source digest: sha256:c1e4cd7db8472a87cbd23ece3a2d4b330f78ad876b58da412e0e54f6c4eb4cf7
 Source digest scope: order-book-storage-benchmark
 Dirty inputs: no
 Generated output: results/pool_backed_storage.txt
-Date: 2026-06-21T05:25:22Z
+Date: 2026-06-25T02:29:36Z
 Dataset:     deterministic storage workloads (general, dense, sparse, cancel/modify, match/traversal)
 Scenario:    baseline OrderBook storage vs PMR pooled nodes vs intrusive OrderPool nodes vs contiguous price-indexed storage
 Warmup:      one full workload replay per storage mode before timing
@@ -43,39 +43,39 @@ Scenario / Metric / Result:
 Workload: general generated flow (seed=42)
 Purpose:  Existing deterministic generated engine flow; mixed insert, match, cancel, modify, IOC, and market activity.
 Shape:    commands=5000 events=7155 accepted=3517 trades=2238 cancel_cmds=793 modify_cmds=690 market_orders=602 ioc_orders=376 canceled_events=710 modified_events=690 final_resting=37 max_resting=72 max_bid_levels=21 max_ask_levels=22 avg_bid_levels=11.2 avg_ask_levels=12.4 max_active_levels=41 price_width=67 price_density=0.076 top_probe_calls=0
-general generated flow   baseline              5000 cmds    30 reps median     99.4 ns/cmd min     98.5 max    102.3     10059512 cmds/sec events/run=7155 resting/run=37 last_seq/run=7155 probes/run=0
-general generated flow   pooled pmr            5000 cmds    30 reps median    114.0 ns/cmd min    113.1 max    116.2      8771914 cmds/sec events/run=7155 resting/run=37 last_seq/run=7155 probes/run=0
-general generated flow   intrusive pool        5000 cmds    30 reps median     82.5 ns/cmd min     81.4 max     88.3     12128563 cmds/sec events/run=7155 resting/run=37 last_seq/run=7155 probes/run=0
-general generated flow   contiguous            5000 cmds    30 reps median     73.3 ns/cmd min     72.4 max     76.3     13644128 cmds/sec events/run=7155 resting/run=37 last_seq/run=7155 probes/run=0
+general generated flow   baseline              5000 cmds    30 reps median     89.4 ns/cmd min     88.4 max     91.6     11185657 cmds/sec events/run=7155 resting/run=37 last_seq/run=7155 probes/run=0
+general generated flow   pooled pmr            5000 cmds    30 reps median    100.0 ns/cmd min     99.1 max    102.2      9997481 cmds/sec events/run=7155 resting/run=37 last_seq/run=7155 probes/run=0
+general generated flow   intrusive pool        5000 cmds    30 reps median     80.3 ns/cmd min     79.6 max     84.4     12454572 cmds/sec events/run=7155 resting/run=37 last_seq/run=7155 probes/run=0
+general generated flow   contiguous            5000 cmds    30 reps median     71.2 ns/cmd min     69.9 max     73.4     14053155 cmds/sec events/run=7155 resting/run=37 last_seq/run=7155 probes/run=0
 
 Workload: dense bounded flow (seed=4702)
 Purpose:  Small bounded price domain with many live levels, repeated same-price operations, and top-of-book probes after every command.
 Shape:    commands=5002 events=5558 accepted=4018 trades=1048 cancel_cmds=0 modify_cmds=492 market_orders=984 ioc_orders=492 canceled_events=0 modified_events=492 final_resting=2264 max_resting=2264 max_bid_levels=40 max_ask_levels=40 avg_bid_levels=39.2 avg_ask_levels=38.5 max_active_levels=80 price_width=136 price_density=0.147 top_probe_calls=20008
-dense bounded flow       baseline              5002 cmds    30 reps median     75.4 ns/cmd min     74.8 max     76.3     13270826 cmds/sec events/run=5558 resting/run=2264 last_seq/run=5558 probes/run=20008
-dense bounded flow       pooled pmr            5002 cmds    30 reps median     78.5 ns/cmd min     78.3 max     80.6     12742618 cmds/sec events/run=5558 resting/run=2264 last_seq/run=5558 probes/run=20008
-dense bounded flow       intrusive pool        5002 cmds    30 reps median     52.8 ns/cmd min     52.2 max     53.9     18952928 cmds/sec events/run=5558 resting/run=2264 last_seq/run=5558 probes/run=20008
-dense bounded flow       contiguous            5002 cmds    30 reps median     57.9 ns/cmd min     57.4 max     58.8     17260715 cmds/sec events/run=5558 resting/run=2264 last_seq/run=5558 probes/run=20008
+dense bounded flow       baseline              5002 cmds    30 reps median     66.1 ns/cmd min     65.9 max     67.3     15128922 cmds/sec events/run=5558 resting/run=2264 last_seq/run=5558 probes/run=20008
+dense bounded flow       pooled pmr            5002 cmds    30 reps median     66.3 ns/cmd min     66.1 max     67.1     15075710 cmds/sec events/run=5558 resting/run=2264 last_seq/run=5558 probes/run=20008
+dense bounded flow       intrusive pool        5002 cmds    30 reps median     52.2 ns/cmd min     51.8 max     53.4     19161667 cmds/sec events/run=5558 resting/run=2264 last_seq/run=5558 probes/run=20008
+dense bounded flow       contiguous            5002 cmds    30 reps median     57.8 ns/cmd min     57.2 max     58.8     17310414 cmds/sec events/run=5558 resting/run=2264 last_seq/run=5558 probes/run=20008
 
 Workload: sparse wide flow (seed=4703)
 Purpose:  Wide in-band price domain with few active levels and many gaps.
 Shape:    commands=5000 events=5000 accepted=3344 trades=0 cancel_cmds=828 modify_cmds=828 market_orders=0 ioc_orders=0 canceled_events=828 modified_events=828 final_resting=2516 max_resting=2517 max_bid_levels=16 max_ask_levels=16 avg_bid_levels=7.8 avg_ask_levels=7.5 max_active_levels=32 price_width=985 price_density=0.004 top_probe_calls=0
-sparse wide flow         baseline              5000 cmds    30 reps median     64.1 ns/cmd min     63.6 max     65.2     15606711 cmds/sec events/run=5000 resting/run=2516 last_seq/run=5000 probes/run=0
-sparse wide flow         pooled pmr            5000 cmds    30 reps median     69.3 ns/cmd min     69.0 max     71.2     14419611 cmds/sec events/run=5000 resting/run=2516 last_seq/run=5000 probes/run=0
-sparse wide flow         intrusive pool        5000 cmds    30 reps median     40.9 ns/cmd min     40.1 max     43.0     24430047 cmds/sec events/run=5000 resting/run=2516 last_seq/run=5000 probes/run=0
-sparse wide flow         contiguous            5000 cmds    30 reps median     39.4 ns/cmd min     38.8 max     41.2     25359213 cmds/sec events/run=5000 resting/run=2516 last_seq/run=5000 probes/run=0
+sparse wide flow         baseline              5000 cmds    30 reps median     55.7 ns/cmd min     55.1 max     56.5     17964093 cmds/sec events/run=5000 resting/run=2516 last_seq/run=5000 probes/run=0
+sparse wide flow         pooled pmr            5000 cmds    30 reps median     57.9 ns/cmd min     57.6 max     60.7     17263703 cmds/sec events/run=5000 resting/run=2516 last_seq/run=5000 probes/run=0
+sparse wide flow         intrusive pool        5000 cmds    30 reps median     42.8 ns/cmd min     42.1 max     45.5     23387217 cmds/sec events/run=5000 resting/run=2516 last_seq/run=5000 probes/run=0
+sparse wide flow         contiguous            5000 cmds    30 reps median     40.3 ns/cmd min     39.8 max     41.9     24829299 cmds/sec events/run=5000 resting/run=2516 last_seq/run=5000 probes/run=0
 
 Workload: cancel/modify-heavy flow (seed=4704)
 Purpose:  Locator-heavy workload with frequent active cancels, in-place modifies, replenishment, and duplicate active ids.
 Shape:    commands=5001 events=4801 accepted=1599 trades=0 cancel_cmds=1599 modify_cmds=1603 market_orders=0 ioc_orders=0 canceled_events=1599 modified_events=1603 final_resting=0 max_resting=62 max_bid_levels=30 max_ask_levels=30 avg_bid_levels=1.9 avg_ask_levels=1.4 max_active_levels=60 price_width=30 price_density=0.333 top_probe_calls=0
-cancel/modify-heavy flow baseline              5001 cmds    30 reps median     46.2 ns/cmd min     46.0 max     47.9     21649351 cmds/sec events/run=4801 resting/run=0 last_seq/run=4801 probes/run=0
-cancel/modify-heavy flow pooled pmr            5001 cmds    30 reps median     53.5 ns/cmd min     53.4 max     54.8     18680801 cmds/sec events/run=4801 resting/run=0 last_seq/run=4801 probes/run=0
-cancel/modify-heavy flow intrusive pool        5001 cmds    30 reps median     36.2 ns/cmd min     35.1 max     38.7     27610766 cmds/sec events/run=4801 resting/run=0 last_seq/run=4801 probes/run=0
-cancel/modify-heavy flow contiguous            5001 cmds    30 reps median     31.6 ns/cmd min     31.5 max     33.4     31618479 cmds/sec events/run=4801 resting/run=0 last_seq/run=4801 probes/run=0
+cancel/modify-heavy flow baseline              5001 cmds    30 reps median     49.0 ns/cmd min     48.8 max     50.4     20419162 cmds/sec events/run=4801 resting/run=0 last_seq/run=4801 probes/run=0
+cancel/modify-heavy flow pooled pmr            5001 cmds    30 reps median     54.7 ns/cmd min     54.6 max     55.8     18271296 cmds/sec events/run=4801 resting/run=0 last_seq/run=4801 probes/run=0
+cancel/modify-heavy flow intrusive pool        5001 cmds    30 reps median     36.7 ns/cmd min     36.0 max     38.6     27284333 cmds/sec events/run=4801 resting/run=0 last_seq/run=4801 probes/run=0
+cancel/modify-heavy flow contiguous            5001 cmds    30 reps median     31.4 ns/cmd min     31.1 max     33.0     31861824 cmds/sec events/run=4801 resting/run=0 last_seq/run=4801 probes/run=0
 
 Workload: match/traversal-heavy flow (seed=4705)
 Purpose:  Many small maker orders per level sweep, stressing level traversal and best-price maintenance.
 Shape:    commands=5003 events=9015 accepted=5003 trades=4012 cancel_cmds=0 modify_cmds=0 market_orders=494 ioc_orders=494 canceled_events=0 modified_events=0 final_resting=3 max_resting=76 max_bid_levels=20 max_ask_levels=40 avg_bid_levels=2.5 avg_ask_levels=5.4 max_active_levels=60 price_width=81 price_density=0.370 top_probe_calls=0
-match/traversal-heavy flow baseline              5003 cmds    30 reps median     98.7 ns/cmd min     96.7 max    101.1     10133500 cmds/sec events/run=9015 resting/run=3 last_seq/run=9015 probes/run=0
-match/traversal-heavy flow pooled pmr            5003 cmds    30 reps median    115.8 ns/cmd min    114.8 max    117.1      8638895 cmds/sec events/run=9015 resting/run=3 last_seq/run=9015 probes/run=0
-match/traversal-heavy flow intrusive pool        5003 cmds    30 reps median     70.1 ns/cmd min     68.3 max     73.3     14262013 cmds/sec events/run=9015 resting/run=3 last_seq/run=9015 probes/run=0
-match/traversal-heavy flow contiguous            5003 cmds    30 reps median     59.8 ns/cmd min     59.5 max     60.2     16725449 cmds/sec events/run=9015 resting/run=3 last_seq/run=9015 probes/run=0
+match/traversal-heavy flow baseline              5003 cmds    30 reps median     96.5 ns/cmd min     95.4 max     99.1     10361739 cmds/sec events/run=9015 resting/run=3 last_seq/run=9015 probes/run=0
+match/traversal-heavy flow pooled pmr            5003 cmds    30 reps median    110.0 ns/cmd min    108.8 max    114.2      9094975 cmds/sec events/run=9015 resting/run=3 last_seq/run=9015 probes/run=0
+match/traversal-heavy flow intrusive pool        5003 cmds    30 reps median     65.1 ns/cmd min     64.0 max     66.8     15350534 cmds/sec events/run=9015 resting/run=3 last_seq/run=9015 probes/run=0
+match/traversal-heavy flow contiguous            5003 cmds    30 reps median     56.8 ns/cmd min     56.5 max     57.3     17603243 cmds/sec events/run=9015 resting/run=3 last_seq/run=9015 probes/run=0
diff --git a/results/recovery_benchmarks.txt b/results/recovery_benchmarks.txt
index 6796022..aa24eb8 100644
--- a/results/recovery_benchmarks.txt
+++ b/results/recovery_benchmarks.txt
@@ -4,12 +4,12 @@ OS:          Linux 6.19.14-400.asahi.fc44.aarch64+16k
 Compiler:    c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2)
 Build type:  Release
 Provenance version: 1
-Git commit (informational): 33f8d11
-Source digest: sha256:66d9841df48c833aeecd1da299536f7a7b16300ea83683d1bc209580ae0bebfe
+Git commit (informational): f9f7e98
+Source digest: sha256:89cd2b8b43602479475abeb330a8f6c854339c62fe22867b5cbdd715b8e65cd4
 Source digest scope: recovery-benchmark
 Dirty inputs: no
 Generated output: results/recovery_benchmarks.txt
-Date: 2026-06-21T05:25:22Z
+Date: 2026-06-25T02:29:37Z
 Dataset:     deterministic generated flows (seed 42, 4 symbols, 5k/20k/80k commands)
              plus synthetic non-crossing resting books (1k/10k/50k resting orders)
 Scenario:    full-replay restart (recover_log_file + replay) vs in-memory book rebuild
@@ -30,35 +30,35 @@ and build-dependent; not a production recovery-time claim.
 
 Scenario / Metric / Result:
 log 5004 commands  file 0.24 MiB  resting 37 orders
-  recover_log_file (read+verify+classify)      10 reps        0.460 ms/run         92.0 ns/record
-  replay into fresh engine (decode+apply)      10 reps        0.709 ms/run        141.8 ns/command
-  full restart (recover + replay)              10 reps        1.017 ms/run        203.2 ns/command
-  capture resting state (snapshot proto)       10 reps        0.000 ms/run          6.4 ns/order
-  rebuild book from captured state             10 reps        0.006 ms/run        168.8 ns/order
+  recover_log_file (read+verify+classify)      10 reps        0.716 ms/run        143.1 ns/record
+  replay into fresh engine (decode+apply)      10 reps        0.894 ms/run        178.6 ns/command
+  full restart (recover + replay)              10 reps        1.239 ms/run        247.6 ns/command
+  capture resting state (snapshot proto)       10 reps        0.000 ms/run          8.0 ns/order
+  rebuild book from captured state             10 reps        0.008 ms/run        216.0 ns/order
 
 log 20004 commands  file 0.95 MiB  resting 30 orders
-  recover_log_file (read+verify+classify)      10 reps        1.331 ms/run         66.5 ns/record
-  replay into fresh engine (decode+apply)      10 reps        2.270 ms/run        113.5 ns/command
-  full restart (recover + replay)              10 reps        3.668 ms/run        183.4 ns/command
+  recover_log_file (read+verify+classify)      10 reps        1.478 ms/run         73.9 ns/record
+  replay into fresh engine (decode+apply)      10 reps        1.992 ms/run         99.6 ns/command
+  full restart (recover + replay)              10 reps        3.360 ms/run        168.0 ns/command
   capture resting state (snapshot proto)       10 reps        0.000 ms/run          5.7 ns/order
-  rebuild book from captured state             10 reps        0.004 ms/run        127.4 ns/order
+  rebuild book from captured state             10 reps        0.003 ms/run        111.5 ns/order
 
 log 80004 commands  file 3.81 MiB  resting 24 orders
-  recover_log_file (read+verify+classify)      10 reps        5.653 ms/run         70.7 ns/record
-  replay into fresh engine (decode+apply)      10 reps        9.050 ms/run        113.1 ns/command
-  full restart (recover + replay)              10 reps       14.736 ms/run        184.2 ns/command
-  capture resting state (snapshot proto)       10 reps        0.000 ms/run          6.6 ns/order
-  rebuild book from captured state             10 reps        0.003 ms/run        117.9 ns/order
+  recover_log_file (read+verify+classify)      10 reps        5.691 ms/run         71.1 ns/record
+  replay into fresh engine (decode+apply)      10 reps        7.931 ms/run         99.1 ns/command
+  full restart (recover + replay)              10 reps       13.734 ms/run        171.7 ns/command
+  capture resting state (snapshot proto)       10 reps        0.000 ms/run          6.4 ns/order
+  rebuild book from captured state             10 reps        0.002 ms/run        101.7 ns/order
 
 synthetic resting book  4 symbols  1000 resting orders
-  capture resting state (snapshot proto)       10 reps        0.002 ms/run          2.1 ns/order
-  rebuild book from captured state             10 reps        0.111 ms/run        110.9 ns/order
+  capture resting state (snapshot proto)       10 reps        0.002 ms/run          2.0 ns/order
+  rebuild book from captured state             10 reps        0.090 ms/run         90.4 ns/order
 
 synthetic resting book  4 symbols  10000 resting orders
-  capture resting state (snapshot proto)       10 reps        0.054 ms/run          5.4 ns/order
-  rebuild book from captured state             10 reps        0.996 ms/run         99.6 ns/order
+  capture resting state (snapshot proto)       10 reps        0.055 ms/run          5.5 ns/order
+  rebuild book from captured state             10 reps        0.722 ms/run         72.2 ns/order
 
 synthetic resting book  4 symbols  50000 resting orders
-  capture resting state (snapshot proto)       10 reps        0.382 ms/run          7.6 ns/order
-  rebuild book from captured state             10 reps        4.775 ms/run         95.5 ns/order
+  capture resting state (snapshot proto)       10 reps        0.376 ms/run          7.5 ns/order
+  rebuild book from captured state             10 reps        3.487 ms/run         69.7 ns/order
 
diff --git a/results/socket_load_summary.txt b/results/socket_load_summary.txt
index fc59226..bf8bb9c 100644
--- a/results/socket_load_summary.txt
+++ b/results/socket_load_summary.txt
@@ -7,12 +7,12 @@ Cores:       8
 Compiler:    c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2)
 Build type:  Debug
 Provenance version: 1
-Git commit (informational): 33f8d11
-Source digest: sha256:bc03fe85ae7177600ecff2a3acca3c7f6892e071716ae5e63529732fea798607
+Git commit (informational): f9f7e98
+Source digest: sha256:221f3a2da54f889cdd448da347faccb9a81c38d5191fee759d69a66aaf54f677
 Source digest scope: socket-load
 Dirty inputs: no
 Generated output: results/socket_load_summary.txt
-Date: 2026-06-21T05:26:56Z
+Date: 2026-06-25T02:31:10Z
 Dataset:     synthetic order flow via qsl-client (NewOrder + Heartbeat per connection)
 Scenario:    concurrent short-lived client sweep across the threaded and epoll transports
 Metric:      best (min) wall time per cell, approximate conns/s, and completion ratio
@@ -30,14 +30,14 @@ path, not matching.
 
 mode       clients   wall(s,best)     conns/s(~)  completed
 -------    -------   ------------     ----------  ---------
-threaded         1         0.0037            270        1/1
-threaded         4         0.0084            476        4/4
-threaded         8         0.0187            428        8/8
-threaded        16         0.0416            385      16/16
-epoll            1         0.0038            263        1/1
-epoll            4         0.0093            430        4/4
-epoll            8         0.0197            406        8/8
-epoll           16         0.0340            471      16/16
+threaded         1         0.0044            227        1/1
+threaded         4         0.0085            471        4/4
+threaded         8         0.0154            519        8/8
+threaded        16         0.0396            404      16/16
+epoll            1         0.0046            217        1/1
+epoll            4         0.0070            571        4/4
+epoll            8         0.0146            548        8/8
+epoll           16         0.0385            416      16/16
 
 Reading the result: compare how the best wall time grows with the client count within each
 mode. At these small loopback counts connection setup dominates and the two modes can stay close
diff --git a/results/socket_profile_loopback.txt b/results/socket_profile_loopback.txt
index 7dd6c40..c14b1c6 100644
--- a/results/socket_profile_loopback.txt
+++ b/results/socket_profile_loopback.txt
@@ -8,12 +8,12 @@ Build type:  Debug
 strace:      strace -- version 7.1
 CLK_TCK:     100
 Provenance version: 1
-Git commit (informational): 33f8d11
-Source digest: sha256:b5e71f8f19427217a89854903a73fcfa796acf2217b706b718acd93c3410fa4c
+Git commit (informational): f9f7e98
+Source digest: sha256:def5b5aa6ee476f193dc5e8a3054389db19c1bfd5af58d3355a5d06a1b9419df
 Source digest scope: gateway-io-profile
 Dirty inputs: no
 Generated output: results/socket_profile_loopback.txt
-Date: 2026-06-21T05:25:30Z
+Date: 2026-06-25T02:31:16Z
 Transport:   TCP over 127.0.0.1 (loopback), portable threaded TcpServer
 Load:        500 sequential client round trips (NewOrder + Heartbeat each)
 
@@ -22,12 +22,12 @@ User (engine-side) vs System (kernel/socket) CPU time splits user-space matching
 from time spent in the kernel servicing accept/read/write/close on the socket path.
 
 User (engine-side) CPU time:   0.000 s  (0 ticks)
-System (kernel/socket) CPU time: 0.040 s  (4 ticks)
+System (kernel/socket) CPU time: 0.030 s  (3 ticks)
 System share of CPU:           100.0%
-Minor page faults: 172    Major page faults: 1
-VmHWM:	    3872 kB
-voluntary_ctxt_switches:	503
-nonvoluntary_ctxt_switches:	12
+Minor page faults: 172    Major page faults: 0
+VmHWM:	    3840 kB
+voluntary_ctxt_switches:	502
+nonvoluntary_ctxt_switches:	0
 
 == Pass 2: strace -f -c (syscall mix on the gateway socket path) ==
 Call counts and in-kernel time per syscall. strace perturbs timing heavily, so read the
@@ -35,35 +35,35 @@ syscall *mix* (which calls dominate the socket path), not the absolute seconds.
 
 % time     seconds  usecs/call     calls    errors syscall
 ------ ----------- ----------- --------- --------- ----------------
- 46.85    0.041831          83       502         1 accept
- 21.65    0.019334          38       501           clone3
-  7.52    0.006711           3      2005           rt_sigprocmask
-  5.28    0.004715           9       506           close
-  5.09    0.004543           9       500           sendto
-  4.53    0.004042           8       502           rseq
-  3.63    0.003241           3      1005           read
-  3.30    0.002947           5       502           set_robust_list
-  2.03    0.001811           3       504           madvise
-  0.04    0.000036           3        11           mprotect
-  0.03    0.000025           1        24           mmap
-  0.02    0.000016           8         2         1 futex
-  0.01    0.000007           7         1           socket
-  0.01    0.000005           5         1           rt_sigaction
-  0.01    0.000005           5         1           bind
-  0.01    0.000005           0         9           munmap
-  0.00    0.000003           3         1           set_tid_address
-  0.00    0.000003           3         1           listen
-  0.00    0.000003           3         1           setsockopt
-  0.00    0.000003           1         3           brk
-  0.00    0.000002           2         1         1 ioctl
-  0.00    0.000002           0         6           fstat
+ 49.31    0.042227          84       502         1 accept
+ 18.43    0.015780          31       501           clone3
+  7.59    0.006499           3      2005           rt_sigprocmask
+  5.77    0.004942           9       506           close
+  5.41    0.004637           9       500           sendto
+  4.09    0.003499           6       502           rseq
+  3.92    0.003354           3      1005           read
+  3.19    0.002733           5       502           set_robust_list
+  2.17    0.001858           3       503           madvise
+  0.04    0.000030          30         1           socket
+  0.02    0.000021           0        23           mmap
+  0.02    0.000016          16         1           bind
+  0.01    0.000011           1         9           munmap
+  0.01    0.000008           8         1           rt_sigaction
+  0.01    0.000008           8         1           listen
+  0.01    0.000008           8         1           setsockopt
+  0.01    0.000005           0        11           mprotect
+  0.00    0.000000           0         1         1 ioctl
   0.00    0.000000           0         1         1 faccessat
   0.00    0.000000           0         5           openat
+  0.00    0.000000           0         6           fstat
+  0.00    0.000000           0         1           set_tid_address
+  0.00    0.000000           0         1           futex
+  0.00    0.000000           0         3           brk
   0.00    0.000000           0         1           execve
   0.00    0.000000           0         1           prlimit64
   0.00    0.000000           0         1           getrandom
 ------ ----------- ----------- --------- --------- ----------------
-100.00    0.089290          13      6598         4 total
+100.00    0.085636          12      6595         3 total
 
 Caveats:
 - Loopback only: no NIC, device driver, routing, or real-network behaviour is exercised.
diff --git a/results/socket_stress_summary.txt b/results/socket_stress_summary.txt
index 680178d..8d6628a 100644
--- a/results/socket_stress_summary.txt
+++ b/results/socket_stress_summary.txt
@@ -8,12 +8,12 @@ rmem_max:     4194304
 Compiler:    c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2)
 Build type:  Debug
 Provenance version: 1
-Git commit (informational): 33f8d11
-Source digest: sha256:b787480b33b16e7b8ab0d2ff6fc3baa5e5ab7162b3bc3d8de691656e4692cdc1
+Git commit (informational): f9f7e98
+Source digest: sha256:8afc6fd65ed36967ea09e87bc411638ae107cec7a1a3e68dba367c1f9479d5d5
 Source digest scope: socket-stress
 Dirty inputs: no
 Generated output: results/socket_stress_summary.txt
-Date: 2026-06-21T05:26:28Z
+Date: 2026-06-25T02:32:17Z
 Transport:   UDP unicast over 127.0.0.1 (loopback)
 Dataset:     qsl-mdfeed publish, seed 42, 20000 orders, 3 symbols
 Trials/setting: 4
@@ -25,8 +25,8 @@ clamp the request, so the effective (granted) size is read back via getsockopt.
 
 setting  requested(B) effective(B)  published  lost/trial      maxlost  seq-gaps/trial
 -------  ------------ ------------  ---------  ----------      -------  --------------
-small            2048         4096      14820  5,0,0,9               9  5,0,0,9
-default             0       212992      14820  0,11,298,0          298  0,11,298,0
+small            2048         4096      14820  0,663,347,490       663  0,663,347,490
+default             0       212992      14820  0,0,2224,0         2224  0,0,2224,0
 large         8388608      8388608      14820  0,0,0,0               0  0,0,0,0
 
 Reading the result: 'lost/trial' is the honest loss metric -- published minus received