diff --git a/AGENTS.md b/AGENTS.md
index 592ad76..444799e 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -1263,6 +1263,19 @@ Fedora Asahi, and `v0.2.0` was released (PR #127 ded6e80; resume-anchor sync PR
`v0.2.1` then shipped two reprioritized backlog items plus a consistency sweep: a Codex
resume-anchor/PMU sweep (PR #129), a perf call-graph flamegraph + `make flamegraph` (PR #134,
superseding the auto-closed #130, closing #32), and the FIX-like text protocol adapter (PR #131,
-closing #29), with the version bump on the release PR. There is no active milestone; the
+closing #29), with the version bump on the release PR.
+
+Since `v0.2.1`, a post-v0.2.1 hardening + perf wave (PRs #135–#146) is merged to `main` and
+unreleased, being cut as **`v0.2.2`**. It came out of a 4-round adversarial bug hunt (converged
+5→2→1→0 confirmed) plus flamegraph-guided optimization. Security/robustness: out-of-domain enum
+rejection in the replay/protocol decoders (#136); network hardening — EINTR retry, accept fairness,
+connection cap, UDP send-error tracking, transient-accept survival, and threaded/epoll fd-exhaustion
+handling (#137, #140, #143); CLI arg validation (#141); a **real UBSan abort gate** — the `asan`
+preset now sets `-fno-sanitize-recover=undefined`, since UBSan previously ran in recover mode and
+exited 0, so pure-UBSan defects passed CI green (#142); OCaml `diff_report` per-fixture robustness
+(#144). Perf (measured back-to-back A/B): `try_emplace` for baseline price levels (~+5%, #138) and
+an order-index hash `max_load_factor` cap at 0.25 (~+18.6%, #145), flamegraph regenerated
+(#135/#139/#146). Determinism preserved (byte-identical fixtures; OCaml differential pass).
+`make check`/`make asan` 270/270 (the latter now a real UBSan gate). After `v0.2.2`, the
highest-value remaining work is non-code and gated on #94 (external review) and #90 (full cache-PMU
evidence on a PMU-capable microarchitecture).
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 98f8da8..0f9dcb0 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,55 @@ All notable changes to this project. The format is loosely based on
_Nothing yet._
+## [0.2.2] - 2026-06-24
+
+A security/robustness **hardening** wave plus two measured order-book **performance** wins, driven by
+a multi-round adversarial bug hunt (converged 5→2→1→0 confirmed bugs) and flamegraph-guided
+optimization. Same honesty bar: a deterministic C++20 exchange simulator and cross-language
+differential-testing harness — **not** a production exchange, no real-market connectivity, no latency
+or profitability claims, not formal verification. Determinism preserved throughout (fixtures
+byte-identical across g++/clang++ and vs the committed copies; the OCaml differential passes).
+`make check`/`make asan` 270/270.
+
+### Fixed
+
+- **Reject out-of-domain enum bytes in the decoders (#136).** `replay::decode_command` (NewLimit /
+ NewMarket) and `protocol::decode_reject` cast raw bytes to `Side` / `TimeInForce` / `RejectReason`
+ without validating the domain. Since the replay path applies decoded commands straight to the
+ engine with no gateway risk check, a corrupt log record could silently diverge replayed state.
+ Both now validate via `core::is_valid` (added `is_valid(RejectReason)`) and refuse out-of-domain
+ bytes like a malformed frame.
+- **Network-path hardening (#137, #140, #143).** The TCP gateway now retries `EINTR` in its
+ read/write paths and survives transient `accept()` errors (`EINTR`/`ECONNABORTED`) instead of
+ tearing the listener down; both the threaded acceptor (back-off retry) and the epoll loop (listener
+ disarm/re-arm) survive fd exhaustion (`EMFILE`/`ENFILE`); a `TcpServerOptions::max_active_connections`
+ cap sheds load; the epoll loop bounds accepts per tick for fairness; and `UdpPublisher` counts
+ `send_failures` rather than silently dropping datagrams.
+- **CLI argument validation (#141).** `qsl-client`, `qsl-mdfeed`, and `qsl-export-fixture` parse
+ numeric arguments with `std::from_chars` and reject malformed / out-of-range input with a usage
+ message and non-zero exit, instead of `std::terminate` (from an uncaught `std::sto*` exception) or
+ silently truncating an out-of-range port.
+- **UBSan gate now actually fails (#142).** The `asan` preset adds `-fno-sanitize-recover=undefined`
+ so UBSan **aborts** on the first violation. It previously ran in recover mode (print a diagnostic,
+ exit 0), so a pure-UBSan defect passed `make asan` / CI green. The tree is UBSan-clean under the
+ strict gate.
+- **OCaml `diff_report` robustness (#144).** The differential-bundle bin guards each fixture
+ (catching `Stream_parser.Parse_error` / `Sys_error`) so one malformed or unreadable fixture cannot
+ abort the whole batch and silently lose the divergence bundles for the rest.
+
+### Performance
+
+- **`try_emplace` for baseline price levels (#138).** `OrderBook::level_for` used
+ `std::map::emplace`, which allocates and frees a node even when the price level already exists.
+ `try_emplace` avoids that on the steady-state common path. Measured back-to-back A/B on the
+ `qsl-bench profile` workload: **~+5%**.
+- **Order-index hash load-factor cap (#145).** The `OrderId → Locator` index is the busiest structure
+ on the engine hot path (1–4 point lookups per op). Capping its `max_load_factor` at 0.25 shortens
+ probe chains. Measured A/B: **~+18.6%**. Determinism is unaffected — the index is never iterated
+ for output.
+- **Flamegraph regenerated (#135, #139, #146)** against the new code, now a dense (~20k-sample),
+ fully-symbolized frame-pointer profile with zero `[unknown]` frames.
+
## [0.2.1] - 2026-06-21
Two backlog items — reprioritized by the maintainer and delivered — plus a resume-anchor and
diff --git a/CLAUDE.md b/CLAUDE.md
index 0b0cabd..60e126e 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -1207,6 +1207,19 @@ Fedora Asahi, and `v0.2.0` was released (PR #127 ded6e80; resume-anchor sync PR
`v0.2.1` then shipped two reprioritized backlog items plus a consistency sweep: a Codex
resume-anchor/PMU sweep (PR #129), a perf call-graph flamegraph + `make flamegraph` (PR #134,
superseding the auto-closed #130, closing #32), and the FIX-like text protocol adapter (PR #131,
-closing #29), with the version bump on the release PR. There is no active milestone; the
+closing #29), with the version bump on the release PR.
+
+Since `v0.2.1`, a post-v0.2.1 hardening + perf wave (PRs #135–#146) is merged to `main` and
+unreleased, being cut as **`v0.2.2`**. It came out of a 4-round adversarial bug hunt (converged
+5→2→1→0 confirmed) plus flamegraph-guided optimization. Security/robustness: out-of-domain enum
+rejection in the replay/protocol decoders (#136); network hardening — EINTR retry, accept fairness,
+connection cap, UDP send-error tracking, transient-accept survival, and threaded/epoll fd-exhaustion
+handling (#137, #140, #143); CLI arg validation (#141); a **real UBSan abort gate** — the `asan`
+preset now sets `-fno-sanitize-recover=undefined`, since UBSan previously ran in recover mode and
+exited 0, so pure-UBSan defects passed CI green (#142); OCaml `diff_report` per-fixture robustness
+(#144). Perf (measured back-to-back A/B): `try_emplace` for baseline price levels (~+5%, #138) and
+an order-index hash `max_load_factor` cap at 0.25 (~+18.6%, #145), flamegraph regenerated
+(#135/#139/#146). Determinism preserved (byte-identical fixtures; OCaml differential pass).
+`make check`/`make asan` 270/270 (the latter now a real UBSan gate). After `v0.2.2`, the
highest-value remaining work is non-code and gated on #94 (external review) and #90 (full cache-PMU
evidence on a PMU-capable microarchitecture).
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 19fc3b1..83d5b0b 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -1,5 +1,5 @@
cmake_minimum_required(VERSION 3.24)
-project(quant-systems-lab VERSION 0.2.1 LANGUAGES CXX)
+project(quant-systems-lab VERSION 0.2.2 LANGUAGES CXX)
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index d5e4a5e..1fba07a 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -20,7 +20,7 @@ reviewable.
```bash
make check # clang-format check + build + tests
-make asan # AddressSanitizer + UBSan build and tests
+make asan # AddressSanitizer + UBSan build and tests (UBSan aborts on first violation)
dune runtest --root ocaml # OCaml log verifier + independent replay + differential + mutation tests
```
diff --git a/HANDOFF.md b/HANDOFF.md
index 00271c9..0594dc6 100644
--- a/HANDOFF.md
+++ b/HANDOFF.md
@@ -33,8 +33,19 @@ partial-PMU reframe, and a full documentation staleness sweep — landed as PR #
**v0.2.1 release** then adds two reprioritized backlog items and a consistency sweep: a Codex
resume-anchor/PMU sweep (PR #129), a perf call-graph flamegraph + `make flamegraph` (PR #130,
issue #32), the FIX-like text protocol adapter (PR #131, issue #29), and the version-bump release
-PR — merged in that order, with `v0.2.1` tagged on the release merge commit. There is no active
-milestone; the project is between releases.
+PR — merged in that order, with `v0.2.1` tagged on the release merge commit.
+
+Since `v0.2.1`, a **post-v0.2.1 hardening + perf wave (#135–#146) is merged to `main` and
+unreleased**, being cut as **`v0.2.2`**. It came out of a 4-round adversarial bug hunt (converged
+5→2→1→0 confirmed bugs) plus flamegraph-guided optimization. Security/robustness: out-of-domain enum
+rejection in the replay/protocol decoders (#136); network hardening — EINTR retry, accept fairness,
+connection cap, UDP send-error tracking, transient-accept survival, and threaded/epoll fd-exhaustion
+handling (#137, #140, #143); CLI arg validation (#141); a **real UBSan abort gate** —
+`-fno-sanitize-recover=undefined`, since UBSan previously ran in recover mode and exited 0 (#142);
+OCaml `diff_report` robustness (#144). Perf (measured A/B): `try_emplace` for baseline price levels
+(~+5%, #138) and an order-index hash load-factor cap (~+18.6%, #145), with the flamegraph regenerated
+(#135/#139/#146). `make check`/`make asan` 270/270 (the latter now under the real UBSan gate). The
+next action is to finish this `v0.2.2` doc/artifact overhaul and cut the tag.
Background — Linux perf evidence (merged, now bare-metal partial PMU):
@@ -77,13 +88,15 @@ Current state:
- latest synced main baseline: `ded6e80` (PR #127, v0.2.0); the `v0.2.1` baseline is the release-PR
merge commit, after PRs #129/#130/#131
-- current active branch, if active: none (work lands via scoped PRs from `main`)
-- current active status: `v0.2.1` is the current release on top of `v0.2.0`. It adds the FIX-like
- text protocol adapter (#29), `make flamegraph` + a bare-metal flamegraph artifact (#32), and a
- Codex resume-anchor/PMU consistency sweep. `make check` 263/263 and `make asan` 263/263 on the
- bare-metal Apple M2 Fedora Asahi host; both new code files pass the CI CodeScene Code Health gate.
- No active milestone
-- release tag: `v0.2.1` (Latest, tagged on the release-PR merge commit), after `v0.2.0` and `v0.1.0`
+- current active branch, if active: `docs/post-v0.2.1-overhaul` (v0.2.2 prep + doc/artifact sweep)
+- current active status: `v0.2.1` is the latest tag; a post-v0.2.1 hardening + perf wave (#135–#146)
+ is merged to `main` and unreleased, being cut as `v0.2.2` (decoder enum rejection, network/CLI
+ hardening, a real UBSan abort gate, OCaml diff_report robustness, and two measured order-book perf
+ wins — `try_emplace` ~+5% and an index load-factor cap ~+18.6%). `make check` 270/270 and
+ `make asan` 270/270 (the latter now under the real UBSan gate) on the bare-metal Apple M2 Fedora
+ Asahi host; every touched file passes the CI CodeScene Code Health gate
+- release tag: `v0.2.1` (Latest, tagged on the release-PR merge commit), after `v0.2.0` and `v0.1.0`;
+ `v0.2.2` prepared on this branch, not yet tagged
- open follow-up issue: #90 — narrowed to the full cache-counter PMU set; the bare-metal Apple host
provides real cycles/instructions/branches/branch-misses but no cache-reference/cache-miss support
- issues #95, #28, and #26 were closed by PR #112; issues #32 and #29 were closed by PR #134 and
@@ -94,12 +107,13 @@ Current state:
### Next milestone
-There is no active milestone. M0–M49, the Linux artifact refresh (PR #125), the v0.2.0 release
-(PR #127), and the v0.2.1 content (PRs #129/#134/#131 + release PR) are merged. The highest-value
-remaining work is non-code and externally gated: issue #94 (independent external review — needs a
-human reviewer) and issue #90 (full cache-counter PMU evidence — needs a PMU microarchitecture that
-exposes cache events). The #32 (flamegraph) and #29 (FIX adapter) backlog items are now done. Do not
-invent a new milestone without an explicit human request.
+There is no active milestone. M0–M49 are merged, as are the v0.2.0/v0.2.1 releases and the
+post-v0.2.1 hardening + perf wave (#135–#146, being released as `v0.2.2`). The immediate next action
+is to finish the `v0.2.2` doc/artifact overhaul (this branch) and cut the tag. After that the
+highest-value remaining work is non-code and externally gated: issue #94 (independent external
+review — needs a human reviewer) and issue #90 (full cache-counter PMU evidence — needs a PMU
+microarchitecture that exposes cache events). Do not invent a new milestone without an explicit
+human request.
### Phase III / IV purpose
diff --git a/PROGRESS.md b/PROGRESS.md
index 53d81cc..d5ba28d 100644
--- a/PROGRESS.md
+++ b/PROGRESS.md
@@ -20,36 +20,41 @@ Do not rely on prior chat memory.
## Current state
-- **Active milestone:** none — `v0.2.1` released; project is between releases
-- **Status:** ☑ `v0.2.1` published (FIX-like text protocol adapter #29, perf flamegraph #32, and a
- resume-anchor/PMU consistency sweep) on top of `v0.2.0`
-- **Active branch:** none (work lands via scoped PRs from `main`)
+- **Active milestone:** none — `v0.2.1` is the latest tag, but a post-v0.2.1 hardening + perf wave
+ (12 PRs, #135–#146) has merged to `main` and is **unreleased**; it is being cut as **`v0.2.2`**
+- **Status:** ☑ `v0.2.1` published on top of `v0.2.0`; ☐ `v0.2.2` in preparation — security/robustness
+ hardening (decoder enum-domain rejection, network/CLI hardening, a real UBSan abort gate, OCaml
+ diff_report robustness) plus two measured order-book perf wins
+- **Active branch:** `docs/post-v0.2.1-overhaul` (the v0.2.2 prep + full doc/artifact staleness sweep)
- **Last completed milestone:** M49 — NIC offload and low-latency networking study (PR #124,
- d8c16b2); since then `v0.2.0` (PR #127, ded6e80) and the `v0.2.1` content: Codex resume-anchor
- sweep (PR #129), perf flamegraph #32 (PR #134), and the FIX text adapter #29 (PR #131)
-- **Last completed docs sync:** v0.2.1 release prep (this PR): version bump + CHANGELOG `[0.2.1]`
- and resume/release anchors brought current
-- **Release:** `v0.1.0` (tag on 9857e1a), `v0.2.0` (tag on ded6e80), and `v0.2.1` (tag created on the
- squash-merge of the release PR, marked Latest) published as GitHub-only releases; no packages
- published
-- **`make check` passing:** yes — `make check` 263/263 and `make asan` 263/263 on the bare-metal
- Apple M2 (aarch64) Fedora Asahi host on 2026-06-21 (includes the v0.2.1 FIX-adapter and flamegraph
- renderer tests)
-- **Last action:** delivered the `v0.2.1` content as scoped PRs and prepared this version-bump
- release. Two reprioritized backlog items — the FIX-like text protocol adapter (#29) and the perf
- call-graph flamegraph (#32) — plus the Codex resume-anchor/PMU consistency sweep (#127/#128
- follow-up). Ran Codex as an independent reviewer across the stack and resolved every finding: the
- FIX envelope now requires MsgType as the first body field and rejects duplicate tags;
- `flamegraph.sh` classifies zero-sample/partial runs honestly, fails hard on renderer errors, and
- gates on the folded sample total (not perf's estimate); and the resume anchors were made
- consistent across PROGRESS/HANDOFF/AGENTS/CLAUDE. Brought every touched file through the CodeScene
- Code Health gate (table-driven enum maps, a `decode_typed` skeleton, split `parse_envelope`,
- flattened `flamegraph.py`). `make check`/`make asan` 263/263.
-- **Next action:** no active milestone. Highest-value remaining work is non-code and gated:
- issue #94 (independent external review — needs a human reviewer) and issue #90 (full
- cache-counter PMU evidence — needs a PMU microarchitecture that exposes cache events, e.g.
- x86_64). The #32 (flamegraph) and #29 (FIX adapter) backlog items are done — shipped in `v0.2.1`
- (PR #134 and PR #131) — so do not reopen them.
+ d8c16b2). Releases since: `v0.2.0` (PR #127, ded6e80) and `v0.2.1` (FIX adapter #131, flamegraph
+ #134, anchor sweep #129). Post-v0.2.1 unreleased work on `main`: #135–#146 (see Last action)
+- **Last completed docs sync:** this v0.2.2-prep overhaul — every `.md`/`.txt` audited against
+ current `main`; resume/release anchors, README, CHANGELOG, and all stale `results/*.txt`
+ provenance digests brought current to HEAD
+- **Release:** `v0.1.0` (tag on 9857e1a), `v0.2.0` (tag on ded6e80), and `v0.2.1` (tag on the
+ release-PR merge, marked Latest) published as GitHub-only releases; `v0.2.2` prepared here, not yet
+ tagged; no packages published
+- **`make check` passing:** yes — `make check` 270/270 and `make asan` 270/270 (the latter now under
+ the **real** UBSan abort gate from #142) on the bare-metal Apple M2 (aarch64) Fedora Asahi host on
+ 2026-06-24
+- **Last action:** post-v0.2.1 hardening + perf wave merged to `main` as 12 scoped PRs (#135–#146),
+ driven by a multi-round adversarial bug hunt (converged 5→2→1→0 confirmed) and flamegraph-guided
+ optimization. Security/robustness: reject out-of-domain enum bytes in the replay/protocol decoders
+ (#136); network hardening — EINTR retry, accept fairness, connection cap, UDP send-error tracking,
+ transient-accept survival, and threaded/epoll fd-exhaustion handling (#137, #140, #143); CLI arg
+ validation so the tools reject malformed input instead of `std::terminate` (#141); the `asan`
+ preset now sets `-fno-sanitize-recover=undefined` so UBSan actually fails CI — previously it ran in
+ recover mode and exited 0 (#142); OCaml `diff_report` guards each fixture so one bad file cannot
+ abort the batch (#144). Perf (measured A/B): baseline price levels use `try_emplace` (~+5%, #138)
+ and the order-index hash caps its load factor at 0.25 (~+18.6%, #145); flamegraph regenerated
+ (#135, #139, #146). Determinism preserved throughout (byte-identical fixtures, OCaml differential
+ pass). `make check`/`make asan` 270/270.
+- **Next action:** finish the `v0.2.2` overhaul (this branch): regenerate the remaining stale
+ `results/*.txt` artifacts, then cut the `v0.2.2` tag/release. After that, the highest-value
+ remaining work is non-code and gated: issue #94 (independent external review — needs a human
+ reviewer) and issue #90 (full cache-counter PMU evidence — needs a PMU microarchitecture that
+ exposes cache events, e.g. x86_64).
- **Blockers:** issue #90 is now a *cache-counter* PMU gap, not a host-access gap — this bare-metal
Apple M2 exposes real `cycles`/`instructions`/`branches`/`branch-misses` but its PMU does not
implement `cache-references`/`cache-misses`; closing it needs a PMU microarchitecture that exposes
@@ -221,15 +226,21 @@ Status key:
- _none yet_
-Measured by `make bench` (full metadata + raw output in `results/latest.txt`). Hardware-,
-compiler-, and build-dependent — these are from one machine, not a production-latency claim.
-
-- Run: arm64, Apple clang 17, Release, seed 42, commit fbb8180 (synthetic, in-process; excludes network/disk/kernel path).
-- order book add/modify/cancel: ~126 ns/op
-- protocol NewOrder encode+decode: ~39 ns/op
-- in-process gateway session (crossing order with fill): ~270 ns/op
-- matching-engine flow apply: ~121 ns/command
-- replay from command log: ~132 ns/command
+Measured by `make bench` (full metadata + raw output in `results/latest.txt`, which is the
+authoritative source). Hardware-, compiler-, and build-dependent — from one machine, not a
+production-latency claim.
+
+- Run: aarch64 (Apple M2), GCC, Release, seed 42, Fedora Asahi Linux (synthetic, in-process;
+ excludes network/disk/kernel path). The earlier macOS Apple-clang numbers (~126/39/270/121/132 ns)
+ were superseded by the Linux regeneration and are not the current set.
+- order book add/modify/cancel: ~90 ns/op
+- protocol NewOrder encode+decode: ~16 ns/op
+- in-process gateway session (crossing order with fill): ~102 ns/op
+- matching-engine flow apply: ~91 ns/command
+- replay from command log: ~101 ns/command
+- Note: these single-process micro-benchmarks hold a near-empty order index, so they do not exercise
+ the deep-book steady state where the v0.2.2 engine wins land — `try_emplace` (~+5%, #138) and the
+ order-index load-factor cap (~+18.6%, #145) are measured on the `qsl-bench profile` workload.
---
@@ -431,6 +442,25 @@ Lower priority:
release anchors and removed completed #29/#32 from every backlog list, synced AGENTS.md/CLAUDE.md
to the v0.2.1 released state, and refreshed this release-readiness audit to 263 tests. `make
check`/`make asan` 263/263. CodeScene MCP token still expired; CI is the authoritative gate.
+- [2026-06-24] Post-v0.2.1 hardening + perf wave (#135–#146), to be released as `v0.2.2`. Driven by a
+ multi-round adversarial bug hunt (4 rounds, converged 5→2→1→0 confirmed) plus flamegraph-guided
+ optimization. Security/robustness: reject out-of-domain enum bytes in the replay/protocol decoders
+ (#136, `core::is_valid` for Side/TimeInForce/RejectReason); network hardening — EINTR retry in the
+ TCP read/write path, accept fairness (epoll `max_accepts_per_tick`), connection cap
+ (`max_active_connections`), UDP send-error counter, transient-accept survival
+ (EINTR/ECONNABORTED), and threaded/epoll fd-exhaustion handling (#137, #140, #143); CLI arg
+ validation via `std::from_chars` so qsl-client/qsl-mdfeed/qsl-export-fixture reject malformed input
+ instead of `std::terminate`/silent port truncation (#141); the `asan` preset now sets
+ `-fno-sanitize-recover=undefined` so UBSan **aborts** on a violation — it previously ran in recover
+ mode and exited 0, so pure-UBSan defects passed CI green; the tree is UBSan-clean under the strict
+ gate (#142); OCaml `diff_report` guards each fixture so one malformed file cannot abort the batch
+ (#144). Perf (measured back-to-back A/B on the `qsl-bench profile` workload): baseline price levels
+ use `try_emplace` (~+5%, #138) and the order-index hash caps `max_load_factor` at 0.25 (~+18.6%,
+ #145); flamegraph regenerated against the new code (#135/#139/#146). Determinism preserved
+ throughout (byte-identical fixtures across g++/clang++ and vs committed; OCaml differential pass).
+ Then a full doc/artifact staleness overhaul (this branch): every `.md`/`.txt` audited against HEAD,
+ resume/release anchors + README + CHANGELOG brought current, and the stale `results/*.txt`
+ provenance digests regenerated. `make check`/`make asan` 270/270.
- [2026-06-03] M35: implemented a multi-client TCP connection-scaling load test (`scripts/socket_load.sh`, `make socket-load`, Linux-only) driving N concurrent `qsl-client`s against the portable TCP and epoll (M34) gateways; `results/socket_load_summary.txt` is Docker-generated and constrained. A `/code-review` (3 finder agents) caught and fixed real measurement-integrity bugs before the PR: a failed trial's `wall=0` no longer poisons the reported best (only trials whose gateway served count toward the min); the `completed` column reports the WORST per-trial completion, not the last, so partial/total trial failures are surfaced rather than masked; a per-client `timeout` bounds a hang if the gateway dies; and `QSL_LOAD_TRIALS` is validated. Post-PR hardening uses fresh monotonic ports per gateway start, retries transient startup/serve failures on new ports, and refuses to write a partial artifact unless `QSL_LOAD_ALLOW_PARTIAL=1` is set intentionally; the refreshed artifact records `Dirty tree: no`. The scaling-shape claim remains constrained to loopback connection setup, not a demonstrated production-capacity advantage for either transport. Deferred follow-up: a shared `scripts/lib` to remove the dirty-tree / `wait_ready` / gateway-stop duplication across the three socket scripts.
- [2026-06-03] M35: started after M34 (#98) squash-merged (commit 9e3750b). Scope: multi-client load / socket-pressure testing of the gateway/feed path (TCP/UDP stress, socket-buffer pressure, connection scaling, backpressure) building on M34's epoll multi-client path and M30's socket tooling. Constraints: scripts/tests document load shape + environment; results must distinguish kernel/socket pressure from user-space engine cost; no production-capacity claims (honest constrained-environment framing, like M29/M30).
- [2026-06-04] M35: PR #100 squash-merged to `main` as a86b701 after all CI jobs and review checks were green. M35 is now landed; original M36 NUMA remains deferred until the repository-health refactor analysis is completed or explicitly skipped by the human.
@@ -837,14 +867,17 @@ Quant Systems Lab — Linux Systems + Exchange Infrastructure Simulator
## Next action remains
-There is no active milestone. `v0.2.1` is the current release, on top of `v0.2.0` (PR #127 ded6e80)
-and `v0.1.0`. The `v0.2.1` content is squash-merged to `main`: the Codex resume-anchor sweep
-(PR #129), the perf flamegraph #32 (PR #134, superseding the auto-closed #130), the FIX text adapter
-#29 (PR #131), and the version-bump release PR (#133), with `v0.2.1` tagged on the release merge
-commit. The committed perf artifacts remain **partial hardware PMU evidence** from this bare-metal
-Apple M2 (aarch64) Fedora Asahi host — real cycles/instructions/branches/branch-misses with
-cache-reference/cache-miss counters unsupported by the Apple Silicon PMU — not NIC-offload, latency,
-or full hardware-PMU evidence.
+`v0.2.1` is the latest tag, on top of `v0.2.0` (PR #127 ded6e80) and `v0.1.0`. A post-v0.2.1
+hardening + perf wave (#135–#146) is squash-merged to `main` and **unreleased**, being cut as
+`v0.2.2`: out-of-domain enum rejection in the decoders (#136); network hardening — EINTR retry,
+accept fairness, connection cap, UDP send-error tracking, transient-accept survival, and fd-exhaustion
+handling (#137, #140, #143); CLI arg validation (#141); a real UBSan abort gate (#142); OCaml
+`diff_report` robustness (#144); and two measured order-book perf wins — `try_emplace` (~+5%, #138)
+and the order-index load-factor cap (~+18.6%, #145), with the flamegraph regenerated (#135/#139/#146).
+`make check`/`make asan` 270/270. The committed perf artifacts remain **partial hardware PMU
+evidence** from this bare-metal Apple M2 (aarch64) Fedora Asahi host — real
+cycles/instructions/branches/branch-misses with cache-reference/cache-miss counters unsupported by
+the Apple Silicon PMU — not NIC-offload, latency, or full hardware-PMU evidence.
Highest-value remaining work is non-code and gated: issue #94 (independent external review) and
issue #90 (full cache-PMU evidence). Issue #90 needs a PMU **microarchitecture** that exposes cache
diff --git a/README.md b/README.md
index ee12fd7..319157b 100644
--- a/README.md
+++ b/README.md
@@ -98,14 +98,18 @@ methodology and caveats in [docs/benchmarking.md](docs/benchmarking.md) and
| Scenario (synthetic, in-process) | Measured on this run |
|---|---|
-| Order book add/modify/cancel | ~87 ns/op |
+| Order book add/modify/cancel | ~90 ns/op |
| Protocol `NewOrder` encode+decode | ~16 ns/op |
-| Gateway session, crossing order with fill | ~110 ns/op |
-| Matching-engine flow (apply) | ~98 ns/command |
-| Replay from command log | ~110 ns/command |
-
-Reproduce with `make bench` (numbers will differ by machine). The differential-testing harness
-(generation, replay, shrinking) has its own benchmark — `make bench-diff`, written to
+| Gateway session, crossing order with fill | ~102 ns/op |
+| Matching-engine flow (apply) | ~91 ns/command |
+| Replay from command log | ~101 ns/command |
+
+Reproduce with `make bench` (numbers will differ by machine). These micro-benchmarks hold a
+near-empty order index, so they do **not** exercise the deep-book steady state where the v0.2.2
+engine optimizations land: `try_emplace` for baseline price levels (#138) and capping the
+order-index hash load factor (#145) were measured by back-to-back A/B on the `qsl-bench profile`
+workload at **~+5%** and **~+18.6%** respectively (determinism preserved). The differential-testing
+harness (generation, replay, shrinking) has its own benchmark — `make bench-diff`, written to
[`results/differential.txt`](results/differential.txt) — kept separate so it does not disturb
the core numbers above.
@@ -121,8 +125,10 @@ capture is dense (~20k samples) and stacks are fully symbolized — no `[unknown
This is a **software cpu-clock sampling** hot-symbol profile, **not** PMU evidence: frame width is
proportional to on-CPU samples, not wall-clock latency or throughput, and it is
-hardware/kernel/compiler/build dependent. The hot frames are `MatchingEngine::new_limit`/`cancel`,
-the order-book level/index operations, and the allocator. Provenance and classification are in
+hardware/kernel/compiler/build dependent. The hot frames are the matching and resting work —
+`MatchingEngine::new_limit` → `OrderBook::match_baseline` and `rest` → `level_for`, plus `cancel`;
+the per-level allocation churn and order-index lookups that previously dominated were cut by the
+v0.2.2 `try_emplace` (#138) and index load-factor (#145) wins. Provenance and classification are in
[`results/flamegraph.txt`](results/flamegraph.txt); methodology in
[docs/perf_analysis.md](docs/perf_analysis.md). GitHub renders the SVG statically; download the raw
file for interactive zoom and search.
@@ -132,9 +138,12 @@ file for interactive zoom and search.
- **Synthetic and local.** No real market data, no real venue connectivity, no order types
beyond limit/market + GTC/IOC.
- **Networking remains scoped.** The default TCP gateway is intentionally
- loopback-only and unauthenticated. It now has portable threaded serving for multiple clients, and
- Linux builds also include an opt-in `epoll` gateway prototype for event-driven readiness. These
- are architecture and pressure-validation paths, not a production event loop or capacity claim.
+ loopback-only and unauthenticated. It has portable threaded serving for multiple clients, plus an
+ opt-in Linux `epoll` gateway prototype for event-driven readiness. Both paths were hardened in
+ v0.2.2: `EINTR` retry on read/write, survival of transient `accept()` errors and fd exhaustion
+ (`EMFILE`/`ENFILE`) instead of tearing the listener down, a connection cap, and per-tick accept
+ fairness. These are architecture and robustness paths, not a production event loop or capacity
+ claim.
- **Benchmarks are microbenchmarks**, not end-to-end or production latency (see above).
CPU-affinity/scheduler-migration and false-sharing studies are separate hardware-dependent
artifacts; contiguous order-book storage is a bounded-domain architecture study, not a general
diff --git a/SECURITY.md b/SECURITY.md
index 1cce055..2993800 100644
--- a/SECURITY.md
+++ b/SECURITY.md
@@ -11,8 +11,10 @@ The demo network components are intentionally minimal:
and bind to `127.0.0.1` only.
- They are for local demonstration. **Do not expose `qsl-gateway` or `qsl-mdfeed` to untrusted
networks**, and do not run them on a shared or public interface.
-- There is no TLS, access control, rate limiting, or DoS protection. Malformed input is handled
- by disconnecting the peer, not by recovering the stream.
+- There is no TLS, access control, or rate limiting. The acceptors do have bounded resilience: an
+ optional connection cap, survival of transient `accept()` errors and fd exhaustion, and `EINTR`
+ retry on read/write — but this is robustness hardening, not DoS protection. Malformed input is
+ handled by disconnecting the peer, not by recovering the stream.
## Reporting
diff --git a/docs/binary_protocol.md b/docs/binary_protocol.md
index 1e1c948..33f4e28 100644
--- a/docs/binary_protocol.md
+++ b/docs/binary_protocol.md
@@ -68,7 +68,8 @@ buffer holds the full declared body before parsing.
NewOrder enum fields are validated during decode. Out-of-range values for Side, OrderType,
or TimeInForce return DecodeError::InvalidEnumValue and are not surfaced as internal domain
-messages.
+messages. Gateway-response decoders apply the same domain check: `decode_reject` returns
+`InvalidEnumValue` for a `RejectReason` byte outside the defined codes (#136).
## Trailing bytes and framing
diff --git a/docs/differential_testing.md b/docs/differential_testing.md
index 7284844..ded31d9 100644
--- a/docs/differential_testing.md
+++ b/docs/differential_testing.md
@@ -332,6 +332,9 @@ When the differential check fails in CI, the `ocaml-verifier` job runs `diff_rep
positive fixtures and uploads a `differential-failure-bundle` artifact. For each diverging
fixture it contains `.original` (the fixture), `.computed` (OCaml snapshot),
`.expected` (C++ snapshot), and `.diff` (a line diff) — so a divergence can be
-debugged from the CI run without reproducing locally. The minimal-counterexample form of a
+debugged from the CI run without reproducing locally. `diff_report` guards each fixture
+independently: a malformed or unreadable fixture is reported as a comparison failure (non-zero
+exit), not allowed to abort the batch and lose the remaining fixtures' bundles (#144). The
+minimal-counterexample form of a
failing *generated* stream is produced separately by the C++ shrinker (`qsl-export-stream
shrink`, M19).
diff --git a/docs/fix_protocol.md b/docs/fix_protocol.md
index d25c8ce..497d370 100644
--- a/docs/fix_protocol.md
+++ b/docs/fix_protocol.md
@@ -92,4 +92,6 @@ ones:
out-of-range integers, and oversized messages;
- signed/extreme `int64` price and `uint64` id/seq round-trips.
-The adapter is also covered by the ASan/UBSan preset (`make asan`), since it parses untrusted text.
+The adapter is also covered by the ASan/UBSan preset (`make asan`), since it parses untrusted text;
+the UBSan half now aborts on the first violation (`-fno-sanitize-recover=undefined`, #142), so a
+UBSan defect in the parser fails the build rather than being silently recovered.
diff --git a/docs/pool_backed_storage.md b/docs/pool_backed_storage.md
index 6fa6b7e..98b2e5a 100644
--- a/docs/pool_backed_storage.md
+++ b/docs/pool_backed_storage.md
@@ -215,28 +215,28 @@ produced the earlier "intrusive is ~4-5x slower" ranking.
This artifact moves engine construction, the registration prefix, and the end-of-run snapshot
readout outside the timed interval (`Source digest:
-sha256:b606452b1bbff3d1c4eed8f59839701590cfbc824207f7b707c03ca66766353a`, `Dirty inputs: no`), so
-each row reflects per-command work. The corrected medians are:
+sha256:c1e4cd7db8472a87cbd23ece3a2d4b330f78ad876b58da412e0e54f6c4eb4cf7`, `Dirty inputs: no`), so
+each row reflects per-command work. The medians are:
| Workload | Shape summary | Median ns/timed-command, fastest to slowest |
| --- | --- | --- |
-| General generated flow | 4 symbols, 5000 timed cmds, 2238 trades, 793 cancels, 690 modifies, max 41 active levels, width 67, density 0.076 | Contiguous 93.2, intrusive 95.4, baseline 111.0, PMR 121.4 |
-| Dense bounded flow | 2 symbols, 5002 timed cmds, 1048 trades, 984 market orders, 20008 probes, max 80 active levels, width 136, density 0.147 | Intrusive 66.0, contiguous 70.7, PMR 88.3, baseline 96.4 |
-| Sparse wide flow | 4 symbols, 5000 timed cmds, no trades, 828 cancels, 828 modifies, max 32 active levels, width 985, density 0.004 | Intrusive 48.2, contiguous 60.9, PMR 72.1, baseline 81.0 |
-| Cancel/modify-heavy flow | 3 symbols, 5001 timed cmds, no trades, 1599 cancels, 1603 modifies, max 60 active levels, width 30, density 0.333 | Contiguous 42.8, intrusive 44.3, baseline 59.7, PMR 59.8 |
-| Match/traversal-heavy flow | 1 symbol, 5003 timed cmds, 4012 trades, 494 market orders, max 60 active levels, width 81, density 0.370 | Contiguous 69.9, intrusive 87.2, baseline 109.3, PMR 117.9 |
+| General generated flow | 4 symbols, 5000 timed cmds, 2238 trades, 793 cancels, 690 modifies, max 41 active levels, width 67, density 0.076 | Contiguous 71.2, intrusive 80.3, baseline 89.4, PMR 100.0 |
+| Dense bounded flow | 2 symbols, 5002 timed cmds, 1048 trades, 984 market orders, 20008 probes, max 80 active levels, width 136, density 0.147 | Intrusive 52.2, contiguous 57.8, baseline 66.1, PMR 66.3 |
+| Sparse wide flow | 4 symbols, 5000 timed cmds, no trades, 828 cancels, 828 modifies, max 32 active levels, width 985, density 0.004 | Contiguous 40.3, intrusive 42.8, baseline 55.7, PMR 57.9 |
+| Cancel/modify-heavy flow | 3 symbols, 5001 timed cmds, no trades, 1599 cancels, 1603 modifies, max 60 active levels, width 30, density 0.333 | Contiguous 31.4, intrusive 36.7, baseline 49.0, PMR 54.7 |
+| Match/traversal-heavy flow | 1 symbol, 5003 timed cmds, 4012 trades, 494 market orders, max 60 active levels, width 81, density 0.370 | Contiguous 56.8, intrusive 65.1, baseline 96.5, PMR 110.0 |
-### What the corrected numbers show
+### What the numbers show
-With per-run setup excluded the four modes cluster into a much tighter band (roughly 40-120
+With per-run setup excluded the four modes cluster into a much tighter band (roughly 30-110
ns/command) instead of the earlier 40-486 spread, and the large earlier gaps are explained by
-per-run pool initialization rather than per-command cost. Intrusive and contiguous storage are the
-two fastest modes and trade the lead by workload shape: intrusive leads the insert/resting-heavy
-dense and sparse flows, contiguous leads the cancel/modify and traversal-heavy flows, and they are
-within a few ns/command on the general flow. Baseline `std::map`/`std::list` and PMR pooling sit
-behind both, with PMR sometimes ahead of baseline and sometimes behind. The medians above come from
-a quiet-host regeneration whose min/max ranges are tight; treat absolute values as environment- and
-build-dependent.
+per-run pool initialization rather than per-command cost. Contiguous storage is fastest on four of
+the five workloads (general, sparse, cancel/modify, match/traversal); the intrusive pool leads only
+the dense bounded flow and is close behind contiguous elsewhere. Baseline `std::map`/`std::list` and
+PMR pooling sit behind both, with baseline usually ahead of PMR. The medians above come from a
+regeneration whose per-mode min/max ranges are tight; treat absolute values as environment- and
+build-dependent, and note these post-v0.2.2 baseline rows already include the `try_emplace` (#138)
+and index load-factor (#145) wins.
This does not make the intrusive pool "free". It pays a large fixed initialization cost
(pre-allocating 65536 order and node slots per book) that this per-command metric deliberately
diff --git a/docs/recruiting_notes.md b/docs/recruiting_notes.md
index 91ebdbe..fa42665 100644
--- a/docs/recruiting_notes.md
+++ b/docs/recruiting_notes.md
@@ -45,8 +45,9 @@
## Résumé bullets — Linux Engineering (conservative)
- Implemented TCP order-gateway transports and a UDP market-data feed on POSIX sockets
- (loopback), with bounded receive timeouts, sequence-gap detection, threaded portable serving,
- epoll-based Linux serving, and disconnect-on-malformed-framing.
+ (loopback), with bounded receive timeouts, sequence-gap detection, UDP send-error counting,
+ threaded portable serving with a connection cap and accept-error/fd-exhaustion survival,
+ epoll-based Linux serving, `EINTR`-retry on read/write, and disconnect-on-malformed-framing.
- Built CLI tools for append-only-log inspection and deterministic replay, plus a demo script
that orchestrates a loopback gateway round-trip with port-readiness polling and clean
process teardown.
diff --git a/docs/release_readiness.md b/docs/release_readiness.md
index 2fa1637..e148c63 100644
--- a/docs/release_readiness.md
+++ b/docs/release_readiness.md
@@ -2,17 +2,22 @@
A pre-release pass verifying the repo builds, demos, reproduces, and reads honestly. This audit
covers **M0–M49, the v0.2.0 evidence refresh** (bare-metal Linux artifact regeneration and the
-documentation/staleness sweep), **and the v0.2.1 content** (the FIX-like text protocol adapter #29,
-the perf call-graph flamegraph + `make flamegraph` #32, and a Codex resume-anchor/PMU consistency
-sweep). It supersedes the v0.1.0-era audit; the actual GitHub release is cut by a human after
-squash-merge.
+documentation/staleness sweep), **the v0.2.1 content** (the FIX-like text protocol adapter #29, the
+perf call-graph flamegraph + `make flamegraph` #32, and a Codex resume-anchor/PMU consistency sweep),
+**and the post-v0.2.1 hardening + perf wave being cut as v0.2.2** (#135–#146): out-of-domain enum
+rejection in the decoders (#136), network-path hardening — EINTR retry, accept fairness, connection
+cap, UDP send-error tracking, transient-accept survival, and fd-exhaustion handling (#137/#140/#143),
+CLI argument validation (#141), a real UBSan abort gate (#142), OCaml `diff_report` robustness (#144),
+and two measured order-book perf wins — `try_emplace` (~+5%, #138) and an index load-factor cap
+(~+18.6%, #145). It supersedes the v0.1.0-era audit; the actual GitHub release is cut by a human
+after squash-merge.
## Verification (this session, bare-metal Apple M2 / aarch64 / GCC 16.1.1, Fedora Asahi Remix)
| Check | Result |
|---|---|
-| `make check` | 263/263 tests pass, no warnings (incl. the v0.2.1 FIX-adapter + flamegraph-renderer tests) |
-| `make asan` (ASan + UBSan) | 263/263, sanitizer-clean (the FIX text parser handles untrusted input) |
+| `make check` | 270/270 tests pass, no warnings (incl. the FIX-adapter, flamegraph-renderer, decoder enum-rejection, and CLI-arg-validation tests) |
+| `make asan` (ASan + UBSan) | 270/270, sanitizer-clean; the UBSan gate now **aborts** on the first violation (`-fno-sanitize-recover=undefined`, #142), so pure-UBSan defects no longer pass green, and the tree is clean under it |
| `make tsan` (ThreadSanitizer) | 20/20 concurrency-labelled tests, race-clean |
| `make check-fixtures` | committed differential fixtures match current C++ output |
| `make check-manifest` | provenance manifest matches the committed fixtures |
@@ -88,7 +93,9 @@ verification.
## Outcome
-Release-ready as a portfolio artifact. The next GitHub-only release is `v0.2.1` (the FIX-like text
-protocol adapter #29, the perf flamegraph #32, and a Codex resume-anchor/PMU consistency sweep) on
-top of `v0.2.0` (Phase III/IV systems work — M24–M49 — plus the bare-metal evidence refresh); it
+Release-ready as a portfolio artifact. `v0.2.1` is already tagged (FIX adapter #29, perf flamegraph
+#32, anchor sweep) on top of `v0.2.0` (Phase III/IV systems work — M24–M49 — plus the bare-metal
+evidence refresh). The next GitHub-only release is **`v0.2.2`**, bundling the post-v0.2.1
+hardening + perf wave merged to `main` (#135–#146): decoder enum rejection, network/CLI hardening, a
+real UBSan abort gate, OCaml diff_report robustness, and the two measured order-book perf wins. It
requires explicit human approval and a squash-merge before tagging.
diff --git a/docs/replay_and_recovery.md b/docs/replay_and_recovery.md
index 0c48f07..2f1caef 100644
--- a/docs/replay_and_recovery.md
+++ b/docs/replay_and_recovery.md
@@ -163,6 +163,8 @@ measurements, not a production recovery-time claim.
not read back from the log (the log could also store events, but the engine is the source
of truth for replay equivalence).
- The reader loads the whole log into memory before replaying (adequate for the simulator).
-- Commands are trusted once their record checksum validates (M7); the command codec does not
- re-validate enum domains — wire-level enum validation lives at the protocol boundary (M2)
- and risk checks at the gateway (M5).
+- Commands are trusted once their record checksum validates (M7). The command codec also rejects
+ out-of-domain enum bytes: `replay::decode_command` refuses a `NewLimit`/`NewMarket` record whose
+ `Side` or `TimeInForce` byte is not a defined enum value (#136), so a corrupt log record cannot
+ apply garbage straight to the engine. Higher-level validation still lives at the protocol boundary
+ (M2) and the risk gateway (M5).
diff --git a/docs/socket_gateway.md b/docs/socket_gateway.md
index ad6bb36..ef914d2 100644
--- a/docs/socket_gateway.md
+++ b/docs/socket_gateway.md
@@ -38,7 +38,9 @@ The portable `TcpServer` writes responses with a send-all loop that tolerates pa
The Linux `EpollServer` keeps a per-client outbound buffer and leaves the connection registered
for `EPOLLOUT` until all pending response bytes are accepted by the kernel. Both write paths use
`send(..., MSG_NOSIGNAL)` where available, and the platform socket option where available, so a
-client that drops before reading a response cannot terminate the gateway through `SIGPIPE`.
+client that drops before reading a response cannot terminate the gateway through `SIGPIPE`. Both the
+read and write paths retry on `EINTR` — a signal interruption is treated as retryable, not a
+disconnect.
The epoll path treats `EAGAIN` / `EWOULDBLOCK` as normal nonblocking backpressure:
@@ -93,7 +95,12 @@ induces an over-cap response is disconnected.
The default demo uses `TcpServer` because it is portable and easiest to inspect. The accept loop
spawns one worker per accepted connection, so a slow or still-open client no longer prevents the
-server from accepting a later client. The shared `OrderGateway` remains protected by an internal
+server from accepting a later client. A connection cap (`TcpServerOptions::max_active_connections`,
+default `0` = unbounded) load-sheds — a freshly accepted connection at the cap is closed immediately
+rather than spawning another worker. The accept loop also survives transient `accept()` errors
+(`EINTR`/`ECONNABORTED`, retried) and file-descriptor exhaustion (`EMFILE`/`ENFILE`, a brief back-off
+retry) instead of tearing the listener down; the `EpollServer` handles the same conditions by
+disarming and re-arming the listener. The shared `OrderGateway` remains protected by an internal
mutex; network I/O can overlap across clients, but matching-engine mutation stays serialized and
deterministic.
diff --git a/docs/socket_hardening.md b/docs/socket_hardening.md
index 236786b..c806f5f 100644
--- a/docs/socket_hardening.md
+++ b/docs/socket_hardening.md
@@ -18,6 +18,11 @@ service.** Nothing here claims a production-networking stack.
| Peer disconnect mid-write | `send(MSG_NOSIGNAL)` / `SO_NOSIGPIPE` so `SIGPIPE` can't kill the process | `Session` |
| Indefinite blocking recv | Bounded `SO_RCVTIMEO` on the UDP client | `udp_feed` |
| UDP burst loss | Detected via sequence gaps; receive-buffer sizing knob (below) | `udp_feed` |
+| UDP transmit failure | Counted, not silently dropped (`UdpPublisher::send_failures()`) | `udp_feed` |
+| Signal during read/write | `EINTR` retried (not treated as a disconnect) | `TcpServer`/`EpollServer` |
+| Transient accept error | `EINTR`/`ECONNABORTED` retried; listener kept alive | `TcpServer`/`EpollServer` |
+| FD exhaustion | `EMFILE`/`ENFILE` survived (back-off retry / listener disarm-rearm), not a teardown | `TcpServer`/`EpollServer` |
+| Connection-count overload | Optional cap (`max_active_connections`) load-sheds at the cap | `TcpServer` |
The first five rows pre-date M30 (M9/M10); M30 adds the receive-buffer sizing knob and documents
the loss model and the things deliberately left out.
@@ -75,9 +80,12 @@ stated plainly so the gap counter is not mistaken for reliability.
bottleneck here. No `io_uring` code exists; none is claimed.
- **TLS / authentication / authorization.** None. The services are loopback-only demos. Do not
expose them on a routable interface (see `SECURITY.md`).
-- **Idle-peer timeouts, connection caps, rate limiting.** Not implemented. Heartbeats are a
- liveness round-trip only; the gateway does not yet time out idle peers. These are reasonable
- future hardening steps, explicitly not done today.
+- **Connection caps.** Implemented as an opt-in `TcpServer` knob (`max_active_connections`, default
+ `0` = unbounded): at the cap a freshly accepted connection is closed (load-shed) rather than
+ spawning another worker. See the posture table above.
+- **Idle-peer timeouts, rate limiting.** Not implemented. Heartbeats are a liveness round-trip only;
+ the gateway does not yet time out idle peers. These are reasonable future hardening steps,
+ explicitly not done today.
- **`SO_REUSEADDR` / rapid rebind.** Not set; the profiling scripts dodge `TIME_WAIT` by using
separate ports per pass instead of forcing address reuse.
diff --git a/results/README.md b/results/README.md
index 0f8b7aa..8515411 100644
--- a/results/README.md
+++ b/results/README.md
@@ -35,6 +35,13 @@ Benchmark results produced by `make bench` and scripts under `scripts/`.
- `false_sharing_study.txt` — benchmark-only packed-vs-padded SPSC queue-cursor contention study
(`make false-sharing-study`). It is research-note evidence about cache-line sharing shape, not
a production throughput or latency claim.
+- `socket_load_summary.txt` — Linux multi-client TCP connection-scaling load experiment
+ (`make socket-load`, `scripts/socket_load.sh`): N concurrent `qsl-client`s against the threaded and
+ epoll gateways. Constrained loopback connection-setup shape only, not a production-capacity claim.
+- `socket_profile_loopback.txt` — Linux syscall/socket-path profiling of the gateway I/O path
+ (`make profile-io`, `scripts/profile_gateway_io.sh`). Loopback, constrained evidence.
+- `socket_stress_summary.txt` — UDP socket-buffer / burst-loss experiment (`make socket-stress`):
+ receive-buffer sizing vs observed sequence-gap loss on loopback. Research-note evidence only.
- `crash_recovery_validation.txt` — M45 SIGKILL crash / torn-tail recovery validation for the
append-only event log across durability modes (`make crash-recovery`). It is process-kill
evidence only: it validates crash-mid-append recovery and acknowledged-record retention across
diff --git a/results/allocator_experiment.txt b/results/allocator_experiment.txt
index 3824cb3..653ba40 100644
--- a/results/allocator_experiment.txt
+++ b/results/allocator_experiment.txt
@@ -4,12 +4,12 @@ OS: Linux 6.19.14-400.asahi.fc44.aarch64+16k
Compiler: c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2)
Build type: Release
Provenance version: 1
-Git commit (informational): 33f8d11
-Source digest: sha256:4f09cf6b1db08de00d5fb480b77d0b1fe7ebb9ea70dbcc7d73807c7eb06e4598
+Git commit (informational): f9f7e98
+Source digest: sha256:e5fb637e109ffba8b25ab7a5274d325ea8edbbbf13aec8d88d1a486cdb1cc168
Source digest scope: allocator-experiment
Dirty inputs: no
Generated output: results/allocator_experiment.txt
-Date: 2026-06-21T05:25:21Z
+Date: 2026-06-25T02:29:37Z
Dataset: engine::Order allocation microbenchmark (new/delete vs fixed pool)
Warmup: iters/10 per benchmark, before timing
Units: latency = ns/op + ops/sec
@@ -19,6 +19,6 @@ This measures allocator mechanics for order-like objects, not end-to-end engine
hardware/compiler/build dependent. A negative or tiny delta is acceptable.
Scenario / Metric / Result:
-order new/delete 500000 ops 14.4 ns/op 69407890 ops/sec
-order pool acquire/release 500000 ops 7.0 ns/op 142345350 ops/sec
-order pool burst cycle 2000 ops 7970.4 ns/op 125464 ops/sec
+order new/delete 500000 ops 12.4 ns/op 80810144 ops/sec
+order pool acquire/release 500000 ops 7.0 ns/op 142468570 ops/sec
+order pool burst cycle 2000 ops 7368.3 ns/op 135716 ops/sec
diff --git a/results/differential.txt b/results/differential.txt
index 9cd907e..7e9495e 100644
--- a/results/differential.txt
+++ b/results/differential.txt
@@ -4,12 +4,12 @@ OS: Linux 6.19.14-400.asahi.fc44.aarch64+16k
Compiler: c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2)
Build type: Release
Provenance version: 1
-Git commit (informational): 33f8d11
-Source digest: sha256:3fe4614b9c004642e244fafaf8d01905ed2dd92ca843bbd579e33e66f5e23836
+Git commit (informational): f9f7e98
+Source digest: sha256:736ee67ee7bfbbac0b8c45c5d2a0805b9bf19a664fa44e8ec650b38a9d46a90f
Source digest scope: differential-benchmark-suite
Dirty inputs: no
Generated output: results/differential.txt
-Date: 2026-06-21T05:25:21Z
+Date: 2026-06-25T02:29:36Z
Dataset: property command streams (generate_property_flow, 3 symbols, 120 orders)
Warmup: iters/10 (or 1 throughput pass) per benchmark, before timing
Units: latency = ns/op + ops/sec; throughput = ns/item + items/sec
@@ -19,6 +19,6 @@ measure the differential-testing harness (generation, gateway replay, shrinking)
production throughput; hardware/compiler/build dependent.
Scenario / Metric / Result:
-property flow generation 123 items 58.0 ns/item 17228399 items/sec
-differential gateway replay 123 items 62.2 ns/item 16071425 items/sec
-shrink property flow 300 ops 31175.3 ns/op 32077 ops/sec
+property flow generation 123 items 108.7 ns/item 9200981 items/sec
+differential gateway replay 123 items 113.8 ns/item 8789365 items/sec
+shrink property flow 300 ops 59593.3 ns/op 16780 ops/sec
diff --git a/results/dpdk_environment.txt b/results/dpdk_environment.txt
index a6aa264..1e3abf8 100644
--- a/results/dpdk_environment.txt
+++ b/results/dpdk_environment.txt
@@ -8,12 +8,12 @@ CPU: Avalanche-M2
Compiler: c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2)
Build type: n/a
Provenance version: 1
-Git commit (informational): 8ab07b5
-Source digest: sha256:a15964c68f12b761a2e60e164dd8dfdc1f56d9fdc896cfe4f867ec5d22b3c8d0
+Git commit (informational): 081e1ec
+Source digest: sha256:ab13cbebe013b05626085319748c5fb9e6d51383be00d78ed7337542d99d67c0
Source digest scope: dpdk-environment-check
Dirty inputs: no
Generated output: results/dpdk_environment.txt
-Date: 2026-06-21T05:43:48Z
+Date: 2026-06-25T02:37:40Z
pkg-config: /usr/bin/pkg-config
libdpdk pkg-config status: not-available
libdpdk version: not-found
diff --git a/results/false_sharing_study.txt b/results/false_sharing_study.txt
index 61bc6d4..461357e 100644
--- a/results/false_sharing_study.txt
+++ b/results/false_sharing_study.txt
@@ -6,12 +6,12 @@ OS: Linux 6.19.14-400.asahi.fc44.aarch64+16k
Compiler: c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2)
Build type: Release
Provenance version: 1
-Git commit (informational): 33f8d11
-Source digest: sha256:4c2b0de72788bd80e4877d9818693a37629ca2decf69260e33c4c3b0c3603c74
+Git commit (informational): f9f7e98
+Source digest: sha256:f8a12fc427a06c5795c555e77a0fca711876ea756923af1922590fee436ab5c2
Source digest scope: false-sharing-study
Dirty inputs: no
Generated output: results/false_sharing_study.txt
-Date: 2026-06-21T05:25:24Z
+Date: 2026-06-25T02:29:38Z
Dataset: synthetic SPSC cursor exchange (producer tail / consumer head)
Host support summary: portable two-thread C++ benchmark; no PMU counters required.
@@ -29,5 +29,5 @@ index with acquire. Benchmark-only control: the padded layout puts each index on
128-byte boundary, so the two cursors sit on distinct coherency lines even on hosts
with 128-byte cache lines (Apple Silicon); the production SpscRing pads to 64 bytes.
-packed indices 4000000 cursor updates 2.9 ns/update 340900588 updates/sec checksum=4000000061052
-padded indices 4000000 cursor updates 29.6 ns/update 33747427 updates/sec checksum=4000002007457
+packed indices 4000000 cursor updates 2.8 ns/update 359899203 updates/sec checksum=4000000021261
+padded indices 4000000 cursor updates 27.2 ns/update 36771064 updates/sec checksum=4000002008455
diff --git a/results/latest.txt b/results/latest.txt
index 5e68f2a..7c24ef2 100644
--- a/results/latest.txt
+++ b/results/latest.txt
@@ -4,12 +4,12 @@ OS: Linux 6.19.14-400.asahi.fc44.aarch64+16k
Compiler: c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2)
Build type: Release
Provenance version: 1
-Git commit (informational): 33f8d11
-Source digest: sha256:1271610ad6c9c96534b350239f0341fe413ce0b989f3aca3434b2f4652395b64
+Git commit (informational): 114445a
+Source digest: sha256:c54df82614b53ea736e845a5893096a9ee12b65a8cee49be28b9c53a25d5a9df
Source digest scope: core-benchmark-suite
Dirty inputs: no
Generated output: results/latest.txt
-Date: 2026-06-21T05:25:21Z
+Date: 2026-06-25T02:36:30Z
Dataset: synthetic order flow (replay::generate_flow, seed 42, 4 symbols)
Warmup: iters/10 (or 1 throughput pass) per benchmark, before timing
Units: latency = ns/op + ops/sec; throughput = ns/item + items/sec
@@ -19,8 +19,8 @@ no kernel/IO path, stock allocator). NOT production exchange throughput or
end-to-end latency; hardware/compiler/build dependent.
Scenario / Metric / Result:
-order_book add/mod/cancel 200000 ops 87.3 ns/op 11458608 ops/sec
-protocol encode+decode 500000 ops 15.9 ns/op 62727387 ops/sec
-gateway session (fill) 200000 ops 109.7 ns/op 9115527 ops/sec
-matching engine flow 5004 items 98.2 ns/item 10181380 items/sec
-replay command log 5004 items 110.4 ns/item 9059370 items/sec
+order_book add/mod/cancel 200000 ops 90.6 ns/op 11043024 ops/sec
+protocol encode+decode 500000 ops 16.1 ns/op 62049736 ops/sec
+gateway session (fill) 200000 ops 102.3 ns/op 9776174 ops/sec
+matching engine flow 5004 items 91.4 ns/item 10939533 items/sec
+replay command log 5004 items 101.3 ns/item 9874313 items/sec
diff --git a/results/nic_offload_environment.txt b/results/nic_offload_environment.txt
index 109125b..14c7f86 100644
--- a/results/nic_offload_environment.txt
+++ b/results/nic_offload_environment.txt
@@ -1,4 +1,4 @@
-Command: QSL_NIC_DEVICES=wld0 make nic-offload-check
+Command: make nic-offload-check
Artifact: NIC offload and timestamping capability check (non-mutating)
Evidence class: linux-readonly-capability-observation
Host support summary: Linux host with read-only NIC capability inspection; no settings changed and no packet measurement ran
@@ -8,24 +8,24 @@ CPU: Avalanche-M2
Compiler: c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2)
Build type: n/a
Provenance version: 1
-Git commit (informational): 8ab07b5
-Source digest: sha256:904cbf8c83cd9ee0107e11be380c53f9509b28b2df2f96a2db8b54b323091a30
+Git commit (informational): 081e1ec
+Source digest: sha256:088a8ba85f514bc6264b43b5754e95f6a8f2e0a239936492f50f800031f8f782
Source digest scope: nic-offload-environment-check
Dirty inputs: no
Generated output: results/nic_offload_environment.txt
-Date: 2026-06-21T05:43:48Z
+Date: 2026-06-25T02:37:40Z
ethtool: /usr/bin/ethtool
ip: /usr/bin/ip
lspci: /usr/bin/lspci
phc_ctl: not-found
ptp4l: not-found
-Requested Linux devices: wld0
+Requested Linux devices: docker0 tailscale0 wld0
Missing requested devices: none
-Linux devices inspected: wld0
-Device count: 1
+Linux devices inspected: docker0 tailscale0 wld0
+Device count: 3
Offload feature list visible: yes
RSS indirection/hash visible: no
-Queue/channel info visible: no
+Queue/channel info visible: yes
Hardware timestamping visible: no
Offload settings changed: no
RSS settings changed: no
@@ -38,6 +38,232 @@ Caveat: This artifact records read-only host and NIC capability context. It does
not change offload flags, queue counts, RSS tables, timestamp filters, drivers,
or interrupt affinity, and it does not support any NIC-offload or latency claim.
+== device docker0 summary ==
+operstate: down
+mtu: 1500
+driver: n/a
+pci: n/a
+rx queues: 1
+tx queues: 1
+
+== ip -details link show dev docker0 ==
+4: docker0: mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
+ link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535 netns-immutable
+ bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time 30000 stp_state 0 priority 32768 vlan_filtering 0 vlan_protocol 802.1Q bridge_id 8000.xx:xx:xx:xx:xx:xx designated_root 8000.xx:xx:xx:xx:xx:xx root_port 0 root_path_cost 0 topology_change 0 topology_change_detected 0 hello_timer 0.00 tcn_timer 0.00 topology_change_timer 0.00 gc_timer 0.00 fdb_n_learned 0 fdb_max_learned 0 vlan_default_pvid 1 vlan_stats_enabled 0 vlan_stats_per_port 0 group_fwd_mask 0 group_address xx:xx:xx:xx:xx:xx mcast_snooping 1 no_linklocal_learn 0 mcast_vlan_snooping 0 mst_enabled 0 mdb_offload_fail_notification 0 mcast_router 1 mcast_query_use_ifaddr 0 mcast_querier 0 mcast_hash_elasticity 16 mcast_hash_max 4096 mcast_last_member_count 2 mcast_startup_query_count 2 mcast_last_member_interval 100 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000 mcast_startup_query_interval 3125 mcast_stats_enabled 0 mcast_igmp_version 2 mcast_mld_version 1 nf_call_iptables 0 nf_call_ip6tables 0 nf_call_arptables 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 gso_ipv4_max_size 65536 gro_ipv4_max_size 65536
+
+== ethtool -i docker0 ==
+driver: bridge
+version: 2.3
+firmware-version: N/A
+expansion-rom-version:
+bus-info: N/A
+supports-statistics: no
+supports-test: no
+supports-eeprom-access: no
+supports-register-dump: no
+supports-priv-flags: no
+
+== ethtool -k docker0 ==
+Features for docker0:
+rx-checksumming: off [fixed]
+tx-checksumming: on
+ tx-checksum-ipv4: off [fixed]
+ tx-checksum-ip-generic: on
+ tx-checksum-ipv6: off [fixed]
+ tx-checksum-fcoe-crc: off [fixed]
+ tx-checksum-sctp: off [fixed]
+scatter-gather: on
+ tx-scatter-gather: on
+ tx-scatter-gather-fraglist: on
+tcp-segmentation-offload: on
+ tx-tcp-segmentation: on
+ tx-tcp-ecn-segmentation: on
+ tx-tcp-mangleid-segmentation: on
+ tx-tcp6-segmentation: on
+ tx-tcp-accecn-segmentation: on
+generic-segmentation-offload: on
+generic-receive-offload: on
+large-receive-offload: off [fixed]
+rx-vlan-offload: off [fixed]
+tx-vlan-offload: on
+ntuple-filters: off [fixed]
+receive-hashing: off [fixed]
+highdma: on
+rx-vlan-filter: off [fixed]
+vlan-challenged: off [fixed]
+tx-gso-robust: on
+tx-fcoe-segmentation: on
+tx-gre-segmentation: on
+tx-gre-csum-segmentation: on
+tx-ipxip4-segmentation: on
+tx-ipxip6-segmentation: on
+tx-udp_tnl-segmentation: on
+tx-udp_tnl-csum-segmentation: on
+tx-gso-partial: on
+tx-tunnel-remcsum-segmentation: on
+tx-sctp-segmentation: on
+tx-esp-segmentation: on
+tx-udp-segmentation: on
+tx-gso-list: on
+tx-nocache-copy: off
+loopback: off [fixed]
+rx-fcs: off [fixed]
+rx-all: off [fixed]
+tx-vlan-stag-hw-insert: on
+rx-vlan-stag-hw-parse: off [fixed]
+rx-vlan-stag-filter: off [fixed]
+l2-fwd-offload: off [fixed]
+hw-tc-offload: off [fixed]
+esp-hw-offload: off [fixed]
+esp-tx-csum-hw-offload: off [fixed]
+rx-udp_tunnel-port-offload: off [fixed]
+tls-hw-tx-offload: off [fixed]
+tls-hw-rx-offload: off [fixed]
+rx-gro-hw: off [fixed]
+tls-hw-record: off [fixed]
+rx-gro-list: off
+macsec-hw-offload: off [fixed]
+rx-udp-gro-forwarding: off
+hsr-tag-ins-offload: off [fixed]
+hsr-tag-rm-offload: off [fixed]
+hsr-fwd-offload: off [fixed]
+hsr-dup-offload: off [fixed]
+
+== ethtool -l docker0 ==
+netlink error: Operation not supported
+command failed: /usr/bin/ethtool -l docker0
+
+== ethtool -x docker0 ==
+netlink error: Operation not supported
+command failed: /usr/bin/ethtool -x docker0
+
+== ethtool -T docker0 ==
+Time stamping parameters for docker0:
+Capabilities:
+ software-receive
+ software-system-clock
+PTP Hardware Clock: none
+Hardware Transmit Timestamp Modes: none
+Hardware Receive Filter Modes: none
+
+== device tailscale0 summary ==
+operstate: unknown
+mtu: 1280
+driver: n/a
+pci: n/a
+rx queues: 1
+tx queues: 1
+
+== ip -details link show dev tailscale0 ==
+2: tailscale0: mtu 1280 qdisc fq_codel state UNKNOWN mode DEFAULT group default qlen 500
+ link/none promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535
+ tun type tun pi off vnet_hdr on persist off addrgenmode random numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 gso_ipv4_max_size 65536 gro_ipv4_max_size 65536
+
+== ethtool -i tailscale0 ==
+driver: tun
+version: 1.6
+firmware-version:
+expansion-rom-version:
+bus-info: tun
+supports-statistics: no
+supports-test: no
+supports-eeprom-access: no
+supports-register-dump: no
+supports-priv-flags: no
+
+== ethtool -k tailscale0 ==
+Features for tailscale0:
+rx-checksumming: off [fixed]
+tx-checksumming: on
+ tx-checksum-ipv4: off [fixed]
+ tx-checksum-ip-generic: on
+ tx-checksum-ipv6: off [fixed]
+ tx-checksum-fcoe-crc: off [fixed]
+ tx-checksum-sctp: off [fixed]
+scatter-gather: on
+ tx-scatter-gather: on
+ tx-scatter-gather-fraglist: on
+tcp-segmentation-offload: on
+ tx-tcp-segmentation: on
+ tx-tcp-ecn-segmentation: off
+ tx-tcp-mangleid-segmentation: off
+ tx-tcp6-segmentation: on
+ tx-tcp-accecn-segmentation: off [fixed]
+generic-segmentation-offload: on
+generic-receive-offload: on
+large-receive-offload: off [fixed]
+rx-vlan-offload: off [fixed]
+tx-vlan-offload: on
+ntuple-filters: off [fixed]
+receive-hashing: off [fixed]
+highdma: off [fixed]
+rx-vlan-filter: off [fixed]
+vlan-challenged: off [fixed]
+tx-gso-robust: off [fixed]
+tx-fcoe-segmentation: off [fixed]
+tx-gre-segmentation: off [fixed]
+tx-gre-csum-segmentation: off [fixed]
+tx-ipxip4-segmentation: off [fixed]
+tx-ipxip6-segmentation: off [fixed]
+tx-udp_tnl-segmentation: off
+tx-udp_tnl-csum-segmentation: off
+tx-gso-partial: off [fixed]
+tx-tunnel-remcsum-segmentation: off [fixed]
+tx-sctp-segmentation: off [fixed]
+tx-esp-segmentation: off [fixed]
+tx-udp-segmentation: on
+tx-gso-list: off [fixed]
+tx-nocache-copy: off
+loopback: off [fixed]
+rx-fcs: off [fixed]
+rx-all: off [fixed]
+tx-vlan-stag-hw-insert: on
+rx-vlan-stag-hw-parse: off [fixed]
+rx-vlan-stag-filter: off [fixed]
+l2-fwd-offload: off [fixed]
+hw-tc-offload: off [fixed]
+esp-hw-offload: off [fixed]
+esp-tx-csum-hw-offload: off [fixed]
+rx-udp_tunnel-port-offload: off [fixed]
+tls-hw-tx-offload: off [fixed]
+tls-hw-rx-offload: off [fixed]
+rx-gro-hw: off [fixed]
+tls-hw-record: off [fixed]
+rx-gro-list: off
+macsec-hw-offload: off [fixed]
+rx-udp-gro-forwarding: off
+hsr-tag-ins-offload: off [fixed]
+hsr-tag-rm-offload: off [fixed]
+hsr-fwd-offload: off [fixed]
+hsr-dup-offload: off [fixed]
+
+== ethtool -l tailscale0 ==
+Channel parameters for tailscale0:
+Pre-set maximums:
+RX: n/a
+TX: n/a
+Other: n/a
+Combined: 1
+Current hardware settings:
+RX: n/a
+TX: n/a
+Other: n/a
+Combined: 1
+
+== ethtool -x tailscale0 ==
+netlink error: Operation not supported
+command failed: /usr/bin/ethtool -x tailscale0
+
+== ethtool -T tailscale0 ==
+Time stamping parameters for tailscale0:
+Capabilities:
+ software-transmit
+ software-receive
+ software-system-clock
+PTP Hardware Clock: none
+Hardware Transmit Timestamp Modes: none
+Hardware Receive Filter Modes: none
+
== device wld0 summary ==
operstate: up
mtu: 1500
@@ -50,7 +276,7 @@ tx queues: 1
01:00.0 Network controller: Broadcom Inc. and subsidiaries BCM4387 802.11ax Dual Band Wireless LAN Controller (rev 07)
== ip -details link show dev wld0 ==
-2: wld0: mtu 1500 qdisc fq_codel state UP mode DORMANT group default qlen 1000
+3: wld0: mtu 1500 qdisc fq_codel state UP mode DORMANT group default qlen 1000
link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff permaddr xx:xx:xx:xx:xx:xx promiscuity 0 allmulti 0 minmtu 68 maxmtu 1500 netns-immutable addrgenmode none numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 gso_ipv4_max_size 65536 gro_ipv4_max_size 65536 parentbus pci parentdev 0000:01:00.0
altname wlp1s0f0
altname wlxxxxxxxxxxxxx
diff --git a/results/numa_affinity_study.txt b/results/numa_affinity_study.txt
index f95a440..5b4c8b5 100644
--- a/results/numa_affinity_study.txt
+++ b/results/numa_affinity_study.txt
@@ -1,4 +1,4 @@
-Command: QSL_NUMA_ALLOW_CONSTRAINED=1 QSL_NUMA_BIN=build/bench/qsl-bench make numa-study
+Command: QSL_NUMA_BIN=build/bench/qsl-bench make numa-study
Evidence class: linux-constrained
Host support summary: Linux host, constrained evidence
Hardware: aarch64
@@ -7,12 +7,12 @@ CPU: Avalanche-M2
Compiler: c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2)
Build type: Release
Provenance version: 1
-Git commit (informational): 33f8d11
-Source digest: sha256:ff7a0a6b696ef700cd7bb568a531cf2e06ea16932d1031a8cfe85be6e0d21b91
+Git commit (informational): f9f7e98
+Source digest: sha256:0b9e8373fa304d7e734399a92e3b7f8bc8f4c6ee538621ad53cc35b443c67909
Source digest scope: numa-affinity-study
Dirty inputs: no
Generated output: results/numa_affinity_study.txt
-Date: 2026-06-21T05:25:24Z
+Date: 2026-06-25T02:30:17Z
Benchmark binary: build/bench/qsl-bench
Allowed CPUs: 0-7
CPU chosen: 0
@@ -50,18 +50,18 @@ Pinned command:
taskset -c 0 build/bench/qsl-bench
Unpinned benchmark output:
-order_book add/mod/cancel 200000 ops 138.0 ns/op 7248959 ops/sec
-protocol encode+decode 500000 ops 20.9 ns/op 47914709 ops/sec
-gateway session (fill) 200000 ops 128.2 ns/op 7800869 ops/sec
-matching engine flow 5004 items 101.7 ns/item 9834865 items/sec
-replay command log 5004 items 113.7 ns/item 8798241 items/sec
+order_book add/mod/cancel 200000 ops 113.5 ns/op 8807360 ops/sec
+protocol encode+decode 500000 ops 20.0 ns/op 50088843 ops/sec
+gateway session (fill) 200000 ops 115.6 ns/op 8652239 ops/sec
+matching engine flow 5004 items 93.6 ns/item 10682213 items/sec
+replay command log 5004 items 101.4 ns/item 9862110 items/sec
Pinned benchmark output:
-order_book add/mod/cancel 200000 ops 143.3 ns/op 6976774 ops/sec
-protocol encode+decode 500000 ops 27.7 ns/op 36063756 ops/sec
-gateway session (fill) 200000 ops 236.9 ns/op 4220492 ops/sec
-matching engine flow 5004 items 187.1 ns/item 5345523 items/sec
-replay command log 5004 items 221.8 ns/item 4508370 items/sec
+order_book add/mod/cancel 200000 ops 234.2 ns/op 4269507 ops/sec
+protocol encode+decode 500000 ops 29.9 ns/op 33473413 ops/sec
+gateway session (fill) 200000 ops 219.4 ns/op 4558316 ops/sec
+matching engine flow 5004 items 168.2 ns/item 5946192 items/sec
+replay command log 5004 items 187.6 ns/item 5329445 items/sec
NUMA local benchmark output:
NUMA node-local/remote binding skipped: fewer than two NUMA nodes found
@@ -76,10 +76,10 @@ Unpinned perf stat output:
0 context-switches:u
0 cpu-migrations:u
- 0.084315551 seconds time elapsed
+ 0.095650252 seconds time elapsed
- 0.084129000 seconds user
- 0.000000000 seconds sys
+ 0.094408000 seconds user
+ 0.000983000 seconds sys
@@ -90,10 +90,10 @@ Pinned perf stat output:
0 context-switches:u
0 cpu-migrations:u
- 0.154719226 seconds time elapsed
+ 0.144623525 seconds time elapsed
- 0.152299000 seconds user
- 0.001988000 seconds sys
+ 0.141344000 seconds user
+ 0.002985000 seconds sys
@@ -111,7 +111,7 @@ Core(s) per socket: 4
Socket(s): 1
Stepping: 0x1
Frequency boost: disabled
-CPU(s) scaling MHz: 53%
+CPU(s) scaling MHz: 100%
CPU max MHz: 2424.0000
CPU min MHz: 600.0000
BogoMIPS: 48.00
@@ -156,7 +156,7 @@ numactl --hardware output:
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 7481 MB
-node 0 free: 1620 MB
+node 0 free: 2023 MB
node distances:
node 0
0: 10
diff --git a/results/perf_report_linux.txt b/results/perf_report_linux.txt
index 92e4c04..3bd5be1 100644
--- a/results/perf_report_linux.txt
+++ b/results/perf_report_linux.txt
@@ -8,18 +8,18 @@ Perf: perf version 6.19.14-400.asahi.fc44.aarch64
Perf paranoid: 2
Build type: Release
Provenance version: 1
-Git commit (informational): 33f8d11
-Source digest: sha256:c991d51c8076952f2c3dcd5e407f78e512d9fb4e573cb7ad65f7a700a9ed37a2
+Git commit (informational): f9f7e98
+Source digest: sha256:1837aa008369e0029dd4a16e7e780bacac293688e03351b88dbb4c586fbbf34e
Source digest scope: perf-record-benchmark
Dirty inputs: no
Generated output: results/perf_report_linux.txt
-Date: 2026-06-21T05:25:24Z
+Date: 2026-06-25T02:30:02Z
Benchmark binary: build/bench/qsl-bench
Benchmark status: 0
Dataset: qsl-bench default synthetic benchmark suite
Record event: cpu-clock
Sample freq: 2000 Hz
-Sample count: 186
+Sample count: 188
Minimum samples for hot profile: 100
Insufficient samples: no
Report limit: 1%
@@ -34,22 +34,22 @@ cpu-clock event is a software sampling profile for hot-symbol investigation,
not a latency or throughput measurement.
Benchmark output:
-order_book add/mod/cancel 200000 ops 141.9 ns/op 7047598 ops/sec
-protocol encode+decode 500000 ops 21.3 ns/op 47032627 ops/sec
-gateway session (fill) 200000 ops 129.1 ns/op 7743908 ops/sec
-matching engine flow 5004 items 103.0 ns/item 9713046 items/sec
-replay command log 5004 items 112.8 ns/item 8863630 items/sec
+order_book add/mod/cancel 200000 ops 131.2 ns/op 7621590 ops/sec
+protocol encode+decode 500000 ops 20.1 ns/op 49771400 ops/sec
+gateway session (fill) 200000 ops 118.7 ns/op 8425787 ops/sec
+matching engine flow 5004 items 95.1 ns/item 10520557 items/sec
+replay command log 5004 items 99.8 ns/item 10021725 items/sec
Benchmark output under perf:
-order_book add/mod/cancel 200000 ops 112.7 ns/op 8873425 ops/sec
-protocol encode+decode 500000 ops 21.0 ns/op 47551868 ops/sec
-gateway session (fill) 200000 ops 127.6 ns/op 7833933 ops/sec
-matching engine flow 5004 items 101.1 ns/item 9892789 items/sec
-replay command log 5004 items 119.5 ns/item 8368038 items/sec
+order_book add/mod/cancel 200000 ops 139.1 ns/op 7190560 ops/sec
+protocol encode+decode 500000 ops 20.5 ns/op 48888534 ops/sec
+gateway session (fill) 200000 ops 117.5 ns/op 8511050 ops/sec
+matching engine flow 5004 items 92.2 ns/item 10847835 items/sec
+replay command log 5004 items 97.9 ns/item 10213751 items/sec
perf record stderr:
[ perf record: Woken up 1 times to write data ]
-[ perf record: Captured and wrote 0.028 MB build/perf/qsl-bench.perf.data (186 samples) ]
+[ perf record: Captured and wrote 0.027 MB build/perf/qsl-bench.perf.data (188 samples) ]
perf report stderr:
@@ -59,15 +59,20 @@ perf report output:
#
# Total Lost Samples: 0
#
-# Samples: 186 of event 'cpu-clock:u'
-# Event count (approx.): 93000000
+# Samples: 188 of event 'cpu-clock:u'
+# Event count (approx.): 94000000
#
-# Overhead Symbol Shared Object IPC [IPC Coverage]
-# ........ ....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... ..................... ....................
+# Overhead Symbol Shared Object IPC [IPC Coverage]
+# ........ ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... ..................... ....................
#
- 10.22% [.] cfree@GLIBC_2.17 libc.so.6 - -
+ 12.23% [.] cfree@GLIBC_2.17 libc.so.6 - -
|
- |--1.61%--qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&)
+ |--3.19%--main
+ | __libc_start_call_main
+ | __libc_start_main@@GLIBC_2.34
+ | _start
+ |
+ |--1.60%--qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&)
| decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0]
| qsl::engine::OrderBook::cancel(unsigned long)
| main
@@ -75,7 +80,7 @@ perf report output:
| __libc_start_main@@GLIBC_2.34
| _start
|
- |--1.08%--qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector >&, unsigned long)
+ |--1.06%--qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector >&, unsigned long)
| qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long)
| qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long)
| qsl::gateway::Session::on_bytes(std::span)
@@ -84,7 +89,7 @@ perf report output:
| __libc_start_main@@GLIBC_2.34
| _start
|
- |--1.08%--qsl::gateway::(anonymous namespace)::append(std::vector >&, std::vector > const&, unsigned long) [clone .isra.0]
+ |--1.06%--qsl::gateway::(anonymous namespace)::append(std::vector >&, std::vector > const&, unsigned long) [clone .isra.0]
| qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector >&, unsigned long)
| qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long)
| qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long)
@@ -94,160 +99,93 @@ perf report output:
| __libc_start_main@@GLIBC_2.34
| _start
|
- |--1.08%--qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector > const&)
- | main
- | __libc_start_call_main
- | __libc_start_main@@GLIBC_2.34
- | _start
- |
- |--1.08%--std::_Hashtable, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node, false>*)
- | qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&)
+ |--1.06%--qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
|
- --1.08%--main
+ --1.06%--0x5000000402b63
__libc_start_call_main
__libc_start_main@@GLIBC_2.34
_start
- 9.68% [.] malloc libc.so.6 - -
+ 6.91% [.] malloc libc.so.6 - -
|
- |--6.45%--operator new(unsigned long)
- | |
- | |--2.15%--qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
- | | |
- | | --1.61%--qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
- | | qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long)
- | | qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long)
- | | qsl::gateway::Session::on_bytes(std::span)
- | | main
- | | __libc_start_call_main
- | | __libc_start_main@@GLIBC_2.34
- | | _start
- | |
- | |--1.61%--qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
- | | qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
- | | qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
- | | qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long)
- | | qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long)
- | | qsl::gateway::Session::on_bytes(std::span)
- | | main
- | | __libc_start_call_main
- | | __libc_start_main@@GLIBC_2.34
- | | _start
+ |--3.72%--operator new(unsigned long)
| |
- | |--1.08%--qsl::gateway::(anonymous namespace)::append(std::vector >&, std::vector > const&, unsigned long) [clone .isra.0]
- | | qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector >&, unsigned long)
- | | qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long)
- | | qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long)
- | | qsl::gateway::Session::on_bytes(std::span)
- | | main
- | | __libc_start_call_main
- | | __libc_start_main@@GLIBC_2.34
- | | _start
- | |
- | --1.08%--qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long)
+ | --1.60%--qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long)
| main
| __libc_start_call_main
| __libc_start_main@@GLIBC_2.34
| _start
|
- --3.23%--__posix_memalign
+ --3.19%--__posix_memalign
operator new(unsigned long, std::align_val_t)
|
- |--1.61%--std::__detail::_Map_base, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits, true>::operator[](unsigned long const&)
- | qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
+ |--1.60%--qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
| qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
| |
- | --1.08%--main
+ | --1.06%--main
| __libc_start_call_main
| __libc_start_main@@GLIBC_2.34
| _start
|
- --1.08%--qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
+ --1.06%--qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long)
+ qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+
+ 5.32% [.] qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) qsl-bench - -
+ |
+ |--3.19%--main
+ | __libc_start_call_main
+ | __libc_start_main@@GLIBC_2.34
+ | _start
+ |
+ --1.60%--qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+ |
+ --1.06%--qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&)
+ qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector > const&)
main
__libc_start_call_main
__libc_start_main@@GLIBC_2.34
_start
- 8.60% [.] qsl::protocol::decode_new_order(std::span) qsl-bench - -
+ 4.79% [.] qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) qsl-bench - -
+ |
+ ---main
+ __libc_start_call_main
+ __libc_start_main@@GLIBC_2.34
+ _start
+
+ 3.72% [.] operator new(unsigned long) libstdc++.so.6.0.35 - -
|
- |--6.99%--main
+ |--1.06%--qsl::engine::OrderBook::fill_front_order(std::__cxx11::list >&, long, qsl::engine::OrderBook::MatchContext&)
+ | qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&)
+ | qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+ | qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+ | qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+ | qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long)
+ | qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long)
+ | qsl::gateway::Session::on_bytes(std::span)
+ | main
| __libc_start_call_main
| __libc_start_main@@GLIBC_2.34
| _start
|
- --1.61%--qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long)
- qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long)
- qsl::gateway::Session::on_bytes(std::span)
+ --1.06%--qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long)
main
__libc_start_call_main
__libc_start_main@@GLIBC_2.34
_start
- 4.84% [.] qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) qsl-bench - -
- |
- --4.30%--qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
- |
- --3.76%--qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
- |
- |--2.15%--qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
- | qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long)
- | qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long)
- | qsl::gateway::Session::on_bytes(std::span)
- | main
- | __libc_start_call_main
- | __libc_start_main@@GLIBC_2.34
- | _start
- |
- --1.61%--qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&)
- |
- --1.08%--main
- __libc_start_call_main
- __libc_start_main@@GLIBC_2.34
- _start
-
- 4.30% [.] malloc@plt libstdc++.so.6.0.35 - -
- |
- ---operator new(unsigned long)
- |
- |--2.15%--qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
- | |
- | |--1.08%--qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&)
- | |
- | --1.08%--qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
- | qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long)
- | qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long)
- | qsl::gateway::Session::on_bytes(std::span)
- | main
- | __libc_start_call_main
- | __libc_start_main@@GLIBC_2.34
- | _start
- |
- --1.08%--qsl::protocol::encode(qsl::protocol::Fill const&)
- qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector >&, unsigned long)
- qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long)
- qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long)
- qsl::gateway::Session::on_bytes(std::span)
- main
- __libc_start_call_main
- __libc_start_main@@GLIBC_2.34
- _start
-
- 2.69% [.] operator new(unsigned long) libstdc++.so.6.0.35 - -
- |
- --1.08%--qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
-
- 2.69% [.] qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) qsl-bench - -
+ 3.72% [.] qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) qsl-bench - -
|
- |--1.61%--qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&)
+ |--2.13%--qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&)
| |
- | --1.08%--qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector > const&)
+ | --1.60%--qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector > const&)
| main
| __libc_start_call_main
| __libc_start_main@@GLIBC_2.34
| _start
|
- --1.08%--qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+ --1.60%--qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long)
qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long)
qsl::gateway::Session::on_bytes(std::span)
@@ -256,200 +194,245 @@ perf report output:
__libc_start_main@@GLIBC_2.34
_start
- 2.69% [.] qsl::engine::OrderBook::contains(unsigned long) const qsl-bench - -
+ 3.72% [.] qsl::protocol::decode_header(std::span) qsl-bench - -
|
- --1.61%--qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long)
+ |--2.13%--qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long)
+ | qsl::gateway::Session::on_bytes(std::span)
+ | main
+ | __libc_start_call_main
+ | __libc_start_main@@GLIBC_2.34
+ | _start
+ |
+ --1.60%--qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long)
+ qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long)
+ qsl::gateway::Session::on_bytes(std::span)
main
__libc_start_call_main
__libc_start_main@@GLIBC_2.34
_start
- 2.15% [.] __posix_memalign libc.so.6 - -
+ 3.19% [.] malloc@plt libstdc++.so.6.0.35 - -
|
- ---operator new(unsigned long, std::align_val_t)
+ ---operator new(unsigned long)
|
- --1.08%--std::__detail::_Map_base, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits, true>::operator[](unsigned long const&)
- qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
- qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
-
- 2.15% [.] main qsl-bench - -
- |
- ---__libc_start_call_main
- __libc_start_main@@GLIBC_2.34
- _start
-
- 2.15% [.] operator delete(void*)@plt libstdc++.so.6.0.35 - -
- 2.15% [.] operator delete(void*, unsigned long, std::align_val_t)@plt libstdc++.so.6.0.35 - -
- |
- |--1.08%--std::_Hashtable, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node, false>*)
- | decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0]
- | qsl::engine::OrderBook::cancel(unsigned long)
- |
- --1.08%--qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&)
- decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0]
- qsl::engine::OrderBook::cancel(unsigned long)
-
- 2.15% [.] qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) qsl-bench - -
- |
- ---main
- __libc_start_call_main
- __libc_start_main@@GLIBC_2.34
- _start
-
- 2.15% [.] std::_Hashtable, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node, false>*) qsl-bench - -
- |
- --1.61%--decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0]
- |
- --1.08%--qsl::engine::OrderBook::cancel(unsigned long)
-
- 2.15% [.] std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) libstdc++.so.6.0.35 - -
- |
- ---qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&)
- decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0]
- qsl::engine::OrderBook::cancel(unsigned long)
- main
- __libc_start_call_main
- __libc_start_main@@GLIBC_2.34
- _start
+ |--1.06%--qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+ | qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long)
+ | qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long)
+ | qsl::gateway::Session::on_bytes(std::span)
+ | main
+ | __libc_start_call_main
+ | __libc_start_main@@GLIBC_2.34
+ | _start
+ |
+ --1.06%--qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long)
+ main
+ __libc_start_call_main
+ __libc_start_main@@GLIBC_2.34
+ _start
- 2.15% [.] std::__detail::_Map_base, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits, true>::operator[](unsigned long const&) qsl-bench - -
+ 3.19% [.] qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) qsl-bench - -
|
---qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
- main
- __libc_start_call_main
- __libc_start_main@@GLIBC_2.34
- _start
+ |
+ |--2.13%--main
+ | __libc_start_call_main
+ | __libc_start_main@@GLIBC_2.34
+ | _start
+ |
+ --1.06%--qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+ qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&)
- 1.61% [.] decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] qsl-bench - -
+ 3.19% [.] qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) qsl-bench - -
|
- ---qsl::engine::OrderBook::cancel(unsigned long)
- main
- __libc_start_call_main
- __libc_start_main@@GLIBC_2.34
- _start
+ ---qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+ |
+ --2.66%--qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+ |
+ |--1.60%--qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&)
+ | |
+ | --1.06%--qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector > const&)
+ | main
+ | __libc_start_call_main
+ | __libc_start_main@@GLIBC_2.34
+ | _start
+ |
+ --1.06%--qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+ qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long)
+ qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long)
+ qsl::gateway::Session::on_bytes(std::span)
+ main
+ __libc_start_call_main
+ __libc_start_main@@GLIBC_2.34
+ _start
+
+ 3.19% [.] qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) qsl-bench - -
+ |
+ ---qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+ |
+ --2.66%--main
+ __libc_start_call_main
+ __libc_start_main@@GLIBC_2.34
+ _start
- 1.61% [.] memcpy@plt qsl-bench - -
- 1.61% [.] operator new(unsigned long, std::align_val_t) libstdc++.so.6.0.35 - -
+ 2.66% [.] _mid_memalign libc.so.6 - -
|
- --1.08%--qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
+ |--1.60%--0x2fffff346e1a63
+ | operator new(unsigned long, std::align_val_t)
+ | qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long)
+ | qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
+ | qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+ | main
+ | __libc_start_call_main
+ | __libc_start_main@@GLIBC_2.34
+ | _start
+ |
+ --1.06%--0x63ffff346e1a63
+ operator new(unsigned long, std::align_val_t)
+ std::__detail::_Map_base, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits, true>::operator[](unsigned long const&)
+ qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
main
__libc_start_call_main
__libc_start_main@@GLIBC_2.34
_start
- 1.61% [.] operator new(unsigned long, std::align_val_t)@plt libstdc++.so.6.0.35 - -
+ 2.66% [.] qsl::engine::OrderBook::contains(unsigned long) const qsl-bench - -
|
- --1.08%--std::pair > > >, bool> std::_Rb_tree > >, std::_Select1st > > >, std::greater, std::pmr::polymorphic_allocator > > > >::_M_emplace_unique > >(long&, std::__cxx11::list >&&)
- qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long)
- qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
- qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+ --2.13%--qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long)
main
__libc_start_call_main
__libc_start_main@@GLIBC_2.34
_start
- 1.61% [.] qsl::engine::OrderBook::fill_front_order(std::__cxx11::list >&, long, qsl::engine::OrderBook::MatchContext&) qsl-bench - -
+ 2.66% [.] qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) qsl-bench - -
|
- ---qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&)
- qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
- qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+ ---decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0]
|
- --1.08%--qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
- qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long)
- qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long)
- qsl::gateway::Session::on_bytes(std::span)
+ --2.13%--qsl::engine::OrderBook::cancel(unsigned long)
main
__libc_start_call_main
__libc_start_main@@GLIBC_2.34
_start
- 1.61% [.] qsl::protocol::decode_header(std::span) qsl-bench - -
+ 2.66% [.] std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) libstdc++.so.6.0.35 - -
|
- --1.08%--qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long)
- qsl::gateway::Session::on_bytes(std::span)
+ |--1.60%--qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&)
+ | |
+ | --1.06%--qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+ | qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+ | qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&)
+ | main
+ | __libc_start_call_main
+ | __libc_start_main@@GLIBC_2.34
+ | _start
+ |
+ --1.06%--qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&)
+ decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0]
+ qsl::engine::OrderBook::cancel(unsigned long)
main
__libc_start_call_main
__libc_start_main@@GLIBC_2.34
_start
- 1.61% [.] std::__detail::_List_node_base::_M_unhook()@plt qsl-bench - -
+ 2.13% [.] __posix_memalign libc.so.6 - -
|
- --1.08%--qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&)
- decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0]
- qsl::engine::OrderBook::cancel(unsigned long)
+ |--1.06%--operator new(unsigned long, std::align_val_t)
+ | std::__detail::_Map_base, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits, true>::operator[](unsigned long const&)
+ | qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
+ | qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+ |
+ --1.06%--0x14ffff349d51d3
+ qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
+ qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
main
__libc_start_call_main
__libc_start_main@@GLIBC_2.34
_start
- 1.61% [.] std::pair > > >, bool> std::_Rb_tree > >, std::_Select1st > > >, std::greater, std::pmr::polymorphic_allocator > > > >::_M_emplace_unique > >(long&, std::__cxx11::list >&&) qsl-bench - -
+ 2.13% [.] operator delete(void*)@plt libstdc++.so.6.0.35 - -
|
- ---qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long)
- qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
- qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
- |
- --1.08%--qsl::engine::OrderBook::modify(unsigned long, long, unsigned int)
- qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int)
- qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&)
- qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector > const&)
- main
- __libc_start_call_main
- __libc_start_main@@GLIBC_2.34
- _start
+ --1.60%--main
+ __libc_start_call_main
+ __libc_start_main@@GLIBC_2.34
+ _start
- 1.08% [.] __memcpy_generic libc.so.6 - -
- 1.08% [.] _mid_memalign libc.so.6 - -
+ 2.13% [.] qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector >&, unsigned long) qsl-bench - -
|
- ---__posix_memalign
- operator new(unsigned long, std::align_val_t)
- std::__detail::_Map_base, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits, true>::operator[](unsigned long const&)
- qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
- qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+ ---qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long)
+ qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long)
+ qsl::gateway::Session::on_bytes(std::span)
main
__libc_start_call_main
__libc_start_main@@GLIBC_2.34
_start
- 1.08% [.] operator delete(void*, unsigned long)@plt qsl-bench - -
- 1.08% [.] qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) qsl-bench - -
+ 1.60% [.] decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] qsl-bench - -
|
- ---qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&)
+ --1.06%--qsl::engine::OrderBook::cancel(unsigned long)
+ main
+ __libc_start_call_main
+ __libc_start_main@@GLIBC_2.34
+ _start
- 1.08% [.] qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const qsl-bench - -
- 1.08% [.] qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) qsl-bench - -
+ 1.60% [.] free@plt libstdc++.so.6.0.35 - -
+ 1.60% [.] operator delete(void*, unsigned long, std::align_val_t)@plt libstdc++.so.6.0.35 - -
+ 1.60% [.] qsl::engine::OrderBook::can_store_limit(qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const qsl-bench - -
|
- ---qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&)
- qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector > const&)
+ ---qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+ qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+ qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long)
+ qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long)
+ qsl::gateway::Session::on_bytes(std::span)
main
__libc_start_call_main
__libc_start_main@@GLIBC_2.34
_start
- 1.08% [.] qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) qsl-bench - -
+ 1.60% [.] qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long) qsl-bench - -
|
- ---qsl::engine::OrderBook::modify(unsigned long, long, unsigned int)
- qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int)
- qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&)
+ ---qsl::gateway::Session::on_bytes(std::span)
main
__libc_start_call_main
__libc_start_main@@GLIBC_2.34
_start
- 1.08% [.] qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector >&, unsigned long) qsl-bench - -
+ 1.60% [.] qsl::protocol::decode_new_order(std::span) qsl-bench - -
|
- ---qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long)
- qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long)
- qsl::gateway::Session::on_bytes(std::span)
+ --1.06%--main
+ __libc_start_call_main
+ __libc_start_main@@GLIBC_2.34
+ _start
+
+ 1.60% [.] qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) qsl-bench - -
+ |
+ ---main
+ __libc_start_call_main
+ __libc_start_main@@GLIBC_2.34
+ _start
+
+ 1.06% [.] __memcpy_generic libc.so.6 - -
+ 1.06% [.] decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] qsl-bench - -
+ |
+ ---qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+
+ 1.06% [.] operator new(unsigned long)@plt qsl-bench - -
+ 1.06% [.] qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) qsl-bench - -
+ |
+ ---qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&)
+
+ 1.06% [.] qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const qsl-bench - -
+ |
+ ---qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long)
main
__libc_start_call_main
__libc_start_main@@GLIBC_2.34
_start
- 1.08% [.] qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) qsl-bench - -
+ 1.06% [.] qsl::gateway::(anonymous namespace)::append(std::vector >&, std::vector > const&, unsigned long) [clone .isra.0] qsl-bench - -
|
- ---qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long)
+ ---qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector >&, unsigned long)
+ qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long)
qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long)
qsl::gateway::Session::on_bytes(std::span)
main
@@ -457,15 +440,27 @@ perf report output:
__libc_start_main@@GLIBC_2.34
_start
- 1.08% [.] qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) qsl-bench - -
+ 1.06% [.] qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) qsl-bench - -
|
---main
__libc_start_call_main
__libc_start_main@@GLIBC_2.34
_start
- 1.08% [.] std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) libstdc++.so.6.0.35 - -
- 1.08% [.] std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*) libstdc++.so.6.0.35 - -
+ 1.06% [.] std::_Hashtable, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node, false>*) qsl-bench - -
+ 1.06% [.] std::_Hashtable, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node, false>*, unsigned long) qsl-bench - -
+ |
+ ---std::__detail::_Map_base, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits, true>::operator[](unsigned long const&)
+ qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
+ qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
+
+ 1.06% [.] std::__detail::_List_node_base::_M_unhook() libstdc++.so.6.0.35 - -
+ |
+ ---qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&)
+ decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0]
+ qsl::engine::OrderBook::cancel(unsigned long)
+
+ 1.06% [.] std::__detail::_Map_base, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits, true>::operator[](unsigned long const&) qsl-bench - -
|
---qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int)
qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce)
@@ -474,9 +469,8 @@ perf report output:
__libc_start_main@@GLIBC_2.34
_start
- 1.08% [.] std::pmr::(anonymous namespace)::newdel_res_t::do_deallocate(void*, unsigned long, unsigned long) libstdc++.so.6.0.35 - -
#
-# (Tip: Compare performance results with: perf diff [ ])
+# (Tip: To see list of saved events and attributes: perf evlist -v)
#
diff --git a/results/perf_stat_linux.txt b/results/perf_stat_linux.txt
index ddcd14b..1c6f521 100644
--- a/results/perf_stat_linux.txt
+++ b/results/perf_stat_linux.txt
@@ -8,12 +8,12 @@ Perf: perf version 6.19.14-400.asahi.fc44.aarch64
Perf paranoid: 2
Build type: Release
Provenance version: 1
-Git commit (informational): 33f8d11
-Source digest: sha256:d8856d2f599416a9e74050726279a67f88d61a3bb3d06de86eb3bf948d2a16a5
+Git commit (informational): f9f7e98
+Source digest: sha256:59d9fdbc9d64b974bd28094e55610cced29a381c6e2ec968092862a975bde281
Source digest scope: perf-stat-benchmark
Dirty inputs: no
Generated output: results/perf_stat_linux.txt
-Date: 2026-06-21T05:25:23Z
+Date: 2026-06-25T02:30:17Z
Benchmark binary: build/bench/qsl-bench
Benchmark status: 0
Dataset: qsl-bench default synthetic benchmark suite
@@ -28,39 +28,39 @@ until every requested counter is supported. Profiling evidence for investigation
not a production-latency claim.
Benchmark output:
-order_book add/mod/cancel 200000 ops 143.9 ns/op 6947208 ops/sec
-protocol encode+decode 500000 ops 21.5 ns/op 46599221 ops/sec
-gateway session (fill) 200000 ops 129.7 ns/op 7710496 ops/sec
-matching engine flow 5004 items 102.4 ns/item 9769779 items/sec
-replay command log 5004 items 110.3 ns/item 9064737 items/sec
+order_book add/mod/cancel 200000 ops 137.2 ns/op 7288189 ops/sec
+protocol encode+decode 500000 ops 21.2 ns/op 47087754 ops/sec
+gateway session (fill) 200000 ops 120.8 ns/op 8277978 ops/sec
+matching engine flow 5004 items 93.2 ns/item 10733865 items/sec
+replay command log 5004 items 97.1 ns/item 10294783 items/sec
Benchmark output under perf:
-order_book add/mod/cancel 200000 ops 92.7 ns/op 10785353 ops/sec
-protocol encode+decode 500000 ops 16.3 ns/op 61508483 ops/sec
-gateway session (fill) 200000 ops 110.8 ns/op 9023997 ops/sec
-matching engine flow 5004 items 98.1 ns/item 10190493 items/sec
-replay command log 5004 items 109.4 ns/item 9137639 items/sec
+order_book add/mod/cancel 200000 ops 121.9 ns/op 8202972 ops/sec
+protocol encode+decode 500000 ops 21.0 ns/op 47563493 ops/sec
+gateway session (fill) 200000 ops 120.9 ns/op 8269791 ops/sec
+matching engine flow 5004 items 95.8 ns/item 10437399 items/sec
+replay command log 5004 items 99.6 ns/item 10043348 items/sec
perf stat output:
Performance counter stats for 'build/bench/qsl-bench':
- 233,479,932 apple_avalanche_pmu/cycles/u
+ 221,558,456 apple_avalanche_pmu/cycles/u
apple_blizzard_pmu/cycles/u (0.00%)
- 1,247,839,058 apple_avalanche_pmu/instructions/u
+ 1,160,776,150 apple_avalanche_pmu/instructions/u
apple_blizzard_pmu/instructions/u (0.00%)
- 245,495,434 apple_avalanche_pmu/branches/u
+ 233,032,815 apple_avalanche_pmu/branches/u
apple_blizzard_pmu/branches/u (0.00%)
- 1,272,574 apple_avalanche_pmu/branch-misses/u
+ 1,143,050 apple_avalanche_pmu/branch-misses/u
apple_blizzard_pmu/branch-misses/u (0.00%)
apple_avalanche_pmu/cache-references/u