diff --git a/AGENTS.md b/AGENTS.md index 592ad76..444799e 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1263,6 +1263,19 @@ Fedora Asahi, and `v0.2.0` was released (PR #127 ded6e80; resume-anchor sync PR `v0.2.1` then shipped two reprioritized backlog items plus a consistency sweep: a Codex resume-anchor/PMU sweep (PR #129), a perf call-graph flamegraph + `make flamegraph` (PR #134, superseding the auto-closed #130, closing #32), and the FIX-like text protocol adapter (PR #131, -closing #29), with the version bump on the release PR. There is no active milestone; the +closing #29), with the version bump on the release PR. + +Since `v0.2.1`, a post-v0.2.1 hardening + perf wave (PRs #135–#146) is merged to `main` and +unreleased, being cut as **`v0.2.2`**. It came out of a 4-round adversarial bug hunt (converged +5→2→1→0 confirmed) plus flamegraph-guided optimization. Security/robustness: out-of-domain enum +rejection in the replay/protocol decoders (#136); network hardening — EINTR retry, accept fairness, +connection cap, UDP send-error tracking, transient-accept survival, and threaded/epoll fd-exhaustion +handling (#137, #140, #143); CLI arg validation (#141); a **real UBSan abort gate** — the `asan` +preset now sets `-fno-sanitize-recover=undefined`, since UBSan previously ran in recover mode and +exited 0, so pure-UBSan defects passed CI green (#142); OCaml `diff_report` per-fixture robustness +(#144). Perf (measured back-to-back A/B): `try_emplace` for baseline price levels (~+5%, #138) and +an order-index hash `max_load_factor` cap at 0.25 (~+18.6%, #145), flamegraph regenerated +(#135/#139/#146). Determinism preserved (byte-identical fixtures; OCaml differential pass). +`make check`/`make asan` 270/270 (the latter now a real UBSan gate). After `v0.2.2`, the highest-value remaining work is non-code and gated on #94 (external review) and #90 (full cache-PMU evidence on a PMU-capable microarchitecture). diff --git a/CHANGELOG.md b/CHANGELOG.md index 98f8da8..0f9dcb0 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,55 @@ All notable changes to this project. The format is loosely based on _Nothing yet._ +## [0.2.2] - 2026-06-24 + +A security/robustness **hardening** wave plus two measured order-book **performance** wins, driven by +a multi-round adversarial bug hunt (converged 5→2→1→0 confirmed bugs) and flamegraph-guided +optimization. Same honesty bar: a deterministic C++20 exchange simulator and cross-language +differential-testing harness — **not** a production exchange, no real-market connectivity, no latency +or profitability claims, not formal verification. Determinism preserved throughout (fixtures +byte-identical across g++/clang++ and vs the committed copies; the OCaml differential passes). +`make check`/`make asan` 270/270. + +### Fixed + +- **Reject out-of-domain enum bytes in the decoders (#136).** `replay::decode_command` (NewLimit / + NewMarket) and `protocol::decode_reject` cast raw bytes to `Side` / `TimeInForce` / `RejectReason` + without validating the domain. Since the replay path applies decoded commands straight to the + engine with no gateway risk check, a corrupt log record could silently diverge replayed state. + Both now validate via `core::is_valid` (added `is_valid(RejectReason)`) and refuse out-of-domain + bytes like a malformed frame. +- **Network-path hardening (#137, #140, #143).** The TCP gateway now retries `EINTR` in its + read/write paths and survives transient `accept()` errors (`EINTR`/`ECONNABORTED`) instead of + tearing the listener down; both the threaded acceptor (back-off retry) and the epoll loop (listener + disarm/re-arm) survive fd exhaustion (`EMFILE`/`ENFILE`); a `TcpServerOptions::max_active_connections` + cap sheds load; the epoll loop bounds accepts per tick for fairness; and `UdpPublisher` counts + `send_failures` rather than silently dropping datagrams. +- **CLI argument validation (#141).** `qsl-client`, `qsl-mdfeed`, and `qsl-export-fixture` parse + numeric arguments with `std::from_chars` and reject malformed / out-of-range input with a usage + message and non-zero exit, instead of `std::terminate` (from an uncaught `std::sto*` exception) or + silently truncating an out-of-range port. +- **UBSan gate now actually fails (#142).** The `asan` preset adds `-fno-sanitize-recover=undefined` + so UBSan **aborts** on the first violation. It previously ran in recover mode (print a diagnostic, + exit 0), so a pure-UBSan defect passed `make asan` / CI green. The tree is UBSan-clean under the + strict gate. +- **OCaml `diff_report` robustness (#144).** The differential-bundle bin guards each fixture + (catching `Stream_parser.Parse_error` / `Sys_error`) so one malformed or unreadable fixture cannot + abort the whole batch and silently lose the divergence bundles for the rest. + +### Performance + +- **`try_emplace` for baseline price levels (#138).** `OrderBook::level_for` used + `std::map::emplace`, which allocates and frees a node even when the price level already exists. + `try_emplace` avoids that on the steady-state common path. Measured back-to-back A/B on the + `qsl-bench profile` workload: **~+5%**. +- **Order-index hash load-factor cap (#145).** The `OrderId → Locator` index is the busiest structure + on the engine hot path (1–4 point lookups per op). Capping its `max_load_factor` at 0.25 shortens + probe chains. Measured A/B: **~+18.6%**. Determinism is unaffected — the index is never iterated + for output. +- **Flamegraph regenerated (#135, #139, #146)** against the new code, now a dense (~20k-sample), + fully-symbolized frame-pointer profile with zero `[unknown]` frames. + ## [0.2.1] - 2026-06-21 Two backlog items — reprioritized by the maintainer and delivered — plus a resume-anchor and diff --git a/CLAUDE.md b/CLAUDE.md index 0b0cabd..60e126e 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1207,6 +1207,19 @@ Fedora Asahi, and `v0.2.0` was released (PR #127 ded6e80; resume-anchor sync PR `v0.2.1` then shipped two reprioritized backlog items plus a consistency sweep: a Codex resume-anchor/PMU sweep (PR #129), a perf call-graph flamegraph + `make flamegraph` (PR #134, superseding the auto-closed #130, closing #32), and the FIX-like text protocol adapter (PR #131, -closing #29), with the version bump on the release PR. There is no active milestone; the +closing #29), with the version bump on the release PR. + +Since `v0.2.1`, a post-v0.2.1 hardening + perf wave (PRs #135–#146) is merged to `main` and +unreleased, being cut as **`v0.2.2`**. It came out of a 4-round adversarial bug hunt (converged +5→2→1→0 confirmed) plus flamegraph-guided optimization. Security/robustness: out-of-domain enum +rejection in the replay/protocol decoders (#136); network hardening — EINTR retry, accept fairness, +connection cap, UDP send-error tracking, transient-accept survival, and threaded/epoll fd-exhaustion +handling (#137, #140, #143); CLI arg validation (#141); a **real UBSan abort gate** — the `asan` +preset now sets `-fno-sanitize-recover=undefined`, since UBSan previously ran in recover mode and +exited 0, so pure-UBSan defects passed CI green (#142); OCaml `diff_report` per-fixture robustness +(#144). Perf (measured back-to-back A/B): `try_emplace` for baseline price levels (~+5%, #138) and +an order-index hash `max_load_factor` cap at 0.25 (~+18.6%, #145), flamegraph regenerated +(#135/#139/#146). Determinism preserved (byte-identical fixtures; OCaml differential pass). +`make check`/`make asan` 270/270 (the latter now a real UBSan gate). After `v0.2.2`, the highest-value remaining work is non-code and gated on #94 (external review) and #90 (full cache-PMU evidence on a PMU-capable microarchitecture). diff --git a/CMakeLists.txt b/CMakeLists.txt index 19fc3b1..83d5b0b 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -1,5 +1,5 @@ cmake_minimum_required(VERSION 3.24) -project(quant-systems-lab VERSION 0.2.1 LANGUAGES CXX) +project(quant-systems-lab VERSION 0.2.2 LANGUAGES CXX) set(CMAKE_CXX_STANDARD 20) set(CMAKE_CXX_STANDARD_REQUIRED ON) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index d5e4a5e..1fba07a 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -20,7 +20,7 @@ reviewable. ```bash make check # clang-format check + build + tests -make asan # AddressSanitizer + UBSan build and tests +make asan # AddressSanitizer + UBSan build and tests (UBSan aborts on first violation) dune runtest --root ocaml # OCaml log verifier + independent replay + differential + mutation tests ``` diff --git a/HANDOFF.md b/HANDOFF.md index 00271c9..0594dc6 100644 --- a/HANDOFF.md +++ b/HANDOFF.md @@ -33,8 +33,19 @@ partial-PMU reframe, and a full documentation staleness sweep — landed as PR # **v0.2.1 release** then adds two reprioritized backlog items and a consistency sweep: a Codex resume-anchor/PMU sweep (PR #129), a perf call-graph flamegraph + `make flamegraph` (PR #130, issue #32), the FIX-like text protocol adapter (PR #131, issue #29), and the version-bump release -PR — merged in that order, with `v0.2.1` tagged on the release merge commit. There is no active -milestone; the project is between releases. +PR — merged in that order, with `v0.2.1` tagged on the release merge commit. + +Since `v0.2.1`, a **post-v0.2.1 hardening + perf wave (#135–#146) is merged to `main` and +unreleased**, being cut as **`v0.2.2`**. It came out of a 4-round adversarial bug hunt (converged +5→2→1→0 confirmed bugs) plus flamegraph-guided optimization. Security/robustness: out-of-domain enum +rejection in the replay/protocol decoders (#136); network hardening — EINTR retry, accept fairness, +connection cap, UDP send-error tracking, transient-accept survival, and threaded/epoll fd-exhaustion +handling (#137, #140, #143); CLI arg validation (#141); a **real UBSan abort gate** — +`-fno-sanitize-recover=undefined`, since UBSan previously ran in recover mode and exited 0 (#142); +OCaml `diff_report` robustness (#144). Perf (measured A/B): `try_emplace` for baseline price levels +(~+5%, #138) and an order-index hash load-factor cap (~+18.6%, #145), with the flamegraph regenerated +(#135/#139/#146). `make check`/`make asan` 270/270 (the latter now under the real UBSan gate). The +next action is to finish this `v0.2.2` doc/artifact overhaul and cut the tag. Background — Linux perf evidence (merged, now bare-metal partial PMU): @@ -77,13 +88,15 @@ Current state: - latest synced main baseline: `ded6e80` (PR #127, v0.2.0); the `v0.2.1` baseline is the release-PR merge commit, after PRs #129/#130/#131 -- current active branch, if active: none (work lands via scoped PRs from `main`) -- current active status: `v0.2.1` is the current release on top of `v0.2.0`. It adds the FIX-like - text protocol adapter (#29), `make flamegraph` + a bare-metal flamegraph artifact (#32), and a - Codex resume-anchor/PMU consistency sweep. `make check` 263/263 and `make asan` 263/263 on the - bare-metal Apple M2 Fedora Asahi host; both new code files pass the CI CodeScene Code Health gate. - No active milestone -- release tag: `v0.2.1` (Latest, tagged on the release-PR merge commit), after `v0.2.0` and `v0.1.0` +- current active branch, if active: `docs/post-v0.2.1-overhaul` (v0.2.2 prep + doc/artifact sweep) +- current active status: `v0.2.1` is the latest tag; a post-v0.2.1 hardening + perf wave (#135–#146) + is merged to `main` and unreleased, being cut as `v0.2.2` (decoder enum rejection, network/CLI + hardening, a real UBSan abort gate, OCaml diff_report robustness, and two measured order-book perf + wins — `try_emplace` ~+5% and an index load-factor cap ~+18.6%). `make check` 270/270 and + `make asan` 270/270 (the latter now under the real UBSan gate) on the bare-metal Apple M2 Fedora + Asahi host; every touched file passes the CI CodeScene Code Health gate +- release tag: `v0.2.1` (Latest, tagged on the release-PR merge commit), after `v0.2.0` and `v0.1.0`; + `v0.2.2` prepared on this branch, not yet tagged - open follow-up issue: #90 — narrowed to the full cache-counter PMU set; the bare-metal Apple host provides real cycles/instructions/branches/branch-misses but no cache-reference/cache-miss support - issues #95, #28, and #26 were closed by PR #112; issues #32 and #29 were closed by PR #134 and @@ -94,12 +107,13 @@ Current state: ### Next milestone -There is no active milestone. M0–M49, the Linux artifact refresh (PR #125), the v0.2.0 release -(PR #127), and the v0.2.1 content (PRs #129/#134/#131 + release PR) are merged. The highest-value -remaining work is non-code and externally gated: issue #94 (independent external review — needs a -human reviewer) and issue #90 (full cache-counter PMU evidence — needs a PMU microarchitecture that -exposes cache events). The #32 (flamegraph) and #29 (FIX adapter) backlog items are now done. Do not -invent a new milestone without an explicit human request. +There is no active milestone. M0–M49 are merged, as are the v0.2.0/v0.2.1 releases and the +post-v0.2.1 hardening + perf wave (#135–#146, being released as `v0.2.2`). The immediate next action +is to finish the `v0.2.2` doc/artifact overhaul (this branch) and cut the tag. After that the +highest-value remaining work is non-code and externally gated: issue #94 (independent external +review — needs a human reviewer) and issue #90 (full cache-counter PMU evidence — needs a PMU +microarchitecture that exposes cache events). Do not invent a new milestone without an explicit +human request. ### Phase III / IV purpose diff --git a/PROGRESS.md b/PROGRESS.md index 53d81cc..d5ba28d 100644 --- a/PROGRESS.md +++ b/PROGRESS.md @@ -20,36 +20,41 @@ Do not rely on prior chat memory. ## Current state -- **Active milestone:** none — `v0.2.1` released; project is between releases -- **Status:** ☑ `v0.2.1` published (FIX-like text protocol adapter #29, perf flamegraph #32, and a - resume-anchor/PMU consistency sweep) on top of `v0.2.0` -- **Active branch:** none (work lands via scoped PRs from `main`) +- **Active milestone:** none — `v0.2.1` is the latest tag, but a post-v0.2.1 hardening + perf wave + (12 PRs, #135–#146) has merged to `main` and is **unreleased**; it is being cut as **`v0.2.2`** +- **Status:** ☑ `v0.2.1` published on top of `v0.2.0`; ☐ `v0.2.2` in preparation — security/robustness + hardening (decoder enum-domain rejection, network/CLI hardening, a real UBSan abort gate, OCaml + diff_report robustness) plus two measured order-book perf wins +- **Active branch:** `docs/post-v0.2.1-overhaul` (the v0.2.2 prep + full doc/artifact staleness sweep) - **Last completed milestone:** M49 — NIC offload and low-latency networking study (PR #124, - d8c16b2); since then `v0.2.0` (PR #127, ded6e80) and the `v0.2.1` content: Codex resume-anchor - sweep (PR #129), perf flamegraph #32 (PR #134), and the FIX text adapter #29 (PR #131) -- **Last completed docs sync:** v0.2.1 release prep (this PR): version bump + CHANGELOG `[0.2.1]` - and resume/release anchors brought current -- **Release:** `v0.1.0` (tag on 9857e1a), `v0.2.0` (tag on ded6e80), and `v0.2.1` (tag created on the - squash-merge of the release PR, marked Latest) published as GitHub-only releases; no packages - published -- **`make check` passing:** yes — `make check` 263/263 and `make asan` 263/263 on the bare-metal - Apple M2 (aarch64) Fedora Asahi host on 2026-06-21 (includes the v0.2.1 FIX-adapter and flamegraph - renderer tests) -- **Last action:** delivered the `v0.2.1` content as scoped PRs and prepared this version-bump - release. Two reprioritized backlog items — the FIX-like text protocol adapter (#29) and the perf - call-graph flamegraph (#32) — plus the Codex resume-anchor/PMU consistency sweep (#127/#128 - follow-up). Ran Codex as an independent reviewer across the stack and resolved every finding: the - FIX envelope now requires MsgType as the first body field and rejects duplicate tags; - `flamegraph.sh` classifies zero-sample/partial runs honestly, fails hard on renderer errors, and - gates on the folded sample total (not perf's estimate); and the resume anchors were made - consistent across PROGRESS/HANDOFF/AGENTS/CLAUDE. Brought every touched file through the CodeScene - Code Health gate (table-driven enum maps, a `decode_typed` skeleton, split `parse_envelope`, - flattened `flamegraph.py`). `make check`/`make asan` 263/263. -- **Next action:** no active milestone. Highest-value remaining work is non-code and gated: - issue #94 (independent external review — needs a human reviewer) and issue #90 (full - cache-counter PMU evidence — needs a PMU microarchitecture that exposes cache events, e.g. - x86_64). The #32 (flamegraph) and #29 (FIX adapter) backlog items are done — shipped in `v0.2.1` - (PR #134 and PR #131) — so do not reopen them. + d8c16b2). Releases since: `v0.2.0` (PR #127, ded6e80) and `v0.2.1` (FIX adapter #131, flamegraph + #134, anchor sweep #129). Post-v0.2.1 unreleased work on `main`: #135–#146 (see Last action) +- **Last completed docs sync:** this v0.2.2-prep overhaul — every `.md`/`.txt` audited against + current `main`; resume/release anchors, README, CHANGELOG, and all stale `results/*.txt` + provenance digests brought current to HEAD +- **Release:** `v0.1.0` (tag on 9857e1a), `v0.2.0` (tag on ded6e80), and `v0.2.1` (tag on the + release-PR merge, marked Latest) published as GitHub-only releases; `v0.2.2` prepared here, not yet + tagged; no packages published +- **`make check` passing:** yes — `make check` 270/270 and `make asan` 270/270 (the latter now under + the **real** UBSan abort gate from #142) on the bare-metal Apple M2 (aarch64) Fedora Asahi host on + 2026-06-24 +- **Last action:** post-v0.2.1 hardening + perf wave merged to `main` as 12 scoped PRs (#135–#146), + driven by a multi-round adversarial bug hunt (converged 5→2→1→0 confirmed) and flamegraph-guided + optimization. Security/robustness: reject out-of-domain enum bytes in the replay/protocol decoders + (#136); network hardening — EINTR retry, accept fairness, connection cap, UDP send-error tracking, + transient-accept survival, and threaded/epoll fd-exhaustion handling (#137, #140, #143); CLI arg + validation so the tools reject malformed input instead of `std::terminate` (#141); the `asan` + preset now sets `-fno-sanitize-recover=undefined` so UBSan actually fails CI — previously it ran in + recover mode and exited 0 (#142); OCaml `diff_report` guards each fixture so one bad file cannot + abort the batch (#144). Perf (measured A/B): baseline price levels use `try_emplace` (~+5%, #138) + and the order-index hash caps its load factor at 0.25 (~+18.6%, #145); flamegraph regenerated + (#135, #139, #146). Determinism preserved throughout (byte-identical fixtures, OCaml differential + pass). `make check`/`make asan` 270/270. +- **Next action:** finish the `v0.2.2` overhaul (this branch): regenerate the remaining stale + `results/*.txt` artifacts, then cut the `v0.2.2` tag/release. After that, the highest-value + remaining work is non-code and gated: issue #94 (independent external review — needs a human + reviewer) and issue #90 (full cache-counter PMU evidence — needs a PMU microarchitecture that + exposes cache events, e.g. x86_64). - **Blockers:** issue #90 is now a *cache-counter* PMU gap, not a host-access gap — this bare-metal Apple M2 exposes real `cycles`/`instructions`/`branches`/`branch-misses` but its PMU does not implement `cache-references`/`cache-misses`; closing it needs a PMU microarchitecture that exposes @@ -221,15 +226,21 @@ Status key: - _none yet_ -Measured by `make bench` (full metadata + raw output in `results/latest.txt`). Hardware-, -compiler-, and build-dependent — these are from one machine, not a production-latency claim. - -- Run: arm64, Apple clang 17, Release, seed 42, commit fbb8180 (synthetic, in-process; excludes network/disk/kernel path). -- order book add/modify/cancel: ~126 ns/op -- protocol NewOrder encode+decode: ~39 ns/op -- in-process gateway session (crossing order with fill): ~270 ns/op -- matching-engine flow apply: ~121 ns/command -- replay from command log: ~132 ns/command +Measured by `make bench` (full metadata + raw output in `results/latest.txt`, which is the +authoritative source). Hardware-, compiler-, and build-dependent — from one machine, not a +production-latency claim. + +- Run: aarch64 (Apple M2), GCC, Release, seed 42, Fedora Asahi Linux (synthetic, in-process; + excludes network/disk/kernel path). The earlier macOS Apple-clang numbers (~126/39/270/121/132 ns) + were superseded by the Linux regeneration and are not the current set. +- order book add/modify/cancel: ~90 ns/op +- protocol NewOrder encode+decode: ~16 ns/op +- in-process gateway session (crossing order with fill): ~102 ns/op +- matching-engine flow apply: ~91 ns/command +- replay from command log: ~101 ns/command +- Note: these single-process micro-benchmarks hold a near-empty order index, so they do not exercise + the deep-book steady state where the v0.2.2 engine wins land — `try_emplace` (~+5%, #138) and the + order-index load-factor cap (~+18.6%, #145) are measured on the `qsl-bench profile` workload. --- @@ -431,6 +442,25 @@ Lower priority: release anchors and removed completed #29/#32 from every backlog list, synced AGENTS.md/CLAUDE.md to the v0.2.1 released state, and refreshed this release-readiness audit to 263 tests. `make check`/`make asan` 263/263. CodeScene MCP token still expired; CI is the authoritative gate. +- [2026-06-24] Post-v0.2.1 hardening + perf wave (#135–#146), to be released as `v0.2.2`. Driven by a + multi-round adversarial bug hunt (4 rounds, converged 5→2→1→0 confirmed) plus flamegraph-guided + optimization. Security/robustness: reject out-of-domain enum bytes in the replay/protocol decoders + (#136, `core::is_valid` for Side/TimeInForce/RejectReason); network hardening — EINTR retry in the + TCP read/write path, accept fairness (epoll `max_accepts_per_tick`), connection cap + (`max_active_connections`), UDP send-error counter, transient-accept survival + (EINTR/ECONNABORTED), and threaded/epoll fd-exhaustion handling (#137, #140, #143); CLI arg + validation via `std::from_chars` so qsl-client/qsl-mdfeed/qsl-export-fixture reject malformed input + instead of `std::terminate`/silent port truncation (#141); the `asan` preset now sets + `-fno-sanitize-recover=undefined` so UBSan **aborts** on a violation — it previously ran in recover + mode and exited 0, so pure-UBSan defects passed CI green; the tree is UBSan-clean under the strict + gate (#142); OCaml `diff_report` guards each fixture so one malformed file cannot abort the batch + (#144). Perf (measured back-to-back A/B on the `qsl-bench profile` workload): baseline price levels + use `try_emplace` (~+5%, #138) and the order-index hash caps `max_load_factor` at 0.25 (~+18.6%, + #145); flamegraph regenerated against the new code (#135/#139/#146). Determinism preserved + throughout (byte-identical fixtures across g++/clang++ and vs committed; OCaml differential pass). + Then a full doc/artifact staleness overhaul (this branch): every `.md`/`.txt` audited against HEAD, + resume/release anchors + README + CHANGELOG brought current, and the stale `results/*.txt` + provenance digests regenerated. `make check`/`make asan` 270/270. - [2026-06-03] M35: implemented a multi-client TCP connection-scaling load test (`scripts/socket_load.sh`, `make socket-load`, Linux-only) driving N concurrent `qsl-client`s against the portable TCP and epoll (M34) gateways; `results/socket_load_summary.txt` is Docker-generated and constrained. A `/code-review` (3 finder agents) caught and fixed real measurement-integrity bugs before the PR: a failed trial's `wall=0` no longer poisons the reported best (only trials whose gateway served count toward the min); the `completed` column reports the WORST per-trial completion, not the last, so partial/total trial failures are surfaced rather than masked; a per-client `timeout` bounds a hang if the gateway dies; and `QSL_LOAD_TRIALS` is validated. Post-PR hardening uses fresh monotonic ports per gateway start, retries transient startup/serve failures on new ports, and refuses to write a partial artifact unless `QSL_LOAD_ALLOW_PARTIAL=1` is set intentionally; the refreshed artifact records `Dirty tree: no`. The scaling-shape claim remains constrained to loopback connection setup, not a demonstrated production-capacity advantage for either transport. Deferred follow-up: a shared `scripts/lib` to remove the dirty-tree / `wait_ready` / gateway-stop duplication across the three socket scripts. - [2026-06-03] M35: started after M34 (#98) squash-merged (commit 9e3750b). Scope: multi-client load / socket-pressure testing of the gateway/feed path (TCP/UDP stress, socket-buffer pressure, connection scaling, backpressure) building on M34's epoll multi-client path and M30's socket tooling. Constraints: scripts/tests document load shape + environment; results must distinguish kernel/socket pressure from user-space engine cost; no production-capacity claims (honest constrained-environment framing, like M29/M30). - [2026-06-04] M35: PR #100 squash-merged to `main` as a86b701 after all CI jobs and review checks were green. M35 is now landed; original M36 NUMA remains deferred until the repository-health refactor analysis is completed or explicitly skipped by the human. @@ -837,14 +867,17 @@ Quant Systems Lab — Linux Systems + Exchange Infrastructure Simulator ## Next action remains -There is no active milestone. `v0.2.1` is the current release, on top of `v0.2.0` (PR #127 ded6e80) -and `v0.1.0`. The `v0.2.1` content is squash-merged to `main`: the Codex resume-anchor sweep -(PR #129), the perf flamegraph #32 (PR #134, superseding the auto-closed #130), the FIX text adapter -#29 (PR #131), and the version-bump release PR (#133), with `v0.2.1` tagged on the release merge -commit. The committed perf artifacts remain **partial hardware PMU evidence** from this bare-metal -Apple M2 (aarch64) Fedora Asahi host — real cycles/instructions/branches/branch-misses with -cache-reference/cache-miss counters unsupported by the Apple Silicon PMU — not NIC-offload, latency, -or full hardware-PMU evidence. +`v0.2.1` is the latest tag, on top of `v0.2.0` (PR #127 ded6e80) and `v0.1.0`. A post-v0.2.1 +hardening + perf wave (#135–#146) is squash-merged to `main` and **unreleased**, being cut as +`v0.2.2`: out-of-domain enum rejection in the decoders (#136); network hardening — EINTR retry, +accept fairness, connection cap, UDP send-error tracking, transient-accept survival, and fd-exhaustion +handling (#137, #140, #143); CLI arg validation (#141); a real UBSan abort gate (#142); OCaml +`diff_report` robustness (#144); and two measured order-book perf wins — `try_emplace` (~+5%, #138) +and the order-index load-factor cap (~+18.6%, #145), with the flamegraph regenerated (#135/#139/#146). +`make check`/`make asan` 270/270. The committed perf artifacts remain **partial hardware PMU +evidence** from this bare-metal Apple M2 (aarch64) Fedora Asahi host — real +cycles/instructions/branches/branch-misses with cache-reference/cache-miss counters unsupported by +the Apple Silicon PMU — not NIC-offload, latency, or full hardware-PMU evidence. Highest-value remaining work is non-code and gated: issue #94 (independent external review) and issue #90 (full cache-PMU evidence). Issue #90 needs a PMU **microarchitecture** that exposes cache diff --git a/README.md b/README.md index ee12fd7..319157b 100644 --- a/README.md +++ b/README.md @@ -98,14 +98,18 @@ methodology and caveats in [docs/benchmarking.md](docs/benchmarking.md) and | Scenario (synthetic, in-process) | Measured on this run | |---|---| -| Order book add/modify/cancel | ~87 ns/op | +| Order book add/modify/cancel | ~90 ns/op | | Protocol `NewOrder` encode+decode | ~16 ns/op | -| Gateway session, crossing order with fill | ~110 ns/op | -| Matching-engine flow (apply) | ~98 ns/command | -| Replay from command log | ~110 ns/command | - -Reproduce with `make bench` (numbers will differ by machine). The differential-testing harness -(generation, replay, shrinking) has its own benchmark — `make bench-diff`, written to +| Gateway session, crossing order with fill | ~102 ns/op | +| Matching-engine flow (apply) | ~91 ns/command | +| Replay from command log | ~101 ns/command | + +Reproduce with `make bench` (numbers will differ by machine). These micro-benchmarks hold a +near-empty order index, so they do **not** exercise the deep-book steady state where the v0.2.2 +engine optimizations land: `try_emplace` for baseline price levels (#138) and capping the +order-index hash load factor (#145) were measured by back-to-back A/B on the `qsl-bench profile` +workload at **~+5%** and **~+18.6%** respectively (determinism preserved). The differential-testing +harness (generation, replay, shrinking) has its own benchmark — `make bench-diff`, written to [`results/differential.txt`](results/differential.txt) — kept separate so it does not disturb the core numbers above. @@ -121,8 +125,10 @@ capture is dense (~20k samples) and stacks are fully symbolized — no `[unknown This is a **software cpu-clock sampling** hot-symbol profile, **not** PMU evidence: frame width is proportional to on-CPU samples, not wall-clock latency or throughput, and it is -hardware/kernel/compiler/build dependent. The hot frames are `MatchingEngine::new_limit`/`cancel`, -the order-book level/index operations, and the allocator. Provenance and classification are in +hardware/kernel/compiler/build dependent. The hot frames are the matching and resting work — +`MatchingEngine::new_limit` → `OrderBook::match_baseline` and `rest` → `level_for`, plus `cancel`; +the per-level allocation churn and order-index lookups that previously dominated were cut by the +v0.2.2 `try_emplace` (#138) and index load-factor (#145) wins. Provenance and classification are in [`results/flamegraph.txt`](results/flamegraph.txt); methodology in [docs/perf_analysis.md](docs/perf_analysis.md). GitHub renders the SVG statically; download the raw file for interactive zoom and search. @@ -132,9 +138,12 @@ file for interactive zoom and search. - **Synthetic and local.** No real market data, no real venue connectivity, no order types beyond limit/market + GTC/IOC. - **Networking remains scoped.** The default TCP gateway is intentionally - loopback-only and unauthenticated. It now has portable threaded serving for multiple clients, and - Linux builds also include an opt-in `epoll` gateway prototype for event-driven readiness. These - are architecture and pressure-validation paths, not a production event loop or capacity claim. + loopback-only and unauthenticated. It has portable threaded serving for multiple clients, plus an + opt-in Linux `epoll` gateway prototype for event-driven readiness. Both paths were hardened in + v0.2.2: `EINTR` retry on read/write, survival of transient `accept()` errors and fd exhaustion + (`EMFILE`/`ENFILE`) instead of tearing the listener down, a connection cap, and per-tick accept + fairness. These are architecture and robustness paths, not a production event loop or capacity + claim. - **Benchmarks are microbenchmarks**, not end-to-end or production latency (see above). CPU-affinity/scheduler-migration and false-sharing studies are separate hardware-dependent artifacts; contiguous order-book storage is a bounded-domain architecture study, not a general diff --git a/SECURITY.md b/SECURITY.md index 1cce055..2993800 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -11,8 +11,10 @@ The demo network components are intentionally minimal: and bind to `127.0.0.1` only. - They are for local demonstration. **Do not expose `qsl-gateway` or `qsl-mdfeed` to untrusted networks**, and do not run them on a shared or public interface. -- There is no TLS, access control, rate limiting, or DoS protection. Malformed input is handled - by disconnecting the peer, not by recovering the stream. +- There is no TLS, access control, or rate limiting. The acceptors do have bounded resilience: an + optional connection cap, survival of transient `accept()` errors and fd exhaustion, and `EINTR` + retry on read/write — but this is robustness hardening, not DoS protection. Malformed input is + handled by disconnecting the peer, not by recovering the stream. ## Reporting diff --git a/docs/binary_protocol.md b/docs/binary_protocol.md index 1e1c948..33f4e28 100644 --- a/docs/binary_protocol.md +++ b/docs/binary_protocol.md @@ -68,7 +68,8 @@ buffer holds the full declared body before parsing. NewOrder enum fields are validated during decode. Out-of-range values for Side, OrderType, or TimeInForce return DecodeError::InvalidEnumValue and are not surfaced as internal domain -messages. +messages. Gateway-response decoders apply the same domain check: `decode_reject` returns +`InvalidEnumValue` for a `RejectReason` byte outside the defined codes (#136). ## Trailing bytes and framing diff --git a/docs/differential_testing.md b/docs/differential_testing.md index 7284844..ded31d9 100644 --- a/docs/differential_testing.md +++ b/docs/differential_testing.md @@ -332,6 +332,9 @@ When the differential check fails in CI, the `ocaml-verifier` job runs `diff_rep positive fixtures and uploads a `differential-failure-bundle` artifact. For each diverging fixture it contains `.original` (the fixture), `.computed` (OCaml snapshot), `.expected` (C++ snapshot), and `.diff` (a line diff) — so a divergence can be -debugged from the CI run without reproducing locally. The minimal-counterexample form of a +debugged from the CI run without reproducing locally. `diff_report` guards each fixture +independently: a malformed or unreadable fixture is reported as a comparison failure (non-zero +exit), not allowed to abort the batch and lose the remaining fixtures' bundles (#144). The +minimal-counterexample form of a failing *generated* stream is produced separately by the C++ shrinker (`qsl-export-stream shrink`, M19). diff --git a/docs/fix_protocol.md b/docs/fix_protocol.md index d25c8ce..497d370 100644 --- a/docs/fix_protocol.md +++ b/docs/fix_protocol.md @@ -92,4 +92,6 @@ ones: out-of-range integers, and oversized messages; - signed/extreme `int64` price and `uint64` id/seq round-trips. -The adapter is also covered by the ASan/UBSan preset (`make asan`), since it parses untrusted text. +The adapter is also covered by the ASan/UBSan preset (`make asan`), since it parses untrusted text; +the UBSan half now aborts on the first violation (`-fno-sanitize-recover=undefined`, #142), so a +UBSan defect in the parser fails the build rather than being silently recovered. diff --git a/docs/pool_backed_storage.md b/docs/pool_backed_storage.md index 6fa6b7e..98b2e5a 100644 --- a/docs/pool_backed_storage.md +++ b/docs/pool_backed_storage.md @@ -215,28 +215,28 @@ produced the earlier "intrusive is ~4-5x slower" ranking. This artifact moves engine construction, the registration prefix, and the end-of-run snapshot readout outside the timed interval (`Source digest: -sha256:b606452b1bbff3d1c4eed8f59839701590cfbc824207f7b707c03ca66766353a`, `Dirty inputs: no`), so -each row reflects per-command work. The corrected medians are: +sha256:c1e4cd7db8472a87cbd23ece3a2d4b330f78ad876b58da412e0e54f6c4eb4cf7`, `Dirty inputs: no`), so +each row reflects per-command work. The medians are: | Workload | Shape summary | Median ns/timed-command, fastest to slowest | | --- | --- | --- | -| General generated flow | 4 symbols, 5000 timed cmds, 2238 trades, 793 cancels, 690 modifies, max 41 active levels, width 67, density 0.076 | Contiguous 93.2, intrusive 95.4, baseline 111.0, PMR 121.4 | -| Dense bounded flow | 2 symbols, 5002 timed cmds, 1048 trades, 984 market orders, 20008 probes, max 80 active levels, width 136, density 0.147 | Intrusive 66.0, contiguous 70.7, PMR 88.3, baseline 96.4 | -| Sparse wide flow | 4 symbols, 5000 timed cmds, no trades, 828 cancels, 828 modifies, max 32 active levels, width 985, density 0.004 | Intrusive 48.2, contiguous 60.9, PMR 72.1, baseline 81.0 | -| Cancel/modify-heavy flow | 3 symbols, 5001 timed cmds, no trades, 1599 cancels, 1603 modifies, max 60 active levels, width 30, density 0.333 | Contiguous 42.8, intrusive 44.3, baseline 59.7, PMR 59.8 | -| Match/traversal-heavy flow | 1 symbol, 5003 timed cmds, 4012 trades, 494 market orders, max 60 active levels, width 81, density 0.370 | Contiguous 69.9, intrusive 87.2, baseline 109.3, PMR 117.9 | +| General generated flow | 4 symbols, 5000 timed cmds, 2238 trades, 793 cancels, 690 modifies, max 41 active levels, width 67, density 0.076 | Contiguous 71.2, intrusive 80.3, baseline 89.4, PMR 100.0 | +| Dense bounded flow | 2 symbols, 5002 timed cmds, 1048 trades, 984 market orders, 20008 probes, max 80 active levels, width 136, density 0.147 | Intrusive 52.2, contiguous 57.8, baseline 66.1, PMR 66.3 | +| Sparse wide flow | 4 symbols, 5000 timed cmds, no trades, 828 cancels, 828 modifies, max 32 active levels, width 985, density 0.004 | Contiguous 40.3, intrusive 42.8, baseline 55.7, PMR 57.9 | +| Cancel/modify-heavy flow | 3 symbols, 5001 timed cmds, no trades, 1599 cancels, 1603 modifies, max 60 active levels, width 30, density 0.333 | Contiguous 31.4, intrusive 36.7, baseline 49.0, PMR 54.7 | +| Match/traversal-heavy flow | 1 symbol, 5003 timed cmds, 4012 trades, 494 market orders, max 60 active levels, width 81, density 0.370 | Contiguous 56.8, intrusive 65.1, baseline 96.5, PMR 110.0 | -### What the corrected numbers show +### What the numbers show -With per-run setup excluded the four modes cluster into a much tighter band (roughly 40-120 +With per-run setup excluded the four modes cluster into a much tighter band (roughly 30-110 ns/command) instead of the earlier 40-486 spread, and the large earlier gaps are explained by -per-run pool initialization rather than per-command cost. Intrusive and contiguous storage are the -two fastest modes and trade the lead by workload shape: intrusive leads the insert/resting-heavy -dense and sparse flows, contiguous leads the cancel/modify and traversal-heavy flows, and they are -within a few ns/command on the general flow. Baseline `std::map`/`std::list` and PMR pooling sit -behind both, with PMR sometimes ahead of baseline and sometimes behind. The medians above come from -a quiet-host regeneration whose min/max ranges are tight; treat absolute values as environment- and -build-dependent. +per-run pool initialization rather than per-command cost. Contiguous storage is fastest on four of +the five workloads (general, sparse, cancel/modify, match/traversal); the intrusive pool leads only +the dense bounded flow and is close behind contiguous elsewhere. Baseline `std::map`/`std::list` and +PMR pooling sit behind both, with baseline usually ahead of PMR. The medians above come from a +regeneration whose per-mode min/max ranges are tight; treat absolute values as environment- and +build-dependent, and note these post-v0.2.2 baseline rows already include the `try_emplace` (#138) +and index load-factor (#145) wins. This does not make the intrusive pool "free". It pays a large fixed initialization cost (pre-allocating 65536 order and node slots per book) that this per-command metric deliberately diff --git a/docs/recruiting_notes.md b/docs/recruiting_notes.md index 91ebdbe..fa42665 100644 --- a/docs/recruiting_notes.md +++ b/docs/recruiting_notes.md @@ -45,8 +45,9 @@ ## Résumé bullets — Linux Engineering (conservative) - Implemented TCP order-gateway transports and a UDP market-data feed on POSIX sockets - (loopback), with bounded receive timeouts, sequence-gap detection, threaded portable serving, - epoll-based Linux serving, and disconnect-on-malformed-framing. + (loopback), with bounded receive timeouts, sequence-gap detection, UDP send-error counting, + threaded portable serving with a connection cap and accept-error/fd-exhaustion survival, + epoll-based Linux serving, `EINTR`-retry on read/write, and disconnect-on-malformed-framing. - Built CLI tools for append-only-log inspection and deterministic replay, plus a demo script that orchestrates a loopback gateway round-trip with port-readiness polling and clean process teardown. diff --git a/docs/release_readiness.md b/docs/release_readiness.md index 2fa1637..e148c63 100644 --- a/docs/release_readiness.md +++ b/docs/release_readiness.md @@ -2,17 +2,22 @@ A pre-release pass verifying the repo builds, demos, reproduces, and reads honestly. This audit covers **M0–M49, the v0.2.0 evidence refresh** (bare-metal Linux artifact regeneration and the -documentation/staleness sweep), **and the v0.2.1 content** (the FIX-like text protocol adapter #29, -the perf call-graph flamegraph + `make flamegraph` #32, and a Codex resume-anchor/PMU consistency -sweep). It supersedes the v0.1.0-era audit; the actual GitHub release is cut by a human after -squash-merge. +documentation/staleness sweep), **the v0.2.1 content** (the FIX-like text protocol adapter #29, the +perf call-graph flamegraph + `make flamegraph` #32, and a Codex resume-anchor/PMU consistency sweep), +**and the post-v0.2.1 hardening + perf wave being cut as v0.2.2** (#135–#146): out-of-domain enum +rejection in the decoders (#136), network-path hardening — EINTR retry, accept fairness, connection +cap, UDP send-error tracking, transient-accept survival, and fd-exhaustion handling (#137/#140/#143), +CLI argument validation (#141), a real UBSan abort gate (#142), OCaml `diff_report` robustness (#144), +and two measured order-book perf wins — `try_emplace` (~+5%, #138) and an index load-factor cap +(~+18.6%, #145). It supersedes the v0.1.0-era audit; the actual GitHub release is cut by a human +after squash-merge. ## Verification (this session, bare-metal Apple M2 / aarch64 / GCC 16.1.1, Fedora Asahi Remix) | Check | Result | |---|---| -| `make check` | 263/263 tests pass, no warnings (incl. the v0.2.1 FIX-adapter + flamegraph-renderer tests) | -| `make asan` (ASan + UBSan) | 263/263, sanitizer-clean (the FIX text parser handles untrusted input) | +| `make check` | 270/270 tests pass, no warnings (incl. the FIX-adapter, flamegraph-renderer, decoder enum-rejection, and CLI-arg-validation tests) | +| `make asan` (ASan + UBSan) | 270/270, sanitizer-clean; the UBSan gate now **aborts** on the first violation (`-fno-sanitize-recover=undefined`, #142), so pure-UBSan defects no longer pass green, and the tree is clean under it | | `make tsan` (ThreadSanitizer) | 20/20 concurrency-labelled tests, race-clean | | `make check-fixtures` | committed differential fixtures match current C++ output | | `make check-manifest` | provenance manifest matches the committed fixtures | @@ -88,7 +93,9 @@ verification. ## Outcome -Release-ready as a portfolio artifact. The next GitHub-only release is `v0.2.1` (the FIX-like text -protocol adapter #29, the perf flamegraph #32, and a Codex resume-anchor/PMU consistency sweep) on -top of `v0.2.0` (Phase III/IV systems work — M24–M49 — plus the bare-metal evidence refresh); it +Release-ready as a portfolio artifact. `v0.2.1` is already tagged (FIX adapter #29, perf flamegraph +#32, anchor sweep) on top of `v0.2.0` (Phase III/IV systems work — M24–M49 — plus the bare-metal +evidence refresh). The next GitHub-only release is **`v0.2.2`**, bundling the post-v0.2.1 +hardening + perf wave merged to `main` (#135–#146): decoder enum rejection, network/CLI hardening, a +real UBSan abort gate, OCaml diff_report robustness, and the two measured order-book perf wins. It requires explicit human approval and a squash-merge before tagging. diff --git a/docs/replay_and_recovery.md b/docs/replay_and_recovery.md index 0c48f07..2f1caef 100644 --- a/docs/replay_and_recovery.md +++ b/docs/replay_and_recovery.md @@ -163,6 +163,8 @@ measurements, not a production recovery-time claim. not read back from the log (the log could also store events, but the engine is the source of truth for replay equivalence). - The reader loads the whole log into memory before replaying (adequate for the simulator). -- Commands are trusted once their record checksum validates (M7); the command codec does not - re-validate enum domains — wire-level enum validation lives at the protocol boundary (M2) - and risk checks at the gateway (M5). +- Commands are trusted once their record checksum validates (M7). The command codec also rejects + out-of-domain enum bytes: `replay::decode_command` refuses a `NewLimit`/`NewMarket` record whose + `Side` or `TimeInForce` byte is not a defined enum value (#136), so a corrupt log record cannot + apply garbage straight to the engine. Higher-level validation still lives at the protocol boundary + (M2) and the risk gateway (M5). diff --git a/docs/socket_gateway.md b/docs/socket_gateway.md index ad6bb36..ef914d2 100644 --- a/docs/socket_gateway.md +++ b/docs/socket_gateway.md @@ -38,7 +38,9 @@ The portable `TcpServer` writes responses with a send-all loop that tolerates pa The Linux `EpollServer` keeps a per-client outbound buffer and leaves the connection registered for `EPOLLOUT` until all pending response bytes are accepted by the kernel. Both write paths use `send(..., MSG_NOSIGNAL)` where available, and the platform socket option where available, so a -client that drops before reading a response cannot terminate the gateway through `SIGPIPE`. +client that drops before reading a response cannot terminate the gateway through `SIGPIPE`. Both the +read and write paths retry on `EINTR` — a signal interruption is treated as retryable, not a +disconnect. The epoll path treats `EAGAIN` / `EWOULDBLOCK` as normal nonblocking backpressure: @@ -93,7 +95,12 @@ induces an over-cap response is disconnected. The default demo uses `TcpServer` because it is portable and easiest to inspect. The accept loop spawns one worker per accepted connection, so a slow or still-open client no longer prevents the -server from accepting a later client. The shared `OrderGateway` remains protected by an internal +server from accepting a later client. A connection cap (`TcpServerOptions::max_active_connections`, +default `0` = unbounded) load-sheds — a freshly accepted connection at the cap is closed immediately +rather than spawning another worker. The accept loop also survives transient `accept()` errors +(`EINTR`/`ECONNABORTED`, retried) and file-descriptor exhaustion (`EMFILE`/`ENFILE`, a brief back-off +retry) instead of tearing the listener down; the `EpollServer` handles the same conditions by +disarming and re-arming the listener. The shared `OrderGateway` remains protected by an internal mutex; network I/O can overlap across clients, but matching-engine mutation stays serialized and deterministic. diff --git a/docs/socket_hardening.md b/docs/socket_hardening.md index 236786b..c806f5f 100644 --- a/docs/socket_hardening.md +++ b/docs/socket_hardening.md @@ -18,6 +18,11 @@ service.** Nothing here claims a production-networking stack. | Peer disconnect mid-write | `send(MSG_NOSIGNAL)` / `SO_NOSIGPIPE` so `SIGPIPE` can't kill the process | `Session` | | Indefinite blocking recv | Bounded `SO_RCVTIMEO` on the UDP client | `udp_feed` | | UDP burst loss | Detected via sequence gaps; receive-buffer sizing knob (below) | `udp_feed` | +| UDP transmit failure | Counted, not silently dropped (`UdpPublisher::send_failures()`) | `udp_feed` | +| Signal during read/write | `EINTR` retried (not treated as a disconnect) | `TcpServer`/`EpollServer` | +| Transient accept error | `EINTR`/`ECONNABORTED` retried; listener kept alive | `TcpServer`/`EpollServer` | +| FD exhaustion | `EMFILE`/`ENFILE` survived (back-off retry / listener disarm-rearm), not a teardown | `TcpServer`/`EpollServer` | +| Connection-count overload | Optional cap (`max_active_connections`) load-sheds at the cap | `TcpServer` | The first five rows pre-date M30 (M9/M10); M30 adds the receive-buffer sizing knob and documents the loss model and the things deliberately left out. @@ -75,9 +80,12 @@ stated plainly so the gap counter is not mistaken for reliability. bottleneck here. No `io_uring` code exists; none is claimed. - **TLS / authentication / authorization.** None. The services are loopback-only demos. Do not expose them on a routable interface (see `SECURITY.md`). -- **Idle-peer timeouts, connection caps, rate limiting.** Not implemented. Heartbeats are a - liveness round-trip only; the gateway does not yet time out idle peers. These are reasonable - future hardening steps, explicitly not done today. +- **Connection caps.** Implemented as an opt-in `TcpServer` knob (`max_active_connections`, default + `0` = unbounded): at the cap a freshly accepted connection is closed (load-shed) rather than + spawning another worker. See the posture table above. +- **Idle-peer timeouts, rate limiting.** Not implemented. Heartbeats are a liveness round-trip only; + the gateway does not yet time out idle peers. These are reasonable future hardening steps, + explicitly not done today. - **`SO_REUSEADDR` / rapid rebind.** Not set; the profiling scripts dodge `TIME_WAIT` by using separate ports per pass instead of forcing address reuse. diff --git a/results/README.md b/results/README.md index 0f8b7aa..8515411 100644 --- a/results/README.md +++ b/results/README.md @@ -35,6 +35,13 @@ Benchmark results produced by `make bench` and scripts under `scripts/`. - `false_sharing_study.txt` — benchmark-only packed-vs-padded SPSC queue-cursor contention study (`make false-sharing-study`). It is research-note evidence about cache-line sharing shape, not a production throughput or latency claim. +- `socket_load_summary.txt` — Linux multi-client TCP connection-scaling load experiment + (`make socket-load`, `scripts/socket_load.sh`): N concurrent `qsl-client`s against the threaded and + epoll gateways. Constrained loopback connection-setup shape only, not a production-capacity claim. +- `socket_profile_loopback.txt` — Linux syscall/socket-path profiling of the gateway I/O path + (`make profile-io`, `scripts/profile_gateway_io.sh`). Loopback, constrained evidence. +- `socket_stress_summary.txt` — UDP socket-buffer / burst-loss experiment (`make socket-stress`): + receive-buffer sizing vs observed sequence-gap loss on loopback. Research-note evidence only. - `crash_recovery_validation.txt` — M45 SIGKILL crash / torn-tail recovery validation for the append-only event log across durability modes (`make crash-recovery`). It is process-kill evidence only: it validates crash-mid-append recovery and acknowledged-record retention across diff --git a/results/allocator_experiment.txt b/results/allocator_experiment.txt index 3824cb3..653ba40 100644 --- a/results/allocator_experiment.txt +++ b/results/allocator_experiment.txt @@ -4,12 +4,12 @@ OS: Linux 6.19.14-400.asahi.fc44.aarch64+16k Compiler: c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2) Build type: Release Provenance version: 1 -Git commit (informational): 33f8d11 -Source digest: sha256:4f09cf6b1db08de00d5fb480b77d0b1fe7ebb9ea70dbcc7d73807c7eb06e4598 +Git commit (informational): f9f7e98 +Source digest: sha256:e5fb637e109ffba8b25ab7a5274d325ea8edbbbf13aec8d88d1a486cdb1cc168 Source digest scope: allocator-experiment Dirty inputs: no Generated output: results/allocator_experiment.txt -Date: 2026-06-21T05:25:21Z +Date: 2026-06-25T02:29:37Z Dataset: engine::Order allocation microbenchmark (new/delete vs fixed pool) Warmup: iters/10 per benchmark, before timing Units: latency = ns/op + ops/sec @@ -19,6 +19,6 @@ This measures allocator mechanics for order-like objects, not end-to-end engine hardware/compiler/build dependent. A negative or tiny delta is acceptable. Scenario / Metric / Result: -order new/delete 500000 ops 14.4 ns/op 69407890 ops/sec -order pool acquire/release 500000 ops 7.0 ns/op 142345350 ops/sec -order pool burst cycle 2000 ops 7970.4 ns/op 125464 ops/sec +order new/delete 500000 ops 12.4 ns/op 80810144 ops/sec +order pool acquire/release 500000 ops 7.0 ns/op 142468570 ops/sec +order pool burst cycle 2000 ops 7368.3 ns/op 135716 ops/sec diff --git a/results/differential.txt b/results/differential.txt index 9cd907e..7e9495e 100644 --- a/results/differential.txt +++ b/results/differential.txt @@ -4,12 +4,12 @@ OS: Linux 6.19.14-400.asahi.fc44.aarch64+16k Compiler: c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2) Build type: Release Provenance version: 1 -Git commit (informational): 33f8d11 -Source digest: sha256:3fe4614b9c004642e244fafaf8d01905ed2dd92ca843bbd579e33e66f5e23836 +Git commit (informational): f9f7e98 +Source digest: sha256:736ee67ee7bfbbac0b8c45c5d2a0805b9bf19a664fa44e8ec650b38a9d46a90f Source digest scope: differential-benchmark-suite Dirty inputs: no Generated output: results/differential.txt -Date: 2026-06-21T05:25:21Z +Date: 2026-06-25T02:29:36Z Dataset: property command streams (generate_property_flow, 3 symbols, 120 orders) Warmup: iters/10 (or 1 throughput pass) per benchmark, before timing Units: latency = ns/op + ops/sec; throughput = ns/item + items/sec @@ -19,6 +19,6 @@ measure the differential-testing harness (generation, gateway replay, shrinking) production throughput; hardware/compiler/build dependent. Scenario / Metric / Result: -property flow generation 123 items 58.0 ns/item 17228399 items/sec -differential gateway replay 123 items 62.2 ns/item 16071425 items/sec -shrink property flow 300 ops 31175.3 ns/op 32077 ops/sec +property flow generation 123 items 108.7 ns/item 9200981 items/sec +differential gateway replay 123 items 113.8 ns/item 8789365 items/sec +shrink property flow 300 ops 59593.3 ns/op 16780 ops/sec diff --git a/results/dpdk_environment.txt b/results/dpdk_environment.txt index a6aa264..1e3abf8 100644 --- a/results/dpdk_environment.txt +++ b/results/dpdk_environment.txt @@ -8,12 +8,12 @@ CPU: Avalanche-M2 Compiler: c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2) Build type: n/a Provenance version: 1 -Git commit (informational): 8ab07b5 -Source digest: sha256:a15964c68f12b761a2e60e164dd8dfdc1f56d9fdc896cfe4f867ec5d22b3c8d0 +Git commit (informational): 081e1ec +Source digest: sha256:ab13cbebe013b05626085319748c5fb9e6d51383be00d78ed7337542d99d67c0 Source digest scope: dpdk-environment-check Dirty inputs: no Generated output: results/dpdk_environment.txt -Date: 2026-06-21T05:43:48Z +Date: 2026-06-25T02:37:40Z pkg-config: /usr/bin/pkg-config libdpdk pkg-config status: not-available libdpdk version: not-found diff --git a/results/false_sharing_study.txt b/results/false_sharing_study.txt index 61bc6d4..461357e 100644 --- a/results/false_sharing_study.txt +++ b/results/false_sharing_study.txt @@ -6,12 +6,12 @@ OS: Linux 6.19.14-400.asahi.fc44.aarch64+16k Compiler: c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2) Build type: Release Provenance version: 1 -Git commit (informational): 33f8d11 -Source digest: sha256:4c2b0de72788bd80e4877d9818693a37629ca2decf69260e33c4c3b0c3603c74 +Git commit (informational): f9f7e98 +Source digest: sha256:f8a12fc427a06c5795c555e77a0fca711876ea756923af1922590fee436ab5c2 Source digest scope: false-sharing-study Dirty inputs: no Generated output: results/false_sharing_study.txt -Date: 2026-06-21T05:25:24Z +Date: 2026-06-25T02:29:38Z Dataset: synthetic SPSC cursor exchange (producer tail / consumer head) Host support summary: portable two-thread C++ benchmark; no PMU counters required. @@ -29,5 +29,5 @@ index with acquire. Benchmark-only control: the padded layout puts each index on 128-byte boundary, so the two cursors sit on distinct coherency lines even on hosts with 128-byte cache lines (Apple Silicon); the production SpscRing pads to 64 bytes. -packed indices 4000000 cursor updates 2.9 ns/update 340900588 updates/sec checksum=4000000061052 -padded indices 4000000 cursor updates 29.6 ns/update 33747427 updates/sec checksum=4000002007457 +packed indices 4000000 cursor updates 2.8 ns/update 359899203 updates/sec checksum=4000000021261 +padded indices 4000000 cursor updates 27.2 ns/update 36771064 updates/sec checksum=4000002008455 diff --git a/results/latest.txt b/results/latest.txt index 5e68f2a..7c24ef2 100644 --- a/results/latest.txt +++ b/results/latest.txt @@ -4,12 +4,12 @@ OS: Linux 6.19.14-400.asahi.fc44.aarch64+16k Compiler: c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2) Build type: Release Provenance version: 1 -Git commit (informational): 33f8d11 -Source digest: sha256:1271610ad6c9c96534b350239f0341fe413ce0b989f3aca3434b2f4652395b64 +Git commit (informational): 114445a +Source digest: sha256:c54df82614b53ea736e845a5893096a9ee12b65a8cee49be28b9c53a25d5a9df Source digest scope: core-benchmark-suite Dirty inputs: no Generated output: results/latest.txt -Date: 2026-06-21T05:25:21Z +Date: 2026-06-25T02:36:30Z Dataset: synthetic order flow (replay::generate_flow, seed 42, 4 symbols) Warmup: iters/10 (or 1 throughput pass) per benchmark, before timing Units: latency = ns/op + ops/sec; throughput = ns/item + items/sec @@ -19,8 +19,8 @@ no kernel/IO path, stock allocator). NOT production exchange throughput or end-to-end latency; hardware/compiler/build dependent. Scenario / Metric / Result: -order_book add/mod/cancel 200000 ops 87.3 ns/op 11458608 ops/sec -protocol encode+decode 500000 ops 15.9 ns/op 62727387 ops/sec -gateway session (fill) 200000 ops 109.7 ns/op 9115527 ops/sec -matching engine flow 5004 items 98.2 ns/item 10181380 items/sec -replay command log 5004 items 110.4 ns/item 9059370 items/sec +order_book add/mod/cancel 200000 ops 90.6 ns/op 11043024 ops/sec +protocol encode+decode 500000 ops 16.1 ns/op 62049736 ops/sec +gateway session (fill) 200000 ops 102.3 ns/op 9776174 ops/sec +matching engine flow 5004 items 91.4 ns/item 10939533 items/sec +replay command log 5004 items 101.3 ns/item 9874313 items/sec diff --git a/results/nic_offload_environment.txt b/results/nic_offload_environment.txt index 109125b..14c7f86 100644 --- a/results/nic_offload_environment.txt +++ b/results/nic_offload_environment.txt @@ -1,4 +1,4 @@ -Command: QSL_NIC_DEVICES=wld0 make nic-offload-check +Command: make nic-offload-check Artifact: NIC offload and timestamping capability check (non-mutating) Evidence class: linux-readonly-capability-observation Host support summary: Linux host with read-only NIC capability inspection; no settings changed and no packet measurement ran @@ -8,24 +8,24 @@ CPU: Avalanche-M2 Compiler: c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2) Build type: n/a Provenance version: 1 -Git commit (informational): 8ab07b5 -Source digest: sha256:904cbf8c83cd9ee0107e11be380c53f9509b28b2df2f96a2db8b54b323091a30 +Git commit (informational): 081e1ec +Source digest: sha256:088a8ba85f514bc6264b43b5754e95f6a8f2e0a239936492f50f800031f8f782 Source digest scope: nic-offload-environment-check Dirty inputs: no Generated output: results/nic_offload_environment.txt -Date: 2026-06-21T05:43:48Z +Date: 2026-06-25T02:37:40Z ethtool: /usr/bin/ethtool ip: /usr/bin/ip lspci: /usr/bin/lspci phc_ctl: not-found ptp4l: not-found -Requested Linux devices: wld0 +Requested Linux devices: docker0 tailscale0 wld0 Missing requested devices: none -Linux devices inspected: wld0 -Device count: 1 +Linux devices inspected: docker0 tailscale0 wld0 +Device count: 3 Offload feature list visible: yes RSS indirection/hash visible: no -Queue/channel info visible: no +Queue/channel info visible: yes Hardware timestamping visible: no Offload settings changed: no RSS settings changed: no @@ -38,6 +38,232 @@ Caveat: This artifact records read-only host and NIC capability context. It does not change offload flags, queue counts, RSS tables, timestamp filters, drivers, or interrupt affinity, and it does not support any NIC-offload or latency claim. +== device docker0 summary == +operstate: down +mtu: 1500 +driver: n/a +pci: n/a +rx queues: 1 +tx queues: 1 + +== ip -details link show dev docker0 == +4: docker0: mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default + link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535 netns-immutable + bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time 30000 stp_state 0 priority 32768 vlan_filtering 0 vlan_protocol 802.1Q bridge_id 8000.xx:xx:xx:xx:xx:xx designated_root 8000.xx:xx:xx:xx:xx:xx root_port 0 root_path_cost 0 topology_change 0 topology_change_detected 0 hello_timer 0.00 tcn_timer 0.00 topology_change_timer 0.00 gc_timer 0.00 fdb_n_learned 0 fdb_max_learned 0 vlan_default_pvid 1 vlan_stats_enabled 0 vlan_stats_per_port 0 group_fwd_mask 0 group_address xx:xx:xx:xx:xx:xx mcast_snooping 1 no_linklocal_learn 0 mcast_vlan_snooping 0 mst_enabled 0 mdb_offload_fail_notification 0 mcast_router 1 mcast_query_use_ifaddr 0 mcast_querier 0 mcast_hash_elasticity 16 mcast_hash_max 4096 mcast_last_member_count 2 mcast_startup_query_count 2 mcast_last_member_interval 100 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000 mcast_startup_query_interval 3125 mcast_stats_enabled 0 mcast_igmp_version 2 mcast_mld_version 1 nf_call_iptables 0 nf_call_ip6tables 0 nf_call_arptables 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 gso_ipv4_max_size 65536 gro_ipv4_max_size 65536 + +== ethtool -i docker0 == +driver: bridge +version: 2.3 +firmware-version: N/A +expansion-rom-version: +bus-info: N/A +supports-statistics: no +supports-test: no +supports-eeprom-access: no +supports-register-dump: no +supports-priv-flags: no + +== ethtool -k docker0 == +Features for docker0: +rx-checksumming: off [fixed] +tx-checksumming: on + tx-checksum-ipv4: off [fixed] + tx-checksum-ip-generic: on + tx-checksum-ipv6: off [fixed] + tx-checksum-fcoe-crc: off [fixed] + tx-checksum-sctp: off [fixed] +scatter-gather: on + tx-scatter-gather: on + tx-scatter-gather-fraglist: on +tcp-segmentation-offload: on + tx-tcp-segmentation: on + tx-tcp-ecn-segmentation: on + tx-tcp-mangleid-segmentation: on + tx-tcp6-segmentation: on + tx-tcp-accecn-segmentation: on +generic-segmentation-offload: on +generic-receive-offload: on +large-receive-offload: off [fixed] +rx-vlan-offload: off [fixed] +tx-vlan-offload: on +ntuple-filters: off [fixed] +receive-hashing: off [fixed] +highdma: on +rx-vlan-filter: off [fixed] +vlan-challenged: off [fixed] +tx-gso-robust: on +tx-fcoe-segmentation: on +tx-gre-segmentation: on +tx-gre-csum-segmentation: on +tx-ipxip4-segmentation: on +tx-ipxip6-segmentation: on +tx-udp_tnl-segmentation: on +tx-udp_tnl-csum-segmentation: on +tx-gso-partial: on +tx-tunnel-remcsum-segmentation: on +tx-sctp-segmentation: on +tx-esp-segmentation: on +tx-udp-segmentation: on +tx-gso-list: on +tx-nocache-copy: off +loopback: off [fixed] +rx-fcs: off [fixed] +rx-all: off [fixed] +tx-vlan-stag-hw-insert: on +rx-vlan-stag-hw-parse: off [fixed] +rx-vlan-stag-filter: off [fixed] +l2-fwd-offload: off [fixed] +hw-tc-offload: off [fixed] +esp-hw-offload: off [fixed] +esp-tx-csum-hw-offload: off [fixed] +rx-udp_tunnel-port-offload: off [fixed] +tls-hw-tx-offload: off [fixed] +tls-hw-rx-offload: off [fixed] +rx-gro-hw: off [fixed] +tls-hw-record: off [fixed] +rx-gro-list: off +macsec-hw-offload: off [fixed] +rx-udp-gro-forwarding: off +hsr-tag-ins-offload: off [fixed] +hsr-tag-rm-offload: off [fixed] +hsr-fwd-offload: off [fixed] +hsr-dup-offload: off [fixed] + +== ethtool -l docker0 == +netlink error: Operation not supported +command failed: /usr/bin/ethtool -l docker0 + +== ethtool -x docker0 == +netlink error: Operation not supported +command failed: /usr/bin/ethtool -x docker0 + +== ethtool -T docker0 == +Time stamping parameters for docker0: +Capabilities: + software-receive + software-system-clock +PTP Hardware Clock: none +Hardware Transmit Timestamp Modes: none +Hardware Receive Filter Modes: none + +== device tailscale0 summary == +operstate: unknown +mtu: 1280 +driver: n/a +pci: n/a +rx queues: 1 +tx queues: 1 + +== ip -details link show dev tailscale0 == +2: tailscale0: mtu 1280 qdisc fq_codel state UNKNOWN mode DEFAULT group default qlen 500 + link/none promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535 + tun type tun pi off vnet_hdr on persist off addrgenmode random numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 gso_ipv4_max_size 65536 gro_ipv4_max_size 65536 + +== ethtool -i tailscale0 == +driver: tun +version: 1.6 +firmware-version: +expansion-rom-version: +bus-info: tun +supports-statistics: no +supports-test: no +supports-eeprom-access: no +supports-register-dump: no +supports-priv-flags: no + +== ethtool -k tailscale0 == +Features for tailscale0: +rx-checksumming: off [fixed] +tx-checksumming: on + tx-checksum-ipv4: off [fixed] + tx-checksum-ip-generic: on + tx-checksum-ipv6: off [fixed] + tx-checksum-fcoe-crc: off [fixed] + tx-checksum-sctp: off [fixed] +scatter-gather: on + tx-scatter-gather: on + tx-scatter-gather-fraglist: on +tcp-segmentation-offload: on + tx-tcp-segmentation: on + tx-tcp-ecn-segmentation: off + tx-tcp-mangleid-segmentation: off + tx-tcp6-segmentation: on + tx-tcp-accecn-segmentation: off [fixed] +generic-segmentation-offload: on +generic-receive-offload: on +large-receive-offload: off [fixed] +rx-vlan-offload: off [fixed] +tx-vlan-offload: on +ntuple-filters: off [fixed] +receive-hashing: off [fixed] +highdma: off [fixed] +rx-vlan-filter: off [fixed] +vlan-challenged: off [fixed] +tx-gso-robust: off [fixed] +tx-fcoe-segmentation: off [fixed] +tx-gre-segmentation: off [fixed] +tx-gre-csum-segmentation: off [fixed] +tx-ipxip4-segmentation: off [fixed] +tx-ipxip6-segmentation: off [fixed] +tx-udp_tnl-segmentation: off +tx-udp_tnl-csum-segmentation: off +tx-gso-partial: off [fixed] +tx-tunnel-remcsum-segmentation: off [fixed] +tx-sctp-segmentation: off [fixed] +tx-esp-segmentation: off [fixed] +tx-udp-segmentation: on +tx-gso-list: off [fixed] +tx-nocache-copy: off +loopback: off [fixed] +rx-fcs: off [fixed] +rx-all: off [fixed] +tx-vlan-stag-hw-insert: on +rx-vlan-stag-hw-parse: off [fixed] +rx-vlan-stag-filter: off [fixed] +l2-fwd-offload: off [fixed] +hw-tc-offload: off [fixed] +esp-hw-offload: off [fixed] +esp-tx-csum-hw-offload: off [fixed] +rx-udp_tunnel-port-offload: off [fixed] +tls-hw-tx-offload: off [fixed] +tls-hw-rx-offload: off [fixed] +rx-gro-hw: off [fixed] +tls-hw-record: off [fixed] +rx-gro-list: off +macsec-hw-offload: off [fixed] +rx-udp-gro-forwarding: off +hsr-tag-ins-offload: off [fixed] +hsr-tag-rm-offload: off [fixed] +hsr-fwd-offload: off [fixed] +hsr-dup-offload: off [fixed] + +== ethtool -l tailscale0 == +Channel parameters for tailscale0: +Pre-set maximums: +RX: n/a +TX: n/a +Other: n/a +Combined: 1 +Current hardware settings: +RX: n/a +TX: n/a +Other: n/a +Combined: 1 + +== ethtool -x tailscale0 == +netlink error: Operation not supported +command failed: /usr/bin/ethtool -x tailscale0 + +== ethtool -T tailscale0 == +Time stamping parameters for tailscale0: +Capabilities: + software-transmit + software-receive + software-system-clock +PTP Hardware Clock: none +Hardware Transmit Timestamp Modes: none +Hardware Receive Filter Modes: none + == device wld0 summary == operstate: up mtu: 1500 @@ -50,7 +276,7 @@ tx queues: 1 01:00.0 Network controller: Broadcom Inc. and subsidiaries BCM4387 802.11ax Dual Band Wireless LAN Controller (rev 07) == ip -details link show dev wld0 == -2: wld0: mtu 1500 qdisc fq_codel state UP mode DORMANT group default qlen 1000 +3: wld0: mtu 1500 qdisc fq_codel state UP mode DORMANT group default qlen 1000 link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff permaddr xx:xx:xx:xx:xx:xx promiscuity 0 allmulti 0 minmtu 68 maxmtu 1500 netns-immutable addrgenmode none numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 gso_ipv4_max_size 65536 gro_ipv4_max_size 65536 parentbus pci parentdev 0000:01:00.0 altname wlp1s0f0 altname wlxxxxxxxxxxxxx diff --git a/results/numa_affinity_study.txt b/results/numa_affinity_study.txt index f95a440..5b4c8b5 100644 --- a/results/numa_affinity_study.txt +++ b/results/numa_affinity_study.txt @@ -1,4 +1,4 @@ -Command: QSL_NUMA_ALLOW_CONSTRAINED=1 QSL_NUMA_BIN=build/bench/qsl-bench make numa-study +Command: QSL_NUMA_BIN=build/bench/qsl-bench make numa-study Evidence class: linux-constrained Host support summary: Linux host, constrained evidence Hardware: aarch64 @@ -7,12 +7,12 @@ CPU: Avalanche-M2 Compiler: c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2) Build type: Release Provenance version: 1 -Git commit (informational): 33f8d11 -Source digest: sha256:ff7a0a6b696ef700cd7bb568a531cf2e06ea16932d1031a8cfe85be6e0d21b91 +Git commit (informational): f9f7e98 +Source digest: sha256:0b9e8373fa304d7e734399a92e3b7f8bc8f4c6ee538621ad53cc35b443c67909 Source digest scope: numa-affinity-study Dirty inputs: no Generated output: results/numa_affinity_study.txt -Date: 2026-06-21T05:25:24Z +Date: 2026-06-25T02:30:17Z Benchmark binary: build/bench/qsl-bench Allowed CPUs: 0-7 CPU chosen: 0 @@ -50,18 +50,18 @@ Pinned command: taskset -c 0 build/bench/qsl-bench Unpinned benchmark output: -order_book add/mod/cancel 200000 ops 138.0 ns/op 7248959 ops/sec -protocol encode+decode 500000 ops 20.9 ns/op 47914709 ops/sec -gateway session (fill) 200000 ops 128.2 ns/op 7800869 ops/sec -matching engine flow 5004 items 101.7 ns/item 9834865 items/sec -replay command log 5004 items 113.7 ns/item 8798241 items/sec +order_book add/mod/cancel 200000 ops 113.5 ns/op 8807360 ops/sec +protocol encode+decode 500000 ops 20.0 ns/op 50088843 ops/sec +gateway session (fill) 200000 ops 115.6 ns/op 8652239 ops/sec +matching engine flow 5004 items 93.6 ns/item 10682213 items/sec +replay command log 5004 items 101.4 ns/item 9862110 items/sec Pinned benchmark output: -order_book add/mod/cancel 200000 ops 143.3 ns/op 6976774 ops/sec -protocol encode+decode 500000 ops 27.7 ns/op 36063756 ops/sec -gateway session (fill) 200000 ops 236.9 ns/op 4220492 ops/sec -matching engine flow 5004 items 187.1 ns/item 5345523 items/sec -replay command log 5004 items 221.8 ns/item 4508370 items/sec +order_book add/mod/cancel 200000 ops 234.2 ns/op 4269507 ops/sec +protocol encode+decode 500000 ops 29.9 ns/op 33473413 ops/sec +gateway session (fill) 200000 ops 219.4 ns/op 4558316 ops/sec +matching engine flow 5004 items 168.2 ns/item 5946192 items/sec +replay command log 5004 items 187.6 ns/item 5329445 items/sec NUMA local benchmark output: NUMA node-local/remote binding skipped: fewer than two NUMA nodes found @@ -76,10 +76,10 @@ Unpinned perf stat output: 0 context-switches:u 0 cpu-migrations:u - 0.084315551 seconds time elapsed + 0.095650252 seconds time elapsed - 0.084129000 seconds user - 0.000000000 seconds sys + 0.094408000 seconds user + 0.000983000 seconds sys @@ -90,10 +90,10 @@ Pinned perf stat output: 0 context-switches:u 0 cpu-migrations:u - 0.154719226 seconds time elapsed + 0.144623525 seconds time elapsed - 0.152299000 seconds user - 0.001988000 seconds sys + 0.141344000 seconds user + 0.002985000 seconds sys @@ -111,7 +111,7 @@ Core(s) per socket: 4 Socket(s): 1 Stepping: 0x1 Frequency boost: disabled -CPU(s) scaling MHz: 53% +CPU(s) scaling MHz: 100% CPU max MHz: 2424.0000 CPU min MHz: 600.0000 BogoMIPS: 48.00 @@ -156,7 +156,7 @@ numactl --hardware output: available: 1 nodes (0) node 0 cpus: 0 1 2 3 4 5 6 7 node 0 size: 7481 MB -node 0 free: 1620 MB +node 0 free: 2023 MB node distances: node 0 0: 10 diff --git a/results/perf_report_linux.txt b/results/perf_report_linux.txt index 92e4c04..3bd5be1 100644 --- a/results/perf_report_linux.txt +++ b/results/perf_report_linux.txt @@ -8,18 +8,18 @@ Perf: perf version 6.19.14-400.asahi.fc44.aarch64 Perf paranoid: 2 Build type: Release Provenance version: 1 -Git commit (informational): 33f8d11 -Source digest: sha256:c991d51c8076952f2c3dcd5e407f78e512d9fb4e573cb7ad65f7a700a9ed37a2 +Git commit (informational): f9f7e98 +Source digest: sha256:1837aa008369e0029dd4a16e7e780bacac293688e03351b88dbb4c586fbbf34e Source digest scope: perf-record-benchmark Dirty inputs: no Generated output: results/perf_report_linux.txt -Date: 2026-06-21T05:25:24Z +Date: 2026-06-25T02:30:02Z Benchmark binary: build/bench/qsl-bench Benchmark status: 0 Dataset: qsl-bench default synthetic benchmark suite Record event: cpu-clock Sample freq: 2000 Hz -Sample count: 186 +Sample count: 188 Minimum samples for hot profile: 100 Insufficient samples: no Report limit: 1% @@ -34,22 +34,22 @@ cpu-clock event is a software sampling profile for hot-symbol investigation, not a latency or throughput measurement. Benchmark output: -order_book add/mod/cancel 200000 ops 141.9 ns/op 7047598 ops/sec -protocol encode+decode 500000 ops 21.3 ns/op 47032627 ops/sec -gateway session (fill) 200000 ops 129.1 ns/op 7743908 ops/sec -matching engine flow 5004 items 103.0 ns/item 9713046 items/sec -replay command log 5004 items 112.8 ns/item 8863630 items/sec +order_book add/mod/cancel 200000 ops 131.2 ns/op 7621590 ops/sec +protocol encode+decode 500000 ops 20.1 ns/op 49771400 ops/sec +gateway session (fill) 200000 ops 118.7 ns/op 8425787 ops/sec +matching engine flow 5004 items 95.1 ns/item 10520557 items/sec +replay command log 5004 items 99.8 ns/item 10021725 items/sec Benchmark output under perf: -order_book add/mod/cancel 200000 ops 112.7 ns/op 8873425 ops/sec -protocol encode+decode 500000 ops 21.0 ns/op 47551868 ops/sec -gateway session (fill) 200000 ops 127.6 ns/op 7833933 ops/sec -matching engine flow 5004 items 101.1 ns/item 9892789 items/sec -replay command log 5004 items 119.5 ns/item 8368038 items/sec +order_book add/mod/cancel 200000 ops 139.1 ns/op 7190560 ops/sec +protocol encode+decode 500000 ops 20.5 ns/op 48888534 ops/sec +gateway session (fill) 200000 ops 117.5 ns/op 8511050 ops/sec +matching engine flow 5004 items 92.2 ns/item 10847835 items/sec +replay command log 5004 items 97.9 ns/item 10213751 items/sec perf record stderr: [ perf record: Woken up 1 times to write data ] -[ perf record: Captured and wrote 0.028 MB build/perf/qsl-bench.perf.data (186 samples) ] +[ perf record: Captured and wrote 0.027 MB build/perf/qsl-bench.perf.data (188 samples) ] perf report stderr: @@ -59,15 +59,20 @@ perf report output: # # Total Lost Samples: 0 # -# Samples: 186 of event 'cpu-clock:u' -# Event count (approx.): 93000000 +# Samples: 188 of event 'cpu-clock:u' +# Event count (approx.): 94000000 # -# Overhead Symbol Shared Object IPC [IPC Coverage] -# ........ ....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... ..................... .................... +# Overhead Symbol Shared Object IPC [IPC Coverage] +# ........ ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... ..................... .................... # - 10.22% [.] cfree@GLIBC_2.17 libc.so.6 - - + 12.23% [.] cfree@GLIBC_2.17 libc.so.6 - - | - |--1.61%--qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) + |--3.19%--main + | __libc_start_call_main + | __libc_start_main@@GLIBC_2.34 + | _start + | + |--1.60%--qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) | decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] | qsl::engine::OrderBook::cancel(unsigned long) | main @@ -75,7 +80,7 @@ perf report output: | __libc_start_main@@GLIBC_2.34 | _start | - |--1.08%--qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector >&, unsigned long) + |--1.06%--qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector >&, unsigned long) | qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long) | qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long) | qsl::gateway::Session::on_bytes(std::span) @@ -84,7 +89,7 @@ perf report output: | __libc_start_main@@GLIBC_2.34 | _start | - |--1.08%--qsl::gateway::(anonymous namespace)::append(std::vector >&, std::vector > const&, unsigned long) [clone .isra.0] + |--1.06%--qsl::gateway::(anonymous namespace)::append(std::vector >&, std::vector > const&, unsigned long) [clone .isra.0] | qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector >&, unsigned long) | qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long) | qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long) @@ -94,160 +99,93 @@ perf report output: | __libc_start_main@@GLIBC_2.34 | _start | - |--1.08%--qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector > const&) - | main - | __libc_start_call_main - | __libc_start_main@@GLIBC_2.34 - | _start - | - |--1.08%--std::_Hashtable, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node, false>*) - | qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) + |--1.06%--qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) | - --1.08%--main + --1.06%--0x5000000402b63 __libc_start_call_main __libc_start_main@@GLIBC_2.34 _start - 9.68% [.] malloc libc.so.6 - - + 6.91% [.] malloc libc.so.6 - - | - |--6.45%--operator new(unsigned long) - | | - | |--2.15%--qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - | | | - | | --1.61%--qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - | | qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long) - | | qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long) - | | qsl::gateway::Session::on_bytes(std::span) - | | main - | | __libc_start_call_main - | | __libc_start_main@@GLIBC_2.34 - | | _start - | | - | |--1.61%--qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - | | qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - | | qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - | | qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long) - | | qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long) - | | qsl::gateway::Session::on_bytes(std::span) - | | main - | | __libc_start_call_main - | | __libc_start_main@@GLIBC_2.34 - | | _start + |--3.72%--operator new(unsigned long) | | - | |--1.08%--qsl::gateway::(anonymous namespace)::append(std::vector >&, std::vector > const&, unsigned long) [clone .isra.0] - | | qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector >&, unsigned long) - | | qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long) - | | qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long) - | | qsl::gateway::Session::on_bytes(std::span) - | | main - | | __libc_start_call_main - | | __libc_start_main@@GLIBC_2.34 - | | _start - | | - | --1.08%--qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) + | --1.60%--qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) | main | __libc_start_call_main | __libc_start_main@@GLIBC_2.34 | _start | - --3.23%--__posix_memalign + --3.19%--__posix_memalign operator new(unsigned long, std::align_val_t) | - |--1.61%--std::__detail::_Map_base, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits, true>::operator[](unsigned long const&) - | qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) + |--1.60%--qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) | qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) | | - | --1.08%--main + | --1.06%--main | __libc_start_call_main | __libc_start_main@@GLIBC_2.34 | _start | - --1.08%--qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) + --1.06%--qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) + qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + + 5.32% [.] qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) qsl-bench - - + | + |--3.19%--main + | __libc_start_call_main + | __libc_start_main@@GLIBC_2.34 + | _start + | + --1.60%--qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + | + --1.06%--qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&) + qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector > const&) main __libc_start_call_main __libc_start_main@@GLIBC_2.34 _start - 8.60% [.] qsl::protocol::decode_new_order(std::span) qsl-bench - - + 4.79% [.] qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) qsl-bench - - + | + ---main + __libc_start_call_main + __libc_start_main@@GLIBC_2.34 + _start + + 3.72% [.] operator new(unsigned long) libstdc++.so.6.0.35 - - | - |--6.99%--main + |--1.06%--qsl::engine::OrderBook::fill_front_order(std::__cxx11::list >&, long, qsl::engine::OrderBook::MatchContext&) + | qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) + | qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + | qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + | qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + | qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long) + | qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long) + | qsl::gateway::Session::on_bytes(std::span) + | main | __libc_start_call_main | __libc_start_main@@GLIBC_2.34 | _start | - --1.61%--qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long) - qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long) - qsl::gateway::Session::on_bytes(std::span) + --1.06%--qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) main __libc_start_call_main __libc_start_main@@GLIBC_2.34 _start - 4.84% [.] qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) qsl-bench - - - | - --4.30%--qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - | - --3.76%--qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - | - |--2.15%--qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - | qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long) - | qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long) - | qsl::gateway::Session::on_bytes(std::span) - | main - | __libc_start_call_main - | __libc_start_main@@GLIBC_2.34 - | _start - | - --1.61%--qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&) - | - --1.08%--main - __libc_start_call_main - __libc_start_main@@GLIBC_2.34 - _start - - 4.30% [.] malloc@plt libstdc++.so.6.0.35 - - - | - ---operator new(unsigned long) - | - |--2.15%--qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - | | - | |--1.08%--qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&) - | | - | --1.08%--qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - | qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long) - | qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long) - | qsl::gateway::Session::on_bytes(std::span) - | main - | __libc_start_call_main - | __libc_start_main@@GLIBC_2.34 - | _start - | - --1.08%--qsl::protocol::encode(qsl::protocol::Fill const&) - qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector >&, unsigned long) - qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long) - qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long) - qsl::gateway::Session::on_bytes(std::span) - main - __libc_start_call_main - __libc_start_main@@GLIBC_2.34 - _start - - 2.69% [.] operator new(unsigned long) libstdc++.so.6.0.35 - - - | - --1.08%--qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - - 2.69% [.] qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) qsl-bench - - + 3.72% [.] qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) qsl-bench - - | - |--1.61%--qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&) + |--2.13%--qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&) | | - | --1.08%--qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector > const&) + | --1.60%--qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector > const&) | main | __libc_start_call_main | __libc_start_main@@GLIBC_2.34 | _start | - --1.08%--qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + --1.60%--qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long) qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long) qsl::gateway::Session::on_bytes(std::span) @@ -256,200 +194,245 @@ perf report output: __libc_start_main@@GLIBC_2.34 _start - 2.69% [.] qsl::engine::OrderBook::contains(unsigned long) const qsl-bench - - + 3.72% [.] qsl::protocol::decode_header(std::span) qsl-bench - - | - --1.61%--qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) + |--2.13%--qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long) + | qsl::gateway::Session::on_bytes(std::span) + | main + | __libc_start_call_main + | __libc_start_main@@GLIBC_2.34 + | _start + | + --1.60%--qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long) + qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long) + qsl::gateway::Session::on_bytes(std::span) main __libc_start_call_main __libc_start_main@@GLIBC_2.34 _start - 2.15% [.] __posix_memalign libc.so.6 - - + 3.19% [.] malloc@plt libstdc++.so.6.0.35 - - | - ---operator new(unsigned long, std::align_val_t) + ---operator new(unsigned long) | - --1.08%--std::__detail::_Map_base, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits, true>::operator[](unsigned long const&) - qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) - qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - - 2.15% [.] main qsl-bench - - - | - ---__libc_start_call_main - __libc_start_main@@GLIBC_2.34 - _start - - 2.15% [.] operator delete(void*)@plt libstdc++.so.6.0.35 - - - 2.15% [.] operator delete(void*, unsigned long, std::align_val_t)@plt libstdc++.so.6.0.35 - - - | - |--1.08%--std::_Hashtable, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node, false>*) - | decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] - | qsl::engine::OrderBook::cancel(unsigned long) - | - --1.08%--qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) - decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] - qsl::engine::OrderBook::cancel(unsigned long) - - 2.15% [.] qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) qsl-bench - - - | - ---main - __libc_start_call_main - __libc_start_main@@GLIBC_2.34 - _start - - 2.15% [.] std::_Hashtable, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node, false>*) qsl-bench - - - | - --1.61%--decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] - | - --1.08%--qsl::engine::OrderBook::cancel(unsigned long) - - 2.15% [.] std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) libstdc++.so.6.0.35 - - - | - ---qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) - decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] - qsl::engine::OrderBook::cancel(unsigned long) - main - __libc_start_call_main - __libc_start_main@@GLIBC_2.34 - _start + |--1.06%--qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + | qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long) + | qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long) + | qsl::gateway::Session::on_bytes(std::span) + | main + | __libc_start_call_main + | __libc_start_main@@GLIBC_2.34 + | _start + | + --1.06%--qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) + main + __libc_start_call_main + __libc_start_main@@GLIBC_2.34 + _start - 2.15% [.] std::__detail::_Map_base, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits, true>::operator[](unsigned long const&) qsl-bench - - + 3.19% [.] qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) qsl-bench - - | ---qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - main - __libc_start_call_main - __libc_start_main@@GLIBC_2.34 - _start + | + |--2.13%--main + | __libc_start_call_main + | __libc_start_main@@GLIBC_2.34 + | _start + | + --1.06%--qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&) - 1.61% [.] decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] qsl-bench - - + 3.19% [.] qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) qsl-bench - - | - ---qsl::engine::OrderBook::cancel(unsigned long) - main - __libc_start_call_main - __libc_start_main@@GLIBC_2.34 - _start + ---qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + | + --2.66%--qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + | + |--1.60%--qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&) + | | + | --1.06%--qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector > const&) + | main + | __libc_start_call_main + | __libc_start_main@@GLIBC_2.34 + | _start + | + --1.06%--qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long) + qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long) + qsl::gateway::Session::on_bytes(std::span) + main + __libc_start_call_main + __libc_start_main@@GLIBC_2.34 + _start + + 3.19% [.] qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) qsl-bench - - + | + ---qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + | + --2.66%--main + __libc_start_call_main + __libc_start_main@@GLIBC_2.34 + _start - 1.61% [.] memcpy@plt qsl-bench - - - 1.61% [.] operator new(unsigned long, std::align_val_t) libstdc++.so.6.0.35 - - + 2.66% [.] _mid_memalign libc.so.6 - - | - --1.08%--qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) + |--1.60%--0x2fffff346e1a63 + | operator new(unsigned long, std::align_val_t) + | qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) + | qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) + | qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + | main + | __libc_start_call_main + | __libc_start_main@@GLIBC_2.34 + | _start + | + --1.06%--0x63ffff346e1a63 + operator new(unsigned long, std::align_val_t) + std::__detail::_Map_base, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits, true>::operator[](unsigned long const&) + qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) main __libc_start_call_main __libc_start_main@@GLIBC_2.34 _start - 1.61% [.] operator new(unsigned long, std::align_val_t)@plt libstdc++.so.6.0.35 - - + 2.66% [.] qsl::engine::OrderBook::contains(unsigned long) const qsl-bench - - | - --1.08%--std::pair > > >, bool> std::_Rb_tree > >, std::_Select1st > > >, std::greater, std::pmr::polymorphic_allocator > > > >::_M_emplace_unique > >(long&, std::__cxx11::list >&&) - qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) - qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) - qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + --2.13%--qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) main __libc_start_call_main __libc_start_main@@GLIBC_2.34 _start - 1.61% [.] qsl::engine::OrderBook::fill_front_order(std::__cxx11::list >&, long, qsl::engine::OrderBook::MatchContext&) qsl-bench - - + 2.66% [.] qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) qsl-bench - - | - ---qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) - qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + ---decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] | - --1.08%--qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long) - qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long) - qsl::gateway::Session::on_bytes(std::span) + --2.13%--qsl::engine::OrderBook::cancel(unsigned long) main __libc_start_call_main __libc_start_main@@GLIBC_2.34 _start - 1.61% [.] qsl::protocol::decode_header(std::span) qsl-bench - - + 2.66% [.] std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) libstdc++.so.6.0.35 - - | - --1.08%--qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long) - qsl::gateway::Session::on_bytes(std::span) + |--1.60%--qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) + | | + | --1.06%--qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + | qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + | qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&) + | main + | __libc_start_call_main + | __libc_start_main@@GLIBC_2.34 + | _start + | + --1.06%--qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) + decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] + qsl::engine::OrderBook::cancel(unsigned long) main __libc_start_call_main __libc_start_main@@GLIBC_2.34 _start - 1.61% [.] std::__detail::_List_node_base::_M_unhook()@plt qsl-bench - - + 2.13% [.] __posix_memalign libc.so.6 - - | - --1.08%--qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) - decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] - qsl::engine::OrderBook::cancel(unsigned long) + |--1.06%--operator new(unsigned long, std::align_val_t) + | std::__detail::_Map_base, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits, true>::operator[](unsigned long const&) + | qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) + | qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + | + --1.06%--0x14ffff349d51d3 + qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) + qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) main __libc_start_call_main __libc_start_main@@GLIBC_2.34 _start - 1.61% [.] std::pair > > >, bool> std::_Rb_tree > >, std::_Select1st > > >, std::greater, std::pmr::polymorphic_allocator > > > >::_M_emplace_unique > >(long&, std::__cxx11::list >&&) qsl-bench - - + 2.13% [.] operator delete(void*)@plt libstdc++.so.6.0.35 - - | - ---qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) - qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) - qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - | - --1.08%--qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) - qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) - qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&) - qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector > const&) - main - __libc_start_call_main - __libc_start_main@@GLIBC_2.34 - _start + --1.60%--main + __libc_start_call_main + __libc_start_main@@GLIBC_2.34 + _start - 1.08% [.] __memcpy_generic libc.so.6 - - - 1.08% [.] _mid_memalign libc.so.6 - - + 2.13% [.] qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector >&, unsigned long) qsl-bench - - | - ---__posix_memalign - operator new(unsigned long, std::align_val_t) - std::__detail::_Map_base, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits, true>::operator[](unsigned long const&) - qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) - qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + ---qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long) + qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long) + qsl::gateway::Session::on_bytes(std::span) main __libc_start_call_main __libc_start_main@@GLIBC_2.34 _start - 1.08% [.] operator delete(void*, unsigned long)@plt qsl-bench - - - 1.08% [.] qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) qsl-bench - - + 1.60% [.] decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] qsl-bench - - | - ---qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&) + --1.06%--qsl::engine::OrderBook::cancel(unsigned long) + main + __libc_start_call_main + __libc_start_main@@GLIBC_2.34 + _start - 1.08% [.] qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const qsl-bench - - - 1.08% [.] qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) qsl-bench - - + 1.60% [.] free@plt libstdc++.so.6.0.35 - - + 1.60% [.] operator delete(void*, unsigned long, std::align_val_t)@plt libstdc++.so.6.0.35 - - + 1.60% [.] qsl::engine::OrderBook::can_store_limit(qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const qsl-bench - - | - ---qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&) - qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector > const&) + ---qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long) + qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long) + qsl::gateway::Session::on_bytes(std::span) main __libc_start_call_main __libc_start_main@@GLIBC_2.34 _start - 1.08% [.] qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) qsl-bench - - + 1.60% [.] qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long) qsl-bench - - | - ---qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) - qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) - qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&) + ---qsl::gateway::Session::on_bytes(std::span) main __libc_start_call_main __libc_start_main@@GLIBC_2.34 _start - 1.08% [.] qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector >&, unsigned long) qsl-bench - - + 1.60% [.] qsl::protocol::decode_new_order(std::span) qsl-bench - - | - ---qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long) - qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long) - qsl::gateway::Session::on_bytes(std::span) + --1.06%--main + __libc_start_call_main + __libc_start_main@@GLIBC_2.34 + _start + + 1.60% [.] qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) qsl-bench - - + | + ---main + __libc_start_call_main + __libc_start_main@@GLIBC_2.34 + _start + + 1.06% [.] __memcpy_generic libc.so.6 - - + 1.06% [.] decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] qsl-bench - - + | + ---qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + + 1.06% [.] operator new(unsigned long)@plt qsl-bench - - + 1.06% [.] qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) qsl-bench - - + | + ---qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&) + + 1.06% [.] qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const qsl-bench - - + | + ---qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) main __libc_start_call_main __libc_start_main@@GLIBC_2.34 _start - 1.08% [.] qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) qsl-bench - - + 1.06% [.] qsl::gateway::(anonymous namespace)::append(std::vector >&, std::vector > const&, unsigned long) [clone .isra.0] qsl-bench - - | - ---qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long) + ---qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector >&, unsigned long) + qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long) qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long) qsl::gateway::Session::on_bytes(std::span) main @@ -457,15 +440,27 @@ perf report output: __libc_start_main@@GLIBC_2.34 _start - 1.08% [.] qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) qsl-bench - - + 1.06% [.] qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) qsl-bench - - | ---main __libc_start_call_main __libc_start_main@@GLIBC_2.34 _start - 1.08% [.] std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) libstdc++.so.6.0.35 - - - 1.08% [.] std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*) libstdc++.so.6.0.35 - - + 1.06% [.] std::_Hashtable, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node, false>*) qsl-bench - - + 1.06% [.] std::_Hashtable, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node, false>*, unsigned long) qsl-bench - - + | + ---std::__detail::_Map_base, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits, true>::operator[](unsigned long const&) + qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) + qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + + 1.06% [.] std::__detail::_List_node_base::_M_unhook() libstdc++.so.6.0.35 - - + | + ---qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) + decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] + qsl::engine::OrderBook::cancel(unsigned long) + + 1.06% [.] std::__detail::_Map_base, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits, true>::operator[](unsigned long const&) qsl-bench - - | ---qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) @@ -474,9 +469,8 @@ perf report output: __libc_start_main@@GLIBC_2.34 _start - 1.08% [.] std::pmr::(anonymous namespace)::newdel_res_t::do_deallocate(void*, unsigned long, unsigned long) libstdc++.so.6.0.35 - - # -# (Tip: Compare performance results with: perf diff [ ]) +# (Tip: To see list of saved events and attributes: perf evlist -v) # diff --git a/results/perf_stat_linux.txt b/results/perf_stat_linux.txt index ddcd14b..1c6f521 100644 --- a/results/perf_stat_linux.txt +++ b/results/perf_stat_linux.txt @@ -8,12 +8,12 @@ Perf: perf version 6.19.14-400.asahi.fc44.aarch64 Perf paranoid: 2 Build type: Release Provenance version: 1 -Git commit (informational): 33f8d11 -Source digest: sha256:d8856d2f599416a9e74050726279a67f88d61a3bb3d06de86eb3bf948d2a16a5 +Git commit (informational): f9f7e98 +Source digest: sha256:59d9fdbc9d64b974bd28094e55610cced29a381c6e2ec968092862a975bde281 Source digest scope: perf-stat-benchmark Dirty inputs: no Generated output: results/perf_stat_linux.txt -Date: 2026-06-21T05:25:23Z +Date: 2026-06-25T02:30:17Z Benchmark binary: build/bench/qsl-bench Benchmark status: 0 Dataset: qsl-bench default synthetic benchmark suite @@ -28,39 +28,39 @@ until every requested counter is supported. Profiling evidence for investigation not a production-latency claim. Benchmark output: -order_book add/mod/cancel 200000 ops 143.9 ns/op 6947208 ops/sec -protocol encode+decode 500000 ops 21.5 ns/op 46599221 ops/sec -gateway session (fill) 200000 ops 129.7 ns/op 7710496 ops/sec -matching engine flow 5004 items 102.4 ns/item 9769779 items/sec -replay command log 5004 items 110.3 ns/item 9064737 items/sec +order_book add/mod/cancel 200000 ops 137.2 ns/op 7288189 ops/sec +protocol encode+decode 500000 ops 21.2 ns/op 47087754 ops/sec +gateway session (fill) 200000 ops 120.8 ns/op 8277978 ops/sec +matching engine flow 5004 items 93.2 ns/item 10733865 items/sec +replay command log 5004 items 97.1 ns/item 10294783 items/sec Benchmark output under perf: -order_book add/mod/cancel 200000 ops 92.7 ns/op 10785353 ops/sec -protocol encode+decode 500000 ops 16.3 ns/op 61508483 ops/sec -gateway session (fill) 200000 ops 110.8 ns/op 9023997 ops/sec -matching engine flow 5004 items 98.1 ns/item 10190493 items/sec -replay command log 5004 items 109.4 ns/item 9137639 items/sec +order_book add/mod/cancel 200000 ops 121.9 ns/op 8202972 ops/sec +protocol encode+decode 500000 ops 21.0 ns/op 47563493 ops/sec +gateway session (fill) 200000 ops 120.9 ns/op 8269791 ops/sec +matching engine flow 5004 items 95.8 ns/item 10437399 items/sec +replay command log 5004 items 99.6 ns/item 10043348 items/sec perf stat output: Performance counter stats for 'build/bench/qsl-bench': - 233,479,932 apple_avalanche_pmu/cycles/u + 221,558,456 apple_avalanche_pmu/cycles/u apple_blizzard_pmu/cycles/u (0.00%) - 1,247,839,058 apple_avalanche_pmu/instructions/u + 1,160,776,150 apple_avalanche_pmu/instructions/u apple_blizzard_pmu/instructions/u (0.00%) - 245,495,434 apple_avalanche_pmu/branches/u + 233,032,815 apple_avalanche_pmu/branches/u apple_blizzard_pmu/branches/u (0.00%) - 1,272,574 apple_avalanche_pmu/branch-misses/u + 1,143,050 apple_avalanche_pmu/branch-misses/u apple_blizzard_pmu/branch-misses/u (0.00%) apple_avalanche_pmu/cache-references/u apple_blizzard_pmu/cache-references/u apple_avalanche_pmu/cache-misses/u apple_blizzard_pmu/cache-misses/u 0 context-switches:u - 208 page-faults:u + 229 page-faults:u - 0.081496718 seconds time elapsed + 0.091141580 seconds time elapsed - 0.080390000 seconds user - 0.000988000 seconds sys + 0.090001000 seconds user + 0.001001000 seconds sys diff --git a/results/pool_backed_storage.txt b/results/pool_backed_storage.txt index ceb42ed..aabf682 100644 --- a/results/pool_backed_storage.txt +++ b/results/pool_backed_storage.txt @@ -4,12 +4,12 @@ OS: Linux 6.19.14-400.asahi.fc44.aarch64+16k Compiler: c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2) Build type: Release Provenance version: 1 -Git commit (informational): 33f8d11 -Source digest: sha256:0596425b8906deaa0bc34f841889eaddd11efa1fc39854c7f92327de1f69ad4d +Git commit (informational): f9f7e98 +Source digest: sha256:c1e4cd7db8472a87cbd23ece3a2d4b330f78ad876b58da412e0e54f6c4eb4cf7 Source digest scope: order-book-storage-benchmark Dirty inputs: no Generated output: results/pool_backed_storage.txt -Date: 2026-06-21T05:25:22Z +Date: 2026-06-25T02:29:36Z Dataset: deterministic storage workloads (general, dense, sparse, cancel/modify, match/traversal) Scenario: baseline OrderBook storage vs PMR pooled nodes vs intrusive OrderPool nodes vs contiguous price-indexed storage Warmup: one full workload replay per storage mode before timing @@ -43,39 +43,39 @@ Scenario / Metric / Result: Workload: general generated flow (seed=42) Purpose: Existing deterministic generated engine flow; mixed insert, match, cancel, modify, IOC, and market activity. Shape: commands=5000 events=7155 accepted=3517 trades=2238 cancel_cmds=793 modify_cmds=690 market_orders=602 ioc_orders=376 canceled_events=710 modified_events=690 final_resting=37 max_resting=72 max_bid_levels=21 max_ask_levels=22 avg_bid_levels=11.2 avg_ask_levels=12.4 max_active_levels=41 price_width=67 price_density=0.076 top_probe_calls=0 -general generated flow baseline 5000 cmds 30 reps median 99.4 ns/cmd min 98.5 max 102.3 10059512 cmds/sec events/run=7155 resting/run=37 last_seq/run=7155 probes/run=0 -general generated flow pooled pmr 5000 cmds 30 reps median 114.0 ns/cmd min 113.1 max 116.2 8771914 cmds/sec events/run=7155 resting/run=37 last_seq/run=7155 probes/run=0 -general generated flow intrusive pool 5000 cmds 30 reps median 82.5 ns/cmd min 81.4 max 88.3 12128563 cmds/sec events/run=7155 resting/run=37 last_seq/run=7155 probes/run=0 -general generated flow contiguous 5000 cmds 30 reps median 73.3 ns/cmd min 72.4 max 76.3 13644128 cmds/sec events/run=7155 resting/run=37 last_seq/run=7155 probes/run=0 +general generated flow baseline 5000 cmds 30 reps median 89.4 ns/cmd min 88.4 max 91.6 11185657 cmds/sec events/run=7155 resting/run=37 last_seq/run=7155 probes/run=0 +general generated flow pooled pmr 5000 cmds 30 reps median 100.0 ns/cmd min 99.1 max 102.2 9997481 cmds/sec events/run=7155 resting/run=37 last_seq/run=7155 probes/run=0 +general generated flow intrusive pool 5000 cmds 30 reps median 80.3 ns/cmd min 79.6 max 84.4 12454572 cmds/sec events/run=7155 resting/run=37 last_seq/run=7155 probes/run=0 +general generated flow contiguous 5000 cmds 30 reps median 71.2 ns/cmd min 69.9 max 73.4 14053155 cmds/sec events/run=7155 resting/run=37 last_seq/run=7155 probes/run=0 Workload: dense bounded flow (seed=4702) Purpose: Small bounded price domain with many live levels, repeated same-price operations, and top-of-book probes after every command. Shape: commands=5002 events=5558 accepted=4018 trades=1048 cancel_cmds=0 modify_cmds=492 market_orders=984 ioc_orders=492 canceled_events=0 modified_events=492 final_resting=2264 max_resting=2264 max_bid_levels=40 max_ask_levels=40 avg_bid_levels=39.2 avg_ask_levels=38.5 max_active_levels=80 price_width=136 price_density=0.147 top_probe_calls=20008 -dense bounded flow baseline 5002 cmds 30 reps median 75.4 ns/cmd min 74.8 max 76.3 13270826 cmds/sec events/run=5558 resting/run=2264 last_seq/run=5558 probes/run=20008 -dense bounded flow pooled pmr 5002 cmds 30 reps median 78.5 ns/cmd min 78.3 max 80.6 12742618 cmds/sec events/run=5558 resting/run=2264 last_seq/run=5558 probes/run=20008 -dense bounded flow intrusive pool 5002 cmds 30 reps median 52.8 ns/cmd min 52.2 max 53.9 18952928 cmds/sec events/run=5558 resting/run=2264 last_seq/run=5558 probes/run=20008 -dense bounded flow contiguous 5002 cmds 30 reps median 57.9 ns/cmd min 57.4 max 58.8 17260715 cmds/sec events/run=5558 resting/run=2264 last_seq/run=5558 probes/run=20008 +dense bounded flow baseline 5002 cmds 30 reps median 66.1 ns/cmd min 65.9 max 67.3 15128922 cmds/sec events/run=5558 resting/run=2264 last_seq/run=5558 probes/run=20008 +dense bounded flow pooled pmr 5002 cmds 30 reps median 66.3 ns/cmd min 66.1 max 67.1 15075710 cmds/sec events/run=5558 resting/run=2264 last_seq/run=5558 probes/run=20008 +dense bounded flow intrusive pool 5002 cmds 30 reps median 52.2 ns/cmd min 51.8 max 53.4 19161667 cmds/sec events/run=5558 resting/run=2264 last_seq/run=5558 probes/run=20008 +dense bounded flow contiguous 5002 cmds 30 reps median 57.8 ns/cmd min 57.2 max 58.8 17310414 cmds/sec events/run=5558 resting/run=2264 last_seq/run=5558 probes/run=20008 Workload: sparse wide flow (seed=4703) Purpose: Wide in-band price domain with few active levels and many gaps. Shape: commands=5000 events=5000 accepted=3344 trades=0 cancel_cmds=828 modify_cmds=828 market_orders=0 ioc_orders=0 canceled_events=828 modified_events=828 final_resting=2516 max_resting=2517 max_bid_levels=16 max_ask_levels=16 avg_bid_levels=7.8 avg_ask_levels=7.5 max_active_levels=32 price_width=985 price_density=0.004 top_probe_calls=0 -sparse wide flow baseline 5000 cmds 30 reps median 64.1 ns/cmd min 63.6 max 65.2 15606711 cmds/sec events/run=5000 resting/run=2516 last_seq/run=5000 probes/run=0 -sparse wide flow pooled pmr 5000 cmds 30 reps median 69.3 ns/cmd min 69.0 max 71.2 14419611 cmds/sec events/run=5000 resting/run=2516 last_seq/run=5000 probes/run=0 -sparse wide flow intrusive pool 5000 cmds 30 reps median 40.9 ns/cmd min 40.1 max 43.0 24430047 cmds/sec events/run=5000 resting/run=2516 last_seq/run=5000 probes/run=0 -sparse wide flow contiguous 5000 cmds 30 reps median 39.4 ns/cmd min 38.8 max 41.2 25359213 cmds/sec events/run=5000 resting/run=2516 last_seq/run=5000 probes/run=0 +sparse wide flow baseline 5000 cmds 30 reps median 55.7 ns/cmd min 55.1 max 56.5 17964093 cmds/sec events/run=5000 resting/run=2516 last_seq/run=5000 probes/run=0 +sparse wide flow pooled pmr 5000 cmds 30 reps median 57.9 ns/cmd min 57.6 max 60.7 17263703 cmds/sec events/run=5000 resting/run=2516 last_seq/run=5000 probes/run=0 +sparse wide flow intrusive pool 5000 cmds 30 reps median 42.8 ns/cmd min 42.1 max 45.5 23387217 cmds/sec events/run=5000 resting/run=2516 last_seq/run=5000 probes/run=0 +sparse wide flow contiguous 5000 cmds 30 reps median 40.3 ns/cmd min 39.8 max 41.9 24829299 cmds/sec events/run=5000 resting/run=2516 last_seq/run=5000 probes/run=0 Workload: cancel/modify-heavy flow (seed=4704) Purpose: Locator-heavy workload with frequent active cancels, in-place modifies, replenishment, and duplicate active ids. Shape: commands=5001 events=4801 accepted=1599 trades=0 cancel_cmds=1599 modify_cmds=1603 market_orders=0 ioc_orders=0 canceled_events=1599 modified_events=1603 final_resting=0 max_resting=62 max_bid_levels=30 max_ask_levels=30 avg_bid_levels=1.9 avg_ask_levels=1.4 max_active_levels=60 price_width=30 price_density=0.333 top_probe_calls=0 -cancel/modify-heavy flow baseline 5001 cmds 30 reps median 46.2 ns/cmd min 46.0 max 47.9 21649351 cmds/sec events/run=4801 resting/run=0 last_seq/run=4801 probes/run=0 -cancel/modify-heavy flow pooled pmr 5001 cmds 30 reps median 53.5 ns/cmd min 53.4 max 54.8 18680801 cmds/sec events/run=4801 resting/run=0 last_seq/run=4801 probes/run=0 -cancel/modify-heavy flow intrusive pool 5001 cmds 30 reps median 36.2 ns/cmd min 35.1 max 38.7 27610766 cmds/sec events/run=4801 resting/run=0 last_seq/run=4801 probes/run=0 -cancel/modify-heavy flow contiguous 5001 cmds 30 reps median 31.6 ns/cmd min 31.5 max 33.4 31618479 cmds/sec events/run=4801 resting/run=0 last_seq/run=4801 probes/run=0 +cancel/modify-heavy flow baseline 5001 cmds 30 reps median 49.0 ns/cmd min 48.8 max 50.4 20419162 cmds/sec events/run=4801 resting/run=0 last_seq/run=4801 probes/run=0 +cancel/modify-heavy flow pooled pmr 5001 cmds 30 reps median 54.7 ns/cmd min 54.6 max 55.8 18271296 cmds/sec events/run=4801 resting/run=0 last_seq/run=4801 probes/run=0 +cancel/modify-heavy flow intrusive pool 5001 cmds 30 reps median 36.7 ns/cmd min 36.0 max 38.6 27284333 cmds/sec events/run=4801 resting/run=0 last_seq/run=4801 probes/run=0 +cancel/modify-heavy flow contiguous 5001 cmds 30 reps median 31.4 ns/cmd min 31.1 max 33.0 31861824 cmds/sec events/run=4801 resting/run=0 last_seq/run=4801 probes/run=0 Workload: match/traversal-heavy flow (seed=4705) Purpose: Many small maker orders per level sweep, stressing level traversal and best-price maintenance. Shape: commands=5003 events=9015 accepted=5003 trades=4012 cancel_cmds=0 modify_cmds=0 market_orders=494 ioc_orders=494 canceled_events=0 modified_events=0 final_resting=3 max_resting=76 max_bid_levels=20 max_ask_levels=40 avg_bid_levels=2.5 avg_ask_levels=5.4 max_active_levels=60 price_width=81 price_density=0.370 top_probe_calls=0 -match/traversal-heavy flow baseline 5003 cmds 30 reps median 98.7 ns/cmd min 96.7 max 101.1 10133500 cmds/sec events/run=9015 resting/run=3 last_seq/run=9015 probes/run=0 -match/traversal-heavy flow pooled pmr 5003 cmds 30 reps median 115.8 ns/cmd min 114.8 max 117.1 8638895 cmds/sec events/run=9015 resting/run=3 last_seq/run=9015 probes/run=0 -match/traversal-heavy flow intrusive pool 5003 cmds 30 reps median 70.1 ns/cmd min 68.3 max 73.3 14262013 cmds/sec events/run=9015 resting/run=3 last_seq/run=9015 probes/run=0 -match/traversal-heavy flow contiguous 5003 cmds 30 reps median 59.8 ns/cmd min 59.5 max 60.2 16725449 cmds/sec events/run=9015 resting/run=3 last_seq/run=9015 probes/run=0 +match/traversal-heavy flow baseline 5003 cmds 30 reps median 96.5 ns/cmd min 95.4 max 99.1 10361739 cmds/sec events/run=9015 resting/run=3 last_seq/run=9015 probes/run=0 +match/traversal-heavy flow pooled pmr 5003 cmds 30 reps median 110.0 ns/cmd min 108.8 max 114.2 9094975 cmds/sec events/run=9015 resting/run=3 last_seq/run=9015 probes/run=0 +match/traversal-heavy flow intrusive pool 5003 cmds 30 reps median 65.1 ns/cmd min 64.0 max 66.8 15350534 cmds/sec events/run=9015 resting/run=3 last_seq/run=9015 probes/run=0 +match/traversal-heavy flow contiguous 5003 cmds 30 reps median 56.8 ns/cmd min 56.5 max 57.3 17603243 cmds/sec events/run=9015 resting/run=3 last_seq/run=9015 probes/run=0 diff --git a/results/recovery_benchmarks.txt b/results/recovery_benchmarks.txt index 6796022..aa24eb8 100644 --- a/results/recovery_benchmarks.txt +++ b/results/recovery_benchmarks.txt @@ -4,12 +4,12 @@ OS: Linux 6.19.14-400.asahi.fc44.aarch64+16k Compiler: c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2) Build type: Release Provenance version: 1 -Git commit (informational): 33f8d11 -Source digest: sha256:66d9841df48c833aeecd1da299536f7a7b16300ea83683d1bc209580ae0bebfe +Git commit (informational): f9f7e98 +Source digest: sha256:89cd2b8b43602479475abeb330a8f6c854339c62fe22867b5cbdd715b8e65cd4 Source digest scope: recovery-benchmark Dirty inputs: no Generated output: results/recovery_benchmarks.txt -Date: 2026-06-21T05:25:22Z +Date: 2026-06-25T02:29:37Z Dataset: deterministic generated flows (seed 42, 4 symbols, 5k/20k/80k commands) plus synthetic non-crossing resting books (1k/10k/50k resting orders) Scenario: full-replay restart (recover_log_file + replay) vs in-memory book rebuild @@ -30,35 +30,35 @@ and build-dependent; not a production recovery-time claim. Scenario / Metric / Result: log 5004 commands file 0.24 MiB resting 37 orders - recover_log_file (read+verify+classify) 10 reps 0.460 ms/run 92.0 ns/record - replay into fresh engine (decode+apply) 10 reps 0.709 ms/run 141.8 ns/command - full restart (recover + replay) 10 reps 1.017 ms/run 203.2 ns/command - capture resting state (snapshot proto) 10 reps 0.000 ms/run 6.4 ns/order - rebuild book from captured state 10 reps 0.006 ms/run 168.8 ns/order + recover_log_file (read+verify+classify) 10 reps 0.716 ms/run 143.1 ns/record + replay into fresh engine (decode+apply) 10 reps 0.894 ms/run 178.6 ns/command + full restart (recover + replay) 10 reps 1.239 ms/run 247.6 ns/command + capture resting state (snapshot proto) 10 reps 0.000 ms/run 8.0 ns/order + rebuild book from captured state 10 reps 0.008 ms/run 216.0 ns/order log 20004 commands file 0.95 MiB resting 30 orders - recover_log_file (read+verify+classify) 10 reps 1.331 ms/run 66.5 ns/record - replay into fresh engine (decode+apply) 10 reps 2.270 ms/run 113.5 ns/command - full restart (recover + replay) 10 reps 3.668 ms/run 183.4 ns/command + recover_log_file (read+verify+classify) 10 reps 1.478 ms/run 73.9 ns/record + replay into fresh engine (decode+apply) 10 reps 1.992 ms/run 99.6 ns/command + full restart (recover + replay) 10 reps 3.360 ms/run 168.0 ns/command capture resting state (snapshot proto) 10 reps 0.000 ms/run 5.7 ns/order - rebuild book from captured state 10 reps 0.004 ms/run 127.4 ns/order + rebuild book from captured state 10 reps 0.003 ms/run 111.5 ns/order log 80004 commands file 3.81 MiB resting 24 orders - recover_log_file (read+verify+classify) 10 reps 5.653 ms/run 70.7 ns/record - replay into fresh engine (decode+apply) 10 reps 9.050 ms/run 113.1 ns/command - full restart (recover + replay) 10 reps 14.736 ms/run 184.2 ns/command - capture resting state (snapshot proto) 10 reps 0.000 ms/run 6.6 ns/order - rebuild book from captured state 10 reps 0.003 ms/run 117.9 ns/order + recover_log_file (read+verify+classify) 10 reps 5.691 ms/run 71.1 ns/record + replay into fresh engine (decode+apply) 10 reps 7.931 ms/run 99.1 ns/command + full restart (recover + replay) 10 reps 13.734 ms/run 171.7 ns/command + capture resting state (snapshot proto) 10 reps 0.000 ms/run 6.4 ns/order + rebuild book from captured state 10 reps 0.002 ms/run 101.7 ns/order synthetic resting book 4 symbols 1000 resting orders - capture resting state (snapshot proto) 10 reps 0.002 ms/run 2.1 ns/order - rebuild book from captured state 10 reps 0.111 ms/run 110.9 ns/order + capture resting state (snapshot proto) 10 reps 0.002 ms/run 2.0 ns/order + rebuild book from captured state 10 reps 0.090 ms/run 90.4 ns/order synthetic resting book 4 symbols 10000 resting orders - capture resting state (snapshot proto) 10 reps 0.054 ms/run 5.4 ns/order - rebuild book from captured state 10 reps 0.996 ms/run 99.6 ns/order + capture resting state (snapshot proto) 10 reps 0.055 ms/run 5.5 ns/order + rebuild book from captured state 10 reps 0.722 ms/run 72.2 ns/order synthetic resting book 4 symbols 50000 resting orders - capture resting state (snapshot proto) 10 reps 0.382 ms/run 7.6 ns/order - rebuild book from captured state 10 reps 4.775 ms/run 95.5 ns/order + capture resting state (snapshot proto) 10 reps 0.376 ms/run 7.5 ns/order + rebuild book from captured state 10 reps 3.487 ms/run 69.7 ns/order diff --git a/results/socket_load_summary.txt b/results/socket_load_summary.txt index fc59226..bf8bb9c 100644 --- a/results/socket_load_summary.txt +++ b/results/socket_load_summary.txt @@ -7,12 +7,12 @@ Cores: 8 Compiler: c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2) Build type: Debug Provenance version: 1 -Git commit (informational): 33f8d11 -Source digest: sha256:bc03fe85ae7177600ecff2a3acca3c7f6892e071716ae5e63529732fea798607 +Git commit (informational): f9f7e98 +Source digest: sha256:221f3a2da54f889cdd448da347faccb9a81c38d5191fee759d69a66aaf54f677 Source digest scope: socket-load Dirty inputs: no Generated output: results/socket_load_summary.txt -Date: 2026-06-21T05:26:56Z +Date: 2026-06-25T02:31:10Z Dataset: synthetic order flow via qsl-client (NewOrder + Heartbeat per connection) Scenario: concurrent short-lived client sweep across the threaded and epoll transports Metric: best (min) wall time per cell, approximate conns/s, and completion ratio @@ -30,14 +30,14 @@ path, not matching. mode clients wall(s,best) conns/s(~) completed ------- ------- ------------ ---------- --------- -threaded 1 0.0037 270 1/1 -threaded 4 0.0084 476 4/4 -threaded 8 0.0187 428 8/8 -threaded 16 0.0416 385 16/16 -epoll 1 0.0038 263 1/1 -epoll 4 0.0093 430 4/4 -epoll 8 0.0197 406 8/8 -epoll 16 0.0340 471 16/16 +threaded 1 0.0044 227 1/1 +threaded 4 0.0085 471 4/4 +threaded 8 0.0154 519 8/8 +threaded 16 0.0396 404 16/16 +epoll 1 0.0046 217 1/1 +epoll 4 0.0070 571 4/4 +epoll 8 0.0146 548 8/8 +epoll 16 0.0385 416 16/16 Reading the result: compare how the best wall time grows with the client count within each mode. At these small loopback counts connection setup dominates and the two modes can stay close diff --git a/results/socket_profile_loopback.txt b/results/socket_profile_loopback.txt index 7dd6c40..c14b1c6 100644 --- a/results/socket_profile_loopback.txt +++ b/results/socket_profile_loopback.txt @@ -8,12 +8,12 @@ Build type: Debug strace: strace -- version 7.1 CLK_TCK: 100 Provenance version: 1 -Git commit (informational): 33f8d11 -Source digest: sha256:b5e71f8f19427217a89854903a73fcfa796acf2217b706b718acd93c3410fa4c +Git commit (informational): f9f7e98 +Source digest: sha256:def5b5aa6ee476f193dc5e8a3054389db19c1bfd5af58d3355a5d06a1b9419df Source digest scope: gateway-io-profile Dirty inputs: no Generated output: results/socket_profile_loopback.txt -Date: 2026-06-21T05:25:30Z +Date: 2026-06-25T02:31:16Z Transport: TCP over 127.0.0.1 (loopback), portable threaded TcpServer Load: 500 sequential client round trips (NewOrder + Heartbeat each) @@ -22,12 +22,12 @@ User (engine-side) vs System (kernel/socket) CPU time splits user-space matching from time spent in the kernel servicing accept/read/write/close on the socket path. User (engine-side) CPU time: 0.000 s (0 ticks) -System (kernel/socket) CPU time: 0.040 s (4 ticks) +System (kernel/socket) CPU time: 0.030 s (3 ticks) System share of CPU: 100.0% -Minor page faults: 172 Major page faults: 1 -VmHWM: 3872 kB -voluntary_ctxt_switches: 503 -nonvoluntary_ctxt_switches: 12 +Minor page faults: 172 Major page faults: 0 +VmHWM: 3840 kB +voluntary_ctxt_switches: 502 +nonvoluntary_ctxt_switches: 0 == Pass 2: strace -f -c (syscall mix on the gateway socket path) == Call counts and in-kernel time per syscall. strace perturbs timing heavily, so read the @@ -35,35 +35,35 @@ syscall *mix* (which calls dominate the socket path), not the absolute seconds. % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- - 46.85 0.041831 83 502 1 accept - 21.65 0.019334 38 501 clone3 - 7.52 0.006711 3 2005 rt_sigprocmask - 5.28 0.004715 9 506 close - 5.09 0.004543 9 500 sendto - 4.53 0.004042 8 502 rseq - 3.63 0.003241 3 1005 read - 3.30 0.002947 5 502 set_robust_list - 2.03 0.001811 3 504 madvise - 0.04 0.000036 3 11 mprotect - 0.03 0.000025 1 24 mmap - 0.02 0.000016 8 2 1 futex - 0.01 0.000007 7 1 socket - 0.01 0.000005 5 1 rt_sigaction - 0.01 0.000005 5 1 bind - 0.01 0.000005 0 9 munmap - 0.00 0.000003 3 1 set_tid_address - 0.00 0.000003 3 1 listen - 0.00 0.000003 3 1 setsockopt - 0.00 0.000003 1 3 brk - 0.00 0.000002 2 1 1 ioctl - 0.00 0.000002 0 6 fstat + 49.31 0.042227 84 502 1 accept + 18.43 0.015780 31 501 clone3 + 7.59 0.006499 3 2005 rt_sigprocmask + 5.77 0.004942 9 506 close + 5.41 0.004637 9 500 sendto + 4.09 0.003499 6 502 rseq + 3.92 0.003354 3 1005 read + 3.19 0.002733 5 502 set_robust_list + 2.17 0.001858 3 503 madvise + 0.04 0.000030 30 1 socket + 0.02 0.000021 0 23 mmap + 0.02 0.000016 16 1 bind + 0.01 0.000011 1 9 munmap + 0.01 0.000008 8 1 rt_sigaction + 0.01 0.000008 8 1 listen + 0.01 0.000008 8 1 setsockopt + 0.01 0.000005 0 11 mprotect + 0.00 0.000000 0 1 1 ioctl 0.00 0.000000 0 1 1 faccessat 0.00 0.000000 0 5 openat + 0.00 0.000000 0 6 fstat + 0.00 0.000000 0 1 set_tid_address + 0.00 0.000000 0 1 futex + 0.00 0.000000 0 3 brk 0.00 0.000000 0 1 execve 0.00 0.000000 0 1 prlimit64 0.00 0.000000 0 1 getrandom ------ ----------- ----------- --------- --------- ---------------- -100.00 0.089290 13 6598 4 total +100.00 0.085636 12 6595 3 total Caveats: - Loopback only: no NIC, device driver, routing, or real-network behaviour is exercised. diff --git a/results/socket_stress_summary.txt b/results/socket_stress_summary.txt index 680178d..8d6628a 100644 --- a/results/socket_stress_summary.txt +++ b/results/socket_stress_summary.txt @@ -8,12 +8,12 @@ rmem_max: 4194304 Compiler: c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2) Build type: Debug Provenance version: 1 -Git commit (informational): 33f8d11 -Source digest: sha256:b787480b33b16e7b8ab0d2ff6fc3baa5e5ab7162b3bc3d8de691656e4692cdc1 +Git commit (informational): f9f7e98 +Source digest: sha256:8afc6fd65ed36967ea09e87bc411638ae107cec7a1a3e68dba367c1f9479d5d5 Source digest scope: socket-stress Dirty inputs: no Generated output: results/socket_stress_summary.txt -Date: 2026-06-21T05:26:28Z +Date: 2026-06-25T02:32:17Z Transport: UDP unicast over 127.0.0.1 (loopback) Dataset: qsl-mdfeed publish, seed 42, 20000 orders, 3 symbols Trials/setting: 4 @@ -25,8 +25,8 @@ clamp the request, so the effective (granted) size is read back via getsockopt. setting requested(B) effective(B) published lost/trial maxlost seq-gaps/trial ------- ------------ ------------ --------- ---------- ------- -------------- -small 2048 4096 14820 5,0,0,9 9 5,0,0,9 -default 0 212992 14820 0,11,298,0 298 0,11,298,0 +small 2048 4096 14820 0,663,347,490 663 0,663,347,490 +default 0 212992 14820 0,0,2224,0 2224 0,0,2224,0 large 8388608 8388608 14820 0,0,0,0 0 0,0,0,0 Reading the result: 'lost/trial' is the honest loss metric -- published minus received