diff --git a/AGENTS.md b/AGENTS.md index abd917d..0833834 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1275,6 +1275,8 @@ exited 0, so pure-UBSan defects passed CI green (#142); OCaml `diff_report` per- (#144). Perf (measured back-to-back A/B): `try_emplace` for baseline price levels (~+5%, #138) and an order-index hash `max_load_factor` cap at 0.25 (~+18.6%, #145), flamegraph regenerated (#135/#139/#146). Determinism preserved (byte-identical fixtures; OCaml differential pass). -`make check`/`make asan` 270/270 (the latter now a real UBSan gate). After `v0.2.2`, the +`make check`/`make asan` 272/272 (the latter now a real UBSan gate). `v0.2.2` then also folded in a +documentation overhaul, a `PERFORMANCE.md` v0.1.0-to-v0.2.2 evidence report, and a bug/style/mermaid +sweep (#147-#150). After `v0.2.2`, the highest-value remaining work is non-code and gated on #94 (external review) and #90 (full cache-PMU evidence on a PMU-capable microarchitecture). diff --git a/CHANGELOG.md b/CHANGELOG.md index cbf2d23..bc5cf86 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,15 +7,17 @@ All notable changes to this project. The format is loosely based on _Nothing yet._ -## [0.2.2] - 2026-06-24 - -A security/robustness **hardening** wave plus two measured order-book **performance** wins, driven by -a multi-round adversarial bug hunt (converged 5→2→1→0 confirmed bugs) and flamegraph-guided -optimization. Same honesty bar: a deterministic C++20 exchange simulator and cross-language -differential-testing harness, **not** a production exchange, no real-market connectivity, no latency -or profitability claims, not formal verification. Determinism preserved throughout (fixtures -byte-identical across g++/clang++ and vs the committed copies; the OCaml differential passes). -`make check`/`make asan` 270/270. +## [0.2.2] - 2026-06-25 + +A security/robustness **hardening** wave plus two measured order-book **performance** wins (driven by +a multi-round adversarial bug hunt, converged 5→2→1→0 confirmed bugs, and flamegraph-guided +optimization), then a full **documentation overhaul**: a reproducible performance-evidence report, a +rebuilt README, mermaid diagrams across the docs, and a repo-wide style sweep. Same honesty bar: a +deterministic C++20 exchange simulator and cross-language differential-testing harness, **not** a +production exchange, no real-market connectivity, no latency or profitability claims, not formal +verification. Determinism preserved throughout (fixtures byte-identical across g++/clang++ and vs the +committed copies; the OCaml differential passes). `make check`/`make asan` 272/272 (the latter a real +UBSan abort gate). ### Fixed @@ -56,6 +58,31 @@ byte-identical across g++/clang++ and vs the committed copies; the OCaml differe - **Flamegraph regenerated (#135, #139, #146)** against the new code, now a dense (~20k-sample), fully-symbolized frame-pointer profile with zero `[unknown]` frames. +### Documentation and evidence + +- **Performance-evidence report (#148, #150).** New `PERFORMANCE.md` profiles the matching-engine hot + path with Linux `perf` and flamegraphs on ARM64 (Apple M2), comparing the **v0.1.0 first release to + v0.2.2** (the same `qsl-perfeval` harness ported into a `v0.1.0` worktree, measured on the same + host): allocations/order 4.094 → 2.670 (-35%), cycles/order 310.7 → 289.5 (-6.8%), branch-miss rate + 2.01% → 1.68%, latency unchanged. Cache-miss rate is reported unavailable, never estimated (Apple + Silicon PMU; #90). A dedicated `qsl-perfeval` target (plus a `qsl-perfeval-allocs` counting build) + makes every number reproducible. The before/after flamegraphs render fully symbolized with zero + `[unknown]` frames; the unresolvable boundary frames were identified (an fp glibc-malloc-boundary + artifact and the vDSO `clock_gettime` leaf) and folded into their resolved caller, not hidden. +- **Documentation overhaul and README rebuild (#147, #149).** Every doc, artifact, and provenance + header was refreshed to the v0.2.2 state, and the README was rebuilt to lead with the performance + numbers and the matching-engine flamegraph. Mermaid diagrams were added across the docs (matching + rules, binary protocol, persistence, concurrency model, memory ordering, gateway accept loop, OCaml + differential), and every em dash and en dash was removed repo-wide. +- **Honesty corrections, made in the open.** Two measurement errors caught by self-review and code + review were fixed and documented rather than buried: the allocation counter had missed + over-aligned allocations (so the cumulative reduction is -35%, not the -73% an earlier draft + claimed), and a thermal-warmup p99 artifact was corrected to "latency distribution unchanged". +- **Previously-unaddressed review findings (#150).** Acted on CodeRabbit comments left open on earlier + PRs: a non-finite `strtod` guard in the bench profile timer, `ENOPROTOOPT`/`EOPNOTSUPP` added to the + threaded accept-retry set to match the epoll path, and the perfeval harness's resting-order + tracking, percentile index, and argument validation. + ## [0.2.1] - 2026-06-21 Two backlog items, reprioritized by the maintainer and delivered, plus a resume-anchor and diff --git a/CLAUDE.md b/CLAUDE.md index cd1a7ae..4829be2 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1219,6 +1219,8 @@ exited 0, so pure-UBSan defects passed CI green (#142); OCaml `diff_report` per- (#144). Perf (measured back-to-back A/B): `try_emplace` for baseline price levels (~+5%, #138) and an order-index hash `max_load_factor` cap at 0.25 (~+18.6%, #145), flamegraph regenerated (#135/#139/#146). Determinism preserved (byte-identical fixtures; OCaml differential pass). -`make check`/`make asan` 270/270 (the latter now a real UBSan gate). After `v0.2.2`, the +`make check`/`make asan` 272/272 (the latter now a real UBSan gate). `v0.2.2` then also folded in a +documentation overhaul, a `PERFORMANCE.md` v0.1.0-to-v0.2.2 evidence report, and a bug/style/mermaid +sweep (#147-#150). After `v0.2.2`, the highest-value remaining work is non-code and gated on #94 (external review) and #90 (full cache-PMU evidence on a PMU-capable microarchitecture). diff --git a/HANDOFF.md b/HANDOFF.md index d273564..0a78e02 100644 --- a/HANDOFF.md +++ b/HANDOFF.md @@ -43,8 +43,10 @@ connection cap, UDP send-error tracking, transient-accept survival, and threaded handling (#137, #140, #143); CLI arg validation (#141); a **real UBSan abort gate**, `-fno-sanitize-recover=undefined`, since UBSan previously ran in recover mode and exited 0 (#142); OCaml `diff_report` robustness (#144). Perf (measured A/B): `try_emplace` for baseline price levels (~+5%, #138) and an order-index hash load-factor cap (~+18.6%, #145), with the flamegraph regenerated -(#135/#139/#146). `make check`/`make asan` 270/270 (the latter now under the real UBSan gate). The -next action is to finish this `v0.2.2` doc/artifact overhaul and cut the tag. +(#135/#139/#146), then a documentation overhaul, a `PERFORMANCE.md` v0.1.0-to-v0.2.2 evidence report, +and a bug/style/mermaid sweep (#147-#150). `make check`/`make asan` 272/272 (the latter now under the +real UBSan gate). `v0.2.2` is tagged; the next high-value work is non-code (#94 external review, #90 +cache-PMU evidence). Background. Linux perf evidence (merged, now bare-metal partial PMU): @@ -87,13 +89,14 @@ Current state: - latest synced main baseline: `ded6e80` (PR #127, v0.2.0); the `v0.2.1` baseline is the release-PR merge commit, after PRs #129/#130/#131 -- current active branch, if active: `docs/post-v0.2.1-overhaul` (v0.2.2 prep + doc/artifact sweep) -- current active status: `v0.2.1` is the latest tag; a post-v0.2.1 hardening + perf wave (#135, #146) - is merged to `main` and unreleased, being cut as `v0.2.2` (decoder enum rejection, network/CLI - hardening, a real UBSan abort gate, OCaml diff_report robustness, and two measured order-book perf - wins, `try_emplace` ~+5% and an index load-factor cap ~+18.6%). `make check` 270/270 and - `make asan` 270/270 (the latter now under the real UBSan gate) on the bare-metal Apple M2 Fedora - Asahi host; every touched file passes the CI CodeScene Code Health gate +- current active branch, if active: none (`main` is at the `v0.2.2` release) +- current active status: **`v0.2.2` is the latest tag**: the post-v0.2.1 hardening + perf wave (#135, + #146: decoder enum rejection, network/CLI hardening, a real UBSan abort gate, OCaml diff_report + robustness, and two measured order-book perf wins, `try_emplace` ~+5% and an index load-factor cap + ~+18.6%), plus a documentation overhaul, a `PERFORMANCE.md` v0.1.0-to-v0.2.2 evidence report, and a + bug/style/mermaid sweep (#147-#150). `make check` 272/272 and `make asan` 272/272 (the latter now + under the real UBSan gate) on the bare-metal Apple M2 Fedora Asahi host; every touched file passes + the CI CodeScene Code Health gate - release tag: `v0.2.1` (Latest, tagged on the release-PR merge commit), after `v0.2.0` and `v0.1.0`; `v0.2.2` prepared on this branch, not yet tagged - open follow-up issue: #90, narrowed to the full cache-counter PMU set; the bare-metal Apple host diff --git a/PROGRESS.md b/PROGRESS.md index 366e98b..245a0e9 100644 --- a/PROGRESS.md +++ b/PROGRESS.md @@ -20,24 +20,25 @@ Do not rely on prior chat memory. ## Current state -- **Active milestone:** none, `v0.2.1` is the latest tag, but a post-v0.2.1 hardening + perf wave - (12 PRs, #135, #146) has merged to `main` and is **unreleased**; it is being cut as **`v0.2.2`** -- **Status:** ☑ `v0.2.1` published on top of `v0.2.0`; ☐ `v0.2.2` in preparation, security/robustness - hardening (decoder enum-domain rejection, network/CLI hardening, a real UBSan abort gate, OCaml - diff_report robustness) plus two measured order-book perf wins -- **Active branch:** `docs/post-v0.2.1-overhaul` (the v0.2.2 prep + full doc/artifact staleness sweep) +- **Active milestone:** none, **`v0.2.2` is the latest tag**. It bundled the post-v0.2.1 + hardening + perf wave (#135, #146) plus a full documentation overhaul (#147, #149), a reproducible + performance-evidence report (#148), and a bug/style sweep with mermaid diagrams (#150) +- **Status:** ☑ `v0.2.2` published on top of `v0.2.1`: security/robustness hardening (decoder + enum-domain rejection, network/CLI hardening, a real UBSan abort gate, OCaml diff_report + robustness), two measured order-book perf wins, `PERFORMANCE.md` (v0.1.0 to v0.2.2 evidence), + README rebuild, repo-wide em-dash purge, and mermaid diagrams across the docs +- **Active branch:** none (`main` is at the `v0.2.2` release) - **Last completed milestone:** M49. NIC offload and low-latency networking study (PR #124, - d8c16b2). Releases since: `v0.2.0` (PR #127, ded6e80) and `v0.2.1` (FIX adapter #131, flamegraph - #134, anchor sweep #129). Post-v0.2.1 unreleased work on `main`: #135, #146 (see Last action) -- **Last completed docs sync:** this v0.2.2-prep overhaul, every `.md`/`.txt` audited against - current `main`; resume/release anchors, README, CHANGELOG, and all stale `results/*.txt` + d8c16b2). Releases since: `v0.2.0` (PR #127, ded6e80), `v0.2.1` (#131/#134/#129), and `v0.2.2` + (#135 through #150) +- **Last completed docs sync:** the v0.2.2 release sweep, every `.md`/`.txt` audited against + current `main`, em/en dashes removed repo-wide, mermaid diagrams added, and all `results/*.txt` provenance digests brought current to HEAD -- **Release:** `v0.1.0` (tag on 9857e1a), `v0.2.0` (tag on ded6e80), and `v0.2.1` (tag on the - release-PR merge, marked Latest) published as GitHub-only releases; `v0.2.2` prepared here, not yet - tagged; no packages published -- **`make check` passing:** yes, `make check` 270/270 and `make asan` 270/270 (the latter now under +- **Release:** `v0.1.0` (tag on 9857e1a), `v0.2.0` (tag on ded6e80), `v0.2.1`, and `v0.2.2` + published as GitHub-only releases; no packages published +- **`make check` passing:** yes, `make check` 272/272 and `make asan` 272/272 (the latter now under the **real** UBSan abort gate from #142) on the bare-metal Apple M2 (aarch64) Fedora Asahi host on - 2026-06-24 + 2026-06-25 - **Last action:** post-v0.2.1 hardening + perf wave merged to `main` as 12 scoped PRs (#135, #146), driven by a multi-round adversarial bug hunt (converged 5→2→1→0 confirmed) and flamegraph-guided optimization. Security/robustness: reject out-of-domain enum bytes in the replay/protocol decoders @@ -49,12 +50,12 @@ Do not rely on prior chat memory. abort the batch (#144). Perf (measured A/B): baseline price levels use `try_emplace` (~+5%, #138) and the order-index hash caps its load factor at 0.25 (~+18.6%, #145); flamegraph regenerated (#135, #139, #146). Determinism preserved throughout (byte-identical fixtures, OCaml differential - pass). `make check`/`make asan` 270/270. -- **Next action:** finish the `v0.2.2` overhaul (this branch): regenerate the remaining stale - `results/*.txt` artifacts, then cut the `v0.2.2` tag/release. After that, the highest-value - remaining work is non-code and gated: issue #94 (independent external review, needs a human - reviewer) and issue #90 (full cache-counter PMU evidence, needs a PMU microarchitecture that - exposes cache events, e.g. x86_64). + pass). The wave then gained a documentation overhaul (#147, #149), a `PERFORMANCE.md` evidence + report (#148), and a bug/style/mermaid sweep (#150), all tagged as `v0.2.2`. `make check`/`make + asan` 272/272. +- **Next action:** none, `v0.2.2` is released. The highest-value remaining work is non-code and + gated: issue #94 (independent external review, needs a human reviewer) and issue #90 (full + cache-counter PMU evidence, needs a PMU microarchitecture that exposes cache events, e.g. x86_64). - **Blockers:** issue #90 is now a *cache-counter* PMU gap, not a host-access gap, this bare-metal Apple M2 exposes real `cycles`/`instructions`/`branches`/`branch-misses` but its PMU does not implement `cache-references`/`cache-misses`; closing it needs a PMU microarchitecture that exposes @@ -865,14 +866,15 @@ Quant Systems Lab. Linux Systems + Exchange Infrastructure Simulator ## Next action remains -`v0.2.1` is the latest tag, on top of `v0.2.0` (PR #127 ded6e80) and `v0.1.0`. A post-v0.2.1 -hardening + perf wave (#135, #146) is squash-merged to `main` and **unreleased**, being cut as -`v0.2.2`: out-of-domain enum rejection in the decoders (#136); network hardening. EINTR retry, -accept fairness, connection cap, UDP send-error tracking, transient-accept survival, and fd-exhaustion -handling (#137, #140, #143); CLI arg validation (#141); a real UBSan abort gate (#142); OCaml -`diff_report` robustness (#144); and two measured order-book perf wins, `try_emplace` (~+5%, #138) -and the order-index load-factor cap (~+18.6%, #145), with the flamegraph regenerated (#135/#139/#146). -`make check`/`make asan` 270/270. The committed perf artifacts remain **partial hardware PMU +**`v0.2.2` is the latest tag**, on top of `v0.2.1`, `v0.2.0` (PR #127 ded6e80), and `v0.1.0`. It +bundled the post-v0.2.1 hardening + perf wave (#135, #146): out-of-domain enum rejection in the +decoders (#136); network hardening. EINTR retry, accept fairness, connection cap, UDP send-error +tracking, transient-accept survival, and fd-exhaustion handling (#137, #140, #143); CLI arg +validation (#141); a real UBSan abort gate (#142); OCaml `diff_report` robustness (#144); and two +measured order-book perf wins, `try_emplace` (~+5%, #138) and the order-index load-factor cap +(~+18.6%, #145), with the flamegraph regenerated (#135/#139/#146); plus a documentation overhaul, a +`PERFORMANCE.md` v0.1.0-to-v0.2.2 evidence report, and a bug/style/mermaid sweep (#147-#150). +`make check`/`make asan` 272/272. The committed perf artifacts remain **partial hardware PMU evidence** from this bare-metal Apple M2 (aarch64) Fedora Asahi host, real cycles/instructions/branches/branch-misses with cache-reference/cache-miss counters unsupported by the Apple Silicon PMU, not NIC-offload, latency, or full hardware-PMU evidence. diff --git a/docs/release_readiness.md b/docs/release_readiness.md index 2eaf811..d7ea7cf 100644 --- a/docs/release_readiness.md +++ b/docs/release_readiness.md @@ -16,8 +16,8 @@ after squash-merge. | Check | Result | |---|---| -| `make check` | 270/270 tests pass, no warnings (incl. the FIX-adapter, flamegraph-renderer, decoder enum-rejection, and CLI-arg-validation tests) | -| `make asan` (ASan + UBSan) | 270/270, sanitizer-clean; the UBSan gate now **aborts** on the first violation (`-fno-sanitize-recover=undefined`, #142), so pure-UBSan defects no longer pass green, and the tree is clean under it | +| `make check` | 272/272 tests pass, no warnings (incl. the FIX-adapter, flamegraph-renderer, decoder enum-rejection, CLI-arg-validation, and perfeval-harness tests) | +| `make asan` (ASan + UBSan) | 272/272, sanitizer-clean; the UBSan gate now **aborts** on the first violation (`-fno-sanitize-recover=undefined`, #142), so pure-UBSan defects no longer pass green, and the tree is clean under it | | `make tsan` (ThreadSanitizer) | 20/20 concurrency-labelled tests, race-clean | | `make check-fixtures` | committed differential fixtures match current C++ output | | `make check-manifest` | provenance manifest matches the committed fixtures | @@ -93,9 +93,10 @@ verification. ## Outcome -Release-ready as a portfolio artifact. `v0.2.1` is already tagged (FIX adapter #29, perf flamegraph -issue #32, anchor sweep) on top of `v0.2.0` (Phase III/IV systems work, M24-M49, plus the bare-metal -evidence refresh). The next GitHub-only release is **`v0.2.2`**, bundling the post-v0.2.1 -hardening + perf wave merged to `main` (#135, #146): decoder enum rejection, network/CLI hardening, a -real UBSan abort gate, OCaml diff_report robustness, and the two measured order-book perf wins. It -requires explicit human approval and a squash-merge before tagging. +Release-ready as a portfolio artifact. `v0.2.2` is tagged on top of `v0.2.1` (FIX adapter #29, perf +flamegraph issue #32, anchor sweep) and `v0.2.0` (Phase III/IV systems work, M24-M49, plus the +bare-metal evidence refresh). `v0.2.2` bundled the post-v0.2.1 hardening + perf wave (#135, #146: +decoder enum rejection, network/CLI hardening, a real UBSan abort gate, OCaml diff_report robustness, +and the two measured order-book perf wins) plus a full documentation overhaul (#147, #149), a +reproducible performance-evidence report comparing v0.1.0 to v0.2.2 (#148), and a bug/style sweep +with mermaid diagrams (#150). Each release is a GitHub-only tag with explicit human approval.