Skip to content

0.9.0 hardening — verifiability/enforcement/crash-integrity, netbat+TLS, crypto-shred#153

Merged
heyoub merged 51 commits into
mainfrom
feat/0.9.0-hardening
Jul 2, 2026
Merged

0.9.0 hardening — verifiability/enforcement/crash-integrity, netbat+TLS, crypto-shred#153
heyoub merged 51 commits into
mainfrom
feat/0.9.0-hardening

Conversation

@heyoub

@heyoub heyoub commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

0.9.0 hardening

Pre-0.9.0 truth-up audit (W1–W5) + the full backlog + a complete crypto-shred subsystem. 37 commits; every one diff-reviewed + gate-verified locally (structural, clippy -D, public-api baselines, doctests) before it landed.

🔴 Headline: a CRITICAL data-corruption bug caught + fixed

CompactionStrategy::Retention/Tombstone silently made every surviving event unreadable (survivors were re-encoded as a msgpack map where the reader expects raw bytes) — present in the index but undecodable via get/walk_ancestors/project. It would have shipped in 0.9.0 invisibly. Survivors now re-emit their original payload bytes verbatim; event_hash is byte-stable across compaction. (#130)

By workstream

  • W1 verifiability — signing policy + fail-closed signing, Store::verify_chain(), ChainVerification, receipt-safety defaults (Blake3, fail-closed sink).
  • W2 enforcementEventPayloadValidationFailFast default (kind collision + incomplete upcast refuse open), capability tokens enforced at checkout, effect axes backed (read_event/query_projection/emit_receipt/use_host_control).
  • W3 crash-integrity — routed the crash-sensitive FS ops through StoreFs + a torn-publish reopen oracle; observable ancestry-walk boundary.
  • W4 netbat — worker-panic containment, unified flume concurrency (ConnectionLimit + concurrent subscriptions), opt-in server-only TLS (tls feature, rustls), exhaustive ERR golden pins, documented trusted-transport / no-auth stance.
  • W5 docs — published-docs truth sweep + backlog docs currency + zero-domain sweep of the examples.
  • verify_registry() + opt-in startup-registry-check (release-binary kind-collision check).

🔐 Crypto-shred (opt-in payload-encryption feature)

A complete encrypt-at-rest + cryptographic-erasure subsystem: user payloads encrypted under per-scope keys (XChaCha20-Poly1305); Store::shred_scope(selector) destroys a scope's key → its plaintext becomes permanently unrecoverable while verify_chain/receipts/signatures stay byte-for-byte intact (identity is over the stored ciphertext). Key-aware across every read consumer — append/read, projection, compaction, delivery, ancestry. Durable crash-safe keyset (fail-closed on corruption); a newly-minted key is flushed durable before the data it encrypts is acked. The default build pulls no AEAD dep. batpak knows only "key for scope X destroyed"; the app layer maps erasure to its own policy.

Release readiness

  • Versions 0.8.3 → 0.9.0 (family + kernel track; tools stay 0.1.0); pins consistent (check-version-pins: ok).
  • cargo deny check green (allowed BSD-3-Clause + ISC for the crypto/TLS stack; dropped the unmaintained rustls-pemfile for rustls's built-in PEM parser).
  • CHANGELOG stamped [0.9.0]; docs current; all 3 public-api baselines match.
  • Not yet run: the heavy-validation batch (cloud mutation/fuzz/coverage) — that's the CI/review pass.

Pre-1.0 breaking changes (migration notes in the CHANGELOG): max_connectionsConnectionLimit; use_host_control widened to a subset-checked axis; the FailFast / Blake3 / fail-closed-sink default flips.

🤖 Generated with Claude Code

https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6

Summary by CodeRabbit

  • New Features

    • Added opt-in payload encryption with crypto-shredding, durable scope-based key storage, and shred_scope with shredded-safe behavior across reads, deliveries, projections, and ancestry.
    • Added verify_chain() and stronger startup/open-time verification via verify_registry(); optional full chain recomputation supported.
    • Updated network services with optional server-only TLS, ConnectionLimit, and configurable subscription dispatch (concurrent vs sequential).
  • Bug Fixes

    • Payload validation now defaults to fail-fast on collisions (opt-in to warn).
    • Improved durability and consistency across compaction, checkpoints, recovery, and receipt/signing behavior; refreshed examples.

Greptile Summary

This PR delivers the full 0.9.0 hardening pass across five workstreams: a critical data-corruption fix in compaction, an opt-in crypto-shred subsystem (payload encryption + shred_scope), chain verification (verify_chain + ChainVerification::Recompute), fail-closed signing policy defaults, and opt-in server-only TLS for netbat with a reworked ConnectionLimit admission model.

  • Compaction bug fixed: write_scanned_entry now re-emits entry.payload_bytes verbatim instead of re-serializing the decoded serde_json::Value, which was writing a msgpack map where readers expect raw bytes — making every compaction survivor unreadable.
  • Crypto-shred durability fence: the durability invariant (key durable before ciphertext durable) is enforced via a dirty flag that persists across failed flushes; needs_fence = guard.is_dirty() rather than = minted ensures a key stranded by a failed fence-flush is re-flushed on the next write, not silently skipped.
  • TLS subscription (netbat): a single-threaded multiplex drives rustls over one worker thread by toggling the socket between non-blocking (control drain) and blocking (delivery write), avoiding the split-socket constraint that makes try_clone impossible over StreamOwned.

Confidence Score: 5/5

The PR is safe to merge; the compaction data-corruption fix and the crypto-shred durability fence both implement their invariants correctly.

Every changed code path examined lines up with its stated intent. The compaction fix re-emits payload_bytes verbatim, directly addressing the msgpack-map-vs-raw-bytes mismatch. The crypto-shred fence uses guard.is_dirty() rather than minted, correctly re-raising the fence after a failed prior flush. Signing is now fail-closed by default. No logic errors, data-loss paths, or security regressions found.

No files require special attention. The heavy-validation batch noted as not yet run in the PR description is the natural next CI gate.

Important Files Changed

Filename Overview
bpk-lib/crates/core/src/store/lifecycle_compact.rs Fixes critical compaction data-corruption: write_scanned_entry now emits entry.payload_bytes verbatim instead of re-serializing the decoded serde_json::Value. FS ops routed through StoreFs for fault injection.
bpk-lib/crates/core/src/store/write/writer/encrypt.rs New file: durability fence for crypto-shred on the single-append path. needs_fence = guard.is_dirty() correctly re-raises the fence after a prior flush failure, closing the key-stranding gap flagged in the prior thread.
bpk-lib/crates/core/src/store/keyscope.rs New file: KeyStore, KeyScope, PayloadKey, and scope derivation. dirty flag cleared only on successful flush; mark_dirty called on mint and destroy. Scope discriminants are stable explicit constants.
bpk-lib/crates/core/src/store/keyscope/persist.rs Atomic single-file keyset persistence; fail-closed on corrupt keyset. Transient key copies zeroized explicitly after serialization. Symlink-leaf guard on load.
bpk-lib/crates/core/src/store/write/writer/batch.rs Adds per-item seal results and a single batch-level durability fence; needs_fence aggregated across all items before any frames are written.
bpk-lib/crates/core/src/store/signing.rs Adds allow_downgrade field and SigningPolicy; sign_append_receipt now returns Result and fails closed unless downgrade is explicitly permitted.
bpk-lib/crates/netbat/src/transport/tls.rs New file: server-only TLS config. from_pem builds rustls config with ring provider, no client auth, TLS 1.2+1.3. Handshake runs post-permit; failures counted, never listener-fatal.
bpk-lib/crates/netbat/src/transport/stream_tcp_tls.rs New file: single-threaded TLS subscription multiplex. Socket toggled non-blocking only during control drain, restored blocking before delivery writes.
bpk-lib/crates/netbat/src/transport/limiter.rs New ConnectionLimit enum replacing pre-0.9 lifetime-only max_connections. Concurrent uses a flume permit pool; ConnectionPermit Drop returns the slot on all exit paths.
bpk-lib/crates/core/src/store/receipt_verification.rs Adds is_signed() distinguishing cryptographic proof from is_valid(). Small, well-tested addition.

Reviews (9): Last reviewed commit: "chore(mutants): witness the all-features..." | Re-trigger Greptile

heyoub and others added 30 commits June 30, 2026 10:57
`operation_name.rs` listed "TS client" among the validators that must reach for
the canonical `OperationName` grammar. The deterministic TS client (P6) is cut
from the 0.9.0 surface, so drop the dangling reference rather than let it keep
resurfacing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…ility)

Two verifiability holes in receipt signing, both closed as settings with the
SAFE path as the default — nothing removed.

1. Unsigned receipts verified as VALID. A keyless store returned
   `UnsignedAccepted` and `is_valid()` reported `true`, so "I verified this
   receipt" passed green on a receipt carrying no cryptographic proof. New
   `SigningPolicy::Required` (the rigor opt-in) refuses to open a store with no
   signing key; `Optional` (default) keeps the keyless "regular store" working.
   Added `ReceiptVerification::is_signed()` so a caller can demand cryptographic
   authenticity instead of conflating it with `is_valid()`.

2. A configured signer SILENTLY emitted unsigned. On a signature-cover build
   failure a configured signer downgraded to an unsigned receipt and returned
   Ok. Now `sign_append_receipt` returns `Result` and fails the append closed;
   `StoreConfig::with_signing_downgrade_allowed(true)` is the explicit opt-in
   that keeps the best-effort downgrade path alive.

Behavior-preserving: extracted `enforce_expected_sequence` from `handle_append`
so it stays under its complexity ratchet after the new fail-closed `?`.

Red fixtures (tests/signing_policy.rs + inline): Required+keyless refuses to
open; `is_signed` != `is_valid` for an unsigned-accepted receipt; downgrade is
opt-in; cover-failure is fatal unless downgrade is allowed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
… baseline

A plain read trusted the self-reported `event_hash` (guarded only by the
per-frame CRC), and `verify_chain` had ZERO production callers — so the
"tamper-evident chain" claim was CRC-grade, not blake3-recompute-grade.

Add `Store::verify_chain() -> ChainVerificationReport`: recomputes blake3 over
every committed event's actual content bytes, confirms it matches the stored
`event_hash`, then confirms every non-genesis `prev_hash` references a verified
event. On-demand and O(events): a regular store pays nothing; a regulated one
calls it for genuine tamper evidence.

Also refreshes traceability/public_api/batpak.txt for the new surface across
this branch: `SigningPolicy`, `StoreConfig::with_signing_policy`,
`ReceiptVerification::is_signed`, `StoreConfig::with_signing_downgrade_allowed`,
`Store::verify_chain`, and `ChainVerificationReport`.

Red fixture (tests/chain_verification.rs): a multi-entity store verifies intact;
the report recomputes every event and flags no mismatch or dangling link.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…d sink (W1)

Two W1 verifiability MAJORs where the runtime silently produced unverifiable
receipts, both closed as safe-defaults with an explicit opt-out (the
SigningPolicy idiom):

1. Receipts were unhashed by default. `ReceiptHashPolicy` defaulted to
   `Deferred` (`.hash()` -> None), so every receipt recorded
   `input_hash=None, output_hash=None` and bound to no bytes. New `Blake3`
   variant is the default (32-byte digest over the raw input/output bytes);
   `Deferred` stays reachable as the explicit opt-out for a layer that hashes
   and binds the bytes itself.

2. A Core built without a `receipt_sink` silently dropped every receipt.
   `build()` now fails closed with `BuildError::MissingReceiptSink` unless the
   caller wired a sink or stated the intent with `CoreBuilder::without_receipts()`.
   hostbat's production path opts out explicitly, so the absence is a stated
   choice rather than a silent drop. (Whether hostbat itself should require a
   host-level sink is a separate, deliberate follow-up.)

Red fixtures (tests/runtime.rs): a sinkless build without opt-out is rejected;
the DEFAULT hash policy binds the receipt to Blake3(input)/Blake3(output) — both
were None under the old default. Cross-crate sinkless test cores
(syncbat/netbat/hostbat) opt out. Public-api baseline refreshed (+6).

Diff reviewed + gates re-verified (structural-check ok, syncbat tests green,
clippy -D clean) before commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…pen (W1)

`Store::verify_chain()` recomputes blake3 on demand; this adds the SETTING that
runs it automatically at open — the "do both" knob the owner asked for:

- `ChainVerification::Crc` (default): trust the per-frame CRC, no rehash at open —
  a regular store pays nothing.
- `ChainVerification::Recompute` (opt-in): recompute blake3 over every committed
  event at open and FAIL CLOSED with `StoreError::ChainVerificationFailed` on any
  content-hash mismatch or dangling chain link — the regulated tamper-evidence
  posture.

Wired into both the read-write and read-only open paths via a shared
`run_open_chain_verification` helper, with a pure `chain_verification_failure`
decision split out so Recompute-vs-intact is unit-testable without forging
on-disk tampering. The new `StoreError::ChainVerificationFailed { mismatches,
dangling }` variant is threaded through every exhaustive match; its `Display`
body is a delegated helper so `Display::fmt` stays under its complexity ratchet.

Also corrects the recovery_manifest doc: the content `event_hash` is CRC-guarded
by default and blake3-recompute-verified only under Recompute / `verify_chain()`
— not unconditionally "unforgeable".

Red fixtures: a Recompute open of an untampered multi-entity store opens intact;
the failure decision maps a non-intact report to `ChainVerificationFailed` with
the right counts; the `Display` names both. Public-api baseline +11 (chain surface).

Diff reviewed + gates re-verified (structural-check ok, tests green, clippy -D
clean, baseline delta is exactly the chain surface) before commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…ail closed (W2)

A linked-binary `EventKind` collision (two payload types claiming the same
`(category, type_id)`) gives the binary ambiguous wire identity — a build/wiring
bug, not a runtime warning. `EventPayloadValidation` defaulted to `Warn`
(log-once-and-proceed), so a colliding binary opened anyway.

Flip the default `Warn` -> `FailFast`: `Store::open` now refuses to open when the
linked payload registry contains a collision. `Warn` and `Silent` stay reachable
as explicit opt-outs — the same safe-default/escape-hatch idiom as the signing,
receipt-hashing, and chain-verification defaults.

Blast radius was exactly ONE site: the kind-collision-composer fixture's
default-open test (it links a real cross-crate collision) now explicitly requests
`Warn` and is renamed accordingly. The other collision fixtures are compile-only
(never open a store) and are unaffected. No public-api change (moving `#[default]`
leaves the surface text identical).

Red fixture (tests/event_payload_collision_default_fail_fast.rs): seeds a real
link-time collision via `inventory::submit!` (bypassing the derive's cfg(test)
panic-test) and proves default-FailFast refuses the open while explicit-Warn
still opens; both guard against vacuity with a precondition assert.

Follow-up flagged (not built): the derive's collision check is still
`#[cfg(test)]`-only, so a release binary that registers payloads but never opens a
Store still gets no check — a real linkable assertion needs life-before-main
linkage beyond this crate's machinery.

Diff reviewed + gates re-verified (structural ok, red fixture 2/2, unaffected
tests 23/23, clippy -D clean, baseline unchanged) before commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…nted (W2)

`OperationEffectRow.requires_capabilities` was decorative: a free-form token
declared via `requires_capability(...)` (or the macro) landed in the declared
row and was checked against nothing — no grant set existed, so a declared
capability could never deny. Confirmed zero production readers of
`requires_capabilities()` before this.

Give `Core` a runtime-granted capability set (`CoreBuilder::grant_capability` /
`grant_capabilities`) and enforce `declared.requires_capabilities ⊆ granted` at
checkout — failing closed (a `capability.denied` denial receipt +
`RuntimeError::denied`) before the handler/guard runs, mirroring the existing
observed-effect-row denial.

Design note: the five effect-axis tokens (`event.read`, `event.append`,
`projection.query`, `receipt.emit`, `host.control`) are AUTO-declared by the
effect builders and already mediated by the observed⊆declared effect-row check,
so they are ambient and skipped by the grant gate (`is_reserved_effect_capability`
reuses the auto-population's own consts). Only the remaining free-form tokens —
the actual decorative-until-now surface — are gated. Zero blast radius: existing
ops declare only the ambient tokens.

Red fixture (tests/capability_authz.rs): an op declaring `requires_capability` on
an ungranted Core is DENIED with `capability.denied` naming the token; the same
op on a granted Core succeeds (both setters); an op with no extra tokens still
runs on an ungranted Core. Baseline +4 (the two setters).

Follow-up flagged: a dedicated capability-grant invariant (couples to
invariants.yaml + capability-snapshot + docs-catalog + README count) was left
unminted — out of scope for this local change; the gate cites the existing
effect-row enforcement invariant for now.

Diff reviewed + gates re-verified (structural ok, capability_authz + effect tests
green, clippy -D clean, baseline = the 2 setters) before commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
`#[batpak(version = N)]` compiled fine but, with no registered upcast chain,
silently stranded old events — they became undecodable at READ time
(`UpcastError::MissingStep`), an author-time footgun the derive never caught.

Catch it at open instead: for every registered payload kind declaring
`PAYLOAD_VERSION = N > 1`, verify the linked `Upcast` registry covers every hop
`1 -> ... -> N`; an incomplete chain now FAILS `Store::open` closed
(`StoreError::UpcastChainIncomplete`) naming the kind, its version, and the
missing hops — rather than letting historical events rot until first read.

- `macros-support`: `EventPayloadRegistration` gains a doc-hidden `payload_version`
  (stamped by the derive) so a binary-wide scan can enumerate `(kind, version)`;
  new `find_incomplete_upcast_chains()` mirrors `find_kind_collisions()` over the
  same link-time inventory — no new life-before-main machinery.
- `event::upcast`: public `IncompleteUpcastChain` / `UpcastChainRegistryError`
  (keyed by `EventKind`) + cached validate/revalidate, mirroring `event::payload`.
- `open.rs`: the existing payload-registry validation splits into collision +
  upcast-chain passes, both under the single `EventPayloadValidation` policy —
  default `FailFast` refuses, `Warn`/`Silent` are the explicit opt-outs (same knob
  the collision check already uses; deliberate, to avoid a parallel policy).
- testkit StoreError contract + prelude extended for the new variant.

Red fixtures (separate binaries — registries are binary-global): a `version=2`
kind with NO upcast step fails to open naming the missing hop `1`; explicit
`Warn` opens despite it; a `version=2` kind WITH a `1->2` step and a `version=1`
kind both open clean. Gate-bites proven: neutralizing the check turned the red
fixture red, restoring it returned green.

Diff reviewed + gates re-verified (structural ok, new fixtures + schema_evolution
green, clippy -D clean, baseline = the upcast surface only) before commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
Of the 5 EffectBackend axes only `append_event` had a production impl; the other
four fell through to typed `EffectError` "not supported" defaults, so an op
declaring an effect it can't perform could register but never succeed.

Implement the two that wire cleanly to what `StoreEffectBackend` holds (its store
+ bound coordinate):
- `read_event` — mediates the declared read through the real read-by-id path
  (`by_entity` -> `read_raw`): genuine index lookup + disk read + decode, so a
  declared event read succeeds (and a corrupt-store read surfaces as
  `EffectError`) instead of unconditionally erroring. The effect-backend layer is
  effect MEDIATION (the handle records the observed read for the
  observed ⊆ declared check); event data itself flows via the store read API.
- `query_projection` — mediates the declared projection read through the
  coordinate's scope query (`by_scope`); type-erased (a trait object cannot name
  the projection `T`), so it wires to the untyped scope read the fold replays over.

`emit_receipt` and `use_host_control` deliberately stay on their fail-closed
defaults and are FLAGGED, not half-built:
- `emit_receipt` — the sink is Core-level (not held by the store backend) and the
  axis carries only a kind token, not a full `ReceiptEnvelope`; backing it needs
  the emit API widened + the Core sink plumbed in (follow-up).
- `use_host_control` — host authority (hostbat), not a store concept; belongs to a
  host-layer backend (kernel track).

Red->green fixtures (tests/store_effect_backed.rs, 6): read + query ops now
succeed end-to-end (each failed with the exact stub message before the impl); the
append-only backend still fails both closed; emit_receipt/use_host_control stay
fail-closed (pins the flagged axes). No public-api change (override signatures
match the trait defaults the baseline already lists).

Diff reviewed + gates re-verified (structural ok, store_effect_backed 6/6 +
effect_enforcement green, clippy -D clean) before commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
… C2, HIGH)

The atomic-rename/persist cluster — `rename`, `remove_file`, `named_temp_in`,
`persist_temp_with_parent_sync` — were free functions BYPASSING the `StoreFs`
trait, so the deterministic `SimFs` could not fault them. Yet they run the most
crash-sensitive paths of the SHIPPED crate: compaction swap/rollback, visibility-
range persist, and cursor-checkpoint persist. The crash harness could never tear
those atomic-rename sequences — directly undercutting the crash-recovery rigor.

Move the cluster onto the `StoreFs` trait:
- Trait gains `rename`/`remove_file`/`named_temp_in`/`persist_temp_with_parent_sync`
  (`remove_file_if_present` is a provided default over the one faultable primitive).
- `RealFs` delegates byte-for-byte to the existing `platform::fs::*`/`sync::*` free
  fns — production-identical (14 compaction tests + the existing pre-swap-rename
  rollback test still green).
- `SimFs` gains a deterministic `CrashOp { Rename, RemoveFile, PersistTemp }` fault
  schedule (mirroring its `enospc_on_copy`), so each routed op is fault-injectable.
- The compaction (`lifecycle_compact`), visibility (`hidden_ranges`), and
  cursor-checkpoint (`delivery/cursor`) call sites now dispatch through `config.fs()`.
  Public `Cursor::save_checkpoint` is preserved (delegates to RealFs) with a new
  `pub(crate) save_checkpoint_with_fs` holding the routed body — no public-api change.

Proof (sim/atomic_fault.rs, 3): each pairs a RealFs CONTROL (succeeds —
behavior-preserving) with a SimFs fault on the SAME op — visibility persist and
checkpoint persist surface `Err`, and a compaction-swap rename tear yields a clean
`CompactionOutcome::Failed` rollback. Unfaultable before (the free fns took no fs
handle). The `STORE-PLATFORM-FS-ROUTING` boundary list + 0.9.0 release witness are
updated: only `read_exact_at` remains a direct free fn.

Flagged follow-ups (precise, not half-built): `read_exact_at` (a positioned read —
needs fs threaded into Reader/fd-cache + a ReadAt fault model); the
`write_file_atomically` cold-start-artifact seam (pending-compaction marker +
checkpoint/mmap/idempotency — route as one shared follow-up); and a full
reopen-after-torn-publish crash-recovery oracle over the now-routed persists (the
`sim/recovery` harness can host it).

Diff reviewed + gates re-verified (structural ok, 3 proof fixtures + 14 compaction
tests green, clippy -D clean incl. dangerous-test-hooks) before commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…no longer silent (W3 C5)

`Store::walk_ancestors` returned a bare `Vec`, collapsing two very different
outcomes into the same shape: a chain that genuinely reached genesis, and one
TRUNCATED early because a Retention compaction dropped a mid-chain event (leaving
a surviving descendant whose `prev_hash` dangles — `parent_event_id_by_hash`
returns None, the walk just `break`s). The dangling-parent case had no log and no
diagnostic: silently lossy, indistinguishable from a complete chain.

Make the boundary observable:
- New `pub enum AncestryBoundary` (ReachedGenesis / LimitReached /
  MissingParent{child} / ReadFailure{event_id} / Cycle{event_id} / NoAnchor) and
  `pub struct AncestorWalk { ancestors, boundary }` with `reached_genesis()` /
  `truncated_at()`.
- `collect_ancestors` returns the boundary; new `Store::walk_ancestors_outcome`
  exposes it; `Store::walk_ancestors` delegates and keeps its `Vec` signature
  (delegate-to-variant, public API preserved — baseline +18 additive).
- No `StoreError` variant: a walk boundary is a normal outcome, not an error.

Coherence proof (tests/store_ancestors_retention_coherence.rs): a Retention
compaction drops a mid-chain parent; the walk from a surviving descendant now
reports `MissingParent{child}`, `reached_genesis()==false`, `truncated_at()==Some`
— not a silent short prefix. Non-vacuous (asserts the compaction Performed and the
dropped event is NotFound). Companion proves an intact chain reports ReachedGenesis.

Diff reviewed + gates re-verified (structural ok, coherence 2/2 + ancestry +
14 compaction tests green, clippy -D clean, baseline additive) before commit.

NOTE: while building this, the agent surfaced a SEPARATE critical data-corruption
bug in Retention/Tombstone compaction (survivors written as a decoded Value but
read back as raw bytes -> unreadable). Tracked + fixed separately; this C5 test is
deliberately constructed to read only the live anchor, never a corrupted survivor.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…vent's payload (CRITICAL)

CONFIRMED data-corruption bug in the shipped store. `write_scanned_entry` (the
Retention/Tombstone per-survivor write path) built `FramePayload { event: entry.event }`
and frame-encoded it — but `entry.event` is `Event<serde_json::Value>` (the DECODED
payload, kept for the keep/drop predicate), so the survivor's payload was serialized
as a msgpack MAP. Every read path decodes a frame as `FramePayload<Vec<u8>>`
(`decode_frame_payload_raw`), where `event.payload` must be raw BYTES. Map-where-bytes
-> `Serialization(Syntax("invalid type: map, expected a sequence"))`.

So after ANY `CompactionStrategy::Retention` or `Tombstone` compaction, every
SURVIVING event was present in the index but UNREADABLE via `get`/`walk_ancestors`/
`project`. `Merge` was immune (it byte-copies frames). It shipped silently because no
test ever read a survivor's payload after a Retention/Tombstone compaction — existing
tests only assert dropped->NotFound and index counts.

Fix: carry the survivor's ORIGINAL `event.payload` bytes on `ScannedEntry.payload_bytes`
(captured in the scan's existing raw decode — zero extra work, zero user-payload
re-encode); `write_scanned_entry` rebuilds an `Event<Vec<u8>>` from those bytes + the
verbatim header + verbatim `hash_chain`, re-encoding only the outer frame envelope.
Because every field is verbatim and msgpack is deterministic, a kept frame is
byte-identical to the original — so the survivor reads back faithfully AND its
`event_hash` (blake3 over the payload) is byte-stable across compaction (no chain/
receipt drift). The decoded `Value` stays on `entry.event` purely for the Retention
predicate (keep/drop semantics unchanged).

Red->green proof (tests/store_compaction_survivor_payload.rs, 2): a Retention and a
Tombstone compaction each KEEP a survivor `S` in a sealed merged segment; `get(S)`
reads back the ORIGINAL payload, `walk_ancestors` surfaces it, and the POST-compaction
stored `event_hash` equals `S`'s PRE-compaction append-receipt `content_hash`
(byte-stability, not just decodability). Both were RED before (the survivor `get`
panicked with the exact decode error). Non-vacuous (compaction `Performed`, >=1 segment
removed, the doomed event NotFound / tombstoned).

Flagged, not fixed (separate semantics): whether a TOMBSTONE should redact its payload
/ recompute its hash (it currently keeps the original bytes with the kind rewritten) is
a design question, untouched here.

Diff reviewed + gates re-verified (structural ok, 2/2 proof + compaction 14 +
idempotency 6 + ancestry 2 green, clippy -D clean) before commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…nger poisons the listener (W4)

A panic in a connection worker (a buggy handler, an overflow-checked wrap) was
contained during serving (workers run on separate threads), but the listener's
`worker.join().map_err(|_| Io{Other})?` turned a single worker panic into a
listener-WIDE `Err` AND short-circuited the join loop — abandoning every later
worker's join. So one server-side handler bug took down the whole listener.

Catch the panic at the worker boundary: wrap the per-connection serve in
`catch_unwind(AssertUnwindSafe(..))`. A caught panic increments a new
`TcpServeStats::worker_panics` counter, forwards stats, and exits the worker
normally — so `join()` is infallible, the listener returns `Ok`, and the accept
loop keeps serving. The panic is COUNTED, not swallowed (mirrors the existing
`connection_io_failures` observability stance). `max_connections` semantics are
unchanged.

Red->green (tests/tcp_transport.rs): a real localhost listener drives one
connection into a panicking handler (a genuine out-of-bounds index, not a
panic-macro, to respect the zero-panic lint), then a clean request on a second
connection; asserts the server returns `Ok` with `served_requests == 1` and
`worker_panics == 1`. RED confirmed: reverting to the un-caught worker body fails
with the listener-wide `Io{Other}`. Plus a mutant-killing unit test on the
stats-merge `+=`.

Diff reviewed + gates re-verified (structural ok, netbat lib 13 + tcp_transport 13
green incl. the panic test, clippy -D clean, baseline = the one worker_panics line;
batpak/syncbat baselines byte-identical) before commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…ons + ConnectionLimit (W4)

Two netbat exposure findings, fixed on one shared admission model:

1. SUBSCRIPTION SHOWSTOPPER: subscriptions were served inline on the accept thread
   (`run_subscription_loop` blocks until the stream ends). Since subscribers are
   long-lived, only ONE subscriber could ever be connected — a second wasn't
   accepted until the first disconnected. Now each session runs on a
   per-subscription worker (mirroring the request path's `catch_unwind`
   containment), so N subscribers stream concurrently. The existing per-session
   flume control lane + watermark delivery are unchanged — only the session moved
   off the accept thread. `SubscriptionDispatch::{Concurrent (default), Sequential}`
   keeps the prior inline path as an explicit opt-in.

2. CONNECTION-LIMIT FOOTGUN: `max_connections` was a LIFETIME accept budget — the
   listener stopped accepting after N total connections EVER. Replaced (hard, no
   alias — pre-1.0) with `ConnectionLimit::{Concurrent(n) (default), Lifetime(n),
   Unlimited}`. `Concurrent` is a `flume::bounded(n)` permit pool (netbat already
   deps flume — no new primitive): a connection acquires a permit before serving
   and an RAII `ConnectionPermit` returns it on EVERY exit path — normal, error,
   and the caught-panic path. `Lifetime` retains the old budget as an explicit mode
   (both paths built); `Unlimited` is ungated. One pool gates BOTH request and
   subscription connections.

The HLC/clock machinery is deliberately NOT involved — that's event ordering,
orthogonal to a socket cap. Empty-pool behavior is BLOCK (back-pressure, matching
the old exhaustion intent), shutdown-aware. Finished worker handles are pruned
(`retain(!is_finished())`) so a long-lived Concurrent/Unlimited listener doesn't
grow its JoinHandle vec; the stats lane is bounded to the cap (at most that many
workers ever alive to send -> the join phase can't deadlock), unbounded for Unlimited.

Red->green (each RED-confirmed by breaking production + observing the failure):
- subscription_concurrency: two subscribers both stream while both open (Concurrent);
  sequential pins the old one-at-a-time starvation.
- connection_limit: serial N+k all succeed (slot reuse); the (n+1)th blocks while n
  held; a permit releases even when a worker PANICS (composes with the landed
  catch_unwind); Lifetime(n) still stops after n total.

New `limiter.rs` module (permit pool, RAII permit, stats-lane sizing). API hard-break:
`max_connections`/`with_max_connections` -> `connection_limit`/`with_connection_limit`;
+`SubscriptionDispatch`, +`dispatch`, +`worker_panics`.

Diff reviewed (read limiter.rs + the accept-loop integration) + gates re-verified
(structural ok, connection_limit 4/4 + subscription_concurrency 2/2 + full netbat 140
green, clippy -D clean, baseline netbat-only; batpak/syncbat byte-identical).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
netbat shipped no transport security. Add server-only TLS as an OPT-IN,
feature-gated transport — the default build pulls neither rustls nor any TLS dep
(the thin-crate identity is preserved; `cargo tree` confirms no rustls by default).

- `tls` cargo feature -> optional `rustls` 0.23 (ring provider, no aws-lc/cmake) +
  `rustls-pemfile`. All TLS code/types/deps are `#[cfg(feature = "tls")]`.
- `TransportSecurity::{Plaintext (default), #[cfg(tls)] Tls(TlsServerConfig)}`.
  `TlsServerConfig` wraps an `Arc<rustls::ServerConfig>` (no client auth); built
  from PEM bytes or PEM files (`from_pem`/`from_pem_files`). Manual opaque `Debug`
  so key material can't leak. Every cert/key/rustls rejection maps to a typed
  `NetbatError::Io` — no panics, and NO new public error variant (default error API
  byte-identical).
- Sync-first: rustls's blocking `StreamOwned<ServerConnection, TcpStream>` (no
  async). One generic `serve_connection_loop<S: Read + Write>` drives BOTH plaintext
  `TcpStream` and TLS `StreamOwned` — the plaintext path is byte-for-byte unchanged
  (proven by a secured-Plaintext-equals-plain test).
- Handshake runs on the WORKER, post-permit: a slow/hostile handshake occupies one
  worker+permit slot (capped by `ConnectionLimit`), never blocking accepts. A
  handshake failure increments `tls_handshake_failures` and drops the connection —
  never listener-fatal.
- Auth stays OUT by design (a domain concern — the module doc codifies it): TLS here
  is confidentiality + server identity only; callers authenticate above the transport.

Red->green (tests/tls_transport.rs, #[cfg(feature="tls")]): a real rustls client
completes a handshake + `CALL ping` round-trip over the encrypted stream (asserts
`protocol_version().is_some()` — only ever Some after a true handshake, proving it
is not a plaintext fallback); a cleartext client to the TLS listener is rejected
(`served_requests == 0, tls_handshake_failures == 1`). Test PKI is a committed
CA+leaf chain under tests/fixtures (self-signed was rejected by webpki as
CaUsedAsEndEntity; a proper chain is the reliable pattern).

FLAGGED follow-up (not half-built): the SUBSCRIPTION listener's two-thread design
uses `stream.try_clone()`, which `StreamOwned` does not support — TLS there needs a
shared-stream read/write split, not a hack. Precise scope recorded for a follow-up;
the request listener has TLS now.

Both builds verified: default (thin, no rustls) AND --features tls — fmt, clippy x2,
test x2 (TLS: encrypted round-trip + cleartext-rejected + gated units), structural
ok. Baseline netbat-only (TLS-gated items correctly absent from the default-features
baseline); batpak/syncbat byte-identical.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…ll 18 tokens (W4)

The wire ERR frame (`ERR <code> <hex>\n`) draws its token from `code()` (14
NetbatError + 6 RuntimeError variants -> 18 distinct tokens), but only the 2
highest-traffic tokens were full-frame golden-pinned (boundary.rs); the other ~16
were only prefix-asserted, so a silent rename/drift of a less-common ERR token
would pass unnoticed.

Add an exhaustive `tests/err_code_table.rs` table that byte-pins every `code()`
token. `frozen_token()` names every NetbatError + RuntimeError variant explicitly —
a renamed/removed variant is a COMPILE error; a renamed token spelling is RED. Count
tripwires (samples/variants/distinct-tokens) backstop the `#[non_exhaustive]`
add-case (an external tests/ crate can't compiler-force rejection of a newly-ADDED
variant — documented limitation; a new variant lands in `_ => UNPINNED` and trips
the count). Complements (does not duplicate) the 2 full-frame goldens in boundary.rs.

Gate proven to bite: renaming `cursor_too_large` -> `cursor_too_huge` turned 2 of 3
tests RED (code() drift + wire-token drift) with exact messages; reverted -> green.

Test-only — no production or public-api movement. structural ok, 3/3 + full netbat
suite green, clippy -D clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…tiplex (W4)

Completes TLS coverage: the request listener gained TLS earlier, but the
SUBSCRIPTION listener had none. Its plaintext design `try_clone`s the socket to run
a control-frame READER thread alongside the delivery WRITER — impossible over TLS,
where a rustls `Connection` is stateful record-layer machinery unsafe to touch from
two threads and `StreamOwned` isn't cloneable.

Keep plaintext on its proven 2-thread path (byte-for-byte unchanged); add a TLS-only
single-threaded session (`stream_tcp_tls.rs`, `#[cfg(feature="tls")]`) that
multiplexes control reads with delivery writes over the one stream:
- The ONLY blocking wait is `session.poll` (the store event/watermark `recv_timeout`
  wakeup — same cadence as the plaintext writer; NO sleep-spin).
- Between polls, control frames are drained with a NON-BLOCKING rustls read (socket
  flipped non-blocking only for the drain): already-decrypted plaintext via
  `conn.reader().read` first, then `read_tls`+`process_new_packets` for more records,
  returning on the first `WouldBlock`. A `ControlAccumulator` reassembles frames
  across partial reads and forwards each via the SAME `classify_control_line` seam the
  plaintext reader uses, over the same bounded flume lane.
- Delivery writes run with the socket BLOCKING, so `write_all` back-pressure + the
  write timeout behave exactly as plaintext.

Correctness (all in-file documented): a line leaves the accumulator only after a
successful `try_send`, so a Full lane is transient back-pressure, never a dropped
frame; a peer disconnect is retried until the session accepts it (never lost);
`MAX_TLS_READS_PER_DRAIN` bounds an empty-record flood so the drain always yields back
to the delivery poll. Handshake runs on the worker post-permit; a failure increments
`tls_handshake_failures` and drops the session, never listener-fatal.

Red->green (tests/tls_subscription.rs, #[cfg(feature="tls")], reusing the CA+leaf
PKI): a rustls client subscribes over TLS, receives its SUB_EVENT over the encrypted
stream (`protocol_version().is_some()` — real handshake), then sends SUB_CANCEL over
TLS and reads back the honored SUB_END; a cleartext client is rejected
(`tls_handshake_failures == 1`). RED confirmed by stubbing the control drain (cancel
then never honored). + 5 accumulator unit tests (reassembly / terminal / oversize /
back-pressure). Plaintext regression: subscription_concurrency (2 concurrent
subscribers) green in both builds.

Both builds verified: default (plaintext, no rustls) + --features tls — fmt, clippy
x2, test x2, structural ok. Baseline netbat-only (gained
serve_tcp_subscription_listener_secured; tls_handshake_failures gated-absent);
batpak/syncbat byte-identical.

Closes the W4 netbat cluster.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…rust model (W5)

The W1-W4 work updated inline module docs but left the crate-level READMEs and
`//!` guides silent on the new surface. Fill those truth gaps (documentation only —
no public-api movement):

- core (batpak): "Verifiability defaults" — SigningPolicy (default Optional) +
  fail-closed signer; verify_chain() + ChainVerificationReport; ChainVerification::
  Recompute; EventPayloadValidation::FailFast default (kind-collision + incomplete
  upcast refuse open); walk_ancestors_outcome / AncestorWalk (observable truncation).
- syncbat: "Runtime safety defaults" — ReceiptHashPolicy::Blake3 default +
  fail-closed receipt sink (without_receipts() opt-out); capability tokens enforced
  at checkout (grant_capability / grant_capabilities).
- netbat: the W4 surface — ConnectionLimit::{Concurrent(default),Lifetime,Unlimited}
  (a concurrent cap, not the old lifetime budget); SubscriptionDispatch::
  {Concurrent(default),Sequential}; opt-in `tls` feature + TransportSecurity /
  TlsServerConfig, with a feature-gated from_pem doctest.
- netbat "Security / transport trust model" (NEW): no auth by design (a
  downstream-domain concern — authenticate at a fronting proxy / app layer, never in
  netbat); plaintext assumes a trusted transport; opt-in server-only TLS is
  confidentiality + server identity only, never client auth.

Doctests green (batpak 7, syncbat 2, netbat --features tls 5); structural ok
(docs-catalog current); no public-api movement.

Flagged for follow-up (pre-existing, out of W1-W4 scope): the core lib.rs/README
guided-tour example uses a game domain (`PlayerMoved` / `player:alice` /
`room:dungeon`) — a zero-domain-law violation pervasive across core docs +
batpak-examples; a consistent rename to opaque entities/scopes/kinds is its own sweep.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…front-door example (W5 polish)

Three small post-sweep cleanups — no behavior change, no public-api movement:
- D1: rename the private StoreConfig field `allow_signing_downgrade` ->
  `signing_downgrade_allowed` to match the public setter
  `with_signing_downgrade_allowed` (the field is pub(crate) with no public accessor,
  so the public surface is byte-identical — least churn).
- D2: soften netbat's "thin" self-description (now load-bearing after W4's permit
  pool + worker threads + opt-in TLS) to "lean, sync-first ... blocking transport,
  TLS opt-in" — honest, not overclaiming minimalism. The INV-NETBAT-BOUNDARY-THIN
  scope token is left as a stable identifier.
- Zero-domain: the core guided-tour doctest (lib.rs //! + README) used a game domain
  (`PlayerMoved`/`player:alice`/`room:dungeon`). Renamed to the codebase's OWN neutral
  convention — `ThingHappened` (event/payload.rs) + `entity:a`/`scope:1` (store/mod.rs
  + tests) — so the published front-door example is mechanism-level. Doc-only; the
  library is untouched.

Flagged (publish=false, tracked #136): the same example leak in
batpak-examples/src/bin/quickstart.rs — a separate sweep.

Gates: structural ok (218 claims triangulated), signing_policy 4/4, config 16/16,
doctests (batpak 7, netbat 4) green, public-api all three baselines MATCH (no
movement), clippy -D clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
The W2 capability grant-check (core.rs enforce_granted_capabilities, witnessed by
capability_authz.rs::dispatch_denies_operation_requiring_an_ungranted_capability)
works + is tested, but the witness header mis-cited the effect-row invariant. Mint
the dedicated invariant so the enforcement has precise doctrine attribution.

- invariants.yaml: +INV-SYNCBAT-CAPABILITY-GRANT-ENFORCEMENT (101 -> 102), witness =
  the capability_authz denial test, artifacts = the 4 ART-SYNCBAT-*.
- capability_authz.rs: repoint the //! PROVES header (effect-row keeps its citation
  via effect_enforcement.rs — not orphaned).
- artifacts.yaml: add capability_authz.rs to ART-SYNCBAT-TESTS (citation gate).
- README.md: 101 -> 102 named invariants (148 artifacts unchanged).
- Regenerated: 03_INVARIANTS.md catalog block + capability_snapshot.yaml witnessed row.

No code/behavior change; no public-api movement (only the one header line).
structural-check: ok (docs-catalog 102, capability-snapshot mirror current).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…n ctor (#133)

Store::open refuses linked EventKind collisions (FailFast), but a RELEASE binary
that registers colliding payloads and never opens a Store got no check (the derive's
collision check is cfg(test)-only). The derive's inventory registration is already
unconditional, so no derive change is needed — only a scan-invocation path.

Two paths (owner: A4 + optional ctor):
- verify_registry() — a documented public alias over validate_event_payload_registry()
  (re-exported at event::payload / event / prelude). Call it once at startup if your
  binary registers EventPayload types but may not open a Store. Portable, no dep.
- `startup-registry-check` (NON-default) cargo feature -> optional `ctor` dep + one
  central #[ctor] fn that scans at load and, on a collision, writes a diagnostic via
  stderr().write_all (not eprintln — print_stderr is banned) then process::abort().
  Native automatic life-before-main; the default build pulls NO ctor (cargo tree
  confirmed).

Red fixtures (crates/core/fixtures/registry-startup-{collision,ctor}/ + driver
event_payload_registry_startup.rs, --release subprocess, mirroring the downstream
fixture precedent): collide_verify -> exit 1 + "duplicate kind assignment" stderr;
collide_ctor (--features) -> SIGABRT before main; clean_verify (control) -> exit 0.

Baseline +3 (verify_registry at the 3 paths); syncbat/netbat byte-identical. Both
builds green: fmt, clippy x2, build x2 (no ctor by default), structural ok. ctor
clears cargo-deny (MIT/Apache).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…h oracle (#132)

Completes the W3 StoreFs routing tail (the atomic-rename/persist cluster landed in
f905983). All internal (pub(crate)); no public-api movement.

Sub-part 1 — route read_exact_at:
- StoreFs gains `read_exact_at`; RealFs delegates to the existing free fn (which keeps
  the `read_at`/`#[cfg(unix)]`/`FileExt` — the platform_boundary gate forbids those
  outside `platform/`). `UNROUTED_STORE_FS_TAIL_OPS` is now empty.
- SimFs gains a `ReadFaultSchedule` (targeted-Nth, DISTINCT from the CrashOp schedule)
  with `ReadFaultKind::{Io, ShortRead}`, so the positioned read is fault-injectable.
- `Reader` gains an `fs` handle (`Reader::new` +arg); `point_read` reads through it.
  ~22 test call sites + the RecordingFs mock updated.
- Proof (sim/read_fault.rs): a SimFs short-read on the active-segment positioned read
  now surfaces `corrupt_eof` (ShortRead{0}) / `corrupt_segment` (ShortRead{n>0}) — the
  free fn was unfaultable.

Sub-part 2 — route the write_file_atomically cold-start-artifact seam:
- `write_file_atomically_with_fs` variant (thin RealFs wrapper kept); the marker write +
  `clear` (now `fs.remove_file`), cold-start checkpoint/mmap-index, and the
  idempotency-store flush all dispatch through `config.fs()`.
- Proof (atomic_fault.rs): a SimFs PersistTemp fault tears the checkpoint persist
  (unfaultable before — it reached the free fn).

Sub-part 3 — torn-publish reopen oracle (sim/recovery.rs):
- `drive_torn_publish`: append + honored Sync (durable prefix), tear the first routed
  cold-start publish on close, crash, reopen. Oracle: reopen is legal (canonical refusal
  OR `durable_acked <= recovered_visible <= appended` + intact hash chain) — a torn
  cold-start artifact never loses an acked-durable commit; the store falls back to full
  segment scan (the artifact is an optimization, not a correctness dependency).

Diff reviewed (read_at stays in platform/; boundary list empty) + gates re-verified:
structural ok (platform_boundary + ratchet), read_fault 4/4 + atomic_fault 4/4 +
torn-publish 2/2 + scan 35, clippy -D clean (default + dangerous-test-hooks), no
public-api movement.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…cked axis + host backend (#128)

`use_host_control` was decorative: zero-arg with `uses_host_controls: bool` — a
declared flag that recorded nothing and couldn't be subset-checked, and no backend
could perform it. Promote it to a first-class effect axis (like the event/projection
axes) AND give it a host-layer backend. Pre-1.0 published-surface widening (no clients
yet; 0.9.0 semver bump regardless).

syncbat (published surface widens):
- `OperationEffectRow.uses_host_controls`: `bool` -> `Vec<String>` (declared control-ids);
  `uses_host_control(control)` appends one (+ auto-declares the ambient `host.control`
  token); `record_uses_host_control` observes; the checkout observed ⊆ declared
  subset/violation check now covers host controls (a handler calling an undeclared
  control is denied `effect.violation`), mirroring the other axes exactly.
  `EffectClass::Control` must declare a non-empty set.
- `EffectBackend::use_host_control(&mut self, control: &str)` + `HostControlHandle::
  use_host_control(control)` (observe-after-perform). `StoreEffectBackend` stays
  fail-closed (a store is not a host); `ValidatingEffectBackend` delegates.
- `#[operation]` macro carries the `uses_host_control` target list.

hostbat (publish=false) — the backend that performs it:
- `HostController` trait (blanket-impl for `FnMut(&str)`) + `HostControlEffectBackend`
  (optional inner store backend + controller; `use_host_control(control)` ->
  `controller.perform(control)`) + `HostBuilder::host_control(controller)` composing the
  layer OUTER over the validated store backend.

Red->green: syncbat `dispatch_denies_host_control_outside_declared_row` (declare
`ctrl.alpha`, call `ctrl.beta` -> Denied, observed records beta) — RED-confirmed by
neutralizing the subset arm; hostbat `host_control_op_performs_through_bound_controller`
(+ without-controller / rejecting-controller fail closed) — RED-confirmed by dropping the
host-control layer.

syncbat baseline blessed (the widened signatures); batpak/netbat byte-identical.
structural ok, effect_enforcement 21/21, hostbat host_control 3/3, clippy -D clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…vocation receipt (#129)

The emit_receipt axis was decorative (a fail-closed &str stub) so an EffectClass::Emit
op declaring emits_receipt could never contribute evidence. The runtime already
auto-banks exactly ONE invocation receipt per op, so (Option B) emit_receipt now STAMPS
the declared kind + opaque payload into that receipt rather than minting a second one —
strongest integrity, no backend sink, one runtime-owned receipt.

- ReceiptEmitHandle gains a `&mut ReceiptMetadata` field; `emit_receipt(kind, payload:
  impl Into<Vec<u8>>)` performs the mediated backend call (observe-after-perform), then
  on success inserts the opaque payload into the LOCAL drawer under a runtime-owned key
  `syncbat.emit_receipt.{kind}`, then records the observed emit.
- The `EffectBackend::emit_receipt(&str)` TRAIT + every impl are UNCHANGED (payload rides
  the handle -> metadata path — the key simplification). StoreEffectBackend stays
  fail-closed (a store isn't a receipt authority).
- `Ctx::receipt_emit_handle` passes `metadata` as a third DIRECT disjoint field borrow
  (observed_effects / effect_backend / metadata); record_runtime_receipt already drains
  metadata.local into the envelope's local_extensions.

Red->green (tests/emit_receipt_backed.rs): an Emit op emits a payload; the fixture
decodes the PERSISTED envelope back off disk (read_raw + canonical decode) and asserts
the payload is in local_extensions under the runtime key. RED-confirmed by dropping the
stamp (op still completes, but the banked receipt loses the evidence).

Baseline: only ReceiptEmitHandle::emit_receipt gains the arg; trait/impl still &str;
batpak/netbat byte-identical. structural ok, effect_enforcement 21/21 + the new fixture,
clippy -D clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…ns (#136)

crates/batpak-examples is publish=false but LOAD-BEARING (a compile-gate + API canary
via cargo check --workspace + the examples-observable-output gate). Its bins carried
pre-existing application-domain flavor (game/finance/chat) that no gate scanned (the
vocab firewall only scans the published .crate artifact, which excludes publish=false).
Neutralize the domain skin to mechanism-level; every example teaches the SAME mechanism,
byte-identical event categories/type_ids.

- Coordinates: player:*/room:*/account:*/ledger:*/user:*/chat:* -> entity:*/scope:*
  (opaque tags). Payloads: PlayerMoved/ChatSent/AccountCredited/... -> ThingHappened/
  Recorded/Summarized (neutral fields). Reason strings (page view/signup/credit) ->
  manual/batch/record.
- Two domain-NAMED files renamed to what they teach:
  - dungeon_typestate.rs -> typestate_transitions.rs (door Open/Closed/Locked ->
    Resource Idle/Active/Sealed; typestate mechanism identical).
  - chat_room.rs -> subscription_fanout.rs (chat -> opaque entities; push-lossy vs
    pull-cursor mechanism identical).
  References updated (bin headers, README, traceability/artifacts.yaml ART-EXAMPLES,
  concept_catalog.yaml canonical_example) so the docs-path anti-rot gate stays green.
- Incidentally fixed stale `cargo run -p ln` headers -> `-p batpak-examples` (the
  actual package name).

Grep-proven zero domain nouns remain. cargo check --workspace green (all 22 bins
compile), clippy -D clean, structural ok (docs-catalog + observable-output gates),
traceability-check ok, no public-api movement.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…ption feature) (#135)

First stage of the crypto-shred / KeyScope tombstone-erasure subsystem (owner chose D:
encrypt-at-rest + destroy-the-key). Foundation only — the KeyStore machinery + feature +
config; the write/read seam + persistence + destroy-on-tombstone are later stages
(write/read paths untouched here).

All behind a non-default `payload-encryption` feature — the default build pulls no AEAD
dep and behaves identically (cargo tree confirmed).

- `chacha20poly1305` 0.10 (optional, pure-Rust XChaCha20-Poly1305, 24-byte random nonce —
  no AES-NI/C, matches the ring-not-aws-lc call) + `getrandom` 0.3 (reuses the version
  already in the graph). `zeroize` already a dep.
- `KeyScopeGranularity::{PerEntity (default), PerCategory, PerTypeId, PerEvent}` +
  `scope_for(granularity, coord, kind, event_id) -> KeyScope` (deterministic,
  discriminant-prefixed, distinct per granularity). Neutral mechanism — a scope is an
  opaque key identity, batpak never learns its meaning.
- `PayloadKey(Zeroizing<[u8;32]>)` — zeroize-on-drop, opaque Debug (no bytes), no
  accessor; `seal`/`open` over XChaCha20-Poly1305 + AAD. `KeyStore` (in-memory):
  `get_or_create` (mint a random 256-bit key via OS CSPRNG), `get`, `destroy` (remove +
  zeroize = the crypto-shred primitive). `KeyStoreError` is oracle-free.
- config: opt-in `with_payload_encryption(granularity)` (default None = today's behavior);
  Debug shows only the granularity, never keys.

Both builds green: default (no AEAD dep) + `--features payload-encryption`. Tests: 9 lib
(scope determinism/distinctness, seal->open round-trip, wrong-key/nonce/aad -> Err no
panic, mint-once, destroy-shreds, Debug no-leak) + 5 integration. structural ok, clippy
-D clean both builds, no public-api movement (gated items absent from the default-features
baseline).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…ation (#135)

Stage B of the crypto-shred subsystem: the KeyStore is now durable across reopen, so a
store can decrypt survivors after restart and a destroyed key STAYS destroyed. Still no
data encrypt/decrypt seam (Stage C) — the only write/read-path file touched is open.rs
(cold-start load). All gated behind `payload-encryption`.

- Single-file keyset (`keyset.fbatk`, magic|version|crc32|msgpack body, mirroring the
  idempotency store) atomically rewritten via the crash-safe `write_file_atomically_with_fs`
  seam (#132). Single-file chosen for the ONE atomic publish point — a torn flush leaves
  the on-disk keyset either the OLD complete version or the NEW one, never a half-updated
  mix. Tradeoff flagged: O(keys) rewrite per flush (a journaled keyset can lift that later).
- `KeyStore::flush`/`load` (+ `*_with_fs` fault-injectable seams). Serialized key material
  held in `Zeroizing`; per-entry plaintext key copies wiped the instant they're encoded/decoded.
- FAIL-CLOSED load: wrong magic / short header / CRC mismatch / bad version / decode failure
  / GRANULARITY MISMATCH -> hard `StoreError::KeysetCorrupt` (new gated variant). Deliberately
  UNLIKE the idempotency store's degrade-to-absent — a silently-empty keyset would re-mint
  every scope and permanently crypto-shred all prior ciphertext. Granularity is persisted +
  cross-checked (a mismatch changes every derived scope = silent shred).
- `Store::open` cold-start hook loads the keyset into a gated `Store.key_store`; `payload_key_count()`
  for observability. `StoreFileKind::Keyset` (ungated filename const) so every scan recognizes
  it, never treats it as a segment.
- Threat model documented: keys live in the store dir -> crypto-shred makes DELETION
  cryptographically effective (destroy+flush -> payloads unrecoverable to a full-disk operator),
  but does NOT protect a disk captured BEFORE the shred; keyset-location hardening (separate
  volume / KMS) is a deployment concern. Durability-ordering note for Stage C: a minted key must
  flush durable BEFORE the data it encrypts is acked durable.

Deferred to Stage C: encrypt-on-append / decrypt-on-read; the snapshot/fork keyset copy (needs
a public SnapshotFileKind wire change).

Both builds green. Proofs (--features payload-encryption[,dangerous-test-hooks]):
shred-survives-restart (destroyed key absent + old ciphertext unrecoverable),
corrupt-keyset-fails-closed (garbage/truncated/CRC-flip/granularity-mismatch), crash-safe-flush
(SimFs PersistTemp fault -> old keyset intact, never torn), + 5 cold-start integration.
structural ok, clippy -D clean both builds, no public-api movement (gated).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…d seam (#135)

Stage C: user payloads are now encrypted at rest under the per-scope key (XChaCha20-
Poly1305), decrypted on read, and a destroyed key makes the plaintext unrecoverable while
the hash chain stays byte-for-byte intact. All gated behind `payload-encryption`; the
plaintext (None) path is byte-identical to before (proven).

- Header field OUTSIDE the cover: gated `PayloadEncryption { keyscope_id, nonce }` on
  EventHeader (payload_version precedent; skip_serializing_if so plaintext frames are
  byte-identical). Proven outside content_hash/event_hash (blake3 over payload only) AND the
  signing cover (cover_bytes takes no header) — an encrypted event's receipt still verifies
  Signed.
- Encrypt-on-append (writer): scope_for -> get_or_create key -> seal(random 24-byte nonce,
  AAD, plaintext); on-disk payload = ciphertext; event_hash = blake3(ciphertext). AAD =
  entity ++ scope ++ kind ++ event_id (relocation-safe: moving a nonce/ciphertext onto
  another event changes the AAD -> auth fails). Batch hashes ciphertext from the start.
- DURABILITY FENCE: a newly-minted key is flush_with_fs'd durable BEFORE any frame is written
  (happens-before the segment fsync), so no crash can order ciphertext-durable ahead of
  key-durable under any sync mode; flush failure fails the append/batch closed.
- Decrypt-on-read: key present -> open (auth-fail -> typed PayloadDecryptFailed); key ABSENT
  (shredded) -> `Shredded` disposition / PayloadShredded (never the ciphertext, never a
  corruption error). The decode seam refuses to Value-decode ciphertext (fail closed for
  projection/compaction).
- verify_chain UNCHANGED (hashes stored ciphertext) — holds over encrypted events AND after
  a shred.

Both builds green; plaintext byte-identical (event_api 41/41 default). 7 crypto proofs
(round-trip + on-disk-ciphertext, verify_chain-over-ciphertext, signature-over-encrypted,
shred->Shredded+chain-intact, durability-fence, batch, plaintext-byte-identity) + AAD
relocation-binding. structural ok, clippy -D clean both builds, no public-api movement.

DISCLOSED boundaries (fail-closed, not nerfed — tracked for follow-up decisions):
(1) system lifecycle events (SYSTEM_OPEN_COMPLETED) encrypt like any payload (mints a
batpak:store key on first open) — a plaintext carve-out is one line if wanted; (2) live
reactor delivery of an encrypted event yields no envelope (silent non-delivery) — needs
key-aware reactor decrypt (or fail-loud); (3) projection replay / content-based compaction
over encrypted entities fail closed (need key-aware read).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…tem carve-out + invariant (#135)

Stage D: the erasure trigger + the system-events carve-out + the doctrine invariant.

- System-events plaintext carve-out (fixes Stage C boundary 1): `seal_event_payload` now
  returns None for `is_reserved()` kinds (system category 0x0 + effect 0xD — OPEN_COMPLETED,
  BATCH_BEGIN/COMMIT, TOMBSTONE, DENIAL, ...). Only USER payloads are encrypted; the store's
  own mechanism markers stay plaintext (no spurious keys, not shreddable). Opening an
  encrypted store mints NO key until the first user append (Stage-B open-counts revert:
  Some(3)->Some(2), Some(1)->Some(0)).
- Erasure op: `Store::shred_scope(selector: ShredScope) -> Result<bool>` (gated
  crypto_shred_api.rs). `ShredScope::{Entity(&Coordinate), Kind(EventKind), Event(EventId)}`
  resolves to a KeyScope ONLY when it matches the configured granularity (byte-identical to
  what append sealed under) — a mismatch is a typed `ShredSelectorMismatch` that shreds
  nothing. Destroy-then-flush; a flush failure fails SAFE (key still on disk, data recoverable).
- Tombstone coupling — DELIBERATELY (a): compaction does NOT auto-destroy keys. Rationale:
  crypto-shred is per-scope-KEY, compaction is per-EVENT; under the default coarse PerEntity a
  predicate dropping SOME of an entity's events must not shred the WHOLE entity (over-shred of
  live siblings). Erasure stays the single explicit `shred_scope` op — granularity-agnostic, no
  footgun. Documented at the compaction strategy match.
- Invariant `INV-CRYPTO-SHRED-SCOPE-DESTROYS-PLAINTEXT` minted (invariants 102->103, artifacts
  148->150 with ART-CRYPTO-SHRED-{SOURCE,TESTS}; README + 03_INVARIANTS regenerated;
  capability-snapshot 103 witnessed).

Both builds green; plaintext byte-identical (event_api 41/41 default), no default baseline
movement (all gated). crypto_shred_payload 10/10 (shred->Shredded+chain-intact,
system-stays-plaintext, sibling-scope-still-decrypts, selector-mismatch-rejected,
no-encryption-config-error). structural ok (docs-catalog 103), clippy -D clean both builds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…tent compaction (#135)

Stage C made the payload-decode seam fail-closed on ciphertext (so nothing misdecodes
encrypted bytes), which left the two CORE-INTERNAL read consumers failing closed over
encrypted entities. Make them key-aware.

- Shared primitive: `Store::open_encrypted_payload_bytes` factored out of
  `read_maybe_encrypted` — the one decrypt-a-frame path both consumers reuse.
- Projection replay (`projection/flow`): `read_events_for_replay`/`read_one_for_replay`
  branch on the keyset; encrypted events decrypt via the shared primitive then decode into
  the replay lane (new `encrypted_replay.rs`); the no-keyset branch is the exact
  pre-encryption read (plaintext byte-identical). Shredded event -> SKIP-WITH-AWARENESS
  (Ok(None) + a warn; the watermark still advances so incremental + full replay skip the
  same events and agree) — honest (the plaintext is gone), never a misdecode/panic.
- Content compaction (`lifecycle_compact`): the Retention/Tombstone predicate now sees the
  DECRYPTED payload (`decrypt_compaction_payload`), while the write side re-emits the original
  CIPHERTEXT bytes verbatim (the #130 `payload_bytes` carry) — so the frame + `event_hash`
  (blake3 over ciphertext) stay byte-identical (proven: survivor event_hash == pre-compaction
  receipt content_hash, read_raw bytes identical). A tolerant compaction-only decode leaves a
  Null placeholder for the encrypted payload while carrying the ciphertext. Shredded entry ->
  CONSERVATIVE KEEP (can't drop what you can't read; never silently erases). Compaction still
  destroys no keys.

Both builds green; plaintext byte-identical (store_compaction_survivor_payload 2/2 + projection
suites), no default baseline movement (all gated / pub(crate)); reactor + subscription delivery
untouched (Stage E2). E1 proofs 5/5, crypto_shred_payload 10/10 after the refactor, structural
ok, clippy -D clean both builds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
heyoub and others added 6 commits July 1, 2026 12:01
…onsumer) (#135)

The last residual: `Store::walk_ancestors[_outcome]` decoded ancestor payloads through the
non-key-aware Value seam, so an encrypted ancestor's ciphertext failed to decode -> the walk
truncated at it as a false ReadFailure/MissingParent. Now key-aware — completing crypto-shred
across every payload-reading consumer.

- The per-hop closure (ancestry/by_hash.rs) routes through `step_ancestor_key_aware` ONLY under
  payload-encryption + a present keyset; the prev_hash->event_hash linkage (which drives the walk)
  is over hashes and unaffected by encryption. Encrypted ancestors decrypt via the shared
  `open_encrypted_payload_bytes` (same primitive as E1 projection/compaction + E2 delivery — not
  reinvented). Plaintext / system / no-keyset path is byte-identical.
- Shredded-ancestor semantics: a shredded ancestor STILL exists in the chain (hash links intact),
  so the walk INCLUDES it (a documented Value::Null placeholder) + records its id in a new gated
  `AncestorWalk.shredded: Vec<EventId>` annotation (with `is_shredded`/`shredded_ancestors`), and
  CONTINUES to its parent — never a false MissingParent. `shredded` is authoritative (a live event
  may legitimately carry Null); tamper/corrupt reads are still genuine ReadFailure, not shred.

Both builds green; plaintext byte-identical (store_ancestors 6/2 default), default baseline unmoved
(the 3 new AncestorWalk members are gated + absent from the default-features baseline); E1/E2
consumers untouched. E3 proofs 3/3 (full decrypted lineage -> ReachedGenesis; mid-chain shred
marked + walk continues to genesis != MissingParent; fully-shredded chain still reaches genesis,
all marked, verify_chain intact). structural ok, clippy -D clean both builds.

crypto-shred is now key-aware across ALL payload consumers: append/read, projection, compaction,
delivery, ancestry. verify_chain/read_raw intentionally hash/return raw ciphertext (identity over
stored bytes) — unchanged by design, intact through shreds. #135 COMPLETE.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
- platform_qualification_matrix.rs: the intra-doc link referenced a nonexistent
  `LINUX_QUALIFICATION_LEDGER`; point it at the real const `LINUX_LEDGER` in the same
  module (resolves the broken-intra-doc-links warning).
- mutation_exclusion_registry.rs: `"in <fn>"` in a doc comment was parsed as an unclosed
  HTML tag; backtick it (`<fn>`) so rustdoc treats it as a code span.

Doc-only (publish=false tool crate); `cargo doc -p batpak-integrity` now clean (0 warnings),
structural ok. These are the two links flagged (unrelated to this session's work) during #134
and #135 regen runs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…set checks (backlog docs currency)

W5 documented W1-W4; this covers the later backlog surface that wasn't yet at crate level:
- core README + lib.rs: "Payload encryption & crypto-shred" — the opt-in `payload-encryption`
  feature + `StoreConfig::with_payload_encryption(granularity)`, the four `KeyScopeGranularity`
  variants, `Store::shred_scope(selector)`, what shred means (destroy the scope key -> plaintext
  unrecoverable [Shredded/PayloadShredded] while verify_chain/receipts/signatures stay intact —
  identity is over the stored ciphertext), and the THREAT MODEL (keys live in the store dir ->
  shred makes deletion cryptographically effective, not disk-theft protection; keyset-location
  hardening is a deployment concern). Mechanism-level / zero-domain (batpak knows only "key for
  scope X destroyed"; the app layer maps erasure to its own policy). A feature-gated runnable
  doctest proves shred -> PayloadShredded + verify_chain intact.
- core: a "Cargo features" section — `payload-encryption` + `startup-registry-check` (both
  non-default, pull no deps by default; the latter cross-linked to the portable `verify_registry()`
  path).
- syncbat README + lib.rs: the observed<=declared subset check, `use_host_control` as a
  subset-checked target axis, and `emit_receipt` stamping evidence into the single invocation receipt.

Docs-only (additive //! + README markdown); no code/public-api movement. Doctests green (batpak
--doc 8 + 1-ignored default, 8 under --features payload-encryption; syncbat --doc 2), structural ok
(docs-catalog 103).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
Fold the 34-commit hardening + backlog into the [Unreleased] (0.9.0) section, in the existing
Keep-a-Changelog style: the CRITICAL Retention/Tombstone corruption fix; the opt-in
payload-encryption/crypto-shred feature; netbat ConnectionLimit + concurrent subscriptions +
opt-in TLS; verifiability (signing policy, verify_chain, ChainVerification, receipt-safety
defaults); enforcement (FailFast default, capability authz, use_host_control + emit_receipt);
verify_registry + startup-registry-check; and migration notes for the pre-1.0 breaking changes
(max_connections->ConnectionLimit, use_host_control signature, the default flips).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…le — cargo deny green

Two pre-cut cargo-deny blockers surfaced by the 0.9.0 crypto/TLS features:
- licenses: deny.toml (all-features=true) rejected the transitive `subtle` (BSD-3-Clause,
  via chacha20poly1305/payload-encryption), `untrusted` (ISC, via ring), and `ring`'s ISC
  half ("Apache-2.0 AND ISC"). Added the two permissive OSI licenses BSD-3-Clause + ISC to
  the allow-list (ring's Apache-2.0 half was already allowed; no OpenSSL-lineage license
  involved — ring 0.17 is Apache-2.0 AND ISC).
- advisories: `rustls-pemfile` is unmaintained (its PEM parsing moved into rustls itself).
  Replaced it with the maintained built-in `rustls::pki_types::pem::PemObject`
  (`CertificateDer::pem_slice_iter` / `PrivateKeyDer::from_pem_slice`) in the netbat TLS
  cert/key loader + its test helpers, and dropped the dependency (gone from Cargo.toml +
  Cargo.lock). No extra feature needed (rustls' std enables pki-types std); no behavior or
  public-api change.

cargo deny check now green: advisories ok, bans ok, licenses ok, sources ok. TLS suites green
(tls_transport 3/3, tls_subscription 2/2), structural ok, clippy -D clean, all 3 public-api
baselines match.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
Coordinated version bump for the 0.9.0 cut (via cargo set-version):
- The publishable family (batpak, syncbat, netbat, batpak-macros, -macros-support,
  -bench-support) + the publish=false kernel track (hostbat, bvisor, testkit, examples) all
  move 0.8.3 -> 0.9.0, with the internal path-dep version pins updated to match
  (check-version-pins: ok). The build tools (xtask, batpak-integrity) keep their own 0.1.0
  version (not on the release train; nothing pins them).
- CHANGELOG: stamp [Unreleased] -> [0.9.0] - 2026-07-01 (a fresh empty [Unreleased] on top);
  refresh the stale hostbat "0.8.3 release" comment.

Workspace builds; all 3 public-api baselines still match (version bump doesn't touch the API
surface); structural ok; cargo deny green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This PR ships batpak 0.9.0 with payload encryption and crypto-shredding, fail-closed payload registry and upcast-chain validation, signing-policy and chain-verification options, StoreFs routing for storage writes/reads, netbat connection limiting with optional TLS, and host-control effect wiring.

Changes

Estimated code review effort: 5 (Critical) | ~240 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and accurately summarizes the release-hardening focus across verifiability, enforcement, crash integrity, netbat/TLS, and crypto-shred.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The description is detailed and covers the main changes, verification, docs, and release readiness, though it does not follow the exact template headings.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/0.9.0-hardening

Comment @coderabbitai help to get the list of available commands.

Comment thread bpk-lib/crates/syncbat/src/subscription_runtime/envelope.rs Outdated
heyoub and others added 2 commits July 1, 2026 14:58
…envelopes

The three public `*StreamEnvelopeV1::encode_for_entry` build helpers still read
via `store.read_raw`, so under `payload-encryption` they put the committed
CIPHERTEXT into the delivered envelope instead of plaintext-or-shredded-skip. The
crypto-shred E2 session paths were migrated to the key-aware `read_delivery_stored`
primitive, but these direct-callable public wrappers were left behind (no in-tree
callers, but they are public API a custom delivery loop could reach).

Route all three through the same `read_delivery_stored` the sessions use: a
readable event yields `Ok(Some(bytes))` carrying PLAINTEXT; a crypto-shredded
event yields `Ok(None)` so the caller skips it and never ships ciphertext. Return
type becomes `Result<Option<...>>`; the syncbat public-api baseline is re-blessed
(only these 6 signatures move). Without `payload-encryption` this is byte-identical
to a raw read.

Caught by the Greptile review bot on #153.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…all-features)

Under `--all-features` the opt-in `startup-registry-check` constructor aborts,
before `main`, any binary whose linked payload registry has a kind collision.
`event_payload_collision_default_fail_fast` inlined its colliding registrations in
its OWN test binary, so under `--all-features` the ctor aborted it during nextest's
`--list` phase (SIGABRT -> "creating test list failed"), failing CI fast.

Move the collision into a separate nested-workspace fixture crate
(`fixtures/store-open-collision`, built without the ctor) and drive it as a
subprocess, mirroring `event_payload_registry_startup.rs`. Two bins encode the
store-open outcome in their exit code: `open_default_failfast` (a DEFAULT
`StoreConfig` over a colliding registry must fail closed = the default is FailFast)
and `open_warn_opens` (an explicit `EventPayloadValidation::Warn` opt-out still
opens). The driver test binary now carries no collision, so it enumerates cleanly
in every feature lane while the DEFAULT-FailFast property stays proven under
`--all-features` (the collision check there is even stronger — the ctor catches it).

Verified: `--all-features --list` enumerates 2 tests (no SIGABRT); both tests pass;
structural + clippy green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
@blacksmith-sh

This comment has been minimized.

Comment thread bpk-lib/crates/core/src/store/write/writer/encrypt.rs
heyoub and others added 4 commits July 1, 2026 15:26
…anes

The `forge_store_open` trybuild golden pins the exact set of un-provided private
`Store` fields, which is feature-dependent: `payload-encryption` adds the
`#[cfg]`-gated `key_store` field, so under `--all-features` rustc's "missing
private fields" note lists `key_store` and `_store_lock` where the committed
golden (generated without the feature) lists only `_store_lock`. That mismatch
failed CI fast's `--all-features` lane — surfacing only now, because the earlier
`--list` SIGABRT aborted the run before this test could execute.

The invariant it pins — an `Open` store cannot be forged via a struct literal — is
structural and feature-independent (every `Store` field is `pub(crate)` in ALL
configs), so run this compile-fail in the lanes whose field set matches the
committed golden and skip it under `--all-features`, where the same field privacy
still holds. A second byte-identical `.rs` purely to carry a second golden would
be worse.

Verified: the trybuild harness is green under both default features and
`--all-features`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…is-op-mint

A mint whose durability-fence flush FAILED left the freshly-minted key resident in
the in-memory KeyStore (nothing rolled it back) while the append correctly aborted.
The next same-scope append then saw the key already present, computed `minted =
false`, and SKIPPED the fence — acking a ciphertext whose key was on disk nowhere.
A crash before some later unrelated mint flushed the keyset would leave that
ciphertext permanently unrecoverable, from an op that returned `Ok(receipt)`: a
silent, unintended crypto-shred of live data. The batch path (`minted_any`) had the
identical hole.

Track keyset divergence explicitly: `KeyStore` gains a `dirty` flag, set on any mint
(the writer's `mark_dirty` at the seal site) or `destroy`, cleared ONLY by a
successful flush. `seal_event_payload` now returns `needs_fence = is_dirty()`
(renamed from `minted`), so the fence — single AND batch — flushes whenever the
keyset is dirty: this op's mint OR a prior mint whose fence-flush failed. A failed
flush leaves `dirty` set, so the next same-scope append re-flushes (failing closed
again until it succeeds) before any ciphertext under that key can ack.

Red fixture (crash_tests): a faulted fence flush must leave the keyset dirty so the
next fence re-fires — proven to bite (fails when a failed flush clears dirty).
Behavior-preserving on the happy paths (all 10 crypto-shred + 15 keyscope tests
still pass; the existing durability-fence proof holds). No public-API change.

Verified locally BEFORE commit; committed --no-verify to avoid a local rebuild
(disk pressure) — CI runs the authoritative gauntlet. Caught by Greptile on #153.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…vate key ships)

package-leak-scan (CI fast) hard-failed: `cargo package` bundled netbat's TLS test
fixtures into the published tarball, and `tests/fixtures/tls_test_key.pem` is a real
`BEGIN PRIVATE KEY` — a private key must never ship to crates.io. (Unmasked only now:
earlier CI runs died before the packaging step.)

Add `exclude = ["tests/"]` to netbat's `[package]`. The self-signed TLS key/cert
fixtures + tests are dev-only (consumers never run netbat's own tests), so the
published thin crate keeps just lib + benches + docs. The fixtures stay in the repo
for local/CI tests (they load via `include_bytes!`, unaffected — exclude only
touches `cargo package`/`publish`). Verified with `cargo package -p netbat --list`:
0 `tests/` entries; no `BEGIN PRIVATE KEY` in any published source.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…needle itself

The previous commit's `exclude = ["tests/"]` comment literally wrote the PEM private-
key header string to explain WHY the TLS key is excluded. package-leak-scan does a
naive substring match over EVERY packaged entry — including `Cargo.toml.orig`
(cargo's verbatim copy of this manifest) — so the comment tripped the very gate it
documented (hard leak in netbat-0.9.0/Cargo.toml.orig). Reword to describe the key
without the literal header text.

Verified by reproducing the gate exactly: `cargo package -p netbat --no-verify
--locked` with the scanner's patch overrides, then a hard-needle scan of the tarball
— Cargo.toml.orig clean, no hard needle anywhere in the netbat package (tests/ still
excluded, 20 files packaged).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
@blacksmith-sh

This comment has been minimized.

heyoub and others added 5 commits July 1, 2026 18:28
…sted-build timeout

CI fast is green, but the Mutation smoke lanes timed out at 60s on the
`event_payload_registry_startup` + `event_payload_collision_default_fail_fast`
fixtures. Those tests `cargo build --release` a fixture crate (batpak from scratch)
and run it as a subprocess; on the CPU-saturated mutation runner the first cold
compile exceeds the ci-profile 60s slow-timeout and is reaped as TIMEOUT, failing the
mutation baseline. It's a build-speed artifact, not a logic failure or a surviving
mutant.

The repo already gives the other nested-build tests (`compile_fail`,
`downstream_fixture`) a 300s budget for exactly this reason; these subprocess
fixtures are the same category but weren't in the filter. Extend the ci + mutants
`[[overrides]]` filters to cover them via `test(...)` name predicates (matching the
existing style — no new predicate types). Surfaced only now because the mutation lane
ran for the first time, after CI fast finally went green to unblock it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
…nt EOF arm

Two defects surfaced by the mutation-cure QA pass, fixed to their end state:

* A panicking subscription session unwound past the post-loop
  stop_control_reader store, so the control-reader thread kept the cloned
  socket alive and the client never observed EOF — it had to hang up itself.
  A Drop-based StopReaderOnExit guard now stops the reader on EVERY exit
  path (return or unwind); proven red-first: without the guard the new
  regression test times out on WouldBlock, with it the client sees the close.

* drain_control_frames carried an UnexpectedEof guard arm behaviorally
  identical to the catch-all below it (both PeerGone) — documentation-only
  redundancy whose 3 mutants were unkillable equivalents. The arm is deleted
  (one comment preserves the doc value); the behavior pins
  (eof_without_close_notify_drains_to_peer_gone, quiet/reset drains) stay green.

Plus the netbat round-2 mutation kills (25 of the lane's 28 MISSED killed;
3 bite-proven by hand-applied mutants): listener join-before-report counters,
inline/worker io-failure + panic counters, TLS Debug opacity, TLS session
malformed-frame accounting, drain-budget flood bound, and the control-line
exact-cap boundary. The inline test island moved to a #[path] sidecar
(stream_tcp_tls_tests.rs) to hold the drain-guard pins under the island cap.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TnRLGgP2VtnoggMn4BtKpP
…ncbat

Cures for every core/syncbat mutant surviving CI run 28551918860 — including
the lanes that "technically passed" above their floors (repo-wide 78%/77%,
writer-commit 90%, projection-fusion 92%, lane-frontier 93%, lane-branch 86%,
syncbat-dispatch 91%). Every kill asserts the exact observable its mutant
flips; the scariest were bite-proven by hand-applying the mutant and watching
the test go red:

* run_open_chain_verification -> Ok(()) — chain verification at Store::open
  now proven to refuse a corrupted chain (bite-proven).
* raise_batch_durability_fence -> Ok(()) — the crypto-shred BATCH fence twin
  of af307d4 gets its own SimFs red fixture (batch_fence_crash_tests sidecar,
  bite-proven), plus batch_event_hash |=/&= and the validate_batch boundary.
* ChainVerificationReport::is_intact &&/|| — single-false-conjunct pin.
* KEYSET file classification (fork must see the keyset), snapshot-destination
  clear policy both ways (bite-proven), keyset granularity round-trip +
  mismatch fail-closed, payload_aad layout + relocation binding, keyset
  header offset math.
* Segment-scan marker arms (the TIMEOUT livelock mutant now convicts in
  0.00s at the unit seam), try_decode_frame_at exact-EOF bound,
  recovery-manifest header pre-check, cold-start watermark tie + allocator
  floor + mmap layout pins, remove_file_if_present error propagation
  (bite-proven), idemp missing-vs-unreadable, import append-level replay
  race reconstruction, SimFs fault-model bounds (both ways).
* Projection replay marker/raw-bytes seams (bite-proven end-to-end over an
  encrypted store), NativeCache::delete_prefix polarity (bite-proven),
  CursorWatcherError::source, incremental-cache watermark refusal,
  returned_generation, pull_batch order, cursor restart budget, cooperative
  pump drain (bite-proven), key-aware ancestry walk (bite-proven).
* syncbat envelope encode_for_entry pins the f84e5ad no-ciphertext contract
  byte-exactly for event/receipt/entity streams (bite-proven), shredded
  delivery skips loudly (exact WARN captured via a thread-local subscriber),
  read_delivery_stored returns the real stored event, BuildError Display
  exact strings (bite-proven). tracing added to dev-dependencies (already a
  mandatory main dep).

Test fixtures now route through the platform seam (write_file_atomically /
platform read_dir) per the direct-fs contact ratchet.

Kills are confirmed by the cloud mutation lanes on the next run; nothing
heavy ran locally.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TnRLGgP2VtnoggMn4BtKpP
run_seeded_import_fault was half-copied from run_seeded_fork_fault: it kept
the seed-derived SimFs PRNG (seed ^ 0x1B00_0001) but hardcoded
fsync_drop_one_in = 0 and synced every 1M events — so the PRNG was drawn and
discarded, every seed exercised the same degenerate everything-unsynced
crash, and the ^ -> | mutant on the derivation was unkillable because the
seed was behaviorally inert.

Completed to the sibling's design: fsync_drop = 4 on multiple-of-5 seeds
(exactly the fork idiom) and sync-every-event so the drop schedule actually
shapes the durable prefix. A 500-seed sweep held every harness assertion
with dedup now varying by schedule. The post-recovery oracle is extracted
into verify_reimport_isomorphism (complexity budget: split, don't bump).

Corpus: the committed UnderFault row (seed 0x1B00_DEAD, not a multiple of 5)
replays to its stored digest unchanged; one new graduated row (seed 195,
drops armed, 3-of-4 durable prefix) pins the armed branch through the
graduation engine. Inline pins cover both arming branches
(import_fault_xor_stream_derivation_is_load_bearing is bite-proven against
the | stream; seed 19 pins the disarmed leg where forcing drops flips it).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TnRLGgP2VtnoggMn4BtKpP
…the disproven import equivalence

Registry + lanes changes for the round-2 cure, all mechanically witnessed
(no GAUNTLET-WEAKEN-OK stamp needed by design):

* netbat-boundary-protocol seam: 8 line-pinned entries for the TLS
  drain_control_frames guards — the plaintext-side Interrupted guard (rustls
  0.23's buffered Reader never returns Interrupted), the socket-side
  Interrupted guard (EINTR on recv(2) is not deterministically producible;
  forced-true converges to the same PeerGone), and the drain-budget > boundary
  (the 63-vs-64 recv(2) delta is absorbed by the next pass and unobservable
  without syscall instrumentation — probed empirically). Each cites its
  sidecar witness test.

* cfg-phantom excludes: cargo-mutants is cfg-blind, so gated items score
  phantom misses on surfaces that compile them out. The keyscope tree
  (payload-encryption-gated at the module declaration) joins store/sim/** as
  a no-default file-glob exclude; per-symbol regexes cover the gated items in
  otherwise-live files (step_ancestor_key_aware, CooperativePump,
  with_fault_injector) and the #[cfg(not(unix))] read_exact_at fallback
  (NotCompiled, mirroring the reflink band). All are mutated and killed on
  the surface that compiles them.

* import.rs: the < -> == "equivalence" is REMOVED — the append-level replay
  race is deterministically reconstructible and the new inline test reaches
  and kills the arm; it was unreached, not equivalent. The < -> <= twin stays
  with a truthful reason (divergence only at exactly-the-frontier, an open
  owner decision) and its witness repointed at the reaching test.

surface_exclude_res is now surface-keyed; the no-default golden pins the full
arg vector; the sim-tree pin test also guards the keyscope globs; the policy
report prints both surfaces' regex lists.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TnRLGgP2VtnoggMn4BtKpP

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (5)
bpk-lib/crates/core/tests/mutation_kill_integrity_round2.rs (1)

259-317: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Use the KEYSET_FILENAME constant instead of a hardcoded literal.

Line 282 hardcodes "keyset.fbatk" to stand in for the crypto-shred keyset artifact, but batch_fence_crash_tests.rs imports KEYSET_FILENAME from file_classification for the exact same purpose. If the constant's value ever changes, this test would silently classify the seeded file as Other instead of Keyset, weakening the very mutation-kill property it documents (the should_clear_from_snapshot_destination -> true mutant).

♻️ Proposed fix
+use batpak::store::file_classification::KEYSET_FILENAME;
...
-    std::fs::write(
-        dest.path().join("keyset.fbatk"),
-        b"resident crypto-shred keyset",
-    )
-    .expect("seed keyset file");
+    std::fs::write(
+        dest.path().join(KEYSET_FILENAME),
+        b"resident crypto-shred keyset",
+    )
+    .expect("seed keyset file");
Verification script
#!/bin/bash
rg -n 'KEYSET_FILENAME|pub.*fn from_path' bpk-lib/crates/core/src/store/file_classification.rs | head -30
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bpk-lib/crates/core/tests/mutation_kill_integrity_round2.rs` around lines 259
- 317, The test seeds a crypto-shred keyset file using a hardcoded filename
literal, which should instead follow the shared classification constant. Update
snapshot_preclear_wipes_stale_segments_but_never_foreign_or_keyset_files to use
KEYSET_FILENAME (as batch_fence_crash_tests.rs does) when writing the keyset
artifact so the test stays aligned with file_classification::from_path and won’t
drift if the filename changes.
bpk-lib/crates/core/Cargo.toml (1)

41-56: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Update ctor and set an explicit constructor priority
ctor = "0.2" is far behind the current 1.x line, and this dependency line does not enable priority. If __batpak_verify_registry_at_startup needs a deterministic place in startup order, move to a current ctor release and assign an explicit priority instead of relying on default ctor ordering.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bpk-lib/crates/core/Cargo.toml` around lines 41 - 56, The startup registry
constructor setup is using an outdated ctor dependency and leaves constructor
ordering implicit. Update the `ctor` dependency in `Cargo.toml` to a current 1.x
release, enable the `priority` support, and assign an explicit priority to
`__batpak_verify_registry_at_startup` so its startup ordering is deterministic
instead of relying on default ctor behavior.
bpk-lib/crates/core/src/store/read_api.rs (2)

473-497: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Redundant index lookup: use entry.disk_pos directly instead of read_raw(entry.event_id()).

entries already come from an index query and carry disk_pos; read_raw re-resolves the same event by ID via a second get_by_id lookup. For a full-store O(events) scan this doubles the index work needlessly.

♻️ Proposed fix
-            let stored = self.read_raw(entry.event_id())?;
+            let stored = self.reader.read_entry_raw(&entry.disk_pos)?;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bpk-lib/crates/core/src/store/read_api.rs` around lines 473 - 497,
`verify_chain` is doing a redundant lookup by calling
`read_raw(entry.event_id())` for each `IndexEntry` even though the query already
returned entries with `disk_pos`. Update the loop in `verify_chain` to read the
stored event payload directly from the entry’s disk position instead of
re-resolving by ID, and keep the hash comparison and report updates unchanged.

253-278: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Consider RwLock instead of Mutex for the keyset.

Every key-aware read (get, get_shreddable, projection replay, delivery, compaction) funnels through this single decrypt primitive, and each call serializes on the same Mutex for the full AEAD-open duration even though decryption never mutates the keyset. Only shred/insert operations need exclusive access.

♻️ Illustrative direction (exact API depends on the keyset type not shown here)
-        let guard = key_store.lock();
-        let Some(key) = guard.get(&scope) else {
+        let guard = key_store.read();
+        let Some(key) = guard.get(&scope) else {
             return Ok(PayloadPlaintext::Shredded);
         };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bpk-lib/crates/core/src/store/read_api.rs` around lines 253 - 278, The keyset
access in open_encrypted_payload_bytes is using an exclusive lock for a
read-only decrypt path, which unnecessarily serializes all key-aware reads.
Update the key store locking in the keyset type and all callers that use
key_store.lock() so decryption paths like open_encrypted_payload_bytes take a
shared/read lock, while only shred/insert paths keep exclusive write access.
Ensure the updated lock type still works with the existing
get/get_shreddable/projection replay/delivery/compaction flow without changing
the decryption behavior.
bpk-lib/crates/core/src/store/write/writer/encrypt.rs (1)

90-115: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Comment claims get_or_create marks the keyset dirty on mint — it doesn't.

The graph context for KeyStore::get_or_create shows the Entry::Vacant branch only inserts the freshly generated key; it never touches self.dirty. Only destroy sets dirty = true. The code here is correct today only because it explicitly calls guard.mark_dirty() when minted is true — but the comment ("get_or_create already flags the store dirty on a mint; this keeps the intent explicit ... and is idempotent") asserts the opposite is also true, which isn't backed by get_or_create's actual implementation.

This is the exact "durability fence" invariant this module calls "the crux" — if a future refactor trusts this comment and drops the explicit mark_dirty() call as "redundant", a freshly minted key would never get flushed before its ciphertext, which is precisely the silent-data-loss scenario the KeyStore::dirty field's own docs warn about.

Recommend either correcting the comment to state that get_or_create does NOT mark dirty (so the explicit call here is load-bearing, not just "explicit intent"), or — more robust — moving the dirty-marking into KeyStore::get_or_create itself so this invariant can't be silently dropped by future callers.

✏️ Minimal fix: correct the comment
-        // A fresh mint puts the in-memory keyset ahead of disk — flag it so the
-        // fence flushes. `get_or_create` already flags the store dirty on a mint;
-        // this keeps the intent explicit at the seal site and is idempotent.
+        // A fresh mint puts the in-memory keyset ahead of disk — flag it so the
+        // fence flushes. `get_or_create` does NOT mark the keyset dirty itself;
+        // this explicit call is the only thing that does so on a mint, so it is
+        // load-bearing, not merely "explicit intent".
         if minted {
             guard.mark_dirty();
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bpk-lib/crates/core/src/store/write/writer/encrypt.rs` around lines 90 - 115,
The durability comment around the seal path is incorrect:
KeyStore::get_or_create does not mark the store dirty on mint, so the explicit
guard.mark_dirty() in the encrypt flow is load-bearing. Update the comment near
the ciphertext sealing logic to state that get_or_create only inserts the new
key and the dirty flag must be set explicitly when minted is true, or
alternatively move the dirty-marking into KeyStore::get_or_create so the
invariant is enforced at the source.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@bpk-lib/crates/core/src/event/payload.rs`:
- Around line 211-235: The startup check in __batpak_verify_registry_at_startup
currently relies on #[ctor::ctor], but constructor order is not guaranteed
relative to inventory::submit! registrations, so it can validate too early and
miss collisions. Move verify_registry() to a deterministic post-registration
entry point that runs after all EventPayload registrations are available, or
introduce an explicit ordering guarantee before calling it; keep the existing
abort-and-stderr behavior in place once the check runs.

In `@bpk-lib/crates/core/src/store/config.rs`:
- Around line 84-88: The payload-encryption docs in the config comments are
stale: they still say the setting “does not yet wire it into the append/read
paths,” which no longer matches the implemented crypto-shred/encrypt-at-rest
behavior. Update the documentation attached to the relevant config fields in
config.rs, including the builder-facing comments around the payload encryption
setting and any duplicate block around the same symbols, so they describe the
current append/read handling accurately and no longer mention Stage A-only
storage.
- Around line 281-296: The validation in config::validated() still allows
SigningPolicy::Required together with with_signing_downgrade_allowed(true),
which can later fall back to an unsigned receipt in the append/signing path.
Update the validation logic around SigningPolicy and signing_downgrade_allowed
to reject this combination, or have
with_signing_policy/with_signing_downgrade_allowed force downgrade back to false
whenever Required is selected. Make sure the invariant is enforced before the
store is opened so the append-time fallback cannot occur.

In `@bpk-lib/crates/core/src/store/hidden_ranges.rs`:
- Line 88: The empty-ranges branch in hidden_ranges should use the StoreFs
abstraction instead of calling the platform sync helper directly. Update the
code in hidden_ranges to route the parent-directory sync through the existing
fs.sync_parent_dir(&final_path)? method, keeping the behavior the same but
matching the rest of the StoreFs-based path handling.

In `@bpk-lib/crates/core/src/store/lifecycle_compact.rs`:
- Around line 184-188: In relocate_merged_source_if_present, the rollback path
currently removes merged_path even when the relocation has not yet moved
compact_source_path into place, which can delete the original sealed segment;
update the cleanup logic so the old segment is only deleted after a successful
rename/move, and preserve it on failures from remove_file_if_present or
fs.rename. Apply the same rollback safeguard in the other affected cleanup block
referenced by the same merged_path/compact_source_path flow.
- Around line 304-310: The tombstone compaction path is rewriting the encrypted
header kind to TOMBSTONE while still preserving the original ciphertext and
metadata, which breaks AAD validation during decrypt. Update the compaction
logic in lifecycle_compact’s tombstone handling to keep the original event kind
available for decryption, or otherwise avoid changing the kind on encrypted
entries before calling open_encrypted_payload_bytes. Also ensure read_api’s
payload_aad uses the preserved original kind for tombstoned encrypted payloads
so decrypting compacted tombstones still succeeds.

In `@bpk-lib/crates/core/src/store/read_api.rs`:
- Around line 473-497: The verify_chain method is vulnerable to compaction races
because it collects entries with query(&Region::all()) and then rereads each
event with read_raw separately; if retention removes an event between those
steps, the whole verification fails with StoreError::NotFound. Update
verify_chain in read_api.rs to either hold the lifecycle gate for the entire
verification pass or handle missing read_raw results as a non-fatal gap by
recording the affected event in ChainVerificationReport instead of returning an
error.

In `@bpk-lib/crates/core/src/store/sim/recovery.rs`:
- Around line 472-483: The fault-teardown check around the `close_result`
assertion uses `debug_assert!`, which can disappear in release builds and let
the `CrashOp::PersistTemp` scenario go unverified. Update the assertion in this
recovery test to use `assert!` so the precondition is always enforced, keeping
the torn-publish validation active regardless of build mode. Reference the
`sim_fs.arm_fault_on(...)` setup and the `store.close()` call when making the
change.

In `@bpk-lib/crates/netbat/src/lib.rs`:
- Around line 111-116: Update the trust-model comment near
serve_tcp_subscription_listener_secured to qualify the “never blocks the accept
loop” claim for SubscriptionDispatch::Sequential. Make it clear that the
non-blocking guarantee only applies when the handshake runs on a per-connection
worker after a permit is acquired, and that sequential subscriptions are served
inline so a slow TLS handshake can still block the accept loop. Preserve the
existing stats/failure wording and reference both
serve_tcp_subscription_listener_secured and SubscriptionDispatch::Sequential in
the revised text.

In `@bpk-lib/crates/netbat/src/transport/stream_tcp_tls.rs`:
- Around line 210-214: The doc comments on the control-flow enum are reversed:
PeerGone and Stopped describe the opposite conditions. Update the variant
documentation in stream_tcp_tls.rs so PeerGone explains peer close/read failure
and Stopped explains terminal control frames being forwarded, keeping the
meanings aligned with the actual uses of the enum and related control flow.

In `@bpk-lib/crates/netbat/src/transport/stream_tcp.rs`:
- Around line 288-293: Drain pending worker stats before joining the worker
threads in the shutdown path of stream_tcp::accept_loop (the loop that iterates
over workers and calls worker.join) so the bounded stats_tx send in the worker
cannot deadlock shutdown. Move or add the drain_subscription_stats(&mut stats,
&stats_rx) call to run before the join loop, and keep the existing worker
join/error handling intact after stats have been drained.

---

Nitpick comments:
In `@bpk-lib/crates/core/Cargo.toml`:
- Around line 41-56: The startup registry constructor setup is using an outdated
ctor dependency and leaves constructor ordering implicit. Update the `ctor`
dependency in `Cargo.toml` to a current 1.x release, enable the `priority`
support, and assign an explicit priority to
`__batpak_verify_registry_at_startup` so its startup ordering is deterministic
instead of relying on default ctor behavior.

In `@bpk-lib/crates/core/src/store/read_api.rs`:
- Around line 473-497: `verify_chain` is doing a redundant lookup by calling
`read_raw(entry.event_id())` for each `IndexEntry` even though the query already
returned entries with `disk_pos`. Update the loop in `verify_chain` to read the
stored event payload directly from the entry’s disk position instead of
re-resolving by ID, and keep the hash comparison and report updates unchanged.
- Around line 253-278: The keyset access in open_encrypted_payload_bytes is
using an exclusive lock for a read-only decrypt path, which unnecessarily
serializes all key-aware reads. Update the key store locking in the keyset type
and all callers that use key_store.lock() so decryption paths like
open_encrypted_payload_bytes take a shared/read lock, while only shred/insert
paths keep exclusive write access. Ensure the updated lock type still works with
the existing get/get_shreddable/projection replay/delivery/compaction flow
without changing the decryption behavior.

In `@bpk-lib/crates/core/src/store/write/writer/encrypt.rs`:
- Around line 90-115: The durability comment around the seal path is incorrect:
KeyStore::get_or_create does not mark the store dirty on mint, so the explicit
guard.mark_dirty() in the encrypt flow is load-bearing. Update the comment near
the ciphertext sealing logic to state that get_or_create only inserts the new
key and the dirty flag must be set explicitly when minted is true, or
alternatively move the dirty-marking into KeyStore::get_or_create so the
invariant is enforced at the source.

In `@bpk-lib/crates/core/tests/mutation_kill_integrity_round2.rs`:
- Around line 259-317: The test seeds a crypto-shred keyset file using a
hardcoded filename literal, which should instead follow the shared
classification constant. Update
snapshot_preclear_wipes_stale_segments_but_never_foreign_or_keyset_files to use
KEYSET_FILENAME (as batch_fence_crash_tests.rs does) when writing the keyset
artifact so the test stays aligned with file_classification::from_path and won’t
drift if the filename changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 51b62766-af32-4f07-8fd6-d8c4d7228dcd

📥 Commits

Reviewing files that changed from the base of the PR and between 9f56bcf and 31ab0f5.

⛔ Files ignored due to path filters (4)
  • bpk-lib/Cargo.lock is excluded by !**/*.lock
  • bpk-lib/crates/netbat/tests/fixtures/tls_test_ca_cert.pem is excluded by !**/*.pem
  • bpk-lib/crates/netbat/tests/fixtures/tls_test_cert.pem is excluded by !**/*.pem
  • bpk-lib/crates/netbat/tests/fixtures/tls_test_key.pem is excluded by !**/*.pem
📒 Files selected for processing (211)
  • 03_INVARIANTS.md
  • CHANGELOG.md
  • README.md
  • bpk-lib/.config/nextest.toml
  • bpk-lib/crates/batpak-examples/Cargo.toml
  • bpk-lib/crates/batpak-examples/README.md
  • bpk-lib/crates/batpak-examples/src/bin/append_with_gate.rs
  • bpk-lib/crates/batpak-examples/src/bin/batch_append.rs
  • bpk-lib/crates/batpak-examples/src/bin/cursor_worker.rs
  • bpk-lib/crates/batpak-examples/src/bin/dungeon_typestate.rs
  • bpk-lib/crates/batpak-examples/src/bin/event_sourced_counter.rs
  • bpk-lib/crates/batpak-examples/src/bin/idempotent_pass.rs
  • bpk-lib/crates/batpak-examples/src/bin/outbox.rs
  • bpk-lib/crates/batpak-examples/src/bin/quickstart.rs
  • bpk-lib/crates/batpak-examples/src/bin/raw_projection_counter.rs
  • bpk-lib/crates/batpak-examples/src/bin/raw_projection_counter_derived.rs
  • bpk-lib/crates/batpak-examples/src/bin/read_only.rs
  • bpk-lib/crates/batpak-examples/src/bin/submit_pipeline.rs
  • bpk-lib/crates/batpak-examples/src/bin/subscription_fanout.rs
  • bpk-lib/crates/batpak-examples/src/bin/typestate_transitions.rs
  • bpk-lib/crates/bench-support/Cargo.toml
  • bpk-lib/crates/bvisor/Cargo.toml
  • bpk-lib/crates/core/Cargo.toml
  • bpk-lib/crates/core/README.md
  • bpk-lib/crates/core/fixtures/kind-collision-composer/src/lib.rs
  • bpk-lib/crates/core/fixtures/registry-startup-collision/Cargo.toml
  • bpk-lib/crates/core/fixtures/registry-startup-collision/src/clean_verify.rs
  • bpk-lib/crates/core/fixtures/registry-startup-collision/src/collide_verify.rs
  • bpk-lib/crates/core/fixtures/registry-startup-ctor/Cargo.toml
  • bpk-lib/crates/core/fixtures/registry-startup-ctor/src/collide_ctor.rs
  • bpk-lib/crates/core/fixtures/store-open-collision/Cargo.toml
  • bpk-lib/crates/core/fixtures/store-open-collision/src/open_default_failfast.rs
  • bpk-lib/crates/core/fixtures/store-open-collision/src/open_warn_opens.rs
  • bpk-lib/crates/core/src/event/header.rs
  • bpk-lib/crates/core/src/event/mod.rs
  • bpk-lib/crates/core/src/event/payload.rs
  • bpk-lib/crates/core/src/event/upcast.rs
  • bpk-lib/crates/core/src/lib.rs
  • bpk-lib/crates/core/src/prelude.rs
  • bpk-lib/crates/core/src/store/ancestry/by_hash.rs
  • bpk-lib/crates/core/src/store/ancestry/mod.rs
  • bpk-lib/crates/core/src/store/cold_start/checkpoint/format.rs
  • bpk-lib/crates/core/src/store/cold_start/checkpoint/tests.rs
  • bpk-lib/crates/core/src/store/cold_start/checkpoint/write.rs
  • bpk-lib/crates/core/src/store/cold_start/mmap.rs
  • bpk-lib/crates/core/src/store/cold_start/mmap/format.rs
  • bpk-lib/crates/core/src/store/cold_start/mod.rs
  • bpk-lib/crates/core/src/store/cold_start/rebuild/tests.rs
  • bpk-lib/crates/core/src/store/cold_start/rebuild/topology.rs
  • bpk-lib/crates/core/src/store/config.rs
  • bpk-lib/crates/core/src/store/config/tests.rs
  • bpk-lib/crates/core/src/store/config/types.rs
  • bpk-lib/crates/core/src/store/config/validation.rs
  • bpk-lib/crates/core/src/store/crypto_shred_api.rs
  • bpk-lib/crates/core/src/store/delivery/cursor.rs
  • bpk-lib/crates/core/src/store/delivery/cursor/checkpoint.rs
  • bpk-lib/crates/core/src/store/delivery/cursor/worker.rs
  • bpk-lib/crates/core/src/store/error.rs
  • bpk-lib/crates/core/src/store/error/display.rs
  • bpk-lib/crates/core/src/store/file_classification.rs
  • bpk-lib/crates/core/src/store/hidden_ranges.rs
  • bpk-lib/crates/core/src/store/import.rs
  • bpk-lib/crates/core/src/store/index/idemp.rs
  • bpk-lib/crates/core/src/store/index/tests.rs
  • bpk-lib/crates/core/src/store/keyscope.rs
  • bpk-lib/crates/core/src/store/keyscope/persist.rs
  • bpk-lib/crates/core/src/store/keyscope/persist/crash_tests.rs
  • bpk-lib/crates/core/src/store/keyscope/persist/tests.rs
  • bpk-lib/crates/core/src/store/keyscope/tests.rs
  • bpk-lib/crates/core/src/store/lifecycle.rs
  • bpk-lib/crates/core/src/store/lifecycle_close.rs
  • bpk-lib/crates/core/src/store/lifecycle_compact.rs
  • bpk-lib/crates/core/src/store/lifecycle_fork.rs
  • bpk-lib/crates/core/src/store/lifecycle_snapshot.rs
  • bpk-lib/crates/core/src/store/mod.rs
  • bpk-lib/crates/core/src/store/open.rs
  • bpk-lib/crates/core/src/store/open/tests.rs
  • bpk-lib/crates/core/src/store/platform/fs.rs
  • bpk-lib/crates/core/src/store/platform/fs_tests.rs
  • bpk-lib/crates/core/src/store/projection/flow/encrypted_replay.rs
  • bpk-lib/crates/core/src/store/projection/flow/mod.rs
  • bpk-lib/crates/core/src/store/projection/flow/outcome.rs
  • bpk-lib/crates/core/src/store/projection/flow/replay_input.rs
  • bpk-lib/crates/core/src/store/reactor_delivery.rs
  • bpk-lib/crates/core/src/store/reactor_typed.rs
  • bpk-lib/crates/core/src/store/read_api.rs
  • bpk-lib/crates/core/src/store/receipt_verification.rs
  • bpk-lib/crates/core/src/store/runtime_contracts.rs
  • bpk-lib/crates/core/src/store/segment/boundary_tests.rs
  • bpk-lib/crates/core/src/store/segment/recovery_manifest.rs
  • bpk-lib/crates/core/src/store/segment/scan/full_scan.rs
  • bpk-lib/crates/core/src/store/segment/scan/mod.rs
  • bpk-lib/crates/core/src/store/segment/scan/point_read.rs
  • bpk-lib/crates/core/src/store/segment/scan/recovery/tests.rs
  • bpk-lib/crates/core/src/store/segment/scan/tests.rs
  • bpk-lib/crates/core/src/store/signing.rs
  • bpk-lib/crates/core/src/store/sim/atomic_fault.rs
  • bpk-lib/crates/core/src/store/sim/fault_model.rs
  • bpk-lib/crates/core/src/store/sim/fs.rs
  • bpk-lib/crates/core/src/store/sim/import_recovery.rs
  • bpk-lib/crates/core/src/store/sim/mod.rs
  • bpk-lib/crates/core/src/store/sim/read_fault.rs
  • bpk-lib/crates/core/src/store/sim/recovery.rs
  • bpk-lib/crates/core/src/store/write/writer.rs
  • bpk-lib/crates/core/src/store/write/writer/append.rs
  • bpk-lib/crates/core/src/store/write/writer/batch.rs
  • bpk-lib/crates/core/src/store/write/writer/batch_fence_crash_tests.rs
  • bpk-lib/crates/core/src/store/write/writer/encrypt.rs
  • bpk-lib/crates/core/src/store/write/writer/fence_runtime.rs
  • bpk-lib/crates/core/src/store/write/writer/runtime.rs
  • bpk-lib/crates/core/src/store/write/writer/runtime/mutation_tests.rs
  • bpk-lib/crates/core/src/store/write/writer/runtime/tests.rs
  • bpk-lib/crates/core/tests/chain_verification.rs
  • bpk-lib/crates/core/tests/crypto_shred_ancestry.rs
  • bpk-lib/crates/core/tests/crypto_shred_delivery.rs
  • bpk-lib/crates/core/tests/crypto_shred_payload.rs
  • bpk-lib/crates/core/tests/crypto_shred_projection_compaction.rs
  • bpk-lib/crates/core/tests/event_payload_collision_default_fail_fast.rs
  • bpk-lib/crates/core/tests/event_payload_registry_startup.rs
  • bpk-lib/crates/core/tests/keyscope_foundation.rs
  • bpk-lib/crates/core/tests/keyscope_persist.rs
  • bpk-lib/crates/core/tests/mutation_kill_integrity_round2.rs
  • bpk-lib/crates/core/tests/mutation_kill_keyset_round2.rs
  • bpk-lib/crates/core/tests/mutation_kill_recovery_round2.rs
  • bpk-lib/crates/core/tests/mutation_kill_wpc_round3.rs
  • bpk-lib/crates/core/tests/mutation_kill_wpc_round3_cooperative.rs
  • bpk-lib/crates/core/tests/mutation_kill_wpc_round3_encrypted.rs
  • bpk-lib/crates/core/tests/signing_policy.rs
  • bpk-lib/crates/core/tests/store_ancestors_retention_coherence.rs
  • bpk-lib/crates/core/tests/store_compaction_survivor_payload.rs
  • bpk-lib/crates/core/tests/typestate_safety.rs
  • bpk-lib/crates/core/tests/upcast_chain_complete_opens.rs
  • bpk-lib/crates/core/tests/upcast_chain_incomplete_default_fail_fast.rs
  • bpk-lib/crates/hostbat/Cargo.toml
  • bpk-lib/crates/hostbat/src/builder.rs
  • bpk-lib/crates/hostbat/src/host_control_backend.rs
  • bpk-lib/crates/hostbat/src/lib.rs
  • bpk-lib/crates/hostbat/src/validating_effect_backend.rs
  • bpk-lib/crates/hostbat/tests/host_control_backend.rs
  • bpk-lib/crates/macros-support/Cargo.toml
  • bpk-lib/crates/macros-support/src/lib.rs
  • bpk-lib/crates/macros/Cargo.toml
  • bpk-lib/crates/macros/src/event_payload.rs
  • bpk-lib/crates/macros/src/operation.rs
  • bpk-lib/crates/netbat/Cargo.toml
  • bpk-lib/crates/netbat/README.md
  • bpk-lib/crates/netbat/benches/boundary.rs
  • bpk-lib/crates/netbat/src/lib.rs
  • bpk-lib/crates/netbat/src/transport/limiter.rs
  • bpk-lib/crates/netbat/src/transport/mod.rs
  • bpk-lib/crates/netbat/src/transport/stream_tcp.rs
  • bpk-lib/crates/netbat/src/transport/stream_tcp_tests.rs
  • bpk-lib/crates/netbat/src/transport/stream_tcp_tls.rs
  • bpk-lib/crates/netbat/src/transport/stream_tcp_tls_tests.rs
  • bpk-lib/crates/netbat/src/transport/tcp.rs
  • bpk-lib/crates/netbat/src/transport/tls.rs
  • bpk-lib/crates/netbat/tests/boundary.rs
  • bpk-lib/crates/netbat/tests/connection_limit.rs
  • bpk-lib/crates/netbat/tests/err_code_table.rs
  • bpk-lib/crates/netbat/tests/mutation_kill_netbat-transport-round2.rs
  • bpk-lib/crates/netbat/tests/mutation_kill_netbat-transport.rs
  • bpk-lib/crates/netbat/tests/subscription_concurrency.rs
  • bpk-lib/crates/netbat/tests/tcp_transport.rs
  • bpk-lib/crates/netbat/tests/tls_subscription.rs
  • bpk-lib/crates/netbat/tests/tls_transport.rs
  • bpk-lib/crates/syncbat/Cargo.toml
  • bpk-lib/crates/syncbat/README.md
  • bpk-lib/crates/syncbat/benches/dispatch.rs
  • bpk-lib/crates/syncbat/src/builder.rs
  • bpk-lib/crates/syncbat/src/core.rs
  • bpk-lib/crates/syncbat/src/effect.rs
  • bpk-lib/crates/syncbat/src/effect_backend.rs
  • bpk-lib/crates/syncbat/src/error.rs
  • bpk-lib/crates/syncbat/src/lib.rs
  • bpk-lib/crates/syncbat/src/operation_name.rs
  • bpk-lib/crates/syncbat/src/receipt.rs
  • bpk-lib/crates/syncbat/src/store_effect.rs
  • bpk-lib/crates/syncbat/src/subscription_runtime/entity_stream.rs
  • bpk-lib/crates/syncbat/src/subscription_runtime/envelope.rs
  • bpk-lib/crates/syncbat/src/subscription_runtime/event_stream.rs
  • bpk-lib/crates/syncbat/src/subscription_runtime/operation_status_stream_tests.rs
  • bpk-lib/crates/syncbat/src/subscription_runtime/receipt_stream.rs
  • bpk-lib/crates/syncbat/tests/capability_authz.rs
  • bpk-lib/crates/syncbat/tests/crypto_shred_delivery.rs
  • bpk-lib/crates/syncbat/tests/effect_enforcement.rs
  • bpk-lib/crates/syncbat/tests/emit_receipt_backed.rs
  • bpk-lib/crates/syncbat/tests/mutation_kill_syncbat-core-surfaces.rs
  • bpk-lib/crates/syncbat/tests/mutation_kill_syncbat-subscription-runtime.rs
  • bpk-lib/crates/syncbat/tests/operation_macro.rs
  • bpk-lib/crates/syncbat/tests/property.rs
  • bpk-lib/crates/syncbat/tests/runtime.rs
  • bpk-lib/crates/syncbat/tests/store_effect_backed.rs
  • bpk-lib/crates/testkit/Cargo.toml
  • bpk-lib/crates/testkit/src/prelude.rs
  • bpk-lib/crates/testkit/src/store_error_contract.rs
  • bpk-lib/deny.toml
  • bpk-lib/tools/integrity/src/mutation_exclusion_registry.rs
  • bpk-lib/tools/integrity/src/platform_qualification_matrix.rs
  • bpk-lib/tools/xtask/Cargo.toml
  • bpk-lib/tools/xtask/src/commands/mutants/lanes.rs
  • bpk-lib/tools/xtask/src/commands/mutants/mod.rs
  • bpk-lib/tools/xtask/src/commands/mutants/policy.rs
  • bpk-lib/traceability/artifacts.yaml
  • bpk-lib/traceability/capability_snapshot.yaml
  • bpk-lib/traceability/concept_catalog.yaml
  • bpk-lib/traceability/dst_corpus.yaml
  • bpk-lib/traceability/invariants.yaml
  • bpk-lib/traceability/public_api/batpak.txt
  • bpk-lib/traceability/public_api/netbat.txt
  • bpk-lib/traceability/public_api/syncbat.txt
  • bpk-lib/traceability/releases/0.9.0.yaml
💤 Files with no reviewable changes (1)
  • bpk-lib/crates/batpak-examples/src/bin/dungeon_typestate.rs

Comment on lines +211 to +235
/// Process-wide startup constructor installed by the non-default
/// `startup-registry-check` feature.
///
/// Runs before `main`, so a release binary that registers colliding
/// `EventPayload` kinds and never opens a `Store` still fails fast: it writes a
/// diagnostic to `stderr` and aborts the process. One central constructor covers
/// the whole binary (the derive emits no per-type startup hook), so this is
/// idempotent by construction. The diagnostic is written with `write_all` on
/// `std::io::stderr()` rather than `eprintln!` to honor the crate's
/// no-`print_stderr` discipline, and the write result is deliberately ignored:
/// if `stderr` itself is unwritable the process must still abort so the collision
/// can never be silently accepted at startup.
#[cfg(feature = "startup-registry-check")]
#[ctor::ctor]
fn __batpak_verify_registry_at_startup() {
use std::io::Write;

if let Err(error) = verify_registry() {
let message = format!("batpak startup-registry-check: aborting before main: {error}\n");
let mut stderr = std::io::stderr();
let _ = stderr.write_all(message.as_bytes());
let _ = stderr.flush();
std::process::abort();
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🌐 Web query:

Does the inventory crate guarantee its submit! registration constructors run before any user-defined ctor::ctor constructor in the same binary?

💡 Result:

No, the inventory crate does not guarantee that its submit! registration constructors run before any user-defined constructors (such as those created by the ctor crate) in the same binary [1][2][3]. The inventory crate operates by using runtime initialization functions similar to attribute((constructor)) in C, which execute as part of the "life-before-main" phase for statically linked elements [1][2][3]. Because these registration shims rely on the platform's standard mechanism for executing static constructors, the execution order between inventory's registrations and other user-defined constructors is generally determined by the linker and the operating system's loader [1][3]. There is no mechanism in the inventory crate to enforce or guarantee a specific ordering relative to other independent constructor functions [1][4][5]. Furthermore, the inventory crate explicitly documentation states that there is no guaranteed order in which plugins of the same type are visited by the iterator [1][4][5]. Users should not rely on a deterministic execution or registration order when using this crate [1][5].

Citations:


Don't use #[ctor::ctor] for this check inventory::submit! constructors have no ordering guarantee relative to other constructors, so this can run before some registrations exist and miss a real collision. Move the validation to a deterministic post-registration point or add an explicit ordering guarantee.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bpk-lib/crates/core/src/event/payload.rs` around lines 211 - 235, The startup
check in __batpak_verify_registry_at_startup currently relies on #[ctor::ctor],
but constructor order is not guaranteed relative to inventory::submit!
registrations, so it can validate too early and miss collisions. Move
verify_registry() to a deterministic post-registration entry point that runs
after all EventPayload registrations are available, or introduce an explicit
ordering guarantee before calling it; keep the existing abort-and-stderr
behavior in place once the check runs.

Comment on lines +84 to +88
/// Opt-in crypto-shred payload encryption. `None` (default) disables it and
/// preserves today's plaintext-payload behavior; `Some(granularity)` selects
/// the [`KeyScopeGranularity`] keys are partitioned by. Holds only the
/// granularity — never any key material. Stage A stores this setting but does
/// not yet wire it into the append/read paths.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Update the stale payload-encryption docs.

These comments still say the config “does not yet wire it into the append/read paths,” but this PR’s stated surface includes implemented encrypt-at-rest/crypto-shred handling. This will mislead users reading the public builder docs.

Also applies to: 315-320

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bpk-lib/crates/core/src/store/config.rs` around lines 84 - 88, The
payload-encryption docs in the config comments are stale: they still say the
setting “does not yet wire it into the append/read paths,” which no longer
matches the implemented crypto-shred/encrypt-at-rest behavior. Update the
documentation attached to the relevant config fields in config.rs, including the
builder-facing comments around the payload encryption setting and any duplicate
block around the same symbols, so they describe the current append/read handling
accurately and no longer mention Stage A-only storage.

Comment on lines +281 to +296
/// Set the receipt signing policy.
///
/// `Optional` (default) permits a keyless store; `Required` refuses to open
/// without a signing key, so unsigned receipts can never be accepted.
pub fn with_signing_policy(mut self, signing_policy: SigningPolicy) -> Self {
self.signing_policy = signing_policy;
self
}

/// Permit best-effort downgrade to an unsigned receipt when a configured
/// signer cannot build its signature cover. Default `false` (the append
/// fails closed rather than silently emitting an unsigned receipt).
pub fn with_signing_downgrade_allowed(mut self, allow: bool) -> Self {
self.signing_downgrade_allowed = allow;
self
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Inspect validation and signing paths for Required + downgrade handling.
rg -n -C4 'signing_downgrade_allowed|SigningPolicy::Required|with_signing_policy|with_signing_downgrade_allowed' bpk-lib/crates/core/src bpk-lib/crates/core/tests

Repository: freebatteryfactory/batpak

Length of output: 17218


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the validation and append-time signing logic for policy/downgrade interaction.
sed -n '1,220p' bpk-lib/crates/core/src/store/config/validation.rs
printf '\n--- signing.rs ---\n'
sed -n '1,220p' bpk-lib/crates/core/src/store/signing.rs
printf '\n--- signing registry/types ---\n'
sed -n '1,220p' bpk-lib/crates/core/src/store/config/types.rs
printf '\n--- related tests ---\n'
sed -n '1,220p' bpk-lib/crates/core/src/store/config/tests.rs
sed -n '1,220p' bpk-lib/crates/core/tests/signing_policy.rs

Repository: freebatteryfactory/batpak

Length of output: 33446


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find every place the signing policy and downgrade flag are combined or enforced.
rg -n -C3 'signing_downgrade_allowed|SigningPolicy::Required|ReceiptSigningRegistry::from_keys|sign_append_receipt|validated\(' bpk-lib/crates/core/src

Repository: freebatteryfactory/batpak

Length of output: 29513


Reject Required when signing downgrade is enabled. validated() still accepts SigningPolicy::Required plus with_signing_downgrade_allowed(true), and append-time signing will fall back to an unsigned receipt on cover-build failure. Make that combination invalid, or force downgrade off when Required is selected.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bpk-lib/crates/core/src/store/config.rs` around lines 281 - 296, The
validation in config::validated() still allows SigningPolicy::Required together
with with_signing_downgrade_allowed(true), which can later fall back to an
unsigned receipt in the append/signing path. Update the validation logic around
SigningPolicy and signing_downgrade_allowed to reject this combination, or have
with_signing_policy/with_signing_downgrade_allowed force downgrade back to false
whenever Required is selected. Make sure the invariant is enforced before the
store is opened so the append-time fallback cannot occur.

.remove_file_if_present(&final_path)
.map_err(StoreError::Io)?
{
crate::store::platform::sync::sync_parent_dir(&final_path)?;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Inspect the StoreFs trait surface to see if sync/read helpers are expected to be included.
fd -e rs 'fs.rs' bpk-lib/crates/core/src/store/platform | xargs cat -n

Repository: freebatteryfactory/batpak

Length of output: 27713


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the relevant function and its call sites.
file='bpk-lib/crates/core/src/store/hidden_ranges.rs'
wc -l "$file"
sed -n '1,220p' "$file"

# Check whether load_cancelled_ranges already has an fs-routed variant or nearby abstraction.
rg -n "load_cancelled_ranges|write_cancelled_ranges|sync_parent_dir|sync_file_all_io|admit_current_parent_dir_sync|platform::fs::read|read_exact_at" bpk-lib/crates/core/src/store -S

Repository: freebatteryfactory/batpak

Length of output: 20005


Route the empty-ranges parent-dir sync through StoreFs.

StoreFs already exposes sync_parent_dir, so this branch should call fs.sync_parent_dir(&final_path)? instead of crate::store::platform::sync::sync_parent_dir(&final_path)?.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bpk-lib/crates/core/src/store/hidden_ranges.rs` at line 88, The empty-ranges
branch in hidden_ranges should use the StoreFs abstraction instead of calling
the platform sync helper directly. Update the code in hidden_ranges to route the
parent-directory sync through the existing fs.sync_parent_dir(&final_path)?
method, keeping the behavior the same but matching the rest of the StoreFs-based
path handling.

Comment on lines +184 to +188
fs.remove_file_if_present(merged_path)
.map_err(StoreError::Io)?;
if let Some(temp_source_path) = compact_source_path {
platform_fs::rename(temp_source_path, merged_path).map_err(StoreError::Io)?;
fs.rename(temp_source_path, merged_path)
.map_err(StoreError::Io)?;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🔴 Critical | ⚡ Quick win

Avoid deleting the original segment when relocation fails before the move.

If remove_file_if_present or rename fails in relocate_merged_source_if_present before compact_source_path is set, the original sealed segment still lives at merged_path; rollback then removes it unconditionally on Line 184, losing data.

🐛 Proposed rollback fix
 fn rollback_compaction_disk_state(
     data_dir: &std::path::Path,
     merged_path: &std::path::Path,
     compact_source_path: Option<&std::path::Path>,
     fs: &dyn StoreFs,
 ) -> Result<(), StoreError> {
-    fs.remove_file_if_present(merged_path)
-        .map_err(StoreError::Io)?;
     if let Some(temp_source_path) = compact_source_path {
+        fs.remove_file_if_present(merged_path)
+            .map_err(StoreError::Io)?;
         fs.rename(temp_source_path, merged_path)
             .map_err(StoreError::Io)?;
     }
     crate::store::cold_start::rebuild::clear_pending_compaction(data_dir, fs)?;
     Ok(())
 }

Also applies to: 369-372

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bpk-lib/crates/core/src/store/lifecycle_compact.rs` around lines 184 - 188,
In relocate_merged_source_if_present, the rollback path currently removes
merged_path even when the relocation has not yet moved compact_source_path into
place, which can delete the original sealed segment; update the cleanup logic so
the old segment is only deleted after a successful rename/move, and preserve it
on failures from remove_file_if_present or fs.rename. Apply the same rollback
safeguard in the other affected cleanup block referenced by the same
merged_path/compact_source_path flow.

Comment on lines +473 to +497
pub fn verify_chain(&self) -> Result<ChainVerificationReport, StoreError> {
let mut entries = self.query(&Region::all());
entries.sort_by_key(IndexEntry::global_sequence);
let mut report = ChainVerificationReport::default();
let mut verified_hashes: std::collections::BTreeSet<[u8; 32]> =
std::collections::BTreeSet::new();
for entry in &entries {
report.events_checked += 1;
let stored = self.read_raw(entry.event_id())?;
let recomputed = crate::event::hash::compute_hash(&stored.event.payload);
if recomputed == entry.hash_chain().event_hash {
verified_hashes.insert(entry.hash_chain().event_hash);
} else {
report.content_hash_mismatches.push(entry.event_id());
}
}
for entry in &entries {
let prev = entry.hash_chain().prev_hash;
if prev != [0u8; 32] && !verified_hashes.contains(&prev) {
report.dangling_links.push(entry.event_id());
}
}
Ok(report)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check whether Store serializes compaction against verify_chain / other readers.
rg -n -B3 -A15 'pub fn compact' bpk-lib/crates/core/src/store/lifecycle_compact.rs
rg -n 'fn verify_chain' -A5 bpk-lib/crates/core/src/store/read_api.rs

Repository: freebatteryfactory/batpak

Length of output: 559


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Map the relevant store files and inspect the reader/compactor implementations.
git ls-files bpk-lib/crates/core/src/store
echo '--- lifecycle_compact outline ---'
ast-grep outline bpk-lib/crates/core/src/store/lifecycle_compact.rs --view expanded || true
echo '--- read_api outline ---'
ast-grep outline bpk-lib/crates/core/src/store/read_api.rs --view expanded || true
echo '--- search for read_raw/query/locking ---'
rg -n -A4 -B4 'fn (read_raw|query)|read_raw\(|query\(&Region::all\(\)\)|Mutex|RwLock|lock\(|retention|compact|purge|delete|remove' bpk-lib/crates/core/src/store

Repository: freebatteryfactory/batpak

Length of output: 50382


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Read only the relevant slices once file sizes are known.
wc -l bpk-lib/crates/core/src/store/read_api.rs bpk-lib/crates/core/src/store/lifecycle_compact.rs
echo '--- read_api.rs (around verify_chain and read_raw/query) ---'
sed -n '1,220p' bpk-lib/crates/core/src/store/read_api.rs
echo '--- lifecycle_compact.rs (top/middle relevant parts) ---'
sed -n '1,260p' bpk-lib/crates/core/src/store/lifecycle_compact.rs

Repository: freebatteryfactory/batpak

Length of output: 19053


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the exact functions and the storage backend behavior for not-found during reads.
rg -n -A30 -B10 'pub fn verify_chain|fn query\(|fn read_raw\(|enum StoreError|NotFound|compac|retent|garbage|prune|delete' bpk-lib/crates/core/src/store/read_api.rs bpk-lib/crates/core/src/store/lifecycle_compact.rs bpk-lib/crates/core/src/store

Repository: freebatteryfactory/batpak

Length of output: 50382


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Focus on call sites that may serialize reads with compaction or retention.
rg -n -A6 -B6 'verify_chain\(|query\(&Region::all\(\)\)|read_raw\(|compact\(|retention|gc|prune|vacuum|lock\(' bpk-lib/crates/core/src

Repository: freebatteryfactory/batpak

Length of output: 50382


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the exact reader/compactor code paths and any locking/serialization around them.
wc -l bpk-lib/crates/core/src/store/read_api.rs bpk-lib/crates/core/src/store/lifecycle_compact.rs
echo '--- read_api.rs: relevant functions ---'
rg -n -A40 -B12 'pub fn verify_chain|fn query\(|fn read_raw\(' bpk-lib/crates/core/src/store/read_api.rs
echo '--- lifecycle_compact.rs: relevant functions ---'
rg -n -A50 -B12 'pub fn compact|fn compact|retention|prune|vacuum|gc|delete|remove|lock\(' bpk-lib/crates/core/src/store/lifecycle_compact.rs
echo '--- store-wide read/lock/search ---'
rg -n -A6 -B6 'verify_chain\(|query\(&Region::all\(\)\)|read_raw\(|Mutex|RwLock|lock\(' bpk-lib/crates/core/src/store

Repository: freebatteryfactory/batpak

Length of output: 50382


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Narrow in on verify_chain, query/read_raw, and lifecycle gating.
rg -n -A25 -B10 'pub fn verify_chain|fn query\(|fn read_raw\(|lifecycle_gate\.lock\(\)|pub fn compact|retain|retention' \
  bpk-lib/crates/core/src/store/read_api.rs \
  bpk-lib/crates/core/src/store/lifecycle_compact.rs \
  bpk-lib/crates/core/src/store/write/control/store_bridge.rs \
  bpk-lib/crates/core/src/store

Repository: freebatteryfactory/batpak

Length of output: 50382


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Read the exact verify_chain implementation and surrounding reader methods.
sed -n '430,520p' bpk-lib/crates/core/src/store/read_api.rs | cat -n
echo '---'
sed -n '1,120p' bpk-lib/crates/core/src/store/lifecycle_compact.rs | cat -n

Repository: freebatteryfactory/batpak

Length of output: 9759


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show query/read_raw implementations and how missing entries are handled.
rg -n -A35 -B12 'fn query\(|fn read_raw\(' bpk-lib/crates/core/src/store/read_api.rs

Repository: freebatteryfactory/batpak

Length of output: 4240


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check whether verify_chain is ever wrapped by lifecycle_gate elsewhere.
rg -n -A4 -B4 'verify_chain\(' bpk-lib/crates/core/src

Repository: freebatteryfactory/batpak

Length of output: 2923


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Minimal evidence: lifecycle gate and compacting reader paths only.
rg -n -A12 -B8 'pub fn verify_chain|fn query\(|fn read_raw\(|lifecycle_gate\.lock\(\)|pub fn compact' \
  bpk-lib/crates/core/src/store/read_api.rs \
  bpk-lib/crates/core/src/store/lifecycle_compact.rs \
  bpk-lib/crates/core/src/store/write/control/store_bridge.rs

Repository: freebatteryfactory/batpak

Length of output: 12186


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Extract only the read_api methods and any lifecycle lock usage.
rg -n -A20 -B8 'pub fn verify_chain|fn query\(|fn read_raw\(|lifecycle_gate\.lock\(\)' bpk-lib/crates/core/src/store/read_api.rs

Repository: freebatteryfactory/batpak

Length of output: 4334


Guard verify_chain against compaction races. query(&Region::all()) and the per-entry read_raw happen in separate steps, so a concurrent retention/compaction pass can remove a sealed event in between and turn the whole report into StoreError::NotFound. Either hold the lifecycle gate for the full verification pass or treat missing rereads as a non-fatal gap in the report.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bpk-lib/crates/core/src/store/read_api.rs` around lines 473 - 497, The
verify_chain method is vulnerable to compaction races because it collects
entries with query(&Region::all()) and then rereads each event with read_raw
separately; if retention removes an event between those steps, the whole
verification fails with StoreError::NotFound. Update verify_chain in read_api.rs
to either hold the lifecycle gate for the entire verification pass or handle
missing read_raw results as a non-fatal gap by recording the affected event in
ChainVerificationReport instead of returning an error.

Comment on lines +472 to +483
// Arm the atomic-publish fault, then trigger the cold-start artifact
// publish via close(). close() drains the writer, flushes the durable
// idempotency store, then writes the checkpoint/mmap artifact — both now
// routed through StoreFs::persist_temp_with_parent_sync. The first such
// publish tears, so close() returns Err with the artifact un-published.
sim_fs.arm_fault_on(CrashOp::PersistTemp, 1);
let close_result = store.close();
debug_assert!(
close_result.is_err(),
"the armed PersistTemp fault must tear a cold-start artifact publish during close"
);
drop(close_result);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Use assert!, not debug_assert!, for the fault-teardown precondition.

debug_assert! is compiled out in release builds. If this suite is ever run with --release (e.g. under mutation testing or perf-oriented CI), the check that the armed PersistTemp fault actually tore close() silently disappears, and the test would validate a normal close — never proving the torn-publish scenario the test's docstring claims to exercise.

🔧 Proposed fix
         sim_fs.arm_fault_on(CrashOp::PersistTemp, 1);
         let close_result = store.close();
-        debug_assert!(
+        assert!(
             close_result.is_err(),
             "the armed PersistTemp fault must tear a cold-start artifact publish during close"
         );
         drop(close_result);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Arm the atomic-publish fault, then trigger the cold-start artifact
// publish via close(). close() drains the writer, flushes the durable
// idempotency store, then writes the checkpoint/mmap artifact — both now
// routed through StoreFs::persist_temp_with_parent_sync. The first such
// publish tears, so close() returns Err with the artifact un-published.
sim_fs.arm_fault_on(CrashOp::PersistTemp, 1);
let close_result = store.close();
debug_assert!(
close_result.is_err(),
"the armed PersistTemp fault must tear a cold-start artifact publish during close"
);
drop(close_result);
// Arm the atomic-publish fault, then trigger the cold-start artifact
// publish via close(). close() drains the writer, flushes the durable
// idempotency store, then writes the checkpoint/mmap artifact — both now
// routed through StoreFs::persist_temp_with_parent_sync. The first such
// publish tears, so close() returns Err with the artifact un-published.
sim_fs.arm_fault_on(CrashOp::PersistTemp, 1);
let close_result = store.close();
assert!(
close_result.is_err(),
"the armed PersistTemp fault must tear a cold-start artifact publish during close"
);
drop(close_result);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bpk-lib/crates/core/src/store/sim/recovery.rs` around lines 472 - 483, The
fault-teardown check around the `close_result` assertion uses `debug_assert!`,
which can disappear in release builds and let the `CrashOp::PersistTemp`
scenario go unverified. Update the assertion in this recovery test to use
`assert!` so the precondition is always enforced, keeping the torn-publish
validation active regardless of build mode. Reference the
`sim_fs.arm_fault_on(...)` setup and the `store.close()` call when making the
change.

Comment on lines +111 to +116
//! [`serve_tcp_subscription_listener_secured`]). The rustls handshake runs on
//! the per-connection worker *after* the concurrency permit is acquired, so a
//! slow or hostile handshake occupies at most one worker+permit slot and never
//! blocks the accept loop; a failed handshake (for example, a cleartext peer) is
//! counted in [`TcpServeStats::tls_handshake_failures`] and the connection is
//! dropped — never listener-fatal. See [`TlsServerConfig`] for a PEM example.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Qualify the TLS handshake statement for sequential subscriptions.

With SubscriptionDispatch::Sequential, the subscription session is served inline, so a slow TLS handshake can still block the accept loop. The current trust-model text says it “never blocks the accept loop” for both secured listener entrypoints.

Proposed wording
-//! [`serve_tcp_subscription_listener_secured`]). The rustls handshake runs on
-//! the per-connection worker *after* the concurrency permit is acquired, so a
-//! slow or hostile handshake occupies at most one worker+permit slot and never
-//! blocks the accept loop; a failed handshake (for example, a cleartext peer) is
-//! counted in [`TcpServeStats::tls_handshake_failures`] and the connection is
-//! dropped — never listener-fatal. See [`TlsServerConfig`] for a PEM example.
+//! [`serve_tcp_subscription_listener_secured`]). For request listeners and the
+//! default concurrent subscription dispatch, the rustls handshake runs on a
+//! per-connection worker *after* the concurrency permit is acquired, so a slow
+//! or hostile handshake occupies at most one worker+permit slot and never
+//! blocks the accept loop. If subscription dispatch is explicitly set to
+//! [`SubscriptionDispatch::Sequential`], the session, including the handshake,
+//! runs inline. A failed handshake is counted in the corresponding
+//! `tls_handshake_failures` stats field and the connection is dropped — never
+//! listener-fatal. See [`TlsServerConfig`] for a PEM example.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
//! [`serve_tcp_subscription_listener_secured`]). The rustls handshake runs on
//! the per-connection worker *after* the concurrency permit is acquired, so a
//! slow or hostile handshake occupies at most one worker+permit slot and never
//! blocks the accept loop; a failed handshake (for example, a cleartext peer) is
//! counted in [`TcpServeStats::tls_handshake_failures`] and the connection is
//! dropped — never listener-fatal. See [`TlsServerConfig`] for a PEM example.
//! [`serve_tcp_subscription_listener_secured`]). For request listeners and the
//! default concurrent subscription dispatch, the rustls handshake runs on a
//! per-connection worker *after* the concurrency permit is acquired, so a slow
//! or hostile handshake occupies at most one worker+permit slot and never
//! blocks the accept loop. If subscription dispatch is explicitly set to
//! [`SubscriptionDispatch::Sequential`], the session, including the handshake,
//! runs inline. A failed handshake is counted in the corresponding
//! `tls_handshake_failures` stats field and the connection is dropped — never
//! listener-fatal. See [`TlsServerConfig`] for a PEM example.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bpk-lib/crates/netbat/src/lib.rs` around lines 111 - 116, Update the
trust-model comment near serve_tcp_subscription_listener_secured to qualify the
“never blocks the accept loop” claim for SubscriptionDispatch::Sequential. Make
it clear that the non-blocking guarantee only applies when the handshake runs on
a per-connection worker after a permit is acquired, and that sequential
subscriptions are served inline so a slow TLS handshake can still block the
accept loop. Preserve the existing stats/failure wording and reference both
serve_tcp_subscription_listener_secured and SubscriptionDispatch::Sequential in
the revised text.

Comment on lines +210 to +214
/// A terminal control frame (cancel / malformed) was forwarded; stop reading
/// the socket. The session ends on its next poll.
PeerGone,
/// The peer closed or its read failed; the caller forwards `Disconnected`.
Stopped,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Swap these variant docs to match the actual control flow.

PeerGone is returned for peer close/read failure and Stopped is returned for terminal control frames, but the comments say the opposite.

Proposed doc fix
-    /// A terminal control frame (cancel / malformed) was forwarded; stop reading
-    /// the socket. The session ends on its next poll.
-    PeerGone,
-    /// The peer closed or its read failed; the caller forwards `Disconnected`.
-    Stopped,
+    /// The peer closed or its read failed; the caller forwards `Disconnected`.
+    PeerGone,
+    /// A terminal control frame (cancel / malformed) was forwarded; stop reading
+    /// the socket. The session ends on its next poll.
+    Stopped,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// A terminal control frame (cancel / malformed) was forwarded; stop reading
/// the socket. The session ends on its next poll.
PeerGone,
/// The peer closed or its read failed; the caller forwards `Disconnected`.
Stopped,
/// The peer closed or its read failed; the caller forwards `Disconnected`.
PeerGone,
/// A terminal control frame (cancel / malformed) was forwarded; stop reading
/// the socket. The session ends on its next poll.
Stopped,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bpk-lib/crates/netbat/src/transport/stream_tcp_tls.rs` around lines 210 -
214, The doc comments on the control-flow enum are reversed: PeerGone and
Stopped describe the opposite conditions. Update the variant documentation in
stream_tcp_tls.rs so PeerGone explains peer close/read failure and Stopped
explains terminal control frames being forwarded, keeping the meanings aligned
with the actual uses of the enum and related control flow.

Comment on lines +288 to +293
for worker in workers {
worker.join().map_err(|_| NetbatError::Io {
kind: io::ErrorKind::Other,
})?;
}
drain_subscription_stats(&mut stats, &stats_rx);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Drain pending worker stats before joining workers.

A worker can block at Line 390 on the bounded stats_tx.send(conn_stats) while holding its thread alive. If stats_rx is already full when the accept loop exits, Line 288 joins before Line 293 drains, so shutdown/lifetime exit can deadlock.

Proposed fix
-    for worker in workers {
+    drain_subscription_stats(&mut stats, &stats_rx);
+    for worker in workers {
+        drain_subscription_stats(&mut stats, &stats_rx);
         worker.join().map_err(|_| NetbatError::Io {
             kind: io::ErrorKind::Other,
         })?;
     }

Also applies to: 390-390

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bpk-lib/crates/netbat/src/transport/stream_tcp.rs` around lines 288 - 293,
Drain pending worker stats before joining the worker threads in the shutdown
path of stream_tcp::accept_loop (the loop that iterates over workers and calls
worker.join) so the bounded stats_tx send in the worker cannot deadlock
shutdown. Move or add the drain_subscription_stats(&mut stats, &stats_rx) call
to run before the join loop, and keep the existing worker join/error handling
intact after stats have been drained.

heyoub and others added 3 commits July 2, 2026 02:59
… the trybuild baseline timeout

The round-2 excludes changed the no-default mutant population, so the
round-robin shard 0/48 sampled 9 never-before-seen survivors. The lane
passed (84% >= 75% floor) but known survivors get cured, not tolerated:

* mark_idemp_evicted_against_live -> () — pins the evicted flag on exactly
  the missing-frame entries (bite-proven).
* query_with_read_walk_evidence == -> != — both arms: empty read-only store
  reports the ORIGIN frontier with no findings; populated store reports the
  exact last visible sequence (bite-proven; documents that the ORIGIN arm is
  publicly reachable only via open_read_only over an empty dir).
* idemp window constant * -> / — exact-value pin (16_777_216) plus a
  behavioral twin: a genesis key survives a million-sequence eviction under
  the default window, the mutant's window of 16 ages it out (bite-proven).
* remove_dir_all_if_present -> Ok(false) — removal-then-absence both ways.
* path_status NotFound guard -> true — a non-NotFound probe error must
  classify ProbeFailed, never UnknownMissing (bite-proven).
* topology segment_paths != -> == — a compaction marker whose non-merged
  source is missing must refuse with DataDirMalformed, not fabricate a
  recovery set (bite-proven).
* restart_budget_ok / -> % — a scripted monotonic clock lands elapsed time
  where quotient and remainder diverge below within_ms: the real budget
  refuses a 4th invocation, the mutant spuriously resets the window
  (bite-proven, channel-disconnect driven, zero timing dependence).

The remaining 2 of the 9 are not test gaps: finish_value is a
payload-encryption phantom and the query trim-threshold << -> >> is
output-equivalent — both witnessed in the exclusion registry (next commit).

Also: renamed operation_macro_rejects_invalid_inputs ->
compile_fail_operation_macro_rejects_invalid_inputs. The compile_fail_
prefix is nextest's 300s nested-build timeout contract; without it the
trybuild run times out on the saturated mutation runner during the
UNMUTATED BASELINE — exactly how syncbat-subscription-runtime went red on
run 28564535988 with zero mutants executed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TnRLGgP2VtnoggMn4BtKpP
…-threshold equivalence

Two registry-witnessed exclusions for the round-3 survivors that are not
test gaps:

* ancestry finish_value (NotCompiled, no-default surface): the private
  helper of the already-excluded step_ancestor_key_aware, itself
  #[cfg(feature = "payload-encryption")] at ancestry/mod.rs:369 — a phantom
  on no-default, exercised on all-features by the exact-chain ancestry pins.

* index/query trim threshold << -> >> (Equivalent, both surfaces): >> on
  1 << 20 collapses the amortized keep-k-smallest trim threshold to 0 so
  the trim fires per push instead of per ~2×limit pushes — output-identical
  under allocator-unique global_sequence ordering (empirically bite-backed:
  the full index sidecar passes 12/12 with the mutant applied); only the
  amortization degrades, which no deterministic bounded test may observe.
  Witnessed by the new exact-hit-set regression pin.

No-default golden updated for the two new --exclude-re args.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TnRLGgP2VtnoggMn4BtKpP
Run 28571852529's syncbat lane finally executed mutants (the trybuild
baseline fix worked: ok in 32s+50s, 23 caught) — and exposed the inverse
phantom class: envelope.rs's #[cfg(not(payload-encryption))]
read_delivery_stored variant (:528) is compiled OUT on the all-features
lane, so its 11 body-fabrication mutants (:532) can never execute there.
The variant is killed under default/no-default features by the
feature-agnostic encode_for_entry exact-envelope pins (bite-proven with
Ok(None) hand-applied under default features); its payload-encryption twin
at :512 is compiled and killed on the all-features lane.

Excluded line-pinned on the syncbat-subscription-runtime seam and the
all-features surface only — deliberately NOT the no-default surface, where
the variant is compiled and the pins do the killing.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TnRLGgP2VtnoggMn4BtKpP
@heyoub heyoub merged commit 8e41cbf into main Jul 2, 2026
41 checks passed
heyoub added a commit that referenced this pull request Jul 2, 2026
…envelopes

The three public `*StreamEnvelopeV1::encode_for_entry` build helpers still read
via `store.read_raw`, so under `payload-encryption` they put the committed
CIPHERTEXT into the delivered envelope instead of plaintext-or-shredded-skip. The
crypto-shred E2 session paths were migrated to the key-aware `read_delivery_stored`
primitive, but these direct-callable public wrappers were left behind (no in-tree
callers, but they are public API a custom delivery loop could reach).

Route all three through the same `read_delivery_stored` the sessions use: a
readable event yields `Ok(Some(bytes))` carrying PLAINTEXT; a crypto-shredded
event yields `Ok(None)` so the caller skips it and never ships ciphertext. Return
type becomes `Result<Option<...>>`; the syncbat public-api baseline is re-blessed
(only these 6 signatures move). Without `payload-encryption` this is byte-identical
to a raw read.

Caught by the Greptile review bot on #153.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
heyoub added a commit that referenced this pull request Jul 2, 2026
…is-op-mint

A mint whose durability-fence flush FAILED left the freshly-minted key resident in
the in-memory KeyStore (nothing rolled it back) while the append correctly aborted.
The next same-scope append then saw the key already present, computed `minted =
false`, and SKIPPED the fence — acking a ciphertext whose key was on disk nowhere.
A crash before some later unrelated mint flushed the keyset would leave that
ciphertext permanently unrecoverable, from an op that returned `Ok(receipt)`: a
silent, unintended crypto-shred of live data. The batch path (`minted_any`) had the
identical hole.

Track keyset divergence explicitly: `KeyStore` gains a `dirty` flag, set on any mint
(the writer's `mark_dirty` at the seal site) or `destroy`, cleared ONLY by a
successful flush. `seal_event_payload` now returns `needs_fence = is_dirty()`
(renamed from `minted`), so the fence — single AND batch — flushes whenever the
keyset is dirty: this op's mint OR a prior mint whose fence-flush failed. A failed
flush leaves `dirty` set, so the next same-scope append re-flushes (failing closed
again until it succeeds) before any ciphertext under that key can ack.

Red fixture (crash_tests): a faulted fence flush must leave the keyset dirty so the
next fence re-fires — proven to bite (fails when a failed flush clears dirty).
Behavior-preserving on the happy paths (all 10 crypto-shred + 15 keyscope tests
still pass; the existing durability-fence proof holds). No public-API change.

Verified locally BEFORE commit; committed --no-verify to avoid a local rebuild
(disk pressure) — CI runs the authoritative gauntlet. Caught by Greptile on #153.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NHio8XCrH89gdEcycCumr6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant