fix(churn): speak warpline-mcp's newline JSON-RPC + honest paging/keying disclosure#77
Conversation
…urn/recently_changed The entity_high_churn_list / entity_recent_change_list MCP surfaces were dead-by-design: loomweave does not populate git_churn_count in v1.0 and, by the loomweave<->warpline seam HARD RULE, retains no cross-run history. This wires the read-time join to Warpline's FROZEN warpline_entity_churn_count_get (warpline.entity_churn_count.v1, 2026-06-13 interface lock SS1A / GV-LW-2). Read-only / enrich-only / dependency-sink: - New WarplineLookup trait (single method, the churn read) + WarplineMcpClient (MCP-stdio subprocess, mirrors the Filigree consumer pattern), injected as an Option<Arc<dyn ..>> defaulting to None. Lives in loomweave-federation. - Candidate universe = the entity catalogue (new entities_for_churn_candidates query), NOT the empty git_churn_count scan; warpline holds the counts. Scope-filter BEFORE the warpline call, then rank the scoped set by the returned counts. SEI-keyed refs with locator fallback (never drops a candidate). One bounded call, joined at read time, retained nowhere. - Honest-degrade (lock SS1C): warpline disabled/unreachable -> honest-empty with a warpline-named missing-signal note + churn_source provenance; never empty-as-clean, never a hard error breaking the core flow. GV-LW-2 is an executable test: the full frozen envelope fixture parsed through the real parse path, driven via an injected fake (no live MCP call from the hub context). Both honest-degrade paths (disabled, unreachable) are tested. Blast radius: READS the frozen warpline_entity_churn_count_get contract only; mints no new sibling obligation (the producer already ships). Live cross-member validation against a real Warpline, deep-pagination beyond warpline's limit, and the >CHURN_SCAN_CAP single-frame request bound are tracked follow-ups. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ing disclosure The Warpline churn consumer was a live NO-GO: entity_high_churn_list / entity_recent_change_list hung when integrations.warpline.enabled=true. Two transport bugs, both confirmed against warpline's source and live on lacuna: 1. Framing mismatch. The client drove warpline over Loomweave's Content-Length plugin framing, but warpline-mcp reads NEWLINE-delimited JSON-RPC (`for line in sys.stdin`, one response line per request line) — so warpline never parsed the request and the read blocked forever. 2. Wrong launcher (found only live). The default command was `warpline mcp`, which is not a valid warpline subcommand (usage error -> broken pipe). The MCP stdio server ships as the standalone `warpline-mcp` binary. Transport rewrite (loomweave-federation/src/warpline.rs): - Newline-delimited JSON-RPC; the whole handshake+call runs on a worker thread bounded by recv_timeout + kill-on-timeout, so a hung warpline degrades to the honest `warpline-unreachable` response instead of hanging (req #5). stderr is discarded (Stdio::null) so a large traceback can't block the child. - Default launcher resolves to `warpline-mcp` (not `warpline mcp`). - Send required `repo: <project_root>` (req #2); drop the unsupported `actor` param (req #3) — it is not in warpline's frozen churn schema. Parse result.structuredContent, falling back to content[0].text (req #4). - Config: keep WarplineConfig.actor (reserved, for deny_unknown_fields back-compat with configs that set it) and add an honored timeout_seconds (default 10), retiring the stale "no timeout knob" doc. Honesty floor (two zeros must not be conflated with a never-observed 0): - Send limit = refs.len() so warpline's default 100-item page does not silently cap the join at the top-100-by-churn. - churn_truncated + ChurnCountResponse::overflow_partial(): warpline keeps a 200-item in-band lead and spills the rest (apply_overflow), so an over-cap scope's truncated tail is DISCLOSED, not shipped as fabricated zeros. - churn_unresolved + unresolved_ref_count(): warpline returns locator:null for an unresolved SEI ref (a resolve hit always sets the NOT-NULL locator), so a keying-miss 0 ("real churn UNKNOWN") is disclosed rather than read as "this code never changes". Closes the disclosure asymmetry with churn_truncated. Disabled / unreachable / empty degrade paths (warpline-disabled, warpline-unreachable, churn_source:"warpline", signal) are preserved (req #6). Validated live on /home/john/lacuna (release + uv-installed binary): enabled -> real nonzero ranked churn with churn_truncated{counted:200,total:580}; recent_change -> real recency, no hang; scoped -> churn_unresolved{count:54}; disabled -> honest-empty. 564 federation+mcp tests pass; fmt + workspace clippy + cargo doc all clean. Follow-ups (filed as filigree observations, not addressed here): deep-pagination via warpline's overflow dump for >200-candidate scopes; the loomweave<->warpline locator-dialect + NULL-sei keying gap that produces the unresolved refs; and the same Content-Length-vs-newline framing latent in filigree.rs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 30549a3af1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let (candidates, scan_truncated) = | ||
| entities_for_churn_candidates(conn, CHURN_SCAN_CAP)?; |
There was a problem hiding this comment.
Apply scope before capping churn candidates
When an index has more than CHURN_SCAN_CAP entities, this fetches the first 50k rows ordered by id and only then applies the requested scope. For a scoped query targeting an entity or path whose rows sort after that cap, in_scope becomes empty, no refs are sent to Warpline, and the tool can report no recorded changes even though the exact scoped entity has churn data. The candidate query needs to be constrained by the resolved scope before applying the cap, or exact/id scopes should be materialized directly.
Useful? React with 👍 / 👎.
| WarplineEntityRef::for_entity(&e.id, sei.as_deref()) | ||
| }) | ||
| .collect(); | ||
| let response = match client.entity_churn_counts(&refs, mode.window().as_ref()) { |
There was a problem hiding this comment.
Release the reader before calling Warpline
This call runs inside readers.with_reader, so the SQLite reader connection is held while the code spawns and waits on the Warpline subprocess (up to the configured timeout, 10s by default). With Warpline enabled but slow or hung, concurrent churn requests can occupy the reader pool and block unrelated MCP reads even though the database is not needed during the external round trip. Collect the refs, drop the reader before the subprocess call, then reacquire a reader to render the page.
Useful? React with 👍 / 👎.
…t-Length) (#78) filigree.rs's MCP stdio client (the observation_create / observation_dismiss path, used by propose_guidance and guidance promotion) framed requests with Loomweave's Content-Length plugin framing, but filigree-mcp uses the official MCP Python SDK (`mcp.server.stdio.stdio_server`), whose stdio transport is NEWLINE-delimited JSON-RPC. Same bug class as the Warpline churn consumer (#77). Verified empirically against the installed filigree-mcp: a newline-delimited initialize gets a clean result; a Content-Length-framed one makes filigree-mcp emit an "Internal Server Error" notification, after which loomweave's Content-Length reader cannot parse filigree's newline responses and the call hangs. The HTTP read path (issues_for / entity-associations) was unaffected — only the stdio observation seam was broken. Fix (mirrors the warpline transport fix): - write_mcp_json / read_mcp_json now use newline framing: one compact JSON line + \n; responses read line-by-line, skipping non-matching ids (the init result and the notification's id:null error), EOF-before-match surfaced as an error. - Extracted run_mcp_tool_over_command(program, args, root, timeout, tool, args): the handshake+call runs on a worker thread bounded by recv_timeout + kill, so a hung filigree-mcp degrades instead of blocking forever (FILIGREE_MCP_TIMEOUT, 10s). stderr -> Stdio::null so a large traceback can't block the child. The resolved command is a parameter, so the transport is unit-testable with an injected fake newline server (no env mutation — set_var is unsafe under edition 2024 + unsafe_code=deny). - Last-resort launcher fallback `("filigree", ["mcp"])` -> `filigree-mcp` (the real binary); `filigree mcp` is not a valid subcommand. The happy path still resolves `python -m filigree.mcp_server` via `filigree mcp-status`. TDD: newline-framing helper round-trip + EOF-error, fallback-command guard, and a real-subprocess happy-path + timeout-not-hang test driving a fake newline server. 131 federation tests pass; fmt + clippy (federation/mcp/cli, -D warnings) + cargo doc clean. Live-probed against the real filigree-mcp on /home/john/lacuna (newline initialize+tools/call round-trips; Content-Length errors). Residual (pre-existing, NOT bounded here): resolve_filigree_mcp_command runs `filigree mcp-status --json` via a plain blocking .output() before the timeout-bounded section, so a hung mcp-status is an unbounded wait outside the new deadline. Short-lived in practice; bounding it is a follow-up. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, warpline #77 in flight); PDR-0006
…; both warpline pieces shipped PR #77 (warpline churn transport) merged to main at 1d2b4fa concurrent with the keying-gap work; the docs from PR #79 still called it in-flight/open. Move it to Shipped, empty the federation-transport In-flight section, and drop it as a DECIDE candidate. Truthfulness fix only; no decision change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Problem
The Warpline churn consumer was a live NO-GO:
entity_high_churn_list/entity_recent_change_listhung wheneverintegrations.warpline.enabled: true. Two transport bugs, both confirmed against warpline's source and reproduced live on/home/john/lacuna:warpline-mcpover Loomweave's Content-Length plugin framing (ADR-002), butwarpline-mcpreads newline-delimited JSON-RPC (for line in sys.stdin, one response line per request line). Warpline never parsed the request → the read blocked forever.warpline mcp, which is not a valid warpline subcommand (usage error → broken pipe). The MCP stdio server ships as the standalonewarpline-mcpbinary.Plus: the call omitted the required
repo, sent an unsupportedactor, and had no timeout.The fix
Transport (
loomweave-federation/src/warpline.rs)recv_timeout+ kill-on-timeout, so a hung warpline degrades to the honestwarpline-unreachableresponse instead of hanging.stderr → Stdio::nullso a large traceback can't block the child.warpline-mcp(notwarpline mcp).repo: <project_root>; drop the unsupportedactor; parseresult.structuredContent, falling back toresult.content[0].text.WarplineConfig.actor(reserved, fordeny_unknown_fieldsback-compat with configs that set it) and add an honoredtimeout_seconds(default 10).Honesty floor — two distinct zeros must not be conflated with a genuine never-observed
0:limit = refs.len()so warpline's default 100-item page doesn't silently cap the join at the top-100-by-churn.churn_truncated+overflow_partial(): warpline keeps a 200-item in-band lead and spills the rest, so an over-cap scope's truncated tail is disclosed, not shipped as fabricated zeros.churn_unresolved+unresolved_ref_count(): warpline returnslocator: nullfor an unresolved SEI ref, so a keying-miss0("real churn UNKNOWN") is disclosed rather than read as "this code never changes."Disabled / unreachable / empty degrade paths (
warpline-disabled,warpline-unreachable,churn_source: "warpline",signal) are preserved.Validation
Live on
/home/john/lacuna(release + uv-installed binary):churn_truncated{counted:200,total:580}recent_change→ real recency, no hangchurn_unresolved{count:54}564 federation+mcp tests pass (+5 new: real-subprocess transport, timeout-not-hang, default-command guard, both disclosures);
fmt+ workspaceclippy -D warnings+cargo doc -D warningsclean.Follow-ups (filed as observations, not in this PR)
filigree.rs.🤖 Generated with Claude Code