fix(churn): speak warpline-mcp's newline JSON-RPC + honest paging/keying disclosure by tachyon-beep · Pull Request #77 · foundryside-dev/loomweave

tachyon-beep · 2026-06-28T07:48:41Z

Problem

The Warpline churn consumer was a live NO-GO: entity_high_churn_list / entity_recent_change_list hung whenever integrations.warpline.enabled: true. Two transport bugs, both confirmed against warpline's source and reproduced live on /home/john/lacuna:

Framing mismatch. The client drove warpline-mcp over Loomweave's Content-Length plugin framing (ADR-002), but warpline-mcp reads newline-delimited JSON-RPC (for line in sys.stdin, one response line per request line). Warpline never parsed the request → the read blocked forever.
Wrong launcher (found only in live validation). The default command was warpline mcp, which is not a valid warpline subcommand (usage error → broken pipe). The MCP stdio server ships as the standalone warpline-mcp binary.

Plus: the call omitted the required repo, sent an unsupported actor, and had no timeout.

The fix

Transport (loomweave-federation/src/warpline.rs)

Newline-delimited JSON-RPC; the whole handshake+call runs on a worker thread bounded by recv_timeout + kill-on-timeout, so a hung warpline degrades to the honest warpline-unreachable response instead of hanging. stderr → Stdio::null so a large traceback can't block the child.
Default launcher resolves to warpline-mcp (not warpline mcp).
Send required repo: <project_root>; drop the unsupported actor; parse result.structuredContent, falling back to result.content[0].text.
Config: keep WarplineConfig.actor (reserved, for deny_unknown_fields back-compat with configs that set it) and add an honored timeout_seconds (default 10).

Honesty floor — two distinct zeros must not be conflated with a genuine never-observed 0:

Send limit = refs.len() so warpline's default 100-item page doesn't silently cap the join at the top-100-by-churn.
churn_truncated + overflow_partial(): warpline keeps a 200-item in-band lead and spills the rest, so an over-cap scope's truncated tail is disclosed, not shipped as fabricated zeros.
churn_unresolved + unresolved_ref_count(): warpline returns locator: null for an unresolved SEI ref, so a keying-miss 0 ("real churn UNKNOWN") is disclosed rather than read as "this code never changes."

Disabled / unreachable / empty degrade paths (warpline-disabled, warpline-unreachable, churn_source: "warpline", signal) are preserved.

Validation

Live on /home/john/lacuna (release + uv-installed binary):

enabled → real nonzero ranked churn with churn_truncated{counted:200,total:580}
recent_change → real recency, no hang
scoped → churn_unresolved{count:54}
disabled → honest-empty

564 federation+mcp tests pass (+5 new: real-subprocess transport, timeout-not-hang, default-command guard, both disclosures); fmt + workspace clippy -D warnings + cargo doc -D warnings clean.

Follow-ups (filed as observations, not in this PR)

Deep-pagination via warpline's overflow dump for >200-candidate scopes.
The loomweave↔warpline locator-dialect + NULL-sei keying gap that produces the unresolved refs (federation-contract fix).
The same Content-Length-vs-newline framing latent in filigree.rs.

🤖 Generated with Claude Code

…urn/recently_changed The entity_high_churn_list / entity_recent_change_list MCP surfaces were dead-by-design: loomweave does not populate git_churn_count in v1.0 and, by the loomweave<->warpline seam HARD RULE, retains no cross-run history. This wires the read-time join to Warpline's FROZEN warpline_entity_churn_count_get (warpline.entity_churn_count.v1, 2026-06-13 interface lock SS1A / GV-LW-2). Read-only / enrich-only / dependency-sink: - New WarplineLookup trait (single method, the churn read) + WarplineMcpClient (MCP-stdio subprocess, mirrors the Filigree consumer pattern), injected as an Option<Arc<dyn ..>> defaulting to None. Lives in loomweave-federation. - Candidate universe = the entity catalogue (new entities_for_churn_candidates query), NOT the empty git_churn_count scan; warpline holds the counts. Scope-filter BEFORE the warpline call, then rank the scoped set by the returned counts. SEI-keyed refs with locator fallback (never drops a candidate). One bounded call, joined at read time, retained nowhere. - Honest-degrade (lock SS1C): warpline disabled/unreachable -> honest-empty with a warpline-named missing-signal note + churn_source provenance; never empty-as-clean, never a hard error breaking the core flow. GV-LW-2 is an executable test: the full frozen envelope fixture parsed through the real parse path, driven via an injected fake (no live MCP call from the hub context). Both honest-degrade paths (disabled, unreachable) are tested. Blast radius: READS the frozen warpline_entity_churn_count_get contract only; mints no new sibling obligation (the producer already ships). Live cross-member validation against a real Warpline, deep-pagination beyond warpline's limit, and the >CHURN_SCAN_CAP single-frame request bound are tracked follow-ups. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ing disclosure The Warpline churn consumer was a live NO-GO: entity_high_churn_list / entity_recent_change_list hung when integrations.warpline.enabled=true. Two transport bugs, both confirmed against warpline's source and live on lacuna: 1. Framing mismatch. The client drove warpline over Loomweave's Content-Length plugin framing, but warpline-mcp reads NEWLINE-delimited JSON-RPC (`for line in sys.stdin`, one response line per request line) — so warpline never parsed the request and the read blocked forever. 2. Wrong launcher (found only live). The default command was `warpline mcp`, which is not a valid warpline subcommand (usage error -> broken pipe). The MCP stdio server ships as the standalone `warpline-mcp` binary. Transport rewrite (loomweave-federation/src/warpline.rs): - Newline-delimited JSON-RPC; the whole handshake+call runs on a worker thread bounded by recv_timeout + kill-on-timeout, so a hung warpline degrades to the honest `warpline-unreachable` response instead of hanging (req #5). stderr is discarded (Stdio::null) so a large traceback can't block the child. - Default launcher resolves to `warpline-mcp` (not `warpline mcp`). - Send required `repo: <project_root>` (req #2); drop the unsupported `actor` param (req #3) — it is not in warpline's frozen churn schema. Parse result.structuredContent, falling back to content[0].text (req #4). - Config: keep WarplineConfig.actor (reserved, for deny_unknown_fields back-compat with configs that set it) and add an honored timeout_seconds (default 10), retiring the stale "no timeout knob" doc. Honesty floor (two zeros must not be conflated with a never-observed 0): - Send limit = refs.len() so warpline's default 100-item page does not silently cap the join at the top-100-by-churn. - churn_truncated + ChurnCountResponse::overflow_partial(): warpline keeps a 200-item in-band lead and spills the rest (apply_overflow), so an over-cap scope's truncated tail is DISCLOSED, not shipped as fabricated zeros. - churn_unresolved + unresolved_ref_count(): warpline returns locator:null for an unresolved SEI ref (a resolve hit always sets the NOT-NULL locator), so a keying-miss 0 ("real churn UNKNOWN") is disclosed rather than read as "this code never changes". Closes the disclosure asymmetry with churn_truncated. Disabled / unreachable / empty degrade paths (warpline-disabled, warpline-unreachable, churn_source:"warpline", signal) are preserved (req #6). Validated live on /home/john/lacuna (release + uv-installed binary): enabled -> real nonzero ranked churn with churn_truncated{counted:200,total:580}; recent_change -> real recency, no hang; scoped -> churn_unresolved{count:54}; disabled -> honest-empty. 564 federation+mcp tests pass; fmt + workspace clippy + cargo doc all clean. Follow-ups (filed as filigree observations, not addressed here): deep-pagination via warpline's overflow dump for >200-candidate scopes; the loomweave<->warpline locator-dialect + NULL-sei keying gap that produces the unresolved refs; and the same Content-Length-vs-newline framing latent in filigree.rs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 30549a3af1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-28T07:56:52Z

+                let (candidates, scan_truncated) =
+                    entities_for_churn_candidates(conn, CHURN_SCAN_CAP)?;


Apply scope before capping churn candidates

When an index has more than CHURN_SCAN_CAP entities, this fetches the first 50k rows ordered by id and only then applies the requested scope. For a scoped query targeting an entity or path whose rows sort after that cap, in_scope becomes empty, no refs are sent to Warpline, and the tool can report no recorded changes even though the exact scoped entity has churn data. The candidate query needs to be constrained by the resolved scope before applying the cap, or exact/id scopes should be materialized directly.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-28T07:56:52Z

+                        WarplineEntityRef::for_entity(&e.id, sei.as_deref())
+                    })
+                    .collect();
+                let response = match client.entity_churn_counts(&refs, mode.window().as_ref()) {


Release the reader before calling Warpline

This call runs inside readers.with_reader, so the SQLite reader connection is held while the code spawns and waits on the Warpline subprocess (up to the configured timeout, 10s by default). With Warpline enabled but slow or hung, concurrent churn requests can occupy the reader pool and block unrelated MCP reads even though the database is not needed during the external round trip. Collect the refs, drop the reader before the subprocess call, then reacquire a reader to render the page.

Useful? React with 👍 / 👎.

…t-Length) (#78) filigree.rs's MCP stdio client (the observation_create / observation_dismiss path, used by propose_guidance and guidance promotion) framed requests with Loomweave's Content-Length plugin framing, but filigree-mcp uses the official MCP Python SDK (`mcp.server.stdio.stdio_server`), whose stdio transport is NEWLINE-delimited JSON-RPC. Same bug class as the Warpline churn consumer (#77). Verified empirically against the installed filigree-mcp: a newline-delimited initialize gets a clean result; a Content-Length-framed one makes filigree-mcp emit an "Internal Server Error" notification, after which loomweave's Content-Length reader cannot parse filigree's newline responses and the call hangs. The HTTP read path (issues_for / entity-associations) was unaffected — only the stdio observation seam was broken. Fix (mirrors the warpline transport fix): - write_mcp_json / read_mcp_json now use newline framing: one compact JSON line + \n; responses read line-by-line, skipping non-matching ids (the init result and the notification's id:null error), EOF-before-match surfaced as an error. - Extracted run_mcp_tool_over_command(program, args, root, timeout, tool, args): the handshake+call runs on a worker thread bounded by recv_timeout + kill, so a hung filigree-mcp degrades instead of blocking forever (FILIGREE_MCP_TIMEOUT, 10s). stderr -> Stdio::null so a large traceback can't block the child. The resolved command is a parameter, so the transport is unit-testable with an injected fake newline server (no env mutation — set_var is unsafe under edition 2024 + unsafe_code=deny). - Last-resort launcher fallback `("filigree", ["mcp"])` -> `filigree-mcp` (the real binary); `filigree mcp` is not a valid subcommand. The happy path still resolves `python -m filigree.mcp_server` via `filigree mcp-status`. TDD: newline-framing helper round-trip + EOF-error, fallback-command guard, and a real-subprocess happy-path + timeout-not-hang test driving a fake newline server. 131 federation tests pass; fmt + clippy (federation/mcp/cli, -D warnings) + cargo doc clean. Live-probed against the real filigree-mcp on /home/john/lacuna (newline initialize+tools/call round-trips; Content-Length errors). Residual (pre-existing, NOT bounded here): resolve_filigree_mcp_command runs `filigree mcp-status --json` via a plain blocking .output() before the timeout-bounded section, so a hung mcp-status is an unbounded wait outside the new deadline. Short-lived in practice; bounding it is a follow-up. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…, warpline #77 in flight); PDR-0006

…; both warpline pieces shipped PR #77 (warpline churn transport) merged to main at 1d2b4fa concurrent with the keying-gap work; the docs from PR #79 still called it in-flight/open. Move it to Shipped, empty the federation-transport In-flight section, and drop it as a DECIDE candidate. Truthfulness fix only; no decision change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

tachyon-beep and others added 2 commits June 27, 2026 12:49

chatgpt-codex-connector Bot reviewed Jun 28, 2026

View reviewed changes

tachyon-beep mentioned this pull request Jun 28, 2026

fix(federation): drive filigree-mcp over newline JSON-RPC (was Content-Length) #78

Merged

tachyon-beep added a commit that referenced this pull request Jun 28, 2026

product: federation-transport reliability cycle (filigree #78 shipped…

2b5b64c

…, warpline #77 in flight); PDR-0006

tachyon-beep merged commit 1d2b4fa into main Jun 29, 2026
4 checks passed

tachyon-beep deleted the feat/warpline-churn-consumer branch June 29, 2026 09:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(churn): speak warpline-mcp's newline JSON-RPC + honest paging/keying disclosure#77

fix(churn): speak warpline-mcp's newline JSON-RPC + honest paging/keying disclosure#77
tachyon-beep merged 2 commits into
mainfrom
feat/warpline-churn-consumer

tachyon-beep commented Jun 28, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 28, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		let (candidates, scan_truncated) =
		entities_for_churn_candidates(conn, CHURN_SCAN_CAP)?;

Uh oh!

Conversation

tachyon-beep commented Jun 28, 2026

Problem

The fix

Validation

Follow-ups (filed as observations, not in this PR)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant