feat(proxy): tool definition drift detection (on_tool_drift) (#25)#59
Merged
Conversation
…#25) - writeAuditRecord now accepts a pre-snapshotted `{ event, mode }` drift struct so hot-reload cannot alter the audited mode after the call is gated - Document ACTION_SEVERITY ranking semantics (enforcement intent, not reachability) - Add three tests: log+flag_destructive current-claim escalation, require_approval timeout defaultOnTimeout allow, log mode dry_run rule stricter-of-both
This was referenced Jun 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Closes the MCP rug-pull / tool-poisoning gap reported in #25: Helio previously adopted every upstream
tools/listdefinition wholesale — the annotation cache cleared and repopulated on each refresh and never looked at input schemas — so a tool whosereadOnlyHint/destructiveHintflipped after review silently rewrote its own baseline and matching policy rules stopped firing.ToolAnnotationCacheto baseline-and-diff: the entire tool definition (annotations,inputSchema,description,outputSchema,title, plus anothercatch-all) is fingerprinted on first sight (canonical JSON, hardened against JSON-parsed__proto__keys) and diffed on everytools/list. Baselines survive tool removal, so remove/re-add cycles can't reset them; reverting to the baseline clears the drift state.policies.on_tool_drift: block | require_approval | log(default block). Drifted tools are denied / escalated through the approval channel / logged. Inlogmode, rules are evaluated against both the baseline and current annotations and the stricter decision wins (dry_runoutranks the limit actions because it never forwards), so a definition flip cannot weaken enforcement in either direction.tools/listthat repeats a name is ambiguous, so the tool is marked drifted (aspect: duplicate) and cannot be un-drifted by entry ordering within the same payload (closes a drift-suppression bypass found in external review).policy_decision: tool_drift/tool_drift_reverted, immediate flush) from both the runtimetools/listpath and startup priming; blocked calls getblock_reason: tool_definition_driftwith self-repair feedback; gated/logged calls carryevidence_chain.tool_drift { mode, changes }with the mode snapshotted at gate time (hot-reload safe).allowed_totalandtop_toolsaggregates so analytics keep representing tool-call outcomes; they remain visible in the feed, totals, and by-decision breakdown.Behavior change (conservative default): a tool whose definition changes mid-session is now denied until the proxy restarts (re-baselines) or the upstream reverts. Set
policies.on_tool_drift: logfor observe-mode. Flagged in the CHANGELOG and the startup log.Known limitations (tracked follow-ups): baselines are in-memory per process — a restart re-baselines from whatever the upstream currently reports (follow-up issue filed; startup log warns explicitly). Drift-escalated approval tickets do not yet show the drift detail to the approver.
Credit: Maaz (Interlock) on the MCP Discord for the report.
Closes #25
Type of Change
Packages Affected
packages/proxypackages/dashboardpackages/python-sdkdocs/examples/Checklist
anytypes or@ts-ignorewithout justificationpnpm secrets:scan,pnpm docs:check:ci,pnpm audit --audit-level=high,pnpm build,pnpm lint,pnpm format:check,pnpm typecheck,pnpm test)feat:,fix:,docs:)How to Test
Annotation cache primed: <n> tool definitions baselined for drift detection ….tools/list); confirm the console drift warning, atool_driftaudit record, and that a subsequenttools/callto that tool is denied withblock_reason: tool_definition_drift.policies.on_tool_drift: logand repeat: the call proceeds, the audit record carriesevidence_chain.tool_drift, and a deny rule matching either the baseline or the current annotations still fires (stricter-of-both).tool_drift_revertedrecord is written and the tool unblocks.tools/listwith a duplicated tool name: the tool is gated (drifted_aspects: ["duplicate"]) regardless of entry order.pnpm --filter @gethelio/proxy test— full suite (1434 tests) passes, including thetool definition driftdetection and call-gating suites.Additional Context