feat(proxy): tool definition drift detection (on_tool_drift) (#25) by olivrg · Pull Request #59 · gethelio/helio

olivrg · 2026-06-11T16:24:03Z

Description

Closes the MCP rug-pull / tool-poisoning gap reported in #25: Helio previously adopted every upstream tools/list definition wholesale — the annotation cache cleared and repopulated on each refresh and never looked at input schemas — so a tool whose readOnlyHint/destructiveHint flipped after review silently rewrote its own baseline and matching policy rules stopped firing.

Rewrite ToolAnnotationCache to baseline-and-diff: the entire tool definition (annotations, inputSchema, description, outputSchema, title, plus an other catch-all) is fingerprinted on first sight (canonical JSON, hardened against JSON-parsed __proto__ keys) and diffed on every tools/list. Baselines survive tool removal, so remove/re-add cycles can't reset them; reverting to the baseline clears the drift state.
Add policies.on_tool_drift: block | require_approval | log (default block). Drifted tools are denied / escalated through the approval channel / logged. In log mode, rules are evaluated against both the baseline and current annotations and the stricter decision wins (dry_run outranks the limit actions because it never forwards), so a definition flip cannot weaken enforcement in either direction.
Fail closed on duplicate tool names: a tools/list that repeats a name is ambiguous, so the tool is marked drifted (aspect: duplicate) and cannot be un-drifted by entry ordering within the same payload (closes a drift-suppression bypass found in external review).
Audit every drift event (policy_decision: tool_drift / tool_drift_reverted, immediate flush) from both the runtime tools/list path and startup priming; blocked calls get block_reason: tool_definition_drift with self-repair feedback; gated/logged calls carry evidence_chain.tool_drift { mode, changes } with the mode snapshotted at gate time (hot-reload safe).
Exclude drift-event records from the dashboard's allowed_total and top_tools aggregates so analytics keep representing tool-call outcomes; they remain visible in the feed, totals, and by-decision breakdown.
Policy evaluation now always uses the baseline annotations — the definitions the operator reviewed — never the latest upstream claim.

Behavior change (conservative default): a tool whose definition changes mid-session is now denied until the proxy restarts (re-baselines) or the upstream reverts. Set policies.on_tool_drift: log for observe-mode. Flagged in the CHANGELOG and the startup log.

Known limitations (tracked follow-ups): baselines are in-memory per process — a restart re-baselines from whatever the upstream currently reports (follow-up issue filed; startup log warns explicitly). Drift-escalated approval tickets do not yet show the drift detail to the approver.

Credit: Maaz (Interlock) on the MCP Discord for the report.

Closes #25

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Refactor (no functional changes)
Documentation
CI / build / tooling

Packages Affected

Checklist

I have read CONTRIBUTING.md
My code follows the existing style (ESLint + Prettier pass)
TypeScript strict mode — no any types or @ts-ignore without justification
I have added or updated tests for my changes
All CI checks pass (pnpm secrets:scan, pnpm docs:check:ci, pnpm audit --audit-level=high, pnpm build, pnpm lint, pnpm format:check, pnpm typecheck, pnpm test)
I have updated documentation if this changes user-facing behavior
Commit messages follow Conventional Commits (e.g. feat:, fix:, docs:)

How to Test

Point Helio at any MCP server, start it, and note the startup line Annotation cache primed: <n> tool definitions baselined for drift detection ….
Change a tool's annotations, schema, or description on the upstream mid-session (or replay a modified tools/list); confirm the console drift warning, a tool_drift audit record, and that a subsequent tools/call to that tool is denied with block_reason: tool_definition_drift.
Set policies.on_tool_drift: log and repeat: the call proceeds, the audit record carries evidence_chain.tool_drift, and a deny rule matching either the baseline or the current annotations still fires (stricter-of-both).
Revert the upstream definition: a tool_drift_reverted record is written and the tool unblocks.
Send a tools/list with a duplicated tool name: the tool is gated (drifted_aspects: ["duplicate"]) regardless of entry order.
pnpm --filter @gethelio/proxy test — full suite (1434 tests) passes, including the tool definition drift detection and call-gating suites.

Additional Context

…#25) - writeAuditRecord now accepts a pre-snapshotted `{ event, mode }` drift struct so hot-reload cannot alter the audited mode after the call is gated - Document ACTION_SEVERITY ranking semantics (enforcement intent, not reachability) - Add three tests: log+flag_destructive current-claim escalation, require_approval timeout defaultOnTimeout allow, log mode dry_run rule stricter-of-both

olivrg added 13 commits June 11, 2026 12:52

feat(proxy): baseline-and-diff tool definition cache (#25)

a86ee66

fix(proxy): harden definition fingerprint against __proto__ keys (#25)

812c57d

feat(proxy): policies.on_tool_drift config option (#25)

2ef6ad3

docs: fix anchor for tool definition drift section (#25)

ad27704

feat(proxy): tool_definition_drift self-repair feedback (#25)

d8b6dd0

feat(proxy): audit tool definition drift events (#25)

d6f338b

feat(proxy): gate tools/call on drifted tool definitions (#25)

d33aead

feat(proxy): exclude drift events from audit call aggregates (#25)

a89577b

docs: changelog for tool definition drift detection (#25)

c656877

docs: update prime log example for drift baselining (#25)

a1493e0

fix(proxy): fail closed on duplicate tool names in tools/list (#25)

a1025df

fix(proxy): rank dry_run above limit actions in drift log mode (#25)

3c2e6a8

olivrg merged commit 92db2ef into main Jun 11, 2026
3 checks passed

olivrg deleted the fix/tool-definition-drift branch June 11, 2026 16:31

This was referenced Jun 11, 2026

Tool definition drift undetected - annotation/schema cache replaces wholesale, no baseline-and-diff (MCP rug-pull) #25

Closed

docs: add changelog entry for v0.4.0 #61

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(proxy): tool definition drift detection (on_tool_drift) (#25)#59

feat(proxy): tool definition drift detection (on_tool_drift) (#25)#59
olivrg merged 13 commits into
mainfrom
fix/tool-definition-drift

olivrg commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

olivrg commented Jun 11, 2026

Description

Type of Change

Packages Affected

Checklist

How to Test

Additional Context

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant