Skip to content

feat: agent safety — MCP tool annotations, guide, and alegra agent guard#43

Merged
jjuanrivvera99 merged 3 commits into
developfrom
feat/agent-safety
Jun 11, 2026
Merged

feat: agent safety — MCP tool annotations, guide, and alegra agent guard#43
jjuanrivvera99 merged 3 commits into
developfrom
feat/agent-safety

Conversation

@jjuanrivvera99

Copy link
Copy Markdown
Member

Makes it easy — and honest — to stop an AI agent from running destructive accounting operations.

What's here

1. MCP tool safety annotations. alegra mcp now advertises standard MCP tool annotations on every tool: reads carry readOnlyHint, writes/actions carry destructiveHint. Hosts that honor them (e.g. Codex) gate destructive operations automatically, no config. Verified via tools/list: 121 read-only, 106 destructive. (Merged into cobra Annotations, so the completion-columns annotation is never clobbered.)

2. alegra agent guard — generates the per-host safety config, with the destructive ops derived from the live command tree (always complete, zero maintenance):

  • --host claude-codesettings.json (deny/ask rules) + a PreToolUse hook that definitively blocks the irreversible ops on both the Bash and MCP surfaces (it inspects the real command/tool name, so glob tricks can't bypass it).
  • --host codexconfig.toml (read-only sandbox + untrusted approval). Honestly notes Codex has no per-command deny hook.
  • --host opencodeopencode.json (deny/ask patterns for bash and MCP tool names).
  • Default: hard-block irreversible actions (delete, void, emit, stamp, close, and compound *-delete), require approval for ordinary writes (create/update/import), allow reads. --all-writes blocks writes too; --write installs the files (never overwriting an existing config).

3. Agent Safety guide (EN + ES) — a layered, honest explanation: layer 1 (annotations, advisory) + layer 2 (host config/hooks, the real gate) + layer 3 (CLI built-ins). Cross-referenced from the MCP, agent-skill, and vs-official-MCP pages.

Honesty (the framing Juan asked for)

The guide and command are explicit that annotations only advise — a host that ignores them runs everything. The hook is the definitive block on Claude Code; on OpenCode deny blocks; on Codex the hard block is the read-only sandbox (approval otherwise). Verified the generated hook blocks void/emit/delete/*-delete on both surfaces and lets reads/creates through.

Verification

  • make check: gofmt, vet, golangci-lint (0 issues), tests green.
  • New tests: TestClassifyAPICommands, TestGuard_*, TestMCPHints (annotations contract).
  • Functional test of the generated hook against sample Bash + MCP payloads.
  • mkdocs build --strict passes (EN + ES).

Docs-only changes ride along; this is a feature release (0.9.0).

alegra mcp now advertises MCP tool annotations (readOnlyHint /
destructiveHint) on every tool, so hosts that honor them (e.g. Codex)
gate destructive operations automatically. Read commands (list/get/
export, catalog, reports, doctor, version, items stock) are marked
read-only; create/update/delete and custom actions (void/emit/stamp/…)
are marked accordingly. Merged into cobra Annotations so the completion
columns annotation is never clobbered.

Adds docs/user-guide/agent-safety.md (EN + ES): a layered, honest guide
to gating destructive operations — annotations (advisory) + per-host
enforcement (Claude Code permissions & PreToolUse hooks, Codex sandbox/
approval, OpenCode permission rules) + CLI built-ins. Cross-referenced
from the MCP, skill, and vs-official-MCP pages.
New command that generates the agent-safety config for Claude Code,
Codex, or OpenCode, with the destructive operations derived from the
live command tree (always complete, zero maintenance).

- default: hard-block irreversible actions (delete, void, emit, stamp,
  close, and compound *-delete actions), require approval for ordinary
  writes (create/update/import), allow reads
- Claude Code: emits a PreToolUse hook that definitively blocks the
  irreversible ops on BOTH the Bash and MCP surfaces (it inspects the
  real command/tool name, so glob tricks can't bypass it), plus
  permission deny/ask rules
- Codex: read-only sandbox + untrusted approval (it has no per-command
  hook; honestly noted)
- OpenCode: permission deny/ask patterns for bash and MCP tool names
- flags: --all-writes (block writes too), --write (install files,
  never overwriting an existing config)

Verified the generated hook blocks void/emit/delete/*-delete on both
surfaces and lets reads/creates through. Documents it as the quick path
in the Agent Safety guide.
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 40c6a177-8711-41b3-897c-32236f10c916

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/agent-safety

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 89.09774% with 29 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
commands/agent_hosts.go 84.6% 10 Missing and 8 partials ⚠️
commands/agent.go 90.2% 11 Missing ⚠️

📢 Thoughts on this report? Let us know!

@jjuanrivvera99 jjuanrivvera99 merged commit 7b1ff81 into develop Jun 11, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants