Skip to content

[AAASM-2943] 🐛 (pydantic-ai): Patch concrete FunctionToolset.call_tool so function-tool governance fires#120

Merged
Chisanan232 merged 4 commits into
masterfrom
v0.0.1/AAASM-2943/pydantic_function_toolset
Jun 14, 2026
Merged

[AAASM-2943] 🐛 (pydantic-ai): Patch concrete FunctionToolset.call_tool so function-tool governance fires#120
Chisanan232 merged 4 commits into
masterfrom
v0.0.1/AAASM-2943/pydantic_function_toolset

Conversation

@Chisanan232

Copy link
Copy Markdown
Contributor

Description

Function tools registered with @agent.tool_plain / @agent.tool execute through the concrete pydantic_ai.toolsets.function.FunctionToolset.call_tool, whose MRO is FunctionToolset → AbstractToolset. FunctionToolset overrides call_tool WITHOUT calling super().call_tool(...). The AAASM-2923 patch only patched AbstractToolset.call_tool (the base), so for the most common tool type the patch was shadowed: a denied function tool ran without raising PolicyViolationError — governance was silently bypassed.

This PR mirrors the Google ADK concrete-class approach already in this file (_load_google_adk_concrete_tool_classes):

  • Adds _load_pydantic_ai_concrete_toolset_classes(...) which loads FunctionToolset explicitly and generically discovers any AbstractToolset subclass that defines its OWN call_tool (checked via vars(cls)) in pydantic_ai.toolsets.
  • apply() patches each discovered concrete class in addition to the abstract base; revert() symmetrically reverts them (idempotent).
  • The patched-flag check now reads the class's OWN __dict__ (vars(cls)), so a patched base never masks a concrete subclass.
  • Everything stays fail-soft when Pydantic AI is absent (ImportError → no-op).
  • The full governance flow is preserved: check_tool_start → pending/approval → deny → PolicyViolationError → spawn context → record result.

This fixes the function-tool shadowing gap introduced by AAASM-2923 and unblocks AAASM-2939 (pending a release).

Type of Change

  • ✨ New feature
  • 🔧 Bug fix
  • ♻️ Refactoring
  • 🍀 Performance improvement
  • 📚 Documentation update
  • 🚀 Release

Breaking Changes

  • No
  • Yes (please describe below)

No public API change. pydantic-ai is NOT added as a runtime dependency.

Related Issues

  • Related JIRA ticket: AAASM-2943 (fixes the function-tool shadowing gap from AAASM-2923; unblocks AAASM-2939 pending a release)
  • Related GitHub issues: N/A

Testing

Describe the testing performed for this PR:

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No tests required (explain why)

Details:

  • Unit (offline, always run): fake-class tests in test/unit/adapters/pydantic_ai/test_pydantic_ai_patch.py model a FunctionToolset-style subclass overriding call_tool without super(). They assert discovery finds it, apply() patches both base and concrete (own-dict flags), invoking the denied tool through the concrete call_tool raises PolicyViolationError, revert() restores both and is idempotent, and discovery is fail-soft without Pydantic AI.
  • Integration (empirical proof): test/integration/test_pydantic_ai_interception_integration.py now builds a real Agent(TestModel(call_tools=["blocked_tool"])) with an @agent.tool_plain tool and asserts the denied tool raises PolicyViolationError after apply(). Verified end-to-end against pydantic-ai 1.107.0 locally (installed dev-only, not committed as a dependency). Also confirmed that with only the base patch (master behavior) the denied tool does NOT raise — proving this change is what closes the gap. The same file's pre-existing real-tool-class test had a stale assertion (on >=0.3.0 Tool has no _run, so the flag lands on AbstractToolset); it now asserts the version-appropriate hook and reverts afterward.
  • Gate: uv sync; ruff check . and ruff format --check . — zero new findings (189 ruff findings are pre-existing on master and unrelated to the adapter); mypy agent_assembly — 58 errors, identical to master, none in changed files; pytest test/ — 432 passed, 11 skipped (the two real-library tests skip when pydantic-ai is absent). With pydantic-ai installed, all 43 adapter/integration tests pass.

Checklist

  • Code follows project style guidelines
  • Self-review completed
  • Comments added for complex logic
  • Documentation updated if needed
  • All tests passing

🤖 Generated with Claude Code

Function tools (@agent.tool_plain / @agent.tool) execute through
pydantic_ai.toolsets.function.FunctionToolset.call_tool, which overrides
AbstractToolset.call_tool WITHOUT calling super(), so the base-class patch
from AAASM-2923 was shadowed and a denied function tool ran without raising
PolicyViolationError.

Discover concrete AbstractToolset subclasses that define their own call_tool
(FunctionToolset explicitly plus generic discovery in pydantic_ai.toolsets)
and patch each in addition to the abstract base. Patched-flag checks now read
the class's own __dict__ so a patched base never masks a concrete subclass,
and revert() symmetrically reverts the concrete classes. Stays fail-soft when
Pydantic AI is absent.

Refs AAASM-2943
Fake-class unit tests modelling a FunctionToolset-style subclass that
overrides call_tool WITHOUT super(). Assert discovery finds it, apply()
patches both base and concrete classes (own-dict flags), a denied tool
invoked through the concrete call_tool raises PolicyViolationError, revert()
restores both and is idempotent, and discovery is fail-soft when Pydantic AI
is absent.

Refs AAASM-2943
Add an integration test that builds a real Agent(TestModel(call_tools=[...]))
with an @agent.tool_plain tool and asserts the denied tool raises
PolicyViolationError after apply() — proving function-tool governance fires
through the concrete FunctionToolset.call_tool on Pydantic AI >=0.3.0.

Also fix the stale assertion in the existing real-tool-class test: on
>=0.3.0 Tool has no _run, so the patched flag lands on AbstractToolset, not
Tool. Assert the version-appropriate hook and revert() after the test.

Refs AAASM-2943
@codecov

codecov Bot commented Jun 14, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…roof

The real-library integration tests for the Pydantic AI adapter are guarded
by importorskip, so without the framework installed they skip in CI. The
`dev` group includes `test`, and the integration-test job installs `dev`,
so adding pydantic-ai here makes the function-tool governance regression
(test_pydantic_ai_real_function_tool_deny_raises_after_apply) actually run.
Dev/test-only — NOT a runtime dependency of the SDK.

Refs AAASM-2943
@sonarqubecloud

Copy link
Copy Markdown

@Chisanan232

Copy link
Copy Markdown
Contributor Author

✅ Claude Code review — ready to merge

CI: all green. Analyze (python), build-and-test (unit + integration + contract), Test Type Imports / Verify PEP 561 Compliance (the type-check workflow, triggered here by the pyproject.toml/uv.lock change), Build documentation, CI Success, SonarCloud, and codecov/patch all pass. e2e / Deploy jobs skipping by design. Nothing to fix.

Notably, after adding pydantic-ai to the test group (option b), the real-library integration test now executes in CI (no longer importorskip-skipped) and passes — so the function-tool governance regression is guarded going forward, not just locally.

Scope vs AAASM-2943 acceptance criteria

Criterion Result
Patch concrete toolset class(es) executing function tools, in addition to the abstract base _load_pydantic_ai_concrete_toolset_classes loads FunctionToolset explicitly + generically discovers own-call_tool overriders in pydantic_ai.toolsets
context7-confirmed target FunctionToolset (in pydantic_ai.toolsets.function) overrides call_tool without super() → base patch shadowed; _AgentFunctionToolset inherits it
Own-__dict__ (vars(cls)) flag checks so a patched base never masks a concrete subclass
revert() symmetric & idempotent; fail-soft when Pydantic AI absent ✅ reverts concrete classes too
Governance flow preserved (check → pending/approval → deny→PolicyViolationError → spawn → record)
Offline fake-class unit tests for a call_tool-override-without-super() subclass
Empirical proof the deny path now raises ✅ real Agent(TestModel) + @agent.tool_plain integration test asserts PolicyViolationError; a control (base-only patch) confirmed the denied tool does not raise on master — proving the fix closes the gap, not just that the test passes
ruff / mypy / pytest green ✅ zero new findings vs master; full suite 434 passed

Notes

  • One pre-existing test made version-aware (test_pydantic_ai_real_tool_class_patch_path_when_available): its assertion only held for <0.3.0, and it leaked global patches — now version-aware + reverts. Same adapter under test, flagged transparently by the author.
  • uv.lock grows because pydantic-ai pulls transitive deps — it's dev/test-only, never a runtime dependency (added to the test group, which the devtest include chain installs in the integration-test job). The published wheel is unaffected (PEP 561 type-check job stays green).

Verdict

Scope fully covered, the fix is empirically proven (with a control experiment) and now CI-guarded, all checks green. Approving for merge.

This closes the function-tool shadowing gap from AAASM-2923 and unblocks AAASM-2939 — though that example pin-relax additionally needs this fix published in an agent-assembly wheel (a release step) before it can complete.

— Claude Code

@Chisanan232 Chisanan232 merged commit a9a7f2c into master Jun 14, 2026
24 checks passed
@Chisanan232 Chisanan232 deleted the v0.0.1/AAASM-2943/pydantic_function_toolset branch June 14, 2026 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant