[AAASM-2943] 🐛 (pydantic-ai): Patch concrete FunctionToolset.call_tool so function-tool governance fires#120
Conversation
Function tools (@agent.tool_plain / @agent.tool) execute through pydantic_ai.toolsets.function.FunctionToolset.call_tool, which overrides AbstractToolset.call_tool WITHOUT calling super(), so the base-class patch from AAASM-2923 was shadowed and a denied function tool ran without raising PolicyViolationError. Discover concrete AbstractToolset subclasses that define their own call_tool (FunctionToolset explicitly plus generic discovery in pydantic_ai.toolsets) and patch each in addition to the abstract base. Patched-flag checks now read the class's own __dict__ so a patched base never masks a concrete subclass, and revert() symmetrically reverts the concrete classes. Stays fail-soft when Pydantic AI is absent. Refs AAASM-2943
Fake-class unit tests modelling a FunctionToolset-style subclass that overrides call_tool WITHOUT super(). Assert discovery finds it, apply() patches both base and concrete classes (own-dict flags), a denied tool invoked through the concrete call_tool raises PolicyViolationError, revert() restores both and is idempotent, and discovery is fail-soft when Pydantic AI is absent. Refs AAASM-2943
Add an integration test that builds a real Agent(TestModel(call_tools=[...])) with an @agent.tool_plain tool and asserts the denied tool raises PolicyViolationError after apply() — proving function-tool governance fires through the concrete FunctionToolset.call_tool on Pydantic AI >=0.3.0. Also fix the stale assertion in the existing real-tool-class test: on >=0.3.0 Tool has no _run, so the patched flag lands on AbstractToolset, not Tool. Assert the version-appropriate hook and revert() after the test. Refs AAASM-2943
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
…roof The real-library integration tests for the Pydantic AI adapter are guarded by importorskip, so without the framework installed they skip in CI. The `dev` group includes `test`, and the integration-test job installs `dev`, so adding pydantic-ai here makes the function-tool governance regression (test_pydantic_ai_real_function_tool_deny_raises_after_apply) actually run. Dev/test-only — NOT a runtime dependency of the SDK. Refs AAASM-2943
|
✅ Claude Code review — ready to mergeCI: all green. Notably, after adding Scope vs AAASM-2943 acceptance criteria
Notes
VerdictScope fully covered, the fix is empirically proven (with a control experiment) and now CI-guarded, all checks green. Approving for merge. This closes the function-tool shadowing gap from AAASM-2923 and unblocks AAASM-2939 — though that example pin-relax additionally needs this fix published in an — Claude Code |



Description
Function tools registered with
@agent.tool_plain/@agent.toolexecute through the concretepydantic_ai.toolsets.function.FunctionToolset.call_tool, whose MRO isFunctionToolset → AbstractToolset.FunctionToolsetoverridescall_toolWITHOUT callingsuper().call_tool(...). The AAASM-2923 patch only patchedAbstractToolset.call_tool(the base), so for the most common tool type the patch was shadowed: a denied function tool ran without raisingPolicyViolationError— governance was silently bypassed.This PR mirrors the Google ADK concrete-class approach already in this file (
_load_google_adk_concrete_tool_classes):_load_pydantic_ai_concrete_toolset_classes(...)which loadsFunctionToolsetexplicitly and generically discovers anyAbstractToolsetsubclass that defines its OWNcall_tool(checked viavars(cls)) inpydantic_ai.toolsets.apply()patches each discovered concrete class in addition to the abstract base;revert()symmetrically reverts them (idempotent).__dict__(vars(cls)), so a patched base never masks a concrete subclass.ImportError→ no-op).check_tool_start→ pending/approval → deny →PolicyViolationError→ spawn context → record result.This fixes the function-tool shadowing gap introduced by AAASM-2923 and unblocks AAASM-2939 (pending a release).
Type of Change
Breaking Changes
No public API change.
pydantic-aiis NOT added as a runtime dependency.Related Issues
Testing
Describe the testing performed for this PR:
Details:
test/unit/adapters/pydantic_ai/test_pydantic_ai_patch.pymodel aFunctionToolset-style subclass overridingcall_toolwithoutsuper(). They assert discovery finds it,apply()patches both base and concrete (own-dict flags), invoking the denied tool through the concretecall_toolraisesPolicyViolationError,revert()restores both and is idempotent, and discovery is fail-soft without Pydantic AI.test/integration/test_pydantic_ai_interception_integration.pynow builds a realAgent(TestModel(call_tools=["blocked_tool"]))with an@agent.tool_plaintool and asserts the denied tool raisesPolicyViolationErrorafterapply(). Verified end-to-end against pydantic-ai 1.107.0 locally (installed dev-only, not committed as a dependency). Also confirmed that with only the base patch (master behavior) the denied tool does NOT raise — proving this change is what closes the gap. The same file's pre-existing real-tool-class test had a stale assertion (on >=0.3.0Toolhas no_run, so the flag lands onAbstractToolset); it now asserts the version-appropriate hook and reverts afterward.uv sync;ruff check .andruff format --check .— zero new findings (189 ruff findings are pre-existing on master and unrelated to the adapter);mypy agent_assembly— 58 errors, identical to master, none in changed files;pytest test/— 432 passed, 11 skipped (the two real-library tests skip whenpydantic-aiis absent). Withpydantic-aiinstalled, all 43 adapter/integration tests pass.Checklist
🤖 Generated with Claude Code