[AAASM-2943] 🐛 (pydantic-ai): Patch concrete FunctionToolset.call_tool so function-tool governance fires by Chisanan232 · Pull Request #120 · ai-agent-assembly/python-sdk

Chisanan232 · 2026-06-14T13:02:18Z

Description

Function tools registered with @agent.tool_plain / @agent.tool execute through the concrete pydantic_ai.toolsets.function.FunctionToolset.call_tool, whose MRO is FunctionToolset → AbstractToolset. FunctionToolset overrides call_tool WITHOUT calling super().call_tool(...). The AAASM-2923 patch only patched AbstractToolset.call_tool (the base), so for the most common tool type the patch was shadowed: a denied function tool ran without raising PolicyViolationError — governance was silently bypassed.

This PR mirrors the Google ADK concrete-class approach already in this file (_load_google_adk_concrete_tool_classes):

Adds _load_pydantic_ai_concrete_toolset_classes(...) which loads FunctionToolset explicitly and generically discovers any AbstractToolset subclass that defines its OWN call_tool (checked via vars(cls)) in pydantic_ai.toolsets.
apply() patches each discovered concrete class in addition to the abstract base; revert() symmetrically reverts them (idempotent).
The patched-flag check now reads the class's OWN __dict__ (vars(cls)), so a patched base never masks a concrete subclass.
Everything stays fail-soft when Pydantic AI is absent (ImportError → no-op).
The full governance flow is preserved: check_tool_start → pending/approval → deny → PolicyViolationError → spawn context → record result.

This fixes the function-tool shadowing gap introduced by AAASM-2923 and unblocks AAASM-2939 (pending a release).

Type of Change

Breaking Changes

No
Yes (please describe below)

No public API change. pydantic-ai is NOT added as a runtime dependency.

Related Issues

Related JIRA ticket: AAASM-2943 (fixes the function-tool shadowing gap from AAASM-2923; unblocks AAASM-2939 pending a release)
Related GitHub issues: N/A

Testing

Describe the testing performed for this PR:

Unit tests added/updated
Integration tests added/updated
Manual testing performed
No tests required (explain why)

Details:

Unit (offline, always run): fake-class tests in test/unit/adapters/pydantic_ai/test_pydantic_ai_patch.py model a FunctionToolset-style subclass overriding call_tool without super(). They assert discovery finds it, apply() patches both base and concrete (own-dict flags), invoking the denied tool through the concrete call_tool raises PolicyViolationError, revert() restores both and is idempotent, and discovery is fail-soft without Pydantic AI.
Integration (empirical proof): test/integration/test_pydantic_ai_interception_integration.py now builds a real Agent(TestModel(call_tools=["blocked_tool"])) with an @agent.tool_plain tool and asserts the denied tool raises PolicyViolationError after apply(). Verified end-to-end against pydantic-ai 1.107.0 locally (installed dev-only, not committed as a dependency). Also confirmed that with only the base patch (master behavior) the denied tool does NOT raise — proving this change is what closes the gap. The same file's pre-existing real-tool-class test had a stale assertion (on >=0.3.0 Tool has no _run, so the flag lands on AbstractToolset); it now asserts the version-appropriate hook and reverts afterward.
Gate: uv sync; ruff check . and ruff format --check . — zero new findings (189 ruff findings are pre-existing on master and unrelated to the adapter); mypy agent_assembly — 58 errors, identical to master, none in changed files; pytest test/ — 432 passed, 11 skipped (the two real-library tests skip when pydantic-ai is absent). With pydantic-ai installed, all 43 adapter/integration tests pass.

Checklist

🤖 Generated with Claude Code

Function tools (@agent.tool_plain / @agent.tool) execute through pydantic_ai.toolsets.function.FunctionToolset.call_tool, which overrides AbstractToolset.call_tool WITHOUT calling super(), so the base-class patch from AAASM-2923 was shadowed and a denied function tool ran without raising PolicyViolationError. Discover concrete AbstractToolset subclasses that define their own call_tool (FunctionToolset explicitly plus generic discovery in pydantic_ai.toolsets) and patch each in addition to the abstract base. Patched-flag checks now read the class's own __dict__ so a patched base never masks a concrete subclass, and revert() symmetrically reverts the concrete classes. Stays fail-soft when Pydantic AI is absent. Refs AAASM-2943

Fake-class unit tests modelling a FunctionToolset-style subclass that overrides call_tool WITHOUT super(). Assert discovery finds it, apply() patches both base and concrete classes (own-dict flags), a denied tool invoked through the concrete call_tool raises PolicyViolationError, revert() restores both and is idempotent, and discovery is fail-soft when Pydantic AI is absent. Refs AAASM-2943

Add an integration test that builds a real Agent(TestModel(call_tools=[...])) with an @agent.tool_plain tool and asserts the denied tool raises PolicyViolationError after apply() — proving function-tool governance fires through the concrete FunctionToolset.call_tool on Pydantic AI >=0.3.0. Also fix the stale assertion in the existing real-tool-class test: on >=0.3.0 Tool has no _run, so the patched flag lands on AbstractToolset, not Tool. Assert the version-appropriate hook and revert() after the test. Refs AAASM-2943

codecov · 2026-06-14T13:03:33Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…roof The real-library integration tests for the Pydantic AI adapter are guarded by importorskip, so without the framework installed they skip in CI. The `dev` group includes `test`, and the integration-test job installs `dev`, so adding pydantic-ai here makes the function-tool governance regression (test_pydantic_ai_real_function_tool_deny_raises_after_apply) actually run. Dev/test-only — NOT a runtime dependency of the SDK. Refs AAASM-2943

sonarqubecloud · 2026-06-14T13:11:39Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
100.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Chisanan232 · 2026-06-14T13:12:39Z

✅ Claude Code review — ready to merge

CI: all green. Analyze (python), build-and-test (unit + integration + contract), Test Type Imports / Verify PEP 561 Compliance (the type-check workflow, triggered here by the pyproject.toml/uv.lock change), Build documentation, CI Success, SonarCloud, and codecov/patch all pass. e2e / Deploy jobs skipping by design. Nothing to fix.

Notably, after adding pydantic-ai to the test group (option b), the real-library integration test now executes in CI (no longer importorskip-skipped) and passes — so the function-tool governance regression is guarded going forward, not just locally.

Scope vs AAASM-2943 acceptance criteria

Criterion	Result
Patch concrete toolset class(es) executing function tools, in addition to the abstract base	✅ `_load_pydantic_ai_concrete_toolset_classes` loads `FunctionToolset` explicitly + generically discovers own-`call_tool` overriders in `pydantic_ai.toolsets`
context7-confirmed target	✅ `FunctionToolset` (in `pydantic_ai.toolsets.function`) overrides `call_tool` without `super()` → base patch shadowed; `_AgentFunctionToolset` inherits it
Own-`__dict__` (`vars(cls)`) flag checks so a patched base never masks a concrete subclass	✅
`revert()` symmetric & idempotent; fail-soft when Pydantic AI absent	✅ reverts concrete classes too
Governance flow preserved (check → pending/approval → deny→`PolicyViolationError` → spawn → record)	✅
Offline fake-class unit tests for a `call_tool`-override-without-`super()` subclass	✅
Empirical proof the deny path now raises	✅ real `Agent(TestModel)` + `@agent.tool_plain` integration test asserts `PolicyViolationError`; a control (base-only patch) confirmed the denied tool does not raise on master — proving the fix closes the gap, not just that the test passes
ruff / mypy / pytest green	✅ zero new findings vs master; full suite 434 passed

Notes

One pre-existing test made version-aware (test_pydantic_ai_real_tool_class_patch_path_when_available): its assertion only held for <0.3.0, and it leaked global patches — now version-aware + reverts. Same adapter under test, flagged transparently by the author.
uv.lock grows because pydantic-ai pulls transitive deps — it's dev/test-only, never a runtime dependency (added to the test group, which the dev→test include chain installs in the integration-test job). The published wheel is unaffected (PEP 561 type-check job stays green).

Verdict

Scope fully covered, the fix is empirically proven (with a control experiment) and now CI-guarded, all checks green. Approving for merge.

This closes the function-tool shadowing gap from AAASM-2923 and unblocks AAASM-2939 — though that example pin-relax additionally needs this fix published in an agent-assembly wheel (a release step) before it can complete.

— Claude Code

Chisanan232 added 3 commits June 14, 2026 20:54

Chisanan232 merged commit a9a7f2c into master Jun 14, 2026
24 checks passed

Chisanan232 deleted the v0.0.1/AAASM-2943/pydantic_function_toolset branch June 14, 2026 13:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AAASM-2943] 🐛 (pydantic-ai): Patch concrete FunctionToolset.call_tool so function-tool governance fires#120

[AAASM-2943] 🐛 (pydantic-ai): Patch concrete FunctionToolset.call_tool so function-tool governance fires#120
Chisanan232 merged 4 commits into
masterfrom
v0.0.1/AAASM-2943/pydantic_function_toolset

Chisanan232 commented Jun 14, 2026

Uh oh!

codecov Bot commented Jun 14, 2026 •

edited

Loading

Uh oh!

sonarqubecloud Bot commented Jun 14, 2026

Uh oh!

Chisanan232 commented Jun 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Chisanan232 commented Jun 14, 2026

Description

Type of Change

Breaking Changes

Related Issues

Testing

Checklist

Uh oh!

codecov Bot commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sonarqubecloud Bot commented Jun 14, 2026

Quality Gate passed

Uh oh!

Chisanan232 commented Jun 14, 2026

✅ Claude Code review — ready to merge

Scope vs AAASM-2943 acceptance criteria

Notes

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented Jun 14, 2026 •

edited

Loading