Update apply_guardrail to allow tool call deletion#24262
Update apply_guardrail to allow tool call deletion#24262seph-barker wants to merge 18 commits intoBerriAI:mainfrom
Conversation
…rail_deleted flag Guardrails can now selectively remove tool calls from requests and responses by setting "guardrail_deleted": True on a tool call dict. When all tool calls in a choice are deleted, tool_calls is set to None and finish_reason changes from "tool_calls" to "stop". Uses the existing remove_items_at_indices utility for index-safe batch deletion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… handlers Extend tool call deletion via the guardrail_deleted flag to the Anthropic Messages and OpenAI Responses output handlers, matching the existing OpenAI Chat Completions behavior. Consolidate GUARDRAIL_DELETED_KEY into the shared base_translation module, fix metadata leaking into API requests for non-deleted tool calls, and narrow overly broad exception handling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ement text injection Remove the _has_text_content early return from OpenAI Chat handler so tool-call-only responses reach guardrails. Allow guardrails to inject replacement text by appending extra entries to the texts list — extras beyond the original count are created as new content blocks in each handler's native format. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add `if guardrailed_texts:` guard to Anthropic and Responses handlers, matching the OpenAI Chat handler. Prevents empty text blocks from being injected into tool-call-only responses when a passthrough guardrail is active. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…handlers Update OpenAI Chat _apply_guardrail_responses_to_output_tool_calls docstring to describe two-pass modify/delete behavior. Add is-not-None explanatory comment to Anthropic and Responses handlers matching Chat. Add non-contiguous deletion and mixed modify-and-delete tests for the Anthropic handler. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Greptile SummaryThis PR adds support for custom guardrails to selectively delete individual tool calls from LLM responses across OpenAI Chat Completions, Anthropic Messages, and OpenAI Responses API formats. It also removes an early-return that was blocking tool-call-only responses from reaching guardrails in the non-streaming path, and adds a replacement-text injection mechanism for those responses. Key changes:
Issues found:
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| litellm/llms/openai/chat/guardrail_translation/handler.py | Core changes: removed _has_text_content early return to allow tool-call-only responses to reach guardrails; added two-pass (modify/delete) tool call processing gated by guardrail_handles_tool_calls; added replacement-text injection for extra texts beyond original mappings; validated content="" fix when all input tool calls are deleted. Logic is sound; the _has_text_content streaming note is accurate since non-ended streaming never passed tool calls anyway. |
| litellm/llms/anthropic/chat/guardrail_translation/handler.py | New helper methods _content_block_to_dict and _get_response_content reduce duplication; two-pass tool call deletion correctly removes tool_use blocks and updates stop_reason to end_turn when no tool_use blocks remain; text injection appends new content blocks correctly. Order of operations (text inject → tool delete) ensures stop_reason is correctly set by deletion step. |
| litellm/llms/openai/responses/guardrail_translation/handler.py | Two-pass tool call deletion for Responses API added correctly; UUID-based IDs (uuid.uuid4().hex[:12]) address the previously noted ID collision risk; no status field update is intentional and documented. Replacement text creates valid GenericResponseOutputItem objects. |
| litellm/integrations/custom_guardrail.py | Added guardrail_handles_tool_calls: bool = False constructor parameter and instance attribute; re-exports GUARDRAIL_DELETED_KEY with explicit as GUARDRAIL_DELETED_KEY public-API re-export pattern. Clean, minimal change. |
| litellm/types/guardrails.py | Added guardrail_handles_tool_calls: Optional[bool] = Field(default=False) to BaseLitellmParams with clear description. Using Optional[bool] is consistent with Pydantic convention for fields that may be absent from YAML config; runtime truthiness handles None correctly. |
| tests/test_litellm/llms/openai/chat/guardrail_translation/test_openai_guardrail_handler.py | Good overall coverage of deletion, modification, non-contiguous deletion, and replacement-text injection paths. Two issues: (1) test_full_deletion_input is missing an assertion that content is set to "" (the API-validity fix); (2) MockFalseDeleteGuardrail lacks guardrail_handles_tool_calls=True, making the test_guardrail_deleted_false_does_not_delete test ineffective at catching a bug in the guardrail_deleted: False path. |
| tests/test_litellm/llms/anthropic/chat/guardrail_translation/test_anthropic_guardrail_handler.py | Comprehensive tests for partial/full deletion, non-contiguous deletion, mixed modify+delete, and replacement-text injection. All mocks correctly set guardrail_handles_tool_calls=True. |
| tests/test_litellm/llms/openai/responses/test_openai_responses_guardrail_handler.py | Tests added for Responses API tool call deletion and replacement text injection. Uses UUID IDs for injected messages, consistent with the implementation fix. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[LLM Response / Request] --> B{Has texts or\ntool_calls?}
B -- No --> Z[Return unchanged]
B -- Yes --> C[Step 1: Extract texts + tool_calls\ninto task_mappings]
C --> D[Step 2: Call apply_guardrail\nwith texts + tool_calls]
D --> E{guardrailed_texts\nnon-empty?}
E -- Yes --> F[Step 3: Write guardrailed texts\nback to response]
E -- No --> G
F --> G{guardrail_handles\n_tool_calls = True?}
G -- No --> H[Return — no tool call writeback]
G -- Yes --> I[Step 4: Two-pass tool call processing]
I --> J[Pass 1 — Modify:\nupdate arguments/name on\nnon-deleted tool calls]
J --> K[Pass 2 — Delete:\nremove items at marked indices]
K --> L{All tool calls\nremoved?}
L -- Yes --> M[Update stop_reason / finish_reason\nOpenAI: tool_calls → stop\nAnthropic: tool_use → end_turn]
L -- No --> N[Return]
M --> N
D --> O{Extra texts beyond\noriginal mappings?}
O -- Yes --> P[Inject replacement text blocks\ninto response content]
O -- No --> G
P --> G
Comments Outside Diff (2)
-
tests/test_litellm/llms/openai/chat/guardrail_translation/test_openai_guardrail_handler.py, line 1797-1798 (link)Missing assertion for
content = ""after full input deletionThe test verifies that
tool_callsis removed from the message, but does not assert thatcontentis set to""when it was originallyNone. This was the core fix added to prevent a 400 error from the OpenAI API (an assistant message withcontent: nulland notool_callsis invalid).Without this assertion the guard in
_apply_guardrail_responses_to_input_tool_calls:if messages[msg_idx].get("content") is None: messages[msg_idx]["content"] = ""
could silently regress (e.g., accidentally removed or conditioned incorrectly) and the test would still pass.
Rule Used: What: Flag any modifications to existing tests and... (source)
-
tests/test_litellm/llms/openai/chat/guardrail_translation/test_openai_guardrail_handler.py, line 1824-1860 (link)test_guardrail_deleted_false_does_not_deletetests the wrong conditionMockFalseDeleteGuardraildoes not setguardrail_handles_tool_calls=True, soguardrail_handles_tool_callsdefaults toFalse. Inprocess_output_response, the writeback block is:if tool_calls_to_check and guardrail_to_apply.guardrail_handles_tool_calls: ... # never reached because guardrail_handles_tool_calls is False
The tool call is preserved not because
guardrail_deleted: Falseis correctly handled, but because writeback is entirely skipped. A bug whereguardrail_deleted: Falseaccidentally triggered deletion would be invisible to this test.To actually test that
guardrail_deleted: Falsedoes not delete tool calls, the mock needsguardrail_handles_tool_calls=Trueso the writeback path is exercised:class MockFalseDeleteGuardrail(CustomGuardrail): def __init__(self): super().__init__( guardrail_name="test-false", guardrail_handles_tool_calls=True, # enable writeback so deletion logic runs )
Rule Used: # Code Review Rule: Mock Test Integrity
What:... (source)
Last reviewed commit: "fix: Anthropic input..."
…essing Gate tool call processing behind a per-guardrail flag (default False) so existing guardrails are unaffected. Only guardrails that set guardrail_handles_tool_calls=True receive tool calls and can use guardrail_deleted. Adds debug logging when tool calls are skipped. Also fixes potential IndexError in _apply_guardrail_responses_to_input_texts and extracts helpers to resolve PLR0915 lint errors in the Anthropic handler. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…adata stripping Remove unconditional fastapi import from test file (violates no-fastapi- outside-proxy rule). Remove tool call checks from _has_text_content so the streaming path does not silently no-op on tool-call-only chunks. Strip guardrail_deleted key from non-deleted tool calls in all three output handlers, matching the input path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tion Tool calls are always passed to apply_guardrail for inspection and validation (preserving backwards compatibility). The flag only controls whether modifications and deletions are written back to the response. Default remains False so existing guardrails are unaffected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Expose the constant from litellm.integrations.custom_guardrail so guardrail authors can import it from a stable public path instead of the internal base_translation module. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace raw dict with GenericResponseOutputItem and OutputText to satisfy mypy arg-type check on response_output.append(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace sequential guardrail_msg_N IDs with uuid-based IDs to avoid collisions across invocations and retry loops. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GenericResponseOutputItem is a litellm type not in the OpenAI response output union — add type: ignore since the handler already processes both typed objects and dicts at runtime. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| if len(responses) > len(task_mappings) and response.choices: | ||
| choice = response.choices[0] | ||
| for extra_text in responses[len(task_mappings) :]: | ||
| if choice.message.content is None: | ||
| choice.message.content = extra_text | ||
| elif isinstance(choice.message.content, str): | ||
| choice.message.content += "\n" + extra_text | ||
| elif isinstance(choice.message.content, list): | ||
| choice.message.content.append({"type": "text", "text": extra_text}) |
There was a problem hiding this comment.
Replacement text always injected into
choices[0] for multi-choice responses
When a guardrail returns extra texts (beyond the original task_mappings length) for a tool-call-only response, they are unconditionally appended to response.choices[0]. For requests with n > 1, each choice can have its own independent tool calls. If the tool calls that were deleted belonged to a non-first choice, the replacement text will still land in choices[0] rather than in the choice whose tool calls were removed.
Consider tracking the relevant choice_idx from tool_call_task_mappings and using it as the injection target, or documenting that replacement-text injection only targets the first choice.
There was a problem hiding this comment.
Documented as a limitation — extras always target choices[0]. Full per-choice tracking adds significant complexity for an unlikely edge case (n>1 + tool calls + guardrail deletion + replacement text).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…andler Add isinstance/hasattr guards before .get()/.pop() on guardrailed tool calls in both input and output paths, matching the Anthropic and Responses handler patterns. Prevents AttributeError if a guardrail returns non-dict tool call objects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| # Apply deletions in reverse index order to preserve indices | ||
| for msg_idx, indices_to_delete in deletions_by_msg.items(): | ||
| message_tool_calls = messages[msg_idx].get("tool_calls", None) | ||
| if message_tool_calls is not None and isinstance(message_tool_calls, list): | ||
| remove_items_at_indices(message_tool_calls, indices_to_delete) | ||
|
|
||
| if not message_tool_calls: | ||
| messages[msg_idx].pop("tool_calls", None) |
There was a problem hiding this comment.
Full input tool call deletion produces an invalid assistant message
When a guardrail marks all tool calls in an assistant input message as deleted, messages[msg_idx].pop("tool_calls", None) removes the key but leaves content: null intact (or absent). The resulting message {"role": "assistant", "content": null} is rejected by the OpenAI Chat Completions API with a 400 error because a non-streaming assistant message must have either non-null content or at least one tool call.
Consider setting content to an empty string or removing the entire message when all tool calls are deleted:
if not message_tool_calls:
messages[msg_idx].pop("tool_calls", None)
# Ensure the message remains valid: give it empty content if none exists
if messages[msg_idx].get("content") is None:
messages[msg_idx]["content"] = ""Alternatively, document this as a known limitation that callers must handle.
There was a problem hiding this comment.
Good catch. When all tool calls are deleted from an input message and content is None, we now set content to empty string to keep the message valid.
| if not choice_tool_calls: | ||
| choice.message.tool_calls = None | ||
| if choice.finish_reason == "tool_calls": | ||
| choice.finish_reason = "stop" |
There was a problem hiding this comment.
finish_reason set to "stop" rather than "end_turn" after tool call deletion
When all tool calls are removed, finish_reason is updated from "tool_calls" to "stop". OpenAI clients generally accept "stop", but per the OpenAI spec the canonical value for a model-terminated turn is "stop". However, if the replacement-text injection path (Step 3) is also active, the combined response now has both injected text and no tool calls — a "natural" finish — so "stop" is the right choice.
One edge case: if all choices have no remaining tool calls but the overall finish reason was something other than "tool_calls" (e.g., "length"), the block is skipped and the response remains unchanged. This is already correct. Just be aware that the guard is specifically for "tool_calls", which is the only value set when the model invokes tools in OpenAI format.
There was a problem hiding this comment.
I don't understand this PR comment. It sounds like it is saying that the current logic is correct. Is the feedback that we need to document more clearly?
There was a problem hiding this comment.
You're right to be confused - the comment is indeed saying the current logic is correct.
The comment appears to be documentation/confirmation rather than a bug report. It's explaining:
- What the code does (sets
finish_reasonto"stop"when all tool calls are deleted) - Why that's the right behavior (per OpenAI spec,
"stop"is canonical for model-terminated turns) - An edge case that's already handled correctly
The confusing part is the comment title ("finish_reason set to 'stop' rather than 'end_turn'") which sounds like it might be flagging an issue, but the body concludes "This is already correct."
No changes needed - this is just a review note documenting the behavior for future reference.
| guardrail_handles_tool_calls: bool = False, | ||
| **kwargs, |
There was a problem hiding this comment.
guardrail_handles_tool_calls type inconsistency between CustomGuardrail and BaseLitellmParams
CustomGuardrail.__init__ declares guardrail_handles_tool_calls: bool = False, but BaseLitellmParams.guardrail_handles_tool_calls is typed as Optional[bool]. When update_in_memory_litellm_params copies DB-loaded params onto the instance it could set self.guardrail_handles_tool_calls = None.
All three handler guard clauses check truthiness (if guardrail_to_apply.guardrail_handles_tool_calls:) so None is treated as False at runtime, but the type annotation on CustomGuardrail is misleading. Consider aligning the two type annotations:
| guardrail_handles_tool_calls: bool = False, | |
| **kwargs, | |
| guardrail_handles_tool_calls: bool = False, |
(or change BaseLitellmParams to bool = False rather than Optional[bool]).
There was a problem hiding this comment.
CustomGuardrail already uses bool = False. BaseLitellmParams uses Optional[bool] which is the Pydantic convention for fields that may not be present in YAML config. Runtime truthiness handles None correctly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…t injection Add range(min(...)) bounds guard to Anthropic _apply_guardrail_responses_to_input. Set content to empty string when all input tool calls are deleted and content is None, preventing invalid assistant messages. Fix multi-text injection to set first extra text directly rather than concatenating onto None. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds guardrail_deleted support so custom guardrails can selectively delete individual tool calls from LLM responses across three API formats (OpenAI Chat Completions, Anthropic Messages, OpenAI Responses). Also enables guardrails to run on tool-call-only responses and inject replacement text, as these are both useful for tool-deletion use cases
Changes
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewDelays in PR merge?
If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).
CI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🆕 New Feature
Changes