Skip to content

Update apply_guardrail to allow tool call deletion#24262

Open
seph-barker wants to merge 18 commits intoBerriAI:mainfrom
predibase:apply_guardrail_deletion
Open

Update apply_guardrail to allow tool call deletion#24262
seph-barker wants to merge 18 commits intoBerriAI:mainfrom
predibase:apply_guardrail_deletion

Conversation

@seph-barker
Copy link

@seph-barker seph-barker commented Mar 21, 2026

Adds guardrail_deleted support so custom guardrails can selectively delete individual tool calls from LLM responses across three API formats (OpenAI Chat Completions, Anthropic Messages, OpenAI Responses). Also enables guardrails to run on tool-call-only responses and inject replacement text, as these are both useful for tool-deletion use cases

Changes

  • guardrail_deleted flag: Guardrails set tc["guardrail_deleted"] = True on individual tool call dicts to remove them. Defined as GUARDRAIL_DELETED_KEY in base_translation.py, implemented with a two-pass modify/delete approach in all three output handlers.
  • Tool-call-only responses now reach guardrails — removed the _has_text_content early return in OpenAI Chat that skipped responses with no text.
  • Replacement text injection — guardrails can append extra entries to the texts list to inject text into responses that originally had none. Each handler creates text blocks in its native format.
  • Empty-text guard — text writeback skipped when guardrail returns empty texts (prevents spurious empty text blocks on passthrough guardrails).
  • Docs: new "Handling Tool Calls" and "Adding Replacement Text" sections with examples.

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🆕 New Feature

Changes

seph-barker and others added 6 commits March 19, 2026 17:21
…rail_deleted flag

Guardrails can now selectively remove tool calls from requests and responses
by setting "guardrail_deleted": True on a tool call dict. When all tool calls
in a choice are deleted, tool_calls is set to None and finish_reason changes
from "tool_calls" to "stop". Uses the existing remove_items_at_indices utility
for index-safe batch deletion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… handlers

Extend tool call deletion via the guardrail_deleted flag to the Anthropic
Messages and OpenAI Responses output handlers, matching the existing
OpenAI Chat Completions behavior. Consolidate GUARDRAIL_DELETED_KEY into
the shared base_translation module, fix metadata leaking into API
requests for non-deleted tool calls, and narrow overly broad exception
handling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ement text injection

Remove the _has_text_content early return from OpenAI Chat handler so
tool-call-only responses reach guardrails. Allow guardrails to inject
replacement text by appending extra entries to the texts list — extras
beyond the original count are created as new content blocks in each
handler's native format.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add `if guardrailed_texts:` guard to Anthropic and Responses handlers,
matching the OpenAI Chat handler. Prevents empty text blocks from being
injected into tool-call-only responses when a passthrough guardrail is
active.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…handlers

Update OpenAI Chat _apply_guardrail_responses_to_output_tool_calls
docstring to describe two-pass modify/delete behavior. Add is-not-None
explanatory comment to Anthropic and Responses handlers matching Chat.
Add non-contiguous deletion and mixed modify-and-delete tests for the
Anthropic handler.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Mar 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 21, 2026 7:13pm

Request Review

@codspeed-hq
Copy link
Contributor

codspeed-hq bot commented Mar 21, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing predibase:apply_guardrail_deletion (c22b17d) with main (b64b0d4)

Open in CodSpeed

@seph-barker seph-barker changed the title Apply guardrail deletion Update apply_guardrail to allow tool call deletion Mar 21, 2026
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 21, 2026

Greptile Summary

This PR adds support for custom guardrails to selectively delete individual tool calls from LLM responses across OpenAI Chat Completions, Anthropic Messages, and OpenAI Responses API formats. It also removes an early-return that was blocking tool-call-only responses from reaching guardrails in the non-streaming path, and adds a replacement-text injection mechanism for those responses.

Key changes:

  • New deletion-flag constant in base_translation.py, re-exported from litellm.integrations.custom_guardrail as a stable public API surface
  • New guardrail_handles_tool_calls: bool = False parameter on CustomGuardrail and BaseLitellmParams — writeback of tool call modifications/deletions is opt-in and does not affect existing guardrails by default
  • Two-pass (modify then delete) logic in all three output handlers, with correct stop_reason/finish_reason updates after full deletion
  • Input-path full deletion fix: when all tool calls are removed from an assistant message, content is set to an empty string to keep the message valid for the OpenAI API
  • Replacement-text injection: guardrails can append extra entries to texts to inject content into tool-call-only responses

Issues found:

  • test_full_deletion_input is missing an assertion that content is set to an empty string when all tool calls are deleted from a content: null assistant message — the critical API-validity fix from a previous review round is not regression-protected
  • MockFalseDeleteGuardrail in test_guardrail_deleted_false_does_not_delete lacks guardrail_handles_tool_calls=True, so the test never exercises the deletion code path and would not catch a bug where a false deletion flag accidentally triggered removal

Confidence Score: 4/5

  • PR is safe to merge with minor test coverage fixes recommended.
  • The implementation logic across all three API handlers is sound — the two-pass delete pattern is consistent, stop_reason/finish_reason updates are correct, the content="" API-validity fix is present in the code, and the opt-in guardrail_handles_tool_calls gate avoids backwards-incompatible breakage for existing guardrails. The score is 4 rather than 5 due to two test quality gaps: the test_full_deletion_input assertion gap means the content="" fix has no regression protection, and the MockFalseDeleteGuardrail gap means the guardrail_deleted: False path is untested under actual writeback conditions.
  • tests/test_litellm/llms/openai/chat/guardrail_translation/test_openai_guardrail_handler.py — missing content="" assertion and ineffective false-deletion test mock

Important Files Changed

Filename Overview
litellm/llms/openai/chat/guardrail_translation/handler.py Core changes: removed _has_text_content early return to allow tool-call-only responses to reach guardrails; added two-pass (modify/delete) tool call processing gated by guardrail_handles_tool_calls; added replacement-text injection for extra texts beyond original mappings; validated content="" fix when all input tool calls are deleted. Logic is sound; the _has_text_content streaming note is accurate since non-ended streaming never passed tool calls anyway.
litellm/llms/anthropic/chat/guardrail_translation/handler.py New helper methods _content_block_to_dict and _get_response_content reduce duplication; two-pass tool call deletion correctly removes tool_use blocks and updates stop_reason to end_turn when no tool_use blocks remain; text injection appends new content blocks correctly. Order of operations (text inject → tool delete) ensures stop_reason is correctly set by deletion step.
litellm/llms/openai/responses/guardrail_translation/handler.py Two-pass tool call deletion for Responses API added correctly; UUID-based IDs (uuid.uuid4().hex[:12]) address the previously noted ID collision risk; no status field update is intentional and documented. Replacement text creates valid GenericResponseOutputItem objects.
litellm/integrations/custom_guardrail.py Added guardrail_handles_tool_calls: bool = False constructor parameter and instance attribute; re-exports GUARDRAIL_DELETED_KEY with explicit as GUARDRAIL_DELETED_KEY public-API re-export pattern. Clean, minimal change.
litellm/types/guardrails.py Added guardrail_handles_tool_calls: Optional[bool] = Field(default=False) to BaseLitellmParams with clear description. Using Optional[bool] is consistent with Pydantic convention for fields that may be absent from YAML config; runtime truthiness handles None correctly.
tests/test_litellm/llms/openai/chat/guardrail_translation/test_openai_guardrail_handler.py Good overall coverage of deletion, modification, non-contiguous deletion, and replacement-text injection paths. Two issues: (1) test_full_deletion_input is missing an assertion that content is set to "" (the API-validity fix); (2) MockFalseDeleteGuardrail lacks guardrail_handles_tool_calls=True, making the test_guardrail_deleted_false_does_not_delete test ineffective at catching a bug in the guardrail_deleted: False path.
tests/test_litellm/llms/anthropic/chat/guardrail_translation/test_anthropic_guardrail_handler.py Comprehensive tests for partial/full deletion, non-contiguous deletion, mixed modify+delete, and replacement-text injection. All mocks correctly set guardrail_handles_tool_calls=True.
tests/test_litellm/llms/openai/responses/test_openai_responses_guardrail_handler.py Tests added for Responses API tool call deletion and replacement text injection. Uses UUID IDs for injected messages, consistent with the implementation fix.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[LLM Response / Request] --> B{Has texts or\ntool_calls?}
    B -- No --> Z[Return unchanged]
    B -- Yes --> C[Step 1: Extract texts + tool_calls\ninto task_mappings]
    C --> D[Step 2: Call apply_guardrail\nwith texts + tool_calls]
    D --> E{guardrailed_texts\nnon-empty?}
    E -- Yes --> F[Step 3: Write guardrailed texts\nback to response]
    E -- No --> G
    F --> G{guardrail_handles\n_tool_calls = True?}
    G -- No --> H[Return — no tool call writeback]
    G -- Yes --> I[Step 4: Two-pass tool call processing]
    I --> J[Pass 1 — Modify:\nupdate arguments/name on\nnon-deleted tool calls]
    J --> K[Pass 2 — Delete:\nremove items at marked indices]
    K --> L{All tool calls\nremoved?}
    L -- Yes --> M[Update stop_reason / finish_reason\nOpenAI: tool_calls → stop\nAnthropic: tool_use → end_turn]
    L -- No --> N[Return]
    M --> N
    D --> O{Extra texts beyond\noriginal mappings?}
    O -- Yes --> P[Inject replacement text blocks\ninto response content]
    O -- No --> G
    P --> G
Loading

Comments Outside Diff (2)

  1. tests/test_litellm/llms/openai/chat/guardrail_translation/test_openai_guardrail_handler.py, line 1797-1798 (link)

    P1 Missing assertion for content = "" after full input deletion

    The test verifies that tool_calls is removed from the message, but does not assert that content is set to "" when it was originally None. This was the core fix added to prevent a 400 error from the OpenAI API (an assistant message with content: null and no tool_calls is invalid).

    Without this assertion the guard in _apply_guardrail_responses_to_input_tool_calls:

    if messages[msg_idx].get("content") is None:
        messages[msg_idx]["content"] = ""

    could silently regress (e.g., accidentally removed or conditioned incorrectly) and the test would still pass.

    Rule Used: What: Flag any modifications to existing tests and... (source)

  2. tests/test_litellm/llms/openai/chat/guardrail_translation/test_openai_guardrail_handler.py, line 1824-1860 (link)

    P2 test_guardrail_deleted_false_does_not_delete tests the wrong condition

    MockFalseDeleteGuardrail does not set guardrail_handles_tool_calls=True, so guardrail_handles_tool_calls defaults to False. In process_output_response, the writeback block is:

    if tool_calls_to_check and guardrail_to_apply.guardrail_handles_tool_calls:
        ...  # never reached because guardrail_handles_tool_calls is False

    The tool call is preserved not because guardrail_deleted: False is correctly handled, but because writeback is entirely skipped. A bug where guardrail_deleted: False accidentally triggered deletion would be invisible to this test.

    To actually test that guardrail_deleted: False does not delete tool calls, the mock needs guardrail_handles_tool_calls=True so the writeback path is exercised:

    class MockFalseDeleteGuardrail(CustomGuardrail):
        def __init__(self):
            super().__init__(
                guardrail_name="test-false",
                guardrail_handles_tool_calls=True,  # enable writeback so deletion logic runs
            )

    Rule Used: # Code Review Rule: Mock Test Integrity

    What:... (source)

Last reviewed commit: "fix: Anthropic input..."

…essing

Gate tool call processing behind a per-guardrail flag (default False)
so existing guardrails are unaffected. Only guardrails that set
guardrail_handles_tool_calls=True receive tool calls and can use
guardrail_deleted. Adds debug logging when tool calls are skipped.

Also fixes potential IndexError in _apply_guardrail_responses_to_input_texts
and extracts helpers to resolve PLR0915 lint errors in the Anthropic handler.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…adata stripping

Remove unconditional fastapi import from test file (violates no-fastapi-
outside-proxy rule). Remove tool call checks from _has_text_content so
the streaming path does not silently no-op on tool-call-only chunks.
Strip guardrail_deleted key from non-deleted tool calls in all three
output handlers, matching the input path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tion

Tool calls are always passed to apply_guardrail for inspection and
validation (preserving backwards compatibility). The flag only controls
whether modifications and deletions are written back to the response.
Default remains False so existing guardrails are unaffected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Expose the constant from litellm.integrations.custom_guardrail so
guardrail authors can import it from a stable public path instead of
the internal base_translation module.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace raw dict with GenericResponseOutputItem and OutputText to
satisfy mypy arg-type check on response_output.append().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace sequential guardrail_msg_N IDs with uuid-based IDs to avoid
collisions across invocations and retry loops.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GenericResponseOutputItem is a litellm type not in the OpenAI response
output union — add type: ignore since the handler already processes
both typed objects and dicts at runtime.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment on lines +720 to +728
if len(responses) > len(task_mappings) and response.choices:
choice = response.choices[0]
for extra_text in responses[len(task_mappings) :]:
if choice.message.content is None:
choice.message.content = extra_text
elif isinstance(choice.message.content, str):
choice.message.content += "\n" + extra_text
elif isinstance(choice.message.content, list):
choice.message.content.append({"type": "text", "text": extra_text})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Replacement text always injected into choices[0] for multi-choice responses

When a guardrail returns extra texts (beyond the original task_mappings length) for a tool-call-only response, they are unconditionally appended to response.choices[0]. For requests with n > 1, each choice can have its own independent tool calls. If the tool calls that were deleted belonged to a non-first choice, the replacement text will still land in choices[0] rather than in the choice whose tool calls were removed.

Consider tracking the relevant choice_idx from tool_call_task_mappings and using it as the injection target, or documenting that replacement-text injection only targets the first choice.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented as a limitation — extras always target choices[0]. Full per-choice tracking adds significant complexity for an unlikely edge case (n>1 + tool calls + guardrail deletion + replacement text).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…andler

Add isinstance/hasattr guards before .get()/.pop() on guardrailed tool
calls in both input and output paths, matching the Anthropic and
Responses handler patterns. Prevents AttributeError if a guardrail
returns non-dict tool call objects.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment on lines +289 to +296
# Apply deletions in reverse index order to preserve indices
for msg_idx, indices_to_delete in deletions_by_msg.items():
message_tool_calls = messages[msg_idx].get("tool_calls", None)
if message_tool_calls is not None and isinstance(message_tool_calls, list):
remove_items_at_indices(message_tool_calls, indices_to_delete)

if not message_tool_calls:
messages[msg_idx].pop("tool_calls", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Full input tool call deletion produces an invalid assistant message

When a guardrail marks all tool calls in an assistant input message as deleted, messages[msg_idx].pop("tool_calls", None) removes the key but leaves content: null intact (or absent). The resulting message {"role": "assistant", "content": null} is rejected by the OpenAI Chat Completions API with a 400 error because a non-streaming assistant message must have either non-null content or at least one tool call.

Consider setting content to an empty string or removing the entire message when all tool calls are deleted:

if not message_tool_calls:
    messages[msg_idx].pop("tool_calls", None)
    # Ensure the message remains valid: give it empty content if none exists
    if messages[msg_idx].get("content") is None:
        messages[msg_idx]["content"] = ""

Alternatively, document this as a known limitation that callers must handle.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. When all tool calls are deleted from an input message and content is None, we now set content to empty string to keep the message valid.

Comment on lines +808 to +811
if not choice_tool_calls:
choice.message.tool_calls = None
if choice.finish_reason == "tool_calls":
choice.finish_reason = "stop"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 finish_reason set to "stop" rather than "end_turn" after tool call deletion

When all tool calls are removed, finish_reason is updated from "tool_calls" to "stop". OpenAI clients generally accept "stop", but per the OpenAI spec the canonical value for a model-terminated turn is "stop". However, if the replacement-text injection path (Step 3) is also active, the combined response now has both injected text and no tool calls — a "natural" finish — so "stop" is the right choice.

One edge case: if all choices have no remaining tool calls but the overall finish reason was something other than "tool_calls" (e.g., "length"), the block is skipped and the response remains unchanged. This is already correct. Just be aware that the guard is specifically for "tool_calls", which is the only value set when the model invokes tools in OpenAI format.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this PR comment. It sounds like it is saying that the current logic is correct. Is the feedback that we need to document more clearly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right to be confused - the comment is indeed saying the current logic is correct.

The comment appears to be documentation/confirmation rather than a bug report. It's explaining:

  1. What the code does (sets finish_reason to "stop" when all tool calls are deleted)
  2. Why that's the right behavior (per OpenAI spec, "stop" is canonical for model-terminated turns)
  3. An edge case that's already handled correctly

The confusing part is the comment title ("finish_reason set to 'stop' rather than 'end_turn'") which sounds like it might be flagging an issue, but the body concludes "This is already correct."

No changes needed - this is just a review note documenting the behavior for future reference.

Comment on lines +102 to 103
guardrail_handles_tool_calls: bool = False,
**kwargs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 guardrail_handles_tool_calls type inconsistency between CustomGuardrail and BaseLitellmParams

CustomGuardrail.__init__ declares guardrail_handles_tool_calls: bool = False, but BaseLitellmParams.guardrail_handles_tool_calls is typed as Optional[bool]. When update_in_memory_litellm_params copies DB-loaded params onto the instance it could set self.guardrail_handles_tool_calls = None.

All three handler guard clauses check truthiness (if guardrail_to_apply.guardrail_handles_tool_calls:) so None is treated as False at runtime, but the type annotation on CustomGuardrail is misleading. Consider aligning the two type annotations:

Suggested change
guardrail_handles_tool_calls: bool = False,
**kwargs,
guardrail_handles_tool_calls: bool = False,

(or change BaseLitellmParams to bool = False rather than Optional[bool]).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CustomGuardrail already uses bool = False. BaseLitellmParams uses Optional[bool] which is the Pydantic convention for fields that may not be present in YAML config. Runtime truthiness handles None correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…t injection

Add range(min(...)) bounds guard to Anthropic _apply_guardrail_responses_to_input.
Set content to empty string when all input tool calls are deleted and content is
None, preventing invalid assistant messages. Fix multi-text injection to set first
extra text directly rather than concatenating onto None.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant