Skip to content

Comments

Add Lakera v2 post-call hook and tests (fixed PII masking)#21783

Open
eurogig wants to merge 7 commits intoBerriAI:mainfrom
eurogig:feature/lakera-post-call-hook-clean
Open

Add Lakera v2 post-call hook and tests (fixed PII masking)#21783
eurogig wants to merge 7 commits intoBerriAI:mainfrom
eurogig:feature/lakera-post-call-hook-clean

Conversation

@eurogig
Copy link
Contributor

@eurogig eurogig commented Feb 21, 2026

Relevant issues

[Feature]: Add missing post_call hook to Lakera guardrails #18016

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🆕 New Feature

Changes

Add Lakera v2 post-call hook and tests (fixed PII masking found by Greptile)

ailabrat and others added 5 commits December 16, 2025 20:52
…er, test location, mask order

- PII masking path: return ModelResponse instead of dict so deployment hook accepts it
- Avoid mutating request data: deep copy original_messages and messages in _mask_pii_in_messages
- Add guardrail header in PII-only return path
- Add test in tests/test_litellm/ (test_lakera_ai_v2.py) per PR checklist
- Sort PII payload spans by (start,end) descending so multiple spans in one message mask correctly

Co-authored-by: Cursor <cursoragent@cursor.com>
@vercel
Copy link

vercel bot commented Feb 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Feb 21, 2026 8:48pm

Request Review

@CLAassistant
Copy link

CLAassistant commented Feb 21, 2026

CLA assistant check
All committers have signed the CLA.

@eurogig
Copy link
Contributor Author

eurogig commented Feb 21, 2026

@greptileai

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 21, 2026

Greptile Summary

This PR adds a post-call guardrail hook (async_post_call_success_hook) to the Lakera AI v2 guardrail, enabling content moderation of LLM responses. It also fixes two bugs in the existing PII masking logic: input message mutation (now uses copy.deepcopy) and incorrect mask application order (now sorts detections in reverse to preserve string indices).

  • Implements async_post_call_success_hook following the same pattern as the existing pre-call and during-call hooks, with support for PII masking, block mode, and monitor mode
  • Returns ModelResponse (not a raw dict) when PII is masked, ensuring compatibility with the parent _is_valid_response_type validation
  • Tracks choice indices to correctly map masked content back to response choices when some choices lack content (e.g., tool-call-only responses)
  • Fixes mutation of data["messages"] in _mask_pii_in_messages by deep-copying before modification
  • Adds reverse-sorted mask application to prevent index corruption when multiple PII entities are detected in a single message
  • Adds 4 new tests (3 in tests/guardrails_tests/, 1 in tests/test_litellm/) covering block, allow, and PII masking scenarios — all mock-only

Confidence Score: 4/5

  • This PR is safe to merge; it adds a well-structured post-call hook following existing patterns with good test coverage.
  • The post-call hook implementation follows the established pre-call and during-call patterns closely, returns the correct type (ModelResponse) for parent hook validation, and addresses all previous review feedback (index mismatch, on_flagged consistency, mutation fixes). Tests cover block, allow, and PII masking scenarios. Minor concern: the post-call monitor mode test is missing from the test suite.
  • litellm/proxy/guardrails/guardrail_hooks/lakera_ai_v2.py — the post-call hook is the main logic addition worth careful review

Important Files Changed

Filename Overview
litellm/proxy/guardrails/guardrail_hooks/lakera_ai_v2.py Adds async_post_call_success_hook with PII masking, blocking/monitoring, and proper ModelResponse return. Fixes mutation bug in _mask_pii_in_messages with copy.deepcopy, adds reverse-sorted mask application. Addresses previous review feedback for index mismatch and on_flagged consistency.
tests/guardrails_tests/test_lakera_v2.py Adds three well-structured post-call hook tests: block on flagged content, allow clean content, and PII masking with ModelResponse validation. All tests use mocks appropriately.
tests/test_litellm/proxy/guardrails/guardrail_hooks/test_lakera_ai_v2.py New unit test file in the required tests/test_litellm/ directory, validates that PII masking returns ModelResponse (not dict) for parent hook compatibility. Mock-only, no network calls.

Sequence Diagram

sequenceDiagram
    participant Proxy as LiteLLM Proxy
    participant Hook as async_post_call_success_deployment_hook
    participant Lakera as LakeraAIGuardrail.async_post_call_success_hook
    participant API as Lakera API v2/guard

    Proxy->>Hook: response from LLM
    Hook->>Hook: should_run_guardrail(post_call)
    Hook->>Lakera: async_post_call_success_hook(data, response)
    Lakera->>Lakera: Extract assistant messages from response
    Lakera->>Lakera: Build post_call_messages (user + assistant)
    Lakera->>API: call_v2_guard(post_call_messages)
    API-->>Lakera: LakeraAIResponse (flagged, payload, breakdown)

    alt Not flagged
        Lakera-->>Hook: return original response
    else Flagged: PII only
        Lakera->>Lakera: _mask_pii_in_messages()
        Lakera->>Lakera: Write masked content back to response_dict
        Lakera-->>Hook: return ModelResponse(**response_dict)
    else Flagged: on_flagged == "block"
        Lakera--xHook: raise HTTPException(400)
    else Flagged: on_flagged == "monitor"
        Lakera->>Lakera: Log warning
        Lakera-->>Hook: return original response
    end

    Hook->>Hook: _is_valid_response_type(result)
    Hook-->>Proxy: return result or original response
Loading

Last reviewed commit: 33257c3

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@eurogig
Copy link
Contributor Author

eurogig commented Feb 21, 2026

@greptileai

@yuneng-jiang
Copy link
Collaborator

@greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@eurogig
Copy link
Contributor Author

eurogig commented Feb 21, 2026

@greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants