Skip to content

Comments

Litellm dev 02 19 2026 p2 (#21871)#21872

Open
krrishdholakia wants to merge 2 commits intomainfrom
litellm_dev_02_21_2026_p4
Open

Litellm dev 02 19 2026 p2 (#21871)#21872
krrishdholakia wants to merge 2 commits intomainfrom
litellm_dev_02_21_2026_p4

Conversation

@krrishdholakia
Copy link
Member

  • feat(ui/): new guardrails monitor 'demo

mock representation of what guardrails monitor looks like

  • fix: ui updates

  • style(ui/): fix styling

  • feat: enable running ai monitor on individual guardrails

  • feat: add backend logic for guardrail monitoring

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

* feat(ui/): new guardrails monitor 'demo

mock representation of what guardrails monitor looks like

* fix: ui updates

* style(ui/): fix styling

* feat: enable running ai monitor on individual guardrails

* feat: add backend logic for guardrail monitoring
@vercel
Copy link

vercel bot commented Feb 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Feb 22, 2026 3:35am

Request Review

@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 22, 2026

Greptile Summary

This PR adds a Guardrails Monitor feature — a new dashboard page for viewing guardrail and policy performance metrics (pass/fail rates, trends, logs). It introduces:

  • Three new DB tables (LiteLLM_DailyGuardrailMetrics, LiteLLM_DailyPolicyMetrics, LiteLLM_SpendLogGuardrailIndex) with proper indexes and migrations
  • Backend usage tracking (usage_tracking.py) that piggybacks on the existing spend log pipeline to aggregate guardrail metrics
  • Dashboard API endpoints (usage_endpoints.py) for overview, detail, and log retrieval
  • New UI components for the Guardrails Monitor page with overview table, detail view, log viewer, and evaluation settings modal
  • Bug fix in GuardrailViewer.tsx for null mode values, and a behavior change in custom_guardrail.py to always attach guardrail info to metadata

Key concerns:

  • Race condition in update_spend_logs_job: The queue size check and the batch pop use two separate lock acquisitions, creating a TOCTOU gap
  • Unbounded query in guardrails_usage_detail: The previous-period metrics query has no lower bound on date, which will grow over time
  • Deprecated Tremor components: New UI files use BarChart, Card, Col, Grid, Title from @tremor/react, which are deprecated per AGENTS.md
  • Pre-submission checklist items unchecked: No tests in tests/litellm/ directory, and the PR scope covers multiple distinct changes (new feature + bug fixes + styling)

Confidence Score: 3/5

  • This PR introduces a new feature with a race condition in the spend log pipeline and an unbounded DB query that needs fixing before merge.
  • Score of 3 reflects: (1) a TOCTOU race condition in the critical update_spend_logs_job function where two separate lock acquisitions create a gap, (2) an unbounded previous-period query in guardrails_usage_detail that will degrade over time, (3) use of deprecated Tremor components in all new UI files against project guidelines, and (4) missing backend tests in tests/litellm/. The schema changes and overall architecture are sound, and the feature is well-structured.
  • Pay close attention to litellm/proxy/utils.py (race condition in update_spend_logs_job) and litellm/proxy/guardrails/usage_endpoints.py (unbounded query in detail endpoint).

Important Files Changed

Filename Overview
litellm/proxy/guardrails/usage_endpoints.py New guardrail/policy usage API endpoints. Unbounded previous-period query in detail endpoint will grow over time; unbounded guardrails table fetch in overview.
litellm/proxy/guardrails/usage_tracking.py New usage tracking logic that processes spend logs for guardrail metrics. Well-structured with proper error handling and skip_duplicates. Sequential upserts could be batched for performance at scale.
litellm/proxy/utils.py Modified spend log processing to pop batch once and pass to both spend log writer and guardrail usage tracker. TOCTOU race condition between two separate lock acquisitions in update_spend_logs_job. Import formatting changes throughout.
litellm/integrations/custom_guardrail.py Changed fallback behavior from logging a warning to creating empty metadata dict and attaching guardrail info. Minor risk of overwriting metadata if key is added concurrently.
litellm/proxy/schema.prisma Adds three new models (DailyGuardrailMetrics, DailyPolicyMetrics, SpendLogGuardrailIndex) with appropriate composite keys and indexes. Also removes spec_path from MCPServerTable.
ui/litellm-dashboard/src/components/GuardrailsMonitor/GuardrailsOverview.tsx Overview dashboard component using react-query. Uses deprecated Tremor components (Card, Col, Grid, Title) per AGENTS.md guidance. Functional logic is sound.
ui/litellm-dashboard/src/components/GuardrailsMonitor/GuardrailDetail.tsx Detail view for individual guardrails with metrics, root cause analysis (hardcoded/demo), and logs. Uses deprecated Tremor components. Functional logic is clean.
ui/litellm-dashboard/src/components/GuardrailsMonitor/ScoreChart.tsx Chart component using deprecated Tremor BarChart. Should be replaced with a non-Tremor chart library per project guidelines.

Sequence Diagram

sequenceDiagram
    participant UI as Dashboard UI
    participant API as Usage Endpoints
    participant DB as PostgreSQL
    participant SLQ as Spend Log Queue
    participant UT as Usage Tracking

    Note over SLQ,UT: Background Job Flow
    SLQ->>SLQ: _monitor_spend_logs_queue polls
    SLQ->>DB: update_spend_logs (batch write)
    SLQ->>UT: process_spend_logs_guardrail_usage
    UT->>DB: create_many SpendLogGuardrailIndex
    UT->>DB: upsert DailyGuardrailMetrics

    Note over UI,DB: Dashboard Request Flow
    UI->>API: GET /guardrails/usage/overview
    API->>DB: find_many GuardrailsTable
    API->>DB: find_many DailyGuardrailMetrics (current + prev period)
    API-->>UI: UsageOverviewResponse (rows, chart, totals)

    UI->>API: GET /guardrails/usage/detail/{id}
    API->>DB: find_unique GuardrailsTable
    API->>DB: find_many DailyGuardrailMetrics
    API-->>UI: UsageDetailResponse (metrics, time_series)

    UI->>API: GET /guardrails/usage/logs
    API->>DB: find_many SpendLogGuardrailIndex
    API->>DB: find_many SpendLogs (by request_ids)
    API-->>UI: UsageLogsResponse (paginated logs)
Loading

Last reviewed commit: 886f1a3

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

25 files reviewed, 6 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +4297 to +4309
# Atomically pop batch from queue
async with prisma_client._spend_log_transactions_lock:
queue_size = len(prisma_client.spend_log_transactions)

if queue_size == 0:
return

async with prisma_client._spend_log_transactions_lock:
logs_to_process = prisma_client.spend_log_transactions[
:MAX_LOGS_PER_INTERVAL
]
prisma_client.spend_log_transactions = prisma_client.spend_log_transactions[
len(logs_to_process) :
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TOCTOU race between queue-size check and pop

The queue size is checked under one lock acquisition (line 4298-4299), then released, and the actual pop happens under a second lock acquisition (line 4303-4309). Between the two, another coroutine could drain the queue, making this check unreliable. While the consequences are minor (an empty pop is harmless), the double-lock pattern is unnecessary overhead and misleading — it implies atomicity that isn't actually achieved.

Consider merging both into a single lock acquisition:

Suggested change
# Atomically pop batch from queue
async with prisma_client._spend_log_transactions_lock:
queue_size = len(prisma_client.spend_log_transactions)
if queue_size == 0:
return
async with prisma_client._spend_log_transactions_lock:
logs_to_process = prisma_client.spend_log_transactions[
:MAX_LOGS_PER_INTERVAL
]
prisma_client.spend_log_transactions = prisma_client.spend_log_transactions[
len(logs_to_process) :
]
# Atomically pop batch from queue
async with prisma_client._spend_log_transactions_lock:
queue_size = len(prisma_client.spend_log_transactions)
if queue_size == 0:
return
logs_to_process = prisma_client.spend_log_transactions[
:MAX_LOGS_PER_INTERVAL
]
prisma_client.spend_log_transactions = prisma_client.spend_log_transactions[
len(logs_to_process) :
]

Comment on lines +310 to +311
metrics_prev = await prisma_client.db.litellm_dailyguardrailmetrics.find_many(
where={"guardrail_id": guardrail_id, "date": {"lt": start}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unbounded query fetches all historical metrics

metrics_prev has no lower-bound on date, so it fetches every historical row for this guardrail before start. Over time this will pull increasingly large result sets into memory. The overview endpoint (guardrails_usage_overview) correctly limits the previous period to 7 days — this endpoint should do the same.

Suggested change
metrics_prev = await prisma_client.db.litellm_dailyguardrailmetrics.find_many(
where={"guardrail_id": guardrail_id, "date": {"lt": start}}
start_prev = (
datetime.strptime(start, "%Y-%m-%d") - timedelta(days=7)
).strftime("%Y-%m-%d")
metrics_prev = await prisma_client.db.litellm_dailyguardrailmetrics.find_many(
where={"guardrail_id": guardrail_id, "date": {"gte": start_prev, "lt": start}}
)


try:
# Guardrails from DB
guardrails = await prisma_client.db.litellm_guardrailstable.find_many()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unbounded find_many() loads all guardrails into memory

find_many() without any take limit will return the full guardrails table. For most deployments this is fine, but for large instances with hundreds of guardrails this could be expensive per dashboard page load. Consider adding a reasonable limit or pagination to align with the custom rule about avoiding heavy DB queries on dashboard endpoints.

Context Used: Rule from dashboard - What: Avoid creating new database requests or Router objects in the critical request path.

Why: Cre... (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@@ -0,0 +1,39 @@
import { BarChart, Card, Title } from "@tremor/react";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecated Tremor components used in new feature

Per AGENTS.md, Tremor is deprecated for new features — the only exception is the Tremor Table component. This file uses BarChart, Card, and Title from @tremor/react. Consider replacing these with Ant Design or native HTML equivalents to stay consistent with the project's direction away from Tremor.

Context Used: Context from dashboard - AGENTS.md (source)

WarningOutlined,
} from "@ant-design/icons";
import { useQuery } from "@tanstack/react-query";
import { Card, Col, Grid, Title } from "@tremor/react";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecated Tremor components used in new feature

Per AGENTS.md, Tremor is deprecated — only the Table component is exempt. This file imports Card, Col, Grid, and Title from @tremor/react. The same applies to GuardrailDetail.tsx. Consider using Ant Design components instead for new features.

Context Used: Context from dashboard - AGENTS.md (source)

Comment on lines +592 to +593
request_data["metadata"] = {}
_append_guardrail_info(request_data["metadata"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unconditionally overwrites request_data["metadata"]

If request_data has neither "metadata" nor "litellm_metadata" keys, this creates request_data["metadata"] = {}. However, if something else in the pipeline later checks for the absence of "metadata" to decide whether to initialize it, this empty dict could interfere. The original code logged a warning here, which was safer. Consider using request_data.setdefault("metadata", {}) to avoid overwriting if a key is somehow added concurrently:

Suggested change
request_data["metadata"] = {}
_append_guardrail_info(request_data["metadata"])
request_data.setdefault("metadata", {})
_append_guardrail_info(request_data["metadata"])

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant