LiteLLM Component Split #24719

harikb · 2026-03-28T05:08:56Z

harikb
Mar 28, 2026

I missed the town-hall regarding the lite-llm malware issue. However I would still like to suggest a break-up of the large repo into smaller repos, if possible. If this is already being discussed elsewhere, please mark this as a dup or point me to that discussion.

Split proxy_server.py (14K lines)

Roughly ~30 module-level globals (router, DB client, caches, config) are defined there and everything else imports them, creating a circular dependency that prevents extraction. The fix is mechanical: move the globals into a standalone ProxyContext singleton, then relocate function groups (streaming, spend helpers, model discovery, login, config, LLM endpoints) into their own files one PR at a time, cutting the file to a thin FastAPI shell that just mounts routers and runs startup. No architecture changes, no DI frameworks, just moving code behind a stable import path. Then split code out of proxy_server.py into smaller files (a few thousand lines can be moved).

Untangle spend tracking from auth

Right now budget checking is inlined inside the auth middleware (user_api_key_auth). The same function that validates your API key token also reads Redis counters, compares against 10 different budget levels, fires Slack alerts, and creates async tasks for email notifications. Spend recording is spread across 4 separate files that all import globals from proxy_server.py.

Testability. Today, to test "does a key get rejected when over budget?" you need to stand up the full FastAPI auth dependency, a Redis instance, a Prisma client, and mock the proxy_server globals. If budget checking is behind check_budget(entity) -> pass/fail, you test it with a unit test that passes in a spend value and a limit.

Decoupling deploy risk. If someone changes how soft-budget alerts work and introduces a bug, it currently breaks auth, meaning all requests fail, not just alerting. Separated, a spend tracking bug means incorrect budget enforcement. An auth bug means authentication is broken. Different blast radius.

Already separable (clean interfaces exist, low risk, but low value too)

litellm-ui — Next.js dashboard, already a separate build in ui/
litellm-cache
litellm-secrets
litellm-integrations
litellm-policy
litellm-providers — each is a leaf dependency today but nothing enforces the import boundary

jagmarques · 2026-04-06T18:10:42Z

jagmarques
Apr 6, 2026

The auth/spend separation point is the most impactful one here. Having budget checking inlined in user_api_key_auth means a spend-tracking bug can take down authentication for all requests. That blast radius difference is exactly the kind of thing that bites you in production at 2am.

The ProxyContext singleton approach is solid. One thing to watch: if you extract spend tracking into its own module, make sure the interface is async-safe from day one. The current pattern of firing async tasks for email notifications from inside the auth middleware is a sign that the boundaries are already leaking concurrency concerns across modules.

The already-separable list (cache, secrets, integrations) would also make it much easier for teams that only need the routing layer without the full proxy stack.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LiteLLM Component Split #24719

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

LiteLLM Component Split #24719

Uh oh!

harikb Mar 28, 2026

Split proxy_server.py (14K lines)

Untangle spend tracking from auth

Already separable (clean interfaces exist, low risk, but low value too)

Replies: 1 comment

Uh oh!

jagmarques Apr 6, 2026

harikb
Mar 28, 2026

jagmarques
Apr 6, 2026