Replies: 1 comment
-
|
The auth/spend separation point is the most impactful one here. Having budget checking inlined in user_api_key_auth means a spend-tracking bug can take down authentication for all requests. That blast radius difference is exactly the kind of thing that bites you in production at 2am. The ProxyContext singleton approach is solid. One thing to watch: if you extract spend tracking into its own module, make sure the interface is async-safe from day one. The current pattern of firing async tasks for email notifications from inside the auth middleware is a sign that the boundaries are already leaking concurrency concerns across modules. The already-separable list (cache, secrets, integrations) would also make it much easier for teams that only need the routing layer without the full proxy stack. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I missed the town-hall regarding the lite-llm malware issue. However I would still like to suggest a break-up of the large repo into smaller repos, if possible. If this is already being discussed elsewhere, please mark this as a dup or point me to that discussion.
Split proxy_server.py (14K lines)
Roughly ~30 module-level globals (router, DB client, caches, config) are defined there and everything else imports them, creating a circular dependency that prevents extraction. The fix is mechanical: move the globals into a standalone ProxyContext singleton, then relocate function groups (streaming, spend helpers, model discovery, login, config, LLM endpoints) into their own files one PR at a time, cutting the file to a thin FastAPI shell that just mounts routers and runs startup. No architecture changes, no DI frameworks, just moving code behind a stable import path. Then split code out of proxy_server.py into smaller files (a few thousand lines can be moved).
Untangle spend tracking from auth
Right now budget checking is inlined inside the auth middleware (user_api_key_auth). The same function that validates your API key token also reads Redis counters, compares against 10 different budget levels, fires Slack alerts, and creates async tasks for email notifications. Spend recording is spread across 4 separate files that all import globals from proxy_server.py.
Testability. Today, to test "does a key get rejected when over budget?" you need to stand up the full FastAPI auth dependency, a Redis instance, a Prisma client, and mock the proxy_server globals. If budget checking is behind check_budget(entity) -> pass/fail, you test it with a unit test that passes in a spend value and a limit.
Decoupling deploy risk. If someone changes how soft-budget alerts work and introduces a bug, it currently breaks auth, meaning all requests fail, not just alerting. Separated, a spend tracking bug means incorrect budget enforcement. An auth bug means authentication is broken. Different blast radius.
Already separable (clean interfaces exist, low risk, but low value too)
Beta Was this translation helpful? Give feedback.
All reactions