LiteLLM Performance Roadmap #15933
Replies: 3 comments 2 replies
-
|
Hi @AlexsanderHamir , can you share the ENV variables worker configurations and db configurations used as well or if you are using helm cna you share values.yaml |
Beta Was this translation helpful? Give feedback.
-
|
@AlexsanderHamir you are doing great job! Kudos! I would love to see memory issues resolved, already considering to use |
Beta Was this translation helpful? Give feedback.
-
Ram use idle at startup, on my windows machine, setup as "proxy".
Shows up on my task manager as python.exe But litellm.exe itself uses barely 5MB But it might be because of my callback, I'm not sure. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
Sharing our public roadmap on LiteLLM performance overheads:
As of v1.78.5, the LiteLLM AI Gateway adds a 8 ms median overhead and 45 ms P99 overhead at 1 K concurrent requests with 4 LiteLLM instances.
This is an ~80% improvement over v1.76.0. This roadmap has 3 key components we plan on achieving by end of 2025:
You can read a detailed breakdown of each component below.
Roadmap & Goals
1. Reduce latency across endpoints (Target: Nov 30, 2025)
Achieve 8 ms median overhead and 45 ms P99 overhead on the setup described above for the following endpoints:
/chat/completions/responses/embeddings/realtime/audio/speech/audio/transcriptions2. Address memory issues (Target: Nov 30, 2025)
Resolve all reported memory leaks:
Lower LiteLLM’s overall memory footprint:
Reduce LiteLLM’s memory allocations per request.
3. Halve LiteLLM’s overhead (Target: Dec 31, 2025)
Feedback
Is there anything you'd like to see us address related to LiteLLM performance this year?
Comment below — we're happy to work with you.
Appendix: Test Setup
Load Testing Configuration
/chat/completionsEnvironment
References
Beta Was this translation helpful? Give feedback.
All reactions