Replies: 1 comment
-
|
This is expected behavior — LiteLLM is not single-threaded despite using asyncio. Why you see multi-core CPU usage1. ThreadPoolExecutor with 100 workersLiteLLM has a global thread pool in MAX_THREADS = 100
executor = ThreadPoolExecutor(max_workers=MAX_THREADS)This is used by the 2. Additional thread sources
3. GIL doesn't prevent CPU throttlingThe GIL prevents parallel Python bytecode execution, but:
Recommendations
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Haven't deep-dived into the architecture myself yet. Deployed LiteLLM in k8s. Decent traffic going through instance. CPU limit is 4, there are 4 pods.
I don't quite understand why I am seeing CPU throttling by the kernel, of the LiteLLM pod. It's running fastapi + asyncio + uvicorn. Uvicorn has 1 worker configured per pod / container. Python has GIL, and there's no multiprocessing used.
I am not intimately familiar with asyncio / fastapi. Does it spawn multiple threads? I thought it was an event loop in single thread. Are the multiple threads somehow not subject to the GIL? What's going on?
Grateful for any thoughts from the devs / folks who have deep-dived into the architecture 🙏
Beta Was this translation helpful? Give feedback.
All reactions