Step 1: Put your API keys in .env Copy the .env.template and put in the relevant keys (e.g. OPENAI_API_KEY="sk-..")
Step 2: Test your proxy Start your proxy server
$ cd litellm-proxy && python3 main.py Make your first call
import openai 
openai.api_key = "sk-litellm-master-key"
openai.api_base = "http://0.0.0.0:8080"
response = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hey"}])
print(response)- 
Make
/chat/completionsrequests for 50+ LLM models Azure, OpenAI, Replicate, Anthropic, Hugging FaceExample: for
modeluseclaude-2,gpt-3.5,gpt-4,command-nightly,stabilityai/stablecode-completion-alpha-3b-4k{ "model": "replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1", "messages": [ { "content": "Hello, whats the weather in San Francisco??", "role": "user" } ] } - 
Consistent Input/Output Format
- Call all models using the OpenAI format - 
completion(model, messages) - Text responses will always be available at 
['choices'][0]['message']['content'] 
 - Call all models using the OpenAI format - 
 - 
Error Handling Using Model Fallbacks (if
GPT-4fails, tryllama2) - 
Logging - Log Requests, Responses and Errors to
Supabase,Posthog,Mixpanel,Sentry,LLMonitor,Traceloop,Helicone(Any of the supported providers here: https://docs.litellm.ai/docs/ - 
Token Usage & Spend - Track Input + Completion tokens used + Spend/model
 - 
Caching - Implementation of Semantic Caching
 - 
Streaming & Async Support - Return generators to stream text responses
 
This endpoint is used to generate chat completions for 50+ support LLM API Models. Use llama2, GPT-4, Claude2 etc
This API endpoint accepts all inputs in raw JSON and expects the following inputs
model(string, required): ID of the model to use for chat completions. See all supported models [here]: (https://docs.litellm.ai/docs/): eggpt-3.5-turbo,gpt-4,claude-2,command-nightly,stabilityai/stablecode-completion-alpha-3b-4kmessages(array, required): A list of messages representing the conversation context. Each message should have arole(system, user, assistant, or function),content(message text), andname(for function role).- Additional Optional parameters: 
temperature,functions,function_call,top_p,n,stream. See the full list of supported inputs here: https://docs.litellm.ai/docs/ 
For claude-2
{
  "model": "claude-2",
  "messages": [
    {
      "content": "Hello, whats the weather in San Francisco??",
      "role": "user"
    }
  ]
}import requests
import json
# TODO: use your URL
url = "http://localhost:5000/chat/completions"
payload = json.dumps({
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "content": "Hello, whats the weather in San Francisco??",
      "role": "user"
    }
  ]
})
headers = {
  'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)Responses from the server are given in the following format. All responses from the server are returned in the following format (for all LLM models). More info on output here: https://docs.litellm.ai/docs/
{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "I'm sorry, but I don't have the capability to provide real-time weather information. However, you can easily check the weather in San Francisco by searching online or using a weather app on your phone.",
        "role": "assistant"
      }
    }
  ],
  "created": 1691790381,
  "id": "chatcmpl-7mUFZlOEgdohHRDx2UpYPRTejirzb",
  "model": "gpt-3.5-turbo-0613",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 41,
    "prompt_tokens": 16,
    "total_tokens": 57
  }
}- Clone liteLLM repository to your local machine:
git clone https://github.com/BerriAI/liteLLM-proxy - Install the required dependencies using pip
pip install requirements.txt - (optional)Set your LiteLLM proxy master key
os.environ['LITELLM_PROXY_MASTER_KEY]` = "YOUR_LITELLM_PROXY_MASTER_KEY" or set LITELLM_PROXY_MASTER_KEY in your .env file - Set your LLM API keys
os.environ['OPENAI_API_KEY]` = "YOUR_API_KEY" or set OPENAI_API_KEY in your .env file - Run the server:
python main.py 
- 
Quick Start: Deploy on Railway
 - 
GCP,AWS,AzureThis project includes aDockerfileallowing you to build and deploy a Docker Project on your providers 
- Our calendar ๐
 - Community Discord ๐ญ
 - Our numbers ๐ +1 (770) 8783-106 / +1 (412) 618-6238
 - Our emails โ๏ธ [email protected] / [email protected]
 
- Support hosted db (e.g. Supabase)
 - Easily send data to places like posthog and sentry.
 - Add a hot-cache for project spend logs - enables fast checks for user + project limitings
 - Implement user-based rate-limiting
 - Spending controls per project - expose key creation endpoint
 - Need to store a keys db -> mapping created keys to their alias (i.e. project name)
 - Easily add new models as backups / as the entry-point (add this to the available model list)
 

