An API for chatting with multiple LLM models.
LLMEngine currently has support of
- Ollama
- GroqAPI
Using, Ollama you can run multiple LLM models on your devices. Some models supported using Ollama are
- Llama3
- phi3
- gemma2b
Currently, mixtral8x7b is supported using GroqAPI.
To run this engine, you need to run this command
python main.py
There are API endpoints
You can chat with the supported LLMs using this API.
http://localhost:8000/api/v1/prompt
In the Body you can give prompt and model_type
For example
{
"prompt": "Fastest planet in the world",
"model_type": "mixtral-8x7b-32768"
}
Yes, Retrieval Augmented Generation (RAG) is also supported in LLMEngine.
Hit this endpoint
http://localhost:8000/api/v1/rag
In the Body pass, model_type, query and file

- Setup Support for more models
- Shift
llama3support to GroqAPI as it requires a lot of compute. - Add
SelfCorrectiveRAGsupport in the API. - Add support for different types of files
- Add WebsiteLoader support