A high-performance, intent-aware router for Large Language Models.
Omnitrix is not just an API wrapper—it is an intelligent gateway that analyzes user prompts to route traffic efficiently. It optimizes for cost and latency by determining User Intent (Coding vs. Creative) and Subscription Tier (Free vs. Premium) before selecting the appropriate Model (Local vs. Cloud).
Omnitrix uses a multi-stage pipeline to process every request. Instead of blindly sending every string to an expensive LLM, it uses a "Fast Path" for pleasantries and a "Smart Path" for complex tasks.
Visual Flow:
Request
- Fast Path (Reflex): Uses string matching algorithms to handle phatic communication (greetings) in nanoseconds, saving GPU cycles.
- Smart Path (Resolver): Dynamically maps the user's intent to the best model for the job (e.g., coding queries go to
DeepSeek, creative writing goes toGemma).
- Zero-Latency Reflexes: Implements the Aho-Corasick algorithm to detect and respond to "Hi", "Hello", and blocked terms instantly without touching an LLM.
- Intent Classification: Utilizes a lightweight local SLM (Phi-3) to "read" the prompt and classify it into categories like
coding,creative,math, orgeneral. - Dynamic Model Resolution: The Resolver Pattern automatically switches between providers based on the logic matrix. It decouples the intent from the execution.
- Tiered Quality of Service:
- Free Tier: Routes to efficient local models (e.g.,
Gemma-2B,Phi-3). - Premium Tier: Routes to state-of-the-art cloud models (e.g.,
Llama-3-70Bon Groq) for superior performance.
- Free Tier: Routes to efficient local models (e.g.,
- Language: Go (Golang) 1.22+
- Web Framework: Gin Gonic
- Documentation: Swagger / OpenAPI 3.0
- Local Inference: Ollama (Phi-3, Gemma-2B, DeepSeek-Coder)
- Cloud Inference: Groq Cloud API
You must have Ollama installed and running locally. You also need to pull the specific models used by the Free Tier:
ollama serve
# Open a new terminal
ollama pull phi3:mini # For Intent Classification
ollama pull gemma:2b # For Creative tasks (Free)
ollama pull deepseek-coder # For Coding tasks (Free)If you want to test the Premium Tier, you need a Groq API Key.
export GROQ_API_KEY="gsk_your_groq_api_key_here"Clone the repository and download dependencies:
git clone https://github.com/AnubhavMadhav/Omnitrix.git
cd omnitrix
go mod downloadgo run cmd/server/main.go| 1. Instant Greetings | 2. Blocked Content Security |
|---|---|
![]() |
![]() |
| Handles phatic communication instantly. | Blocks restricted keywords via Aho-Corasick. |
3. Classify
| Eg 1: Poem | Eg 2: Coding |
|---|---|
![]() |
![]() |
Once the server is running, you can access the full interactive documentation via Swagger UI:
http://localhost:8080/swagger/index.html
Free Tier (Local Model):
curl -X POST http://localhost:8080/api/v1/chat \
-H "Content-Type: application/json" \
-d '{"prompt": "Write a Python script to sort a list"}'Premium Tier (Cloud Model):
curl -X POST http://localhost:8080/api/v1/chat \
-H "Content-Type: application/json" \
-H "X-User-Tier: premium" \
-d '{"prompt": "Write a Python script to sort a list"}'Omnitrix follows the Standard Go Project Layout to ensure scalability and maintainability.
omnitrix/
├── cmd/
│ └── server/ # Application entry point (main.go)
├── docs/ # Generated Swagger documentation
├── internal/
│ ├── llm/ # Interfaces for AI Clients (Ollama/Groq)
│ ├── provider/ # Factory pattern for AI Provider selection
│ ├── reflex/ # Aho-Corasick engine for zero-latency checks
│ ├── resolver/ # Business logic for mapping Intent -> Model
│ ├── router/ # Gin HTTP handlers and core aim of router middleware
│ └── utils/ # Shared utilities (Log formatting)
└── go.mod # Dependency definitions



