All endpoints are available at http://localhost:8000 by default. Interactive Swagger UI is at /docs.
Select the best model for a task without executing it. Returns a routing decision.
Request body:
{
"team_id": "team-engineering",
"complexity": "simple",
"domain": "chat",
"estimated_input_tokens": 500,
"messages": [{"role": "user", "content": "Hello"}],
"privacy": "public",
"estimated_output_tokens": 256,
"agent_depth": 0,
"preferred_model_id": null,
"max_cost_usd": null,
"workflow_id": null
}| Field | Type | Required | Description |
|---|---|---|---|
team_id |
string | Yes | Team identifier for budget tracking |
complexity |
enum | Yes | simple | moderate | complex | critical |
domain |
enum | Yes | chat | code | reasoning | extraction | classification | summarization | creative |
estimated_input_tokens |
int | Yes | Estimated prompt token count |
messages |
array | Yes | OpenAI message format [{"role": "user", "content": "..."}] |
privacy |
enum | No | public (default) | internal | confidential |
estimated_output_tokens |
int | No | Expected output length (default: 256) |
agent_depth |
int | No | Current recursive agent depth (default: 0) |
preferred_model_id |
string | No | Preferred model; used if eligible |
max_cost_usd |
float | No | Per-request cost ceiling in USD |
workflow_id |
string | No | Workflow identifier for per-workflow budgets |
Response 200 — accepted:
{
"task_id": "3f8a2c1d-...",
"accepted": true,
"chosen_model_id": "deepseek-v3",
"estimated_cost_usd": 0.000056,
"score": 0.096
}Response 422 — no capable model:
{
"detail": "No capable model found",
"failure_stage": 3,
"failure_reason": "complexity_ceiling",
"rejections": [
{"model_id": "phi-4-ollama", "reason": "complexity_mismatch"},
{"model_id": "gemini-nano", "reason": "complexity_mismatch"}
]
}Rejection reasons:
| Reason | Stage | Cause |
|---|---|---|
model_disabled |
1 | Model has enabled=false or deprecated=true |
context_too_large |
1 | estimated_input_tokens > max_context |
domain_not_supported |
1 | Model lacks the required domain capability |
privacy_violation |
1 | Confidential task routed to a non-local model |
complexity_mismatch |
1 | Task complexity outside model's [min_complexity, max_complexity] range |
agent_depth_exceeded |
2 | agent_depth > max_agent_depth (default: 5) |
token_limit_exceeded |
2 | estimated_input_tokens > max_tokens_per_step (default: 8,000) |
complexity_ceiling |
3 | Model's tier exceeds the ceiling for this complexity level |
budget_exceeded |
4 | Estimated cost would exceed team or workflow budget |
no_capable_model |
— | All candidates rejected; catch-all |
Route and execute in one call. Tidus selects the best model, calls the vendor adapter, logs the cost, and returns the response.
Request body: Same as /api/v1/route, plus:
| Field | Type | Required | Description |
|---|---|---|---|
estimated_output_tokens |
int | No | Expected output length (default: 256) |
agent_depth |
int | No | Current recursive agent depth (default: 0) |
Response 200:
{
"task_id": "b2c3d4e5-...",
"chosen_model_id": "deepseek-v3",
"content": "Here is the answer...",
"vendor": "deepseek",
"input_tokens": 18,
"output_tokens": 42,
"cost_usd": 0.0000084,
"latency_ms": 612.3
}Response 422 — no capable model: Same structure as /api/v1/route rejection.
Response 422 — adapter error:
{"detail": "Adapter error: <message>"}List all models in the registry.
Query parameters:
| Parameter | Type | Description |
|---|---|---|
enabled_only |
bool | Return only enabled, non-deprecated models |
tier |
int | Filter by tier (1–4) |
Response:
[
{
"model_id": "deepseek-r1",
"vendor": "deepseek",
"tier": 1,
"max_context": 128000,
"input_price": 0.00055,
"output_price": 0.00219,
"latency_p50_ms": 2000,
"capabilities": ["reasoning", "code", "chat"],
"min_complexity": "complex",
"max_complexity": "critical",
"is_local": false,
"enabled": true,
"deprecated": false
}
]Get a single model by ID. Returns 404 if not found.
Update model settings in-memory (changes not persisted to models.yaml).
{
"enabled": false,
"latency_p50_ms": 2500
}List all configured budget policies.
Create a budget policy.
{
"policy_id": "team-eng-monthly",
"scope": "team",
"scope_id": "team-engineering",
"period": "monthly",
"limit_usd": 500.00,
"warn_at_pct": 0.80,
"hard_stop": true
}| Field | Options |
|---|---|
scope |
team | workflow |
period |
daily | weekly | monthly | rolling_30d |
hard_stop |
true = reject requests over limit; false = warn only |
Live spend vs. limit for a team. Returns 404 if no policy exists for the team.
{
"team_id": "team-engineering",
"spent_usd": 123.45,
"limit_usd": 500.00,
"utilisation_pct": 24.69,
"is_hard_stopped": false,
"period": "monthly"
}Cost utilisation for all teams with active budget policies.
[
{
"team_id": "team-engineering",
"current_spend_usd": 123.45,
"limit_usd": 500.00,
"utilisation_pct": 24.69,
"is_hard_stopped": false
}
]Create a new agent session.
{
"session_id": "session-abc123",
"team_id": "team-engineering",
"max_depth": 5
}Returns 409 if a session with that ID already exists.
Get session state: current depth, retry count, tokens used.
Terminate a session. Returns 204 No Content.
Check guardrails and increment agent depth. Returns 422 if any limit is exceeded.
{
"session_id": "session-abc123",
"input_tokens": 1200
}Response 200:
{"allowed": true, "reason": null}Response 422 — limit exceeded:
{"detail": "agent_depth_exceeded"}Returns all dashboard metrics in a single call: cost KPIs, cost by model (7-day), budget utilisation, active sessions, and registry health.
{
"cost": {
"total_7d_usd": 12.45,
"total_30d_usd": 48.20,
"requests_7d": 14230,
"avg_cost_per_request_usd": 0.000875,
"estimated_monthly_usd": 208.50
},
"cost_by_model": [
{"model_id": "deepseek-v3", "vendor": "deepseek", "tier": 2, "total_usd": 5.20, "requests": 8400},
{"model_id": "claude-sonnet-4-6", "vendor": "anthropic", "tier": 2, "total_usd": 4.10, "requests": 2100}
],
"budgets": [
{
"team_id": "team-engineering",
"spent_usd": 123.45,
"limit_usd": 500.0,
"utilisation_pct": 24.69,
"is_hard_stopped": false
}
],
"sessions": [
{"session_id": "sess-001", "team_id": "team-engineering", "current_depth": 2, "total_tokens_used": 4800}
],
"registry_health": [
{"model_id": "deepseek-v3", "enabled": true, "latency_p50_ms": 820, "last_health_check": "2026-03-27T10:00:00Z"}
],
"generated_at": "2026-03-27T10:05:00Z"
}The dashboard SPA at /dashboard/ calls this endpoint every 30 seconds.
Trigger a health probe for all enabled models immediately (normally runs every 5 minutes automatically).
{"probed": 28, "healthy": 25, "unhealthy": 3}Trigger a price sync for all models immediately (normally runs weekly).
{"changes_detected": 2, "changes": [
{"model_id": "gpt-4o-mini", "field": "input_price", "old": 0.00015, "new": 0.00012}
]}Liveness probe. Returns 200 when the server process is running.
{"status": "ok"}Readiness probe. Returns 200 when the registry and DB are initialised.
- Swagger UI:
GET /docs - ReDoc:
GET /redoc - OpenAPI schema:
GET /openapi.json