Citrus LLM Evaluation Platform - API Reference

Version: 2.4.0
Base URL: http://localhost:8000
Last Updated: March 2026

Complete API reference for the Citrus LLM Evaluation Platform.

Overview
Authentication
Chat & Evaluations
Traces & Analytics
Health & Info
Error Handling
Rate Limiting

Overview

Base URLs

Environment	URL
Development	`http://localhost:8000`
Production	`https://api.yourdomain.com`

Content Types

All endpoints accept and return application/json unless otherwise specified.

API Versioning

Current API version: v1

Versioned endpoints: /api/v1/* Legacy endpoints: /api/* (deprecated, for backward compatibility)

Interactive Documentation

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc
OpenAPI Spec: http://localhost:8000/openapi.json

Authentication

Register User

Create a new user account.

POST /api/auth/register

Request Body:

{
  "email": "user@example.com",
  "name": "John Doe"
}

Response: 201 Created

{
  "message": "OTP sent to email",
  "email": "user@example.com"
}

Request OTP (Login)

Request a one-time password for authentication.

POST /api/auth/login

Request Body:

{
  "email": "user@example.com"
}

Response: 200 OK

{
  "message": "OTP sent to email",
  "email": "user@example.com"
}

Verify OTP

Verify OTP and receive JWT token.

POST /api/auth/verify-otp

Request Body:

{
  "email": "user@example.com",
  "otp": "123456"
}

Response: 200 OK

{
  "access_token": "eyJhbGciOiJIUzI1NiIs...",
  "token_type": "bearer",
  "expires_in": 2592000,
  "user": {
    "id": "user_id",
    "email": "user@example.com",
    "name": "John Doe"
  }
}

Get Current User

Get authenticated user information.

GET /api/auth/me

Headers:

Authorization: Bearer <access_token>

Response: 200 OK

{
  "id": "user_id",
  "email": "user@example.com",
  "name": "John Doe",
  "created_at": "2026-03-23T10:00:00Z"
}

Chat & Evaluations

Generate Dual Responses (Streaming)

Generate responses from two models simultaneously with Server-Sent Events (SSE).

POST /api/v1/evaluations/dual-responses

Request Body:

{
  "user_message": "Explain quantum computing",
  "chat_history": [
    {
      "role": "user",
      "content": "Hello"
    },
    {
      "role": "assistant",
      "content": "Hi! How can I help you?"
    }
  ],
  "session_id": "session-123",
  "temperature": 0.7,
  "max_tokens": 2000
}

Response: 200 OK (SSE Stream)

event: token
data: {"model": "1", "content": "Quantum"}

event: token
data: {"model": "1", "content": " computing"}

event: token
data: {"model": "2", "content": "Quantum"}

event: done
data: {"trace_id": "trace-abc123", "total_tokens": 500}

Request Fields:

Field	Type	Required	Description
`user_message`	string	Yes	User's input message
`chat_history`	array	No	Previous conversation messages
`session_id`	string	Yes	Unique session identifier
`temperature`	float	No	Model temperature (0-2, default: 0.7)
`max_tokens`	int	No	Max tokens per response (default: 2000)

Store Preference

Store user's preference between two responses.

POST /api/v1/evaluations/store-preference

Request Body:

{
  "session_id": "session-123",
  "trace_id": "trace-abc123",
  "user_message": "Explain quantum computing",
  "response_1": "Quantum computing uses qubits...",
  "response_2": "In quantum computing, we use...",
  "choice": "response_1",
  "reasoning": "More clear and concise explanation",
  "metadata": {
    "response_time_1": 1200,
    "response_time_2": 1500
  }
}

Response: 201 Created

{
  "id": "preference-xyz789",
  "message": "Preference stored successfully",
  "timestamp": "2026-03-23T10:00:00Z"
}

Choice Values:

response_1 - User prefers first response
response_2 - User prefers second response
tie - Both responses are equally good

Get Evaluation Statistics

Get aggregated statistics for evaluations.

GET /api/v1/evaluations/stats

Query Parameters:

Parameter	Type	Default	Description
`days`	int	7	Number of days to include
`session_id`	string	-	Filter by session

Response: 200 OK

{
  "total_evaluations": 1500,
  "total_preferences": 1200,
  "preference_distribution": {
    "response_1": 45.5,
    "response_2": 42.3,
    "tie": 12.2
  },
  "average_response_time_ms": 1350,
  "models_compared": ["gemini-2.5-flash", "gemini-2.5-pro"],
  "period": {
    "start": "2026-03-16T00:00:00Z",
    "end": "2026-03-23T23:59:59Z"
  }
}

Legacy: Dual Responses (Non-versioned)

For backward compatibility only. Use versioned endpoint instead.

POST /api/dual-responses

Traces & Analytics

List Traces

Get paginated list of traces with filtering.

GET /api/v1/traces

Query Parameters:

Parameter	Type	Default	Description
`skip`	int	0	Number of records to skip
`limit`	int	50	Maximum records to return
`status`	string	-	Filter by status (success, error, running)
`session_id`	string	-	Filter by session
`user_id`	string	-	Filter by user
`start_date`	datetime	-	Filter from date
`end_date`	datetime	-	Filter to date
`sort_by`	string	timestamp	Sort field
`sort_order`	string	desc	Sort order (asc, desc)

Response: 200 OK

{
  "traces": [
    {
      "trace_id": "trace-abc123",
      "name": "dual-response-generation",
      "session_id": "session-123",
      "user_id": "user-456",
      "status": "success",
      "start_time": "2026-03-23T10:00:00Z",
      "end_time": "2026-03-23T10:00:02Z",
      "total_latency_ms": 2000,
      "total_token_usage": {
        "prompt_tokens": 150,
        "completion_tokens": 500,
        "total_tokens": 650
      },
      "privacy_score": 100,
      "vault_processed": true
    }
  ],
  "total": 1500,
  "skip": 0,
  "limit": 50
}

Get Trace by ID

Get detailed trace information including spans.

GET /api/v1/traces/{trace_id}

Path Parameters:

Parameter	Type	Description
`trace_id`	string	Unique trace identifier

Response: 200 OK

{
  "trace_id": "trace-abc123",
  "name": "dual-response-generation",
  "session_id": "session-123",
  "user_id": "user-456",
  "status": "success",
  "start_time": "2026-03-23T10:00:00Z",
  "end_time": "2026-03-23T10:00:02Z",
  "total_latency_ms": 2000,
  "total_token_usage": {
    "prompt_tokens": 150,
    "completion_tokens": 500,
    "total_tokens": 650
  },
  "spans": [
    {
      "span_id": "span-001",
      "name": "llm_call",
      "parent_span_id": null,
      "start_time": "2026-03-23T10:00:00.100Z",
      "end_time": "2026-03-23T10:00:01.500Z",
      "duration_ms": 1400,
      "status": "success",
      "attributes": {
        "model": "gemini-2.5-flash",
        "temperature": 0.7,
        "input_tokens": 75,
        "output_tokens": 250
      },
      "input": "[MASKED:user_message]",
      "output": "Quantum computing is..."
    }
  ],
  "metadata": {
    "client_ip": "[MASKED:ip]",
    "user_agent": "Mozilla/5.0...",
    "api_version": "v1"
  },
  "privacy_score": 100,
  "vault_processed": true,
  "pii_detected": ["EMAIL_ADDRESS"],
  "encryption_map": {
    "user_message": "vault:v1:abc123..."
  }
}

Get Trace Statistics

Get aggregated statistics for traces.

GET /api/v1/traces/statistics

Query Parameters:

Parameter	Type	Default	Description
`days`	int	7	Number of days to include
`model`	string	-	Filter by model

Response: 200 OK

{
  "total_traces": 5000,
  "success_rate": 98.5,
  "error_rate": 1.5,
  "latency": {
    "p50": 1200,
    "p95": 2500,
    "p99": 3500,
    "avg": 1400
  },
  "token_usage": {
    "total_prompt_tokens": 750000,
    "total_completion_tokens": 2500000,
    "avg_prompt_tokens": 150,
    "avg_completion_tokens": 500
  },
  "by_status": {
    "success": 4925,
    "error": 75,
    "running": 0
  },
  "by_model": {
    "gemini-2.5-flash": 2500,
    "gemini-2.5-pro": 2500
  },
  "period": {
    "start": "2026-03-16T00:00:00Z",
    "end": "2026-03-23T23:59:59Z"
  }
}

Get Session Traces

Get all traces for a specific session.

GET /api/v1/traces/session/{session_id}

Path Parameters:

Parameter	Type	Description
`session_id`	string	Session identifier

Response: 200 OK

{
  "session_id": "session-123",
  "traces": [...],
  "total": 15,
  "total_tokens": 9750,
  "total_latency_ms": 18000
}

Evaluate Trace (Privacy-Preserving)

Run VaultGemma evaluation on a trace.

POST /api/v1/traces/{trace_id}/evaluate

Path Parameters:

Parameter	Type	Description
`trace_id`	string	Unique trace identifier

Request Body:

{
  "evaluation_type": "quality",
  "criteria": ["accuracy", "relevance", "coherence"]
}

Response: 200 OK

{
  "trace_id": "trace-abc123",
  "evaluation": {
    "scores": {
      "accuracy": 0.92,
      "relevance": 0.88,
      "coherence": 0.95
    },
    "overall_score": 0.917
  },
  "privacy": {
    "dp_protected": true,
    "epsilon": 1.0,
    "delta": 1e-5,
    "privacy_score": 100
  },
  "timestamp": "2026-03-23T10:05:00Z"
}

Get Model Performance

Get performance metrics for models.

GET /api/v1/models/performance

Query Parameters:

Parameter	Type	Default	Description
`days`	int	7	Number of days to include
`models`	string	-	Comma-separated model names

Response: 200 OK

{
  "models": [
    {
      "name": "gemini-2.5-flash",
      "total_requests": 2500,
      "success_rate": 99.2,
      "latency": {
        "p50": 800,
        "p95": 1500,
        "p99": 2000
      },
      "avg_tokens_per_request": 400,
      "preference_win_rate": 45.5
    },
    {
      "name": "gemini-2.5-pro",
      "total_requests": 2500,
      "success_rate": 98.8,
      "latency": {
        "p50": 1200,
        "p95": 2500,
        "p99": 3500
      },
      "avg_tokens_per_request": 600,
      "preference_win_rate": 54.5
    }
  ],
  "period": {
    "start": "2026-03-16T00:00:00Z",
    "end": "2026-03-23T23:59:59Z"
  }
}

Get Real-time Analytics

Get live dashboard metrics.

GET /api/v1/analytics/realtime

Query Parameters:

Parameter	Type	Default	Description
`minutes`	int	60	Time window in minutes

Response: 200 OK

{
  "requests_per_minute": 12.5,
  "active_sessions": 45,
  "error_rate": 0.5,
  "avg_latency_ms": 1250,
  "token_rate": 5000,
  "queue_depth": 3,
  "models_in_use": ["gemini-2.5-flash", "gemini-2.5-pro"],
  "recent_errors": [
    {
      "timestamp": "2026-03-23T09:55:00Z",
      "error_type": "rate_limit",
      "message": "Rate limit exceeded for model gemini-2.5-pro"
    }
  ],
  "time_series": {
    "requests": [...],
    "latency": [...],
    "errors": [...]
  },
  "timestamp": "2026-03-23T10:00:00Z"
}

Health & Info

Root Endpoint

Get API information and available endpoints.

GET /

Response: 200 OK

{
  "name": "Citrus - LLM Evaluation Platform",
  "version": "2.4.0",
  "status": "operational",
  "message": "Welcome to Citrus - LLM Evaluation Platform",
  "endpoints": {
    "health": "/health",
    "docs": "/docs",
    "redoc": "/redoc",
    "chat": {
      "dual_responses": "/api/v1/dual-responses",
      "store_preference": "/api/v1/store-preference"
    },
    "analytics": {
      "stats": "/api/v1/stats",
      "traces": "/api/v1/traces",
      "trace_statistics": "/api/v1/statistics",
      "model_performance": "/api/v1/models/performance",
      "realtime": "/api/v1/analytics/realtime"
    }
  },
  "timestamp": "2026-03-23T10:00:00Z"
}

Health Check

Check platform health status.

GET /health

Response: 200 OK

{
  "status": "healthy",
  "database": "connected",
  "version": "2.4.0",
  "uptime_seconds": 86400.5,
  "timestamp": "2026-03-23T10:00:00Z"
}

Status Values:

healthy - All systems operational
degraded - Some systems impaired
unhealthy - Critical systems down

API Info

Get detailed platform information.

GET /api/info

Response: 200 OK

{
  "platform": "Citrus AI",
  "version": "2.4.0",
  "features": [
    "Dual Response Generation",
    "Preference Learning",
    "Model Performance Analytics",
    "Real-time Tracing",
    "Multi-model Support"
  ],
  "supported_models": [
    "Gemini 1.5 Pro",
    "GPT-4",
    "Claude 3",
    "Custom Models"
  ],
  "capabilities": {
    "chat": true,
    "evaluations": true,
    "analytics": true,
    "tracing": true,
    "preferences": true
  },
  "limits": {
    "max_message_length": 10000,
    "max_chat_history": 20,
    "max_concurrent_requests": 100,
    "request_timeout_seconds": 300
  },
  "timestamp": "2026-03-23T10:00:00Z"
}

Error Handling

Error Response Format

All errors follow a consistent format:

{
  "error": "Error Type",
  "detail": "Human-readable error message",
  "code": "ERROR_CODE",
  "timestamp": "2026-03-23T10:00:00Z"
}

HTTP Status Codes

Code	Meaning	Description
`200`	OK	Request successful
`201`	Created	Resource created
`400`	Bad Request	Invalid request body
`401`	Unauthorized	Missing or invalid token
`403`	Forbidden	Insufficient permissions
`404`	Not Found	Resource not found
`422`	Unprocessable Entity	Validation error
`429`	Too Many Requests	Rate limit exceeded
`500`	Internal Server Error	Server error

Common Error Codes

Code	Description
`VALIDATION_ERROR`	Request validation failed
`AUTH_REQUIRED`	Authentication required
`TOKEN_EXPIRED`	JWT token has expired
`INVALID_OTP`	OTP code is invalid or expired
`RATE_LIMITED`	Request rate limit exceeded
`MODEL_ERROR`	LLM API error
`DATABASE_ERROR`	Database operation failed
`VAULT_ERROR`	Vault operation failed
`NOT_FOUND`	Resource not found
`INTERNAL_ERROR`	Unexpected server error

Example Error Response

{
  "error": "Validation Error",
  "detail": "user_message is required",
  "code": "VALIDATION_ERROR",
  "timestamp": "2026-03-23T10:00:00Z"
}

Rate Limiting

Default Limits

Endpoint Type	Limit	Window
Authentication	10 requests	1 minute
Chat/Evaluations	60 requests	1 minute
Analytics	100 requests	1 minute
Health	No limit	-

Rate Limit Headers

Responses include rate limit information:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1679616060

Rate Limit Exceeded Response

HTTP/1.1 429 Too Many Requests
Retry-After: 30

{
  "error": "Rate Limit Exceeded",
  "detail": "Too many requests. Please retry after 30 seconds.",
  "code": "RATE_LIMITED",
  "retry_after": 30,
  "timestamp": "2026-03-23T10:00:00Z"
}

SDKs and Examples

Python Client

import requests

BASE_URL = "http://localhost:8000"

# Login
response = requests.post(f"{BASE_URL}/api/auth/login", json={
    "email": "user@example.com"
})

# Verify OTP
response = requests.post(f"{BASE_URL}/api/auth/verify-otp", json={
    "email": "user@example.com",
    "otp": "123456"
})
token = response.json()["access_token"]

# Generate dual responses
headers = {"Authorization": f"Bearer {token}"}
response = requests.post(
    f"{BASE_URL}/api/v1/evaluations/dual-responses",
    headers=headers,
    json={
        "user_message": "Explain quantum computing",
        "chat_history": [],
        "session_id": "session-123"
    },
    stream=True
)

# Process SSE stream
for line in response.iter_lines():
    if line:
        print(line.decode())

JavaScript/TypeScript Client

const BASE_URL = 'http://localhost:8000';

// Login
const loginResponse = await fetch(`${BASE_URL}/api/auth/login`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ email: 'user@example.com' })
});

// Verify OTP
const verifyResponse = await fetch(`${BASE_URL}/api/auth/verify-otp`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ email: 'user@example.com', otp: '123456' })
});
const { access_token } = await verifyResponse.json();

// Generate dual responses (SSE)
const eventSource = new EventSource(
  `${BASE_URL}/api/v1/evaluations/dual-responses-stream?token=${access_token}`
);

eventSource.onmessage = (event) => {
  const data = JSON.parse(event.data);
  console.log(data);
};

cURL Examples

# Health check
curl http://localhost:8000/health

# Login
curl -X POST http://localhost:8000/api/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email": "user@example.com"}'

# Generate dual responses
curl -X POST http://localhost:8000/api/v1/evaluations/dual-responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{
    "user_message": "Explain quantum computing",
    "chat_history": [],
    "session_id": "test-123"
  }'

# Get traces
curl "http://localhost:8000/api/v1/traces?limit=10&status=success" \
  -H "Authorization: Bearer <token>"

API Reference for Citrus LLM Evaluation Platform v2.4.0

FilesExpand file tree

API_REFERENCE.md

Latest commit

History

API_REFERENCE.md

File metadata and controls

Citrus LLM Evaluation Platform - API Reference

Table of Contents

Overview

Base URLs

Content Types

API Versioning

Interactive Documentation

Authentication

Register User

Request OTP (Login)

Verify OTP

Get Current User

Chat & Evaluations

Generate Dual Responses (Streaming)

Store Preference

Get Evaluation Statistics

Legacy: Dual Responses (Non-versioned)

Traces & Analytics

List Traces

Get Trace by ID

Get Trace Statistics

Get Session Traces

Evaluate Trace (Privacy-Preserving)

Get Model Performance

Get Real-time Analytics

Health & Info

Root Endpoint

Health Check

API Info

Error Handling

Error Response Format

HTTP Status Codes

Common Error Codes

Example Error Response

Rate Limiting

Default Limits

Rate Limit Headers

Rate Limit Exceeded Response

SDKs and Examples

Python Client

JavaScript/TypeScript Client

cURL Examples