Data requirements

You can start with only three fields:

Field	Required	Meaning
timestamp	yes	request arrival time; numeric seconds or a datetime-like column
input_tokens	yes	request input or prompt tokens
output_tokens	yes	generated response tokens

The package gets more useful when you also provide these:

Field	Recommended	Why it matters
cached_input_tokens	yes	some providers discount cached tokens
thinking_tokens	yes	reasoning-heavy routes can burn far more reserved capacity
max_output_tokens	yes	useful for conservative planning and admission-style estimates
class_name	yes	lets you segment chat, RAG, tool use, reasoning, and other traffic classes
latency_s	yes	helps fit a baseline-latency model from real telemetry

Minimal schema

timestamp,input_tokens,output_tokens
0.0,800,120
0.2,1200,180
0.7,600,95

Recommended schema

timestamp,class_name,input_tokens,cached_input_tokens,output_tokens,thinking_tokens,max_output_tokens,latency_s
0.0,chat,800,120,120,0,512,0.74
0.2,rag,1200,350,180,0,768,1.10
0.7,reasoning,600,0,95,210,1024,1.48

Example files in this repo

examples/input/synthetic_request_trace_baseline.csv
examples/input/synthetic_request_trace_optimized.csv

Those are fake, but structurally realistic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data requirements

Minimal schema

Recommended schema

Example files in this repo

FilesExpand file tree

data-requirements.md

Latest commit

History

data-requirements.md

File metadata and controls

Data requirements

Minimal schema

Recommended schema

Example files in this repo