@reaatech/agent-eval-harness-types
Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.
Canonical TypeScript domain types, Zod schemas, and interfaces for the agent-eval-harness ecosystem. This package is the foundational dependency of every other package in the monorepo.
npm install @reaatech/agent-eval-harness-types
# or
pnpm add @reaatech/agent-eval-harness-types
19 domain type interfaces — Turn, Trajectory, EvalResult, JudgeScore, CostBreakdown, LatencyBudget, GoldenTrajectory, RegressionGate, and more
20 Zod schemas — runtime validation for every domain type with full type inference via z.infer
Zero runtime dependencies beyond zod
Dual ESM/CJS output — works with import and require
Golden trajectory markers — golden, expected, and quality_notes fields on every turn
CI gate types — threshold, baseline-comparison, and distribution gates with regression tracking
Suite runner types — configuration, run status, comparison, and metric regression interfaces
import { TurnSchema , type Trajectory , type EvalResult } from '@reaatech/agent-eval-harness-types' ;
const turn = TurnSchema . parse ( {
turn_id : 1 ,
role : 'user' ,
content : 'Hello' ,
timestamp : '2026-04-15T00:00:00Z' ,
} ) ;
const trajectory : Trajectory = { turns : [ turn ] , metadata : { total_turns : 1 } } ;
Name
Type
Description
Turn
interface
Single turn in a trajectory with role, content, timestamp, and optional tool calls, latency, and cost
ToolCall
interface
Tool invocation with name, arguments, and optional result
CostData
interface
Token usage and cost for a single turn
Trajectory
interface
Complete agent execution with turns array and optional metadata
EvalResult
interface
Evaluation result with overall score, per-metric scores, and issues
EvalIssue
interface
Issue found during evaluation with type, severity, and description
Name
Type
Description
JudgeScore
interface
LLM judge scoring result with score, explanation, confidence, and calibration status
Name
Type
Description
CostBreakdown
interface
Full cost breakdown for a trajectory with LLM, tool, and per-turn costs
TurnCost
interface
Cost breakdown for a single turn with token counts
Name
Type
Description
LatencyBudget
interface
Latency SLA budget with P50, P90, P99 thresholds and component breakdowns
LatencyResult
interface
Latency measurement result with percentiles, violations, and SLA status
LatencyViolation
interface
SLA violation record with turn ID, actual vs threshold values
Name
Type
Description
GoldenTrajectory
interface
Golden reference trajectory with versioning and quality markers
Name
Type
Description
RegressionGate
interface
Gate definition with threshold, baseline-comparison, or distribution types
GateResult
interface
Single gate evaluation result with pass/fail and actual vs expected values
Name
Type
Description
EvalSuiteConfig
interface
Suite configuration with metrics, judge model, budgets, gates, and parallelism
EvalRunStatus
interface
Suite run progress with status, completion counts, and timing
RunComparison
interface
Comparison of two evaluation runs with metric diffs and significance testing
MetricRegression
interface
Single regression with baseline and candidate values and change percentage
Name
Type
Description
ToolCallSchema
ZodObject
Validates tool invocation structure
CostDataSchema
ZodObject
Validates token counts and cost data
TurnSchema
ZodObject
Validates turn structure with optional tool calls, latency, and golden markers
TrajectoryMetadataSchema
ZodObject
Validates trajectory metadata
TrajectorySchema
ZodObject
Validates complete trajectory (minimum one turn, optional metadata)
EvalIssueSchema
ZodObject
Validates evaluation issue records
EvalResultSchema
ZodObject
Validates evaluation results with metrics and issues
JudgeScoreSchema
ZodObject
Validates judge scoring output
CostBreakdownSchema
ZodObject
Validates cost breakdowns with per-turn cost arrays
LatencyBudgetSchema
ZodObject
Validates latency budget configuration
LatencyViolationSchema
ZodObject
Validates latency SLA violations
LatencyResultSchema
ZodObject
Validates latency measurement results
QualityMarkersSchema
ZodObject
Validates golden trajectory quality markers
GoldenTrajectorySchema
ZodObject
Validates golden trajectories with nested trajectory and quality markers
RegressionGateSchema
ZodObject
Validates regression gate definitions
GateResultSchema
ZodObject
Validates gate evaluation results
EvalSuiteConfigSchema
ZodObject
Validates suite configuration with nested latency budget and gates
EvalRunStatusSchema
ZodObject
Validates suite run status
MetricRegressionSchema
ZodObject
Validates metric regression records
RunComparisonSchema
ZodObject
Validates run comparison results with statistical significance arrays
Package
Description
@reaatech/agent-eval-harness-types
Shared domain types and Zod schemas
@reaatech/agent-eval-harness-trajectory
Trajectory loading, evaluation, and golden comparison
@reaatech/agent-eval-harness-tool-use
Tool-use validation and schema compliance
@reaatech/agent-eval-harness-cost
Cost tracking, budgets, and reporting
@reaatech/agent-eval-harness-latency
Latency monitoring, SLA enforcement, and optimization
@reaatech/agent-eval-harness-judge
LLM-as-judge with calibration and consensus
@reaatech/agent-eval-harness-golden
Golden trajectory management and curation
@reaatech/agent-eval-harness-suite
Suite runner, results aggregation, and comparison
@reaatech/agent-eval-harness-gate
CI regression gates with JUnit and GitHub output
@reaatech/agent-eval-harness-mcp-server
MCP server with three-layer tool architecture
@reaatech/agent-eval-harness-cli
Command-line interface
@reaatech/agent-eval-harness-observability
OTel tracing, metrics, structured logging, and dashboards
MIT