This document defines the Decision Layer of ratelord.
While the Prediction Engine observes the world and forecasts risk, the Policy Engine decides what to do about it. It acts as the "judge," taking raw forecasts and agent intents as input, and issuing binding verdicts (intent_decided) as output.
Traditional rate limiters are arithmetic: if count > limit then block.
ratelord is probabilistic: if P(exhaustion) > risk_tolerance then shape.
Policies govern future risk, not past consumption. A 90% utilized pool might be perfectly safe if the reset is in 1 second. A 10% utilized pool might be critical if the burn rate is massive and reset is in 1 hour.
Policies exist in a strict hierarchy. A lower-level policy cannot permit what a higher-level policy forbids.
- Global Safety (System-wide Hard Rules)
- Organization / Scope (Business Rules)
- Pool / Resource (Provider constraints)
- Identity / Agent (Local optimization)
The goal of ratelord is to maximize successful throughput, not to block traffic.
Policies prefer Shaping (delay, defer, degrade) over Denial whenever possible. A "Soft Limit" is a signal to slow down, not a wall.
A Policy is a named collection of Rules bound to a specific Target (Scope, Pool, or Identity).
A Rule consists of:
- Condition: A logic predicate based on Forecasts, Metadata, or Time.
- Action: The decision to apply if the condition matches.
- Priority: Precedence order (critical vs. optimization).
Conditions evaluate the forecast_computed and intent_submitted events.
- Risk Metrics:
risk.p_exhaustion,tte.p99,margin.seconds. - Identity Metadata:
agent.role(prod/ci/dev),agent.priority. - Pool State:
pool.remaining_percent,pool.is_resetting. - Time:
time.is_business_hours,time.seconds_to_reset.
The engine outputs one of the following verdicts:
| Action | Description | Behavior |
|---|---|---|
| APPROVE | Risk is acceptable. | Agent proceeds immediately. |
| SHAPE (Throttle) | Risk is elevated. | Agent must wait wait_seconds before proceeding. |
| DEFER | Risk is high, but transient. | Agent must wait until after the next reset. |
| DENY | Risk is critical or rule violated. | Agent must abort the action completely. |
| SWITCH | Pool is exhausted/risky. | Agent must retry using a different identity_id (fallback). |
When an agent submits an intent_submitted event, the Daemon triggers the Arbitration Cycle.
- Intent: Who, What, Where (Scope), How much (Cost).
- Forecast: The latest
forecast_computedfor the relevant pool(s). - Policies: All active policies matching the Intent's scope hierarchy.
The engine evaluates rules from Top to Bottom (Global -> Local).
-
Check Hard Limits (Global/Pool):
- Condition:
risk.p_exhaustion > 0.99ORpool.remaining == 0. - Action:
DENYorDEFER. - Result: If matched, stop and return.
- Condition:
-
Check Business Rules (Org/Scope):
- Condition:
agent.role == 'dev' AND risk.p_exhaustion > 0.5. - Action:
SHAPE(add delay) orDENY. - Result: If matched, apply modifier.
- Condition:
-
Check Fairness/Optimization (Local/Identity):
- Condition:
identity.burn_rate > target_share. - Action:
SHAPE(smooth out the spikes).
- Condition:
The cycle emits an intent_decided event:
{
"event_type": "intent_decided",
"intent_id": "uuid",
"decision": "approve_with_modifications",
"modifications": {
"throttle_wait_seconds": 2.5,
"cost_adjustment": 0
},
"reason": "policy:soft_limit_buffer",
"risk_score": 0.45
}Safety constraints that prevent catastrophic exhaustion.
- Trigger:
margin.seconds < 0(We will die before reset). - Action:
DENY(if urgent) orDEFER(if waitable). - Target: All agents, regardless of priority.
Optimization constraints to preserve buffer.
- Trigger:
risk.p_exhaustion > 0.2(20% chance of failure). - Action:
SHAPE(Linear backoff). - Goal: Slow down consumption just enough to land the plane safely at the reset time.
Prevents one agent from monopolizing a shared pool.
- Trigger:
identity.burn_rate_share > 0.5(Using >50% of pool capacity). - Action:
SHAPE(Throttle specific identity). - Note: Applied per identity, preserving the pool for others.
Differentiates between critical and non-critical workloads.
- Rule A:
if role == 'ci' AND risk.level == 'high' THEN DENY. - Rule B:
if role == 'prod' AND risk.level == 'high' THEN APPROVE. - Result: Production traffic eats into the safety margin; CI traffic stops to save it.
The Policy Engine must be robust against system failures (e.g., disconnected daemon, stale forecasts).
If time.now - forecast.as_of > stale_threshold:
- Policy: "Uncertainty Principle"
- Action: Widen safety margins.
risk.p_exhaustionis treated as 100% for non-critical traffic. - Result: Fail-safe defaults (throttle/deny) rather than fail-open (allow and crash).
If the provider is down or returning errors:
- Policy: "Emergency Stop"
- Action:
DENYall non-essential intents.SHAPEessential intents with massive backoff.
Policies are defined in declarative files (e.g., policies.yaml).
policies:
- id: "global-safety-net"
scope: "global"
type: "hard"
rules:
- name: "prevent-exhaustion"
condition: "risk.p99_exhaustion_before_reset == true"
action: "defer"
priority: 100
- id: "dev-throttling"
scope: "env:dev"
type: "soft"
rules:
- name: "slow-down-devs"
condition: "pool.utilization > 0.50"
action: "shape"
params:
algorithm: "linear"
factor: 2.0
priority: 50- Feedback Loops: If policies throttle agents, the observed burn rate drops. The Predictor might see this drop and signal "Risk is low", causing the Policy to relax, causing a spike.
- Mitigation: The Predictor needs to know why burn dropped (Natural vs. Artificial).
- Switching Logic: When a
SWITCHaction is issued (use backup key), how does the agent discover the backupidentity_id? Does the daemon provide it, or is it client-side config? - Pre-emption: Can a high-priority intent revoke a previously approved (but long-running) lower-priority intent? (Likely out of scope for Phase 1).