-
Notifications
You must be signed in to change notification settings - Fork 4
Resource Monitoring
Guide to monitoring and controlling agent resource consumption with the Resource Exhaustion Service.
The Resource Exhaustion Service prevents runaway agents by:
- Tracking resource usage - Files, API calls, tokens, subtasks
- Progressive intervention - Warning → Pause → Terminate
- Deliverable checkpoints - Require periodic progress markers
- Automatic enforcement - Configurable thresholds with auto-pause
| Metric | Description |
|---|---|
filesRead |
Number of files read |
filesWritten |
Number of files created |
filesModified |
Number of files modified |
apiCallsCount |
Total API calls made |
subtasksSpawned |
Number of subtasks created |
tokensConsumed |
Total tokens used |
timeWithoutDeliverable |
Duration since last deliverable |
Agents progress through phases based on resource consumption:
stateDiagram-v2
[*] --> Normal: Agent started
Normal --> Warning: Approaching threshold
Warning --> Normal: Deliverable recorded
Warning --> Intervention: Threshold exceeded
Intervention --> Warning: Agent resumed
Intervention --> Termination: No response
Termination --> [*]
| Phase | Description | Actions |
|---|---|---|
| Normal | Operating within limits | No action |
| Warning | Approaching limits (default 80%) | Log warning, notify |
| Intervention | Exceeded limits | Pause agent, require approval |
| Termination | Unrecoverable | Force stop agent |
interface ResourceExhaustionConfig {
enabled: boolean;
warningThresholdPercent: number; // Default: 0.8 (80%)
checkIntervalMs: number; // Default: 60000 (1 minute)
pauseOnIntervention: boolean; // Default: true
autoTerminate: boolean; // Default: false
thresholds: ResourceThresholds;
}
interface ResourceThresholds {
maxFilesAccessed: number; // Default: 100
maxApiCalls: number; // Default: 50
maxSubtasksSpawned: number; // Default: 20
maxTokensConsumed: number; // Default: 100000
maxTimeWithoutDeliverableMs: number; // Default: 300000 (5 min)
}{
"resourceExhaustion": {
"enabled": true,
"warningThresholdPercent": 0.8,
"checkIntervalMs": 60000,
"pauseOnIntervention": true,
"autoTerminate": false,
"thresholds": {
"maxFilesAccessed": 100,
"maxApiCalls": 50,
"maxSubtasksSpawned": 20,
"maxTokensConsumed": 100000,
"maxTimeWithoutDeliverableMs": 300000
}
}
}Deliverables are progress markers that indicate an agent is making meaningful progress, not just consuming resources.
| Type | Description |
|---|---|
code_commit |
Code committed to repository |
test_passed |
Tests passing |
review_complete |
Code review finished |
documentation |
Documentation produced |
analysis_report |
Analysis or report generated |
deployment |
Deployment completed |
other |
Custom deliverable type |
Agents (or orchestrators) should record deliverables periodically:
import { getResourceExhaustionService } from '@blackms/aistack';
const resourceService = getResourceExhaustionService(store, config);
// Record a deliverable
const checkpoint = resourceService.recordDeliverable(
agentId,
'code_commit',
'Implemented user authentication module',
['src/auth/login.ts', 'src/auth/jwt.ts']
);Recording a deliverable:
- Creates a checkpoint in the database
- Updates
lastDeliverableAttimestamp - Resets agent from
warningtonormalphase
import { getResourceExhaustionService } from '@blackms/aistack';
const resourceService = getResourceExhaustionService(store, config);
// Start tracking a new agent
const metrics = resourceService.initializeAgent(agentId, 'coder');// Record file operations
resourceService.recordFileOperation(agentId, 'read');
resourceService.recordFileOperation(agentId, 'write');
resourceService.recordFileOperation(agentId, 'modify');
// Record API calls
resourceService.recordApiCall(agentId, 1500); // with token count
// Record subtask spawning
resourceService.recordSubtaskSpawn(agentId);// Get current metrics
const metrics = resourceService.getAgentMetrics(agentId);
console.log(metrics);
// {
// agentId: 'uuid',
// filesRead: 15,
// filesWritten: 3,
// filesModified: 8,
// apiCallsCount: 12,
// subtasksSpawned: 2,
// tokensConsumed: 45000,
// phase: 'normal',
// lastDeliverableAt: Date,
// ...
// }
// Evaluate current phase
const phase = resourceService.evaluateAgent(agentId);
// Returns: 'normal' | 'warning' | 'intervention' | 'termination'// Pause an agent
await resourceService.pauseAgent(agentId, 'Manual review required');
// Check if paused
const isPaused = resourceService.isAgentPaused(agentId);
// Resume agent
resourceService.resumeAgent(agentId);
// Terminate agent
resourceService.terminateAgent(agentId, 'Exceeded all limits');const summary = resourceService.getResourceMetrics(new Date('2026-01-01'));
// {
// totalAgentsTracked: 5,
// agentsByPhase: { normal: 3, warning: 1, intervention: 1, termination: 0 },
// pausedAgents: 1,
// totalWarnings: 15,
// totalInterventions: 3,
// totalTerminations: 0,
// recentEvents: [...]
// }The Resource Exhaustion Service integrates with system_health:
{
"status": "healthy",
"checks": {
"database": true,
"vectorSearch": true,
"github": true,
"resourceExhaustion": {
"enabled": true,
"agentsTracked": 5,
"agentsByPhase": {
"normal": 3,
"warning": 1,
"intervention": 1
},
"pausedAgents": 1
}
}
}When Prometheus metrics are enabled, the service exposes:
| Metric | Type | Description |
|---|---|---|
agent_files_accessed |
Histogram | Files accessed per agent |
agent_api_calls |
Histogram | API calls per agent |
agent_tokens_consumed |
Histogram | Tokens consumed per agent |
agents_paused_current |
Gauge | Currently paused agents |
resource_exhaustion_warnings_total |
Counter | Total warnings issued |
resource_exhaustion_interventions_total |
Counter | Total interventions |
resource_exhaustion_terminations_total |
Counter | Total terminations |
// For exploratory/research agents - higher limits
{
"thresholds": {
"maxFilesAccessed": 500,
"maxApiCalls": 100,
"maxTimeWithoutDeliverableMs": 600000 // 10 minutes
}
}
// For production/deployment agents - stricter limits
{
"thresholds": {
"maxFilesAccessed": 50,
"maxApiCalls": 20,
"maxSubtasksSpawned": 5,
"maxTimeWithoutDeliverableMs": 180000 // 3 minutes
}
}// After completing meaningful work, record a deliverable
if (testsPassed) {
resourceService.recordDeliverable(
agentId,
'test_passed',
`All ${testCount} tests passing`
);
}
// This resets the "time without deliverable" timer
// and transitions warning → normalconst metrics = resourceService.getAgentMetrics(agentId);
if (metrics.phase === 'warning') {
// Agent approaching limits
// Consider: completing current task, recording deliverable, or pausing
console.warn(`Agent ${agentId} in warning phase`);
}// Check if agent is paused before assigning work
if (resourceService.isAgentPaused(agentId)) {
// Either wait for resume or use different agent
const resumed = await resourceService.waitForResume(agentId);
if (!resumed) {
// Agent was terminated, handle accordingly
}
}Problem: Agent keeps hitting warning threshold
Solutions:
- Record deliverables more frequently
- Increase threshold limits
- Break task into smaller subtasks
Problem: Agents getting paused too often
Solutions:
- Increase
warningThresholdPercent(e.g., 0.9) - Increase absolute thresholds
- Reduce
checkIntervalMsfor more gradual detection
Problem: Metrics not being recorded
Solutions:
- Ensure
enabled: truein config - Call
initializeAgent()when agent starts - Verify service is started with
start()
Related:
Getting Started
Core Concepts
Agent Guides
- Overview
- Coder
- Researcher
- Tester
- Reviewer
- Adversarial
- Architect
- Coordinator
- Analyst
- DevOps
- Documentation
- Security Auditor
MCP Tools
- Overview
- Agent Tools
- Memory Tools
- Task Tools
- Session Tools
- System Tools
- GitHub Tools
- Review Loop Tools
- Identity Tools
Recipes
- Index
- Code Review
- Doc Sync
- Multi-Agent
- Adversarial Testing
- Full-Stack Feature
- Memory Patterns
- GitHub Integration
Advanced
- Plugin Development
- Custom Agent Types
- Workflow Engine
- Vector Search Setup
- Web Dashboard
- Programmatic API
- Resource Monitoring
Reference