Resource Monitoring

Guide to monitoring and controlling agent resource consumption with the Resource Exhaustion Service.

Overview

The Resource Exhaustion Service prevents runaway agents by:

Tracking resource usage - Files, API calls, tokens, subtasks
Progressive intervention - Warning → Pause → Terminate
Deliverable checkpoints - Require periodic progress markers
Automatic enforcement - Configurable thresholds with auto-pause

Resource Metrics Tracked

Metric	Description
`filesRead`	Number of files read
`filesWritten`	Number of files created
`filesModified`	Number of files modified
`apiCallsCount`	Total API calls made
`subtasksSpawned`	Number of subtasks created
`tokensConsumed`	Total tokens used
`timeWithoutDeliverable`	Duration since last deliverable

Phase Progression

Agents progress through phases based on resource consumption:

stateDiagram-v2
    [*] --> Normal: Agent started
    Normal --> Warning: Approaching threshold
    Warning --> Normal: Deliverable recorded
    Warning --> Intervention: Threshold exceeded
    Intervention --> Warning: Agent resumed
    Intervention --> Termination: No response
    Termination --> [*]

Phase Descriptions

Phase	Description	Actions
Normal	Operating within limits	No action
Warning	Approaching limits (default 80%)	Log warning, notify
Intervention	Exceeded limits	Pause agent, require approval
Termination	Unrecoverable	Force stop agent

Configuration

interface ResourceExhaustionConfig {
  enabled: boolean;
  warningThresholdPercent: number;  // Default: 0.8 (80%)
  checkIntervalMs: number;          // Default: 60000 (1 minute)
  pauseOnIntervention: boolean;     // Default: true
  autoTerminate: boolean;           // Default: false
  thresholds: ResourceThresholds;
}

interface ResourceThresholds {
  maxFilesAccessed: number;           // Default: 100
  maxApiCalls: number;                // Default: 50
  maxSubtasksSpawned: number;         // Default: 20
  maxTokensConsumed: number;          // Default: 100000
  maxTimeWithoutDeliverableMs: number; // Default: 300000 (5 min)
}

Example Configuration

{
  "resourceExhaustion": {
    "enabled": true,
    "warningThresholdPercent": 0.8,
    "checkIntervalMs": 60000,
    "pauseOnIntervention": true,
    "autoTerminate": false,
    "thresholds": {
      "maxFilesAccessed": 100,
      "maxApiCalls": 50,
      "maxSubtasksSpawned": 20,
      "maxTokensConsumed": 100000,
      "maxTimeWithoutDeliverableMs": 300000
    }
  }
}

Deliverable Checkpoints

Deliverables are progress markers that indicate an agent is making meaningful progress, not just consuming resources.

Deliverable Types

Type	Description
`code_commit`	Code committed to repository
`test_passed`	Tests passing
`review_complete`	Code review finished
`documentation`	Documentation produced
`analysis_report`	Analysis or report generated
`deployment`	Deployment completed
`other`	Custom deliverable type

Recording Deliverables

Agents (or orchestrators) should record deliverables periodically:

import { getResourceExhaustionService } from '@blackms/aistack';

const resourceService = getResourceExhaustionService(store, config);

// Record a deliverable
const checkpoint = resourceService.recordDeliverable(
  agentId,
  'code_commit',
  'Implemented user authentication module',
  ['src/auth/login.ts', 'src/auth/jwt.ts']
);

Recording a deliverable:

Creates a checkpoint in the database
Updates lastDeliverableAt timestamp
Resets agent from warning to normal phase

Programmatic API

Initialize Agent Tracking

import { getResourceExhaustionService } from '@blackms/aistack';

const resourceService = getResourceExhaustionService(store, config);

// Start tracking a new agent
const metrics = resourceService.initializeAgent(agentId, 'coder');

Record Operations

// Record file operations
resourceService.recordFileOperation(agentId, 'read');
resourceService.recordFileOperation(agentId, 'write');
resourceService.recordFileOperation(agentId, 'modify');

// Record API calls
resourceService.recordApiCall(agentId, 1500); // with token count

// Record subtask spawning
resourceService.recordSubtaskSpawn(agentId);

Check Agent Status

// Get current metrics
const metrics = resourceService.getAgentMetrics(agentId);
console.log(metrics);
// {
//   agentId: 'uuid',
//   filesRead: 15,
//   filesWritten: 3,
//   filesModified: 8,
//   apiCallsCount: 12,
//   subtasksSpawned: 2,
//   tokensConsumed: 45000,
//   phase: 'normal',
//   lastDeliverableAt: Date,
//   ...
// }

// Evaluate current phase
const phase = resourceService.evaluateAgent(agentId);
// Returns: 'normal' | 'warning' | 'intervention' | 'termination'

Manual Intervention

// Pause an agent
await resourceService.pauseAgent(agentId, 'Manual review required');

// Check if paused
const isPaused = resourceService.isAgentPaused(agentId);

// Resume agent
resourceService.resumeAgent(agentId);

// Terminate agent
resourceService.terminateAgent(agentId, 'Exceeded all limits');

Get Metrics Summary

const summary = resourceService.getResourceMetrics(new Date('2026-01-01'));
// {
//   totalAgentsTracked: 5,
//   agentsByPhase: { normal: 3, warning: 1, intervention: 1, termination: 0 },
//   pausedAgents: 1,
//   totalWarnings: 15,
//   totalInterventions: 3,
//   totalTerminations: 0,
//   recentEvents: [...]
// }

Integration with system_health

The Resource Exhaustion Service integrates with system_health:

{
  "status": "healthy",
  "checks": {
    "database": true,
    "vectorSearch": true,
    "github": true,
    "resourceExhaustion": {
      "enabled": true,
      "agentsTracked": 5,
      "agentsByPhase": {
        "normal": 3,
        "warning": 1,
        "intervention": 1
      },
      "pausedAgents": 1
    }
  }
}

Prometheus Metrics

When Prometheus metrics are enabled, the service exposes:

Metric	Type	Description
`agent_files_accessed`	Histogram	Files accessed per agent
`agent_api_calls`	Histogram	API calls per agent
`agent_tokens_consumed`	Histogram	Tokens consumed per agent
`agents_paused_current`	Gauge	Currently paused agents
`resource_exhaustion_warnings_total`	Counter	Total warnings issued
`resource_exhaustion_interventions_total`	Counter	Total interventions
`resource_exhaustion_terminations_total`	Counter	Total terminations

Best Practices

Set Appropriate Thresholds

// For exploratory/research agents - higher limits
{
  "thresholds": {
    "maxFilesAccessed": 500,
    "maxApiCalls": 100,
    "maxTimeWithoutDeliverableMs": 600000  // 10 minutes
  }
}

// For production/deployment agents - stricter limits
{
  "thresholds": {
    "maxFilesAccessed": 50,
    "maxApiCalls": 20,
    "maxSubtasksSpawned": 5,
    "maxTimeWithoutDeliverableMs": 180000  // 3 minutes
  }
}

Record Deliverables Proactively

// After completing meaningful work, record a deliverable
if (testsPassed) {
  resourceService.recordDeliverable(
    agentId,
    'test_passed',
    `All ${testCount} tests passing`
  );
}

// This resets the "time without deliverable" timer
// and transitions warning → normal

Monitor Warning Phase

const metrics = resourceService.getAgentMetrics(agentId);

if (metrics.phase === 'warning') {
  // Agent approaching limits
  // Consider: completing current task, recording deliverable, or pausing
  console.warn(`Agent ${agentId} in warning phase`);
}

Handle Paused Agents

// Check if agent is paused before assigning work
if (resourceService.isAgentPaused(agentId)) {
  // Either wait for resume or use different agent
  const resumed = await resourceService.waitForResume(agentId);
  if (!resumed) {
    // Agent was terminated, handle accordingly
  }
}

Troubleshooting

Agent Stuck in Warning

Problem: Agent keeps hitting warning threshold

Solutions:

Record deliverables more frequently
Increase threshold limits
Break task into smaller subtasks

Intervention Too Aggressive

Problem: Agents getting paused too often

Solutions:

Increase warningThresholdPercent (e.g., 0.9)
Increase absolute thresholds
Reduce checkIntervalMs for more gradual detection

Missing Metrics

Problem: Metrics not being recorded

Solutions:

Ensure enabled: true in config
Call initializeAgent() when agent starts
Verify service is started with start()

Related:

Home

Getting Started

Core Concepts

Agent Guides

MCP Tools

Recipes

Advanced

Reference

Resource Monitoring

Resource Monitoring

Overview

Resource Metrics Tracked

Phase Progression

Phase Descriptions

Configuration

Example Configuration

Deliverable Checkpoints

Deliverable Types

Recording Deliverables

Programmatic API

Initialize Agent Tracking

Record Operations

Check Agent Status

Manual Intervention

Get Metrics Summary

Integration with system_health

Prometheus Metrics

Best Practices

Set Appropriate Thresholds

Record Deliverables Proactively

Monitor Warning Phase

Handle Paused Agents

Troubleshooting

Agent Stuck in Warning

Intervention Too Aggressive

Missing Metrics

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!