Skip to content

akhilsinghcodes/agents_fleet

Repository files navigation

Agents Fleet

CI

AI coding agents like Claude Code and Codex are powerful, but they have no built-in cost controls—one runaway session can silently burn $20–$50 with no visibility into what’s happening or when to stop. Agents Fleet gives you a local web UI to launch and monitor agent sessions and automatically stop them when they hit a token or USD budget.

Local-first “mission control” for AI coding agent CLIs (and any shell commands): launch sessions in a repo, stream live output to a web UI, stop them, and keep a persisted history.

Visual Overview

AgentFleet: Stop Runaway AI Agents with Local Mission Control

✨ Recently Shipped

  • Context % indicator in session header (latest)
    • Live colored chip in the top info bar showing Claude Code's session context window usage
    • Green <70%, amber 70-89%, red ≥90% — at a glance warning when context is running low
    • Extracts ctxPct from AF statusline ctx=IN/SIZE(PCT%) pattern — always available, updates every 500ms (same polling as token updates)
    • No server changes, no DB migration — pure client-side event dispatch
  • Caveman compression in Headroom Shell
    • Caveman level dropdown (Off / Lite / Full / Ultra / Wenyan) in the Headroom Shell session form
    • Selected level is appended as --append-system-prompt to the claude command, activating the caveman plugin for that session
    • Spend Analytics → Caveman tab: sessions, output tokens, estimated tokens saved, and cost broken down by compression level; savings rates: Lite 75%, Full 75%, Ultra 85%, Wenyan 90%
    • Resume sessions (claude --resume / codex resume) now correctly track delta tokens/cost instead of cumulative history
  • AI Coach Analytics (latest)
    • New Analytics tab (next to Artifacts) on Shell and Headroom Shell sessions — shows a 4-category practice scorecard (Prompt Quality, Session Hygiene, Code Review, Tool Mastery) plus detected anti-patterns with suggestions
    • Powered by 45 detection rules ported from microsoft/AI-Engineering-Coach, all running locally — no telemetry, no network calls
    • Parses Claude/Codex log files off disk (~/.claude/projects, ~/.codex) and scopes them to the exact session time window
    • Results persisted in a new additive session_analytics table — zero impact on existing budget/replay features
    • Historical sessions backfillable: pnpm --filter @agents_fleet/server exec tsx scripts/backfill-analytics.ts
    • See AI_COACH.md for the full rule catalogue and scoring formula
  • Headroom integration
    • New Headroom tab: LiteLLM chat with transparent context compression via the headroom proxy — same model/budget controls as LiteLLM, compression is automatic
    • ~19% token reduction observed on first real session (948 tokens saved out of 4,901 input)
    • Proxy starts automatically with pnpm dev:one — installs headroom-ai via pip on first run, polls until ready before starting the app
    • All LLM calls route through http://localhost:8787/v1 → your LITELLM_BASE_URL — no external endpoints called directly
    • Telemetry disabled (HEADROOM_TELEMETRY=off), HuggingFace offline after first model download (HF_HUB_OFFLINE=1)
    • Spend Analytics → Headroom tab: lifetime + per-session compression stats (tokens saved, savings %, cost saved) pulled from the proxy's /stats endpoint — persists across restarts via ~/.headroom/proxy_savings.json
    • Sessions tagged with purple Headroom chip in sidebar, distinguished from regular LiteLLM sessions
  • AI session summary (latest)
    • One-click plain-English summary of any session — title, what the agent did, and token/cost breakdown for the summary call
    • Powered by gpt-4.1-nano via your LiteLLM proxy — under $0.001 per summary
    • Summary persisted in SQLite and surfaced as a top-level artifact alongside git diff
    • Generated session title appears in the sessions sidebar for quick scanning
  • Side-by-side git diff viewer
    • File tabs at the top — click to switch files without scrolling
    • Side-by-side split with red/green highlighting and line numbers
    • Replaces the raw diff <pre> block in the Artifacts tab
  • LiteLLM Spend Analytics tab
    • Real spend data pulled from your LiteLLM proxy (/spend/logs, /user/daily/activity)
    • Matches Agents Fleet layout: header stats, This Week chart, Weekly Budget strip, By Model and Daily tabs
    • Weekly budget resets Sunday; projects spend and flags over-budget
  • Budget 80% warning notifications
    • Native browser notification + in-app toast when a session reaches 80% of its USD or token budget
    • Toast auto-dismisses after 8s; works even when browser notifications are blocked
  • One-click session resume (latest)
    • claude --resume <uuid> and codex resume <uuid> commands are captured automatically on session exit and shown in the Artifacts tab
    • Resume button spawns a new shell session instantly — no copy-paste needed
    • Backfilled across all historical sessions in the database
  • Graceful session exit for Claude and Codex (latest)
    • Stop button sends Ctrl+C → /exit instead of hard-killing, giving Claude/Codex time to save state and print the resume command before exiting
  • Interactive Git Diff Viewer
    • Side-by-side diff display with line-by-line numbering
    • Paired removed/added lines render adjacent for easy comparison
    • File-level grouping with syntax coloring
  • Spend Analytics Dashboard
    • View total spend by month, week, or day
    • Drill down by repo, command, or model
    • Real-time cost tracking with USD budgets
  • Budget Tracking in Session Header
    • Display token budgets (input + output combined) and USD budgets side-by-side with current usage
    • Shows on all tabs: Shell, Claude (SDK), LiteLLM
    • Example: total 81,772 / 100,000 budget $1.23 / $5.00

Context % Indicator Examples

Context Percentage Indicator at 2% Context Percentage Indicator at 18% Context Usage at 100% This repository contains a working MVP:

  • pnpm workspace monorepo
  • React + Vite + TypeScript “Mission Control” web app
  • Node + Express + TypeScript server:
    • SQLite persistence (data/agents_fleet.sqlite)
    • session + terminal history HTTP APIs
    • WebSocket streaming (/ws):
      • live PTY output for shell/CLI sessions
      • live Claude SDK chat streaming + tool events
  • shared TypeScript types (packages/shared)

Demo

Screenshots

Mission control overview

Mission control overview

Local-first architecture

Local-first architecture

AI Coach analytics — category scorecards + anti-patterns (per-session)

AI Coach analytics scorecard

  • Detailed anti-pattern breakdown (severity, occurrences, suggestion per rule)

AI Coach analytics detailed anti-patterns

AI Coach Analytics — cross-session dashboard

  • Dashboard — avg practice score, per-category trends, top anti-patterns, daily activity, harness mix

AI Coach Analytics dashboard

  • Patterns — hour×weekday activity heatmap, session calendar, per-repo project breakdown

AI Coach Analytics patterns

  • Timeline — recent sessions with repo, harness, duration, requests, score, and cost

AI Coach Analytics timeline

  • SDLC — work-type distribution (bug fix / feature / test / config / docs / style / refactor / code review) overall and per-repo

AI Coach Analytics SDLC breakdown

Create a new session

  • Shell session

New shell session

  • Claude (SDK) session

New Claude SDK session

  • LiteLLM session

New LiteLLM session

Interactive sessions

  • Claude Code / PTY session

Claude interactive session

  • OpenAI Codex / PTY session

Codex interactive session

  • Codex scrollback / persisted terminal replay

Codex scrollable terminal

Claude SDK chat flow

  • Chat conversation view

Claude SDK chat

  • Command approval gate

Claude SDK approval gate

  • Approval accepted

Claude SDK approval accepted

  • Approval rejected

Claude SDK approval rejected

  • Persisted chat history

Claude SDK history

Per-session artifacts (git diff snapshots + AI summary)

  • Git diff viewer (side-by-side, file tabs)

Git diff viewer

  • Session summary — AI-generated title, description, and token/cost breakdown

Session summary

  • Session summary live — sidebar titles, artifacts tab, and Regenerate button

Session summary live

Spend dashboards / budget tracking

  • Default spend dashboard

Spend dashboard default view

  • Spend dashboard today

Spend dashboard today

  • Spend dashboard 7 days

Spend dashboard 7 days

  • Spend dashboard by repo

Spend dashboard by repo

  • Spend dashboard by command

Spend dashboard by command

  • Spend dashboard by model

Spend dashboard by model

Session resume

  • Resume artifact in Artifacts tab with Copy + Resume button

Session resume artifact

  • Resumed session live terminal

Session resume terminal

Headroom — context compression

  • Headroom chat tab (proxy connected)

Headroom chat tab

  • Headroom chat session in progress

Headroom chat session

  • Spend Analytics — session overview (agent usage + savings breakdown)

Headroom session overview

  • Prefix cache impact (cache reads, writes, hit rate)

Headroom prefix cache impact

  • Performance stats (token usage + pipeline breakdown)

Headroom performance stats

  • Per-model token savings + recent requests

Headroom per-model and recent requests

  • Request Log tab (per-request detail: model, tokens, latency, cache, status)

Headroom request log

  • Headroom vs LiteLLM token comparison

Headroom vs LiteLLM token comparison

LiteLLM Spend Analytics

  • By Model tab — real spend breakdown from proxy

LiteLLM spend by model

  • Daily tab — per-day requests, tokens, and spend

LiteLLM spend daily

Budget warnings

  • In-app toast (shown when browser notifications are blocked or as a persistent overlay)

Budget warning toast

  • Native browser notification

Budget warning notification

  • Browser notification permission prompt

Budget warning permission

SQLite persistence / debug views

  • Sessions table

sessions table

  • Logs table

logs table

Videos

  • screenshots/AgentFleet__AI_Mission_Control.mp4

The MVP persists several tables in data/agents_fleet.sqlite:

  • sessions: session metadata + budgets + estimated token/cost + stop reason
  • pty_chunks: raw PTY stream (ANSI included) used for Terminal (persisted) replay
  • stdin_events: input audit trail (stored separately; not injected into replay)
  • session_markers: lifecycle markers like stop_requested, budget_exceeded, process_exit
  • session_artifacts: per-session artifacts — git snapshot (changedFiles[] + diff captured on stop/exit), session resume command, and AI-generated summary (title + description via gpt-4.1-nano)
  • session_analytics: practice score + anti-patterns + per-category group scores — backs the Analytics tab, additive and independent from budget/replay data

Earlier iterations used a line-based logs table. The current design persists terminal history as raw PTY chunks (pty_chunks) for xterm.js replay, which is much closer to real scrollback (especially for TUIs like Claude/Codex).

Videos

Tip: GitHub renders MP4 previews nicely in README. .mov files are ignored by default in .gitignore to avoid bloating git history.

  • screenshots/AgentFleet__AI_Mission_Control.mp4

Architecture

See ARCHITECTURE.md.

Prerequisites

  • Node.js 20.x–24.x (Node 26+ not yet supported)
  • pnpm (Corepack is fine)

Setup

COREPACK_HOME="$PWD/.corepack" pnpm install

Run (dev) — one command

pnpm dev:one

On first run, this may optionally prompt you for ANTHROPIC_API_KEY and save it to .env.local (gitignored). Press Enter to skip.

Environment variables:

  • ANTHROPIC_API_KEY (required for Claude SDK chat)
  • LITELLM_BASE_URL and LITELLM_API_KEY (optional for LiteLLM Chat via enterprise proxy)
  • HEADROOM_PORT (optional, default 8787) — port for the headroom proxy

This will:

  • install dependencies (if needed)
  • start apps/server + apps/web in parallel

Open: http://localhost:5173

Run (dev) — manual (two terminals)

COREPACK_HOME="$PWD/.corepack" pnpm -C apps/server dev
COREPACK_HOME="$PWD/.corepack" pnpm -C apps/web dev

Create a session

  1. Open the web app (Vite prints the URL, typically http://localhost:5173).
  2. Enter:
    • Repo path: absolute path to a local repository (must be a directory)
    • Command: any shell command to run in that repo

Example commands:

node -e "console.log('hello')"
git status
node -e "setinterval(()=>console.log('tick',Date.now()),200)"
node -e "setInterval(()=>console.log(Date.now()),200)"
claude
codex

Interactive sessions (e.g. Claude)

Claude Code / Codex (PTY)

  • Start a session with command claude (or codex if installed).

(Recommended) Claude Code status line for accurate budget tracking

Claude Code can run a custom status line command that receives structured JSON about the current session (context window usage, estimated cost, etc.).

For the most reliable budget tracking in Agents Fleet, configure a single-line status line that prints parse-friendly key/value pairs.

  1. Create the script:
#!/bin/bash
input=$(cat)

COST=$(echo "$input" | jq -r '.cost.total_cost_usd // 0')
COST_FMT=$(printf '%.6f' "$COST")
TRANSCRIPT=$(echo "$input" | jq -r '.transcript_path // empty')

IN=0
OUT=0
if [ -n "$TRANSCRIPT" ] && [ -f "$TRANSCRIPT" ]; then
  read IN OUT < <(jq -rs '
    [.[] | .message.usage // empty] as $u
    | "\(($u | map((.input_tokens // 0) + (.cache_read_input_tokens // 0) + (.cache_creation_input_tokens // 0)) | add // 0)) \(($u | map(.output_tokens // 0) | add // 0))"
  ' "$TRANSCRIPT" 2>/dev/null)
fi

CTX_IN=$(echo "$input" | jq -r '.context_window.total_input_tokens // 0')
CTX_SIZE=$(echo "$input" | jq -r '.context_window.context_window_size // 0')
CTX_PCT=$(echo "$input" | jq -r '.context_window.used_percentage // 0' | cut -d. -f1)

echo "[AF] ctx=${CTX_IN}/${CTX_SIZE}(${CTX_PCT}%) in=${IN:-0} out=${OUT:-0} cost=\$${COST_FMT} [/AF]"

Save it as ~/.claude/agents_fleet_statusline.sh and make it executable:

chmod +x ~/.claude/agents_fleet_statusline.sh
  1. Update ~/.claude/settings.json:
{
  "statusLine": {
    "type": "command",
    "command": "~/.claude/agents_fleet_statusline.sh",
    "padding": 1,
    "refreshInterval": 1
  }
}

Notes:

  • Requires jq to be installed (brew install jq on macOS).
  • cost.total_cost_usd is an estimate computed client-side by Claude Code and may differ from your actual bill.
  • The ctx=IN/SIZE(PCT%) field in the AF statusline powers the Context % indicator chip in the session header — green <70%, amber 70-89%, red ≥90%.
  • Type directly into the Terminal (live) pane (xterm.js).
  • Use Terminal (persisted) to replay and scroll through the recorded PTY output (xterm.js replay).

(Recommended) Codex status line for accurate budget tracking

Codex can also show session usage in a single-line status line. For Agents Fleet, the simplest reliable setup is to keep Codex’s built-in status line enabled and ensure it includes the usage fields below.

  1. Update ~/.codex/config.toml:
[tui]
status_line = ["model-with-reasoning","current-dir","context-remaining","context-used","total-input-token","total-output-tokens","weekly-limit","five-hour-limit","run-state","task-progress"]
status_line_use_color = true
  1. Make sure the output stays on one line in the Codex TUI.

Notes:

  • The config above matches the usage fields Agents Fleet can parse for budget tracking.
  • If you change the field list, keep it single-line so PTY replay remains parse-friendly.
  • Type directly into the Terminal (live) pane (xterm.js).
  • Use Terminal (persisted) to replay and scroll through the recorded PTY output (xterm.js replay).

Claude (SDK) chat (tool-calling)

Prerequisite: set ANTHROPIC_API_KEY (required). The server will reject Claude SDK requests if it’s missing.

  • Switch to Claude (SDK) in the UI.
  • Provide a repo path and chat normally.
  • The assistant can propose run_command tool calls; you must Approve or Reject each command.
  • Tool output is capped (100KB) and stored as session artifacts.

Screenshots:

  • Claude SDK session stopped by budget

Claude SDK budget stop

  • Claude SDK tool call + output

Claude SDK tool call

  • Claude SDK tool permission gate (Approve/Reject)

Claude SDK tool permission

LiteLLM Chat (proxy support)

Use your enterprise URL and API key to access multiple models through a LiteLLM proxy.

LiteLLM Chat allows you to:

  • Use your enterprise/custom LiteLLM proxy endpoint
  • Access models beyond Claude (OpenAI, Anthropic, etc.)
  • Route requests through your own infrastructure

Setup:

  1. Set environment variables:
export LITELLM_BASE_URL="https://your-litellm-proxy.com"
export LITELLM_API_KEY="your-api-key"
  1. Switch to LiteLLM in the UI.
  2. Provide a repo path and select your desired model from the dropdown.
  3. Chat and use tools normally—the same Approve/Reject workflow as Claude SDK.

Notes:

  • LITELLM_BASE_URL must be a valid HTTPS URL pointing to your LiteLLM proxy endpoint.
  • LITELLM_API_KEY is your authentication key for the proxy.
  • The available models depend on your LiteLLM proxy configuration.
  • Tool output is capped (100KB) and stored as session artifacts, just like Claude SDK.

Enterprise/Custom LLM Integration: If you're running a local or enterprise LiteLLM proxy, Agents Fleet will route all requests through your infrastructure, giving you full control and visibility over API costs and usage.

Headroom Chat (context compression)

Headroom is a transparent context compression proxy that reduces tokens sent to your LLM by 15–95% depending on context size, with no code changes required.

Setup: pnpm dev:one handles everything automatically:

  • Installs headroom-ai via pip if not present (prompts once)
  • Downloads the kompress-v2-base compression model from HuggingFace (one-time, ~first run)
  • Starts the proxy on http://localhost:8787 before the app
  • Routes all Headroom tab calls through the proxy → your LITELLM_BASE_URL

Usage:

  1. Switch to the Headroom tab
  2. Provide a repo path and select a model — same as LiteLLM
  3. Chat normally — compression is transparent
  4. View savings in Spend Analytics → Headroom tab

Notes:

  • Requires Python + pip for headroom proxy installation
  • Compression kicks in when context exceeds 500 tokens (configurable via HEADROOM_COMPRESSION_STABLE_AFTER_TURN)
  • Sessions are tagged [headroom-chat] and show a purple Headroom chip in the sidebar
  • Proxy logs: data/headroom.log
  • Port override: HEADROOM_PORT=9000 pnpm dev:one

Budgets (estimated)

  • Optional Budget USD and/or Budget tokens apply to the entire session lifetime.
  • Token budget: counts input + output tokens combined. When total_tokens >= budget_tokens, the session stops.
  • USD budget: when estimated_cost >= budget_usd, the session stops.
  • Token estimation: ceil(text.length / 4).
  • Cost estimation:
    • shell/PTY sessions use the default rates in apps/server/src/budget.ts ($3.00 per 1M input, $15.00 per 1M output by default)
    • Claude SDK sessions use a model-based pricing table (computeModelCostUsd) and SDK-reported usage when available.
    • LiteLLM sessions use model-specific pricing from your LiteLLM proxy configuration.
  • If a budget is exceeded, the session is stopped automatically and stop_reason becomes budget_exceeded.

Note: USD cost is still an estimate unless you configure model pricing to match your account/contract.

Configure pricing via a remote API (PRICING_API_URL, must be https) or via local overrides (PRICING_JSON inline JSON / PRICING_JSON_PATH file path). See apps/server/src/pricing.ts for schema + env vars.

Stop a session

  • Select a running session and click Stop.
  • The server will attempt graceful termination first, then force-kill if needed (best-effort, cross-platform).

Per-session artifacts (git diff snapshots)

On session stop and/or exit, Agents Fleet can capture a git snapshot for the session repo and store it in SQLite.

  • UI: open the Artifacts tab (next to Terminal tabs) to view changed files + diff.
  • Storage: session_artifacts table.
  • Toggle: set AGENTS_FLEET_CAPTURE_GIT_ON_END=0 to disable capture.

Resource metrics

Agents Fleet ships a scripts/metrics script that captures CPU, memory, swap, load, network I/O, disk I/O, open file descriptors, and SQLite DB size — scoped to AgentFleet processes only.

./scripts/metrics              # one-shot pretty print
./scripts/metrics --watch      # live refresh every 3s
./scripts/metrics --log        # continuous CSV log to data/metrics_<timestamp>.csv (every 5s)

To tail the log live while a session runs:

# Terminal 1
./scripts/metrics --log

# Terminal 2
tail -f data/metrics_<timestamp>.csv

Measured profile (Apple M4 Pro, 24GB RAM):

Scenario CPU% Memory
Idle (server + vite only) 0–1% ~165MB
Claude Shell session active 1–4% ~788MB
Claude SDK tool calls firing 3–17% 600–665MB
Git diff capture on stop 12–47% spike, clears fast
LiteLLM / Spend Analytics 0–3% ~165MB
  • Baseline footprint is tiny — 165MB, <1% CPU when no agent is running
  • Memory is almost entirely the Claude/Codex process itself, not AgentFleet overhead
  • Git diff capture is the heaviest single event (~12% typical, up to 47% if multiple sessions stop together) — lasts <10s and clears cleanly
  • No memory leaks observed — memory returns to baseline after every session exits
  • Swap usage unchanged throughout — AgentFleet does not add swap pressure
  • Data dir grows ~3MB per SDK session — worth monitoring on frequent use

GPU utilization is not captured without sudo. On Apple Silicon (unified memory) run sudo powermetrics --samplers gpu_power separately if needed.

Scripts

  • pnpm dev:one installs deps if needed and runs dev for all workspaces (web + server).
  • pnpm dev runs dev for all workspaces (web + server) in parallel.
  • pnpm check runs lint + typecheck + test + build.
  • pnpm build builds all workspaces.
  • pnpm typecheck runs TypeScript checks across workspaces.

Tests

COREPACK_HOME="$PWD/.corepack" pnpm -C apps/server test

Notes

  • If you see Corepack cache permission errors, the COREPACK_HOME="$PWD/.corepack" prefix keeps Corepack’s cache inside the repo.
  • Node version: Node 20–24 are supported. Node 26+ is blocked by @homebridge/node-pty-prebuilt-multiarch (>=18 <25). Node 22 and 24 work fine.

Data location

  • SQLite DB: data/agents_fleet.sqlite (local only; do not commit).

Known limitations

  • PTY sessions do not preserve stdout/stderr separation.
  • Token/cost is an estimate unless the CLI provides actual usage.
  • Some TUIs (notably Claude) may clear/restore the alternate screen on exit. The persisted replay is a faithful stream replay, so end-of-session scrollback may differ from what you remember seeing just before exit.
  • No multi-line input in the terminal pane. The xterm.js terminal forwards keystrokes directly to the PTY; the shell owns the line and executes on Enter. Shift+Enter, Ctrl+Enter, etc. all behave the same as plain Enter — there is no way to insert a newline without submitting at the terminal protocol level. Workarounds inside the shell: end a line with \ for continuation, or use $'line1\nline2' quoting. Inside Claude Code's TUI specifically, Option+Enter inserts a newline in the prompt. For free-form multi-line composition, use the Claude (SDK) or LiteLLM chat tabs instead, where Shift+Enter works as expected.

About

AgentFleet is a local-first "mission control" for AI coding agent CLIs and other shell commands. It provides a web UI to launch, monitor, and manage sessions within a repository, streaming live output and persisting a history of sessions and logs to SQLite. It supports interactive agents and budget enforcement.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages