Observability and Explainability for AI Agents
Datadog meets Chain-of-Thought — for autonomous agents
Getting Started · Features · SDK Reference · Dashboard · Architecture · Contributing · 📖 Full Docs · 🎯 Live Demo
AgentLens gives you full visibility into what your AI agents are doing, why they're doing it, and how much it costs. As AI agents become more autonomous — making decisions, calling tools, chaining actions — you need to see inside the black box.
AgentLens provides:
- Session-level tracing for every agent run
- Token and cost tracking across models and calls
- Decision traces capturing why an agent made each choice
- Human-readable explanations of agent behavior
- A real-time dashboard to monitor everything visually
| LangSmith | Helicone | Weights & Biases | AgentLens | |
|---|---|---|---|---|
| Self-hosted | ❌ | ❌ | ❌ | ✅ |
| Zero external dependencies | ❌ | ❌ | ❌ | ✅ |
| Decision-level explainability | ❌ | ❌ | ❌ | ✅ |
| Built-in anomaly detection | ❌ | ❌ | ❌ | ✅ |
| Session comparison & diff | ❌ | ❌ | ❌ | ✅ |
| Cost forecasting | ❌ | Partial | ❌ | ✅ |
| No vendor lock-in | ❌ | ❌ | ❌ | ✅ |
| Free & open source | ❌ | Partial | ❌ | ✅ |
AgentLens runs entirely on your infrastructure — SQLite for storage, no cloud dependencies, no data leaving your network.
| Feature | Description |
|---|---|
| 📊 Session Tracking | Group agent actions into sessions with full execution traces |
| 🛠️ Tool Call Capture | Record every tool invocation with inputs, outputs, and duration |
| 💰 Token Usage | Track token consumption and costs across models |
| 🧠 Decision Traces | Capture the reasoning behind each agent decision |
| 📈 Visual Timeline | Interactive timeline view of agent actions in the dashboard |
| 💡 Explainability | Generate human-readable summaries of agent behavior |
| 🎨 Decorators | Zero-config instrumentation with Python decorators |
| 📈 Analytics Dashboard | Aggregate stats, model usage, hourly activity heatmap, sessions-over-time |
| ⚖️ Session Comparison | Compare two sessions side-by-side — token deltas, event breakdowns, tool usage diffs |
| 💲 Cost Estimation | Configurable model pricing, per-session/event cost tracking, cost breakdown dashboard |
| 🔔 Alert Rules | Configurable alert rules with metric thresholds and event triggers |
| 🏷️ Session Tags | Tag sessions for filtering, organization, and retention exemption |
| 📝 Annotations | Timestamped notes on sessions and events for auditing |
| 🗄️ Data Retention | Configurable retention policies with auto-purge and exempt tags |
| 🔍 Event Search | Rich filtering across sessions — by type, model, tokens, duration |
| 🔬 Anomaly Detection | Z-score statistical analysis to detect latency spikes, token surges, error bursts |
| 🏥 Health Scoring | Grade sessions A–F based on error rates, latency, tool failures |
| 💸 Cost Budgets | Per-agent and global spending limits with real-time tracking, warnings, and overage detection |
| 📖 Session Narratives | Auto-generate human-readable summaries of agent session behavior |
| 🏆 Agent Scorecards | Per-agent performance grading with composite scores and letter grades |
| 🔮 Cost Forecasting | Budget projections with what-if simulator and model breakdown |
| 📊 Token Heatmap | Calendar-style visualization of token consumption patterns |
| ⏱️ Trace Waterfall | Interactive Gantt-style event visualization for session traces |
| 🔄 Session Diff | Side-by-side visual comparison of two agent sessions |
| ❌ Error Analytics | Error grouping by type, agent, and model with trend analysis |
| 🎯 Command Center | Unified activity feed aggregating alerts, anomalies, budget warnings, and health signals |
| 📋 SLA Compliance | Track SLA targets with compliance rings, violation alerts, and history |
┌──────────────┐ HTTP POST ┌──────────────────┐ SQLite ┌──────────┐
│ Your Agent │ ──────────────────► │ AgentLens API │ ──────────────► │ DB │
│ + SDK │ /events │ (Express.js) │ └──────────┘
└──────────────┘ └────────┬─────────┘
│ REST API
┌────────┴─────────┐
│ Dashboard │
│ (HTML/CSS/JS) │
└──────────────────┘
| Component | Directory | Tech Stack |
|---|---|---|
| Python SDK | sdk/ |
Python 3.9+, Pydantic, httpx |
| Backend API | backend/ |
Node.js, Express, better-sqlite3 |
| Dashboard | dashboard/ |
Vanilla HTML/CSS/JS (no build step) |
- Python 3.9+ (for the SDK)
- Node.js 18+ (for the backend)
- npm (comes with Node.js)
git clone https://github.com/sauravbhattacharya001/agentlens.git
cd agentlenscd backend
npm install
node seed.js # Load demo data (optional)
node server.js # Starts on http://localhost:3000The dashboard is served automatically at http://localhost:3000.
pip install agentlensOr install from source for development:
cd sdk
pip install -e .After installing the SDK, you get the agentlens command:
# Check backend connectivity
agentlens status
# List recent sessions
agentlens sessions --limit 10
# View cost breakdown for a session
agentlens costs <session_id>
# Search events by type or model
agentlens events --type llm_call --model gpt-4
# Export a session to JSON or CSV
agentlens export <session_id> --format csv -o report.csv
# Health score for a session (A–F grading)
agentlens health <session_id>
# Compare two sessions side-by-side
agentlens compare <session_a> <session_b>
# View aggregate analytics
agentlens analytics
# List recent alerts
agentlens alerts
# Generate incident postmortem for a session
agentlens postmortem <session_id>
# List sessions eligible for postmortem analysis
agentlens postmortem --candidates --min-errors 3
# Live session leaderboard
agentlens top
# Live-follow session events
agentlens tail <session_id>
# Generate time-range summary report
agentlens report --from 2024-01-01 --to 2024-01-31
# Generate interactive HTML flamegraph for a session
agentlens flamegraph <session_id> -o profile.html --open
# Print flamegraph statistics without generating HTML
agentlens flamegraph <session_id> --stats
# Generate self-contained HTML dashboard with interactive charts
agentlens dashboard --limit 200 -o dashboard.html --open
# Evaluate sessions against SLA policies
agentlens sla --policy production --limit 100
# Custom SLA targets with verbose output
agentlens sla --latency 2000 --error-rate 5 --token-budget 8000 --slo 95 --verbose
# SLA compliance as JSON for CI/CD pipelines
agentlens sla --policy production --json📖 Full CLI reference: See docs/CLI.md for all 50+ commands with options and examples.
Configure via environment variables:
export AGENTLENS_ENDPOINT=http://localhost:3000
export AGENTLENS_API_KEY=your-keyOr pass --endpoint and --api-key flags to any command.
import agentlens
# Initialize the SDK
agentlens.init(api_key="your-key", endpoint="http://localhost:3000")
# Start a tracking session
session = agentlens.start_session(agent_name="my-agent")
# Track events manually
agentlens.track(
event_type="llm_call",
input_data={"prompt": "What is 2+2?"},
output_data={"response": "4"},
model="gpt-4",
tokens_in=12,
tokens_out=3,
reasoning="Simple arithmetic question, answered directly",
)
# Get a human-readable explanation
print(agentlens.explain())
# End the session
agentlens.end_session()cd sdk/examples
python mock_agent.py
# Then open http://localhost:3000 to see the resultsimport agentlens
# Connect to your AgentLens backend
tracker = agentlens.init(
api_key="your-key", # API key for authentication
endpoint="http://localhost:3000" # Backend URL
)# Start a session
session = agentlens.start_session(
agent_name="my-agent", # Name of the agent
metadata={"env": "prod"} # Optional metadata
)
# End the session (flushes all pending events)
agentlens.end_session()event = agentlens.track(
event_type="llm_call", # Event type: llm_call, tool_call, generic
input_data={"prompt": "..."}, # Input to the operation
output_data={"text": "..."}, # Output from the operation
model="gpt-4", # Model used (if applicable)
tokens_in=100, # Input tokens
tokens_out=50, # Output tokens
reasoning="...", # Why the agent made this decision
tool_name="search", # Tool name (for tool calls)
tool_input={"query": "..."}, # Tool input
tool_output={"results": []}, # Tool output
duration_ms=1500.0, # Execution duration in ms
)from agentlens import track_agent, track_tool_call
@track_agent(model="gpt-4")
def my_agent(prompt):
"""Automatically tracked — captures input, output, and timing."""
return call_llm(prompt)
@track_tool_call(tool_name="web_search")
def search(query):
"""Automatically tracked — captures tool input/output."""
return do_search(query)# Get a human-readable explanation of agent behavior
explanation = agentlens.explain()
print(explanation)
# Output: "The agent received a question about arithmetic.
# It called GPT-4 which responded with '4'.
# Total tokens used: 15 (12 in, 3 out)."# Compare two sessions side-by-side
result = agentlens.compare_sessions(
session_a="abc123",
session_b="def456",
)
# Result includes metrics, deltas, and shared breakdowns
print(f"Token delta: {result['deltas']['total_tokens']['percent']}%")
print(f"Session A events: {result['session_a']['event_count']}")
print(f"Session B events: {result['session_b']['event_count']}")
print(f"Shared tools: {result['shared']['tools']}")# Get cost breakdown for the current session
costs = agentlens.get_costs()
print(f"Total cost: ${costs['total_cost']:.4f}")
print(f"Input cost: ${costs['total_input_cost']:.4f}")
print(f"Output cost: ${costs['total_output_cost']:.4f}")
# Per-model breakdown
for model, mc in costs['model_costs'].items():
print(f" {model}: ${mc['total_cost']:.4f} ({mc['calls']} calls)")
# View/update model pricing (per 1M tokens, USD)
pricing = agentlens.get_pricing()
print(pricing['pricing']) # Current pricing config
# Set custom pricing
agentlens.set_pricing({
"my-custom-model": {
"input_cost_per_1m": 5.00,
"output_cost_per_1m": 15.00,
}
})# Search events with rich filtering
results = tracker.search_events(
q="error", # Full-text search
event_type="tool_call", # Filter by type
model="gpt-4", # Filter by model
min_tokens=100, # Minimum token count
has_tools=True, # Only events with tool calls
after="2024-01-01T00:00:00Z", # Date range
limit=50, # Max results
)
for event in results["events"]:
print(f"{event['event_type']}: {event.get('model', 'N/A')}")# Add tags to the current session
tracker.add_tags(["production", "v2.0", "critical"])
# Remove specific tags
tracker.remove_tags(["v2.0"])
# Get tags for a session
tags = tracker.get_tags()
# List all tags across sessions
all_tags = tracker.list_all_tags()
# Find sessions by tag
sessions = tracker.list_sessions_by_tag("production")# Annotate a session with timestamped notes
tracker.annotate(
"Latency spike detected at step 5",
annotation_type="warning",
author="monitoring-bot",
)
tracker.annotate(
"Reached goal state",
annotation_type="milestone",
)
# Retrieve annotations
annotations = tracker.get_annotations(annotation_type="warning")
for ann in annotations["annotations"]:
print(f"[{ann['type']}] {ann['text']}")
# Update or delete annotations
tracker.update_annotation("ann-id-123", text="Updated note")
tracker.delete_annotation("ann-id-456")# Create an alert rule
tracker.create_alert_rule(
name="High Error Rate",
metric="error_rate",
condition="gt",
threshold=0.1,
description="Fires when error rate exceeds 10%",
)
# List and evaluate rules
rules = tracker.list_alert_rules()
alerts = tracker.evaluate_alerts() # Check all rules against recent data
alert_events = tracker.get_alert_events(limit=20)from agentlens import AnomalyDetector, AnomalyDetectorConfig
config = AnomalyDetectorConfig(
warning_threshold=2.0, # 2σ = warning
critical_threshold=3.0, # 3σ = critical
)
detector = AnomalyDetector(config)
# Analyze a session for anomalies
report = detector.analyze(session_events)
print(f"Found {len(report.anomalies)} anomalies")
for anomaly in report.anomalies:
print(f" [{anomaly.severity.value}] {anomaly.kind.value}: {anomaly.description}")from agentlens import HealthScorer, HealthThresholds
scorer = HealthScorer()
report = scorer.score(session_events)
print(f"Overall: {report.overall_grade.value} ({report.overall_score:.0f}/100)")
for metric in report.metrics:
print(f" {metric.name}: {metric.grade.value} ({metric.score:.0f}/100)")# Configure retention policy
tracker.set_retention_config(
max_age_days=30, # Delete sessions older than 30 days
max_sessions=10000, # Keep max 10k sessions
exempt_tags=["production"], # Never delete production sessions
auto_purge=True, # Enable automatic cleanup
)
# Preview what would be purged
preview = tracker.purge(dry_run=True)
print(preview["message"])
# Actually purge
result = tracker.purge()
print(f"Purged {result['purged_sessions']} sessions")| Model | Description |
|---|---|
AgentEvent |
A single observable event (LLM call, tool use, decision) |
ToolCall |
A tool/function invocation with input and output |
DecisionTrace |
The reasoning behind an agent's decision |
Session |
A collection of events for one agent run |
AlertRule |
A configurable alert rule with metric and threshold |
Anomaly |
A detected statistical anomaly in session metrics |
HealthReport |
Graded health assessment of a session (A–F) |
The dashboard provides a real-time view of your agent sessions:
- Sessions List — Filter by status (active, completed, error)
- Session Comparison — Select two sessions and compare side-by-side with visual diffs
- Analytics Overview — Click 📈 Analytics to see aggregate stats, model usage, hourly activity, and top agents
- Timeline View — Interactive timeline of every event in a session
- Token Charts — Per-event and cumulative token usage visualization
- Explain Tab — Human-readable behavior summaries
- Costs Tab — Per-event and per-model cost breakdowns, cumulative cost chart, configurable model pricing
- Cost Forecast — Budget projections with what-if simulator and model breakdown
- Agent Scorecards — Per-agent performance grading with composite scores, letter grades, and sparkline trends
- Token Heatmap — Calendar-style visualization of daily token consumption
- Trace Waterfall — Gantt-style visualization of event timing within a session
- Session Diff Viewer — Side-by-side comparison of two sessions with event-level diffs
- Error Analytics — Error grouping by type, agent, and model with trends
- SLA Compliance — Compliance rings, violation alerts, and history charts
The dashboard is a lightweight HTML/CSS/JS app served directly by the backend — no build step required.
The backend exposes a comprehensive REST API with 80+ endpoints across 16 route groups:
| Route Group | Endpoints | Description |
|---|---|---|
| Sessions | 8 | CRUD, search, explain, export, compare |
| Events | 1 | Batch event ingestion (up to 500/call) |
| Analytics | 4 | Aggregate stats, performance, heatmaps, cache |
| Pricing & Costs | 4 | Model pricing config, per-session cost calculation |
| Alerts | 8 | Alert rules CRUD, evaluation, acknowledgment |
| Webhooks | 6 | Webhook CRUD, test delivery, delivery history |
| Correlations | 10 | Correlation rules, groups, event correlations |
| Correlation Scheduler | 6 | SSE stream, schedule management, scheduler control |
| Tags | 5 | Session tagging, tag-based filtering |
| Bookmarks | 4 | Session bookmarking |
| Annotations | 5 | Timestamped notes on sessions and events |
| Baselines | 5 | Agent performance baselines and drift detection |
| Error Analysis | 5 | Error grouping by type, agent, model with trends |
| Dependencies | 5 | Service dependency graph, co-occurrence, critical paths |
| Leaderboard | 1 | Agent performance ranking |
| Postmortem | 2 | Incident report generation and candidate listing |
| Retention | 4 | Retention config, stats, manual purge |
| Health | 1 | Health check |
📖 Full API reference with request/response examples: docs/API.md
- Python SDK: Pydantic for data validation, httpx for async HTTP
- Backend: Express.js with better-sqlite3 for zero-config persistence
- Dashboard: Vanilla JS with Canvas-based charts (no framework dependencies)
- Database: SQLite (embedded, no external DB setup needed)
Contributions are welcome! Here's how to get started:
- Fork the repo
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Run tests:
cd sdk && pytest - Commit (
git commit -m 'Add amazing feature') - Push (
git push origin feature/amazing-feature) - Open a Pull Request
# Backend (with auto-reload)
cd backend && npm install && node server.js
# SDK (editable install with dev deps)
cd sdk && pip install -e ".[dev]"
# Run SDK tests
cd sdk && pytestMIT — see LICENSE for details.
Built by Saurav Bhattacharya
Because if you can't see what your agents are doing, you can't trust them.