XaresAICoder automatically captures and logs all LLM API conversations from workspaces, enabling automatic documentation generation and learning reflection.
- Overview
- Architecture
- Features
- How It Works
- Using the Feature
- Documentation Types
- API Endpoints
- Configuration
- Technical Details
- Privacy & Security
- Troubleshooting
This feature transparently captures all LLM API traffic from workspace containers through the mitmproxy-based network proxy. Captured conversations include:
- Full request prompts (including system prompts, user messages, tool calls)
- Complete AI responses (text, streaming SSE responses, tool outputs)
- Token usage and cost metrics
- Model information and parameters
- Timestamps and metadata
The captured data enables:
- Automatic Documentation: Generate markdown documentation from AI coding sessions
- Learning Reflection: Students can review their AI-assisted development process
- Cost Tracking: Monitor token usage across projects
- Debugging: Investigate AI behavior and prompt engineering
┌─────────────────────┐
│ Workspace Container │
│ (with AI tools) │
└──────────┬───────────┘
│ HTTP/HTTPS
▼
┌──────────────────────────┐
│ mitmproxy-logger │
│ - SSL interception │
│ - Domain recording │ ← ALL requests
│ - LLM request/response │ ← LLM API calls only
│ body capture │
│ - SSE parsing │
└──────────┬───────────────┘
│ Writes JSON logs
▼
┌──────────────────────────┐
│ /var/log/mitmproxy/ │
│ ├── llm_conversations/ │ (LLM API calls)
│ │ ├── 172.30.0.5/ │
│ │ │ ├── 2025-*.json│
│ │ │ └── ... │
│ │ └── 172.30.0.6/ │
│ └── domains/ │ (ALL accessed domains)
│ ├── 172.30.0.5.json │
│ └── 172.30.0.6.json │
└──────────────────────────┘
│
▼
┌──────────────────────────┐
│ Backend API │
│ - Retrieve LLM logs │
│ - Generate docs │
│ - Get recorded domains │
│ - Apply as whitelist │
└──────────────────────────┘
- ✅ Captures all LLM API calls (OpenAI, Anthropic, Google, OpenCode, etc.)
- ✅ Supports streaming responses (Server-Sent Events)
- ✅ Per-workspace organization by container IP
- ✅ No configuration required in workspaces
- ✅ Works with all pre-installed AI tools
- ✅ Records ALL accessed domains (not just LLM APIs)
- ✅ Per-workspace tracking with hit count, first/last seen timestamps
- ✅ Auto-categorization (Package Managers, AI APIs, Documentation, etc.)
- ✅ Apply recorded domains as Security Proxy whitelist
- ✅ Survives mitmproxy restarts (persistent storage)
- ✅ Two documentation types: Clean and Detailed
- ✅ Markdown format for easy reading/sharing
- ✅ Token usage summaries
- ✅ Conversation history with timestamps
- ✅ Downloadable documentation files
- ✅ View all conversations in browser
- ✅ Filter by model, date range
- ✅ Delete individual conversations
- ✅ Clear all workspace conversations
- ✅ Search and pagination support
When a workspace has proxy enabled:
# Environment variables set in container
HTTP_PROXY=http://mitmproxy-logger:8080
HTTPS_PROXY=http://mitmproxy-logger:8080All HTTP/HTTPS traffic routes through mitmproxy, which:
- Intercepts LLM API calls to known domains
- Captures request body (prompts, parameters)
- Captures response body (AI output, token usage)
- Parses API-specific formats (JSON, SSE)
- Writes complete conversation to JSON file
The logger monitors these domains:
api.openai.com- OpenAI (ChatGPT, GPT-4, etc.)api.anthropic.com- Anthropic (Claude)generativelanguage.googleapis.com- Google (Gemini)api.google.dev- Google AIapi.opencode.ai- OpenCodeopencode.ai- OpenCode free tierapi.huggingface.co- Hugging Face
For streaming APIs (Claude Code, Claude API):
event: message_start
data: {"type":"message_start","message":{"id":"msg_123","model":"claude-3-5-sonnet-20241022"}}
event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"Here"}}
event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" is"}}
The logger:
- Detects SSE format (lines starting with
event:) - Parses each
data:line as JSON - Extracts text from
content_block_deltaevents - Combines chunks into complete response
Each conversation is stored as:
{
"client_ip": "172.30.0.5",
"timestamp": "2025-01-15T10:30:45.123Z",
"method": "POST",
"url": "https://api.anthropic.com/v1/messages",
"headers": {...},
"body": "{\"model\":\"claude-3-5-sonnet-20241022\",\"messages\":[...]}",
"parsed_request": {
"model": "claude-3-5-sonnet-20241022",
"messages": [...],
"max_tokens": 4096,
"temperature": 1.0
},
"response": {
"status_code": 200,
"headers": {...},
"body": "event: message_start\ndata: {...}",
"timestamp": "2025-01-15T10:30:47.456Z"
},
"parsed_response": {
"id": "msg_abc123",
"model": "claude-3-5-sonnet-20241022",
"content": [{"type": "text", "text": "Here is the code..."}],
"usage": {
"input_tokens": 1234,
"output_tokens": 567
}
}
}-
Enable proxy for workspace (required for logging)
- Check "Use Network Proxy" when creating workspace
- Or enable globally with
ENABLE_PROXY=truein.env
-
Open AI Conversations Modal
- Look for network icon (⊕) next to workspace name
- Click document icon (📄) or "AI Conversations" button
- View list of all captured conversations
-
Browse Conversations
- Click conversation headers to expand/collapse
- View full request/response JSON
- See timestamps, models, token usage
Method 1: From Modal
- Open AI Conversations modal
- Click 📄 Clean Docs or 📋 Detailed Docs
- Confirm generation
- Documentation downloads automatically
Method 2: Via API
curl -X POST http://localhost/api/projects/{projectId}/generate-documentation \
-H "Content-Type: application/json" \
-d '{"format":"markdown","type":"clean"}'Delete Individual Conversation:
- Open AI Conversations modal
- Click delete icon (🗑️) next to conversation
- Confirm deletion
Clear All Conversations:
- Open AI Conversations modal
- Click "Clear All" button
- Confirm deletion
Via API:
# Delete all
curl -X DELETE http://localhost/api/projects/{projectId}/llm-conversations
# Delete specific conversation
curl -X DELETE http://localhost/api/projects/{projectId}/llm-conversations/{timestamp}In addition to LLM conversation capture, mitmproxy records every domain accessed through the proxy. This enables teachers to discover which domains are needed for a project and create a targeted whitelist.
What gets recorded:
- Domain name (not full URLs) for every HTTP/HTTPS request
- Hit count per domain
- First seen and last seen timestamps
- Organized per workspace IP address
Storage format (/var/log/mitmproxy/domains/{client_ip}.json):
{
"pypi.org": {"count": 15, "first_seen": "2026-02-14T10:00:00Z", "last_seen": "2026-02-14T11:30:00Z"},
"api.anthropic.com": {"count": 3, "first_seen": "2026-02-14T10:05:00Z", "last_seen": "2026-02-14T11:00:00Z"},
"github.com": {"count": 8, "first_seen": "2026-02-14T10:02:00Z", "last_seen": "2026-02-14T11:25:00Z"}
}Recording behavior:
- New domains written to disk immediately
- Hit counts and last_seen flushed to disk every 60 seconds
- Existing data loaded from disk on mitmproxy startup (survives restarts)
- Domain tracking happens in both
request()andresponse()hooks
- Create a workspace with LLM Logging Proxy mode
- Work in the workspace (install packages, use AI tools, browse docs)
- Click the globe icon next to the workspace name
- View domains grouped by auto-detected category:
- Package Managers: pypi.org, npmjs.org, maven.org, etc.
- AI APIs: openai.com, anthropic.com, googleapis.com, etc.
- Documentation: docs.python.org, stackoverflow.com, etc.
- Version Control: github.com, gitlab.com
- System: debian.org, ubuntu.com
- Other: everything else
- In the Recorded Domains modal, check/uncheck domains as needed
- Click "Apply as Security Proxy Whitelist"
- The selected domains are sent to
PUT /api/whitelist - The server merges with base defaults, normalizes to squid format, and reconfigures squid
- New Security Proxy workspaces immediately use the updated whitelist
Via API:
# View recorded domains for a workspace
curl http://localhost/api/projects/{projectId}/recorded-domains
# Apply domains as whitelist
curl -X PUT http://localhost/api/whitelist \
-H "Content-Type: application/json" \
-d '{"domains": ["pypi.org", "api.anthropic.com", "github.com", "registry.npmjs.org"]}'
# View current whitelist
curl http://localhost/api/whitelistPurpose: Easy-to-read conversation summaries for sharing and review
Includes:
- User messages only
- AI assistant responses only
- Token usage summary
- Model information
- Timestamps
Excludes:
- System prompts
- Tool calls and results
- Internal system messages
<system-reminder>tags
Example Output:
# AI Coding Session Documentation (Clean)
**Generated:** 2025-01-15T14:30:00.000Z
**Total Conversations:** 5
**Type:** Clean (User/Assistant conversation only)
## Summary
- **claude-3-5-sonnet-20241022**: 3 conversations, 12,456 tokens
- **gpt-4**: 2 conversations, 8,234 tokens
## Conversation History
### Conversation 1
**Time:** 1/15/2025, 2:30:45 PM
**Model:** claude-3-5-sonnet-20241022
#### Request
**user:**Please help me implement user authentication
#### Response
I'll help you implement user authentication...
**Tokens:** Prompt: 234, Completion: 567, Total: 801
Best for:
- Project summaries
- Sharing with team members
- Student learning reflection
- Quick conversation review
Purpose: Complete technical details for debugging and analysis
Includes:
- ALL messages (system, user, assistant, tool)
- System prompts (truncated at 5000 chars)
- Request parameters (max_tokens, temperature)
- Response metadata (status codes, IDs)
- Complete token usage breakdown
- Tool use indicators
- Message numbering and role labels
Example Output:
# AI Coding Session Documentation (Detailed)
**Generated:** 2025-01-15T14:30:00.000Z
**Total Conversations:** 5
**Type:** Detailed (Complete technical details)
## Conversation History
### Conversation 1
**Time:** 1/15/2025, 2:30:45 PM
**Model:** claude-3-5-sonnet-20241022
**Endpoint:** https://api.anthropic.com/v1/messages
**Request ID:** msg_abc123
#### Request
**Message 1 (system):**You are Claude Code, Anthropic's official CLI... (5000+ char system prompt)
**Message 2 (user):**
Please help me implement user authentication
**Request Parameters:**
- Max Tokens: 4096
- Temperature: 1.0
#### Response
I'll help you implement user authentication...
**Token Usage:**
- Prompt: 1234
- Completion: 567
- Total: 1801
**Response Metadata:**
- Status Code: 200
- Response Time: 2025-01-15T14:30:47.456Z
Best for:
- Debugging AI behavior
- Prompt engineering analysis
- Cost tracking and optimization
- Understanding system prompts
- Technical troubleshooting
GET /api/projects/:projectId/llm-conversations
Query Parameters:
limit(default: 100) - Number of conversations to returnoffset(default: 0) - Pagination offsetmodel- Filter by model namedateFrom- Filter conversations after date (ISO 8601)dateTo- Filter conversations before date (ISO 8601)
Response:
{
"success": true,
"projectId": "abc-123",
"ipAddress": "172.30.0.5",
"conversations": [...],
"count": 50
}POST /api/projects/:projectId/generate-documentation
Request Body:
{
"format": "markdown",
"type": "clean"
}Parameters:
format:markdownorjson(default:markdown)type:cleanordetailed(default:clean)
Response:
{
"success": true,
"projectId": "abc-123",
"format": "markdown",
"type": "clean",
"documentation": "# AI Coding Session Documentation...",
"conversationCount": 50
}DELETE /api/projects/:projectId/llm-conversations
Response:
{
"success": true,
"message": "All conversations deleted successfully"
}DELETE /api/projects/:projectId/llm-conversations/:timestamp
Parameters:
timestamp: Conversation timestamp (from filename, e.g.,2025-01-15T10-30-45-123Z)
Response:
{
"success": true,
"message": "Conversation deleted successfully"
}.env file:
# Enable proxy feature globally (affects all new workspaces)
ENABLE_PROXY=true
# LLM logging is always enabled when proxy is enabled
# No separate toggle neededEach workspace can individually enable/disable proxy:
- With proxy: Conversations are logged
- Without proxy: No logging, direct internet access
To enable for specific workspace:
- Check "Use Network Proxy" during creation
- Or use API:
{"useProxy": true}
Conversations are stored indefinitely. To manage storage:
Manual cleanup:
# Delete all conversations for workspace
curl -X DELETE http://localhost/api/projects/{projectId}/llm-conversationsAutomated cleanup (future):
- Environment variable:
LLM_LOG_RETENTION_DAYS=30 - Cron job to delete old logs (not yet implemented)
Container: xaresaicoder-mitmproxy-logger
Command:
mitmproxy \
--mode regular \
--listen-host 0.0.0.0 \
--listen-port 8080 \
--set ssl_insecure=true \
-s /scripts/llm-logger.pyVolumes:
./mitmproxy/llm-logger.py:/scripts/llm-logger.py:ro- Logging scriptmitmproxy_logs:/var/log/mitmproxy- Log storage./squid/certs:/certs:ro- CA certificate (for future use)
Workspace containers trust the mitmproxy CA certificate:
# code-server/Dockerfile
COPY squid/certs/squid-ca-cert.pem /usr/local/share/ca-certificates/squid-ca.crt
RUN update-ca-certificatesThis enables HTTPS interception without SSL errors.
/var/log/mitmproxy/llm_conversations/
├── 172.30.0.5/ # Workspace container IP
│ ├── 2025-01-15T10-30-45-123Z.json
│ ├── 2025-01-15T10-31-12-456Z.json
│ └── ...
├── 172.30.0.6/ # Different workspace
│ └── 2025-01-15T11-00-00-789Z.json
└── ...
- One directory per workspace IP address
- One JSON file per conversation
- Filename is ISO 8601 timestamp (colons/dots replaced with hyphens)
- Files sorted chronologically
Backend maps workspace IP to project ID:
// Get workspace IP from project ID
const ipAddress = await dockerService.getWorkspaceIPAddress(projectId);
// Files stored at: /var/log/mitmproxy/llm_conversations/{ipAddress}/Note: IP addresses are dynamic and reused when workspaces are deleted/recreated. Conversation logs remain associated with the IP, not the project ID.
Clean documentation filtering:
// Only include user and assistant messages
const conversationMessages = conv.parsed_request.messages.filter(msg =>
msg.role === 'user' || msg.role === 'assistant'
);
// Skip system-reminder tags
content = msg.content.filter(block =>
block.type === 'text' && !block.text?.includes('<system-reminder>')
);Detailed documentation:
- Includes ALL roles: system, user, assistant, tool
- Shows complete system prompts (truncated at 5000 chars)
- Displays tool use/result indicators
What's logged:
- ✅ User prompts and questions
- ✅ AI responses and code suggestions
- ✅ API keys in request headers (if sent)
- ✅ Model parameters and settings
- ✅ Token usage and costs
What's NOT logged:
- ❌ Workspace file contents (unless sent to AI)
- ❌ Terminal commands (unless sent to AI)
- ❌ Git credentials
- ❌ Non-LLM network traffic
Risk: API keys appear in request headers and are logged in JSON files.
Mitigation:
- Logs stored in Docker volume (not exposed to network)
- Only accessible via backend API (requires project access)
- Consider encrypting log files (future enhancement)
Best practice:
- Use API keys with minimal scopes
- Rotate keys regularly
- Delete conversations when no longer needed
For Educational Use:
- Students should be informed of conversation logging
- Provide option to review/delete their conversations
- Consider data retention policies (30-90 days)
For Commercial Use:
- May need user consent for logging AI interactions
- Consider GDPR/privacy law implications
- Implement automated log deletion after retention period
To completely disable conversation logging:
Option 1: Disable proxy globally
# .env
ENABLE_PROXY=falseOption 2: Per-workspace
- Uncheck "Use Network Proxy" during creation
- Workspace bypasses mitmproxy, no logging occurs
Option 3: Remove mitmproxy service
# docker-compose.yml
# Comment out mitmproxy-logger serviceProblem: AI Conversations modal shows "No conversations found"
Solutions:
-
Check proxy enabled:
- Workspace must have proxy enabled
- Look for network icon (⊕) next to workspace name
-
Verify mitmproxy running:
docker ps | grep mitmproxy docker logs xaresaicoder-mitmproxy-logger -
Check workspace environment:
docker exec workspace-{id} env | grep -i proxy # Should show: HTTP_PROXY=http://mitmproxy-logger:8080
-
Verify LLM API calls:
- Make a prompt in workspace AI tool
- Check mitmproxy logs for interception
docker logs xaresaicoder-mitmproxy-logger | grep "Logged LLM"
Problem: AI tools fail with SSL certificate errors
Solution:
# In workspace container
sudo update-ca-certificatesPrevention:
- Ensure
squid/certs/squid-ca-cert.pemexists - Rebuild code-server image to include certificate
Problem: Documentation shows requests but empty responses
Causes:
-
Streaming responses not parsed correctly
- Check mitmproxy logs for SSE parsing errors
-
Old logs before SSE support
- Generate new conversations
- Old logs can't be retroactively fixed
-
Non-standard API format
- Some APIs may use different response formats
- Update
llm-logger.pyto support new formats
Problem: Detailed documentation is 50+ MB and slow to download
Cause: System prompts are very large (e.g., Claude Code has ~100KB system prompt per conversation)
Solutions:
- Use clean documentation - Excludes system prompts
- Limit conversations:
# Generate docs for last 10 conversations only curl -X POST .../generate-documentation \ -d '{"type":"detailed","limit":10}'
- System prompt truncation - Already implemented (5000 char limit)
Problem: Conversations disappear after Docker restart
Cause: mitmproxy_logs volume not configured properly
Solution:
# Check volume exists
docker volume ls | grep mitmproxy
# Recreate volume if missing
docker-compose down
docker volume create mitmproxy_logs
docker-compose up -dProblem: Workspace IP changed, can't find old conversations
Cause: Docker reassigns IPs when containers restart
Solution:
-
Check old IP directories:
docker exec xaresaicoder-mitmproxy-logger \ ls -la /var/log/mitmproxy/llm_conversations/ -
Manual retrieval:
# List all conversations across all IPs docker exec xaresaicoder-mitmproxy-logger \ find /var/log/mitmproxy/llm_conversations -name "*.json"
-
Prevention:
- Use static IP assignment (future enhancement)
- Associate logs with project ID instead of IP
-
Log Encryption
- Encrypt JSON files to protect API keys
- Decrypt on-demand during documentation generation
-
Automated Retention
- Environment variable:
LLM_LOG_RETENTION_DAYS - Cron job to delete old logs
- Per-project retention settings
- Environment variable:
-
Cost Analytics Dashboard
- Token usage visualization
- Cost estimates by model
- Per-project spending reports
- Monthly usage trends
-
Advanced Filtering
- Search conversations by content
- Filter by token usage range
- Group by project/model/date
- Export to CSV/Excel
-
Conversation Replay
- Replay conversations in UI
- Interactive prompt/response viewer
- Diff tool for prompt iterations
-
Static IP Assignment
- Assign fixed IPs to workspaces
- Persist conversation history across restarts
- Associate logs with project ID
-
Multiple Export Formats
- PDF documentation
- HTML with syntax highlighting
- LaTeX for academic papers
- JSON for programmatic access
-
Collaborative Features
- Share conversations via link
- Comment on specific prompts/responses
- Team conversation library
To extend LLM conversation logging:
-
Add new LLM provider:
# mitmproxy/llm-logger.py LLM_DOMAINS = [ 'api.openai.com', 'your-llm-provider.com' # Add here ]
-
Add new response format:
# Handle custom API response format if 'x-custom-api' in flow.response.headers: log_data['parsed_response'] = parse_custom_format(resp_body)
-
Add new documentation format:
// server/src/services/documentation.js function generateCustomDocumentation(conversations) { // Your custom format logic }
- mitmproxy Documentation
- Anthropic API - Streaming
- OpenAI API Reference
- Server-Sent Events Specification
For issues or questions:
- GitHub Issues: https://github.com/anthropics/xares-aicoder/issues
- Documentation:
/docs/ - Feature Branch:
feature/llm-conversation-logging