Paper Autopilot - Automatic Document Processing

Automatic PDF processing from your ScanSnap scanner using OpenAI Responses API

Paper Autopilot continuously monitors your scanner's inbox folder and automatically processes PDFs as they arrive. No manual intervention needed - just scan your documents and let the autopilot handle the rest.

What's New

Week 3 - Vector Store Observability (January 2025)

Observability:

Real-time cost tracking ($0.10/GB/day after 1GB free tier)
Performance metrics (P50/P95/P99 search latency)
Upload success rate monitoring (target >95%)
Structured JSON logging with embedded metrics

Documentation:

ADR-030: Vector Store Integration (458 lines)
Complete usage guide (683 lines) with troubleshooting
Cost management strategies and optimization tips

Discovery:

Phase 4 (Error Handling) already complete from Wave 1 ✓
CompensatingTransaction pattern with LIFO rollback
30 tests (100% passing) for transaction safety

Wave 2 - Type Safety & Cache (October 2025)

Type Safety:

100% type annotation coverage with MyPy strict mode
Zero Any type leakage from external libraries
Pre-commit hooks enforce type safety

Performance:

Production-ready embedding cache with <0.1ms latency
70%+ cache hit rate with temporal locality
1M lookups/sec throughput
SHA-256 cache keys with LRU eviction

Quality:

41 new cache tests (unit + integration + performance)
Property-based testing with Hypothesis framework
6 new ADRs documenting technical decisions (ADR-027 through ADR-030)

See CHANGELOG.md for full details.

What It Does

Watches your scanner's inbox folder (/Users/krisstudio/Paper/InboxA)
Detects new PDFs instantly using filesystem events
Validates PDF integrity and waits for scanner to finish writing
Processes documents using OpenAI Responses API for metadata extraction
Stores results in SQLite database with full audit trail
Uploads to OpenAI vector store for semantic search
Moves processed PDFs to organized folders

Quick Start

1. Install Dependencies

# Create and activate virtual environment
python3 -m venv .venv && source .venv/bin/activate

# Install required packages
pip install -r requirements.txt

2. Configure API Key

The daemon will automatically load your OpenAI API key from ~/.OPENAI_API_KEY:

# Save your API key to the file
echo "sk-your-actual-key-here" > ~/.OPENAI_API_KEY
chmod 600 ~/.OPENAI_API_KEY

Or set it as an environment variable:

export OPENAI_API_KEY=sk-your-actual-key-here

3. Run the Daemon

# Start the automatic processing daemon
python3 run_daemon.py

The daemon will:

Create necessary directories if they don't exist
Start watching /Users/krisstudio/Paper/InboxA for new PDFs
Process documents automatically as they arrive
Log all activities to logs/paper_autopilot.log

4. Configure Your Scanner

Set your ScanSnap scanner to save PDFs to:

/Users/krisstudio/Paper/InboxA

See docs/scansnap-ix1600-setup.md for detailed scanner configuration.

Automatic Startup (macOS)

To have Paper Autopilot start automatically on login:

# Copy LaunchAgent plist
cp com.paperautopilot.daemon.plist ~/Library/LaunchAgents/

# Load and start the daemon
launchctl load ~/Library/LaunchAgents/com.paperautopilot.daemon.plist

# Verify it's running
launchctl list | grep paperautopilot

The daemon will now start automatically every time you log in to your Mac.

Repository Structure

.
├── run_daemon.py          # Entry point for automatic daemon
├── src/
│   ├── daemon.py          # File watching and automatic processing
│   ├── processor.py       # Document processing pipeline
│   ├── config.py          # Configuration management (Pydantic V2)
│   ├── cache.py           # LRU embedding cache (NEW)
│   ├── database.py        # SQLite database operations
│   ├── api_client.py      # OpenAI Responses API client
│   └── vector_store.py    # Vector store management
├── docs/
│   ├── DAEMON_MODE.md     # Detailed daemon setup guide
│   ├── RUNBOOK.md         # Production operations guide
│   ├── DEVELOPMENT_MODEL.md  # Parallel execution guide (NEW)
│   └── scansnap-ix1600-setup.md  # Scanner configuration
└── com.paperautopilot.daemon.plist  # macOS LaunchAgent config

Configuration

All settings can be configured via environment variables:

# Required
OPENAI_API_KEY=sk-...              # OpenAI API key

# Paths (defaults shown)
PAPER_AUTOPILOT_INBOX_PATH=/Users/krisstudio/Paper/InboxA
PAPER_AUTOPILOT_DB_URL=sqlite:///paper_autopilot.db

# Processing
OPENAI_MODEL=gpt-5-mini           # gpt-5-mini, gpt-5-nano, gpt-5, gpt-5-pro, gpt-4.1
API_TIMEOUT_SECONDS=300           # API call timeout (30-600s)
MAX_RETRIES=5                     # Retry attempts (1-10)

# Logging
LOG_LEVEL=INFO                    # DEBUG, INFO, WARNING, ERROR
LOG_FORMAT=json                   # json or text

See docs/DAEMON_MODE.md for complete configuration reference.

Monitoring

View daemon logs in real-time:

# Application logs (structured JSON)
tail -f logs/paper_autopilot.log | jq .

# Daemon stdout
tail -f logs/daemon_stdout.log

# Daemon errors
tail -f logs/daemon_stderr.log

Check daemon status:

# macOS LaunchAgent
launchctl list | grep paperautopilot

# View recent activity
grep "Processing complete" logs/paper_autopilot.log | tail -10

Folder Organization

/Users/krisstudio/Paper/
├── InboxA/          # Scanner drops PDFs here
├── Processed/       # Successfully processed PDFs
└── Failed/          # PDFs that failed processing

The daemon automatically moves PDFs to the appropriate folder after processing.

Supported Models

Paper Autopilot uses only OpenAI Frontier models per project requirements:

gpt-5-mini (default) - Fast, cost-efficient
gpt-5-nano - Fastest, most cost-efficient
gpt-5 - Best for coding and agentic tasks
gpt-5-pro - Smarter and more precise
gpt-4.1 - Smartest non-reasoning model

Important: Never use gpt-4o or chat completions models. Paper Autopilot uses only the Responses API endpoint (/v1/responses), never chat completions.

Documentation

Daemon Mode Guide - Complete daemon setup and troubleshooting
Production Runbook - Operations guide for production deployments
Development Model - Parallel execution with git worktrees (NEW)
Scanner Setup - ScanSnap iX1600 configuration
Code Architecture - System architecture and design
Processor Guide - Document processing pipeline details

Contributing

Review AGENTS.md for project conventions, testing expectations, and security practices. Key points:

Follow PEP 8, run black before commits
Use pytest for automated testing
Never commit sample PDFs or raw API responses
Keep model selections aligned with Frontier models only

Run policy checks before PRs:

python scripts/check_model_policy.py --diff
pytest tests/test_model_policy.py

Architecture

Paper Autopilot implements a production-grade document processing pipeline:

File Watching: Real-time detection with filesystem events (watchdog library)
File Stabilization: Handles scanner's phased writes (waits for OCR completion)
Deduplication: SHA-256 hash-based duplicate detection
Processing Pipeline: Responses API → Schema Validation → Database Storage
Vector Search: Automatic upload to OpenAI vector store with LRU embedding cache
Audit Trail: Complete processing history with costs and timing
Error Handling: Automatic retries with exponential backoff
Type Safety: MyPy strict mode with 100% annotation coverage
Performance: <0.1ms cache latency, 70%+ hit rate, >1M ops/sec throughput

License

See LICENSE file for details.

Maintained By: Platform Engineering Team Version: 1.0.0

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
.claude/agents		.claude/agents
.github		.github
alembic		alembic
config		config
docs		docs
examples		examples
failed		failed
inbox		inbox
logs		logs
processed		processed
scripts		scripts
src		src
tests		tests
token_counter		token_counter
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
=2.0.0		=2.0.0
=4.20.0		=4.20.0
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
COVERAGE_BASELINE.md		COVERAGE_BASELINE.md
COVERAGE_FINAL.md		COVERAGE_FINAL.md
Dockerfile		Dockerfile
GETTING_STARTED.md		GETTING_STARTED.md
INTEGRATION_REPORT.md		INTEGRATION_REPORT.md
PHASE5_COMPLETION.md		PHASE5_COMPLETION.md
PHASES_3_4_SUMMARY.md		PHASES_3_4_SUMMARY.md
PHASE_10_HANDOFF.json		PHASE_10_HANDOFF.json
PHASE_10_SUMMARY.md		PHASE_10_SUMMARY.md
PHASE_6_7_8_HANDOFF.json		PHASE_6_7_8_HANDOFF.json
PHASE_9_FINAL_REPORT.txt		PHASE_9_FINAL_REPORT.txt
PHASE_9_HANDOFF.json		PHASE_9_HANDOFF.json
PHASE_9_SUMMARY.md		PHASE_9_SUMMARY.md
PROJECT_MANAGER_GUIDE.md		PROJECT_MANAGER_GUIDE.md
README.md		README.md
TESTING_RECOMMENDATIONS.md		TESTING_RECOMMENDATIONS.md
WEEK_1_COMPLETE.md		WEEK_1_COMPLETE.md
WEEK_2_PLAN.md		WEEK_2_PLAN.md
WORKSTREAM_1_COMPLETE.md		WORKSTREAM_1_COMPLETE.md
alembic.ini		alembic.ini
com.paperautopilot.daemon.plist		com.paperautopilot.daemon.plist
coverage.json		coverage.json
docker-compose.yml		docker-compose.yml
paper-autopilot.service		paper-autopilot.service
phase_3_4_handoff_report.json		phase_3_4_handoff_report.json
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements-test.txt		requirements-test.txt
requirements.txt		requirements.txt
run_daemon.py		run_daemon.py
test_search.py		test_search.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Paper Autopilot - Automatic Document Processing

What's New

Week 3 - Vector Store Observability (January 2025)

Wave 2 - Type Safety & Cache (October 2025)

What It Does

Quick Start

1. Install Dependencies

2. Configure API Key

3. Run the Daemon

4. Configure Your Scanner

Automatic Startup (macOS)

Repository Structure

Configuration

Monitoring

Folder Organization

Supported Models

Documentation

Contributing

Architecture

License

About

Uh oh!

Releases

Packages

Languages

walksalot/autoD

Folders and files

Latest commit

History

Repository files navigation

Paper Autopilot - Automatic Document Processing

What's New

Week 3 - Vector Store Observability (January 2025)

Wave 2 - Type Safety & Cache (October 2025)

What It Does

Quick Start

1. Install Dependencies

2. Configure API Key

3. Run the Daemon

4. Configure Your Scanner

Automatic Startup (macOS)

Repository Structure

Configuration

Monitoring

Folder Organization

Supported Models

Documentation

Contributing

Architecture

License

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages