paralleldrive · ericelliott · Jan 23, 2026 · Jan 23, 2026 · Jan 23, 2026 · Jan 23, 2026
diff --git a/.gitignore b/.gitignore
@@ -1,2 +1,35 @@
 node_modules/
 .esm-cache/
+.env
+.claude/settings.local.json
+### Generated by gibo (https://github.com/simonwhitaker/gibo)
+### https://raw.github.com/github/gitignore/fc6ce5da28a8c3480cc8a5acad050449f72a9261/Global/macOS.gitignore
+
+# General
+.DS_Store
+__MACOSX/
+.AppleDouble
+.LSOverride
+Icon[
+]
+
+# Thumbnails
+._*
+
+# Files that might appear in the root of a volume
+.DocumentRevisions-V100
+.fseventsd
+.Spotlight-V100
+.TemporaryItems
+.Trashes
+.VolumeIcon.icns
+.com.apple.timemachine.donotpresent
+
+# Directories potentially created on remote AFP share
+.AppleDB
+.AppleDesktop
+Network Trash Folder
+Temporary Items
+.apdisk
+CLAUDE.md
+.claude/
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,43 @@
+# AI Agent Guidelines
+
+This project uses AI-assisted development with structured guidance in the `ai/` directory.
+
+## Directory Structure
+
+Agents should examine the `ai/*` directory listings to understand the available commands, rules, and workflows.
+
+## Index Files
+
+Each folder in the `ai/` directory contains an `index.md` file that describes the purpose and contents of that folder. Agents can read these index files to learn the function of files in each folder without needing to read every file.
+
+**Important:** The `ai/**/index.md` files are auto-generated from frontmatter. Do not create or edit these files manually—they will be overwritten by the pre-commit hook.
+
+## Progressive Discovery
+
+Agents should only consume the root index until they need subfolder contents. For example:
+- If the project is Python, there is no need to read JavaScript-specific folders
+- If working on backend logic, frontend UI folders can be skipped
+- Only drill into subfolders when the task requires that specific domain knowledge
+
+This approach minimizes context consumption and keeps agent responses focused.
+
+## Vision Document Requirement
+
+**Before creating or running any task, agents must first read the vision document (`vision.md`) in the project root.**
+
+The vision document serves as the source of truth for:
+- Project goals and objectives
+- Key constraints and non-negotiables
+- Architectural decisions and rationale
+- User experience principles
+- Success criteria
+
+## Conflict Resolution
+
+If any conflicts are detected between a requested task and the vision document, agents must:
+
+1. Stop and identify the specific conflict
+2. Explain how the task conflicts with the stated vision
+3. Ask the user to clarify how to resolve the conflict before proceeding
+
+Never proceed with a task that contradicts the vision without explicit user approval.
diff --git a/ARCHIVE-ORGANIZATION-SUMMARY.md b/ARCHIVE-ORGANIZATION-SUMMARY.md
@@ -0,0 +1,126 @@
+# Epic Archive Organization - Summary
+
+**Date**: February 2, 2026  
+**Task**: Organize Riteway AI Testing Framework epic documentation
+
+## Actions Completed
+
+### 1. ✅ Updated EPIC-REVIEW-2026-02-02.md
+
+**Updates Made**:
+- Updated Cursor CLI requirement status (line 149):
+  - Before: `Cursor: agent chat with --api-key | ⚠️ Modified | OAuth-aligned (no API keys), uses cursor agent`
+  - After: `Cursor: agent with OAuth | ✅ Updated | Uses agent --print --output-format json, OAuth-only (no API keys)`
+
+- Added new section: "Post-PR Updates: Cursor CLI Implementation (2026-02-02)"
+  - Documented Cursor CLI testing and validation
+  - Explained OAuth-only authentication decision
+  - Included UAT validation results
+  - Referenced CURSOR-CLI-TESTING-2026-02-02.md
+
+- Updated removed files list:
+  - Added: `source/fixtures/sample-test.sudo` (removed per PR review guidance)
+
+### 2. ✅ Created Archive Directory Structure
+
+Created organized epic archive:
+```
+tasks/archive/2026-01-22-riteway-ai-testing-framework/
+├── README.md                                        (NEW - Archive overview)
+├── 2026-01-22-riteway-ai-testing-framework.md      (MOVED - Original epic task)
+├── EPIC-REVIEW-2026-02-02.md                       (MOVED - Epic review)
+└── CURSOR-CLI-TESTING-2026-02-02.md                (MOVED - Cursor CLI testing)
+```
+
+### 3. ✅ Created Archive README
+
+**File**: `tasks/archive/2026-01-22-riteway-ai-testing-framework/README.md`
+
+**Contents**:
+- Epic overview and timeline
+- Key features delivered
+- Document descriptions (all 3 archived docs)
+- Implementation statistics
+- Key technical decisions
+- Command reference
+- Agent authentication setup
+- Future enhancement opportunities
+- Related PRs
+
+### 4. ✅ Verified Codebase Health
+
+**Git Status**:
+- Archive directory properly organized
+- Old locations marked for deletion
+- All changes tracked
+
+**Test Results**:
+- ✅ All 73 unit tests passing
+- ✅ No broken references
+- ✅ Clean codebase state
+
+## Archive Contents
+
+### Document Hierarchy
+
+1. **README.md** (4.3K)
+   - Entry point for understanding the epic
+   - Quick reference for commands and setup
+   - Links to detailed documentation
+
+2. **2026-01-22-riteway-ai-testing-framework.md** (15K)
+   - Original task specification
+   - Functional requirements
+   - Acceptance criteria
+   - Technical requirements
+
+3. **EPIC-REVIEW-2026-02-02.md** (22K)
+   - Complete implementation review
+   - Task completion verification
+   - Architecture documentation
+   - Test coverage analysis
+   - Post-PR updates section
+
+4. **CURSOR-CLI-TESTING-2026-02-02.md** (4.6K)
+   - Cursor CLI validation
+   - OAuth implementation details
+   - UAT test results
+   - Technical specifications
+
+## Benefits of This Organization
+
+1. **Clean Project Root**: Epic documentation moved from project root to dedicated archive
+2. **Clear Context**: All epic-related docs in one place with overview README
+3. **Easy Navigation**: README provides quick access to all epic information
+4. **Historical Record**: Complete audit trail from planning → implementation → testing
+5. **PR Ready**: Clean codebase state ready for next development cycle
+
+## Next Steps
+
+The codebase is now clean and ready for:
+- Creating new feature branches
+- Starting next epic
+- PR review process
+- Documentation updates
+
+## Related Files
+
+### Modified (Current Changes):
+- `README.md` - Cursor CLI authentication section updated
+- `bin/riteway` - Cursor agent config (OAuth-only)
+- `bin/riteway.test.js` - Tests updated for OAuth
+- `source/ai-runner.js` - Authentication error messages updated
+
+### Deleted (Cleanup):
+- `source/fixtures/sample-test.sudo` - Removed per PR guidance
+- `ai-evals/*.tap.md` - Test output files (generated, not source)
+- Root-level epic documents - Moved to archive
+
+### Added:
+- `tasks/archive/2026-01-22-riteway-ai-testing-framework/` - Complete epic archive
+
+---
+
+**Archive Organization Complete** ✅
+
+The Riteway AI Testing Framework epic is now properly documented and archived, with the codebase clean and ready for the next development cycle.
diff --git a/README.md b/README.md
@@ -27,6 +27,178 @@ Riteway's structured approach makes it ideal for AIDD:
 - **Simple API**: Minimal surface area reduces AI confusion and hallucinations
 - **Token efficient**: Concise syntax saves valuable context window space
 
+## Testing AI Prompts with `riteway ai`
+
+Riteway includes a powerful CLI for testing AI agent prompts and evaluating their outputs. Test your prompts as rigorously as you test your code.
+
+### Prerequisites
+
+Riteway AI tests use CLI tools with OAuth authentication (not API keys). This ensures tests run against your subscription rather than usage-based billing.
+
+**Claude CLI** (default):
+```bash
+# Set up OAuth token (one-time)
+claude setup-token
+```
+Docs: https://docs.claude.ai/docs/cli-authentication
+
+**OpenCode CLI**:
+Follow provider setup instructions at https://opencode.ai/docs/cli/
+
+**Cursor CLI**:
+```bash
+# Set up OAuth authentication (one-time)
+agent login
+
+# Verify authentication
+agent status
+```
+Docs: https://docs.cursor.com/context/rules-for-ai
+
+### Quick Start
+
+```bash
+# Run AI prompt tests with default settings (4 runs, 75% pass threshold)
+riteway ai test/my-prompt.sudo
+
+# Customize test runs and pass threshold
+riteway ai test/my-prompt.sudo --runs 10 --threshold 80
+
+# Use different AI agents
+riteway ai test/my-prompt.sudo --agent opencode
+riteway ai test/my-prompt.sudo --agent cursor
+```
+
+### How It Works
+
+AI prompt testing handles non-deterministic AI responses by:
+
+1. **Multiple runs**: Executes each test multiple times (default: 4)
+2. **Pass threshold**: Requires a percentage of runs to pass (default: 75%)
+3. **Parallel execution**: Runs tests in parallel for speed
+4. **Isolated contexts**: Each run gets a fresh subprocess for clean state
+
+### Test File Format
+
+AI test files are written in SudoLang/Markdown and passed directly to AI agents:
+
+```markdown
+# Sample AI Test
+
+Test the basic math capability of the AI.
+
+## Requirements
+
+- Given a simple addition prompt, should correctly add 2 + 2
+- Given the result, should output the number 4
+- Given the response format, should provide a JSON response with "passed" and "output" fields
+
+## Test Prompt
+
+What is 2 + 2? Please respond with JSON in this format:
+{
+  "passed": true,
+  "output": "2 + 2 = 4"
+}
+```
+
+### Test Output
+
+Results are saved to `ai-evals/` with:
+
+- **TAP format**: Standard Test Anything Protocol output
+- **Rich formatting**: Colorized, markdown-compatible
+- **Unique identifiers**: Date-stamped with unique slugs
+- **Browser preview**: Automatically opens results in your browser
+
+Example output path:
+```
+ai-evals/2026-01-23-my-prompt-abc123.tap.md
+```
+
+### Supported Agents
+
+Riteway supports multiple AI agents:
+
+- **Claude Code** (default): `--agent claude`
+- **OpenCode**: `--agent opencode`
+- **Cursor Agent**: `--agent cursor`
+
+Each agent runs in a subprocess with proper JSON output configuration.
+
+### CLI Options
+
+```bash
+riteway ai <test-file> [options]
+
+Options:
+  --runs <n>            Number of test runs (default: 4)
+  --threshold <p>       Required pass percentage 0-100 (default: 75)
+  --agent <name>        AI agent to use: claude|opencode|cursor (default: claude)
+  --debug               Enable debug output to console
+  --debug-log           Enable debug output and save to auto-generated log file
+```
+
+#### Debug Mode and Logging
+
+Use `--debug` to see detailed information about test execution in the console:
+
+```bash
+# Debug output to console only
+riteway ai test/my-prompt.sudo --debug
+
+# Debug output to console AND save to auto-generated log file
+riteway ai test/my-prompt.sudo --debug-log
+```
+
+When using `--debug-log`, the log file is automatically created in the `ai-evals/` directory with a timestamped filename matching the pattern: `YYYY-MM-DD-testname-xxxxx.debug.log`.
+
+Debug mode shows:
+- Agent command execution details
+- Prompt lengths and content previews
+- JSON parsing steps
+- Test extraction and evaluation results
+- Process exit codes and timing
+
+### Example: Testing a Prompt
+
+Create a test file `tests/math-test.sudo`:
+
+```markdown
+# Math Test
+
+## Test Prompt
+Solve: What is 15 + 27?
+
+Respond in JSON:
+{
+  "passed": true,
+  "output": "42"
+}
+```
+
+Run the test:
+
+```bash
+riteway ai tests/math-test.sudo --runs 5 --threshold 80
+```
+
+Output:
+
+```
+TAP version 13
+ok 1
+ok 2
+ok 3
+ok 4
+ok 5
+1..5
+# tests 5
+# pass  5
+
+Test results saved to: ai-evals/2026-01-23-math-test-xyz789.tap.md
+```
+
 ## The 5 Questions Every Test Must Answer
 
 There are [5 questions every unit test must answer](https://medium.com/javascript-scene/what-every-unit-test-needs-f6cd34d9836d). Riteway forces you to answer them.