Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
4d256c4
feat(ai-runner): implement core module with TDD (Task 2 partial)
ericelliott Jan 23, 2026
b5dceea
fix(ai-runner): address PR review feedback
ianwhitedeveloper Jan 23, 2026
6d32496
feat: add AI test runner with timeout and TAP output
ianwhitedeveloper Jan 23, 2026
a283915
feat(cli): add agent config and error-causes
ianwhitedeveloper Jan 23, 2026
5ad85ba
feat(ai): add E2E tests and complete documentation
ianwhitedeveloper Jan 23, 2026
1f8a35b
Update path for example SudoLang source code
ericelliott Jan 23, 2026
935862e
feat(ai-runner): implement core AI test infrastructure
ianwhitedeveloper Feb 2, 2026
3fc4a32
feat(cli): add AI test runner CLI support
ianwhitedeveloper Feb 2, 2026
3edb04b
feat(debug): add debug logging infrastructure
ianwhitedeveloper Feb 2, 2026
d6c2ecf
feat(output): add TAP colorization and security
ianwhitedeveloper Feb 2, 2026
d0a502e
docs(fixtures): add test fixtures and documentation
ianwhitedeveloper Feb 2, 2026
ab0c897
build: add dependencies for AI test runner
ianwhitedeveloper Feb 2, 2026
8efb7d2
docs(epic): organize AI testing framework epic
ianwhitedeveloper Feb 2, 2026
b0737f8
fix(ai): add OpenCode agent support with correct CLI configuration
ianwhitedeveloper Feb 5, 2026
c22c51b
docs: add PR #394 remediation plan
ianwhitedeveloper Feb 6, 2026
4aee50a
fix(ai-runner): resolve PR #394 review findings
ianwhitedeveloper Feb 6, 2026
fb5976a
feat(ai-runner): implement core module with TDD (Task 2 partial)
ericelliott Jan 23, 2026
59a4f10
fix(ai-runner): address PR review feedback
ianwhitedeveloper Jan 23, 2026
a122d25
feat: add AI test runner with timeout and TAP output
ianwhitedeveloper Jan 23, 2026
cbd22f8
feat(cli): add agent config and error-causes
ianwhitedeveloper Jan 23, 2026
03e797a
feat(ai): add E2E tests and complete documentation
ianwhitedeveloper Jan 23, 2026
5d27d22
Update path for example SudoLang source code
ericelliott Jan 23, 2026
0e9024c
feat(ai-runner): implement core AI test infrastructure
ianwhitedeveloper Feb 2, 2026
7c9b76a
feat(cli): add AI test runner CLI support
ianwhitedeveloper Feb 2, 2026
ad9d8ed
feat(debug): add debug logging infrastructure
ianwhitedeveloper Feb 2, 2026
bdbecf9
feat(output): add TAP colorization and security
ianwhitedeveloper Feb 2, 2026
0fd3674
docs(fixtures): add test fixtures and documentation
ianwhitedeveloper Feb 2, 2026
448f031
build: add dependencies for AI test runner
ianwhitedeveloper Feb 2, 2026
499998d
docs(epic): organize AI testing framework epic
ianwhitedeveloper Feb 2, 2026
3156cce
fix(ai): add OpenCode agent support with correct CLI configuration
ianwhitedeveloper Feb 5, 2026
bac3434
docs: add PR #394 remediation plan
ianwhitedeveloper Feb 6, 2026
864830c
fix(ai-runner): resolve PR #394 review findings
ianwhitedeveloper Feb 6, 2026
3d9c55f
chore: rebase main and regen lockfile
ianwhitedeveloper Feb 9, 2026
dfed8dd
docs: create draft remediation doc for PR 394
ianwhitedeveloper Feb 9, 2026
c8e99da
docs(pr394): expand remediation plan with architecture analysis
ianwhitedeveloper Feb 9, 2026
ee720c2
docs(pr394): clarify orchestrator is an AI agent
ianwhitedeveloper Feb 9, 2026
c3120cd
docs: remove redundant architecture document; remove draft status fro…
ianwhitedeveloper Feb 9, 2026
24f67da
chore: add macOS related files to gitignore
ianwhitedeveloper Feb 10, 2026
ac6697f
docs: add re-architected planning review with concerns
ianwhitedeveloper Feb 10, 2026
1a271ed
docs(pr394): add 3-agent architecture review with synthesis
ianwhitedeveloper Feb 10, 2026
328f478
Merge riteway-ai-testing-framework-implementation
claude Feb 10, 2026
6140beb
docs(ai-framework): add comprehensive architecture diagrams and analysis
claude Feb 10, 2026
e343f65
chore: ignore claude related files for now
ianwhitedeveloper Feb 10, 2026
63d1aed
Merge branch 'claude/testing-framework-diagrams-wpf8J' into riteway-a…
ianwhitedeveloper Feb 10, 2026
d6c638f
docs: two-agent refactor architecture + epic
ianwhitedeveloper Feb 10, 2026
33502b2
chore: archive outdated planning documents
ianwhitedeveloper Feb 10, 2026
9dc352a
test: remove IIFEs + use Try for error tests
ianwhitedeveloper Feb 11, 2026
9ce41c7
feat: add buildResultPrompt + buildJudgePrompt + parseTAPYAML
ianwhitedeveloper Feb 11, 2026
94cfc40
feat: add normalizeJudgment + averageScore
ianwhitedeveloper Feb 11, 2026
1465893
fix: address code review findings
ianwhitedeveloper Feb 11, 2026
b277379
feat: agent-directed imports + remove parseImports
ianwhitedeveloper Feb 11, 2026
efdfd13
feat: add score/actual/expected TAP diagnostics
ianwhitedeveloper Feb 11, 2026
a871726
docs: update epic progress T1-T7, T9 complete
ianwhitedeveloper Feb 11, 2026
1797153
feat: two-agent flow in runAITests + rawOutput
ianwhitedeveloper Feb 11, 2026
d581936
docs: update epic progress T1-T9 complete (9/13)
ianwhitedeveloper Feb 11, 2026
5090cb5
feat: Zod schema validation + centralized defaults
ianwhitedeveloper Feb 11, 2026
7c0a258
refactor: error-causes switch in ai-runner.js
ianwhitedeveloper Feb 11, 2026
971cbc8
docs: update epic progress T1-T11 complete (11/13)
ianwhitedeveloper Feb 11, 2026
35d9738
refactor: eliminate mutations in runAICommand
ianwhitedeveloper Feb 11, 2026
41b07bd
docs: update epic progress T1-T12 complete (12/13)
ianwhitedeveloper Feb 11, 2026
a11656f
docs: failure fixture + architecture flowchart
ianwhitedeveloper Feb 11, 2026
ee48a14
docs: epic complete 13/13 + flowchart rework
ianwhitedeveloper Feb 11, 2026
fe34517
refactor: review remediation H1-H3 + rename
ianwhitedeveloper Feb 11, 2026
6d68e81
fix: Zod v4 API + parseTAPYAML type annotation
ianwhitedeveloper Feb 11, 2026
af7761a
fix(color): apply --color to terminal output
ianwhitedeveloper Feb 11, 2026
84fbcbc
chore: remove unused dev dependencies
ianwhitedeveloper Feb 11, 2026
cc83873
feat(cli): add --agent-config flag
ianwhitedeveloper Feb 11, 2026
bef6f72
docs: add PR #394 remediation epic to plan
ianwhitedeveloper Feb 11, 2026
4d2830f
refactor(cli): centralize defaults, drop flag
ianwhitedeveloper Feb 11, 2026
308b05a
refactor: merge desc/requirement, add projectRoot
ianwhitedeveloper Feb 11, 2026
8531d97
docs: fix Wave 1 review findings
ianwhitedeveloper Feb 11, 2026
9bff900
refactor(tap): replace mutation with array-join
ianwhitedeveloper Feb 11, 2026
04744dc
refactor(test): convert try/catch to Try helper
ianwhitedeveloper Feb 11, 2026
3b15711
refactor(cli): extract formatZodError helper
ianwhitedeveloper Feb 11, 2026
e5cf46f
docs: mark Wave 2 complete in epic
ianwhitedeveloper Feb 11, 2026
fcb1561
refactor(ai-runner): extract to focused modules
ianwhitedeveloper Feb 11, 2026
5f9b06b
refactor: consolidate errors, colocate tests
ianwhitedeveloper Feb 11, 2026
70a7190
refactor(extractor): extract to focused modules
ianwhitedeveloper Feb 11, 2026
6372804
refactor(cli): extract to focused modules
ianwhitedeveloper Feb 11, 2026
5a967da
docs: mark Wave 3 complete in epic
ianwhitedeveloper Feb 11, 2026
4e93f1e
refactor: remove re-exports, fix formatZodError DRY
ianwhitedeveloper Feb 11, 2026
b0c7149
fix: always show auth guidance on failure
ianwhitedeveloper Feb 11, 2026
3cc0a60
test(e2e): add wrong-prompt failure fixture
ianwhitedeveloper Feb 11, 2026
98cbfc3
docs: update epic — Wave 4 complete
ianwhitedeveloper Feb 11, 2026
87c64b4
docs: update architecture with post-remediation module map
ianwhitedeveloper Feb 11, 2026
34c8e6c
docs: add test scripts + finalize epic
ianwhitedeveloper Feb 11, 2026
c855f8b
chore: archive completed remediation epic
ianwhitedeveloper Feb 11, 2026
acd15bb
docs: update flowchart + archive completed docs
ianwhitedeveloper Feb 11, 2026
5339047
fix(cli): unify error registries
ianwhitedeveloper Feb 11, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,35 @@
node_modules/
.esm-cache/
.env
.claude/settings.local.json
### Generated by gibo (https://github.com/simonwhitaker/gibo)
### https://raw.github.com/github/gitignore/fc6ce5da28a8c3480cc8a5acad050449f72a9261/Global/macOS.gitignore

# General
.DS_Store
__MACOSX/
.AppleDouble
.LSOverride
Icon[
]

# Thumbnails
._*

# Files that might appear in the root of a volume
.DocumentRevisions-V100
.fseventsd
.Spotlight-V100
.TemporaryItems
.Trashes
.VolumeIcon.icns
.com.apple.timemachine.donotpresent

# Directories potentially created on remote AFP share
.AppleDB
.AppleDesktop
Network Trash Folder
Temporary Items
.apdisk
CLAUDE.md
.claude/
43 changes: 43 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# AI Agent Guidelines

This project uses AI-assisted development with structured guidance in the `ai/` directory.

## Directory Structure

Agents should examine the `ai/*` directory listings to understand the available commands, rules, and workflows.

## Index Files

Each folder in the `ai/` directory contains an `index.md` file that describes the purpose and contents of that folder. Agents can read these index files to learn the function of files in each folder without needing to read every file.

**Important:** The `ai/**/index.md` files are auto-generated from frontmatter. Do not create or edit these files manually—they will be overwritten by the pre-commit hook.

## Progressive Discovery

Agents should only consume the root index until they need subfolder contents. For example:
- If the project is Python, there is no need to read JavaScript-specific folders
- If working on backend logic, frontend UI folders can be skipped
- Only drill into subfolders when the task requires that specific domain knowledge

This approach minimizes context consumption and keeps agent responses focused.

## Vision Document Requirement

**Before creating or running any task, agents must first read the vision document (`vision.md`) in the project root.**

The vision document serves as the source of truth for:
- Project goals and objectives
- Key constraints and non-negotiables
- Architectural decisions and rationale
- User experience principles
- Success criteria

## Conflict Resolution

If any conflicts are detected between a requested task and the vision document, agents must:

1. Stop and identify the specific conflict
2. Explain how the task conflicts with the stated vision
3. Ask the user to clarify how to resolve the conflict before proceeding

Never proceed with a task that contradicts the vision without explicit user approval.
126 changes: 126 additions & 0 deletions ARCHIVE-ORGANIZATION-SUMMARY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Epic Archive Organization - Summary

**Date**: February 2, 2026
**Task**: Organize Riteway AI Testing Framework epic documentation

## Actions Completed

### 1. ✅ Updated EPIC-REVIEW-2026-02-02.md

**Updates Made**:
- Updated Cursor CLI requirement status (line 149):
- Before: `Cursor: agent chat with --api-key | ⚠️ Modified | OAuth-aligned (no API keys), uses cursor agent`
- After: `Cursor: agent with OAuth | ✅ Updated | Uses agent --print --output-format json, OAuth-only (no API keys)`

- Added new section: "Post-PR Updates: Cursor CLI Implementation (2026-02-02)"
- Documented Cursor CLI testing and validation
- Explained OAuth-only authentication decision
- Included UAT validation results
- Referenced CURSOR-CLI-TESTING-2026-02-02.md

- Updated removed files list:
- Added: `source/fixtures/sample-test.sudo` (removed per PR review guidance)

### 2. ✅ Created Archive Directory Structure

Created organized epic archive:
```
tasks/archive/2026-01-22-riteway-ai-testing-framework/
├── README.md (NEW - Archive overview)
├── 2026-01-22-riteway-ai-testing-framework.md (MOVED - Original epic task)
├── EPIC-REVIEW-2026-02-02.md (MOVED - Epic review)
└── CURSOR-CLI-TESTING-2026-02-02.md (MOVED - Cursor CLI testing)
```

### 3. ✅ Created Archive README

**File**: `tasks/archive/2026-01-22-riteway-ai-testing-framework/README.md`

**Contents**:
- Epic overview and timeline
- Key features delivered
- Document descriptions (all 3 archived docs)
- Implementation statistics
- Key technical decisions
- Command reference
- Agent authentication setup
- Future enhancement opportunities
- Related PRs

### 4. ✅ Verified Codebase Health

**Git Status**:
- Archive directory properly organized
- Old locations marked for deletion
- All changes tracked

**Test Results**:
- ✅ All 73 unit tests passing
- ✅ No broken references
- ✅ Clean codebase state

## Archive Contents

### Document Hierarchy

1. **README.md** (4.3K)
- Entry point for understanding the epic
- Quick reference for commands and setup
- Links to detailed documentation

2. **2026-01-22-riteway-ai-testing-framework.md** (15K)
- Original task specification
- Functional requirements
- Acceptance criteria
- Technical requirements

3. **EPIC-REVIEW-2026-02-02.md** (22K)
- Complete implementation review
- Task completion verification
- Architecture documentation
- Test coverage analysis
- Post-PR updates section

4. **CURSOR-CLI-TESTING-2026-02-02.md** (4.6K)
- Cursor CLI validation
- OAuth implementation details
- UAT test results
- Technical specifications

## Benefits of This Organization

1. **Clean Project Root**: Epic documentation moved from project root to dedicated archive
2. **Clear Context**: All epic-related docs in one place with overview README
3. **Easy Navigation**: README provides quick access to all epic information
4. **Historical Record**: Complete audit trail from planning → implementation → testing
5. **PR Ready**: Clean codebase state ready for next development cycle

## Next Steps

The codebase is now clean and ready for:
- Creating new feature branches
- Starting next epic
- PR review process
- Documentation updates

## Related Files

### Modified (Current Changes):
- `README.md` - Cursor CLI authentication section updated
- `bin/riteway` - Cursor agent config (OAuth-only)
- `bin/riteway.test.js` - Tests updated for OAuth
- `source/ai-runner.js` - Authentication error messages updated

### Deleted (Cleanup):
- `source/fixtures/sample-test.sudo` - Removed per PR guidance
- `ai-evals/*.tap.md` - Test output files (generated, not source)
- Root-level epic documents - Moved to archive

### Added:
- `tasks/archive/2026-01-22-riteway-ai-testing-framework/` - Complete epic archive

---

**Archive Organization Complete** ✅

The Riteway AI Testing Framework epic is now properly documented and archived, with the codebase clean and ready for the next development cycle.
172 changes: 172 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,178 @@ Riteway's structured approach makes it ideal for AIDD:
- **Simple API**: Minimal surface area reduces AI confusion and hallucinations
- **Token efficient**: Concise syntax saves valuable context window space

## Testing AI Prompts with `riteway ai`

Riteway includes a powerful CLI for testing AI agent prompts and evaluating their outputs. Test your prompts as rigorously as you test your code.

### Prerequisites

Riteway AI tests use CLI tools with OAuth authentication (not API keys). This ensures tests run against your subscription rather than usage-based billing.

**Claude CLI** (default):
```bash
# Set up OAuth token (one-time)
claude setup-token
```
Docs: https://docs.claude.ai/docs/cli-authentication

**OpenCode CLI**:
Follow provider setup instructions at https://opencode.ai/docs/cli/

**Cursor CLI**:
```bash
# Set up OAuth authentication (one-time)
agent login

# Verify authentication
agent status
```
Docs: https://docs.cursor.com/context/rules-for-ai

### Quick Start

```bash
# Run AI prompt tests with default settings (4 runs, 75% pass threshold)
riteway ai test/my-prompt.sudo

# Customize test runs and pass threshold
riteway ai test/my-prompt.sudo --runs 10 --threshold 80

# Use different AI agents
riteway ai test/my-prompt.sudo --agent opencode
riteway ai test/my-prompt.sudo --agent cursor
```

### How It Works

AI prompt testing handles non-deterministic AI responses by:

1. **Multiple runs**: Executes each test multiple times (default: 4)
2. **Pass threshold**: Requires a percentage of runs to pass (default: 75%)
3. **Parallel execution**: Runs tests in parallel for speed
4. **Isolated contexts**: Each run gets a fresh subprocess for clean state

### Test File Format

AI test files are written in SudoLang/Markdown and passed directly to AI agents:

```markdown
# Sample AI Test

Test the basic math capability of the AI.

## Requirements

- Given a simple addition prompt, should correctly add 2 + 2
- Given the result, should output the number 4
- Given the response format, should provide a JSON response with "passed" and "output" fields

## Test Prompt

What is 2 + 2? Please respond with JSON in this format:
{
"passed": true,
"output": "2 + 2 = 4"
}
```

### Test Output

Results are saved to `ai-evals/` with:

- **TAP format**: Standard Test Anything Protocol output
- **Rich formatting**: Colorized, markdown-compatible
- **Unique identifiers**: Date-stamped with unique slugs
- **Browser preview**: Automatically opens results in your browser

Example output path:
```
ai-evals/2026-01-23-my-prompt-abc123.tap.md
```

### Supported Agents

Riteway supports multiple AI agents:

- **Claude Code** (default): `--agent claude`
- **OpenCode**: `--agent opencode`
- **Cursor Agent**: `--agent cursor`

Each agent runs in a subprocess with proper JSON output configuration.

### CLI Options

```bash
riteway ai <test-file> [options]

Options:
--runs <n> Number of test runs (default: 4)
--threshold <p> Required pass percentage 0-100 (default: 75)
--agent <name> AI agent to use: claude|opencode|cursor (default: claude)
--debug Enable debug output to console
--debug-log Enable debug output and save to auto-generated log file
```

#### Debug Mode and Logging

Use `--debug` to see detailed information about test execution in the console:

```bash
# Debug output to console only
riteway ai test/my-prompt.sudo --debug

# Debug output to console AND save to auto-generated log file
riteway ai test/my-prompt.sudo --debug-log
```

When using `--debug-log`, the log file is automatically created in the `ai-evals/` directory with a timestamped filename matching the pattern: `YYYY-MM-DD-testname-xxxxx.debug.log`.

Debug mode shows:
- Agent command execution details
- Prompt lengths and content previews
- JSON parsing steps
- Test extraction and evaluation results
- Process exit codes and timing

### Example: Testing a Prompt

Create a test file `tests/math-test.sudo`:

```markdown
# Math Test

## Test Prompt
Solve: What is 15 + 27?

Respond in JSON:
{
"passed": true,
"output": "42"
}
```

Run the test:

```bash
riteway ai tests/math-test.sudo --runs 5 --threshold 80
```

Output:

```
TAP version 13
ok 1
ok 2
ok 3
ok 4
ok 5
1..5
# tests 5
# pass 5

Test results saved to: ai-evals/2026-01-23-math-test-xyz789.tap.md
```

## The 5 Questions Every Test Must Answer

There are [5 questions every unit test must answer](https://medium.com/javascript-scene/what-every-unit-test-needs-f6cd34d9836d). Riteway forces you to answer them.
Expand Down
Loading