Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,4 +86,4 @@ Update docs in the same PR when changing:
- Security: `docs/SECURITY.md`
- Reliability: `docs/RELIABILITY.md`
- Autonomous E2E workflow: `docs/runbooks/autonomous-agent-e2e.md`

- PR evaluation harness: `docs/runbooks/pr-eval-harness.md`
29 changes: 29 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,35 @@ A file-first knowledge base with Git-backed approval workflows. Edit notes local
- [ARCHITECTURE.md](ARCHITECTURE.md) — Domain boundaries and flows
- [AGENTS.md](AGENTS.md) — Entry point for AI agents
- [docs/](docs/) — Design docs, runbooks, security, reliability
- [docs/runbooks/pr-eval-harness.md](docs/runbooks/pr-eval-harness.md) — Review another branch from an isolated `main`-based worktree

## Review Another PR Locally

Use the local harness when you want to evaluate someone else's branch without
switching your active checkout:

```bash
python3 scripts/eval_pr.py <target-ref>
```

Common examples:

```bash
python3 scripts/eval_pr.py origin/some-branch
python3 scripts/eval_pr.py my-local-branch --keep-temp
python3 scripts/eval_pr.py HEAD --tests-only
```

The harness:

- creates a temp worktree rooted from local `main`
- resolves symbolic refs like `HEAD` in your current checkout before populating that worktree
- runs stable targeted tests for changed areas
- can launch `kb-server` and `vault-sync` in `tmux`
- exercises the current mainline write/sync workflow end to end

See [docs/runbooks/pr-eval-harness.md](docs/runbooks/pr-eval-harness.md) for
prerequisites, flags, and failure inspection.

## Docs Checks Before PR

Expand Down
7 changes: 3 additions & 4 deletions docs/exec-plans/active/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
---
owner: platform
status: draft
last_verified: 2026-03-21
last_verified: 2026-03-16
source_of_truth:
- ../completed/README.md
- ../../PLANS.md
related_code:
- ../../../scripts/docs_lint.py
related_tests:
Expand All @@ -14,7 +13,7 @@ review_cycle_days: 30

# Active Plans

Place in-progress execution plans here.
Move in-progress execution plans here while work is underway.

- Add one file per active plan.
- Move completed plans to `../completed/`.
- Keep active plan filenames stable so `docs/PLANS.md` and agent navigation remain valid.
4 changes: 2 additions & 2 deletions docs/generated/api-surface.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
owner: platform
status: generated
last_verified: 2026-03-07
last_verified: 2026-03-22
source_of_truth:
- ../../kb-server/app/api/routes/health.py
- ../../kb-server/app/api/routes/notes.py
Expand All @@ -15,7 +15,7 @@ review_cycle_days: 7

# API Surface (Generated)

Generated on `2026-03-07` from route handlers.
Generated on `2026-03-22` from route handlers.

| Method | Path |
| --- | --- |
Expand Down
4 changes: 2 additions & 2 deletions docs/generated/env-catalog.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
owner: platform
status: generated
last_verified: 2026-03-07
last_verified: 2026-03-22
source_of_truth:
- ../../kb-server/.env.example
- ../../kb-server/app/core/config.py
Expand All @@ -16,7 +16,7 @@ review_cycle_days: 7

# Environment Catalog (Generated)

Generated on `2026-03-07` from settings and env sources.
Generated on `2026-03-22` from settings and env sources.

## kb-server `.env.example`

Expand Down
3 changes: 2 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,4 +49,5 @@ review_cycle_days: 14
- `runbooks/incident-response.md`
- `runbooks/backup-restore.md`
- `runbooks/autonomous-agent-e2e.md`

- `runbooks/local-role-auth-e2e.md`
- `runbooks/pr-eval-harness.md`
137 changes: 137 additions & 0 deletions docs/runbooks/pr-eval-harness.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
---
owner: platform
status: draft
last_verified: 2026-03-16
source_of_truth:
- ../../scripts/eval_pr.py
- ../../kb-server/app/api/routes/notes.py
- ../../vault-sync/vault_sync/api_client.py
related_code:
- ../../kb-server/app/services/git_batcher.py
- ../../vault-sync/vault_sync/sync.py
related_tests:
- ../../kb-server/tests/test_current_view.py
- ../../kb-server/tests/test_source_and_delete.py
- ../../vault-sync/tests/test_api_client.py
- ../../vault-sync/tests/test_sync.py
review_cycle_days: 14
---

# PR Evaluation Harness

Use `scripts/eval_pr.py` to evaluate a target ref from an isolated worktree rooted
from local `main`.

The harness is meant for reviewing other people's branches without touching your
active checkout. It provisions a temp worktree, temp vault, temp bare remote,
temp SQLite database, and a tmux-backed local stack.

## Prerequisites

- local `main` exists and is up to date enough for comparison
- repo-local virtualenvs already exist:
- `kb-server/.venv`
- `vault-sync/.venv`
- `git` and `tmux` are installed

## Default flow

```bash
cd /path/to/flight-deck
python3 scripts/eval_pr.py <target-ref>
```

Example refs:

```bash
python3 scripts/eval_pr.py origin/some-branch
python3 scripts/eval_pr.py my-local-branch
python3 scripts/eval_pr.py HEAD
```

Symbolic refs such as `HEAD` are resolved in your current checkout before the temp
worktree is created, so the harness evaluates the commit you asked for rather than
the temp worktree's initial `main` HEAD.

By default the harness:

- creates a temp git worktree from `main`
- resolves `<target-ref>` to a commit in your current checkout, then checks out that commit inside the temp worktree
- runs targeted `kb-server` and `vault-sync` tests when those areas changed
- starts `kb-server` in `tmux`
- starts `vault-sync` in `tmux`
- validates current main behavior:
- API key enforcement using `KB_API_KEY`
- `source=human` writes commit directly to `main`
- default API writes batch into `kb-api/YYYY-MM-DD`
- `view=current` includes pending branch content
- `vault-sync` pulls `view=current` and pushes local edits with `source=human`

The default `kb-server` test subset intentionally avoids `tests/test_notes_api.py`
because that file is currently failing on `main`; auth behavior is covered by the
harness's own HTTP smoke checks instead.

## Useful flags

```bash
python3 scripts/eval_pr.py <target-ref> --keep-temp
python3 scripts/eval_pr.py <target-ref> --tests-only
python3 scripts/eval_pr.py <target-ref> --e2e-only
python3 scripts/eval_pr.py <target-ref> --no-sync
python3 scripts/eval_pr.py <target-ref> --port 8021
python3 scripts/eval_pr.py <target-ref> --tmux-session-name fd-review
```

## Recommended review flow

For a normal review pass:

```bash
git fetch origin
python3 scripts/eval_pr.py origin/<branch-name>
```

For a quicker check when you only want test feedback:

```bash
python3 scripts/eval_pr.py origin/<branch-name> --tests-only
```

For debugging a runtime issue and keeping the temp environment around:

```bash
python3 scripts/eval_pr.py origin/<branch-name> --keep-temp
```

## Artifacts and inspection

The script prints:

- temp root
- temp worktree path
- changed files relative to `main`
- interpreter paths
- logs directory
- tmux session name

On failure it keeps the temp root and any harness-created worktree/tmux session so
you can inspect:

- env files
- runtime logs
- worktree contents
- tmux panes

If the requested tmux session name already exists, the harness stops before
starting services and leaves that pre-existing session untouched.

Useful commands after a failed run:

```bash
tmux attach -t <session-name>
tmux capture-pane -pt <session-name>:0.0
tmux capture-pane -pt <session-name>:0.1
```

`--keep-temp` uses the same preservation behavior after a successful run. Remove the
temp directory and any preserved worktree/tmux session yourself after inspection.
Loading
Loading