diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json
index ee5f161..1bf7dd4 100644
--- a/.claude-plugin/marketplace.json
+++ b/.claude-plugin/marketplace.json
@@ -10,6 +10,14 @@
     "license": "MIT"
   },
   "plugins": [
+    {
+      "name": "agentic-harness",
+      "source": "./agentic-harness",
+      "description": "Stand up, assess, and maintain an agentic harness in an existing repo — generate project-specific agent teams and the skills they use, then assess how effectively they are used",
+      "version": "0.1.0",
+      "category": "engineering",
+      "tags": ["harness", "agents", "skills", "scaffolding", "orchestration", "multi-agent", "meta-skill"]
+    },
     {
       "name": "developer-tools",
       "source": "./developer-tools",
diff --git a/CLAUDE.md b/CLAUDE.md
index 7d8b7e2..1a98469 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -47,3 +47,9 @@ This is a static marketplace — no build steps. Plugins are collections of mark
 ## Shell script conventions
 
 Scripts must be macOS (BSD) compatible — no `grep -P`, no `head -n -1`. Use `grep -E`/`grep -oE` and `awk` instead. Avoid `((VAR++))` with `set -e` (fails when VAR=0); use `VAR=$((VAR + 1))`.
+
+## Git / PR Working Policy
+- Worktree usage: disabled
+- Worktree location: n/a (topic branch in main checkout)
+- Base branch: main
+- Recorded on: 2026-06-01
diff --git a/README.md b/README.md
index da835bb..5c5c2c4 100644
--- a/README.md
+++ b/README.md
@@ -6,6 +6,7 @@ A curated collection of [Claude Code](https://claude.com/claude-code) plugins fo
 
 | Plugin | Description | Category | Version |
 |--------|-------------|----------|---------|
+| [agentic-harness](./agentic-harness) | Stand up, assess, and maintain an agentic harness — generate project-specific agent teams and the skills they use, then assess how effectively they are used | Engineering | 0.1.0 |
 | [developer-tools](./developer-tools) | Developer environment tooling — devcontainer generation, stack detection, infrastructure config | Engineering | 2.4.0 |
 | [human-resources](./human-resources) | HR interview workflow — job descriptions, pre-screening, interview prep, evaluation, compliance | Human Resources | 0.2.0 |
 | [kaizen](./kaizen) | Continuous improvement loops — recursive optimization engine with profiles for Claude Code usage, refactoring, and process improvement | Engineering | 1.0.0 |
@@ -13,6 +14,8 @@ A curated collection of [Claude Code](https://claude.com/claude-code) plugins fo
 | [project-management](./project-management) | SOW writing, review, estimation, and PMI-compliant PERT analysis — integrated project management pipeline | Operations | 1.0.0 |
 | [tech-writing](./tech-writing) | Technical writing support — documentation structure, style guides, content review | Documentation | 0.1.0 |
 
+> The [`agentic-harness`](./agentic-harness) plugin is inspired by the open-source `harness` plugin by revfactory (Apache-2.0). It is an independent reimplementation under MIT, with its own structure and prose.
+
 ## Installation
 
 Add this marketplace to Claude Code:
diff --git a/agentic-harness/.claude-plugin/plugin.json b/agentic-harness/.claude-plugin/plugin.json
new file mode 100644
index 0000000..a62ff7f
--- /dev/null
+++ b/agentic-harness/.claude-plugin/plugin.json
@@ -0,0 +1,22 @@
+{
+  "name": "agentic-harness",
+  "version": "0.1.0",
+  "description": "Stand up, assess, and maintain an agentic harness in an existing repo. A meta-tool that generates project-specific agent teams and the skills they use, then assesses how effectively they are used. Two skills: harness-setup (build, extend, maintain) and harness-review (read-only assessment).",
+  "author": {
+    "name": "MrBogomips",
+    "url": "https://github.com/MrBogomips"
+  },
+  "license": "MIT",
+  "keywords": [
+    "harness",
+    "agentic-harness",
+    "agent-team",
+    "skill-architect",
+    "meta-skill",
+    "orchestration",
+    "scaffolding",
+    "agent-scaffolding",
+    "multi-agent",
+    "claude-code-plugin"
+  ]
+}
diff --git a/agentic-harness/README.md b/agentic-harness/README.md
new file mode 100644
index 0000000..b28c12d
--- /dev/null
+++ b/agentic-harness/README.md
@@ -0,0 +1,28 @@
+# agentic-harness
+
+Stand up, assess, and maintain an **agentic harness** inside an existing repository — the project-local agents, skills, and orchestrator that make a repo work well with Claude Code.
+
+This plugin does not do your domain work. It builds and maintains the agents and skills that do.
+
+## The two skills
+
+- **`harness-setup`** — the writer. Analyzes the project, designs an agent team and the skills they use, generates them into `.claude/`, builds an orchestrator, and registers a pointer in `CLAUDE.md`. Also extends an existing harness, applies a review context, and records every change.
+- **`harness-review`** — read-only. Inventories the harness, detects drift, and assesses how effectively the skills and agents are actually used (from project memory, the `CLAUDE.md` pointer, and the `.claude/` inventory), then produces a prioritized *review context* that `harness-setup` can act on.
+
+The loop: **review → setup → review again.**
+
+## Skills, not commands
+
+Invoke a skill directly (`/agentic-harness:harness-setup`) or let Claude trigger it from context. This plugin ships no slash commands — Claude Code merged commands into skills.
+
+## Execution modes
+
+`harness-setup` defaults to an **agent team** and falls back to **subagents** when the experimental team tools are unavailable. See `shared/execution-modes.md`.
+
+## Relationship to `kaizen`
+
+`kaizen` optimizes assets against measured KPIs. `agentic-harness` builds, assesses, and maintains the harness those assets live in. They compose.
+
+## Inspired by
+
+The harness concept is inspired by prior work in the community; this is an independent implementation. Credit lives in the repository README.
diff --git a/agentic-harness/shared/claude-md-pointer.md b/agentic-harness/shared/claude-md-pointer.md
new file mode 100644
index 0000000..d3cdf53
--- /dev/null
+++ b/agentic-harness/shared/claude-md-pointer.md
@@ -0,0 +1,61 @@
+# The CLAUDE.md pointer
+
+A target project's `CLAUDE.md` is loaded into context at the start of every session. That
+makes it the right place to record that a harness exists and when it should fire — and the
+wrong place for anything the file system already holds.
+
+## Register a minimal pointer
+
+After a harness is built or changed, write (or update) one short section in the target
+project's `CLAUDE.md`. It carries three things and nothing more:
+
+1. The harness's **goal**, in one line.
+2. The **trigger rule** — which orchestrator skill to use, and for which kind of request.
+3. A **change-history** table.
+
+This is enough for a fresh session: the trigger rule routes domain requests to the
+orchestrator, and the orchestrator handles the rest from the files under `.claude/`.
+
+### Template
+
+````markdown
+## Harness: {domain}
+
+**Goal:** {one line on what this harness produces}
+
+**Trigger:** For {domain} work, use the `{orchestrator-skill-name}` skill. Answer simple
+questions directly.
+
+**Change history:**
+| Date | Change | Target | Reason |
+|------|--------|--------|--------|
+| {YYYY-MM-DD} | Initial setup | All | — |
+````
+
+## What not to put here
+
+Leave these out of `CLAUDE.md`:
+
+- The agent list or the skill list — they live in `.claude/agents/` and `.claude/skills/`,
+  and the orchestrator already knows them. Copying them here creates a second source of
+  truth that drifts.
+- The directory structure — readable straight from the file system.
+- Detailed execution rules — they belong in the skills and the orchestrator.
+
+The pointer is a signpost, not a manifest. Keep it small enough that it stays correct.
+
+## The change-history table
+
+Every write to the harness appends a row. The columns are fixed:
+
+| Date | Change | Target | Reason |
+|------|--------|--------|--------|
+| 2026-04-05 | Initial setup | All | — |
+| 2026-04-07 | Added QA agent | `agents/qa.md` | deliverable-quality gaps reported |
+| 2026-04-10 | Added tone guidance | `skills/content-writer` | output read as too stiff |
+
+The table earns its place: it shows how the harness has evolved and why, which makes
+regressions visible and gives the next reviewer a starting point. Recording history is a
+required step of every harness write — not an optional courtesy.
+
+On periodic review this table can be pruned, preserving valuable information and keeping it readable.
diff --git a/agentic-harness/shared/execution-modes.md b/agentic-harness/shared/execution-modes.md
new file mode 100644
index 0000000..3a2baa4
--- /dev/null
+++ b/agentic-harness/shared/execution-modes.md
@@ -0,0 +1,85 @@
+# Execution modes
+
+A harness coordinates more than one agent. There are three ways to run that coordination.
+Pick one per harness — or, in hybrid, one per phase — and state the choice explicitly in
+the orchestrator.
+
+## Before you choose: the team-tools caveat
+
+The agent-team mode depends on experimental tools: `TeamCreate`, `SendMessage`,
+`TaskCreate`/`TaskUpdate`, `TeamDelete`. These are **not guaranteed to be available** in a
+given Claude Code build, session, or permission setup. Treat their presence as a runtime
+fact to check, not an assumption.
+
+Because of that, every team-mode harness must define a **subagent fallback**. The subagent
+mode uses only the `Agent` tool, which is broadly available. A harness that can only run as
+a team is brittle; a harness that prefers a team and falls back to subagents is not. Build
+the fallback at the same time as the team path, not later.
+
+When the team tools are missing, do not fail — re-route to the subagent equivalent
+(`Agent` with `run_in_background`) and note in the run that team coordination was
+unavailable. The mapping below makes that re-route mechanical.
+
+## The three modes
+
+| Mode | Use when | Mechanism |
+|------|----------|-----------|
+| **Agent team** (default) | Two or more agents collaborate and benefit from real-time exchange — sharing findings, challenging each other, reconciling conflicts | Members run as peers; coordinate via `TeamCreate` + `SendMessage` + a shared task list (`TaskCreate`/`TaskUpdate`) |
+| **Subagent** (fallback and lightweight default) | A single agent's work, or several independent jobs where only the result matters and inter-agent talk would be overhead | The orchestrator calls the `Agent` tool directly; parallelize with `run_in_background` and collect return values |
+| **Hybrid** | Phases differ in character — e.g. independent collection, then consensus integration | Choose the mode per phase; state each phase's mode in the orchestrator |
+
+The team mode is the *preferred* default when agents genuinely need to talk: cross-checking
+and shared discovery raise quality in a way isolated subagents cannot. But "preferred" is
+conditional on the tools being present and on the work actually needing coordination. When
+either is false, subagents are the right call, not a downgrade.
+
+## Decision order
+
+1. Is this a single agent? → **subagent**. One agent needs no team.
+2. Two or more agents: do they need to exchange information mid-task (challenge findings,
+   resolve conflicts, hand off partial state)? → if yes, **agent team**; if no — only the
+   final results combine — **subagent** is enough and cheaper.
+3. Are the team tools unavailable? → **subagent fallback**, using the mapping below.
+4. Do phases differ markedly in whether coordination helps? → **hybrid**, mode stated per
+   phase.
+
+## Team-to-subagent fallback mapping
+
+When team mode is unavailable, translate it mechanically:
+
+| Team-mode mechanism | Subagent equivalent |
+|---------------------|---------------------|
+| `TeamCreate` + member prompts | One `Agent` call per member, each with its role prompt |
+| Parallel members | `Agent` calls with `run_in_background: true`, collected after |
+| `SendMessage` between members | No direct channel — members can't talk; route shared context through files the orchestrator writes and each agent reads |
+| `TaskCreate`/`TaskUpdate` shared list | The orchestrator holds the task list itself and assigns work in the spawn prompts |
+| Leader synthesis after team | Orchestrator reads each agent's output file/return value and integrates |
+
+The cost of the fallback is real: subagents cannot debate or revise each other mid-flight.
+Where the team relied on that, compensate by adding an explicit integration or review step
+that reconciles the independent outputs afterward.
+
+## Data-passing per mode
+
+State the data-passing method inside the orchestrator. Match it to the mode:
+
+| Method | How | Mode | Fits |
+|--------|-----|------|------|
+| Message-based | direct member-to-member via `SendMessage` | team | real-time coordination, lightweight state hand-off |
+| Task-based | shared work state via `TaskCreate`/`TaskUpdate` | team | progress tracking, dependency management |
+| File-based | write and read files at agreed paths | team + subagent | large or structured deliverables, audit trail |
+| Return-value-based | the `Agent` tool's return message | subagent | the orchestrator collects results directly |
+
+- **Team mode:** task-based for coordination + file-based for deliverables + message-based
+  for live exchange.
+- **Subagent mode:** return-value-based for results + file-based for anything large.
+- **Hybrid:** apply the matching combination per phase, and check the hand-off at phase
+  boundaries — when a team phase feeds a subagent phase, the team's output files become the
+  subagent's inputs.
+
+### File-passing convention
+
+- Keep intermediate work under a `_agents_workspace/` directory in the working tree.
+- Name files `{phase}_{agent}_{artifact}.{ext}` — e.g. `01_analyst_requirements.md`.
+- Write only the final deliverable to the user's target path; preserve `_agents_workspace/` for
+  later inspection rather than deleting it.
diff --git a/agentic-harness/shared/harness-model.md b/agentic-harness/shared/harness-model.md
new file mode 100644
index 0000000..ab925e5
--- /dev/null
+++ b/agentic-harness/shared/harness-model.md
@@ -0,0 +1,55 @@
+# The harness model
+
+A *harness* is the project-local configuration that lets a repository work well with
+agents. It has three parts, and keeping them separate is what makes a harness
+maintainable.
+
+## The three parts
+
+| Part | Question it answers | Where it lives |
+|------|---------------------|----------------|
+| **Agent** | *Who* does the work | `.claude/agents/{name}.md` |
+| **Skill** | *How* the work is done | `.claude/skills/{name}/SKILL.md` |
+| **Orchestrator** | *When*, and *in what order*, the agents collaborate | a skill, usually `.claude/skills/{domain}-orchestrator/` |
+
+An **agent** is an expert role: a persona, its working principles, its input/output
+contract, and — in team mode — how it communicates with other agents. An agent is small.
+It says who is acting and what that actor cares about, not the step-by-step procedure.
+
+A **skill** is procedural knowledge: the workflow an agent follows, the format it
+produces, the references it consults. A skill can be large. It says how a job is done,
+independent of who does it.
+
+An **orchestrator** is a skill whose subject is the team itself. Where individual skills
+describe one agent's job, the orchestrator describes the collaboration: which agents take
+part, what each produces, how their outputs flow together, and how failures are handled.
+
+## Why separate who from how
+
+The separation is not bookkeeping. It buys three concrete things:
+
+- **Reuse.** A skill written for "how to review an API contract" is usable by any agent
+  that needs it, in this project or the next. Bind the procedure to a single named agent
+  and it stops being portable.
+- **Independent change.** When a deliverable is weak, you revise the skill. When you need
+  a new kind of reviewer, you add an agent. When the workflow order is wrong, you edit the
+  orchestrator. Each fault has one obvious place to fix it.
+- **Survival across sessions.** An agent defined only inline in a prompt vanishes when the
+  session ends. An agent defined as a file is there next time, with its protocol intact
+  and its memory.
+
+## The file rule
+
+Every agent is a file under `.claude/agents/`, even when it uses a built-in type
+(`general-purpose`, `Explore`, `Plan`). Put the built-in type in the call that spawns the
+agent; put the role, principles, and protocol in the file. A role written only into a
+spawn prompt is not reusable and carries no collaboration contract — so it is not part of
+the harness.
+
+## How the parts connect
+
+One agent uses one or more skills; a skill may be shared by several agents. The
+orchestrator names the agents and points each at its skills. The pointer in the target
+project's `CLAUDE.md` names only the orchestrator and its trigger rule — see
+`claude-md-pointer.md`. Nothing in the harness duplicates what the file system already
+states: the agent and skill lists live in `.claude/`, not in `CLAUDE.md`.
diff --git a/agentic-harness/skills/harness-review/SKILL.md b/agentic-harness/skills/harness-review/SKILL.md
new file mode 100644
index 0000000..b0304b7
--- /dev/null
+++ b/agentic-harness/skills/harness-review/SKILL.md
@@ -0,0 +1,134 @@
+---
+name: harness-review
+description: "Assess an existing agentic harness — read-only. Use to review, audit, or assess a harness; to ask how well its skills and agents are actually used; to check for drift between the files and the CLAUDE.md record; or to validate that skills trigger and agents wire up correctly. It inventories .claude/agents and .claude/skills, reads the CLAUDE.md pointer and change history, judges effective usage from project memory and the inventory, and produces a prioritized review context — what works well and what to improve — that hands off to harness-setup. This skill writes nothing; it diagnoses. To build or change a harness, use harness-setup instead."
+model: opus
+---
+
+# Harness review — assess an existing harness (read-only)
+
+This skill diagnoses an existing agentic harness and produces a *review context*: a
+prioritized account of what works well and what to improve. It is the **reader** half of the
+plugin — it writes nothing. The fixes it identifies are carried out by `harness-setup`.
+
+The harness model it assesses against is in `${CLAUDE_PLUGIN_ROOT}/shared/harness-model.md`
+(agent = who, skill = how, orchestrator = when/order).
+
+## This skill vs harness-setup
+
+`harness-review` reads and judges; `harness-setup` writes. A request to "review / assess /
+audit a harness" or "how well is the harness used" is this skill. A request that creates or
+changes anything is `harness-setup`. The output here is a review context that hands off to
+`harness-setup` — this skill never applies a change itself.
+
+## The read-only contract
+
+Make no edits — not to `.claude/agents/`, not to `.claude/skills/`, not to `CLAUDE.md`, not
+to project memory. Run nothing that mutates the project. When a fix is obvious, write it into
+the review context as a recommendation for `harness-setup`; do not apply it. The value of a
+reader is that its findings are trustworthy precisely because it changed nothing.
+
+## Step 1: Inventory the harness
+
+1. List `.claude/agents/` and `.claude/skills/`; identify the orchestrator skill.
+2. For each agent, read its frontmatter and role; for each skill, read its name and
+   description. Note the declared execution mode.
+3. Map which skills each agent uses and how the orchestrator wires them together.
+
+## Step 2: Read the record
+
+Read the harness section of `CLAUDE.md` — the goal, the trigger rule, and the change-history
+table. The convention is in `${CLAUDE_PLUGIN_ROOT}/shared/claude-md-pointer.md`. The history
+tells you how the harness has evolved and where recent changes concentrated.
+
+## Step 3: Detect drift
+
+Compare three views of the harness:
+
+- the files in `.claude/agents/` and `.claude/skills/`,
+- the orchestrator's stated composition,
+- the `CLAUDE.md` pointer and change history.
+
+List every discrepancy: an agent or skill present in the files but absent from the record or
+the orchestrator; something the record names that no longer exists; an orchestrator that
+references a member it never forms. Do not reconcile drift — report it. Reconciliation is a
+write, and belongs to `harness-setup`.
+
+## Step 4: Assess effective usage
+
+Drift tells you what *exists*; usage tells you what is *used*. Judge effective usage from a
+fixed, deterministic signal set — read each, infer nothing you cannot ground in one:
+
+- **Project auto-memory** at `~/.claude/projects/<project-hash>/memory/` — what the project
+  has recorded about how work actually happens.
+- **The `CLAUDE.md` pointer and change history** — whether the trigger rule is current and
+  what has been changing.
+- **The `.claude/` inventory** — what the harness offers.
+- **The tools registry**, if present (a `tools.md` in the orchestrator's `references/`
+  directory) — which roles are filled by which tools, and their alternatives.
+
+From these, classify each skill and agent as **used**, **unused**, **bypassed** (the work
+happens but around the harness), or **drifted** (present but out of sync). Apply the same lens
+to registered tools: is each one still used, and is there now a better alternative for its
+role? Flag tools that look unnecessary or superseded — as findings for `harness-setup`, not
+changes you make. The method for reading each signal and making the call is in
+`references/usage-assessment.md`. This step is strictly read-only.
+
+## Step 5: Validate
+
+- **Structural.** Agents are files in the right place (including built-in types); skill
+  frontmatter has `name` and `description`; cross-references between agents are consistent;
+  no `commands/` directory was generated.
+- **Triggering.** For each skill description, write should-trigger and should-NOT-trigger
+  queries — the should-NOT set built from near-misses, including ones that belong to a
+  *different* skill. Confirm the descriptions separate cleanly and don't collide. The method
+  is in `references/skill-testing-guide.md`.
+- **Effectiveness.** Where feasible, compare the deliverable with the skill against the same
+  task without it, to confirm the skill adds value. Also in `references/skill-testing-guide.md`.
+- **Dry-run.** Walk the orchestrator: is the phase order logical, do data-passing paths have
+  no dead links, does each phase's input match the previous phase's output, is each error
+  fallback executable? Check the declared execution mode against
+  `${CLAUDE_PLUGIN_ROOT}/shared/execution-modes.md` — in particular, that a team-mode
+  harness has the subagent fallback it needs.
+- **QA agent, if present.** Check it against the QA methodology — does it compare across
+  boundaries rather than only confirm existence, and run incrementally rather than once at
+  the end? See `references/qa-agent-guide.md`.
+
+## Step 6: Emit the review context
+
+Produce a prioritized review context — the deliverable of this skill — and hand it to
+`harness-setup`:
+
+````markdown
+## Harness review: {domain} — {date}
+
+### Works well (keep)
+- {finding} — {brief evidence}
+
+### To improve (act on)
+| Priority | Finding | Target | Why |
+|----------|---------|--------|-----|
+| high | {what's wrong} | {agent / skill / orchestrator / CLAUDE.md} | {evidence and impact} |
+| medium | ... | ... | ... |
+
+### Drift
+- {discrepancy between files, orchestrator, and record}
+
+### Suggested next step
+Hand this context to `harness-setup` to act on the "to improve" items in priority order.
+````
+
+Order the "to improve" items by impact, name a concrete target for each (so `harness-setup`
+knows where the fix lands), and ground each in evidence from the steps above — not in
+speculation. Keep "works well" honest: it tells `harness-setup` what not to touch.
+
+## References
+
+- `references/usage-assessment.md` — the deterministic signal set and how to read each to
+  classify a skill or agent as used, unused, bypassed, or drifted.
+- `references/skill-testing-guide.md` — trigger validation (should / should-not), with-skill
+  vs baseline effectiveness checks, and the iterative-improvement method.
+- `references/qa-agent-guide.md` — what a good QA agent does (cross-boundary comparison,
+  boundary-bug patterns, incremental QA), for assessing a harness that includes one.
+- `${CLAUDE_PLUGIN_ROOT}/shared/harness-model.md`,
+  `${CLAUDE_PLUGIN_ROOT}/shared/claude-md-pointer.md`,
+  `${CLAUDE_PLUGIN_ROOT}/shared/execution-modes.md` — shared concepts.
diff --git a/agentic-harness/skills/harness-review/references/qa-agent-guide.md b/agentic-harness/skills/harness-review/references/qa-agent-guide.md
new file mode 100644
index 0000000..af75fcd
--- /dev/null
+++ b/agentic-harness/skills/harness-review/references/qa-agent-guide.md
@@ -0,0 +1,137 @@
+# QA agent guide
+
+What a good QA agent does, so you can assess one in an existing harness or specify one for
+`harness-setup` to build. The method is drawn from common boundary-bug patterns — recurring
+failure modes that surface where two correctly-built components meet.
+
+## Table of contents
+
+1. [The bugs QA agents miss](#1-the-bugs-qa-agents-miss)
+2. [Why static checks don't catch them](#2-why-static-checks-dont-catch-them)
+3. [Cross-boundary verification](#3-cross-boundary-verification)
+4. [Design principles](#4-design-principles)
+5. [Integration-consistency checklist](#5-integration-consistency-checklist)
+6. [QA agent definition template](#6-qa-agent-definition-template)
+
+---
+
+## 1. The bugs QA agents miss
+
+The most common defect is a **boundary mismatch**: two components are each correct on their
+own, but the contract between them is broken at the seam. Each passes its own review; nobody
+compares the two sides. Recurring forms:
+
+| Boundary | Mismatch | Why it's missed |
+|----------|----------|-----------------|
+| response → consumer type | the producer returns `{ items: [...] }`; the consumer expects a bare array | each is fine alone; no one compares the shapes |
+| field name across layers | one layer uses camelCase, another snake_case for the same field | a generic cast hides it from the compiler |
+| route path → link target | a page lives under one path prefix; a link points at another | file layout and link values aren't cross-checked |
+| state map → update sites | the transition map allows a move the code never performs (or vice versa) | only the map's existence is confirmed, not every update site |
+| endpoint → caller | an endpoint exists but nothing calls it (or a caller hits a missing one) | the endpoint list and caller list aren't mapped 1:1 |
+| immediate vs async result | the immediate response lacks a field the consumer reads off it | only the type is checked, not sync vs async timing |
+
+## 2. Why static checks don't catch them
+
+- **Generic casts defeat the type checker.** A typed fetch annotated to return one shape will
+  compile even when the runtime value has another — the annotation is a claim, not a check.
+- **A passing build is not correct behavior.** With casts, `any`, or generics in play, the
+  build succeeds and the code still fails at runtime.
+- **Existence is not connection.** "Does the endpoint exist?" and "does its response match
+  what the caller expects?" are different questions; only the second catches boundary bugs.
+
+## 3. Cross-boundary verification
+
+The core technique: compare the two sides of a contract directly, never one in isolation.
+
+- **Response shape ↔ consumer type.** Extract the shape the producer actually returns and the
+  type the consumer expects, and compare — including any wrapper (does the producer return
+  `{ data: [...] }` while the consumer reads a bare array?), and field-name casing across
+  layers, and the difference between an immediate acknowledgement and the eventual result.
+- **Route path ↔ link target.** Derive each page's real path from the file layout and compare
+  it against every link, redirect, and navigation target in the code; account for path
+  prefixes and dynamic segments.
+- **State map ↔ update sites.** List the transitions the map allows, then find every place the
+  code changes state; flag transitions the code performs that the map forbids, and transitions
+  the map allows that the code never performs.
+- **Endpoint ↔ caller.** List every endpoint and every call site and pair them; an endpoint no
+  one calls is either dead or a missing wire-up — decide which.
+
+## 4. Design principles
+
+- **Use the `general-purpose` type, not `Explore`.** Effective QA greps for patterns, runs
+  comparison scripts, and sometimes fixes — `Explore` is read-only and can't. Give the agent a
+  "verify → report → request fix" protocol instead of read-only tooling.
+- **Prefer cross-boundary comparison over existence checks.** "Does the response shape match
+  the consumer's type?" beats "does the endpoint exist?"; "does every update obey the state
+  map?" beats "is the map defined?".
+- **Read both sides at once.** To catch a seam bug, open the producer and the consumer
+  together — the route and its caller, the state map and the update code, the file layout and
+  the links. State this explicitly in the agent definition.
+- **Run incrementally, not just at the end.** QA placed only after everything is built lets
+  early mismatches propagate and raises the cost of every fix. Verify each module's boundaries
+  as it lands.
+
+## 5. Integration-consistency checklist
+
+A template to fold into a QA agent for a typical web application:
+
+```markdown
+### Integration consistency
+
+#### Response ↔ consumer
+- [ ] Every response shape matches the consumer's expected type
+- [ ] Wrapped responses ({ items: [...] }) are unwrapped on the consumer side
+- [ ] camelCase/snake_case is consistent across layers
+- [ ] Immediate acknowledgements and eventual results are distinguished on the consumer
+- [ ] Every endpoint has a caller, and every caller hits a real endpoint
+
+#### Routing
+- [ ] Every link / redirect / navigation target resolves to a real page path
+- [ ] Path prefixes and route groups are accounted for
+- [ ] Dynamic segments are filled with the right parameters
+
+#### State machine
+- [ ] Every transition the code performs is allowed by the map (no unauthorized moves)
+- [ ] Every transition the map allows is reachable in the code (no dead transitions)
+- [ ] Intermediate-to-final transitions are present, not missing
+- [ ] State-based branches on the consumer are actually reachable
+
+#### Data flow
+- [ ] Field names match from storage through response to consumer type
+- [ ] Optional-field null/undefined handling is consistent on both sides
+```
+
+## 6. QA agent definition template
+
+```markdown
+---
+name: qa-inspector
+description: "QA verification specialist. Checks spec compliance, integration consistency, and design quality across module boundaries."
+model: opus
+---
+
+# QA inspector
+
+## Core role
+Verify implementation quality against the spec, and **integration consistency between
+modules** — boundary mismatch is the leading cause of runtime failure.
+
+## Verification priority
+1. Integration consistency (highest)
+2. Functional spec compliance
+3. Design quality
+4. Code quality (dead code, naming)
+
+## Method: read both sides at once
+| Target | Producer side | Consumer side |
+|--------|---------------|---------------|
+| response shape | where the response is built | where it is consumed and typed |
+| routing | page path from the file layout | link / redirect / navigation target |
+| state | the transition map | the state-update sites |
+| data | storage field names | response field → consumer type |
+
+## Team communication protocol
+- On a finding, send a specific fix request (location + fix) to the responsible agent.
+- For a boundary issue, notify **both** sides.
+- To the leader: a report separating passed / failed / unverified.
+```
diff --git a/agentic-harness/skills/harness-review/references/skill-testing-guide.md b/agentic-harness/skills/harness-review/references/skill-testing-guide.md
new file mode 100644
index 0000000..50d5fa7
--- /dev/null
+++ b/agentic-harness/skills/harness-review/references/skill-testing-guide.md
@@ -0,0 +1,103 @@
+# Skill testing guide
+
+How to test whether a harness's skills trigger correctly and add value, and how to improve
+them. Supplements the validation step.
+
+## Table of contents
+
+1. [Two kinds of evaluation](#1-two-kinds-of-evaluation)
+2. [Writing test prompts](#2-writing-test-prompts)
+3. [With-skill vs baseline](#3-with-skill-vs-baseline)
+4. [Assertion-based scoring](#4-assertion-based-scoring)
+5. [Trigger validation](#5-trigger-validation)
+6. [The improvement loop](#6-the-improvement-loop)
+
+---
+
+## 1. Two kinds of evaluation
+
+Skill quality is judged two ways, and most skills need both:
+
+| Kind | How | Fits |
+|------|-----|------|
+| Qualitative | the user reads the deliverable | subjective quality — prose, design, judgment calls |
+| Quantitative | assertion-based scoring | objectively checkable results — a file created, data extracted, code generated |
+
+The loop is the same either way: **write a test → run it → evaluate → improve → re-run**.
+
+## 2. Writing test prompts
+
+A test prompt should read like a real request a real user would type — specific and natural.
+Abstract prompts ("process the data", "generate a chart") have little test value.
+
+A good prompt carries concrete detail:
+
+```text
+In the quarterly sales sheet in my downloads folder, add a margin column from the revenue and
+cost columns, then sort by margin descending.
+```
+
+Vary the prompts so the set covers more than the happy path:
+
+- Mix formal and casual phrasing.
+- Mix explicit intent (states the file type) with implicit (must be inferred from context).
+- Mix simple and multi-step tasks.
+- Start with two or three: one core case, one edge case, optionally one composite.
+
+## 3. With-skill vs baseline
+
+To confirm a skill actually adds value, run the same prompt twice in parallel — once with the
+skill, once without — and compare:
+
+- **With-skill:** the agent reads the skill and does the work.
+- **Baseline:** the same prompt, no skill. When improving an *existing* skill, the baseline is
+  the previous version (keep a snapshot), not an empty run.
+
+Keep each run's outputs in their own directory so iterations don't overwrite each other. If
+you capture timing or token counts, save them from the completion notification immediately —
+that data is available only at that moment and cannot be recovered later.
+
+## 4. Assertion-based scoring
+
+When the deliverable is objectively checkable, write assertions. A good assertion can be
+judged true or false, has a descriptive name, and tests the skill's core value. A bad one is
+subjective ("it reads well") or passes regardless of the skill ("output exists").
+
+Watch for **non-discriminating** assertions — ones that pass in both the with-skill and the
+baseline run. They measure nothing about the skill; drop them or replace them with a harder
+one. Where an assertion can be checked by code, script it: faster, repeatable, and reusable
+across iterations. Record results with the field names `text`, `passed`, `evidence` (the same
+schema the setup-side skill-writing guide defines).
+
+## 5. Trigger validation
+
+A skill's description is its only trigger. Validate it with two query sets:
+
+- **Should-trigger (8–10):** the same need phrased many ways — formal and casual, explicit and
+  implicit, including cases where this skill should win over a competing one.
+- **Should-NOT-trigger (8–10):** **near-misses are the point.** A query with overlapping
+  keywords but where a *different* tool is the right fit tests the boundary; an obviously
+  unrelated query ("write a Fibonacci function") tests nothing. Build the should-NOT set from
+  queries that belong to adjacent skills — especially, for this plugin's own two skills, a
+  "review my harness" that must not fire `harness-setup` and an "extend my harness" that must
+  not fire `harness-review`.
+
+Then check for collisions: confirm the should-trigger queries don't wrongly fire a *different*
+existing skill. Where two descriptions overlap, sharpen the boundary wording in both.
+
+## 6. The improvement loop
+
+When a test surfaces a problem:
+
+1. **Generalize the fix.** Repair the principle, not the one example that failed — a narrow
+   patch that fits only the test case is overfitting.
+2. **Cut what doesn't earn its place.** If the transcript shows the skill making the agent do
+   unproductive work, remove that part.
+3. **Explain the why.** Even terse feedback has a reason behind it; fold the reason into the
+   skill so it generalizes.
+4. **Bundle repetition.** If every run writes the same helper, pre-bundle it under `scripts/`.
+
+Re-run all cases in a fresh iteration directory, show the user the result against the previous
+one, and repeat. Stop when the user is satisfied, when feedback comes back empty, or when
+there is no more meaningful improvement to make. Draft, then re-read with fresh eyes — don't
+expect the first pass to be the final one.
diff --git a/agentic-harness/skills/harness-review/references/usage-assessment.md b/agentic-harness/skills/harness-review/references/usage-assessment.md
new file mode 100644
index 0000000..fc3ab8e
--- /dev/null
+++ b/agentic-harness/skills/harness-review/references/usage-assessment.md
@@ -0,0 +1,80 @@
+# Usage assessment
+
+How to judge whether a harness is actually *used*, not just whether it *exists*. This reads a
+fixed, deterministic signal set and classifies each skill, agent, and tool. It changes
+nothing — every output is a finding for `harness-setup`, never an edit.
+
+## Why a fixed signal set
+
+Usage is inferred, not measured directly — there is no usage counter. So ground every
+judgment in a concrete signal rather than a guess. Four signals, all read-only. Read each,
+state what it shows, and make the call only from evidence one of them supports. When the
+signals are thin, say so and lower your confidence rather than inventing certainty.
+
+## Signal 1: project auto-memory
+
+Location: `~/.claude/projects/<project-hash>/memory/`. The `<project-hash>` is the project's
+absolute path with the separators turned into dashes — e.g. a project at
+`/Users/me/work/app` becomes a directory like `-Users-me-work-app`. Find the one whose slug
+matches the project you are reviewing.
+
+Read `MEMORY.md` (the index) and the memory files it points to. This is where the project has
+recorded how work actually happens. What to look for:
+
+- Does the memory mention the orchestrator skill, the agents, or the harness at all? Silence
+  is itself a signal — a harness the project never records using is likely unused.
+- Does it describe work being done **by hand** in the harness's domain? That points to the
+  orchestrator being bypassed.
+- Does it record friction ("the agent kept doing X")? That points to a quality or trigger
+  problem to flag.
+
+## Signal 2: the CLAUDE.md pointer and change history
+
+Read the harness section of `CLAUDE.md`. The pointer's trigger rule tells you what *should*
+route to the orchestrator; compare it against how the project actually describes its work
+(Signal 1). The change-history table tells you the evolution:
+
+- A pointer that names an orchestrator the files no longer contain → drift.
+- A long history that stops abruptly → the harness may have been abandoned.
+- History concentrated on one agent or skill → that area has been unstable; worth a look.
+
+## Signal 3: the `.claude/` inventory
+
+List `.claude/agents/` and `.claude/skills/`, and read the orchestrator's composition. This
+is the ground truth of what the harness offers. Cross-check it against the other signals:
+something in the inventory that no signal ever mentions is a candidate for "unused"; something
+the memory or pointer references that is missing from the inventory is drift.
+
+## Signal 4: the tools registry
+
+If the orchestrator has a `tools.md` registry (from the tool-discovery step), read it: the
+roles, the preferred tool and alternative per role, and the `Last reviewed` dates. What to
+look for:
+
+- A registered tool no agent or skill references by role → unused.
+- A `Last reviewed` date that is far in the past → due for re-evaluation.
+- A role whose preferred tool the project's memory shows failing, so the alternative is what
+  runs in practice → flag the preferred tool as a candidate to swap.
+
+## Classifying from the signals
+
+Combine the signals and label each skill, agent, and tool:
+
+| Class | What the signals show |
+|-------|------------------------|
+| **Used** | the inventory has it, and memory or history shows it being invoked or changed |
+| **Unused** | it exists in the inventory, but no other signal ever references it |
+| **Bypassed** | the work in its domain happens (memory shows it), but around the harness rather than through the orchestrator |
+| **Drifted** | the pointer, history, or orchestrator names it, but the files disagree (missing, moved, or out of sync) |
+
+A label is only as good as its evidence. For each, name the signal that supports it. "Unused
+— not referenced in memory, history, or the orchestrator" is a finding; "probably unused" with
+nothing behind it is not.
+
+## Staying read-only
+
+This assessment writes nothing — not the files, not `CLAUDE.md`, not memory. When the right
+fix is obvious (retire an unused agent, swap a superseded tool, widen a trigger that memory
+shows missing), record it as a prioritized recommendation in the review context and hand it to
+`harness-setup`. The separation is the point: a reader that never writes can be trusted, and
+its findings stay clean for the writer to act on.
diff --git a/agentic-harness/skills/harness-setup/SKILL.md b/agentic-harness/skills/harness-setup/SKILL.md
new file mode 100644
index 0000000..6407e3d
--- /dev/null
+++ b/agentic-harness/skills/harness-setup/SKILL.md
@@ -0,0 +1,221 @@
+---
+name: harness-setup
+description: "Build, extend, and maintain a project's agentic harness — the agents, skills, and orchestrator under .claude/. This skill writes files. Use it to set up or build a harness, scaffold or design an agent team and the skills they use, add or change an agent or skill, update or rebuild the harness, sync it after drift, or apply a review context produced by harness-review. On explicit request it can also discover and configure external tools — MCP servers and plugins — that fit the project, and register the approved ones in the harness tools registry. Also triggers on follow-ups such as 'extend the harness', 'the harness needs a new agent', 're-run setup', 'act on the review', and 'find tools or MCPs for this project'. For read-only assessment of how well an existing harness is used, use harness-review instead — this skill is the writer, that one is the reader."
+model: opus
+---
+
+# Harness setup — build and maintain the agent team
+
+This skill builds and maintains a project-local **agentic harness**: a team of agent
+definitions, the skills those agents use, an orchestrator that coordinates them, and a
+pointer in the project's `CLAUDE.md`. It writes files. It does not do the project's domain
+work — it builds the agents and skills that do.
+
+Read the model first: `${CLAUDE_PLUGIN_ROOT}/shared/harness-model.md` defines the three
+parts (agent = who, skill = how, orchestrator = when/order) and why they stay separate.
+
+## This skill vs harness-review
+
+`harness-setup` is the **writer** — every change to the harness goes through it. `harness-review`
+is the **reader** — it assesses an existing harness and writes nothing. When a request is
+"assess / review / audit / how well is it used," that is `harness-review`. When a request
+creates or changes anything, it is this skill. After `harness-review` hands off a *review
+context*, this skill is what acts on it.
+
+## Step 0: Orient before writing
+
+Always check the current state first, then pick the path. Do not start generating until the
+plan is confirmed.
+
+1. Read `.claude/agents/`, `.claude/skills/`, and the harness section of `CLAUDE.md`.
+2. Branch on what you find:
+   - **New build** — no harness, or empty agent/skill dirs → run Steps 1–7 in full.
+   - **Extend** — a harness exists and the request adds or changes an agent or skill → run
+     only the needed steps, per the extension matrix in `references/maintenance.md`.
+   - **Apply a review context** — `harness-review` produced a prioritized list → work it as
+     an interactive improvement pass (see `references/maintenance.md`).
+   - **Sync** — the files and the `CLAUDE.md` record disagree → reconcile and record the
+     correction.
+3. Summarize what you found and the plan you propose; get the user's confirmation.
+
+## Step 1: Analyze the domain
+
+1. Identify the domain and the core task types (creation, validation, editing, analysis).
+2. Explore the codebase — tech stack, data models, key modules — so agents and skills fit
+   the project rather than a generic template.
+3. Check for conflicts or overlap with any existing agents and skills from Step 0.
+4. Read the user's technical level from the conversation and match your wording to it.
+   Explain a term like "assertion" or "schema" when the cues suggest it is unfamiliar.
+
+## Step 1b: Discover tools — optional, on request
+
+Run this only when the user asks for it ("find tools / MCPs / plugins for this project").
+It is never automatic. It can run inside a build or standalone against an existing harness.
+
+This skill proposes nothing of its own — the candidates come from a live search, not a
+built-in catalog:
+
+1. Hand a **search-optimized subagent** (`general-purpose`, with web search) a tight context
+   — the project's domain, stack, and task types from Step 1 — and have it find candidate
+   MCP servers and plugins that fit. Also inspect the local and session configuration for
+   tools already available, so you don't propose what is already there.
+2. Present each candidate with the **role** it would fill, what it does, and its trade-off.
+   The user **accepts or rejects each one explicitly**. Adopt only what is accepted.
+3. Register accepted tools **by role** in the tools registry under the orchestrator — a
+   `tools.md` file in `.claude/skills/{domain}-orchestrator/references/`. It is a lookup of
+   role → preferred tool → alternative (for when the preferred one is unavailable) → status.
+   Agents and skills reference a tool by its **role**, never by a hard tool name, so the
+   harness falls back to the alternative when a tool is missing.
+
+The subagent's context template, the acceptance flow, and the registry schema are in
+`references/tool-discovery.md`. Registered tools are reviewed periodically — see
+`references/maintenance.md`.
+
+## Step 2: Choose the execution mode and the architecture pattern
+
+**Execution mode.** Default to an **agent team** when two or more agents genuinely need to
+exchange information mid-task; fall back to **subagents** when they do not, or when the
+experimental team tools are unavailable. The team-tools caveat and the mechanical fallback
+mapping are in `${CLAUDE_PLUGIN_ROOT}/shared/execution-modes.md` — read it and decide the
+mode before designing the team, because the mode shapes the agent definitions and the
+orchestrator.
+
+**Architecture pattern.** Decompose the work into areas of expertise and pick a structure.
+The six patterns — Pipeline, Fan-out/Fan-in, Expert Pool, Producer-Reviewer, Supervisor,
+Hierarchical Delegation — with their fit and their team-mode suitability are in
+`references/agent-design-patterns.md`. Composite patterns are common; the same reference
+covers them.
+
+**Split agents** along four axes — expertise, parallelism, context, reusability. The
+criteria table is in `references/agent-design-patterns.md`. Prefer a few focused agents over
+many thin ones; coordination cost grows with team size.
+
+## Step 3: Generate the agent definitions
+
+Write every agent as a file under `.claude/agents/{name}.md` — including agents that use a
+built-in type (`general-purpose`, `Explore`, `Plan`). Put the built-in type in the spawn
+call; put the role, principles, and protocol in the file. The reason is in the harness
+model: a role defined only inline is not reusable next session and carries no collaboration
+contract.
+
+Each agent file states: core role, working principles, input/output protocol, error
+handling, and collaboration. In team mode, add a **team communication protocol** section —
+who it messages, who messages it, and what it claims from the shared task list. The
+definition template and worked agent files are in `references/agent-design-patterns.md` and
+`references/team-examples.md`.
+
+**Model.** Set each agent's model explicitly, in both the agent file and the spawn call. A
+harness's quality tracks its agents' reasoning, so use the strongest reasoning model for
+roles that depend on judgment rather than throughput.
+
+**If the team includes a QA agent.** Use the `general-purpose` type (`Explore` is read-only
+and cannot run validation). Make its core method *cross-boundary comparison* — read both
+sides of a contract together (the producer and the consumer), not each in isolation — and
+run it incrementally as each module lands, not once at the end. The full methodology,
+boundary-bug patterns, and a QA agent template are in the `qa-agent-guide` reference under
+the `harness-review` skill.
+
+## Step 4: Create the skills
+
+Create each skill the agents use at `.claude/skills/{name}/SKILL.md`. The authoring guide —
+description writing, body principles, progressive disclosure, data-schema standards — is in
+`references/skill-writing-guide.md`. The essentials:
+
+- **Description.** It is the only trigger mechanism. Write it to be specific about what the
+  skill does and when it should fire, slightly pushy to offset conservative triggering, and
+  worded to stay clear of skills that should *not* fire on the same request.
+- **Body.** Explain the *why* rather than issuing bare "ALWAYS/NEVER" rules — an agent that
+  understands the reason handles edge cases correctly. Keep it lean (aim under 500 lines;
+  move detail to `references/`). Generalize to the principle instead of overfitting to one
+  example. Write imperatively.
+- **Progressive disclosure.** Metadata is always in context; the body loads on trigger;
+  `references/` load only when needed. Split large or domain-specific detail into
+  `references/` so only the relevant file loads.
+- **Linking.** One agent uses one or more skills; a skill may be shared across agents. The
+  skill holds *how*; the agent holds *who*.
+
+## Step 5: Build the orchestrator and register the pointer
+
+The orchestrator is a skill whose subject is the team: which agents take part, what each
+produces, how outputs flow, and how failures are handled. Templates for team, subagent, and
+hybrid modes — with data-passing, error handling, and test scenarios — are in
+`references/orchestrator-template.md`.
+
+Build into the orchestrator:
+
+- **A context-check first phase** so it distinguishes an initial run from a follow-up or a
+  partial re-run (branch on whether `_agents_workspace/` already exists).
+- **Follow-up trigger keywords** in its description ("re-run", "update", "modify",
+  "supplement", "improve the previous result", and everyday domain phrasings). Without
+  these, the harness goes unused after its first run.
+- **Data-passing** stated explicitly, matched to the mode — see
+  `${CLAUDE_PLUGIN_ROOT}/shared/execution-modes.md`.
+- **Error handling** that does not assume success: retry once, then proceed without the
+  missing result and note the omission; never delete conflicting data — record it with its
+  source.
+- **The tools registry**, when tool discovery (Step 1b) has run: it lives in this
+  orchestrator's `references/` directory as `tools.md`, and agents and skills reference tools
+  by role from it.
+
+When extending rather than building new, modify the existing orchestrator — do not create a
+second one. Reflect a new agent in the team composition, task assignment, data flow, and
+trigger keywords.
+
+Then **register the pointer** in the project's `CLAUDE.md`: goal, trigger rule, and the
+change-history table — and nothing the file system already holds. The convention and the
+template are in `${CLAUDE_PLUGIN_ROOT}/shared/claude-md-pointer.md`.
+
+## Step 6: Record the change in history
+
+Every write to the harness appends a row to the change-history table in `CLAUDE.md`
+(`Date | Change | Target | Reason`). This is a required step, not optional — it is how
+evolution stays visible and regressions stay catchable. The table format is in
+`${CLAUDE_PLUGIN_ROOT}/shared/claude-md-pointer.md`.
+
+## Step 7: Capture feedback
+
+A harness is a system that keeps changing, not a one-time artifact. After a run, offer the
+user the chance to feed back ("anything to improve in the result, the team, or the
+workflow?"). If there is none, move on — do not force it. When there is, route it to the
+right place: output quality → the relevant skill; missing role → a new agent definition;
+wrong order → the orchestrator; missing trigger → a description. The routing table and the
+evolution triggers (recurring feedback, repeated failures, the user bypassing the
+orchestrator) are in `references/maintenance.md`.
+
+## Deliverable checklist
+
+Before calling a setup or change complete:
+
+- [ ] Every agent is a file under `.claude/agents/` — including built-in types.
+- [ ] Skills exist under `.claude/skills/` with valid `name` + `description` frontmatter.
+- [ ] One orchestrator, with data flow, error handling, and test scenarios.
+- [ ] Execution mode is stated (team / subagent / hybrid; per-phase if hybrid), with the
+      subagent fallback covered when team mode is the default.
+- [ ] Each agent's model is set explicitly.
+- [ ] No `commands/` directory was generated.
+- [ ] No conflict with existing agents or skills.
+- [ ] Skill and orchestrator descriptions are pushy and include follow-up keywords.
+- [ ] Each SKILL.md body is within ~500 lines; overflow moved to `references/`.
+- [ ] The orchestrator's first phase does a context check (initial / follow-up / partial).
+- [ ] The `CLAUDE.md` pointer is registered (goal + trigger + change history).
+- [ ] The change-history table records this change.
+- [ ] If tool discovery ran: nothing was adopted without explicit approval, and accepted
+      tools are registered by role (with alternatives) in the orchestrator's `tools.md`.
+
+## References
+
+- `references/agent-design-patterns.md` — execution-mode comparison, the six architecture
+  patterns, agent-split criteria, the agent-definition template.
+- `references/team-examples.md` — worked agent teams across generic domains, with full
+  sample agent files.
+- `references/orchestrator-template.md` — orchestrator templates by mode, with data-passing,
+  error handling, and test scenarios.
+- `references/skill-writing-guide.md` — skill authoring: descriptions, body style,
+  progressive disclosure, data-schema standards.
+- `references/maintenance.md` — extending an existing harness (the extension matrix),
+  applying a review context, syncing drift, feedback routing, and periodic tool review.
+- `references/tool-discovery.md` — the optional, on-request tool-discovery step: the
+  search subagent's context, the explicit-acceptance flow, and the `tools.md` registry schema.
+- `${CLAUDE_PLUGIN_ROOT}/shared/harness-model.md`,
+  `${CLAUDE_PLUGIN_ROOT}/shared/execution-modes.md`,
+  `${CLAUDE_PLUGIN_ROOT}/shared/claude-md-pointer.md` — shared concepts.
diff --git a/agentic-harness/skills/harness-setup/references/agent-design-patterns.md b/agentic-harness/skills/harness-setup/references/agent-design-patterns.md
new file mode 100644
index 0000000..7a62e61
--- /dev/null
+++ b/agentic-harness/skills/harness-setup/references/agent-design-patterns.md
@@ -0,0 +1,223 @@
+# Agent design patterns
+
+How to structure a team, which architecture pattern fits the work, when to split a role
+into its own agent, and the shape of an agent definition file.
+
+## Table of contents
+
+1. [Execution modes, in brief](#1-execution-modes-in-brief)
+2. [The six architecture patterns](#2-the-six-architecture-patterns)
+3. [Composite patterns](#3-composite-patterns)
+4. [Agent type selection](#4-agent-type-selection)
+5. [Splitting vs merging agents](#5-splitting-vs-merging-agents)
+6. [Skill vs agent](#6-skill-vs-agent)
+7. [Agent definition template](#7-agent-definition-template)
+
+---
+
+## 1. Execution modes, in brief
+
+Two modes, with a third that mixes them. The full decision logic, the experimental
+team-tools caveat, and the team-to-subagent fallback mapping are in the shared execution-modes
+doc — read that before designing the team. The short version:
+
+- **Agent team** — members run as peers and coordinate directly (`SendMessage`) over a
+  shared task list (`TaskCreate`). Use it when agents need to exchange information while they
+  work: sharing findings, challenging each other, reconciling conflicts. Only one team is
+  active per session, but a team can be disbanded between phases and a new one formed. The
+  team tools are experimental — always design a subagent fallback.
+- **Subagent** — the orchestrator spawns each agent with the `Agent` tool; the agent returns
+  its result and does not talk to siblings. Use it for a single agent, or for independent
+  jobs where only the combined result matters. Parallelize with `run_in_background`.
+
+Each pattern below notes whether a team earns its cost or a subagent is the better fit.
+
+## 2. The six architecture patterns
+
+### Pipeline
+
+Stages run in order; each consumes the previous stage's output.
+
+```
+[Analyze] → [Design] → [Build] → [Verify]
+```
+
+Fits work with strong sequential dependency. A slow stage stalls the whole line, so make
+each stage as independent as the work allows. Team mode adds little to a pure sequence —
+unless a stage has parallel sub-work, in which case a team helps there.
+
+### Fan-out / Fan-in
+
+Independent work in parallel, then integration.
+
+```
+            ┌→ [Specialist A] ─┐
+[Distribute]┼→ [Specialist B] ─┼→ [Integrate]
+            └→ [Specialist C] ─┘
+```
+
+Fits when one input needs several independent perspectives. The integration stage governs
+final quality. This is the most natural fit for a team: members surface findings to each
+other and one member's discovery can redirect another's work mid-flight, which a set of
+isolated subagents cannot do. Build it as a team when the tools allow.
+
+### Expert Pool
+
+Route to the one specialist the input needs.
+
+```
+[Router] → { Specialist A | Specialist B | Specialist C }
+```
+
+Fits when different input types need different handling. The router's classification is the
+weak point. Subagents usually fit better — only the chosen specialist runs, so a standing
+team is wasted.
+
+### Producer–Reviewer
+
+A producer and a reviewer work as a pair, looping until the deliverable passes.
+
+```
+[Produce] → [Review] → (issues?) → back to [Produce]
+```
+
+Fits when quality matters and there are objective review criteria. Cap the retry count (two
+or three) to avoid an endless loop. A team lets producer and reviewer exchange feedback
+directly; subagents work too when a written review hand-off is enough.
+
+### Supervisor
+
+A central agent holds the work state and hands out work as it goes.
+
+```
+              ┌→ [Worker A]
+[Supervisor] ─┼→ [Worker B]
+              └→ [Worker C]
+```
+
+Fits variable workloads where assignment is decided at runtime, not fixed up front (the
+difference from fan-out). Keep delegation units large enough that the supervisor is not the
+bottleneck. The team's shared task list matches this naturally: register work, let members
+claim it.
+
+### Hierarchical Delegation
+
+A higher agent decomposes a problem and delegates to lower agents, recursively.
+
+```
+[Lead] → [Sub-lead A] → [Worker A1], [Worker A2]
+       → [Sub-lead B] → [Worker B1]
+```
+
+Fits problems that decompose cleanly into a tree. Keep it to two levels — three or more adds
+latency and loses context. Teams cannot nest (a member cannot form its own team), so
+implement level one as a team and level two as subagents, or flatten to a single team.
+
+## 3. Composite patterns
+
+Real harnesses usually combine patterns:
+
+| Composite | Shape | Example domain |
+|-----------|-------|----------------|
+| Fan-out + Producer–Reviewer | parallel production, then per-item review | translate several variants in parallel, each reviewed separately |
+| Pipeline + Fan-out | parallelize one stage of a sequence | sequential analysis → parallel build → sequential integration test |
+| Supervisor + Expert Pool | supervisor routes to the right specialist on demand | inbound triage that assigns each case to a domain specialist |
+
+Default composites to a team when members benefit from talking; drop to subagents only for a
+stage that is genuinely isolated and one-off.
+
+## 4. Agent type selection
+
+Pass the type via the `Agent` tool's `subagent_type`. Built-in types:
+
+| Type | Tool access | Fits |
+|------|-------------|------|
+| `general-purpose` | full (incl. web, scripts) | research, validation, any work needing tools |
+| `Explore` | read-only | reading and analyzing a codebase |
+| `Plan` | read-only | design and planning |
+
+A custom type is an agent you defined under `.claude/agents/{name}.md`, invoked with
+`subagent_type: "{name}"`; it has full tool access. Choose by:
+
+| Situation | Choice |
+|-----------|--------|
+| Complex role reused across sessions | custom type (a file) |
+| Simple collection where a prompt suffices | `general-purpose` + a detailed prompt |
+| Read-only analysis or review | `Explore` |
+| Design or planning only | `Plan` |
+| Work that modifies files | custom type |
+
+Whatever the type, write the agent as a file (see the harness model's file rule). For a QA
+agent specifically, use `general-purpose` — `Explore` cannot run validation.
+
+## 5. Splitting vs merging agents
+
+Judge along four axes:
+
+| Axis | Split when | Merge when |
+|------|------------|------------|
+| Expertise | the domains differ | the domains overlap |
+| Parallelism | they can run independently | they are sequentially dependent |
+| Context | the context load is large | both are light and fast |
+| Reusability | other teams use it too | only this team uses it |
+
+Prefer a few focused agents over many thin ones — coordination cost rises with team size.
+
+## 6. Skill vs agent
+
+| Aspect | Skill | Agent |
+|--------|-------|-------|
+| What it is | procedural knowledge + tools | an expert persona + principles |
+| Lives in | `.claude/skills/` | `.claude/agents/` |
+| Triggered by | matching a request | explicit invocation via the `Agent` tool |
+| Size | small to large (a workflow) | small (a role) |
+| Answers | *how* | *who* |
+
+An agent leverages a skill in one of three ways:
+
+| Way | How | Fits |
+|-----|-----|------|
+| Skill invocation | the agent prompt says to invoke `/skill-name` via the Skill tool | reusable, independently invocable skills |
+| Inline | the skill content sits inside the agent definition | short (≤50 lines), exclusive to that agent |
+| Reference load | the agent reads a `references/` file when needed | large, only conditionally relevant content |
+
+## 7. Agent definition template
+
+```markdown
+---
+name: agent-name
+description: "One or two sentences on the role. List the trigger keywords."
+model: opus
+---
+
+# Agent Name — one-line role
+
+You are an expert [role] in [domain].
+
+## Core role
+1. ...
+2. ...
+
+## Working principles
+- ...
+
+## Input / output protocol
+- Input: where it reads from, and what
+- Output: where it writes, and what
+- Format: file format and structure
+
+## Team communication protocol   (team mode only)
+- Receives: from whom, and what
+- Sends: to whom, and what
+- Claims: what it takes from the shared task list
+
+## Error handling
+- on failure: ...
+- on timeout: ...
+
+## Collaboration
+- relationships with the other agents
+```
+
+Set `model` explicitly. For roles whose quality depends on judgment rather than throughput,
+use the strongest reasoning model available.
diff --git a/agentic-harness/skills/harness-setup/references/maintenance.md b/agentic-harness/skills/harness-setup/references/maintenance.md
new file mode 100644
index 0000000..08da66e
--- /dev/null
+++ b/agentic-harness/skills/harness-setup/references/maintenance.md
@@ -0,0 +1,117 @@
+# Maintaining a harness
+
+A harness is a system that keeps changing, not a one-time build. This covers the four ways it
+changes after the first build: extending it, applying a review context, syncing drift, and
+folding in feedback. Every one of them ends by recording the change in history.
+
+## Table of contents
+
+1. [The extension matrix](#1-the-extension-matrix)
+2. [Applying a review context](#2-applying-a-review-context)
+3. [Syncing drift](#3-syncing-drift)
+4. [Feedback routing](#4-feedback-routing)
+5. [Evolution triggers](#5-evolution-triggers)
+6. [The operations workflow](#6-the-operations-workflow)
+7. [Periodic tool review](#7-periodic-tool-review)
+
+---
+
+## 1. The extension matrix
+
+When extending an existing harness, do not re-run the full build. Run only the steps the
+change needs (step numbers are from the `harness-setup` SKILL.md):
+
+| Change | Domain analysis (1) | Mode/pattern (2) | Agents (3) | Skills (4) | Orchestrator (5) | History (6) |
+|--------|---------------------|------------------|------------|------------|------------------|-------------|
+| Add an agent | reuse Step 0 | placement only | yes | only if it needs a new skill | update composition + triggers | yes |
+| Add / change a skill | skip | skip | skip | yes | only if wiring changes | yes |
+| Architecture change | skip | yes | only affected agents | only affected skills | yes | yes |
+
+When adding an agent, modify the existing orchestrator — do not spawn a second one. Reflect
+the new agent in the team composition, task assignment, data flow, and trigger keywords.
+
+## 2. Applying a review context
+
+`harness-review` hands off a prioritized list: what works well (leave it), and what to
+improve (act on it). Treat it as an interactive improvement pass, not a batch rewrite:
+
+1. Read the review context and confirm the priority order with the user.
+2. Take items one at a time, highest priority first.
+3. For each, identify the right target with the routing table below, make the smallest change
+   that addresses it, and record it in history.
+4. After each change, re-check the specific concern the item raised before moving on.
+
+Working one item at a time keeps each change attributable and easy to roll back.
+
+## 3. Syncing drift
+
+Drift is when the files and the `CLAUDE.md` record disagree — an agent or skill exists that
+the record doesn't mention, or the record names something no longer present.
+
+1. List `.claude/agents/` and `.claude/skills/`; compare against the orchestrator's
+   composition and the `CLAUDE.md` pointer.
+2. For each discrepancy, decide with the user which side is correct — the files or the
+   record.
+3. Reconcile toward the correct side, then record the correction in history.
+
+The files are usually the truth; the record is what falls behind. But confirm rather than
+assume — a missing file can mean a deletion that was never recorded *or* an accidental loss.
+
+## 4. Feedback routing
+
+Different feedback lands in different places. Route by type:
+
+| Feedback | Fix where | Example |
+|----------|-----------|---------|
+| Output quality | the agent's skill | "the analysis is too shallow" → add depth criteria to the skill |
+| Missing role | a new agent definition | "we also need a security pass" → add an agent |
+| Wrong order | the orchestrator | "validation should come first" → reorder the phases |
+| Team composition | orchestrator + agents | "merge these two" → combine the agents |
+| Missing trigger | the skill description | "it doesn't fire when I phrase it this way" → widen the description |
+
+The reason this table exists is the separation in the harness model: who, how, and when each
+have one home, so each kind of fault has one place to fix it.
+
+## 5. Evolution triggers
+
+Propose a change not only when the user asks, but when the signals say it is due:
+
+- The same kind of feedback recurs two or more times.
+- An agent fails the same way repeatedly.
+- The user is working around the orchestrator by hand — a sign it does not fit the real task.
+
+When you see these, raise it; don't wait to be told.
+
+## 6. The operations workflow
+
+For an audit-fix-sync request on an existing harness:
+
+1. **Audit.** Compare `.claude/agents/` and `.claude/skills/` against the orchestrator's
+   composition; produce a discrepancy list; report it to the user. (Read-only inventory and
+   usage assessment are `harness-review`'s job — call it for the deeper read.)
+2. **Change incrementally.** Add, modify, or remove one agent or skill at a time. Sync after
+   each change rather than batching.
+3. **Record history.** Append the date, change, target, and reason to the `CLAUDE.md` table.
+4. **Validate.** Re-check the changed agents and skills structurally; if the change affects
+   triggering, re-check the descriptions; for large changes (an architecture change, or
+   adding/removing several agents), re-run the relevant tests. Finally, confirm the
+   `CLAUDE.md` record matches the actual files.
+
+## 7. Periodic tool review
+
+If the harness has a tools registry (`references/tools.md` under the orchestrator, produced
+by the tool-discovery step), the tools in it are not permanent. Re-evaluate them
+periodically — not on a fixed clock, but when a signal warrants it:
+
+- A registered tool is no longer being used by any agent or skill.
+- A tool has stopped being maintained, or a better alternative has appeared.
+- The preferred tool fails often enough that agents are running on its alternative in
+  practice.
+
+The split follows the rest of this plugin: assessing whether a tool is still earning its
+place is a read activity, so it belongs to `harness-review` (it reads the registry as one of
+its usage signals); swapping, adding, or retiring a tool is a write, so it comes back here.
+When you change the registry, update the affected row's `Last reviewed` date, keep every
+role's alternative current, and record the change in the `CLAUDE.md` history like any other.
+Because agents and skills reference tools by role, swapping the tool behind a role needs no
+edits to their files.
diff --git a/agentic-harness/skills/harness-setup/references/orchestrator-template.md b/agentic-harness/skills/harness-setup/references/orchestrator-template.md
new file mode 100644
index 0000000..b78f365
--- /dev/null
+++ b/agentic-harness/skills/harness-setup/references/orchestrator-template.md
@@ -0,0 +1,208 @@
+# Orchestrator template
+
+The orchestrator is the top-level skill that coordinates the whole team. It comes in three
+shapes, one per execution mode. Pick the shape from the mode chosen in Step 2; for hybrid,
+state the mode per phase.
+
+## Table of contents
+
+- [Orchestrator template](#orchestrator-template)
+  - [Table of contents](#table-of-contents)
+  - [Template A — agent team (default)](#template-a--agent-team-default)
+  - [Template B — subagent (fallback / lightweight)](#template-b--subagent-fallback--lightweight)
+  - [Template C — hybrid](#template-c--hybrid)
+  - [Authoring rules](#authoring-rules)
+  - [Follow-up keywords](#follow-up-keywords)
+
+---
+
+## Template A — agent team (default)
+
+The first choice when two or more agents need to talk while they work. Build the team with
+`TeamCreate`; coordinate over a shared task list and `SendMessage`.
+
+````markdown
+---
+name: {domain}-orchestrator
+description: "Coordinates the {domain} agent team to produce {deliverable}. {initial keywords}. Also use for follow-ups: re-run, update, modify, supplement, improve the previous result, and everyday {domain} requests."
+model: opus
+---
+
+# {Domain} orchestrator
+
+Coordinates the {domain} agent team to produce {final deliverable}.
+
+## Execution mode: agent team
+
+## Team
+
+| Member | Type | Role | Skill | Output |
+|--------|------|------|-------|--------|
+| {member-1} | {custom or built-in} | {role} | {skill} | {file} |
+| {member-2} | ... | ... | ... | ... |
+
+## Workflow
+
+### Phase 0: context check
+Branch on whether prior work exists:
+- `_agents_workspace/` absent → initial run; go to Phase 1.
+- `_agents_workspace/` present + a partial-change request → partial re-run; re-invoke only
+  the affected member and overwrite only its output.
+- `_agents_workspace/` present + new input → new run; move the old `_agents_workspace/` to
+  `_agents_workspace/archive/{YYYYMMDD_HHMMSS}/`, then go to Phase 1.
+On a partial re-run, pass the prior output path into the member's prompt so it reads the
+existing result and folds in the change.
+
+### Phase 1: prepare
+1. Read the input and identify {what to identify}.
+2. Create `_agents_workspace/` (or recreate it after archiving the old one on a new run).
+3. Save the input under `_agents_workspace/00_input/`.
+
+### Phase 2: form the team
+1. `TeamCreate(team_name, members: [...])` — each member with its name, type, model, and a
+   role prompt.
+2. `TaskCreate(tasks: [...])` — roughly five to six tasks per member; declare dependencies
+   with `depends_on`.
+
+### Phase 3: {the main work}
+Members claim tasks from the shared list and work independently. State the communication
+rules: who passes what to whom via `SendMessage`; that each member saves its output to a
+file and notifies the leader on completion; that a member requesting another's result asks
+over `SendMessage`. The leader watches progress (it is notified when a member goes idle),
+nudges or reassigns a stuck member, and checks state with `TaskGet`.
+
+| Member | Output path |
+|--------|-------------|
+| {member-1} | `_agents_workspace/{phase}_{member-1}_{artifact}.md` |
+
+### Phase 4: integrate
+1. Wait for all tasks to complete (`TaskGet`).
+2. Read each member's output.
+3. Apply the integration logic; where members conflict, record both with their sources.
+4. Write the final deliverable to the user's target path.
+
+### Phase 5: tear down
+1. Ask members to stop (`SendMessage`); `TeamDelete`.
+2. Keep `_agents_workspace/` (do not delete intermediate work).
+3. Report a summary to the user.
+
+## Error handling
+
+| Situation | Strategy |
+|-----------|----------|
+| One member fails or stalls | leader checks via `SendMessage`, restarts or replaces it |
+| Most members fail | tell the user, confirm whether to continue |
+| Timeout | use the partial results, stop the unfinished members |
+| Members disagree | record both with sources; do not delete either |
+| Task state lags | leader confirms with `TaskGet`, corrects with `TaskUpdate` |
+
+## Test scenarios
+- **Normal:** input → analysis → team of N + M tasks → self-coordinated work → integration →
+  teardown → the deliverable exists at its path.
+- **Error:** a member stops mid-phase → leader gets the idle notice → restart attempt →
+  on repeated failure, reassign its task → integrate the rest → note the gap in the report.
+````
+
+## Template B — subagent (fallback / lightweight)
+
+When team communication is unnecessary, or the team tools are unavailable. Spawn each agent
+with the `Agent` tool and collect return values.
+
+````markdown
+---
+name: {domain}-orchestrator
+description: "Coordinates {domain} agents to produce {deliverable}. {initial keywords} + follow-up keywords."
+model: opus
+---
+
+## Execution mode: subagent
+
+## Agents
+
+| Agent | subagent_type | Role | Skill | Output |
+|-------|---------------|------|-------|--------|
+| {agent-1} | {built-in or custom} | {role} | {skill} | {file} |
+
+## Workflow
+
+### Phase 0: context check
+Same branch as Template A — on whether `_agents_workspace/` exists.
+
+### Phase 1: prepare
+Read the input; create `_agents_workspace/` (archiving any old one on a new run).
+
+### Phase 2: run in parallel
+Invoke the agents in a single message:
+
+| Agent | Input | Output | model | run_in_background |
+|-------|-------|--------|-------|-------------------|
+| {agent-1} | {source} | `_agents_workspace/{phase}_{agent}_{artifact}.md` | opus | true |
+
+### Phase 3: integrate
+Collect each return value; read file outputs; apply the integration logic; write the final
+deliverable.
+
+### Phase 4: finish
+Keep `_agents_workspace/`; report a summary.
+
+## Error handling
+- One agent fails: retry once; on a second failure, note the omission and continue.
+- Most fail: tell the user and confirm.
+- Timeout: use the partial results.
+````
+
+## Template C — hybrid
+
+A different mode per phase. State `**Execution mode:** {team | subagent}` at the top of each
+phase.
+
+````markdown
+---
+name: {domain}-orchestrator
+description: "{domain} orchestrator (hybrid). {keywords} + follow-up keywords."
+model: opus
+---
+
+## Execution mode: hybrid
+
+| Phase | Mode | Why |
+|-------|------|-----|
+| Phase 2 (parallel collection) | subagent | independent collection, no coordination needed |
+| Phase 3 (consensus integration) | agent team | reconcile conflicting inputs by discussion |
+| Phase 4 (independent verification) | subagent | one QA agent verifies objectively |
+
+### Phase 2: collect — **Execution mode:** subagent
+Invoke N agents in parallel (`run_in_background: true`); save each to
+`_agents_workspace/02_{agent}_raw.md`.
+
+### Phase 3: integrate — **Execution mode:** agent team
+`TeamCreate` an integration team; `TaskCreate` work that reads the Phase 2 files; members
+reconcile conflicts via `SendMessage`; write `_agents_workspace/03_integrated.md`;
+`TeamDelete`.
+
+### Phase 4: verify — **Execution mode:** subagent
+One QA subagent reads `_agents_workspace/03_integrated.md` and writes a verification report.
+````
+
+**Transition rules.** Team → subagent: `TeamDelete` before any `Agent` call. Subagent →
+team: pass the subagent's file outputs to members as read paths. Team → team: tear down the
+old team before the next `TeamCreate` (only one team is active per session).
+
+## Authoring rules
+
+1. State the execution mode at the top. For hybrid, a per-phase mode table is required.
+2. For team mode, be concrete about `TeamCreate` / `TaskCreate` / `SendMessage` usage.
+3. For subagent mode, fully specify each `Agent` call (name, type, prompt, model,
+   `run_in_background`).
+4. Use clear paths under `_agents_workspace/`; avoid ambiguous relative paths.
+5. State inter-phase dependencies — which phase needs which phase's output. For hybrid,
+   call out the transition points.
+6. Make error handling realistic — do not assume everything succeeds.
+7. Include at least one normal and one error test scenario.
+
+## Follow-up keywords
+
+Initial-run keywords alone leave the harness unused after its first run. Put follow-up
+phrasings in the description: re-run, run again, update, modify, supplement; "only the
+{part} again"; "based on the previous result", "improve the result"; and everyday domain
+requests (for a launch-planning harness: "launch", "promotion", and the like).
diff --git a/agentic-harness/skills/harness-setup/references/skill-writing-guide.md b/agentic-harness/skills/harness-setup/references/skill-writing-guide.md
new file mode 100644
index 0000000..b968501
--- /dev/null
+++ b/agentic-harness/skills/harness-setup/references/skill-writing-guide.md
@@ -0,0 +1,154 @@
+# Skill writing guide
+
+How to write the skills a generated harness uses, so they trigger reliably and earn their
+context cost. Supplements Step 4.
+
+## Table of contents
+
+- [Skill writing guide](#skill-writing-guide)
+  - [Table of contents](#table-of-contents)
+  - [1. Writing the description](#1-writing-the-description)
+  - [2. Body style](#2-body-style)
+  - [3. Output formats and examples](#3-output-formats-and-examples)
+  - [4. Progressive disclosure](#4-progressive-disclosure)
+  - [5. When to bundle a script](#5-when-to-bundle-a-script)
+  - [6. Data-schema standards](#6-data-schema-standards)
+  - [7. What to leave out](#7-what-to-leave-out)
+
+---
+
+## 1. Writing the description
+
+The description is the only thing that decides whether a skill triggers. The model judges
+from the name and description alone, and it tends to skip a skill for a task it thinks it can
+handle bare-handed. So:
+
+1. Say both **what the skill does** and **the specific situations that should trigger it**.
+2. Mark the boundary — name the near-miss cases where a *different* tool is the right fit, so
+   this skill does not fire on them.
+3. Lean slightly pushy, to offset the conservative default.
+
+**Weak:** `"A skill that processes data"` — which data, which task? Too vague to trigger.
+
+**Strong:** `"All spreadsheet operations — add columns, compute formulas, format, chart,
+clean data — for .xlsx / .csv / .tsv. Use whenever the user mentions a spreadsheet file,
+even in passing. Not for a Word document or a PDF: use the matching skill for those."`
+
+The strong version enumerates concrete actions, states the trigger, and draws the boundary.
+
+## 2. Body style
+
+**Explain the why.** A rule the agent understands generalizes; a bare command does not.
+
+Weak:
+```text
+ALWAYS use a table-aware extractor. NEVER use a plain text extractor for tables.
+```
+Strong:
+```text
+Use a table-aware extractor for tables: a plain text extractor loses the row and column
+structure, while a table-aware one recognizes cell boundaries and returns structured rows.
+```
+
+**Generalize, don't overfit.** Fix at the level of the principle, not the one example that
+failed.
+
+Weak: `If a column is named "Q4 Revenue", convert it to numbers.`
+Strong: `If a column name implies a numeric quantity (revenue, amount, count), convert it to
+a number; if conversion fails, keep the original value.`
+
+**Write imperatively.** A skill is an instruction sheet: "do X", not "the skill does X".
+
+**Spend context deliberately.** The window is shared. For each sentence ask: does the agent
+already know this (cut it), would it err without it (keep it), would one example replace the
+explanation (swap it)?
+
+## 3. Output formats and examples
+
+When the deliverable's shape matters, show the template:
+
+```text
+## Report structure
+# {Title}
+## Summary
+## Findings
+## Recommendations
+```
+
+Keep it short — a concrete example teaches faster than a long spec. For transformations,
+pair input with output:
+
+```text
+Input:  Add token-based user authentication
+Output: feat(auth): add token-based authentication
+```
+
+## 4. Progressive disclosure
+
+A skill loads in three stages: metadata (always in context), the SKILL.md body (on trigger),
+and `references/` files (only when read). Use the stages:
+
+- As the body nears ~500 lines, move detail into `references/` and leave a one-line pointer
+  that says when to read it.
+- Give any reference over ~300 lines a table of contents at the top.
+- Split domain- or framework-specific variants into separate reference files, so only the
+  relevant one loads:
+
+```text
+deploy-skill/
+├── SKILL.md            (workflow + which file to load)
+└── references/
+    ├── provider-a.md
+    ├── provider-b.md
+    └── provider-c.md
+```
+
+## 5. When to bundle a script
+
+Watch the agents' transcripts during testing. Bundle when a pattern repeats:
+
+| Signal | Action |
+|--------|--------|
+| The same helper script is written in every test run | bundle it under `scripts/` |
+| The same dependency is installed every time | state the install step in the skill |
+| The same multi-step approach recurs | write it up as a standard procedure in the body |
+| The same workaround follows the same error every time | document the issue and its fix |
+| The same document or report is read or produced | write it up and define a json data schema for input/output payload to interact with it |
+
+A bundled script must pass an execution test of its own.
+
+## 6. Data-schema standards
+
+When a harness tests its own generated skills, a shared schema keeps results comparable.
+
+Test-case metadata:
+```json
+{
+  "eval_id": 0,
+  "eval_name": "descriptive-name",
+  "prompt": "the user's task prompt",
+  "assertions": ["the deliverable contains X", "a file in format Y was produced"]
+}
+```
+
+Assertion-based grading — use the field names `text`, `passed`, `evidence` exactly:
+```json
+{
+  "expectations": [
+    { "text": "a margin column was added", "passed": true, "evidence": "column E holds margin_pct" }
+  ],
+  "summary": { "passed": 2, "failed": 1, "total": 3, "pass_rate": 0.67 }
+}
+```
+
+Timing — capture from the completion notification immediately; it cannot be recovered later:
+```json
+{ "total_tokens": 84852, "duration_ms": 23332 }
+```
+
+## 7. What to leave out
+
+- Supplementary docs (README, CHANGELOG, install guides).
+- Meta-notes about how the skill was built (test logs, iteration history).
+- End-user manuals — a skill is an instruction sheet for an agent, not a human.
+- General knowledge the model already has.
diff --git a/agentic-harness/skills/harness-setup/references/team-examples.md b/agentic-harness/skills/harness-setup/references/team-examples.md
new file mode 100644
index 0000000..4e56718
--- /dev/null
+++ b/agentic-harness/skills/harness-setup/references/team-examples.md
@@ -0,0 +1,157 @@
+# Team examples
+
+Worked examples across generic domains. Each shows the architecture pattern, the execution
+mode, the agent composition, and the coordination shape. Adapt them — do not copy them
+literally.
+
+## Table of contents
+
+- [Team examples](#team-examples)
+  - [Table of contents](#table-of-contents)
+  - [1. Multi-source research (team, fan-out/fan-in)](#1-multi-source-research-team-fan-outfan-in)
+  - [2. Long-form authoring (team, pipeline + fan-out)](#2-long-form-authoring-team-pipeline--fan-out)
+    - [A full agent file — `consistency-reviewer.md`](#a-full-agent-file--consistency-reviewermd)
+  - [3. Generate-and-review (subagent, producer–reviewer)](#3-generate-and-review-subagent-producerreviewer)
+  - [4. Code review (team, fan-out/fan-in + debate)](#4-code-review-team-fan-outfan-in--debate)
+  - [5. Large migration (team, supervisor)](#5-large-migration-team-supervisor)
+  - [What every example shares](#what-every-example-shares)
+
+---
+
+## 1. Multi-source research (team, fan-out/fan-in)
+
+Several researchers cover different source types in parallel; the leader integrates.
+
+| Member | Type | Scope | Output |
+|--------|------|-------|--------|
+| primary-source-researcher | general-purpose | official / first-party sources | `research_primary.md` |
+| press-researcher | general-purpose | reporting and analysis | `research_press.md` |
+| community-researcher | general-purpose | forums and social discussion | `research_community.md` |
+| context-researcher | general-purpose | background, comparables, prior art | `research_context.md` |
+| (leader = orchestrator) | — | integrate into one report | `synthesis-report.md` |
+
+The researchers use the `general-purpose` built-in type but are still defined as files —
+each file states the scope and the team communication protocol so the roles are reusable and
+the collaboration has a contract.
+
+Coordination: the four work independently, but pass relevant finds to each other over
+`SendMessage` (the press researcher hands an acquisition rumor to the context researcher;
+the community researcher flags a press item to the press researcher). Conflicting claims are
+debated directly and, in the final report, recorded side by side with their sources. This
+cross-talk is why it is a team rather than four isolated subagents.
+
+## 2. Long-form authoring (team, pipeline + fan-out)
+
+A document built in stages, with parallel groundwork.
+
+```text
+Phase 1 (team, parallel):  structure-planner + source-curator + outline-builder
+                           — keep each other consistent via SendMessage
+Phase 2 (subagent):        section-writer drafts from the Phase 1 outputs
+Phase 3 (team, parallel):  fact-checker + consistency-reviewer review the draft
+                           — share findings via SendMessage
+Phase 4 (subagent):        section-writer revises to reflect the review
+```
+
+Phase 1 forms a team, then tears it down. Phase 2 needs no team (one writer working alone),
+so it runs as a subagent reading the Phase 1 files from `_agents_workspace/`. Phase 3 forms a
+new review team (only one team is active at a time, but Phase 1's was already disbanded).
+Phase 4 is again a lone subagent.
+
+### A full agent file — `consistency-reviewer.md`
+
+```markdown
+---
+name: consistency-reviewer
+description: "Checks a long-form draft for internal consistency — terminology, claims, and cross-references that contradict each other."
+model: opus
+---
+
+# Consistency reviewer
+
+You check a draft for internal contradictions. You do not judge prose quality — that is the
+writer's concern. You judge whether the document agrees with itself.
+
+## Core role
+1. Flag terms used with two different meanings.
+2. Flag claims in one section that a later section contradicts.
+3. Flag cross-references that point to the wrong place or to nothing.
+
+## Working principles
+- Report file and location for every issue, so the writer can act without searching.
+- Distinguish a true contradiction from a stylistic variation; report only the former.
+
+## Input / output protocol
+- Input: the draft at `_agents_workspace/02_section-writer_draft.md`.
+- Output: `_agents_workspace/03_consistency_report.md`.
+- Format: one entry per issue — location, the two conflicting statements, suggested fix.
+
+## Team communication protocol
+- To fact-checker: SendMessage when a consistency issue depends on a factual question.
+- To the leader: a report distinguishing confirmed issues from open questions.
+
+## Error handling
+- If the draft is missing, report that and stop rather than inventing content.
+
+## Collaboration
+- Works alongside fact-checker; the two divide internal vs external correctness.
+```
+
+## 3. Generate-and-review (subagent, producer–reviewer)
+
+Two agents — a producer and a reviewer — looping until the output passes. With only two
+agents and a hand-off that matters more than conversation, subagent mode fits; cap the loop
+at two or three rounds.
+
+| Agent | subagent_type | Role |
+|-------|---------------|------|
+| asset-producer | custom | generate the batch of outputs |
+| asset-reviewer | custom | inspect each, mark PASS / FIX / REDO |
+
+The reviewer judges on objective criteria (completeness, consistency, legibility), not
+taste, and writes a per-item verdict with a specific fix instruction. Items marked REDO go
+back to the producer with that instruction; after the retry cap, a still-failing item is
+passed with a warning. If a large fraction needs REDO, the reviewer proposes revising the
+generation prompt rather than looping further.
+
+## 4. Code review (team, fan-out/fan-in + debate)
+
+Reviewers with different lenses examine the same change and talk to each other directly.
+
+```text
+[leader] → TeamCreate(review-team)
+    ├── security-reviewer:    vulnerabilities, input handling, authz
+    ├── performance-reviewer: hot paths, query patterns, allocations
+    └── test-reviewer:        coverage and meaningful assertions
+    → reviewers cross-message; leader synthesizes
+```
+
+The value is in the cross-talk: the security reviewer flags an injectable query and asks the
+performance reviewer to weigh in; the performance reviewer finds an N+1 query and asks the
+test reviewer whether a test would have caught it. Reviewers reach each other without routing
+through the leader, which catches cross-domain issues a set of siloed reviews would miss.
+
+## 5. Large migration (team, supervisor)
+
+A supervisor analyzes the file set and hands out batches dynamically.
+
+```text
+[supervisor] → analyze file list → register batches as tasks
+    ├→ migrator-1 (claims a batch)
+    ├→ migrator-2 (claims a batch)
+    └→ migrator-3 (claims a batch)
+    ← on completion, claims the next; on failure, supervisor diagnoses and reassigns
+```
+
+Unlike fan-out, batches are not fixed in advance — workers claim the next item from the
+shared task list as they free up, which keeps fast workers busy and lets the supervisor
+rebalance around failures. When all batches are done, the supervisor runs the integration
+test.
+
+## What every example shares
+
+- Every agent is a file under `.claude/agents/`, with the required sections and — in team
+  mode — a team communication protocol.
+- Skills live under `.claude/skills/{name}/SKILL.md`.
+- The orchestrator names the agents and the workflow and states the execution mode. See
+  `references/orchestrator-template.md`.
diff --git a/agentic-harness/skills/harness-setup/references/tool-discovery.md b/agentic-harness/skills/harness-setup/references/tool-discovery.md
new file mode 100644
index 0000000..8cd3a60
--- /dev/null
+++ b/agentic-harness/skills/harness-setup/references/tool-discovery.md
@@ -0,0 +1,96 @@
+# Tool discovery
+
+How the optional, on-request tool-discovery step works: how to brief the search subagent,
+how candidates are accepted, and how accepted tools are registered so agents and skills can
+use them without hard-coding a tool name. Supplements Step 1b.
+
+## When it runs
+
+Only when the user asks ("find tools / MCPs / plugins for this project"). It is never part of
+an automatic build. It runs equally well during an initial build or standalone against an
+existing harness. This skill ships **no catalog of recommendations** — every candidate comes
+from a live search and the local configuration, and nothing is adopted without the user's
+explicit approval.
+
+## Step 1: Brief the search subagent
+
+Dispatch a search-optimized subagent and give it a tight context so its search is grounded in
+this project, not generic. Use the `general-purpose` type (it has web search and fetch); run
+it in the background while you continue. The brief:
+
+```text
+You are searching for external tools — MCP servers and Claude Code plugins — that would help
+this project's harness. Project context:
+- Domain: {from Step 1}
+- Stack / languages / frameworks: {from Step 1}
+- Core task types: {creation / validation / analysis / ...}
+- Pain points or repeated manual work the user mentioned: {if any}
+
+For each candidate, return: the ROLE it fills (e.g. "knowledge base", "language server",
+"issue tracker CLI"), the concrete tool, what it does, its maturity/maintenance signal, and
+its main trade-off or cost. Do not rank or recommend — just surface grounded candidates with
+evidence. Prefer well-maintained, widely-used tools; flag anything experimental.
+```
+
+The categories are illustrative, not prescriptive — a document-heavy project might benefit
+from a markup-conversion MCP, a code project from a language-server tool, a project tracked
+in an external issue tracker from that tracker's CLI or MCP. What actually fits is whatever the
+search and the project context turn up.
+
+## Step 2: Check the local configuration
+
+Before proposing anything, inspect what is already available in the local and session
+configuration — connected MCP servers, installed plugins, CLIs on the path. The point is
+twofold: don't propose a tool the project already has, and surface useful tools already
+present that the harness isn't using yet. Fold both into the candidate list.
+
+## Step 3: Get explicit acceptance
+
+Present the candidates to the user, each with its role, what it does, and its trade-off. The
+user **accepts or rejects each one individually**. Adopt only the accepted ones. Do not
+batch-accept, and do not adopt a tool on the user's behalf because it "seems useful" — an
+external tool is a dependency and a trust decision, and that decision is the user's.
+
+For an accepted tool that needs configuration (an MCP server, a plugin), set it up only as
+far as the user authorizes, and note any credentials or installation the user must complete
+themselves.
+
+## Step 4: Register accepted tools
+
+Record accepted tools **by role** in the registry under the orchestrator:
+`.claude/skills/{domain}-orchestrator/references/tools.md`. One registry per harness; the
+orchestrator owns it. The schema:
+
+```markdown
+# Tools registry
+
+Agents and skills reference a tool by its **role**, never by a hard tool name. When the
+preferred tool is unavailable, fall back to the alternative. Reviewed on the dates below.
+
+| Role | Preferred tool | Alternative (if unavailable) | Status | Last reviewed |
+|------|----------------|------------------------------|--------|---------------|
+| knowledge-base | {accepted MCP} | built-in file search | active | {YYYY-MM-DD} |
+| language-server | {accepted tool} | manual code reading | active | {YYYY-MM-DD} |
+| issue-tracker | {accepted CLI} | none — note the gap | active | {YYYY-MM-DD} |
+```
+
+Every role needs an **alternative**, even if the alternative is "none, and here is what the
+agent does without it." The alternative is what keeps the harness working when a tool is
+absent or unreachable — the same graceful-degradation idea the marketplace applies to its own
+optional connectors.
+
+## Step 5: Reference tools by role
+
+In an agent definition or a skill, name the **role**, not the tool. For example: "retrieve
+context using the `knowledge-base` tool from the orchestrator's tools registry; if it is
+unavailable, use the registered alternative." This keeps the concrete tool swappable: change
+the registry row and every agent and skill follows, with no edits to their files.
+
+## Step 6: Periodic review
+
+Registered tools are not permanent. A tool can fall out of use, stop being maintained, or be
+overtaken by a better option. Re-evaluate the registry periodically — whether each tool is
+still earning its place, and whether a better alternative now exists — and update the `Last
+reviewed` date. Assessing tool usage is a read activity that belongs to `harness-review`;
+acting on it (swapping or retiring a tool) is a write that comes back here. The cadence and
+the triggers are in `references/maintenance.md`.