OpenCode Swarm is built on a simple premise: multi-agent systems fail when they're unstructured.
Most frameworks throw agents at a problem and hope coherence emerges. It doesn't. You get race conditions, conflicting changes, lost context, and code that doesn't work.
Swarm enforces discipline:
- One Architect owns all decisions
- One task executes at a time
- Every task gets QA'd before the next starts
- Project state persists in files, not memory
┌─────────────┐
│ ARCHITECT │
│ (control) │
└──────┬──────┘
│
┌──────────────────┼──────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ EXPLORER │ │ SMEs │ │ PIPELINE │
│ (discovery) │ │ (advisory) │ │ (execution) │
└───────────────┘ └───────────────┘ └───────────────┘
│
┌──────────┴──────────┐
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ CODER │ │ QA │
│ (implement) │ │ (verify) │
└─────────────┘ └─────────────┘
- Owns the plan
- Makes all delegation decisions
- Synthesizes inputs from other agents
- Handles failures and escalations
- Maintains project memory
Fast codebase scanner and factual mapping agent. Explorer is strictly observational — it reports what is observed without judgment, verdict, or directive.
- Explorer analyzes directly — does NOT delegate to sub-agents
- Output is under 2000 characters
- Read-only: cannot write, edit, or patch code
Explorer systematically reports across four dimensions:
| Dimension | Content |
|---|---|
| STRUCTURE | Entry points and call chains (≤3 levels), public API surface (exports with signatures), internal/external dependencies |
| PATTERNS | Design patterns (factory, observer, strategy), error handling approach (throw/Result/error callback), state management (global/module/passed), configuration pattern (env/config/hardcoded) |
| COMPLEXITY INDICATORS | Structural complexity — cyclomatic complexity, deep nesting, complex control flow, large files (>500 lines), deep inheritance/type hierarchies |
| RUNTIME/BEHAVIORAL CONCERNS | Missing error handling paths, platform-specific assumptions (path separators, line endings, OS APIs) |
| Section | When Populated | Example |
|---|---|---|
OBSERVED CHANGES |
INPUT referenced specific files | What changed in those targets |
CONSUMERS_AFFECTED |
Integration impact mode | Files importing changed symbols |
RELEVANT CONSTRAINTS |
Always | Architectural patterns, error handling coverage, platform assumptions, established conventions |
FOLLOW-UP CANDIDATE AREAS |
Always | Observable conditions warranting later review (e.g., "function has 12 parameters — consider splitting") |
DOMAINS |
Always | Relevant SME domains (typescript, nodejs, powershell, etc.) |
Explorer output uses observational language only. The following were removed:
VERDICT→ replaced withCOMPATIBILITY SIGNALS(COMPATIBLE / INCOMPATIBLE / UNCERTAIN)MIGRATION_NEEDED→ replaced withMIGRATION_SURFACE(observable call signatures affected)REVIEW NEEDED,dead,missinglabels → removed- All directive language → recast as observable conditions
Activates when delegated with "Integration impact analysis" or INPUT lists contract changes. Uses diff + imports tools to classify each change as BREAKING or COMPATIBLE and list all affected consumer files.
Automatically activates during codebase reality check at plan ingestion. Uses doc_scan to build a manifest index of project docs, then doc_extract to surface relevant constraints per task using Jaccard bigram similarity scoring.
Explorer also operates in two curator modes for phase boundary consolidation:
| Mode | Trigger | Purpose |
|---|---|---|
CURATOR_INIT |
Session start | Consolidates prior session knowledge into architect briefing; flags contradictions |
CURATOR_PHASE |
Phase complete | Extends running digest with phase outcomes; observes workflow deviations and knowledge update candidates |
Both curator modes are dispatched via the standard MODE: EXPLORER with the appropriate curator mode trigger (CURATOR_INIT or CURATOR_PHASE). Explorer uses OBSERVATIONS (not KNOWLEDGE_UPDATES) and reports with observational language — no directives.
Explorer's behavioral boundaries were tightened across Phases 3–5 to eliminate residual routing authority and preserve its discovery function.
Explorer's scope is strictly factual. Explorer reports what is observed without judgment, verdict, or directive. Explorer does not route tasks, prioritize findings, or declare areas out-of-scope.
Architect owns all routing decisions. DOMAINS and FOLLOW-UP CANDIDATE AREAS are Explorer output fields — they are informational hints for the architect to consider. Explorer does not delegate, dispatch, or determine next steps. Only the architect decides what to act on.
Reviewer must validate Explorer findings. Explorer observations are not authoritative. Reviewer and other consumers of Explorer output must independently verify factual claims before using them in decisions or implementations.
Critic remains a pure challenge layer. Critic does not map code, scan files, or make routing decisions. Its role is to challenge the architect's plan — not to replace the explorer.
Speed preservation is a design goal. Explorer's efficiency is intentional. Lightweight, fast scans serve the architect better than exhaustive analysis that delays routing decisions.
Explorer output is for architect consumption, not autonomous action. No agent other than the architect should treat Explorer output as a directive or verdict.
Test coverage. Role boundary hardening is validated by explorer-role-boundary.test.ts (25 tests) and explorer-consumer-contract.test.ts (58 tests). All 182 explorer tests pass. The test suite enforces that Explorer output contains only observational language, that VERDICT and MIGRATION_NEEDED never appear, and that routing authority is never asserted by Explorer.
Before and after examples:
The following judgmental language was systematically removed from Explorer output and replaced with observational language only:
| Removed (judgmental) | Added (observational) |
|---|---|
VERDICT: INCOMPATIBLE — needs rewrite |
COMPATIBILITY SIGNALS: INCOMPATIBLE — observable call signatures affected: 3 |
VERDICT: COMPATIBLE — safe to use |
COMPATIBILITY SIGNALS: COMPATIBLE — no observable call signature conflicts detected |
REVIEW NEEDED — dead code |
FOLLOW-UP CANDIDATE AREAS: function has 12 parameters, 8 of which are unused in current call sites |
MIGRATION_NEEDED: X must be rewritten |
MIGRATION_SURFACE: 5 call sites reference the affected function signature |
REVIEW NEEDED — missing labels |
FOLLOW-UP CANDIDATE AREAS: 3 exported functions lack doc comments |
All directive language (must, should, needs, verdict, review needed, dead) was recast as observable conditions. Explorer reports facts; the architect decides what to do with them.
- UI/UX specification agent
- Generates component scaffolds and design tokens before coding begins on UI-heavy tasks
- Runs in MODE: EXECUTE before Coder (Rule 9)
- Single open-domain expert (any domain: security, ios, rust, kubernetes, etc.)
- Consulted serially, one call per domain
- Guidance cached in context.md
- Read-only (cannot write code)
- Coder: Implements one task at a time
- Reviewer: Dual-pass review — general correctness first, then automatic security-only pass for security-sensitive files (OWASP Top 10 categories)
- Test Engineer: Generates verification tests + adversarial tests (attack vectors, boundary violations, injection attempts)
- Gates: Automated
diff,imports,lint, andsecretscantools verify contracts, dependencies, style, and security before/during review.
- Reviews architect's plan BEFORE implementation begins
- Returns APPROVED / NEEDS_REVISION / REJECTED
- Read-only (cannot write code)
- Documentation synthesizer
- Automatically updates READMEs, API docs, and guides based on implementation changes
- Runs in Phase 6 as part of project wrap-up
Is .swarm/plan.md present?
├── YES → Read plan.md and context.md
│ Find current phase and task
│ Resume execution
│
└── NO → New project
Proceed to Phase 1
Is the user request clear?
├── YES → Proceed to Phase 2
│
└── NO → Ask up to 3 clarifying questions
Wait for answers
Then proceed
@explorer analyzes codebase
│
├── Project structure
├── Languages and frameworks
├── Key files
├── Patterns observed
└── Relevant SME domains
For each relevant domain:
│
├── Check context.md for cached guidance
│ └── If cached → Skip this SME
│
└── If not cached:
├── Delegate to @sme with DOMAIN: [domain]
├── Wait for response
└── Cache guidance in context.md
Create/Update .swarm/plan.md:
│
├── Project overview
├── Phases (logical groupings)
│ └── Tasks (atomic units of work)
│ ├── Dependencies
│ ├── Acceptance criteria
│ └── Complexity estimate
│
└── Status tracking
MODE: CRITIC-GATE
@critic reviews plan
│
├── APPROVED → Proceed to MODE: EXECUTE
├── NEEDS_REVISION → Revise plan, resubmit (max 2 cycles)
└── REJECTED → Escalate to user
For each task in current phase:
│
├── Check dependencies complete
│ └── If blocked → Skip, mark [BLOCKED]
│
├── 5a. @coder implements (ONE task only)
│ └── → REQUIRED: Print task start confirmation
│
├── 5b. diff + imports tools analyze changes + semantic diff injection
│ ├── Detect contract changes (exports, interfaces, types)
│ ├── Track import dependencies across files
│ ├── system-enhancer injects AST-based semantic diff summary into reviewer context
│ │ - Computes AST diffs for changed files (up to 10, from `declaredCoderScope`)
│ │ - Enriches with blast radius (consumers count from repo graph)
│ │ - Generates risk-ranked markdown for reviewer prioritization
│ │ - Fully error-resilient: never throws, returns null on failure
│ └── → REQUIRED: Print change summary
│
├── 5c. syntax_check validates code syntax (v6.9.0)
│ ├── Tree-sitter parse validation for 20 languages
│ ├── Catches syntax errors before review
│ └── → REQUIRED: Print syntax status
│
├── 5d. placeholder_scan detects incomplete code (v6.9.0)
│ ├── Scans for TODO/FIXME comments
│ ├── Detects placeholder text and stub implementations
│ └── → REQUIRED: Print placeholder scan results
│
├── 5e. lint fix → lint:check (auto-fix then verify)
│ ├── Run `lint` tool with fix mode, then check mode
│ └── → REQUIRED: Print lint status
│
├── 5f. imports audit analyzes dependencies (AST-based)
│ ├── Track import dependencies across files
│ └── → REQUIRED: Print import analysis
│
├── 5g. build_check verifies compilation (v6.9.0)
│ ├── Runs repo-native build/typecheck commands
│ ├── Validates code compiles correctly
│ └── → REQUIRED: Print build status
│
├── 5h. pre_check_batch runs parallel verification (v6.10.0)
│ ├── Runs 4 tools in parallel with p-limit (max 4 concurrent):
│ │ ├── lint:check (code quality verification - hard gate)
│ │ ├── secretscan (secret detection - hard gate)
│ │ ├── sast_scan (security analysis - hard gate)
│ │ └── quality_budget (maintainability metrics)
│ ├── Returns unified result with gates_passed boolean
│ ├── If gates_passed === false → Return to coder with specific failures
│ └── → REQUIRED: Print gates_passed status
├── 5i. @reviewer reviews (correctness, edge-cases, performance)
│ ├── APPROVED → Continue
│ └── REJECTED → Retry from 5a (max 5)
│ └── → REQUIRED: Print approval decision
│
├── 5j. @reviewer security-only pass (if file matches security globs
│ or coder output contains security keywords)
│ ├── Security globs: auth, crypto, session, token, middleware, api, security
│ ├── Uses OWASP Top 10 2021 categories
│ └── → REQUIRED: Print security approval
│
├── 5k. @test_engineer generates AND runs verification tests
│ ├── PASS → Continue
│ └── FAIL → Send failures to @coder, retry from 5a with RETRY protocol
│ └── → REQUIRED: Print test results
│
├── 5l. @test_engineer adversarial testing pass
│ ├── Attack vectors, boundary violations, injection attempts
│ ├── PASS → Continue
│ └── FAIL → Send failures to @coder, retry from 5a with RETRY protocol
│ └── → REQUIRED: Print adversarial test results
│
├── 5m. architect regression sweep (scope:"graph")
│ ├── Runs test_runner with scope:"graph" to find cross-task test regressions
│ ├── If no related tests beyond task scope → SKIP. Print "regression-sweep: SKIPPED — no related tests beyond task scope"
│ ├── If additional tests pass → PASS. Print "regression-sweep: PASS [N additional tests, M files]"
│ ├── If additional tests FAIL → return to coder: "REGRESSION DETECTED: Your changes broke [N] tests in [test files]"
│ └── → REQUIRED: Print regression-sweep result
│
├── 5n. ⛔ HARD STOP: Pre-commit checklist (v6.11.0)
│ ├── [ ] All QA gates passed (lint:check, secretscan, sast_scan)
│ ├── [ ] Reviewer approval documented
│ ├── [ ] Tests pass with evidence
│ └── [ ] No security findings
│ └── → REQUIRED: Print checklist completion
│ **No override. A commit without completed QA gate is a workflow violation.**
│
└── 5o. TASK COMPLETION CHECKLIST (v6.11.0)
├── Evidence written to .swarm/evidence/{taskId}/
├── update_task_status called with status='completed' (advances state machine to 'complete')
└── → REQUIRED: Print completion confirmation
All tasks in phase done
│
├── 1. @explorer - Rescan codebase after changes
├── 2. @docs - Update documentation for all changes in this phase
├── 3. Update context.md with learnings and decisions
├── 4. Write retrospective evidence via write_retro tool
├── 4.5. Run evidence_check to verify all completed tasks have required evidence
├── 5. Run sbom_generate with scope='changed' for dependency snapshot
├── 5.5. Defense-in-depth drift check: Delegate to @critic_drift_verifier BEFORE phase_complete
│ - Returns early feedback on plan drift
│ - Architect calls `write_drift_evidence(phase, verdict, summary)` tool after critic_drift_verifier returns
│ - Writes drift verification evidence to .swarm/evidence/{phase}/drift-verifier.json
│ - Verdict automatically normalized: APPROVED → approved, NEEDS_REVISION → rejected
│ - Skip this step if spec.md does not exist
├── 5.6. Verify mandatory gate evidence exists:
│ - .swarm/evidence/{phase}/completion-verify.json (auto-written by completion-verify gate)
│ - .swarm/evidence/{phase}/drift-verifier.json (written by @critic_drift_verifier)
│ - .swarm/evidence/{phase}/hallucination-guard.json (if hallucination_guard enabled; written by write_hallucination_evidence)
│ - .swarm/evidence/{phase}/mutation-gate.json (if mutation_test enabled; written by write_mutation_evidence after generate_mutants + mutation_test)
│ If either missing: run the missing gate first
│ Note: All gates automatically bypassed in turbo mode
├── 6. Call phase_complete (enforces four phase gates automatically)
│ - Gate 1: completion-verify — deterministic identifier check in source files
│ - Gate 2: drift verifier evidence — reads drift-verifier.json for approved verdict
│ - Gate 3: hallucination guard — reads hallucination-guard.json for approved verdict (if enabled)
│ - Gate 4: mutation gate — reads mutation-gate.json for pass/warn/fail verdict (if enabled)
│ - All four gates bypassed when turbo mode is active
└── 7. Ask user: "Ready for Phase [N+1]?"
The phase_complete tool enforces two mandatory gates before marking a phase complete:
| Gate | Purpose | Blocking Reason | Turbo Bypass |
|---|---|---|---|
completion-verify |
Deterministic check that plan task identifiers exist in source files | COMPLETION_INCOMPLETE — zero identifiers found in target files |
Yes |
drift-verifier |
Evidence-based check that critic_drift_verifier approved the implementation |
DRIFT_VERIFICATION_MISSING or DRIFT_VERIFICATION_REJECTED |
Yes |
hallucination-guard |
Evidence-based check that critic_hallucination_verifier approved plan/implementation claims |
HALLUCINATION_VERIFICATION_MISSING or HALLUCINATION_VERIFICATION_REJECTED |
Yes |
mutation-test |
Evidence-based check that mutation tests achieved a passing kill rate | MUTATION_GATE_MISSING or MUTATION_GATE_FAIL |
Yes |
Gate 1: Completion Verify
- Parses plan task descriptions for identifiers (backtick, camelCase, PascalCase, config keys)
- Scans target source files for matches
- Blocks if zero identifiers found in any task's target files
- Non-blocking on errors — treats as warning and continues
Gate 2: Drift Verifier Evidence
- Reads
.swarm/evidence/{phase}/drift-verifier.json - Checks for entry with
typecontaining 'drift' andverdictof 'approved' - Blocks if evidence missing or verdict is 'rejected'
- Defense-in-depth: architect should delegate to
@critic_drift_verifierBEFORE callingphase_completefor early feedback - Uses
write_drift_evidencetool to persist verification results (verdict is normalized automatically)
Gate 3: Hallucination Guard (optional)
- Only enforced when
hallucination_guard: truein the QA gate profile - Reads
.swarm/evidence/{phase}/hallucination-guard.json - Checks for entry with
typecontaining 'hallucination' andverdictof 'approved' - Blocks if evidence missing or verdict is 'rejected'
- Architect should delegate to
@critic_hallucination_verifierand callwrite_hallucination_evidencebeforephase_complete - Enabled via
set_qa_gatestool withhallucination_guard: true
Gate 4: Mutation Gate (optional)
- Only enforced when
mutation_test: truein the QA gate profile (default: OFF) - Reads
.swarm/evidence/{phase}/mutation-gate.json - Checks for entry with
typeof 'mutation-gate' and verdict of 'pass', 'warn', or 'skip' - Blocks if verdict is 'fail' — resolve surviving mutants or lower threshold before continuing
- Warns (proceeds) if verdict is 'warn'
- Allows if verdict is 'pass' or 'skip'
- Architect workflow: call
generate_mutantsto create patches, pipe patches tomutation_testtool, then callwrite_mutation_evidencewith verdict and kill rate metrics beforephase_complete - Enabled via
set_qa_gatestool withmutation_test: true - Expensive: requires one LLM call per mutation phase due to LLM-based patch generation; OFF by default to avoid cost on all projects
Turbo Mode Behavior:
When hasActiveTurboMode() returns true, all four gates are automatically bypassed (completion-verify, drift-verifier, hallucination-guard, mutation-test). The phase_complete tool logs a warning and proceeds without enforcement.
The plan_cursor config enables a compact representation of the project plan that is injected into the LLM context, keeping the total token count under a configurable limit.
{
"plan_cursor": {
"enabled": true,
"max_tokens": 1500,
"lookahead_tasks": 2
}
}- enabled – When
true(default) Swarm injects a plan cursor instead of the fullplan.md. - max_tokens – Upper bound on tokens emitted for the cursor (default 1500). The cursor includes the current phase summary, the full current task, and up to
lookahead_tasksupcoming tasks. Earlier phases are reduced to one‑line summaries. - lookahead_tasks – Number of future tasks to include in full detail (default 2). Set to
0to show only the current task.
Disabling ("enabled": false) falls back to the pre‑v6.13 behavior of injecting the entire plan text.
Controls the size of tool outputs sent back to the LLM.
{
"tool_output": {
"truncation_enabled": true,
"max_lines": 150,
"per_tool": {
"diff": 200,
"symbols": 100,
"search": 200,
"batch_symbols": 150,
"suggest_patch": 50
}
}
}- truncation_enabled – Global switch (default true).
- max_lines – Default line limit for any tool output.
- per_tool – Overrides
max_linesfor specific tools.diff,symbols,search, andbatch_symbolsare truncated by default because their outputs can be very large.suggest_patchuses a conservative limit since patch output is typically compact.
When truncation is active, a footer is appended to the output:
---
[output truncated to {maxLines} lines – use `tool_output.per_tool.<tool>` to adjust]
Added as the default structured search path for workspace pattern lookup. Replaces shell grep workarounds with a machine-readable JSON output.
| Property | Type | Default | Description |
|---|---|---|---|
pattern |
string |
required | Search pattern (literal or regex) |
workspace |
string |
required | Root directory to search |
mode |
"literal" | "regex" |
"literal" |
Match mode |
glob |
string[] |
[] |
Include globs (e.g. ["**/*.ts"]) |
exclude |
string[] |
[] |
Exclude globs |
maxResults |
number |
100 |
Hard cap on returned matches |
maxLines |
number |
10 |
Max lines per match snippet |
context |
number |
0 |
Surrounding lines per match (only when > 0) |
Returns structured JSON with file, line, text, and truncated fields per hit. Falls back to a graceful error when ripgrep is unavailable. Not a structural AST search — combine with symbols and imports for full module analysis.
Registered for: architect, coder, reviewer, explorer, test_engineer.
Generates structured diff hunks for a target file without modifying it. Used by the reviewer agent to deliver actionable remediation artifacts in read-only review passes.
| Property | Type | Default | Description |
|---|---|---|---|
file |
string |
required | Target file path |
patch |
string |
required | Unified diff body |
context |
number |
3 |
Context lines per hunk |
Returns structured patch output with hunk anchors. Detects and reports stale context when the file content has changed since the patch was authored. Not a write tool — no file is modified. Not registered for coder agents.
Registered for: reviewer, architect.
Extracts exported symbols from multiple files in a single call. Each file is processed independently with per-file error isolation — a parse failure in one file does not affect others.
| Property | Type | Default | Description |
|---|---|---|---|
files |
string[] |
required | List of file paths to analyse |
includeTopLevel |
boolean |
false |
Include non-exported top-level definitions |
Returns a per-file symbol summary with exports, topLevel (optional), parseError, and empty fields. Benchmarks show 75–98% call reduction versus sequential single-file calls. Reuses symbols.ts parsing logic.
Registered for: architect, explorer, reviewer.
Canonical helper for stripping namespace prefixes from tool names (e.g. mega:search → search, mega.search → search). Exposed via src/hooks/normalize-tool-name.ts.
| Function | Description |
|---|---|
normalizeToolName(name) |
Returns the bare tool name string |
normalizeToolNameLowerCase(name) |
Returns the bare tool name in lowercase |
Replaces 13 inline replace(/^[^:]+[:.]/, '') regex patterns that were duplicated across guardrails.ts, scope-guard.ts, index.ts, delegation-gate.ts, self-review.ts, and delegation-ledger.ts. Test-file sites are excluded — they validate the raw pattern behavior directly.
Swarm now explicitly distinguishes five architect modes, which affect what injection blocks are added to the LLM prompt.
| Mode | When Injected |
|---|---|
DISCOVER |
After the explorer finishes scanning the codebase. |
PLAN |
When the architect writes or updates the plan. |
EXECUTE |
During task implementation (the normal pipeline). |
PHASE-WRAP |
After all tasks in a phase are completed, before docs are updated. |
UNKNOWN |
Fallback when the current state does not match any known mode. |
The Architect workflow uses explicit MODE labels internally to distinguish architect execution phases from project plan phases:
| MODE | Description |
|---|---|
MODE: RESUME |
Detect and restore previous session state |
MODE: CLARIFY |
Ask clarifying questions for ambiguous requirements |
MODE: DISCOVER |
Explore codebase structure and patterns |
MODE: CONSULT |
Consult SMEs for domain guidance |
MODE: PLAN |
Create or update project plan (includes CODEBASE REALITY CHECK on brownfield) |
MODE: CRITIC-GATE |
Plan review checkpoint before execution |
MODE: EXECUTE |
Task implementation with QA gates |
MODE: PHASE-WRAP |
Phase completion and retrospective |
CODEBASE REALITY CHECK (v6.29.2): Before any planning (MODE: PLAN), spec generation (MODE: SPECIFY), or plan import, the Architect dispatches Explorer in targeted scoped chunks to verify the current state of every referenced item. Produces a CODEBASE REALITY REPORT with statuses: NOT STARTED, PARTIALLY DONE, ALREADY COMPLETE, or ASSUMPTION INCORRECT. This prevents planning against stale assumptions (e.g., assuming a function exists when it was already fixed). Skipped for purely greenfield projects with no existing codebase references.
NAMESPACE RULE: MODE labels refer to the architect's internal workflow phases. Project plan phases (in .swarm/plan.md) remain as "Phase N" to avoid confusion.
All blocking steps require explicit printed output for visibility:
→ REQUIRED: Print {description}
This ensures:
- Clear progress tracking through gates
- Determinable failure points
- Evidence of execution for debugging
On gate failure, emit structured rejection:
RETRY #{count}/5
FAILED GATE: {gate_name}
REASON: {specific failure}
REQUIRED FIX: {actionable instruction}
RESUME AT: {step_5x}
Failure Counting: Track retry count, escalate to user after 5 failures.
The following rationalization patterns are explicitly blocked:
- "It's a simple change"
- "Just updating docs"
- "Only a config tweak"
- "Hotfix, no time for QA"
- "The tests pass locally"
- "I'll clean it up later"
- "No logic changes"
- "Already reviewed the pattern"
Rule: There are NO simple changes. There are NO exceptions to the QA gate sequence.
⛔ HARD STOP before marking any task complete:
- All QA gates passed (no overrides)
- Reviewer approval documented
- Tests pass with evidence
- No security findings
There is no override. A commit without a completed QA gate is a workflow violation.
Tasks classified by size with strict decomposition rules:
| Size | Criteria | Decomposition Required |
|---|---|---|
| SMALL | 1 file, single verb, <2 hours | No |
| MEDIUM | 1-2 files, compound action, <4 hours | No |
| LARGE | >2 files OR compound verbs | Yes |
Task Atomicity Checks (Critic validates):
- Max 2 files per task (otherwise decompose)
- No compound verbs ("and", "plus", "with") in task descriptions
- Clear acceptance criteria required
project/
├── .swarm/
│ ├── plan.md # Legacy phased roadmap (migrated to plan.json)
│ ├── plan.json # Machine-readable plan with Zod-validated schema
│ ├── context.md # Project knowledge, SME cache
│ ├── evidence/ # Per-task execution evidence
│ │ ├── 1.1/ # Evidence for task 1.1
│ │ └── 2.3/ # Evidence for task 2.3
│ └── history/
│ ├── phase-1.md # Archived phase summaries
│ └── phase-2.md
│
├── src/
│ ├── index.ts # Plugin entry — registers 8 hook types
│ ├── state.ts # Shared swarm state singleton (zero imports)
│ ├── agents/ # Agent definitions and factory
│ ├── config/ # Schema, constants, loader
│ ├── commands/ # Slash command handlers (12 commands)
│ │ ├── index.ts # Factory + dispatcher (createSwarmCommandHandler)
│ │ ├── status.ts # /swarm status
│ │ ├── plan.ts # /swarm plan [N]
│ │ ├── agents.ts # /swarm agents
│ │ ├── evidence.ts # /swarm evidence [task]
│ │ ├── archive.ts # /swarm archive [--dry-run]
│ │ └── reset.ts # /swarm reset --confirm
│ ├── hooks/ # Hook handlers
│ │ ├── index.ts # Barrel exports
│ │ ├── utils.ts # safeHook, composeHandlers, readSwarmFileAsync, estimateTokens
│ │ ├── extractors.ts # Plan/context file parsers
│ │ ├── pipeline-tracker.ts # Message transform (pipeline logging)
│ │ ├── context-budget.ts # Message transform (token budget warnings)
│ │ ├── system-enhancer.ts # System prompt transform + cross-agent context
│ │ ├── compaction-customizer.ts # Session compaction enrichment
│ │ ├── agent-activity.ts # Tool hooks (activity tracking + flush)
│ │ ├── delegation-tracker.ts # Chat message hook (active agent tracking)
│ │ └── cc-command-intercept.ts # CC command interception (hard-block /reset, soft-correct /plan)
│ ├── lang/ # Language profiles, framework detector (PHP, Laravel, etc.)
│ ├── tools/ # Domain detector, file extractor, gitingest, diff, retrieve-summary
│ ├── plan/ # Plan management
│ │ └── manager.ts # load/save/migrate/derive plan operations
│ └── evidence/ # Evidence bundle management
│ ├── index.ts # Barrel exports
│ └── manager.ts # CRUD: save/load/list/delete/archive evidence
│
├── tests/unit/ # 1211 tests across 54+ files (bun test)
│ ├── agents/ # creation (64), factory (20), architect-v6-prompt (15),
│ │ # security-categories (12)
│ ├── config/ # constants (14), schema (35), loader (17), plan-schema (40),
│ │ # evidence-schema (23), evidence-config (8),
│ │ # review-integration-schemas (20)
│ ├── hooks/ # pipeline-tracker (16), utils (25), system-enhancer (58),
│ │ # compaction-customizer (26), context-budget (23),
│ │ # extractors (32), agent-activity (14), delegation-tracker (16),
│ │ # guardrails (39), system-enhancer-v6 (18), cc-command-intercept (23 unit + 22 adversarial)
│ ├── commands/ # status (6), plan (9), agents (28), index (11),
│ │ # archive (8), benchmark (5)
│ ├── evidence/ # manager (25)
│ ├── plan/ # manager (40)
│ ├── tools/ # domain-detector (30), file-extractor (16), gitingest (5),
│ │ # diff (22), retrieve-summary (28)
│ ├── smoke/ # packaging (8)
│ └── state.test.ts # Shared state (31)
│
├── tests/adversarial/ # 22 adversarial tests for cc-command-intercept (bun test)
└── dist/ # Build output (ESM)
# Project: [Name]
Created: [ISO date]
Last Updated: [ISO date]
Current Phase: [N]
## Overview
[1-2 paragraph project summary]
## Phase 1: [Name] [STATUS]
Estimated: [SMALL/MEDIUM/LARGE]
- [x] Task 1.1: [Description] [SIZE]
- Acceptance: [Criteria]
- [ ] Task 1.2: [Description] [SIZE] (depends: 1.1)
- Acceptance: [Criteria]
- Attempt 1: REJECTED - [Reason]
- Attempt 2: REJECTED - [Reason]
- [BLOCKED] Task 1.3: [Description]
- Reason: [Why blocked]
## Phase 2: [Name] [PENDING]
...# Project Context: [Name]
## Summary
[What the project does, who it's for]
## Technical Decisions
- [Decision]: [Rationale]
## Architecture
[Key patterns, file organization]
## SME Guidance Cache
### [Domain] (Phase [N])
- [Guidance point]
## Patterns Established
- [Pattern]: [Where/how used]
## Known Issues / Tech Debt
- [ ] [Issue to address later]
## File Map
- [path]: [Purpose]| Agent | Read | Write | Execute | Delegate |
|---|---|---|---|---|
| architect | ✅ | ✅ | ✅ | ✅ |
| explorer | ✅ | ❌ | ❌ | ❌ |
| sme | ✅ | ❌ | ❌ | ❌ |
| coder | ✅ | ✅ | ✅ | ❌ |
| reviewer | ✅ | ❌ | ❌ | ❌ |
| critic | ✅ | ❌ | ❌ | ❌ |
| test_engineer | ✅ | ✅ | ✅ | ❌ |
Some tools are restricted to the architect and cannot be called by other agents:
| Tool | Purpose |
|---|---|
update_task_status |
Mark plan tasks as pending | in_progress | completed | blocked |
write_retro |
Write retrospective evidence bundles before phase_complete |
declare_scope |
Pre-declare which files the coder may modify for a given task |
Swarm uses file locking to prevent concurrent writes from corrupting shared state files. Even though tasks execute serially, agents may still race on plan.json and events.jsonl during phase transitions or when multiple sessions interact with the same project.
- Library:
proper-lockfilewithretries: 0(fail-fast — no polling) - Lock acquisition:
tryAcquireLock(directory, filename, agentName, taskId)before every write to a shared state file - Lock release:
_release()called in afinallyblock to ensure cleanup even on error - Tagging: Each lock records the agent name and task context for diagnostics
| File | Tool | Lock Behavior |
|---|---|---|
.swarm/plan.json |
update_task_status |
Exclusive lock acquired before calling updateTaskStatus(). Lock losers return success: false with recovery_guidance: "retry". |
.swarm/events.jsonl |
phase_complete |
Lock acquired before appendFileSync. If lock is unavailable, logs a warning and proceeds without lock protection (non-blocking). |
Caller A: tryAcquireLock(plan.json) → acquired=true → write → _release()
Caller B: tryAcquireLock(plan.json) → acquired=false → returns { success: false, recovery_guidance: "retry" }
Only one call wins. The loser receives:
{
"success": false,
"message": "Task status write blocked: plan.json is locked by architect (task: update-task-status-1.1-1234567890)",
"errors": ["Concurrent plan write detected — retry after the current write completes"],
"recovery_guidance": "Wait a moment and retry update_task_status. The lock will expire automatically if the holding agent fails."
}The architect should retry after a short delay. Sequential calls (no contention) always succeed.
Tasks execute serially, but the architect may dispatch multiple concurrent agent sessions or tool calls that race on plan/event file writes. Locking provides a second layer of protection against corruption from:
- Concurrent
update_task_statuscalls from different sessions phase_completeconcurrent appends toevents.jsonl- Race conditions during session rehydration or recovery
If the lock holder crashes, the OS or lock library will eventually clean up the stale lock file. On retry, the next call will acquire the lock and proceed. Swarm does not auto-retry on lock contention — the architect receives the error and decides when to retry.
Attempt 1: @coder implements
@reviewer rejects with feedback
Attempt 2: @coder fixes based on feedback
@reviewer rejects again
Attempt 3: @coder fixes again
@reviewer rejects
Escalation: Architect handles directly
OR re-scopes task
Document in plan.md
Task cannot proceed (external dependency):
├── Mark [BLOCKED] in plan.md
├── Record reason
├── Skip to next unblocked task
└── Inform user
Agent times out or errors:
├── Retry once
├── If still failing:
│ └── Architect handles directly
└── Document in context.md
When an agent encounters a transient model error (rate limit, 429, 503, timeout, overloaded, model not found), the guardrails hook detects the failure and triggers fallback behavior:
Detection: The toolAfter hook in guardrails.ts checks for null/undefined tool output combined with error messages matching TRANSIENT_MODEL_ERROR_PATTERN:
const TRANSIENT_MODEL_ERROR_PATTERN =
/rate.?limit|429|503|timeout|overloaded|model.?not.?found|temporarily unavailable|server error/i;Behavior on transient error:
- Increment
session.model_fallback_index(position in fallback_models array) - Set
session.modelFallbackExhausted = true(prevents advisory spam) - Inject
MODEL FALLBACKadvisory intosession.pendingAdvisoryMessages
Behavior on success:
- Reset
session.model_fallback_index = 0 - Reset
session.modelFallbackExhausted = false
Configuration: Per-agent fallback_models array (max 3) in AgentOverrideConfigSchema:
export const AgentOverrideConfigSchema = z.object({
model: z.string().optional(),
temperature: z.number().min(0).max(2).optional(),
disabled: z.boolean().optional(),
fallback_models: z.array(z.string()).max(3).optional(),
});When a task requires multiple coder attempts (e.g., reviewer rejections), the guardrails hook tracks revision counts and warns when limits are approached.
State fields:
session.coderRevisions— Number of times the coder has been re-delegated for the current tasksession.revisionLimitHit— Boolean flag set whencoderRevisions >= max_coder_revisions
Detection: The toolAfter hook in guardrails.ts increments session.coderRevisions when a coder delegation completes:
// In guardrails.ts toolAfter handler
if (delegation.isDelegation && delegation.targetAgent === 'coder') {
if (!session.revisionLimitHit) {
session.coderRevisions++;
const maxRevisions = cfg.max_coder_revisions ?? 5;
if (session.coderRevisions >= maxRevisions) {
session.revisionLimitHit = true;
session.pendingAdvisoryMessages.push(
`CODER REVISION LIMIT: Agent has been revised ${session.coderRevisions} times ` +
`(max: ${maxRevisions}) for task ${session.currentTaskId ?? 'unknown'}. ` +
`Escalate to user or consider a fundamentally different approach.`
);
}
}
}Reset: coderRevisions resets to 0 when a new coder delegation is dispatched (unless revisionLimitHit is already true).
Configuration: guardrails.max_coder_revisions in GuardrailsConfigSchema (default: 5, range: 1–20).
Snapshot persistence: Both coderRevisions and revisionLimitHit are serialized in SerializedAgentSession and restored on session rehydration.
Parallel execution causes:
- Race conditions in file modifications
- Context inconsistency between agents
- Non-deterministic outputs
- Debugging nightmares
Serial execution provides:
- Predictable order of operations
- Clear causal chain
- Reproducible results
- Easy debugging
Correctness > Speed
QA at the end causes:
- Accumulated bugs
- Cascading failures (Task 3 builds on buggy Task 2)
- Massive rework
- Lost context on what each task was supposed to do
QA per task provides:
- Immediate feedback
- Issues fixed while context is fresh
- No bug accumulation
- Clear task boundaries
v6.9.0 "Quality & Anti-Slop Tooling" adds 6 automated gates to the pre-reviewer pipeline. v6.10.0 adds parallel batch execution for faster QA gates:
| Gate | Purpose | Local-Only |
|---|---|---|
syntax_check |
Tree-sitter parse validation across 20 languages | ✅ |
placeholder_scan |
Anti-slop detection for TODO/FIXME/stubs | ✅ |
sast_scan |
Static security analysis with 63+ rules | ✅ |
sbom_generate |
CycloneDX SBOM generation for dependencies | ✅ |
build_check |
Build/typecheck verification | ✅ |
pre_check_batch |
Parallel verification batch (4x faster) | ✅ |
quality_budget |
Maintainability threshold enforcement | ✅ |
Local-Only Guarantee: All v6.9.0 gates run without Docker, network connections, external APIs, or cloud services. Optional enhancement via Semgrep (if already on PATH).
Session-only memory causes:
- Lost progress on session end
- No way to resume projects
- Re-explaining context every time
- No institutional knowledge
Persistent .swarm/ files provide:
- Resume any project instantly
- Knowledge transfer between sessions
- Audit trail of decisions
- Cached SME guidance (no re-asking)
The hooks system is the foundation of v5.1.x+, extended in v6.0.0 with config-aware hint injection. All features are built as hook handlers registered on OpenCode's Plugin API.
safeHook(handler)— Wraps any hook handler in a try/catch. Errors are logged at warning level; the original payload is returned unchanged. This ensures no hook can crash the plugin.composeHandlers<I,O>(...handlers)— Composes multiple handlers for the same hook type into a single handler. Runs handlers sequentially on shared mutable output. Each handler is individually wrapped insafeHook.readSwarmFileAsync(directory, filename)— Reads.swarm/files usingBun.file().text(). Returns empty string on missing files.estimateTokens(text)— Conservative token estimation:Math.ceil(text.length * 0.33).
| Hook Type | Handler | Purpose |
|---|---|---|
experimental.chat.messages.transform |
composeHandlers(pipelineTracker, contextBudget, ccCommandInterceptHook) |
Pipeline logging + token budget warnings; progressive task disclosure; deliberation preamble injection; tier-based behavioral prompt trimming; CC command interception (hard-block /reset, soft-correct /plan, log HIGH severity) |
experimental.chat.system.transform |
systemEnhancerHook |
Inject phase/task/decisions + cross-agent context; reviewer receives semantic AST diff summary with blast radius (consumers count) from buildSemanticDiffBlock() |
experimental.session.compacting |
compactionHook |
Enrich compaction with plan.md + context.md data |
command.execute.before |
safeHook(commandHandler) |
Handle /swarm slash commands |
tool.execute.before |
safeHook(activityHooks.toolBefore) |
Track tool usage per agent; append written file paths to modifiedFilesThisCoderTask; reset tracking on coder delegation |
tool.execute.after |
safeHook(activityHooks.toolAfter) |
Record tool results + trigger flush; advance per-task state machine on gate completions; populate lastGateOutcome; check scope containment after coder task; set lastScopeViolation on drift |
chat.message |
safeHook(delegationHandler) |
Track active agent per session; extract FILE: directives into declaredCoderScope; advance state to coder_delegated on new coder delegation |
The OpenCode Plugin API allows one handler per hook type. When multiple features need the same hook type (e.g., pipeline-tracker and context-budget both use experimental.chat.messages.transform), they must be composed via composeHandlers() into a single registered handler.
Five new tools extend the architect's decision-making capabilities with intelligence gathering and QA auditing, plus a framework detection layer for first-class framework-specific tooling.
Extracts TODO, FIXME, and HACK annotations across the codebase using regex matching and file discovery (Node.js native glob for cross-platform safety).
Usage: Phase 0 (resume check) or Phase 2 (discovery) to identify pre-existing work items and prioritize planning.
Input: paths (directory whitelist), tags (annotation types), exclude (directory patterns)
Output: Structured JSON with file, line, tag, and content for each annotation
Safety: Validates paths against workspace root, rejects shell metacharacters, enforces file size limits
Audits completed tasks in .swarm/evidence/ against required evidence types (review, test, diff, approval) and ensures a valid retrospective evidence bundle exists before a phase can be completed.
Usage: Phase 6 (phase complete) to verify every task has sufficient QA artifacts
Input: Task ID pattern (wildcard support)
Output: JSON with per-task evidence status, missing types, and overall completeness score
Safety: Validates task ID format (N.M, N.M.P, retro-N, or internal tool IDs), skips symlinks, reads JSON with size limits
Read-only tool to query the gate status of a specific task. Reads .swarm/evidence/{taskId}.json and EvidenceBundle entries and returns structured JSON describing which gates have passed, which are missing, and the overall task status.
Usage: Any phase to check task completion status without mutating evidence
Input: task_id (task identifier in N.M, N.M.P, retro-N format, or internal tool ID like "sast_scan", "quality_budget", etc.)
Output: JSON with taskId, status (all_passed|incomplete|no_evidence), required_gates, passed_gates, missing_gates, gates map, message, todo_scan (advisory TODO count if available), and secretscan_verdict (v6.33)
Secretscan verdict (v6.33): The tool now scans EvidenceBundle entries for secretscan type evidence using the isSecretscanEvidence type guard. When secretscan entries are found, it reports:
pass— No secrets found (verdict: pass, approved, or info)fail— Secrets detected (verdict: fail or rejected), status becomesincompleteand task is BLOCKEDnot_run— No secretscan evidence found for this task
Safety: Validates task ID format against three accepted patterns (canonical N.M or N.M.P numeric format, retrospective format retro-N, or internal tool IDs like sast_scan/quality_budget/syntax_check/placeholder_scan/sbom_generate/build/secretscan), enforces path containment within workspace .swarm/evidence/ directory, reads-only (no writes)
Wraps npm audit, pip-audit, cargo audit, and composer audit via Bun.spawn to identify security vulnerabilities in project dependencies.
Usage: Phase 2 (discovery) or Phase 6 (phase complete) to scope security risk and feed results to reviewer
Input: ecosystem (npm|pip|cargo|composer), days (vulnerability age), top_n (limit results)
Output: Structured CVE data with severity, patched versions, and advisory URLs
Safety: Validates enum args strictly, bounds-checks integers (1-365 days, 1-100 results), enforces timeout via Promise.race
Combines cyclomatic complexity analysis with git churn metrics to identify high-risk modules before implementation.
Usage: Phase 0/2 (early warning) or Phase 6 (post-implementation assessment) to flag modules needing stricter QA
Input: paths (file patterns), metrics (complexity|churn|both)
Output: Ranked list of risky files with complexity score, recent commits, and risk level
Safety: Uses Bun.spawn for git commands (not shell pipes), parses output in JavaScript, cross-platform path handling
Compares OpenAPI specification files against actual route implementations to surface undocumented routes and phantom spec paths.
Usage: Phase 6 (when API routes were modified) to catch documentation drift before release
Input: spec_file (path to OpenAPI JSON/YAML), routes_dir (implementation directory)
Output: Drift report with missing implementations, extra routes, and parameter mismatches
Safety: Validates spec file extension whitelist and size limits (<10MB), uses lstatSync to skip symlinks, YAML parsing with regex g flag for multi-line patterns
Implements a two-pass progressive disclosure for documentation: Pass 1 indexes docs at plan time, Pass 2 extracts relevant constraints at task start.
Scans project documentation files and builds an index manifest at .swarm/doc-manifest.json.
Usage: Phase 4 (PLAN mode) to build documentation index before tasks begin
Input: force (optional boolean, force re-scan even if cache is valid)
Output: JSON with success, files_count, cached, and full manifest object
Manifest schema:
interface DocManifest {
schema_version: 1;
scanned_at: string; // ISO timestamp
files: DocManifestFile[]; // Sorted by path
}
interface DocManifestFile {
path: string; // Relative to project root
title: string; // First # heading or filename
summary: string; // First non-empty paragraph (max 200 chars)
lines: number; // Total line count
mtime: number; // fs.statSync().mtimeMs for cache invalidation
}Discovery patterns: Uses doc_patterns from DocsConfigSchema plus extras (ARCHITECTURE.md, CLAUDE.md, AGENTS.md, .github/*.md, doc/**/*.md). Skips node_modules, .git, .swarm, test files, and type definitions.
Caching: mtime-based — only re-scans if any indexed file has changed since last scan.
Reads the manifest, scores docs against task context using Jaccard bigram similarity, and extracts actionable constraints into .swarm/knowledge.jsonl.
Usage: Phase 5 (EXECUTE mode) at task start to load relevant documentation constraints
Input: task_files (array of file paths), task_description (string)
Output: JSON with success, extracted count, skipped count, and per-doc details with score and constraints
Algorithm:
- Read
.swarm/doc-manifest.json(or generate viadoc_scanif missing) - Build task context from
task_files+task_description - Compute Jaccard bigram similarity between task context and each doc's
{path, title, summary} - For docs with score > 0.1 relevance threshold, read full content and extract constraints
- Extract lines matching constraint patterns:
MUST,MUST NOT,SHOULD,SHOULD NOT,DO NOT,ALWAYS,NEVER,REQUIRED; or bullet points with action words - Dedup against existing
.swarm/knowledge.jsonlentries viafindNearDuplicate(0.6 similarity threshold) - Append non-duplicate constraints as
SwarmKnowledgeEntryobjects
Constraints: Max 5 per document, 15-200 characters each, markdown stripped.
Detects Laravel projects via multi-signal logic in src/lang/framework-detector.ts. Requires 2-of-3 signals for a positive detection:
| Signal | Check |
|---|---|
artisan_file |
artisan CLI script exists at project root |
laravel_dep |
laravel/framework present in composer.json |
app_config |
config/app.php exists |
Usage: During discovery (Phase 2) or before test command resolution, to activate Laravel-specific tooling overlays.
Output:
interface LaravelDetectionSignals {
hasArtisanFile: boolean;
hasLaravelDep: boolean;
hasAppConfig: boolean;
signals: ('artisan_file' | 'laravel_dep' | 'app_config')[];
detected: boolean; // true when 2+ signals present
}
interface LaravelCommandOverlay {
test: string; // "php artisan test"
lint: string[]; // ["vendor/bin/pint", "vendor/bin/php-cs-fixer fix"]
analyze: string[]; // ["vendor/bin/phpstan analyse", "vendor/bin/phpstan analyse --memory-limit=1G"]
audit: string; // "composer audit --locked --format=json"
}SAST rules active for Laravel projects: sast/php-laravel-sql-injection, sast/php-laravel-mass-assignment, sast/php-laravel-destructive-migration
**Common Security Patterns
All five tools follow strict security practices:
- Path validation:
path.resolve()+startsWith(workspaceRoot + path.sep)prevents traversal bypass - Command execution: Bun.spawn with array args (never string concat) to prevent shell injection
- Timeout protection: Promise.race on all async operations to prevent hangs
- Input validation: Enum/range checks on all user-supplied arguments
- File access: Node.js native fs (not shell grep/find) for cross-platform safety
The context pruning system now incorporates several new controls introduced in v6.14.12:
- Provider‑aware model limits – token limits are looked up per model via
context_budget.model_limits(e.g., a higher limit for Claude Sonnet). This allows the system to adapt the budget based on the active provider. - Priority‑based pruning tiers – messages are classified using the
MessagePrioritytiers (CRITICAL, HIGH, MEDIUM, LOW, DISPOSABLE). Lower‑priority messages are removed first when the token budget is exceeded. - Agent‑switch enforcement – when
enforce_on_agent_switchis true, a hard context reset is triggered whenever the active agent changes (e.g., fromexplorertocoder). - Tool‑output masking – large tool outputs are masked/truncated once they exceed
tool_output_mask_thresholdtokens, preventing budget overruns.
These enhancements work together to keep the architect’s context within limits while preserving the most important information.
Context pruning manages the architect's context window to prevent overflow.
Registered on experimental.chat.messages.transform (composed with pipeline-tracker):
- Estimates total tokens across all message parts using
estimateTokens() - Looks up model-specific token limit from
context_budget.model_limitsconfig (default: 128,000) - At
warn_threshold(default 70%): injects[CONTEXT WARNING]message - At
critical_threshold(default 90%): injects[CONTEXT CRITICAL]message
Registered on experimental.session.compacting:
- Reads
.swarm/plan.md: extracts current phase + incomplete tasks - Reads
.swarm/context.md: extracts decisions + patterns - Injects these as compaction context strings (max 500 chars each)
- When
.swarm/summaries/exists and contains files:- Injects
[CONTEXT OPTIMIZATION]hint: instructs LLM to replace large tool output blocks (bash, test_runner, lint, diff) with retrieve references, preserving tool name, exit status, and errors - Injects
[STORED OUTPUTS]count showing how many tool outputs are stored
- Injects
- Guides OpenCode's built-in compaction to preserve swarm-relevant context
Registered on experimental.chat.system.transform:
- Injects current phase + task from plan.md (~200 chars)
- Injects top 3 most recent decisions from context.md
- Keeps agents focused even after conversation history is compacted
- Respects
max_injection_tokensbudget (default: 4,000 tokens) - Priority ordering: phase → task → decisions → agent context
- Lower-priority items dropped when budget is exhausted
- v6.0.0: Injects config override hints for
always_security_reviewandintegration_analysis.enabledwhen non-default values are detected
The evidence system persists verifiable execution artifacts per task.
| Type | Fields | Purpose |
|---|---|---|
review |
risk, issues[] | Reviewer findings |
test |
tests_passed, tests_failed | Test engineer results |
diff |
files_changed[], additions, deletions | Code change summary |
approval |
(base fields only) | Explicit approval record |
note |
(base fields only) | Free-form annotation |
secretscan |
findings_count, scan_directory, files_scanned, skipped_files | Secret scan results (v6.33) |
mutation-gate |
verdict, killRate, adjustedKillRate, summary, survivedMutants[] | Phase-close mutation testing gate results — written by write_mutation_evidence tool; blocks phase completion when verdict is fail (v6.68+) |
.swarm/evidence/
├── 1.1/
│ └── evidence.json # EvidenceBundleSchema (array of entries)
└── 2.3/
├── evidence.json
└── diff.patch # Optional raw diff
- Task IDs are sanitized: regex
^[\w-]+(\.[\w-]+)*$, rejects.., null bytes, control chars - Two-layer path validation: sanitize task ID +
validateSwarmPath()on full path - Size limits: JSON 500KB, diff.patch 5MB, total per task 20MB
- Atomic writes via temp+rename pattern
Configurable via evidence config:
max_age_days: Archive evidence older than N days (default: 90)max_bundles: Maximum evidence bundles before auto-archive (default: 1000)auto_archive: Enable automatic archiving (default: false)
Six new automated gates enforce code quality before human review. All gates run locally without Docker or network dependencies.
| Gate | Function | Fail Action |
|---|---|---|
syntax_check |
Tree-sitter parse validation | Return to coder for fix |
placeholder_scan |
Detect TODO/FIXME/stubs | Return to coder to complete |
sast_scan |
Static security analysis (66 rules) | Return to coder for fix |
sbom_generate |
CycloneDX SBOM generation | Log for audit trail |
build_check |
Build/typecheck verification | Return to coder for fix |
pre_check_batch |
Parallel verification (v6.10.0) | Return to coder for fix |
quality_budget |
Maintainability enforcement | Return to coder or adjust limits |
Uses Tree-sitter grammars for 20 languages:
- TypeScript/JavaScript
- Python
- Rust
- Go
- Java
- C/C++
- Ruby
- PHP
- C#
Fail condition: Parse errors, unclosed brackets, invalid syntax Resolution: Coder fixes syntax errors before review
Detects patterns indicating incomplete implementation:
TODO,FIXME,XXX,HACKcomments- Placeholder strings (
placeholder,stub,implement me) - Empty function bodies
- Hardcoded dummy values
Fail condition: Any placeholder pattern in changed files Resolution: Coder completes implementation before review
66+ security rules across 9 languages covering:
- SQL injection vectors (including Laravel-specific
DB::raw()concatenation) - Path traversal patterns
- Hardcoded secrets
- Insecure crypto usage
- XSS vulnerabilities
- Command injection
- Laravel-specific: mass-assignment via empty
$guarded, destructive migrations without rollback
Offline operation: Built-in rule engine, no external API calls Optional enhancement: Semgrep Tier B rules if available on PATH Fail condition: High/critical severity findings Resolution: Coder fixes security issues before review
Generates CycloneDX SBOMs from manifest files:
package.json+package-lock.json(npm)requirements.txt,Pipfile,poetry.lock(Python)Cargo.toml+Cargo.lock(Rust)go.mod+go.sum(Go)pom.xml,build.gradle(Java)Gemfile.lock(Ruby)composer.lock(PHP).csproj+packages.lock.json(C#)
Output: CycloneDX JSON format Purpose: Security auditing, license compliance Fail condition: None (informational gate)
Runs repository-native build commands:
npm run build/tsc --noEmit(TypeScript)cargo build/cargo check(Rust)go build(Go)javac/ Maven / Gradle (Java)python -m py_compile(Python)
Fail condition: Build errors, type check failures Resolution: Coder fixes build errors before review
Enforces configurable thresholds on code changes:
| Budget | Default | Description |
|---|---|---|
max_complexity_delta |
5 | Maximum cyclomatic complexity increase |
max_public_api_delta |
10 | Maximum new public API surface |
max_duplication_ratio |
0.05 | Maximum code duplication ratio (5%) |
min_test_to_code_ratio |
0.3 | Minimum test-to-code ratio (30%) |
Fail condition: Budget exceeded Resolution: Refactor code or adjust budget thresholds
Runs four verification tools in parallel for 4x faster gate execution:
| Tool | Purpose | Gate Type |
|---|---|---|
lint:check |
Code quality verification | Hard gate |
secretscan |
Secret/credential detection | Hard gate |
sast_scan |
Static security analysis | Hard gate |
quality_budget |
Maintainability metrics | Hard gate |
Parallel Execution:
- Uses
p-limitwith max 4 concurrent operations - 60-second timeout per tool
- 500KB combined output limit
- Individual failures don't cascade
Return Value:
{
"gates_passed": true,
"lint": { "ran": true, "result": {}, "duration_ms": 1200 },
"secretscan": { "ran": true, "result": {}, "duration_ms": 800 },
"sast_scan": { "ran": true, "result": {}, "duration_ms": 2500 },
"quality_budget": { "ran": true, "result": {}, "duration_ms": 400 },
"total_duration_ms": 3200
}Configuration:
{
"pipeline": {
"parallel_precheck": true // default: true
}
}Fail condition: Any hard gate fails (lint errors, secrets found, SAST findings, budget exceeded) Resolution: Fix specific failures identified in tool results and retry
All v6.9.0 quality gates:
- ✅ Run entirely locally
- ✅ No Docker containers required
- ✅ No network connections
- ✅ No external APIs
- ✅ No cloud services
Optional enhancement:
- Semgrep CLI (if already installed on PATH)
Twelve commands registered under /swarm:
| Command | Description |
|---|---|
/swarm status |
Shows current phase, task progress (completed/total), and agent count |
/swarm plan |
Displays full plan.md content |
/swarm plan N |
Displays only Phase N from plan.md |
/swarm agents |
Lists all registered agents with model, temperature, read-only status, and guardrail profiles |
/swarm history |
View completed phases with status icons |
/swarm config |
View current resolved plugin configuration |
/swarm diagnose |
Health check for .swarm/ files, plan structure, and evidence completeness |
/swarm export |
Export plan and context as portable JSON |
/swarm reset --confirm |
Clear swarm state files (with safety gate) |
/swarm evidence [task] |
View evidence bundles for a task or list all tasks with evidence |
/swarm archive [--dry-run] |
Archive old evidence bundles with retention policy |
/swarm benchmark |
Run performance benchmarks and display metrics |
/swarm retrieve [id] |
Retrieve auto-summarized tool outputs by ID |
Commands are registered in two steps:
confighook — Addsswarmcommand to OpenCode's command registrycommand.execute.beforehook — Intercepts/swarmcommands and routes to handlers
The command handler uses a factory pattern: createSwarmCommandHandler(directory, agents) creates a closure over the project directory and agent definitions, returning a handler function.
Four features added to delegation-gate.ts and guardrails.ts reduce context waste and improve model decision quality.
When the last user message contains more than 5 task lines (matching - [ ] or - [x] with task IDs), messagesTransform trims the list to a context window: the current task (from session.currentTaskId), 2 tasks before it, and 3 tasks after it. All other message content is preserved. A [Task window: showing N of M tasks] comment marks the trim point. If no currentTaskId is set, no trimming occurs.
At the end of every messagesTransform invocation, a preamble is prepended to the last user message:
[Last gate: {tool} {result} for task {taskId}]
[DELIBERATE: Before proceeding — what is the SINGLE next task? What gates must it pass?]
The preamble is sourced from session.lastGateOutcome. On first invocation (lastGateOutcome is null), an introductory variant fires instead:
[DELIBERATE: Identify the first task from the plan. What gates must it pass before marking complete?]
src/config/constants.ts exports:
LOW_CAPABILITY_MODELS: string[]— Substrings that identify smaller models:'mini','nano','small','free'isLowCapabilityModel(modelId: string): boolean— ReturnstrueifmodelId(case-insensitive) contains any entry inLOW_CAPABILITY_MODELS. Usessession.activeModelas the input, the same field used for rate limiting inmodel-limits.ts.
Three <!-- BEHAVIORAL_GUIDANCE_START --> … <!-- BEHAVIORAL_GUIDANCE_END --> marker pairs wrap verbose behavioral sections in src/agents/architect.ts:
- The BATCHING DETECTION section
- The ARCHITECT CODING BOUNDARIES section
- The QA gate behavioral description paragraphs
When isLowCapabilityModel(session.activeModel) returns true, guardrails.ts messagesTransform strips all text between every marker pair (inclusive) and replaces each removed block with [Enforcement: programmatic gates active]. If session.activeModel is null or undefined, the prompt is left unchanged.
Rationale: Smaller models benefit from shorter, more directive prompts. The programmatic enforcement mechanisms (state machine, hard blocks, scope containment) provide equivalent safety guarantees without verbose behavioral instructions consuming context.
Architect-only tool that pre-declares which files a coder delegation is allowed to modify.
Input:
{
taskId: string, // N.M or N.M.P format
files: string[], // file paths the coder may modify
whitelist?: string[], // additional allowed paths
working_directory?: string
}Validation:
taskIdmust matchN.MorN.M.Pformatfilesmust be non-empty; no null bytes, path traversal (..), or overly long pathsworking_directorymust exist and contain.swarm/plan.jsontaskIdmust exist in the plan; task must not already be in'complete'state
On success: Sets session.declaredCoderScope = mergedFiles (files + whitelist) and session.lastScopeViolation = null on ALL active architect sessions. Returns { success: true, taskId, fileCount }.
On failure: Returns structured error with { success: false, message, errors[] }.
Automatic alternative: delegation-gate.ts extracts FILE: directive values from any coder delegation envelope and stores them as session.declaredCoderScope automatically. An explicit declare_scope call is only needed when scope must be declared before the delegation text is composed.
Tracking (guardrails.ts toolBefore):
- When the architect uses any file-modifying tool (write, edit, patch, create_file, insert, replace), the target file path is appended to
session.modifiedFilesThisCoderTask. - When a coder Task delegation is dispatched,
modifiedFilesThisCoderTaskresets to[].
Checking (guardrails.ts toolAfter):
- After a coder Task delegation completes,
modifiedFilesThisCoderTaskis compared againstdeclaredCoderScope. - If
declaredCoderScopeis non-null and more than 2 files inmodifiedFilesThisCoderTaskare outside the declared scope:session.lastScopeViolationis set to a message listing the undeclared file paths. modifiedFilesThisCoderTaskresets to[]after the check.
Warning injection (guardrails.ts messagesTransform):
- If
session.scopeViolationDetectedis set, a scope violation warning is injected into the next architect message. - The flag clears immediately (before nested conditionals) to prevent stale state across turns.
Agent awareness tracks what each agent is doing and shares relevant context across agents via system prompts. The architect remains the sole orchestrator — there is no direct inter-agent communication.
src/state.ts exports a module-scoped singleton (swarmState) with:
activeAgent: Map<sessionId, agentName>— Which agent is active in each session (updated by chat.message hook)agentSessions: Map<sessionId, AgentSessionState>— Per-session guardrail tracking. Key fields:toolCallCount,startTime,delegationActive— Guardrail counterstaskWorkflowStates: Map<string, TaskWorkflowState>— Per-task state machine. States:'idle' | 'coder_delegated' | 'pre_check_passed' | 'reviewer_run' | 'tests_run' | 'complete'. Transitions are forward-only;completecan only be reached fromtests_run.lastGateOutcome: { gate, taskId, passed, timestamp } | null— Most recent gate result, populated byguardrails.tstoolAfter. Used for deliberation preamble injection in Phase 4 context engineering.declaredCoderScope: string[] | null— File paths from the coder delegation FILE: directives or explicitdeclare_scopetool call (Phase 5). Null means no scope has been declared.lastScopeViolation: string | null— Last scope containment violation message (Phase 5). Set when coder modifies >2 files outside declared scope; cleared after warning is injected.modifiedFilesThisCoderTask: string[]— File paths the architect has written during the current coder task (Phase 5). Reset to[]when the next coder delegation starts.scopeViolationDetected?: boolean— One-shot flag (Phase 5). Set when a scope violation is found; cleared immediately after the warning is injected into the next architect message.model_fallback_index: number— Current index into the fallback_models array (v6.33). Incremented on transient model failure; resets to 0 on success.modelFallbackExhausted: boolean— Flag set when all fallback models have been exhausted (v6.33). Prevents advisory spam on consecutive transient errors.
eventCounter: number— Tracks events for flush thresholdflushLock: Promise | null— Serializes context.md writesresetSwarmState()— Clears all state (used in tests)
State machine helpers: advanceTaskState(session, taskId, newState) enforces forward-only transitions (throws INVALID_TASK_STATE_TRANSITION on illegal transitions); getTaskState(session, taskId) returns 'idle' for unknown tasks.
Scope containment helper: isInDeclaredScope(filePath, scopeEntries) in guardrails.ts resolves both the candidate path and each scope entry with path.resolve() then checks containment with path.relative(). This handles directory entries correctly (a scope entry of src/ covers all files below it) without brittle string includes() matching.
The module has zero imports — it's pure TypeScript with no project dependencies.
When a subagent finishes and returns control to the architect, there's a race condition between the chat.message hook (which updates activeAgent) and the tool.execute.before hook (which checks guardrails). To prevent the architect from inheriting subagent limits during this transition:
- Stale delegation window: If
lastToolCallTimeis >10 seconds old, the session is considered stale and reverts to architect - Delegation active flag: If
delegationActive=false(subagent finished), immediately revert to architect - Early exemption: Three name-based architect checks in the guardrails hook provide defense-in-depth
The 10-second window is tight enough to prevent architect misidentification but loose enough to allow slow subagent operations (file I/O, network).
chat.message hook tool.execute.before hook tool.execute.after hook
───────────────── ──────────────────────── ───────────────────────
│ │ │
├─ Extract agent name ├─ Read active agent from ├─ Record tool result
│ (strip prefix: │ swarmState │ (success heuristic)
│ paid_, local_, ├─ Log: "agent X using ├─ Increment event counter
│ mega_, default_) │ tool Y" ├─ If counter >= 20:
├─ Update activeAgent │ │ └─ Flush to context.md
│ map │ │ (promise-based lock)
│ │ │
The system-enhancer reads the ## Agent Activity section from context.md and maps agent names to context labels:
coder→ implementation contextreviewer→ review findingstest_engineer→ test results- Other agents → general context
Injected text is truncated to hooks.agent_awareness_max_chars (default: 300 characters).
v6.7 is GUI-first and background-first: Slash commands remain the primary control surface, but background automation provides autonomous operation when enabled.
Three modes control background-first rollout:
| Mode | Behavior | Use Case |
|---|---|---|
manual |
No background automation, all actions via slash commands (default) | Conservative rollout, full control |
hybrid |
Background automation for safe operations, slash commands for sensitive ones | Gradual feature rollout |
auto |
Full background automation (target state) | Future production use |
Default: manual for backward compatibility. Enable automation via config.
All v6.7 automation features are gated behind explicit feature flags (all default false):
| Feature Flag | Description | Security |
|---|---|---|
plan_sync |
Plan auto-heal: regenerate plan.md from canonical plan.json when out of sync | Safe - read-only regeneration |
phase_preflight |
Phase-boundary preflight checks before agent execution | Safe - validation-only |
config_doctor_on_startup |
Config Doctor runs on startup to validate/fix configuration | Moderate - auto-fix requires explicit opt-in |
config_doctor_autofix |
Auto-fix mode for Config Doctor (requires config_doctor_on_startup) | Safety: Defaults to false - autofix requires explicit opt-in |
evidence_auto_summaries |
Generate automatic summaries for evidence bundles | Safe - read-only aggregation |
decision_drift_detection |
Detect drift between planned and actual decisions | Moderate - drift detection only |
Typed event system for internal automation events. Events:
- Queue events (
enqueued,dequeued,completed,failed,retry scheduled) - Worker events (
started,stopped,error) - Circuit breaker events (
opened,half-open,closed,callSuccess,callFailure) - Loop protection events (
triggered) - Preflight events (
requested,triggered,skipped,completed) - Phase boundary events (
detected,checked) - Task events (
completed) - Evidence summary events (
generated,error)
All events include timestamp, payload, and optional source identifier.
Lightweight in-process queue with:
- Priority levels:
critical>high>normal>low - FIFO ordering within priority
- Exponential backoff retry (configurable max)
- Max queue size protection (default 1000)
- Retry metadata tracking (attempts, next attempt time, backoff)
Lifecycle manager for background workers:
- Register workers with handler functions
- Configurable concurrency (default 1)
- Auto-start support
- Processing loop with idle detection
- Statistics tracking (processed count, error count, queue size)
Fault tolerance primitive:
- States:
closed(normal) →open(fail fast) →half-open(testing recovery) - Configurable failure threshold (default 5) and reset timeout (default 30s)
- Call timeout support (default 10s)
- Success threshold for half-open → closed (default 3)
Prevents cascading failures during background automation.
Infinite loop prevention:
- Tracks operation frequency over time window (default 10s)
- Configurable max iterations (default 5)
- Operation key for tracking specific operations
- Automatic detection and abort on threshold exceed
Passive status writer for GUI visibility:
- Writes to
.swarm/automation-status.json - Tracks mode, capabilities, phase, triggers, outcomes
- GUI-friendly summary with readable status text
v6.7 Task 5.1: Automatic plan.json ↔ plan.md synchronization.
loadPlan(directory):
1. Try to load plan.json
├─ VALID → Check if plan.md in sync
│ ├─ In sync → Return plan
│ └─ Out of sync → Auto-regenerate plan.md from plan.json
│
└─ INVALID → Try to migrate from plan.md
├─ plan.md exists → Migrate, save both files, return plan
└─ plan.md doesn't exist → Fall through
2. Try to load plan.md only (no auto-migration)
├─ Exists → Return migrated plan
└─ Doesn't exist → Fall through
3. Neither exists → Return null
Content hash uses natural numeric sorting for task IDs:
"1.2"<"1.10"(not"1.2" < "1.10")- Example:
1.1, 1.2, 1.10, 1.11, 2.1(sorted correctly) - Hash stored in plan.md header as
<!-- PLAN_HASH: <hash> -->
Plan.json writes use temp+rename pattern for atomicity:
- Write to
plan.json.tmp.{timestamp} - Atomic rename to
plan.json - Derive plan.md with hash comment
Validates project state before agent execution:
- Checks plan completeness
- Validates evidence requirements per task
- Detects blockers and missing dependencies
- Returns actionable findings with severity levels
Gated behind: automation.capabilities.phase_preflight
Startup service that validates and fixes configuration:
- Validates config schema and types
- Detects stale/invalid settings
- Classifies findings by severity (info/warn/error)
- Proposes safe auto-fixes
Security: Defaults to scan-only mode. Autofix requires explicit automation.capabilities.config_doctor_autofix = true.
Backups: Creates encrypted backups in .swarm/ before auto-fix. Supports restore via /swarm config doctor --restore <backup-id>.
Detects drift between planned and actual decisions:
- Stale decisions (age/phase mismatch)
- Contradictory decisions (use vs don't use, keep vs remove, etc.)
- Caches decisions from
## Decisionssection in context.md - Returns structured drift signals for context injection
Gated behind: automation.capabilities.decision_drift_detection
Aggregates evidence per task and phase:
- Machine-readable JSON summary in
.swarm/evidence-summary.json - Human-readable markdown in
.swarm/evidence-summary.md - Per-task completion status
- Phase-level blockers (missing evidence, incomplete tasks, blocked tasks)
Gated behind: automation.capabilities.evidence_auto_summaries
Commands expose service functionality without blocking UI:
| Command | Function | Security |
|---|---|---|
/swarm preflight |
Run preflight checks on current plan | Safe - validation-only |
/swarm config doctor [--fix] [--restore <id>] |
Config Doctor with optional auto-fix and restore | Moderate - auto-fix opt-in |
/swarm doctor tools |
Tool registration coherence, AGENT_TOOL_MAP alignment, and Class 3 binary readiness check | Safe - validation-only |
/swarm sync-plan |
Force plan.md regeneration from plan.json | Safe - read-only |
All commands:
- Non-blocking (fire and forget for background ops)
- Async execution (don't block OpenCode UI)
- Log results to console
- Store artifacts in
.swarm/
AutomationStatusArtifact provides passive status for GUI:
{
"timestamp": 1234567890,
"mode": "manual",
"enabled": false,
"currentPhase": 2,
"lastTrigger": null,
"pendingActions": 0,
"lastOutcome": null,
"capabilities": {
"plan_sync": false,
"phase_preflight": false,
"config_doctor_on_startup": false,
"config_doctor_autofix": false,
"evidence_auto_summaries": false,
"decision_drift_detection": false
}
}GUI uses getGuiSummary() for display:
- Status (Disabled / Hybrid / Auto)
- Current phase
- Last trigger time
- Pending actions count
- Last outcome (success/failure/skipped)
v6.8 builds on v6.7 with automatic execution triggers and persistent background workers.
createPhaseMonitorHook() in src/hooks/phase-monitor.ts:
- Detects phase transitions in the execution pipeline
- Automatically triggers preflight checks when phase changes
- Registers on
PhaseChangeHookevent hook - Configurable via
automation.capabilities.phase_preflight(requires explicit enable)
createPreflightIntegration() wires the phase monitor into the hook chain:
Phase Monitor Hook → Preflight Service → Auto-trigger if phase changes
Benefits:
- No manual
/swarm preflightcommand needed during execution - Consistent preflight checks at every phase boundary
- Automatic blocker detection before agent execution
PlanSyncWorker class in src/background/plan-sync-worker.ts:
File Watching:
fs.watchonplan.jsonfor real-time synchronization- 2-second polling fallback if
fs.watchfails (network/mount issues)
Debouncing:
- 300ms debounce before processing changes
- Prevents multiple rapid updates from triggering unnecessary regenerations
- Batch writes for better performance
Overlap Lock:
- Exclusive lock on plan.json during regeneration
- Concurrent reads during regeneration (readers/writer pattern)
- Safe shutdown with grace period
Safe Shutdown:
- Graceful shutdown on plugin unload
- Cancel pending operations
- Release all locks before exit
Background service that auto-generates evidence summaries:
- Scheduled generation for long-running tasks
- Aggregates per-task evidence into phase-level summaries
- Writes to
.swarm/evidence-summary.json - Triggers via
/swarm evidence summarycommand
| Configuration | v6.7 Default | v6.8 Default | Reason |
|---|---|---|---|
evidence_auto_summaries |
false |
true |
Long-running tasks benefit from automatic summaries |
plan_sync |
false |
true |
Auto-healing plan.json ↔ plan.md is safe and recommended |
v6.7 configs are fully compatible with v6.8:
- Set
evidence_auto_summaries: truein config to enable automation - Set
plan_sync: truein config to enable background synchronization - Previous configs remain valid (defaults preserved for disabled features)
808 new tests across 6 new test files:
evidence-summary-init.test.ts— Evidence summary service initializationevidence-summary-init.adversarial.test.ts— Error handling and recoveryevidence-summary-automation.test.ts— Auto-generation triggersphase-preflight-auto.test.ts— Phase monitor and auto-triggerplan-sync-worker.test.ts— Worker core functionalityplan-sync-worker.adversarial.test.ts— Edge cases and failuresplan-sync-init.test.ts— Worker initialization
Total: 4008 tests across 136 files
Plugin Init
├── EvidenceSummaryIntegration (auto-generates summaries)
├── PhaseMonitorHook (detects phase changes)
└── PreflightIntegration (wires phase monitor to preflight)
Plugin Init
└── WorkerManager
└── Register PlanSyncWorker (auto-sync plan.json → plan.md)
Manual trigger for evidence summary generation:
// Command handler in src/commands/evidence.ts
evidenceCommand.execute(async (args) => {
await EvidenceSummaryService.generate();
return { success: true, message: 'Evidence summary generated' };
});For Architects:
- Less manual intervention during long-running tasks
- Automatic plan synchronization without refresh
- Consistent preflight checks at every phase boundary
For Projects:
- Resumable, maintainable execution state
- Automatic evidence aggregation
- Reduced risk of plan drift
- Comprehensive audit trail
For Users:
- "set it and forget it" automation
- No breaking changes to existing configs
- Clear visibility via status artifacts
- Graceful degradation on failures
Research findings (Claude Code, Windsurf, JetBrains, Trajectory Miner) support
separating .swarm/context.md into distinct concerns:
.swarm/
├── context.md ← Human-authored rules & architecture (static, version-controlled)
├── patterns.md ← Agent-discovered patterns (auto-updated, 200-line limit)
├── plan.json / plan.md ← Current plan state
├── evidence/
│ ├── retro-1/evidence.json ← Phase 1 retrospective
│ ├── retro-2/evidence.json ← Phase 2 retrospective
│ └── {task-id}/evidence.json ← Task-level evidence bundles
├── events.jsonl ← Event log
└── telemetry.jsonl ← Session observability (JSONL, 10MB rotation)
Key principles:
- Static rules (
context.md) always override auto-learned patterns (patterns.md) - Retrospectives are keyed by
retro-{phase_number}convention - User directives with
scope: projectare persisted tocontext.md - Agent-discovered patterns require 2+ session frequency before persisting
Note: The actual restructuring of
context.mdis deferred to v6.14. This section documents the target architecture for planning purposes.
PR 1 of the stacked parallelization release lands dark infrastructure with zero behavior change. See docs/archive/dev/pr1-foundation.md for the full design note covering:
- What is dark now vs. deferred to PR 2 and PR 3
- Process-global variable audit and isolation classification
- Invariants that prove runtime behavior is unchanged