Skip to content

Releases: luiseiman/dotforge

v3.7.1 — Evidence-based compaction policy (80% threshold)

05 May 14:23

Choose a tag to compare

Política de compactación basada en evidencia

Investigación combinando academia (Liu et al. Stanford 2023, Chroma Research, Greg Kamradt) y práctica de campo en X (Boris Cherny y Cat Wu de Anthropic, Daniel San, Avthar, Paweł Huryn) consolidada en política operacional dotforge. Threshold canónico: 80% del context window (no 50%, no 96.7% default).

Nuevas piezas

  • domain/compaction-strategy.md (nueva domain rule, 70 líneas) — política basada en evidencia: threshold 80%, distinción /compact vs /clear vs subagent, anti-patterns confirmados, cache economy (Paweł Huryn — invalidación de prefix cache), re-anchoring para mitigar "lost in the middle" (Liu et al.). Citas con URLs.
  • /forge compact-task (nuevo slash command) — wrapper de /compact con hint estandarizado dotforge: preserve decisions, files modified, pending TODOs, behaviors disabled, last commit; drop tool output verbose. Resuelve el anti-pattern de /compact sin instructions custom.
  • /forge context-status (nuevo slash command) — reporte read-only sobre uso estimado del context window, cache health proxy (basado en tool-latency.sh p50), edits recientes, recomendación de acción. No compacta.
  • pre-compact-warning.sh (nuevo hook, wired en UserPromptSubmit) — alerta proactiva al 80% (warning) y 90% (urgent). Estimación: bytes del transcript / 5. Configurable via env vars: CLAUDE_CONTEXT_LIMIT, CLAUDE_COMPACT_WARN_PCT, CLAUDE_COMPACT_URGENT_PCT. Smoke-tested en 3 escenarios (debajo del threshold, warning, urgent).
  • docs/internal/compaction-strategy.md (~200 líneas) — guía operacional canónica con flow chart ASCII, tabla de decisión /compact vs /clear vs subagent, configuración por tipo de proyecto (light/standard/heavy), bibliografía completa (académica + X).

Wiring

  • .claude/settings.json: nuevo bloque UserPromptSubmit con pre-compact-warning.sh (timeout 3s)
  • template/settings.json.tmpl: idem para propagación a 12 proyectos
  • template/hooks/pre-compact-warning.sh: copia propagable

Hallazgos clave de la investigación

  • Liu et al. (Stanford 2023): 30%+ accuracy loss para info en el medio del contexto (7-50% depth)
  • Chroma Research (2024): 18 modelos frontier degradan dentro de su ventana declarada — Sonnet 200K muestra caídas desde 50K tokens
  • Greg Kamradt: GPT-4 recall degrada >73K tokens
  • Boris Cherny (Anthropic): auto-compact dispara ~155K tokens; plan acceptance auto-clears context
  • Cat Wu (Anthropic): defiende auto-compact como preservador de info crítica
  • Daniel San (X): mantiene auto-compact OFF, hook al 80% en producción ("every time it triggered for me, I lost important context")
  • Avthar (X): "actively clear context yourself using /clear or /compact rather than waiting for auto-compact to happen mid-task, which can hurt performance"
  • Paweł Huryn (X): cache economics dominan la decisión — bug de marzo 2026 causó 20× cost inflation por cache roto

Convergencia de la evidencia

Threshold Veredicto
50% Demasiado agresivo. Pérdida de thread reciente, summary acumula degradación
80% Sweet spot. Coinciden Daniel San, Avthar, evidencia académica con safety margin
96.7% (default) Demasiado tarde. Calidad ya degradada al disparar auto-compact

v3.7.0 — Smart init — startup snapshot + drift + Setup validation

05 May 13:11

Choose a tag to compare

Init inteligente — startup snapshot + drift detection + Setup validation

Cuatro piezas nuevas que cierran la simetría con auto-compact (v3.6.3): el SessionStart ahora captura, compara y persiste el estado inicial; el Setup hook valida invariantes antes de cualquier tool call.

Nuevos hooks

  • .claude/hooks/session-startup.sh (wired en SessionStart, todos los source ≠ compact):

    • Captura branch, HEAD short, working tree count, archivos .claude/ editados en últimas 24h, TODOs/FIXMEs pendientes, behaviors deshabilitados
    • Compara HEAD actual con el HEAD del último snapshot en startup-history/ → emite "drift" line con commits-ahead
    • Escribe .claude/session/last-startup.md (snapshot completo) + startup-history/<ISO>.md (rotating, últimos 5)
    • Inyecta brief al stdout (Claude lo recibe como contexto inicial) SOLO si hay algo notable: tree dirty, recent edits, drift, behaviors off, TODOs pending. Silencioso si todo limpio
    • Silencioso en source=compact (delegado a session-restore.sh)
  • .claude/hooks/pre-session-check.sh (wired en Setup, matchers init y maintenance):

    • Valida invariantes en claude --init-only / claude --maintenance:
      1. settings.json es JSON válido
      2. block-destructive.sh presente + ejecutable (security baseline)
      3. behaviors/index.yaml es YAML válido (si existe)
      4. Todos los hooks wireados existen y son ejecutables
      5. DOTFORGE_DIR resuelve (warn only)
    • Exit 2 bloquea session start si hay errores críticos
    • Output: silencioso en éxito, checklist completo en fallo

Cambios

  • template/settings.json.tmpl — nuevos hooks wireados:
    • SessionStart agrega tercer entry: session-startup.sh (timeout 10s)
    • Setup con matchers init y maintenance apunta a pre-session-check.sh
  • template/hooks/session-startup.sh y template/hooks/pre-session-check.sh — copias propagables a los 12 proyectos en próximo /forge sync
  • domain/hook-events.md — documenta el wiring de dotforge en SessionStart (3 hooks) y Setup (pre-session-check). Refleja enabled en index.yaml para los 3 hooks y los matchers para Setup.

Verificación

Smoke tests sobre el proyecto dotforge mismo:

$ printf '{"source":"startup"}' | bash .claude/hooks/session-startup.sh
## Session Startup Brief
**Branch:** main @ fffc0b6
**Working tree:** 6 changed files
**Recent .claude/ edits (24h):** 14
**Behaviors disabled:** search-first,plan-before-code,objection-format

$ bash .claude/hooks/pre-session-check.sh
✓ dotforge pre-session check: all invariants pass

$ printf '{"source":"compact"}' | bash .claude/hooks/session-startup.sh
(silent — delegated to session-restore.sh)

$ # Inject broken hook reference, run check
$ bash .claude/hooks/pre-session-check.sh
── dotforge pre-session check ──
Errors (1):
  ✗ Wired hook missing: .claude/hooks/nonexistent.sh
─────────────────────────────────
exit=2

Lo que cierra de la auditoría inicial

Pre-v3.7.0 Post-v3.7.0
SessionStart sólo en compact re-inyecta contexto Cubre los 4 sources (startup, resume, compact, clear)
No detección de drift entre sesiones session-startup.sh compara HEAD vs último snapshot
Setup hook nunca wireado pese a estar documentado pre-session-check.sh valida invariantes en --init-only
No histórico de session starts startup-history/ rotating, último 5
Sin visibilidad de behaviors disabled al arrancar Brief incluye lista explícita

v3.6.3 — Smart auto-compact — filter + rotating history

05 May 13:11

Choose a tag to compare

Auto-compact inteligente — filtrado y histórico

Capa de filtrado encima del compact_summary que genera Claude Code. Dos mejoras concretas al pipeline existente (PostCompact → last-compact.md → SessionStart restore):

  • scripts/compact-filter.py (nuevo) — pipe filter conservador que reduce el summary antes de persistirlo. Heurísticas seguras:
    • Bloques fenced (```) >40 líneas → primer 5 + último 5 + nota de elisión
    • Runs de ≥30 líneas no-protegidas (sin markdown structure, sin paths, sin keywords decisión/error/fix/pending) → primer 3 + último 3 + nota
    • Paragraphs duplicados ≥3 veces → 1 sola copia
    • Runs de >2 newlines consecutivos → colapso a 2
    • Nunca filtra: lineas con #/-/|/>/=, paths (.md/.sh/.py/etc.), tokens críticos (decision, error, fix, pending, next step, commit, todo, blocker, warning, fail), primeras 10 líneas
    • Output a stdout, métricas (in/out bytes, ratio) a stderr
    • Tests: 2253B → 730B = 68% reducción sobre summary verbose; 22453B → 22447B = ~0% sobre summary ya denso (no daña).
  • .claude/hooks/post-compact.sh + template/hooks/post-compact.sh — pipe summary por compact-filter, con fallback al raw si el filter falla. Métrica [compact-filter] queda en el frontmatter del checkpoint.
  • Histórico rotatorio — últimas 5 compactaciones bajo .claude/session/compact-history/<ISO>.md. Permite diff entre compactaciones consecutivas o recovery si last-compact.md quedó stale.
  • domain/context-window-optimization.md — actualizado con la nota del nuevo comportamiento del hook.

Verificación

Smoke test end-to-end con JSON sintético (40 líneas filler + decision + next steps):

[compact-filter] in=2253B  out=730B  saved=1523B  ratio=0.32

Sobre last-compact.md real de la sesión actual (22 KB de summary denso):

[compact-filter] in=22453B  out=22447B  saved=6B  ratio=1.00

Comportamiento esperado: summaries densos pasan casi sin tocar, summaries con tool dumps verbose se reducen 30-70%. Worst case el archivo queda igual — el filter es seguridad, no compresión agresiva.

v3.6.2 — Audit follow-ups — signal gate, metric rename, parallel-sessions split

05 May 13:11

Choose a tag to compare

Cierre de pendientes de auditoría

Aplicados los 4 pendientes registrados en v3.6.1.

  • hooks/detect-claude-changes.sh — gate de señal: skip auto-stub si TOTAL < 15 archivos AND no hay cambios estructurales (agents/commands/skills = 0). Elimina ruido de inbox que el usuario no podía evaluar.
  • Métrica honesta — not-applicableinformational en practices/metrics.yml (35 entradas), skills/update-practices/SKILL.md, practices/active/*.md (11 frontmatters), practices/README.md, docs/config-validation.md, docs/internal/config-validation-flow.md. Validation rate ahora calcula sobre 19 prácticas trackable (no sobre 54), produciendo 0/19 = 0% — métrica realista, no inflada por información general.
  • registry/projects.yml — header reescrito como "EXAMPLE / REFERENCE FILE" explícito. Aclara que la fuente de verdad es projects.local.yml (gitignored) y por qué hay dos archivos.
  • domain/parallel-sessions.md — 81 → 38 líneas. Las secciones de CLI flags no relacionadas con paralelismo migraron al nuevo domain/cli-flags.md (53 líneas), con globs distintos (CLAUDE.md, agents/*, skills/**/SKILL.md, scripts/**/*.sh, .github/workflows/*.yml) — cargan según contexto distinto.

Domain rules > 50 líneas tras este pase (8 restantes, no críticas)

Archivo Líneas Sobre
rule-effectiveness.md 68 +18
hook-architecture.md 63 +13
auto-mode.md 62 +12
permission-managed-settings.md 60 +10
permission-model.md 59 +9
agent-orchestration.md 59 +9
context-control-patterns.md 54 +4
cli-flags.md 53 +3
plugin-distribution.md 52 +2
context-window-optimization.md 52 +2

Diminishing returns: trim de wording en próxima iteración sin fragmentación adicional.

v3.6.1 — Audit fixes — search-first off, permission-model split

05 May 13:11

Choose a tag to compare

Auditoría crítica + pulidos de calidad

Sesión de auditoría a conciencia detectó tres degradaciones reales y se aplicaron las correcciones baratas + alto retorno.

Cambios

  • behaviors/index.yamlsearch-first.enabled: false. Evidencia: counter=7, escaló a soft_block y el usuario lo desactivó manualmente en sesión. Diseño actual (flag se consume tras cada Write/Edit) genera falsos positivos en sesiones tras compactación o con contexto ya cargado vía Read inicial. Revisitar cuando exista modo "sticky-flag".
  • Hooks generados de search-first removidos — eliminados de .claude/hooks/generated/ y de settings.json. PreToolUse: 8 → 6 hooks. Latencia neta menor; los hooks restantes (block-destructive, no-destructive-git, respect-todo-state×2, verify-before-done×2) siguen activos.
  • domain/permission-model.md dividido — 112 líneas → 59 (core: modes, cascade, prefix detection, core rules, auto-approvals tightening, glob/grep platform note). El nuevo domain/permission-managed-settings.md (60 líneas) absorbe enterprise managed settings, MCP server config y dynamic permissions from hooks. Globs distintos (managed-settings.json, .mcp.json) → cargan sólo cuando aplican.
  • Limpieza de filesystem — borrados 9 backups huérfanos settings.json.bak.20260428-* (dotforge + 8 proyectos), worktree zombi reverent-banzai ya no aparecía.

Auditoría — qué SÍ aporta valor (con evidencia)

  • block-destructive.sh: activo en 12 proyectos, nunca desactivado, intercepta patrones nuevos (find -delete, xargs rm)
  • Fix session-report.sh (v3.3.1): corrigió bug silencioso de 5 meses en métricas
  • tool-latency.sh: datos llegando — Bash p50=53ms, Edit p50=11ms (hooks no son cuello de botella)
  • Domain rules con globs específicos: cargan sólo cuando aplican
  • scripts/audit_all.py + sync_all.py + wire_hooks_all.py: real automatización 12× → 1×

Auditoría — pendientes (no críticos)

  • 8 domain rules siguen >50 líneas (propio límite). Acumular en próximo refactor sin urgencia.
  • practices/metrics.yml: 35/54 = not-applicable. Métrica engañosa — "validated" debería significar "previno error", no "5 ciclos sin pasar nada". Renombrar campo a informational y excluir de validation rate.
  • Registry shadow: projects.yml (committed, ejemplo) vs projects.local.yml (gitignored, real) — aclarar en docs.
  • inbox/*-session-changes.md automáticos sin detalle son ruido. Filtrar en post-session hook si sólo son conteos.

v3.6.0 — Sync from CC v2.1.120-128 round 2 + Setup hook coverage

05 May 13:11

Choose a tag to compare

Sync from Claude Code v2.1.120-128 — round 2 (deeper coverage)

Seven practices captured this morning from a fresh /forge watch pass — all incorporated. One auto-stub rejected. Inbox: 0.

Domain rule updates

  • domain/hook-architecture.md — added Setup event to Session-level cadence (32 events total now). Documents Setup lifecycle: fires for --init-only / --maintenance runs with matchers init and maintenance, distinct from SessionStart (every session) — Setup only fires on explicit request. Also added design tradeoff note for PostToolUse.updatedToolOutput: now works for ALL tools (v2.1.121+), not just MCP — but rewriting can hide errors and breaks audit trail. Prefer additionalContext for augmentation; reserve updatedToolOutput for redaction or compression.

  • domain/hook-events.md — generalized updatedToolOutput from MCP-only to all tools (v2.1.121+). New Setup event payload: matchers init | maintenance, non-blockable, used for credential rotation / env-var provisioning / prerequisite checks BEFORE session starts.

  • domain/permission-model.md — added 5 managed-only enterprise fields (allowManagedPermissionRulesOnly, network.allowManagedDomainsOnly, filesystem.allowManagedReadPathsOnly, strictKnownMarketplaces, blockedMarketplaces) plus pluginTrustMessage for org-specific guidance. New MCP server config section consolidating enableAllProjectMcpServers, enabledMcpjsonServers, etc., with new alwaysLoad: true option (v2.1.121+) that bypasses tool-search deferral per server, and workspace reserved name (v2.1.128+).

  • domain/rule-effectiveness.md — new Runtime placeholders in skill content (v2.1.120+) section: ${CLAUDE_EFFORT} resolves to active effort tier in skill markdown body (not just frontmatter). New Settings fields worth knowing (beyond permissions) section: availableModels, effortLevel, defaultShell, viewMode, enableWeakerNestedSandbox, pluginTrustMessage.

  • domain/parallel-sessions.md--init-only / --maintenance flag now cross-references Setup hook (matchers init / maintenance) and points to hook-events.md.

Docs

  • docs/usage-guide.md — new section 5b. CI / automation covering claude ultrareview [target] non-interactive code review (v2.1.120+, exit 0/1 contract, --json/--timeout flags, GitHub Actions sketch with claude setup-token). Subprocess attribution note: AI_AGENT=claude-code auto-set in subprocesses for platforms that surface it.

  • docs/best-practices.md — new Minor tooling tips (v2.1.120-128) subsection batching: --plugin-dir .zip, claude plugin prune / --prune cascade, AI_AGENT subprocess env, ANTHROPIC_BEDROCK_SERVICE_TIER, --channels API-key auth (channelsEnabled: true requirement), workspace reserved MCP name, claude install [version|stable|latest] for CI pinning.

  • integrations/channels/README.md — added API-key auth note (v2.1.128+): console / API-key users must set channelsEnabled: true; Claude.ai sessions don't need this flag.

Practices

  • 7 practices moved inbox/ → active/, frontmatter incorporated_in: ['3.6.0'].
  • 1 rejected (invisigtht-session-changes — auto-stub, summary-only).
  • Inbox: 0 pending.
  • metrics.yml: 1 new monitoring (posttooluse-updated-output-all-tools — error_type=logic), 6 not-applicable.

v3.5.0 — Sync from Claude Code v2.1.120-128 + agent memory checklist

05 May 13:11

Choose a tag to compare

Sync from Claude Code v2.1.120 → v2.1.128 + agent memory checklist

Six practices incorporated. Three security-relevant (monitoring), one auto-stub rejected.

New domain rule

  • .claude/rules/domain/plugin-distribution.md — covers ${CLAUDE_PLUGIN_DATA} (v2.1.126+ persistent state for plugins surviving updates), CLAUDE_CODE_PLUGIN_SEED_DIR multi-dir layered overlays (base + corporate + personal), managed marketplace governance (strictKnownMarketplaces, blockedMarketplaces, allowManagedPermissionRulesOnly, pluginTrustMessage), reserved server names (workspace since v2.1.128), and lifecycle hygiene (claude plugin prune, --plugin-dir .zip).
  • Migration of dotforge's practices/metrics.yml and inbox/ to ${CLAUDE_PLUGIN_DATA} is documented as a candidate but explicitly out of scope this release (multi-commit work).

Skill / docs / agent updates

  • skills/reset-project/SKILL.md — new Step 5b suggesting claude project purge $PWD post-reset (v2.1.126+) to drop orphaned transcripts, task lists, and ~/.claude.json entry. Verifies CLI availability before suggesting; never runs automatically.
  • docs/usage-guide.md — new "Layered distribution (multi-seed)" subsection covering CLAUDE_CODE_PLUGIN_SEED_DIR overlay pattern; new "PR review flow tip" noting /resume accepts pasted PR URLs (v2.1.122+, GitHub/Enterprise/GitLab/Bitbucket).
  • docs/security-checklist.md — new "--dangerously-skip-permissions tradeoffs (v2.1.121+)" subsection documenting that the flag now bypasses prompts for .claude/skills,agents,commands writes, with explicit warning against pairing with prompts that include unverified content (injection vector that can now write to template files unprompted).
  • agents/{architect,code-reviewer,implementer,security-auditor}.md — appended a "Memory persistence" section to each agent prompt with concrete checklist on when (and when not) to write to .claude/agent-memory/<agent>.md. Targets the agent-memory-underused finding from /forge insights 2026-04-21 (≤2 entries per agent across 5 months).

Practices

  • 6 practices moved inbox/ → active/, frontmatter incorporated_in: ['3.5.0'].
  • 1 rejected (tradingbot-session-changes — auto-stub, summary-only).
  • Inbox: 0 pending.
  • metrics.yml: 4 new monitoring entries (plugin-data-variable, claude-project-purge, skip-permissions-claude-paths, agent-memory-underused), 2 not-applicable.

Verified against

  • Claude Code v2.1.128 (latest as of 2026-05-04). Watch-upstream pass surfaced additional v2.1.120-128 deltas captured for next cycle (PostToolUse.updatedToolOutput for all tools, Setup hook event, alwaysLoad MCP option, claude ultrareview, ${CLAUDE_EFFORT} placeholder, missing settings fields).

v3.4.1 — Backtesting ADR gate rule

05 May 13:11

Choose a tag to compare

New rule — stacks/trading/rules/backtesting-adr-gate.md

Captured from a real ADR retrospective in the tradingview repo: a "Dual Momentum SPY/QQQ/BIL 12m" strategy was declared the official baseline of the passive-US sleeve based on walk-forward OOS Sharpe 1.08 vs QQQ B&H 1.04 (delta = +0.04) and Calmar 2.78 vs 1.66. After fixing a look-ahead bug in the rebalancer, the OOS metrics deflated to 1.06 vs 1.04. Computing PSR(QQQ B&H) per Bailey & López de Prado (2012) for all 9 strategies tested in the repo showed none passed the 0.95 threshold — the "best" strategy gave 70% probability of beating B&H, i.e. 30% probability of being worse.

The new rule encodes:

  • PSR(benchmark) > 0.95 required to claim "baseline", "winner", or "supersedes" in any ADR
  • DSR (Deflated Sharpe Ratio) required when testing > 5 strategies in the same project (multiple-testing correction)
  • Below threshold: ADR may document the strategy as alternative, but must not use the strong words
  • Implementation: ~50 lines stdlib-only via statistics.NormalDist; no scipy needed

Generalization beyond trading: when ranking N options by a noisy metric, compute Pr(top option genuinely better than alternatives). Below threshold, the ranking is decoration — don't anchor decisions on it.

Changed

  • stacks/trading/plugin.json: bumped to v2.1.0, components.rules now lists both rules.
  • practices/active/2026-04-27-psr-gate-baseline-adrs.md: incorporated_in ['3.4.1'].
  • metrics.yml: monitoring entry, error_type=logic.

Inbox processing

  • 1 accepted (psr-gate-baseline-adrs → above)
  • 1 rejected (tradingview-session-changes — auto-stub, summary-only)
  • 1 deferred (agent-memory-underused — low-priority, needs more usage data to evaluate)

v3.4.0 — Sync from Claude Code v2.1.92→v2.1.119 + audit/behavior fixes

26 Apr 21:48

Choose a tag to compare

Highlights

/forge watch pass against code.claude.com covering Claude Code v2.1.92 → v2.1.119. 14 practices accepted, 6 auto-generated rejects, 1 deferred.

Domain rules refreshed

  • Hook event catalogue → 33+ with UserPromptExpansion (slash-command expansion, blockable) and PostToolBatch (end-of-batch validation, blockable). Documented mcp_tool as a fifth hook type with \${tool_input.*} substitution (v2.1.118+). PostToolUse/PostToolUseFailure now carry duration_ms (v2.1.119+). UserPromptSubmit can return hookSpecificOutput.sessionTitle (v2.1.94+).
  • Auto mode \"\$defaults\" placeholder (v2.1.118+) — extends autoMode.allow|soft_deny|environment instead of replacing them. Removes the all-or-nothing trade-off when shipping custom rules.
  • Permission tightening (v2.1.113+): Bash(find:*) allow rules no longer auto-approve -exec/-delete; deny rules now match env/sudo/watch/ionice/setsid wrappers; macOS /private/{etc,var,tmp,home} are dangerous removal targets under Bash(rm:*).
  • Native macOS/Linux builds (v2.1.117+) replace Glob/Grep with embedded bfs/ugrep via Bash. Glob(...)/Grep(...) permission specifiers and hook matchers are now platform-dependent.
  • TUI + idle-return recap: tui setting + /tui toggle (v2.1.110+); awaySummaryEnabled + /recap (v2.1.108+, default-on for telemetry-disabled deployments since v2.1.110). Coexists with dotforge's last-compact.md — different problems (idle return vs compaction survival).
  • Git attribution refresh: attribution.commit/attribution.pr supersede includeCoAuthoredBy; prUrlTemplate for self-hosted GitHub/GitLab/Bitbucket.
  • CLI surface fully documented in domain/parallel-sessions.md: 16+ flags and 6 subcommands (claude install, auth, agents, auto-mode, remote-control, setup-token).

Operational fixes

  • Audit checklist item 14 — scoring v3 behavior coverage now requires ENFORCEMENT (compiled hook under .claude/hooks/generated/ AND a settings.json reference), not just a behaviors/index.yaml declaration. Closes the false-positive that scored projects 1/1 with no runtime effect.
  • verify-before-done regex extended to match bash tests/*.sh, bash <path>/test-*.sh, ./tests/*.sh. Fixes legitimate git push from dotforge being soft-blocked after bash tests/test-*.sh runs. Recompiled hook included so the change takes runtime effect (caveat: behavior YAML edits are inert until `scripts/compiler/compile.sh` is rerun).
  • docs/claude-vs-forge.md: `/usage` is the canonical command (`/cost` and `/stats` are aliases since v2.1.118).

Verified

33/33 tests pass — 19 skills + 8 runtime + 1 compiler + 5 behavior CLI.

Catch-up note

Latest published release was v3.0.0 (2026-04-13). Versions v3.1.0 → v3.3.1 shipped between releases — see docs/changelog.md for the full intervening history.


Full diff: v3.0.0...v3.4.0

v3.0.0 — Behavior Governance

13 Apr 20:30

Choose a tag to compare

dotforge v3.0.0 — Behavior Governance

dotforge v3 ships a runtime behavior governance layer on top of the v2.9 configuration layer. Behaviors are declarative policies on tool calls, compiled to `PreToolUse` hooks that share a session-scoped state file. Opt-in and non-breaking — v2.9 projects run unchanged.

What's new

  • Behavior catalogue (`behaviors/`): `no-destructive-git`, `search-first`, `verify-before-done`, `respect-todo-state` (core, on by default) + `plan-before-code`, `objection-format` (opinionated, opt-in)
  • Declarative DSL: `behavior.yaml` with closed field/operator set, 5-level escalation (silent → nudge → warning → soft_block → hard_block), flag-based temporal gating, template rendering
  • Compiler: YAML → bash hooks + `settings.json` snippet. Conditions enforced at runtime via `regex_match`, `contains`, `starts_with`, …
  • Runtime: mkdir-based locking, TTL 24h, counters, flags, pending_block reinvocation detection for override audit
  • CLI: `/forge behavior list | describe | status | on | off | strict | relaxed` with project and session scopes
  • Audit dimension 14: `/forge audit` now scores v3 behavior coverage (0-1)
  • 33 tests green across runtime, compiler, CLI, and per-behavior scenarios

Example

```
Bash(git push origin main --force)
PreToolUse:Bash hook returned blocking error
PreToolUse:Bash says: Destructive git operation blocked: force push,
hard reset, clean -f, and forced branch delete
are not allowed.
Error: Hook PreToolUse:Bash denied this tool
```

Spec of record

Breaking changes

None. v3 is purely additive. A v2.9.1 project upgraded to v3.0.0 continues to work with zero changes until the user explicitly creates `behaviors/` and wires compiled hooks into `settings.json`.

Test suite

  • runtime: 8
  • compiler: 1
  • CLI: 5
  • search-first: 5
  • no-destructive-git: 2
  • verify-before-done: 3
  • respect-todo-state: 2
  • plan-before-code: 3
  • objection-format: 2

Total: 33 tests green (up from 18 in v3.0.0-alpha.1).