Conventions steering + adherence scorer + guardrail hardening by Filip-Podstavec · Pull Request #7 · Filip-Podstavec/claude-leverage

Filip-Podstavec · 2026-06-03T09:08:37Z

Summary

Three bodies of work from a stack self-review.

Guardrail hardening

Behavioral tests for block-secrets-precommit (token shapes, allowlist marker, redaction, quote-evasion) — it previously had zero coverage.
windows-latest CI job so the bash hook tests actually run on Windows (they silently skip without bash); the step fails if bash/git is missing.
Surgical prune of the archived token-savings benchmark (raw transcripts + regenerable git fixtures; 16 → 4.3 MB), keeping the documentary evidence.

Adherence scorer (Phase 1) — scripts/score_adherence.py

Deterministic naming/casing/structure scoring, --repo / --diff, language-pluggable (Python first). A clean-vs-dirty separation gate guards calibration.

Conventions steering (Phase 2)

conventions.yml → build-context-map.py folds it into _meta.conventions → the context-surface hook surfaces a compact conventions block before source-file edits (role computed at runtime; even for files with no anchor entry) → ai-first-nudge advisory on casing/vague drift in the edit blob (not the whole file).
/conventions-init skill (15th) drafts conventions.yml; never invents house rules, never overwrites a populated file, Python-first casing.
Dogfooded on this repo (conventions.yml at root).
bench/conventions-eval/ documents the full-repo A/B protocol (the valid quality eval) + a score_diff helper.

The conventions parser ships a stdlib fallback (no PyYAML dependency); a review caught it dropping block-lists and that is fixed + tested.

Test plan

pytest tests/ -v — 177 passed, 0 failed, 0 skipped
python scripts/check_version_sync.py — OK
python scripts/gen-codex-agents.py --check — OK
CI: shellcheck (hook changes were not lint-checked locally), the new windows-latest pytest job, and full pytest on ubuntu — gated by this PR.

🤖 Generated with Claude Code

Closes three gaps found auditing the stack's own guardrails. - Behavioral tests for block-secrets-precommit.sh (block/allow/edge): token shapes, staged-diff parsing, allowlist marker, non-commit ignore, quote evasion, preview redaction. The secret-scanning hook previously had zero behavioral coverage while block-dangerous-git was thoroughly tested. - A windows-latest pytest job so the bash hook tests actually run on the maintainer's platform instead of silently skipping; the step fails if bash or git is absent so a skip cannot masquerade as a pass. - Surgical prune of the archived token-savings benchmark: drop 816 raw per-run transcripts (~15 MB) and 118 regenerable bare-repo fixtures, keeping charts, manifests, and reports as the honest-history evidence. Stop tracking _remotes/ (was re-included via a gitignore negation; regenerable by build_fixtures.py) and ignore regenerated raw/.

Approved brainstorm output: deterministic adherence scorer + a hand-confirmed conventions.yml surfaced pre-write via the context-surface hook + a synthetic A/B eval, sequenced measurement-first. No model on the critical path; no client code, name, or description in the repo (eval runs on a committed synthetic fixture). Next: implementation plan, scorer phase first.

8 TDD tasks (extraction -> 3 metrics -> report -> CLI -> clean/dirty gate -> catalogue) with complete code per step. Executed via subagent-driven loop.

…st collect_diff

…oped)

…coring, source-gate, phasing)

…runcation tests

…for anchor-less files)

Adds convention_violation_nudge() to ai-first-nudge.sh: when a Write/Edit/MultiEdit touches a .py file in a git repo that has conventions.yml, the hook runs flag_blob_violations() against the new content and emits a non-blocking stderr nudge naming any identifiers that drift from the declared casing or vague denylist. Frequency-capped per file per day via conv-nudges-<date>.txt. Key implementation note: blob is passed via BLOB_DATA env var rather than stdin because `printf ... | python - <<'PY'` is ambiguous in bash — the heredoc wins as the script source and stdin reads as empty. Env var sidesteps this cleanly. Tests: 3 new @requires_git cases in test_hook_behavior.py (fires on violation, silent on clean edit, silent when no conventions.yml). Full suite: 172/172 pass.

…ount Adds skills/conventions-init/SKILL.md (15th skill): bootstraps conventions.yml with inferred casing, seeded denylist, and scaffolded house-rules block. Bumps skill count 14 -> 15 in README badge, install blurb, Codex copy line, and smoke-test confirm line.

… fix stale skill count - README: 14 -> 15 cross-tool skills (only stale instance) - convention_violation_nudge: use canon_path() for both the grep cap-check and the printf append, matching per_dir_agents_md_nudge's approach - tests: add test_nudge_fires_on_convention_violation_in_multiedit - tests: add test_nudge_frequency_cap_silences_second_identical_edit

Filip-Podstavec added 30 commits June 2, 2026 14:29

docs(plan): phase-1 adherence scorer implementation plan

e9805ba

8 TDD tasks (extraction -> 3 metrics -> report -> CLI -> clean/dirty gate -> catalogue) with complete code per step. Executed via subagent-driven loop.

feat(scorer): python identifier extraction

f349ca8

feat(scorer): naming_clarity metric

61d434b

feat(scorer): casing_consistency metric

df78faf

feat(scorer): structure metric

cd0dbdf

feat(scorer): assemble report with language coverage

16d3a17

feat(scorer): --repo and --diff CLI modes

a30d250

fix(scorer): count multiline-signature + nested function bodies; robu…

0dc8924

…st collect_diff

test(scorer): clean-vs-dirty separation gate

aef8f0b

docs(scorer): list score_adherence in the catalogue

0bfe95c

docs(scorer): fold scorer into the scripts row (table is directory-sc…

b9d3ac3

…oped)

docs(spec): phase 2 conventions steering design

3f49cc8

docs(spec): phase 2 steering — revise per design review (nudge blob-s…

7d9a119

…coring, source-gate, phasing)

docs(plan): phase 2a conventions delivery loop

6782ad3

feat(conventions): conventions.yml parser + role matcher

b218ba4

feat(context-map): fold conventions.yml into _meta.conventions

7642045

feat(context-surface): surface conventions for source-file edits

df925cf

feat(conventions): dogfood conventions.yml + ship template + catalogue

b1860c0

fix(conventions): fallback parser drops block-lists; add fallback + t…

050f2db

…runcation tests

docs(spec): conventions live in _meta (hook computes role at runtime …

6939ae3

…for anchor-less files)

docs(plan): phase 2b conventions-init + nudge

a27ac41

feat(scorer): flag_blob_violations for the conventions nudge

e50e01b

docs(session): 2026-06-03 conventions steering + scorer (phase 1 & 2)

735dd54

feat(conventions-eval): full-repo A/B protocol + score_diff helper

55dad2d

chore(release): v1.11.0 — conventions steering + scorer + hardening

f6344ec

Filip-Podstavec merged commit 4645d18 into main Jun 3, 2026
5 checks passed

Filip-Podstavec deleted the feat/adherence-scorer branch June 3, 2026 09:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conventions steering + adherence scorer + guardrail hardening#7

Conventions steering + adherence scorer + guardrail hardening#7
Filip-Podstavec merged 30 commits into
mainfrom
feat/adherence-scorer

Filip-Podstavec commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Filip-Podstavec commented Jun 3, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant