Conventions steering + adherence scorer + guardrail hardening#7
Merged
Conversation
Closes three gaps found auditing the stack's own guardrails. - Behavioral tests for block-secrets-precommit.sh (block/allow/edge): token shapes, staged-diff parsing, allowlist marker, non-commit ignore, quote evasion, preview redaction. The secret-scanning hook previously had zero behavioral coverage while block-dangerous-git was thoroughly tested. - A windows-latest pytest job so the bash hook tests actually run on the maintainer's platform instead of silently skipping; the step fails if bash or git is absent so a skip cannot masquerade as a pass. - Surgical prune of the archived token-savings benchmark: drop 816 raw per-run transcripts (~15 MB) and 118 regenerable bare-repo fixtures, keeping charts, manifests, and reports as the honest-history evidence. Stop tracking _remotes/ (was re-included via a gitignore negation; regenerable by build_fixtures.py) and ignore regenerated raw/.
Approved brainstorm output: deterministic adherence scorer + a hand-confirmed conventions.yml surfaced pre-write via the context-surface hook + a synthetic A/B eval, sequenced measurement-first. No model on the critical path; no client code, name, or description in the repo (eval runs on a committed synthetic fixture). Next: implementation plan, scorer phase first.
8 TDD tasks (extraction -> 3 metrics -> report -> CLI -> clean/dirty gate -> catalogue) with complete code per step. Executed via subagent-driven loop.
…coring, source-gate, phasing)
…for anchor-less files)
Adds convention_violation_nudge() to ai-first-nudge.sh: when a Write/Edit/MultiEdit touches a .py file in a git repo that has conventions.yml, the hook runs flag_blob_violations() against the new content and emits a non-blocking stderr nudge naming any identifiers that drift from the declared casing or vague denylist. Frequency-capped per file per day via conv-nudges-<date>.txt. Key implementation note: blob is passed via BLOB_DATA env var rather than stdin because `printf ... | python - <<'PY'` is ambiguous in bash — the heredoc wins as the script source and stdin reads as empty. Env var sidesteps this cleanly. Tests: 3 new @requires_git cases in test_hook_behavior.py (fires on violation, silent on clean edit, silent when no conventions.yml). Full suite: 172/172 pass.
…ount Adds skills/conventions-init/SKILL.md (15th skill): bootstraps conventions.yml with inferred casing, seeded denylist, and scaffolded house-rules block. Bumps skill count 14 -> 15 in README badge, install blurb, Codex copy line, and smoke-test confirm line.
… fix stale skill count - README: 14 -> 15 cross-tool skills (only stale instance) - convention_violation_nudge: use canon_path() for both the grep cap-check and the printf append, matching per_dir_agents_md_nudge's approach - tests: add test_nudge_fires_on_convention_violation_in_multiedit - tests: add test_nudge_frequency_cap_silences_second_identical_edit
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three bodies of work from a stack self-review.
Guardrail hardening
block-secrets-precommit(token shapes, allowlist marker, redaction, quote-evasion) — it previously had zero coverage.windows-latestCI job so the bash hook tests actually run on Windows (they silently skip without bash); the step fails if bash/git is missing.Adherence scorer (Phase 1) —
scripts/score_adherence.py--repo/--diff, language-pluggable (Python first). A clean-vs-dirty separation gate guards calibration.Conventions steering (Phase 2)
conventions.yml→build-context-map.pyfolds it into_meta.conventions→ thecontext-surfacehook surfaces a compact conventions block before source-file edits (role computed at runtime; even for files with no anchor entry) →ai-first-nudgeadvisory on casing/vague drift in the edit blob (not the whole file)./conventions-initskill (15th) draftsconventions.yml; never invents house rules, never overwrites a populated file, Python-first casing.conventions.ymlat root).bench/conventions-eval/documents the full-repo A/B protocol (the valid quality eval) + ascore_diffhelper.The conventions parser ships a stdlib fallback (no PyYAML dependency); a review caught it dropping block-lists and that is fixed + tested.
Test plan
pytest tests/ -v— 177 passed, 0 failed, 0 skippedpython scripts/check_version_sync.py— OKpython scripts/gen-codex-agents.py --check— OKwindows-latestpytest job, and full pytest on ubuntu — gated by this PR.🤖 Generated with Claude Code