Skip to content

Conventions steering + adherence scorer + guardrail hardening#7

Merged
Filip-Podstavec merged 30 commits into
mainfrom
feat/adherence-scorer
Jun 3, 2026
Merged

Conventions steering + adherence scorer + guardrail hardening#7
Filip-Podstavec merged 30 commits into
mainfrom
feat/adherence-scorer

Conversation

@Filip-Podstavec

Copy link
Copy Markdown
Owner

Summary

Three bodies of work from a stack self-review.

Guardrail hardening

  • Behavioral tests for block-secrets-precommit (token shapes, allowlist marker, redaction, quote-evasion) — it previously had zero coverage.
  • windows-latest CI job so the bash hook tests actually run on Windows (they silently skip without bash); the step fails if bash/git is missing.
  • Surgical prune of the archived token-savings benchmark (raw transcripts + regenerable git fixtures; 16 → 4.3 MB), keeping the documentary evidence.

Adherence scorer (Phase 1)scripts/score_adherence.py

  • Deterministic naming/casing/structure scoring, --repo / --diff, language-pluggable (Python first). A clean-vs-dirty separation gate guards calibration.

Conventions steering (Phase 2)

  • conventions.ymlbuild-context-map.py folds it into _meta.conventions → the context-surface hook surfaces a compact conventions block before source-file edits (role computed at runtime; even for files with no anchor entry) → ai-first-nudge advisory on casing/vague drift in the edit blob (not the whole file).
  • /conventions-init skill (15th) drafts conventions.yml; never invents house rules, never overwrites a populated file, Python-first casing.
  • Dogfooded on this repo (conventions.yml at root).
  • bench/conventions-eval/ documents the full-repo A/B protocol (the valid quality eval) + a score_diff helper.

The conventions parser ships a stdlib fallback (no PyYAML dependency); a review caught it dropping block-lists and that is fixed + tested.

Test plan

  • pytest tests/ -v — 177 passed, 0 failed, 0 skipped
  • python scripts/check_version_sync.py — OK
  • python scripts/gen-codex-agents.py --check — OK
  • CI: shellcheck (hook changes were not lint-checked locally), the new windows-latest pytest job, and full pytest on ubuntu — gated by this PR.

🤖 Generated with Claude Code

Closes three gaps found auditing the stack's own guardrails.

- Behavioral tests for block-secrets-precommit.sh (block/allow/edge): token
  shapes, staged-diff parsing, allowlist marker, non-commit ignore, quote
  evasion, preview redaction. The secret-scanning hook previously had zero
  behavioral coverage while block-dangerous-git was thoroughly tested.
- A windows-latest pytest job so the bash hook tests actually run on the
  maintainer's platform instead of silently skipping; the step fails if bash
  or git is absent so a skip cannot masquerade as a pass.
- Surgical prune of the archived token-savings benchmark: drop 816 raw per-run
  transcripts (~15 MB) and 118 regenerable bare-repo fixtures, keeping charts,
  manifests, and reports as the honest-history evidence. Stop tracking
  _remotes/ (was re-included via a gitignore negation; regenerable by
  build_fixtures.py) and ignore regenerated raw/.
Approved brainstorm output: deterministic adherence scorer + a hand-confirmed
conventions.yml surfaced pre-write via the context-surface hook + a synthetic
A/B eval, sequenced measurement-first. No model on the critical path; no client
code, name, or description in the repo (eval runs on a committed synthetic
fixture). Next: implementation plan, scorer phase first.
8 TDD tasks (extraction -> 3 metrics -> report -> CLI -> clean/dirty gate ->
catalogue) with complete code per step. Executed via subagent-driven loop.
Adds convention_violation_nudge() to ai-first-nudge.sh: when a Write/Edit/MultiEdit
touches a .py file in a git repo that has conventions.yml, the hook runs
flag_blob_violations() against the new content and emits a non-blocking stderr
nudge naming any identifiers that drift from the declared casing or vague denylist.
Frequency-capped per file per day via conv-nudges-<date>.txt.

Key implementation note: blob is passed via BLOB_DATA env var rather than stdin
because `printf ... | python - <<'PY'` is ambiguous in bash — the heredoc wins
as the script source and stdin reads as empty. Env var sidesteps this cleanly.

Tests: 3 new @requires_git cases in test_hook_behavior.py (fires on violation,
silent on clean edit, silent when no conventions.yml). Full suite: 172/172 pass.
…ount

Adds skills/conventions-init/SKILL.md (15th skill): bootstraps conventions.yml
with inferred casing, seeded denylist, and scaffolded house-rules block.
Bumps skill count 14 -> 15 in README badge, install blurb, Codex copy line,
and smoke-test confirm line.
… fix stale skill count

- README: 14 -> 15 cross-tool skills (only stale instance)
- convention_violation_nudge: use canon_path() for both the grep cap-check
  and the printf append, matching per_dir_agents_md_nudge's approach
- tests: add test_nudge_fires_on_convention_violation_in_multiedit
- tests: add test_nudge_frequency_cap_silences_second_identical_edit
@Filip-Podstavec Filip-Podstavec merged commit 4645d18 into main Jun 3, 2026
5 checks passed
@Filip-Podstavec Filip-Podstavec deleted the feat/adherence-scorer branch June 3, 2026 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant