Skip to content

M7: surface unclassifiedFallback in audit + validate/doctor lint#314

Merged
LanNguyenSi merged 3 commits into
masterfrom
fix/m7-risk-gate-unclassified-audit-validate-lint
Jun 27, 2026
Merged

M7: surface unclassifiedFallback in audit + validate/doctor lint#314
LanNguyenSi merged 3 commits into
masterfrom
fix/m7-risk-gate-unclassified-audit-validate-lint

Conversation

@LanNguyenSi

Copy link
Copy Markdown
Owner

What

M7 (discovery 2026-06-10): surface the Risk Gate's unclassifiedFallback signal end to end, and lint the risk-without-env-scope footgun.

The when: evaluator already computed unclassifiedFallback (true when a risk clause matched ONLY because the action was unclassified, the "unknown is not safe" fail-close) but it surfaced only in explain-policy. An operator reviewing a deny could not tell a genuine critical-severity match from a fail-closed unclassified command.

Changes

  • Audit visibility. PolicyDecision.whenUnclassifiedFallback is set when a match was fail-closed, serialised into the policy_decision ledger row, and rendered on three surfaces: harness audit (table annotation [unclassified-fallback]), harness audit --json (field), harness explain <policy> --trace [--json] (trace projection). The non-ux deny message appends a cause note before the "To satisfy" hint. The ux surface is left operator-curated; the flag still rides the audit record.
  • New validate lint checkPolicyRiskWithoutEnvScope: warns when a when: gates on risk.severity_at_least / risk.category_in / action.reversible without an environment.name scope (those clauses fail-closed to match every unclassified command in every environment).
  • doctor parity. doctor delegates to the same shared check; this corrects a stale doctor walker plus comment that wrongly excluded action.reversible (when-eval treats it like the other arms).
  • docs/risk-gate.md: deny-unclassified-in-production example.

Verification

  • build plus full suite green (2550 passed, 1 skipped).
  • New tests: intercept (flag set/absent, ux carve-out), ledger-record (encode/decode round-trip), validate (3 positive, 3 negative), doctor (action.reversible parity), audit (json plus table render), explain (--trace --json). All mutation-validated.
  • Two independent reviewer passes; all findings (MEDIUM render-gap, MEDIUM audit test-gap, LOWs) addressed.

Refs: harness-discovery-2026-06-10/M7 (b519df5c)

nguyen-si-pp and others added 3 commits June 27, 2026 16:28
…tor lint

M7 (discovery 2026-06-10). The Risk Gate now distinguishes a genuine
classification hit from a fail-closed "unknown is not safe" match:

- PolicyDecision and the serialised audit row carry an optional
  whenUnclassifiedFallback flag; the neutral deny message appends a note
  when a block was caused by an unclassified command rather than a real
  risk classification. The ux surface is unchanged (operator-curated).
- New validate lint checkPolicyRiskWithoutEnvScope warns when a policy
  gates on risk.severity_at_least / risk.category_in / action.reversible
  without an environment.name scope (those clauses fail-closed to match
  every unclassified command in every environment).
- doctor delegates to the same check so doctor and validate stay in
  parity; this also corrects a stale doctor comment + walker that wrongly
  excluded action.reversible (when-eval treats it like the other arms).
- docs/risk-gate.md gains a deny-unclassified-in-production example.

Refs: harness-discovery-2026-06-10/M7 (b519df5c)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address review findings on the M7 change:

- explain --trace [--json] and harness audit now surface the
  whenUnclassifiedFallback flag (TraceProjection field; AuditDecisionRow
  field + [unclassified-fallback] annotation on the reason column). The
  flag was persisted but no read surface rendered it, so a ux-block deny
  had no operator-visible fail-closed signal; CHANGELOG/docs now name the
  three render surfaces accurately instead of overclaiming.
- intercept: build whenFallbackMap in an explicit loop instead of as a
  side effect inside the .filter() predicate (a future filter refactor
  could otherwise silently drop the audit flag).
- intercept: place the fail-closed note before the "To satisfy" hint so
  the cause precedes the remedy in the deny message.
- tests: explain --trace --json renders/omits the flag; ux-path carve-out
  (flag on the decision, clause NOT in the agent-facing reason); a direct
  payloadFromDecision->encode->decode round-trip.

Refs: harness-discovery-2026-06-10/M7 (b519df5c)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tor parity

Final review notes:

- Add two audit tests (harness audit --json field, table reason-column
  [unclassified-fallback] annotation), each with a classified negative
  control. The audit render surface was advertised in CHANGELOG/docs but
  had no mutation-test guard; reverting either audit.ts render line now
  fails a test.
- CHANGELOG: note that harness doctor delegates to the same shared check
  and now also warns on action.reversible-unscoped policies (doctor
  previously excluded it on an incorrect fail-close assumption).

The remaining LOW (whenFallbackMap keyed by policy.name) is consciously
accepted: policy-name uniqueness is enforced by schema/policies.ts.

Refs: harness-discovery-2026-06-10/M7 (b519df5c)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@LanNguyenSi LanNguyenSi merged commit 77050a5 into master Jun 27, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants