M7: surface unclassifiedFallback in audit + validate/doctor lint#314
Merged
LanNguyenSi merged 3 commits intoJun 27, 2026
Merged
Conversation
…tor lint M7 (discovery 2026-06-10). The Risk Gate now distinguishes a genuine classification hit from a fail-closed "unknown is not safe" match: - PolicyDecision and the serialised audit row carry an optional whenUnclassifiedFallback flag; the neutral deny message appends a note when a block was caused by an unclassified command rather than a real risk classification. The ux surface is unchanged (operator-curated). - New validate lint checkPolicyRiskWithoutEnvScope warns when a policy gates on risk.severity_at_least / risk.category_in / action.reversible without an environment.name scope (those clauses fail-closed to match every unclassified command in every environment). - doctor delegates to the same check so doctor and validate stay in parity; this also corrects a stale doctor comment + walker that wrongly excluded action.reversible (when-eval treats it like the other arms). - docs/risk-gate.md gains a deny-unclassified-in-production example. Refs: harness-discovery-2026-06-10/M7 (b519df5c) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address review findings on the M7 change: - explain --trace [--json] and harness audit now surface the whenUnclassifiedFallback flag (TraceProjection field; AuditDecisionRow field + [unclassified-fallback] annotation on the reason column). The flag was persisted but no read surface rendered it, so a ux-block deny had no operator-visible fail-closed signal; CHANGELOG/docs now name the three render surfaces accurately instead of overclaiming. - intercept: build whenFallbackMap in an explicit loop instead of as a side effect inside the .filter() predicate (a future filter refactor could otherwise silently drop the audit flag). - intercept: place the fail-closed note before the "To satisfy" hint so the cause precedes the remedy in the deny message. - tests: explain --trace --json renders/omits the flag; ux-path carve-out (flag on the decision, clause NOT in the agent-facing reason); a direct payloadFromDecision->encode->decode round-trip. Refs: harness-discovery-2026-06-10/M7 (b519df5c) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tor parity Final review notes: - Add two audit tests (harness audit --json field, table reason-column [unclassified-fallback] annotation), each with a classified negative control. The audit render surface was advertised in CHANGELOG/docs but had no mutation-test guard; reverting either audit.ts render line now fails a test. - CHANGELOG: note that harness doctor delegates to the same shared check and now also warns on action.reversible-unscoped policies (doctor previously excluded it on an incorrect fail-close assumption). The remaining LOW (whenFallbackMap keyed by policy.name) is consciously accepted: policy-name uniqueness is enforced by schema/policies.ts. Refs: harness-discovery-2026-06-10/M7 (b519df5c) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
M7 (discovery 2026-06-10): surface the Risk Gate's
unclassifiedFallbacksignal end to end, and lint the risk-without-env-scope footgun.The
when:evaluator already computedunclassifiedFallback(true when a risk clause matched ONLY because the action was unclassified, the "unknown is not safe" fail-close) but it surfaced only inexplain-policy. An operator reviewing a deny could not tell a genuine critical-severity match from a fail-closed unclassified command.Changes
PolicyDecision.whenUnclassifiedFallbackis set when a match was fail-closed, serialised into thepolicy_decisionledger row, and rendered on three surfaces:harness audit(table annotation[unclassified-fallback]),harness audit --json(field),harness explain <policy> --trace [--json](trace projection). The non-ux deny message appends a cause note before the "To satisfy" hint. The ux surface is left operator-curated; the flag still rides the audit record.checkPolicyRiskWithoutEnvScope: warns when awhen:gates onrisk.severity_at_least/risk.category_in/action.reversiblewithout anenvironment.namescope (those clauses fail-closed to match every unclassified command in every environment).action.reversible(when-eval treats it like the other arms).docs/risk-gate.md: deny-unclassified-in-production example.Verification
Refs: harness-discovery-2026-06-10/M7 (b519df5c)