Skip to content

fix: stop suppressing native signals ruff doesn't actually emit (dogfooding)#22

Merged
maruwork merged 15 commits into
mainfrom
claude/dreamy-brattain-c0180a
Jun 21, 2026
Merged

fix: stop suppressing native signals ruff doesn't actually emit (dogfooding)#22
maruwork merged 15 commits into
mainfrom
claude/dreamy-brattain-c0180a

Conversation

@maruwork

Copy link
Copy Markdown
Owner

概要

ACI を ACI 自身にかけて(dogfooding)見つけた欠陥です。外部解析を有効にすると、ネイティブの検知が減っていました。

  • --profile full(ネイティブのみ):8件、うち CI02_LONG_FUNCTION 2件
  • 同じスキャン+外部解析:6件、long-function が消失

根本原因

ACI は ruff が「ready」というだけで、5つのネイティブシグナル(CI02_LONG_FUNCTION / CI03_TODO_HACK / CI18_PARAMETER_CLUSTER / CI21_BROAD_EXCEPTION_SWALLOW / CI26_RACE_HAZARD)を「ruff が covers する」前提で事前抑制していました。しかし ACI は ruff を既定ルールで起動しており、C901/PLR0915/PLR0913/BLE001/TD はいずれも既定では発火しません。結果、委譲先が何も出さないのにネイティブを抑制 →「解析を増やすと検知が減る」状態でした。

是正

正しい機構は事後の cross-lane dedup(_deduplicate_findings、既存・テスト済み)です。これは「外部レーンが実際に同じ (ci_id, file, line) を報告したときだけ」ネイティブを除去します。事前委譲(_delegated_native_signals / _RUFF_DELEGATED_SIGNALS)を撤去し、dedup に一本化しました。ネイティブは常に走り、ruff が実際に出したものだけが重複除去されます。

検証

  • dogfooding 再確認:自己スキャンの外部込み=ネイティブのみと一致(8件、CI02_LONG_FUNCTION 2件とも保持)。
  • 欠陥を「正解」として固定していたテストを、正しい挙動(ruff が沈黙ならネイティブ保持)に書き換え。
  • 全スイート 831 passed / 1 skipped、ruff・mypy クリーン。
  • CI_REFERENCE.md の「overlaps」記述を、ruff の設定次第である旨に精緻化。

🤖 Generated with Claude Code

saiganakato and others added 15 commits June 21, 2026 11:43
The native detector lane scanned bundled third-party code (pip/_vendor,
vendor/, third_party/, site-packages). On pip, 280 of 362 native findings
(77%) came from _vendor/ — code the project ships but does not own and
cannot fix.

Add _vendor, vendor, vendored, third_party, third-party, site-packages,
bower_components to DEFAULT_GENERATED_PATH_SEGMENTS. After the fix pip
reports 74 findings, 0 from _vendor, while its own code is still scanned.

This is the native twin of the eslint built-JS fix: both lanes were
scanning code that is not the project's own source. Locked by
test_vendored_third_party_code_is_not_scanned; sample-report mirrors and
ACI_EVIDENCE.md (§2i) updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…policy

§2h (eslint) and §2i (native) were the same defect — scanning code that is
not the project's own source — fixed twice because each lane kept its own
skip set and they had drifted apart. The borrowed-analyzer skip sets were
each missing ten segments the native policy already had, including every
vendored one (_vendor, vendor, vendored, third_party, site-packages,
bower_components).

The report already dropped these (external findings are bounded to the
native-discovered file set), but the borrowed tools still processed the
code: on a pip-scale _vendor/ tree, handing hundreds of bundled files to
mypy can exhaust the per-analyzer timeout and starve the project's own
code of findings.

Derive _PYTHON_ANALYZER_SKIP_SEGMENTS / _SEMGREP_SKIP_SEGMENTS /
_ESLINT_SKIP_SEGMENTS from DEFAULT_GENERATED_PATH_SEGMENTS plus lane-
specific extras, so they cannot drift again. Add a drift-guard test and a
vendored file-discovery test; ACI_EVIDENCE.md §2j records the root cause.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Reviewing a branch that deletes files reports the deleted path in git's
changed set while it is gone from the working tree. Verified by a real
git rename+delete probe that the scan handles this gracefully (the
deleted file is simply absent from the rglob'd target set); this test
locks that guarantee so a future refactor cannot reintroduce a crash or a
phantom finding for a vanished file.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The operations baseline is the backbone of the "only new issues" review,
and the source of three earlier defects (all from encoding a finding's
identity by line number). Verified by a real scan->baseline->edit->rescan
probe that the workflow is sound:

- existing findings keep their identity after an unrelated line shift
  (fingerprint match short-circuits the stale line in the baseline entry),
- a genuinely new finding is flagged new,
- a fixed finding is reported resolved.

This end-to-end test ties fingerprint stability to operations matching in
one guarantee, where the prior tests covered each half separately. The
waiver path (active-waiver -> accepted/accepted-residual) was probed too
and is already covered by existing tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…scan crash

ACI ingests JSON/SARIF from 13 external tools. The parse sites caught only
json.JSONDecodeError, so output that was well-formed but shape-shifted (a
renamed key, a null where a table was, a list that became an object -- the
ordinary result of a tool version bump) raised KeyError/TypeError/
AttributeError instead. That escaped the parse-failure path and aborted the
whole scan, taking the native findings and every other analyzer down with
one borrowed tool's drift.

Verified: raw ruff/eslint/trivy/osv output in a drifted shape, plus codeql
SARIF with a wrong-typed `runs` and a gitleaks report with a wrong-typed
field, all crashed before the fix.

Broaden the three parse sites (stdout JSON, gitleaks report, codeql SARIF)
to a shared _ANALYZER_PARSE_ERRORS set so any structural mismatch becomes a
recorded parse-failure for that one analyzer while the scan continues. One
definition, used at all three sites, so the lanes cannot drift apart.
Regression tests cover the drift cases and confirm valid output still parses.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ACI's purpose includes making users aware that even its declared checks are
not 100% -- a clean result does not prove the absence of issues. That
disclosure already rides in the JSON report, but the GitHub PR summary, the
surface where a reviewer actually concludes "ACI passed, the code is clean,"
omitted it. A zero-finding pass is exactly where that false confidence is
highest, yet the summary said only "Gate: pass / Findings: 0."

Add the disclosure as a footer to build_github_summary_markdown, sourced
from the report's own detection_disclosure key (single source of truth), so
the honest bound travels to the point of use. Reports without the key still
render. This serves the "make the bound explicit, at the point of use" half
of ACI's purpose, which until now lived only in docs and the machine report.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… report

Adopting ACI on an existing codebase means accepting today's findings as
pre-existing so future scans surface only NEW ones. That baseline was
hand-authored TOML -- run a scan, read the JSON, copy each fingerprint by
hand. For the tool's central adoption workflow, the first step was manual
TOML editing, which is the opposite of "easy to use."

Add `aci emit-baseline --report report.json --output ops.toml`, following the
existing emit-sarif/emit-annotations/emit-github-summary pattern (report in,
artifact out, same --report-scope-class/--report-owner-lane filters). The
whole adoption loop is now two commands:

  aci scan --target . --output report.json
  aci emit-baseline --report report.json --output ops.toml
  aci scan --target . --operations-file ops.toml --fail-on-new-findings

Each entry is keyed by the finding's fingerprint (stable across line shifts),
never the line number -- encoding identity by line was the root of three
earlier defects. Output is sorted for a clean diff on regeneration, and the
TOML is escaped so paths with metacharacters still parse. Verified end to end:
generate -> load_operations_state -> re-scan marks findings existing-baseline.
QUICKSTART and CONFIGURATION updated to lead with the generated baseline.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ooding)

Found by running ACI on its own source: enabling external analyzers DROPPED
native findings. `--profile full` (native only) reported 8 findings incl. 2
CI02_LONG_FUNCTION; the same scan with external analyzers reported 6, the
long-function findings gone.

Root cause: ACI pre-emptively suppressed five native signals
(CI02_LONG_FUNCTION, CI03_TODO_HACK, CI18_PARAMETER_CLUSTER,
CI21_BROAD_EXCEPTION_SWALLOW, CI26_RACE_HAZARD) whenever ruff was merely
*ready*, on the assumption ruff covered them. But ACI invokes ruff with its
default rule set, which emits none of C901/PLR0915/PLR0913/BLE001/TD — so the
delegation suppressed native coverage with nothing replacing it. Turning on
more analysis silently produced fewer findings.

The post-hoc cross-lane dedup (_deduplicate_findings) is the correct and
already-tested mechanism: it removes a native finding only where the external
lane actually reported the same (ci_id, file, line). Remove the pre-emptive
_delegated_native_signals / _RUFF_DELEGATED_SIGNALS machinery and rely on the
dedup. Native now always runs; ruff dedupes only what it genuinely emits.

Verified by dogfooding: ACI's self-scan with external analyzers now matches
native-only (8 findings incl. both CI02_LONG_FUNCTION). Rewrote the test that
encoded the buggy behavior; CI_REFERENCE wording tightened.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@maruwork maruwork merged commit c91f448 into main Jun 21, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant