Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/action-validate.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ jobs:
- uses: ./action
with:
target: examples/vulnerable-mcp-server/server.py
# ci-trust forces fail-on-critical + min-score 70; smoke test only checks outputs.
ci-trust: "false"
fail-on-critical: "false"

- name: Upload SARIF for GitHub Code Scanning
Expand Down
19 changes: 19 additions & 0 deletions .github/workflows/test-gate.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,25 @@ jobs:
- run: uv run pytest -m integration tests/integration/
- run: uv run python scripts/validate_trust_layer.py
if: matrix.python-version == '3.12'
- name: Fact coverage gate (enforce, 80%)
if: matrix.python-version == '3.12'
run: |
uv run python - <<'PY'
from mcts.core.config import ScanConfig
from mcts.core.scanner import Scanner
from mcts.reporting.evidence_provenance import fact_coverage

report = Scanner(
ScanConfig(
target="examples/vulnerable-mcp-server/server.py",
findings_trust_mode="enforce",
)
).run()
fc = fact_coverage(report.findings)
pct = float(fc.get("pct", 0))
assert pct >= 80.0, fc
print(f"fact_coverage pct={pct}")
PY
- run: |
uv run python - <<'PY'
from mcts.testing.regression_harness import REGRESSION_THRESHOLD, REGRESSION_TECHNIQUES, evaluate_technique
Expand Down
1 change: 1 addition & 0 deletions .mcts/policy.yaml.example
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ findings_trust_mode: enforce
# min_security_score: 50
# max_absolute_risk: 400
# max_risk_level: high
# max_worst_absolute_risk: 500
# min_category_score_v2:
# injection: 80

Expand Down
20 changes: 20 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,26 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- **Optional analyzer skip rows** — `npm_audit`, `yara_metadata`, `cloud_inspect`, `llm_judge`, `llm_metadata_triage`, `virustotal` emit hygiene findings when deps/keys missing
- **`--ignore-policy` on pentest/readiness** — auxiliary CLI paths can opt out of policy merge
- **GitHub Action** — `max-high`, `max-critical`, and `ignore-policy` inputs
- **SARIF coverage filter** — compliance `finding_kind=coverage` rows excluded from SARIF by default (`include_coverage_findings=True` to export)
- **SARIF v2 metadata** — per-finding `mcts/v2RiskContribution` for top contributors; run-level `mcts/v2TopContributors`
- **Hygiene bronze facts** — readiness, live/static discovery meta, and protocol probe rows emit bronze `evidence.facts`
- **Compliance trust validation** — compliance meta-findings pass through `validate_findings()` when trust is active
- **Fact coverage CI gate** — enforce scans must meet ≥50% structured-fact coverage (ramp toward 80%)
- **JSON truncation** — `max_json_findings` on `ScanConfig` truncates JSON export with scan note
- **MCP IDE scan params** — `scan_mcp_target` accepts `scoring_mode`, trust mode, and v2 gate thresholds
- **HTML letter grade** — dashboard grade uses v2 `security_score` when present
- **GitHub Action default** — `ci-trust` defaults to `true` (display-aligned CI gates)
- **Auxiliary v2 gates** — `build_gate_scan_report()` computes `score_v2` when v2 YAML/CLI gates are set
- **Bronze facts completion** — compliance, readiness OPA/LLM judge via `build_hygiene_finding`
- **CLI `--max-json-findings`** — truncates JSON export with scan note
- **Readiness JSON** — exports `scoring_mode`, `score_v2_note`, and v2 snapshots when scoring is v2/both
- **Vet v2 snapshot** — `scan_score_snapshot` in vet JSON from synthetic gate scoring
- **Fleet `max_worst_absolute_risk`** — machine-wide and inventory `--scan-all` gate
- **Bronze counterfactual (R17 partial)** — analyzer findings with facts get counterfactual under trust
- **fact_coverage CI gate** — raised to **80%** on enforce scans
- **v2 gauge chart** — uses `security_score` when v2 benchmark is available
- **Terminal v2-first** — when `scoring_version=both`, Absolute Risk / Security Score appear first
- **MCP IDE** — `min_category_score_v2` comma gates on `scan_mcp_target`

### Fixed

Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -227,6 +227,8 @@ mcts scan . -o report.sarif --format sarif

Gate cheat sheet: [scoring guide](docs/reporting/scoring-guide.md#ci-gates--pick-one-strategy) · [CI integration](docs/platform/ci-integration.md) · [GitHub Action](action/README.md)

The GitHub Action defaults to `ci-trust: true` (display-aligned gates). Set `ci-trust: false` for legacy template-mode scans.

### Themes

```bash
Expand Down
2 changes: 1 addition & 1 deletion action/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ If the action lives in your repo under `action/`:
| `weights-profile` | `manual_v1` | v2 weights profile when `scoring` is `v2` or `both` |
| `assets-path` | — | Optional `.mcts/assets.yaml` for v2 asset-value overrides |
| `findings-trust-mode` | `off` | Trust layer: `off`, `warn`, or `enforce` |
| `ci-trust` | `false` | Shorthand for enforce + aligned gates (same as `mcts --ci-trust`) |
| `ci-trust` | `true` | Shorthand for enforce + aligned gates (same as `mcts --ci-trust`). Set `false` for template-mode scans. |
| `fail-on-priority-min` | — | Fail when any finding priority_score ≥ threshold (enforce only) |
| `min-evidence-strength` | — | With priority gate: minimum evidence strength |
| `max-high` | — | Fail when high findings exceed count (display under enforce) |
Expand Down
2 changes: 1 addition & 1 deletion action/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ inputs:
description: >
Shorthand for findings-trust-mode enforce with fail-on-critical (same as mcts --ci-trust).
required: false
default: "false"
default: "true"
fail-on-priority-min:
description: >
Fail when any security finding priority_score is at or above this value (0-100).
Expand Down
17 changes: 15 additions & 2 deletions docs/platform/ci-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ Full reference: [action/README.md](../../action/README.md)
| `max-risk-level` | — | v2 band gate (`low` … `critical`) |
| `min-category-score-v2` | — | Comma-separated `category:min` for v2 OWASP tiles |
| `findings-trust-mode` | `off` | Trust layer: `off`, `warn`, or `enforce` (prefer `enforce` / `ci-trust` for CI) |
| `ci-trust` | `false` | Shorthand: enforce + aligned gates (same as `mcts --ci-trust`) |
| `ci-trust` | `true` | Shorthand: enforce + aligned gates (same as `mcts --ci-trust`). Set `false` for template-mode scans. |
| `fail-on-priority-min` | — | Fail when priority ≥ threshold (**enforce** only) |
| `min-evidence-strength` | — | Optional filter for priority gate |
| `extras` | `mcp,sast` | Optional extras to install (`all` for full set) |
Expand Down Expand Up @@ -191,7 +191,7 @@ mcts scan ./server.py \
--min-score 70
```

GitHub Action:
GitHub Action (default `ci-trust: true`; set `false` to opt out):

```yaml
- uses: MCP-Audit/MCTS@v1
Expand Down Expand Up @@ -377,6 +377,19 @@ Pair MCTS gates with required CI checks on `main`. See [CONTRIBUTING.md](../../C
| HTML artifacts | Self-contained; no exfiltration, but contains full scan |
| Secrets in repos | MCTS may flag secrets in scanned source — rotate if leaked in CI logs |

### Fleet gates (`--machine-wide`, inventory `--scan-all`)

Per-server gates run via `collect_gate_violations()`. Fleet-wide v2 cap:

```bash
mcts scan --machine-wide --scoring both --max-worst-absolute-risk 500
mcts inventory --scan-all --scoring both --max-worst-absolute-risk 500
```

YAML: `max_worst_absolute_risk` in `.mcts/policy.yaml` (see `.mcts/policy.yaml.example`).

**Dual exit heuristic:** If no explicit gate fires, machine-wide and inventory scan-all may still exit 1 when any server has critical/high display counts (or v2 `risk_level` high/critical). Prefer explicit `--max-critical` / `--max-worst-absolute-risk` for predictable CI.

---

## Planned CI capabilities
Expand Down
9 changes: 6 additions & 3 deletions docs/reporting/findings-trust-phase0.md
Original file line number Diff line number Diff line change
Expand Up @@ -508,7 +508,7 @@ All mature analyzers including optional/metadata-heavy paths (`npm_audit`, `vuln

### Still raw `Finding()` outside analyzers

Readiness, compliance, and probe/discovery helpers still construct raw `Finding()` rows. Compliance meta-findings set `finding_kind=coverage` and receive `rule_stability` after the trust pipeline (excluded from priority/bronze security gates). **`fuzz/classifier.py` is migrated** — fuzz findings emit bronze `evidence.facts` via `build_analyzer_finding`. Deferred paths pass through `apply_trust_layer()` but do not emit bronze facts unless migrated. The bronze gate applies only to **`experimental`** analyzers when `--enforce-bronze-facts` is set.
Compliance meta-findings use `build_hygiene_finding()` with `finding_kind=coverage` (bronze facts, excluded from security gates). Readiness heuristics, OPA, LLM judge, live/static discovery meta, and protocol probe emit bronze facts via `build_hygiene_finding`. **`fuzz/classifier.py` is migrated** — fuzz findings emit bronze `evidence.facts` via `build_analyzer_finding`. The bronze gate applies only to **`experimental`** analyzers when `--enforce-bronze-facts` is set.

Vulnerable fixture under enforce: **100%** of security findings have `evidence.facts`; **3 display critical** remain (real issues, not overlap noise).

Expand Down Expand Up @@ -554,7 +554,7 @@ When `findings_trust_mode=enforce`, v2 scoring reads **display** severity for:

`finding.severity` (template) is **unchanged** — `RiskScoringEngineV2.verify()` still passes.

Corpus Spearman recalibration is **deferred** until a maintainer run confirms score drift.
Corpus Spearman gate passes at ρ=0.955 (maintainer `--write-package-stats` optional).

---

Expand All @@ -565,8 +565,11 @@ Shipped in-tree:
- Shared `apply_trust_layer()` for scan, fuzz, and inventory entry points
- Bronze CI gate (`--enforce-bronze-facts`) for experimental analyzers without `evidence.facts` (**enforce only**)
- All `src/mcts/analyzers/` paths on `FindingBuilder` / bronze facts
- SARIF excludes `finding_kind=coverage` by default (`build_sarif(..., include_coverage_findings=True)` to export)
- GitHub Action `ci-trust` defaults to `true`
- Hygiene bronze facts on readiness / live/static discovery / protocol probe paths

**Next (product / soak):** flip GitHub Action default to `--ci-trust` after opt-in period; corpus Spearman recalibration post-B2; optional bronze migration for fuzz/readiness/compliance paths.
**Next:** persona tabs; counterfactual on inferrer-only paths without bronze facts; ramp corpus QA.

### Gap fixes (pre-Phase 3)

Expand Down
10 changes: 10 additions & 0 deletions scripts/validate_trust_layer.py
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,16 @@ def main() -> int:
else:
check("vulnerable v2 verify", True)

from mcts.reporting.evidence_provenance import fact_coverage

fc = fact_coverage(vuln.findings)
check("fact_coverage pct >= 80", fc.get("pct", 0) >= 80.0, str(fc))
vuln_sarif = build_sarif(vuln)
comp_results = [
r for r in vuln_sarif["runs"][0]["results"] if r.get("properties", {}).get("analyzer") == "compliance"
]
check("SARIF excludes compliance coverage rows", len(comp_results) == 0)

print(f"\n=== {len(FAILURES)} failure(s) ===")
for f in FAILURES:
print(f" - {f}")
Expand Down
4 changes: 1 addition & 3 deletions src/mcts/analyzers/embedding_secrets.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,9 +73,7 @@ def analyze(self, server: MCPServerInfo) -> list[Finding]:
finding_id="embedding-secrets-semantic-skipped",
analyzer="embedding_secrets",
title="Semantic credential detection skipped",
description=(
"Semantic embedding model unavailable; only regex and phrase fallback ran."
),
description=("Semantic embedding model unavailable; only regex and phrase fallback ran."),
recommendation=(
"Install sentence-transformers and model weights, or disable semantic_secrets."
),
Expand Down
42 changes: 42 additions & 0 deletions src/mcts/analyzers/finding_facts.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,48 @@ def build_analyzer_finding(
return builder.build()


def build_hygiene_finding(
*,
finding_id: str,
analyzer: str,
title: str,
description: str,
severity: Severity,
recommendation: str,
rule_id: str,
match: str,
field: str,
tool: str | None = None,
technique_id: str | None = None,
confidence: float = 0.7,
extra_evidence: dict[str, Any] | None = None,
finding_kind: str | None = None,
) -> Finding:
"""Hygiene/readiness/meta row with bronze facts (R6 migration path)."""
builder = FindingBuilder(
finding_id=finding_id,
analyzer=analyzer,
title=title,
description=description,
severity=severity,
recommendation=recommendation,
).confidence(confidence)
if tool:
builder = builder.tool(tool)
if technique_id:
builder = builder.technique(technique_id)
fact_kwargs: dict[str, Any] = {"rule_id": rule_id, "match": match, "field": field}
if tool:
fact_kwargs["tool"] = tool
builder = builder.fact(**fact_kwargs)
if extra_evidence:
builder = builder.evidence(**extra_evidence)
row = builder.build()
if finding_kind:
return row.model_copy(update={"finding_kind": finding_kind})
return row


def build_skip_finding(
*,
finding_id: str,
Expand Down
Loading
Loading