Conversation
… book Phase 64 (FJ-773→FJ-780): 8/8 tickets Done — governance & audit intelligence. Phase 65 defined: operational readiness & deep analysis. Book updated with validate, graph, status Phase 64 examples. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…(2274→2292) New CLI flags: - validate --check-dependency-exists: verify depends_on targets exist - validate --check-path-conflicts-strict: detect same file path on same machine - graph --topological-sort: output valid execution order (Kahn's algorithm) - graph --critical-path-resources: show resources on longest chain - status --resource-apply-age: time since last apply per resource - status --machine-uptime: time since first apply per machine - status --resource-churn: apply frequency per resource from event log - apply --notify-slack-webhook: Slack webhook notification (arg wiring) 18 new tests (2274→2292), all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… book Phase 65 (FJ-781→FJ-788): 8/8 tickets Done — operational readiness. Phase 66 defined: fleet intelligence & compliance. Book updated with validate, graph, status Phase 65 examples. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…2311) New CLI flags: - validate --check-duplicate-names: detect duplicate base names across groups - validate --check-resource-groups: verify resource groups are non-empty - graph --sink-resources: show resources with no dependents (leaf nodes) - graph --bipartite-check: check if dependency graph is bipartite (2-coloring) - status --last-drift-time: show timestamp of last drift per resource - status --machine-resource-count: show resource count per machine - status --convergence-score: weighted convergence score across fleet - apply --notify-telegram: Telegram notification (arg wiring) New file: status_fleet_detail.rs. 19 new tests (2292→2311), all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… book Phase 66 (FJ-789→FJ-796): 8/8 tickets Done — fleet intelligence. Phase 67 defined: advanced graph analysis & monitoring. Book updated with validate, graph, status Phase 66 examples. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…J-804, 2329 tests) Validate: --check-orphan-resources (FJ-797), --check-machine-arch (FJ-801) Graph: --strongly-connected via Tarjan SCC (FJ-799), --dependency-matrix-csv (FJ-803) Status: --apply-success-rate (FJ-800), --error-rate (FJ-802), --fleet-health-summary (FJ-804) Split graph_export.rs → graph_advanced.rs to stay under 500-line limit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…812, 2329→2350) Validate: --check-resource-health-conflicts (FJ-805), --check-resource-overlap (FJ-809) Status: --machine-convergence-history (FJ-806), --drift-history (FJ-810), --resource-failure-rate (FJ-812) Graph: --resource-weight (FJ-807), --dependency-depth-per-resource (FJ-811) Apply: Wire --notify-pagerduty into NotifyOpts with PagerDuty Events v2 API (FJ-808) Split validate_safety.rs -> validate_advanced.rs, tests_graph_core 1/2 -> core_6. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…50→2373) - validate --check-resource-tags (FJ-813): tag convention enforcement - status --machine-last-apply (FJ-814): last apply timestamp per machine - graph --resource-fanin (FJ-815): fan-in count per resource - apply --notify-discord-webhook (FJ-816): Discord rich embed notifications - validate --check-resource-state-consistency (FJ-817): state/type validation - status --fleet-drift-summary (FJ-818): aggregated drift across fleet - graph --isolated-subgraphs (FJ-819): disconnected subgraph detection - status --resource-apply-duration (FJ-820): avg apply duration per type - Split status_fleet_detail.rs → status_operational.rs (500-line limit) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…→2396) - validate --check-resource-dependencies-complete (FJ-821): dep target existence - status --machine-resource-health (FJ-822): per-machine health breakdown - graph --resource-dependency-chain (FJ-823): full chain from root to leaf - apply --notify-teams-webhook (FJ-824): MS Teams adaptive card notifications - validate --check-machine-connectivity (FJ-825): address format validation - status --fleet-convergence-trend (FJ-826): convergence % across fleet - graph --bottleneck-resources (FJ-827): high fan-in + fan-out detection - status --resource-state-distribution (FJ-828): state counts across fleet Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…al paths (2396→2419) Validate: --check-resource-naming-pattern, --check-resource-provider-support Status: --machine-apply-count, --fleet-apply-history, --resource-hash-changes Graph: --critical-dependency-path, --resource-depth-histogram Apply: --notify-slack-blocks Split graph_advanced.rs → graph_paths.rs (FJ-823/827/831/835) to stay under 500-line limit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nce times (2419→2442) Validate: --check-resource-secret-refs, --check-resource-idempotency-hints Status: --machine-uptime-estimate, --fleet-resource-type-breakdown, --resource-convergence-time Graph: --resource-coupling-score, --resource-change-frequency Apply: --notify-custom-template New status_insights.rs module. Split try_status_phase68 + try_status_phase71 helpers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
8 tickets: validate --check-resource-dependency-depth, --check-resource-machine-affinity, status --machine-drift-age, --fleet-failed-resources, --resource-dependency-health, graph --resource-impact-score, --resource-stability-score, apply --notify-custom-webhook. Split validate_advanced→validate_governance (500-line limit). Extract try_graph_paths helper (cognitive complexity). 2442→2463 tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
8 tickets: validate --check-resource-drift-risk, --check-resource-tag-coverage, status --machine-resource-age-distribution, --fleet-convergence-velocity, --resource-failure-correlation, graph --resource-dependency-fanout, --resource-dependency-weight, apply --notify-custom-headers. Extract try_validate_governance helper. 2463→2484 tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement 8 resource lifecycle & operational intelligence commands: - FJ-861: validate --check-resource-lifecycle-hooks - FJ-862: status --machine-resource-churn-rate - FJ-863: graph --resource-dependency-bottleneck - FJ-864: apply --notify-custom-json - FJ-865: validate --check-resource-provider-version - FJ-866: status --fleet-resource-staleness - FJ-867: graph --resource-type-clustering - FJ-868: status --machine-convergence-trend Split graph_paths→graph_scoring, status_insights→status_predictive. 2507 tests pass, all commands dogfooded. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement 8 capacity planning & configuration analytics commands: - FJ-869: validate --check-resource-naming-convention - FJ-870: status --machine-capacity-utilization - FJ-871: graph --resource-dependency-cycle-risk - FJ-872: apply --notify-custom-filter - FJ-873: validate --check-resource-idempotency - FJ-874: status --fleet-configuration-entropy - FJ-875: graph --resource-impact-radius - FJ-876: status --machine-resource-freshness Extract try_status_phase73, collect_type_entropy, flatten find_cycle_risks. 2530 tests pass, all commands dogfooded. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase 77 — Operational Maturity & Compliance Automation: - FJ-877: validate --check-resource-documentation - FJ-878: status --machine-error-budget - FJ-879: graph --resource-dependency-health-map - FJ-880: apply --notify-custom-retry - FJ-881: validate --check-resource-ownership - FJ-882: status --fleet-compliance-score - FJ-883: graph --resource-change-propagation - FJ-884: status --machine-mean-time-to-recovery 2553 tests pass. All commands dogfooded. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase 78 — Automation Intelligence & Fleet Optimization: - FJ-885: validate --check-resource-secret-exposure - FJ-886: status --machine-resource-dependency-health - FJ-887: graph --resource-dependency-depth-analysis - FJ-888: apply --notify-custom-transform - FJ-889: validate --check-resource-tag-standards - FJ-890: status --fleet-resource-type-health - FJ-891: graph --resource-dependency-fan-analysis - FJ-892: status --machine-resource-convergence-rate 2576 tests passing. Extracted validate_ownership.rs module. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase 79 — Security Hardening & Operational Insights: - FJ-893: validate --check-resource-privilege-escalation - FJ-894: status --machine-resource-failure-correlation - FJ-895: graph --resource-dependency-isolation-score - FJ-896: apply --notify-custom-batch - FJ-897: validate --check-resource-update-safety - FJ-898: status --fleet-resource-age-distribution - FJ-899: graph --resource-dependency-stability-score - FJ-900: status --machine-resource-rollback-readiness 2599 tests passing. Milestone: FJ-900 reached. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase 80 — Operational Resilience & Configuration Intelligence: - FJ-901: validate --check-resource-cross-machine-consistency - FJ-902: status --machine-resource-health-trend - FJ-903: graph --resource-dependency-critical-path-length - FJ-904: apply --notify-custom-deduplicate - FJ-905: validate --check-resource-version-pinning - FJ-906: status --fleet-resource-drift-velocity - FJ-907: graph --resource-dependency-redundancy-score - FJ-908: status --machine-resource-apply-success-trend 2622 tests passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Predictive Infrastructure Intelligence: dependency completeness validation, MTTR estimation, centrality scoring, state coverage, convergence forecasting, bridge detection, error budget forecasting, custom throttle notifications. 2645 tests passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Infrastructure Insight & Configuration Maturity: rollback safety validation, dependency lag detection, clustering coefficient, custom aggregate notifications, config maturity scoring, fleet dependency lag, modularity scoring, config drift rate. 2668 tests passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…onfig-merge (Refs PMAT-035) PMAT-041: Drift-aware deployment blocking (#21) — pre-apply drift check PMAT-042: --why change explanation (#106) — plan --why shows reasons PMAT-043: Convergence budget enforcement (#85) — policy.convergence_budget PMAT-044: Pre-apply state snapshots (#129) — policy.snapshot_generations PMAT-045: Reversibility classification (#130) — classify destroy actions PMAT-046: Config merge CLI (#121) — forjar config-merge 22 new tests, 7198 passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove [patch.crates-io] path overrides and /mnt/nvme-raid0 references. These break clean-room CI builds. Spec: sovereign-stack-protected-branch-strategy.md (Section 5) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix: remove hard-coded paths and patch overrides
These pre-existing workflows are superseded by the clean-room gate system (ci.yml). They fail due to path dependencies and run on GitHub-hosted runners, wasting CI minutes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use github.ref instead of github.sha so that multiple pushes to the same branch/PR correctly cancel stale CI runs rather than running in parallel with conflicting container names. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
bashrs SC1035 ("Missing space after 'in' keyword") triggers false
positives on `in` inside quoted strings (e.g., Docker image name
`jaegertracing/all-in-one:1.54`). This blocks sovereign-ai-cookbook
08-observability stack convergence in CI.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
bashrs SC1xxx (syntax) rules have false positives on generated scripts: - SC1035: `in` inside quoted strings (Docker image names) - SC1020: `]` in heredocs and template strings SC2xxx (semantic) rules are retained. The SC1xxx false positives will be fixed properly in bashrs; this unblocks sovereign-ai-cookbook CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Auto-formatted with cargo fmt (Rust 1.93.0). Prerequisite for unified CI lint gate. Co-authored-by: Noah Gift <noah@paiml.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Deploys unified CI template with: - Lint gate: fmt, clippy -D warnings, cargo deny, pmat quality-gate - CPU gates: Mode A (publish sim) + Mode B (source verify) - GPU gates: Mode C (conditional, CUDA repos only) - Deterministic: rust-toolchain.toml pin, cargo-nextest, sccache - Quality: pmat quality-gate --fail-on-violation Spec: docs/specifications/unified-ci-pipeline.md Generated by deploy-unified-ci.sh Co-authored-by: Noah Gift <noah@paiml.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Remove duplicate #[allow(clippy::too_many_arguments)] on cmd_plan. Replace indexed loops with iterator patterns in graph_advanced, graph_export, and staleness. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract helpers from resolve_resource_templates, read_conda_zip, parse_resolved_version, and 8 other functions. Reword Design: comments to remove SATD patterns. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Relative path assertions (benches/, src/) fail if working directory
differs from manifest dir. Use env!("CARGO_MANIFEST_DIR") to resolve
paths absolutely.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bootstrap merge — clean-room gate workflow deployment. Generated by machines/clean-room/deploy-workflows.sh Spec: sovereign-stack-protected-branch-strategy.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When nvidia-smi works (driver present), accept it regardless of version mismatch. Inside --gpus-all containers (Lambda Labs, RunPod), the host driver is passed through and cannot be changed via apt. Previously, a version mismatch (e.g. host=535, requested=550) would attempt apt-get install nvidia-driver-550, which fails on vendor images. check_script: reports match whenever nvidia-smi is functional apply_script: prints NOTICE on mismatch instead of apt-get install Refactored apply_script_nvidia into smaller helpers to reduce cognitive complexity below pre-commit threshold. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
Mar 6, 2026
…drift forensics (Refs PMAT-038) - #124 Stack diff: `forjar stack-diff` compares resources/machines/params/outputs between configs - #37 Security scanner: 10-rule IaC scanner (SS-1 through SS-10) with `forjar security-scan` CLI - #35 Policy-as-code: `policy.security_gate` blocks apply on findings above severity threshold - #20 Drift forensics: `operator` and `config_hash` fields on ApplyStarted events for attribution - Book: security scanning section with rule table and policy gate examples - Score: 98 → 101/166 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
Mar 20, 2026
…drift forensics (Refs PMAT-038) - #124 Stack diff: `forjar stack-diff` compares resources/machines/params/outputs between configs - #37 Security scanner: 10-rule IaC scanner (SS-1 through SS-10) with `forjar security-scan` CLI - #35 Policy-as-code: `policy.security_gate` blocks apply on findings above severity threshold - #20 Drift forensics: `operator` and `config_hash` fields on ApplyStarted events for attribution - Book: security scanning section with rule table and policy gate examples - Score: 98 → 101/166 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
8cf6817 to
f100dab
Compare
noahgift
added a commit
that referenced
this pull request
Mar 21, 2026
…drift forensics (Refs PMAT-038) - #124 Stack diff: `forjar stack-diff` compares resources/machines/params/outputs between configs - #37 Security scanner: 10-rule IaC scanner (SS-1 through SS-10) with `forjar security-scan` CLI - #35 Policy-as-code: `policy.security_gate` blocks apply on findings above severity threshold - #20 Drift forensics: `operator` and `config_hash` fields on ApplyStarted events for attribution - Book: security scanning section with rule table and policy gate examples - Score: 98 → 101/166 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
nvidia-smiworks, accept the installed driver regardless of version mismatch--gpus allcontainers (Lambda Labs, RunPod), the host driver is passed through and cannot be changed via aptapt-get install nvidia-driver-550, which fails on vendor imagescheck_script: reportsmatchwhenevernvidia-smiis functionalapply_script: printsNOTICEon mismatch instead ofapt-get installapply_script_nvidiainto smaller helpers to reduce cognitive complexityRefs FJ-1009
Test plan
🤖 Generated with Claude Code