**Complexity-Adaptive & Context-Aware Reasoning Fabric** - A research grade Architectural Blueprint & Decision Intelligence Simulation for more reliable and transparent data-driven decision making and Agentic AI Systems, combining complexity & context adaptability, causal inference, bayesian methods to quantify uncertainty & complexity
and epistemic awareness.
Modern AI systems often act as black boxes, producing confident-sounding outputs without clarifying their reasoning, certainty level, or the nature of the problem they're addressing. This creates a "trust gap": users cannot easily distinguish whether an AI answer is based on solid causal evidence, a probabilistic inference, or a simple guess.
CYNEPIC (CYNefin-EPIstemic Cockpit) solves this by enforcing epistemic awareness. The system explicitly classifies every query by its inherent complexity using the Cynefin Framework, then routes it to the appropriate analytical engine -- ensuring the right tool is used for the right problem.
| Problem Type | Analysis Method | Example |
|---|---|---|
| Clear (Obvious) | Rule lookup | "What is the capital of France?" |
| Complicated (Knowable) | Causal Inference (DoWhy) | "Does offering a discount reduce churn?" |
| Complex (Emergent) | Bayesian Inference (PyMC) | "What is the likely conversion rate?" |
| Chaotic (Crisis) | Circuit Breaker | System alert, require human action |
| Disorder (Ambiguous) | Human Escalation | Input is unclear or contradictory |
All outputs are filtered through a Guardian Layer that enforces organizational policies (e.g., "require human approval for decisions affecting >$1M budget") and logs an audit trail for compliance (e.g., EU AI Act).
- Cynefin-based Routing: Automatic classification of query complexity across 5 domains.
- Causal Inference Engine: Discover DAGs, estimate effects, and run refutation tests via DoWhy/EconML.
- Bayesian Exploration: Quantify uncertainty and update beliefs with new evidence via PyMC.
- Guardian Policy Layer: Multi-layer enforcement (YAML + CSL-Core + OPA), human-in-the-loop, and audit trails.
- Currency-Aware Financial Guardrails: Guardian + CSL enforce monetary thresholds with FX normalization (
CARF_FX_RATES_JSON) and fail-safe blocking when conversion evidence is unavailable. - ChimeraOracle Fast Predictions: Pre-trained CausalForestDML models for <100ms causal effect scoring.
- What-If Simulation Framework: Multi-scenario comparison with 6 built-in realistic data generators.
- CSL-Core Policy Verification: Formal, deterministic policy rules with fail-closed safety.
- Policy Scaffolding & Refinement: Auto-generate domain-specific policies with adaptive refinement agents.
- Four-View Dashboard: Tailored views for Analysts, Developers, Executives, and Governance.
- Dark/Light Theme: Full dark mode support with system preference detection.
- Actionable Insights: Persona-specific recommendations, action items with effort badges, and analysis roadmaps.
- Smart Reflector: Hybrid heuristic + LLM self-correction for policy violations with observability.
- Experience Buffer: Semantic memory using sentence-transformers (all-MiniLM-L6-v2) with TF-IDF fallback for similar past analysis retrieval and domain pattern tracking.
- Library API: Notebook-friendly wrappers (
from src.api.library import classify_query, run_pipeline). - Agent Transparency: Track LLM usage, latency, cost, and quality scores across workflows.
- Multi-Source Data Loading: Load data from JSON, CSV, APIs, or Neo4j with automatic quality assessment.
- Streaming Query Mode: Server-sent events for real-time progressive responses.
- EU AI Act Compliance: Built-in compliance reporting and audit trail generation.
- Data Lineage Tracking: Full provenance chain for audit and reproducibility.
- Router Retraining Pipeline: Extract domain override feedback for DistilBERT fine-tuning.
- MCP Server: 18 cognitive tools exposed via Model Context Protocol for agentic AI integration.
- Agentic Chat Actions: Natural-language UI actions (e.g., onboarding launch, latest-analysis simulation compare, governance tab routing).
- Governance Semantic Graph: Purpose-built policy/domain/conflict topology view with explainability (
Why this?,How confident?,Based on what?). - RAG-Augmented Policy Search: In-memory retrieval-augmented generation for governance policy queries with auto-ingestion at startup.
- Agent Memory: Persistent agent memory with compaction and recall for cross-session knowledge retention.
- Document Processor: Upload and ingest PDF/text documents for RAG indexing and policy extraction.
- Embedding Engine: Sentence-transformer embeddings (all-MiniLM-L6-v2) with TF-IDF fallback for semantic search.
- Deployment Profiles: Environment-aware presets (research/staging/production) controlling CORS, auth, rate limiting, and governance defaults.
- Security Middleware: Profile-aware API key auth, per-IP rate limiting, and request size enforcement.
- Causal World Model (Phase 17): Structural Causal Models with do-calculus interventions, forward simulation, and Pearl's 3-step counterfactual reasoning.
- Neurosymbolic Engine (Phase 17): Tight neural-symbolic loop — LLM fact extraction, forward-chaining, shortcut detection, Neo4j graph grounding.
- H-Neuron Sentinel (Phase 17): Hallucination detection via weighted signal fusion (8 signals, configurable weights).
- 3-Layer NeSy-Augmented RAG (Phase 17): Vector + Graph + Symbolic retrieval with Reciprocal Rank Fusion.
- Firebase Auth + Cloud SQL (Phase 17): JWT authentication, SQLite/PostgreSQL factory, per-user analysis history.
- Drift Detection (Phase 18): KL-divergence monitoring of routing distribution for feedback loop safety.
- Bias Auditing (Phase 18): Chi-squared fairness tests on accumulated agent memory.
- Plateau Detection (Phase 18): Convergence monitoring for router retraining pipeline.
- ChimeraOracle Fast-Path (Phase 18): StateGraph-integrated fast causal predictions with Guardian enforcement.
- Supervised Recursive Refinement (SRR): Formally bounded self-improvement model — 4 RSI safety gaps closed, TLA+ verified.
CARF is evaluated against 43 falsifiable hypotheses (H0--H43) across 11 benchmark categories, using synthetic and realistic enterprise data with known ground truth and a raw LLM baseline (same model, no pipeline) for comparison. All benchmarks use fixed random seeds for full reproducibility.
| # | Hypothesis | Measured | Threshold | Result |
|---|---|---|---|---|
| H0 | Router Accuracy -- Cynefin classification on 456 queries | 89.5% (F1 0.895) | >= 85% | PASS |
| H1 | Causal Accuracy -- DoWhy ATE vs raw LLM | MSE ratio 0.0009 (1,138x more accurate) | >= 50% lower | PASS |
| H2 | Bayesian Calibration -- posterior coverage | 100% well-calibrated | >= 90% | PASS |
| H3 | Violation Detection -- Guardian catches all violations | 100% detection | 100% | PASS |
| H4 | Determinism -- same input, same Guardian decision | 100% (50x repetitions) | 100% | PASS |
| H5 | EU AI Act Compliance -- Art. 9, 12, 13, 14 | 100% | >= 90% | PASS |
| H6 | Latency Overhead -- CARF vs raw LLM | 1.9x | <= 5x | PASS |
| H7 | Hallucination Reduction -- grounded queries | 100% (0% both) | >= 40% | PASS |
| H8 | ChimeraOracle Speedup -- fast causal predictions | 40.7x faster | >= 10x | PASS |
| H9 | Memory Stability -- 500+ queries | -37.3% RSS growth | <= 10% | PASS |
| H10 | MAP Accuracy -- cross-domain link detection | 90% | >= 70% | PASS |
| H11 | PRICE Precision -- cost computation | 100% (max err 2.8e-05) | >= 95% | PASS |
| H12 | Governance Latency -- P95 non-blocking | 0.58ms | < 50ms | PASS |
| H13 | PRICE Expanded -- 15-case cost test | 100% | >= 95% | PASS |
| H14 | RESOLVE Conflict Detection -- 30 cases | 86.7% | >= 80% | PASS |
| H15 | Board Lifecycle -- CRUD operations | 100% | 100% | PASS |
| H16 | Policy Roundtrip -- YAML export/import fidelity | 100% | >= 95% | PASS |
| H17 | Counterfactual Accuracy -- vs raw LLM | +25pp (CARF 100% vs LLM 75%) | >= 10pp | PASS |
| H18 | Tau-Bench Agent Compliance -- policy-guided | 100% (30/30) | >= 95% | PASS |
| H19 | Hallucination at Scale -- rate ceiling | 7.0% | <= 10% | PASS |
| H21 | Cross-LLM Agreement -- provider parity | 100% | >= 85% | PASS |
| H22 | CLEAR Composite -- cost/latency/efficacy/alignment | 0.77 | >= 0.75 | PASS |
| H23 | OWASP Injection Block -- prompt injection defense | 100% | >= 90% | PASS |
| H24 | Adversarial Causal Robustness | 70% | >= 70% | PASS |
| H25 | Red Team Defense -- 8 attack surfaces | 100% | >= 85% | PASS |
| H26 | Fairness -- demographic parity ratio | 1.0 | >= 0.80 | PASS |
| H27 | XAI Fidelity -- explainability quality | 80% (3/3 dimensions) | >= 80% | PASS |
| H28 | ALCOA+ Audit Trail -- compliance | 100% | >= 95% | PASS |
| H29 | Energy Proportionality -- Clear < Complicated < Complex | 100% | 100% | PASS |
| H30 | Scope 3 Attribution -- emission accuracy | 100% | >= 85% | PASS |
| H31 | SUS Usability -- System Usability Scale | 68.4 | >= 68 | PASS |
| H32 | Task Completion -- success rate | 100% | >= 90% | PASS |
| H33 | WCAG 2.2 Level A -- accessibility violations | 0 | 0 | PASS |
| H34 | Supply Chain Prediction -- precision | 94% | >= 70% | PASS |
| H35 | Healthcare CATE -- vs RCT ground truth | 98% | >= 90% | PASS |
| H36 | Finance VaR -- Kupiec backtest | p = 1.0 | > 0.05 | PASS |
| H37 | Load Test -- P95 at 25 concurrent users | 42ms | <= 15s | PASS |
| H38 | Chaos Cascade -- containment rate | 100% | >= 80% | PASS |
| H39 | Soak Test -- memory growth over 1000 queries | -1.5% | <= 5% | PASS |
| H40 | Drift Detection -- routing shift sensitivity | 100% sensitivity, 100% specificity | >= 90% | PASS |
| H41 | Bias Audit -- memory corpus fairness detection | 100% accuracy, 0% false alarm | >= 90% | PASS |
| H42 | Plateau Detection -- retraining convergence | 100% detection, 0% false plateau | >= 90% | PASS |
| H43 | Fast-Path Guardian -- ChimeraOracle enforcement | 100% (all paths through Guardian) | 100% | PASS |
Full machine-readable results:
benchmarks/reports/benchmark_report.json| Text report:benchmark_report.txt
Based on the benchmark evidence, CARF is particularly suited for:
| Use Case | Why CARF Helps | Supporting Evidence |
|---|---|---|
| Causal Decision Support -- supply chain, marketing attribution, policy evaluation | Separates cause from correlation with statistical rigor | H1: 1,138x more accurate, H17: +25pp vs LLM on confounded scenarios |
| Risk Quantification Under Uncertainty -- investment, insurance, clinical trials | Calibrated posteriors with epistemic/aleatoric decomposition | H2: 100% calibrated across all Bayesian scenarios |
| Regulated AI Systems -- EU AI Act, financial audit, healthcare decision support | Deterministic, compliant, and fully auditable | H3--H5: 100% violation detection, determinism, and compliance |
| Enterprise Governance -- multi-domain policy orchestration, cost intelligence | MAP-PRICE-RESOLVE framework with conflict detection and audit | H10: 90% MAP accuracy, H11--H16: 100% cost precision, 86.7% conflict detection, full board lifecycle |
| Security-Critical Deployments -- financial services, government, healthcare | Injection-proof, red-team-tested, fairness-verified | H23: 100% OWASP block, H25: 100% red team defense, H26: perfect fairness |
| High-Throughput Analysis -- real-time scoring, batch processing | Fast oracle + stable memory under sustained load | H8: 40.7x speedup, H37: 42ms P95 at 25 users, H39: no memory growth |
| Strategic Analysis -- market entry, R&D allocation, scenario planning | Cynefin routing ensures the right analytical method per problem type | H0: 89.5% router accuracy, F1 = 0.895 across 5 domains |
| Operational Monitoring -- routing drift, memory bias, model staleness | Continuous monitoring of self-improvement feedback loops | H40-H42: 100% detection accuracy across drift, bias, and plateau scenarios |
All evaluation data is synthetic with known ground truth, enabling objective measurement. No proprietary datasets are required to reproduce results.
| Category | Description | Details |
|---|---|---|
| Causal (Synthetic) | 5 DGP types with known ATEs (linear, nonlinear, interaction, threshold, confounded) | n=500 each, 60 scenarios with CI calibration |
| Causal (Industry) | 5 sector-specific DGPs with realistic confounding | Supply chain, Healthcare, Marketing, Sustainability, Education |
| Bayesian | 8 scenarios (4 continuous, 4 binomial) | Known ground truth posteriors for calibration checking |
| Router | 456-query labeled test set across 5 Cynefin domains | Clear (101), Complicated (102), Complex (101), Chaotic (50), Disorder (102) |
| Governance | MAP (50), PRICE (15), RESOLVE (30), Tau-Bench (30), board lifecycle, policy roundtrip | Cross-domain link detection, cost precision, conflict detection, agent compliance |
| Security | OWASP LLM Top 10 (45 cases), Red Team (8 surfaces, 40 attacks) | Injection, PII detection, sanitization, multi-vector adversarial |
| Compliance | Fairness (80 variations), XAI fidelity, ALCOA+ audit (50 queries) | Demographic parity, explanation stability, audit trail completeness |
| Sustainability | Energy proportionality per domain, Scope 3 attribution | Clear < Complicated < Complex energy ordering |
| Industry | Supply chain prediction, Healthcare CATE, Finance VaR | Disruption lead time, treatment effect vs RCT, Kupiec backtest |
| UX | SUS usability (68.4), task completion, WCAG 2.2 Level A | System Usability Scale, success rate, accessibility audit |
| Performance | Load (1--25 concurrent), chaos cascade, soak (1000 queries) | P95 latency, fault containment, memory stability |
| Monitoring | Drift (5 enterprise scenarios), Bias (5 memory corpus patterns), Plateau (5 training curves), Guardian enforcement | Enterprise-realistic routing distributions, DistilBERT training curves |
| Baselines | Raw LLM (same model, no pipeline) on identical data | DeepSeek without CARF pipeline for fair comparison |
Benchmark reports include two quality gates:
- Performance gate: hypothesis pass/fail and grade (
A+toD) - Realism gate: realism (55/100), reliability (81/100), feasibility (89/100), evidence (100/100) from
realism_manifest.json
# Generate reports
python benchmarks/reports/generate_report.py
python benchmarks/reports/check_result_evidence.py# Clone the repository
git clone https://github.com/eljaplacido/projectcarfcynepic.git
cd projectcarf
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -e ".[dev]"
# Configure environment
cp .env.example .env
# Edit .env with your API keys
# Start the API server
python -m src.main
# In a new terminal, start the React cockpit
cd carf-cockpit
npm install
npm run dev# Start all services (API, Dashboard, Neo4j, Kafka, OPA)
docker compose up --build
# With demo data seeding
docker compose --profile demo up --buildServices:
- API: http://localhost:8000
- React Cockpit: http://localhost:5175
- Neo4j Browser: http://localhost:7474
- OPA: http://localhost:8181
# Set test mode to use offline stubs
# Linux/macOS:
export CARF_TEST_MODE=1
# Windows PowerShell:
$env:CARF_TEST_MODE="1"
# Run with mocked LLM responses
python -m src.mainCreate a .env file in the project root:
# Required: LLM Provider
LLM_PROVIDER=deepseek # or "openai"
DEEPSEEK_API_KEY=sk-... # Your DeepSeek API key
# OPENAI_API_KEY=sk-... # If using OpenAI
# Optional: Human-in-the-Loop
HUMANLAYER_API_KEY=hl-... # For Slack/Email approvals
# Optional: Observability
LANGSMITH_API_KEY=ls-... # For LangSmith tracing
# Optional: Data Storage
CARF_DATA_DIR=./var # Dataset storage location
# Optional: Services
NEO4J_URI=bolt://localhost:7687
KAFKA_ENABLED=false
OPA_ENABLED=false
# Optional: CSL-Core Policy Engine
CSL_ENABLED=false # Enable formal policy verification
CSL_POLICY_DIR=config/policies # Directory for CSL policy files
CSL_FAIL_CLOSED=true # Fail-closed on CSL errors (recommended)
# Optional: Currency normalization for financial policies
CARF_FX_RATES_JSON={"USD":1.0,"EUR":1.08,"JPY":0.0067}CARF services can be used directly in Jupyter notebooks or Python scripts:
from src.api.library import classify_query, run_causal, run_bayesian, run_pipeline, query_memory
# Classify a query
result = await classify_query("Why did costs increase 15%?")
print(result["domain"], result["confidence"])
# Run full pipeline
pipeline = await run_pipeline("Does supplier diversification reduce disruptions?")
print(pipeline["response"])
# Search past analyses
similar = await query_memory("supply chain risk")
print(similar["matches"])Query -> Memory Augmentation -> Cynefin Router -> RAG Context (3-layer) ->
Clear -> Deterministic Runner (lookup)
Complicated -> ChimeraOracle Fast-Path (if model available, Phase 18)
OR Causal Inference Engine (DoWhy/EconML)
Complex -> Bayesian Active Inference (PyMC)
Chaotic -> Circuit Breaker (emergency stop)
Disorder -> Human Escalation
All paths -> H-Neuron Sentinel -> Guardian (YAML + CSL-Core + OPA) ->
Approved -> Governance (MAP-PRICE-RESOLVE) -> END
Rejected -> Smart Reflector (heuristic + LLM repair, max 2 retries) -> Retry
Escalated -> HumanLayer (3-point context) -> END
All results -> Drift Detector (routing distribution monitoring, Phase 18)
-> Experience Buffer + Agent Memory (semantic retrieval)
-> Kafka (audit trail)
| Domain | Description | Handler | Use Case |
|---|---|---|---|
| Clear | Cause-effect obvious | Deterministic automation | Standard procedures |
| Complicated | Requires expert analysis | Causal inference engine | Impact estimation |
| Complex | Emergent, probe required | Bayesian active inference | Uncertainty exploration |
| Chaotic | Crisis mode | Circuit breaker | Emergency response |
| Disorder | Cannot classify | Human escalation | Ambiguous inputs |
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | System health check |
/query |
POST | Process query through CARF pipeline |
/query/transparent |
POST | Query with full transparency metrics |
/domains |
GET | List Cynefin domains |
/scenarios |
GET | List demo scenarios |
/scenarios/{id} |
GET | Fetch scenario payload |
/datasets |
POST | Upload dataset to registry |
/datasets |
GET | List stored datasets |
/datasets/{id}/preview |
GET | Preview dataset rows |
| Endpoint | Method | Description |
|---|---|---|
/insights/generate |
POST | Generate persona-based insights |
/insights/enhanced |
POST | Enhanced insights with action items and roadmap |
/insights/types |
GET | List available insight types |
/experience/similar |
GET | Find similar past analyses (semantic memory) |
/experience/patterns |
GET | Aggregated domain-level patterns |
/transparency/reliability |
POST | Assess analysis reliability |
/transparency/agents |
GET | Get agent registry info |
/guardian/status |
GET | Get compliance status |
/guardian/policies |
GET | List configured policies |
/feedback/retraining-readiness |
GET | Check Router retraining readiness |
| Endpoint | Method | Description |
|---|---|---|
/workflow/start |
POST | Start workflow tracking |
/workflow/complete |
POST | Complete workflow and aggregate metrics |
/workflow/trace/{id} |
GET | Get full execution trace |
/workflow/recent |
GET | Get recent workflow traces |
/agents/stats |
GET | Get agent performance statistics |
/agents/comparison |
GET | Get agent comparison data |
| Endpoint | Method | Description |
|---|---|---|
/data/load/json |
POST | Load JSON data with quality assessment |
/data/load/csv |
POST | Load CSV data with quality assessment |
/data/{id} |
GET | Retrieve loaded data by ID |
/data/quality/levels |
GET | Get available quality levels |
/data/detect-schema |
POST | Auto-detect schema from uploaded data |
/data/cache |
DELETE | Clear data cache |
| Endpoint | Method | Description |
|---|---|---|
/simulations/run |
POST | Run what-if scenario simulation |
/simulations/compare |
POST | Compare multiple simulation results |
/simulations/{id}/status |
GET | Get simulation status |
/simulations/{id}/rerun |
POST | Rerun a simulation |
/simulations/generators |
GET | List available data generators |
/simulations/generate |
POST | Generate synthetic data with causal structure |
/simulations/assess-realism |
POST | Assess scenario realism score |
/simulations/run-transparent |
POST | Enhanced simulation with transparency |
| Endpoint | Method | Description |
|---|---|---|
/oracle/models |
GET | List trained oracle models |
/oracle/train |
POST | Train CausalForestDML model on scenario data |
/oracle/predict |
POST | Fast causal prediction (<100ms) |
/oracle/models/{id} |
GET | Get model metadata for scenario |
| Endpoint | Method | Description |
|---|---|---|
/api/visualization-config |
GET | Combined domain + context visualization config |
/config/visualization |
GET | Context-aware visualization settings |
/config/status |
GET | System configuration status |
/config/validate |
POST | Validate configuration |
/router/config |
GET/PUT/PATCH | Manage Cynefin Router configuration |
/guardian/config |
GET/PUT/PATCH | Manage Guardian policy configuration |
| Endpoint | Method | Description |
|---|---|---|
/query/stream |
POST | Streaming query with server-sent events |
/query/fast |
POST | Fast query mode via Chimera Oracle |
/chat |
POST | Chat interface with Socratic mode |
| Endpoint | Method | Description |
|---|---|---|
/escalations |
GET | List pending human escalations |
/escalations/{id} |
GET | Get escalation details |
/escalations/{id}/resolve |
POST | Resolve an escalation |
/transparency/compliance |
POST | EU AI Act compliance report |
/transparency/data-quality |
POST | Assess data quality |
/transparency/guardian |
POST | Guardian decision transparency |
/sessions/{id}/lineage |
GET | Data lineage and provenance tracking |
| Endpoint | Method | Description |
|---|---|---|
/governance/domains |
GET/POST | List or create governance domains |
/governance/policies |
GET/POST | List or create federated policies |
/governance/policies/{ns} |
PUT/DELETE | Update or remove a policy by namespace |
/governance/policies/extract |
POST | Extract governance rules from unstructured policy text (LLM-powered) |
/governance/conflicts |
GET | List policy conflicts (optionally unresolved only) |
/governance/conflicts/{id}/resolve |
POST | Resolve a detected policy conflict |
/governance/triples |
GET | Query MAP context triples |
/governance/triples/impact/{domain} |
GET | Triple impact analysis for a domain |
/governance/compliance/{framework} |
GET | Compliance score for EU AI Act, CSRD, GDPR, ISO 27001 |
/governance/cost/breakdown/{session} |
GET | Token-level cost breakdown per session |
/governance/cost/aggregate |
GET | Aggregated cost intelligence across sessions |
/governance/cost/roi |
GET | ROI analysis for LLM spend |
/governance/audit |
GET | Governance audit log (filterable) |
/governance/health |
GET | Governance subsystem health check |
/governance/semantic-graph |
GET | Semantic governance topology (domains, policies, conflicts, MAP triples) |
/governance/boards |
GET/POST | Governance board lifecycle management |
/governance/boards/templates |
GET | List governance board templates |
/governance/boards/from-template |
POST | Create board from template |
/governance/boards/{id} |
GET/PUT/DELETE | Board CRUD operations |
/governance/boards/{id}/compliance |
GET | Board-level compliance check |
/governance/export |
POST | Export governance spec (JSON/YAML) |
/governance/seed/{template} |
POST | Seed domain from template |
/governance/rag/status |
GET | RAG index status |
/governance/rag/query |
POST | RAG-augmented policy search |
/governance/rag/ingest-policies |
POST | Re-ingest policies into RAG index |
/governance/rag/ingest-text |
POST | Ingest arbitrary text into RAG |
/governance/documents/upload-file |
POST | Upload document for RAG ingestion |
/governance/documents/status |
GET | Document processing status |
/governance/memory/status |
GET | Agent memory status |
/governance/memory/compact |
POST | Compact agent memory |
/governance/memory/recall |
POST | Recall from agent memory |
| Endpoint | Method | Description |
|---|---|---|
/monitoring/drift |
GET | Routing distribution drift status (KL-divergence) |
/monitoring/drift/history |
GET | Recent drift detection snapshots |
/monitoring/drift/reset |
POST | Reset drift baseline |
/monitoring/bias-audit |
GET | Run bias audit on agent memory (chi-squared, quality, verdicts) |
/monitoring/convergence |
GET | Retraining convergence/plateau status |
/monitoring/convergence/record |
POST | Record retraining accuracy measurement |
/monitoring/status |
GET | Unified monitoring status (drift + bias + convergence) |
| Endpoint | Method | Description |
|---|---|---|
/developer/state |
GET | Full system state dump |
/developer/logs |
GET | Filtered log entries (layer, level, limit) |
/developer/ws |
WebSocket | Live log streaming |
/analyze |
POST | File analysis for CSV/JSON |
/agent/suggest-improvements |
POST | Automated improvement suggestions |
/explain |
POST | Generate explanations for analyses |
/explain/{domain}/{element} |
GET | Domain-specific element explanations |
/benchmarks/run-all |
POST | Run all benchmark suites |
/feedback |
POST | Submit analysis feedback |
/summary/executive |
POST | Generate executive summary |
Simple Query:
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"query": "Why did our costs increase by 15%?"}'Causal Analysis:
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{
"query": "Estimate impact of discount on churn",
"causal_estimation": {
"treatment": "discount",
"outcome": "churn",
"covariates": ["region", "tenure"],
"data": [
{"discount": 0.1, "churn": 0, "region": "NA", "tenure": 12},
{"discount": 0.0, "churn": 1, "region": "EU", "tenure": 3}
]
}
}'Bayesian Inference:
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{
"query": "Update belief on conversion rate",
"bayesian_inference": {
"successes": 42,
"trials": 100
}
}'Tip
The platform is ready to test with built-in demo scenarios or your own data. No complex setup needed -- just start the servers and explore.
CYNEPIC includes 17 pre-built scenarios covering all 5 Cynefin domains across 7 verticals:
| Scenario | Domain | Analysis Type | What It Tests |
|---|---|---|---|
| Scope 3 Attribution | Complicated | Causal (DoWhy) | Supplier sustainability impact on emissions (2000 records) |
| Discount vs Churn | Complicated | Causal (DoWhy) | Causal effect of discounts on customer retention (2000 records) |
| Conversion Belief Update | Complex | Bayesian (PyMC) | Prior/posterior belief updates with binomial data |
| Renewable Energy ROI | Complicated | Causal (DoWhy) | ROI estimation across facilities with regional variation (800 records) |
| Shipping Mode Analysis | Complicated | Causal (DoWhy) | Carbon footprint impact of freight mode switching (1200 records) |
| Supply Chain Resilience | Complicated | Causal (DoWhy) | Climate stress impact on disruption risk (2000 records) |
| Pricing Optimization | Complicated | Causal (DoWhy) | Price elasticity and sales volume effects (1500 records) |
| Market Adoption | Complex | Bayesian (PyMC) | Uncertainty modeling for new product launch |
| Crisis Response | Chaotic | Circuit Breaker | Critical supplier failure requiring immediate stabilization |
| Inventory Data Lookup | Clear | Deterministic | Simple stock level and product queries |
| CSRD Double Materiality | Complicated | Causal (DoWhy) | Climate transition risk impact on operating costs (ESRS) |
| ESRS E1 Climate Disclosure | Complicated | Causal (DoWhy) | Emission reduction program effectiveness analysis |
| ESRS S1 Workforce Assessment | Complicated | Causal (DoWhy) | Training investment impact on workforce productivity |
| Energy Mix Optimization | Complicated | Causal (DoWhy) | Renewable energy mix cost/target optimization |
| Energy Demand Forecast | Complex | Bayesian (PyMC) | Seasonal energy demand uncertainty modeling |
| Manufacturing Quality Control | Complicated | Causal (DoWhy) | Process temperature effect on defect rates |
| Process Line Optimization | Complicated | Causal (DoWhy) | Production parameter throughput optimization |
To run a demo:
- Open the React dashboard:
http://localhost:5175 - Select a scenario card from the list.
- Click a suggested query to run the analysis.
- Explore the Cynefin classification, Causal DAG, and Guardian Panel.
See docs/DEMO_WALKTHROUGH.md for a step-by-step guide.
Bring your own CSV to run causal analysis:
- Generate Sample Data (optional):
python scripts/generate_chain_data.pyto createsupply_chain_resilience.csv. - Open Data Onboarding: In the dashboard, click "Upload your own data".
- Map Variables: Identify the Treatment (e.g.,
climate_stress_index), Outcome (e.g.,disruption_risk_percent), and Confounders. - Run Analysis: The platform will automatically classify the query, build a causal model, and display results.
See docs/END_USER_TESTING_GUIDE.md for detailed instructions.
projectcarf/
├── src/
│ ├── core/ # State schemas (EpistemicState), LLM config, deployment profiles, database
│ ├── services/ # 30+ services: Causal, Bayesian, World Model, NeSy Engine,
│ │ # ChimeraOracle, H-Neuron, Drift Detector, Bias Auditor,
│ │ # Governance, RAG, Agent Memory, Embedding Engine
│ ├── workflows/ # LangGraph graph (incl. chimera_fast_path), Guardian, Router
│ ├── utils/ # Telemetry, caching, circuit breaker, currency normalization
│ ├── api/ # FastAPI routers (17 routers, 90+ endpoints)
│ ├── mcp/ # MCP server (18 cognitive tools for agentic integration)
│ └── main.py # FastAPI entry point
├── carf-cockpit/ # React (Vite + TypeScript) dashboard — 59 components, 4 views
├── config/
│ ├── agents.yaml # Agent configurations
│ ├── policies.yaml # Guardian YAML policies
│ ├── policies/ # CSL-Core formal policy definitions (35 rules)
│ ├── federated_policies/ # Domain-owner governance policies (6 YAML files)
│ ├── governance_boards/ # Compliance board templates (EU AI Act, CSRD, etc.)
│ └── policy_scaffolds/ # Domain-specific policy templates
├── models/ # Trained models (DistilBERT router + 5 CausalForest models)
├── demo/ # 17 demo scenarios + 11 sample datasets
├── tests/
│ ├── unit/ # 58+ unit test files (1,130+ tests)
│ ├── deepeval/ # LLM quality evaluation tests
│ ├── e2e/ # End-to-end gold standard tests
│ └── integration/ # API flow integration tests
├── benchmarks/ # Technical & use-case benchmarks (43 hypotheses + realism validation)
├── tla_specs/ # TLA+ formal verification specs (StateGraph, EscalationProtocol)
├── .agent/skills/ # 12 agent skills
├── scripts/ # 13 scripts (training, generation, migration, seeding)
├── docs/ # 40+ documentation files
└── docker-compose.yml # Full stack deployment
# Run all tests
pytest tests/ -v
# Run with coverage
pytest tests/ -v --cov=src --cov-report=term-missing
# Run manual test suite
python scripts/test_carf.py
# Type checking
mypy src/ --strict
# Linting
ruff check src/ tests/CARF includes comprehensive LLM output quality evaluation using DeepEval:
# Install with evaluation dependencies
pip install -e ".[dev,evaluation]"
# Run DeepEval tests
pytest tests/deepeval/ -v
# Run with DeepEval CLI (parallel execution)
deepeval test run tests/deepeval/ -n 4Quality Metrics Evaluated:
- Relevancy Score: How well responses address user queries
- Hallucination Risk: Detection of fabricated content
- Reasoning Depth: Quality of reasoning and justification
- UIX Compliance: Adherence to transparency standards (Why? How confident? Based on what?)
See Evaluation Framework Documentation for details.
- Query input with intelligent suggestions
- Simulation controls (sliders)
- Cynefin classification with domain scores and entropy
- Bayesian belief state with distribution chart
- Causal DAG visualization
- Guardian policy check with approval workflow
- Transparency Panel with agent reliability and data quality
- Insights Panel with actionable recommendations
- Execution trace timeline
- Performance metrics (latency, tokens, cost)
- DAG structure explorer
- State snapshots (JSON)
- Agent Comparison Panel with LLM performance tracking
- Data Flow Visualization
- Live log streaming via WebSocket
- DeepEval quality metrics integration
- Monitoring Panel (Phase 18): Drift detection, bias audit, retraining convergence
- Domain-Specific Views for all 5 Cynefin domains:
- Clear: Decision checklist with step tracking
- Complicated: Expert analysis with causal effect summary
- Complex: Uncertainty exploration with epistemic/aleatoric breakdown
- Chaotic: Circuit breaker with rapid response controls
- Disorder: Clarification prompts with human escalation
- Expected impact hero card
- Dynamic KPI Dashboard (0-10 scoring with real data)
- Routing Drift, Memory Bias, Retraining Health KPI cards (Phase 18)
- Proposed action summary
- Policy compliance overview
- Actionable Insights for decision-makers
- Export and share functionality
- Spec Map: ReactFlow visualization of governance domains and policy nodes
- Cost Intelligence: KPI cards with recharts cost breakdown (token pricing, ROI, risk exposure)
- Policy Federation: Domain sidebar, policy cards with conflict detection and resolution
- Compliance Audit: Framework selector (EU AI Act, CSRD, GDPR, ISO 27001), score gauge, article accordion
- Semantic Graph: Interactive policy/conflict topology with explainability annotations
- Policy Ingestion: Upload documents for RAG indexing and automated rule extraction
- Monitoring (Phase 18): Drift detection, bias audit, plateau detection — operational intelligence for SRR safety
- PlotlyChart unified wrapper supporting waterfall, radar, sankey, and gauge charts
- CynefinVizConfig backend-driven domain-specific visualization strategy
- Context-adaptive charts: color schemes, chart types, and interaction modes adapt per Cynefin domain and business context (sustainability, financial, operational, risk)
- useVisualizationConfig React hook with caching and offline fallbacks
- Dark/Light Mode Toggle in header
- System preference detection
- Persistent theme preference (localStorage)
- Quick Start Guide - Get running in 5 minutes
- Complete Walkthrough - Comprehensive guide for all user types (Analyst, Developer, Executive)
- Demo Walkthrough - Step-by-step demo scenarios
- End-User Testing Guide - Validate the demo flow and integrations
- PRD and Blueprint - Product requirements
- Data Layer - Data architecture
- Phase 17 Architecture - Causal world model, NeSy engine, H-Neuron
- RSI Safety Analysis - Supervised Recursive Refinement (SRR) model
- UI/UX Guidelines - Design system
- LLM Agentic Strategy - LLM roles, guardrails, multi-agent scaling
- Self-Healing Architecture - SRR, reflection, human escalation
- End-to-End Context Flow - 6-layer state propagation and monitoring
- Evaluation Framework - 43 hypotheses, DeepEval quality metrics
- Intellectual Property - Complete IP registry
- OPA Policy - Enterprise policy setup
- Security Guidelines - Release readiness checklist
- Integration Guide - Enterprise integration patterns (ERP, Cloud)
- Future Roadmap - Development path and vision
We welcome contributions! Please review the following guides based on your interest:
| I want to... | Start here |
|---|---|
| Add a new demo use case | docs/RFC_UIX_001_SCENARIO_REGISTRY.md - How to add new scenarios to the registry. |
| Integrate CYNEPIC with another system (ERP, Cloud) | docs/INTEGRATION_GUIDE.md - API usage, data ingestion patterns, security. |
| Test the platform demos end-to-end | docs/DEMO_WALKTHROUGH.md and docs/END_USER_TESTING_GUIDE.md |
| Understand the future vision | docs/FUTURE_ROADMAP.md - Planned features and areas for improvement. |
| Review contribution guidelines | CONTRIBUTING.md - Code standards, commit messages, PR process. |
- Fork the repository.
- Create a feature branch:
git checkout -b feature/my-feature - Make changes and run tests:
pytest tests/ -v - Commit with a descriptive message:
git commit -m "Add my feature" - Push:
git push origin feature/my-feature - Open a Pull Request.
Business Source License 1.1 (BSL) - see LICENSE for details. For production use, see COMMERCIAL_LICENSE.
This software is source-available under BSL 1.1. You may freely use it for development, testing, academic research, and personal projects. Production use is permitted provided it is not competitive with Cisuregen's products (see LICENSE for exact terms).
If your use case involves offering CARF-based functionality as a hosted service or embedding it in a commercial product, contact licensing@cisuregen.com for a commercial license.
On February 19, 2030, this version converts to Apache License 2.0.
The architectural innovations in this project -- including the entropy-aware Cynefin routing mechanism, the deterministic Guardian policy enforcement layer, the integrated causal-Bayesian-neurosymbolic pipeline, and the EpistemicState provenance schema -- are original works of Cisuregen. See IP_CLASSIFICATION.md for the full IP tier mapping.
"CARF", "CYNEPIC", and related marks are trademarks of Cisuregen. See NOTICE for trademark usage guidelines.
- LangGraph - Workflow orchestration
- DoWhy - Causal inference
- PyMC - Bayesian modeling
- HumanLayer - Human-in-the-loop SDK