Skip to content

eljaplacido/projectcarfcynepic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project CARF: CYNEPIC Architecture 0.5

image **Complexity-Adaptive & Context-Aware Reasoning Fabric** - A research grade Architectural Blueprint & Decision Intelligence Simulation for more reliable and transparent data-driven decision making and Agentic AI Systems, combining complexity & context adaptability, causal inference, bayesian methods to quantify uncertainty & complexity and epistemic awareness.

Python 3.11+ License: BSL 1.1 FastAPI React

CARF architecture

Overview

Modern AI systems often act as black boxes, producing confident-sounding outputs without clarifying their reasoning, certainty level, or the nature of the problem they're addressing. This creates a "trust gap": users cannot easily distinguish whether an AI answer is based on solid causal evidence, a probabilistic inference, or a simple guess.

CYNEPIC (CYNefin-EPIstemic Cockpit) solves this by enforcing epistemic awareness. The system explicitly classifies every query by its inherent complexity using the Cynefin Framework, then routes it to the appropriate analytical engine -- ensuring the right tool is used for the right problem.

Problem Type Analysis Method Example
Clear (Obvious) Rule lookup "What is the capital of France?"
Complicated (Knowable) Causal Inference (DoWhy) "Does offering a discount reduce churn?"
Complex (Emergent) Bayesian Inference (PyMC) "What is the likely conversion rate?"
Chaotic (Crisis) Circuit Breaker System alert, require human action
Disorder (Ambiguous) Human Escalation Input is unclear or contradictory

All outputs are filtered through a Guardian Layer that enforces organizational policies (e.g., "require human approval for decisions affecting >$1M budget") and logs an audit trail for compliance (e.g., EU AI Act).

Key Features

  • Cynefin-based Routing: Automatic classification of query complexity across 5 domains.
  • Causal Inference Engine: Discover DAGs, estimate effects, and run refutation tests via DoWhy/EconML.
  • Bayesian Exploration: Quantify uncertainty and update beliefs with new evidence via PyMC.
  • Guardian Policy Layer: Multi-layer enforcement (YAML + CSL-Core + OPA), human-in-the-loop, and audit trails.
  • Currency-Aware Financial Guardrails: Guardian + CSL enforce monetary thresholds with FX normalization (CARF_FX_RATES_JSON) and fail-safe blocking when conversion evidence is unavailable.
  • ChimeraOracle Fast Predictions: Pre-trained CausalForestDML models for <100ms causal effect scoring.
  • What-If Simulation Framework: Multi-scenario comparison with 6 built-in realistic data generators.
  • CSL-Core Policy Verification: Formal, deterministic policy rules with fail-closed safety.
  • Policy Scaffolding & Refinement: Auto-generate domain-specific policies with adaptive refinement agents.
  • Four-View Dashboard: Tailored views for Analysts, Developers, Executives, and Governance.
  • Dark/Light Theme: Full dark mode support with system preference detection.
  • Actionable Insights: Persona-specific recommendations, action items with effort badges, and analysis roadmaps.
  • Smart Reflector: Hybrid heuristic + LLM self-correction for policy violations with observability.
  • Experience Buffer: Semantic memory using sentence-transformers (all-MiniLM-L6-v2) with TF-IDF fallback for similar past analysis retrieval and domain pattern tracking.
  • Library API: Notebook-friendly wrappers (from src.api.library import classify_query, run_pipeline).
  • Agent Transparency: Track LLM usage, latency, cost, and quality scores across workflows.
  • Multi-Source Data Loading: Load data from JSON, CSV, APIs, or Neo4j with automatic quality assessment.
  • Streaming Query Mode: Server-sent events for real-time progressive responses.
  • EU AI Act Compliance: Built-in compliance reporting and audit trail generation.
  • Data Lineage Tracking: Full provenance chain for audit and reproducibility.
  • Router Retraining Pipeline: Extract domain override feedback for DistilBERT fine-tuning.
  • MCP Server: 18 cognitive tools exposed via Model Context Protocol for agentic AI integration.
  • Agentic Chat Actions: Natural-language UI actions (e.g., onboarding launch, latest-analysis simulation compare, governance tab routing).
  • Governance Semantic Graph: Purpose-built policy/domain/conflict topology view with explainability (Why this?, How confident?, Based on what?).
  • RAG-Augmented Policy Search: In-memory retrieval-augmented generation for governance policy queries with auto-ingestion at startup.
  • Agent Memory: Persistent agent memory with compaction and recall for cross-session knowledge retention.
  • Document Processor: Upload and ingest PDF/text documents for RAG indexing and policy extraction.
  • Embedding Engine: Sentence-transformer embeddings (all-MiniLM-L6-v2) with TF-IDF fallback for semantic search.
  • Deployment Profiles: Environment-aware presets (research/staging/production) controlling CORS, auth, rate limiting, and governance defaults.
  • Security Middleware: Profile-aware API key auth, per-IP rate limiting, and request size enforcement.
  • Causal World Model (Phase 17): Structural Causal Models with do-calculus interventions, forward simulation, and Pearl's 3-step counterfactual reasoning.
  • Neurosymbolic Engine (Phase 17): Tight neural-symbolic loop — LLM fact extraction, forward-chaining, shortcut detection, Neo4j graph grounding.
  • H-Neuron Sentinel (Phase 17): Hallucination detection via weighted signal fusion (8 signals, configurable weights).
  • 3-Layer NeSy-Augmented RAG (Phase 17): Vector + Graph + Symbolic retrieval with Reciprocal Rank Fusion.
  • Firebase Auth + Cloud SQL (Phase 17): JWT authentication, SQLite/PostgreSQL factory, per-user analysis history.
  • Drift Detection (Phase 18): KL-divergence monitoring of routing distribution for feedback loop safety.
  • Bias Auditing (Phase 18): Chi-squared fairness tests on accumulated agent memory.
  • Plateau Detection (Phase 18): Convergence monitoring for router retraining pipeline.
  • ChimeraOracle Fast-Path (Phase 18): StateGraph-integrated fast causal predictions with Guardian enforcement.
  • Supervised Recursive Refinement (SRR): Formally bounded self-improvement model — 4 RSI safety gaps closed, TLA+ verified.

Data & Analytical Flows in CARF architecture

Dataflow blueprint

User Interface -- React Cockpit (CARF Cockpit)


Benchmark Results

CARF is evaluated against 43 falsifiable hypotheses (H0--H43) across 11 benchmark categories, using synthetic and realistic enterprise data with known ground truth and a raw LLM baseline (same model, no pipeline) for comparison. All benchmarks use fixed random seeds for full reproducibility.

Overall Grade: A+ -- 43/43 Hypotheses Passed (100%)

# Hypothesis Measured Threshold Result
H0 Router Accuracy -- Cynefin classification on 456 queries 89.5% (F1 0.895) >= 85% PASS
H1 Causal Accuracy -- DoWhy ATE vs raw LLM MSE ratio 0.0009 (1,138x more accurate) >= 50% lower PASS
H2 Bayesian Calibration -- posterior coverage 100% well-calibrated >= 90% PASS
H3 Violation Detection -- Guardian catches all violations 100% detection 100% PASS
H4 Determinism -- same input, same Guardian decision 100% (50x repetitions) 100% PASS
H5 EU AI Act Compliance -- Art. 9, 12, 13, 14 100% >= 90% PASS
H6 Latency Overhead -- CARF vs raw LLM 1.9x <= 5x PASS
H7 Hallucination Reduction -- grounded queries 100% (0% both) >= 40% PASS
H8 ChimeraOracle Speedup -- fast causal predictions 40.7x faster >= 10x PASS
H9 Memory Stability -- 500+ queries -37.3% RSS growth <= 10% PASS
H10 MAP Accuracy -- cross-domain link detection 90% >= 70% PASS
H11 PRICE Precision -- cost computation 100% (max err 2.8e-05) >= 95% PASS
H12 Governance Latency -- P95 non-blocking 0.58ms < 50ms PASS
H13 PRICE Expanded -- 15-case cost test 100% >= 95% PASS
H14 RESOLVE Conflict Detection -- 30 cases 86.7% >= 80% PASS
H15 Board Lifecycle -- CRUD operations 100% 100% PASS
H16 Policy Roundtrip -- YAML export/import fidelity 100% >= 95% PASS
H17 Counterfactual Accuracy -- vs raw LLM +25pp (CARF 100% vs LLM 75%) >= 10pp PASS
H18 Tau-Bench Agent Compliance -- policy-guided 100% (30/30) >= 95% PASS
H19 Hallucination at Scale -- rate ceiling 7.0% <= 10% PASS
H21 Cross-LLM Agreement -- provider parity 100% >= 85% PASS
H22 CLEAR Composite -- cost/latency/efficacy/alignment 0.77 >= 0.75 PASS
H23 OWASP Injection Block -- prompt injection defense 100% >= 90% PASS
H24 Adversarial Causal Robustness 70% >= 70% PASS
H25 Red Team Defense -- 8 attack surfaces 100% >= 85% PASS
H26 Fairness -- demographic parity ratio 1.0 >= 0.80 PASS
H27 XAI Fidelity -- explainability quality 80% (3/3 dimensions) >= 80% PASS
H28 ALCOA+ Audit Trail -- compliance 100% >= 95% PASS
H29 Energy Proportionality -- Clear < Complicated < Complex 100% 100% PASS
H30 Scope 3 Attribution -- emission accuracy 100% >= 85% PASS
H31 SUS Usability -- System Usability Scale 68.4 >= 68 PASS
H32 Task Completion -- success rate 100% >= 90% PASS
H33 WCAG 2.2 Level A -- accessibility violations 0 0 PASS
H34 Supply Chain Prediction -- precision 94% >= 70% PASS
H35 Healthcare CATE -- vs RCT ground truth 98% >= 90% PASS
H36 Finance VaR -- Kupiec backtest p = 1.0 > 0.05 PASS
H37 Load Test -- P95 at 25 concurrent users 42ms <= 15s PASS
H38 Chaos Cascade -- containment rate 100% >= 80% PASS
H39 Soak Test -- memory growth over 1000 queries -1.5% <= 5% PASS
H40 Drift Detection -- routing shift sensitivity 100% sensitivity, 100% specificity >= 90% PASS
H41 Bias Audit -- memory corpus fairness detection 100% accuracy, 0% false alarm >= 90% PASS
H42 Plateau Detection -- retraining convergence 100% detection, 0% false plateau >= 90% PASS
H43 Fast-Path Guardian -- ChimeraOracle enforcement 100% (all paths through Guardian) 100% PASS

Full machine-readable results: benchmarks/reports/benchmark_report.json | Text report: benchmark_report.txt

Indicated Use Cases

Based on the benchmark evidence, CARF is particularly suited for:

Use Case Why CARF Helps Supporting Evidence
Causal Decision Support -- supply chain, marketing attribution, policy evaluation Separates cause from correlation with statistical rigor H1: 1,138x more accurate, H17: +25pp vs LLM on confounded scenarios
Risk Quantification Under Uncertainty -- investment, insurance, clinical trials Calibrated posteriors with epistemic/aleatoric decomposition H2: 100% calibrated across all Bayesian scenarios
Regulated AI Systems -- EU AI Act, financial audit, healthcare decision support Deterministic, compliant, and fully auditable H3--H5: 100% violation detection, determinism, and compliance
Enterprise Governance -- multi-domain policy orchestration, cost intelligence MAP-PRICE-RESOLVE framework with conflict detection and audit H10: 90% MAP accuracy, H11--H16: 100% cost precision, 86.7% conflict detection, full board lifecycle
Security-Critical Deployments -- financial services, government, healthcare Injection-proof, red-team-tested, fairness-verified H23: 100% OWASP block, H25: 100% red team defense, H26: perfect fairness
High-Throughput Analysis -- real-time scoring, batch processing Fast oracle + stable memory under sustained load H8: 40.7x speedup, H37: 42ms P95 at 25 users, H39: no memory growth
Strategic Analysis -- market entry, R&D allocation, scenario planning Cynefin routing ensures the right analytical method per problem type H0: 89.5% router accuracy, F1 = 0.895 across 5 domains
Operational Monitoring -- routing drift, memory bias, model staleness Continuous monitoring of self-improvement feedback loops H40-H42: 100% detection accuracy across drift, bias, and plateau scenarios

Benchmark Data Sources & Methodology

All evaluation data is synthetic with known ground truth, enabling objective measurement. No proprietary datasets are required to reproduce results.

Category Description Details
Causal (Synthetic) 5 DGP types with known ATEs (linear, nonlinear, interaction, threshold, confounded) n=500 each, 60 scenarios with CI calibration
Causal (Industry) 5 sector-specific DGPs with realistic confounding Supply chain, Healthcare, Marketing, Sustainability, Education
Bayesian 8 scenarios (4 continuous, 4 binomial) Known ground truth posteriors for calibration checking
Router 456-query labeled test set across 5 Cynefin domains Clear (101), Complicated (102), Complex (101), Chaotic (50), Disorder (102)
Governance MAP (50), PRICE (15), RESOLVE (30), Tau-Bench (30), board lifecycle, policy roundtrip Cross-domain link detection, cost precision, conflict detection, agent compliance
Security OWASP LLM Top 10 (45 cases), Red Team (8 surfaces, 40 attacks) Injection, PII detection, sanitization, multi-vector adversarial
Compliance Fairness (80 variations), XAI fidelity, ALCOA+ audit (50 queries) Demographic parity, explanation stability, audit trail completeness
Sustainability Energy proportionality per domain, Scope 3 attribution Clear < Complicated < Complex energy ordering
Industry Supply chain prediction, Healthcare CATE, Finance VaR Disruption lead time, treatment effect vs RCT, Kupiec backtest
UX SUS usability (68.4), task completion, WCAG 2.2 Level A System Usability Scale, success rate, accessibility audit
Performance Load (1--25 concurrent), chaos cascade, soak (1000 queries) P95 latency, fault containment, memory stability
Monitoring Drift (5 enterprise scenarios), Bias (5 memory corpus patterns), Plateau (5 training curves), Guardian enforcement Enterprise-realistic routing distributions, DistilBERT training curves
Baselines Raw LLM (same model, no pipeline) on identical data DeepSeek without CARF pipeline for fair comparison

Benchmark reports include two quality gates:

  1. Performance gate: hypothesis pass/fail and grade (A+ to D)
  2. Realism gate: realism (55/100), reliability (81/100), feasibility (89/100), evidence (100/100) from realism_manifest.json
# Generate reports
python benchmarks/reports/generate_report.py
python benchmarks/reports/check_result_evidence.py

Quick Start

Option 1: Local Development (Recommended)

# Clone the repository
git clone https://github.com/eljaplacido/projectcarfcynepic.git
cd projectcarf

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -e ".[dev]"

# Configure environment
cp .env.example .env
# Edit .env with your API keys

# Start the API server
python -m src.main

# In a new terminal, start the React cockpit
cd carf-cockpit
npm install
npm run dev

Option 2: Docker Compose (Full Stack)

# Start all services (API, Dashboard, Neo4j, Kafka, OPA)
docker compose up --build

# With demo data seeding
docker compose --profile demo up --build

Services:

Option 3: Test Mode (No API Keys Required)

# Set test mode to use offline stubs

# Linux/macOS:
export CARF_TEST_MODE=1

# Windows PowerShell:
$env:CARF_TEST_MODE="1"

# Run with mocked LLM responses
python -m src.main

Configuration

Create a .env file in the project root:

# Required: LLM Provider
LLM_PROVIDER=deepseek          # or "openai"
DEEPSEEK_API_KEY=sk-...        # Your DeepSeek API key
# OPENAI_API_KEY=sk-...        # If using OpenAI

# Optional: Human-in-the-Loop
HUMANLAYER_API_KEY=hl-...      # For Slack/Email approvals

# Optional: Observability
LANGSMITH_API_KEY=ls-...       # For LangSmith tracing

# Optional: Data Storage
CARF_DATA_DIR=./var            # Dataset storage location

# Optional: Services
NEO4J_URI=bolt://localhost:7687
KAFKA_ENABLED=false
OPA_ENABLED=false

# Optional: CSL-Core Policy Engine
CSL_ENABLED=false              # Enable formal policy verification
CSL_POLICY_DIR=config/policies # Directory for CSL policy files
CSL_FAIL_CLOSED=true           # Fail-closed on CSL errors (recommended)

# Optional: Currency normalization for financial policies
CARF_FX_RATES_JSON={"USD":1.0,"EUR":1.08,"JPY":0.0067}

Library Usage (Notebooks & Data Pipelines)

CARF services can be used directly in Jupyter notebooks or Python scripts:

from src.api.library import classify_query, run_causal, run_bayesian, run_pipeline, query_memory

# Classify a query
result = await classify_query("Why did costs increase 15%?")
print(result["domain"], result["confidence"])

# Run full pipeline
pipeline = await run_pipeline("Does supplier diversification reduce disruptions?")
print(pipeline["response"])

# Search past analyses
similar = await query_memory("supply chain risk")
print(similar["matches"])

Core Architecture

Query -> Memory Augmentation -> Cynefin Router -> RAG Context (3-layer) ->
  Clear        -> Deterministic Runner (lookup)
  Complicated  -> ChimeraOracle Fast-Path (if model available, Phase 18)
              OR Causal Inference Engine (DoWhy/EconML)
  Complex      -> Bayesian Active Inference (PyMC)
  Chaotic      -> Circuit Breaker (emergency stop)
  Disorder     -> Human Escalation

All paths -> H-Neuron Sentinel -> Guardian (YAML + CSL-Core + OPA) ->
  Approved  -> Governance (MAP-PRICE-RESOLVE) -> END
  Rejected  -> Smart Reflector (heuristic + LLM repair, max 2 retries) -> Retry
  Escalated -> HumanLayer (3-point context) -> END

All results -> Drift Detector (routing distribution monitoring, Phase 18)
           -> Experience Buffer + Agent Memory (semantic retrieval)
           -> Kafka (audit trail)

Cynefin Domains

Domain Description Handler Use Case
Clear Cause-effect obvious Deterministic automation Standard procedures
Complicated Requires expert analysis Causal inference engine Impact estimation
Complex Emergent, probe required Bayesian active inference Uncertainty exploration
Chaotic Crisis mode Circuit breaker Emergency response
Disorder Cannot classify Human escalation Ambiguous inputs

API Endpoints

Core Endpoints

Endpoint Method Description
/health GET System health check
/query POST Process query through CARF pipeline
/query/transparent POST Query with full transparency metrics
/domains GET List Cynefin domains
/scenarios GET List demo scenarios
/scenarios/{id} GET Fetch scenario payload
/datasets POST Upload dataset to registry
/datasets GET List stored datasets
/datasets/{id}/preview GET Preview dataset rows

Insights & Transparency Endpoints

Endpoint Method Description
/insights/generate POST Generate persona-based insights
/insights/enhanced POST Enhanced insights with action items and roadmap
/insights/types GET List available insight types
/experience/similar GET Find similar past analyses (semantic memory)
/experience/patterns GET Aggregated domain-level patterns
/transparency/reliability POST Assess analysis reliability
/transparency/agents GET Get agent registry info
/guardian/status GET Get compliance status
/guardian/policies GET List configured policies
/feedback/retraining-readiness GET Check Router retraining readiness

Agent Tracking Endpoints

Endpoint Method Description
/workflow/start POST Start workflow tracking
/workflow/complete POST Complete workflow and aggregate metrics
/workflow/trace/{id} GET Get full execution trace
/workflow/recent GET Get recent workflow traces
/agents/stats GET Get agent performance statistics
/agents/comparison GET Get agent comparison data

Data Loading Endpoints

Endpoint Method Description
/data/load/json POST Load JSON data with quality assessment
/data/load/csv POST Load CSV data with quality assessment
/data/{id} GET Retrieve loaded data by ID
/data/quality/levels GET Get available quality levels
/data/detect-schema POST Auto-detect schema from uploaded data
/data/cache DELETE Clear data cache

Simulation & What-If Endpoints

Endpoint Method Description
/simulations/run POST Run what-if scenario simulation
/simulations/compare POST Compare multiple simulation results
/simulations/{id}/status GET Get simulation status
/simulations/{id}/rerun POST Rerun a simulation
/simulations/generators GET List available data generators
/simulations/generate POST Generate synthetic data with causal structure
/simulations/assess-realism POST Assess scenario realism score
/simulations/run-transparent POST Enhanced simulation with transparency

ChimeraOracle (Fast Predictions) Endpoints

Endpoint Method Description
/oracle/models GET List trained oracle models
/oracle/train POST Train CausalForestDML model on scenario data
/oracle/predict POST Fast causal prediction (<100ms)
/oracle/models/{id} GET Get model metadata for scenario

Visualization & Configuration Endpoints

Endpoint Method Description
/api/visualization-config GET Combined domain + context visualization config
/config/visualization GET Context-aware visualization settings
/config/status GET System configuration status
/config/validate POST Validate configuration
/router/config GET/PUT/PATCH Manage Cynefin Router configuration
/guardian/config GET/PUT/PATCH Manage Guardian policy configuration

Advanced Query Modes

Endpoint Method Description
/query/stream POST Streaming query with server-sent events
/query/fast POST Fast query mode via Chimera Oracle
/chat POST Chat interface with Socratic mode

Escalation & Compliance Endpoints

Endpoint Method Description
/escalations GET List pending human escalations
/escalations/{id} GET Get escalation details
/escalations/{id}/resolve POST Resolve an escalation
/transparency/compliance POST EU AI Act compliance report
/transparency/data-quality POST Assess data quality
/transparency/guardian POST Guardian decision transparency
/sessions/{id}/lineage GET Data lineage and provenance tracking

Governance & Policy Endpoints (18 endpoints)

Endpoint Method Description
/governance/domains GET/POST List or create governance domains
/governance/policies GET/POST List or create federated policies
/governance/policies/{ns} PUT/DELETE Update or remove a policy by namespace
/governance/policies/extract POST Extract governance rules from unstructured policy text (LLM-powered)
/governance/conflicts GET List policy conflicts (optionally unresolved only)
/governance/conflicts/{id}/resolve POST Resolve a detected policy conflict
/governance/triples GET Query MAP context triples
/governance/triples/impact/{domain} GET Triple impact analysis for a domain
/governance/compliance/{framework} GET Compliance score for EU AI Act, CSRD, GDPR, ISO 27001
/governance/cost/breakdown/{session} GET Token-level cost breakdown per session
/governance/cost/aggregate GET Aggregated cost intelligence across sessions
/governance/cost/roi GET ROI analysis for LLM spend
/governance/audit GET Governance audit log (filterable)
/governance/health GET Governance subsystem health check
/governance/semantic-graph GET Semantic governance topology (domains, policies, conflicts, MAP triples)
/governance/boards GET/POST Governance board lifecycle management
/governance/boards/templates GET List governance board templates
/governance/boards/from-template POST Create board from template
/governance/boards/{id} GET/PUT/DELETE Board CRUD operations
/governance/boards/{id}/compliance GET Board-level compliance check
/governance/export POST Export governance spec (JSON/YAML)
/governance/seed/{template} POST Seed domain from template
/governance/rag/status GET RAG index status
/governance/rag/query POST RAG-augmented policy search
/governance/rag/ingest-policies POST Re-ingest policies into RAG index
/governance/rag/ingest-text POST Ingest arbitrary text into RAG
/governance/documents/upload-file POST Upload document for RAG ingestion
/governance/documents/status GET Document processing status
/governance/memory/status GET Agent memory status
/governance/memory/compact POST Compact agent memory
/governance/memory/recall POST Recall from agent memory

Monitoring & Operational Intelligence Endpoints (Phase 18)

Endpoint Method Description
/monitoring/drift GET Routing distribution drift status (KL-divergence)
/monitoring/drift/history GET Recent drift detection snapshots
/monitoring/drift/reset POST Reset drift baseline
/monitoring/bias-audit GET Run bias audit on agent memory (chi-squared, quality, verdicts)
/monitoring/convergence GET Retraining convergence/plateau status
/monitoring/convergence/record POST Record retraining accuracy measurement
/monitoring/status GET Unified monitoring status (drift + bias + convergence)

Developer & Diagnostics Endpoints

Endpoint Method Description
/developer/state GET Full system state dump
/developer/logs GET Filtered log entries (layer, level, limit)
/developer/ws WebSocket Live log streaming
/analyze POST File analysis for CSV/JSON
/agent/suggest-improvements POST Automated improvement suggestions
/explain POST Generate explanations for analyses
/explain/{domain}/{element} GET Domain-specific element explanations
/benchmarks/run-all POST Run all benchmark suites
/feedback POST Submit analysis feedback
/summary/executive POST Generate executive summary

Example API Calls

Simple Query:

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"query": "Why did our costs increase by 15%?"}'

Causal Analysis:

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Estimate impact of discount on churn",
    "causal_estimation": {
      "treatment": "discount",
      "outcome": "churn",
      "covariates": ["region", "tenure"],
      "data": [
        {"discount": 0.1, "churn": 0, "region": "NA", "tenure": 12},
        {"discount": 0.0, "churn": 1, "region": "EU", "tenure": 3}
      ]
    }
  }'

Bayesian Inference:

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Update belief on conversion rate",
    "bayesian_inference": {
      "successes": 42,
      "trials": 100
    }
  }'

Get Started: Test the Platform

Tip

The platform is ready to test with built-in demo scenarios or your own data. No complex setup needed -- just start the servers and explore.

Option A: Run a Pre-Built Demo Scenario

CYNEPIC includes 17 pre-built scenarios covering all 5 Cynefin domains across 7 verticals:

Scenario Domain Analysis Type What It Tests
Scope 3 Attribution Complicated Causal (DoWhy) Supplier sustainability impact on emissions (2000 records)
Discount vs Churn Complicated Causal (DoWhy) Causal effect of discounts on customer retention (2000 records)
Conversion Belief Update Complex Bayesian (PyMC) Prior/posterior belief updates with binomial data
Renewable Energy ROI Complicated Causal (DoWhy) ROI estimation across facilities with regional variation (800 records)
Shipping Mode Analysis Complicated Causal (DoWhy) Carbon footprint impact of freight mode switching (1200 records)
Supply Chain Resilience Complicated Causal (DoWhy) Climate stress impact on disruption risk (2000 records)
Pricing Optimization Complicated Causal (DoWhy) Price elasticity and sales volume effects (1500 records)
Market Adoption Complex Bayesian (PyMC) Uncertainty modeling for new product launch
Crisis Response Chaotic Circuit Breaker Critical supplier failure requiring immediate stabilization
Inventory Data Lookup Clear Deterministic Simple stock level and product queries
CSRD Double Materiality Complicated Causal (DoWhy) Climate transition risk impact on operating costs (ESRS)
ESRS E1 Climate Disclosure Complicated Causal (DoWhy) Emission reduction program effectiveness analysis
ESRS S1 Workforce Assessment Complicated Causal (DoWhy) Training investment impact on workforce productivity
Energy Mix Optimization Complicated Causal (DoWhy) Renewable energy mix cost/target optimization
Energy Demand Forecast Complex Bayesian (PyMC) Seasonal energy demand uncertainty modeling
Manufacturing Quality Control Complicated Causal (DoWhy) Process temperature effect on defect rates
Process Line Optimization Complicated Causal (DoWhy) Production parameter throughput optimization

To run a demo:

  1. Open the React dashboard: http://localhost:5175
  2. Select a scenario card from the list.
  3. Click a suggested query to run the analysis.
  4. Explore the Cynefin classification, Causal DAG, and Guardian Panel.

See docs/DEMO_WALKTHROUGH.md for a step-by-step guide.

Option B: Onboard Your Own Data

Bring your own CSV to run causal analysis:

  1. Generate Sample Data (optional): python scripts/generate_chain_data.py to create supply_chain_resilience.csv.
  2. Open Data Onboarding: In the dashboard, click "Upload your own data".
  3. Map Variables: Identify the Treatment (e.g., climate_stress_index), Outcome (e.g., disruption_risk_percent), and Confounders.
  4. Run Analysis: The platform will automatically classify the query, build a causal model, and display results.

See docs/END_USER_TESTING_GUIDE.md for detailed instructions.


Project Structure

projectcarf/
├── src/
│   ├── core/           # State schemas (EpistemicState), LLM config, deployment profiles, database
│   ├── services/       # 30+ services: Causal, Bayesian, World Model, NeSy Engine,
│   │                   # ChimeraOracle, H-Neuron, Drift Detector, Bias Auditor,
│   │                   # Governance, RAG, Agent Memory, Embedding Engine
│   ├── workflows/      # LangGraph graph (incl. chimera_fast_path), Guardian, Router
│   ├── utils/          # Telemetry, caching, circuit breaker, currency normalization
│   ├── api/            # FastAPI routers (17 routers, 90+ endpoints)
│   ├── mcp/            # MCP server (18 cognitive tools for agentic integration)
│   └── main.py         # FastAPI entry point
├── carf-cockpit/       # React (Vite + TypeScript) dashboard — 59 components, 4 views
├── config/
│   ├── agents.yaml     # Agent configurations
│   ├── policies.yaml   # Guardian YAML policies
│   ├── policies/       # CSL-Core formal policy definitions (35 rules)
│   ├── federated_policies/ # Domain-owner governance policies (6 YAML files)
│   ├── governance_boards/  # Compliance board templates (EU AI Act, CSRD, etc.)
│   └── policy_scaffolds/   # Domain-specific policy templates
├── models/             # Trained models (DistilBERT router + 5 CausalForest models)
├── demo/               # 17 demo scenarios + 11 sample datasets
├── tests/
│   ├── unit/           # 58+ unit test files (1,130+ tests)
│   ├── deepeval/       # LLM quality evaluation tests
│   ├── e2e/            # End-to-end gold standard tests
│   └── integration/    # API flow integration tests
├── benchmarks/         # Technical & use-case benchmarks (43 hypotheses + realism validation)
├── tla_specs/          # TLA+ formal verification specs (StateGraph, EscalationProtocol)
├── .agent/skills/      # 12 agent skills
├── scripts/            # 13 scripts (training, generation, migration, seeding)
├── docs/               # 40+ documentation files
└── docker-compose.yml  # Full stack deployment

Testing

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ -v --cov=src --cov-report=term-missing

# Run manual test suite
python scripts/test_carf.py

# Type checking
mypy src/ --strict

# Linting
ruff check src/ tests/

LLM Quality Evaluation (DeepEval)

CARF includes comprehensive LLM output quality evaluation using DeepEval:

# Install with evaluation dependencies
pip install -e ".[dev,evaluation]"

# Run DeepEval tests
pytest tests/deepeval/ -v

# Run with DeepEval CLI (parallel execution)
deepeval test run tests/deepeval/ -n 4

Quality Metrics Evaluated:

  • Relevancy Score: How well responses address user queries
  • Hallucination Risk: Detection of fabricated content
  • Reasoning Depth: Quality of reasoning and justification
  • UIX Compliance: Adherence to transparency standards (Why? How confident? Based on what?)

See Evaluation Framework Documentation for details.

Dashboard Views

Analyst View

  • Query input with intelligent suggestions
  • Simulation controls (sliders)
  • Cynefin classification with domain scores and entropy
  • Bayesian belief state with distribution chart
  • Causal DAG visualization
  • Guardian policy check with approval workflow
  • Transparency Panel with agent reliability and data quality
  • Insights Panel with actionable recommendations

Developer View

  • Execution trace timeline
  • Performance metrics (latency, tokens, cost)
  • DAG structure explorer
  • State snapshots (JSON)
  • Agent Comparison Panel with LLM performance tracking
  • Data Flow Visualization
  • Live log streaming via WebSocket
  • DeepEval quality metrics integration
  • Monitoring Panel (Phase 18): Drift detection, bias audit, retraining convergence
  • Domain-Specific Views for all 5 Cynefin domains:
    • Clear: Decision checklist with step tracking
    • Complicated: Expert analysis with causal effect summary
    • Complex: Uncertainty exploration with epistemic/aleatoric breakdown
    • Chaotic: Circuit breaker with rapid response controls
    • Disorder: Clarification prompts with human escalation

Executive View

  • Expected impact hero card
  • Dynamic KPI Dashboard (0-10 scoring with real data)
  • Routing Drift, Memory Bias, Retraining Health KPI cards (Phase 18)
  • Proposed action summary
  • Policy compliance overview
  • Actionable Insights for decision-makers
  • Export and share functionality

Governance View

  • Spec Map: ReactFlow visualization of governance domains and policy nodes
  • Cost Intelligence: KPI cards with recharts cost breakdown (token pricing, ROI, risk exposure)
  • Policy Federation: Domain sidebar, policy cards with conflict detection and resolution
  • Compliance Audit: Framework selector (EU AI Act, CSRD, GDPR, ISO 27001), score gauge, article accordion
  • Semantic Graph: Interactive policy/conflict topology with explainability annotations
  • Policy Ingestion: Upload documents for RAG indexing and automated rule extraction
  • Monitoring (Phase 18): Drift detection, bias audit, plateau detection — operational intelligence for SRR safety

Data Visualization

  • PlotlyChart unified wrapper supporting waterfall, radar, sankey, and gauge charts
  • CynefinVizConfig backend-driven domain-specific visualization strategy
  • Context-adaptive charts: color schemes, chart types, and interaction modes adapt per Cynefin domain and business context (sustainability, financial, operational, risk)
  • useVisualizationConfig React hook with caching and offline fallbacks

Theme Support

  • Dark/Light Mode Toggle in header
  • System preference detection
  • Persistent theme preference (localStorage)

Documentation

Getting Started

Architecture & Design

Operations & Integration

For Contributors

We welcome contributions! Please review the following guides based on your interest:

I want to... Start here
Add a new demo use case docs/RFC_UIX_001_SCENARIO_REGISTRY.md - How to add new scenarios to the registry.
Integrate CYNEPIC with another system (ERP, Cloud) docs/INTEGRATION_GUIDE.md - API usage, data ingestion patterns, security.
Test the platform demos end-to-end docs/DEMO_WALKTHROUGH.md and docs/END_USER_TESTING_GUIDE.md
Understand the future vision docs/FUTURE_ROADMAP.md - Planned features and areas for improvement.
Review contribution guidelines CONTRIBUTING.md - Code standards, commit messages, PR process.

Quick Contribution Steps

  1. Fork the repository.
  2. Create a feature branch: git checkout -b feature/my-feature
  3. Make changes and run tests: pytest tests/ -v
  4. Commit with a descriptive message: git commit -m "Add my feature"
  5. Push: git push origin feature/my-feature
  6. Open a Pull Request.

License

Business Source License 1.1 (BSL) - see LICENSE for details. For production use, see COMMERCIAL_LICENSE.

Commercial Use

This software is source-available under BSL 1.1. You may freely use it for development, testing, academic research, and personal projects. Production use is permitted provided it is not competitive with Cisuregen's products (see LICENSE for exact terms).

If your use case involves offering CARF-based functionality as a hosted service or embedding it in a commercial product, contact licensing@cisuregen.com for a commercial license.

On February 19, 2030, this version converts to Apache License 2.0.

Intellectual Property

The architectural innovations in this project -- including the entropy-aware Cynefin routing mechanism, the deterministic Guardian policy enforcement layer, the integrated causal-Bayesian-neurosymbolic pipeline, and the EpistemicState provenance schema -- are original works of Cisuregen. See IP_CLASSIFICATION.md for the full IP tier mapping.

"CARF", "CYNEPIC", and related marks are trademarks of Cisuregen. See NOTICE for trademark usage guidelines.

Acknowledgments