Project CARF: CYNEPIC Architecture 0.5

**Complexity-Adaptive & Context-Aware Reasoning Fabric** - A research grade Architectural Blueprint & Decision Intelligence Simulation for more reliable and transparent data-driven decision making and Agentic AI Systems, combining complexity & context adaptability, causal inference, bayesian methods to quantify uncertainty & complexity and epistemic awareness.

Overview

Modern AI systems often act as black boxes, producing confident-sounding outputs without clarifying their reasoning, certainty level, or the nature of the problem they're addressing. This creates a "trust gap": users cannot easily distinguish whether an AI answer is based on solid causal evidence, a probabilistic inference, or a simple guess.

CYNEPIC (CYNefin-EPIstemic Cockpit) solves this by enforcing epistemic awareness. The system explicitly classifies every query by its inherent complexity using the Cynefin Framework, then routes it to the appropriate analytical engine -- ensuring the right tool is used for the right problem.

Problem Type	Analysis Method	Example
Clear (Obvious)	Rule lookup	"What is the capital of France?"
Complicated (Knowable)	Causal Inference (DoWhy)	"Does offering a discount reduce churn?"
Complex (Emergent)	Bayesian Inference (PyMC)	"What is the likely conversion rate?"
Chaotic (Crisis)	Circuit Breaker	System alert, require human action
Disorder (Ambiguous)	Human Escalation	Input is unclear or contradictory

All outputs are filtered through a Guardian Layer that enforces organizational policies (e.g., "require human approval for decisions affecting >$1M budget") and logs an audit trail for compliance (e.g., EU AI Act).

Key Features

Cynefin-based Routing: Automatic classification of query complexity across 5 domains.
Causal Inference Engine: Discover DAGs, estimate effects, and run refutation tests via DoWhy/EconML.
Bayesian Exploration: Quantify uncertainty and update beliefs with new evidence via PyMC.
Guardian Policy Layer: Multi-layer enforcement (YAML + CSL-Core + OPA), human-in-the-loop, and audit trails.
Currency-Aware Financial Guardrails: Guardian + CSL enforce monetary thresholds with FX normalization (CARF_FX_RATES_JSON) and fail-safe blocking when conversion evidence is unavailable.
ChimeraOracle Fast Predictions: Pre-trained CausalForestDML models for <100ms causal effect scoring.
What-If Simulation Framework: Multi-scenario comparison with 6 built-in realistic data generators.
CSL-Core Policy Verification: Formal, deterministic policy rules with fail-closed safety.
Policy Scaffolding & Refinement: Auto-generate domain-specific policies with adaptive refinement agents.
Four-View Dashboard: Tailored views for Analysts, Developers, Executives, and Governance.
Dark/Light Theme: Full dark mode support with system preference detection.
Actionable Insights: Persona-specific recommendations, action items with effort badges, and analysis roadmaps.
Smart Reflector: Hybrid heuristic + LLM self-correction for policy violations with observability.
Experience Buffer: Semantic memory using sentence-transformers (all-MiniLM-L6-v2) with TF-IDF fallback for similar past analysis retrieval and domain pattern tracking.
Library API: Notebook-friendly wrappers (from src.api.library import classify_query, run_pipeline).
Agent Transparency: Track LLM usage, latency, cost, and quality scores across workflows.
Multi-Source Data Loading: Load data from JSON, CSV, APIs, or Neo4j with automatic quality assessment.
Streaming Query Mode: Server-sent events for real-time progressive responses.
EU AI Act Compliance: Built-in compliance reporting and audit trail generation.
Data Lineage Tracking: Full provenance chain for audit and reproducibility.
Router Retraining Pipeline: Extract domain override feedback for DistilBERT fine-tuning.
MCP Server: 18 cognitive tools exposed via Model Context Protocol for agentic AI integration.
Agentic Chat Actions: Natural-language UI actions (e.g., onboarding launch, latest-analysis simulation compare, governance tab routing).
Governance Semantic Graph: Purpose-built policy/domain/conflict topology view with explainability (Why this?, How confident?, Based on what?).
RAG-Augmented Policy Search: In-memory retrieval-augmented generation for governance policy queries with auto-ingestion at startup.
Agent Memory: Persistent agent memory with compaction and recall for cross-session knowledge retention.
Document Processor: Upload and ingest PDF/text documents for RAG indexing and policy extraction.
Embedding Engine: Sentence-transformer embeddings (all-MiniLM-L6-v2) with TF-IDF fallback for semantic search.
Deployment Profiles: Environment-aware presets (research/staging/production) controlling CORS, auth, rate limiting, and governance defaults.
Security Middleware: Profile-aware API key auth, per-IP rate limiting, and request size enforcement.
Causal World Model (Phase 17): Structural Causal Models with do-calculus interventions, forward simulation, and Pearl's 3-step counterfactual reasoning.
Neurosymbolic Engine (Phase 17): Tight neural-symbolic loop — LLM fact extraction, forward-chaining, shortcut detection, Neo4j graph grounding.
H-Neuron Sentinel (Phase 17): Hallucination detection via weighted signal fusion (8 signals, configurable weights).
3-Layer NeSy-Augmented RAG (Phase 17): Vector + Graph + Symbolic retrieval with Reciprocal Rank Fusion.
Firebase Auth + Cloud SQL (Phase 17): JWT authentication, SQLite/PostgreSQL factory, per-user analysis history.
Drift Detection (Phase 18): KL-divergence monitoring of routing distribution for feedback loop safety.
Bias Auditing (Phase 18): Chi-squared fairness tests on accumulated agent memory.
Plateau Detection (Phase 18): Convergence monitoring for router retraining pipeline.
ChimeraOracle Fast-Path (Phase 18): StateGraph-integrated fast causal predictions with Guardian enforcement.
Supervised Recursive Refinement (SRR): Formally bounded self-improvement model — 4 RSI safety gaps closed, TLA+ verified.

Data & Analytical Flows in CARF architecture

User Interface -- React Cockpit (CARF Cockpit)

Benchmark Results

CARF is evaluated against 43 falsifiable hypotheses (H0--H43) across 11 benchmark categories, using synthetic and realistic enterprise data with known ground truth and a raw LLM baseline (same model, no pipeline) for comparison. All benchmarks use fixed random seeds for full reproducibility.

Overall Grade: A+ -- 43/43 Hypotheses Passed (100%)

#	Hypothesis	Measured	Threshold	Result
H0	Router Accuracy -- Cynefin classification on 456 queries	89.5% (F1 0.895)	>= 85%	PASS
H1	Causal Accuracy -- DoWhy ATE vs raw LLM	MSE ratio 0.0009 (1,138x more accurate)	>= 50% lower	PASS
H2	Bayesian Calibration -- posterior coverage	100% well-calibrated	>= 90%	PASS
H3	Violation Detection -- Guardian catches all violations	100% detection	100%	PASS
H4	Determinism -- same input, same Guardian decision	100% (50x repetitions)	100%	PASS
H5	EU AI Act Compliance -- Art. 9, 12, 13, 14	100%	>= 90%	PASS
H6	Latency Overhead -- CARF vs raw LLM	1.9x	<= 5x	PASS
H7	Hallucination Reduction -- grounded queries	100% (0% both)	>= 40%	PASS
H8	ChimeraOracle Speedup -- fast causal predictions	40.7x faster	>= 10x	PASS
H9	Memory Stability -- 500+ queries	-37.3% RSS growth	<= 10%	PASS
H10	MAP Accuracy -- cross-domain link detection	90%	>= 70%	PASS
H11	PRICE Precision -- cost computation	100% (max err 2.8e-05)	>= 95%	PASS
H12	Governance Latency -- P95 non-blocking	0.58ms	< 50ms	PASS
H13	PRICE Expanded -- 15-case cost test	100%	>= 95%	PASS
H14	RESOLVE Conflict Detection -- 30 cases	86.7%	>= 80%	PASS
H15	Board Lifecycle -- CRUD operations	100%	100%	PASS
H16	Policy Roundtrip -- YAML export/import fidelity	100%	>= 95%	PASS
H17	Counterfactual Accuracy -- vs raw LLM	+25pp (CARF 100% vs LLM 75%)	>= 10pp	PASS
H18	Tau-Bench Agent Compliance -- policy-guided	100% (30/30)	>= 95%	PASS
H19	Hallucination at Scale -- rate ceiling	7.0%	<= 10%	PASS
H21	Cross-LLM Agreement -- provider parity	100%	>= 85%	PASS
H22	CLEAR Composite -- cost/latency/efficacy/alignment	0.77	>= 0.75	PASS
H23	OWASP Injection Block -- prompt injection defense	100%	>= 90%	PASS
H24	Adversarial Causal Robustness	70%	>= 70%	PASS
H25	Red Team Defense -- 8 attack surfaces	100%	>= 85%	PASS
H26	Fairness -- demographic parity ratio	1.0	>= 0.80	PASS
H27	XAI Fidelity -- explainability quality	80% (3/3 dimensions)	>= 80%	PASS
H28	ALCOA+ Audit Trail -- compliance	100%	>= 95%	PASS
H29	Energy Proportionality -- Clear < Complicated < Complex	100%	100%	PASS
H30	Scope 3 Attribution -- emission accuracy	100%	>= 85%	PASS
H31	SUS Usability -- System Usability Scale	68.4	>= 68	PASS
H32	Task Completion -- success rate	100%	>= 90%	PASS
H33	WCAG 2.2 Level A -- accessibility violations	0	0	PASS
H34	Supply Chain Prediction -- precision	94%	>= 70%	PASS
H35	Healthcare CATE -- vs RCT ground truth	98%	>= 90%	PASS
H36	Finance VaR -- Kupiec backtest	p = 1.0	> 0.05	PASS
H37	Load Test -- P95 at 25 concurrent users	42ms	<= 15s	PASS
H38	Chaos Cascade -- containment rate	100%	>= 80%	PASS
H39	Soak Test -- memory growth over 1000 queries	-1.5%	<= 5%	PASS
H40	Drift Detection -- routing shift sensitivity	100% sensitivity, 100% specificity	>= 90%	PASS
H41	Bias Audit -- memory corpus fairness detection	100% accuracy, 0% false alarm	>= 90%	PASS
H42	Plateau Detection -- retraining convergence	100% detection, 0% false plateau	>= 90%	PASS
H43	Fast-Path Guardian -- ChimeraOracle enforcement	100% (all paths through Guardian)	100%	PASS

Full machine-readable results: benchmarks/reports/benchmark_report.json | Text report: benchmark_report.txt

Indicated Use Cases

Based on the benchmark evidence, CARF is particularly suited for:

Use Case	Why CARF Helps	Supporting Evidence
Causal Decision Support -- supply chain, marketing attribution, policy evaluation	Separates cause from correlation with statistical rigor	H1: 1,138x more accurate, H17: +25pp vs LLM on confounded scenarios
Risk Quantification Under Uncertainty -- investment, insurance, clinical trials	Calibrated posteriors with epistemic/aleatoric decomposition	H2: 100% calibrated across all Bayesian scenarios
Regulated AI Systems -- EU AI Act, financial audit, healthcare decision support	Deterministic, compliant, and fully auditable	H3--H5: 100% violation detection, determinism, and compliance
Enterprise Governance -- multi-domain policy orchestration, cost intelligence	MAP-PRICE-RESOLVE framework with conflict detection and audit	H10: 90% MAP accuracy, H11--H16: 100% cost precision, 86.7% conflict detection, full board lifecycle
Security-Critical Deployments -- financial services, government, healthcare	Injection-proof, red-team-tested, fairness-verified	H23: 100% OWASP block, H25: 100% red team defense, H26: perfect fairness
High-Throughput Analysis -- real-time scoring, batch processing	Fast oracle + stable memory under sustained load	H8: 40.7x speedup, H37: 42ms P95 at 25 users, H39: no memory growth
Strategic Analysis -- market entry, R&D allocation, scenario planning	Cynefin routing ensures the right analytical method per problem type	H0: 89.5% router accuracy, F1 = 0.895 across 5 domains
Operational Monitoring -- routing drift, memory bias, model staleness	Continuous monitoring of self-improvement feedback loops	H40-H42: 100% detection accuracy across drift, bias, and plateau scenarios

Benchmark Data Sources & Methodology

All evaluation data is synthetic with known ground truth, enabling objective measurement. No proprietary datasets are required to reproduce results.

Category	Description	Details
Causal (Synthetic)	5 DGP types with known ATEs (linear, nonlinear, interaction, threshold, confounded)	n=500 each, 60 scenarios with CI calibration
Causal (Industry)	5 sector-specific DGPs with realistic confounding	Supply chain, Healthcare, Marketing, Sustainability, Education
Bayesian	8 scenarios (4 continuous, 4 binomial)	Known ground truth posteriors for calibration checking
Router	456-query labeled test set across 5 Cynefin domains	Clear (101), Complicated (102), Complex (101), Chaotic (50), Disorder (102)
Governance	MAP (50), PRICE (15), RESOLVE (30), Tau-Bench (30), board lifecycle, policy roundtrip	Cross-domain link detection, cost precision, conflict detection, agent compliance
Security	OWASP LLM Top 10 (45 cases), Red Team (8 surfaces, 40 attacks)	Injection, PII detection, sanitization, multi-vector adversarial
Compliance	Fairness (80 variations), XAI fidelity, ALCOA+ audit (50 queries)	Demographic parity, explanation stability, audit trail completeness
Sustainability	Energy proportionality per domain, Scope 3 attribution	Clear < Complicated < Complex energy ordering
Industry	Supply chain prediction, Healthcare CATE, Finance VaR	Disruption lead time, treatment effect vs RCT, Kupiec backtest
UX	SUS usability (68.4), task completion, WCAG 2.2 Level A	System Usability Scale, success rate, accessibility audit
Performance	Load (1--25 concurrent), chaos cascade, soak (1000 queries)	P95 latency, fault containment, memory stability
Monitoring	Drift (5 enterprise scenarios), Bias (5 memory corpus patterns), Plateau (5 training curves), Guardian enforcement	Enterprise-realistic routing distributions, DistilBERT training curves
Baselines	Raw LLM (same model, no pipeline) on identical data	DeepSeek without CARF pipeline for fair comparison

Benchmark reports include two quality gates:

Performance gate: hypothesis pass/fail and grade (A+ to D)
Realism gate: realism (55/100), reliability (81/100), feasibility (89/100), evidence (100/100) from realism_manifest.json

# Generate reports
python benchmarks/reports/generate_report.py
python benchmarks/reports/check_result_evidence.py

Quick Start

Option 1: Local Development (Recommended)

# Clone the repository
git clone https://github.com/eljaplacido/projectcarfcynepic.git
cd projectcarf

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -e ".[dev]"

# Configure environment
cp .env.example .env
# Edit .env with your API keys

# Start the API server
python -m src.main

# In a new terminal, start the React cockpit
cd carf-cockpit
npm install
npm run dev

Option 2: Docker Compose (Full Stack)

# Start all services (API, Dashboard, Neo4j, Kafka, OPA)
docker compose up --build

# With demo data seeding
docker compose --profile demo up --build

Services:

Option 3: Test Mode (No API Keys Required)

# Set test mode to use offline stubs

# Linux/macOS:
export CARF_TEST_MODE=1

# Windows PowerShell:
$env:CARF_TEST_MODE="1"

# Run with mocked LLM responses
python -m src.main

Configuration

Create a .env file in the project root:

# Required: LLM Provider
LLM_PROVIDER=deepseek          # or "openai"
DEEPSEEK_API_KEY=sk-...        # Your DeepSeek API key
# OPENAI_API_KEY=sk-...        # If using OpenAI

# Optional: Human-in-the-Loop
HUMANLAYER_API_KEY=hl-...      # For Slack/Email approvals

# Optional: Observability
LANGSMITH_API_KEY=ls-...       # For LangSmith tracing

# Optional: Data Storage
CARF_DATA_DIR=./var            # Dataset storage location

# Optional: Services
NEO4J_URI=bolt://localhost:7687
KAFKA_ENABLED=false
OPA_ENABLED=false

# Optional: CSL-Core Policy Engine
CSL_ENABLED=false              # Enable formal policy verification
CSL_POLICY_DIR=config/policies # Directory for CSL policy files
CSL_FAIL_CLOSED=true           # Fail-closed on CSL errors (recommended)

# Optional: Currency normalization for financial policies
CARF_FX_RATES_JSON={"USD":1.0,"EUR":1.08,"JPY":0.0067}

Library Usage (Notebooks & Data Pipelines)

CARF services can be used directly in Jupyter notebooks or Python scripts:

from src.api.library import classify_query, run_causal, run_bayesian, run_pipeline, query_memory

# Classify a query
result = await classify_query("Why did costs increase 15%?")
print(result["domain"], result["confidence"])

# Run full pipeline
pipeline = await run_pipeline("Does supplier diversification reduce disruptions?")
print(pipeline["response"])

# Search past analyses
similar = await query_memory("supply chain risk")
print(similar["matches"])

Core Architecture

Query -> Memory Augmentation -> Cynefin Router -> RAG Context (3-layer) ->
  Clear        -> Deterministic Runner (lookup)
  Complicated  -> ChimeraOracle Fast-Path (if model available, Phase 18)
              OR Causal Inference Engine (DoWhy/EconML)
  Complex      -> Bayesian Active Inference (PyMC)
  Chaotic      -> Circuit Breaker (emergency stop)
  Disorder     -> Human Escalation

All paths -> H-Neuron Sentinel -> Guardian (YAML + CSL-Core + OPA) ->
  Approved  -> Governance (MAP-PRICE-RESOLVE) -> END
  Rejected  -> Smart Reflector (heuristic + LLM repair, max 2 retries) -> Retry
  Escalated -> HumanLayer (3-point context) -> END

All results -> Drift Detector (routing distribution monitoring, Phase 18)
           -> Experience Buffer + Agent Memory (semantic retrieval)
           -> Kafka (audit trail)

Cynefin Domains

Domain	Description	Handler	Use Case
Clear	Cause-effect obvious	Deterministic automation	Standard procedures
Complicated	Requires expert analysis	Causal inference engine	Impact estimation
Complex	Emergent, probe required	Bayesian active inference	Uncertainty exploration
Chaotic	Crisis mode	Circuit breaker	Emergency response
Disorder	Cannot classify	Human escalation	Ambiguous inputs

API Endpoints

Core Endpoints

Endpoint	Method	Description
`/health`	GET	System health check
`/query`	POST	Process query through CARF pipeline
`/query/transparent`	POST	Query with full transparency metrics
`/domains`	GET	List Cynefin domains
`/scenarios`	GET	List demo scenarios
`/scenarios/{id}`	GET	Fetch scenario payload
`/datasets`	POST	Upload dataset to registry
`/datasets`	GET	List stored datasets
`/datasets/{id}/preview`	GET	Preview dataset rows

Insights & Transparency Endpoints

Endpoint	Method	Description
`/insights/generate`	POST	Generate persona-based insights
`/insights/enhanced`	POST	Enhanced insights with action items and roadmap
`/insights/types`	GET	List available insight types
`/experience/similar`	GET	Find similar past analyses (semantic memory)
`/experience/patterns`	GET	Aggregated domain-level patterns
`/transparency/reliability`	POST	Assess analysis reliability
`/transparency/agents`	GET	Get agent registry info
`/guardian/status`	GET	Get compliance status
`/guardian/policies`	GET	List configured policies
`/feedback/retraining-readiness`	GET	Check Router retraining readiness

Agent Tracking Endpoints

Endpoint	Method	Description
`/workflow/start`	POST	Start workflow tracking
`/workflow/complete`	POST	Complete workflow and aggregate metrics
`/workflow/trace/{id}`	GET	Get full execution trace
`/workflow/recent`	GET	Get recent workflow traces
`/agents/stats`	GET	Get agent performance statistics
`/agents/comparison`	GET	Get agent comparison data

Data Loading Endpoints

Endpoint	Method	Description
`/data/load/json`	POST	Load JSON data with quality assessment
`/data/load/csv`	POST	Load CSV data with quality assessment
`/data/{id}`	GET	Retrieve loaded data by ID
`/data/quality/levels`	GET	Get available quality levels
`/data/detect-schema`	POST	Auto-detect schema from uploaded data
`/data/cache`	DELETE	Clear data cache

Simulation & What-If Endpoints

Endpoint	Method	Description
`/simulations/run`	POST	Run what-if scenario simulation
`/simulations/compare`	POST	Compare multiple simulation results
`/simulations/{id}/status`	GET	Get simulation status
`/simulations/{id}/rerun`	POST	Rerun a simulation
`/simulations/generators`	GET	List available data generators
`/simulations/generate`	POST	Generate synthetic data with causal structure
`/simulations/assess-realism`	POST	Assess scenario realism score
`/simulations/run-transparent`	POST	Enhanced simulation with transparency

ChimeraOracle (Fast Predictions) Endpoints

Endpoint	Method	Description
`/oracle/models`	GET	List trained oracle models
`/oracle/train`	POST	Train CausalForestDML model on scenario data
`/oracle/predict`	POST	Fast causal prediction (<100ms)
`/oracle/models/{id}`	GET	Get model metadata for scenario

Visualization & Configuration Endpoints

Endpoint	Method	Description
`/api/visualization-config`	GET	Combined domain + context visualization config
`/config/visualization`	GET	Context-aware visualization settings
`/config/status`	GET	System configuration status
`/config/validate`	POST	Validate configuration
`/router/config`	GET/PUT/PATCH	Manage Cynefin Router configuration
`/guardian/config`	GET/PUT/PATCH	Manage Guardian policy configuration

Advanced Query Modes

Endpoint	Method	Description
`/query/stream`	POST	Streaming query with server-sent events
`/query/fast`	POST	Fast query mode via Chimera Oracle
`/chat`	POST	Chat interface with Socratic mode

Escalation & Compliance Endpoints

Endpoint	Method	Description
`/escalations`	GET	List pending human escalations
`/escalations/{id}`	GET	Get escalation details
`/escalations/{id}/resolve`	POST	Resolve an escalation
`/transparency/compliance`	POST	EU AI Act compliance report
`/transparency/data-quality`	POST	Assess data quality
`/transparency/guardian`	POST	Guardian decision transparency
`/sessions/{id}/lineage`	GET	Data lineage and provenance tracking

Governance & Policy Endpoints (18 endpoints)

Endpoint	Method	Description
`/governance/domains`	GET/POST	List or create governance domains
`/governance/policies`	GET/POST	List or create federated policies
`/governance/policies/{ns}`	PUT/DELETE	Update or remove a policy by namespace
`/governance/policies/extract`	POST	Extract governance rules from unstructured policy text (LLM-powered)
`/governance/conflicts`	GET	List policy conflicts (optionally unresolved only)
`/governance/conflicts/{id}/resolve`	POST	Resolve a detected policy conflict
`/governance/triples`	GET	Query MAP context triples
`/governance/triples/impact/{domain}`	GET	Triple impact analysis for a domain
`/governance/compliance/{framework}`	GET	Compliance score for EU AI Act, CSRD, GDPR, ISO 27001
`/governance/cost/breakdown/{session}`	GET	Token-level cost breakdown per session
`/governance/cost/aggregate`	GET	Aggregated cost intelligence across sessions
`/governance/cost/roi`	GET	ROI analysis for LLM spend
`/governance/audit`	GET	Governance audit log (filterable)
`/governance/health`	GET	Governance subsystem health check
`/governance/semantic-graph`	GET	Semantic governance topology (domains, policies, conflicts, MAP triples)
`/governance/boards`	GET/POST	Governance board lifecycle management
`/governance/boards/templates`	GET	List governance board templates
`/governance/boards/from-template`	POST	Create board from template
`/governance/boards/{id}`	GET/PUT/DELETE	Board CRUD operations
`/governance/boards/{id}/compliance`	GET	Board-level compliance check
`/governance/export`	POST	Export governance spec (JSON/YAML)
`/governance/seed/{template}`	POST	Seed domain from template
`/governance/rag/status`	GET	RAG index status
`/governance/rag/query`	POST	RAG-augmented policy search
`/governance/rag/ingest-policies`	POST	Re-ingest policies into RAG index
`/governance/rag/ingest-text`	POST	Ingest arbitrary text into RAG
`/governance/documents/upload-file`	POST	Upload document for RAG ingestion
`/governance/documents/status`	GET	Document processing status
`/governance/memory/status`	GET	Agent memory status
`/governance/memory/compact`	POST	Compact agent memory
`/governance/memory/recall`	POST	Recall from agent memory

Monitoring & Operational Intelligence Endpoints (Phase 18)

Endpoint	Method	Description
`/monitoring/drift`	GET	Routing distribution drift status (KL-divergence)
`/monitoring/drift/history`	GET	Recent drift detection snapshots
`/monitoring/drift/reset`	POST	Reset drift baseline
`/monitoring/bias-audit`	GET	Run bias audit on agent memory (chi-squared, quality, verdicts)
`/monitoring/convergence`	GET	Retraining convergence/plateau status
`/monitoring/convergence/record`	POST	Record retraining accuracy measurement
`/monitoring/status`	GET	Unified monitoring status (drift + bias + convergence)

Developer & Diagnostics Endpoints

Endpoint	Method	Description
`/developer/state`	GET	Full system state dump
`/developer/logs`	GET	Filtered log entries (layer, level, limit)
`/developer/ws`	WebSocket	Live log streaming
`/analyze`	POST	File analysis for CSV/JSON
`/agent/suggest-improvements`	POST	Automated improvement suggestions
`/explain`	POST	Generate explanations for analyses
`/explain/{domain}/{element}`	GET	Domain-specific element explanations
`/benchmarks/run-all`	POST	Run all benchmark suites
`/feedback`	POST	Submit analysis feedback
`/summary/executive`	POST	Generate executive summary

Example API Calls

Simple Query:

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"query": "Why did our costs increase by 15%?"}'

Causal Analysis:

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Estimate impact of discount on churn",
    "causal_estimation": {
      "treatment": "discount",
      "outcome": "churn",
      "covariates": ["region", "tenure"],
      "data": [
        {"discount": 0.1, "churn": 0, "region": "NA", "tenure": 12},
        {"discount": 0.0, "churn": 1, "region": "EU", "tenure": 3}
      ]
    }
  }'

Bayesian Inference:

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Update belief on conversion rate",
    "bayesian_inference": {
      "successes": 42,
      "trials": 100
    }
  }'

Get Started: Test the Platform

Tip

The platform is ready to test with built-in demo scenarios or your own data. No complex setup needed -- just start the servers and explore.

Option A: Run a Pre-Built Demo Scenario

CYNEPIC includes 17 pre-built scenarios covering all 5 Cynefin domains across 7 verticals:

Scenario	Domain	Analysis Type	What It Tests
Scope 3 Attribution	Complicated	Causal (DoWhy)	Supplier sustainability impact on emissions (2000 records)
Discount vs Churn	Complicated	Causal (DoWhy)	Causal effect of discounts on customer retention (2000 records)
Conversion Belief Update	Complex	Bayesian (PyMC)	Prior/posterior belief updates with binomial data
Renewable Energy ROI	Complicated	Causal (DoWhy)	ROI estimation across facilities with regional variation (800 records)
Shipping Mode Analysis	Complicated	Causal (DoWhy)	Carbon footprint impact of freight mode switching (1200 records)
Supply Chain Resilience	Complicated	Causal (DoWhy)	Climate stress impact on disruption risk (2000 records)
Pricing Optimization	Complicated	Causal (DoWhy)	Price elasticity and sales volume effects (1500 records)
Market Adoption	Complex	Bayesian (PyMC)	Uncertainty modeling for new product launch
Crisis Response	Chaotic	Circuit Breaker	Critical supplier failure requiring immediate stabilization
Inventory Data Lookup	Clear	Deterministic	Simple stock level and product queries
CSRD Double Materiality	Complicated	Causal (DoWhy)	Climate transition risk impact on operating costs (ESRS)
ESRS E1 Climate Disclosure	Complicated	Causal (DoWhy)	Emission reduction program effectiveness analysis
ESRS S1 Workforce Assessment	Complicated	Causal (DoWhy)	Training investment impact on workforce productivity
Energy Mix Optimization	Complicated	Causal (DoWhy)	Renewable energy mix cost/target optimization
Energy Demand Forecast	Complex	Bayesian (PyMC)	Seasonal energy demand uncertainty modeling
Manufacturing Quality Control	Complicated	Causal (DoWhy)	Process temperature effect on defect rates
Process Line Optimization	Complicated	Causal (DoWhy)	Production parameter throughput optimization

To run a demo:

Open the React dashboard: http://localhost:5175
Select a scenario card from the list.
Click a suggested query to run the analysis.
Explore the Cynefin classification, Causal DAG, and Guardian Panel.

See docs/DEMO_WALKTHROUGH.md for a step-by-step guide.

Option B: Onboard Your Own Data

Bring your own CSV to run causal analysis:

Generate Sample Data (optional): python scripts/generate_chain_data.py to create supply_chain_resilience.csv.
Open Data Onboarding: In the dashboard, click "Upload your own data".
Map Variables: Identify the Treatment (e.g., climate_stress_index), Outcome (e.g., disruption_risk_percent), and Confounders.
Run Analysis: The platform will automatically classify the query, build a causal model, and display results.

See docs/END_USER_TESTING_GUIDE.md for detailed instructions.

Project Structure

projectcarf/
├── src/
│   ├── core/           # State schemas (EpistemicState), LLM config, deployment profiles, database
│   ├── services/       # 30+ services: Causal, Bayesian, World Model, NeSy Engine,
│   │                   # ChimeraOracle, H-Neuron, Drift Detector, Bias Auditor,
│   │                   # Governance, RAG, Agent Memory, Embedding Engine
│   ├── workflows/      # LangGraph graph (incl. chimera_fast_path), Guardian, Router
│   ├── utils/          # Telemetry, caching, circuit breaker, currency normalization
│   ├── api/            # FastAPI routers (17 routers, 90+ endpoints)
│   ├── mcp/            # MCP server (18 cognitive tools for agentic integration)
│   └── main.py         # FastAPI entry point
├── carf-cockpit/       # React (Vite + TypeScript) dashboard — 59 components, 4 views
├── config/
│   ├── agents.yaml     # Agent configurations
│   ├── policies.yaml   # Guardian YAML policies
│   ├── policies/       # CSL-Core formal policy definitions (35 rules)
│   ├── federated_policies/ # Domain-owner governance policies (6 YAML files)
│   ├── governance_boards/  # Compliance board templates (EU AI Act, CSRD, etc.)
│   └── policy_scaffolds/   # Domain-specific policy templates
├── models/             # Trained models (DistilBERT router + 5 CausalForest models)
├── demo/               # 17 demo scenarios + 11 sample datasets
├── tests/
│   ├── unit/           # 58+ unit test files (1,130+ tests)
│   ├── deepeval/       # LLM quality evaluation tests
│   ├── e2e/            # End-to-end gold standard tests
│   └── integration/    # API flow integration tests
├── benchmarks/         # Technical & use-case benchmarks (43 hypotheses + realism validation)
├── tla_specs/          # TLA+ formal verification specs (StateGraph, EscalationProtocol)
├── .agent/skills/      # 12 agent skills
├── scripts/            # 13 scripts (training, generation, migration, seeding)
├── docs/               # 40+ documentation files
└── docker-compose.yml  # Full stack deployment

Testing

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ -v --cov=src --cov-report=term-missing

# Run manual test suite
python scripts/test_carf.py

# Type checking
mypy src/ --strict

# Linting
ruff check src/ tests/

LLM Quality Evaluation (DeepEval)

CARF includes comprehensive LLM output quality evaluation using DeepEval:

# Install with evaluation dependencies
pip install -e ".[dev,evaluation]"

# Run DeepEval tests
pytest tests/deepeval/ -v

# Run with DeepEval CLI (parallel execution)
deepeval test run tests/deepeval/ -n 4

Quality Metrics Evaluated:

Relevancy Score: How well responses address user queries
Hallucination Risk: Detection of fabricated content
Reasoning Depth: Quality of reasoning and justification
UIX Compliance: Adherence to transparency standards (Why? How confident? Based on what?)

See Evaluation Framework Documentation for details.

Dashboard Views

Analyst View

Query input with intelligent suggestions
Simulation controls (sliders)
Cynefin classification with domain scores and entropy
Bayesian belief state with distribution chart
Causal DAG visualization
Guardian policy check with approval workflow
Transparency Panel with agent reliability and data quality
Insights Panel with actionable recommendations

Developer View

Execution trace timeline
Performance metrics (latency, tokens, cost)
DAG structure explorer
State snapshots (JSON)
Agent Comparison Panel with LLM performance tracking
Data Flow Visualization
Live log streaming via WebSocket
DeepEval quality metrics integration
Monitoring Panel (Phase 18): Drift detection, bias audit, retraining convergence
Domain-Specific Views for all 5 Cynefin domains:
- Clear: Decision checklist with step tracking
- Complicated: Expert analysis with causal effect summary
- Complex: Uncertainty exploration with epistemic/aleatoric breakdown
- Chaotic: Circuit breaker with rapid response controls
- Disorder: Clarification prompts with human escalation

Executive View

Expected impact hero card
Dynamic KPI Dashboard (0-10 scoring with real data)
Routing Drift, Memory Bias, Retraining Health KPI cards (Phase 18)
Proposed action summary
Policy compliance overview
Actionable Insights for decision-makers
Export and share functionality

Governance View

Spec Map: ReactFlow visualization of governance domains and policy nodes
Cost Intelligence: KPI cards with recharts cost breakdown (token pricing, ROI, risk exposure)
Policy Federation: Domain sidebar, policy cards with conflict detection and resolution
Compliance Audit: Framework selector (EU AI Act, CSRD, GDPR, ISO 27001), score gauge, article accordion
Semantic Graph: Interactive policy/conflict topology with explainability annotations
Policy Ingestion: Upload documents for RAG indexing and automated rule extraction
Monitoring (Phase 18): Drift detection, bias audit, plateau detection — operational intelligence for SRR safety

Data Visualization

PlotlyChart unified wrapper supporting waterfall, radar, sankey, and gauge charts
CynefinVizConfig backend-driven domain-specific visualization strategy
Context-adaptive charts: color schemes, chart types, and interaction modes adapt per Cynefin domain and business context (sustainability, financial, operational, risk)
useVisualizationConfig React hook with caching and offline fallbacks

Theme Support

Dark/Light Mode Toggle in header
System preference detection
Persistent theme preference (localStorage)

Documentation

Getting Started

Quick Start Guide - Get running in 5 minutes
Complete Walkthrough - Comprehensive guide for all user types (Analyst, Developer, Executive)
Demo Walkthrough - Step-by-step demo scenarios
End-User Testing Guide - Validate the demo flow and integrations

Architecture & Design

PRD and Blueprint - Product requirements
Data Layer - Data architecture
Phase 17 Architecture - Causal world model, NeSy engine, H-Neuron
RSI Safety Analysis - Supervised Recursive Refinement (SRR) model
UI/UX Guidelines - Design system
LLM Agentic Strategy - LLM roles, guardrails, multi-agent scaling
Self-Healing Architecture - SRR, reflection, human escalation
End-to-End Context Flow - 6-layer state propagation and monitoring
Evaluation Framework - 43 hypotheses, DeepEval quality metrics
Intellectual Property - Complete IP registry

Operations & Integration

OPA Policy - Enterprise policy setup
Security Guidelines - Release readiness checklist
Integration Guide - Enterprise integration patterns (ERP, Cloud)
Future Roadmap - Development path and vision

For Contributors

We welcome contributions! Please review the following guides based on your interest:

I want to...	Start here
Add a new demo use case	docs/RFC_UIX_001_SCENARIO_REGISTRY.md - How to add new scenarios to the registry.
Integrate CYNEPIC with another system (ERP, Cloud)	docs/INTEGRATION_GUIDE.md - API usage, data ingestion patterns, security.
Test the platform demos end-to-end	docs/DEMO_WALKTHROUGH.md and docs/END_USER_TESTING_GUIDE.md
Understand the future vision	docs/FUTURE_ROADMAP.md - Planned features and areas for improvement.
Review contribution guidelines	CONTRIBUTING.md - Code standards, commit messages, PR process.

Quick Contribution Steps

Fork the repository.
Create a feature branch: git checkout -b feature/my-feature
Make changes and run tests: pytest tests/ -v
Commit with a descriptive message: git commit -m "Add my feature"
Push: git push origin feature/my-feature
Open a Pull Request.

License

Business Source License 1.1 (BSL) - see LICENSE for details. For production use, see COMMERCIAL_LICENSE.

Commercial Use

This software is source-available under BSL 1.1. You may freely use it for development, testing, academic research, and personal projects. Production use is permitted provided it is not competitive with Cisuregen's products (see LICENSE for exact terms).

If your use case involves offering CARF-based functionality as a hosted service or embedding it in a commercial product, contact licensing@cisuregen.com for a commercial license.

On February 19, 2030, this version converts to Apache License 2.0.

Intellectual Property

The architectural innovations in this project -- including the entropy-aware Cynefin routing mechanism, the deterministic Guardian policy enforcement layer, the integrated causal-Bayesian-neurosymbolic pipeline, and the EpistemicState provenance schema -- are original works of Cisuregen. See IP_CLASSIFICATION.md for the full IP tier mapping.

"CARF", "CYNEPIC", and related marks are trademarks of Cisuregen. See NOTICE for trademark usage guidelines.

Acknowledgments

LangGraph - Workflow orchestration
DoWhy - Causal inference
PyMC - Bayesian modeling
HumanLayer - Human-in-the-loop SDK

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.agent		.agent
.github/workflows		.github/workflows
benchmarks		benchmarks
carf-cockpit		carf-cockpit
config		config
debugging_screenshots		debugging_screenshots
demo		demo
docs		docs
modeltraining		modeltraining
notebooks		notebooks
projectcarfpics		projectcarfpics
scripts		scripts
src		src
tests		tests
tla_specs		tla_specs
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLA.md		CLA.md
COMMERCIAL_LICENSE.md		COMMERCIAL_LICENSE.md
CONTRIBUTING.md		CONTRIBUTING.md
CURRENT_STATUS.md		CURRENT_STATUS.md
DEV_REFERENCE.md		DEV_REFERENCE.md
Dockerfile		Dockerfile
HANDOFF.md		HANDOFF.md
IP_CLASSIFICATION.md		IP_CLASSIFICATION.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
SECURITY.md		SECURITY.md
THIRD_PARTY_LICENSES.md		THIRD_PARTY_LICENSES.md
docker-compose.yml		docker-compose.yml
firebase.json		firebase.json
improvements.md		improvements.md
pyproject.toml		pyproject.toml
research.md		research.md
start_backend.bat		start_backend.bat
start_dashboard.bat		start_dashboard.bat

Folders and files

Latest commit

History

Repository files navigation

Project CARF: CYNEPIC Architecture 0.5

Overview

Key Features

Data & Analytical Flows in CARF architecture

User Interface -- React Cockpit (CARF Cockpit)

Benchmark Results

Overall Grade: A+ -- 43/43 Hypotheses Passed (100%)

Indicated Use Cases

Benchmark Data Sources & Methodology

Quick Start

Option 1: Local Development (Recommended)

Option 2: Docker Compose (Full Stack)

Option 3: Test Mode (No API Keys Required)

Configuration

Library Usage (Notebooks & Data Pipelines)

Core Architecture

Cynefin Domains

API Endpoints

Core Endpoints

Insights & Transparency Endpoints

Agent Tracking Endpoints

Data Loading Endpoints

Simulation & What-If Endpoints

ChimeraOracle (Fast Predictions) Endpoints

Visualization & Configuration Endpoints

Advanced Query Modes

Escalation & Compliance Endpoints

Governance & Policy Endpoints (18 endpoints)

Monitoring & Operational Intelligence Endpoints (Phase 18)

Developer & Diagnostics Endpoints

Example API Calls

Get Started: Test the Platform

Option A: Run a Pre-Built Demo Scenario

Option B: Onboard Your Own Data

Project Structure

Testing

LLM Quality Evaluation (DeepEval)

Dashboard Views

Analyst View

Developer View

Executive View

Governance View

Data Visualization

Theme Support

Documentation

Getting Started

Architecture & Design

Operations & Integration

For Contributors

Quick Contribution Steps

License

Commercial Use

Intellectual Property

Acknowledgments

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages