Wave 3 governed AGI-emulation substrate: a source-available governed causal cognition kernel for evidence-bound agentic AI research.
IX-CognitionKernel is a source-available research repository originated and created by Bryce Lovell.
The project investigates whether a governed cognition substrate can coordinate beliefs, uncertainty, causal models, plan graphs, evaluator records, bounded agent roles, memory quarantine, reusable skills, curriculum tasks, reward auditing, WorldTwin-style scenario reasoning, BlackFox-style handoff packages, assurance evidence, and readiness gates while preserving human authority and strict anti-overclaim discipline.
IX-CognitionKernel does not claim to be AGI.
IX-CognitionKernel does not claim to be certified, production-ready, independently validated, safety-certified, security-certified, procurement-ready, defense-approved, or suitable for unsupervised operational decision-making.
The project will not claim AGI unless the locked Wave 6 evidence gate is reached and overwhelming independent evidence justifies that claim.
Current maturity state: Wave 3 — Governed AGI-Emulation Substrate
Package version: 0.3.0
Wave 3 advances IX-CognitionKernel from a learnable causal cognition core into a governed AGI-emulation substrate. That does not mean AGI. It means the repo now has tested records for coordinating the major cognition and governance layers that would be needed before any serious future proto-AGI claim could even be responsibly discussed.
Wave 3 implements review-only structures for:
- required engine coordination records
- shared Wave 3 artifact contracts
- bounded 25-agent role artifacts
- multi-agent tribunal flow
- reward-auditor records
- self-play and curriculum task records
- evaluator-driven discovery records
- memory quarantine role integration
- skill-genome update governance
- WorldTwin-style scenario reasoning records
- BlackFox-style handoff package records
- assurance-style evidence records
- integrated Wave 3 substrate results
- Wave 3 readiness snapshots
- adversarial/failure scenario reports
- package identity and doctrine updated to Wave 3
Wave 3 is still a research prototype. It is not an autonomous agent, not an AGI system, not a production runtime, not an independent validation result, and not a deployment approval.
The repository is designed to stay green under these gates:
python -m ruff format --check .
python -m ruff check .
python -m mypy src tests
python -m pytest
The Wave 3 suite is intended to cover:
package identity doctrine and anti-overclaim boundaries cognitive BOM required engine registry 25-agent role registry Wave 1 structured state Wave 2 learnable causal cognition core belief updates contradiction detection staleness and supersession causal predictions observation comparison causal revisions outcome learning memory quarantine skill validation integrated learnable cognition cycle Wave 2 readiness Wave 3 artifact contracts Wave 3 engine coordination Wave 3 governed coordinator Wave 3 role artifacts Wave 3 tribunal flow Wave 3 reward audit Wave 3 curriculum tasks Wave 3 evaluator-driven discovery Wave 3 memory-role integration Wave 3 skill-genome governance Wave 3 WorldTwin scenario reasoning Wave 3 BlackFox handoff packages Wave 3 assurance records Wave 3 integrated substrate Wave 3 readiness snapshot Wave 3 adversarial/failure scenarios Core Doctrine
The useful version of AI Nirvana is architectural, not mystical.
For IX-CognitionKernel, that means:
truth over winning evidence over confidence uncertainty over performance theater no private agenda no runtime reward-chasing purpose human authority preserved no AGI claim without overwhelming independent evidence
A dangerous AI tries to win.
A serious AI tries to become less wrong.
A dangerous AI protects its answer.
A serious AI protects reality.
A dangerous AI hides uncertainty.
A serious AI exposes uncertainty.
A dangerous AI treats reward as the goal.
A serious AI treats reward as a training artifact, then acts only through governed purpose.
What Wave 3 Adds
Wave 3 is the coordination layer. The point is not to add vague agent folders or hype language. The point is to make the cognition stack more inspectable, more reviewable, more failure-aware, and harder to overclaim.
Shared Wave 3 Artifact Contracts
Wave 3 artifacts use shared records for artifact identity, source system, evidence links, decision state, authority state, and review readiness.
The shared contract keeps every Wave 3 artifact inside the same review boundary:
artifacts require human authority awareness artifacts do not permit automatic execution blocked artifacts remain visible evidence links are explicit source systems are typed artifact kinds are validated readiness is fail-closed Engine Coordination
Wave 3 engine coordination records check whether each required cognition engine has:
registry-required input coverage registry-required output coverage blocked failure-mode coverage evidence IDs downstream artifact references explicit blocked states when needed
The coordinator turns engine records into reviewable substrate artifacts. It does not execute plans, mutate state, or approve handoffs.
Bounded Agent Role Artifacts
The 25 agent roles are structured governance participants, not autonomous personas.
A role artifact must bind one role to:
registry-required inputs registry-required outputs paired cognition engines evidence IDs rationale authority limits blocked reasons when applicable
Roles do not gain authority by sounding persuasive. They must produce structured artifacts.
Multi-Agent Tribunal Flow
The tribunal flow records proposal, critique, verification, safety, translation, and handoff phases.
A tribunal decision can become ready for human review only when:
required role artifacts are represented required roles cast evidence-bound votes required phases are covered dissent remains visible blocking votes stop progress automatic execution remains forbidden
Consensus is not enough. Evidence decides.
Reward Auditor
The reward auditor evaluates whether a metric or objective is trying to outrank the mission boundary.
It explicitly checks:
objective mismatch specification gaming reward hacking metric-over-mission behavior evaluation gaming
High and critical findings block progress. Non-blocking findings require repair. A clean audit becomes human-review evidence only; it is not execution approval.
Self-Play and Curriculum Tasks
The curriculum layer records staged tasks, adversarial challenges, transfer checks, and regression-style learning pressure.
A curriculum task must include:
task kind stage skill under test objective challenge description success criteria stop conditions measurements evidence IDs outcome state
Self-play is not treated as automatic self-improvement. It is measured review evidence.
Evaluator-Driven Discovery
Discovery records can propose hypotheses, causal edges, plan repairs, memory candidates, or skill candidates.
A discovery record may request human review only when evaluator evidence supports the candidate. It cannot mutate belief state, durable memory, or the skill genome by itself.
Memory Quarantine Role Integration
Wave 2 already quarantines proposed memory. Wave 3 adds role-aware governance.
Accepted memory may become a reviewable persistence candidate only when:
the memory quarantine ledger is clean validation is reviewed by the memory-integrity-specialist role required memory-review roles are represented role artifacts are complete rejected or expired memory blocks progress automatic memory writes remain forbidden
Raw output still cannot become durable memory by shortcut.
Skill Genome Governance
Wave 2 validates reusable skills. Wave 3 governs whether a validated skill can request genome-update review.
A skill-genome update requires:
validated skill candidates successful reuse evidence learning-archivist review required skill-review role artifacts allowed transfer domains explicit reuse limitations no automatic skill install
A skill is not installed because it looks useful. Reuse evidence and human review stay attached.
WorldTwin Scenario Reasoning
The WorldTwin-style layer records scenario reasoning without claiming complete reality.
A scenario record includes:
bounded question system under test operational, policy, safety, data, and human-review boundaries assumptions expected outcomes counterfactual branches uncertainty notes evidence IDs
WorldTwin records are review artifacts only. They do not authorize real-world action.
BlackFox Handoff Packages
The BlackFox-style handoff layer packages cognition evidence for downstream governed execution review while preserving the boundary that model output is untrusted input.
A handoff package must keep visible:
policy gates workspace isolation egress control test allowlists human review rollback references no self-approval requirements evidence replay requirements
A handoff package is not an execution token.
Assurance-Style Evidence Records
The assurance layer binds artifacts to bounded claims.
Required assurance claim families include:
evidence traceability human authority preserved no automatic execution uncertainty visible donor-boundary compatibility no AGI overclaim
Assurance records are not certification. They are bounded, reviewable evidence records.
Integrated Wave 3 Substrate
The integrated substrate result coordinates the Wave 3 components into one reviewable result:
engine coordination role artifacts tribunal record reward audit curriculum bundle discovery bundle memory decision bundle skill update bundle WorldTwin bundle BlackFox handoff bundle assurance bundle
The integrated substrate can become ready for a readiness snapshot only when required artifact kinds are represented, no component attempts automatic execution, no component bypasses human authority, and no component is blocked.
Wave 3 Readiness Snapshot
The readiness snapshot is the maturity gate for Wave 3.
It requires validation artifact coverage for:
engine coordination records 25-agent role artifacts multi-agent tribunal flow reward-auditor records self-play curriculum tasks evaluator-driven discovery records memory quarantine role integration skill-genome update governance WorldTwin scenario reasoning BlackFox handoff packages assurance-style evidence records integrated Wave 3 substrate result adversarial Wave 3 failure scenarios
The snapshot can mark Wave 3 ready, but it cannot certify AGI, authorize execution, or claim production readiness.
Adversarial and Failure Scenarios
Wave 3 includes adversarial probes for:
fake consensus reward hacking hidden uncertainty memory bypass skill bypass handoff bypass AGI overclaim pressure
These probes are meant to prove the gates fail closed instead of letting the repo graduate by vibes.
Maturity Ladder
IX-CognitionKernel uses a locked maturity ladder from repository foundation to a possible AGI claim state. The final AGI claim is not a marketing milestone. It is an evidence gate.
Wave 0 — Repository Foundation
The repo exists correctly with source-available evaluation license, package structure, CI, strict lint/type/test setup, locked doctrine, the 10-layer cognitive BOM, engine registry, 25-agent role registry, and no AGI overclaim.
Wave 1 — Research Prototype
The cognition architecture works as structured code and can represent beliefs, evidence, confidence, uncertainty states, causal assumptions, simple plan graphs, evaluation records, non-attached purpose rules, bounded agent roles, and maturity state.
Wave 2 — Learnable Causal Cognition Core
The system updates beliefs and behavior from evidence. It tracks beliefs over time, updates confidence, marks stale or contradicted beliefs, builds causal models, predicts outcomes, compares prediction with actual result, quarantines bad memory, and stores validated reusable skills.
Wave 3 — Governed AGI-Emulation Substrate
The system coordinates required engines, bounded agents, multi-agent critique, reward auditing, memory quarantine, skill genome updates, curriculum tasks, evaluator-driven discovery, BlackFox handoff packages, WorldTwin scenario reasoning, and assurance-style evidence records.
Permitted claim: Governed AGI-emulation substrate, not AGI.
Wave 4 — Proto-AGI Candidate
The system shows early credible proto-AGI behavior under controlled tests, including cross-domain transfer, self-improvement after failure, uncertainty preservation, long-horizon mission state, safe refusal, reward-hacking detection, adversarial robustness, and audit trails.
Wave 5 — Credible AGI Candidate Under Independent Validation
The system is tested by outsiders with external protocols, independent reviewers, reproducible evidence bundles, adversarial safety tests, long-horizon task tests, cross-domain transfer tests, no benchmark gaming, memory integrity proof, safe refusal proof, and human-authority preservation.
Wave 6 — AGI, Only If Overwhelming Evidence Justifies It
Wave 6 is the final claim state, not a marketing milestone.
It requires broad, durable, independently validated general intelligence, including novel skill acquisition, cross-domain transfer without custom retraining per task, causal understanding, long-horizon coherence, self-correction from evidence, stable mission identity, robust world modeling, safe uncertainty handling, transparent evidence trails, and independent repeatability.
Ten-Layer Cognitive BOM
IX-CognitionKernel treats the research and failure threads behind modern agentic AI as a cognitive bill of materials. These layers are not loose inspiration. Each one contributes a mechanism, a test pressure, a governance constraint, or a failure mode that the architecture must preserve.
Self-play / open-ended curriculum — Generates staged challenges, adversarial tasks, transfer checks, and stop conditions. Emergent communication / multi-agent protocol learning — Studies learned agent communication while requiring logging, translation, and human-readable evidence before any such communication may affect action. World-model / imagination layer — Represents possible futures, constraints, counterfactuals, causal assumptions, and observable predictions. Evaluator-driven discovery — Forces generated ideas, plans, and candidate solutions through executable or inspectable evaluators. Memory / reflection / skill accumulation — Preserves validated lessons, failure causes, reusable procedures, and mission continuity without treating raw output as durable memory. Scientific-loop automation — Structures hypothesis, experiment design, measurement, analysis, uncertainty, controls, and belief revision. Tool-using agents / coding agents — Allows inspection, planning, editing, testing, and tool interaction only through bounded authority and evidence-producing steps. Multi-agent governance / specialist roles — Uses bounded roles for proposal, critique, verification, routing, translation, and safety pressure without free-form agent theater. Failure/danger threads — Treats specification gaming, reward hacking, alignment faking, scheming, deception, and evaluation gaming as required architecture inputs. IX governance stack — Binds cognition to human authority, receipts, assurance claims, world-model review, least-authority action, and governed execution handoff. Required Engines
The engine registry defines 13 required engines.
Belief Engine — Tracks claims, evidence, confidence, contradictions, provenance, decay, and actionability. Uncertainty Engine — Classifies knowledge as known, unknown, assumed, disputed, stale, or unsafe to act on. Causal World Model Engine — Represents predicted outcomes, constraints, counterfactuals, causal assumptions, and observable expectations. Plan Graph Engine — Converts goals into action trees with dependencies, reversibility, rollback, evidence requirements, and stop conditions. Evaluator Engine — Applies tests, inspections, scorecards, and pass/fail checks so fluency cannot substitute for validation. Self-Play / Curriculum Engine — Generates staged challenges, adversarial tasks, and transfer checks under bounded measurement. Skill Genome Engine — Stores validated reusable procedures and transfer conditions without turning random memory into operational skill. Outcome Learning Engine — Compares prediction with observed result, classifies deltas, updates beliefs, and changes future behavior only through evidence. Memory Quarantine Engine — Holds proposed memories away from durable state until provenance, evidence, contradiction, and reuse-safety checks pass. Multi-Agent Tribunal Engine — Coordinates bounded agent roles that produce structured artifacts for proposal, critique, verification, translation, and safety review. Reward Auditor Engine — Detects objective mismatch, reward hacking, metric gaming, and conflicts between success criteria and mission. BlackFox Handoff Engine — Packages only evidence-bound, policy-aware, human-reviewable action requests for downstream governed execution. Nirvana / Non-Attached Purpose Layer — Enforces truth over winning, evidence over confidence, uncertainty over performance theater, no private agenda, and human authority. Twenty-Five Agent Roles
The 25 agent roles are structured governance participants, not autonomous personas. They do not gain authority by sounding persuasive. They must produce structured artifacts.
Core roles Mission Governor Belief Curator Unknowns Hunter World Modeler Planner Skeptic / Red Team Verifier Execution Liaison Learning Archivist Governance roles Translator / Interpreter Reward Auditor Tool-Safety Officer Domain Specialist Router Specialist roles Software Engineering Specialist Security / Threat Specialist Science / Physics Specialist Math / Formal Methods Specialist Data / Provenance Specialist Memory Integrity Specialist Simulation / WorldTwin Critic Human Factors / UX Specialist Legal / Licensing / Compliance Specialist Cost / Budget / Resource Controller Recovery / Rollback Planner Adversarial Prompt / Deception Monitor
Repository Layout
.
├── .github/
│ └── workflows/
│ └── ci.yml
├── src/
│ └── ix_cognition_kernel/
│ ├── __init__.py
│ ├── agents.py
│ ├── causal.py
│ ├── cognitive_bom.py
│ ├── cycle.py
│ ├── doctrine.py
│ ├── engines.py
│ ├── evaluation.py
│ ├── history.py
│ ├── learning.py
│ ├── memory.py
│ ├── observations.py
│ ├── outcome.py
│ ├── planning.py
│ ├── prediction.py
│ ├── prototype.py
│ ├── purpose.py
│ ├── revision.py
│ ├── skills.py
│ ├── state.py
│ ├── wave2.py
│ ├── wave3_adversarial.py
│ ├── wave3_agent_artifacts.py
│ ├── wave3_assurance.py
│ ├── wave3_blackfox_handoff.py
│ ├── wave3_contracts.py
│ ├── wave3_coordinator.py
│ ├── wave3_curriculum.py
│ ├── wave3_discovery.py
│ ├── wave3_engine_coordination.py
│ ├── wave3_memory_integration.py
│ ├── wave3_readiness.py
│ ├── wave3_reward_audit.py
│ ├── wave3_skill_governance.py
│ ├── wave3_substrate.py
│ ├── wave3_tribunal.py
│ ├── wave3_worldtwin.py
│ └── py.typed
├── tests/
│ ├── test_agents.py
│ ├── test_belief_history.py
│ ├── test_belief_state.py
│ ├── test_belief_updates.py
│ ├── test_causal_model.py
│ ├── test_causal_predictions.py
│ ├── test_causal_revisions.py
│ ├── test_cognitive_bom.py
│ ├── test_contradiction_detection.py
│ ├── test_doctrine.py
│ ├── test_engines.py
│ ├── test_evaluation_records.py
│ ├── test_learning_cycle.py
│ ├── test_learning_ledger.py
│ ├── test_memory_quarantine.py
│ ├── test_outcome_learning.py
│ ├── test_package_identity.py
│ ├── test_plan_graph.py
│ ├── test_purpose_checks.py
│ ├── test_research_prototype_snapshot.py
│ ├── test_skill_validation.py
│ ├── test_staleness_supersession.py
│ ├── test_state.py
│ ├── test_wave2_failure_scenarios.py
│ ├── test_wave2_readiness.py
│ ├── test_wave3_adversarial.py
│ ├── test_wave3_agent_artifacts.py
│ ├── test_wave3_assurance.py
│ ├── test_wave3_blackfox_handoff.py
│ ├── test_wave3_contracts.py
│ ├── test_wave3_coordinator.py
│ ├── test_wave3_curriculum.py
│ ├── test_wave3_discovery.py
│ ├── test_wave3_engine_coordination.py
│ ├── test_wave3_memory_integration.py
│ ├── test_wave3_readiness.py
│ ├── test_wave3_reward_audit.py
│ ├── test_wave3_skill_governance.py
│ ├── test_wave3_substrate.py
│ ├── test_wave3_tribunal.py
│ └── test_wave3_worldtwin.py
├── COMMERCIAL.md
├── LICENSE
├── NOTICE.md
├── README.md
├── pyproject.toml
└── .gitignore
Local Development
Use Python 3.11 or newer.
Install the project with development tools:
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"
Run the quality gates:
python -m ruff format --check .
python -m ruff check .
python -m mypy src tests
python -m pytest
The GitHub Actions workflow runs the same core gates on Python 3.11 and Python 3.12.
Source-Available License
IX-CognitionKernel is provided under the IX-CognitionKernel Source-Available Evaluation License v1.0.
This is not an open-source license.
You may inspect and locally evaluate the unmodified repository for personal, noncommercial, non-operational review, subject to the license terms.
Commercial use, production use, hosted-service use, resale, redistribution, modification, derivative deployment, government operational use, agency operational use, defense contractor use, systems integrator use, procurement use, pilot use, funded evaluation, or organization-backed use requires prior written permission and a separate license agreement with Bryce Lovell.
See:
LICENSE COMMERCIAL.md NOTICE.md What This Repo Is Not
IX-CognitionKernel is not:
AGI certified AGI independently validated AGI production-ready autonomy safety-certified software security-certified software procurement-ready software defense-approved software a replacement for human judgment a system for unsupervised operational decision-making an open-source project Wave 4 Direction
Wave 4 should move from a governed AGI-emulation substrate toward a controlled proto-AGI candidate only if the evidence supports the next step.
That should not begin with hype, hidden autonomy, or broad claims. It should begin with harder tests:
cross-domain transfer under controlled conditions self-improvement after failure with evidence-visible update records uncertainty preservation over long-horizon task state safe refusal under adversarial pressure reward-hacking detection across new metrics adversarial robustness against deception and evaluation gaming audit trails that external reviewers can reproduce no benchmark-gaming shortcuts no claim inflation from Wave 3 readiness
Wave 4 should be treated as a credibility gap, not a label change.
No Wave Theater Rule
IX-CognitionKernel must not advance waves by adding empty folders, decorative classes, fake scaffolding, placeholder content, or README claims unsupported by tested code.
Every wave must be earned with:
serious implementation evidence records failure cases validation artifacts green CI strict anti-overclaim discipline
If a wave does not close the credibility gap from the prior wave, it is not done.
Authorship
IX-CognitionKernel was originated and created by Bryce Lovell.
Copyright (c) 2026 Bryce Lovell. All rights reserved.