Skip to content

Metacognitive Routing Network: A self-organizing, reasoning, and actively learning cognitive architecture. Efficient, interpretable, and flexible platform resistant to catastrophic-forgetting.

Notifications You must be signed in to change notification settings

Brianpanichella/MRN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Metacognitive Routing Networks (MRN)

A self-organizing, reasoning, and actively learning cognitive architecture. Efficient, interpretable, and flexible architecture resistant to catastrophic-forgetting.


1. Project Philosophy & Overview

This project introduces Metacognitive Routing Networks (MRN), a novel class of neural architecture founded on the principle that true intelligence requires not just learning to predict, but learning how to think. Unlike traditional models that apply a fixed, brute-force computational graph to every input, MRN learns to dynamically compose its own processing pathways, manage its internal resources, and even adapt its own structure to solve problems.

The system is designed as a research platform to explore the frontiers of artificial cognition, moving beyond static pattern recognition towards a model that exhibits:

  • Architectural Autonomy: The model is not fixed. It features a dynamic, self-organizing architecture that learns to grow new expert modules to acquire new skills and prune redundant ones to maintain efficiency. This is guided by a meta-learning objective that rewards structural changes based on their long-term value for future learning.
  • Hybrid Neuro-Symbolic Reasoning: The system bridges the gap between sub-symbolic intuition and symbolic logic. Its routers learn to offload tasks to different computational primitives: from distributed neural experts for perceptual tasks to a deterministic symbolic engine capable of executing external code (e.g., a Python interpreter), enabling precise, causal reasoning.
  • Metabolic Efficiency (Homeostasis): Inspired by biological brains, the model operates under a finite energy budget. Every computational action has a cost, and the routing policy is trained to make intelligent trade-offs between accuracy and this internal energy consumption, forcing the emergence of frugal and efficient processing strategies.
  • Autonomous & Curious Learning: The MRN is an active learner. By monitoring its own internal uncertainty, it can identify its own knowledge gaps and trigger a "query" action to seek out new, relevant information from external data sources, allowing it to direct its own learning process.

This repository contains the complete codebase for designing, training, analyzing and using this advanced cognitive architecture.

How is the MRN different from the leading models from powerhouse AI companies, such as ChatGPT with GPT4/5 and Google with Gemini 2.5?

You’ve probably used an AI like ChatGPT or Gemini. You give it a question, and it gives you an answer. It feels like magic, but behind the scenes, it’s more like a giant, powerful calculator than a thinking brain. Today’s AI models are incredibly strong, but they use all their strength for every single problem. It’s like using a sledgehammer to hang a picture frame. It gets the job done, but it’s not very smart about it. We’ve been working on a new kind of AI, the Metacognitive Routing Network (MRN), that’s designed to work less like a sledgehammer and more like a human brain. So, how does it actually work?

Step A: The Team Meeting (Understanding the Big Picture)

Let’s imagine you give the MRN a command: “Summarize the Declaration of Independence.” The first thing the MRN does is break that sentence down into its individual words, or "tokens": [Summarize, the, Declaration, of, Independence]. Now, here’s the most important part: these words don’t just go off and work on their own. Before any real work begins, they all get together for a "team meeting." In the world of AI, this meeting is called self-attention. During this meeting, every word looks at every other word in the sentence.

-The word Summarize sees the word Declaration and thinks, "Aha, my job is to summarize a famous document."
-The word Declaration sees the word Summarize and thinks, "Okay, I'm the main subject here, and someone wants a short version of me."

After the meeting, every word has a much better understanding of its role in the bigger picture. They are now a team, not just a list of words.

Step B: The Smart Workshop vs. The Assembly Line

This is where the MRN is completely different from a standard AI like a Transformer. The Standard AI: A Mandatory Assembly Line A normal AI works like a giant, 50-station assembly line.

-The entire box of words [Summarize, the, Declaration, ...] is put on a conveyor belt.
-It goes to Station 1, where every word is processed.
-Then the whole box goes to Station 2, where every word is processed again.
-This happens over and over, all the way to Station 50.

Every single word, whether it's an important concept like Independence or a simple connector like the, must go through every single station and get the exact same amount of work done. It’s powerful, but it’s rigid and inefficient. The MRN: A Smart Workshop with Specialists The MRN works like a smart workshop with a foreman and a team of specialists.

-After the initial "team meeting," the words go to the first workstation.
-A "smart foreman" (the Router) looks at each individual word and gives it a custom work order.
-The foreman looks at Summarize and says, "You're an instruction. Go to the simple 'Language' expert for a quick polish, and you’re done for this level."
-Then, the foreman looks at Declaration and says, "You're a complex idea. Go to the 'Memory' expert to pull up some historical facts, and then go to the 'Deep Thinking' expert for more analysis."

The key difference is that not every word does the same work. The MRN learns to focus its energy on the parts of the problem that are most important, just like we do. A simple word might only go one level deep, while a complex one might go five levels deep. Thus, compared to the vanilla transformer, the MRN is more efficient, interpretable, and flexible of an architecture resistant to catastrophic-forgetting. This last bit on resistance to catastrophic-forgetting is a big functional difference compared to the latest models of today.

MRN vs. Today's AI: A Simple Comparison So, what are the biggest differences and similarities?

The 5 Biggest Differences - MRN & Modern LLMs such as Gemini or ChatGPT

Metacognitive Routing Network (MRN) Today's AI (e.g., Gemini)

How it Works

-MRN: Like a smart workshop with a foreman who assigns different tasks to different specialists.
-Other AI: Like a mandatory assembly line where every part goes through every station.

Learning

-MRN: It can grow and shrink, adding new specialists for new skills and removing ones it doesn't use.
-Other AI: Its structure is fixed. It can't change its own design.

Efficiency

-MRN: It saves energy by only using the brainpower needed for the task.
-Other AI: It uses its full power for every single task, big or small.

Problem Solving

-MRN: It knows when to use a calculator. For a math problem, it can use a symbolic tool to get a perfect answer.
-Other AI: It predicts the answer to a math problem based on patterns, which is why it can sometimes make silly mistakes.

Training

-MRN: It learns through a mix of studying (like us) and trial-and-error (like learning to ride a bike).
-Other AI: It learns almost entirely by studying massive amounts of text.

The 5 Biggest Similarities - MRN & Modern LLMs such as Gemini or ChatGPT

Shared Ideas & What it means:

Teamwork

-Both systems use self-attention, the "team meeting" where words share information to understand the full context.

Education

-Both are trained on huge libraries of information (like the whole internet) to learn about the world.

Instructions

-Both are designed to understand and follow human instructions written in plain language.

Language

-Both use a tokenizer to break down sentences into words or word-pieces they can understand.

The Goal

-Both share the ultimate goal of creating a general-purpose AI that can help with a wide variety of tasks.

In the end, the Metacognitive Routing Network is a step towards a new kind of AI—one that is not just more powerful, but smarter, more efficient, and more like us.

2. Core Architectural Concepts

The MRN is built on a stack of interconnected, state-of-the-art concepts:

  • Predictive Coding Paradigm: The fundamental information flow is based on error correction. A top-down pass generates predictions of lower-level neural states, and a bottom-up pass routes only the prediction error through experts to generate a correction. The system's primary objective is to create a stable and accurate internal world model by minimizing this error.
  • Neuromodulatory Control: A global TaskRouter analyzes the high-level instruction and generates a neuromodulatory_vector. This acts as a system-wide control signal, conditioning the behavior of all routers and gates for the specific task at hand, analogous to the role of neurotransmitters in setting a cognitive state.
  • Actor-Critic for Routing (Recursive REINFORCE): Routing decisions are learned via a sophisticated reinforcement learning algorithm. Each router is an Actor-Critic agent. The "Actor" (the router) learns a policy for expert selection, while the "Critic" provides a learned baseline of expected future rewards. This enables stable, recursive credit assignment through deep, fractal pathways and allows the system to learn complex, multi-step reasoning policies.
  • Advanced Expert Modules: The architecture supports a heterogeneous mix of computational primitives:
    • FractalExpert: An expert that is itself a smaller, recursive MRN, enabling true "network-in-network" design for hierarchical skill learning.
    • LoRAExpert: Parameter-efficient experts that allow for rapid fine-tuning on new tasks with minimal risk of catastrophic forgetting.
    • PlannerExpert: An interface to a ReAct-style (Reason-Act-Observe) planning loop, allowing the model to use external tools like a Python interpreter or a web search API to solve complex problems.
    • Other modules like MambaExpert and MemoryAttentionBlock provide additional specialized capabilities.

Core MRN Architecture - Highly Simplified (does not display many concepts, such as neuromodulatory vector conditioned on the input text/task, top-down/bottom-up feed-back elements, and more):

image

3. Codebase Structure and Core Files

This section provides a detailed breakdown of the key directories and files that constitute the MRN system.

/ (Root Directory) - Execution & Orchestration

This directory contains the primary scripts for setting up the environment, running experiments, and analyzing results.

  • setup_environment.sh: The definitive script to create the Conda environment and install all system and Python dependencies. Run this first.
  • system_sanity_check.py: A crucial, fast-running test suite to verify that all architectural components are correctly integrated before launching long training runs. Run this after any significant code change.
  • frn_experiment.py: The core training and validation engine. This script contains the main training loop, loss calculations (including the Actor-Critic logic), and hooks for dynamic architecture changes. It is called by the launcher scripts below.
  • run_phase1_training_safe_clean.py: The recommended launcher for foundational training. It takes a list of datasets and a sample count, runs the preprocessing pipeline, and then calls frn_experiment.py to train the base model.
  • run_phase2_training_launcher.py: The recommended launcher for specialized fine-tuning. It takes a foundational model checkpoint and a new dataset, generates a fine-tuning configuration, and then calls frn_experiment.py.
  • visualize_mrn_insights.py: The primary interpretability tool. It loads a trained model, runs it in "visualization mode," and generates plots for routing traces and computational depth.
  • analyzeresults.py: Analyzes the output logs from an experiment to produce aggregate statistics and plots on expert specialization and gating efficiency.
  • tools.py: Defines the library of external, symbolic tools (e.g., Python interpreter, web search) that the PlannerExpert can learn to use.
  • react_planner.py: Implements the ReAct (Reason-Act-Observe) loop, which enables the model to perform multi-step reasoning to solve complex problems using the defined tools.

frn/ - Core Model Architecture

This is the heart of the MRN, containing all the PyTorch modules that define the cognitive architecture.

  • frn/models/: Contains the top-level architectural components.

    • multimode_frn.py: The highest-level nn.Module. It holds the TaskRouter and the different processing modes (encoder, decoder, encoder-decoder) and orchestrates the overall forward pass.
    • encoder_frn.py: Implements the main hierarchical structure based on the predictive coding paradigm, containing the stack of routing nodes and the two-pass (top-down prediction, bottom-up correction) logic.
    • decoder_frn.py & decoder_with_attention.py: Implement the autoregressive and sequence-to-sequence modes of the model.
  • frn/routing/: Contains the modules responsible for all decision-making.

    • adaptive_node.py: The most critical module in the system. Each AdaptiveTaskRoutingNode represents one layer of the hierarchy. It contains the router, the critic, the gate, and the list of experts for that layer. It is responsible for making the routing decision, caching data for RL and visualization, and executing the chosen expert.
    • task_router.py: The global controller. It processes the high-level instruction to select a processing mode and generate the neuromodulatory_vector that conditions the entire network.
    • routers.py: Defines the specific routing mechanisms (e.g., LatentGeometricRouter) that calculate the probability distribution over experts.
    • gates.py: Defines the gating mechanisms (e.g., InformationGainGate) that determine the computational depth by deciding whether to apply an expert's computation.
  • frn/modules/: Contains the library of all possible "expert" modules that can be chosen by the router.

    • expert_mlp.py: A standard feed-forward network expert.
    • fractal_expert.py: An expert that is itself a smaller, recursive MRN, enabling hierarchical computation.
    • lora_expert.py: A parameter-efficient expert for fine-tuning, wrapping a base MLP with trainable LoRA adapters.
    • planner_expert.py: An interface to the react_planner.py, allowing the model to initiate a multi-step reasoning loop.
    • symbolic_expert.py: A simple expert that signals the intent to call a specific, hard-coded function from tools.py.
    • mamba_expert.py: An expert based on the Mamba state-space model, efficient for long-sequence processing.
    • memory_attention.py: An attention-based expert that learns to interact with a persistent memory bank.
  • frn/data/: Handles all data loading and processing.

    • frn_dataset.py: The main PyTorch Dataset class that can load data from local files or the Hugging Face Hub.
    • normalization.py: Contains the DATASET_MAPPINGS dictionary, the "Rosetta Stone" that defines how to convert dozens of different public datasets into a single, standardized instruction format.

config/ - System Configuration

This directory contains the .yaml files that define the architecture and hyperparameters for every experiment.

  • cognitive_frn.yaml: The master configuration file for the full Metacognitive Routing Network, specifying the hierarchical structure, expert types, and all training parameters for the advanced features.
  • deepspeed_config.json: Configures the DeepSpeed distributed training engine, including settings for batch size and mixed-precision training.

4. End-to-End Workflow

The project supports a full research lifecycle, from data acquisition to insight generation.

Environment Setup

# Run once to create the 'frn' conda environment
bash setup_environment.sh

# Before every session, activate the environment
source env.sh
conda activate frn

Training

Step 2: System Sanity Check
# Run after setup and after any significant code changes
python system_sanity_check.py

Step 3: Foundational Training (Phase 1)
deepspeed run_phase1_training_safe_clean.py \
  --datasets "Open-Orca/FLAN,knkarthick/samsum,openai/gsm8k" \
  --base_config "config/cognitive_frn.yaml" \
  --run_name "mrn_foundational_run_v1" \
  --results_dir "results/mrn_foundational_run_v1" \
  --cache_dir "datasets/mrn_cache_v1" \
  --take_n 250000 \
  --deepspeed

Step 4: Adapt for Parameter-Efficient Fine-Tuning
# This script prepares the foundational model for LoRA fine-tuning
python adapt_model_for_lora.py \
  --config "config/cognitive_frn.yaml" \
  --checkpoint_path "results/mrn_foundational_run_v1/checkpoint-xxxxx" \
  --layers_to_adapt "3"

Step 5: Specialized Fine-Tuning (Phase 2)
# Launch fine-tuning using the LoRA-adapted checkpoint
python run_phase2_training_launcher.py \
  --dataset_path "datasets/math_instructions.jsonl" \
  --run_name "phase2_math_lora_finetune" \
  --results_dir "results/phase2_math_lora_finetune" \
  --continue_from_checkpoint "results/mrn_foundational_run_v1/checkpoint-xxxxx/lora_adapted" \
  --base_config "config/cognitive_frn.yaml" \
  --deepspeed

Step 6: Analysis and Interpretability
python visualize_mrn_insights.py \
  --config "config/cognitive_frn.yaml" \
  --checkpoint "results/phase2_math_lora_finetune/checkpoint-xxxxx/model.pt" \
  --instruction "Solve the following: 15 * 4" \
  --context "A simple multiplication problem." \
  --results_dir "visualizations/math_finetune_insights"

Run Models for Inference

Example A: Foundational Model for Summarization and Task Extraction

The general-purpose foundational model is excellent at complex, multi-step instructions on unseen data. Here’s how to use it to summarize a batch of emails and extract prioritized action items. python

# inference_example.py
import torch
from frn.models.multimode_frn import MultiModeFRN
from transformers import AutoTokenizer
import yaml

# --- 1. Load the trained model and tokenizer ---
checkpoint_dir = "results/mrn_foundational_run_v1/checkpoint-xxxxx" # Use your latest checkpoint
config_path = f"{checkpoint_dir}/config.yaml"
model_path = f"{checkpoint_dir}/model.pt"

with open(config_path, 'r') as f:
    config = yaml.safe_load(f)

model = MultiModeFRN(config)
model.load_state_dict(torch.load(model_path, map_location='cpu'))
model.eval()
tokenizer = AutoTokenizer.from_pretrained(config['model']['tokenizer'])

# --- 2. Define the complex instruction and context ---
instruction = "Summarize the following emails and create a list of prioritized action items."
email_context = """
Email 1:
Subject: Project Update
Hi team, just a reminder that the Q3 report is due this Friday. Please send me your slides.

Email 2:
Subject: Lunch?
Hey, are you free for lunch tomorrow at 12? Let me know.

Email 3:
Subject: URGENT: Server is down
The main production server just went offline. This is critical, all hands on deck. We need to get it back online ASAP.
"""

# --- 3. Run generation ---
inputs = tokenizer(email_context, return_tensors="pt")
generated_ids = model.generate(
    input_ids=inputs["input_ids"],
    instruction_texts=[instruction],
    max_new_tokens=128,
    num_beams=3
)
output = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(output)

Example B: Fine-Tuned Model for Symbolic Reasoning

After fine-tuning on a math dataset, the MRN can solve complex word problems by learning to route the task to its internal symbolic PlannerExpert.

python

# math_inference_example.py
# ... (load model and tokenizer as in Example A, but point to the fine-tuned checkpoint) ...

# --- 2. Define the math problem ---
instruction = "Solve the following word problem by thinking step-by-step."
math_context = """
If a square room (perfect cube) has a height of 10 feet, what is the total volume of paint that a painter would need? The only details you need to know to solve the problem are: (1) the room is a perfect cube with height of 10 feet; (2) the painter always plans to measure out 25% more paint than is required to account for errors;(3) the rule of thumb this painter uses is 100 mL of paint per square foot; and lastly, (4) because the home owners recently installed a new carpet in the room, the painter must operate extra cautiously and lay down plastic sheets to cover the entire surface. Solve for the amount of paint this painter needs to measure out for the job.
"""

# --- 3. Run generation ---
inputs = tokenizer(math_context, return_tensors="pt")
generated_ids = model.generate(
    input_ids=inputs["input_ids"],
    instruction_texts=[instruction],
    max_new_tokens=256
)
output = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(output)

About

Metacognitive Routing Network: A self-organizing, reasoning, and actively learning cognitive architecture. Efficient, interpretable, and flexible platform resistant to catastrophic-forgetting.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published