Skip to content

kekzl/axo

Repository files navigation

axo — a living, continually learning neuromorphic being

build

Axo living in real time

Axo hunting food, avoiding hazards and learning in real time — with its brain (eye place-cells, motor neurons, smell/value and hunger drive) rendered live. Recorded straight from --watch.

Axo is a small artificial creature with a brain that learns while it lives.

Most neural networks are trained once on a big dataset and then frozen. Axo is different: its brain is a simulated network of spiking neurons (like real ones, talking in electrical pulses) that keeps learning continuously as the creature moves around a tiny world — getting hungry, hunting food, avoiding danger — all in one unbroken "life".

The twist that makes it interesting: Axo learns using only local rules. Every connection (synapse) changes based only on what its own two neurons just did — there is no backpropagation, no global optimizer, no separate training phase. That's much closer to how a biological brain is thought to learn, and it's the question this project explores: how far can purely local learning go? The animation above is a real recording of Axo hunting, with its brain — what it sees, the action it picks, its hunger — drawn live next to the world.

This is a research/learning project built from scratch in C++23 + CUDA. It is not a library or a finished product — it's a series of runnable experiments, each honest about what works and where the walls are.

Requirements

  • An NVIDIA GPU and Docker with the NVIDIA Container Toolkit (so the container can reach the GPU).
  • That's all — CUDA, the compiler and every other dependency live inside the container. Nothing is installed on your host.

Developed on an RTX 5090 (CUDA arch sm_120, CUDA 13.3). On a different NVIDIA card, set CMAKE_CUDA_ARCHITECTURES in CMakeLists.txt to your GPU's compute capability.

Quickstart — watch Axo live

docker compose run --rm build ./build/axo --watch

Loads (or, the very first time, grows) a creature and shows it living in real time, with its brain rendered next to the world — exactly like the animation at the top. Press Ctrl-C to stop; its progress is saved.

Want it to keep living across runs? --live runs one "life phase" (≈6000 steps), then saves the brain to being/. Call it again and the same creature picks up where it left off, a little smarter each time:

docker compose run --rm build ./build/axo --live

Key ideas, in plain words

  • Spiking neural network (SNN) — neurons that fire discrete pulses ("spikes") over time, like biological ones, instead of passing smooth numbers in a single shot.
  • Local / Hebbian learning — a synapse changes based only on the activity of the two neurons it connects ("cells that fire together wire together"). No global error signal is broadcast.
  • STDP (spike-timing-dependent plasticity) — the local rule in detail: if neuron A fires just before B their link strengthens, just after it weakens. This is how Axo's perception self-organizes, with no labels.
  • R-STDP (reward-modulated STDP) — the same rule, but a global "dopamine" reward decides whether recent activity gets reinforced or suppressed. This is how Axo learns to act.
  • Feedback Alignment — a trick to train hidden layers locally, without backpropagation, by sending the error back through fixed random connections. Lets depth be learned, not hand-wired.
  • Continual learning / catastrophic forgetting — ordinary networks overwrite old skills when they learn new ones; a real brain keeps both. Axo learns new patterns without erasing old ones.
  • "No backprop" — backpropagation (the standard training algorithm) needs a global backward pass and is considered biologically implausible. Everything here avoids it on principle.

What Axo can do (overview)

Each ability is a self-contained, runnable experiment; the detailed sections follow below.

  • 🐛 Lives — perceives, acts and learns in one continuous loop, no resets (--live, Phase J)
  • 👁️ Learns to see — its place-cell vision self-organizes, unsupervised (Phase K)
  • 🍎 Learns to act — hunts food from the consequences of its own moves (Phases A, B, D, J)
  • 🧭 Gets curious — explores when full, driven by emergent novelty-seeking (Phase M)
  • ☠️ Avoids danger & poison — remembers what hurt it, and anticipates poison before touching it (Phase X)
  • 🧠 Learns depth without backprop — solves XOR and trains a hidden layer via Feedback Alignment (Phases E, F, G)
  • ♾️ Doesn't forget — keeps absorbing new patterns without overwriting old ones (Phase I)
  • 🔣 Grounds symbols — forms inner symbols tied to its own experience (Phases S, T)

Build from source & run the tests

# compile
docker compose run --rm build bash -c "cmake -S . -B build -G Ninja && cmake --build build -j"
# run the unit test suite (needs the GPU)
docker compose run --rm build ctest --test-dir build --output-on-failure

Visualizations (spike rasters, receptive fields, selectivity heatmaps) live in viz/:

docker compose run --rm viz bash -c "pip install -q -r viz/requirements.txt && python viz/plot_raster.py"

The experiments, in detail

Everything below is the project's research log — one runnable experiment per section, ordered roughly as they were explored (not by difficulty), each honest about its limits. Skim freely; you don't need to read it top to bottom.

Phase E: third area — XOR needs depth (cerebellum model) — validated

docker compose run --rm build ./build/axo --phase E
docker compose run --rm build ./build/axo --phase E-diag   # proves: hidden code is separable

XOR is provably unsolvable with a single layer. The solution comes via the Marr-Albus cerebellum model: a fixed sparse spiking hidden area (granule-cell-like random conjunctions) expands the input nonlinearly, and a supervised delta-rule readout (Purkinje cell with climbing-fiber teaching signal — local, no backprop) learns on top of it. The same readout on the raw input fails (2/4), but on the hidden code it solves XOR (4/4) — which isolates the benefit of depth. Note: purely local STDP/R-STDP forms only single-feature detectors and does NOT solve XOR (shown via --phase E-diag); depth plus a teaching signal are required.

Phase F: Real local deep learning — the hidden layer learns BY ITSELF (Feedback Alignment) — validated

docker compose run --rm build ./build/axo --phase F
docker compose run --rm build ./build/axo --phase F-sweep   # regime: where random weights don't separate XOR

In Phase E the hidden area was fixed (only the readout learned). Real deep learning means: the hidden layer learns by itself — with credit assignment WITHOUT backprop and WITHOUT weight transport. The mechanism is Feedback Alignment (Lillicrap 2016) as a spiking three-factor rule: the output error e_k = target_k − rate_k is projected onto the hidden layer through a fixed random feedback B (δ_j = Σ_k B[j,k]·e_k); each synapse updates purely locally as ΔW = lr · (Pre×Post eligibility) · modulator. Output layer: modulator = e_k (delta rule); hidden layer: modulator = δ_j (projected error). For this there is a new primitive reward_update_vec (per-neuron modulator instead of a global scalar) — unit-tested (test_fa).

For a small hidden area (H=16), in which random weights do not linearly separate XOR, FA rebuilds the hidden code purely locally: linearly separable before 2/4 → after 4/4. The equally sized frozen random depth stays at 2/4. This isolates the claim: the usable depth was learned, not handed over for free by random expansion. (The spiking 2-neuron greedy eval that is also printed is too noise-sensitive at H=16 and is only a secondary measure; what matters is the linear separability = what a local delta readout achieves.)

Phase G: Real local deep learning AT SCALE — MNIST (Feedback Alignment) — validated

# place MNIST in data/ (train-images-idx3-ubyte, train-labels-idx1-ubyte)
docker compose run --rm build ./build/axo --phase G

The same mechanism, scaled up: 784 → H=200 → 10, spiking FA-learned hidden layer, local delta readout on the hidden rates. Three equally sized conditions separate "scaling" from "learning":

Net (same size) Test accuracy
flat (linear readout on pixels) 82 %
deep, hidden frozen-random 15 %
deep, hidden LEARNED via FA 45 %

At identical size (H=200), the learned depth beats the frozen random depth by ~3× — it's not the neuron count but the local learning of depth that delivers the performance (the LLM logic in miniature: large learned nets ≫ large random ones). Honestly: deep < flat, because the spike-rate encoding of raw pixels loses information on a nearly linear task — the depth advantage shows up on nonlinear XOR (Phase F), not on linearly separable MNIST. Together the two phases give the full picture of deep learning: learned depth beats random depth, and most strongly where the task demands nonlinearity.

Phase I: Continual learning — the brain expands its knowledge — validated

docker compose run --rm build ./build/axo --phase I

A brain doesn't learn a single fixed task but keeps learning new things without unlearning the old. That is exactly the hallmark of a self-learning brain — and the point where standard AI fails (catastrophic forgetting: when it learns B, it overwrites A).

Here 14 overlapping patterns are introduced ONE AFTER ANOTHER — each old one is never shown again. After each new pattern we measure how many of the patterns seen so far the brain still distinguishes. The mechanism that lets knowledge grow is purely local and already built into the model: homeostasis (adaptive threshold — whoever fires gets "tired") recruits free neurons for each new pattern instead of overwriting occupied ones; lateral inhibition + weight normalization keep the code sparse and selective (little interference). The fatigue state decays slowly, so permanently occupied neurons stay protected while transiently firing ones recover (a free reserve).

Stream Coverage over the patterns seen so far (1…14) Final (mean)
WITH homeostasis 1 2 3 4 5 6 7 8 9 9 10 8 10 13 — follows the diagonal ~10/14
WITHOUT (control) 1 2 3 4 5 5 6 6 6 5 5 6 6 6 — capped ~6/14

Under capacity pressure (14 patterns, only 50 neurons) the brain continually expands its knowledge through recruitment and holds ~10 concepts; without the mechanism, new patterns overwrite the old ones and it stays at ~6. That is the core ability of a continually learning brain — learning more without unlearning — made visible with purely local rules.

Phase J: The EMBODIED brain — a little creature learns to live — validated

docker compose run --rm build ./build/axo --phase J

The high point: no longer a single mechanism, but everything together in one living loop. A little creature lives in a 7×7 world and learns over one continuous life (24,000 steps, no episode reset) to hunt from the consequences of its own actions — purely local:

sensory area (egocentric place cells: where is food relative to me?)
  → motor area (4 actions, winner-take-all)
  → reward on approaching/eating → reward-modulated plasticity (R-STDP)
Food per 3000 steps
random baseline (untrained) 4
life (learning): 143 → 434 → 441 → 415 → 420 → 394 → 365 → 380 ~400

The creature starts essentially helpless (≈ random) and learns to hunt by itself — the food rate rises by a factor of ~95. At the end of life it greedily solves 97.7% of all start/food configurations, so it has learned the rule "move toward the food," not just one path. Perceiving, acting and learning in a single, continuous, purely local loop — no backprop, no central optimizer, no reset. So that the creature doesn't stay stuck forever in a rare policy trap, food "spoils" after a while (reappears) — the rest is what the brain builds from its own experience.

Phase K: Vision & action — the being learns its perception (stage 1) — validated

docker compose run --rm build ./build/axo --phase K

Instead of a hard-wired place-cell encoding, the being learns its perception by itself — and acts on it. From the egocentric retina (where is the food relative to me?), two separate spatial channels self-organize via STDP: an x-area over the columns (food left/right) and a y-area over the rows (up/down). Each forms its own place-cell map purely unsupervised (lateral inhibition, no fatigue). A "critical period" first develops vision (random exploration), then the motor (R-STDP) learns to hunt on the disentangled, self-learned code:

retina → x-area (STDP place cells) ┐
         y-area (STDP place cells) ┴→ motor (4 actions, R-STDP)
Food per 3000 steps
random baseline (untrained) 3
life (learning): 1 → 50 → 171 → 248 ~156

The being hunts on its self-learned vision — food rate ~52× above random, and greedily it solves 94% of all start/food configurations. The key was the disentangling: a conjunctive vision map ("food is in the NE cell") is not controllable by a single local motor layer (the same wall as in deep learning, Phase F/G) — separate axis channels, on the other hand, are, exactly like the disentangled place cells in Phase J, except that here the perception is learned. Perception and control, both purely local, no backprop.

The living being — --live (stages 0 + 1 unified) — validated

docker compose run --rm build ./build/axo --live      # one life phase; call it repeatedly

Here the individual phases become a single, coherent being with a continuous biography. Each call of --live is one life phase: the being awakens, shows its current abilities, lives & learns for 6000 steps, sleeps and saves its entire state. Across container/process restarts it continues where it was — the whole brain state lives in being/ (sx.bin, sy.bin, motor.bin, life.txt; gitignored).

It unifies everything so far:

  • lives continuously (persistence, stage 0) — no reset, an ongoing biography (age, meals, life phases),
  • sees with self-learned eyes (stage 1): at birth it opens "its eyes for the first time" — two critical periods first develop vision, then the hunting skill,
  • acts on this vision and gets better over its life (R-STDP),
  • lives off its own hunger (stage 1, drives): an energy budget drives it; the energy is part of its biography and persists with it,
  • explores out of curiosity (stage 2): when full, it doesn't keep hunting doggedly but explores its world out of habituation curiosity — it covers all cells over its life,
  • avoids danger: a hazard cell costs energy; after the first pain it remembers it and avoids it.
  • categorizes objects with grounded symbols & anticipates (stage 3 + expectation integrated): food comes as nutritious vs poisonous (each with a distinct appearance). A symbol area formed unsupervised at birth (purity 1.0, template readout) recognizes the type; the being learns the value of each symbol from its energy consequences. And it senses the type already at appearance and doesn't even approach poisonous food (anticipation, Phase X) — it spares itself the trips to the poison. This knowledge persists: born naive, it awakens experienced.

So it lives richly: hunt when hungry, explore when full, avoid danger, sense and avoid poison already at appearance — out of hunger, curiosity, learned values and prediction, all in one being. The anticipation gain is measurable: over four lives the meals rise 86 → 108 → 121 → 122 (previously ~57–74 without sensing), while hardly any poison is eaten anymore (~13/life, mostly born-naive/exploration) and ~190 poisonous foods per life are anticipatorily avoided — without walking up to them. Curiosity (49/49) and danger avoidance stay intact along the way. (Honestly: the symbol/value binding uses the same supplied structure as Phase S/T/X; a starving being still takes the occasional risk.)

Evidence (separate process restarts): the being awakens in the state in which it fell asleep:

Life awakens meals (hungry/full) explored danger (early→late)
1 (birth) full 96 (86 / 10) 49/49 1 → 0
2 hungry (8) 112 (95 / 17) 49/49 1 → 0

It hunts mainly when hungry, explores its whole world when full, and learns to avoid the danger (steps on it once, then never again). The earlier pure-hunger version:

Life awakens with eaten (hungry/full) sleeps with
1 (birth) full 117 (110 / 7) 50 (full)
2 50 (full) 150 (127 / 23) 38 (hungry)

A being that lives, sees, remembers and acts on its own drive — all from its own experience, learned purely locally. The foundation that the next stages dock onto (ROADMAP.md).

Watching — --watch

docker compose run --rm build ./build/axo --watch

(See the animation at the top.)

Watch Axo, the living being, in real time (colored ASCII, English display, one frame per step) — in its 12×10 world with three hazard cells (each learned individually through pain). It immediately loads your saved being (from being/) — otherwise one is born (with visible progress). Ctrl-C exits cleanly at any time — and the session counts toward Axo's ONE life: brain, value memory and age are saved to being/. While you watch, it keeps learning (R-STDP motor while hunting, food value from the consequence — poison now tastes the same as in --live too: energy down, value learned).

Visible is the whole world and the brain:

   Axo — a living being   age 12640  step 25  meals 2 (lifetime 96)
   energy [####------]  44 (hungry)

   . . . . O . . . . . .       O Axo  * food  x poison  ! hazard
   . . . . . . . . . ! .       | hungry — hunting food >
   . . * . . . . . . . .
                               == Brain (firing neurons) ==
                               Eye    x:..+###+..  y:..+###+..  (place cells: where is the food)
                               Motor: N[####]< E[ ] S[ ] W[ ]   -> N (direction)
                               Smell: food smells GOOD -> eat   (symbol area)
                               Drive: hunger [######----] -> HUNT

You see everything: energy/hunger, hunting vs curious exploring (when full), the danger (!, which it avoids), and above all the anticipation — when poisonous food appears, an x briefly blinks with "smells POISON — refuses to even go there". In the brain panel the eye place cells (food direction) fire live, along with the motor action neurons (winner marked), the smell/value symbol (GOOD/TOXIC) and the hunger drive.

Note: --live and --watch share world size (12×10) and brain (48 place cells per vision axis, 8 symbol neurons). An older being from a smaller world doesn't fit the new retina and is reborn on the first start.

Why not bigger? The percept scales (food direction is still linearly decodable to 0.95 even at 21×21), but the local R-STDP motor (REINFORCE at its core) no longer reaches hunting competence beyond ~12×10 — an honest, open research wall (details in ROADMAP.md, section "Motor scaling").

Phase S: Grounded symbols (stage 3) — validated

docker compose run --rm build ./build/axo --phase S

First step toward symbols anchored in one's own experience — no LLM talk, but a discrete inner token that makes a nonlinear category actionable. Task: an object has two sensory features (encoded noisily over the retina); the APPROACH/AVOID category is feature1 XOR feature2 — i.e. not linearly readable from the raw senses.

Honest metric, fixed in advance (not fakeable):

measured result
M1 emergence & correspondence purity / NMI token↔combo (labels only for evaluation) 0.99 / 0.95
M2 causal grounding — symbol agent correct-action rate 0.99
M2 ablation — raw agent (linear) ditto 0.52 (random)
M2 control — random-token agent ditto 0.51 (random)

Passed — and both hurdles count:

  • Two feature areas self-organize unsupervised (STDP + WTA + decaying habituation) one clean detector per feature value each (purity 0.99). Token = bound pair of the winners.
  • The raw agent at 0.52 proves: the category is genuinely nonlinear — not actionable without a symbol. The random token at 0.51 proves: it's the meaning of the token, not just "an extra input". The symbol agent solves the task (0.99).

Honest about the scope: the feature detectors emerge (unsupervised); the binding of the two into a combo token is supplied as a quantized index — the same "disentangle + quantized relay" bias as the x/y axes in vision (Phase K) and the pairing in Phase Pc. The motor solves the XOR nonlinearity via the clean token. Lesson from the build path (4 attempts): the monopoly of competitive spiking cells is only broken by the decaying habituation (tau_theta≈1500, like Phase I), and action selection needs the rate-based LinReadout instead of a 2-neuron spiking WTA (WTA pathology, like Phase F/Pc) — both fixes came out of the project's own findings.

Phase T: Sequences (stage 4) — validated

docker compose run --rm build ./build/axo --phase T

From the single symbol (stage 3) to the temporal sequence. Task over 3 symbols {A,B,C}: go = (symbol2 == successor(symbol1)) in the cycle A→B→C→A. This depends on both symbols and their order (e.g. AB → go, BA → no).

Agent sees correct rate
sequence (ordered token [tok1·S+tok2]) both symbols + order 0.99
memoryless only the second symbol 0.46
bag both symbols, without order 0.53

Passed — and both ablations count: the sequence agent solves the rule (0.99), while both controls clearly fail (even below the trivial baseline): the memoryless agent needs the memory of the first symbol, the bag agent needs the order.

The key that cracked >2 symbols (purity 0.64 → 0.99): separating learning from readout. The receptive fields arise via spiking STDP+WTA+habituation (emergent), but the token readout runs through a deterministic template match (argmax_j w_j·input over the learned fields), not through a spiking argmax — the latter is destabilized by the habituation accumulating within the window, which collapses at >2 modes. With the decoupling (+ cells ~4·K) the area clusters 3 modes cleanly (purity 0.99).

Honest about the scope: the binding of the two time steps into an ordered token is supplied time slotting (a "first/then" register, like the feature slotting in S). And: K=4+ still degrades (purity ~0.67–0.71 even with more cells) — the coverage problem of competitive cluster formation grows with K; getting K≥4 clean (init seeding / conscience tuning) is open work. Shown: K=2 and K=3 clean, K≥4 open.

Phase X: Temporal expectation / anticipation — validated

docker compose run --rm build ./build/axo --phase X

From the sequence to prediction: a cue (A/B) at the appearance of the food predicts its value (A → nutritious, B → poisonous). The food is only reachable after a path — the cue lies in the past, the decision in the present, and the appearance on arrival is not value-predictive. Only by holding the cue across the time gap can one decide correctly in anticipation.

Agent decides from correct rate
expectation (holds the past cue) the past 1.00
memoryless only the (non-predictive) present 0.48 (random)

Cue-symbol purity 1.00 (unsupervised, template readout). Passed: the expectation agent acts from prediction (1.00), the memoryless one stays at random — the value-relevant information lay in the past, so working memory is needed. Honest about the scope: the value-from-consequence is learned; holding the cue is a supplied latch (a register, like the time slotting in T). This is the validated mechanism — embedding it into the living --live being (avoiding poison already at appearance, without walking up to it) is the next step.

Phase L: Inner drives — the being acts on its own drive — validated

docker compose run --rm build ./build/axo --phase L

The step from the trained hunter (designer reward "closer = good") to a being with its own drive. It gets an energy/hunger budget: energy drops with every step (metabolism), eating refills it. The drive arises internally from the hunger — biologically incentive salience: when the being is full, food is barely appealing (retina dark) → it doesn't hunt; when it gets hungry, food lights up → it hunts. A critical period first learns the hunting skill (full salience), after which the drive steers the behavior.

energy (mean) famines (early→late) eats hungry vs full
WITH drive 48 (healthy middle) 14 → 6 (learns to survive) 0.049 vs 0.010 (~5× more when hungry)
without drive 95 (overeats) 0 → 0 0.088 vs 0.149 (state-blind)

The drive being regulates itself: it keeps its energy in a healthy middle and eats mainly when it is hungry — no more designed reward signal, the behavior springs from an inner need. The being without a drive doggedly overeats. A step toward the "self": the being acts of its own accord.

Phase P: Depth-forcing task (parity) — finding: the multilayer wall

docker compose run --rm build ./build/axo --phase P

A controlled depth test instead of MNIST (which is nearly linear and never needs depth). N-bit parity (XOR over N bits) is linearly unsolvable, and with a small hidden width even one layer fails at larger N — the "open road" on which a depth benefit can become visible at all. Measured with the right ruler: the separability of the hidden code (offline perceptron), not the noisy spiking readout.

N (combos) flat (linear) 1-hidden 2-hidden (layer-wise)
2 (XOR) 2/4 (fail) 4/4 (solves)
4 (parity) 8/16 5/16 (fail) 6/16 (fail)

Two findings, cleanly shown:

  1. The task forces depth: the same 1-hidden layer that fully solves XOR (4/4) fails at 4-bit parity (5/16, below random). One layer is not enough.
  2. The spiking multilayer coupling does NOT deliver the depth: the second layer doesn't catch the drop (6/16 ≈ 5/16). "Depth helps" is not demonstrable here — not because the stage is missing, but because the coupling wall is real.

Methodological lesson (learned the hard way): the behavioral readout sells a working layer short — it shows only 2/4 (50%) for XOR, even though the hidden code is 4/4-separable. Only the separability measurement reveals what was really learned; a single noisy number can mislead. Together with Phase F/G, "real deep learning" is achieved with one learned layer; multilayer remains the open research wall.

Phase Pc: Compositional depth — cracks parity-4 (16/16)

docker compose run --rm build ./build/axo --phase Pc

The multilayer wall (Phase P) cracked through decomposition: parity-4 = XOR(XOR(b0,b1), XOR(b2,b3)) — two XOR modules (each on one bit pair) plus a spiking combiner, the same "disentangle" lesson as in vision (Phase K). The decomposition (pairing + subgoals) is the supplied inductive bias, like the x/y axes in vision.

Path Result
module A / B (XOR on 1 pair each) 4/4 / 4/4
ensemble relay A / B (majority over modules) [0110] / [0110] (clean)
B) combiner over learned modules → quantized relay 16/16 behavioral, 16/16 separable
C) oracle relay [0110] (control) 16/16 separable (12/16 behav., readout seed)
A) combiner over distributed code [codeA|codeB] 6/16 (fails)

The real path solves parity-4 with 16/16 (learned XOR modules → ensemble-decoded relay → spiking FA combiner; deterministically reproduced). Three ingredients are needed — and that is the payoff:

  1. Robust base operation. Decisive was the exact Phase-F XOR config (16 active neurons/bit, w_norm=0.2·N, present=60). With a weak input (8/bit, w_norm=0.5·D) the base XOR is only marginally separable and composition fails (8/16). The wall was never "depth" but the robustness of the base operation. Marginal XOR can't be stacked, solid XOR can.
  2. Quantized inter-area relay. Over the distributed code it keeps failing (A: 6/16) — the signal between the areas must be a quantized decision bit (like labeled-line spikes in the brain), not a raw distributed code. An ensemble (population coding, "many neurons") makes the relay bit reliably clean.
  3. Compositional structure instead of a monolithic layer that is supposed to disentangle the entangled whole.

How this finding came about (an honest research arc): an initial run reported "13/16 cracks it" — an artifact (the online readout collapsed to a constant relay [1111], the offline probe overfit noise). The correction then yielded an apparently decisive negative finding (8/16, "XOR fundamentally marginal") — which was, however, config-dependent: with the weak input. Only the robust Phase-F config revealed the true picture: 16/16. Lesson to myself: behavioral + oracle control + high repetition against measurement artifacts — and check negative findings for config dependence before calling them "fundamental".

Phase M: Emergent curiosity via habituation (stage 2) — validated

docker compose run --rm build ./build/axo --phase M

Stage 2: intrinsic motivation that emerges from a neuronal building block instead of a hand-coded counter. No food, no goal, no reward — the novelty is the homeostasis (habituation, theta) in a spiking place-cell area: visited places habituate (their place signal goes weak), and the being is drawn toward the un-habituated (the new). World: two rooms connected by a single door; walls are discovered as a map.

Step 300 800 1500 6000
curious 85 85 108 111
random 47 55 56 111

The curious being finds the door and covers the world much faster than a random wanderer that stays trapped in the starting room (@800: 85 vs 55 of 111 cells). The decisive emergent addition: because habituation slowly decays, the old becomes new again → the being stays curious for life. In the last third of its life it still visits 95 distinct cells (vs 56 for random) — it never "ends" the exploration but keeps its world fresh. A (monotonic) counter couldn't do that. Nicely: the same homeostasis that protects knowledge from forgetting in Phase I drives the curiosity here.

Honestly: the novelty drive is now neuronal/emergent (habituation), and the lifelong behavior follows from the theta decay dynamics. The action selection still explicitly compares the habituation of the neighboring cells (not a pure spike reflex). A first attempt to drive this purely via an R-STDP motor failed instructively: a reward-driven policy converges to a fixed habit and then precisely cannot explore anymore — curiosity needs a never-freezing driving force, and the decaying habituation provides it.

Phase A: Agent learns from reward (R-STDP) — validated

Associative choice: 4 stimuli, 4 actions, immediate reward. Runs in seconds. docker compose run --rm build ./build/axo --phase A

Result: the moving hit rate rises from ~25% (random) to ~99% — the agent learns, via reward-modulated plasticity (eligibility traces + global dopamine signal d(t)), to choose the right action per stimulus. Two-area mini-brain: sensory input → motor area (winner-take-all), learning via the reward signal. Next stage: delayed reward (1D food search).

Phase B: 1D food search with delayed reward (R-STDP) — validated

docker compose run --rm build ./build/axo --phase B

The agent senses its position on a track (food in the middle), moves left/right and only gets the reward at the food. Via eligibility traces, R-STDP distributes the delayed reward back onto the steps that led there. With learning-rate annealing it is stably ~99% successful, mean steps ~2.3 (optimum ~2.0). First step with temporal credit assignment; next stage: variable food position / 2D.

Phase D: variable food position — generalization (R-STDP) — validated

docker compose run --rm build ./build/axo --phase D

The food changes every episode. The agent perceives position AND food via two place-cell populations and learns the relational rule "move toward the food". Greedy eval over ALL (position×food) pairs: 42/42 solved (100% generalization) — the brain learns a rule, not just a mapping.

Finding: episode-wide sparse reward fails here (a random policy reaches the food ~50% of the time anyway → no learning gradient). Only reward shaping (immediate reward per step: closer to the food = +1) gives a clean signal. A classic RL principle, made visible.

Phase C: Self-organization (synthetic patterns) — validated

No external dataset needed, runs in seconds.

  1. Run: docker compose run --rm build ./build/axo --phase C
  2. Heatmap: docker compose run --rm viz bash -c "pip install -q -r viz/requirements.txt && python viz/plot_selectivity.py selectivity_phaseC.bin viz/out/selectivity.png"

Result: the network organizes its neurons by itself so that each of the 6 patterns is represented by its own selective neurons — patterns_covered = 6/6, mean_selectivity ≈ 0.98. This is the validated proof that the brain forms structure from a raw spike stream unsupervised. These stabilized learning parameters (lateral inhibition + weight normalization) carry over into Phase A (agentic, R-STDP).

Phase 3: unsupervised MNIST learning (infrastructure, tuning open)

  1. Place MNIST in data/ (train-images-idx3-ubyte, train-labels-idx1-ubyte).
  2. Training: docker compose run --rm build ./build/axo --phase 3
  3. Fields: docker compose run --rm viz bash -c "pip install -q -r viz/requirements.txt && python viz/plot_receptive_fields.py weights_phase3.bin viz/out/receptive_fields.png"
  4. Accuracy: docker compose run --rm build ./build/axo --phase 3-eval

Status: the pipeline runs fully, but digit selectivity still needs hyperparameter tuning (cf. Phase C, where the learning dynamics were cleanly stabilized on controlled patterns). Currently classification accuracy is at random level — deliberately not tuned further, since the focus is on the agentic direction (Phase A).

License

MIT — see LICENSE.

About

A living, continually learning neuromorphic being — a spiking neural network in C++23/CUDA with purely local plasticity (STDP, R-STDP, and Feedback Alignment instead of backprop). Runs on RTX 5090.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors