ExPhil - Neural Network Architecture Lab for Melee AI

Experimenting with neural network architectures to build competitive Super Smash Bros. Melee bots in Elixir.

ExPhil is a platform for training and evaluating AI agents that play Melee via Slippi. It supports 30+ backbone architectures — from MLPs to Mamba SSMs to Transformers — all implemented in Elixir using Nx/Axon. The project uses imitation learning from human replays, with reinforcement learning refinement planned.

The long-term goal is competitive bots for Melee's lower-tier characters (Mewtwo, Ganondorf, Link, Zelda, Ice Climbers, Game & Watch), but high-tier replay data is more abundant, so that's where architecture iteration happens first.

Why Elixir?

Most game AI research uses Python/PyTorch. ExPhil uses Elixir deliberately:

BEAM concurrency for parallel self-play (multiple game instances, Elo matchmaking, population training)
Nx/Axon + EXLA gives GPU acceleration via XLA without Python overhead
Fault tolerance — OTP supervisors keep training alive through transient failures
Edifice — our companion library with 92+ ML architecture implementations in pure Elixir

Architecture Experiments

The core question: which neural network architecture best captures Melee's temporal dynamics at 60 FPS?

Benchmarked Backbones

Backbone	Type	Inference	60 FPS	Val Loss	Notes
MLP	Single-frame	9 ms	Yes	3.11	No temporal context
LSTM	Recurrent	229 ms	No	2.95	Best accuracy, too slow
GRU	Recurrent	~150 ms	No	~3.0	Faster LSTM variant
Attention	Transformer	17 ms	Borderline	3.07	Parallelizable
Mamba	State Space	8.9 ms	Yes	3.00	Best speed/accuracy tradeoff
Jamba	Hybrid SSM	~20 ms	Needs opt	~3.0	Mamba + Attention layers

3-epoch benchmark, RTX 4090, batch_size=256, 50 replays

Additional Architectures

All available for experimentation: Griffin, xLSTM, RetNet, RWKV, GLA, HGRN, KAN, S4, S4D, S5, H3, Liquid, Decision Transformer, Zamba, Mamba-2, GatedSSM, Performer, DeltaNet, FNet, Perceiver, TTT, Hopfield, NTM, Reservoir, SNN, Bayesian — plus Diffusion, ACT, and Flow Matching policy types.

See Architecture Guide for beginner-friendly explanations, or Architectures Reference for benchmarks and CLI options.

How It Works

Slippi Replays (.slp)
        |
    Peppi Parser (Rust NIF)
        |
    Frame Extraction + Learned Embedding (288 dims)
        |
    Temporal Backbone (Mamba / Attention / LSTM / ...)
        |
    Autoregressive Policy Head (6 heads)
        |
    Controller Output → Dolphin

State Embedding: Player positions, velocities, action states, damage, stocks, stage, projectiles. Uses learned embeddings by default (288 dims) — trainable action/character/stage embeddings replace one-hot vectors, enabling 6x larger networks at the same training speed.

Temporal Backbone: Processes 60 frames of game context. Swappable via --backbone flag.

Policy Head: 6-head autoregressive controller output:

Buttons (8 independent), main stick (17x17), c-stick (17x17), shoulder (5 levels)

Training Pipeline

Behavioral Cloning — learn from human replays via cross-entropy loss
Self-Play RL — PPO refinement against itself (infrastructure complete, scaling next)

Quick Start

git clone https://github.com/blasphemetheus/exphil.git
cd exphil
mix deps.get

# Interactive setup (recommended for beginners)
mix exphil.setup

# Train with presets
mix run scripts/train_from_replays.exs --preset quick --replays ./replays
mix run scripts/train_from_replays.exs --preset production --online-robust

# Try different architectures
mix run scripts/train_from_replays.exs --backbone mamba --temporal --replays ./replays
mix run scripts/train_from_replays.exs --backbone attention --temporal --replays ./replays
mix run scripts/train_from_replays.exs --backbone griffin --temporal --replays ./replays

# Evaluate
mix run scripts/eval_model.exs --checkpoint checkpoints/model.axon

# Play against the AI (requires Dolphin + Slippi)
mix run scripts/play_dolphin_async.exs --policy checkpoints/model_policy.bin

Training Presets

Preset	Use Case	Time (RTX 4090)
`quick`	Code testing	~2 min
`gpu_standard`	Balanced training	~30 min
`production`	Maximum quality	~2-3 hours
`mewtwo`	Mewtwo specialist	~2-3 hours
`ganondorf`	Ganondorf specialist	~2-3 hours

Key Features

Feature	Description
30+ Backbones	Swap architectures with a single CLI flag
Learned Embeddings	288-dim trainable embeddings (6x more efficient than one-hot)
Frame Delay Augmentation	Train with 0-18 frame random delays for online robustness
Embedding Caching	2-3x training speedup, ~100x with augmented caching
Model EMA	Exponential moving average for better generalization
ONNX Export	INT8 quantization for 0.55ms inference
Dolphin Integration	Sync and async runners for live play
Self-Play Infrastructure	PPO, population training, Elo matchmaking

Tech Stack

Language: Elixir on the BEAM VM
ML: Nx / Axon with EXLA (XLA backend)
Architectures: Edifice — 92+ architecture implementations in Elixir
Replay Parsing: Peppi via Rustler NIF
Game Interface: libmelee (Python, via Port)
Inference: ONNX Runtime for production deployment

Current Status

2469 tests passing
Imitation learning pipeline (single-frame + temporal)
30+ backbone architectures benchmarked or available
4 policy types (Standard, Diffusion, ACT, Flow Matching)
Dolphin integration for live play
PPO trainer and self-play infrastructure built
Training features: EMA, cosine scheduling, augmentation, caching, checkpointing

Next: Large-scale self-play training on GPU clusters, scaling experiments with more data and larger models.

We Want Your Replays

Training data makes or breaks imitation learning. If you play Melee competitively, your replays are valuable — especially for lower-tier characters where data is scarce.

Characters we especially need: Mewtwo, Ganondorf, Link, Zelda, Ice Climbers, Game & Watch

Find your replays at ~/.slippi/replays/ and reach out: dleie on Discord (spikenard)

Research References

Builds on:

slippi-ai — Vladfi's BC+RL framework (TensorFlow)
Project Nabla — Lessons on policy collapse in fighting game AI
Phillip — Original deep RL Melee bot

Key lessons applied:

Two-stage training: BC first, RL second (pure RL struggles with Melee's complexity)
Autoregressive action heads: Sample buttons, stick, c-stick sequentially
Population-based training: Maintain diverse agents to avoid policy collapse
Frame delay training: Essential for online play with 18+ frames of latency

Contributing

See CONTRIBUTING.md for setup and guidelines. PRs welcome!

License

MIT License — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 766 Commits
.github/workflows		.github/workflows
config		config
docs		docs
lib		lib
native		native
notebooks		notebooks
priv		priv
scripts		scripts
test		test
.credo.exs		.credo.exs
.dockerignore		.dockerignore
.formatter.exs		.formatter.exs
.gitignore		.gitignore
.muzak.exs		.muzak.exs
.tool-versions		.tool-versions
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile.cpu		Dockerfile.cpu
Dockerfile.filter		Dockerfile.filter
Dockerfile.gpu		Dockerfile.gpu
Dockerfile.play		Dockerfile.play
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
mix.exs		mix.exs
mix.lock		mix.lock
shell.nix		shell.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ExPhil - Neural Network Architecture Lab for Melee AI

Why Elixir?

Architecture Experiments

Benchmarked Backbones

Additional Architectures

How It Works

Training Pipeline

Quick Start

Training Presets

Key Features

Tech Stack

Current Status

We Want Your Replays

Research References

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ExPhil - Neural Network Architecture Lab for Melee AI

Why Elixir?

Architecture Experiments

Benchmarked Backbones

Additional Architectures

How It Works

Training Pipeline

Quick Start

Training Presets

Key Features

Tech Stack

Current Status

We Want Your Replays

Research References

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages