Experimenting with neural network architectures to build competitive Super Smash Bros. Melee bots in Elixir.
ExPhil is a platform for training and evaluating AI agents that play Melee via Slippi. It supports 30+ backbone architectures — from MLPs to Mamba SSMs to Transformers — all implemented in Elixir using Nx/Axon. The project uses imitation learning from human replays, with reinforcement learning refinement planned.
The long-term goal is competitive bots for Melee's lower-tier characters (Mewtwo, Ganondorf, Link, Zelda, Ice Climbers, Game & Watch), but high-tier replay data is more abundant, so that's where architecture iteration happens first.
Most game AI research uses Python/PyTorch. ExPhil uses Elixir deliberately:
- BEAM concurrency for parallel self-play (multiple game instances, Elo matchmaking, population training)
- Nx/Axon + EXLA gives GPU acceleration via XLA without Python overhead
- Fault tolerance — OTP supervisors keep training alive through transient failures
- Edifice — our companion library with 92+ ML architecture implementations in pure Elixir
The core question: which neural network architecture best captures Melee's temporal dynamics at 60 FPS?
| Backbone | Type | Inference | 60 FPS | Val Loss | Notes |
|---|---|---|---|---|---|
| MLP | Single-frame | 9 ms | Yes | 3.11 | No temporal context |
| LSTM | Recurrent | 229 ms | No | 2.95 | Best accuracy, too slow |
| GRU | Recurrent | ~150 ms | No | ~3.0 | Faster LSTM variant |
| Attention | Transformer | 17 ms | Borderline | 3.07 | Parallelizable |
| Mamba | State Space | 8.9 ms | Yes | 3.00 | Best speed/accuracy tradeoff |
| Jamba | Hybrid SSM | ~20 ms | Needs opt | ~3.0 | Mamba + Attention layers |
3-epoch benchmark, RTX 4090, batch_size=256, 50 replays
All available for experimentation: Griffin, xLSTM, RetNet, RWKV, GLA, HGRN, KAN, S4, S4D, S5, H3, Liquid, Decision Transformer, Zamba, Mamba-2, GatedSSM, Performer, DeltaNet, FNet, Perceiver, TTT, Hopfield, NTM, Reservoir, SNN, Bayesian — plus Diffusion, ACT, and Flow Matching policy types.
See Architecture Guide for beginner-friendly explanations, or Architectures Reference for benchmarks and CLI options.
Slippi Replays (.slp)
|
Peppi Parser (Rust NIF)
|
Frame Extraction + Learned Embedding (288 dims)
|
Temporal Backbone (Mamba / Attention / LSTM / ...)
|
Autoregressive Policy Head (6 heads)
|
Controller Output → Dolphin
State Embedding: Player positions, velocities, action states, damage, stocks, stage, projectiles. Uses learned embeddings by default (288 dims) — trainable action/character/stage embeddings replace one-hot vectors, enabling 6x larger networks at the same training speed.
Temporal Backbone: Processes 60 frames of game context. Swappable via --backbone flag.
Policy Head: 6-head autoregressive controller output:
- Buttons (8 independent), main stick (17x17), c-stick (17x17), shoulder (5 levels)
- Behavioral Cloning — learn from human replays via cross-entropy loss
- Self-Play RL — PPO refinement against itself (infrastructure complete, scaling next)
git clone https://github.com/blasphemetheus/exphil.git
cd exphil
mix deps.get
# Interactive setup (recommended for beginners)
mix exphil.setup
# Train with presets
mix run scripts/train_from_replays.exs --preset quick --replays ./replays
mix run scripts/train_from_replays.exs --preset production --online-robust
# Try different architectures
mix run scripts/train_from_replays.exs --backbone mamba --temporal --replays ./replays
mix run scripts/train_from_replays.exs --backbone attention --temporal --replays ./replays
mix run scripts/train_from_replays.exs --backbone griffin --temporal --replays ./replays
# Evaluate
mix run scripts/eval_model.exs --checkpoint checkpoints/model.axon
# Play against the AI (requires Dolphin + Slippi)
mix run scripts/play_dolphin_async.exs --policy checkpoints/model_policy.bin| Preset | Use Case | Time (RTX 4090) |
|---|---|---|
quick |
Code testing | ~2 min |
gpu_standard |
Balanced training | ~30 min |
production |
Maximum quality | ~2-3 hours |
mewtwo |
Mewtwo specialist | ~2-3 hours |
ganondorf |
Ganondorf specialist | ~2-3 hours |
| Feature | Description |
|---|---|
| 30+ Backbones | Swap architectures with a single CLI flag |
| Learned Embeddings | 288-dim trainable embeddings (6x more efficient than one-hot) |
| Frame Delay Augmentation | Train with 0-18 frame random delays for online robustness |
| Embedding Caching | 2-3x training speedup, ~100x with augmented caching |
| Model EMA | Exponential moving average for better generalization |
| ONNX Export | INT8 quantization for 0.55ms inference |
| Dolphin Integration | Sync and async runners for live play |
| Self-Play Infrastructure | PPO, population training, Elo matchmaking |
- Language: Elixir on the BEAM VM
- ML: Nx / Axon with EXLA (XLA backend)
- Architectures: Edifice — 92+ architecture implementations in Elixir
- Replay Parsing: Peppi via Rustler NIF
- Game Interface: libmelee (Python, via Port)
- Inference: ONNX Runtime for production deployment
- 2469 tests passing
- Imitation learning pipeline (single-frame + temporal)
- 30+ backbone architectures benchmarked or available
- 4 policy types (Standard, Diffusion, ACT, Flow Matching)
- Dolphin integration for live play
- PPO trainer and self-play infrastructure built
- Training features: EMA, cosine scheduling, augmentation, caching, checkpointing
Next: Large-scale self-play training on GPU clusters, scaling experiments with more data and larger models.
Training data makes or breaks imitation learning. If you play Melee competitively, your replays are valuable — especially for lower-tier characters where data is scarce.
Characters we especially need: Mewtwo, Ganondorf, Link, Zelda, Ice Climbers, Game & Watch
Find your replays at ~/.slippi/replays/ and reach out: dleie on Discord (spikenard)
Builds on:
- slippi-ai — Vladfi's BC+RL framework (TensorFlow)
- Project Nabla — Lessons on policy collapse in fighting game AI
- Phillip — Original deep RL Melee bot
Key lessons applied:
- Two-stage training: BC first, RL second (pure RL struggles with Melee's complexity)
- Autoregressive action heads: Sample buttons, stick, c-stick sequentially
- Population-based training: Maintain diverse agents to avoid policy collapse
- Frame delay training: Essential for online play with 18+ frames of latency
See CONTRIBUTING.md for setup and guidelines. PRs welcome!
MIT License — see LICENSE for details.