This repository implements Reinforcement Learning (RL) techniques from scratch for the game of Connect 4. It serves as the companion repo to the blog post series about Connect-Zero. Currently it implements:
- basic REINFORCE -- blog post
- REINFORCE with baseline -- blog posts on theory and implementation
- A2C (Actor-Critic with one-step TD value bootstrapping) -- blog posts on theory, implementation, and evaluation
- PPO -- blog posts on theory, implementation forthcoming
It also contains some utility scripts for having models play single games or tournaments against each other, perform pretraining, evaluate the performance on tactical puzzles, and export models to ONNX format.
The scripts require torch
, matplotlib
, numpy
, and click
.
To run the examples, navigate to the train/
subdirectory and
execute e.g.
$ python example3-rwb.py
The webapp/
subdirectory contains a JavaScript applet for
interactively playing against an ONNX exported model.
If you want to play against a live version, the strongest public version is currently hosted in the applet in the RwB post.