This repository contains my implementation of Transformers from scratch, inspired by Andrej Karpathy's video. While the original implementation focused on a decoder-only model, I have extended it by adding an encoder as well. The goal of this project is to deeply understand the inner workings of Transformers by building them step by step without relying on high-level libraries like Hugging Face Transformers.
- Implements a full Transformer model (Encoder-Decoder architecture)
- Single-file implementation (
gpt.py) for simplicity - Includes essential components:
- Token Embeddings
- Positional Encodings
- Multi-Head Self-Attention
- Feedforward Layers
- Layer Normalization
- Encoder and Decoder Blocks
- Trained on sample text data to demonstrate functionality
To run the implementation, clone this repository and install the required dependencies:
git clone https://github.com/ahmetz3lka/transformers_from_scratch.git
cd transformers-from-scratch
pip install -r requirements.txtRun the script with:
python gpt.pyModify gpt.py to experiment with different model hyperparameters.
The entire Transformer model is implemented within a single file, gpt.py, to keep things simple and easy to follow. The key sections include:
- Embedding Layer: Converts tokens into dense vector representations.
- Self-Attention Mechanism: Captures relationships between tokens.
- Feedforward Network: Adds non-linearity and depth.
- Encoder-Decoder Architecture: Implements both parts of the Transformer model.
- Fine-tune on larger datasets
- Experiment with different tokenization techniques
- Optimize performance and efficiency
This project is open-source under the MIT License.