YapFormer : A Transformer Implementation with Modern Optimizations ✧ദ്ദി(˵ •̀ ᴗ - ˵ ) ✧

yapformer_demo.mp4

YapFormer is a transformer model built entirely from scratch, featuring modern architectural components and efficient training optimizations.
The final model contains ~56 million parameters and was trained for 15,000 steps (~4.5 hours) on the TinyStories dataset.

Despite the small size and short training time, YapFormer produces surprisingly high-quality short stories, demonstrating that well-designed architectures can go a long way even with limited compute.

What is YapFormer? ૮ ◕ ﻌ ◕ა

YapFormer is a from-scratch GPT-style autoregressive transformer that integrates many techniques used in contemporary LLMs:

Rotary Embeddings (RoPE)
Grouped Query Attention (GQA)
KV caching for fast inference
RMSNorm
SwiGLU feed-forward layers
Mixed precision training
Gradient accumulation
Cosine decay learning rate
Gradient clipping

This project serves as both a learning exercise and a practical lightweight generative model.

Working ૮₍ • ᴥ • ₎ა

1. Input & Embeddings

Tokens are mapped using a custom tokenizer.
RoPE is applied to attention queries/keys instead of absolute positional embeddings.

2. Attention (with GQA + KV Cache)

Grouped Query Attention (GQA):
Multiple query heads share a smaller number of key/value heads → faster and more memory-efficient.
KV Caching:
During inference, previous keys/values are stored so the model only attends to new tokens.

3. Transformer Blocks

Each block contains:

RMSNorm
Multi-Head Attention (with RoPE, GQA, KV cache)
SwiGLU feed-forward network
Residual connections

4. Output Projection

Final RMSNorm
Linear layer → logits → softmax for next-token prediction

5. Training Loop

Modern GPU-friendly techniques:

AMP mixed precision for speed + memory efficiency
Gradient accumulation to simulate large batch sizes
Cosine LR decay for smooth convergence
Gradient clipping to prevent instability

Architecture ૮ - ﻌ • ა⁩

Model Structure

Model Structure (Decoder-Only Transformer)

Token Embedding
        ↓
Rotary Positional Encoding (RoPE)
        ↓
N × Transformer Blocks
 ├─ RMSNorm
 ├─ Grouped Query Attention (GQA + KV Cache)
 ├─ Residual Connection
 ├─ RMSNorm
 ├─ SwiGLU Feed-Forward
 └─ Residual Connection
        ↓
Final RMSNorm
        ↓
Linear Language Modeling Head

Technology Stack ૮ฅ・ﻌ・აฅ

Language: Python
Framework: PyTorch
Built With:
- Custom attention mechanisms
- Custom embeddings
- Custom RMSNorm + SwiGLU layers
- Mixed precision training tools
Ecosystem Tools:
- 🤗 Hugging Face (datasets/tokenization)
- PyTorch (core autograd & tensor ops)

How to Run ૮⎚ﻌ⎚ა⁩

1. Clone the Repository

git clone https://github.com/Aravind-808/YapFormer cd YapFormer

2. Install Dependencies

pip install -r requirements.txt

3. Generate Text

python inference.py

4. Enter your prompt

Prompt: Once upon a time

5. Example Output

Once upon a time there was a tiny mouse who loved reading stories...

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
README.md		README.md
attention.py		attention.py
config.py		config.py
image-1.png		image-1.png
image.png		image.png
inference.py		inference.py
kv_cache.py		kv_cache.py
model.py		model.py
params.py		params.py
rope.py		rope.py
train.py		train.py
transformer.py		transformer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YapFormer : A Transformer Implementation with Modern Optimizations ✧ദ്ദി(˵ •̀ ᴗ - ˵ ) ✧

What is YapFormer? ૮ ◕ ﻌ ◕ა

Working ૮₍ • ᴥ • ₎ა

1. Input & Embeddings

2. Attention (with GQA + KV Cache)

3. Transformer Blocks

4. Output Projection

5. Training Loop

Architecture ૮ - ﻌ • ა⁩

Technology Stack ૮ฅ・ﻌ・აฅ

How to Run ૮⎚ﻌ⎚ა⁩

1. Clone the Repository

2. Install Dependencies

3. Generate Text

4. Enter your prompt

5. Example Output

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

RootSprout/yapformer-from-scratch

Folders and files

Latest commit

History

Repository files navigation

YapFormer : A Transformer Implementation with Modern Optimizations ✧ദ്ദി(˵ •̀ ᴗ - ˵ ) ✧

What is YapFormer? ૮ ◕ ﻌ ◕ა

Working ૮₍ • ᴥ • ₎ა

1. Input & Embeddings

2. Attention (with GQA + KV Cache)

3. Transformer Blocks

4. Output Projection

5. Training Loop

Architecture ૮ - ﻌ • ა⁩

Technology Stack ૮ฅ・ﻌ・აฅ

How to Run ૮⎚ﻌ⎚ა⁩

1. Clone the Repository

2. Install Dependencies

3. Generate Text

4. Enter your prompt

5. Example Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages