GitHub - kanishkez/Paper-implementations: Implementation of papers I read and stuff

In this repo I try to implement the architectures behind modern llms

So far done with:

Transformer
KV Caching
Flash Attention ( just a rough algorithm no cuda code )
Rotary Positional Embeddings
Minimal GPT implementation ( just an implementation without training or inference)
Mixture of Experts module implemented

Also check out my blog where I sometimes explain these concepts: https://kanishkez.github.io/#blogs

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
GPT2.py		GPT2.py
README.md		README.md
RoPE.py		RoPE.py
flashattention.py		flashattention.py
kvcache.py		kvcache.py
moe.py		moe.py
transformer.py		transformer.py

Provide feedback