In this repo I try to implement the architectures behind modern llms
So far done with:
- Transformer
- KV Caching
- Flash Attention ( just a rough algorithm no cuda code )
- Rotary Positional Embeddings
- Minimal GPT implementation ( just an implementation without training or inference)
- Mixture of Experts module implemented
Also check out my blog where I sometimes explain these concepts: https://kanishkez.github.io/#blogs