Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 11 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,11 +56,11 @@ Apply what you've learned to real-world machine learning and AI problems.
| 8 | [Unsupervised Learning: Clustering & Dimensionality Reduction](./chapters/chapter-08-unsupervised-learning/) | 8h | ✅ Available |
| 9 | [Deep Learning Fundamentals](./chapters/chapter-09-deep-learning-fundamentals/) | 12h | ✅ Available |
| 10 | [Natural Language Processing Basics](./chapters/chapter-10-natural-language-processing-basics/) | 8–10h | ✅ Available |
| 11 | Large Language Models & Transformers | 10h | 🔄 Coming Soon |
| 12 | Prompt Engineering & In-Context Learning | 6h | 🔄 Coming Soon |
| 13 | Retrieval-Augmented Generation (RAG) | 8h | 🔄 Coming Soon |
| 14 | Fine-tuning & Adaptation Techniques | 8h | 🔄 Coming Soon |
| 15 | MLOps & Model Deployment | 8h | 🔄 Coming Soon |
| 11 | [Large Language Models & Transformers](./chapters/chapter-11-large-language-models-and-transformers/) | 10h | ✅ Available |
| 12 | [Prompt Engineering & In-Context Learning](./chapters/chapter-12-prompt-engineering-and-in-context-learning/) | 6h | ✅ Available |
| 13 | [Retrieval-Augmented Generation (RAG)](./chapters/chapter-13-retrieval-augmented-generation/) | 8h | ✅ Available |
| 14 | [Fine-tuning & Adaptation Techniques](./chapters/chapter-14-fine-tuning-and-adaptation/) | 8h | ✅ Available |
| 15 | [MLOps & Model Deployment](./chapters/chapter-15-mlops-and-model-deployment/) | 8h | ✅ Available |

### Advanced & Specialization Track (Master Complex Topics)
Dive deep into cutting-edge techniques and specialized domains.
Expand Down Expand Up @@ -268,12 +268,12 @@ pie title Curriculum Breakdown
"Community Requested" : 999
```

- **Chapters Available Now**: 9 (76 hours of content)
- **Chapters Available Now**: 15 (116 hours of content) — Foundation + Practitioner tracks complete
- **Total Planned Chapters**: 25+
- **Jupyter Notebooks**: 21 interactive notebooks
- **SVG Diagrams**: 21 professional diagrams
- **Exercises**: 37 problems with solutions
- **Datasets**: 5 practice datasets
- **Jupyter Notebooks**: 45 interactive notebooks
- **SVG/Mermaid Diagrams**: 36 professional diagrams
- **Exercises**: 60+ problems with solutions
- **Datasets**: 30+ practice datasets
- **Community-Requested Chapters**: Growing daily

---
Expand Down Expand Up @@ -370,5 +370,5 @@ Every share helps more people learn AI. Thank you! 🙏

**Created by Luigi Pascal Rondanini | Generated by Berta AI**

*Last Updated: March 2026*
*Last Updated: May 2026*
*All chapters maintained and continuously improved based on community feedback.*
24 changes: 12 additions & 12 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ Our vision for the future of AI education. This is a living document—prioritie

**Master Repository**: ✅ Live
**Foundation Track**: ✅ Complete (5 chapters available)
**Practitioner Track**: 🔄 In progress (4 of 10 chapters available)
**Practitioner Track**: ✅ Complete (10 of 10 chapters available)
**Advanced Track**: 📋 Planned (10 chapters)
**Community Requests**: 🚀 Starting (unlimited)
**Total Planned**: 25+ chapters, 500+ hours of content
**Currently Available**: 9 chapters, 76 hours of content, 27 SVG diagrams
**Currently Available**: 15 chapters, 116 hours of content, 36 diagrams

---

Expand All @@ -21,7 +21,7 @@ Our vision for the future of AI education. This is a living document—prioritie
### Objectives
- ✅ Establish master repository (DONE)
- ✅ Complete Foundation Track (DONE)
- ✅ Begin Practitioner Track (Ch 6-9 available)
- ✅ Complete Practitioner Track (Ch 6-15 available)
- 🔄 Establish community request process
- 🔄 Build first 100 community chapters
- ✅ Create core infrastructure and documentation (DONE)
Expand All @@ -37,11 +37,11 @@ Our vision for the future of AI education. This is a living document—prioritie
- One new chapter released per week
- New chapters unlock after reaching **10 newsletter subscribers**
- ✅ Foundation Track complete (Chapters 1-5)
- ✅ Practitioner Track started (Chapters 6-9)
- ✅ Practitioner Track complete (Chapters 6-15)

### Metrics to Track
- Newsletter subscribers (target: 10 to unlock weekly releases)
- Chapters completed: 9 / 25
- Chapters completed: 15 / 25
- Community requests received
- Stars on master repo

Expand All @@ -50,7 +50,7 @@ Our vision for the future of AI education. This is a living document—prioritie
## Phase 2: Practitioner Track & Community Scale

### Objectives
- 🔄 Complete Practitioner Track (10 chapters, releasing one per week)
- Complete Practitioner Track (10 of 10 chapters released)
- 🔄 Scale community chapters to 50+
- 🔄 Establish quality standards and review process
- 🔄 Begin analytics and learner tracking
Expand All @@ -61,12 +61,12 @@ Our vision for the future of AI education. This is a living document—prioritie
- [x] Chapter 7: Supervised Learning (Regression & Classification)
- [x] Chapter 8: Unsupervised Learning
- [x] Chapter 9: Deep Learning Fundamentals
- [ ] Chapter 10: Natural Language Processing Basics
- [ ] Chapter 11: Large Language Models & Transformers
- [ ] Chapter 12: Prompt Engineering
- [ ] Chapter 13: Retrieval-Augmented Generation (RAG)
- [ ] Chapter 14: Fine-tuning & Adaptation
- [ ] Chapter 15: MLOps & Deployment
- [x] Chapter 10: Natural Language Processing Basics
- [x] Chapter 11: Large Language Models & Transformers
- [x] Chapter 12: Prompt Engineering
- [x] Chapter 13: Retrieval-Augmented Generation (RAG)
- [x] Chapter 14: Fine-tuning & Adaptation
- [x] Chapter 15: MLOps & Deployment

### Infrastructure Improvements
- [ ] GitHub Actions for automated testing
Expand Down
38 changes: 19 additions & 19 deletions SYLLABUS.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,12 @@ graph TD
CH7["Ch 7: Supervised Learning<br/>10h | Available"]
CH8["Ch 8: Unsupervised Learning<br/>8h | Available"]
CH9["Ch 9: Deep Learning<br/>12h | Available"]
CH10["Ch 10: NLP Basics<br/>10h | Coming Soon"]
CH11["Ch 11: LLMs & Transformers<br/>10h | Coming Soon"]
CH12["Ch 12: Prompt Engineering<br/>6h | Coming Soon"]
CH13["Ch 13: RAG<br/>8h | Coming Soon"]
CH14["Ch 14: Fine-tuning<br/>8h | Coming Soon"]
CH15["Ch 15: MLOps<br/>8h | Coming Soon"]
CH10["Ch 10: NLP Basics<br/>10h | Available"]
CH11["Ch 11: LLMs & Transformers<br/>10h | Available"]
CH12["Ch 12: Prompt Engineering<br/>6h | Available"]
CH13["Ch 13: RAG<br/>8h | Available"]
CH14["Ch 14: Fine-tuning<br/>8h | Available"]
CH15["Ch 15: MLOps<br/>8h | Available"]

CH1 --> CH2
CH1 --> CH3
Expand Down Expand Up @@ -58,15 +58,15 @@ graph TD
style CH7 fill:#4caf50,color:#fff
style CH8 fill:#4caf50,color:#fff
style CH9 fill:#4caf50,color:#fff
style CH10 fill:#f3e5f5
style CH11 fill:#f3e5f5
style CH12 fill:#f3e5f5
style CH13 fill:#f3e5f5
style CH14 fill:#f3e5f5
style CH15 fill:#f3e5f5
style CH10 fill:#4caf50,color:#fff
style CH11 fill:#4caf50,color:#fff
style CH12 fill:#4caf50,color:#fff
style CH13 fill:#4caf50,color:#fff
style CH14 fill:#4caf50,color:#fff
style CH15 fill:#4caf50,color:#fff
```

**Legend**: Green = Available | Purple = Practitioner (Coming Soon) | Chapters 1-9 fully available with SVG diagrams
**Legend**: Green = Available | Practitioner Track (Chapters 6–15) is now complete; Advanced Track (Chapters 16+) is planned

---

Expand All @@ -83,12 +83,12 @@ graph TD
| 7 | [Supervised Learning](./chapters/chapter-07-supervised-learning/) | Practitioner | 10h | Available | 3 notebooks, scripts, 5 exercises, 3 SVGs |
| 8 | [Unsupervised Learning](./chapters/chapter-08-unsupervised-learning/) | Practitioner | 8h | Available | 3 notebooks, scripts, 5 exercises, 3 SVGs |
| 9 | [Deep Learning Fundamentals](./chapters/chapter-09-deep-learning-fundamentals/) | Practitioner | 12h | Available | 3 notebooks, scripts, 5 exercises, 3 SVGs |
| 10 | Natural Language Processing | Practitioner | 10h | Planned | - |
| 11 | LLMs & Transformers | Practitioner | 10h | Planned | - |
| 12 | Prompt Engineering | Practitioner | 6h | Planned | - |
| 13 | RAG | Practitioner | 8h | Planned | - |
| 14 | Fine-tuning & Adaptation | Practitioner | 8h | Planned | - |
| 15 | MLOps & Deployment | Practitioner | 8h | Planned | - |
| 10 | [Natural Language Processing](./chapters/chapter-10-natural-language-processing-basics/) | Practitioner | 10h | Available | 3 notebooks, scripts, 4 exercises, 3 diagrams |
| 11 | [LLMs & Transformers](./chapters/chapter-11-large-language-models-and-transformers/) | Practitioner | 10h | Available | 3 notebooks, scripts, 4 exercises, 3 diagrams |
| 12 | [Prompt Engineering](./chapters/chapter-12-prompt-engineering-and-in-context-learning/) | Practitioner | 6h | Available | 3 notebooks, scripts, 4 exercises, 3 diagrams |
| 13 | [RAG](./chapters/chapter-13-retrieval-augmented-generation/) | Practitioner | 8h | Available | 3 notebooks, scripts, 4 exercises, 3 diagrams |
| 14 | [Fine-tuning & Adaptation](./chapters/chapter-14-fine-tuning-and-adaptation/) | Practitioner | 8h | Available | 3 notebooks, scripts, 4 exercises, 3 diagrams |
| 15 | [MLOps & Deployment](./chapters/chapter-15-mlops-and-model-deployment/) | Practitioner | 8h | Available | 3 notebooks, scripts, 4 exercises, 3 diagrams |
| 16 | Multi-Agent Systems | Advanced | 10h | Planned | - |
| 17 | Advanced RAG | Advanced | 10h | Planned | - |
| 18 | Reinforcement Learning | Advanced | 12h | Planned | - |
Expand Down
140 changes: 140 additions & 0 deletions chapters/chapter-11-large-language-models-and-transformers/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
# Chapter 11: Large Language Models & Transformers

**Track**: Practitioner | **Time**: 10 hours | **Prerequisites**: [Chapter 10: Natural Language Processing Basics](../chapter-10-natural-language-processing-basics/)

---

Large language models (LLMs) and the **Transformer** architecture power most of modern AI: ChatGPT, Claude, Gemini, Llama, and the embedding/RAG systems built on top of them. This chapter takes the attention and transfer-learning ideas from Chapter 10 and builds them up into a full understanding of how transformers work, how pretrained LLMs are used, and how to build real applications around them.

You will implement **scaled dot-product attention**, **multi-head attention**, **positional encodings**, and a **transformer block** in pure NumPy; work with **pretrained models** (BERT, DistilBERT, GPT-style) through a graceful Hugging Face fallback; generate embeddings; explore **decoding strategies** (greedy, top-k, top-p, temperature); and study **scaling laws**, **evaluation**, and how to ship LLM-powered features.

---

## Learning Objectives

By the end of this chapter, you will be able to:

1. **Explain the Transformer architecture** — self-attention, multi-head attention, positional encoding, residuals, layer norm
2. **Implement attention from scratch** — scaled dot-product and multi-head attention in NumPy
3. **Distinguish encoder, decoder, and encoder–decoder models** — and pick the right family for a task
4. **Use pretrained LLMs** — tokenize, extract embeddings, run inference with Hugging Face `transformers`
5. **Apply LLM embeddings to downstream tasks** — similarity search and frozen-embedding classifiers
6. **Generate text with controlled decoding** — greedy, sampling, temperature, top-k, top-p, repetition penalty
7. **Evaluate LLMs** — perplexity, BLEU/ROUGE, win-rate, and the limits of LLM-as-judge
8. **Design LLM-powered systems** — chunking, streaming, function calling, and the road to RAG and fine-tuning

---

## Prerequisites

- **Chapter 10: Natural Language Processing Basics** — tokenization, embeddings, attention intuition, transfer learning
- **Chapter 9: Deep Learning Fundamentals** — backprop, layers, optimizers, training loops
- Comfort with NumPy, linear algebra (matmul, softmax), and basic probability
- Optional: PyTorch for the deeper sections (the chapter runs without it)

---

## What You'll Build

- **Mini-Transformer in NumPy** — scaled dot-product attention, multi-head attention, positional encoding, and a single encoder block you can run end-to-end
- **Embedding service** — wrap a pretrained model (or fallback) to turn text into vectors and search by similarity
- **Frozen-embedding classifier** — sentence embeddings + scikit-learn for a fast, strong text classifier
- **Decoding playground** — greedy, temperature, top-k and top-p samplers operating on real logit distributions
- **LLM application sketch** — chunking, prompt assembly, and streaming patterns that lead into Chapter 12 (Prompt Engineering) and Chapter 13 (RAG)

---

## Time Commitment

| Section | Time |
|---------|------|
| Notebook 01: Transformer Architecture (attention, multi-head, positional encoding, blocks) | 3 hours |
| Notebook 02: Pretrained LLMs (tokenizers, embeddings, classification, model selection) | 3 hours |
| Notebook 03: Advanced LLMs (decoding, KV cache, scaling, evaluation, apps) | 2.5 hours |
| Exercises (Problem Sets 1 & 2) | 1.5 hours |
| **Total** | **10 hours** |

---

## Technology Stack

- **Numerics**: `numpy`, `pandas`, `scikit-learn`
- **Visualization**: `matplotlib`
- **Notebooks**: `jupyter`, `ipywidgets`
- **Optional (LLMs)**: `transformers`, `tokenizers`, `accelerate`, `datasets`, `sentencepiece`, `huggingface-hub`
- **Optional (DL)**: `torch` for the deeper transformer/embedding sections

---

## Quick Start

1. **Clone and enter the chapter**
```bash
cd chapters/chapter-11-large-language-models-and-transformers
```

2. **Create a virtual environment and install dependencies**
```bash
python -m venv .venv
.venv\Scripts\activate # Windows
# source .venv/bin/activate # macOS/Linux
pip install -r requirements.txt
# Optional, for the pretrained-LLM sections:
# pip install torch transformers tokenizers accelerate datasets sentencepiece huggingface-hub
```

3. **Run the notebooks**
```bash
jupyter notebook notebooks/
```
Start with `01_transformer_architecture.ipynb`, then `02_pretrained_llms.ipynb`, then `03_advanced_llms.ipynb`.

---

## Notebook Guide

| Notebook | Focus |
|----------|--------|
| **01_transformer_architecture.ipynb** | From RNN limits to attention; scaled dot-product and multi-head attention in NumPy; sinusoidal positional encoding; encoder block; encoder/decoder/decoder-only families; tokenization (BPE/WordPiece) intuition |
| **02_pretrained_llms.ipynb** | Loading pretrained models with `transformers` (with fallback); `AutoTokenizer`; extracting and visualizing embeddings; mean pooling for sentence vectors; frozen-embedding classification; choosing BERT vs RoBERTa vs DistilBERT vs GPT |
| **03_advanced_llms.ipynb** | Decoding strategies (greedy, sampling, temperature, top-k, top-p); KV cache shapes; scaling laws; evaluation (perplexity, BLEU/ROUGE, LLM-as-judge); building LLM apps (chunking, streaming, function calling); capstone design |

---

## Exercise Guide

- **Problem Set 1** (`exercises/problem_set_1.ipynb`) — implement scaled dot-product attention; build sinusoidal positional encoding; plot an attention heatmap; tokenize text and reason about BPE; multi-head attention shape check; compare encoder/decoder/encoder–decoder
- **Problem Set 2** (`exercises/problem_set_2.ipynb`) — implement top-k sampling; build a tiny transformer block from scratch; compute perplexity; train an embedding-based classifier; reason about prompt vs context-window trade-offs; evaluate generations
- **Solutions** — in `exercises/solutions/` with runnable code, explanations, and alternatives

---

## How to Run Locally

- Use Python 3.9+ and the versions in `requirements.txt` for reproducibility.
- The numpy-only sections (Notebook 01, large parts of 03, all Problem Set 1) require **no** transformer installs.
- For Notebook 02 and the embedding sections, install the optional `transformers` / `torch` extras shown above.
- Scripts in `scripts/` can be run from the chapter root; notebooks assume that root as working directory.

---

## Common Troubleshooting

- **`transformers` not installed** — Notebooks fall back to NumPy/sklearn stubs and print a `pip install transformers` hint; install when you want the real models
- **Hugging Face download blocked / offline** — Set `HF_HUB_OFFLINE=1` and use a locally cached model, or rely on the fallback paths in the notebooks
- **Out-of-memory loading a large model** — Switch `MODEL_NAME` in `scripts/config.py` to `distilbert-base-uncased` or `sentence-transformers/all-MiniLM-L6-v2`
- **CUDA/GPU** — Optional; everything runs on CPU. Set `CUDA_VISIBLE_DEVICES=""` to force CPU if a GPU is misbehaving
- **Slow first run** — Pretrained model download can take a few minutes; subsequent runs hit the local cache

---

## Next Steps

- **Chapter 12: Prompt Engineering** — Now that you understand how LLMs tokenize, attend, and decode, Chapter 12 turns to *steering* them: prompt patterns, few-shot, chain-of-thought, structured output, and evaluation of prompts.

---

**Generated by Berta AI**

Part of [Berta Chapters](https://github.com/your-org/berta-chapters) — open-source AI curriculum.
*March 2026 — Berta Chapters*
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
graph LR
X["Input X (batch, seq, d_model)"] --> SP["Split into h heads"]
SP --> H1["Head 1: Attention(Q1, K1, V1)"]
SP --> H2["Head 2: Attention(Q2, K2, V2)"]
SP --> H3["..."]
SP --> Hh["Head h: Attention(Qh, Kh, Vh)"]
H1 --> C["Concat (batch, seq, d_model)"]
H2 --> C
H3 --> C
Hh --> C
C --> P["Projection Wo"]
P --> O["Output (batch, seq, d_model)"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
graph LR
X["Input X"] --> Q["Q = X * Wq"]
X --> K["K = X * Wk"]
X --> V["V = X * Wv"]
Q --> S["Scores = Q * K^T / sqrt(d_k)"]
K --> S
S --> M["Optional Mask"]
M --> SM["Softmax"]
SM --> A["Attention Weights"]
A --> O["Output = A * V"]
V --> O
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
graph TB
A["Input Tokens"] --> B["Token Embedding"]
B --> C["+ Positional Encoding"]
C --> D["Encoder Block x N"]
D --> E["Encoder Output"]

F["Target Tokens (shifted)"] --> G["Token Embedding"]
G --> H["+ Positional Encoding"]
H --> I["Decoder Block x N"]
E -.->|Cross-Attention| I
I --> J["Linear + Softmax"]
J --> K["Output Probabilities"]

subgraph Encoder Block
D1["Multi-Head Self-Attention"] --> D2["Add & LayerNorm"]
D2 --> D3["Feed-Forward"]
D3 --> D4["Add & LayerNorm"]
end

subgraph Decoder Block
I1["Masked Multi-Head Self-Attention"] --> I2["Add & LayerNorm"]
I2 --> I3["Cross-Attention"]
I3 --> I4["Add & LayerNorm"]
I4 --> I5["Feed-Forward"]
I5 --> I6["Add & LayerNorm"]
end
Loading
Loading