🔥 LLM Interview Hot 100

The Hot 100 for the LLM Era

Hand-torn Code for LLM Interviews · Community-Driven Voting · Real-time Rankings

English | 中文

📖 Master these 100 topics, ace your LLM interview coding challenges

"Asked to implement Multi-Head Attention from scratch?"

"Can you write PPO, DPO, GRPO and explain the differences?"

"How does KV Cache work? What's the core idea of Flash Attention?"

👉 Vote: Which topics appear most often? 👈

✨ Features

Feature	Description
🎯 Real Interview Questions	Rankings driven by community votes
📝 Detailed Comments	Every line of code clearly annotated
🔥 Production-Ready	Numerical stability, edge cases handled
🆚 Method Comparisons	Side-by-side comparisons of similar methods
❓ Q&A Sections	Common questions answered proactively

📋 Complete Topic List

Legend: 🔥🔥🔥 Must Know ｜ 🔥🔥 High Frequency ｜ 🔥 Occasional ｜ No mark = Good to know

👉 Vote Now to help calibrate the real interview frequency!

📖 LLM Basics → View

#	Topic	Hot	Difficulty	One-liner
1	Gradient & Backprop	🔥🔥	⭐⭐	Chain rule, foundation of deep learning
2	Linear Regression	🔥	⭐	`y = Wx + b`, simplest model
3	Logistic Regression	🔥🔥	⭐⭐	`sigmoid(Wx + b)`, binary classification
4	Softmax Regression	🔥	⭐⭐	Multi-class, LLM output layer
5	MLP	🔥🔥	⭐⭐	Universal approximator, FFN basis
6	Activation Functions	🔥🔥	⭐	ReLU/GELU/SiLU and gradients

🧠 Attention Mechanisms → View

#	Topic	Hot	Difficulty	One-liner
7	Scaled Dot-Product Attention	🔥🔥🔥	⭐⭐⭐	`softmax(QK^T/√d)V`, the foundation
8	Multi-Head Attention	🔥🔥🔥	⭐⭐⭐⭐	Parallel heads, different subspaces
9	Causal Mask	🔥🔥🔥	⭐⭐	Lower triangular, prevent future peeking
10	GQA	🔥🔥🔥	⭐⭐⭐⭐	Q heads > KV heads, LLaMA2 standard
11	MQA	🔥🔥	⭐⭐⭐	All Q share one KV
12	Flash Attention	🔥🔥	⭐⭐⭐⭐⭐	Tiled computation, IO-aware, O(N) memory
13	KV Cache	🔥🔥🔥	⭐⭐⭐⭐	Cache historical KV, avoid recomputation
14	Cross Attention	🔥	⭐⭐⭐	Q from decoder, KV from encoder

📏 Normalization → View

#	Topic	Hot	Difficulty	One-liner
15	Layer Normalization	🔥🔥🔥	⭐⭐	Normalize across features, Transformer standard
16	RMS Normalization	🔥🔥🔥	⭐⭐	No mean, just RMS, LLaMA uses it
17	Batch Normalization	🔥	⭐⭐	Normalize across batch, CNN common
18	Pre-Norm vs Post-Norm	🔥🔥	⭐	Pre-Norm more stable, modern LLM standard

📍 Position Encoding → View

#	Topic	Hot	Difficulty	One-liner
19	Sinusoidal PE	🔥	⭐⭐	sin/cos fixed, original Transformer
20	Learnable PE	🔥	⭐	Learnable embeddings, BERT/GPT
21	RoPE	🔥🔥🔥	⭐⭐⭐⭐	Complex rotation, relative position, LLM standard
22	ALiBi	🔥🔥	⭐⭐⭐	Linear bias in attention, good extrapolation

🎲 Sampling Strategies → View

#	Topic	Hot	Difficulty	One-liner
23	Greedy Decoding	🔥	⭐	Pick argmax each step
24	Temperature Sampling	🔥🔥🔥	⭐⭐	`logits/T` controls randomness
25	Top-k Sampling	🔥🔥	⭐⭐	Sample from top-k only
26	Top-p Sampling	🔥🔥🔥	⭐⭐⭐	Cumulative probability cutoff
27	Beam Search	🔥🔥	⭐⭐⭐	Keep k best sequences

📉 Loss Functions → View

#	Topic	Hot	Difficulty	One-liner
28	Cross Entropy Loss	🔥🔥🔥	⭐⭐⭐	`-log(p_true)`, classification standard
29	LM Loss	🔥🔥🔥	⭐⭐	Autoregressive CE, next token prediction
30	KL Divergence	🔥🔥	⭐⭐⭐	Distribution difference, distillation/RLHF
31	MSE Loss	🔥	⭐	`(y-ŷ)²`, regression
32	Focal Loss	🔥	⭐⭐⭐	Down-weight easy samples
33	SFT Loss	🔥🔥	⭐⭐	Masked CE, response only
34	Reward Model Loss	🔥🔥	⭐⭐⭐	`-log σ(r_w - r_l)`, preference learning
35	Contrastive Loss	🔥	⭐⭐⭐	Pull positives, push negatives

⚡ Optimizers → View

#	Topic	Hot	Difficulty	One-liner
36	SGD	🔥	⭐	Basic `w -= lr * grad`
37	SGD + Momentum	🔥	⭐⭐	Add momentum, faster convergence
38	Adam	🔥🔥🔥	⭐⭐⭐	Adaptive LR, 1st & 2nd moments
39	AdamW	🔥🔥🔥	⭐⭐⭐	Decoupled weight decay, LLM standard
40	LR Schedule	🔥🔥	⭐⭐	Warmup + Cosine/Linear decay

🎮 Reinforcement Learning (RLHF) → View

#	Topic	Hot	Difficulty	One-liner
41	REINFORCE	🔥	⭐⭐⭐	Policy gradient `∇log π × R`
42	GAE	🔥🔥🔥	⭐⭐⭐⭐	Advantage estimation, bias-variance tradeoff
43	PPO	��🔥🔥	⭐⭐⭐⭐⭐	Clip ratio, RLHF core
44	PPO-Clip	��🔥🔥	⭐⭐⭐⭐	Clipped objective version
45	DPO	🔥��🔥	⭐⭐⭐⭐	Direct preference optimization, no RM
46	GRPO	🔥🔥🔥	⭐⭐⭐⭐⭐	Group relative policy, DeepSeek uses
47	KL Penalty	🔥🔥	⭐⭐	Prevent diverging from reference
48	Reward Shaping	🔥	⭐⭐⭐	Reward engineering, sparse → dense

🚀 Efficient Training → View

#	Topic	Hot	Difficulty	One-liner
49	LoRA	🔥🔥🔥	⭐⭐⭐⭐	Low-rank decomposition `W + BA`
50	QLoRA	🔥🔥	⭐⭐⭐⭐	LoRA + 4bit quantization
51	Gradient Checkpointing	🔥🔥	⭐⭐⭐	Trade time for memory
52	Mixed Precision	🔥🔥	⭐⭐⭐	FP16/BF16, less memory, faster
53	Gradient Accumulation	🔥🔥	⭐⭐	Small batch simulates large batch

⚡ Inference Optimization → View

#	Topic	Hot	Difficulty	One-liner
54	KV Cache	🔥🔥🔥	⭐⭐⭐⭐	Cache KV, speed up autoregressive
55	Paged Attention	🔥🔥	⭐⭐⭐⭐	Paged KV management, vLLM core
56	Speculative Decoding	🔥🔥	⭐⭐⭐⭐	Small model drafts, large verifies
57	Continuous Batching	🔥🔥	⭐⭐⭐	Dynamic batching, higher throughput
58	Quantization	🔥🔥	⭐⭐⭐	INT8/INT4, half+ memory

🏗️ Transformer Architecture → View

#	Topic	Hot	Difficulty	One-liner
59	Encoder-Only (BERT)	🔥	⭐⭐⭐	Bidirectional, understanding tasks
60	Decoder-Only (GPT)	🔥🔥🔥	⭐⭐⭐	Causal attention, generation, LLM standard
61	Encoder-Decoder (T5)	🔥	⭐⭐⭐	Seq2seq, translation/summarization
62	FFN	🔥🔥	⭐⭐	2-layer MLP, 4x expansion
63	SwiGLU	🔥🔥	⭐⭐⭐	Gated FFN, LLaMA uses

🔥 Hot Top 20

Community-driven, updated hourly via GitHub Actions

Last updated: 2026-04-18

Rank	Topic	Category	Votes
🥇	Scaled Dot-Product Attention	Attention	🔥 4
🥈	Gradient & Backprop	Basics	🔥 3
🥉	Linear Regression	Basics	🔥 3
4	Multi-Head Attention	Attention	🔥 3
5	Logistic Regression	Basics	🔥 2
6	BatchNorm	Norm	🔥 2
7	Cross Entropy	Loss	🔥 2
8	LayerNorm	Norm	🔥 2
9	GQA	Attention	🔥 1
10	RoPE	Position	🔥 1
11	DPO	RL	🔥 1
12	Causal Mask	Attention	🔥 1
13	Top-k	Sampling	🔥 1
14	Top-p	Sampling	🔥 1
15	Beam Search	Sampling	🔥 1
16	Decoder-Only	Arch	🔥 1
17	FFN	Arch	🔥 1
18	GAE	RL	🔥 1
19	GRPO	RL	🔥 1
20	KL Penalty	RL	🔥 1

🗳️ Vote Now

Your interview experience matters! Help calibrate real interview frequencies.

👉 Go to Voting Page 👈

🗳️ Vote for topics you've seen in interviews
🏆 Real-time leaderboard
💬 Share your experience

🤝 Contributing

Contributions welcome! New topics, bug fixes, documentation improvements.

Fork this repo
Create feature branch git checkout -b feature/new-topic
Commit changes git commit -m 'Add: XXX'
Push branch git push origin feature/new-topic
Submit Pull Request

�� References

⭐ Star History

If this helps, please give it a star ⭐!

Made with ❤️ for LLM Interview Preparation

The Hot 100 for the LLM Era

#LLMHot100

Report Bug · Request Feature · Vote Now

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github/workflows		.github/workflows
docs		docs
pdfs		pdfs
scripts		scripts
README.md		README.md
README_zh.md		README_zh.md
SETUP.md		SETUP.md
index.html		index.html
index_en.html		index_en.html
index_zh.html		index_zh.html
vote.html		vote.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔥 LLM Interview Hot 100

The Hot 100 for the LLM Era

✨ Features

📋 Complete Topic List

📖 LLM Basics → View

🧠 Attention Mechanisms → View

📏 Normalization → View

📍 Position Encoding → View

🎲 Sampling Strategies → View

📉 Loss Functions → View

⚡ Optimizers → View

🎮 Reinforcement Learning (RLHF) → View

🚀 Efficient Training → View

⚡ Inference Optimization → View

🏗️ Transformer Architecture → View

🔥 Hot Top 20

🗳️ Vote Now

🤝 Contributing

�� References

⭐ Star History

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🔥 LLM Interview Hot 100

The Hot 100 for the LLM Era

✨ Features

📋 Complete Topic List

📖 LLM Basics → View

🧠 Attention Mechanisms → View

📏 Normalization → View

📍 Position Encoding → View

🎲 Sampling Strategies → View

📉 Loss Functions → View

⚡ Optimizers → View

🎮 Reinforcement Learning (RLHF) → View

🚀 Efficient Training → View

⚡ Inference Optimization → View

🏗️ Transformer Architecture → View

🔥 Hot Top 20

🗳️ Vote Now

🤝 Contributing

�� References

⭐ Star History

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages