Skip to content

bishopaccepting325/LLM-Code-Hot-100

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔥 LLM Interview Hot 100

The Hot 100 for the LLM Era

Hand-torn Code for LLM Interviews · Community-Driven Voting · Real-time Rankings

GitHub stars License: MIT PRs Welcome Vote

English | 中文

📖 Master these 100 topics, ace your LLM interview coding challenges


"Asked to implement Multi-Head Attention from scratch?"

"Can you write PPO, DPO, GRPO and explain the differences?"

"How does KV Cache work? What's the core idea of Flash Attention?"

👉 Vote: Which topics appear most often? 👈


✨ Features

Feature Description
🎯 Real Interview Questions Rankings driven by community votes
📝 Detailed Comments Every line of code clearly annotated
🔥 Production-Ready Numerical stability, edge cases handled
🆚 Method Comparisons Side-by-side comparisons of similar methods
Q&A Sections Common questions answered proactively

📋 Complete Topic List

Legend: 🔥🔥🔥 Must Know | 🔥🔥 High Frequency | 🔥 Occasional | No mark = Good to know

👉 Vote Now to help calibrate the real interview frequency!

📖 LLM Basics → View

# Topic Hot Difficulty One-liner
1 Gradient & Backprop 🔥🔥 ⭐⭐ Chain rule, foundation of deep learning
2 Linear Regression 🔥 y = Wx + b, simplest model
3 Logistic Regression 🔥🔥 ⭐⭐ sigmoid(Wx + b), binary classification
4 Softmax Regression 🔥 ⭐⭐ Multi-class, LLM output layer
5 MLP 🔥🔥 ⭐⭐ Universal approximator, FFN basis
6 Activation Functions 🔥🔥 ReLU/GELU/SiLU and gradients

🧠 Attention Mechanisms → View

# Topic Hot Difficulty One-liner
7 Scaled Dot-Product Attention 🔥🔥🔥 ⭐⭐⭐ softmax(QK^T/√d)V, the foundation
8 Multi-Head Attention 🔥🔥🔥 ⭐⭐⭐⭐ Parallel heads, different subspaces
9 Causal Mask 🔥🔥🔥 ⭐⭐ Lower triangular, prevent future peeking
10 GQA 🔥🔥🔥 ⭐⭐⭐⭐ Q heads > KV heads, LLaMA2 standard
11 MQA 🔥🔥 ⭐⭐⭐ All Q share one KV
12 Flash Attention 🔥🔥 ⭐⭐⭐⭐⭐ Tiled computation, IO-aware, O(N) memory
13 KV Cache 🔥🔥🔥 ⭐⭐⭐⭐ Cache historical KV, avoid recomputation
14 Cross Attention 🔥 ⭐⭐⭐ Q from decoder, KV from encoder

📏 Normalization → View

# Topic Hot Difficulty One-liner
15 Layer Normalization 🔥🔥🔥 ⭐⭐ Normalize across features, Transformer standard
16 RMS Normalization 🔥🔥🔥 ⭐⭐ No mean, just RMS, LLaMA uses it
17 Batch Normalization 🔥 ⭐⭐ Normalize across batch, CNN common
18 Pre-Norm vs Post-Norm 🔥🔥 Pre-Norm more stable, modern LLM standard

📍 Position Encoding → View

# Topic Hot Difficulty One-liner
19 Sinusoidal PE 🔥 ⭐⭐ sin/cos fixed, original Transformer
20 Learnable PE 🔥 Learnable embeddings, BERT/GPT
21 RoPE 🔥🔥🔥 ⭐⭐⭐⭐ Complex rotation, relative position, LLM standard
22 ALiBi 🔥🔥 ⭐⭐⭐ Linear bias in attention, good extrapolation

🎲 Sampling Strategies → View

# Topic Hot Difficulty One-liner
23 Greedy Decoding 🔥 Pick argmax each step
24 Temperature Sampling 🔥🔥🔥 ⭐⭐ logits/T controls randomness
25 Top-k Sampling 🔥🔥 ⭐⭐ Sample from top-k only
26 Top-p Sampling 🔥🔥🔥 ⭐⭐⭐ Cumulative probability cutoff
27 Beam Search 🔥🔥 ⭐⭐⭐ Keep k best sequences

📉 Loss Functions → View

# Topic Hot Difficulty One-liner
28 Cross Entropy Loss 🔥🔥🔥 ⭐⭐⭐ -log(p_true), classification standard
29 LM Loss 🔥🔥🔥 ⭐⭐ Autoregressive CE, next token prediction
30 KL Divergence 🔥🔥 ⭐⭐⭐ Distribution difference, distillation/RLHF
31 MSE Loss 🔥 (y-ŷ)², regression
32 Focal Loss 🔥 ⭐⭐⭐ Down-weight easy samples
33 SFT Loss 🔥🔥 ⭐⭐ Masked CE, response only
34 Reward Model Loss 🔥🔥 ⭐⭐⭐ -log σ(r_w - r_l), preference learning
35 Contrastive Loss 🔥 ⭐⭐⭐ Pull positives, push negatives

⚡ Optimizers → View

# Topic Hot Difficulty One-liner
36 SGD 🔥 Basic w -= lr * grad
37 SGD + Momentum 🔥 ⭐⭐ Add momentum, faster convergence
38 Adam 🔥🔥🔥 ⭐⭐⭐ Adaptive LR, 1st & 2nd moments
39 AdamW 🔥🔥🔥 ⭐⭐⭐ Decoupled weight decay, LLM standard
40 LR Schedule 🔥🔥 ⭐⭐ Warmup + Cosine/Linear decay

🎮 Reinforcement Learning (RLHF) → View

# Topic Hot Difficulty One-liner
41 REINFORCE 🔥 ⭐⭐⭐ Policy gradient ∇log π × R
42 GAE 🔥🔥🔥 ⭐⭐⭐⭐ Advantage estimation, bias-variance tradeoff
43 PPO ��🔥🔥 ⭐⭐⭐⭐⭐ Clip ratio, RLHF core
44 PPO-Clip ��🔥🔥 ⭐⭐⭐⭐ Clipped objective version
45 DPO 🔥��🔥 ⭐⭐⭐⭐ Direct preference optimization, no RM
46 GRPO 🔥🔥🔥 ⭐⭐⭐⭐⭐ Group relative policy, DeepSeek uses
47 KL Penalty 🔥🔥 ⭐⭐ Prevent diverging from reference
48 Reward Shaping 🔥 ⭐⭐⭐ Reward engineering, sparse → dense

🚀 Efficient Training → View

# Topic Hot Difficulty One-liner
49 LoRA 🔥🔥🔥 ⭐⭐⭐⭐ Low-rank decomposition W + BA
50 QLoRA 🔥🔥 ⭐⭐⭐⭐ LoRA + 4bit quantization
51 Gradient Checkpointing 🔥🔥 ⭐⭐⭐ Trade time for memory
52 Mixed Precision 🔥🔥 ⭐⭐⭐ FP16/BF16, less memory, faster
53 Gradient Accumulation 🔥🔥 ⭐⭐ Small batch simulates large batch

⚡ Inference Optimization → View

# Topic Hot Difficulty One-liner
54 KV Cache 🔥🔥🔥 ⭐⭐⭐⭐ Cache KV, speed up autoregressive
55 Paged Attention 🔥🔥 ⭐⭐⭐⭐ Paged KV management, vLLM core
56 Speculative Decoding 🔥🔥 ⭐⭐⭐⭐ Small model drafts, large verifies
57 Continuous Batching 🔥🔥 ⭐⭐⭐ Dynamic batching, higher throughput
58 Quantization 🔥🔥 ⭐⭐⭐ INT8/INT4, half+ memory

🏗️ Transformer Architecture → View

# Topic Hot Difficulty One-liner
59 Encoder-Only (BERT) 🔥 ⭐⭐⭐ Bidirectional, understanding tasks
60 Decoder-Only (GPT) 🔥🔥🔥 ⭐⭐⭐ Causal attention, generation, LLM standard
61 Encoder-Decoder (T5) 🔥 ⭐⭐⭐ Seq2seq, translation/summarization
62 FFN 🔥🔥 ⭐⭐ 2-layer MLP, 4x expansion
63 SwiGLU 🔥🔥 ⭐⭐⭐ Gated FFN, LLaMA uses

🔥 Hot Top 20

Community-driven, updated hourly via GitHub Actions

Last updated: 2026-04-18

Rank Topic Category Votes
🥇 Scaled Dot-Product Attention Attention 🔥 4
🥈 Gradient & Backprop Basics 🔥 3
🥉 Linear Regression Basics 🔥 3
4 Multi-Head Attention Attention 🔥 3
5 Logistic Regression Basics 🔥 2
6 BatchNorm Norm 🔥 2
7 Cross Entropy Loss 🔥 2
8 LayerNorm Norm 🔥 2
9 GQA Attention 🔥 1
10 RoPE Position 🔥 1
11 DPO RL 🔥 1
12 Causal Mask Attention 🔥 1
13 Top-k Sampling 🔥 1
14 Top-p Sampling 🔥 1
15 Beam Search Sampling 🔥 1
16 Decoder-Only Arch 🔥 1
17 FFN Arch 🔥 1
18 GAE RL 🔥 1
19 GRPO RL 🔥 1
20 KL Penalty RL 🔥 1

🗳️ Vote Now

Your interview experience matters! Help calibrate real interview frequencies.

👉 Go to Voting Page 👈

  • 🗳️ Vote for topics you've seen in interviews
  • 🏆 Real-time leaderboard
  • 💬 Share your experience

🤝 Contributing

Contributions welcome! New topics, bug fixes, documentation improvements.

  1. Fork this repo
  2. Create feature branch git checkout -b feature/new-topic
  3. Commit changes git commit -m 'Add: XXX'
  4. Push branch git push origin feature/new-topic
  5. Submit Pull Request

�� References


⭐ Star History

If this helps, please give it a star ⭐!

Star History Chart


Made with ❤️ for LLM Interview Preparation

The Hot 100 for the LLM Era

#LLMHot100

Report Bug · Request Feature · Vote Now

About

Track and rank the top 100 LLM interview coding topics with community votes, real-time updates, and clear study focus

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors