AI PM Interview Q&A by Topic — LLMs, RAG, Agents & More

SELECT TOPIC

When you ask ChatGPT a question like 'What were the main causes of World War I and how did they compare to World War II?', the transformer reads your entire question at once, identifies key concepts ('World War I', 'causes', 'World War II', 'compare'), draws connections between them, and constructs an answer word-by-word while remembering everything it just said. It's like having an assistant who reads your entire email before responding, remembers every detail of the conversation, and can reference things you mentioned hours ago. This parallel understanding and generation capability is what makes modern AI feel so natural and intelligent.

🔍 DEFINITION:

⚙️ HOW IT WORKS:

💡 WHY IT MATTERS:

📋 EXAMPLE:

What is the self-attention mechanism and how does it compute attention scores?

🔍 DEFINITION:

⚙️ HOW IT WORKS:

💡 WHY IT MATTERS:

📋 EXAMPLE:

What is multi-head attention and why is it used instead of single-head attention?

🔍 DEFINITION:

⚙️ HOW IT WORKS:

💡 WHY IT MATTERS:

📋 EXAMPLE:

Explain the role of positional encoding in transformers. Why is it necessary?

🔍 DEFINITION:

⚙️ HOW IT WORKS:

💡 WHY IT MATTERS:

📋 EXAMPLE:

What is the difference between encoder-only, decoder-only, and encoder-decoder transformer models? Give examples of each.

🔍 DEFINITION:

⚙️ HOW IT WORKS:

💡 WHY IT MATTERS:

📋 EXAMPLE:

What is the scaled dot-product attention formula? Why do we scale by the square root of d_k?

🔍 DEFINITION:

⚙️ HOW IT WORKS:

💡 WHY IT MATTERS:

📋 EXAMPLE:

What are residual connections and layer normalization, and why are they important in transformers?

🔍 DEFINITION:

⚙️ HOW IT WORKS:

💡 WHY IT MATTERS:

📋 EXAMPLE:

What is the feed-forward network (FFN) layer in a transformer block and what does it do?

🔍 DEFINITION:

⚙️ HOW IT WORKS:

💡 WHY IT MATTERS:

📋 EXAMPLE:

How does the attention mask work in a decoder-only model during training?

🔍 DEFINITION:

⚙️ HOW IT WORKS:

💡 WHY IT MATTERS:

📋 EXAMPLE:

What is the difference between cross-attention and self-attention?

🔍 DEFINITION:

⚙️ HOW IT WORKS:

💡 WHY IT MATTERS:

📋 EXAMPLE:

Explain the concept of key, query, and value (K, Q, V) in attention. What do they represent intuitively?

🔍 DEFINITION:

⚙️ HOW IT WORKS:

💡 WHY IT MATTERS:

📋 EXAMPLE:

What is the computational complexity of self-attention with respect to sequence length?

🔍 DEFINITION:

⚙️ HOW IT WORKS:

💡 WHY IT MATTERS:

📋 EXAMPLE:

How does the transformer handle variable-length inputs?

🔍 DEFINITION:

⚙️ HOW IT WORKS:

💡 WHY IT MATTERS:

📋 EXAMPLE:

What is Flash Attention and why does it matter for training large models?

🔍 DEFINITION:

⚙️ HOW IT WORKS:

💡 WHY IT MATTERS:

📋 EXAMPLE:

What is the role of softmax in the attention mechanism?

🔍 DEFINITION:

⚙️ HOW IT WORKS:

💡 WHY IT MATTERS:

📋 EXAMPLE:

How does a transformer model generate text autoregressively at inference time?

🔍 DEFINITION:

⚙️ HOW IT WORKS: