🤖

LLM & AI Interview Questions

2 free questions to get you started. Unlock all sections (Normal, Code & Logic) with Pro.

Free sample — easy bait
1

What is the Transformer architecture and why did it replace RNNs for NLP? Normal

The Transformer (Vaswani et al., 2017) uses self-attention instead of recurrence. Each token can attend to every other token in parallel, so training is much faster and long-range dependencies are easier to learn. RNNs process sequentially and suffer from vanishing gradients and slow training. Transformers are the backbone of GPT, BERT, and modern LLMs. Key components: multi-head attention, feed-forward layers, layer norm, and positional encodings.

2

Fine-tuning vs RAG — when would you use each? Normal

Fine-tuning updates the model’s weights on your data (e.g. domain-specific text, task examples). Good when you need the model to internalize new knowledge or a specific style; requires training infrastructure and data. RAG (Retrieval-Augmented Generation) keeps the model fixed and retrieves relevant documents at query time, then injects them into the prompt. Good when knowledge changes often or you don’t want to retrain; faster to implement and easier to update. Use RAG for dynamic/knowledge-heavy Q&A; use fine-tuning when you need consistent behavior or domain language.

Unlock all LLM & AI interview questions

30+ questions across attention, prompting, RAG, evaluation, and production — with full answers.

Upgrade to Pro
Attention & architecture
3

Explain the attention mechanism (Query, Key, Value). Normal

Unlock with Pro for the answer.

4

Write pseudo-code or formula for scaled dot-product attention. Code

Unlock with Pro for the answer.

5

Why do we use multi-head attention instead of single-head? Logic

Unlock with Pro for the answer.

Prompting & control
6

What is prompt engineering? Name 3 techniques. Normal

Unlock with Pro for the answer.

7

Write a system prompt and user prompt for a “support chatbot” with constraints. Code

Unlock with Pro for the answer.

8

How would you reduce hallucination in a production LLM app? Logic

Unlock with Pro for the answer.

RAG & retrieval
9

What are the main components of a RAG pipeline? Normal

Unlock with Pro for the answer.

10

How do you chunk documents for RAG (strategy and code)? Code

Unlock with Pro for the answer.

11

Design a RAG system for internal docs: embedding model, retriever, and LLM choice. Logic

Unlock with Pro for the answer.

Evaluation & production
12

How do you evaluate LLM outputs (BLEU, ROUGE, human eval, LLM-as-judge)? Normal

Unlock with Pro for the answer.

13

What is tokenization and why does it matter for context length? Code

Unlock with Pro for the answer.

14

How would you deploy an LLM API with rate limiting and cost control? Logic

Unlock with Pro for the answer.