LLM & AI Interview | Data For Dummies

Free sample — easy bait

What is the Transformer architecture and why did it replace RNNs for NLP? Normal

The Transformer (Vaswani et al., 2017) uses self-attention instead of recurrence. Each token can attend to every other token in parallel, so training is much faster and long-range dependencies are easier to learn. RNNs process sequentially and suffer from vanishing gradients and slow training. Transformers are the backbone of GPT, BERT, and modern LLMs. Key components: multi-head attention, feed-forward layers, layer norm, and positional encodings.

Fine-tuning vs RAG — when would you use each? Normal

Fine-tuning updates the model’s weights on your data (e.g. domain-specific text, task examples). Good when you need the model to internalize new knowledge or a specific style; requires training infrastructure and data. RAG (Retrieval-Augmented Generation) keeps the model fixed and retrieves relevant documents at query time, then injects them into the prompt. Good when knowledge changes often or you don’t want to retrain; faster to implement and easier to update. Use RAG for dynamic/knowledge-heavy Q&A; use fine-tuning when you need consistent behavior or domain language.

Unlock all LLM & AI interview questions

30+ questions across attention, prompting, RAG, evaluation, and production — with full answers.

Upgrade to Pro

Attention & architecture