2 free questions to get you started. Unlock all sections (Normal, Code & Logic) with Pro.
The Transformer (Vaswani et al., 2017) uses self-attention instead of recurrence. Each token can attend to every other token in parallel, so training is much faster and long-range dependencies are easier to learn. RNNs process sequentially and suffer from vanishing gradients and slow training. Transformers are the backbone of GPT, BERT, and modern LLMs. Key components: multi-head attention, feed-forward layers, layer norm, and positional encodings.
Fine-tuning updates the model’s weights on your data (e.g. domain-specific text, task examples). Good when you need the model to internalize new knowledge or a specific style; requires training infrastructure and data. RAG (Retrieval-Augmented Generation) keeps the model fixed and retrieves relevant documents at query time, then injects them into the prompt. Good when knowledge changes often or you don’t want to retrain; faster to implement and easier to update. Use RAG for dynamic/knowledge-heavy Q&A; use fine-tuning when you need consistent behavior or domain language.
30+ questions across attention, prompting, RAG, evaluation, and production — with full answers.
Upgrade to ProUnlock with Pro for the answer.
Unlock with Pro for the answer.
Unlock with Pro for the answer.
Unlock with Pro for the answer.
Unlock with Pro for the answer.
Unlock with Pro for the answer.
Unlock with Pro for the answer.
Unlock with Pro for the answer.
Unlock with Pro for the answer.
Unlock with Pro for the answer.
Unlock with Pro for the answer.
Unlock with Pro for the answer.