What Are LLMs? | LLM Course | Fakhruddin Khambaty's Learning Hub

Part 1: What is a Language Model?

A language model is a computer program that has read so much text that it can predict what word comes next in a sentence. That's it. That's the whole idea.

👶 Like You're 5

Your phone keyboard does this! When you type "I am going to the", it suggests "store" or "park" or "gym." Your phone is a tiny language model. It learned from billions of text messages what words usually follow other words.

Now imagine that keyboard on steroids — trained on the entire internet, every book ever written, all of Wikipedia. That's an LLM. It got SO good at predicting the next word that it can write essays, answer questions, code, and have conversations!

🎰 The Slot Machine Analogy

Think of an LLM as a slot machine. You pull the lever (give it a prompt: "The capital of France is"), and the wheels spin through every possible next word: "Paris" (99% chance), "London" (0.01%), "pizza" (0.001%). It picks the most likely one. Then uses "The capital of France is Paris" as input and predicts the NEXT word. And the next. And the next. That's autoregressive generation — one word at a time, each based on everything before it.

🎮 Interactive: Play "Predict the Next Word"!

Click any option to complete the sentence. This is exactly what an LLM does — but it considers thousands of options!

The cat sat on the ___

💡 The Key Insight

LLM = next-word prediction machine. It predicts one word, then uses that to predict the next, over and over.
It doesn't "understand" the way humans do. It has learned statistical patterns from enormous amounts of text.
The "Large" in LLM means billions of parameters (knobs the model tunes during training).
GPT-4 has ~1.8 trillion parameters. GPT-3 had 175 billion. Your brain has ~100 trillion synapses.

Part 2: How We Got Here (A Brief History)

LLMs didn't appear overnight. Here's the journey from counting word pairs to ChatGPT:

1950s-2000s

N-grams: Count Word Pairs

"After seeing 'ice' followed by 'cream' 10,000 times, the next word after 'ice' is probably 'cream'." Simple counting. Works for 2-3 words but fails for long sentences.

2013

Word2Vec: Words Become Numbers

Breakthrough: represent words as vectors (lists of numbers) where similar words are close together. "King" - "Man" + "Woman" = "Queen". Words finally have mathematical meaning!

2014-2017

RNNs & LSTMs: Reading One Word at a Time

Neural networks that process text sequentially (word by word). Good for short text, but they forget the beginning by the time they reach the end of a long document. Like reading a book one word at a time through a keyhole.

2017 ⚡

"Attention Is All You Need" — The Transformer

Google researchers published the paper that changed everything. The Transformer architecture can look at ALL words simultaneously (not one by one). Massively parallelizable. This is the foundation of every modern LLM.

2018

BERT & GPT-1: Pre-training on Huge Text

Google's BERT and OpenAI's GPT-1 showed that pre-training on massive text, then fine-tuning for specific tasks, works incredibly well. GPT-1 had 117 million parameters.

2020

GPT-3: 175 Billion Parameters

OpenAI scaled up to 175B parameters. GPT-3 could write essays, code, translate, and answer questions without any task-specific training. The era of "few-shot learning" began.

2022-Present

ChatGPT, GPT-4, Claude, LLaMA, Gemini

RLHF (learning from human feedback) made models helpful and conversational. Open-source models (LLaMA, Mistral) democratized access. Now anyone can fine-tune an LLM on their laptop!

Part 3: The LLM Landscape Today

Here are the major players. Think of it like the smartphone market — a few big companies, each with their own approach:

🟢

GPT-4 / GPT-4o

OpenAI. The one that started the revolution. Closed-source. Powers ChatGPT.

🟣

Claude

Anthropic. Known for being helpful and safe. Long context windows.

🔵

Gemini

Google DeepMind. Multimodal (text + images + video). Powers Google products.

🦙

LLaMA

Meta. Open-source! Anyone can download and use it. Community favorite.

🌊

Mistral / Mixtral

Mistral AI (France). Open-source. Amazingly efficient for its size.

🔶

DeepSeek

DeepSeek (China). Open-source. Great for code and reasoning. Very efficient.

Open Source vs Closed Source

Closed-source (GPT-4, Claude, Gemini): You access via API. Can't see the code or weights. The company controls everything.
Open-source (LLaMA, Mistral, DeepSeek): You can download the model, see the code, fine-tune it, run it on your own hardware. This is what we'll use in this course!

Part 4: How Does ChatGPT Actually Work? (ELI5)

Let's walk through what happens from the moment you type a question to the moment you see an answer. No jargon, just plain English:

The Journey of Your Prompt Through an LLM

🧩 The 6-Step Summary

1. You type a question ("What is DNA?")

2. Tokenize — Your text is split into tokens (small pieces) and converted to numbers

3. Embed — Each number becomes a rich vector (a list of 4,096+ numbers capturing meaning)

4. Transform — The vectors pass through dozens of Transformer layers. Each layer uses Attention to figure out how every word relates to every other word

5. Predict — The model outputs a probability for every possible next word and picks the best one

6. Repeat — The predicted word is added to the input, and steps 2-5 repeat until the answer is complete

🎓 What You'll Build in This Course

Module 2: Build step 2 (Tokenizer) from scratch
Module 3: Understand the neural network behind steps 3-5
Module 4: Build the Attention mechanism (the core of step 4)
Module 5: Build the complete Transformer architecture
Module 6: Train it on real text and generate your own completions!
Module 7-8: Fine-tune, optimize, and deploy
Module 9: Combine everything into a working mini-ChatGPT!

Part 5: Your LLM Toolbox — Technologies You'll Use

Before we dive into building, here's every tool in your toolkit. Think of this as the "ingredients list" before cooking a recipe. For each one, we explain what it is, why LLMs need it, and exactly where you'll use it in this course.

🍳 The Kitchen Analogy

Building an LLM is like cooking a complex dish. You need a stove (PyTorch), ingredients (Datasets), a recipe book (HuggingFace), a food processor (Tokenizers), measuring cups (NumPy), a serving plate (FastAPI), and spice adjustments (PEFT/LoRA). Each tool has a specific job. Here they all are:

🔥 PyTorch — The Engine That Powers Everything

🔥

Module 3–6

What: An open-source deep learning framework created by Meta (Facebook). It's the language you use to define, train, and run neural networks.

Why LLMs need it: Every modern LLM (GPT, LLaMA, Mistral, Claude) is built on PyTorch. It provides tensors (multi-dimensional arrays that run on GPUs), autograd (automatic calculation of gradients for backpropagation), and nn.Module (building blocks for layers like Attention, LayerNorm, Linear).

How you'll use it:

Module 3: Learn tensors, autograd, and write your first training loop
Module 4: Build the Attention mechanism as a nn.Module class
Module 5: Assemble the full Transformer from PyTorch layers
Module 6: Train your mini-GPT end-to-end with optimizer and loss function

🤗 HuggingFace Transformers — The Model Supermarket

🤗

Module 7–9

What: A Python library and platform with 200,000+ pre-trained models. Download GPT-2, LLaMA, Mistral in 3 lines of code.

Why LLMs need it: Training an LLM from absolute scratch takes millions of dollars. HuggingFace gives you pre-trained models that already know language. You just fine-tune them for your specific task (customer support, code generation, medical Q&A).

How you'll use it:

Module 7: Load a pre-trained model and fine-tune it on custom data with Trainer API
Module 8: Load quantized models for deployment, use pipelines for inference
Module 9: Combine everything — load model, fine-tune, serve via API

📝 Tokenizers (BPE / SentencePiece) — The Translator

📝

Module 2

What: Algorithms that convert raw text into numbers (tokens) that neural networks can process. "Hello world" → [15496, 995].

Why LLMs need it: Neural networks can't read English — they only understand numbers. The tokenizer is the first and last step of every LLM interaction. Text → tokens → model → tokens → text. GPT-4 uses ~100,000 unique tokens.

How you'll use it:

Module 2: Build a character-level tokenizer from scratch in pure Python
Module 2: Build a BPE (Byte-Pair Encoding) tokenizer from scratch
Module 6+: Use HuggingFace's fast Rust-based tokenizer library for real models

📊 NumPy — The Math Swiss Army Knife

📊

Throughout

What: Python library for fast numerical operations — arrays, matrices, linear algebra, random numbers.

Why LLMs need it: LLMs are fundamentally giant matrix multiplication machines. NumPy is the foundation that PyTorch is built on. You use it for data preparation, computing metrics, and understanding the math behind attention and embeddings.

How you'll use it: Data loading, computing softmax manually, generating positional encodings, evaluation metrics, and quick prototyping before moving to PyTorch.

🧩 PEFT (LoRA / QLoRA) — The Efficiency Hack

🧩

Module 7

What: Parameter-Efficient Fine-Tuning. Instead of updating all 7 billion weights in a model, LoRA adds tiny "adapter" matrices (0.1% of parameters) and trains only those.

Why LLMs need it: Fine-tuning a 7B model normally requires 4× A100 GPUs ($50K+). With QLoRA (4-bit quantization + LoRA), you can fine-tune on a single consumer GPU with 16GB VRAM. This is how individuals and startups customize LLMs.

How you'll use it:

Module 7: Fine-tune a model using the PEFT library with rank-4 LoRA adapters
Module 9: Apply QLoRA in the capstone project for memory-efficient training

🗜️ bitsandbytes — The Model Shrinker

🗜️

Module 8

What: A library that compresses model weights from 32-bit floats to 8-bit or 4-bit integers. A 7B model goes from 28GB → 4GB.

Why LLMs need it: LLMs are huge. LLaMA-7B is 28GB at full precision — that won't fit on most GPUs. Quantization shrinks it to 4GB with minimal quality loss. This is how you run models on a laptop or a $10/month cloud GPU.

How you'll use it:

Module 8: Load a model in 4-bit precision with BitsAndBytesConfig
Module 9: Quantize the capstone model for fast inference

💾 HuggingFace Datasets — The Data Library

💾

Module 6–7

What: A library with 100,000+ datasets. Load any dataset in one line: load_dataset("tiny_shakespeare").

Why LLMs need it: LLMs are trained on massive text — books, Wikipedia, code, conversations. The Datasets library handles streaming (load data that's too big for RAM), tokenization mapping, and train/test splitting automatically.

How you'll use it:

Module 6: Load TinyShakespeare (1MB text) for training your mini-GPT
Module 7: Load Alpaca instruction dataset for fine-tuning

🚀 FastAPI — The Serving Layer

🚀

Module 8–9

What: A modern Python web framework for building APIs. It's async, auto-generates documentation, and is the standard for ML model serving.

Why LLMs need it: A trained model sitting on your laptop is useless to others. FastAPI wraps your model in an HTTP API: send a POST request with a prompt, get back generated text. This is how ChatGPT works behind the scenes — it's an API.

How you'll use it:

Module 8: Build a /generate endpoint that accepts prompts and returns completions
Module 9: Connect the capstone model to a simple chat web interface via API

⚡ Install Everything in One Command

pip install torch numpy transformers datasets tokenizers
pip install peft bitsandbytes accelerate trl
pip install fastapi uvicorn

Requires Python 3.9+. GPU recommended from Module 6 onward (but CPU works — just slower). We'll remind you to install each tool when the time comes in each module!

🗺️ Which Tool Goes Where?

Module 2: Tokenizers + NumPy
Module 3: PyTorch (tensors, autograd, nn.Module)
Module 4: PyTorch (build Attention)
Module 5: PyTorch (build full Transformer)
Module 6: PyTorch + Datasets + Tokenizers (train mini-GPT)
Module 7: HuggingFace Transformers + PEFT + Datasets (fine-tune)
Module 8: bitsandbytes + FastAPI + Transformers (deploy)
Module 9: ALL of the above combined into one project!

🧠 What Are Large Language Models?

Part 1: What is a Language Model?

👶 Like You're 5

🎰 The Slot Machine Analogy

🎮 Interactive: Play "Predict the Next Word"!

💡 The Key Insight

Part 2: How We Got Here (A Brief History)

N-grams: Count Word Pairs

Word2Vec: Words Become Numbers

RNNs & LSTMs: Reading One Word at a Time

"Attention Is All You Need" — The Transformer

BERT & GPT-1: Pre-training on Huge Text

GPT-3: 175 Billion Parameters

ChatGPT, GPT-4, Claude, LLaMA, Gemini

Part 3: The LLM Landscape Today

GPT-4 / GPT-4o

Claude

Gemini

LLaMA

Mistral / Mixtral

DeepSeek

Open Source vs Closed Source

Part 4: How Does ChatGPT Actually Work? (ELI5)

The Journey of Your Prompt Through an LLM

🧩 The 6-Step Summary

🎓 What You'll Build in This Course

Part 5: Your LLM Toolbox — Technologies You'll Use

🍳 The Kitchen Analogy

🔥 PyTorch — The Engine That Powers Everything

🤗 HuggingFace Transformers — The Model Supermarket

📝 Tokenizers (BPE / SentencePiece) — The Translator

📊 NumPy — The Math Swiss Army Knife

🧩 PEFT (LoRA / QLoRA) — The Efficiency Hack

🗜️ bitsandbytes — The Model Shrinker

💾 HuggingFace Datasets — The Data Library

🚀 FastAPI — The Serving Layer

⚡ Install Everything in One Command

🗺️ Which Tool Goes Where?

🧪 Quiz — Test Your Understanding!

Question 1: What is a language model at its core?

Question 2: What was the main limitation of N-gram language models (1950s–2000s) compared to modern neural models?

Question 3: What is the key architectural difference between GPT and BERT?

Question 4: Which company created the LLaMA family of open-source models?

Question 5: How does ChatGPT generate a long, multi-sentence response?

Question 6: In LLM text generation, what does the "temperature" parameter control?