Large Language Models (LLMs) Course | Fakhruddin Khambaty's Learning Hub

Your LLM Journey: From Zero to a Working Model

📖

What is
an LLM?

→

🔤

Text to
Numbers

→

🧠

Neural Net
Basics

→

👁️

Attention
Mechanism

→

🏗️

Transformer
Architecture

→

🔥

Train
Your LLM

→

🎯

Fine-Tune
& RLHF

→

🚀

Deploy
& Use

Course Modules

✅ READY

📖

Module 1: What Are LLMs?

The big picture. What is a language model, why is everyone talking about it, and how does ChatGPT actually work at a very high level?

What is a Language Model? Brief History: N-grams → RNNs → Transformers The LLM Landscape (GPT, LLaMA, Claude, Gemini) How Does ChatGPT Actually Work? (ELI5)

✅ READY

🔤

Module 2: Text to Numbers

Computers only understand numbers. How do we convert words into math that a neural network can process? Tokenization, embeddings, and positional encoding.

Why Computers Can't Read WordsPro Tokenization (BPE, WordPiece, SentencePiece)Pro Word Embeddings (Word2Vec, GloVe)Pro Positional Encoding (How Order Matters)Pro Code: Build a Tokenizer From ScratchPro

✅ READY

🧠

Module 3: Neural Network Refresher

A fast-track refresher on the building blocks every LLM uses: neurons, layers, activation functions, backpropagation, and PyTorch basics.

What is a Neural Network? (ELI5)Pro Forward Pass & BackpropagationPro PyTorch Crash Course (Tensors, Autograd)Pro Training Loop: Loss, Optimizer, EpochsPro Code: Train a Simple Neural Net in PyTorchPro

✅ READY

👁️

Module 4: The Attention Mechanism

The breakthrough that made LLMs possible. "Attention Is All You Need" - we break it down piece by piece with visual examples and real math.

Why Attention? (The Problem with RNNs)Pro Query, Key, Value - The Library AnalogyPro Scaled Dot-Product Attention (With Math)Pro Multi-Head Attention (Parallel Understanding)Pro Self-Attention vs Cross-AttentionPro Code: Build Attention From Scratch in PyTorchPro

✅ READY

🏗️

Module 5: The Transformer Architecture

The full picture: encoder, decoder, layer normalization, feed-forward networks, residual connections. We build the entire Transformer block by block.

Encoder vs Decoder (BERT vs GPT)Pro Inside a Transformer BlockPro Layer Normalization & Residual ConnectionsPro Feed-Forward Network (The "Thinking" Layer)Pro Masking (Why GPT Can't Peek Ahead)Pro Code: Build a Full Transformer in PyTorchPro

✅ READY

🔥

Module 6: Training Your Own LLM

The main event! We train a GPT-style language model from scratch on real text data. Data preparation, training loop, loss curves, and text generation.

How LLMs Learn: Next-Token PredictionPro Data Preparation (Datasets & DataLoaders)Pro Model Config: Layers, Heads, Embedding SizePro The Training Loop (Step by Step)Pro Text Generation: Temperature & Top-K SamplingPro PROJECT: Build & Train a Mini-GPT!

✅ READY

🎯

Module 7: Fine-Tuning & RLHF

How ChatGPT went from "raw text predictor" to "helpful assistant." Supervised fine-tuning, LoRA, and Reinforcement Learning from Human Feedback.

Pre-training vs Fine-tuning (The Two Phases)Pro Supervised Fine-Tuning (SFT)Pro LoRA & QLoRA (Fine-tune with Little Memory)Pro RLHF: How Models Learn to Be HelpfulPro DPO: Direct Preference OptimizationPro Code: Fine-Tune a Model with LoRAPro

✅ READY

🚀

Module 8: Deployment & Real-World Use

Take your model from a notebook to production. Inference optimization, quantization, APIs, RAG, prompt engineering, and safety.

Inference: Making the Model FastPro Quantization (4-bit, 8-bit Models)Pro RAG: Retrieval Augmented GenerationPro Prompt Engineering (Getting Great Outputs)Pro Safety, Ethics & HallucinationsPro Serving via API (FastAPI + HuggingFace)Pro

🏆 CAPSTONE

🏆

Module 9: Capstone Project

Put it all together! Build a complete, working text-generating LLM from scratch: tokenizer, transformer, training, fine-tuning, and a simple web interface.

Full Project: Your Own Mini-ChatGPT Project Plan & ArchitecturePro Step 1: Dataset & TokenizerPro Step 2: Build the GPT ModelPro Step 3: Train on Your DataPro Step 4: Generate TextPro Step 5: Simple Chat Web InterfacePro

🛠️ Tech Stack & Skillset Roadmap

Every tool and library you'll learn in this course, what it does in plain English, where you'll use it, and how they all connect.

📋 Prerequisites (Before You Start)

These skills are assumed. If you're shaky on any, brush up first — the links go to our courses!

🐍 Python (Intermediate)

Functions, classes, list comprehensions, file I/O. You should be comfortable writing 50+ line scripts.

→ Our Python course

📐 Basic Linear Algebra

Vectors, matrices, dot products, matrix multiplication. Don't worry — Module 3 refreshes everything you need.

→ Our Math Foundations

🔧 Core Tools (You'll Master These)

These are the main tools used to build, train, and run LLMs. We teach every one from scratch in this course.

Module 3-6

🔥 PyTorch

THE deep learning framework for LLMs

Tensors (like numpy arrays but on GPU), automatic differentiation (autograd), neural network layers (nn.Module), optimizers (Adam), and the full training loop. Every major LLM (GPT, LLaMA, Mistral) is built with PyTorch.

Module 7-9

🤗 HuggingFace Transformers

The "app store" for pre-trained models

Download pre-trained LLMs (LLaMA, Mistral, GPT-2) in 3 lines of code. Fine-tune them with the Trainer API. The library has 200,000+ models. You'll use it to load, fine-tune, and deploy models.

Module 2

📝 Tokenizers (BPE, SentencePiece)

Convert text → numbers that models understand

Byte-Pair Encoding (BPE) splits words into subwords: "playing" → ["play", "ing"]. We build one from scratch, then use HuggingFace's fast Rust-based tokenizer library. GPT-4 uses ~100K tokens.

Throughout

📊 NumPy

The math engine behind everything

Matrix operations, random number generation, array manipulation. PyTorch tensors are NumPy arrays on steroids (with GPU support). You use NumPy for data prep, evaluation metrics, and quick prototyping.

Module 7

🧩 PEFT (LoRA / QLoRA)

Fine-tune billion-parameter models on a laptop

Parameter-Efficient Fine-Tuning. Instead of retraining all 7 billion weights, LoRA adds tiny "adapter" matrices (~0.1% of params). QLoRA combines this with 4-bit quantization. Fine-tune LLaMA-7B on a single GPU!

Module 8

🗜️ bitsandbytes (Quantization)

Shrink models from 28GB → 4GB

Quantization compresses model weights from 32-bit → 8-bit or 4-bit. A 7B model goes from 28GB to 4GB! Runs on consumer GPUs. We use bitsandbytes library for 4-bit loading and NF4 quantization.

Module 8-9

🚀 FastAPI

Serve your model as a REST API

Build a /generate endpoint that accepts a prompt and returns model output. FastAPI is async, auto-generates docs, and is the standard for ML model serving. Your capstone project uses this to create a chat API.

Module 6-7

💾 HuggingFace Datasets

Load any dataset in one line

Access 100,000+ datasets: text corpora, instruction-following pairs, preference data. Built-in streaming for massive datasets. We use TinyShakespeare for training and Alpaca for fine-tuning.

🧠 Key Concepts You'll Master

Beyond tools — these are the ideas and architectures you'll understand deeply.

🔤

Tokenization & BPE

Text → numbers

📍

Embeddings & Positional Encoding

Meaning + order

👁️

Self-Attention & Multi-Head

The Q, K, V magic

🏗️

Transformer Architecture

The full GPT blueprint

🎯

Next-Token Prediction

How LLMs learn

🌡️

Temperature & Sampling

Creative vs safe output

🎓

RLHF & DPO

Human preference alignment

🔍

RAG (Retrieval Augmented)

Give LLMs real docs

⚡ Quick Setup (One Command)

Install everything you need for this course in one line:

                # Install all LLM course dependencies

                pip install torch numpy transformers datasets tokenizers

                pip install peft bitsandbytes accelerate trl

                pip install fastapi uvicorn langchain

                # Optional: for Jupyter notebooks

                pip install jupyter matplotlib

Requires Python 3.9+. GPU recommended for Module 6+ (but not required — CPU works, just slower).