Indexes & Performance | PostgreSQL Mastery

1️⃣ What Is an Index?

🧒 ELI5 — The Book Index Analogy

Imagine you have a 500-page textbook and you need to find every mention of "photosynthesis." You have two choices: (1) Start on page 1 and read every single page until the end — this is what PostgreSQL does without an index (called a sequential scan). Or (2) flip to the index at the back of the book, look up "photosynthesis," and it tells you: "pages 42, 87, 156." You jump straight there — done in seconds! A database index works exactly the same way. It's a separate data structure that tells PostgreSQL exactly where to find the rows you're looking for, without reading the entire table.

🐌

Without Index
Read ALL 10 million rows
~30 seconds

⚡

With Index
Jump straight to the answer
~2 milliseconds

🌍 Real-Life Analogy — The Phone Book

Old-school phone books were organized alphabetically by last name. If you wanted to find "Smith, John," you'd flip to the S section and find it in seconds. But if someone asked: "Find everyone on Maple Street" — you'd have to read the ENTIRE phone book because it's not organized by street. That's why the choice of which columns to index matters so much. You can only look things up fast if there's an index on that particular column.

Your First Index

PostgreSQL automatically creates indexes on PRIMARY KEY and UNIQUE columns. But for other columns you query often, you create indexes yourself:

-- Basic syntax
CREATE INDEX index_name ON table_name (column_name);

-- Example: index on the email column of a users table
CREATE INDEX idx_users_email ON users (email);

-- Now this query will be FAST:
SELECT * FROM users WHERE email = 'alice@example.com';

💡 Pro Tip — Naming Convention

Use the pattern idx_tablename_columnname for index names. For multi-column indexes: idx_orders_customer_id_date. Clear names make it easy to know what an index is for when you're debugging performance months later.

Let's Create a Big Table to Test With

Small tables don't show performance differences. Let's create a table with 1 million rows to see the dramatic difference indexes make:

-- Create a big table
CREATE TABLE big_orders (
    id          SERIAL PRIMARY KEY,
    customer_id INTEGER,
    amount      DECIMAL(10,2),
    status      VARCHAR(20),
    created_at  TIMESTAMP
);

-- Insert 1 million rows of random data
INSERT INTO big_orders (customer_id, amount, status, created_at)
SELECT
    (RANDOM() * 10000)::INTEGER,
    (RANDOM() * 1000)::DECIMAL(10,2),
    CASE (RANDOM() * 3)::INTEGER
        WHEN 0 THEN 'pending'
        WHEN 1 THEN 'completed'
        WHEN 2 THEN 'cancelled'
        ELSE 'refunded'
    END,
    NOW() - (RANDOM() * 365)::INTEGER * INTERVAL '1 day'
FROM generate_series(1, 1000000);

2️⃣ How B-tree Indexes Work

🧒 ELI5 — The Guessing Game

Remember the "guess a number between 1 and 100" game? You don't start at 1 and count up. You start in the middle: "Is it higher or lower than 50?" Then: "Higher or lower than 75?" Then: "Higher or lower than 62?" In just 7 guesses, you can find any number from 1 to 100. That's called binary search, and it's exactly how a B-tree index works! Instead of checking every row, it narrows down by half at each step. For 1 million rows, that's only about 20 steps instead of 1,000,000.

The B-tree Structure (Visual)

A B-tree (Balanced tree) organizes data like a family tree where every branch is the same height:

                         [50]                    ← Root node
                       /      \
                   [25]        [75]               ← Internal nodes
                  /    \      /    \
              [10,20] [30,40] [60,70] [80,90]      ← Leaf nodes (point to actual rows)

Looking for customer_id = 70:
  1. Root: 70 > 50 → go right
  2. Node: 70 < 75 → go left
  3. Leaf: found 70! → points to the actual row on disk
  Only 3 steps instead of scanning 1,000,000 rows!

B-tree Is the Default

When you write CREATE INDEX without specifying a type, PostgreSQL creates a B-tree index. It's the most versatile and handles these operations efficiently:

Operation	Example	B-tree Can Help?
Equality	`WHERE email = 'alice@test.com'`	✅ Yes
Range (less/greater)	`WHERE price > 100`	✅ Yes
BETWEEN	`WHERE age BETWEEN 20 AND 30`	✅ Yes
ORDER BY	`ORDER BY created_at DESC`	✅ Yes (avoids sorting!)
IS NULL	`WHERE deleted_at IS NULL`	✅ Yes
LIKE prefix	`WHERE name LIKE 'Al%'`	✅ Yes
LIKE middle/suffix	`WHERE name LIKE '%ice'`	❌ No

⚠️ LIKE '%pattern' Can't Use B-tree Indexes!

A B-tree works like a phone book — organized from left to right. LIKE 'Al%' works because you can flip to the "Al" section. But LIKE '%ice' is like asking "find all names ending in 'ice'" — you'd have to read the entire book. For suffix/contains searches, you need a different index type (GIN with trigrams).

Seeing the Speed Difference

-- BEFORE creating an index (slow!)
EXPLAIN ANALYZE
SELECT * FROM big_orders WHERE customer_id = 42;
-- Seq Scan on big_orders  (cost=0.00..20834.00 rows=100 width=52)
--   Filter: (customer_id = 42)
--   Execution Time: 85.432 ms  ← SLOW!

-- Create the index
CREATE INDEX idx_big_orders_customer_id ON big_orders (customer_id);

-- AFTER creating the index (fast!)
EXPLAIN ANALYZE
SELECT * FROM big_orders WHERE customer_id = 42;
-- Index Scan using idx_big_orders_customer_id  (cost=0.42..8.87 rows=100 width=52)
--   Index Cond: (customer_id = 42)
--   Execution Time: 0.215 ms  ← 400x FASTER!

85 ms

Without index (Seq Scan)

0.2 ms

With index (Index Scan)

400x

Faster!

3️⃣ Other Index Types

B-tree is the Swiss Army knife, but PostgreSQL has specialized index types for specific use cases:

Hash Index — Equality Only

🧒 ELI5 — The Coat Check

When you check your coat at a restaurant, they give you a ticket number. To get your coat back, you hand in the number and they go straight to that hook. They can't help you find "all coats that are darker than navy blue" — it's exact match only. That's a hash index: incredibly fast for = checks, but useless for ranges, sorting, or anything else.

CREATE INDEX idx_users_uuid ON users USING HASH (uuid);

-- ✅ Fast:
SELECT * FROM users WHERE uuid = 'abc-123-def';

-- ❌ Cannot use this index:
SELECT * FROM users WHERE uuid > 'abc';
SELECT * FROM users ORDER BY uuid;

GIN Index — Full-Text, Arrays, JSONB

🧒 ELI5 — The Keyword Concordance

In a textbook concordance, each keyword has a list of every page where it appears: "photosynthesis: 42, 87, 156." GIN (Generalized Inverted Index) works the same way — for each value or keyword, it stores a list of all rows that contain it. Perfect when a single column contains multiple values (arrays, JSONB, or full-text documents).

-- Full-text search on articles
CREATE INDEX idx_articles_search ON articles USING GIN (
    to_tsvector('english', title || ' ' || body)
);

-- Search for articles about "database performance"
SELECT title FROM articles
WHERE to_tsvector('english', title || ' ' || body)
   @@ to_tsquery('database & performance');

-- JSONB index for flexible schemas
CREATE INDEX idx_events_data ON events USING GIN (metadata jsonb_path_ops);

-- Array containment
CREATE INDEX idx_products_tags ON products USING GIN (tags);
SELECT * FROM products WHERE tags @> ARRAY['organic', 'vegan'];

GiST Index — Geometry & Ranges

-- Spatial data (PostGIS)
CREATE INDEX idx_locations_geom ON locations USING GIST (geom);

-- Range types
CREATE INDEX idx_reservations_period ON reservations USING GIST (date_range);

BRIN Index — Huge Sorted Tables

🧒 ELI5 — The City Block Map

Instead of indexing every single house, BRIN (Block Range INdex) just remembers: "Block 1 has addresses 1–100, Block 2 has 101–200." Super tiny and fast to maintain. Works brilliantly when data is naturally sorted — like a log table where rows are always inserted in order of timestamp.

-- Perfect for time-series data where rows arrive in order
CREATE INDEX idx_logs_created_at ON server_logs USING BRIN (created_at);

-- Tiny index size: ~50 KB instead of ~200 MB for a B-tree on 100M rows!

Index Type Cheat Sheet

Index Type	Best For	Size	Supports
B-tree	General purpose	Medium	=, <, >, BETWEEN, ORDER BY, LIKE 'prefix%'
Hash	Exact equality only	Small	= only
GIN	Full-text, arrays, JSONB	Large	Contains, search, @>, @@
GiST	Geometry, ranges	Medium	Overlaps, contains, nearest-neighbor
BRIN	Huge sorted tables	Tiny	=, <, > on naturally ordered data

💡 Pro Tip — When in Doubt, Use B-tree

B-tree handles 90% of use cases. Only reach for specialized types when you have a specific need: GIN for full-text/JSONB, GiST for spatial, BRIN for append-only time-series. Don't over-optimize — start with B-tree and switch only when you measure a problem.

4️⃣ EXPLAIN ANALYZE — X-Ray for Queries

🧒 ELI5 — The Doctor's X-Ray

When you're sick, the doctor doesn't just guess — they take an X-ray to see what's happening inside your body. EXPLAIN ANALYZE is an X-ray for your SQL queries. It shows you exactly what PostgreSQL does to find your data: which tables it scans, which indexes it uses, how many rows it processes, and how long each step takes. Instead of guessing why a query is slow, you can see the bottleneck.

EXPLAIN vs. EXPLAIN ANALYZE

Command	What It Does	Runs the Query?
`EXPLAIN`	Shows the plan (estimated costs and rows)	❌ No — just estimates
`EXPLAIN ANALYZE`	Shows the plan AND actual timing/row counts	✅ Yes — actually runs it

⚠️ EXPLAIN ANALYZE Actually Executes the Query!

Be careful with EXPLAIN ANALYZE on DELETE or UPDATE statements — they will actually modify your data! Wrap them in a transaction and rollback: BEGIN; EXPLAIN ANALYZE DELETE ...; ROLLBACK;

Reading an Execution Plan

EXPLAIN ANALYZE
SELECT * FROM big_orders
WHERE customer_id = 42 AND status = 'completed';

-- Sample output (with index on customer_id):

Index Scan using idx_big_orders_customer_id on big_orders
  (cost=0.42..12.55 rows=25 width=52)
  (actual time=0.035..0.128 rows=23 loops=1)
  Index Cond: (customer_id = 42)
  Filter: (status = 'completed'::text)
  Rows Removed by Filter: 77
Planning Time: 0.125 ms
Execution Time: 0.168 ms

Let's decode each piece:

Part	Meaning
`Index Scan using idx_...`	PostgreSQL used an index — great!
`cost=0.42..12.55`	Estimated startup cost..total cost (abstract units)
`rows=25` (estimated)	PostgreSQL guessed 25 rows would match
`actual time=0.035..0.128`	Real time: 0.035ms to first row, 0.128ms to all rows
`rows=23` (actual)	Actually found 23 rows (close to the estimate!)
`Index Cond`	The condition that used the index
`Filter`	Additional filtering done AFTER the index lookup
`Rows Removed by Filter: 77`	Found 100 rows via index, then filtered out 77
`Execution Time`	Total wall-clock time — the number that matters!

Common Scan Types

Seq Scan

Reads every row. Slowest for large tables. Often means a missing index.

Index Scan

Uses index to find rows, then fetches full rows from table. Fast!

Index Only Scan

Gets ALL needed data from the index itself. Fastest!

Bitmap Scan

Builds a bitmap of matching pages, then reads them. Good for many matches.

💡 Pro Tip — Use EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)

Adding BUFFERS shows disk I/O (cache hits vs. actual reads): EXPLAIN (ANALYZE, BUFFERS) SELECT .... High Buffers: shared read means data wasn't cached. High shared hit means PostgreSQL read from memory cache — much faster.

5️⃣ When NOT to Index

🧒 ELI5 — Too Many Bookmarks

Imagine putting a sticky bookmark on every single page of a book. Now the book is twice as thick, the bookmarks don't help you find anything faster, and every time you add a new page you have to add a new bookmark too. That's what happens when you create too many indexes! Indexes have a cost — storage space and slower writes. They're only worth it when they actually speed up your reads.

When Indexes DON'T Help

Situation	Why Index Doesn't Help	Example
Very small tables	Reading 100 rows is already instant — an index adds overhead for no benefit	A `countries` table with 200 rows
Low-cardinality columns	If only 3 unique values exist, the index can't narrow things down much	A `status` column with values: active, inactive, pending
Write-heavy tables	Every INSERT/UPDATE/DELETE must also update the index — slows writes	A high-volume logging table with 10,000 inserts/second
Columns rarely queried	An index on a column nobody searches is pure waste	An internal `notes` field nobody filters on
Queries returning most rows	If 80% of rows match, a sequential scan is faster than using an index	`WHERE active = true` when 95% of users are active

The Cost of Indexes

💾

Storage
Each index can be 10-50% of the table size

🐢

Slower Writes
INSERT/UPDATE must update every index

🔧

Maintenance
VACUUM must maintain index pages too

-- Check the size of all indexes on a table
SELECT indexname,
       pg_size_pretty(pg_relation_size(indexname::regclass)) AS index_size
FROM pg_indexes
WHERE tablename = 'big_orders';

-- Find unused indexes (waste of space!)
SELECT schemaname, relname AS table_name,
       indexrelname AS index_name,
       idx_scan AS times_used,
       pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

🌍 Real-Life Analogy

Think of a small coffee shop menu with 5 items. Does it need an alphabetical index? Absolutely not — you can read the whole thing in 2 seconds. But a 200-page restaurant menu at a food court? Yes, you'd want a "Cuisine Type" index! Size matters when deciding whether to index.

6️⃣ Partial Indexes & Expression Indexes

These are PostgreSQL power features that let you create smarter, smaller indexes.

Partial Indexes — Index Only Some Rows

🧒 ELI5 — The VIP List

Instead of putting everyone's name in a phone book, imagine you only put VIP customers in a special booklet. It's much smaller, faster to search, and only contains the people you actually care about looking up. A partial index does the same — it indexes only rows that match a condition.

-- Only index active orders (ignore completed/cancelled ones)
CREATE INDEX idx_orders_active
ON big_orders (customer_id)
WHERE status = 'pending';

-- This query uses the partial index:
SELECT * FROM big_orders
WHERE customer_id = 42 AND status = 'pending';

-- This query does NOT use it (different status):
SELECT * FROM big_orders
WHERE customer_id = 42 AND status = 'completed';

💡 Pro Tip — Partial Indexes Save Space!

If only 5% of orders are "pending," a partial index is 20x smaller than a full index. Smaller index = fits in memory = faster lookups. This is one of PostgreSQL's most underused superpowers.

Expression Indexes — Index a Computed Value

🧒 ELI5 — The Nickname Directory

Imagine a school directory sorted by students' nicknames instead of full names. Even though "nickname" isn't stored anywhere, the directory computes it and sorts by it. Expression indexes let you index the result of a function or expression, not just a raw column.

-- Index on LOWER(email) for case-insensitive email lookups
CREATE INDEX idx_users_email_lower
ON users (LOWER(email));

-- This query now uses the index:
SELECT * FROM users
WHERE LOWER(email) = 'alice@example.com';

-- ❌ This does NOT use it (no LOWER function):
SELECT * FROM users
WHERE email = 'alice@example.com';

-- Index on extracted year from a timestamp
CREATE INDEX idx_orders_year
ON big_orders (EXTRACT(YEAR FROM created_at));

-- Fast lookup of all 2024 orders
SELECT * FROM big_orders
WHERE EXTRACT(YEAR FROM created_at) = 2024;

⚠️ The Query Must Match the Index Expression Exactly!

If the index is on LOWER(email), your WHERE clause must also use LOWER(email). Using email ILIKE or just email = won't match the index. The expression in the query must be identical to the expression in the index definition.

7️⃣ Multi-Column Indexes

🧒 ELI5 — The Phone Book, Again

A phone book is sorted by Last Name first, then by First Name within the same last name. So you can quickly find "Smith" (last name only), or "Smith, John" (both names). But can you find everyone named "John" regardless of last name? Nope — you'd have to read the whole book, because it's not organized by first name first. Multi-column indexes work the same way. The order of columns matters!

-- Create a multi-column index
CREATE INDEX idx_orders_cust_status
ON big_orders (customer_id, status);

The Left-Prefix Rule

A multi-column index on (A, B, C) can be used for queries on:

Query Filters On	Uses Index?	Why
`WHERE A = ?`	✅ Yes	Leftmost column
`WHERE A = ? AND B = ?`	✅ Yes	First two columns (left prefix)
`WHERE A = ? AND B = ? AND C = ?`	✅ Yes	All three columns
`WHERE B = ?`	❌ No	Skips the leftmost column!
`WHERE C = ?`	❌ No	Skips A and B
`WHERE A = ? AND C = ?`	⚠️ Partial	Uses index for A, then filters C

🌍 Real-Life Analogy

It's like a filing cabinet organized by: Year → Month → Day. You can quickly find "all files from 2024" (Year only), or "all files from March 2024" (Year + Month). But "all files from the 15th of any month in any year"? You'd have to check every drawer — the cabinet isn't organized that way.

-- ✅ Uses the index (customer_id is the leftmost column)
EXPLAIN ANALYZE
SELECT * FROM big_orders
WHERE customer_id = 42 AND status = 'pending';

-- ❌ Cannot use the index (status is NOT the leftmost column)
EXPLAIN ANALYZE
SELECT * FROM big_orders
WHERE status = 'pending';

💡 Pro Tip — Column Order Strategy

Put the most selective (most unique values) column first. If customer_id has 10,000 unique values and status has only 4, put customer_id first. This narrows down the results most quickly at the first level of the tree.

8️⃣ Common Performance Mistakes

Here are the pitfalls that trip up even experienced developers:

Mistake 1: Missing Index on JOIN Columns

-- ❌ SLOW: If orders.customer_id has no index, this scans the entire orders table for EVERY customer
SELECT c.name, o.amount
FROM customers c
JOIN orders o ON o.customer_id = c.id;

-- ✅ FIX: Create an index on the foreign key
CREATE INDEX idx_orders_customer_id ON orders (customer_id);

⚠️ PostgreSQL Does NOT Auto-Index Foreign Keys!

Unlike some other databases, PostgreSQL does not automatically create indexes on foreign key columns. It only auto-creates indexes on PRIMARY KEY and UNIQUE constraints. You must create FK indexes yourself! This is the #1 performance mistake in PostgreSQL.

Mistake 2: Functions on Indexed Columns

-- ❌ SLOW: UPPER() wraps the column, so the index on "email" can't be used
SELECT * FROM users
WHERE UPPER(email) = 'ALICE@EXAMPLE.COM';

-- ✅ FIX Option A: Create an expression index
CREATE INDEX idx_users_email_upper ON users (UPPER(email));

-- ✅ FIX Option B: Store data normalized, query without function
SELECT * FROM users WHERE email = 'alice@example.com';

Mistake 3: Leading Wildcard in LIKE

-- ❌ SLOW: Leading % means "anything before" — can't use B-tree
SELECT * FROM products
WHERE name LIKE '%phone%';

-- ✅ FIX: Use a GIN trigram index
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_products_name_trgm
ON products USING GIN (name gin_trgm_ops);

-- Now '%phone%' can use the GIN index!

Mistake 4: Too Many Indexes

-- ❌ BAD: Indexing every column wastes space and slows writes
CREATE INDEX idx1 ON orders (customer_id);
CREATE INDEX idx2 ON orders (status);
CREATE INDEX idx3 ON orders (amount);
CREATE INDEX idx4 ON orders (created_at);
CREATE INDEX idx5 ON orders (customer_id, status);
CREATE INDEX idx6 ON orders (customer_id, created_at);
-- 6 indexes on one table! Each INSERT now updates 6 indexes + the table.

-- ✅ BETTER: Identify actual query patterns and create only what's needed
CREATE INDEX idx_orders_cust_status ON orders (customer_id, status);
CREATE INDEX idx_orders_created ON orders (created_at);
-- 2 indexes that cover the real query patterns

Mistake 5: SELECT * When You Don't Need All Columns

-- ❌ Fetches ALL columns — can't use "Index Only Scan"
SELECT * FROM big_orders WHERE customer_id = 42;

-- ✅ Only fetch what you need — might enable "Index Only Scan"
SELECT id, amount FROM big_orders WHERE customer_id = 42;

Performance Checklist

Index foreign key columns

Every column used in a JOIN or WHERE that references another table should have an index.

Check with EXPLAIN ANALYZE

Don't guess — measure! Run EXPLAIN ANALYZE on slow queries to find sequential scans.

Avoid functions on indexed columns

Use expression indexes if you must filter by LOWER(col) or EXTRACT(... FROM col).

Use partial indexes for filtered queries

If you always query WHERE status = 'active', a partial index is smaller and faster.

Remove unused indexes

Check pg_stat_user_indexes for indexes with idx_scan = 0. Drop them to save space and speed up writes.

🏋️ Practice Exercises

Time to apply your indexing knowledge!

Exercise 1: Diagnose the Slow Query

You have a users table with 5 million rows. This query takes 4 seconds: SELECT * FROM users WHERE email = 'test@example.com';. What's the likely problem and how do you fix it?

Reveal Solution

The email column likely has no index, so PostgreSQL does a sequential scan of all 5 million rows.

-- First, confirm with EXPLAIN ANALYZE
EXPLAIN ANALYZE
SELECT * FROM users WHERE email = 'test@example.com';
-- Look for "Seq Scan" in the output

-- Fix: Create an index (UNIQUE if emails are unique)
CREATE UNIQUE INDEX idx_users_email ON users (email);

Exercise 2: Choose the Right Index Type

You need to search a products table for items where the description column contains the word "organic". Which index type would you use and why?

Reveal Solution

A GIN index with full-text search, because you need to search inside text content:

CREATE INDEX idx_products_description
ON products USING GIN (
    to_tsvector('english', description)
);

SELECT * FROM products
WHERE to_tsvector('english', description)
   @@ to_tsquery('organic');

Exercise 3: Multi-Column Index Design

Your app always queries orders like: WHERE customer_id = ? AND created_at > ?. Design the optimal index.

Reveal Solution

-- customer_id first (equality), then created_at (range)
CREATE INDEX idx_orders_cust_date
ON orders (customer_id, created_at);

Put the equality column (customer_id = ?) first, and the range column (created_at > ?) second. This lets PostgreSQL narrow down by customer first, then efficiently scan dates within that customer.

Exercise 4: Partial Index Challenge

Your orders table has 10 million rows. Only 50,000 are status = 'pending'. You frequently query pending orders by customer. Create the most efficient index possible.

Reveal Solution

-- Partial index: only index the 50K pending rows!
CREATE INDEX idx_orders_pending_customer
ON orders (customer_id)
WHERE status = 'pending';

-- This index is ~200x smaller than a full index
-- and perfectly matches: WHERE customer_id = ? AND status = 'pending'

📝 Quiz — Test Your Knowledge!

Question 1: What is the default index type in PostgreSQL?

Question 2: What does `EXPLAIN ANALYZE` do differently than `EXPLAIN`?

Question 3: Does PostgreSQL automatically create indexes on foreign key columns?

Question 4: You have an index on `(A, B, C)`. Which query can use it?

Question 5: Why does `WHERE UPPER(email) = 'TEST'` not use an index on `email`?

Question 6: What is a partial index?

Question 7: Which index type is best for full-text search and JSONB queries?

🎉 What's Next?

5

Index types learned

400x

Speedup demonstrated

7

Quiz questions conquered

You now have the tools to make any PostgreSQL query blazing fast. Here's what you learned:

Concept	What It Does	Key Syntax
B-tree Index	General-purpose index for =, <, >, ORDER BY	`CREATE INDEX idx ON tbl (col)`
Hash Index	Equality-only lookups	`CREATE INDEX ... USING HASH (col)`
GIN Index	Full-text, arrays, JSONB	`CREATE INDEX ... USING GIN (col)`
BRIN Index	Huge naturally-sorted tables	`CREATE INDEX ... USING BRIN (col)`
EXPLAIN ANALYZE	X-ray for query performance	`EXPLAIN ANALYZE SELECT ...`
Partial Index	Index only matching rows	`CREATE INDEX ... WHERE condition`
Expression Index	Index a computed value	`CREATE INDEX ... ON tbl (LOWER(col))`
Multi-Column Index	Index on multiple columns	`CREATE INDEX ... ON tbl (a, b)`

🔮 Coming Up Next: Functions

Now that you can write fast queries, it's time to learn about PostgreSQL's built-in functions and how to create your own! You'll master string functions, date/time manipulation, mathematical operations, and even write custom stored functions in PL/pgSQL. Think of functions as reusable mini-programs that live inside your database — ready to transform data on demand!

⚡ Indexes & Performance

1️⃣ What Is an Index?

🧒 ELI5 — The Book Index Analogy

🌍 Real-Life Analogy — The Phone Book

Your First Index

💡 Pro Tip — Naming Convention

Let's Create a Big Table to Test With

2️⃣ How B-tree Indexes Work

🧒 ELI5 — The Guessing Game

The B-tree Structure (Visual)

B-tree Is the Default

⚠️ LIKE '%pattern' Can't Use B-tree Indexes!

Seeing the Speed Difference

3️⃣ Other Index Types

Hash Index — Equality Only

🧒 ELI5 — The Coat Check

GIN Index — Full-Text, Arrays, JSONB

🧒 ELI5 — The Keyword Concordance

GiST Index — Geometry & Ranges

BRIN Index — Huge Sorted Tables

🧒 ELI5 — The City Block Map

Index Type Cheat Sheet

💡 Pro Tip — When in Doubt, Use B-tree

4️⃣ EXPLAIN ANALYZE — X-Ray for Queries

🧒 ELI5 — The Doctor's X-Ray

EXPLAIN vs. EXPLAIN ANALYZE

⚠️ EXPLAIN ANALYZE Actually Executes the Query!

Reading an Execution Plan

Common Scan Types

💡 Pro Tip — Use EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)

5️⃣ When NOT to Index

🧒 ELI5 — Too Many Bookmarks

When Indexes DON'T Help

The Cost of Indexes

🌍 Real-Life Analogy

6️⃣ Partial Indexes & Expression Indexes

Partial Indexes — Index Only Some Rows

🧒 ELI5 — The VIP List

💡 Pro Tip — Partial Indexes Save Space!

Expression Indexes — Index a Computed Value

🧒 ELI5 — The Nickname Directory

⚠️ The Query Must Match the Index Expression Exactly!

7️⃣ Multi-Column Indexes

🧒 ELI5 — The Phone Book, Again

The Left-Prefix Rule

🌍 Real-Life Analogy

💡 Pro Tip — Column Order Strategy

8️⃣ Common Performance Mistakes

Mistake 1: Missing Index on JOIN Columns

⚠️ PostgreSQL Does NOT Auto-Index Foreign Keys!

Mistake 2: Functions on Indexed Columns

Mistake 3: Leading Wildcard in LIKE

Mistake 4: Too Many Indexes

Mistake 5: SELECT * When You Don't Need All Columns

Performance Checklist

Index foreign key columns

Check with EXPLAIN ANALYZE

Avoid functions on indexed columns

Use partial indexes for filtered queries

Remove unused indexes

🏋️ Practice Exercises

Exercise 1: Diagnose the Slow Query

Exercise 2: Choose the Right Index Type

Exercise 3: Multi-Column Index Design

Exercise 4: Partial Index Challenge

📝 Quiz — Test Your Knowledge!

Question 1: What is the default index type in PostgreSQL?

Question 2: What does EXPLAIN ANALYZE do differently than EXPLAIN?

Question 3: Does PostgreSQL automatically create indexes on foreign key columns?

Question 4: You have an index on (A, B, C). Which query can use it?

Question 5: Why does WHERE UPPER(email) = 'TEST' not use an index on email?

Question 6: What is a partial index?

Question 7: Which index type is best for full-text search and JSONB queries?

🎉 What's Next?

🔮 Coming Up Next: Functions

Question 2: What does `EXPLAIN ANALYZE` do differently than `EXPLAIN`?

Question 4: You have an index on `(A, B, C)`. Which query can use it?

Question 5: Why does `WHERE UPPER(email) = 'TEST'` not use an index on `email`?