Make your queries blazing fast! From full table scans to instant lookups.
Imagine you have a 500-page textbook and you need to find every mention of "photosynthesis." You have two choices: (1) Start on page 1 and read every single page until the end — this is what PostgreSQL does without an index (called a sequential scan). Or (2) flip to the index at the back of the book, look up "photosynthesis," and it tells you: "pages 42, 87, 156." You jump straight there — done in seconds! A database index works exactly the same way. It's a separate data structure that tells PostgreSQL exactly where to find the rows you're looking for, without reading the entire table.
Old-school phone books were organized alphabetically by last name. If you wanted to find "Smith, John," you'd flip to the S section and find it in seconds. But if someone asked: "Find everyone on Maple Street" — you'd have to read the ENTIRE phone book because it's not organized by street. That's why the choice of which columns to index matters so much. You can only look things up fast if there's an index on that particular column.
PostgreSQL automatically creates indexes on PRIMARY KEY and UNIQUE columns. But for other columns you query often, you create indexes yourself:
-- Basic syntax CREATE INDEX index_name ON table_name (column_name); -- Example: index on the email column of a users table CREATE INDEX idx_users_email ON users (email); -- Now this query will be FAST: SELECT * FROM users WHERE email = 'alice@example.com';
Use the pattern idx_tablename_columnname for index names. For multi-column indexes: idx_orders_customer_id_date. Clear names make it easy to know what an index is for when you're debugging performance months later.
Small tables don't show performance differences. Let's create a table with 1 million rows to see the dramatic difference indexes make:
-- Create a big table CREATE TABLE big_orders ( id SERIAL PRIMARY KEY, customer_id INTEGER, amount DECIMAL(10,2), status VARCHAR(20), created_at TIMESTAMP ); -- Insert 1 million rows of random data INSERT INTO big_orders (customer_id, amount, status, created_at) SELECT (RANDOM() * 10000)::INTEGER, (RANDOM() * 1000)::DECIMAL(10,2), CASE (RANDOM() * 3)::INTEGER WHEN 0 THEN 'pending' WHEN 1 THEN 'completed' WHEN 2 THEN 'cancelled' ELSE 'refunded' END, NOW() - (RANDOM() * 365)::INTEGER * INTERVAL '1 day' FROM generate_series(1, 1000000);
Remember the "guess a number between 1 and 100" game? You don't start at 1 and count up. You start in the middle: "Is it higher or lower than 50?" Then: "Higher or lower than 75?" Then: "Higher or lower than 62?" In just 7 guesses, you can find any number from 1 to 100. That's called binary search, and it's exactly how a B-tree index works! Instead of checking every row, it narrows down by half at each step. For 1 million rows, that's only about 20 steps instead of 1,000,000.
A B-tree (Balanced tree) organizes data like a family tree where every branch is the same height:
[50] ← Root node / \ [25] [75] ← Internal nodes / \ / \ [10,20] [30,40] [60,70] [80,90] ← Leaf nodes (point to actual rows) Looking for customer_id = 70: 1. Root: 70 > 50 → go right 2. Node: 70 < 75 → go left 3. Leaf: found 70! → points to the actual row on disk Only 3 steps instead of scanning 1,000,000 rows!
When you write CREATE INDEX without specifying a type, PostgreSQL creates a B-tree index. It's the most versatile and handles these operations efficiently:
| Operation | Example | B-tree Can Help? |
|---|---|---|
| Equality | WHERE email = 'alice@test.com' | ✅ Yes |
| Range (less/greater) | WHERE price > 100 | ✅ Yes |
| BETWEEN | WHERE age BETWEEN 20 AND 30 | ✅ Yes |
| ORDER BY | ORDER BY created_at DESC | ✅ Yes (avoids sorting!) |
| IS NULL | WHERE deleted_at IS NULL | ✅ Yes |
| LIKE prefix | WHERE name LIKE 'Al%' | ✅ Yes |
| LIKE middle/suffix | WHERE name LIKE '%ice' | ❌ No |
A B-tree works like a phone book — organized from left to right. LIKE 'Al%' works because you can flip to the "Al" section. But LIKE '%ice' is like asking "find all names ending in 'ice'" — you'd have to read the entire book. For suffix/contains searches, you need a different index type (GIN with trigrams).
-- BEFORE creating an index (slow!) EXPLAIN ANALYZE SELECT * FROM big_orders WHERE customer_id = 42; -- Seq Scan on big_orders (cost=0.00..20834.00 rows=100 width=52) -- Filter: (customer_id = 42) -- Execution Time: 85.432 ms ← SLOW! -- Create the index CREATE INDEX idx_big_orders_customer_id ON big_orders (customer_id); -- AFTER creating the index (fast!) EXPLAIN ANALYZE SELECT * FROM big_orders WHERE customer_id = 42; -- Index Scan using idx_big_orders_customer_id (cost=0.42..8.87 rows=100 width=52) -- Index Cond: (customer_id = 42) -- Execution Time: 0.215 ms ← 400x FASTER!
B-tree is the Swiss Army knife, but PostgreSQL has specialized index types for specific use cases:
When you check your coat at a restaurant, they give you a ticket number. To get your coat back, you hand in the number and they go straight to that hook. They can't help you find "all coats that are darker than navy blue" — it's exact match only. That's a hash index: incredibly fast for = checks, but useless for ranges, sorting, or anything else.
CREATE INDEX idx_users_uuid ON users USING HASH (uuid); -- ✅ Fast: SELECT * FROM users WHERE uuid = 'abc-123-def'; -- ❌ Cannot use this index: SELECT * FROM users WHERE uuid > 'abc'; SELECT * FROM users ORDER BY uuid;
In a textbook concordance, each keyword has a list of every page where it appears: "photosynthesis: 42, 87, 156." GIN (Generalized Inverted Index) works the same way — for each value or keyword, it stores a list of all rows that contain it. Perfect when a single column contains multiple values (arrays, JSONB, or full-text documents).
-- Full-text search on articles CREATE INDEX idx_articles_search ON articles USING GIN ( to_tsvector('english', title || ' ' || body) ); -- Search for articles about "database performance" SELECT title FROM articles WHERE to_tsvector('english', title || ' ' || body) @@ to_tsquery('database & performance'); -- JSONB index for flexible schemas CREATE INDEX idx_events_data ON events USING GIN (metadata jsonb_path_ops); -- Array containment CREATE INDEX idx_products_tags ON products USING GIN (tags); SELECT * FROM products WHERE tags @> ARRAY['organic', 'vegan'];
-- Spatial data (PostGIS) CREATE INDEX idx_locations_geom ON locations USING GIST (geom); -- Range types CREATE INDEX idx_reservations_period ON reservations USING GIST (date_range);
Instead of indexing every single house, BRIN (Block Range INdex) just remembers: "Block 1 has addresses 1–100, Block 2 has 101–200." Super tiny and fast to maintain. Works brilliantly when data is naturally sorted — like a log table where rows are always inserted in order of timestamp.
-- Perfect for time-series data where rows arrive in order CREATE INDEX idx_logs_created_at ON server_logs USING BRIN (created_at); -- Tiny index size: ~50 KB instead of ~200 MB for a B-tree on 100M rows!
| Index Type | Best For | Size | Supports |
|---|---|---|---|
| B-tree | General purpose | Medium | =, <, >, BETWEEN, ORDER BY, LIKE 'prefix%' |
| Hash | Exact equality only | Small | = only |
| GIN | Full-text, arrays, JSONB | Large | Contains, search, @>, @@ |
| GiST | Geometry, ranges | Medium | Overlaps, contains, nearest-neighbor |
| BRIN | Huge sorted tables | Tiny | =, <, > on naturally ordered data |
B-tree handles 90% of use cases. Only reach for specialized types when you have a specific need: GIN for full-text/JSONB, GiST for spatial, BRIN for append-only time-series. Don't over-optimize — start with B-tree and switch only when you measure a problem.
When you're sick, the doctor doesn't just guess — they take an X-ray to see what's happening inside your body. EXPLAIN ANALYZE is an X-ray for your SQL queries. It shows you exactly what PostgreSQL does to find your data: which tables it scans, which indexes it uses, how many rows it processes, and how long each step takes. Instead of guessing why a query is slow, you can see the bottleneck.
| Command | What It Does | Runs the Query? |
|---|---|---|
EXPLAIN | Shows the plan (estimated costs and rows) | ❌ No — just estimates |
EXPLAIN ANALYZE | Shows the plan AND actual timing/row counts | ✅ Yes — actually runs it |
Be careful with EXPLAIN ANALYZE on DELETE or UPDATE statements — they will actually modify your data! Wrap them in a transaction and rollback: BEGIN; EXPLAIN ANALYZE DELETE ...; ROLLBACK;
EXPLAIN ANALYZE SELECT * FROM big_orders WHERE customer_id = 42 AND status = 'completed';
-- Sample output (with index on customer_id): Index Scan using idx_big_orders_customer_id on big_orders (cost=0.42..12.55 rows=25 width=52) (actual time=0.035..0.128 rows=23 loops=1) Index Cond: (customer_id = 42) Filter: (status = 'completed'::text) Rows Removed by Filter: 77 Planning Time: 0.125 ms Execution Time: 0.168 ms
Let's decode each piece:
| Part | Meaning |
|---|---|
Index Scan using idx_... | PostgreSQL used an index — great! |
cost=0.42..12.55 | Estimated startup cost..total cost (abstract units) |
rows=25 (estimated) | PostgreSQL guessed 25 rows would match |
actual time=0.035..0.128 | Real time: 0.035ms to first row, 0.128ms to all rows |
rows=23 (actual) | Actually found 23 rows (close to the estimate!) |
Index Cond | The condition that used the index |
Filter | Additional filtering done AFTER the index lookup |
Rows Removed by Filter: 77 | Found 100 rows via index, then filtered out 77 |
Execution Time | Total wall-clock time — the number that matters! |
Adding BUFFERS shows disk I/O (cache hits vs. actual reads): EXPLAIN (ANALYZE, BUFFERS) SELECT .... High Buffers: shared read means data wasn't cached. High shared hit means PostgreSQL read from memory cache — much faster.
Imagine putting a sticky bookmark on every single page of a book. Now the book is twice as thick, the bookmarks don't help you find anything faster, and every time you add a new page you have to add a new bookmark too. That's what happens when you create too many indexes! Indexes have a cost — storage space and slower writes. They're only worth it when they actually speed up your reads.
| Situation | Why Index Doesn't Help | Example |
|---|---|---|
| Very small tables | Reading 100 rows is already instant — an index adds overhead for no benefit | A countries table with 200 rows |
| Low-cardinality columns | If only 3 unique values exist, the index can't narrow things down much | A status column with values: active, inactive, pending |
| Write-heavy tables | Every INSERT/UPDATE/DELETE must also update the index — slows writes | A high-volume logging table with 10,000 inserts/second |
| Columns rarely queried | An index on a column nobody searches is pure waste | An internal notes field nobody filters on |
| Queries returning most rows | If 80% of rows match, a sequential scan is faster than using an index | WHERE active = true when 95% of users are active |
-- Check the size of all indexes on a table SELECT indexname, pg_size_pretty(pg_relation_size(indexname::regclass)) AS index_size FROM pg_indexes WHERE tablename = 'big_orders';
-- Find unused indexes (waste of space!) SELECT schemaname, relname AS table_name, indexrelname AS index_name, idx_scan AS times_used, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size FROM pg_stat_user_indexes WHERE idx_scan = 0 ORDER BY pg_relation_size(indexrelid) DESC;
Think of a small coffee shop menu with 5 items. Does it need an alphabetical index? Absolutely not — you can read the whole thing in 2 seconds. But a 200-page restaurant menu at a food court? Yes, you'd want a "Cuisine Type" index! Size matters when deciding whether to index.
These are PostgreSQL power features that let you create smarter, smaller indexes.
Instead of putting everyone's name in a phone book, imagine you only put VIP customers in a special booklet. It's much smaller, faster to search, and only contains the people you actually care about looking up. A partial index does the same — it indexes only rows that match a condition.
-- Only index active orders (ignore completed/cancelled ones) CREATE INDEX idx_orders_active ON big_orders (customer_id) WHERE status = 'pending'; -- This query uses the partial index: SELECT * FROM big_orders WHERE customer_id = 42 AND status = 'pending'; -- This query does NOT use it (different status): SELECT * FROM big_orders WHERE customer_id = 42 AND status = 'completed';
If only 5% of orders are "pending," a partial index is 20x smaller than a full index. Smaller index = fits in memory = faster lookups. This is one of PostgreSQL's most underused superpowers.
Imagine a school directory sorted by students' nicknames instead of full names. Even though "nickname" isn't stored anywhere, the directory computes it and sorts by it. Expression indexes let you index the result of a function or expression, not just a raw column.
-- Index on LOWER(email) for case-insensitive email lookups CREATE INDEX idx_users_email_lower ON users (LOWER(email)); -- This query now uses the index: SELECT * FROM users WHERE LOWER(email) = 'alice@example.com'; -- ❌ This does NOT use it (no LOWER function): SELECT * FROM users WHERE email = 'alice@example.com';
-- Index on extracted year from a timestamp CREATE INDEX idx_orders_year ON big_orders (EXTRACT(YEAR FROM created_at)); -- Fast lookup of all 2024 orders SELECT * FROM big_orders WHERE EXTRACT(YEAR FROM created_at) = 2024;
If the index is on LOWER(email), your WHERE clause must also use LOWER(email). Using email ILIKE or just email = won't match the index. The expression in the query must be identical to the expression in the index definition.
A phone book is sorted by Last Name first, then by First Name within the same last name. So you can quickly find "Smith" (last name only), or "Smith, John" (both names). But can you find everyone named "John" regardless of last name? Nope — you'd have to read the whole book, because it's not organized by first name first. Multi-column indexes work the same way. The order of columns matters!
-- Create a multi-column index CREATE INDEX idx_orders_cust_status ON big_orders (customer_id, status);
A multi-column index on (A, B, C) can be used for queries on:
| Query Filters On | Uses Index? | Why |
|---|---|---|
WHERE A = ? | ✅ Yes | Leftmost column |
WHERE A = ? AND B = ? | ✅ Yes | First two columns (left prefix) |
WHERE A = ? AND B = ? AND C = ? | ✅ Yes | All three columns |
WHERE B = ? | ❌ No | Skips the leftmost column! |
WHERE C = ? | ❌ No | Skips A and B |
WHERE A = ? AND C = ? | ⚠️ Partial | Uses index for A, then filters C |
It's like a filing cabinet organized by: Year → Month → Day. You can quickly find "all files from 2024" (Year only), or "all files from March 2024" (Year + Month). But "all files from the 15th of any month in any year"? You'd have to check every drawer — the cabinet isn't organized that way.
-- ✅ Uses the index (customer_id is the leftmost column) EXPLAIN ANALYZE SELECT * FROM big_orders WHERE customer_id = 42 AND status = 'pending'; -- ❌ Cannot use the index (status is NOT the leftmost column) EXPLAIN ANALYZE SELECT * FROM big_orders WHERE status = 'pending';
Put the most selective (most unique values) column first. If customer_id has 10,000 unique values and status has only 4, put customer_id first. This narrows down the results most quickly at the first level of the tree.
Here are the pitfalls that trip up even experienced developers:
-- ❌ SLOW: If orders.customer_id has no index, this scans the entire orders table for EVERY customer SELECT c.name, o.amount FROM customers c JOIN orders o ON o.customer_id = c.id; -- ✅ FIX: Create an index on the foreign key CREATE INDEX idx_orders_customer_id ON orders (customer_id);
Unlike some other databases, PostgreSQL does not automatically create indexes on foreign key columns. It only auto-creates indexes on PRIMARY KEY and UNIQUE constraints. You must create FK indexes yourself! This is the #1 performance mistake in PostgreSQL.
-- ❌ SLOW: UPPER() wraps the column, so the index on "email" can't be used SELECT * FROM users WHERE UPPER(email) = 'ALICE@EXAMPLE.COM'; -- ✅ FIX Option A: Create an expression index CREATE INDEX idx_users_email_upper ON users (UPPER(email)); -- ✅ FIX Option B: Store data normalized, query without function SELECT * FROM users WHERE email = 'alice@example.com';
-- ❌ SLOW: Leading % means "anything before" — can't use B-tree SELECT * FROM products WHERE name LIKE '%phone%'; -- ✅ FIX: Use a GIN trigram index CREATE EXTENSION IF NOT EXISTS pg_trgm; CREATE INDEX idx_products_name_trgm ON products USING GIN (name gin_trgm_ops); -- Now '%phone%' can use the GIN index!
-- ❌ BAD: Indexing every column wastes space and slows writes CREATE INDEX idx1 ON orders (customer_id); CREATE INDEX idx2 ON orders (status); CREATE INDEX idx3 ON orders (amount); CREATE INDEX idx4 ON orders (created_at); CREATE INDEX idx5 ON orders (customer_id, status); CREATE INDEX idx6 ON orders (customer_id, created_at); -- 6 indexes on one table! Each INSERT now updates 6 indexes + the table. -- ✅ BETTER: Identify actual query patterns and create only what's needed CREATE INDEX idx_orders_cust_status ON orders (customer_id, status); CREATE INDEX idx_orders_created ON orders (created_at); -- 2 indexes that cover the real query patterns
-- ❌ Fetches ALL columns — can't use "Index Only Scan" SELECT * FROM big_orders WHERE customer_id = 42; -- ✅ Only fetch what you need — might enable "Index Only Scan" SELECT id, amount FROM big_orders WHERE customer_id = 42;
Every column used in a JOIN or WHERE that references another table should have an index.
Don't guess — measure! Run EXPLAIN ANALYZE on slow queries to find sequential scans.
Use expression indexes if you must filter by LOWER(col) or EXTRACT(... FROM col).
If you always query WHERE status = 'active', a partial index is smaller and faster.
Check pg_stat_user_indexes for indexes with idx_scan = 0. Drop them to save space and speed up writes.
Time to apply your indexing knowledge!
You have a users table with 5 million rows. This query takes 4 seconds: SELECT * FROM users WHERE email = 'test@example.com';. What's the likely problem and how do you fix it?
The email column likely has no index, so PostgreSQL does a sequential scan of all 5 million rows.
-- First, confirm with EXPLAIN ANALYZE EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com'; -- Look for "Seq Scan" in the output -- Fix: Create an index (UNIQUE if emails are unique) CREATE UNIQUE INDEX idx_users_email ON users (email);
You need to search a products table for items where the description column contains the word "organic". Which index type would you use and why?
A GIN index with full-text search, because you need to search inside text content:
CREATE INDEX idx_products_description ON products USING GIN ( to_tsvector('english', description) ); SELECT * FROM products WHERE to_tsvector('english', description) @@ to_tsquery('organic');
Your app always queries orders like: WHERE customer_id = ? AND created_at > ?. Design the optimal index.
-- customer_id first (equality), then created_at (range) CREATE INDEX idx_orders_cust_date ON orders (customer_id, created_at);
Put the equality column (customer_id = ?) first, and the range column (created_at > ?) second. This lets PostgreSQL narrow down by customer first, then efficiently scan dates within that customer.
Your orders table has 10 million rows. Only 50,000 are status = 'pending'. You frequently query pending orders by customer. Create the most efficient index possible.
-- Partial index: only index the 50K pending rows! CREATE INDEX idx_orders_pending_customer ON orders (customer_id) WHERE status = 'pending'; -- This index is ~200x smaller than a full index -- and perfectly matches: WHERE customer_id = ? AND status = 'pending'
EXPLAIN ANALYZE do differently than EXPLAIN?(A, B, C). Which query can use it?WHERE UPPER(email) = 'TEST' not use an index on email?You now have the tools to make any PostgreSQL query blazing fast. Here's what you learned:
| Concept | What It Does | Key Syntax |
|---|---|---|
| B-tree Index | General-purpose index for =, <, >, ORDER BY | CREATE INDEX idx ON tbl (col) |
| Hash Index | Equality-only lookups | CREATE INDEX ... USING HASH (col) |
| GIN Index | Full-text, arrays, JSONB | CREATE INDEX ... USING GIN (col) |
| BRIN Index | Huge naturally-sorted tables | CREATE INDEX ... USING BRIN (col) |
| EXPLAIN ANALYZE | X-ray for query performance | EXPLAIN ANALYZE SELECT ... |
| Partial Index | Index only matching rows | CREATE INDEX ... WHERE condition |
| Expression Index | Index a computed value | CREATE INDEX ... ON tbl (LOWER(col)) |
| Multi-Column Index | Index on multiple columns | CREATE INDEX ... ON tbl (a, b) |
Now that you can write fast queries, it's time to learn about PostgreSQL's built-in functions and how to create your own! You'll master string functions, date/time manipulation, mathematical operations, and even write custom stored functions in PL/pgSQL. Think of functions as reusable mini-programs that live inside your database — ready to transform data on demand!