Production PostgreSQL | PostgreSQL Mastery

1️⃣ VACUUM & Autovacuum — Spring Cleaning

🧒 ELI5 — The Messy Roommate

Imagine you live with a roommate who never throws anything away. Every time they update their shopping list, they don’t erase the old one — they write a brand new list and leave the old one on the table. After a few months, you have thousands of old shopping lists piling up everywhere.

That’s PostgreSQL with MVCC! Remember from Module 11: when you UPDATE or DELETE a row, the old version isn’t immediately removed. It stays there as a “dead tuple” (dead row) taking up space. Over time, these dead rows pile up, making your tables bloated and slow.

VACUUM is the spring cleaning service. It walks through the table, identifies dead rows that no active transaction can see anymore, and marks that space as reusable. AUTOVACUUM is the robot vacuum that runs automatically in the background so you don’t have to clean manually.

Dead Tuples — The Problem

-- See how many dead tuples (dead rows) each table has
SELECT relname AS table_name,
       n_live_tup AS live_rows,
       n_dead_tup AS dead_rows,
       ROUND(n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 2) AS dead_pct,
       last_vacuum,
       last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

-- Result:
-- table_name | live_rows | dead_rows | dead_pct | last_vacuum | last_autovacuum
-- orders     | 1000000   | 250000    | 20.00    | NULL        | 2025-03-18 03:00
-- products   | 50000     | 1200      | 2.34     | NULL        | 2025-03-18 03:15

⚠️ 20% Dead Rows Is a Warning Sign

If a table has more than 10-20% dead rows, autovacuum might not be keeping up. This causes table bloat (wasted disk space) and slower queries (PostgreSQL has to skip over dead rows). Check your autovacuum settings!

Manual VACUUM

-- Regular VACUUM: marks dead space as reusable, doesn't lock the table
VACUUM orders;

-- VACUUM VERBOSE: shows detailed stats about what it did
VACUUM VERBOSE orders;
-- INFO: "orders": removed 250000 dead row versions in 1847 pages
-- INFO: "orders": found 250000 removable, 1000000 nonremovable row versions

-- VACUUM ANALYZE: vacuum + update statistics (helps query planner)
VACUUM ANALYZE orders;

VACUUM vs VACUUM FULL

Feature	VACUUM (Regular)	VACUUM FULL
Locks the table?	No (runs alongside queries)	Yes! (exclusive lock)
Reclaims disk space to OS?	No (only marks reusable)	Yes (rewrites entire table)
Speed	Fast	Slow (rewrites everything)
Downtime required?	No	Effectively yes
When to use	Routine maintenance	Emergency: severe bloat only

🌎 The Rule of Thumb

Use regular VACUUM 99.9% of the time (or just let autovacuum handle it). Only use VACUUM FULL as a last resort when a table has extreme bloat (e.g., after a massive DELETE of millions of rows) and you need to reclaim disk space to the operating system. It’s like tearing down your house and rebuilding it vs. just tidying up the rooms.

Autovacuum — The Robot Vacuum

-- Check if autovacuum is enabled (it should be!)
SHOW autovacuum;
-- Result: on

-- Key autovacuum settings
SHOW autovacuum_vacuum_threshold;
-- Default: 50 (minimum dead rows before vacuum kicks in)

SHOW autovacuum_vacuum_scale_factor;
-- Default: 0.2 (vacuum when 20% of rows are dead)

-- The formula: vacuum triggers when dead rows > threshold + (scale_factor × total rows)
-- For a 1M-row table: 50 + (0.2 × 1,000,000) = 200,050 dead rows triggers vacuum

-- For high-write tables, you can tune per-table:
ALTER TABLE orders SET (
    autovacuum_vacuum_scale_factor = 0.05,   -- trigger at 5% dead
    autovacuum_vacuum_threshold = 1000       -- minimum 1000 dead rows
);

💡 Pro Tip — Never Disable Autovacuum

Some people disable autovacuum to “improve performance.” This is a terrible idea. Without autovacuum, dead rows pile up endlessly, your tables bloat, queries slow to a crawl, and eventually you might hit transaction ID wraparound — a catastrophic situation where PostgreSQL refuses to accept any new writes until you vacuum. Just leave it on and tune the settings if needed.

2️⃣ WAL — Write-Ahead Log

🧒 ELI5 — The Airplane’s Black Box

You know the “black box” on an airplane? It records every single thing that happens during a flight. If the plane crashes, investigators can replay the black box recording to understand exactly what happened and reconstruct the sequence of events.

The Write-Ahead Log (WAL) is PostgreSQL’s black box. Before PostgreSQL writes any change to the actual data files on disk, it first writes a record of that change to the WAL. This is the “write-ahead” part — the log entry is written ahead of (before) the actual data change.

If PostgreSQL crashes, it simply replays the WAL from the last checkpoint to recover all committed transactions. No data loss. It’s what makes the “D” in ACID (Durability) possible.

How WAL Works (Step by Step)

You execute: `UPDATE accounts SET balance = 500 WHERE id = 1;`

PostgreSQL receives your command and figures out what data needs to change.

WAL record is written to disk FIRST

A description of the change (“set row 1 balance to 500”) is appended to the WAL file on disk. This is a sequential write — very fast!

Data change happens in memory (shared buffers)

The actual table data is modified in RAM. The disk version of the table is NOT updated yet — it’s now “dirty” (out of sync with disk).

COMMIT confirms success

PostgreSQL tells the client “COMMIT successful” as soon as the WAL record is safely on disk. Even if a crash happens now, the WAL has the record.

Background writer eventually flushes to data files

Later, the bgwriter and checkpointer processes write the dirty data from RAM to the actual table files on disk. This happens at their own pace.

WAL

Write-Ahead Log

💾

Sequential writes = Fast

🛡️

Crash recovery guaranteed

Key WAL Settings

-- Check the current WAL level
SHOW wal_level;
-- Default: replica (supports streaming replication)
-- Options: minimal, replica, logical

-- See where WAL files are stored
SHOW data_directory;
-- WAL files live in: {data_directory}/pg_wal/

-- Current WAL position (useful for replication monitoring)
SELECT pg_current_wal_lsn();
-- Result: 0/16B3748  (Log Sequence Number)

-- How much WAL has been generated?
SELECT pg_size_pretty(pg_wal_lsn_diff(
    pg_current_wal_lsn(),
    '0/0'
)) AS total_wal_generated;

Checkpoints — The Sync Point

-- A checkpoint flushes all dirty data from RAM to disk
-- After a checkpoint, all WAL before that point can be recycled

SHOW checkpoint_timeout;
-- Default: 5min (checkpoint at least every 5 minutes)

SHOW max_wal_size;
-- Default: 1GB (force checkpoint if WAL exceeds this)

-- Force a manual checkpoint (rarely needed)
CHECKPOINT;

🌎 Why WAL Matters for You

WAL is the foundation of three critical features: crash recovery (replay WAL after restart), replication (stream WAL to replicas), and point-in-time recovery (replay WAL up to a specific moment to undo a mistake). If you understand WAL, you understand how PostgreSQL stays reliable, how backups work, and how replication is even possible.

3️⃣ Backup & Restore

🧒 ELI5 — Photocopying Your Diary

Imagine your diary contains every important memory of your life. What if your house catches fire? You’d want a photocopy stored somewhere safe. Database backups are exactly that — copies of your data stored somewhere safe so you can recover from disasters like hardware failure, accidental deletions, or ransomware attacks.

PostgreSQL gives you two types of backups: logical backups (like photocopying each page — produces SQL scripts) and physical backups (like cloning the entire diary including the binding — copies raw data files).

pg_dump — Logical Backup (Most Common)

# Backup a single database to a SQL file
pg_dump -U postgres -d mystore > mystore_backup.sql

# Backup in custom format (compressed, supports selective restore)
pg_dump -U postgres -d mystore -Fc -f mystore_backup.dump

# Backup only specific tables
pg_dump -U postgres -d mystore -t orders -t products -Fc -f orders_products.dump

# Backup schema only (no data — just table definitions)
pg_dump -U postgres -d mystore --schema-only -f mystore_schema.sql

# Backup data only (no table definitions)
pg_dump -U postgres -d mystore --data-only -f mystore_data.sql

Backup Formats Compared

Format	Flag	Compressed?	Selective Restore?	Best For
Plain SQL	(default)	No	No	Simple, readable backups
Custom	`-Fc`	Yes	Yes	Production backups (recommended)
Directory	`-Fd`	Yes	Yes	Parallel backup/restore
Tar	`-Ft`	No	Yes	Archive-compatible format

pg_restore — Restoring Backups

# Restore a plain SQL backup
psql -U postgres -d mystore_new < mystore_backup.sql

# Restore a custom-format backup
pg_restore -U postgres -d mystore_new mystore_backup.dump

# Restore only specific tables from a custom backup
pg_restore -U postgres -d mystore_new -t orders mystore_backup.dump

# Restore with --clean (drops objects before recreating)
pg_restore -U postgres -d mystore_new --clean mystore_backup.dump

# List contents of a backup without restoring
pg_restore -l mystore_backup.dump

pg_dumpall — Backup Everything

# Backup ALL databases, roles, and tablespaces
pg_dumpall -U postgres > full_cluster_backup.sql

# Backup only roles (users & permissions)
pg_dumpall -U postgres --roles-only > roles_backup.sql

pg_basebackup — Physical Backup

# Physical backup: copies the entire data directory
# This is the foundation for PITR and setting up replicas
pg_basebackup -U replicator -h localhost -D /backups/base_backup -Fp -Xs -P

# Flags explained:
# -D : destination directory
# -Fp : plain format (raw files)
# -Xs : stream WAL during backup
# -P  : show progress

Automated Backup Script

#!/bin/bash
# Daily backup script — add to crontab
# crontab: 0 2 * * * /scripts/daily_backup.sh

BACKUP_DIR="/backups/postgres"
DATE=$(date +%Y%m%d_%H%M%S)
DB_NAME="mystore"
RETENTION_DAYS=7

# Create backup
pg_dump -U postgres -d $DB_NAME -Fc -f "$BACKUP_DIR/${DB_NAME}_${DATE}.dump"

# Delete backups older than retention period
find $BACKUP_DIR -name "*.dump" -mtime +$RETENTION_DAYS -delete

# Log the result
echo "$(date): Backup completed — ${DB_NAME}_${DATE}.dump" >> /var/log/pg_backup.log

⚠️ The #1 Backup Rule: TEST YOUR RESTORES

A backup that you’ve never tested restoring is NOT a backup. It’s a hope and a prayer. Regularly restore your backups to a test server to verify they actually work. Many teams have learned this the hard way — their “backup” files were corrupt or incomplete, and they only discovered this during an actual disaster.

Point-in-Time Recovery (PITR) — The Time Machine

Take a base backup with pg_basebackup

This is your starting point — a full copy of the data directory.

Archive WAL files continuously

Configure archive_mode = on and archive_command to copy WAL files to a safe location as they’re completed.

Disaster strikes! Someone drops a table at 3:42 PM

Don’t panic. You have a base backup and all the WAL files since that backup.

Restore base backup + replay WAL up to 3:41 PM

Set recovery_target_time = '2025-03-18 15:41:00' and PostgreSQL replays every change up to one minute before the disaster. Your data is back!

💡 Pro Tip — Backup Strategy

For production databases, use a 3-2-1 backup strategy: 3 copies of your data, on 2 different storage types, with 1 copy offsite (cloud storage, different data center). Combine daily pg_dump for quick table restores with continuous WAL archiving for point-in-time recovery.

4️⃣ Replication — Cloning Your Database

🧒 ELI5 — Twins That Stay in Sync

Imagine you’re a teacher with 30 students and only one textbook. Everyone has to wait their turn to read it — very slow! But what if you made exact copies of the textbook and gave one to every table? Now everyone can read at the same time. That’s replication.

You have one primary (the original textbook) where all changes happen. Then you have one or more replicas (copies) that automatically stay in sync. Reads can go to any copy, but writes always go to the primary. This lets you handle 10x more read traffic without the primary breaking a sweat.

Types of Replication

Type	How It Works	Use Case
Streaming Replication	Primary streams WAL to replicas in real-time	High availability, read scaling
Logical Replication	Replicates specific tables using publish/subscribe	Selective sync, cross-version migration
Synchronous	Primary waits for replica to confirm write	Zero data loss (slower writes)
Asynchronous	Primary doesn’t wait (default)	Better performance (tiny lag OK)

Streaming Replication (Most Common)

Configure the Primary Server

-- In postgresql.conf on the PRIMARY:
wal_level = replica
max_wal_senders = 5           -- how many replicas can connect
wal_keep_size = '1GB'          -- keep WAL around for slow replicas

-- Create a replication user
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secure_password_here';

Allow Replication Connections (pg_hba.conf)

# In pg_hba.conf on the PRIMARY:
# TYPE  DATABASE        USER          ADDRESS         METHOD
host    replication     replicator    192.168.1.0/24  scram-sha-256

Create the Replica with pg_basebackup

# On the REPLICA server:
pg_basebackup -h primary-server -U replicator -D /var/lib/postgresql/16/main -Fp -Xs -P -R

# The -R flag creates standby.signal and sets primary_conninfo
# This tells the replica "I'm a replica, connect to this primary"

Start the Replica — It Automatically Streams WAL

# Start PostgreSQL on the replica
sudo systemctl start postgresql

# On the PRIMARY — check connected replicas:
SELECT client_addr, state, sent_lsn, write_lsn, replay_lsn
FROM pg_stat_replication;

Monitoring Replication Lag

-- On the PRIMARY: check how far behind each replica is
SELECT client_addr,
       state,
       pg_size_pretty(pg_wal_lsn_diff(sent_lsn, replay_lsn)) AS replay_lag,
       NOW() - backend_start AS connection_duration
FROM pg_stat_replication;

-- On the REPLICA: check lag from replica's perspective
SELECT NOW() - pg_last_xact_replay_timestamp() AS replication_lag;

Logical Replication (Selective Table Sync)

-- On the PRIMARY: create a publication
CREATE PUBLICATION orders_pub FOR TABLE orders, products;

-- On the SUBSCRIBER (another PostgreSQL instance):
CREATE SUBSCRIPTION orders_sub
    CONNECTION 'host=primary-server dbname=mystore user=replicator password=...'
    PUBLICATION orders_pub;

🌎 Real-World Replication Setup

Read replicas for scaling: Your web app sends all writes to the primary, but reads (which are usually 80-90% of traffic) go to replicas. This is how Netflix, Instagram, and Uber handle billions of database queries — lots of read replicas behind a load balancer.

High availability: If the primary dies, a replica can be promoted to become the new primary in seconds. Tools like Patroni, repmgr, and pg_auto_failover automate this failover process.

5️⃣ Security: Roles & Permissions

🧒 ELI5 — Hotel Key Cards

In a hotel, you get a key card that opens only your room. You can’t open other guests’ rooms, the manager’s office, or the server room. The front desk (admin) can program any key card to open any door. Housekeeping gets a card that opens all rooms but can’t access the safe. That’s exactly how PostgreSQL roles and permissions work.

Each person (or application) gets a role with specific permissions. The principle is simple: give the minimum permissions needed (principle of least privilege). Your app doesn’t need to DROP tables in production, so don’t give it that power!

Creating Roles

-- Create a role (user) with a password and login ability
CREATE ROLE app_user WITH LOGIN PASSWORD 'strong_password_here';

-- Create a read-only role (no login — it's a group)
CREATE ROLE readonly_group;

-- Create a read-write role (group)
CREATE ROLE readwrite_group;

-- Create an admin role with CREATEDB ability
CREATE ROLE db_admin WITH LOGIN PASSWORD 'admin_pass' CREATEDB;

GRANT & REVOKE Permissions

-- Grant read-only access to the readonly_group
GRANT CONNECT ON DATABASE mystore TO readonly_group;
GRANT USAGE ON SCHEMA public TO readonly_group;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO readonly_group;

-- Automatically grant SELECT on future tables too
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT ON TABLES TO readonly_group;

-- Grant read-write access
GRANT CONNECT ON DATABASE mystore TO readwrite_group;
GRANT USAGE ON SCHEMA public TO readwrite_group;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO readwrite_group;
GRANT USAGE ON ALL SEQUENCES IN SCHEMA public TO readwrite_group;

-- Add a user to a group
GRANT readonly_group TO app_user;

-- Revoke a permission
REVOKE DELETE ON orders FROM readwrite_group;

Role Hierarchy Example

Role	Can SELECT?	Can INSERT/UPDATE?	Can DELETE?	Can DROP?
`readonly_group`	✔	✗	✗	✗
`readwrite_group`	✔	✔	✔	✗
`db_admin`	✔	✔	✔	✔
`analytics_team`	✔	✗	✗	✗

Row Level Security (RLS) — Row-by-Row Control

-- RLS lets you control which ROWS a user can see, not just which tables

-- Enable RLS on the orders table
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;

-- Policy: users can only see their own orders
CREATE POLICY user_sees_own_orders ON orders
    FOR SELECT
    USING (user_id = current_setting('app.current_user_id')::INTEGER);

-- Policy: users can only insert orders for themselves
CREATE POLICY user_inserts_own_orders ON orders
    FOR INSERT
    WITH CHECK (user_id = current_setting('app.current_user_id')::INTEGER);

-- Admin policy: admins can see everything
CREATE POLICY admin_full_access ON orders
    FOR ALL
    TO db_admin
    USING (TRUE);

🌎 RLS in Action: Supabase

If you use Supabase (the popular open-source Firebase alternative), you’re using RLS every day! Supabase’s security model is built entirely on PostgreSQL’s Row Level Security. When a user logs in and queries their data through the Supabase API, RLS policies ensure they can only see and modify their own data — even though they’re hitting the database directly.

pg_hba.conf — Connection Authentication

# pg_hba.conf controls WHO can connect and HOW they authenticate
# TYPE    DATABASE    USER          ADDRESS           METHOD

# Local connections (Unix socket) — use peer auth
local   all         postgres                          peer

# Local network — password authentication
host    all         all           127.0.0.1/32        scram-sha-256
host    all         all           ::1/128             scram-sha-256

# Application servers on the internal network
host    mystore     app_user      192.168.1.0/24      scram-sha-256

# Replication connections
host    replication replicator    192.168.1.0/24      scram-sha-256

# DANGER: never do this in production!
# host  all         all           0.0.0.0/0           trust

⚠️ Security Best Practices

Never use the postgres superuser for application connections
Use scram-sha-256 authentication (not md5 or trust)
Restrict IP addresses in pg_hba.conf — don’t allow 0.0.0.0/0
Rotate passwords regularly and use secrets managers
Use SSL/TLS for all remote connections (ssl = on in postgresql.conf)
Principle of least privilege: give only the permissions each role actually needs

6️⃣ Configuration Tuning

🧒 ELI5 — Tuning a Race Car

Out of the box, PostgreSQL is configured for a small laptop — like driving a Ferrari in first gear. The default settings are extremely conservative to work on any machine. For production, you need to adjust the settings to match your server’s actual RAM, CPU, and storage. It’s like adjusting the seat, mirrors, and gear ratios for the actual driver and track.

The Big Five Settings (postgresql.conf)

Setting	Default	What It Does	Recommended
`shared_buffers`	128MB	RAM for caching data pages	25% of total RAM
`work_mem`	4MB	RAM per sort/hash operation	50-256MB (depends on connections)
`maintenance_work_mem`	64MB	RAM for VACUUM, CREATE INDEX	512MB – 2GB
`effective_cache_size`	4GB	Planner’s estimate of available cache	50-75% of total RAM
`max_connections`	100	Maximum concurrent connections	100-300 (use pooling!)

Example: Tuning for a 32GB RAM Server

# postgresql.conf tuning for a 32GB RAM, SSD server

# Memory
shared_buffers = '8GB'              # 25% of 32GB
work_mem = '128MB'                  # per sort/hash operation
maintenance_work_mem = '1GB'        # for VACUUM and CREATE INDEX
effective_cache_size = '24GB'       # 75% of 32GB

# Connections
max_connections = 200               # use connection pooling!

# WAL
wal_buffers = '64MB'                # WAL buffer size
min_wal_size = '1GB'
max_wal_size = '4GB'
checkpoint_completion_target = 0.9  # spread checkpoint over 90% of interval

# Planner
random_page_cost = 1.1              # SSD (default 4.0 is for spinning disks)
effective_io_concurrency = 200      # SSD can handle parallel IO

# Logging
log_min_duration_statement = 1000   # log queries slower than 1 second
log_line_prefix = '%t [%p] %u@%d ' # timestamp, pid, user, database

💡 Pro Tip — Use PGTune

Don’t guess! Use PGTune — a free online tool. Enter your server’s RAM, CPU count, storage type, and application type. It generates optimal postgresql.conf settings instantly. It’s what most PostgreSQL professionals use as a starting point.

pg_stat_statements — Find Slow Queries

-- Enable the extension (add to postgresql.conf first):
-- shared_preload_libraries = 'pg_stat_statements'

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Top 10 slowest queries (by total time)
SELECT
    ROUND(total_exec_time::numeric, 2) AS total_ms,
    calls,
    ROUND(mean_exec_time::numeric, 2) AS avg_ms,
    ROUND(rows::numeric / NULLIF(calls, 0), 0) AS avg_rows,
    LEFT(query, 80) AS query_preview
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- Top 10 most frequently called queries
SELECT
    calls,
    ROUND(mean_exec_time::numeric, 2) AS avg_ms,
    LEFT(query, 80) AS query_preview
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 10;

-- Reset statistics (do this periodically)
SELECT pg_stat_statements_reset();

Connection Pooling with PgBouncer

-- The problem: each PostgreSQL connection uses ~5-10MB of RAM
-- 200 connections × 10MB = 2GB just for connections!

-- The solution: PgBouncer sits between your app and PostgreSQL
-- Your app opens 200 connections to PgBouncer
-- PgBouncer uses only 20 connections to PostgreSQL
-- Connections are reused and shared — like a carpool!

# pgbouncer.ini (simplified)
[databases]
mystore = host=127.0.0.1 port=5432 dbname=mystore

[pgbouncer]
listen_port = 6432
listen_addr = 0.0.0.0
auth_type = scram-sha-256
pool_mode = transaction     # release connection after each transaction
default_pool_size = 20      # 20 actual PostgreSQL connections
max_client_conn = 500       # 500 app connections allowed

🌎 Why Connection Pooling Matters

Without PgBouncer, a spike to 500 simultaneous users means 500 PostgreSQL connections, eating up 5GB of RAM and causing the database to crawl. With PgBouncer in “transaction mode,” those 500 users share just 20-50 actual connections, because most connections are idle between queries. Amazon RDS, Supabase, and Railway all use connection pooling by default.

7️⃣ Monitoring — Watching Your Database

🧒 ELI5 — The Dashboard on Your Car

When you drive a car, you have a dashboard showing speed, fuel level, engine temperature, and warning lights. You don’t need to be a mechanic to notice when the temperature gauge is in the red zone. PostgreSQL has its own dashboard — system views that tell you what’s happening right now, what’s slow, and what needs attention.

pg_stat_activity — Who’s Doing What Right Now?

-- See all active connections and what they're doing
SELECT
    pid,
    usename AS user,
    datname AS database,
    state,
    NOW() - query_start AS duration,
    wait_event_type,
    LEFT(query, 60) AS query_preview
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY query_start;

Finding Long-Running Queries

-- Queries running for more than 5 minutes
SELECT
    pid,
    usename,
    NOW() - query_start AS duration,
    state,
    query
FROM pg_stat_activity
WHERE state = 'active'
  AND query_start < NOW() - INTERVAL '5 minutes'
ORDER BY query_start;

-- Kill a specific runaway query (use with caution!)
SELECT pg_cancel_backend(12345);    -- graceful: cancel the query
SELECT pg_terminate_backend(12345); -- forceful: terminate the connection

pg_stat_user_tables — Table Health

-- Table health overview: reads, writes, dead rows, vacuums
SELECT
    relname AS table_name,
    seq_scan AS full_scans,
    idx_scan AS index_scans,
    n_tup_ins AS inserts,
    n_tup_upd AS updates,
    n_tup_del AS deletes,
    n_dead_tup AS dead_rows,
    last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

⚠️ Red Flag: seq_scan >> idx_scan

If a table shows far more sequential scans (full table scans) than index scans, it means queries are reading the entire table instead of using indexes. This is the #1 indicator that you need to add an index. For example, if orders has 500,000 seq_scans and only 1,000 idx_scans, something is very wrong.

Lock Monitoring

-- Find blocked queries (waiting for locks)
SELECT
    blocked.pid AS blocked_pid,
    blocked.usename AS blocked_user,
    LEFT(blocked.query, 50) AS blocked_query,
    blocking.pid AS blocking_pid,
    blocking.usename AS blocking_user,
    LEFT(blocking.query, 50) AS blocking_query
FROM pg_stat_activity blocked
JOIN pg_locks bl ON bl.pid = blocked.pid
JOIN pg_locks lk ON lk.locktype = bl.locktype
    AND lk.database = bl.database
    AND lk.relation = bl.relation
    AND lk.pid != bl.pid
JOIN pg_stat_activity blocking ON blocking.pid = lk.pid
WHERE bl.granted = FALSE;

Database Size Monitoring

-- Total database size
SELECT pg_size_pretty(pg_database_size('mystore')) AS db_size;
-- Result: 2.4 GB

-- Size of each table (including indexes and toast)
SELECT
    relname AS table_name,
    pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
    pg_size_pretty(pg_relation_size(relid)) AS table_size,
    pg_size_pretty(pg_indexes_size(relid)) AS index_size
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(relid) DESC;

-- Unused indexes (candidates for removal)
SELECT
    indexrelname AS index_name,
    relname AS table_name,
    idx_scan AS times_used,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

💡 Pro Tip — Monitoring Checklist

Set up alerts for: (1) Connections approaching max_connections, (2) Queries running longer than 5 minutes, (3) Replication lag exceeding 1 minute, (4) Disk usage above 80%, (5) Dead rows exceeding 20% on any table. Tools like pgAdmin, Datadog, Grafana + Prometheus, and pg_stat_monitor make this easy.

🏋️ Practice Exercises

Exercise 1: Check Dead Rows

Write a query using pg_stat_user_tables to find tables with more than 10% dead rows. Show table name, live rows, dead rows, and dead percentage.

Reveal Solution

SELECT
    relname AS table_name,
    n_live_tup AS live_rows,
    n_dead_tup AS dead_rows,
    ROUND(
        n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 2
    ) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup, 0) > 0.1
ORDER BY dead_pct DESC;

Exercise 2: Backup and Restore

Write the pg_dump command to backup a database called shopdb in custom format to /backups/shopdb.dump. Then write the command to restore it to a new database called shopdb_restored.

Reveal Solution

# Backup
pg_dump -U postgres -d shopdb -Fc -f /backups/shopdb.dump

# Create the target database first
createdb -U postgres shopdb_restored

# Restore
pg_restore -U postgres -d shopdb_restored /backups/shopdb.dump

Exercise 3: Create Roles

Create a role called analyst that can only SELECT from all tables in the public schema. Create another role called api_service that can SELECT, INSERT, and UPDATE (but NOT DELETE).

Reveal Solution

-- Analyst role (read-only)
CREATE ROLE analyst WITH LOGIN PASSWORD 'analyst_pass';
GRANT CONNECT ON DATABASE mystore TO analyst;
GRANT USAGE ON SCHEMA public TO analyst;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO analyst;

-- API service role (read-write, no delete)
CREATE ROLE api_service WITH LOGIN PASSWORD 'api_pass';
GRANT CONNECT ON DATABASE mystore TO api_service;
GRANT USAGE ON SCHEMA public TO api_service;
GRANT SELECT, INSERT, UPDATE ON ALL TABLES IN SCHEMA public TO api_service;
GRANT USAGE ON ALL SEQUENCES IN SCHEMA public TO api_service;

Exercise 4: Find Slow Queries

Write a query using pg_stat_statements to find the top 5 queries by average execution time that have been called more than 100 times.

Reveal Solution

SELECT
    calls,
    ROUND(mean_exec_time::numeric, 2) AS avg_ms,
    ROUND(total_exec_time::numeric, 2) AS total_ms,
    LEFT(query, 100) AS query_preview
FROM pg_stat_statements
WHERE calls > 100
ORDER BY mean_exec_time DESC
LIMIT 5;

📝 Quiz — Test Your Knowledge!

Question 1: What does VACUUM do?

Question 2: What is the Write-Ahead Log (WAL)?

Question 3: Which pg_dump format is recommended for production backups?

Question 4: In streaming replication, which server accepts writes?

Question 5: What should `shared_buffers` typically be set to?

Question 6: What is the principle of least privilege?

Question 7: What does PgBouncer do?

Question 8: Which view shows currently running queries?

🏆 Course Complete — You Made It!

13

Modules completed

40+

Lessons mastered

100+

SQL examples practiced

🏅

PostgreSQL Hero!

Congratulations! You’ve gone from “What’s a database?” to understanding production-grade PostgreSQL. Here’s everything you’ve learned across all 13 modules:

Module	What You Learned
1. Meet PostgreSQL	What databases are, why PostgreSQL, who uses it
2. Installation	Setting up PostgreSQL and DBeaver on any platform
3. SQL Basics	CREATE, INSERT, SELECT, UPDATE, DELETE — the CRUD operations
4. Data Types	INTEGER, VARCHAR, TIMESTAMP, BOOLEAN, constraints (PK, FK, UNIQUE, CHECK)
5. Querying	WHERE, ORDER BY, LIMIT, LIKE, BETWEEN, IN, NULL handling
6. Aggregations	COUNT, SUM, AVG, GROUP BY, HAVING
7. JOINs	INNER, LEFT, RIGHT, FULL OUTER, CROSS, Self JOIN
8. Subqueries & CTEs	Nested queries, WITH clause, recursive CTEs
9. Indexes	B-tree, GIN, GiST, EXPLAIN ANALYZE, query optimization
10. Functions	Built-in functions, CREATE FUNCTION, stored procedures, triggers
11. Transactions	ACID, BEGIN/COMMIT/ROLLBACK, isolation levels, MVCC, deadlocks
12. JSON & Arrays	JSONB operators, GIN indexes, arrays, hybrid data modeling
13. Production	VACUUM, WAL, backup, replication, security, tuning, monitoring

🔮 Where to Go From Here

Practice: Build a real project — an e-commerce database, a blog platform, or a task management system. Nothing beats hands-on experience.

Advanced topics to explore: Window functions, partitioning, full-text search, PostGIS (geospatial), foreign data wrappers, pgvector (AI embeddings), and database design patterns.

Certifications: Consider the EDB PostgreSQL certification to validate your skills professionally.

Community: Join the PostgreSQL mailing lists, attend PGConf events, and contribute to the ecosystem. The PostgreSQL community is one of the most welcoming in all of tech.

🧒 A Final Message

Databases are the backbone of every application you use — from Instagram to banking to healthcare. By mastering PostgreSQL, you’ve learned a skill that will serve you throughout your entire career, no matter which programming language, framework, or cloud provider you work with. PostgreSQL isn’t going anywhere — it’s been around since 1986 and it’s only getting better. You’ve made an excellent investment in yourself. Now go build something amazing! 🚀

🚀 Production PostgreSQL

1️⃣ VACUUM & Autovacuum — Spring Cleaning

🧒 ELI5 — The Messy Roommate

Dead Tuples — The Problem

⚠️ 20% Dead Rows Is a Warning Sign

Manual VACUUM

VACUUM vs VACUUM FULL

🌎 The Rule of Thumb

Autovacuum — The Robot Vacuum

💡 Pro Tip — Never Disable Autovacuum

2️⃣ WAL — Write-Ahead Log

🧒 ELI5 — The Airplane’s Black Box

How WAL Works (Step by Step)

You execute: UPDATE accounts SET balance = 500 WHERE id = 1;

WAL record is written to disk FIRST

Data change happens in memory (shared buffers)

COMMIT confirms success

Background writer eventually flushes to data files

Key WAL Settings

Checkpoints — The Sync Point

🌎 Why WAL Matters for You

3️⃣ Backup & Restore

🧒 ELI5 — Photocopying Your Diary

pg_dump — Logical Backup (Most Common)

Backup Formats Compared

pg_restore — Restoring Backups

pg_dumpall — Backup Everything

pg_basebackup — Physical Backup

Automated Backup Script

⚠️ The #1 Backup Rule: TEST YOUR RESTORES

Point-in-Time Recovery (PITR) — The Time Machine

Take a base backup with pg_basebackup

Archive WAL files continuously

Disaster strikes! Someone drops a table at 3:42 PM

Restore base backup + replay WAL up to 3:41 PM

💡 Pro Tip — Backup Strategy

4️⃣ Replication — Cloning Your Database

🧒 ELI5 — Twins That Stay in Sync

Types of Replication

Streaming Replication (Most Common)

Configure the Primary Server

Allow Replication Connections (pg_hba.conf)

Create the Replica with pg_basebackup

Start the Replica — It Automatically Streams WAL

Monitoring Replication Lag

Logical Replication (Selective Table Sync)

🌎 Real-World Replication Setup

5️⃣ Security: Roles & Permissions

🧒 ELI5 — Hotel Key Cards

Creating Roles

GRANT & REVOKE Permissions

Role Hierarchy Example

Row Level Security (RLS) — Row-by-Row Control

🌎 RLS in Action: Supabase

pg_hba.conf — Connection Authentication

⚠️ Security Best Practices

6️⃣ Configuration Tuning

🧒 ELI5 — Tuning a Race Car

The Big Five Settings (postgresql.conf)

Example: Tuning for a 32GB RAM Server

💡 Pro Tip — Use PGTune

pg_stat_statements — Find Slow Queries

Connection Pooling with PgBouncer

🌎 Why Connection Pooling Matters

7️⃣ Monitoring — Watching Your Database

🧒 ELI5 — The Dashboard on Your Car

pg_stat_activity — Who’s Doing What Right Now?

Finding Long-Running Queries

pg_stat_user_tables — Table Health

⚠️ Red Flag: seq_scan >> idx_scan

Lock Monitoring

Database Size Monitoring

💡 Pro Tip — Monitoring Checklist

🏋️ Practice Exercises

Exercise 1: Check Dead Rows

Exercise 2: Backup and Restore

Exercise 3: Create Roles

Exercise 4: Find Slow Queries

📝 Quiz — Test Your Knowledge!

Question 1: What does VACUUM do?

You execute: `UPDATE accounts SET balance = 500 WHERE id = 1;`

Question 5: What should `shared_buffers` typically be set to?