PostgreSQL is also a document database! Store, query, and index JSON. Plus arrays for multi-value columns.
Imagine you write a letter and photocopy it. The photocopy is JSON — it stores the exact text you wrote, character for character, including extra spaces and the order you put things. Every time someone wants to read one sentence, they have to scan through the whole letter from the beginning.
Now imagine you type that same letter into a computer and it gets organized into a searchable database. That’s JSONB (the “B” stands for Binary). It parses your text, throws away useless whitespace, sorts the keys, and stores it in a special binary format that’s lightning-fast to search. You can even build an index on it!
Bottom line: Almost always use JSONB. It’s faster to query, supports indexing, and only costs a tiny bit more storage.
| Feature | JSON | JSONB |
|---|---|---|
| Storage format | Raw text (exact copy) | Decomposed binary |
| Preserves key order? | Yes | No (sorts keys) |
| Preserves whitespace? | Yes | No (strips it) |
| Duplicate keys? | Keeps all | Keeps last value |
| Write speed | Slightly faster | Slightly slower (must parse) |
| Read/query speed | Slower (must re-parse) | Much faster |
| Supports GIN index? | No | Yes! |
| Containment operators (@>)? | No | Yes |
-- Quick demo: JSON stores your exact text SELECT '{"b": 2, "a": 1}'::json; -- Result: {"b": 2, "a": 1} (exact text preserved) -- JSONB parses, sorts keys, strips whitespace SELECT '{"b": 2, "a": 1}'::jsonb; -- Result: {"a": 1, "b": 2} (sorted, clean)
Only use JSON (not JSONB) if you need to preserve the exact original text — for example, logging raw API responses where byte-for-byte fidelity matters. In every other case, use JSONB.
Every house has a junk drawer where random stuff goes — batteries, tape, keys, weird cables. A JSONB column is like that drawer, except PostgreSQL secretly organizes everything inside so you can find anything instantly. You throw in whatever structure you want, and PostgreSQL handles the rest.
CREATE TABLE products ( id SERIAL PRIMARY KEY, name VARCHAR(200) NOT NULL, category VARCHAR(100), price NUMERIC(10,2), attributes JSONB -- flexible attributes column! );
-- A laptop has screen_size, ram, storage INSERT INTO products (name, category, price, attributes) VALUES ('MacBook Pro 16', 'Laptops', 2499.00, '{"screen_size": "16 inch", "ram": "32GB", "storage": "1TB SSD", "color": "Space Gray"}'); -- A t-shirt has size, material, pattern — completely different attributes! INSERT INTO products (name, category, price, attributes) VALUES ('Cotton T-Shirt', 'Clothing', 29.99, '{"size": "L", "material": "100% Cotton", "pattern": "Striped", "washable": true}'); -- A blender has wattage, speeds, capacity INSERT INTO products (name, category, price, attributes) VALUES ('Pro Blender 3000', 'Kitchen', 149.99, '{"wattage": 1200, "speeds": 10, "capacity_liters": 2.5, "bpa_free": true}');
In a traditional relational design, you’d need separate columns for screen_size, ram, size, material, wattage, etc. — most of which would be NULL for most products. With JSONB, each product stores only the attributes that apply to it. This is exactly how Amazon, Shopify, and eBay handle their product catalogs.
INSERT INTO products (name, category, price, attributes) VALUES ('Gaming Monitor', 'Monitors', 599.99, '{ "resolution": "4K", "refresh_rate": "144Hz", "panel": "IPS", "ports": ["HDMI 2.1", "DisplayPort 1.4", "USB-C"], "dimensions": { "width": "27 inch", "height": "16 inch", "weight_kg": 6.5 } }');
Notice how ports is a JSON array and dimensions is a nested object. JSONB handles both beautifully.
Think of JSON operators as different hands you can reach into the junk drawer with. Some grab the whole item (still in its wrapper), some unwrap it first, some can reach deep into nested compartments, and some just check if something’s in there without grabbing it.
| Operator | What It Does | Returns |
|---|---|---|
-> | Get JSON object field by key | JSON / JSONB |
->> | Get JSON object field by key | TEXT |
#> | Get value at a nested path | JSON / JSONB |
#>> | Get value at a nested path | TEXT |
@> | Does left contain right? | BOOLEAN |
? | Does the key exist? | BOOLEAN |
-> vs ->> — JSON vs Text-- -> returns the value as JSONB (still wrapped) SELECT name, attributes->'ram' AS ram_jsonb FROM products WHERE category = 'Laptops'; -- Result: "32GB" (with quotes — it's still JSONB type) -- ->> returns the value as plain TEXT (unwrapped) SELECT name, attributes->>'ram' AS ram_text FROM products WHERE category = 'Laptops'; -- Result: 32GB (no quotes — plain text)
Use -> (single arrow) when you need to chain operators or keep the JSON type. Use ->> (double arrow) when you need the final text value for display, comparison, or casting to another type. Think: single = still JSON, double = text.
#> and #>> — Path Navigation-- Reach into nested JSON using a path -- Get the "width" from the nested "dimensions" object SELECT name, attributes#>'{dimensions, width}' AS width_jsonb, attributes#>>'{dimensions, width}' AS width_text FROM products WHERE name = 'Gaming Monitor'; -- width_jsonb: "27 inch" (JSONB) -- width_text: 27 inch (TEXT) -- Get the second port from the "ports" array (0-indexed!) SELECT attributes#>>'{ports, 1}' AS second_port FROM products WHERE name = 'Gaming Monitor'; -- Result: DisplayPort 1.4
@> Containment — “Does This JSON Contain...?”-- Find all products where attributes contain {"bpa_free": true} SELECT name, price FROM products WHERE attributes @> '{"bpa_free": true}'; -- Finds the blender! This operator is VERY fast with a GIN index. -- Find laptops with specific RAM SELECT name, price FROM products WHERE attributes @> '{"ram": "32GB"}';
? Existence — “Does This Key Exist?”-- Find all products that HAVE a "screen_size" attribute SELECT name, category FROM products WHERE attributes ? 'screen_size'; -- Returns laptops and monitors (not t-shirts or blenders) -- ? also has variants: -- ?| (ANY of these keys exist) SELECT name FROM products WHERE attributes ?| ARRAY['wattage', 'screen_size']; -- ?& (ALL of these keys exist) SELECT name FROM products WHERE attributes ?& ARRAY['ram', 'storage'];
Now you know how to reach into the drawer. But what if you want to dump the whole drawer onto the table and examine each item one by one? Or find all drawers that contain a specific type of item? That’s what JSON functions do — they let you unpack, transform, and analyze JSON data in powerful ways.
-- Find products with wattage greater than 1000 SELECT name, price, (attributes->>'wattage')::INTEGER AS watts FROM products WHERE (attributes->>'wattage')::INTEGER > 1000; -- Find products that are washable SELECT name, price FROM products WHERE (attributes->>'washable')::BOOLEAN = TRUE; -- Partial text match inside JSON SELECT name, attributes->>'material' AS material FROM products WHERE attributes->>'material' LIKE '%Cotton%';
The ->> operator always returns TEXT. If you need to compare numbers or booleans, you must cast using ::INTEGER, ::NUMERIC, ::BOOLEAN, etc. Otherwise '1000' > '200' would be false because it compares text alphabetically!
-- Create a GIN index on the entire JSONB column CREATE INDEX idx_products_attributes ON products USING GIN (attributes); -- Now these queries are FAST (use the GIN index): SELECT * FROM products WHERE attributes @> '{"ram": "32GB"}'; SELECT * FROM products WHERE attributes ? 'wattage'; -- For targeting a SPECIFIC path (more efficient if you only query one key): CREATE INDEX idx_products_category_attr ON products USING GIN ((attributes->'ports'));
-- Turn each JSON key-value pair into a row SELECT p.name, kv.key AS attribute_name, kv.value AS attribute_value FROM products p, jsonb_each(p.attributes) AS kv(key, value) WHERE p.name = 'MacBook Pro 16'; -- Result: -- name | attribute_name | attribute_value -- MacBook Pro 16 | color | "Space Gray" -- MacBook Pro 16 | ram | "32GB" -- MacBook Pro 16 | screen_size | "16 inch" -- MacBook Pro 16 | storage | "1TB SSD"
-- Expand the "ports" array into individual rows SELECT p.name, port.value AS port_name FROM products p, jsonb_array_elements(p.attributes->'ports') AS port(value) WHERE p.name = 'Gaming Monitor'; -- Result: -- name | port_name -- Gaming Monitor | "HDMI 2.1" -- Gaming Monitor | "DisplayPort 1.4" -- Gaming Monitor | "USB-C" -- Use jsonb_array_elements_text to get TEXT instead of JSONB: SELECT p.name, port AS port_name FROM products p, jsonb_array_elements_text(p.attributes->'ports') AS port WHERE p.name = 'Gaming Monitor';
-- Convert JSON attributes to proper typed columns SELECT p.name, p.price, attr.* FROM products p, jsonb_to_record(p.attributes) AS attr( ram TEXT, storage TEXT, screen_size TEXT ) WHERE p.category = 'Laptops'; -- Result: -- name | price | ram | storage | screen_size -- MacBook Pro 16 | 2499.00 | 32GB | 1TB SSD | 16 inch
You can’t erase one word on a printed sticker, but you CAN peel off the old sticker and put on a new one with the word changed. That’s how JSONB modification works: PostgreSQL creates a new version of the JSON with your changes applied. It feels like editing in-place, but under the hood it’s replace-and-swap.
-- Change the RAM of a laptop from 32GB to 64GB UPDATE products SET attributes = jsonb_set(attributes, '{ram}', '"64GB"') WHERE name = 'MacBook Pro 16'; -- Update a nested value (change the weight) UPDATE products SET attributes = jsonb_set(attributes, '{dimensions, weight_kg}', '5.8') WHERE name = 'Gaming Monitor'; -- Add a NEW key (create_missing = true by default) UPDATE products SET attributes = jsonb_set(attributes, '{warranty_years}', '3') WHERE name = 'MacBook Pro 16';
-- Add multiple new keys at once by merging UPDATE products SET attributes = attributes || '{"on_sale": true, "discount_pct": 15}'::jsonb WHERE name = 'Cotton T-Shirt'; -- If a key already exists, || overwrites it with the new value UPDATE products SET attributes = attributes || '{"price_note": "Holiday sale!"}'::jsonb WHERE category = 'Clothing';
-- Remove a single key UPDATE products SET attributes = attributes - 'on_sale' WHERE name = 'Cotton T-Shirt'; -- Remove multiple keys at once UPDATE products SET attributes = attributes - 'discount_pct' - 'price_note' WHERE name = 'Cotton T-Shirt'; -- Remove a nested path UPDATE products SET attributes = attributes #- '{dimensions, weight_kg}' WHERE name = 'Gaming Monitor';
-- Remove all keys that have null values SELECT jsonb_strip_nulls( '{"name": "Widget", "color": null, "size": "M", "note": null}'::jsonb ); -- Result: {"name": "Widget", "size": "M"}
All JSONB modifications happen within a transaction, so they’re atomic. If you need to update multiple JSON keys, chain them: jsonb_set(jsonb_set(attributes, '{a}', '"x"'), '{b}', '"y"'). Or use the || operator to merge an entire object of changes at once — it’s cleaner.
Normally, a column holds one value per row — like one name, one price, one date. But what if a blog post has multiple tags? Or a recipe has multiple ingredients? You COULD create a separate “tags” table... but sometimes an array column is simpler. It’s like a cell in a spreadsheet that can hold a comma-separated list, except PostgreSQL actually understands and can search inside it.
CREATE TABLE articles ( id SERIAL PRIMARY KEY, title VARCHAR(300) NOT NULL, tags TEXT[], -- array of text strings ratings INTEGER[], -- array of integers scores NUMERIC(5,2)[] -- array of decimals );
-- Method 1: ARRAY constructor INSERT INTO articles (title, tags, ratings) VALUES ('PostgreSQL JSON Guide', ARRAY['postgresql', 'json', 'database', 'tutorial'], ARRAY[5, 4, 5, 3, 5]); -- Method 2: Curly-brace syntax INSERT INTO articles (title, tags, ratings) VALUES ('Learning SQL Joins', '{sql, joins, beginner}', '{4, 5, 4, 4}'); -- Method 3: Cast a string INSERT INTO articles (title, tags, ratings) VALUES ('Advanced Indexing', '{indexing, performance, b-tree, gin}'::TEXT[], '{5, 5, 4, 5, 5}'::INTEGER[]);
| Operator / Function | What It Does | Example |
|---|---|---|
ANY(array) | Value equals any element | WHERE 'sql' = ANY(tags) |
ALL(array) | Value matches all elements | WHERE 5 = ALL(ratings) |
@> | Array contains another array | WHERE tags @> ARRAY['sql'] |
<@ | Array is contained by another | WHERE ARRAY['sql'] <@ tags |
&& | Arrays overlap (any common element) | WHERE tags && ARRAY['json','sql'] |
array_length | Length of the array | array_length(tags, 1) |
unnest | Expand array into rows | SELECT unnest(tags) |
array_agg | Aggregate rows into array | SELECT array_agg(name) |
-- Find articles tagged with 'postgresql' SELECT title, tags FROM articles WHERE 'postgresql' = ANY(tags); -- Find articles tagged with BOTH 'sql' AND 'beginner' SELECT title, tags FROM articles WHERE tags @> ARRAY['sql', 'beginner']; -- Find articles with ANY overlap with these tags SELECT title, tags FROM articles WHERE tags && ARRAY['json', 'performance']; -- Access specific array element (1-indexed!) SELECT title, tags[1] AS first_tag FROM articles; -- Get array length SELECT title, array_length(tags, 1) AS num_tags FROM articles;
-- Turn each tag into its own row SELECT a.title, t.tag FROM articles a, unnest(a.tags) AS t(tag); -- Count how many articles use each tag SELECT tag, COUNT(*) AS article_count FROM articles, unnest(tags) AS tag GROUP BY tag ORDER BY article_count DESC;
-- Combine all employee names in each department into one array SELECT department, array_agg(name ORDER BY name) AS employees FROM employees GROUP BY department; -- Result: -- department | employees -- Engineering | {Alice, Bob, Carol} -- Marketing | {Dave, Eve}
Unlike most programming languages (where arrays start at 0), PostgreSQL arrays start at index 1. So tags[1] is the first element, tags[2] is the second, and so on. This catches many programmers off guard!
Knowing JSON operators is like knowing how to chop, sauté, and boil. But real cooking means combining techniques into actual dishes. Here are battle-tested patterns that professional developers use every day with JSON and arrays in PostgreSQL.
CREATE TABLE blog_posts ( id SERIAL PRIMARY KEY, title TEXT NOT NULL, content TEXT, tags TEXT[] DEFAULT '{}' ); -- GIN index for fast tag queries CREATE INDEX idx_posts_tags ON blog_posts USING GIN (tags); -- Find posts with a specific tag SELECT title FROM blog_posts WHERE 'python' = ANY(tags); -- Add a tag to an existing post UPDATE blog_posts SET tags = array_append(tags, 'featured') WHERE id = 1; -- Remove a tag UPDATE blog_posts SET tags = array_remove(tags, 'draft') WHERE id = 1;
CREATE TABLE users ( id SERIAL PRIMARY KEY, email VARCHAR(255) UNIQUE NOT NULL, preferences JSONB DEFAULT '{ "theme": "light", "notifications": true, "language": "en", "timezone": "UTC" }'::jsonb ); -- Update just the theme preference UPDATE users SET preferences = jsonb_set(preferences, '{theme}', '"dark"') WHERE id = 1; -- Toggle notifications off UPDATE users SET preferences = jsonb_set(preferences, '{notifications}', 'false') WHERE id = 1; -- Find users who prefer dark theme SELECT email FROM users WHERE preferences @> '{"theme": "dark"}';
CREATE TABLE api_logs ( id BIGSERIAL PRIMARY KEY, endpoint VARCHAR(500), method VARCHAR(10), status_code INTEGER, request_body JSONB, response_body JSONB, created_at TIMESTAMPTZ DEFAULT NOW() ); -- Find all failed API calls that had a specific error code in the response SELECT endpoint, status_code, response_body->>'error_message' AS error FROM api_logs WHERE status_code >= 400 AND response_body ? 'error_message' ORDER BY created_at DESC LIMIT 20;
-- The "hybrid" approach: fixed columns for common fields, -- JSONB for variable attributes CREATE TABLE catalog ( id SERIAL PRIMARY KEY, sku VARCHAR(50) UNIQUE NOT NULL, name VARCHAR(300) NOT NULL, category VARCHAR(100) NOT NULL, price NUMERIC(12,2) NOT NULL, in_stock BOOLEAN DEFAULT TRUE, specs JSONB DEFAULT '{}', -- variable attributes images TEXT[] DEFAULT '{}', -- multiple image URLs created_at TIMESTAMPTZ DEFAULT NOW() ); CREATE INDEX idx_catalog_specs ON catalog USING GIN (specs); -- Search for laptops with at least 16GB RAM under $2000 SELECT name, price, specs->>'ram' AS ram FROM catalog WHERE category = 'Laptops' AND specs @> '{"ram": "16GB"}' AND price < 2000;
Amazon and Shopify use this exact hybrid pattern. Core product data (name, price, SKU) lives in normal columns with proper types and indexes. Category-specific specs (screen size for monitors, thread count for sheets, engine size for cars) live in JSONB. It’s the best of both worlds: relational structure where it matters, flexibility where you need it.
The biggest mistake developers make is storing everything in JSON because it feels easy. “Just throw it all in a JSONB column!” This turns your relational database into a slow, unstructured mess. JSON is powerful, but it’s a tool, not a crutch.
A suitcase (JSON) is great for travel — you throw in whatever you need, it’s flexible. But you don’t use a suitcase as your permanent closet at home. For everyday clothes, a wardrobe with organized shelves (regular columns) is much better — you can find things faster, things stay neat, and you know exactly what’s in each drawer. Use suitcases for trips (variable data), wardrobes for daily life (fixed-structure data).
| Scenario | Use JSON? | Why |
|---|---|---|
| Every user has a name, email, birthday | ✗ No | Fixed structure — use VARCHAR, DATE columns |
| Every order has a total, status, created_at | ✗ No | Known fields that need constraints & indexing |
| You frequently JOIN on or GROUP BY the field | ✗ No | JOINs on JSON fields are slow and awkward |
| The field needs UNIQUE or FOREIGN KEY constraints | ✗ No | JSON fields can’t have FK/UNIQUE constraints |
| Product specs vary wildly by category | ✔ Yes | Different products have different attributes |
| Storing user preferences/settings | ✔ Yes | Highly variable, rarely JOINed on |
| Logging raw API requests/responses | ✔ Yes | Unpredictable structure, mostly for auditing |
| Metadata that changes frequently | ✔ Yes | Avoids constant ALTER TABLE migrations |
The best practice is the hybrid approach: use regular typed columns for data that’s well-defined and frequently queried (name, price, status, dates). Use JSONB for truly variable, supplementary data (custom attributes, preferences, raw logs). This gives you the speed and integrity of relational columns where it counts, and the flexibility of JSON where you need it.
Create a table called events with columns: id (SERIAL PK), event_name (VARCHAR), metadata (JSONB), and tags (TEXT[]). Insert an event named "Tech Conference" with metadata {"location": "NYC", "capacity": 500, "virtual": true} and tags ['tech', 'conference', '2025'].
CREATE TABLE events ( id SERIAL PRIMARY KEY, event_name VARCHAR(300), metadata JSONB, tags TEXT[] ); INSERT INTO events (event_name, metadata, tags) VALUES ('Tech Conference', '{"location": "NYC", "capacity": 500, "virtual": true}', ARRAY['tech', 'conference', '2025']);
Write a query that finds all products where the attributes JSONB column contains {"bpa_free": true}. Also select the product name and price.
SELECT name, price FROM products WHERE attributes @> '{"bpa_free": true}';
Update the "Gaming Monitor" product to add a key "hdr" with value true to its attributes. Then remove the "panel" key from its attributes.
-- Add the "hdr" key UPDATE products SET attributes = jsonb_set(attributes, '{hdr}', 'true') WHERE name = 'Gaming Monitor'; -- Remove the "panel" key UPDATE products SET attributes = attributes - 'panel' WHERE name = 'Gaming Monitor';
Find all articles that have the tag 'postgresql'. Then write a query that counts how many articles each tag appears in, ordered by popularity.
-- Articles with 'postgresql' tag SELECT title, tags FROM articles WHERE 'postgresql' = ANY(tags); -- Tag popularity SELECT tag, COUNT(*) AS article_count FROM articles, unnest(tags) AS tag GROUP BY tag ORDER BY article_count DESC;
@> operator do for JSONB?You now have the skills to use PostgreSQL as both a relational and a document database. Here’s what you learned:
| Topic | Key Takeaway |
|---|---|
| JSON vs JSONB | Almost always use JSONB — faster queries, supports indexing |
| Storing JSON | Perfect for flexible, variable-structure data like product attributes |
| JSON Operators | -> (JSON), ->> (text), @> (containment), ? (key exists) |
| Querying | GIN index + containment operators = blazing-fast JSON queries |
| Modifying | jsonb_set, || (merge), - (remove key) |
| Arrays | ANY, @>, &&, unnest, array_agg — multi-value columns |
| Best Practice | Hybrid approach: regular columns for fixed data, JSONB for variable data |
You’ve learned how to build databases. Now it’s time to learn how to run them in the real world. The next module covers VACUUM (cleaning dead rows), WAL (crash recovery), backup & restore, replication (database clones), security (roles & permissions), and performance tuning. This is the module that turns you from a SQL developer into a database administrator.