Models & Materializations | dbt Course

📄 What is a Model?

A model is just a SQL SELECT statement saved in a file. That's it. Seriously — that's the whole thing.

dbt takes your SELECT and turns it into a real table or view in your warehouse. It's like writing a recipe on a card — dbt reads the card and cooks the dish for you. You write the instructions, dbt does the cooking!

Let's look at the simplest possible model. Create a file called my_first_model.sql inside your models/ folder:

models/my_first_model.sql

-- This entire file IS the model. Nothing else needed!

SELECT
    1 AS id,
    'Hello dbt!' AS greeting

When you run dbt run, dbt reads this file and executes something like:

What dbt actually runs in your warehouse

CREATE VIEW my_first_model AS (
    SELECT
        1 AS id,
        'Hello dbt!' AS greeting
);

The Golden Rule of dbt Models:

📄 One .sql file = One model = One table or view in your warehouse

If you have 50 SQL files in your models/ folder, running dbt run creates 50 tables/views. It's that simple. No config files, no registration, no boilerplate — just write SQL and go.

Here's a more realistic model — a staging model that cleans up raw customer data:

models/staging/stg_customers.sql

SELECT
    id            AS customer_id,
    first_name,
    last_name,
    email_addr    AS email,
    created_at::DATE AS signup_date,
    CASE
        WHEN status = 'A' THEN 'active'
        WHEN status = 'I' THEN 'inactive'
        ELSE 'unknown'
    END AS customer_status
FROM
    {{ source('raw', 'customers') }}

Notice there's no CREATE TABLE or CREATE VIEW in the file. You only write the SELECT. dbt wraps it in the appropriate DDL for you based on the materialization you choose. That's the magic!

🧱 Materializations — How dbt Builds Your Models

A "materialization" is just a fancy word for HOW dbt creates your model in the warehouse. Think of it like choosing how to serve food:

🪟 View = Look through a window (query runs fresh every time)

📸 Table = Print a photo (data is stored physically)

📸+ Incremental = Add new photos to an album (only process new data)

📝 Ephemeral = A sticky note (exists only temporarily)

📷 Snapshot = A daily photo diary (tracks changes over time)

dbt has 5 materialization types. Choosing the right one is one of the most important decisions you'll make. Let's deep-dive into each one.

🪟 Materialization 1: View (Default)

A view is like a window into your data. Every time someone looks through the window, the query runs fresh. Nothing is stored — it's just a saved query. Imagine a window in your house that always shows the live street outside. You don't take a photo of the street; you just look through the window whenever you want to see what's happening right now.

models/staging/stg_orders.sql

{{ config(materialized='view') }}

-- This creates a VIEW in your warehouse
-- The query runs fresh every time someone queries stg_orders

SELECT
    id         AS order_id,
    user_id    AS customer_id,
    order_date,
    status,
    amount / 100.0 AS amount_dollars
FROM
    {{ source('raw', 'orders') }}

When to use views:

Staging models (light transformations)
Models that don't need to be super fast to query
When you always want the freshest data
Small datasets where query speed isn't a concern

✅ Pros

No storage cost (nothing is physically stored)
Always shows the latest data
Builds instantly (just saves the query)
Perfect for staging layers

❌ Cons

Slow to query (runs the full SQL every time)
Stacks up — if view A references view B which references view C, querying A runs all three
Not suitable for dashboards with many users

📸 Materialization 2: Table

A table is like printing a photo. The data is physically stored in your warehouse. It's fast to look at (no waiting for the camera to take a new picture), but you need to reprint the entire photo (rebuild the table) whenever the underlying data changes.

models/marts/fct_daily_revenue.sql

{{ config(materialized='table') }}

-- This creates a physical TABLE in your warehouse
-- Data is stored — queries are fast!

SELECT
    DATE_TRUNC('day', order_date) AS revenue_date,
    COUNT(*) AS total_orders,
    SUM(amount_dollars) AS total_revenue,
    AVG(amount_dollars) AS avg_order_value
FROM
    {{ ref('stg_orders') }}
WHERE
    status = 'completed'
GROUP BY 1

When to use tables:

Mart models that dashboards query directly
Heavy aggregations and complex joins
Models queried frequently by many users
When query speed matters more than storage cost

✅ Pros

Fast to query (data is pre-computed)
Great for dashboards and BI tools
No cascading query chains

❌ Cons

Uses warehouse storage
Full rebuild every dbt run (drops and recreates)
Rebuild can be slow for large datasets

View vs Table — The Restaurant Analogy:

🪟 View = A made-to-order dish. The chef cooks it fresh every time someone orders. Great for quality, but slow if 100 people order at once.

📸 Table = A buffet dish. The chef cooks a big batch in advance. Everyone grabs a plate instantly. But if the recipe changes, you need to cook a whole new batch.

📸+ Materialization 3: Incremental

Imagine you have a photo album with 10,000 photos. Every day, you take 50 new photos. Would you reprint ALL 10,000 photos every single day? Of course not! You'd only print the 50 new ones and add them to the album.

That's exactly what incremental models do. Instead of rebuilding the entire table every run, dbt only processes the new or changed rows and inserts/merges them. This saves massive amounts of time and compute cost.

models/marts/fct_events.sql

{{ config(
    materialized='incremental',
    unique_key='event_id',
    incremental_strategy='merge'
) }}

SELECT
    event_id,
    user_id,
    event_type,
    event_timestamp,
    page_url,
    device_type
FROM
    {{ ref('stg_events') }}

-- This WHERE clause is the magic!
-- It only runs on the FIRST build (full load)
-- On subsequent runs, it filters to only NEW rows
{% if is_incremental() %}
    WHERE event_timestamp > (
        SELECT MAX(event_timestamp)
        FROM {{ this }}
    )
{% endif %}

How It Works — Step by Step

Full Refresh vs Incremental — Visual Comparison

❌ Table (Full Rebuild)

Day 1: Process 10,000 rows
Day 2: Process 10,050 rows
Day 3: Process 10,100 rows
Day 4: Process 10,150 rows
Total processed: 40,300 rows

✅ Incremental

Day 1: Process 10,000 rows (first run)
Day 2: Process 50 new rows
Day 3: Process 50 new rows
Day 4: Process 50 new rows
Total processed: 10,150 rows

That's a 75% reduction in processing! For tables with billions of rows, this is the difference between minutes and hours.

The 3 Incremental Strategies

Strategy	How It Works	Best For
`append`	Simply adds new rows to the table. No duplicate checking.	Event logs, immutable data (rows never change)
`merge`	Inserts new rows AND updates existing rows (using `unique_key`).	Data that can be updated (orders, user profiles)
`delete+insert`	Deletes matching rows first, then inserts the new version.	When merge isn't supported or you need a clean swap

The Full-Refresh Escape Hatch: If your incremental model gets out of sync or you change the logic, you can force a complete rebuild with:

dbt run --full-refresh -s fct_events

This drops the existing table and rebuilds it from scratch — like reprinting the entire photo album. Use it when you change column definitions or fix bugs in your incremental logic.

The {{ this }} variable is special — it refers to the current model's table in the warehouse. So SELECT MAX(event_timestamp) FROM {{ this }} gets the latest timestamp already in your table, and you only process rows newer than that.

📝 Materialization 4: Ephemeral

An ephemeral model is like a sticky note you use while cooking but throw away after. You write "chop onions, mince garlic" on a sticky note as a reminder, but you don't frame it and hang it on the wall. It exists only to help you while you're cooking.

In dbt, an ephemeral model doesn't create anything in your warehouse. Instead, it gets injected as a CTE (Common Table Expression) inside whatever model references it. It's invisible to your warehouse — only dbt knows it exists.

models/intermediate/int_order_amounts.sql

{{ config(materialized='ephemeral') }}

-- This model won't create a table or view!
-- It becomes a CTE inside any model that ref()s it

SELECT
    order_id,
    customer_id,
    amount_dollars,
    amount_dollars * 0.1 AS tax_amount,
    amount_dollars * 1.1 AS total_with_tax
FROM
    {{ ref('stg_orders') }}

When another model references this ephemeral model, dbt compiles it as a CTE:

What dbt actually compiles (behind the scenes)

-- dbt injects the ephemeral model as a CTE
WITH int_order_amounts AS (
    SELECT
        order_id,
        customer_id,
        amount_dollars,
        amount_dollars * 0.1 AS tax_amount,
        amount_dollars * 1.1 AS total_with_tax
    FROM raw.orders
)
SELECT * FROM int_order_amounts
WHERE total_with_tax > 100

When to use ephemeral:

Intermediate calculations you don't need to query directly
Helper logic used by multiple models
When you want to keep your warehouse clean (fewer objects)

Careful! You can't query an ephemeral model directly in your SQL editor — it doesn't exist in the warehouse. You also can't use dbt test on it directly. Use ephemeral sparingly and only for truly intermediate calculations.

📷 Materialization 5: Snapshot

Snapshots are like taking a photo of your fridge every day. Monday: 6 eggs, whole milk, cheddar cheese. Tuesday: 4 eggs, whole milk, swiss cheese. The milk brand didn't change, but the cheese did! By looking at all your daily photos, you can see exactly what changed and when.

In data terms, this is called Slowly Changing Dimension Type 2 (SCD Type 2). Don't let the fancy name scare you — it just means "keep a history of how rows change over time."

snapshots/snapshot_customers.sql

{% snapshot snapshot_customers %}

{{
    config(
        target_schema='snapshots',
        unique_key='customer_id',
        strategy='timestamp',
        updated_at='updated_at',
    )
}}

SELECT
    customer_id,
    email,
    plan_type,
    updated_at
FROM {{ source('raw', 'customers') }}

{% endsnapshot %}

SCD Type 2 — How It Tracks Changes

Imagine customer #42 changes their email and upgrades their plan. Here's what the snapshot table looks like over time:

customer_id	email	plan_type	dbt_valid_from	dbt_valid_to
42	old@email.com	free	2024-01-01	2024-03-15
42	old@email.com	pro	2024-03-15	2024-06-20
42	new@email.com	pro	2024-06-20	null (current)

The row with dbt_valid_to = null is always the current version. Previous versions have an end date. It's like a time machine for your data — you can ask "What was customer 42's email on February 1st?" and get the answer!

When to use snapshots:

📊 Tracking customer plan changes for churn analysis

💰 Tracking product price changes over time

🏢 Tracking employee department transfers

📦 Tracking order status changes (pending → shipped → delivered)

🌳 The Materialization Decision Tree

Not sure which materialization to pick? Follow this flowchart:

Start Here: What kind of model?

🔍

Is it a staging model? (light cleaning, renaming)

→ View

📊

Is the table huge? (>1M rows) (events, logs, transactions)

→ Incremental

📷

Do you need historical tracking? (what changed and when)

→ Snapshot

📝

Is it only used inside other models? (helper logic, no direct queries)

→ Ephemeral

🍽️

Otherwise (mart models, dashboards) (needs to be fast to query)

→ Table

When in doubt, start with view (it's the default). If queries are too slow, upgrade to table. If the table is too big to rebuild every run, upgrade to incremental. You can always change the materialization later — it's just one line of config!

🏛️ The 3-Layer Architecture

Professional dbt projects organize models into three layers, like a restaurant kitchen with three stations. Raw ingredients come in, get prepped, get cooked, and come out as a beautiful dish. Every model belongs to one of these layers.

🥬

Staging

Prep Station

Clean, rename, cast

materialized = view

→

🍳

Intermediate

Cooking Station

Join, combine, enrich

materialized = ephemeral / view

→

🍽️

Marts

Plating Station

Final tables for dashboards

materialized = table / incremental

🥬 Layer 1: Staging (`stg_`)

Purpose: One staging model per source table. Clean up the raw data — rename ugly columns, cast data types, filter out junk. No joins, no aggregations, no business logic.

Think of it like washing vegetables. You don't cook them yet. You just wash off the dirt, peel the skin, and chop them into uniform pieces so they're ready for the next station.

models/staging/stg_customers.sql

{{ config(materialized='view') }}

SELECT
    id            AS customer_id,
    first_name,
    last_name,
    LOWER(email_addr)  AS email,
    created_at::DATE AS signup_date,
    CASE
        WHEN status = 'A' THEN 'active'
        WHEN status = 'I' THEN 'inactive'
        ELSE 'unknown'
    END AS customer_status
FROM
    {{ source('raw', 'customers') }}

🍳 Layer 2: Intermediate (`int_`)

Purpose: Combine staging models. Join tables, apply business logic, create calculated fields. This is where the real cooking happens.

models/intermediate/int_customer_orders.sql

{{ config(materialized='ephemeral') }}

SELECT
    c.customer_id,
    c.first_name,
    c.last_name,
    c.email,
    c.signup_date,
    o.order_id,
    o.order_date,
    o.amount_dollars,
    o.status AS order_status
FROM
    {{ ref('stg_customers') }} c
LEFT JOIN
    {{ ref('stg_orders') }} o
    ON c.customer_id = o.customer_id

🍽️ Layer 3: Marts (`fct_` and `dim_`)

Purpose: The final, polished tables that your dashboards and analysts query. Marts come in two flavors:

📊 Fact Tables (`fct_`)

Events that happened — orders placed, payments made, clicks recorded. They have numbers you can count and sum.

Examples: fct_orders, fct_payments, fct_page_views

👤 Dimension Tables (`dim_`)

Descriptive attributes — customer details, product info, store locations. They describe the "who, what, where" of your facts.

Examples: dim_customers, dim_products, dim_stores

models/marts/fct_customer_summary.sql

{{ config(materialized='table') }}

SELECT
    customer_id,
    first_name,
    last_name,
    email,
    signup_date,
    COUNT(order_id) AS total_orders,
    SUM(amount_dollars) AS lifetime_revenue,
    AVG(amount_dollars) AS avg_order_value,
    MIN(order_date) AS first_order_date,
    MAX(order_date) AS last_order_date,
    CASE
        WHEN COUNT(order_id) >= 10 THEN 'vip'
        WHEN COUNT(order_id) >= 5  THEN 'regular'
        WHEN COUNT(order_id) >= 1  THEN 'new'
        ELSE 'prospect'
    END AS customer_tier
FROM
    {{ ref('int_customer_orders') }}
GROUP BY
    customer_id, first_name, last_name, email, signup_date

Naming Conventions Cheat Sheet

Prefix	Layer	Materialization	Example	Purpose
`stg_`	Staging	View	`stg_customers`	Clean raw data, rename columns
`int_`	Intermediate	Ephemeral / View	`int_customer_orders`	Join and combine staging models
`fct_`	Marts (Fact)	Table / Incremental	`fct_orders`	Event/transaction tables with metrics
`dim_`	Marts (Dimension)	Table	`dim_customers`	Descriptive lookup tables

Real-World Example: E-Commerce Data Flow

Let's trace how raw e-commerce data flows through all 3 layers:

📦

Raw
Sources

→

🥬

stg_customers
stg_orders
stg_products

→

🍳

int_customer
_orders
int_order
_products

→

🍽️

fct_orders
dim_customers
dim_products

→

📊

Dashboards
& Reports

🛠️ Hands-On: Build Your First 3 Models

Time to get your hands dirty! Let's build a complete 3-layer pipeline from scratch. We'll create a staging model, an intermediate model, and a mart model.

Step 1: Create the Staging Model

Create the file models/staging/stg_customers.sql:

models/staging/stg_customers.sql

{{ config(materialized='view') }}

SELECT
    id            AS customer_id,
    first_name,
    last_name,
    LOWER(email)   AS email,
    created_at::DATE AS signup_date
FROM
    {{ source('raw', 'customers') }}

Step 2: Create the Intermediate Model

Create the file models/intermediate/int_customer_orders.sql:

models/intermediate/int_customer_orders.sql

{{ config(materialized='ephemeral') }}

SELECT
    c.customer_id,
    c.first_name,
    c.last_name,
    c.email,
    c.signup_date,
    o.order_id,
    o.order_date,
    o.amount_dollars
FROM
    {{ ref('stg_customers') }} c
LEFT JOIN
    {{ ref('stg_orders') }} o
    ON c.customer_id = o.customer_id

Step 3: Create the Mart Model

Create the file models/marts/fct_customer_summary.sql:

models/marts/fct_customer_summary.sql

{{ config(materialized='table') }}

SELECT
    customer_id,
    first_name,
    last_name,
    email,
    signup_date,
    COUNT(order_id)       AS total_orders,
    COALESCE(SUM(amount_dollars), 0) AS lifetime_revenue,
    COALESCE(AVG(amount_dollars), 0) AS avg_order_value,
    MIN(order_date)       AS first_order_date,
    MAX(order_date)       AS last_order_date
FROM
    {{ ref('int_customer_orders') }}
GROUP BY
    customer_id, first_name, last_name, email, signup_date

Step 4: Run It!

Open your terminal and run:

Terminal

$ dbt run

Running with dbt=1.7.0
Found 3 models, 0 tests, 0 snapshots, 0 seeds

Concurrency: 4 threads (target='dev')

1 of 3 START view model dev.stg_customers .............. [RUN]
1 of 3 OK created view model dev.stg_customers .......... [CREATE VIEW in 0.12s]

2 of 3 START table model dev.fct_customer_summary ........ [RUN]
2 of 3 OK created table model dev.fct_customer_summary ... [SELECT 1,247 in 0.45s]

Finished running 1 view model, 1 table model in 0 hours 0 minutes and 0.89 seconds (0.89s).
Completed successfully

Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

Notice the output says TOTAL=2, not 3! That's because int_customer_orders is ephemeral — it doesn't create anything in the warehouse. It was silently injected as a CTE inside fct_customer_summary. Sneaky and efficient!

📊 Materialization Comparison Table

Here's the ultimate cheat sheet. Bookmark this — you'll come back to it often!

Feature	View	Table	Incremental	Ephemeral	Snapshot
Creates in warehouse?	Yes (view)	Yes (table)	Yes (table)	No	Yes (table)
Stores data?	No	Yes	Yes	No	Yes
Build speed	Instant	Slow (full rebuild)	Fast (new rows only)	N/A	Fast (changed rows)
Query speed	Slow	Fast	Fast	N/A	Fast
Storage cost	None	High	High	None	High
Tracks history?	No	No	No	No	Yes
Best for	Staging, light transforms	Marts, dashboards	Large fact tables	Helper CTEs	SCD Type 2
Typical layer	Staging	Marts	Marts	Intermediate	Snapshots folder

🧠 Quick Quiz — Test Your Understanding!

Let's see if you've mastered models and materializations. Click the answer you think is correct:

Question 1: What is a dbt model?

A) A Python script that transforms data

B) A SQL SELECT statement saved in a .sql file

C) A YAML configuration file

D) A stored procedure in the warehouse

Question 2: Which materialization should you use for a staging model?

A) Table

B) View

C) Incremental

D) Snapshot

Question 3: You have a table with 500 million event rows that grows by 1 million per day. Which materialization saves the most compute?

A) View

B) Table

C) Incremental

D) Ephemeral

Question 4: What does an ephemeral model create in the warehouse?

A) A view

B) A table

C) A temporary table

D) Nothing — it becomes a CTE inside other models

Can you explain the difference between a view and a table to a non-technical friend? Try it out loud! If you can explain it using the "window vs photo" analogy, you've truly understood it.

A model = a SQL SELECT in a .sql file → becomes a table or view
View = saved query, runs fresh every time (staging layer)
Table = physically stored data, fast to query (mart layer)
Incremental = only processes new/changed rows (large fact tables)
3-Layer Architecture: staging (stg_) → intermediate (int_) → marts (fct_ / dim_)
{{ config(materialized='...') }} at the top of a model sets its type
Naming conventions: stg_, int_, fct_, dim_

Ephemeral = invisible CTE, doesn't create warehouse objects
Snapshot = SCD Type 2 history tracking (lives in snapshots/ folder)
Incremental strategies: append, merge, delete+insert
{{ this }} = refers to the current model's existing table
--full-refresh flag to force a complete rebuild of incremental models

🏗️ Models & Materializations — The Heart of dbt

📄 What is a Model?

🧱 Materializations — How dbt Builds Your Models

🪟 Materialization 1: View (Default)

✅ Pros

❌ Cons

📸 Materialization 2: Table

✅ Pros

❌ Cons

📸+ Materialization 3: Incremental

How It Works — Step by Step

The 3 Incremental Strategies

📝 Materialization 4: Ephemeral

📷 Materialization 5: Snapshot

SCD Type 2 — How It Tracks Changes

🌳 The Materialization Decision Tree

🏛️ The 3-Layer Architecture

🥬 Layer 1: Staging (stg_)

🍳 Layer 2: Intermediate (int_)

🍽️ Layer 3: Marts (fct_ and dim_)

📊 Fact Tables (fct_)

👤 Dimension Tables (dim_)

Naming Conventions Cheat Sheet

Real-World Example: E-Commerce Data Flow

🛠️ Hands-On: Build Your First 3 Models

Step 1: Create the Staging Model

Step 2: Create the Intermediate Model

Step 3: Create the Mart Model

Step 4: Run It!

📊 Materialization Comparison Table

🧠 Quick Quiz — Test Your Understanding!

Question 1: What is a dbt model?

Question 2: Which materialization should you use for a staging model?

Question 3: You have a table with 500 million event rows that grows by 1 million per day. Which materialization saves the most compute?

Question 4: What does an ephemeral model create in the warehouse?

🥬 Layer 1: Staging (`stg_`)

🍳 Layer 2: Intermediate (`int_`)

🍽️ Layer 3: Marts (`fct_` and `dim_`)

📊 Fact Tables (`fct_`)

👤 Dimension Tables (`dim_`)