Every SQL file is a model. Every model becomes a table or view. Let's master this.
A model is just a SQL SELECT statement saved in a file. That's it. Seriously โ that's the whole thing.
dbt takes your SELECT and turns it into a real table or view in your warehouse. It's like writing a recipe on a card โ dbt reads the card and cooks the dish for you. You write the instructions, dbt does the cooking!
Let's look at the simplest possible model. Create a file called my_first_model.sql inside your models/ folder:
-- This entire file IS the model. Nothing else needed!
SELECT
1 AS id,
'Hello dbt!' AS greeting
When you run dbt run, dbt reads this file and executes something like:
CREATE VIEW my_first_model AS (
SELECT
1 AS id,
'Hello dbt!' AS greeting
);
The Golden Rule of dbt Models:
๐ One .sql file = One model = One table or view in your warehouse
If you have 50 SQL files in your models/ folder, running dbt run creates 50 tables/views. It's that simple. No config files, no registration, no boilerplate โ just write SQL and go.
Here's a more realistic model โ a staging model that cleans up raw customer data:
SELECT
id AS customer_id,
first_name,
last_name,
email_addr AS email,
created_at::DATE AS signup_date,
CASE
WHEN status = 'A' THEN 'active'
WHEN status = 'I' THEN 'inactive'
ELSE 'unknown'
END AS customer_status
FROM
{{ source('raw', 'customers') }}
Notice there's no CREATE TABLE or CREATE VIEW in the file. You only write the SELECT. dbt wraps it in the appropriate DDL for you based on the materialization you choose. That's the magic!
A "materialization" is just a fancy word for HOW dbt creates your model in the warehouse. Think of it like choosing how to serve food:
๐ช View = Look through a window (query runs fresh every time)
๐ธ Table = Print a photo (data is stored physically)
๐ธ+ Incremental = Add new photos to an album (only process new data)
๐ Ephemeral = A sticky note (exists only temporarily)
๐ท Snapshot = A daily photo diary (tracks changes over time)
dbt has 5 materialization types. Choosing the right one is one of the most important decisions you'll make. Let's deep-dive into each one.
A view is like a window into your data. Every time someone looks through the window, the query runs fresh. Nothing is stored โ it's just a saved query. Imagine a window in your house that always shows the live street outside. You don't take a photo of the street; you just look through the window whenever you want to see what's happening right now.
{{ config(materialized='view') }}
-- This creates a VIEW in your warehouse
-- The query runs fresh every time someone queries stg_orders
SELECT
id AS order_id,
user_id AS customer_id,
order_date,
status,
amount / 100.0 AS amount_dollars
FROM
{{ source('raw', 'orders') }}
When to use views:
A table is like printing a photo. The data is physically stored in your warehouse. It's fast to look at (no waiting for the camera to take a new picture), but you need to reprint the entire photo (rebuild the table) whenever the underlying data changes.
{{ config(materialized='table') }}
-- This creates a physical TABLE in your warehouse
-- Data is stored โ queries are fast!
SELECT
DATE_TRUNC('day', order_date) AS revenue_date,
COUNT(*) AS total_orders,
SUM(amount_dollars) AS total_revenue,
AVG(amount_dollars) AS avg_order_value
FROM
{{ ref('stg_orders') }}
WHERE
status = 'completed'
GROUP BY 1
When to use tables:
dbt run (drops and recreates)View vs Table โ The Restaurant Analogy:
๐ช View = A made-to-order dish. The chef cooks it fresh every time someone orders. Great for quality, but slow if 100 people order at once.
๐ธ Table = A buffet dish. The chef cooks a big batch in advance. Everyone grabs a plate instantly. But if the recipe changes, you need to cook a whole new batch.
Imagine you have a photo album with 10,000 photos. Every day, you take 50 new photos. Would you reprint ALL 10,000 photos every single day? Of course not! You'd only print the 50 new ones and add them to the album.
That's exactly what incremental models do. Instead of rebuilding the entire table every run, dbt only processes the new or changed rows and inserts/merges them. This saves massive amounts of time and compute cost.
{{ config(
materialized='incremental',
unique_key='event_id',
incremental_strategy='merge'
) }}
SELECT
event_id,
user_id,
event_type,
event_timestamp,
page_url,
device_type
FROM
{{ ref('stg_events') }}
-- This WHERE clause is the magic!
-- It only runs on the FIRST build (full load)
-- On subsequent runs, it filters to only NEW rows
{% if is_incremental() %}
WHERE event_timestamp > (
SELECT MAX(event_timestamp)
FROM {{ this }}
)
{% endif %}
Full Refresh vs Incremental โ Visual Comparison
โ Table (Full Rebuild)
โ Incremental
That's a 75% reduction in processing! For tables with billions of rows, this is the difference between minutes and hours.
| Strategy | How It Works | Best For |
|---|---|---|
append |
Simply adds new rows to the table. No duplicate checking. | Event logs, immutable data (rows never change) |
merge |
Inserts new rows AND updates existing rows (using unique_key). |
Data that can be updated (orders, user profiles) |
delete+insert |
Deletes matching rows first, then inserts the new version. | When merge isn't supported or you need a clean swap |
The Full-Refresh Escape Hatch: If your incremental model gets out of sync or you change the logic, you can force a complete rebuild with:
dbt run --full-refresh -s fct_events
This drops the existing table and rebuilds it from scratch โ like reprinting the entire photo album. Use it when you change column definitions or fix bugs in your incremental logic.
The {{ this }} variable is special โ it refers to the current model's table in the warehouse. So SELECT MAX(event_timestamp) FROM {{ this }} gets the latest timestamp already in your table, and you only process rows newer than that.
An ephemeral model is like a sticky note you use while cooking but throw away after. You write "chop onions, mince garlic" on a sticky note as a reminder, but you don't frame it and hang it on the wall. It exists only to help you while you're cooking.
In dbt, an ephemeral model doesn't create anything in your warehouse. Instead, it gets injected as a CTE (Common Table Expression) inside whatever model references it. It's invisible to your warehouse โ only dbt knows it exists.
{{ config(materialized='ephemeral') }}
-- This model won't create a table or view!
-- It becomes a CTE inside any model that ref()s it
SELECT
order_id,
customer_id,
amount_dollars,
amount_dollars * 0.1 AS tax_amount,
amount_dollars * 1.1 AS total_with_tax
FROM
{{ ref('stg_orders') }}
When another model references this ephemeral model, dbt compiles it as a CTE:
-- dbt injects the ephemeral model as a CTE
WITH int_order_amounts AS (
SELECT
order_id,
customer_id,
amount_dollars,
amount_dollars * 0.1 AS tax_amount,
amount_dollars * 1.1 AS total_with_tax
FROM raw.orders
)
SELECT * FROM int_order_amounts
WHERE total_with_tax > 100
When to use ephemeral:
Careful! You can't query an ephemeral model directly in your SQL editor โ it doesn't exist in the warehouse. You also can't use dbt test on it directly. Use ephemeral sparingly and only for truly intermediate calculations.
Snapshots are like taking a photo of your fridge every day. Monday: 6 eggs, whole milk, cheddar cheese. Tuesday: 4 eggs, whole milk, swiss cheese. The milk brand didn't change, but the cheese did! By looking at all your daily photos, you can see exactly what changed and when.
In data terms, this is called Slowly Changing Dimension Type 2 (SCD Type 2). Don't let the fancy name scare you โ it just means "keep a history of how rows change over time."
{% snapshot snapshot_customers %}
{{
config(
target_schema='snapshots',
unique_key='customer_id',
strategy='timestamp',
updated_at='updated_at',
)
}}
SELECT
customer_id,
email,
plan_type,
updated_at
FROM {{ source('raw', 'customers') }}
{% endsnapshot %}
Imagine customer #42 changes their email and upgrades their plan. Here's what the snapshot table looks like over time:
| customer_id | plan_type | dbt_valid_from | dbt_valid_to | |
|---|---|---|---|---|
| 42 | old@email.com | free | 2024-01-01 | 2024-03-15 |
| 42 | old@email.com | pro | 2024-03-15 | 2024-06-20 |
| 42 | new@email.com | pro | 2024-06-20 | null (current) |
The row with dbt_valid_to = null is always the current version. Previous versions have an end date. It's like a time machine for your data โ you can ask "What was customer 42's email on February 1st?" and get the answer!
When to use snapshots:
๐ Tracking customer plan changes for churn analysis
๐ฐ Tracking product price changes over time
๐ข Tracking employee department transfers
๐ฆ Tracking order status changes (pending โ shipped โ delivered)
Not sure which materialization to pick? Follow this flowchart:
When in doubt, start with view (it's the default). If queries are too slow, upgrade to table. If the table is too big to rebuild every run, upgrade to incremental. You can always change the materialization later โ it's just one line of config!
Professional dbt projects organize models into three layers, like a restaurant kitchen with three stations. Raw ingredients come in, get prepped, get cooked, and come out as a beautiful dish. Every model belongs to one of these layers.
stg_)Purpose: One staging model per source table. Clean up the raw data โ rename ugly columns, cast data types, filter out junk. No joins, no aggregations, no business logic.
Think of it like washing vegetables. You don't cook them yet. You just wash off the dirt, peel the skin, and chop them into uniform pieces so they're ready for the next station.
{{ config(materialized='view') }}
SELECT
id AS customer_id,
first_name,
last_name,
LOWER(email_addr) AS email,
created_at::DATE AS signup_date,
CASE
WHEN status = 'A' THEN 'active'
WHEN status = 'I' THEN 'inactive'
ELSE 'unknown'
END AS customer_status
FROM
{{ source('raw', 'customers') }}
int_)Purpose: Combine staging models. Join tables, apply business logic, create calculated fields. This is where the real cooking happens.
{{ config(materialized='ephemeral') }}
SELECT
c.customer_id,
c.first_name,
c.last_name,
c.email,
c.signup_date,
o.order_id,
o.order_date,
o.amount_dollars,
o.status AS order_status
FROM
{{ ref('stg_customers') }} c
LEFT JOIN
{{ ref('stg_orders') }} o
ON c.customer_id = o.customer_id
fct_ and dim_)Purpose: The final, polished tables that your dashboards and analysts query. Marts come in two flavors:
fct_)Events that happened โ orders placed, payments made, clicks recorded. They have numbers you can count and sum.
Examples: fct_orders, fct_payments, fct_page_views
dim_)Descriptive attributes โ customer details, product info, store locations. They describe the "who, what, where" of your facts.
Examples: dim_customers, dim_products, dim_stores
{{ config(materialized='table') }}
SELECT
customer_id,
first_name,
last_name,
email,
signup_date,
COUNT(order_id) AS total_orders,
SUM(amount_dollars) AS lifetime_revenue,
AVG(amount_dollars) AS avg_order_value,
MIN(order_date) AS first_order_date,
MAX(order_date) AS last_order_date,
CASE
WHEN COUNT(order_id) >= 10 THEN 'vip'
WHEN COUNT(order_id) >= 5 THEN 'regular'
WHEN COUNT(order_id) >= 1 THEN 'new'
ELSE 'prospect'
END AS customer_tier
FROM
{{ ref('int_customer_orders') }}
GROUP BY
customer_id, first_name, last_name, email, signup_date
| Prefix | Layer | Materialization | Example | Purpose |
|---|---|---|---|---|
stg_ | Staging | View | stg_customers | Clean raw data, rename columns |
int_ | Intermediate | Ephemeral / View | int_customer_orders | Join and combine staging models |
fct_ | Marts (Fact) | Table / Incremental | fct_orders | Event/transaction tables with metrics |
dim_ | Marts (Dimension) | Table | dim_customers | Descriptive lookup tables |
Let's trace how raw e-commerce data flows through all 3 layers:
Time to get your hands dirty! Let's build a complete 3-layer pipeline from scratch. We'll create a staging model, an intermediate model, and a mart model.
Create the file models/staging/stg_customers.sql:
{{ config(materialized='view') }}
SELECT
id AS customer_id,
first_name,
last_name,
LOWER(email) AS email,
created_at::DATE AS signup_date
FROM
{{ source('raw', 'customers') }}
Create the file models/intermediate/int_customer_orders.sql:
{{ config(materialized='ephemeral') }}
SELECT
c.customer_id,
c.first_name,
c.last_name,
c.email,
c.signup_date,
o.order_id,
o.order_date,
o.amount_dollars
FROM
{{ ref('stg_customers') }} c
LEFT JOIN
{{ ref('stg_orders') }} o
ON c.customer_id = o.customer_id
Create the file models/marts/fct_customer_summary.sql:
{{ config(materialized='table') }}
SELECT
customer_id,
first_name,
last_name,
email,
signup_date,
COUNT(order_id) AS total_orders,
COALESCE(SUM(amount_dollars), 0) AS lifetime_revenue,
COALESCE(AVG(amount_dollars), 0) AS avg_order_value,
MIN(order_date) AS first_order_date,
MAX(order_date) AS last_order_date
FROM
{{ ref('int_customer_orders') }}
GROUP BY
customer_id, first_name, last_name, email, signup_date
Open your terminal and run:
$ dbt run
Running with dbt=1.7.0
Found 3 models, 0 tests, 0 snapshots, 0 seeds
Concurrency: 4 threads (target='dev')
1 of 3 START view model dev.stg_customers .............. [RUN]
1 of 3 OK created view model dev.stg_customers .......... [CREATE VIEW in 0.12s]
2 of 3 START table model dev.fct_customer_summary ........ [RUN]
2 of 3 OK created table model dev.fct_customer_summary ... [SELECT 1,247 in 0.45s]
Finished running 1 view model, 1 table model in 0 hours 0 minutes and 0.89 seconds (0.89s).
Completed successfully
Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
Notice the output says TOTAL=2, not 3! That's because int_customer_orders is ephemeral โ it doesn't create anything in the warehouse. It was silently injected as a CTE inside fct_customer_summary. Sneaky and efficient!
Here's the ultimate cheat sheet. Bookmark this โ you'll come back to it often!
| Feature | View | Table | Incremental | Ephemeral | Snapshot |
|---|---|---|---|---|---|
| Creates in warehouse? | Yes (view) | Yes (table) | Yes (table) | No | Yes (table) |
| Stores data? | No | Yes | Yes | No | Yes |
| Build speed | Instant | Slow (full rebuild) | Fast (new rows only) | N/A | Fast (changed rows) |
| Query speed | Slow | Fast | Fast | N/A | Fast |
| Storage cost | None | High | High | None | High |
| Tracks history? | No | No | No | No | Yes |
| Best for | Staging, light transforms | Marts, dashboards | Large fact tables | Helper CTEs | SCD Type 2 |
| Typical layer | Staging | Marts | Marts | Intermediate | Snapshots folder |
Let's see if you've mastered models and materializations. Click the answer you think is correct:
Can you explain the difference between a view and a table to a non-technical friend? Try it out loud! If you can explain it using the "window vs photo" analogy, you've truly understood it.
stg_) โ intermediate (int_) โ marts (fct_ / dim_)stg_, int_, fct_, dim_