Project Structure | dbt Course

🏠 The Big Picture

A dbt project is like a well-organized kitchen. Every drawer has a purpose, every shelf holds specific items. The spoons don't go in the fridge, and the milk doesn't go in the cutlery drawer!

Right now, you're standing at the kitchen door for the first time. Let's open every drawer and see what's inside. By the end of this lesson, you'll know exactly where everything goes — and why.

Your Entire dbt Project at a Glance

When you run dbt init my_project, dbt creates a folder structure that looks like this. Don't panic — we'll explain every single line:

📁 my_dbt_project/ 📄 dbt_project.yml ← The recipe book cover 📄 packages.yml ← Your shopping list for add-ons 📁 models/ ← Where the magic happens (SQL files) 📁 staging/ ← Prep station (wash & chop) 📄 stg_customers.sql 📄 stg_orders.sql 📄 _stg_models.yml 📁 intermediate/ ← Cooking station (combine) 📄 int_orders_joined.sql 📁 marts/ ← Plating station (serve!) 📄 fct_orders.sql 📄 dim_customers.sql 📄 _mart_models.yml 📁 macros/ ← Secret recipes (reusable SQL) 📄 cents_to_dollars.sql 📁 seeds/ ← Pantry staples (CSV data) 📄 country_codes.csv 📁 snapshots/ ← Food diary (point-in-time) 📄 snapshot_orders.sql 📁 tests/ ← Quality control (custom tests) 📄 assert_positive_revenue.sql 📁 analyses/ ← Scratch pad (ad-hoc queries) 📄 monthly_revenue_check.sql

Color code: Cyan = SQL files | Green = YAML config files | Orange = CSV data files. You'll see this pattern everywhere in dbt!

Think of this folder tree as a map of your kitchen. Each folder is a different station, and each file is a specific tool or ingredient. Let's tour every station now.

📖 dbt_project.yml — The Recipe Book Cover

This is like the cover page of your recipe book — it says what the book is called, who wrote it, and how the recipes are organized. Every dbt project must have this file. Without it, dbt doesn't even know it's looking at a dbt project!

This is the single most important file in your project. It lives at the root of your project folder and tells dbt everything it needs to know about your project. Let's look at a real example:

dbt_project.yml

# ── What is this project called? ──
name: 'my_online_store'       # Your project's unique name (no spaces!)
version: '1.0.0'              # Version number (like a book edition)

# ── Who wrote this? ──
profile: 'my_online_store'    # Links to your profiles.yml (warehouse connection)

# ── Where are things stored? ──
model-paths: ["models"]       # Where dbt looks for SQL models
analysis-paths: ["analyses"]  # Where ad-hoc queries live
test-paths: ["tests"]         # Where custom tests live
seed-paths: ["seeds"]         # Where CSV seed files live
macro-paths: ["macros"]       # Where reusable macros live
snapshot-paths: ["snapshots"] # Where snapshot definitions live

# ── How should models be built by default? ──
models:
  my_online_store:
    staging:
      +materialized: view      # Staging models = lightweight views
    intermediate:
      +materialized: view      # Intermediate = also views
    marts:
      +materialized: table     # Marts = full tables (fast for dashboards!)

Key Settings Explained

Setting	What It Does	Kitchen Analogy
`name`	Unique identifier for your project	The title on your recipe book cover
`version`	Tracks which edition of your project this is	"2nd Edition" on the book spine
`profile`	Links to your warehouse credentials in `profiles.yml`	Which kitchen (warehouse) you're cooking in
`model-paths`	Tells dbt where to find your SQL models	Which shelf has the recipe cards
`+materialized`	How dbt should build models (view, table, incremental)	Should we serve on a plate (table) or a tray (view)?

The name field must match the folder name under the models: section. If your project is called my_online_store, then the models config must also start with my_online_store:. Mismatch = errors!

✨ models/ — Where the Magic Happens

This is the heart and soul of your dbt project. Every SQL file in this folder is a "model" — a transformation that turns raw data into something useful. Most of your time as a dbt developer will be spent here.

Think of the models/ folder as the main cooking area of your restaurant kitchen. It has three stations, and every dish passes through them in order — just like an assembly line!

The 3-Layer Architecture

Professional dbt projects organize models into three layers. This isn't just a suggestion — it's the industry-standard best practice used by companies like Spotify, GitLab, and Shopify.

🥬

Staging

Prep Station

Wash, chop, rename

→

🍳

Intermediate

Cooking Station

Combine, join, enrich

→

🍽️

Marts

Plating Station

Ready to serve!

Restaurant analogy — making a Caesar salad:

🥬 Staging (Prep Station) — Wash the lettuce, chop the croutons, grate the parmesan. You're just cleaning and renaming raw ingredients. No mixing yet!

🍳 Intermediate (Cooking Station) — Toss the lettuce with dressing, mix in croutons and parmesan. You're combining prepped ingredients into something new.

🍽️ Marts (Plating Station) — Arrange beautifully on a plate, add a lemon wedge. This is the final dish that goes to the customer (your dashboard).

🥬 Layer 1: Staging Models (`stg_`)

Staging models are your first touch on raw data. They do simple, boring-but-essential work: rename columns, cast data types, and apply light cleaning. One staging model per source table.

models/staging/stg_customers.sql

-- Staging model: clean up raw customer data
-- One staging model per source table. Keep it simple!

SELECT
    id          AS customer_id,
    first_name,
    last_name,
    email_addr  AS email,           -- Rename for clarity
    created_at::DATE AS signup_date  -- Cast to date
FROM
    {{ source('raw', 'customers') }}

🍳 Layer 2: Intermediate Models (`int_`)

Intermediate models combine staging models. They join tables, aggregate data, and create business logic. Think of them as the "cooking" step.

models/intermediate/int_orders_joined.sql

-- Intermediate: join customers with their orders

SELECT
    o.order_id,
    o.order_date,
    o.amount,
    c.customer_id,
    c.first_name,
    c.email
FROM
    {{ ref('stg_orders') }} o
LEFT JOIN
    {{ ref('stg_customers') }} c
    ON o.customer_id = c.customer_id

🍽️ Layer 3: Mart Models (`fct_` and `dim_`)

Mart models are the final, polished tables that your dashboards and analysts query directly. They come in two flavors:

📊 Fact Tables (`fct_`)

Events that happened — orders placed, payments made, clicks recorded. They have numbers you can count and sum.

👤 Dimension Tables (`dim_`)

Descriptive attributes — customer details, product info, store locations. They describe the "who, what, where" of your facts.

models/marts/fct_orders.sql

-- Mart: final orders fact table for dashboards

SELECT
    order_id,
    customer_id,
    order_date,
    amount,
    CASE
        WHEN amount > 100 THEN 'high_value'
        WHEN amount > 50  THEN 'medium_value'
        ELSE 'low_value'
    END AS order_tier
FROM
    {{ ref('int_orders_joined') }}

Naming Conventions Cheat Sheet

Prefix	Layer	Example	Purpose
`stg_`	Staging	`stg_customers`	Clean raw data, rename columns
`int_`	Intermediate	`int_orders_joined`	Join and combine staging models
`fct_`	Marts (Fact)	`fct_orders`	Event/transaction tables with metrics
`dim_`	Marts (Dimension)	`dim_customers`	Descriptive lookup tables

Naming conventions aren't just cosmetic — they let anyone on your team instantly understand what a model does just by reading its name. stg_orders? That's a staging model for orders. fct_revenue? That's a final fact table for revenue. No guessing needed!

🧙 macros/ — Your Secret Recipes

Macros are like those cooking shortcuts your grandma taught you — instead of writing the same 10 steps every time you make gravy, you just say "make gravy the grandma way" and everyone knows what to do. Write once, reuse everywhere!

A macro is a reusable piece of SQL written with Jinja (a templating language). You define it once in the macros/ folder and then call it from any model. This keeps your code DRY (Don't Repeat Yourself).

macros/cents_to_dollars.sql

-- Macro: convert cents to dollars (reusable!)

{% macro cents_to_dollars(column_name) %}
    (ROUND({{ column_name }} / 100.0, 2))
{% endmacro %}

Now you can use it in any model like this:

Using the macro in a model

SELECT
    order_id,
    {{ cents_to_dollars('amount_cents') }} AS amount_dollars
FROM
    {{ ref('stg_payments') }}

Without a macro: You write ROUND(amount_cents / 100.0, 2) in 15 different models. One day the boss says "use 4 decimal places." You have to find and change all 15 files. 😱

With a macro: You change the formula in one file (cents_to_dollars.sql), and all 15 models are instantly updated. 🎉

🧂 seeds/ — Your Pantry Staples

Seeds are like the salt, pepper, and spices in your pantry — small, rarely changing reference data that you always keep handy. You don't buy new salt every day; it just sits there, ready when you need it.

Seeds are CSV files that dbt loads directly into your warehouse as tables. They're perfect for small, static reference data that doesn't come from a source system.

seeds/country_codes.csv

country_code,country_name,region
US,United States,North America
GB,United Kingdom,Europe
IN,India,Asia
AU,Australia,Oceania
BR,Brazil,South America

Run dbt seed and this CSV becomes a real table in your warehouse that you can join with any model using {{ ref('country_codes') }}.

When to Use Seeds vs. Sources

🌱 Use Seeds When...

Data is small (< 1,000 rows)
Data rarely changes
Data doesn't come from a source system
Examples: country codes, status mappings, category labels

📡 Use Sources When...

Data is large or growing
Data changes frequently
Data comes from apps, APIs, or databases
Examples: orders, users, transactions, events

📸 snapshots/ — Your Food Diary

Snapshots are like taking a photo of your fridge every day. Yesterday there were 6 eggs. Today there are 4. Tomorrow there might be 12 (you went shopping!). By looking at all the photos, you can see exactly what changed and when.

In the data world, this is called Slowly Changing Dimension Type 2 (SCD Type 2). Sounds scary, but it's simple: dbt keeps a history of how rows change over time.

snapshots/snapshot_orders.sql

{% snapshot snapshot_orders %}

{{
    config(
        target_schema='snapshots',
        unique_key='order_id',
        strategy='timestamp',
        updated_at='updated_at',
    )
}}

SELECT * FROM {{ source('raw', 'orders') }}

{% endsnapshot %}

How SCD Type 2 Works — Visually

Imagine a customer changes their email address. Here's what the snapshot table looks like:

customer_id	email	dbt_valid_from	dbt_valid_to
42	old@email.com	2024-01-01	2024-06-15
42	new@email.com	2024-06-15	null (current)

The row with dbt_valid_to = null is always the current version. Previous versions have an end date, so you can see the full history. It's like a time machine for your data!

🧪 tests/ — Quality Control

Tests are like the health inspector visiting your kitchen — they check that everything is safe, clean, and correct before any food reaches a customer. If something is wrong, the inspector shuts things down and tells you exactly what to fix.

The tests/ folder holds custom SQL tests — queries that should return zero rows if everything is correct. If any rows come back, the test fails (something is wrong!).

tests/assert_positive_revenue.sql

-- This test FAILS if any order has negative revenue
-- (A passing test returns zero rows)

SELECT
    order_id,
    amount
FROM
    {{ ref('fct_orders') }}
WHERE
    amount < 0

Two types of tests in dbt:

📋 Schema tests — Defined in YAML files (e.g., unique, not_null). Quick, declarative, no SQL needed.

📝 Custom tests — SQL files in the tests/ folder. For complex business logic that schema tests can't cover.

Run dbt test after every dbt run. Think of it as tasting your food before serving it. Never skip this step!

📝 analyses/ — Your Scratch Pad

This is your notepad where you try new recipe ideas before adding them to the official recipe book. Maybe you want to experiment with a new sauce — you scribble notes here first. If it works, you promote it to a real model. If not, no harm done!

Analysis files are SQL files that dbt compiles but does NOT run. They're perfect for:

🔍 Ad-hoc investigations ("Why did revenue drop last Tuesday?")
📊 One-off reports for your boss
🧪 Prototyping a new model before making it official
📋 Audit queries you run periodically

analyses/monthly_revenue_check.sql

-- Quick check: monthly revenue trend
-- This compiles (so ref() works) but doesn't create a table

SELECT
    DATE_TRUNC('month', order_date) AS month,
    SUM(amount) AS total_revenue,
    COUNT(*) AS order_count
FROM
    {{ ref('fct_orders') }}
GROUP BY 1
ORDER BY 1 DESC

The key difference: models create tables/views in your warehouse. Analyses just compile SQL (resolving ref() and source()) so you can copy-paste the compiled query into your SQL editor. They're for humans, not for the warehouse.

🏷️ schema.yml — The Label Maker

Imagine you have a label maker in your kitchen. You stick labels on every jar: "Sugar", "Salt", "Flour — do NOT confuse with powdered sugar!" The schema.yml file is your label maker — it describes every model, every column, and adds tests to make sure nothing goes wrong.

Schema YAML files (usually named _models.yml or schema.yml) live alongside your models and serve three purposes:

📖

Documentation

Describe what each model and column means

🧪

Testing

Define tests like unique, not_null, accepted_values

🔗

Relationships

Define how models connect to each other

models/marts/_mart_models.yml

version: 2

models:
  - name: fct_orders
    description: "Final orders fact table. One row per order."
    columns:
      - name: order_id
        description: "Unique identifier for each order"
        tests:
          - unique            # No duplicate order IDs!
          - not_null          # Every row must have an ID

      - name: amount
        description: "Order total in dollars"
        tests:
          - not_null

      - name: order_tier
        description: "Classification: high_value, medium_value, low_value"
        tests:
          - accepted_values:
              values: ['high_value', 'medium_value', 'low_value']

  - name: dim_customers
    description: "Customer dimension table. One row per customer."
    columns:
      - name: customer_id
        description: "Unique customer identifier"
        tests:
          - unique
          - not_null
          - relationships:
              to: ref('stg_customers')
              field: customer_id

How it all connects: When you run dbt test, dbt reads these YAML files and automatically generates SQL queries to check every test you defined. It's like having a health inspector who reads your labels and verifies every jar actually contains what it says.

🧠 Quick Quiz — Test Your Understanding!

Let's see if you can navigate the dbt kitchen with your eyes closed. Click the answer you think is correct:

Question 1: Where do your SQL transformation files live?

A) seeds/

B) models/

C) macros/

D) analyses/

Question 2: What naming prefix indicates a staging model?

A) fct_

B) dim_

C) stg_

D) raw_

Question 3: What is the purpose of the seeds/ folder?

A) Store your SQL transformation models

B) Store reusable macros

C) Store small, static CSV reference data

D) Store point-in-time snapshots

Close your eyes and try to draw the dbt project folder tree from memory. Can you name all 7 main folders and explain what each one does? If you can, you're ready for the next lesson!

dbt_project.yml — The main config file; every project must have one
models/ — SQL transformations organized in 3 layers: staging → intermediate → marts
Naming: stg_ (staging), int_ (intermediate), fct_ (facts), dim_ (dimensions)
macros/ — Reusable SQL snippets (write once, use everywhere)
seeds/ — Small CSV reference data loaded into the warehouse
tests/ — Custom SQL tests that should return zero rows when passing
schema.yml — Describes models, columns, and defines tests

snapshots/ — SCD Type 2 history tracking (important but advanced)
analyses/ — Ad-hoc queries that compile but don't create tables
packages.yml — Third-party package management (covered in a later lesson)
The specific materialization settings in dbt_project.yml (covered in Models lesson)

📁 dbt Project Structure — Every File Explained