LESSON 9 ยท INTERMEDIATE

๐Ÿง™ Jinja & Macros โ€” SQL That Writes SQL

The secret superpower that makes dbt 10x more powerful than plain SQL

๐Ÿค” What is Jinja?

Imagine you're writing a birthday card to 100 friends. Instead of writing each card by hand โ€” "Dear Alice, Happy Birthday!", "Dear Bob, Happy Birthday!", "Dear Charlie..." โ€” you create a template:

"Dear {name}, Happy Birthday! You're turning {age}!"

Then you let a machine fill in each name and age automatically. That's Jinja โ€” a templating language that lets you write dynamic SQL. Instead of writing the same SQL 20 times with tiny changes, you write it once and let Jinja do the repetitive work.

Why Does dbt Use Jinja?

Plain SQL is static โ€” what you write is exactly what runs. But real-world data problems need dynamic behavior:

โŒ Plain SQL Problems

Copy-paste the same logic 20 times. Change one thing? Update all 20 files. Want different behavior in dev vs prod? Maintain two separate queries.

โœ… Jinja + SQL Solutions

Write logic once, reuse everywhere. One change updates everything. Dev vs prod? One if statement handles both automatically.

The Three Jinja Delimiters

Jinja uses three special bracket styles. Think of them as three different types of instructions:

Syntax Name What It Does Analogy
{{ }} Expressions Outputs a value (prints something) The blank in a Mad Libs: "I ate a ___"
{% %} Statements Logic โ€” if/else, for loops, macros The instructions: "If rainy, bring umbrella"
{# #} Comments Notes that don't appear in output Sticky notes to yourself: "TODO: fix this"

How dbt Compiles Jinja

When you run dbt run, dbt takes your Jinja+SQL file and compiles it into pure SQL before sending it to your database. Your database never sees the Jinja โ€” it only sees clean SQL.

๐Ÿ“
You Write
Jinja + SQL
โ†’
โš™๏ธ
dbt
Compiles
โ†’
๐Ÿ“„
Pure SQL
Generated
โ†’
๐Ÿ—„๏ธ
Database
Executes
What you write (Jinja + SQL)
SELECT
    customer_id,
    first_name,
    last_name
FROM
    {{ ref('stg_customers') }}
WHERE
    status = 'active'
What dbt compiles it to (pure SQL sent to database)
SELECT
    customer_id,
    first_name,
    last_name
FROM
    analytics.staging.stg_customers
WHERE
    status = 'active'

You can always see the compiled SQL by running dbt compile or checking the target/compiled/ folder. This is incredibly useful for debugging โ€” if your Jinja isn't producing the SQL you expect, look at the compiled output!

๐ŸŽฏ Variables & Filters

Variables are like blanks in a Mad Libs game. You leave a blank: "The ___ jumped over the ___" and someone fills in the words. In dbt, you define variables once and use them everywhere.

Filters are like autocorrect โ€” they automatically clean up or transform text. "bob" becomes "BOB" with the upper filter. No manual work needed!

Project Variables with var()

Define variables in your dbt_project.yml and use them across your entire project:

dbt_project.yml
name: 'my_project'
version: '1.0.0'

vars:
  start_date: '2023-01-01'
  currency: 'USD'
  enable_debug: false
models/marts/fct_orders.sql โ€” Using variables
SELECT
    order_id,
    customer_id,
    amount,
    order_date
FROM
    {{ ref('stg_orders') }}
WHERE
    order_date >= '{{ var('start_date') }}'

{# Compiles to: WHERE order_date >= '2023-01-01' #}

You can also override variables from the command line:

Terminal โ€” Override a variable at runtime
# Use a different start date without changing any files
$ dbt run --vars '{"start_date": "2024-06-01"}'

# Now the query uses 2024-06-01 instead of 2023-01-01

Jinja Filters

Filters transform values using the pipe | symbol. They're chained after a value to modify it:

Filter What It Does Example Result
upperUPPERCASE{{ "hello" | upper }}HELLO
lowerlowercase{{ "HELLO" | lower }}hello
trimRemove whitespace{{ " hi " | trim }}hi
replaceSwap text{{ "cat" | replace("c","b") }}bat
defaultFallback value{{ undefined_var | default("N/A") }}N/A
list | joinCombine list items{{ ["a","b","c"] | join(", ") }}a, b, c

Think of filters like a factory assembly line. A raw material (your value) goes in one end, passes through machines (filters) that shape, polish, and transform it, and a finished product comes out the other end. You can chain multiple filters: {{ " hello " | trim | upper }} โ†’ HELLO

๐Ÿ”€ if/else Logic

It's like a recipe that says: "If you're cooking for 2 people, use 1 cup of rice. If you're cooking for 10 people, use 5 cups." The recipe adapts based on the situation. Jinja's if/else lets your SQL adapt based on the environment, variables, or any condition you want.

The Basic Pattern

Jinja if/else syntax
{% if condition %}
    -- SQL when condition is true
{% else %}
    -- SQL when condition is false
{% endif %}

Use Case: Dev vs Prod

This is the #1 most common use of Jinja if/else in dbt. In development, you don't want to process millions of rows โ€” it's slow and expensive. So you limit the data:

models/marts/fct_orders.sql
SELECT
    order_id,
    customer_id,
    amount,
    order_date
FROM
    {{ ref('stg_orders') }}

{% if target.name == 'dev' %}
WHERE order_date >= DATEADD('day', -30, CURRENT_DATE)
{# In dev: only last 30 days โ€” fast! #}
{% endif %}

{# In prod: no WHERE clause โ€” process ALL data #}

๐Ÿ› ๏ธ In Dev (target.name = 'dev')

SELECT ... FROM stg_orders
WHERE order_date >= DATEADD('day', -30, CURRENT_DATE)

Only 30 days of data โ†’ runs in seconds

๐Ÿš€ In Prod (target.name = 'prod')

SELECT ... FROM stg_orders

All data โ†’ complete and accurate

Use Case: Conditional Columns

models/marts/dim_customers.sql
SELECT
    customer_id,
    first_name,
    last_name,
    email

    {% if var('include_pii', false) %}
    , phone_number
    , home_address
    {# Only include sensitive data if explicitly enabled #}
    {% endif %}

FROM
    {{ ref('stg_customers') }}

The target object gives you information about the current environment. target.name is the profile target (dev, prod, staging). target.schema is the database schema. target.database is the database name. Use these to make your models environment-aware!

๐Ÿ”„ for Loops

Instead of writing SUM(jan_sales), SUM(feb_sales), SUM(mar_sales), ... SUM(dec_sales) by hand โ€” all 12 lines โ€” you write a loop that does it for all 12 months automatically. It's like telling a robot: "Stamp every envelope in this pile" instead of stamping each one yourself.

The Basic Pattern

Jinja for loop syntax
{% for item in list %}
    -- SQL that repeats for each item
    {{ item }}
{% endfor %}

Example: Generating Monthly Columns

models/marts/fct_monthly_revenue.sql โ€” What you write
{% set months = [
    'jan', 'feb', 'mar', 'apr', 'may', 'jun',
    'jul', 'aug', 'sep', 'oct', 'nov', 'dec'
] %}

SELECT
    customer_id,
    {% for month in months %}
    SUM({{ month }}_sales) AS total_{{ month }}_sales{% if not loop.last %},{% endif %}
    {% endfor %}
FROM
    {{ ref('stg_sales') }}
GROUP BY customer_id
Compiled SQL โ€” What the database sees
SELECT
    customer_id,
    SUM(jan_sales) AS total_jan_sales,
    SUM(feb_sales) AS total_feb_sales,
    SUM(mar_sales) AS total_mar_sales,
    SUM(apr_sales) AS total_apr_sales,
    SUM(may_sales) AS total_may_sales,
    SUM(jun_sales) AS total_jun_sales,
    SUM(jul_sales) AS total_jul_sales,
    SUM(aug_sales) AS total_aug_sales,
    SUM(sep_sales) AS total_sep_sales,
    SUM(oct_sales) AS total_oct_sales,
    SUM(nov_sales) AS total_nov_sales,
    SUM(dec_sales) AS total_dec_sales
FROM
    analytics.staging.stg_sales
GROUP BY customer_id

๐Ÿ”‘ The loop Object

Inside a for loop, Jinja gives you a special loop variable:

loop.index โ€” Current iteration (1, 2, 3...)

loop.first โ€” True on the first iteration

loop.last โ€” True on the last iteration (perfect for trailing commas!)

loop.length โ€” Total number of items

Example: UNION ALL Multiple Sources

models/staging/stg_all_payments.sql
{% set payment_methods = ['credit_card', 'paypal', 'bank_transfer', 'apple_pay'] %}

{% for method in payment_methods %}

SELECT
    order_id,
    amount,
    '{{ method }}' AS payment_method
FROM
    {{ ref('stg_' ~ method ~ '_payments') }}

{% if not loop.last %} UNION ALL {% endif %}

{% endfor %}

Adding a 5th payment method? Without Jinja, you'd copy-paste a whole SELECT block and manually add UNION ALL. With Jinja, you just add 'venmo' to the list. One word change instead of 6 lines. Now imagine doing this with 20 payment methods โ€” Jinja saves you from writing 120+ lines of repetitive SQL!

๐Ÿ› ๏ธ Writing Your Own Macros

A macro is like a cooking shortcut. Instead of writing "preheat oven to 350ยฐF, grease the pan, line with parchment paper" every time you bake, you just say "prep_oven()" and it does all three steps automatically.

In dbt, a macro is a reusable Jinja function. You define it once, and call it from any model. It's the ultimate DRY (Don't Repeat Yourself) tool.

Where Do Macros Live?

๐Ÿ“ my_dbt_project/ ๐Ÿ“ models/ ๐Ÿ“„ fct_orders.sql ๐Ÿ“„ dim_customers.sql ๐Ÿ“ macros/ โ† Your macros go here! ๐Ÿ“„ cents_to_dollars.sql ๐Ÿ“„ generate_schema_name.sql ๐Ÿ“„ limit_data_in_dev.sql ๐Ÿ“„ dbt_project.yml

Example 1: cents_to_dollars

Many payment systems store amounts in cents (e.g., $49.99 = 4999). You constantly need to convert:

macros/cents_to_dollars.sql โ€” Define the macro
{% macro cents_to_dollars(column_name, decimal_places=2) %}
    ROUND( {{ column_name }} / 100.0, {{ decimal_places }} )
{% endmacro %}
models/marts/fct_orders.sql โ€” Call the macro
SELECT
    order_id,
    customer_id,
    {{ cents_to_dollars('amount_cents') }} AS amount_usd,
    {{ cents_to_dollars('tax_cents') }} AS tax_usd,
    {{ cents_to_dollars('shipping_cents') }} AS shipping_usd
FROM
    {{ ref('stg_orders') }}
Compiled SQL โ€” What the database sees
SELECT
    order_id,
    customer_id,
    ROUND( amount_cents / 100.0, 2 ) AS amount_usd,
    ROUND( tax_cents / 100.0, 2 ) AS tax_usd,
    ROUND( shipping_cents / 100.0, 2 ) AS shipping_usd
FROM
    analytics.staging.stg_orders

Example 2: limit_data_in_dev

Remember the if/else for dev vs prod? Let's turn that into a reusable macro:

macros/limit_data_in_dev.sql
{% macro limit_data_in_dev(column_name, dev_days=3) %}

{% if target.name == 'dev' %}
WHERE {{ column_name }} >= DATEADD('day', -{{ dev_days }}, CURRENT_DATE)
{% endif %}

{% endmacro %}
models/marts/fct_orders.sql โ€” Now every model can use it!
SELECT
    order_id,
    customer_id,
    amount,
    order_date
FROM
    {{ ref('stg_orders') }}

{{ limit_data_in_dev('order_date') }}

Without macros vs. with macros:

โŒ Copy-paste the same WHERE clause into 30 models โ†’ change the logic? Update 30 files

โœ… One macro, called in 30 models โ†’ change the logic? Update 1 file

That's the power of DRY (Don't Repeat Yourself)!

Example 3: generate_schema_name (Override)

dbt has a built-in macro called generate_schema_name that decides which schema a model is built in. You can override it:

macros/generate_schema_name.sql
{% macro generate_schema_name(custom_schema_name, node) %}

    {% if custom_schema_name is none %}
        {{ target.schema }}
    {% elif target.name == 'prod' %}
        {{ custom_schema_name | trim }}
    {% else %}
        {{ target.schema }}_{{ custom_schema_name | trim }}
    {% endif %}

{% endmacro %}

{# In prod: models go to the exact schema name (e.g., "marts") #}
{# In dev: models go to "dev_john_marts" โ€” isolated per developer #}

Start small! Your first macro should solve a real pain point โ€” something you've copy-pasted more than 3 times. The cents_to_dollars and limit_data_in_dev macros are perfect first macros because they're simple, useful, and easy to understand.

๐Ÿ“ฆ Built-in dbt Macros

dbt comes with a toolbox full of pre-built macros โ€” like a kitchen that already has a blender, toaster, and microwave. You don't have to build these appliances yourself; they're ready to use on day one!

You've already been using some of these without realizing it. ref() and source() are macros!

The Most Useful Built-in Macros

Macro What It Does Example
ref() References another model (builds the DAG) {{ ref('stg_orders') }}
source() References a raw source table {{ source('raw_shop', 'orders') }}
config() Sets model configuration (materialization, etc.) {{ config(materialized='table') }}
var() Reads a project variable {{ var('start_date') }}
run_query() Executes SQL and returns results at compile time {% set results = run_query(sql) %}
log() Prints a message to the dbt log {{ log("Building model...", info=True) }}
exceptions.raise_compiler_error() Stops compilation with a custom error {{ exceptions.raise_compiler_error("Bad config!") }}
adapter.dispatch() Calls different SQL per database (Snowflake vs Postgres) {{ adapter.dispatch('my_macro')() }}

run_query() โ€” SQL at Compile Time

This is a powerful macro that lets you query the database while dbt is compiling your models. Use it to dynamically generate SQL based on actual data:

models/marts/fct_pivoted_payments.sql
{# Get all distinct payment methods from the database #}
{% set payment_query %}
    SELECT DISTINCT payment_method
    FROM {{ ref('stg_payments') }}
    ORDER BY 1
{% endset %}

{% set results = run_query(payment_query) %}
{% set methods = results.columns[0].values() %}

SELECT
    order_id,
    {% for method in methods %}
    SUM(CASE WHEN payment_method = '{{ method }}' THEN amount ELSE 0 END) AS {{ method }}_amount{% if not loop.last %},{% endif %}
    {% endfor %}
FROM
    {{ ref('stg_payments') }}
GROUP BY order_id

Why is this amazing? If someone adds a new payment method (say "crypto"), you don't need to change any code. The next time dbt runs, it queries the database, finds "crypto" in the list, and automatically generates a crypto_amount column. The SQL writes itself!

๐ŸŒ Jinja + SQL: Real-World Examples

Let's see how Jinja transforms real, messy, copy-paste-heavy SQL into clean, maintainable code. These are patterns you'll use every single week as a dbt developer.

Pattern 1: DRY Column Transformations

โŒ Before (Repetitive)

SELECT
  LOWER(TRIM(first_name)) AS first_name,
  LOWER(TRIM(last_name)) AS last_name,
  LOWER(TRIM(email)) AS email,
  LOWER(TRIM(city)) AS city,
  LOWER(TRIM(state)) AS state
FROM raw_customers

5 lines of nearly identical code

โœ… After (DRY with Jinja)

{% set cols = ['first_name',
  'last_name', 'email',
  'city', 'state'] %}

SELECT
  {% for col in cols %}
  LOWER(TRIM({{ col }})) AS {{ col }}
  {% if not loop.last %},{% endif %}
  {% endfor %}
FROM raw_customers

Add a column? Just add to the list!

Pattern 2: Environment-Aware Queries

macros/get_database.sql
{% macro get_database() %}
    {% if target.name == 'prod' %}
        PRODUCTION_DB
    {% elif target.name == 'staging' %}
        STAGING_DB
    {% else %}
        DEV_DB
    {% endif %}
{% endmacro %}

Pattern 3: Grant Permissions Macro

macros/grant_select.sql
{% macro grant_select(schema, role) %}

    {% set grant_sql %}
        GRANT SELECT ON ALL TABLES IN SCHEMA {{ schema }}
        TO ROLE {{ role }};
    {% endset %}

    {{ log("Granting SELECT to " ~ role, info=True) }}
    {% do run_query(grant_sql) %}

{% endmacro %}
80%
Less Copy-Paste

Macros eliminate most repetitive SQL

1
Place to Update

Change logic once, it updates everywhere

โˆž
Reusability

Use the same macro in every model

โš ๏ธ Common Jinja Mistakes

Even experienced dbt developers trip over these. Learning the common pitfalls now will save you hours of debugging later. Think of this as the "don't touch the hot stove" section โ€” learn from others' burns!

๐Ÿ› Mistake #1: Forgetting {% endif %} or {% endfor %}

Every {% if %} needs an {% endif %}. Every {% for %} needs an {% endfor %}. Forget one and dbt gives you a cryptic error like "unexpected end of template".

โŒ {% if target.name == 'dev' %}
   WHERE date > '2024-01-01'
   {# Missing {% endif %}! #}

๐Ÿ› Mistake #2: Whitespace Issues

Jinja blocks can add unwanted blank lines to your compiled SQL. Use the dash syntax {%- -%} to trim whitespace:

{% for col in columns %} โ† Adds blank lines
{%- for col in columns -%} โ† Clean output, no extra whitespace

๐Ÿ› Mistake #3: Trailing Commas in Loops

When generating comma-separated lists, the last item shouldn't have a trailing comma. Always use loop.last:

โŒ {{ col }}, โ† Last item gets a trailing comma โ†’ SQL error!
โœ… {{ col }}{% if not loop.last %},{% endif %} โ† No trailing comma

๐Ÿ› Mistake #4: Using Jinja in .yml Where It's Not Supported

Jinja works in .sql files and some parts of .yml files (like descriptions and test arguments). But it does not work in all YAML fields. If your Jinja isn't rendering, check whether that YAML field supports it.

โœ… description: "{{ doc('my_doc') }}" โ† Works!
โœ… to: ref('stg_customers') โ† Works in test args!
โŒ name: "{{ var('model_name') }}" โ† Does NOT work

Debugging tip: When your Jinja isn't working, run dbt compile and check the output in target/compiled/. This shows you the exact SQL that dbt generated. If the Jinja didn't render, you'll see the raw {{ }} tags still there โ€” that's your clue!

๐Ÿง  Quick Quiz โ€” Test Your Understanding!

Let's see if you've unlocked the Jinja superpower. Click the answer you think is correct:

Question 1: What does {{ }} do in Jinja?

A) Defines a for loop
B) Outputs (prints) a value into the compiled SQL
C) Creates a comment that won't appear in the output
D) Defines a macro function

Question 2: What is a dbt macro?

A) A type of database table that stores temporary data
B) A YAML configuration file for model settings
C) A reusable Jinja function defined in the macros/ folder
D) A built-in SQL function provided by your database

Question 3: How do you avoid a trailing comma in a Jinja for loop?

A) Use {% break %} on the last iteration
B) dbt automatically removes trailing commas
C) Use {% if not loop.last %},{% endif %} to conditionally add the comma
D) Put the comma before the column name instead of after

Can you explain to a colleague what Jinja is and why dbt uses it? Can you write a simple macro and call it from a model? Can you use a for loop to generate dynamic columns? If yes โ€” you've unlocked the Jinja superpower!

  • Jinja basics โ€” {{ }} for expressions, {% %} for logic, {# #} for comments
  • Variables โ€” var('name') reads project variables from dbt_project.yml
  • if/else โ€” Conditional SQL, especially for dev vs prod behavior
  • for loops โ€” Generate repetitive SQL dynamically (columns, UNION ALL, etc.)
  • Macros โ€” Reusable Jinja functions in the macros/ folder
  • Built-in macros โ€” ref(), source(), config(), var()
  • dbt compile โ€” See the compiled SQL to debug Jinja issues
  • Jinja filters โ€” | upper, | lower, | trim, | replace, | default
  • run_query() โ€” Execute SQL at compile time for dynamic model generation
  • adapter.dispatch() โ€” Write database-agnostic macros
  • generate_schema_name โ€” Override dbt's default schema naming
  • Whitespace control โ€” {%- -%} for clean compiled output
{PW}