LESSON 1 ยท BEGINNER

๐Ÿ‘‹ Meet dbt โ€” Your Data's Best Friend

Discover the tool that changed how the world transforms data. No jargon, just fun analogies and "aha!" moments.

๐Ÿค” What is dbt?

Imagine you have a HUGE messy room full of toys (data). Lego bricks are mixed with action figures, puzzle pieces are under the bed, and crayons are in your sock drawer. It's chaos!

Now imagine you have a magical robot helper that sorts every toy into neat, labelled boxes โ€” Legos in one box, puzzles in another โ€” and then checks that nothing is broken or missing. All you have to do is write simple instructions on sticky notes (SQL files) like "put all the red Legos together."

That magical robot is dbt.

The Technical Definition

dbt stands for Data Build Tool. It's an open-source command-line tool that lets analytics engineers transform raw data inside a data warehouse using plain SQL.

Instead of moving data out of the warehouse, cleaning it somewhere else, and loading it back (the old way), dbt says: "Just write SELECT statements. I'll handle the rest."

dbt doesn't extract or load data โ€” it only handles the T (Transform) in ELT. Think of it as the chef in the kitchen, not the delivery truck.

How dbt Works โ€” In 30 Seconds

Here's the entire magic trick in four steps:

๐Ÿ“ฆ
Raw Data
โ†’
๐Ÿ”„
dbt
(Transform)
โ†’
โœจ
Clean Tables
โ†’
๐Ÿ“Š
Dashboard
Messy Data dbt Transform Clean Tables Dashboards

You write a SQL file (called a model). dbt reads it, figures out dependencies, runs it in your warehouse, and creates a clean table or view. That's it. No fancy GUI, no drag-and-drop โ€” just SQL and a sprinkle of magic (Jinja templating).

๐Ÿงฉ The Three Things dbt Does

  1. Transforms โ€” Turns messy raw data into clean, business-ready tables using SQL
  2. Tests โ€” Automatically checks your data for problems (nulls, duplicates, bad values)
  3. Documents โ€” Generates a beautiful website showing every table, column, and how they connect

๐Ÿณ The Restaurant Kitchen Analogy

The best way to understand dbt is to think of a restaurant kitchen. Let's map every part of the data world to something you already know:

๐Ÿฅฌ Raw ingredients (potatoes, tomatoes, chicken) = Raw data arriving from apps, databases, and APIs

๐Ÿช The kitchen (with ovens, stoves, and countertops) = Your data warehouse (Snowflake, BigQuery, Redshift, Postgres)

๐Ÿ“ Recipes (step-by-step cooking instructions) = dbt models (SQL files that describe transformations)

๐Ÿ‘จโ€๐Ÿณ The head chef (reads recipes, manages cooking order) = dbt itself (reads SQL files, manages execution order)

๐Ÿงช Quality checks (taste-testing before serving) = dbt tests (automated data quality checks)

๐Ÿ“‹ The menu (what dishes are available and what's in them) = dbt documentation (auto-generated docs for your data)

๐Ÿšš Raw Ingredients (Raw Data) ๐Ÿช THE KITCHEN (Data Warehouse) ๐Ÿ‘จโ€๐Ÿณ dbt (the Chef) Follows recipes (SQL) ๐Ÿงช Quality Check (dbt Tests) ๐Ÿฝ๏ธ Served Dish (Dashboard) ๐Ÿ“ Recipe (SQL) SELECT name, SUM(qty) FROM raw_orders ๐Ÿ“‹ The Menu (dbt Documentation)

Without dbt, it's like having 10 different cooks in the kitchen, each with their own secret recipe, nobody writing anything down, and no one taste-testing the food before it goes to the customer. Chaos!

With dbt, there's one recipe book (your SQL models in Git), one head chef (dbt), and automatic taste-testing (dbt tests) before anything reaches the customer (your dashboard).

๐Ÿ”€ ETL vs ELT โ€” The Big Shift

Before we go deeper into dbt, you need to understand the biggest revolution in data engineering in the last decade: the shift from ETL to ELT.

๐Ÿš๏ธ Traditional ETL

Extract โ†’ Transform โ†’ Load

  1. Extract data from source systems
  2. Transform data on a separate ETL server (outside the warehouse)
  3. Load the cleaned data into the warehouse

โš ๏ธ Slow, expensive, hard to change

๐Ÿš€ Modern ELT (with dbt)

Extract โ†’ Load โ†’ Transform

  1. Extract data from source systems
  2. Load raw data directly into the warehouse
  3. Transform data inside the warehouse using dbt

โœ… Fast, flexible, version-controlled

ETL is like cooking dinner at home, packing it in Tupperware, driving it to a restaurant, and serving it to customers. By the time it arrives, it might be cold and you can't easily change the recipe.

ELT is like bringing fresh groceries straight to the restaurant kitchen and letting the chef (dbt) cook right there โ€” using the restaurant's amazing ovens and stoves (the warehouse's compute power). Much faster, much fresher!

๐Ÿš๏ธ OLD WAY: ETL โ‘  EXTRACT ๐Ÿ“ค โ‘ก TRANSFORM โš ๏ธ Outside warehouse! โ‘ข LOAD ๐Ÿ“ฅ Warehouse ๐Ÿข vs ๐Ÿš€ NEW WAY: ELT (with dbt) โ‘  EXTRACT ๐Ÿ“ค โ‘ก LOAD ๐Ÿ“ฅ ๐Ÿข Warehouse โ‘ข dbt TRANSFORMS โœจ โœ… Inside warehouse!

Side-by-Side Comparison

Aspect Traditional ETL Modern ELT + dbt
Where transformation happensSeparate ETL serverInside the data warehouse
LanguageProprietary GUI / Java / PythonSQL (the language you already know!)
Version controlDifficult or impossibleGit-based, just like software
TestingManual, after deploymentAutomated, built into every run
DocumentationExternal wikis, often outdatedAuto-generated, always current
ScalabilityLimited by ETL server hardwareScales with your warehouse
CostHigh (Informatica, SSIS licenses)Free (dbt Core) or low (dbt Cloud)
Speed of changeDays to weeksMinutes to hours

The shift from ETL to ELT was made possible by cloud data warehouses (Snowflake, BigQuery, Redshift) that have massive compute power. Since the warehouse can handle heavy transformations, there's no need for a separate ETL server anymore. dbt takes full advantage of this.

๐Ÿงฑ Where dbt Fits in the Modern Data Stack

The "Modern Data Stack" is a set of cloud-based tools that work together to move, transform, and visualize data. Think of it like an assembly line in a factory. Each tool has one job, and they all work together.

Imagine a pizza delivery chain:

๐ŸŒพ Farmers grow wheat and tomatoes = Data Sources (your apps, databases, APIs)

๐Ÿš› Delivery trucks bring ingredients to the kitchen = Ingestion tools (Fivetran, Airbyte)

๐Ÿช The kitchen where everything is stored = Data Warehouse (Snowflake, BigQuery)

๐Ÿ‘จโ€๐Ÿณ The chef who makes the pizza = dbt (transforms raw ingredients into delicious pizza)

๐Ÿ• The menu board customers see = BI Tools (Metabase, Looker, Tableau)

๐Ÿ“ฑ
Data Sources
Apps, DBs, APIs
โ†’
๐Ÿš›
Ingestion
Fivetran / Airbyte
โ†’
๐Ÿข
Warehouse
Snowflake / BQ
โ†’
๐Ÿ”„
dbt
Transform!
โ†’
๐Ÿ“Š
BI Tools
Metabase / Looker

Each Piece Explained Simply

๐Ÿ“ฑ 1. Data Sources โ€” Where data is born

Your company's apps, databases, payment systems, marketing tools, etc. For example: Shopify stores order data, Stripe stores payment data, Google Analytics stores website visit data.

Analogy: These are the farms and factories that produce raw ingredients.

๐Ÿš› 2. Ingestion Tools โ€” The delivery trucks

Tools like Fivetran, Airbyte, or Stitch connect to your data sources and copy the raw data into your warehouse. They handle the "Extract" and "Load" parts.

Analogy: Delivery trucks that pick up ingredients from farms and bring them to the restaurant kitchen.

๐Ÿข 3. Data Warehouse โ€” The kitchen

Snowflake, Google BigQuery, Amazon Redshift, or Databricks. This is where all your data lives and where transformations happen. Modern warehouses can process terabytes of data in seconds.

Analogy: A massive, state-of-the-art restaurant kitchen with industrial ovens and unlimited counter space.

๐Ÿ”„ 4. dbt โ€” The head chef (YOU ARE HERE!)

dbt reads your SQL "recipes" (models), figures out the right order to cook things (dependency graph), transforms raw data into clean tables, tests everything, and generates documentation. This is the heart of the modern data stack.

Analogy: The brilliant head chef who follows recipes, manages the cooking order, taste-tests everything, and updates the menu.

๐Ÿ“Š 5. BI Tools โ€” The dining room

Metabase, Looker, Tableau, or Power BI. These tools connect to the clean tables dbt created and display beautiful dashboards and reports for business users.

Analogy: The dining room where customers enjoy the beautifully plated dishes.

dbt is the glue that makes the modern data stack work. Without it, you'd have raw data sitting in your warehouse with no way to turn it into useful insights โ€” like having a kitchen full of ingredients but no chef.

๐ŸŒ Who Uses dbt?

dbt isn't some niche tool โ€” it's used by thousands of companies, from tiny startups to massive enterprises. Here are some mind-blowing numbers:

100,000+
Active Projects
75%
Top Data Teams
9,000+
Companies
50+
Database Adapters

Companies That Trust dbt

Here are just a few of the companies using dbt every single day:

  • ๐ŸŽต Spotify โ€” Models music streaming data for 500M+ users
  • ๐Ÿ’ณ Stripe โ€” Processes billions in payment data transformations
  • ๐ŸฆŠ GitLab โ€” Product usage analytics and DevOps metrics
  • ๐Ÿ›๏ธ Shopify โ€” E-commerce analytics for millions of merchants
  • โœˆ๏ธ JetBlue โ€” Flight operations and customer experience analytics
  • ๐Ÿ’ฌ Discord โ€” Community engagement and usage analytics
  • ๐Ÿฆ Monzo โ€” Real-time financial data modeling
  • ๐Ÿ“ฆ Deliveroo โ€” Delivery logistics and restaurant analytics

If data were a sport, dbt would be the most popular piece of equipment. Almost every serious data team in the world uses it โ€” just like almost every soccer team uses a soccer ball. It's that fundamental.

๐Ÿš€ Why YOU Should Learn dbt

Still not convinced? Here's why learning dbt is one of the best career moves you can make right now:

๐Ÿ’ฐ

High Demand, High Pay

Analytics Engineers (dbt experts) earn $120Kโ€“$180K+ in the US. dbt is the #1 most-requested skill in analytics job postings.

๐Ÿ“ˆ

Career Growth

dbt created an entirely new role: Analytics Engineer. It bridges the gap between data engineering and data analysis.

๐Ÿง 

Easy to Learn

If you know SQL, you already know 80% of dbt. The learning curve is gentle โ€” you can build your first project in a day.

๐ŸŒ

Massive Community

50,000+ members in dbt Slack, tons of free resources, packages, and a welcoming community that loves helping beginners.

๐Ÿ†“

Free & Open Source

dbt Core is completely free. You can start learning today with zero cost โ€” just install it and go.

๐Ÿ”ฎ

Future-Proof

Every company is becoming data-driven. dbt skills will be relevant for decades as data only grows in importance.

Career hack: If you're a data analyst who wants to level up, learning dbt is the fastest path to becoming an Analytics Engineer โ€” a role that typically pays 30-50% more than a pure analyst role.

โ˜๏ธ dbt Core vs dbt Cloud

dbt comes in two flavors. Think of it like cooking at home vs. using a meal-kit delivery service:

๐Ÿ–ฅ๏ธ dbt Core (Free, Open Source)

Like cooking from scratch at home

  • โœ… Completely free forever
  • โœ… Full power and flexibility
  • โœ… Run from your terminal
  • โœ… You manage scheduling & deployment
  • โš ๏ธ Requires Python & command line skills
  • โš ๏ธ You handle infrastructure

โ˜๏ธ dbt Cloud (Managed Service)

Like using a meal-kit service

  • โœ… Web-based IDE (no terminal needed)
  • โœ… Built-in scheduling & orchestration
  • โœ… Team collaboration features
  • โœ… One-click deployment
  • โœ… Free tier for 1 developer
  • ๐Ÿ’ฐ Paid plans for teams ($100/seat/month)

dbt Core is like having a fully equipped kitchen at home โ€” you have all the tools, but you need to buy groceries, manage the oven timer, and clean up yourself.

dbt Cloud is like a fancy cooking class where someone sets up the kitchen for you, gives you pre-measured ingredients, and handles the cleanup. You just focus on cooking!

For learning, either works great. We'll cover both in this course.

๐Ÿง  Quick Quiz โ€” Test Your Understanding!

Let's see if you absorbed the key concepts. Click the answer you think is correct:

Question 1: What does dbt stand for?

A) Data Build Tool
B) Database Backup Technology
C) Data Business Transformer
D) Digital Batch Transfer

Question 2: In the ELT approach, where does transformation happen?

A) On a separate ETL server
B) In your local computer
C) Inside the data warehouse
D) In the source database

Question 3: Which of these is NOT something dbt does?

A) Transform data using SQL
B) Test data quality automatically
C) Extract data from source systems
D) Generate documentation

๐Ÿ‘€ What's Coming Next?

Now that you know what dbt is and why it matters, in the next lesson we'll explore the fascinating history of how dbt was born โ€” from one frustrated analyst's "eureka moment" to a tool used by 9,000+ companies.

๐Ÿ“œ Next Lesson: History & Evolution of dbt

  • The "Data Chaos Era" โ€” what life was like before dbt
  • Drew Banin's eureka moment at RJMetrics
  • The founding of Fishtown Analytics (now dbt Labs)
  • dbt Cloud launch and the $4.2B valuation
  • Where dbt is heading in 2025 and beyond

In your own words, explain to a friend (or a rubber duck) what dbt does and why the shift from ETL to ELT matters. If you can explain it simply, you truly understand it!

  • dbt = Data Build Tool โ€” transforms raw data into clean tables using SQL inside your warehouse
  • ELT > ETL โ€” load raw data first, then transform inside the warehouse (faster, cheaper, more flexible)
  • dbt does 3 things: Transform, Test, Document
  • Modern Data Stack: Sources โ†’ Ingestion โ†’ Warehouse โ†’ dbt โ†’ BI Tools
  • dbt Core is free and open source; dbt Cloud is the managed service
  • Specific company adoption stories (Spotify, Stripe, etc.)
  • Exact salary numbers (they vary by region and experience)
  • dbt Cloud pricing tiers
  • The pizza delivery chain analogy (fun but not tested!)