Lesson 2: History & Evolution of dbt

🌪️ The Data Chaos Era (2012–2015)

Imagine 10 people trying to cook in the same kitchen, but nobody wrote down any recipes. Everyone makes the same dish differently. One person adds salt, another doesn't. One person bakes at 350°F, another at 400°F. The food comes out different every single time. That's what data teams looked like before dbt.

Between 2012 and 2015, the data world was like the Wild West. Companies were collecting more data than ever before — from websites, mobile apps, payment systems, marketing tools — but they had no good way to transform and organise all that raw data into something useful.

🔥 The Four Horsemen of Data Chaos

📂 Scattered SQL

SQL queries lived in random places — someone's laptop, a shared Google Doc, buried in an email thread, or hard-coded inside a Tableau dashboard. There was no single source of truth.

🚫 No Version Control

Nobody tracked who changed a query, when they changed it, or why. If a dashboard broke on Monday morning, the team spent hours playing detective.

🧪 Zero Testing

There were no automated checks. Bad data — duplicates, nulls, impossible values — flowed straight into executive dashboards. The CEO would ask, "Why does this number look wrong?" and nobody had a quick answer.

📊 Inconsistent Metrics

Marketing said revenue was $5M. Finance said $4.8M. Product said $5.2M. Everyone used slightly different SQL to calculate "revenue." Meetings became arguments about whose number was right.

Real-world example: Imagine you work at a pizza chain. Store A counts "revenue" as all orders placed. Store B counts only delivered orders. Store C counts orders minus refunds. The CEO looks at a report and sees three different revenue numbers. Who's right? Nobody knows, because there's no agreed-upon recipe.

The Chaos: SQL Scattered Everywhere

💻

Analyst's
Laptop

→

📊

Tableau
Dashboard

→

📧

Email
Thread

→

📝

Google
Doc

→

🤯

Total
Chaos

The fundamental problem wasn't that people were bad at SQL. The problem was that there was no system — no framework, no workflow, no process — for managing data transformations the way software engineers managed code.

💡 Drew Banin's Eureka Moment (2016)

Our story begins with a young data analyst named Drew Banin, working at a company called RJMetrics in Philadelphia. RJMetrics was a business intelligence platform — they helped other companies understand their data.

Drew was smart, hard-working, and increasingly frustrated.

😤 The Daily Grind

Every single day, Drew found himself doing the same painful thing: maintaining identical SQL transformations across five different tools. He'd write a query to calculate monthly recurring revenue (MRR) in one tool, then copy-paste it into another tool, then tweak it slightly for a third tool, and so on.

It's like writing the same essay for 5 different teachers, by hand, every single time. And every time one teacher wants a small change, you have to update all 5 copies manually. Miss one? That teacher gets the wrong essay. Now imagine doing this every day for years.

Here's what Drew's typical week looked like:

📊 Write a revenue query in the analytics database
📋 Copy the same logic into Looker for dashboards
📓 Paste a version into a Jupyter notebook for ad-hoc analysis
📧 Email a slightly different version to the finance team
🔧 Hard-code yet another version into an ETL pipeline

Real-world analogy: Imagine you're a chef who has perfected a chocolate cake recipe. But instead of writing it down once in a recipe book, you have to memorise it and recite it from scratch every time someone asks. Sometimes you forget the baking soda. Sometimes you say "2 cups of sugar" instead of "1.5 cups." The cake comes out different every time — and you can never figure out which version was the "right" one.

🤔 The Question That Changed Everything

One evening, after yet another fire-drill caused by a broken dashboard (someone had updated the SQL in one place but forgotten the other four), Drew sat back and asked himself a simple but revolutionary question:

"Why can't data transformations work like software development?"

Software engineers had solved this problem decades ago. They had:

✅ Version control (Git) — track every change, by whom, and why
✅ Testing — automated checks that catch bugs before deployment
✅ Modularity — write code once, reuse it everywhere
✅ Code review — peers check your work before it goes live
✅ CI/CD — automated deployment pipelines

Data analysts had none of this. They were stuck in the stone age while software engineers were living in the future.

Drew's co-founder Tristan Handy later described the problem perfectly: "We had all these amazing data warehouses — Redshift, BigQuery, Snowflake — that could process billions of rows in seconds. But the way we managed the SQL that ran inside them was stuck in 2005."

🐣 The Birth of dbt (2016–2017)

Drew didn't just complain about the problem — he built a solution. Working nights and weekends, he created the first version of dbt: a simple Python command-line tool that could read SQL files, understand their dependencies, and execute them in the right order against a data warehouse.

Drew basically built a robot that could read recipe cards (SQL files) and cook dishes in order, automatically. The robot knew that you have to make the sauce before you can put it on the pasta. It knew that you need to chop the vegetables before you can make the salad. And it did everything in the right order, every single time, without forgetting a step.

🛠️ What the First dbt Could Do

📝 SQL + Jinja

Write your transformations as plain SQL files with Jinja templating. No proprietary language to learn — if you knew SQL, you knew dbt.

🔗 ref() Function

Instead of hard-coding table names, use {{ ref('model_name') }}. dbt automatically figures out the execution order.

⚙️ Materializations

Tell dbt whether your model should be a view or a table with a single config line. No manual DDL needed.

🐍 Python CLI

Install with pip install dbt, run with dbt run. Simple, clean, no bloated GUI — just a terminal and your SQL files.

📄 What Early dbt Code Looked Like

SQL — models/customers.sql (circa 2016)

-- One of the very first dbt models ever written
-- Simple, clean, revolutionary

{{ config(materialized='table') }}

SELECT
    customer_id,
    first_name,
    last_name,
    email,
    created_at
FROM {{ ref('raw_customers') }}
WHERE customer_id IS NOT NULL

It looks simple, right? Almost too simple. But that was the genius. Drew didn't try to reinvent SQL. He just added a thin layer of intelligence on top of it:

{{ config(materialized='table') }} — tells dbt to create a physical table
{{ ref('raw_customers') }} — creates a dependency link so dbt knows to run raw_customers first
Everything else is plain SQL that any analyst already knows

Real-world analogy: Think of dbt like the invention of the recipe book. Before recipe books, grandma's secret soup recipe lived only in her head. If she forgot an ingredient, the soup tasted different. A recipe book doesn't change how you cook — you still chop, stir, and bake — but it gives you a reliable, repeatable system so the soup tastes the same every time.

🌐 Open Source from Day One

Drew made a critical decision: he open-sourced dbt. Anyone could use it, modify it, and contribute to it for free. This was a bold move — most data tools at the time were expensive, proprietary products from companies like Informatica and IBM.

The open-source decision meant that dbt's growth would be driven by the community, not by a sales team. And that community would grow faster than anyone expected.

🏢 Fishtown Analytics & Growth (2018–2019)

By 2018, dbt was gaining real traction. Drew Banin and Tristan Handy founded a company called Fishtown Analytics (named after the Fishtown neighbourhood in Philadelphia where they worked) to build a business around dbt.

Imagine you built an amazing free recipe app in your garage. People love it. Now you start a company to make a premium version with extra features — like meal planning, grocery lists, and cooking timers — while keeping the basic recipe app free forever. That's what Fishtown Analytics did with dbt.

🚀 Early Adopters Who Took the Leap

Some brave, forward-thinking companies became early dbt adopters:

👓

Warby Parker

🛏️

Casper

🎵

SeatGeek

📊

Buffer

These companies reported dramatic improvements:

⏱️ 80% less time spent on data pipeline maintenance
🐛 90% fewer data bugs reaching production dashboards
👥 5x faster onboarding for new data analysts
📈 One source of truth for every business metric

🧪 The Testing Framework Arrives

In 2019, dbt added one of its most important features: a built-in testing framework. Now you could write simple YAML declarations to automatically check your data:

YAML — schema.yml (testing, 2019)

version: 2

models:
  - name: customers
    columns:
      - name: customer_id
        tests:
          - unique
          - not_null
      - name: email
        tests:
          - not_null
          - unique

Real-world analogy: Adding tests to dbt was like adding a quality inspector to a factory assembly line. Before, defective products (bad data) would reach customers (dashboards). Now, the inspector catches problems before they leave the factory.

☁️ dbt Cloud & COVID Acceleration (2020)

2020 was the year everything changed — for the world, and for dbt.

🌐 dbt Cloud: The Managed Service

Fishtown Analytics launched dbt Cloud — a web-based platform that let you develop, test, schedule, and monitor your dbt projects without touching the command line. It was dbt with training wheels, a dashboard, and a scheduling engine all rolled into one.

💻 Web-Based IDE

Write and test dbt models directly in your browser. No local setup required — just log in and start building.

⏰ Job Scheduling

Schedule your dbt runs to execute automatically — every hour, every day, or on a custom cron schedule.

👥 Team Collaboration

Multiple team members can work on the same project with Git integration, code review, and environment management.

📊 Monitoring

See which models ran, how long they took, which tests passed or failed — all in a beautiful dashboard.

🦠 The COVID Effect

Then COVID-19 hit. Suddenly, millions of knowledge workers were working from home. Data teams that relied on "walking over to someone's desk to ask about a query" were in trouble. They needed tools that enabled remote, asynchronous collaboration.

dbt was perfectly positioned. It was Git-based (so changes were tracked), it had documentation (so you could understand models without asking someone), and dbt Cloud worked in a browser (so you didn't need a corporate VPN).

Imagine your school suddenly switches to online classes. The teacher who already had all their lessons on YouTube and Google Docs was fine. The teacher who only taught from handwritten notes on a whiteboard was in big trouble. dbt was the teacher with everything online — ready for remote work from day one.

10x

Download Increase in 2020

5,000+

Slack Community Members

850+

Contributors on GitHub

30+

Warehouse Adapters

🏗️ dbt Labs & Enterprise Era (2021–2022)

By 2021, dbt wasn't a scrappy open-source project any more — it was the centre of the Modern Data Stack. The company raised $52 million in Series B funding and renamed itself from Fishtown Analytics to dbt Labs to reflect its growth.

🎉 dbt 1.0 — The Stability Promise

In December 2021, dbt Labs released dbt 1.0 — a huge milestone. The "1.0" label meant:

🔒 Semantic versioning — no more surprise breaking changes
📋 Stable API — packages and integrations could rely on consistent behaviour
🏢 Enterprise-ready — big companies could adopt dbt with confidence

Real-world analogy: Think of dbt 1.0 like a restaurant getting its health inspection certificate. Before the certificate, adventurous foodies would eat there. After the certificate, even cautious corporate event planners would book it. The food didn't change — but the trust did.

📐 The Semantic Layer

dbt Labs introduced the Semantic Layer — a way to define business metrics (like "revenue," "churn rate," "active users") in one place and have every downstream tool (Looker, Tableau, Mode, etc.) use the exact same definition.

Remember the chaos from Section 1, where Marketing, Finance, and Product all had different revenue numbers? The Semantic Layer was dbt's answer: define it once, use it everywhere.

🔌 20+ Adapters

dbt now worked with over 20 data platforms:

❄️

Snowflake

📊

BigQuery

🔴

Redshift

🟣

Databricks

🐘

Postgres

🔷

Spark

➕

14+ more

🚀 dbt Today (2023–2026)

Fast-forward to today, and dbt is no longer just a tool — it's an ecosystem. It's the backbone of how modern data teams build, test, document, and deploy data transformations worldwide.

🤖 AI-Powered Features

dbt Copilot uses artificial intelligence to help you write SQL, generate documentation, and suggest tests. It's like having a senior analytics engineer looking over your shoulder, offering suggestions as you type.

Imagine you're writing an essay, and a really smart friend sits next to you whispering, "Hey, you should probably add a paragraph about X" or "That sentence has a grammar mistake." That's dbt Copilot — an AI assistant that helps you write better data transformations.

🕸️ dbt Mesh for Large Organisations

Big companies (think: hundreds of data engineers across dozens of teams) needed a way to manage multiple interconnected dbt projects. dbt Mesh lets teams own their own projects while still sharing models across team boundaries — like departments in a company that have their own budgets but share a cafeteria.

🔬 Column-Level Lineage

Early dbt could tell you that model_A depends on model_B. Modern dbt can tell you that column revenue in model_A comes from column amount in model_B after being multiplied by exchange_rate from model_C. That's column-level lineage — and it's a game-changer for debugging and compliance.

🌍 50+ Adapters

dbt now connects to over 50 data platforms, from traditional warehouses to modern lakehouses, from cloud giants to open-source databases. If your company stores data somewhere, there's probably a dbt adapter for it.

The dbt community now has its own annual conference called Coalesce (a clever data pun — "coalesce" is a SQL function that returns the first non-null value). Thousands of data professionals attend to share best practices, new features, and war stories.

📊 dbt by the Numbers

Let's step back and appreciate just how far dbt has come from Drew Banin's side project:

100K+

Active Projects Worldwide

$4.2B

Company Valuation

9,000+

Companies Using dbt

50+

Data Platform Adapters

To put this in perspective: dbt went from a side project by one frustrated analyst in 2016 to a $4.2 billion company used by 9,000+ organisations in under a decade. That's like going from selling lemonade on your street corner to running a global beverage empire — in less time than it takes most people to finish a PhD.

🗺️ The Complete dbt Timeline

Here's the full journey at a glance:

2012–2015

The Chaos Era — Data teams struggle with scattered SQL, no version control, no testing, and inconsistent metrics across organisations.

2016

Drew Banin's Eureka Moment — Working at RJMetrics, Drew asks: "Why can't data work like software?" He starts building dbt as a side project.

2016–2017

First Open-Source Release — dbt launches as a simple Python CLI with SQL + Jinja templating. The ref() function is born.

2018

Fishtown Analytics Founded — Drew and Tristan Handy start a company. Early adopters include Warby Parker and Casper.

2019

Testing Framework & Docs — Built-in testing (unique, not_null) and auto-generated documentation arrive. Community grows rapidly.

2020

dbt Cloud & COVID Boom — Managed service launches. Remote work drives 10x download increase. Web-based IDE makes dbt accessible to everyone.

2021

$52M Funding & Rebrand — Renamed to dbt Labs. dbt 1.0 released with stability guarantees. Semantic Layer introduced.

2022

Enterprise Adoption — 20+ adapters, dbt Mesh for large orgs, enterprise access controls and audit logging.

2023–2026

AI & Modern Era — dbt Copilot (AI), column-level lineage, 50+ adapters, $4.2B valuation, and 9,000+ companies using dbt daily.

🧠 Test Your Knowledge

Let's see how much you remember from the story of dbt! Answer these 3 questions:

Question 1 of 3

Who created dbt, and where was he working at the time?

A) Tristan Handy, working at Snowflake B) Drew Banin, working at RJMetrics C) Jeff Bezos, working at Amazon D) Martin Fowler, working at ThoughtWorks

Question 2 of 3

What event in 2020 caused dbt downloads to increase 10x?

A) dbt was acquired by Google B) A viral TikTok video about SQL C) COVID-19 drove remote work and the need for collaborative data tools D) Snowflake's IPO made everyone interested in data

Question 3 of 3

What was the original company name before it became "dbt Labs"?

A) Data Build Company B) Philadelphia Data Co. C) Fishtown Analytics D) Transform Inc.

📋 Lesson Summary

Before dbt (2012–2015): SQL was scattered, untested, unversioned, and metrics were inconsistent.
Drew Banin created dbt in 2016 at RJMetrics to bring software engineering practices to data.
dbt is open-source — a Python CLI that runs SQL + Jinja against your data warehouse.
Key milestones: Fishtown Analytics (2018), testing framework (2019), dbt Cloud (2020), dbt 1.0 (2021).
Today: 100K+ projects, 50+ adapters, AI features, $4.2B valuation.

Specific funding rounds and investor names.
Detailed feature release dates for each minor version.
The Coalesce conference and community Slack history.
dbt Mesh architecture details (covered in the Advanced lesson).

In one sentence: Why do you think dbt succeeded where other data tools failed? Think about what made it different — the open-source model, the focus on SQL, the community, or something else?

Prev: Meet dbt Next: Installation

📜 The Story of dbt