From a frustrated analyst's side project to the backbone of modern data teams
Imagine 10 people trying to cook in the same kitchen, but nobody wrote down any recipes. Everyone makes the same dish differently. One person adds salt, another doesn't. One person bakes at 350Β°F, another at 400Β°F. The food comes out different every single time. That's what data teams looked like before dbt.
Between 2012 and 2015, the data world was like the Wild West. Companies were collecting more data than ever before β from websites, mobile apps, payment systems, marketing tools β but they had no good way to transform and organise all that raw data into something useful.
SQL queries lived in random places β someone's laptop, a shared Google Doc, buried in an email thread, or hard-coded inside a Tableau dashboard. There was no single source of truth.
Nobody tracked who changed a query, when they changed it, or why. If a dashboard broke on Monday morning, the team spent hours playing detective.
There were no automated checks. Bad data β duplicates, nulls, impossible values β flowed straight into executive dashboards. The CEO would ask, "Why does this number look wrong?" and nobody had a quick answer.
Marketing said revenue was $5M. Finance said $4.8M. Product said $5.2M. Everyone used slightly different SQL to calculate "revenue." Meetings became arguments about whose number was right.
Real-world example: Imagine you work at a pizza chain. Store A counts "revenue" as all orders placed. Store B counts only delivered orders. Store C counts orders minus refunds. The CEO looks at a report and sees three different revenue numbers. Who's right? Nobody knows, because there's no agreed-upon recipe.
The Chaos: SQL Scattered Everywhere
The fundamental problem wasn't that people were bad at SQL. The problem was that there was no system β no framework, no workflow, no process β for managing data transformations the way software engineers managed code.
Our story begins with a young data analyst named Drew Banin, working at a company called RJMetrics in Philadelphia. RJMetrics was a business intelligence platform β they helped other companies understand their data.
Drew was smart, hard-working, and increasingly frustrated.
Every single day, Drew found himself doing the same painful thing: maintaining identical SQL transformations across five different tools. He'd write a query to calculate monthly recurring revenue (MRR) in one tool, then copy-paste it into another tool, then tweak it slightly for a third tool, and so on.
It's like writing the same essay for 5 different teachers, by hand, every single time. And every time one teacher wants a small change, you have to update all 5 copies manually. Miss one? That teacher gets the wrong essay. Now imagine doing this every day for years.
Here's what Drew's typical week looked like:
Real-world analogy: Imagine you're a chef who has perfected a chocolate cake recipe. But instead of writing it down once in a recipe book, you have to memorise it and recite it from scratch every time someone asks. Sometimes you forget the baking soda. Sometimes you say "2 cups of sugar" instead of "1.5 cups." The cake comes out different every time β and you can never figure out which version was the "right" one.
One evening, after yet another fire-drill caused by a broken dashboard (someone had updated the SQL in one place but forgotten the other four), Drew sat back and asked himself a simple but revolutionary question:
"Why can't data transformations work like software development?"
Software engineers had solved this problem decades ago. They had:
Data analysts had none of this. They were stuck in the stone age while software engineers were living in the future.
Drew's co-founder Tristan Handy later described the problem perfectly: "We had all these amazing data warehouses β Redshift, BigQuery, Snowflake β that could process billions of rows in seconds. But the way we managed the SQL that ran inside them was stuck in 2005."
Drew didn't just complain about the problem β he built a solution. Working nights and weekends, he created the first version of dbt: a simple Python command-line tool that could read SQL files, understand their dependencies, and execute them in the right order against a data warehouse.
Drew basically built a robot that could read recipe cards (SQL files) and cook dishes in order, automatically. The robot knew that you have to make the sauce before you can put it on the pasta. It knew that you need to chop the vegetables before you can make the salad. And it did everything in the right order, every single time, without forgetting a step.
Write your transformations as plain SQL files with Jinja templating. No proprietary language to learn β if you knew SQL, you knew dbt.
Instead of hard-coding table names, use {{ ref('model_name') }}. dbt automatically figures out the execution order.
Tell dbt whether your model should be a view or a table with a single config line. No manual DDL needed.
Install with pip install dbt, run with dbt run. Simple, clean, no bloated GUI β just a terminal and your SQL files.
-- One of the very first dbt models ever written
-- Simple, clean, revolutionary
{{ config(materialized='table') }}
SELECT
customer_id,
first_name,
last_name,
email,
created_at
FROM {{ ref('raw_customers') }}
WHERE customer_id IS NOT NULL
It looks simple, right? Almost too simple. But that was the genius. Drew didn't try to reinvent SQL. He just added a thin layer of intelligence on top of it:
{{ config(materialized='table') }} β tells dbt to create a physical table{{ ref('raw_customers') }} β creates a dependency link so dbt knows to run raw_customers firstReal-world analogy: Think of dbt like the invention of the recipe book. Before recipe books, grandma's secret soup recipe lived only in her head. If she forgot an ingredient, the soup tasted different. A recipe book doesn't change how you cook β you still chop, stir, and bake β but it gives you a reliable, repeatable system so the soup tastes the same every time.
Drew made a critical decision: he open-sourced dbt. Anyone could use it, modify it, and contribute to it for free. This was a bold move β most data tools at the time were expensive, proprietary products from companies like Informatica and IBM.
The open-source decision meant that dbt's growth would be driven by the community, not by a sales team. And that community would grow faster than anyone expected.
By 2018, dbt was gaining real traction. Drew Banin and Tristan Handy founded a company called Fishtown Analytics (named after the Fishtown neighbourhood in Philadelphia where they worked) to build a business around dbt.
Imagine you built an amazing free recipe app in your garage. People love it. Now you start a company to make a premium version with extra features β like meal planning, grocery lists, and cooking timers β while keeping the basic recipe app free forever. That's what Fishtown Analytics did with dbt.
Some brave, forward-thinking companies became early dbt adopters:
These companies reported dramatic improvements:
In 2019, dbt added one of its most important features: a built-in testing framework. Now you could write simple YAML declarations to automatically check your data:
version: 2
models:
- name: customers
columns:
- name: customer_id
tests:
- unique
- not_null
- name: email
tests:
- not_null
- unique
Real-world analogy: Adding tests to dbt was like adding a quality inspector to a factory assembly line. Before, defective products (bad data) would reach customers (dashboards). Now, the inspector catches problems before they leave the factory.
2020 was the year everything changed β for the world, and for dbt.
Fishtown Analytics launched dbt Cloud β a web-based platform that let you develop, test, schedule, and monitor your dbt projects without touching the command line. It was dbt with training wheels, a dashboard, and a scheduling engine all rolled into one.
Write and test dbt models directly in your browser. No local setup required β just log in and start building.
Schedule your dbt runs to execute automatically β every hour, every day, or on a custom cron schedule.
Multiple team members can work on the same project with Git integration, code review, and environment management.
See which models ran, how long they took, which tests passed or failed β all in a beautiful dashboard.
Then COVID-19 hit. Suddenly, millions of knowledge workers were working from home. Data teams that relied on "walking over to someone's desk to ask about a query" were in trouble. They needed tools that enabled remote, asynchronous collaboration.
dbt was perfectly positioned. It was Git-based (so changes were tracked), it had documentation (so you could understand models without asking someone), and dbt Cloud worked in a browser (so you didn't need a corporate VPN).
Imagine your school suddenly switches to online classes. The teacher who already had all their lessons on YouTube and Google Docs was fine. The teacher who only taught from handwritten notes on a whiteboard was in big trouble. dbt was the teacher with everything online β ready for remote work from day one.
By 2021, dbt wasn't a scrappy open-source project any more β it was the centre of the Modern Data Stack. The company raised $52 million in Series B funding and renamed itself from Fishtown Analytics to dbt Labs to reflect its growth.
In December 2021, dbt Labs released dbt 1.0 β a huge milestone. The "1.0" label meant:
Real-world analogy: Think of dbt 1.0 like a restaurant getting its health inspection certificate. Before the certificate, adventurous foodies would eat there. After the certificate, even cautious corporate event planners would book it. The food didn't change β but the trust did.
dbt Labs introduced the Semantic Layer β a way to define business metrics (like "revenue," "churn rate," "active users") in one place and have every downstream tool (Looker, Tableau, Mode, etc.) use the exact same definition.
Remember the chaos from Section 1, where Marketing, Finance, and Product all had different revenue numbers? The Semantic Layer was dbt's answer: define it once, use it everywhere.
dbt now worked with over 20 data platforms:
Fast-forward to today, and dbt is no longer just a tool β it's an ecosystem. It's the backbone of how modern data teams build, test, document, and deploy data transformations worldwide.
dbt Copilot uses artificial intelligence to help you write SQL, generate documentation, and suggest tests. It's like having a senior analytics engineer looking over your shoulder, offering suggestions as you type.
Imagine you're writing an essay, and a really smart friend sits next to you whispering, "Hey, you should probably add a paragraph about X" or "That sentence has a grammar mistake." That's dbt Copilot β an AI assistant that helps you write better data transformations.
Big companies (think: hundreds of data engineers across dozens of teams) needed a way to manage multiple interconnected dbt projects. dbt Mesh lets teams own their own projects while still sharing models across team boundaries β like departments in a company that have their own budgets but share a cafeteria.
Early dbt could tell you that model_A depends on model_B. Modern dbt can tell you that column revenue in model_A comes from column amount in model_B after being multiplied by exchange_rate from model_C. That's column-level lineage β and it's a game-changer for debugging and compliance.
dbt now connects to over 50 data platforms, from traditional warehouses to modern lakehouses, from cloud giants to open-source databases. If your company stores data somewhere, there's probably a dbt adapter for it.
The dbt community now has its own annual conference called Coalesce (a clever data pun β "coalesce" is a SQL function that returns the first non-null value). Thousands of data professionals attend to share best practices, new features, and war stories.
Let's step back and appreciate just how far dbt has come from Drew Banin's side project:
To put this in perspective: dbt went from a side project by one frustrated analyst in 2016 to a $4.2 billion company used by 9,000+ organisations in under a decade. That's like going from selling lemonade on your street corner to running a global beverage empire β in less time than it takes most people to finish a PhD.
Here's the full journey at a glance:
ref() function is born.
unique, not_null) and auto-generated documentation arrive. Community grows rapidly.
Let's see how much you remember from the story of dbt! Answer these 3 questions:
Who created dbt, and where was he working at the time?
What event in 2020 caused dbt downloads to increase 10x?
What was the original company name before it became "dbt Labs"?
In one sentence: Why do you think dbt succeeded where other data tools failed? Think about what made it different β the open-source model, the focus on SQL, the community, or something else?