Decision Trees & Random Forest | Fakhruddin Khambaty's Learning Hub

Chapter 1: What is a Decision Tree?

👶 Explain Like I'm 5

Remember playing "20 Questions"? 🎮

"Is it an animal?" → YES

"Does it have 4 legs?" → YES

"Does it bark?" → YES

"It's a DOG!" 🐕

A Decision Tree works exactly like this! It asks simple questions to make predictions.

📌 In One Sentence

A decision tree is a flowchart of yes/no questions: you start at the top, answer each question, follow the branch, and when you reach a leaf you get the prediction (e.g. "play tennis" or "don't play"). The algorithm learns which questions to ask and in what order from the training data.

❓ Why Do We Need Random Forest When One Tree Works?

A single tree can overfit: it memorizes the training data and gets confused on new data. A Random Forest builds many trees (each on a random subset of data and features) and combines their votes. That usually gives a more stable and accurate model—like asking many people instead of one.

🌳 Example: Should I Play Tennis Today?

                          🌤️ What's the weather?
                         /        |           \
                   Sunny      Overcast        Rainy
                   /               |              \
        💨 Is it windy?      ✅ YES, PLAY!    💨 Is it windy?
             /     \                              /     \
         Yes       No                          Yes       No
          |         |                           |         |
    ❌ NO PLAY  ✅ YES PLAY              ❌ NO PLAY  ✅ YES PLAY

🎯 Key Insight

Each question splits the data into smaller groups.

The goal? Make each group as "pure" as possible (all Yes or all No).

Chapter 2: How Decision Trees Work

The Tree Anatomy

Part	What It Is	Example
🏠 Root Node	The first question (top of tree)	"What's the weather?"
🔀 Internal Node	Questions in the middle	"Is it windy?"
🏹 Branch	The answer paths	"Yes" or "No"
🍃 Leaf Node	Final prediction (bottom)	"Play Tennis" or "Don't Play"

How Does It Pick Questions?

🤔 The Algorithm Thinks:

"Which question separates my data best?"

Imagine a box of 50 red balls and 50 blue balls:

Bad split: After asking, I have 40 red + 35 blue in one group (still mixed!)
Good split: After asking, I have 48 red + 2 blue in one group (almost pure!)

The algorithm uses math (Gini Impurity or Entropy) to measure "purity."

📊 Gini Impurity Explained Simply

Gini = 0 → Perfectly pure (all same class) ✅

Gini = 0.5 → Completely mixed (50-50 split) ❌

The algorithm picks the question that reduces Gini the most!

Chapter 3: Decision Trees in Python

Building Your First Tree

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
import pandas as pd

# Load the famous Iris dataset (flowers!)
iris = load_iris()
X = iris.data       # Features (petal length, width, etc.)
y = iris.target     # Labels (Setosa, Versicolor, Virginica)

# Split data: 80% train, 20% test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Create and train the Decision Tree
tree = DecisionTreeClassifier(max_depth=3, random_state=42)
tree.fit(X_train, y_train)

# Check accuracy
accuracy = tree.score(X_test, y_test)
print(f"Accuracy: {accuracy:.2%}")
# Output: Accuracy: 100.00%

Visualizing the Tree

from sklearn.tree import plot_tree
import matplotlib.pyplot as plt

# Create a beautiful tree visualization
plt.figure(figsize=(20, 10))
plot_tree(tree, 
          feature_names=iris.feature_names,
          class_names=iris.target_names,
          filled=True,
          rounded=True,
          fontsize=10)
plt.title("Decision Tree for Iris Classification")
plt.show()

🌸 What the Tree Learned (Simplified)

                    🌺 Is petal_length ≤ 2.45?
                         /           \
                      YES             NO
                       |               |
                  🟡 SETOSA    Is petal_width ≤ 1.75?
                  (50/0/0)          /           \
                                 YES             NO
                                  |               |
                           🔵 VERSICOLOR    🟣 VIRGINICA
                            (0/49/1)         (0/1/45)

Chapter 4: The Problem with One Tree

🤔 Single Tree = Single Opinion

Imagine asking one person for restaurant advice.

They might be biased! Maybe they hate spicy food, or only know cheap places.

What if you asked 100 people and went with the majority vote? 🗳️

That's the idea behind Random Forests!

Problem	What Happens
Overfitting	Tree memorizes training data, fails on new data
High Variance	Small change in data = completely different tree
Instability	Remove one data point, entire tree changes

Chapter 5: Random Forest - Many Trees!

🌲🌲🌲 What is a Random Forest?

It's exactly what it sounds like - a forest of decision trees!

Instead of 1 tree, we build 100+ trees. Each tree votes, majority wins! 🗳️

🌳 Single Decision Tree

🌳

One tree, one decision

High risk of overfitting

Can be unstable

🌲🌲🌲 Random Forest

🌲🌲🌲🌲🌲

100+ trees voting together

Much more robust

Wisdom of the crowd!

The "Random" Part

🎲 Why Random?

Each tree is built differently:

Random Data: Each tree sees a random sample of the data (with replacement)
Random Features: At each split, only consider a random subset of features

This creates diverse trees that make different mistakes!

When they vote together, mistakes cancel out! 🎯

Random Forest in Python

from sklearn.ensemble import RandomForestClassifier

# Create a Random Forest with 100 trees
forest = RandomForestClassifier(
    n_estimators=100,        # 100 trees in the forest
    max_depth=5,             # Max depth of each tree
    random_state=42
)

# Train the forest
forest.fit(X_train, y_train)

# Check accuracy
accuracy = forest.score(X_test, y_test)
print(f"Random Forest Accuracy: {accuracy:.2%}")
# Output: Random Forest Accuracy: 100.00%

Feature Importance

# See which features matter most!
import pandas as pd

importance = pd.DataFrame({
    'Feature': iris.feature_names,
    'Importance': forest.feature_importances_
}).sort_values('Importance', ascending=False)

print(importance)

#          Feature  Importance
# 2   petal length       0.44  ← Most important!
# 3    petal width       0.42
# 0   sepal length       0.10
# 1    sepal width       0.04  ← Least important

💡 What This Tells Us

Petal length and petal width are the most useful features for classifying iris flowers!

This is a FREE bonus from Random Forests - you learn which features matter most! 🎁

Chapter 6: When to Use What?

Situation	Use This	Why?
Need to explain decisions	🌳 Decision Tree	Easy to visualize and explain to stakeholders
Need high accuracy	🌲🌲🌲 Random Forest	More robust, less overfitting
Want to know important features	🌲🌲🌲 Random Forest	Provides feature importance scores
Small dataset	🌳 Decision Tree	Simpler, less likely to overfit
Large dataset	🌲🌲🌲 Random Forest	Can capture complex patterns

Summary

Concept	Simple Explanation
Decision Tree	Asks yes/no questions to make predictions (like 20 Questions)
Root Node	First question at the top
Leaf Node	Final prediction at the bottom
Gini Impurity	Measures how "mixed" a group is (0 = pure, 0.5 = mixed)
Random Forest	Many trees voting together (wisdom of the crowd)
Feature Importance	Which features matter most for predictions

🎉 You've Mastered Tree-Based Models!

Decision Trees and Random Forests are among the most powerful and widely-used algorithms in data science!

🚫 Common Mistakes with Decision Trees & Random Forest

Letting one tree grow too deep — Deep trees overfit; use max_depth or min_samples_leaf to limit size.
Not tuning the number of trees — More trees usually help up to a point; too many just slow things down.
Ignoring feature scaling — Trees don't need scaling, but if you mix with other algorithms later, scale then.
Treating feature importance as causation — Importance only says "this feature was useful for splitting"; it doesn't prove cause and effect.

💭 Short reflection

In one sentence: why does a Random Forest usually generalize better than a single deep decision tree on the same data?

✅ CORE (Must know)

Decision tree: splits on features (yes/no questions); root → internal nodes → leaf (prediction).
Gini impurity / entropy: measure how mixed a node is; tree chooses splits that minimize impurity.
Overfitting: deep trees overfit; use max_depth, min_samples_leaf, or pruning.
Random Forest: ensemble of trees on bootstrap samples + random feature subsets; vote for classification, average for regression.
Feature importance: from split improvement (e.g. Gini decrease) across the forest.
When to use: interpretability → single tree; accuracy → Random Forest.

📚 NON-CORE (Good to know)

Information gain and entropy formula.
Pruning (post-prune vs pre-prune).
Bagging vs Random Forest (RF adds random feature subset per split).
Out-of-bag (OOB) error estimate.

🌳 Decision Trees & Random Forests