A/B Testing (Split Testing) | Fakhruddin Khambaty's Learning Hub

Chapter 1: What is A/B Testing?

👶 Explain Like I'm 5

Imagine you're selling lemonade and want to know which sign attracts more customers:

Sign A: "Fresh Lemonade - $1"
Sign B: "Ice Cold Lemonade - Only $1!"

You show Sign A to half the people walking by, and Sign B to the other half.

After counting who bought more, you know which sign is better! 🍋

That's A/B Testing!

📌 In One Sentence

A/B testing means randomly showing one version (A = control) to some users and another version (B = variant) to others, then comparing a metric (e.g. conversion rate) and using a statistical test (e.g. t-test) to decide if the difference is real or just luck. If p < 0.05, we say the result is significant and we can choose the winner.

🔬 A/B Testing in Action

                    100% of Users
                          |
            ┌─────────────┴─────────────┐
            ▼                           ▼
       Version A                   Version B
    (Current Design)            (New Design)
     [Blue Button]              [Red Button]
            |                           |
         50 Users                   50 Users
            |                           |
     5 Purchases                 12 Purchases
     (10% Convert)              (24% Convert)
            |                           |
            └───────────┬───────────────┘
                        ▼
               🏆 VERSION B WINS!

🌍 Companies That Use A/B Testing

Facebook: Tests which news feed layout keeps users engaged longer
Netflix: Tests different thumbnail images for movies
Amazon: Tests button colors, prices, and product recommendations
Google: Tests search result layouts and ad placements
Booking.com: Tests over 1000 experiments simultaneously!

Chapter 2: How A/B Testing Works

📋 The A/B Testing Process

Ask a Question

"Will changing the button color from blue to green increase sign-ups?"

Create Two Versions

Version A (Control): The current design (blue button)

Version B (Variant): The new design (green button)

Split Your Users Randomly

50% see Version A, 50% see Version B (randomly assigned)

Measure Results

Count conversions (sign-ups, purchases, clicks) for each version

Analyze Statistically

Use a T-test to check if the difference is REAL or just random chance

🅰️ Version A (Control)

📱

Current Website Design

Blue "Sign Up" Button

This is what users see now

🅱️ Version B (Variant)

📱

New Website Design

Green "Sign Up" Button

This is what we're testing

Chapter 3: The Statistics (Super Simple!)

🤔 The Big Question

Version A had 10% conversion rate. Version B had 12% conversion rate.

But wait! Is that 2% difference REAL, or just random luck?

Maybe if we tested again, A might do better? 🤷

That's why we use statistics!

The T-Test

👶 What is a T-Test?

The T-Test answers: "Is the difference between two groups REAL or just coincidence?"

It gives us a p-value:

p < 0.05: "The difference is REAL!" ✅ (less than 5% chance it's random)
p ≥ 0.05: "Probably just random luck" ❌

p-value	Meaning	Decision
p < 0.01	Very strong evidence	✅✅ Definitely implement B!
p < 0.05	Strong evidence	✅ Safe to implement B
p < 0.10	Weak evidence	🤔 Maybe test longer
p ≥ 0.10	No evidence	❌ Probably no real difference

Chapter 4: A/B Testing in Python

📥 Download the A/B Testing Dataset!

Download this CSV to follow along with the code examples below.

Download CSV (525 bytes)

Step 1: Load the Data

import pandas as pd
from scipy import stats

# Load A/B test data (download from link above!)
# This data has conversion rates for 35 days
data = pd.read_csv("AB_testing_data.csv")

# Let's look at the data
print(data.head(10))

# Output:
#    Day  Conversion fraction A  Conversion fraction B
# 0    1                  0.102                  0.189
# 1    2                  0.095                  0.178
# 2    3                  0.108                  0.192
# ...

What each line does (in simple words)

import pandas as pd — Lets us use DataFrames and read CSV.

from scipy import stats — For the statistical test (chi-square or t-test) later.

pd.read_csv("AB_testing_data.csv") — Loads the A/B test file: columns are Day, Conversion fraction A, Conversion fraction B.

print(data.head(10)) — Shows the first 10 rows so you can see the conversion rates for each day.

Step 2: Look at the Averages

# Calculate average conversion rate for each version
avg_A = data['Conversion fraction A'].mean()
avg_B = data['Conversion fraction B'].mean()

print(f"Version A average conversion: {avg_A:.1%}")
print(f"Version B average conversion: {avg_B:.1%}")
print(f"Difference: {(avg_B - avg_A):.1%}")

# Output:
# Version A average conversion: 10.2%
# Version B average conversion: 18.5%
# Difference: 8.3%

# Wow! B looks much better! But is it STATISTICALLY significant?

Step 3: Run the T-Test

# Get the conversion rates for each version
group_A = data['Conversion fraction A']
group_B = data['Conversion fraction B']

# Run the T-Test!
t_stat, p_value = stats.ttest_ind(group_A, group_B)

print("=" * 50)
print("       A/B TEST RESULTS")
print("=" * 50)
print(f"T-statistic: {t_stat:.2f}")
print(f"P-value: {p_value:.6f}")
print("=" * 50)

# Output:
# ==================================================
#        A/B TEST RESULTS
# ==================================================
# T-statistic: -3.74
# P-value: 0.000347
# ==================================================

Step 4: Interpret the Results

# Make a decision based on p-value
alpha = 0.05  # Significance threshold (5%)

if p_value < alpha:
    print("✅ STATISTICALLY SIGNIFICANT!")
    print("The difference is REAL, not random luck.")
    print(f"We are {(1 - p_value) * 100:.2f}% confident Version B is better!")
    print("\n👉 RECOMMENDATION: Implement Version B!")
else:
    print("❌ NOT statistically significant.")
    print("The difference might just be random chance.")
    print("\n👉 RECOMMENDATION: Keep Version A or test longer.")

# Output:
# ✅ STATISTICALLY SIGNIFICANT!
# The difference is REAL, not random luck.
# We are 99.97% confident Version B is better!
#
# 👉 RECOMMENDATION: Implement Version B!

🏆 VERSION B WINS!

With p-value = 0.000347 (much less than 0.05), we can confidently say:

"Version B truly performs better - it's not just luck!"

Chapter 5: Common Mistakes to Avoid

Mistake	Why It's Bad	What to Do Instead
Stopping too early	Small sample = unreliable results	Wait for enough data (usually 1000+ users per version)
Testing too many things	Can't tell which change made the difference	Change ONE thing at a time
Peeking at results	Leads to false positives	Set a fixed end date before starting
Not randomizing properly	Biased groups	Use proper random assignment
Ignoring seasonality	Weekend vs weekday behavior differs	Test for at least 1-2 full weeks

🚫 Common Mistakes in A/B Testing

Peeking and stopping early — Checking results every day and stopping when you see "significant" inflates false positives. Fix the sample size and end date before you start.
Non-random assignment — If one group gets more weekend users or new users, the comparison is biased. Use proper randomization (e.g. by user ID).
Ignoring seasonality — Run for at least one full week (or a full business cycle) so weekday vs weekend and other patterns don't skew the result.

📘 From the course notebook (A/B Testing and Market Basket)

The course source uses ab_testing_data.csv (or similar): control vs variant groups, conversion metric. Key steps: split by group, compute conversion rates, run a t-test or z-test for significance. Download ab_testing_data.csv from the datasets page. See AB testing and Market Basket Analysis.pdf in the course source for slides.

Complete code from course notebook: ab_test.ipynb

Every line of code (verbatim).

# --- Code cell 1 ---
from IPython.core.display import HTML

HTML("""
<style>

h2 { color: blue !important; }
h3 { color: green !important; }
</style>
""")

# --- Code cell 4 ---
import pandas as pd
from scipy import stats

# --- Code cell 5 ---
data = pd.read_csv("AB_testing_data.csv")

# --- Code cell 6 ---
len(data)

# --- Code cell 7 ---
data.head(10)

# --- Code cell 8 ---
data.info()

# --- Code cell 9 ---
data.describe()

# --- Code cell 11 ---
samples_set1 = data['Conversion fraction A']
samples_set2 = data['Conversion fraction B']
stat, p = stats.ttest_ind(samples_set1, samples_set2,equal_var = True)

print("AB test results: ")
print("p-value : ", p)
print("")
print("")

# --- Code cell 12 ---
1-0.00034704350989135126

# --- Code cell 13 ---
1-0.05

# --- Code cell 14 ---
# p value < 0.05 so two versions of website have different means for conversion rate - more than 95% confidence

💭 Short reflection

In one sentence: why is it important to run an A/B test for at least one full week (or more) before deciding a winner?

✅ CORE (Must know)

A/B test: compare control (A) vs variant (B) with random assignment.
Metric: e.g. conversion rate; need enough sample size and runtime.
Statistical significance: use t-test (or z-test); p < 0.05 → reject “no difference”.
Random split 50/50; avoid peeking and early stopping; run for full cycle (e.g. weekly).

📚 NON-CORE (Good to know)

Sample size calculation (power, MDE).
Multiple comparisons correction if testing many variants.

Chapter 6: Summary

📋 A/B Testing Checklist

❓ Define what you want to test (hypothesis)
🅰️🅱️ Create two versions (A = current, B = new)
👥 Split users randomly 50/50
📊 Collect enough data (be patient!)
🧮 Run a T-test to check statistical significance
📈 If p < 0.05 → Implement the winner!

Concept	Simple Explanation
A/B Test	Comparing two versions to see which performs better
Control (A)	The current version (what we're comparing against)
Variant (B)	The new version we're testing
Conversion Rate	% of users who took the desired action
p-value	Probability that the difference is just random luck
Significance (p < 0.05)	Less than 5% chance it's random → real difference!

🎉 Congratulations!

You now understand A/B testing - a skill used by data scientists at top tech companies!

Back to Course Hub Explore More

🧪 A/B Testing