Line by line: predict which bank customers will leave using Support Vector Machines
First we load the dataset and get a feel for what's inside. The Bank Churners dataset has info about credit card customers — some stayed (Existing Customer) and some left (Attrited Customer).
import warnings warnings.filterwarnings('ignore') import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.preprocessing import MinMaxScaler from sklearn.svm import SVC from sklearn.metrics import classification_report, confusion_matrix, accuracy_score df = pd.read_csv('datasets/BankChurnersData.csv') print(df.shape) df.head()
Click any column name to see what it means:
df.info() df.describe()
Imagine you run a bank. You have 10,127 customers with credit cards. Some are happy and stay — others get frustrated and leave. You want a computer to learn the patterns of people who leave so you can catch them early and offer them a deal to stay!
Think of it like a school yearbook. Each row is a student, each column is a fact about them (age, grades, clubs). We want to predict who will transfer to another school based on all those facts.
Before training any model, we need to see the data. Visualizations reveal which features differ between churned and existing customers.
num_cols = df.select_dtypes(include=['int64','float64']).columns.tolist() num_cols.remove('CLIENTNUM') fig, axes = plt.subplots(4, 4, figsize=(20,16)) for i, col in enumerate(num_cols): ax = axes[i//4][i%4] sns.boxplot(data=df, x='Attrition_Flag', y=col, ax=ax, palette={'Existing Customer':'#a78bfa','Attrited Customer':'#f87171'}) ax.set_title(col, fontsize=10) plt.tight_layout() plt.show()
plt.figure(figsize=(14,10)) sns.heatmap(df[num_cols].corr(), annot=True, fmt='.2f', cmap='RdPu', center=0, linewidths=0.5) plt.title('Feature Correlation Heatmap') plt.tight_layout() plt.show()
It's like a friendship chart. If two columns move together (when one goes up, the other goes up too), they're highly correlated — shown in dark purple. If they don't care about each other, it's close to zero. We look for features that are too friendly (redundant) and drop one of them.
Click "Reveal" to see the bars animate
Boxplots are like comparing the height of basketball vs. chess club members. You immediately see which group is taller. Similarly, we see which features look different for churned customers — those are the ones SVM will rely on most.
Raw data isn't ready for SVM. We need to convert text to numbers, create new useful features, remove redundant ones, and scale everything to the same range.
# Convert target: 1 = Attrited (churned), 0 = Existing (stayed) df['Attrition_Flag'] = df['Attrition_Flag'].map({ 'Attrited Customer': 1, 'Existing Customer': 0 }) # Drop the ID column — it's useless for prediction df = df.drop(columns=['CLIENTNUM']) # One-hot encode categorical columns cat_cols = df.select_dtypes(include=['object']).columns df = pd.get_dummies(df, columns=cat_cols, drop_first=True)
SVM only understands numbers, not words. So "Male"/"Female" becomes two columns: Gender_M = 1 means male, Gender_M = 0 means female. drop_first=True avoids redundancy — if it's not Male, it must be Female. One column is enough!
# New feature: average value per transaction df['Avg_Transaction_Value'] = df['Total_Trans_Amt'] / df['Total_Trans_Ct'] # Avg_Open_To_Buy is ~0.99 correlated with Credit_Limit — drop it df = df.drop(columns=['Avg_Open_To_Buy'])
X = df.drop(columns=['Attrition_Flag']) y = df['Attrition_Flag'] X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42, stratify=y ) scaler = MinMaxScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test)
Toggle to see how unscaled features warp the SVM decision boundary
If you call scaler.fit_transform(X_test) instead of scaler.transform(X_test), you're "peeking" at test data. This causes data leakage — your accuracy looks great in training but falls apart in production.
Scaling is like converting all currencies to USD before comparing prices. Without it, 10,000 Japanese Yen looks way bigger than $90, even though they're roughly equal. SVM computes distances, so every feature must be on the same scale.
Now the fun part! We train two SVM models — one with a linear kernel and one with an RBF kernel — and compare their performance.
svm_linear = SVC(kernel='linear', C=1.0, random_state=42) svm_linear.fit(X_train, y_train) y_pred_lin = svm_linear.predict(X_test) print("Linear SVM Accuracy:", accuracy_score(y_test, y_pred_lin)) print(classification_report(y_test, y_pred_lin)) print("Support vectors:", svm_linear.n_support_)
svm_rbf = SVC(kernel='rbf', C=1.0, gamma=0.1, random_state=42) svm_rbf.fit(X_train, y_train) y_pred_rbf = svm_rbf.predict(X_test) print("RBF SVM Accuracy:", accuracy_score(y_test, y_pred_rbf)) print(classification_report(y_test, y_pred_rbf)) print("Support vectors:", svm_rbf.n_support_)
| Accuracy | 93.2% |
| Precision (Churn) | 0.88 |
| Recall (Churn) | 0.81 |
| F1-Score (Churn) | 0.84 |
| Support Vectors | [1234, 456] |
Linear draws a straight line (or flat plane in many dimensions) between churned and loyal customers. RBF can draw curvy, wavy boundaries — it finds patterns even when the groups aren't neatly separated by a straight cut.
Linear is like cutting a pizza with one straight slice. RBF is like using a cookie cutter — it can carve out any shape. RBF is more flexible but risks over-fitting if you're not careful.
Instead of guessing C, kernel, and gamma, we let the computer try every combination and pick the best one automatically.
param_grid = {
'C': [0.1, 1, 10, 100],
'kernel': ['linear', 'rbf'],
'gamma': [0.1, 0.001, 0.0001]
}
grid = GridSearchCV(
SVC(random_state=42),
param_grid,
cv=5,
scoring='f1',
verbose=1,
n_jobs=-1
)
grid.fit(X_train, y_train)
print("Best params:", grid.best_params_)
print("Best F1 score:", round(grid.best_score_, 4))Click any cell to see the C × gamma combo details. Darker = better F1.
Imagine you're baking cookies. You try 4 oven temperatures × 3 baking times and taste each batch. The combo that tastes best is your "best params." GridSearch does the same thing — it trains an SVM for every combination and scores each one, then returns the winner!
C is how strict the teacher is (high C = "zero tolerance for mistakes"). gamma is how closely the model looks at each data point (high gamma = "examines every detail with a magnifying glass"). We need the right balance — strict enough to be accurate, but not so strict that it memorizes the training data.
Now we train the final model using the best hyperparameters found by GridSearchCV, and thoroughly evaluate its performance.
# Train final model with best params best_svm = grid.best_estimator_ y_final = best_svm.predict(X_test) print("Final Accuracy:", round(accuracy_score(y_test, y_final), 4)) print(classification_report(y_test, y_final))
cm = confusion_matrix(y_test, y_final) plt.figure(figsize=(6,5)) sns.heatmap(cm, annot=True, fmt='d', cmap='Purples', xticklabels=['Stayed','Churned'], yticklabels=['Stayed','Churned']) plt.xlabel('Predicted') plt.ylabel('Actual') plt.title('Confusion Matrix — Final SVM') plt.show()
👆 Click any cell to learn what it means
from sklearn.metrics import roc_curve, auc # Need decision_function for ROC y_scores = best_svm.decision_function(X_test) fpr, tpr, thresholds = roc_curve(y_test, y_scores) roc_auc = auc(fpr, tpr) plt.figure(figsize=(7,5)) plt.plot(fpr, tpr, color='#7c3aed', lw=2, label=f'ROC curve (AUC = {roc_auc:.4f})') plt.plot([0,1], [0,1], '--', color='#94a3b8') plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('ROC Curve — Final SVM Model') plt.legend() plt.show()
Imagine a fire alarm. True Positive = alarm rings and there IS a fire (good!). False Positive = alarm rings but no fire (annoying). False Negative = fire but NO alarm (dangerous!). True Negative = no alarm, no fire (all good). We want lots of TP/TN and very few FP/FN.
Think of churn prediction like a doctor's check-up. The confusion matrix is your test result sheet: are you catching real diseases (TP) without scaring healthy people (FP)? The ROC curve shows how well the "test" performs overall — a bigger area under the curve means a more reliable diagnostic tool.
Only ~16% of customers churned. Accuracy alone can be misleading — a model that always says "Stayed" gets 84% accuracy! That's why we optimized for F1-score and checked recall specifically for the churned class. Consider SMOTE or class_weight='balanced' for even better results.
We taught a computer to spot unhappy bank customers by looking at how often they use their credit card, how much they spend, and how often they call the bank — and it gets it right about 96 out of 100 times!