Every line of the regularization (Ridge/Lasso) code explained in simple words. Use the same dataset as in the lesson.
pd.read_csv("auto_mpg.csv") works.
We load the libraries we need.
import pandas as pd import seaborn as sns from sklearn.linear_model import LinearRegression, Ridge, Lasso import matplotlib.pyplot as plt import warnings warnings.filterwarnings("ignore")
Read the CSV file into a DataFrame.
data = pd.read_csv("auto_mpg.csv")
auto_mpg.csv from the current folder and stores it in a variable called data. Each row is a car; columns are mpg, cylinders, displacement, horsepower, weight, acceleration, model year, origin, car name.See the first rows and column types.
data.head(15) # First 15 rows data.info() # Column names, types, and non-null counts
In this dataset, missing values in horsepower are stored as ?. We replace them with the average and convert to numbers.
data['horsepower'] = data['horsepower'].str.replace('?', 'NaN').astype(float) data['horsepower'].fillna(data['horsepower'].mean(), inplace=True) data['horsepower'] = data['horsepower'].astype(int)
? in the horsepower column with NaN (missing), then converts the column to float so we can do math.inplace=True means we change the column in place.We predict mpg (miles per gallon) from the other columns. We drop the car name (not useful for prediction), then split into train and test.
data.drop(columns=['car name'], axis=1, inplace=True) from sklearn.model_selection import train_test_split train, test = train_test_split(data, test_size=0.20, random_state=0) y_train = train.pop('mpg') y_test = test.pop('mpg') X_train = train X_test = test
random_state=0 makes the split the same every time.Ridge shrinks all weights. alpha is the strength of the penalty (lambda).
from sklearn.linear_model import Ridge ridge = Ridge(alpha=10.0) ridge.fit(X_train, y_train) print(f"Ridge R² on test: {ridge.score(X_test, y_test):.3f}")
alpha=10.0 means a strong penalty; try 0.1 or 1.0 for weaker regularization.Lasso can set some weights to exactly zero, so it also does feature selection.
from sklearn.linear_model import Lasso lasso = Lasso(alpha=0.1) lasso.fit(X_train, y_train) print(f"Lasso R² on test: {lasso.score(X_test, y_test):.3f}") # See which features Lasso kept (non-zero coefficients) for name, coef in zip(X_train.columns, lasso.coef_): if coef != 0: print(f" {name}: {coef:.3f}")
Try different alpha values and see how R² and the number of non-zero coefficients change. Higher alpha → stronger regularization → simpler model.
# Try different alphas for alpha in [0.01, 0.1, 1.0, 10.0]: m = Lasso(alpha=alpha) m.fit(X_train, y_train) nz = sum(1 for c in m.coef_ if c != 0) print(f"alpha={alpha}: R²={m.score(X_test, y_test):.3f}, non-zero coefs={nz}")