Skip to main content

Ridge Regression: Extending Linear Regression

An illustration covering Ridge Regression as an extension of Linear Regression using scikit-learn, complete with code and interactive visualizations.
Created on August 9|Last edited on August 26


Table of Contents (Click to Expand)

Introduction

From our previous report on Linear Regression:
In its most basic form, linear regression is a statistical technique attempts to model the relationship between variables by fitting a linear equation. One variable is usually considered to be the target/dependent variable / feature ('desired output'), and the others are known as explanatory variables / features ('input').

In the last report we used the method of least squares; one of the most common methods used to estimate the value of regression coefficients. Let's extend this method to what is commonly called Ridge Regression.

Ridge Regression Method

Given some data points (x1,y1),(x2,y2),...,(xn,yn)\large (x_1, y_1), (x_2, y_2), ..., (x_n, y_n) and if we want to find the linear relationship between x\large x and y\large y (xxTβ\large x \approx x^T \beta), our linear regression formulation of the method would be:
minimize:1ni=1n(xiTβyi)2\huge \text{minimize:} \hspace{1em} \frac{1}{n} \displaystyle \sum_{i=1}^{n} (x_i^T \beta -y_i)^2

Ridge Regression simply takes this a bit further and adds another penalty term.
minimize:1ni=1n(xiTβyi)2+λj=1αβj2\huge \text{minimize:} \hspace{1em} \frac{1}{n} \displaystyle \sum_{i=1}^{n} (x_i^T \beta -y_i)^2 + \lambda\displaystyle \sum_{j=1}^{\alpha} \beta_j^2


Motivation

Why do we add another term for this?
Simply put: if the target variables are highly correlated with each other then simple least squares regression won't work. It often leads to high statistical variance making the model unreliable. Well one approach might be to simply remove the variables that cause multicollinearity. But more often than not as Data Scientists or Machine Learning Engineers you won't have that freedom, that's why this Penalty Term comes in. If any two target variables have significant correlation the penalty term increases leading to a bigger loss thereby making the model more robust to multicollinearity.

Using sklearn

scikit-learn makes it incredibly easy for you to use Ridge. It's available in the sklearn.linear_model submodule and has a very simple API design.
from sklearn.linear_model import Ridge

x, y = get_dataset()
model = Ridge(alpha=0.75)
model.fit(x, y)
It's even easier to plot the Learning Curve using Weights & Biases!! For instance the graph below was plotted using the following line.
wandb.sklearn.plot_learning_curve(model, x, y)

Run set
3


Summary

In this article, you saw how Ridge Regression is simply an extension of Linear Regression and learned how we can use scikit-learn to train a Ridge Regression model and plot valuable metrics and data using a suite of W&B tools.
To see the full suite of W&B features please check out this short 5 minutes guide. If you want more reports covering the math and from scratch code implementations let us know in the comments down below or on our forum ✨!
Check out these other reports on Fully Connected covering other fundamental development topics like GPU Utilization and Saving Models.

Iterate on AI agents and models faster. Try Weights & Biases today.