Ridge Regression: Extending Linear Regression

An illustration covering Ridge Regression as an extension of Linear Regression using scikit-learn, complete with code and interactive visualizations.
Saurav Maheshkar
Created on August 9|Last edited on August 26
Comment
﻿
﻿
Table of Contents (Click to Expand)
IntroductionFrom our previous report on Linear Regression:
In its most basic form, linear regression is a statistical technique attempts to model the relationship between variables by fitting a linear equation. One variable is usually considered to be the target/dependent variable / feature ('desired output'), and the others are known as explanatory variables / features ('input'). 
An Introduction to Linear Regression For Machine Learning (With Examples)
In this article, we provide an overview of, and a tutorial on, linear regression using scikit-learn, with code and interactive visualizations so you can follow.
﻿
In the last report we used the method of least squares; one of the most common methods used to estimate the value of regression coefficients. Let's extend this method to what is commonly called Ridge Regression.
Ridge Regression MethodGiven some data points (x1,y1),(x2,y2),...,(xn,yn)\large (x_1, y_1), (x_2, y_2), ..., (x_n, y_n)(x1​,y1​),(x2​,y2​),...,(xn​,yn​)﻿ and if we want to find the linear relationship between x\large xx﻿ and y\large yy﻿ (x≈xTβ\large x \approx x^T \betax≈xTβ﻿), our linear regression formulation of the method would be:
minimize:1n∑i=1n(xiTβ−yi)2\huge \text{minimize:} \hspace{1em} \frac{1}{n} \displaystyle \sum_{i=1}^{n} (x_i^T \beta -y_i)^2minimize:n1​i=1∑n​(xiT​β−yi​)2﻿
Ridge Regression simply takes this a bit further and adds another penalty term.
minimize:1n∑i=1n(xiTβ−yi)2+λ∑j=1αβj2\huge \text{minimize:} \hspace{1em} \frac{1}{n} \displaystyle \sum_{i=1}^{n} (x_i^T \beta -y_i)^2 + \lambda\displaystyle \sum_{j=1}^{\alpha} \beta_j^2minimize:n1​i=1∑n​(xiT​β−yi​)2+λj=1∑α​βj2​﻿
MotivationWhy do we add another term for this?
Simply put: if the target variables are highly correlated with each other then simple least squares regression won't work. It often leads to high statistical variance making the model unreliable. Well one approach might be to simply remove the variables that cause multicollinearity. But more often than not as Data Scientists or Machine Learning Engineers you won't have that freedom, that's why this Penalty Term comes in. If any two target variables have significant correlation the penalty term increases leading to a bigger loss thereby making the model more robust to multicollinearity.
Using sklearn﻿scikit-learn makes it incredibly easy for you to use Ridge. It's available in the sklearn.linear_model submodule and has a very simple API design.
from sklearn.linear_model import Ridge
﻿
x, y = get_dataset()
model = Ridge(alpha=0.75)
model.fit(x, y)
It's even easier to plot the Learning Curve using Weights & Biases!! For instance the graph below was plotted using the following line.
wandb.sklearn.plot_learning_curve(model, x, y)
﻿
Run set3
﻿
Summary In this article, you saw how Ridge Regression is simply an extension of Linear Regression and learned how we can use scikit-learn to train a Ridge Regression model and plot valuable metrics and data using a suite of W&B tools. 
To see the full suite of W&B features please check out this short 5 minutes guide. If you want more reports covering the math and from scratch code implementations let us know in the comments down below or on our forum ✨!
Check out these other reports on Fully Connected covering other fundamental development topics like GPU Utilization and Saving Models.
Recommended Reading
An Introduction to Linear Regression For Machine Learning (With Examples)
In this article, we provide an overview of, and a tutorial on, linear regression using scikit-learn, with code and interactive visualizations so you can follow.
PyTorch Dropout for regularization - tutorial 
Learn how to regularize your PyTorch model with Dropout, complete with a code tutorial and interactive visualizations
How To Use GPU with PyTorch 
A short tutorial on using GPUs for your deep learning models with PyTorch, from checking availability to visualizing usable.
How to Initialize Weights in PyTorch
A short tutorial on how you can initialize weights in PyTorch with code and interactive visualizations.
How To Calculate Number of Model Parameters for PyTorch and TensorFlow Models
This article provides a short tutorial on calculating the number of parameters for TensorFlow and PyTorch deep learning models, with examples for you to follow.
Tutorial: Regression and Classification on XGBoost
A short tutorial on how you can use XGBoost with code and interactive visualizations.
﻿
﻿
Add a comment
Tags: Articles, Domain Agnostic, Tutorial, Intermediate
Iterate on AI agents and models faster. Try Weights & Biases today.