Skip to main content

Visualize Scikit-Learn Models with Weights & Biases

This article explores how to visualize the performance of your scikit-learn model with just a few lines of code using Weights & Biases.
Created on January 24|Last edited on November 3
In this article, I'll show you how to visualize your scikit-learn model's performance with just a few lines of code. We’ll also explore how each of these plots help us understand our model better.
Creating these plots is simple.

Table of Contents




Step 1: Import Weights & Biases and initialize a new run.

import wandb
wandb.init(project="visualize-sklearn")

Step 2: Visualize individual plots.

# Visualize single plot
wandb.sklearn.plot_confusion_matrix(y_true, y_probas, labels)

Or visualize all plots at once:

# Visualize all the plots in the Classification section below with one line of code
wandb.sklearn.plot_classifier(clf, X_train, X_test, y_train, y_test, y_pred, y_probas, labels,
model_name='SVC', feature_names=None)

# Visualize all the plots in the Regression section below with one line of code
wandb.sklearn.plot_regressor(reg, X_train, X_test, y_train, y_test, model_name='Ridge')

# Visualize all the plots in the Clustering section below with one line of code
wandb.sklearn.plot_clusterer(kmeans, X_train, cluster_labels, labels=None, model_name='KMeans')

If you have any questions, we'd love to answer them in our slack community.

Classification

The Dataset

In this report, I trained several models on the Titanic dataset, which describes the passengers aboard the Titanic. Our goal is to predict whether the passenger survived or not.

Learning Curve

Trains model on datasets of varying lengths and generates a plot of cross validated scores vs dataset size, for both training and test sets.
Here we can observe that our model is overfitting. While it performs well on the training set right off the bat, the test accuracy gradually improves but never quite achieves parity with the training accuracy.
Example
wandb.sklearn.plot_learning_curve(model, X, y)
  • model (clf or reg): Takes in a fitted regressor or classifier.
  • X (arr): Dataset features.
  • y (arr): Dataset labels.

ROC Curve

ROC curves plot true positive rate (y-axis) vs false positive rate (x-axis). The ideal score is a TPR = 1 and FPR = 0, which is the point on the top left. Typically we calculate the area under the ROC curve (AUC-ROC), and the greater the AUC-ROC the better.
Here we can see our model is slightly better at predicting the class Survived, as evidenced by the larger AUC-ROC.
Example
wandb.sklearn.plot_roc(y_true, y_probas, labels)
  • y_true (arr): Test set labels.
  • y_probas (arr): Test set predicted probabilities.
  • labels (list): Named labels for target varible (y).

Class Proportions

Plots the distribution of target classes in training and test sets. Useful for detecting imbalanced classes and ensuring that one class doesn't have a disproportionate influence on the model.
Here we can see we have more examples of passengers who didn't survive than of those who survived. The training and test set seem to share the distribution of target classes, which is great news for generalizing our model outputs.
Example
wandb.sklearn.plot_class_proportions(y_train, y_test, ['dog', 'cat', 'owl'])
  • y_train (arr): Training set labels.
  • y_test (arr): Test set labels.
  • labels (list): Named labels for target variable (y).

Precision Recall Curve

Computes the tradeoff between precision and recall for different thresholds. A high area under the curve represents both high recall and high precision, where high precision relates to a low false positive rate, and high recall relates to a low false negative rate.
High scores for both show that the classifier is returning accurate results (high precision), as well as returning a majority of all positive results (high recall). PR curve is useful when the classes are very imbalanced.
Example
wandb.sklearn.plot_precision_recall(y_true, y_probas, labels)
  • y_true (arr): Test set labels.
  • y_probas (arr): Test set predicted probabilities.
  • labels (list): Named labels for target varible (y).

Feature Importances

Evaluates and plots the importance of each feature for the classification task. Only works with classifiers that have a `feature_importances_` attribute, like trees.
Here we can see that `Title` (Miss, Mrs, Mr, Master) was highly indicative of who survived. This makes sense because `Title` simultaneously captures the gender, age and the social status of the passengers. It's curious that `name_length` was the second most predictive feature, and it might be interesting to dig into why that was the case.
Example
wandb.sklearn.plot_feature_importances(model, ['width', 'height, 'length'])
  • model (clf): Takes in a fitted classifier.
  • feature_names (list): Names for features. Makes plots easier to read by replacing feature indexes with corresponding names.

Calibration Curve

Plots how well calibrated the predicted probabilities of a classifier are and how to calibrate an uncalibrated classifier. Compares estimated predicted probabilities by a baseline logistic regression model, the model passed as an argument, and by both its isotonic calibration and sigmoid calibrations.
The closer the calibration curves are to a diagonal the better. A transposed sigmoid like curve represents an overfitted classifier, while a sigmoid like curve represents an underfitted classifier. By training isotonic and sigmoid calibrations of the model and comparing their curves we can figure out whether the model is over or underfitting and if so which calibration (sigmoid or isotonic) might help fix this.
For more details, check out sklearn's docs.
In this case, we can see that vanilla AdaBoost suffers from overfitting (as evidenced by the transposed sigmoid curve), potentially because of redundant features (like `title`) which violate the feature-independence assumption. Calibrating AdaBoost using sigmoid calibration seems to be most effective in fixing this overfitting.
Example
wandb.sklearn.plot_calibration_curve(clf, X, y, 'RandomForestClassifier')
  • model (clf): Takes in a fitted classifier.
  • X (arr): Training set features.
  • y (arr): Training set labels.
  • model_name (str): Model name. Defaults to 'Classifier'


Confusion Matrix

Computes the confusion matrix to evaluate the accuracy of a classification. It's useful for assessing the quality of model predictions and finding patterns in the predictions the model gets wrong. The diagonal represents the predictions the model got right, i.e. where the actual label is equal to the predicted label.
Example
wandb.sklearn.plot_confusion_matrix(y_true, y_probas, labels)
  • y_true (arr): Test set labels.
  • y_probas (arr): Test set predicted probabilities.
  • labels (list): Named labels for target varible (y).


Summary Metrics

Calculates summary metrics (like f1, accuracy, precision and recall for classification and mse, mae, r2 score for regression) for both regression and classification algorithms.
Example
wandb.sklearn.plot_summary_metrics(model, X_train, X_test, y_train, y_test)
  • model (clf or reg): Takes in a fitted regressor or classifier.
  • X (arr): Training set features.
  • y (arr): Training set labels.
  • X_test (arr): Test set features.
  • y_test (arr): Test set labels.

AdaBoost
7


Clustering

Elbow Plot

Measures and plots the percentage of variance explained as a function of the number of clusters, along with training times. Useful in picking the optimal number of clusters.
Here we can see that the optimal number of clusters according to the elbow plot is 3, which is reflective of the dataset (which has 3 classes – Iris Setosa, Iris Versicolour, Iris Virginica).
Example
wandb.sklearn.plot_elbow_curve(model, X_train)
  • model (clusterer): Takes in a fitted clusterer.
  • X (arr): Training set features.

Run set
1


Silhouette Plot

Measures & plots how close each point in one cluster is to points in the neighboring clusters. The thickness of the clusters corresponds to the cluster size. The vertical line represents the average silhouette score of all the points.
Silhouette coefficients near +1 indicate that the sample is far away from the neighboring clusters.
A value of 0 indicates that the sample is on or very close to the decision boundary between two neighboring clusters and negative values indicate that those samples might have been assigned to the wrong cluster.
In general, we want all silhouette cluster scores to be above average (past the red line) and as close to 1 as possible. We also prefer cluster sizes that reflect the underlying patterns in the data.
Example
wandb.sklearn.plot_silhouette(model, X_train, ['spam', 'not spam'])
  • model (clusterer): Takes in a fitted clusterer.
  • X (arr): Training set features.
  • cluster_labels (list): Names for cluster labels. Makes plots easier to read by replacing cluster indexes with corresponding names.

Regression

Outlier Candidates Plot

Measures a datapoint's influence on regression model via cook's distance. Instances with heavily skewed influences could potentially be outliers. Useful for outlier detection.
Example
wandb.sklearn.plot_outlier_candidates(model, X, y)
  • model (regressor): Takes in a fitted classifier.
  • X (arr): Training set features.
  • y (arr): Training set labels.


Run set
1


Residuals Plot

Measures and plots the predicted target values (y-axis) vs the difference between actual and predicted target values (x-axis), as well as the distribution of the residual error.
Generally, the residuals of a well-fit model should be randomly distributed because good models will account for most phenomena in a data set, except for random error.
Here we can see most of the error made by our model is between +/-5, and is evenly distributed for both training and test datasets.
Example
wandb.sklearn.plot_residuals(model, X, y)
  • model (regressor): Takes in a fitted classifier.
  • X (arr): Training set features.
  • y (arr): Training set labels.

Try it for yourself

Let's walk through a complete example.
!pip install wandb -qq
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.tree import DecisionTreeClassifier
import matplotlib.pyplot as plt
import pandas as pd
import wandb
wandb.init(project="sklearn")

# Load data
boston = load_boston()
X = pd.DataFrame(boston.data, columns=boston.feature_names)
y = boston.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train model, get predictions
reg = Ridge()
reg.fit(X, y)
y_pred = reg.predict(X_test)

# Visualize all regression plots
wandb.sklearn.plot_regressor(reg, X_train, X_test, y_train, y_test, 'Ridge')

# Make individual plots
wandb.sklearn.plot_outlier_candidates(reg, X, y)

If you have any questions, we'd love to answer them in our Slack community.










# Visualize all regression plots
wandb.sklearn.plot_regressor(reg, X_train, X_test, y_train, y_test, 'Ridge')

# Make individual plots
wandb.sklearn.plot_outlier_candidates(reg, X, y)
```

### If you have any questions, we'd love to answer them in our [slack community](http://bit.ly/wandb-forum).
Iterate on AI agents and models faster. Try Weights & Biases today.