Skip to main content

Confusion Matrix: Usage and Examples

In this article, we review the usage and examples for a multi-class confusion matrix using Weights & Biases.
Created on November 5|Last edited on November 16

Method: wandb.plot.confusion_matrix()

Log a multi-class confusion matrix in one line:
wandb.log({"conf_mat" : wandb.plot.confusion_matrix(probs=None,
y_true=ground_truth, preds=predictions,
class_names=class_names)})
You can log this wherever your code has access to:
  • a model's predicted labels on a set of examples (preds) or the normalized probability scores (probs). The probabilities must have the shape (number of examples, number of classes). You can supply either probabilities or predictions but not both.
  • the corresponding ground truth labels for those examples (y_true)
  • a full list of the labels/class names as strings (class_names, e.g. class_names=["cat", "dog", "bird"] if index 0 means cat, 1 = dog, 2 = bird, etc)

Try it yourself via Colab →

Table of Contents



Basic Usage

In this toy example, I finetune a CNN to predict one of 10 classes of living things in a photo (plants, animals, insects) while varying the number of epochs (E or pretrain_epochs) and the number of training examples (NT or num_train). I log a confusion matrix in each validation step, after each training epoch (so only the last confusion matrix at the end of training is visualized).

Powerful Interactions

In this confusion matrix chart, you can
  • easily review the relative performance of each model at a glance
  • focus on particular models by toggling the eye symbol next to each run in the table below to show/hide that run
  • hover for details : hold your mouse over the different bars in each cell to see the exact count for a given model in a given cell
  • filter to a subset of classes using the gear icon in the top right corner of the chart. This lets you type comma-separated class names (e.g. Amphibia,Reptilia) to zoom in on a particular subset of classes
  • normalize the counts using the gear icon to toggle between raw counts and probabilities in each cell of the confusion matrix. This is especially convenient for comparing model performance across different dataset sizes.

Observations on This Matrix

Runs colored closer to blue/violet above correspond to more training examples/more epochs, and these generally show stronger performance along the diagonal, compared to runs colored closer to red. Mollusks classified as just "animals" and Amphibians classified as Reptiles are the two most common mistakes of the largest model (train on 10,000 examples for 10 epochs). It's interesting to see the blue "NT 1000, E 10" model outperform this largest violet "NT 10000, E 10" model in several diagonal cells, even with 10 times less data—perhaps due to overfitting.


Vary num train and num epochs
9


Logging Details

In my validation step, I have access to val_data and the corresponding val_labels for all my validation examples, as well as my full list of possible labels for the model: all_labels=["Amphibia", "Animalia", ... "Reptilia"], which means an integer class label of 0 = Amphibia, 1 = Animalia, ... 9 = Reptilia). Referencing the model I have trained so far, I call the following in my validation callback.
val_predictions = model.predict(val_data)
top_pred_ids = val_predictions.argmax(axis=1)
ground_truth_ids = val_labels.argmax(axis=1)
wandb.log({"my_conf_mat_id" : wandb.plot.confusion_matrix(
preds=top_pred_ids, y_true=ground_truth_ids,
class_names=all_labels)})
This creates a confusion matrix and logs it to the "Custom Charts" section of my Workspace, under the specified key my_conf_mat_id. Keep this key fixed to display multiple runs on the same confusion matrix.
Note: I explicitly take the argmax of the prediction scores to return the class ids of the top predictions (highest confidence score) across the images: one per image. While this is the most common scenario for a confusion matrix, the W&B implementation allows for other ways of computing the relevant prediction class id to log. For example, you could use an embedding or distance function to find the most likely class, or you could easily account for top-N accuracy across many classes. You could also log precomputed probabilities via the probs argument, making sure these have the shape (number of examples, number of classes).
See the API definition for wandb.plot.confusion_matrix()→

Customize As You Wish

By editing the Vega spec, you can adjust various aspects of the chart using the Vega visualization grammar. For example, below I try two different color palettes for the model variants, and focus the confusion matrix on different subsets of classes using the gear pop-up menu in the top right corner. You can try this yourself in the last chart of this report by
  • clicking on the gear icon in the top right of the chart
  • checking the box to enable class filtering
  • typing in one or more comma-separated class names (the full list: Amphibia, Animalia, Arachnida, Aves, Fungi, Insecta, Mammalia, Mollusca, Plantae, Reptilia)
  • clicking the "X" when you're done
    



Vary num train and num epochs
6


Customization Details

Change the Color Scheme

  • Open the Vega spec for a Confusion Matrix chart in your report or workspace by hovering over the top right corner of the panel and clicking on the "edit" pencil icon
  • Click on the "Edit" button to edit the default Confusion Matrix preset
  • You will see the full Vega spec for the preset. Make the following changes:
    • on line 40, change "range": {"data": "wandb", "field": "color"} to "range" : {"scheme" : "plasma"} (or "rainbow", "viridis", or any Vega color scheme option)
    • on line 251, change "fill": {"field": "color"} to "fill": {"scale": "colorScale", "field": "name"}

Zoom In on a Subset of Classes

  • Open the Vega spec to "edit" as above.
  • On line 67, change "value": false to "value": true
  • On line 75, type your class selection into "value", e.g. "value": "Animalia,Plantae"

Save Your Changes

  • If you'd like to save this custom version for future use, click "Save as" in the top left and give your new preset a memorable descriptive name. You will now be able to use it across projects under your username. Otherwise, click on "Apply custom panel visualization" in the bottom right to apply changes to the current chart instance only.

One Last Comparison

This toy model tends to over-predict plants and insects even when training on the full dataset. You can try your own experiments with a custom confusion matrix in this Colab. Please ask any questions & let me know how it goes in the comments below!

Toy CNN runs
4

Axel Jacobsen
Axel Jacobsen •  
Howdy! I wonder if there is a way to construct a visualization from an already-computed confusion matrix? Thanks! Axel
1 reply
Alberto Presta
Alberto Presta •  
How can I put a title to the confusion matrix?
Reply
James Bonello
James Bonello •  *
I seem to only be able to get the confusion matrix for the last step rather than for all the steps (all epochs). How can I use the log() method to aggregate the counts for all steps like the way we use it to log loss and such?
1 reply
Sayak Paul
Sayak Paul •  
Fascinating feature. Is it possible to look through the confusion matrices epoch-wise?
1 reply
Frankie Robertson
Frankie Robertson •  
Is there any way to look at the confusion matrix results historically across a single training session?
1 reply
Ayush Thakur
Ayush Thakur •  
Hey Stacey, this is a great feature. This is going to make classifier comparison so easy. Icing on the cake is the ability to filter to a subset of classes. A great addition to this feature would be the ability to do a few computation in the backend. For example, if I want to visualize the confusion matrix for top 10 confused classes. This can be helpful easpecially for classifiers with 100 classes. Also this might not be a good place to ask but what according to you is the best practice to use such a chart which is based on colors by a partially color blinded individual. How can the existing feature be modified minimally to support such users?
1 reply
Iterate on AI agents and models faster. Try Weights & Biases today.