Confusion Matrix: Usage and Examples

In this article, we review the usage and examples for a multi-class confusion matrix using Weights & Biases.

Created on November 5|Last edited on November 16

Comment

﻿
Method: wandb.plot.confusion_matrix()Log a multi-class confusion matrix in one line:
wandb.log({"conf_mat" : wandb.plot.confusion_matrix(probs=None,
                        y_true=ground_truth, preds=predictions,
                        class_names=class_names)})
You can log this wherever your code has access to:
a model's predicted labels on a set of examples (preds) or the normalized probability scores (probs). The probabilities must have the shape (number of examples, number of classes). You can supply either probabilities or predictions but not both.
the corresponding ground truth labels for those examples (y_true)
a full list of the labels/class names as strings (class_names,  e.g. class_names=["cat", "dog", "bird"] if index 0 means cat, 1 = dog, 2 = bird, etc)
﻿Try it yourself via Colab →﻿﻿﻿
Table of ContentsMethod: wandb.plot.confusion_matrix()Basic UsagePowerful InteractionsLogging DetailsCustomize As You WishCustomization DetailsOne Last Comparison
﻿
Basic UsageIn this toy example, I finetune a CNN to predict one of 10 classes of living things in a photo (plants, animals, insects) while varying the number of epochs (E or pretrain_epochs) and the number of training examples (NT or num_train). I log a confusion matrix in each validation step, after each training epoch (so only the last confusion matrix at the end of training is visualized). 
Powerful InteractionsIn this confusion matrix chart, you can
easily review the relative performance of each model at a glance
focus on particular models by toggling the eye symbol next to each run in the table below to show/hide that run
hover for details : hold your mouse over the different bars in each cell to see the exact count for a given model in a given cell
filter to a subset of classes using the gear icon in the top right corner of the chart. This lets you type comma-separated class names (e.g. Amphibia,Reptilia) to zoom in on a particular subset of classes
normalize the counts using the gear icon to toggle between raw counts and probabilities in each cell of the confusion matrix. This is especially convenient for comparing model performance across different dataset sizes.
Observations on This MatrixRuns colored closer to blue/violet above correspond to more training examples/more epochs, and these generally show stronger performance along the diagonal, compared to runs colored closer to red. Mollusks classified as just "animals" and Amphibians classified as Reptiles are the two most common mistakes of the largest model (train on 10,000 examples for 10 epochs). It's interesting to see the blue "NT 1000, E 10" model outperform this largest violet "NT 10000, E 10" model in several diagonal cells, even with 10 times less data—perhaps due to overfitting.﻿﻿
﻿
﻿
Vary num train and num epochs9
﻿
Logging DetailsIn my validation step, I have access to val_data and the corresponding val_labels for all my validation examples, as well as my full list of possible labels for the model: all_labels=["Amphibia", "Animalia", ... "Reptilia"], which means an integer class label of 0 = Amphibia, 1 = Animalia, ... 9 = Reptilia). Referencing the model I have trained so far, I call the following in my validation callback.
val_predictions = model.predict(val_data)
top_pred_ids = val_predictions.argmax(axis=1)
ground_truth_ids = val_labels.argmax(axis=1)
wandb.log({"my_conf_mat_id" : wandb.plot.confusion_matrix( 
            preds=top_pred_ids, y_true=ground_truth_ids,
            class_names=all_labels)})
This creates a confusion matrix and logs it to the "Custom Charts" section of my Workspace, under the specified key my_conf_mat_id. Keep this key fixed to display multiple runs on the same confusion matrix.
Note: I explicitly take the argmax of the prediction scores to return the class ids of the top predictions (highest confidence score) across the images: one per image. While this is the most common scenario for a confusion matrix, the W&B implementation allows for other ways of computing the relevant prediction class id to log. For example, you could use an embedding or distance function to find the most likely class, or you could easily account for top-N accuracy across many classes. You could also log precomputed probabilities via the probs argument, making sure these have the shape (number of examples, number of classes).
See the API definition for wandb.plot.confusion_matrix()→﻿
Customize As You WishBy editing the Vega spec, you can adjust various aspects of the chart using the Vega visualization grammar. For example, below I try two different color palettes for the model variants, and focus the confusion matrix on different subsets of classes using the gear pop-up menu in the top right corner. You can try this yourself in the last chart of this report by
clicking on the gear icon in the top right of the chart
checking the box to enable class filtering
typing in one or more comma-separated class names (the full list: Amphibia, Animalia, Arachnida, Aves, Fungi, Insecta, Mammalia, Mollusca, Plantae, Reptilia)
clicking the "X" when you're done
﻿
﻿
﻿
﻿
Vary num train and num epochs6
﻿
Customization Details
Change the Color SchemeOpen the Vega spec for a Confusion Matrix chart in your report or workspace by hovering over the top right corner of the panel and clicking on the "edit" pencil icon
Click on the "Edit" button to edit the default Confusion Matrix preset
You will see the full Vega spec for the preset. Make the following changes:
on line 40, change "range": {"data": "wandb", "field": "color"} to "range" : {"scheme" : "plasma"} (or "rainbow", "viridis", or any Vega color scheme option)
on line 251, change "fill": {"field": "color"} to "fill": {"scale": "colorScale", "field": "name"}
Zoom In on a Subset of ClassesOpen the Vega spec to "edit" as above.
On line 67, change "value": false to "value": true
On line 75, type your class selection into "value", e.g. "value": "Animalia,Plantae"
Save Your ChangesIf you'd like to save this custom version for future use, click "Save as" in the top left and give your new preset a memorable descriptive name. You will now be able to use it across projects under your username. Otherwise, click on "Apply custom panel visualization" in the bottom right to apply changes to the current chart instance only.
One Last ComparisonThis toy model tends to over-predict plants and insects even when training on the full dataset. You can try your own experiments with a custom confusion matrix in this Colab. Please ask any questions & let me know how it goes in the comments below!
﻿
Toy CNN runs4
﻿
﻿

Add a comment

Axel Jacobsen • 3 years ago

Howdy! I wonder if there is a way to construct a visualization from an already-computed confusion matrix? Thanks! Axel

1 reply

Alberto Presta • 3 years ago

How can I put a title to the confusion matrix?

James Bonello • 3 years ago*

I seem to only be able to get the confusion matrix for the last step rather than for all the steps (all epochs). How can I use the log() method to aggregate the counts for all steps like the way we use it to log loss and such?

1 reply

Sayak Paul • 4 years ago

Fascinating feature. Is it possible to look through the confusion matrices epoch-wise?

1 reply

Frankie Robertson • 4 years ago

Is there any way to look at the confusion matrix results historically across a single training session?

1 reply

Ayush Thakur • 5 years ago

Hey Stacey, this is a great feature. This is going to make classifier comparison so easy. Icing on the cake is the ability to filter to a subset of classes. A great addition to this feature would be the ability to do a few computation in the backend. For example, if I want to visualize the confusion matrix for top 10 confused classes. This can be helpful easpecially for classifiers with 100 classes. Also this might not be a good place to ask but what according to you is the best practice to use such a chart which is based on colors by a partially color blinded individual. How can the existing feature be modified minimally to support such users?

1 reply

Tags: Beginner, Domain Agnostic, W&B Meta, Github, Custom Charts, Plots, iNaturalist

Iterate on AI agents and models faster. Try Weights & Biases today.

Confusion Matrix: Usage and Examples

Method: wandb.plot.confusion_matrix()

﻿Try it yourself via Colab →﻿﻿﻿

Table of Contents

Basic Usage

Powerful Interactions

Observations on This Matrix

Logging Details

Customize As You Wish

Customization Details

Change the Color Scheme

Zoom In on a Subset of Classes

Save Your Changes

One Last Comparison

Try it yourself via Colab →