Skip to main content

Original Visualize Data for Image Classification

Version and interactively explore data and predictions across train/val/test
Created on January 23|Last edited on July 20

Overview

This is a walkthrough of dataset and prediction visualization (DSViz) and artifacts for image classification on W&B. Here I finetune a convnet in Keras on 10,000 photos from iNaturalist 2017 to identify 10 classes of living things (plants, insects, birds, etc).

Follow along in this colab →

Train on versioned data and visualize predictions

  1. Upload raw data
  2. Create a balanced split (train/val/test)
  3. Train model and validate predictions
  4. Run inference & explore results

Project workflow

Here is the artifacts graph for this project, with datasets, predictions, and models connected by the runs creating/consuming them.

Screen Shot 2020-12-11 at 8.33.46 AM.png



1. Upload raw data

My raw data for this project contains 10,000 images, organized into 10 subfolders. The name of each subfolder is the ground truth label for the images it contains (Amphibia, Animalia, Arachnida...Reptilia). With Artifacts, I can upload my full raw dataset and automatically track and version all the different ways I may subsequently decide to generate my train/val/test splits (how many items per split or per class, balanced or unbalanced, hold out test set or no, etc). Screen Shot 2020-12-11 at 6.08.51 AM.png



2. Create a balanced split

Verify data distribution: Group by "split"

Confirm the data distribution across labels for each split. Here I have 400 images for each label in train, 50 in val, and 50 in test.



Preview the images: Group by "label"

Group by "label" to see all the images by their true class. You can scroll horizontally through the images in each cell using the arrows.




3. Train model and validate predictions

Check the predictions: Group by "truth"

See the distribution of predictions and scores for a given correct class. Interestingly in this example, many of the "Animalia" in the second row are sea creatures, easily confused with mollusks.
  • sort by "truth" (true label name) to alphabetize the table
  • group by "truth" to see a distribution of guesses for each true label
Sort and group actions are available for each relevant column. Hover over the column header, click on the three-dot menu on the right, and select an action.


Dive into the confusing classes: Group by "guess"

When the model guesses a particular class, what is the distribution of true labels for those guesses? Here we can see that "Mollusks" are a popular confound for "Animals" (second row, "truth" column).
  • sort by "guess" to alphabetize; group by "guess" to see distributions of true labels


Focus on a subset: Filter by true class

Let's look at just the mollusks, animals, and insects. Crustaceans and slugs are especially confusing because of the context: the model may be picking up on common backgrounds (underwater, tide pools, grass) or hands (frequently holding the smaller creatures).
  • click the "Filter" button to enter a query selecting the true labels of interest. You can type and see hints in the dropdown to formulate an expression like x["truth"] = "Mollusca" or x["truth"] = "Animalia" or x["truth"] = "Insecta"
  • sort and group by "truth"
  • edit the first three score columns via the header menu (to rearrange them and prioritize the score columns for the classes you've just selected: Animalia, Mollusca, and Insecta


Focus on the confounds: Filter by "guess"

Let's look at only images with a true label of Amphibia which were misclassified.
  • filter query: x["truth"] = "Amphibia" and x["guess"] != "Amphibia"
  • then sort by score_Amphibia to show the closest guesses/worst errors
    
  • or group by guess to try to see systematic errors per class


Compare across model versions

Here I compare two versions of my model, one trained for a single epoch (left) and one trained for five epochs (right). You can see that performance and confidence generally improves with more training.
To compare versions from any model artifact view:
  • Using the left sidebar, select one version of the model, then hover over another version and click "Compare" to select it. Then change the dropdown on the right of the table name from WbTableFile to Split Panel -> WbTableFile
  • sort by "truth", group by "truth" as before—now you can compare the prediction distributions across the two models



Focus on confusing classes: Filter by mislabeled images

After 1 epoch, the model misclassifies 30 images. After 5 epochs, this drops to 5!
Filter query: x["truth"] = "Plantae" and x["guess"] != "Plantae"

Filter by specific guess and sort by the guessed score for more detail:
Filter query: x["guess"] = "Reptilia" and x["truth"] != "Reptilia" and sort by "score_Reptilia"


Focus on score distributions across models

If you log an id column for the validation images, you can compare score distributions for each class during the course of training. From the default comparison view, group by "truth". In these histograms, 0/blue shows the predictions after 1 epoch, and 1/orange after 3 epochs. With more training, the frequency of higher confidence scores increases (taller orange bars at the right of the prediction score range). Animalia is the largest and most confusing category, and after three epochs the model makes different (and slightly fewer) mistakes, based on some smaller and some totally new orange bars.




4. Run inference and explore results

Test predictions table and split view table, 1 epoch left vs 5 epochs right

Here I compare two versions of the same model trained for one (left) versus five (right) epochs on the same set of test images. In many scenarios, we won't have labels for test data, but we do here for illustration purposes.

Compare across the same images

Sort and group by truth as before. Note that the model does tend to improve on the right, with more peaked guess distributions.

Screen Shot 2020-12-11 at 8.12.10 AM.png

Filter to a subset of classes

With more epochs, there is less confusion among mollusks, animals, and insects.

  • Filter query: x["truth"] = "Mollusca" or x["truth"] = "Animalia" or x["truth"] = "Insecta"

  • Sort and group by truth

    Screen Shot 2020-12-11 at 8.13.31 AM.png

See most confused classes: Group by guess

After longer training, the model generally makes fewer mistakes. However the "Fungi" mistakes on the right (right column, fourth row down) are new and interesting—perhaps this is the effect of the background/overall shape? Screen Shot 2020-12-11 at 8.15.14 AM.png

Compare across individual images

See how the class predictions for the same images change with more training—generally more confident on the right.

Screen Shot 2020-12-11 at 8.23.19 AM.png

Focus on top confused images for a particular (truth, guess) pair

Optionally filter by labels, then sort by score.

Screen Shot 2020-12-11 at 8.27.18 AM.png



Interesting finds

Context is everything

Here are some Plantae from the dataset which the models (L: 1 epoch, R: 5 epochs) failed to identify as plants. It looks like the background/visual context of the living thing might influence the prediction. In the last row on the left, the phots of field/forest scenes are more canonical for images of mammals. In the first row on the right, the bare earth is more typical context for Fungi photos. And the pitcher plant reptile in the bottom right definitely fooled me.

Screen Shot 2020-12-11 at 12.54.14 PM.png

False positives as evolutionary advantage

The "eyespot" patterns on the butterfly in the third row image look amazingly like the face of a reptile or snake (you may need to zoom in on this page to see the nostril spots). There's some evidence that such coloration evolved to discourage predators. Note that the confidence scores for the last image are almost evenly split across Insecta, Reptilia, Amphibia, and even Arachnida (a bit lower for this last one). If we wanted to ship this model to production, adding a minimum score threshold for detecting any particular class would filter out particularly confusing images like this one. best butterfly frog.png

Add your own?

If you find any interesting insights or patterns in our interactive example, please comment below—we'd love to see them!