Original Visualize Data for Image Classification

Version and interactively explore data and predictions across train/val/test
Created on January 23|Last edited on July 20
Comment
﻿
OverviewThis is a walkthrough of dataset and prediction visualization (DSViz) and artifacts for image classification on W&B. Here I finetune a convnet in Keras on 10,000 photos from  iNaturalist 2017 to identify 10 classes of living things (plants, insects, birds, etc).
Follow along in this colab  →Train on versioned data and visualize predictionsUpload raw data
Create a balanced split (train/val/test)
Train model and validate predictions
Run inference & explore results
Project workflowHere is the artifacts graph for this project, with datasets, predictions, and models connected by the runs creating/consuming them.
 
﻿
1. Upload raw dataMy raw data for this project contains 10,000 images, organized into 10 subfolders. The name of each subfolder is the ground truth label for the images it contains (Amphibia, Animalia, Arachnida...Reptilia). With Artifacts, I can upload my full raw dataset and automatically track and version all the different ways I may subsequently decide to generate my train/val/test splits (how many items per split or per class, balanced or unbalanced, hold out test set or no, etc).
﻿
2. Create a balanced split﻿Explore the dataset split interactively →﻿
Verify data distribution: Group by "split"Confirm the data distribution across labels for each split. Here I have 400 images for each label in train, 50 in val, and 50 in test.
﻿
﻿
Preview the images: Group by "label"Group by "label" to see all the images by their true class. You can scroll horizontally through the images in each cell using the arrows.
﻿
﻿
﻿
3. Train model and validate predictions﻿Interact with a validation predictions table →﻿
Check the predictions: Group by "truth"See the distribution of predictions and scores for a given correct class. Interestingly in this example, many of the "Animalia" in the second row are sea creatures, easily  confused with mollusks.
sort by "truth" (true label name)  to  alphabetize the table
group by "truth" to see a distribution of guesses for each true label
Sort and group actions are available for each relevant column. Hover over the column header, click on the three-dot menu on the right, and select an action.
﻿
Dive into the confusing classes: Group by "guess"When the model guesses a particular class, what is the distribution of true labels for those guesses? Here we can see that "Mollusks" are a popular confound for "Animals" (second row, "truth" column).
sort by "guess" to alphabetize; group by "guess" to see distributions of true labels
﻿
Focus on a subset: Filter by true class﻿Interact with a validation predictions table →﻿
Let's look at just the mollusks, animals, and insects. Crustaceans and slugs are especially confusing because of the context: the model may be picking up on common backgrounds (underwater, tide pools, grass) or hands (frequently holding the smaller creatures).
click the "Filter" button to enter a query selecting the true labels of interest. You can type and see hints in the dropdown to formulate an expression like x["truth"] = "Mollusca" or x["truth"] = "Animalia" or x["truth"] = "Insecta"
sort and group by "truth"
edit the first three score columns via the header menu (to rearrange them and prioritize the score columns for the classes you've just selected: Animalia, Mollusca, and Insecta
﻿
Focus on the confounds: Filter by "guess"﻿Interact with a validation predictions table →﻿
Let's look at only images with a true label of Amphibia which were misclassified.
filter query: x["truth"] = "Amphibia" and x["guess"] != "Amphibia"
then sort by score_Amphibia to show the closest guesses/worst errors

﻿
or group by guess to try to see systematic errors per class

﻿
Compare across model versions﻿Model comparison view: Left 1 epoch, right 5 epochs﻿
Here I compare two versions of my model, one trained for a single epoch (left) and one trained for five epochs (right). You can see that performance and confidence generally improves with more training. 
To compare versions from any model artifact view:
Using the left sidebar, select one version of the model, then hover over another version and click "Compare" to select it. Then change the dropdown on the right of the table name from  WbTableFile to Split Panel -> WbTableFile
sort by "truth", group by "truth" as before—now you can compare the prediction distributions across the two models
﻿
﻿
Focus on confusing classes: Filter by mislabeled imagesAfter 1 epoch, the model misclassifies 30 images. After 5 epochs, this drops to 5!
Filter query: x["truth"] = "Plantae" and x["guess"] != "Plantae"
﻿
Filter by specific guess and sort by the guessed score for more detail:
Filter query: x["guess"] = "Reptilia" and x["truth"] != "Reptilia" and sort by "score_Reptilia"
﻿
Focus on score distributions across models﻿Example: joined on id, v0 in blue is after 1 epoch, v2 in yellow is after 3 epochs﻿
If you log an id column for the validation images, you can compare score distributions for each class during the course of training. From the default comparison view, group by "truth".  In these histograms, 0/blue shows the predictions after 1 epoch, and 1/orange after 3 epochs. With more training, the frequency of higher confidence scores increases (taller orange bars at the right of the prediction score range). Animalia is the largest and most confusing category, and after three epochs the model makes different (and slightly fewer) mistakes, based on some smaller and some totally new orange bars.
﻿
﻿
﻿
4. Run inference and explore resultsTest predictions table and
split view table, 1 epoch left vs 5 epochs right
Here I compare two versions of the same model trained for one (left) versus five (right) epochs on the same set of test images. In many scenarios, we won't have labels for test data, but we do here for illustration purposes.
Compare across the same imagesSort and group by truth as before. Note that the model does tend to improve on the right, with more peaked guess distributions.
 
Filter to a subset of classesWith more epochs, there is less confusion among mollusks, animals, and insects.
Filter query: x["truth"] = "Mollusca" or x["truth"] = "Animalia" or x["truth"] = "Insecta"
Sort and group by truth
See most confused classes: Group by guessAfter longer training, the model generally makes fewer mistakes. However the "Fungi" mistakes on the right (right column, fourth row down) are new and interesting—perhaps this is the effect of the background/overall shape?
  
Compare across individual imagesSee how the class predictions for the same images change with more training—generally more confident on the right.
  
Focus on top confused images for a particular (truth, guess) pairOptionally filter by labels, then sort by score.
  
﻿
Interesting findsContext is everythingHere are some Plantae from the dataset which the models (L: 1 epoch, R: 5 epochs) failed to identify as plants. It looks like the background/visual context of the living thing might influence the prediction. In the last row on the left, the phots of field/forest scenes are more canonical for images of mammals. In the first row on the right, the bare earth is more typical context for Fungi photos. And the pitcher plant reptile in the bottom right definitely fooled me.
   
False positives as evolutionary advantageThe "eyespot" patterns on the butterfly in the third row image look amazingly like the face of a reptile or snake (you may need to zoom in on this page to see the nostril spots). There's some evidence that such coloration evolved to discourage predators. Note that the confidence scores for the last image are almost evenly split across Insecta, Reptilia, Amphibia, and even Arachnida (a bit lower for this last one). If we wanted to ship this model to production, adding a minimum score threshold for detecting any particular class would filter out particularly confusing images like this one.
Add your own?If you find any interesting insights or patterns in our interactive example, please comment below—we'd love to see them!
﻿
﻿
Add a comment