Original Visualize Data for Image Classification
Overview
This is a walkthrough of dataset and prediction visualization (DSViz) and artifacts for image classification on W&B. Here I finetune a convnet in Keras on 10,000 photos from iNaturalist 2017 to identify 10 classes of living things (plants, insects, birds, etc).
Follow along in this colab →
Train on versioned data and visualize predictions
- Upload raw data
- Create a balanced split (train/val/test)
- Train model and validate predictions
- Run inference & explore results
Project workflow
Here is the artifacts graph for this project, with datasets, predictions, and models connected by the runs creating/consuming them.
1. Upload raw data
My raw data for this project contains 10,000 images, organized into 10 subfolders. The name of each subfolder is the ground truth label for the images it contains (Amphibia, Animalia, Arachnida...Reptilia). With Artifacts, I can upload my full raw dataset and automatically track and version all the different ways I may subsequently decide to generate my train/val/test splits (how many items per split or per class, balanced or unbalanced, hold out test set or no, etc).
2. Create a balanced split
Verify data distribution: Group by "split"

Preview the images: Group by "label"

3. Train model and validate predictions
Check the predictions: Group by "truth"
- sort by "truth" (true label name) to alphabetize the table
- group by "truth" to see a distribution of guesses for each true label

Dive into the confusing classes: Group by "guess"
- sort by "guess" to alphabetize; group by "guess" to see distributions of true labels

Focus on a subset: Filter by true class
- click the "Filter" button to enter a query selecting the true labels of interest. You can type and see hints in the dropdown to formulate an expression like x["truth"] = "Mollusca" or x["truth"] = "Animalia" or x["truth"] = "Insecta"
- sort and group by "truth"
- edit the first three score columns via the header menu (to rearrange them and prioritize the score columns for the classes you've just selected: Animalia, Mollusca, and Insecta

Focus on the confounds: Filter by "guess"
- filter query: x["truth"] = "Amphibia" and x["guess"] != "Amphibia"
- then sort by score_Amphibia to show the closest guesses/worst errors
- or group by guess to try to see systematic errors per class

Compare across model versions
- Using the left sidebar, select one version of the model, then hover over another version and click "Compare" to select it. Then change the dropdown on the right of the table name from WbTableFile to Split Panel -> WbTableFile
- sort by "truth", group by "truth" as before—now you can compare the prediction distributions across the two models

Focus on confusing classes: Filter by mislabeled images


Focus on score distributions across models

4. Run inference and explore results
Test predictions table and split view table, 1 epoch left vs 5 epochs right
Here I compare two versions of the same model trained for one (left) versus five (right) epochs on the same set of test images. In many scenarios, we won't have labels for test data, but we do here for illustration purposes.
Compare across the same images
Sort and group by truth as before. Note that the model does tend to improve on the right, with more peaked guess distributions.
Filter to a subset of classes
With more epochs, there is less confusion among mollusks, animals, and insects.
-
Filter query:
x["truth"] = "Mollusca" or x["truth"] = "Animalia" or x["truth"] = "Insecta"
-
Sort and group by truth
See most confused classes: Group by guess
After longer training, the model generally makes fewer mistakes. However the "Fungi" mistakes on the right (right column, fourth row down) are new and interesting—perhaps this is the effect of the background/overall shape?
Compare across individual images
See how the class predictions for the same images change with more training—generally more confident on the right.
Focus on top confused images for a particular (truth, guess) pair
Optionally filter by labels, then sort by score.
Interesting finds
Context is everything
Here are some Plantae from the dataset which the models (L: 1 epoch, R: 5 epochs) failed to identify as plants. It looks like the background/visual context of the living thing might influence the prediction. In the last row on the left, the phots of field/forest scenes are more canonical for images of mammals. In the first row on the right, the bare earth is more typical context for Fungi photos. And the pitcher plant reptile in the bottom right definitely fooled me.
False positives as evolutionary advantage
The "eyespot" patterns on the butterfly in the third row image look amazingly like the face of a reptile or snake (you may need to zoom in on this page to see the nostril spots). There's some evidence that such coloration evolved to discourage predators. Note that the confidence scores for the last image are almost evenly split across Insecta, Reptilia, Amphibia, and even Arachnida (a bit lower for this last one). If we wanted to ship this model to production, adding a minimum score threshold for detecting any particular class would filter out particularly confusing images like this one.
Add your own?
If you find any interesting insights or patterns in our interactive example, please comment below—we'd love to see them!