Skip to main content

Disentangling Model Predictions

Created on January 23|Last edited on December 17
We can visualize a model's performance by clustering its predictions or confidence scores. Below, I examine the predictions of toy CNNs fine-tuned to identify 10 classes of living things (birds, mammals, reptiles, etc—more details in this report). With W&B's newest Embedding Projector panel, it's easy and fascinating to explore patterns in a model's classification. In the live charts below, you can
  • hover your cursor over any point to see the image at that location
  • click & drag on the chart area to pan
  • scroll up/down to zoom in/out on particular region

One Model

Embeddings shown: Top left: PCA, top right: t-SNE, bottom: UMAP
  • PCA shows the least clean separation, though birds (Aves, teal) are most reliably distinct/clustered. Insects + Arachnids show substantial overlap
  • t-SNE quality varies substantially over rounds, with some reliably-separated clusters of the most canonical images
  • UMAP reliably shows confusion across Plants, Insects, and Arachnids, with Animalia as the least distinct cluster (which makes sense as Animalia is technically a higher-level category in the biological taxonomy than several of the other classes and is the easiest to mistake—e.g. in this dataset it contains a lot of sea creatures, which more closely resemble mollusks)



Comparing Two Models?

Fist row: baseline model (inception-v3 finetuned for just one epoch)
Second row: better model (double FC layer, finetuned for 5 epochs)


Baseline
1


Better model?


Better model
1


Interesting examples

Plants, insects, and spiders often confused

Scenes containing all three look very similar, and some samples legitimately contain multiple representatives (here, plants and an insect that might be hard to spot)



Unclear what is being photographed