Skip to main content

Classify the Natural World with Weights & Biases

This article explains how to train and fine-tune convolutional networks (CNNs) to identify species beyond ImageNet using Weights & Biases
Created on February 7|Last edited on November 3
In this project, I explore different models to identify plant and animal species from photos. I've trained small convnets and compared fine-tuned versions of existing architectures (Inception V3, ResNet, Inception ResNet V2, Xception) with different freeze layers. I also analyzed performance on higher-level categories (e.g. mammals versus insects versus birds).

Table of Contents



This is my log of progress as I try different approaches.
Here's a sample of the images from the iNaturalist 2017 dataset:


Fine-Tune Standard CNNs on Small Data



Experiments with epochs of pretraining
8



Findings

3-10 Epochs of Pre-Training and More Data Helps

  • more data: adding more data (5K to 10K to 50K) clearly helps—training accuracy goes from ~81% to ~85% to ~95%. However, validation accuracy mostly just stabilizes and saturates at around 84% even with the full dataset. That seems to be our limit—need to look into per-class accuracy.
  • pre-train for 3 epochs: the longer the pre-training, the slower the accuracy rises. All converges in around the same range, although 25 pre-train epochs is too much and 10 is surprisingly still good. Not a significant difference between 1, 3, 5, and 10, though 3 actually seems to do best. Maybe longer than 3 would be better on more data. Without any pretraining, acc and val acc are both around 4% lower.
  • p re-training the whole network before freezing some layers => faster learning & higher acc (red baseline = no pretraining: lower & slower).
  • val acc less noisy
  • val acc doesn’t change that much from 5K to 50K: we’re overfitting

Notes Fri Jan 18

Got basic code running: load InceptionV3, add a fully-connected (fc) layer on top of configurable size, then an output layer of size num_classes. Pretrain for pe epochs, currently using rmsprop. Freeze everything up to fl layer as untrainable. Train the rest of the layers for e epochs, currently using SGD with momentum and a lower learning rate. These defaults are directly from the Keras tutorial on finetuning InceptionV3 on a new set of classes.
  • How many pre-train epochs to let the model settle in versus to actually train matters. Investigate this space. Note that we don't have a lot of data, so it's probably massively overfitting. Right now 1 to 10 epochs of pretraining look about the same. Why would we want more/less pretraining?
  • More generally, how can we see an effect with so little data? Try to train with full data, and see how that compares.
Experiment 1
  • How does the number of pre-train epochs affect the situation? Try a few in 1-25.
TODO
  • standardize code so that we don't duplicate model logging and all params
  • bring back per-class accuracy
  • how to parameterize which network we're loading
  • adding tags
  • types of optimizer in pre-train and train: what matters?

Fine-Tuning Inception V3

Freeze Fewer Layers for Higher Accuracy

  • Freeze fewer layers: when adapting an existing network to new data, we can choose how many layers to freeze versus train/fine-tune.
    • fl = index of last frozen layer in these runs, so fl 54 means freezing layers 0 to 54 and fine-tuning the rest
    • Fewer layers (lower fl): train slower, better fit for our data.
    • More layers (higher fl): train faster, better generalization to unseen data
  • freezing up to layers 54/155 outperforms 249/311 (higher train and val acc)
  • still generalizes well (higher val acc)
  • not much difference in the two groups, could pick a higher layer index for efficiency (313-layer InceptionV3). Layers 54, 155 get to 87/85; layers 249, 311 get to 76/78 validation accuracy.
Notes Thurs Jan 24
per-class accuracy tbd: crashed, probably asking for way too many batches—consider dividing actual size of validation set in the Keras callback
  • How does per-class accuracy look for these?
  • How do different freeze layers affect the result?
  • How do different base models affect the result?
  • learning rate/optimizer modifications don't seem super relevant at this point, probably not worth it

Visualizing Differences Between Freeze Layers



Freeze layer
4


Per-Class Precision with InceptionV3

More Data for Kingdoms

How well do we perform on each of the 10 taxonomic classes? Look at per-class precision for different models (5K examples, hence noisy/jagged plots)
  • per-class precision: Birds are best (up to 95). Animalia and Plantae seem to the be worst (70-80). Molluscs and Reptiles slightly worse than Amphibia, Arachnida, Fungi, Insects, Mammals (all 80-90). Birds have most species represented, may have more consistent data. Animals are a weird subset and plants are a different kingdom from everything (and more diverse than fungi). Next level of analysis: where does more data help?
  • more data: 10K vs 5K: helps most in Animalia, maybe Reptilia, maybe Insecta—not so much in Mamalia, Fungi, Amphibia, Arachnida, Mollusca, Aves. Also super noisy at this level. Maybe run an experiment with per-class accuracy enabled and also turning up data by 10K. Animals are a pretty diverse class (lots of different subclasses grouped together).
  • pre-training definitely helps (red curve is generally lower)
Notes Mon Jan 28
TODO
  • how to parameterize which network we're loading
  • how to set a sensible freeze layer for each model
Experiments
  • How do different base models affect the result?
  • How does varying fc size affect the result?
Running commands
  • python adv_finetune.py -m "fl_155_pt_5_per_class_3" -t "per_class" -g "0" -pe 5 -e 45 -fl 155 --per_class
  • python adv_finetune.py -m "pt_0" -pe 0 --per_class -g 1
  • python adv_finetune.py -m "fl_155_pt_3_10K" -t "per_class" -g 2 -fl 155 -nt 10000 -nv 1600 -pe 3 -e 47 --per_class
  • python adv_finetune.py -m "fl_54_pt_5" -t "per_class" -g 3 --per_class -fl 54 -pe 3 -e 47

Per-Class Precision with Inception



Per class
4



Varying Base Models

Use InceptionV3

Fine-tuning a well-known, high-accuracy convnet (pretrained on ImageNet) is a great strategy for vision tasks, especially for nature photos (very similar to ImageNet). Which base network should we choose?
  • InceptionV3 & IRV2: same train/val acc, xception: a bit lower
  • IRV2 uses double the memory & params for minimal acc difference, so let’s choose Inception V3
  • val acc only changes by ~10%, so we need more data (this is 5K)
  • Keras version update needed for correct ResNet instantiation
TODO
  • how to set a sensible freeze layer for each model
  • epochs not matching expectations
  • low val acc on resnet; update Keras

Visualizing the Results of Varying Base Models



Base Models
4


Vary Freeze Layer and Base Model

Long Runs; Not Much Difference

This hyperparameter is theoretically interesting but would take a lot of compute to finetune— stick with InceptionV3 for now.

Results of Varying Freeze Layer and Base Model



Base Models
5
Vary Base & Freeze Layer
3

Iterate on AI agents and models faster. Try Weights & Biases today.