Classify the Natural World with Weights & Biases

This article explains how to train and fine-tune convolutional networks (CNNs) to identify species beyond ImageNet using Weights & Biases
Stacey Svetlichnaya
Created on February 7|Last edited on November 3
Comment
In this project, I explore different models to identify plant and animal species from photos. I've trained small convnets and compared fine-tuned versions of existing architectures (Inception V3, ResNet, Inception ResNet V2, Xception) with different freeze layers. I also analyzed performance on higher-level categories (e.g. mammals versus insects versus birds).
Table of ContentsFine-Tune Standard CNNs on Small DataFindingsFine-Tuning Inception V3Visualizing Differences Between Freeze LayersPer-Class Precision with InceptionV3Per-Class Precision with InceptionVarying Base ModelsVisualizing the Results of Varying Base ModelsVary Freeze Layer and Base ModelResults of Varying Freeze Layer and Base Model
﻿
﻿
This is my log of progress as I try different approaches.
Here's a sample of the images from the iNaturalist 2017 dataset:
﻿
Fine-Tune Standard CNNs on Small Data﻿
﻿
Experiments with epochs of pretraining8
﻿
﻿
Findings
3-10 Epochs of Pre-Training and More Data Helpsmore data: adding more data (5K to 10K to 50K) clearly helps—training accuracy goes from ~81% to ~85% to ~95%. However, validation accuracy mostly just stabilizes and saturates at around 84% even with the full dataset. That seems to be our limit—need to look into per-class accuracy.
pre-train for 3 epochs: the longer the pre-training, the slower the accuracy rises. All converges in around the same range, although 25 pre-train epochs is too much and 10 is surprisingly still good. Not a significant difference between 1, 3, 5, and 10, though 3 actually seems to do best. Maybe longer than 3 would be better on more data. Without any pretraining, acc and val acc are both around 4% lower.
p re-training the whole network before freezing some layers => faster learning & higher acc (red baseline = no pretraining: lower & slower).
val acc less noisy
val acc doesn’t change that much from 5K to 50K: we’re overfitting
Notes Fri Jan 18Got basic code running: load InceptionV3, add a fully-connected (fc) layer on top of configurable size, then an output layer of size num_classes. Pretrain for pe epochs, currently using rmsprop. Freeze everything up to fl layer as untrainable. Train the rest of the layers for e epochs, currently using SGD with momentum and a lower learning rate. These defaults are directly from the Keras tutorial on finetuning InceptionV3 on a new set of classes.
How many pre-train epochs to let the model settle in versus to actually train matters. Investigate this space. Note that we don't have a lot of data, so it's probably massively overfitting. Right now 1 to 10 epochs of pretraining look about the same. Why would we want more/less pretraining?
More generally, how can we see an effect with so little data? Try to train with full data, and see how that compares.
Experiment 1
How does the number of pre-train epochs affect the situation? Try a few in 1-25.
TODO
standardize code so that we don't duplicate model logging and all params
bring back per-class accuracy
how to parameterize which network we're loading
adding tags
types of optimizer in pre-train and train: what matters?
Fine-Tuning Inception V3
Freeze Fewer Layers for Higher AccuracyFreeze fewer layers: when adapting an existing network to new data, we can choose how many layers to freeze versus train/fine-tune. 
fl = index of last frozen layer in these runs, so fl 54 means freezing layers 0 to 54 and fine-tuning the rest
Fewer layers (lower fl): train slower, better fit for our data.
More layers (higher fl): train faster, better generalization to unseen data
freezing up to layers 54/155 outperforms 249/311 (higher train and val acc)
still generalizes well (higher val acc)
not much difference in the two groups, could pick a higher layer index for efficiency (313-layer InceptionV3). Layers 54, 155 get to 87/85; layers 249, 311 get to 76/78 validation accuracy.
Notes Thurs Jan 24
per-class accuracy tbd: crashed, probably asking for way too many batches—consider dividing actual size of validation set in the Keras callback
How does per-class accuracy look for these?
How do different freeze layers affect the result?
How do different base models affect the result?
learning rate/optimizer modifications don't seem super relevant at this point, probably not worth it
Visualizing Differences Between Freeze Layers﻿
﻿
Freeze layer4
﻿
Per-Class Precision with InceptionV3
More Data for KingdomsHow well do we perform on each of the 10 taxonomic classes? Look at per-class precision for different models (5K examples, hence noisy/jagged plots)
per-class precision: Birds are best (up to 95). Animalia and Plantae seem to the be worst (70-80). Molluscs and Reptiles slightly worse than Amphibia, Arachnida, Fungi, Insects, Mammals (all 80-90). Birds have most species represented, may have more consistent data. Animals are a weird subset and plants are a different kingdom from everything (and more diverse than fungi). Next level of analysis: where does more data help?
more data: 10K vs 5K: helps most in Animalia, maybe Reptilia, maybe Insecta—not so much in Mamalia, Fungi, Amphibia, Arachnida, Mollusca, Aves. Also super noisy at this level. Maybe run an experiment with per-class accuracy enabled and also turning up data by 10K. Animals are a pretty diverse class (lots of different subclasses grouped together).
pre-training definitely helps (red curve is generally lower)
Notes Mon Jan 28
TODO
how to parameterize which network we're loading
how to set a sensible freeze layer for each model
Experiments
How do different base models affect the result?
How does varying fc size affect the result?
Running commands
python adv_finetune.py -m "fl_155_pt_5_per_class_3" -t "per_class" -g "0" -pe 5 -e 45 -fl 155 --per_class
python adv_finetune.py -m "pt_0" -pe 0 --per_class -g 1
python adv_finetune.py -m "fl_155_pt_3_10K" -t "per_class" -g 2 -fl 155 -nt 10000 -nv 1600 -pe 3 -e 47 --per_class
python adv_finetune.py -m "fl_54_pt_5" -t "per_class" -g 3 --per_class -fl 54 -pe 3 -e 47
Per-Class Precision with Inception﻿
﻿
Per class4
﻿
﻿
Varying Base Models
Use InceptionV3Fine-tuning a well-known, high-accuracy convnet (pretrained on ImageNet) is a great strategy for vision tasks, especially for nature photos (very similar to ImageNet). Which base network should we choose?
InceptionV3 & IRV2: same train/val acc, xception: a bit lower
IRV2 uses double the memory & params for minimal acc difference, so let’s choose Inception V3
val acc only changes by ~10%, so we need more data (this is 5K)
Keras version update needed for correct ResNet instantiation
TODO
how to set a sensible freeze layer for each model
epochs not matching expectations
low val acc on resnet; update Keras
Visualizing the Results of Varying Base Models﻿
﻿
Base Models4
﻿
Vary Freeze Layer and Base Model
Long Runs; Not Much DifferenceThis hyperparameter is theoretically interesting but would take a lot of compute to finetune— stick with InceptionV3 for now.
Results of Varying Freeze Layer and Base Model﻿
﻿
Base Models5
Vary Base & Freeze Layer3
﻿
﻿
Add a comment
Tags: Advanced, Computer Vision, Object Detection, Keras, Experiment, CNN, Plots, ImageNet, iNaturalist, Exemplary
Iterate on AI agents and models faster. Try Weights & Biases today.