Curriculum Learning in Nature

Applying human learning strategies to neural nets on iNaturalist 2017. Made by Stacey Svetlichnaya using Weights & Biases
Stacey Svetlichnaya


Could the human strategy of learning by progressively increasing specificity or difficulty, known as curriculum learning, benefit a neural network?

I train a small CNN identifying plant and animal species from photos in the iNaturalist 2017 dataset. I set up for curriculum learning by filtering the dataset to a balanced 5 classes with 25 subclasses each, then learning to predict in two stages: pre-training on 5 taxonomic classes (birds, insects, mammals) and regular training on the 25 constituent species (blackbirds, bears, butterflies). I test various combinations of learning rates, optimizers, and architectures for curriculum learning. The relevant (highly exploratory!) code is here.

For an intuition on curriculum learning, consider how learning to identifying the species may be harder in the left scenario below (all different kinds of castilleja) and easier—plus more generalizable—in the right scenario of a hypothetical curriculum from mammals to bear species.


Preparing a small baseline CNN

Quickly explore layer configuration and batch size

This network is very small and trained on only 2000 or 5000 total examples of 10 classes. The highest validation accuracy is around 45%. Below you can activate one or more tabs (by checking the box to the left of the group name) to see the results.


Results of tuning batch size, layer config, and dropout

Results of tuning batch size, layer config, and dropout

Dropout & Optimizers

Too early for dropout; rmsprop > adam

Results of varying dropout and optimizers

Results of varying dropout and optimizers

Curriculum learning: Pretrain on class, then species

Pre-train on class to try to beat species baseline

Does pretraining to predict one of 5 classes before finetuning on one of 25 species help?

Below, I pretrain the network on the easy task first: predict one of 5 taxonomic classes, for C epochs total. Then I switch to training/finetuning on the harder task: predict one of 25 species, for S epochs total (by reloading the learned weights into a new network with the same architecture). I vary C from 0 (species baseline in red) to 15. All runs in the "switch" condition initially track the class baseline in blue, drop in accuracy substantially when the switch happens, and quickly catch up to the species baseline.

These initial results are mixed. C 5 S 45 acquires a higher training accuracy than the species baseline, with C=3 and C=10 also matching/exceeding the species baseline at some points. Validation accuracy is noisier: C=5 is only slightly better, with C=15 tracking the species baseline most closely.

Next steps

Pretrain on Class, switch to Species

Pretrain on Class, switch to Species

Learning rate experiments

SGD and Adam do not beat baseline

Results of learning rate experiments

Results of learning rate experiments