Could the human strategy of learning by progressively increasing specificity or difficulty, known as curriculum learning, benefit a neural network?
I train a small CNN identifying plant and animal species from photos in the iNaturalist 2017 dataset. I set up for curriculum learning by filtering the dataset to a balanced 5 classes with 25 subclasses each, then learning to predict in two stages: pre-training on 5 taxonomic classes (birds, insects, mammals) and regular training on the 25 constituent species (blackbirds, bears, butterflies). I test various combinations of learning rates, optimizers, and architectures for curriculum learning. The relevant (highly exploratory!) code is here.
For an intuition on curriculum learning, consider how learning to identifying the species may be harder in the left scenario below (all different kinds of castilleja) and easier—plus more generalizable—in the right scenario of a hypothetical curriculum from mammals to bear species.
This network is very small and trained on only 2000 or 5000 total examples of 10 classes. The highest validation accuracy is around 45%. Below you can activate one or more tabs (by checking the box to the left of the group name) to see the results.
Does pretraining to predict one of 5 classes before finetuning on one of 25 species help?
Below, I pretrain the network on the easy task first: predict one of 5 taxonomic classes, for C epochs total. Then I switch to training/finetuning on the harder task: predict one of 25 species, for S epochs total (by reloading the learned weights into a new network with the same architecture). I vary C from 0 (species baseline in red) to 15. All runs in the "switch" condition initially track the class baseline in blue, drop in accuracy substantially when the switch happens, and quickly catch up to the species baseline.
These initial results are mixed. C 5 S 45 acquires a higher training accuracy than the species baseline, with C=3 and C=10 also matching/exceeding the species baseline at some points. Validation accuracy is noisier: C=5 is only slightly better, with C=15 tracking the species baseline most closely.