Intro to Meta-Learning

Getting started with metalearning for image classification. Made by Stacey Svetlichnaya using Weights & Biases
Stacey Svetlichnaya

Meta-learning is a generalization of machine learning

Rather than solve any problem perfectly, meta-learning seeks to improve the process of learning itself. It's appealing from a cognitive science perspective: humans need way fewer examples than a deep net to understand a pattern, and we can often pick up new skills and habits faster if we're more self-aware and intentional about reaching a certain goal.

Higher accuracy with fewer examples

In regular deep learning, we apply gradient descent over training examples to learn the best parameters for a particular task (like classifying a photo of an animal into one of 5 possible species). In meta-learning, the task itself becomes a training example: we apply a learning algorithm over many tasks to learn the best parameters for a particular problem type (e.g. classification of photos into N classes). In Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks from ICML 2017, the meta-learning algorithm is, elegantly, gradient descent, and it works for any inner model type that is itself trained with gradient descent (hence "model-agnostic"). Finn, Abbeel, and Levine apply this to classification, regression, and reinforcement learning problems and tune a meta-model (the outer model) that can learn quickly (1-10 gradient updates) on a new task with only a few examples (1-5 per class for 2-20 classes). How well does this work in practice and how can we best apply the meta-model to new datasets?

Few-shot classification on mini-ImageNet (MIN)

In this report, I focus on MAML for few-shot image classification, instrumenting the original code for the paper. Below are some examples from the mini-ImageNet (MIN) dataset with my best guesses as to the labels (which could be more specific or more general categories in actuality). This is fairly representative of ImageNet: diversity of images and views of the target object, balanced with mostly center crops and strict, not-always-intuitive definitions (e.g. the "wolf" and "bird" classes could more narrowly intend a particular species).

N-way, K-shot image classification

From the MAML paper: "According to the conventional terminology, K-shot classification tasks use K input/output pairs from each class, for a total of NK data points for N-way classification."

Here are the relevant settings (argument flags) in the provided code:

So, 5-way, 1-shot MIN considers 1 labeled image from each of 5 classes (a total of 5 images). 5-way, 5-shot MIN considers 5 labeled images from each of 5 classes (a total of 25 images). Some example scenarios are shown below. Note how much the diversity of classes in a given N-way task may vary: e.g. different species of similar-looking dogs or the range of visuals used to represent "lipstick" may be much harder to learn.

Other important flags for training dynamics

Initial observations

Here I compare meta-learning runs with K=1 shot learning (1 example for each class) while varying the number of classes (num_classes), the number of inner gradient updates (num_updates), the effective batch size, and the number of filters learned. All charts are shown with smoothing 0.8.

Use three repos and a gist

Training data setup

For mini-ImageNet (MIN), the data is split into train (64 classes), validation (16 classes), and test (20 classes). Each class contains 600 images, each 84 x 84 pixels. in the main repo randomly picks classes, and randomly picks the right number of samples per class (K in K-shot learning), from the right split depending on the mode ( training, evaluation, or testing). One confusing detail is that the source code increments the inner batch size K by 15 when generating training data, which may affect the correctness of image shuffling. I trained some with and without this modification to try to isolate its impact and necessity.

Next experiments


File "/home/stacey/.pyenv/versions/mm/lib/python3.7/site-packages/tensorflow/python/framework/", line 1950, in __init__
    "Cannot create a tensor proto whose content is larger than 2GB.")