Skip to main content

Few Shot Learning

An exploration into few-shot learning
Created on October 31|Last edited on November 28
While a majority of research and tooling has been dedicated into Large Language Models (LLMs) such as ViT and Dalle-2, there have been many advances in the world of Computer Vision that have gone unnoticed in the past few years. CNNs are still surprisingly relevant in the world of applied AI. They provide a lightweight model for inference at the edge and are economical to train in terms of compute, time, and labeled data requirements. Leveraging the best practices of transfer learning, we explore few-shot learning

What is Few Shot learning?

Few shot learning is the practice of fine-tuning a generalized base model on a dataset of only a few examples. We intentionally under-fit the model to the complete dataset and using the model that generalizes best to the larger dataset. To accomplish this, we move from a training data heavy splitting paradigm to a testing data heavy paradigm.

Training Set 70%2% Training\ Set\ 70 \% \rightarrow 2\%

Validation Set 20%18% Validation\ Set\ 20 \% \rightarrow 18\%

Test Set 10%50% Test\ Set\ 10 \% \rightarrow 50\%

With a minimalist approach to training data, dataset imbalances can be exacerbated leaving some classes null, so instead of percentages, we think of these datasets in terms of examples per class.


A Convient byproduct

By utilizing few shot learning, we ultimately end up with a far more statistically significant set of accuracy metrics. If a model performs well on 98% of the dataset, then it has a better chance of generalizing over real world data.

Determining Boundaries

Our Best Backbone

For this experiment we switch between a Resnet 50 from torchvision, pre-trained on imagenet, and a Resnext50 pre-trained on 21b images as laid out in Big Transfer Learning

Run set
380


Minimum Viable Dataset Size

We must also determine a minimum viable dataset size that can reliably produce a production worthy classifier. What is incredible is the two backbones used are clearly delineated in this scatter plot. There is an obvious winner


Sweep: putliaj4 1
182
Sweep: putliaj4 2
0

So what? We have learned a set of best practices to get a pre-trained model to perform well on a toy dataset. In reality, the performance of a model on an academic dataset such as cifar10 is not indicative of real world performance.
The real world is full of dirty, imbalanced data with various forms of nuanced decision boundaries. For the second half of the experiment we will try to train classifiers on the dirtiest of dirty data Kaggle datasets 😲😱😳🤢🤮

Not Too Shabby: Real World Dirty Data


Run set
653





Analysis by Dataset


Run set
118


Confusion Matrixes


Run set
653

File<(table)>
File<(table)>