A couple of years ago, Andrej Karpathy posted a tweet on the most common mistakes people make when training neural nets:
A year after, he followed it up with a comprehensive blog post covering all the steps he takes when building a neural network training pipeline that avoids all the aforementioned mistakes (or at least makes them easy to fix). Given the sheer detail and depth into which Andrej's blog goes into to elucidate these points, it is impossible to cover all of them in a single report. Over the course of a series of reports, I will try to put some of the steps in that recipe into practice and see how each of them impacts the quality of the network we end up with.
Note: You might be wondering where the Al Dente comes from and how on earth is it related to training a neural net. Al Dente pasta is firm to the the tooth. Neither is it too hard and raw, nor is it too mushy and soft. Similarly, an Al Dente neural net neither underfits, nor overfits. It just works for your data. Which is why having a good recipe is important :wink:.
Neural Net Training is a Leaky Abstraction:
Neural Nets fail Silently:
In short, debugging a neural net is hard. While suffering is unavoidable :sweat:, you can take some measures to make sure that the suffering was worth your while :sweat_smile:.
There are 6 steps in the recipe, each with their own sub-steps. In this report, I will be focusing on the first of them:
One of the most important steps in building a robust pipeline is knowing your data inside and out. This involves completely forgetting about your network code and spending time inspecting (yes manually) your data. Often times this can be an insane challenge since the "The Fast and Furious" researcher in us wants to code up that neural net and watch it train to SOTA. However, by learning the quirks of the data you are working with, its biases, its limitations and its patterns, you can better model your pipeline to draw out the maximum possible juice from it.
One with your data you must become young padawan! Source: starwars.com
In his recipe, Andrej mentions looking out for the following:
Fasten your seatbelts and follow along as we explore some of these points below.
For the purpose of demonstration, I chose the CIFAR-10 dataset which consists of 10 object classes, namely Airplanes, Automobiles, Birds, Cats, Deer, Dogs, Frogs, Horses, Ships, and Trucks. Each class has 5000 training examples and a 1000 test examples which gives 60k images in all. Each image in the dataset is a color image of resolution $32 \times 32$.
Let's now load a few of them up and visualize them to get a sense of the image quality, object variety and so forth.
For each class in the dataset, I first visualized images to see how varied they were and how consistent the quality of the labeling was. I manually inspected several batches of images and identified ones that I felt would be difficult for the model to classify, ones which were incorrectly labeled, and ones where there were multiple objects in the image. Doing this gives me a good sense of how to evaluate my model when it makes mistakes. It also gives me an idea of the cleanliness of the labeling.
Just going through 10 classes worth of images took me several hours and finding odd samples in the dataset was even more challenging. I can only imagine how many days it would have taken Andrej to go through ImageNet in its entirety. No wonder he is called the Human ImageNet Classifier :sweat_smile:.
Now that we've explored the nuances of the individual classes, let's put our assumptions to the test. Particularly, let's look at a few more things:
Clearly, there's a lot more analysis that can be done and many more outliers can be found. What's more important is that we keep these aspects in mind when designing the model, analyzing its performance and thinking the ways of improving it. Ideally, we should strive to clean the dataset so that there are as few (none) of these outliers as possible. In summary, here are some of the things I found manually going through this dataset:
I will leave you to do more exploratory analysis but going through this process has given me a lot of insight into how to go about the next steps of the recipe. Until the next part, enjoy your al-dente data :wink:.
A Recipe for Training Neural Networks: https://karpathy.github.io/2019/04/25/recipe/
Yes you should understand backprop: https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b