The Al-Dente Neural Network: Part I
Training a neural network is easy to learn but takes a lifetime to master. This article explains how to make your own neural network.
Created on May 17|Last edited on October 14
Comment
A couple of years ago, Andrej Karpathy posted a tweet on the most common mistakes people make when training neural nets:

A year after, he followed it up with a comprehensive blog post covering all the steps he takes when building a neural network training pipeline that avoids all the aforementioned mistakes (or at least makes them easy to fix). Given the sheer detail and depth into which Andrej's blog goes into to elucidate these points, it is impossible to cover all of them in a single report. Over the course of a series of reports, I will try to put some of the steps in that recipe into practice and see how each of them impacts the quality of the network we end up with.
Note: You might be wondering where the Al Dente comes from and how on earth is it related to training a neural net. Al Dente pasta is firm to the tooth. Neither is it too hard and raw nor is it too mushy and soft. Similarly, an Al Dente neural net neither underfits, nor overfits. It just works for your data. This is why having a good recipe is important 😉.
Table of Contents
The PremiseThe RecipeBecome One with the Data👉 👈What's our data anyway?Visualizing Some SamplesOf Duplicates, Noisy Labels and more....Pixel Statistics and t-SNEMean Class ImagesSummaryReferences
The Premise
- Neural Net Training is a Leaky Abstraction:
- It's easy to code up a neural net thanks to the numerous libraries and frameworks available today.
- However, neural nets aren't a plug-and-play technology and you must understand what happens behind the scenes.
- Neural Nets fail Silently:
- You can have syntactically correct code and still have things fail on you because you put things together in the wrong order.
- You might have performed partial augmentation (just the image and not the label) and yet your network works decently.
- You might have initialized a network from a pretrained checkpoint but might have ignored the original mean and the list goes on and on and on......
In short, debugging a neural net is hard. While suffering is unavoidable 😓, you can take some measures to make sure that the suffering was worth your while 😅.
The Recipe
There are 6 steps in the recipe, each with its own sub-steps. In this article, I will be focusing on the first of them:
Become One with the Data
One of the most important steps in building a robust pipeline is knowing your data inside and out. This involves completely forgetting about your network code and spending time inspecting (yes manually) your data. Often times this can be an insane challenge since the "The Fast and Furious" researcher in us wants to code up that neural net and watch it train to SOTA. However, by learning the quirks of the data you are working with, its biases, its limitations and its patterns, you can better model your pipeline to draw out the maximum possible juice from it.

One with your data you must become young padawan! Source: starwars.com
In his recipe, Andrej mentions looking out for the following:
- The distribution of the data and its patterns
- Duplicate examples and/or corrupted images/labels
- Imbalances and Biases in the data
- Your own approach for classifying the data
Fasten your seatbelts and follow along as we explore some of these points below.
👉 The colab notebook for this report can be found here 👈
What's our data anyway?
For the purpose of demonstration, I chose the CIFAR-10 dataset which consists of 10 object classes, namely Airplanes, Automobiles, Birds, Cats, Deer, Dogs, Frogs, Horses, Ships, and Trucks. Each class has 5000 training examples and a 1000 test examples which gives 60k images in all. Each image in the dataset is a color image of resolution .

Let's now load a few of them up and visualize them to get a sense of the image quality, object variety and so forth.
Visualizing Some Samples
Run set
1
From the images above, we can see that each class has images consisting of a wide variety of pose, lighting, sizes and environments. Since the images are only $32 \times 32$, there isn't a high level of detail in the image which we could possibly use to distinguish corner cases. But the recipe wasn't written in such painstaking detail for us just to come up with cursory conclusions. Let's take a few samples from each of the 10 classes, and try to find out if they have mislabeled examples, difficult cases, noisy data and more.
Of Duplicates, Noisy Labels and more....
For each class in the dataset, I first visualized images to see how varied they were and how consistent the quality of the labeling was. I manually inspected several batches of images and identified ones that I felt would be difficult for the model to classify, ones which were incorrectly labeled, and ones where there were multiple objects in the image. Doing this gives me a good sense of how to evaluate my model when it makes mistakes. It also gives me an idea of the cleanliness of the labeling.
Just going through 10 classes worth of images took me several hours and finding odd samples in the dataset was even more challenging. I can only imagine how many days it would have taken Andrej to go through ImageNet in its entirety. No wonder he is called the Human ImageNet Classifier 😅.
Airplanes
Below you can see a random subset of images from the airplane class. It can be seen that there is a huge variety in the types of airplanes in the dataset. For example, you have older propeller-based models to the newer stealth crafts. Further, the colors, size, location and pose of the images show huge variance. There are some airplanes flying in a sunset sky, while others are flying in clear blue skies, some are on the tarmac while others are toy models. It might be interesting to look at when we see how our model does when it gets a sample that is a ship in the ocean 😉
Run set
1
Pixel Statistics and t-SNE
Now that we've explored the nuances of the individual classes, let's put our assumptions to the test. Particularly, let's look at a few more things:
- What would an average image of a given class look like?
- How do the color distributions of similar classes look like?
- Finally, what if we clustered our images with t-SNE? Do they form well separated clusters?
Mean Class Images
Let's compute the mean image for each class in the training set. This will give us a rough idea of the most popular colors, the average pose of the objects, and perhaps a vague shape of the object that the model will see over the course of training.
For this I wrote a simple function shown here:
def compute_mean_image(images, opt="mean"):"""Compute and return mean image givena batch of imagesimages - batch of shape (N x W x H x C)"""images = images/255.if opt == "mean":return np.mean(images, axis=0, dtype=np.float64)else:return np.median(images, axis=0)
While experimenting, I found that the `median` yielded an ever so slightly subtle better response compared to the mean at least visually. Below are the mean images for each of the classes.
Run set
1
Summary
Clearly, there's a lot more analysis that can be done and many more outliers can be found. What's more important is that we keep these aspects in mind when designing the model, analyzing its performance and thinking the ways of improving it. Ideally, we should strive to clean the dataset so that there are as few (none) of these outliers as possible. In summary, here are some of the things I found manually going through this dataset:
- It has balanced class sample numbers (5k training samples, 1k test samples per class)
- However, there are some mislabeled examples for almost all of the classes
- There are samples that have been squished or padded to fit the dataset dimensions
- There is inconsistency in labeling. A van is labeled both as a car and as a truck
- Some of the images are really grainy and it's pretty hard to identify the object in question
- There are some images where multiple objects are present, e.g.: Human and deer, or Human and truck, Multiple Cars
- The samples for some of the classes have been drawn from sources like posters, toys, mock ups and this will make it challenging for our model
- For some of the classes, color might influence or bias our model and force it to lean one way. For example, ships and airplanes are both seen quite often against blue backgrounds, the dataset also contains quite a few red cars and trucks, and similarly deer and horses appear against green backgrounds a lot
I will leave you to do more exploratory analysis but going through this process has given me a lot of insight into how to go about the next steps of the recipe. Until the next part, enjoy your al-dente data 😉.
References
- Yes you should understand backprop: https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.