Hyperparameters of a Simple CNN Trained on Fashion MNIST

This article explores various hyperparameters of a convolutional neural network (CNN) trained on Fashion MNIST to identify 10 types of clothing
Stacey Svetlichnaya
Created on February 7|Last edited on November 3
Comment
﻿
In this project, we explore some hyperparameters of a simple convolutional neural network (CNN) to build intuitions for a manageable example. 
The dataset is Fashion MNIST: 60,000 images of 10 classes of clothing (dress, shirt, sneaker, etc).
Table of ContentsVarying Basic Hyperparameters: Batch Size, Dropout, Learning RateResults of Varying HyperparametersVarying Layer SizeResults of Varying Layer SizeCombinations of Layer SizesResults of Different Layer SizesHyperparameters More GenerallyParallel Coordinates ChartEvaluating Specific ExamplesLogged Example Predictions from Different Runs	
﻿
﻿
Findings so farvarying most of the straightforward hyperparameters (like batch size, dropout, and learning rate) doesn't have a strong effect on validation accuracy. 
the most promising direction is increasing layer sizes (val acc increases by 1%) and exploring different ratios of consecutive layer sizes (perhaps building up to more complex architectures.)
some of the classes are harder for a human to distinguish than others, so investigating class-specific accuracy may prove useful. 
﻿
Varying Basic Hyperparameters: Batch Size, Dropout, Learning RateI train a small convolutional network (2 convolutional layers with max pooling followed by dropout and a fully-connected layer) on Fashion MNIST.
The baseline accuracy is already impressive: 0.9366 training / 0.9146 validation (suggesting slight overfitting). What happens as we increase dropout, vary batch size, and change the learning rate? You can see the effect of these hyperparameters on train/val accuracy by checking exactly one of the three tabs in the "Results of varying hyperparameters" section below.
Batch size: No significant effect and the default of 32 performs well
Dropout: Increasing the dropout has the predictable effect of decreasing training accuracy and no improvement on validation accuracy.
Learning rate: Again the baseline of 0.01, and generally lower values, perform better. Setting the learning rate too high (0.1) leads to a sudden divergence.
Results of Varying Hyperparameters﻿
﻿
Batch size4
 
Dropout6
 
Learning Rate5
﻿
﻿
Varying Layer SizeWhat happens if we vary the sizes of the three layers (two convolutional, one fully-connected)? You can enable the tabs in the following section to see the results.
Hidden (fc) Layer SizeIncreasing the size of the penultimate fully-connected layer leads to lower training loss and slightly faster learning but doesn't significantly affect the validation accuracy (although a size of 512 performs well and may be worth exploring).
Layers 1 & 2Increasing both layers gives the model more predictive capacity (more parameters) and increases validation accuracy.
Results of Varying Layer Size﻿
﻿
Hidden layer size5
 
First and Second Layers7
﻿
Combinations of Layer SizesBy increasing all the layers while maintaining their relative sizes, the validation accuracy goes up by about 1% from baseline.
Next Stepsconsider class-specific accuracy: is the model better at identifying certain items of clothing? are other items particularly problematic?
explore the learning optimizer space (settings for optimizer, learning rate, decay, and momentum)
broader architecture search: number and kinds of layers, kernel size, etc.
Results of Different Layer Sizes﻿
﻿
Layer size variations6
﻿
Hyperparameters More GenerallyBelow you can see a parallel coordinates chart that shows correlations between hyperparameters and an output metric of choice. In this case, I'm using validation accuracy, which you can see in the colorful column on the right. Most of the experiments so far have a high validation accuracy of around 0.9. Some hyperparameters like dropout, batch size, and layer 2 size have been sampled more extensively and do not seem to have a strong effect on performance. Others, like momentum, have not been varied and could be promising candidates for further experimentation.
Parallel Coordinates Chart﻿
﻿
All runs33
﻿
Evaluating Specific ExamplesIn this view, we can browse through predictions on specific examples from different runs. One common misclassification you'll notice as you browse these examples is between bags and shirts, e.g. because a bag handle resembles a neckline. Even a human can have trouble with these:
﻿
Logged Example Predictions from Different Runs	﻿
﻿
All runs33
﻿
﻿
Add a comment
Tags: Intermediate, Computer Vision, Object Detection, Experiment, CNN, Slider, Sweeps
Iterate on AI agents and models faster. Try Weights & Biases today.