In this project I explore some hyperparameters of a simple convolutional neural network (CNN) to build intuitions for a manageable example. The dataset is Fashion MNIST: 60,000 images of 10 classes of clothing (dress, shirt, sneaker, etc).
I train a small convolutional network (2 convolutional layers with max pooling followed by dropout and a fully-connected layer) on Fashion MNIST.
The baseline accuracy is already impressive: 0.9366 training / 0.9146 validation (suggesting slight overfitting). What happens as we increase dropout, vary batch size, and change the learning rate? You can see the effect of these hyperparameters on train/val accuracy by checking exactly one of the three tabs in the "Results of varying hyperparameters" section below.
What happens if we vary the sizes of the three layers (two convolutional, one fully-connected)? You can enable the tabs in the following section to see the results).
Increasing the size of the penultimate fully-connected layer leads to lower training loss and slightly faster learning but doesn't significantly affect the validation accuracy (although a size of 512 performs well and may be worth exploring).
Increasing both layers gives the model more predictive capacity (more parameters) and increases validation accuracy.
By increasing all the layers while maintaining their relative sizes, the validation accuracy goes up by about 1% from baseline.
Below you can see a parallel coordinates chart that shows correlations between hyperparameters and an output metric of choice. In this case, I'm using validation accuracy, which you can see in the colorful column on the right. Most of the experiments so far have a high validation accuracy around 0.9. Some hyperparameters like dropout, batch size, and layer 2 size have been sampled more extensively and do not seem to have a strong effect on performance. Others, like momentum, have not been varied and could be promising candidates for further experimentation.
In this view we can browse through predictions on specific examples from different runs. One common misclassification you'll notice as you browse these examples is between bags and shirts, e.g. because a bag handle resembles a neckline. Even a human can have trouble with these: