Skip to main content

CIFAR10 classification with ResNet and a simple convnet.

This is a short report for the Stepik Computer Vision course. Two general model architectures - typical ConvNet with max-pooling and ResNet - were applied to the classification task on the CIFAR10 dataset. Deep but narrow ResNet20 was compared with the much larger ResNet18 designed for the ImageNet task. For ResNet20 model hyperparameter search was conducted 1) on the dropout rate of two convolutional layers of each ResNet block (0<p<0.2) and 2) weight decay of the Adam optimizer (5e-3<wd<1e-5).
Created on June 11|Last edited on June 16

ResNet20(270k parameters) vs ResNet18 (11690k parameters, outputs 1000 classes) vs CIFARnet (1282k parameters)

Deep but narrow ResNet20 was compared with the much larger ResNet18 designed for the ImageNet task and with the modification of LeNet architecture (using max-poolings). Even though CIFARnet contains 5x parameters it still computes only 1/8 of the FLOPS on the (3x32x32) image input (5.28 MMAC vs 41.12 MMAC of the ResNet18), which may partly explain its better performance after estimation errors of both networks are reduced with regularization.




5k10k15k20k25k30kStep0.20.40.60.81
5k10k15k20k25k30kStep0.20.40.60.81
Run set
3


Dropout2d tuning for ResNet blocks

We can observe that even very modest dropout rate of 0.03 successfully prevents overfitting (early stopping was not called for any of the runs with set dropout2d). This effect was not nearly as pronounced for the traditional convolutional architecture of CifarNet (see below) and may depend on the high depth/breadth ratio of the network.




Run set
8


Lambda for L2 Regularization (weight-decay) tuning; optimizer - Adam.




Run set
4


Comparison of CIFARnet and ResNet after applying small dropout

Seemingly ResNet gains more performance from the dropout (0.07 vs 0.01 gain in validation accuracy for ResNet and CIFARnet respectively) which may simply be a result of the higher number of stochastically 'dropped' neurons due to the higher number of dropout layers (3 vs 18).




Run set
5


Comparison of ResNet110 training efficiency with and without BatchNorm2d

Without channel-wise normalization of pixel intensities across all images in one batch we couldn't train ResNet110 with 54 ResNet blocks. With BN2d network approached its plateau (visual assessment of the validation loss function over steps graph) somewhere around 25k steps which is only slightly later (~20k steps with the same regularization and lr) than the much less deep ResNet20.




Run set
2


Comparing ResNet110 with its denser, shorter counterpart.

MyResNet adds fourth bundles of ResNet blocks to the architecture. Number of blocks in each bundle is (5,7,7,5) and number of channels (32,64,128,256). Additionally, for training of these networks several data augmentation techniques are used. MyResNet architecture has more parameters than the ResNet110 despite having only 50 layers.




Run set
4


Link to the code https://github.com/culpritgene/fineopia/blob/master/LaunchPad.ipynb (see CIFARnet.py file for the exact implementation of each model)