CIFAR10 classification with ResNet and a simple convnet.

This is a short report for the Stepik Computer Vision course. Two general model architectures - typical ConvNet with max-pooling and ResNet - were applied to the classification task on the CIFAR10 dataset. Deep but narrow ResNet20 was compared with the much larger ResNet18 designed for the ImageNet task. For ResNet20 model hyperparameter search was conducted 1) on the dropout rate of two convolutional layers of each ResNet block (0<p<0.2) and 2) weight decay of the Adam optimizer (5e-3<wd<1e-5).

Denis Moiseenko

Created on June 11|Last edited on June 16

Comment

﻿
ResNet20(270k parameters) vs ResNet18 (11690k parameters, outputs 1000 classes) vs CIFARnet (1282k parameters) Deep but narrow ResNet20 was compared with the much larger ResNet18 designed for the ImageNet task and with the modification of LeNet architecture (using max-poolings). Even though CIFARnet contains 5x parameters it still computes only 1/8 of the FLOPS on the (3x32x32) image input (5.28 MMAC vs 41.12 MMAC of the ResNet18), which may partly explain its better performance after estimation errors of both networks are reduced with regularization.
﻿
﻿
﻿
val_accuracy
val_accuracy
5k10k15k20k25k30kStep0.20.40.60.81
CIFARnet_drop-0.0-only
ResNet20_no_reg
ResNet18_ImageNet
train_accuracy
train_accuracy
5k10k15k20k25k30kStep0.20.40.60.81
CIFARnet_drop-0.0-only
ResNet20_no_reg
ResNet18_ImageNet
Run set3
﻿
Dropout2d tuning for ResNet blocksWe can observe that even very modest dropout rate of 0.03 successfully prevents overfitting (early stopping was not called for any of the runs with set dropout2d). This effect was not nearly as pronounced for the traditional convolutional architecture of CifarNet (see below) and may depend on the high depth/breadth ratio of the network.
﻿
﻿
﻿
Run set8
﻿
Lambda for L2 Regularization (weight-decay) tuning; optimizer - Adam.
﻿
﻿
﻿
Run set4
﻿
Comparison of CIFARnet and ResNet after applying small dropoutSeemingly ResNet gains more performance from the dropout (0.07 vs 0.01 gain in validation accuracy for ResNet and CIFARnet respectively) which may simply be a result of the higher number of stochastically 'dropped' neurons due to the higher number of dropout layers (3 vs 18).
﻿
﻿
﻿
Run set5
﻿
Comparison of ResNet110 training efficiency with and without BatchNorm2dWithout channel-wise normalization of pixel intensities across all images in one batch we couldn't train ResNet110 with 54 ResNet blocks.
With BN2d network approached its plateau (visual assessment of the validation loss function over steps graph) somewhere around  25k steps which is only slightly later (~20k steps with the same regularization and lr) than the much less deep ResNet20.
﻿
﻿
﻿
Run set2
﻿
Comparing ResNet110 with its denser, shorter counterpart.MyResNet adds fourth bundles of ResNet blocks to the architecture. Number of blocks in each bundle is (5,7,7,5) and number of channels (32,64,128,256). Additionally, for training of these networks several data augmentation techniques are used. MyResNet architecture has more parameters than the ResNet110 despite having only 50 layers. 
﻿
﻿
﻿
Run set4
﻿
Link to the code https://github.com/culpritgene/fineopia/blob/master/LaunchPad.ipynb (see CIFARnet.py file for the exact implementation of each model)
﻿
﻿

Add a comment