Skip to main content

Boost performance: Achieve Better Results, Faster - 25% Fewer Epochs

Comparison between Vanilla ResNet-18 trained on CIFAR-10, and ResNet-18 with three additional easy-to-use composer functions.
Created on August 6|Last edited on September 14
I have implemented a slightly modified ResNet-18 model on the CIFAR-10 dataset, which achieved an accuracy of 93.72% after training for 60 epochs. However, I made some improvements by incorporating three additional composer library functions, and this enhanced ResNet-18 model achieved a higher accuracy of 93.94% at the 43rd epoch. Notably, these improvements helped in saving both computation time and overall training time.

Section 1


01020304050Step0.40.60.811.21.41.6
01020304050Step406080
40455055Step92.59393.59494.595
Run set
2


Composer Functions

Three functions I used in the training of the model.
  1. Label Smoothing
  2. RandAugment
  3. MixUp

Label Smoothing

It is proposed by Christian Szegedy in this paper. It actually acts as a regularizer technique. The composer makes this function very easy to use.
How to use it?
import composer.functional as cf
for X, y in train_loader:
y_hat = model(X)
# note that if you were to modify the variable y here it is a good
# idea to set y back to the original targets after computing the loss
smoothed_targets = cf.smooth_labels(y_hat, y, smoothing=0.1)
loss = loss_fn(y_hat, smoothed_targets)

RandAugment

RandAugment applies random depth image augmentations sequentially from a set of augmentations (e.g. translation, shear, contrast) with severity values randomly selected from 0 to 10. This regularization method during training enhances network generalization.
It is proposed by Cubuk et al. (2020) in this paper.
How to use?
import torchvision.transforms as transforms
from composer.algorithms.randaugment import RandAugmentTransform
randaugment_transform = RandAugmentTransform(severity=9,
depth=2,
augmentation_set="all")
composed = transforms.Compose([randaugment_transform, ....])

MixUp

The following paragraph I copied from the bag of tricks paper.
"Here we consider another augmentation method called mixup. In mixup, each time we randomly sample two examples (xi, yi) and (xj , yj ). Then we form a new example by a weighted linear interpolation of these two examples:
x-hat = λxi + (1 − λ)xj
y-hat = λyi + (1 − λ)yj
where λ ∈ [0, 1] is a random number drawn from the Beta(α, α) distribution. In mixup training, we only use the new example (x-hat, y-hat)."

How to use it?
import composer.functional as cf
for epoch in range(num_epochs):
for X, y in train_loader:
X_mixed, y_perm, mixing = cf.mixup_batch(X, y, alpha=0.2)
y_hat = model(X_mixed)
loss = (1 - mixing) * loss_fn(y_hat, y) + mixing * loss_fn(y_hat, y_perm)
loss.backward()