Boost performance: Achieve Better Results, Faster - 25% Fewer Epochs
Comparison between Vanilla ResNet-18 trained on CIFAR-10, and ResNet-18 with three additional easy-to-use composer functions.
Created on August 6|Last edited on September 14
Comment
I have implemented a slightly modified ResNet-18 model on the CIFAR-10 dataset, which achieved an accuracy of 93.72% after training for 60 epochs. However, I made some improvements by incorporating three additional composer library functions, and this enhanced ResNet-18 model achieved a higher accuracy of 93.94% at the 43rd epoch. Notably, these improvements helped in saving both computation time and overall training time.
Section 1
Run set
2
Composer Functions
Three functions I used in the training of the model.
- Label Smoothing
- RandAugment
- MixUp
Label Smoothing
It is proposed by Christian Szegedy in this paper. It actually acts as a regularizer technique. The composer makes this function very easy to use.
How to use it?
import composer.functional as cf
for X, y in train_loader:
y_hat = model(X)
# note that if you were to modify the variable y here it is a good
# idea to set y back to the original targets after computing the loss
smoothed_targets = cf.smooth_labels(y_hat, y, smoothing=0.1)
loss = loss_fn(y_hat, smoothed_targets)
RandAugment
RandAugment applies random depth image augmentations sequentially from a set of augmentations (e.g. translation, shear, contrast) with severity values randomly selected from 0 to 10. This regularization method during training enhances network generalization.
How to use?
import torchvision.transforms as transforms
from composer.algorithms.randaugment import RandAugmentTransform
randaugment_transform = RandAugmentTransform(severity=9,
depth=2,
augmentation_set="all")
composed = transforms.Compose([randaugment_transform, ....])
MixUp
"Here we consider another augmentation method called mixup. In mixup, each time we randomly sample two examples (xi, yi) and (xj , yj ). Then we form a new example by a weighted linear interpolation of these two examples:
x-hat = λxi + (1 − λ)xj
y-hat = λyi + (1 − λ)yj
where λ ∈ [0, 1] is a random number drawn from the Beta(α, α) distribution. In mixup training, we only use the new example (x-hat, y-hat)."
How to use it?
import composer.functional as cf
for epoch in range(num_epochs):
for X, y in train_loader:
X_mixed, y_perm, mixing = cf.mixup_batch(X, y, alpha=0.2)
y_hat = model(X_mixed)
loss = (1 - mixing) * loss_fn(y_hat, y) + mixing * loss_fn(y_hat, y_perm)
loss.backward()
Add a comment