How to Properly Use PyTorch's CosineAnnealingWarmRestarts Scheduler

This article provides a short tutorial on how to use the CosineAnnealingWarmRestarts Scheduler in PyTorch, along with code and interactive visualizations.
Saurav Maheshkar
Created on June 2|Last edited on February 3
Comment
In this article, we'll look at how you can use the CosineAnnealingWarmRestarts Scheduler in PyTorch for writing efficient training loops.
Unlike Tensorflow, PyTorch provides an easy interface to use various Learning Rate Schedulers, which we can easily add to the training loop!
For a closer look at the various Learning Rate Schedulers available in PyTorch you can refer to the official documentation.
💡
Table of ContentsCodeSummaryRecommended Reading
﻿
﻿
CodeMost PyTorch training loops are of the following form:
optimizer = ...
﻿
for epoch in range(...):
    for i, sample in enumerate(dataloader):
        inputs, labels = sample
        optimizer.zero_grad()
﻿
	# Forward Pass
        outputs = model(inputs)
        # Compute Loss and Perform Backpropagation
	loss = loss_fn(outputs, labels)
        loss.backward()
	# Update Optimizer
        optimizer.step()
This assumes that the optimizer uses the same learning rate it was initialized with. However, experiments have shown that periodically using a Learning Rate Scheduler improves training stability and leads to better convergence. 
Thus, we mostly use a Learning Rate Scheduler such as ReduceLROnPlateau. Let's look at how you can instantiate a scheduler and use it inside the training loop:
optimizer = ...
scheduler = ReduceLROnPlateau(optimizer, 'min')
﻿
for epoch in range(...):
    for i, sample in enumerate(dataloader):
	# Forward Pass
        # Compute Loss and Perform Backpropagation
	# Update Optimizer
        optimizer.step()
    scheduler.step() # < ----- Update Learning Rate
While this would work for most Schedulers, the CosineAnnealingWarmRestarts Scheduler requires some extra steps to function properly. 
In order to properly change Learning Rate for longer training, you should ideally pass the epoch number while invoking the step() function. Like so: 
optimizer = ...
scheduler = lr_scheduler.CosineAnnealingWarmRestarts(optimizer, ...)
iters = len(dataloader)
﻿
for epoch in range(...):
    for i, sample in enumerate(dataloader):
	# Forward Pass
        # Compute Loss and Perform Back-propagation
	# Update Optimizer
        optimizer.step()
        scheduler.step(epoch + i / iters)
If not done, this usually leads to a more erratic change of learning rate and not gradual and smooth as expected.
SummaryIn this article, you saw how you can use the CosineAnnealingWarmRestarts Scheduler in PyTorch deep learning models and how using Weights & Biases to monitor your metrics can lead to valuable insights.
To see the full suite of W&B features, please check out this short 5 minutes guide. If you want more reports covering the math and "from-scratch" code implementations, let us know in the comments below or on our forum ✨!
Check out these other reports on Fully Connected covering other fundamental development topics like GPU Utilization and Saving Models.
Recommended Reading
Setting Up TensorFlow And PyTorch Using GPU On Docker
A short tutorial on setting up TensorFlow and PyTorch deep learning models on GPUs using Docker.
How to Compare Keras Optimizers in Tensorflow for Deep Learning
A short tutorial outlining how to compare Keras optimizers for your deep learning pipelines in Tensorflow, with a Colab to help you follow along.
Preventing The CUDA Out Of Memory Error In PyTorch
A short tutorial on how you can avoid the "RuntimeError: CUDA out of memory" error while using the PyTorch framework.
How to Initialize Weights in PyTorch
A short tutorial on how you can initialize weights in PyTorch with code and interactive visualizations.
Recurrent Neural Network Regularization With Keras
A short tutorial teaching how you can use regularization methods for Recurrent Neural Networks (RNNs) in Keras, with a Colab to help you follow along.
How To Calculate Number of Model Parameters for PyTorch and TensorFlow Models
This article provides a short tutorial on calculating the number of parameters for TensorFlow and PyTorch deep learning models, with examples for you to follow.
﻿
﻿
Add a comment
Tags: Tutorial, PyTorch, Articles, Beginner, Domain Agnostic
Iterate on AI agents and models faster. Try Weights & Biases today.