Skip to main content

How to Properly Use PyTorch's CosineAnnealingWarmRestarts Scheduler

This article provides a short tutorial on how to use the CosineAnnealingWarmRestarts Scheduler in PyTorch, along with code and interactive visualizations.
Created on June 2|Last edited on February 3
In this article, we'll look at how you can use the CosineAnnealingWarmRestarts Scheduler in PyTorch for writing efficient training loops.
Unlike Tensorflow, PyTorch provides an easy interface to use various Learning Rate Schedulers, which we can easily add to the training loop!
For a closer look at the various Learning Rate Schedulers available in PyTorch you can refer to the official documentation.
💡

Table of Contents





Code

Most PyTorch training loops are of the following form:
optimizer = ...

for epoch in range(...):
for i, sample in enumerate(dataloader):
inputs, labels = sample
optimizer.zero_grad()

# Forward Pass
outputs = model(inputs)
# Compute Loss and Perform Backpropagation
loss = loss_fn(outputs, labels)
loss.backward()
# Update Optimizer
optimizer.step()
This assumes that the optimizer uses the same learning rate it was initialized with. However, experiments have shown that periodically using a Learning Rate Scheduler improves training stability and leads to better convergence.
Thus, we mostly use a Learning Rate Scheduler such as ReduceLROnPlateau. Let's look at how you can instantiate a scheduler and use it inside the training loop:
optimizer = ...
scheduler = ReduceLROnPlateau(optimizer, 'min')

for epoch in range(...):
for i, sample in enumerate(dataloader):
# Forward Pass
# Compute Loss and Perform Backpropagation
# Update Optimizer
optimizer.step()
scheduler.step() # < ----- Update Learning Rate
While this would work for most Schedulers, the CosineAnnealingWarmRestarts Scheduler requires some extra steps to function properly.
In order to properly change Learning Rate for longer training, you should ideally pass the epoch number while invoking the step() function. Like so:
optimizer = ...
scheduler = lr_scheduler.CosineAnnealingWarmRestarts(optimizer, ...)
iters = len(dataloader)

for epoch in range(...):
for i, sample in enumerate(dataloader):
# Forward Pass
# Compute Loss and Perform Back-propagation
# Update Optimizer
optimizer.step()
scheduler.step(epoch + i / iters)
If not done, this usually leads to a more erratic change of learning rate and not gradual and smooth as expected.

Summary

In this article, you saw how you can use the CosineAnnealingWarmRestarts Scheduler in PyTorch deep learning models and how using Weights & Biases to monitor your metrics can lead to valuable insights.
To see the full suite of W&B features, please check out this short 5 minutes guide. If you want more reports covering the math and "from-scratch" code implementations, let us know in the comments below or on our forum ✨!
Check out these other reports on Fully Connected covering other fundamental development topics like GPU Utilization and Saving Models.

Iterate on AI agents and models faster. Try Weights & Biases today.