Pytorch Lightning with Weights & Biases

Comparing Pytorch and Pytorch Lightning with Weights & Biases
Created on September 19|Last edited on September 19
Comment
PyTorch Lightning lets you decouple science code from engineering code. Try this quick tutorial to visualize Lightning models and optimize hyperparameters with an easy Weights & Biases integration.
﻿Try Pytorch Lightning →, or explore this integration in a live dashboard →.
Installation and IntroductionInstalling PyTorch lightning is very simple:
pip install pytorch-lightning
To use it in our PyTorch code, we’ll import the necessary PyTorch lightning modules:
import pytorch_lightning as pl
from pytorch_lightning.loggers import WandbLogger
We’ll use WandbLogger to track our experiment results and log them directly to W&B.
Creating Our Lightning ClassResearch often involves editing the boiler plate code with new experimental variations. Most of the errors get introduced into the codebase due to this tinkering process. Pytorch lighting significantly reduces the boiler plate code by providing definite code structures for defining and training models.
To create a neural network class in pytorch we have to import or extend from torch.nn.module.  Similarly, when we use pytorch lightning, we import the class pl.LightningModule.
Let’s create our class, which we’ll use to train a model for classifying the MNIST dataset. We’ll use the same example as the one in the official documentation in order to compare our results.
class LightningMNISTClassifier(pl.LightningModule):
 def __init__(self):
super(LightningMNISTClassifier, self).__init__()
﻿
   # mnist images are (1, 28, 28) (channels, width, height)
﻿
   self.layer_1 = torch.nn.Linear(28 * 28, 128)
   self.layer_2 = torch.nn.Linear(128, 256)
   self.layer_3 = torch.nn.Linear(256, 10)
   As you can see above, except for the base class imported, everything else in the code is pretty much the same as the original PyTorch code would be.
In PyTorch, this data loading can be done anywhere in your main training file.
In PyTorch Lightning it is done in the three specific methods of the LightningModule.
train_dataloader()
val_dataloader()
test_dataloader()
And a fourth method meant for data preparation/downloading.
prepare_data()
Here’s how to do this in code.
 def prepare_data(self):
  # prepare transforms standard to MNIST
   MNIST(os.getcwd(), train=True, download=True)
   MNIST(os.getcwd(), train=False, download=True)
﻿
 def train_dataloader(self):
   #Load the dataset
   mnist_train = DataLoader(self.mnist_train, batch_size=64)
   return mnist_train
﻿
 def val_dataloader(self):
   #Load val dataset
   mnist_val = DataLoader(self.mnist_val, batch_size=64)
   return mnist_val
﻿
 def test_dataloader(self):
   #Load test data
   mnist_test = DataLoader(mnist_test, batch_size=64)
   return mnist_test
  The optimizer code is the same for Lightning, except that it is added to the function configure_optimizers() in the LightningModule.
If we consider a traditional PyTorch training pipeline, we’ll need to implement the loop for epochs, iterate the mini-batches, perform feed forward pass for each mini-batch, compute the loss, perform backprop for each batch and then finally update the gradients.
To do the same in lightning, we pull out the main parts of the training loop and the validation loop into three functions:
training_step
validation_step
validation_end
The prototypes of these functions are:
 def training_step(self, train_batch, batch_idx):
 def validation_step(self, val_batch, batch_idx):
 def validation_end(self, outputs):
‍Using these functions, Pytorch Lightning will automate the training part of the pipeline. We’ll get to that but before let’s see how pytorch lightning easily integrates with Weights & Buases to track experiments and create visualizations you can monitor from anywhere.
Track Pytorch Lightning Model Performance with W&BLet’s see how the wandbLogger integrates with lightning.
from pytorch_lightning.loggers import WandbLogger
wandb_logger = WandbLogger(name='Adam-32-0.001',project='pytorchlightning')
Here, we’ve created a wandbLogger object which holds the details about the project and the run being logged.
Training LoopNow, let’s jump into the most important part of training any model, the training loop. As we are using pytorch lightning, most of the things are already taken care of behind the scenes. We just need to specify a few hyper-parameters and the training process will be completed automatically. As an added benefit, you’ll also get a cool progress bar for each iteration.
model = LightningMNISTClassifier()
model.prepare_data()
model.train_dataloader()
trainer = pl.Trainer(max_epochs = 5,logger= wandb_logger)
The important part in the code regarding the visualization is the part where wandbLogger object is passed as a logger in the Trainer object of lightning. This will automatically use the logger to log the results.
def train():
trainer.fit(model)
This is all you need to do in order to train your pytorch model using lightning. This one line code will easily replace your bulky and inefficient vanilla pytorch code.
PyTorch also gives you a nice progress bar keeping track of each iteration.
﻿
Visualizing Performance with Weights & BiasesLet’s have a look at the visualizations generated for this run in the dashboard.
﻿
Train loss and validation loss for the particular run are automatically logged in the dashboard in real time as the model is being trained.
We can repeat the same training step with different hyper-parameters to compare different runs. We’ll change the name of the logger to uniquely identify each run.
wandb_logger = WandbLogger(name='Adam-32-0.001',project='pytorchlightning')
wandb_logger = WandbLogger(name='Adam-64-0.01',project='pytorchlightning')
wandb_logger = WandbLogger(name='sgd-64-0.01',project='pytorchlightning')
Here I’ve used a convention to name the runs. The first part is the optimizer, the second is the mini-batch size and third is the learning rate. For example the name ‘Adam-32-0.001’ means the optimizer being used is Adam with batch size of 32 and the learning rate is 0.001.
Here’s how our models are faring so far.
﻿
﻿
These visualizations are stored forever in your project which makes it much easier to compare the performances of variations with different hyperparameters, restore the best performing model and share results with your team.
Multi GPU trainingLightning provides a simple API for performing data parallelism and multi-gpu training. You don’t need to use torch’s data parallelism class in the sampler. You just need to specify the parallelism mode and the number of GPUs you wish to use.
There are multiple ways of training:
Data Parallel (distributed_backend=’dp’) (multiple-gpus, 1 machine)
DistributedDataParallel (distributed_backend=’ddp’) (multiple-gpus across many machines).
DistributedDataParallel2 (distributed_backend=’ddp2’) (dp in a machine, ddp across machines).
TPUs (num_tpu_cores=8|x) (tpu or TPU pod)
We’ll use the data parallel backend in this post. Here’s how we can incorporate it in the existing code.
trainer = pl.Trainer(max_epochs = 5,logger= wandb_logger, gpus=1, distributed_backend='dp')
Here I’m using only 1 GPU as I’m working on google colab.
As you use more GPUs, you'd be able to monitor the difference in memory usage between different configurations in wandb, like in the plot below.
﻿
Early StoppingPytorch Lightning provides 2 methods to incorporate early stopping. Here’s how you can do use them:
A) Set early_stop_callback to True. Will look for 'val_loss'
﻿
# in validation_end() return dict. If it is not found an error is raised.
trainer = Trainer(early_stop_callback=True)
﻿
B) Or configure your own callback
early_stop_callback = EarlyStopping(
   monitor='val_loss',
   min_delta=0.00,
   patience=3,
   verbose=False,
   mode='min')
trainer = Trainer(early_stop_callback=early_stop_callback)
As we’ve created the  validation_end() function, we can directly set the early_stop_callback = true: 
trainer = pl.Trainer(max_epochs = 5,logger= wandb_logger, gpus=1, distributed_backend='dp',early_stop_callback=True)
16-bit PrecisionDepending on the requirements of a project, you might need to increase or decrease the precision of the weights of a model. Reducing precision allows you to fit bigger models into your GPU. Let’s see how we can incorporate 16-bit precision in pytorch lightning.
First, we need to install NVIDIA apex. To do that, we’ll create a shell script in colab and execute it.
%%writefile setup.sh
git clone https://github.com/NVIDIA/apex
pip install -v --no-cache-dir ./apex
!sh setup.sh
You’ll need to restart the runtime after installing apex.
Now we can directly pass in the required value in the precision parameter of the trainer.
trainer = pl.Trainer(max_epochs = 100,logger= wandb_logger, gpus=1, distributed_backend='dp',early_stop_callback=True, amp_level='O1',precision=16)
Saving And Loading ModelsOften during research, you’ll need to train a model in intervals. This brings up the need to stop the training, save the state, load the saved state later and then resume the training where we stopped.
Being able to save and restore models also allows you collaborate more effectively with your team and return to experiments from a few weeks ago.
To save pytorch lightning models with W&B, we use:
trainer.save_checkpoint('EarlyStoppingADam-32-0.001.pth')
wandb.save('EarlyStoppingADam-32-0.001.pth')
This creates a checkpoint file in the local runtime, and uploads it to wandb. Now, when we decide to resume training even on a different system, we can simply load the checkpoint file from wandb and load it into our program like so:
wandb.restore('EarlyStoppingADam-32-0.001.pth')
model.load_from_checkpoint('EarlyStoppingADam-32-0.001.pth')
Now the checkpoint has been loaded into the model and the training can be resumed using the desired training module.
Comparison With PytorchNow that we’ve seen the simplistic framework that lightning provides, let’s have a quick look at how it compares with pytorch. In lightning, we can train the model with automatic callbacks as well as progress bars by just creating a trainer and calling train() method on it.
Let’s see how the same can be achieved using Vanilla Pytorch.
#Pytorch
pytorch_model = MNISTClassifier(
optimizer = torch.optim.Adam(pytorch_model.parameters(), lr=1e-3)
﻿
# ----------------
# LOSS
# ----------------
﻿
def cross_entropy_loss(logits, labels):
 return F.nll_loss(logits, labels)
﻿
# ----------------
# TRAINING LOOP
# ----------------
﻿
num_epochs = 1
for epoch in range(num_epochs):
﻿
 # TRAINING LOOP
 for train_batch in mnist_train:
   x, y = train_batch
   
   logits = pytorch_model(x)
   loss = cross_entropy_loss(logits, y)
   print('train loss: ', loss.item())
 
   loss.backward()
﻿
   optimizer.step()
   optimizer.zero_grad()
﻿
 # VALIDATION LOOP
﻿
 with torch.no_grad():
   val_loss = []
   for val_batch in mnist_val:
     x, y = val_batch
     logits = pytorch_model(x)
     val_loss.append(cross_entropy_loss(logits, y).item())
   val_loss = torch.mean(torch.tensor(val_loss))
   print('val_loss: ', val_loss.item())
﻿
You can see how complicated the training code can get and we haven’t even included the modifications to incorporate multi GPU training, early stopping or tracking performance with wandb yet.
For adding distributed training in Pytorch, we need to use DistributedSampler for sampling our dataset.
def train_dataloader(self):
   dataset = MNIST(...)
   sampler = None
﻿
   if self.on_tpu:
       sampler = DistributedSampler(dataset)
﻿
   return DataLoader(dataset, sampler=sampler)
You’ll also need to write a custom function to incorporate early stopping.
But when using lightning, all of this can be accomplished by one line of code.
#Pytorch Lightning
trainer = pl.Trainer(max_epochs = 5,logger= wandb_logger, gpus=1, distributed_backend='dp',early_stop_callback=True)
trainer.fit(model)
﻿
That’s all for this post.
﻿Try Pytorch Lightning →, or explore this integration in a live dashboard →.﻿