Implementing and Tracking the Performance of a CNN in PyTorch

This article provides a guide to implementing and tracking the performance of a Convolutional Neural Network (CNN) in PyTorch.
Ayush Thakur
Created on July 6|Last edited on October 26
Comment
﻿
In this article, we will show you how to implement a Convolutional Neural Network in PyTorch. We will define the model's architecture, train the CNN, and leverage Weights & Biases to observe the effect of changing hyperparameters (like filter and kernel sizes) on model performance.
A CNN can extract spatial and temporal relationships in data with a known grid-like topology, e.g., images (2D grid of pixels) and audio or time series data (1D grid of samples at regular intervals). You can see an example of a convolutional operation below (source):

﻿
﻿Full code in colab →﻿
﻿
Table of Contents1. Download and Prepare Data2. Define the CNN Model in PyTorchDefine the ModelDefine the Convolution3. Train the Model4. Visualize the Model Performance5. Observing the Effect of Tweaking HyperparametersWeights & Biases
﻿
1. Download and Prepare DataFor this report, will we use the CIFAR-10 dataset. Using torchvision, it is effortless to load CIFAR-10.
BATCH_SIZE = 32
﻿
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
﻿
# load training dataset
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=BATCH_SIZE,
                                          shuffle=True, num_workers=2)
# ... load test dataset ...
﻿
CLASS_NAMES = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
The output of a torchvision dataset (trainset) are PILImage images with values in the range [0,1]. Using transform.transforms we can transform them into normalized tensors [-1, 1].
2. Define the CNN Model in PyTorch
Define the ModelIn PyTorch, a model is defined by subclassing the torch.nn.Module class. We define our model, the Net class this way.
The model is defined in two steps: First, we specify the parameters of our model, then we outline how they are applied to the inputs. The __init__ method initializes the layers used in our model – in our example, these are the Conv2d, Maxpool2d, and Linear layers.
The forward method defines the feed-forward operation on the input data x.
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
﻿
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
﻿
Define the ConvolutionUsing torch.nn.Conv2d, we can apply a 2D convolution over an input signal (images in our dataset). The most important parameters of the convolutional layer are:
in_channels and out_channels: number of input and output channels, respectively.
kernel_size: the size of the convolutional filter (3x3 in our example).
stride: controls the stride for the cross-correlation, and can be a single number or a tuple.
padding: the number of implicit zero-paddings on both sides for padding number of points for each dimension.
Our conv1 layer is initialized with 2 input channels, 6 output channels, and a kernel size of 5.
Next, we add a pooling layer, torch.nn.MaxPool2d, which downsamples our feature maps by summarizing features in patches of the feature map. The most critical parameters for this layer are:
kernel_size – the size of the window to take a max over
stride – the stride of the window. The default value is kernel_size
padding – implicit zero padding to be added on both sides
Next, we flatten the last convolutional or pooling layer's output so it can be fed into a fully connected neural network to map the features extracted to their corresponding classes. In PyTorch, this is done using nn.Linear layer. 
In the forward method, you will see we apply the ReLU activation (using F.relu) to the layer's output to avoid succumbing to the vanishing gradient problem. 
def train(model, device, train_loader, optimizer, criterion, epoch, steps_per_epoch=20):
  # Switch model to training mode. This is necessary for layers like dropout, batchnorm, etc., which behave differently in training and evaluation mode
  model.train()
﻿
  train_loss = 0
  train_total = 0
  train_correct = 0
﻿
  # We loop over the data iterator, and feed the inputs to the network and adjust the weights.
  for batch_idx, (data, target) in enumerate(train_loader, start=0):
    
    # Load the input features and labels from the training dataset
    data, target = data.to(device), target.to(device)
    
    # Reset the gradients to 0 for all learnable weight parameters
    optimizer.zero_grad()
    
    # Forward pass: Pass image data from training dataset, make predictions about class image belongs to (0-9 in this case)
    output = model(data)
    
    # Define our loss function, and compute the loss
    loss = criterion(output, target)
    train_loss += loss.item()
﻿
    scores, predictions = torch.max(output.data, 1)
    train_total += target.size(0)
    train_correct += int(sum(predictions == target))
            
    # Reset the gradients to 0 for all learnable weight parameters
    optimizer.zero_grad()
﻿
    # Backward pass: compute the gradients of the loss w.r.t. the model's parameters
    loss.backward()
    
    # Update the neural network weights
    optimizer.step()
﻿
  acc = round((train_correct / train_total) * 100, 2)
  print('Epoch [{}], Loss: {}, Accuracy: {}'.format(epoch, train_loss/train_total, acc), end='')
  wandb.log({'Train Loss': train_loss/train_total, 'Train Accuracy': acc})
  
Now let us train our model and use Weights & Biases to measure its performance. 
3. Train the ModelIn PyTorch, the core of the training step looks like this:
output_batch = model(train_batch) # get the model predictions
loss = loss_fn(output_batch, labels_batch)  # calculate the loss
﻿
optimizer.zero_grad()  # clear previous gradients - note: this step is very important!
﻿
loss.backward() # compute gradients of all variables w.r.t. the loss
﻿
optimizer.step() # update the network using the calculated gradients
The test step is similar with two key differences:
In the training step, we use model.train(), while in the test step, we use model.test().
In the test step, we do not backpropagate the loss computed by the model output. 
4. Visualize the Model PerformanceWe used wandb.log() in step 2 to log our Train Accuracy and Train Loss. Weights & Biases helps us save everything we need to compare and reproduce models — architecture, hyperparameters, weights, model predictions, GPU usage, git commits, and even datasets.
Let us train the model for 10 epochs and see the results automatically logged by Weights & Biases. We can see that the model trained well but quickly overfitted on the dataset. We can observe this from the diverging validation loss curve.
﻿
﻿
﻿
Run set1
﻿
﻿
5. Observing the Effect of Tweaking HyperparametersNext, we'll change the values of a hyperparameter (kernel_size) and observe its effect on model performance using Weights & Biases. You can try a plethora of values for a plethora of hyperparameters with just a few lines of code using Sweeps.
Check out this colab for full code for running a Sweep with a PyTorch model.
Observations:This experiment showcases the effect of changing kernel size, i.e. the size of the filter that strides over the input and performs convolutions. Weights & Biases automatically generates a few helpful plots that help us analyze the results of our hyperparameter search.
We can see that a kernel_size of 3 for both the convolutional layers led to the best model performance. Since the network was shallow, the small kernel sizes resulted in better resolution for feature extraction.
From the parameter importance plot, we can see that the first layer's kernel size is more correlated to lower values of test_loss.
﻿
﻿
Run set 220
﻿
﻿
Weights & BiasesWeights & Biases helps you keep track of your machine learning experiments. Use our tool to log hyperparameters and output metrics from your runs, then visualize and compare results and quickly share findings with your colleagues.
﻿Get started in 5 minutes.
﻿
﻿
Add a comment
Tags: Intermediate, Computer Vision, Domain Agnostic, Object Detection, PyTorch, Tutorial, CNN, Plots, Sweeps, CIFAR10
Iterate on AI agents and models faster. Try Weights & Biases today.