Skip to main content

Implementing and Tracking the Performance of a CNN in PyTorch

This article provides a guide to implementing and tracking the performance of a Convolutional Neural Network (CNN) in PyTorch.
Created on July 6|Last edited on October 26
In this article, we will show you how to implement a Convolutional Neural Network in PyTorch. We will define the model's architecture, train the CNN, and leverage Weights & Biases to observe the effect of changing hyperparameters (like filter and kernel sizes) on model performance.
A CNN can extract spatial and temporal relationships in data with a known grid-like topology, e.g., images (2D grid of pixels) and audio or time series data (1D grid of samples at regular intervals). You can see an example of a convolutional operation below (source):



Table of Contents



1. Download and Prepare Data

For this report, will we use the CIFAR-10 dataset. Using torchvision, it is effortless to load CIFAR-10.
BATCH_SIZE = 32

transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

# load training dataset
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=BATCH_SIZE,
shuffle=True, num_workers=2)
# ... load test dataset ...

CLASS_NAMES = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
The output of a torchvision dataset (trainset) are PILImage images with values in the range [0,1]. Using transform.transforms we can transform them into normalized tensors [-1, 1].

2. Define the CNN Model in PyTorch

Define the Model

In PyTorch, a model is defined by subclassing the torch.nn.Module class. We define our model, the Net class this way.
The model is defined in two steps: First, we specify the parameters of our model, then we outline how they are applied to the inputs. The __init__ method initializes the layers used in our model – in our example, these are the Conv2d, Maxpool2d, and Linear layers.
The forward method defines the feed-forward operation on the input data x.
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)

def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x


Define the Convolution

Using torch.nn.Conv2d, we can apply a 2D convolution over an input signal (images in our dataset). The most important parameters of the convolutional layer are:
  • in_channels and out_channels: number of input and output channels, respectively.
  • kernel_size: the size of the convolutional filter (3x3 in our example).
  • stride: controls the stride for the cross-correlation, and can be a single number or a tuple.
  • padding: the number of implicit zero-paddings on both sides for padding number of points for each dimension.
Our conv1 layer is initialized with 2 input channels, 6 output channels, and a kernel size of 5.
Next, we add a pooling layer, torch.nn.MaxPool2d, which downsamples our feature maps by summarizing features in patches of the feature map. The most critical parameters for this layer are:
  • kernel_size – the size of the window to take a max over
  • stride – the stride of the window. The default value is kernel_size
  • padding – implicit zero padding to be added on both sides
Next, we flatten the last convolutional or pooling layer's output so it can be fed into a fully connected neural network to map the features extracted to their corresponding classes. In PyTorch, this is done using nn.Linear layer.
In the forward method, you will see we apply the ReLU activation (using F.relu) to the layer's output to avoid succumbing to the vanishing gradient problem.
def train(model, device, train_loader, optimizer, criterion, epoch, steps_per_epoch=20):
# Switch model to training mode. This is necessary for layers like dropout, batchnorm, etc., which behave differently in training and evaluation mode
model.train()

train_loss = 0
train_total = 0
train_correct = 0

# We loop over the data iterator, and feed the inputs to the network and adjust the weights.
for batch_idx, (data, target) in enumerate(train_loader, start=0):
# Load the input features and labels from the training dataset
data, target = data.to(device), target.to(device)
# Reset the gradients to 0 for all learnable weight parameters
optimizer.zero_grad()
# Forward pass: Pass image data from training dataset, make predictions about class image belongs to (0-9 in this case)
output = model(data)
# Define our loss function, and compute the loss
loss = criterion(output, target)
train_loss += loss.item()

scores, predictions = torch.max(output.data, 1)
train_total += target.size(0)
train_correct += int(sum(predictions == target))
# Reset the gradients to 0 for all learnable weight parameters
optimizer.zero_grad()

# Backward pass: compute the gradients of the loss w.r.t. the model's parameters
loss.backward()
# Update the neural network weights
optimizer.step()

acc = round((train_correct / train_total) * 100, 2)
print('Epoch [{}], Loss: {}, Accuracy: {}'.format(epoch, train_loss/train_total, acc), end='')
wandb.log({'Train Loss': train_loss/train_total, 'Train Accuracy': acc})
Now let us train our model and use Weights & Biases to measure its performance.

3. Train the Model

In PyTorch, the core of the training step looks like this:
output_batch = model(train_batch) # get the model predictions
loss = loss_fn(output_batch, labels_batch) # calculate the loss

optimizer.zero_grad() # clear previous gradients - note: this step is very important!

loss.backward() # compute gradients of all variables w.r.t. the loss

optimizer.step() # update the network using the calculated gradients
The test step is similar with two key differences:
  • In the training step, we use model.train(), while in the test step, we use model.test().
  • In the test step, we do not backpropagate the loss computed by the model output.

4. Visualize the Model Performance

We used wandb.log() in step 2 to log our Train Accuracy and Train Loss. Weights & Biases helps us save everything we need to compare and reproduce models — architecture, hyperparameters, weights, model predictions, GPU usage, git commits, and even datasets.
Let us train the model for 10 epochs and see the results automatically logged by Weights & Biases. We can see that the model trained well but quickly overfitted on the dataset. We can observe this from the diverging validation loss curve.



Run set
1



5. Observing the Effect of Tweaking Hyperparameters

Next, we'll change the values of a hyperparameter (kernel_size) and observe its effect on model performance using Weights & Biases. You can try a plethora of values for a plethora of hyperparameters with just a few lines of code using Sweeps.
Check out this colab for full code for running a Sweep with a PyTorch model.

Observations:

This experiment showcases the effect of changing kernel size, i.e. the size of the filter that strides over the input and performs convolutions. Weights & Biases automatically generates a few helpful plots that help us analyze the results of our hyperparameter search.
  • We can see that a kernel_size of 3 for both the convolutional layers led to the best model performance. Since the network was shallow, the small kernel sizes resulted in better resolution for feature extraction.
  • From the parameter importance plot, we can see that the first layer's kernel size is more correlated to lower values of test_loss.


Run set 2
20



Weights & Biases

Weights & Biases helps you keep track of your machine learning experiments. Use our tool to log hyperparameters and output metrics from your runs, then visualize and compare results and quickly share findings with your colleagues.
Get started in 5 minutes.

Iterate on AI agents and models faster. Try Weights & Biases today.