In this tutorial, we will show you how to implement a Convolutional Neural Network in PyTorch. We will define the model's architecture, train the CNN, and leverage Weights and Biases to observe the effect of changing hyperparameters (like filter and kernel sizes) on model performance.
A Convolutional Neural Network can extract spatial and temporal relationships in data with a known grid-like topology, e.g., images (2D grid of pixels) and audio or time series data (1D grid of samples at regular intervals). You can see an example of a convolutional operation below (source):
For this report, will we use the CIFAR-10 dataset. Using torchvision
, it is effortless to load CIFAR-10.
BATCH_SIZE = 32
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
# load training dataset
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=BATCH_SIZE,
shuffle=True, num_workers=2)
# ... load test dataset ...
CLASS_NAMES = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
The output of a torchvision
dataset (trainset
) are PILImage
images with values in the range [0,1]
. Using transform.transforms
we can transform them into normalized tensors [-1, 1].
In PyTorch, a model is defined by subclassing the torch.nn.Module
class. We define our model, the Net
class this way.
The model is defined in two steps: First, we specify the parameters of our model, then we outline how they are applied to the inputs. The __init__
method initializes the layers used in our model – in our example, these are the Conv2d
, Maxpool2d
, and Linear
layers.
The forward
method defines the feed-forward operation on the input data x
.
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
Using torch.nn.Conv2d
, we can apply a 2D convolution over an input signal (images in our dataset). The most important parameters of the convolutional layer are:
in_channels
and out_channels
: number of input and output channels, respectively.stride
: controls the stride for the cross-correlation, and can be a single number or a tuple.padding
: the number of implicit zero-paddings on both sides for padding number of points for each dimension.Our conv1
layer is initialized with 2 input channels, 6 output channels, and a kernel size of 5.
Next, we add a pooling layer, torch.nn.MaxPool2d
, which downsamples our feature maps by summarizing features in patches of the feature map. The most critical parameters for this layer are:
kernel_size
– the size of the window to take a max overstride
– the stride of the window. The default value is kernel_sizepadding
– implicit zero padding to be added on both sidesNext, we flatten the last convolutional or pooling layer's output so it can be fed into a fully connected neural network to map the features extracted to their corresponding classes. In PyTorch, this is done using nn.Linear
layer.
In the forward
method, you will see we apply the ReLU activation (using F.relu
) to the layer's output to avoid succumbing to the vanishing gradient problem.
def train(model, device, train_loader, optimizer, criterion, epoch, steps_per_epoch=20):
# Switch model to training mode. This is necessary for layers like dropout, batchnorm, etc., which behave differently in training and evaluation mode
model.train()
train_loss = 0
train_total = 0
train_correct = 0
# We loop over the data iterator, and feed the inputs to the network and adjust the weights.
for batch_idx, (data, target) in enumerate(train_loader, start=0):
# Load the input features and labels from the training dataset
data, target = data.to(device), target.to(device)
# Reset the gradients to 0 for all learnable weight parameters
optimizer.zero_grad()
# Forward pass: Pass image data from training dataset, make predictions about class image belongs to (0-9 in this case)
output = model(data)
# Define our loss function, and compute the loss
loss = criterion(output, target)
train_loss += loss.item()
scores, predictions = torch.max(output.data, 1)
train_total += target.size(0)
train_correct += int(sum(predictions == target))
# Reset the gradients to 0 for all learnable weight parameters
optimizer.zero_grad()
# Backward pass: compute the gradients of the loss w.r.t. the model's parameters
loss.backward()
# Update the neural network weights
optimizer.step()
acc = round((train_correct / train_total) * 100, 2)
print('Epoch [{}], Loss: {}, Accuracy: {}'.format(epoch, train_loss/train_total, acc), end='')
wandb.log({'Train Loss': train_loss/train_total, 'Train Accuracy': acc})
Now let us train our model and use Weights and Biases to measure its performance.
In PyTorch, the core of the training step looks like this:
output_batch = model(train_batch) # get the model predictions
loss = loss_fn(output_batch, labels_batch) # calculate the loss
optimizer.zero_grad() # clear previous gradients - note: this step is very important!
loss.backward() # compute gradients of all variables w.r.t. the loss
optimizer.step() # update the network using the calculated gradients
The test step is similar with two key differences:
model.train()
, while in the test step, we use model.test()
.We used wandb.log()
in step 2 to log our Train Accuracy
and Train Loss
. Weights & Biases helps us save everything we need to compare and reproduce models — architecture, hyperparameters, weights, model predictions, GPU usage, git commits, and even datasets.
Let us train the model for 10 epochs and see the results automatically logged by Weights & Biases. We can see that the model trained well but quickly overfitted on the dataset. We can observe this from the diverging validation loss curve.
Next, we'll change the values of a hyperparameter (kernel_size
) and observe its effect on model performance using Weights & Biases. You can try a plethora of values for a plethora of hyperparameters with just a few lines of code using Sweeps.
Check out this colab for full code for running a Sweep with a PyTorch model.
This experiment showcases the effect of changing kernel size, i.e. the size of the filter that strides over the input and performs convolutions. Weights & Biases automatically generates a few helpful plots that help us analyze the results of our hyperparameter search.
We can see that a kernel_size
of 3 for both the convolutional layers led to the best model performance. Since the network was shallow, the small kernel sizes resulted in better resolution for feature extraction.
From the parameter importance plot, we can see that the first layer's kernel size is more correlated to lower values of test_loss
.
Weights & Biases helps you keep track of your machine learning experiments. Use our tool to log hyperparameters and output metrics from your runs, then visualize and compare results and quickly share findings with your colleagues.
Get started in 5 minutes.