In this tutorial, we will show you how to implement a Convolutional Neural Network in PyTorch. We will define the model's architecture, train the CNN, and leverage Weights and Biases to observe the effect of changing hyperparameters (like filter and kernel sizes) on model performance.

A Convolutional Neural Network can extract spatial and temporal relationships in data with a known grid-like topology, e.g., images (2D grid of pixels) and audio or time series data (1D grid of samples at regular intervals). You can see an example of a convolutional operation below (source): 1 ciDgQEjViWLnCbmX-EeSrA.gif

Full code in colab →

1. Download and prepare data

For this report, will we use the CIFAR-10 dataset. Using torchvision, it is effortless to load CIFAR-10.


transform = transforms.Compose(
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

# load training dataset
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader =, batch_size=BATCH_SIZE,
                                          shuffle=True, num_workers=2)
# ... load test dataset ...

CLASS_NAMES = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

The output of a torchvision dataset (trainset) are PILImage images with values in the range [0,1]. Using transform.transforms we can transform them into normalized tensors [-1, 1].

2. Define the CNN model in PyTorch

Define the model

In PyTorch, a model is defined by subclassing the torch.nn.Module class. We define our model, the Net class this way.

The model is defined in two steps: First, we specify the parameters of our model, then we outline how they are applied to the inputs. The __init__ method initializes the layers used in our model – in our example, these are the Conv2d, Maxpool2d, and Linear layers.

The forward method defines the feed-forward operation on the input data x.

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Define the convolution

Using torch.nn.Conv2d, we can apply a 2D convolution over an input signal (images in our dataset). The most important parameters of the convolutional layer are:

Our conv1 layer is initialized with 2 input channels, 6 output channels, and a kernel size of 5.

Next, we add a pooling layer, torch.nn.MaxPool2d, which downsamples our feature maps by summarizing features in patches of the feature map. The most critical parameters for this layer are:

Next, we flatten the last convolutional or pooling layer's output so it can be fed into a fully connected neural network to map the features extracted to their corresponding classes. In PyTorch, this is done using nn.Linear layer.

In the forward method, you will see we apply the ReLU activation (using F.relu) to the layer's output to avoid succumbing to the vanishing gradient problem.

def train(model, device, train_loader, optimizer, criterion, epoch, steps_per_epoch=20):
  # Switch model to training mode. This is necessary for layers like dropout, batchnorm, etc., which behave differently in training and evaluation mode

  train_loss = 0
  train_total = 0
  train_correct = 0

  # We loop over the data iterator, and feed the inputs to the network and adjust the weights.
  for batch_idx, (data, target) in enumerate(train_loader, start=0):
    # Load the input features and labels from the training dataset
    data, target =,
    # Reset the gradients to 0 for all learnable weight parameters
    # Forward pass: Pass image data from training dataset, make predictions about class image belongs to (0-9 in this case)
    output = model(data)
    # Define our loss function, and compute the loss
    loss = criterion(output, target)
    train_loss += loss.item()

    scores, predictions = torch.max(, 1)
    train_total += target.size(0)
    train_correct += int(sum(predictions == target))
    # Reset the gradients to 0 for all learnable weight parameters

    # Backward pass: compute the gradients of the loss w.r.t. the model's parameters
    # Update the neural network weights

  acc = round((train_correct / train_total) * 100, 2)
  print('Epoch [{}], Loss: {}, Accuracy: {}'.format(epoch, train_loss/train_total, acc), end='')
  wandb.log({'Train Loss': train_loss/train_total, 'Train Accuracy': acc})

Now let us train our model and use Weights and Biases to measure its performance.

3. Train the model

In PyTorch, the core of the training step looks like this:

output_batch = model(train_batch) # get the model predictions
loss = loss_fn(output_batch, labels_batch)  # calculate the loss

optimizer.zero_grad()  # clear previous gradients - note: this step is very important!

loss.backward() # compute gradients of all variables w.r.t. the loss

optimizer.step() # update the network using the calculated gradients

The test step is similar with two key differences:

4. Visualize the model performance

We used wandb.log() in step 2 to log our Train Accuracy and Train Loss. Weights & Biases helps us save everything we need to compare and reproduce models — architecture, hyperparameters, weights, model predictions, GPU usage, git commits, and even datasets.

Let us train the model for 10 epochs and see the results automatically logged by Weights & Biases. We can see that the model trained well but quickly overfitted on the dataset. We can observe this from the diverging validation loss curve.

Section 4

5. Observing the Effect of Tweaking Hyperparameters

Next, we'll change the values of a hyperparameter (kernel_size) and observe its effect on model performance using Weights & Biases. You can try a plethora of values for a plethora of hyperparameters with just a few lines of code using Sweeps.

Check out this colab for full code for running a Sweep with a PyTorch model.


This experiment showcases the effect of changing kernel size, i.e. the size of the filter that strides over the input and performs convolutions. Weights & Biases automatically generates a few helpful plots that help us analyze the results of our hyperparameter search.

Section 6

Weights & Biases

Weights & Biases helps you keep track of your machine learning experiments. Use our tool to log hyperparameters and output metrics from your runs, then visualize and compare results and quickly share findings with your colleagues.

Get started in 5 minutes.