PyTorch Dropout for regularization - tutorial
Learn how to regularize your PyTorch model with Dropout, complete with a code tutorial and interactive visualizations
Created on July 2|Last edited on December 13
Comment
In this report, we'll see an example of adding dropout regularization to a PyTorch model and observe the effect dropout has on the model's performance by tracking our models in Weights & Biases.
For the unfamiliar, we'll discuss what dropout does. If you already know this, you can:
Jump to the tutorial
What We'll Be Covering
What is Dropout regularizaton  in machine learning?Understanding torch.nn.DropoutAdding Dropout to a PyTorch model1. Add Dropout to a PyTorch Model2. Observe the Effect of Dropout on Model performanceImpact of using Dropout regularization in PyTorchTry Weights & BiasesWhere to apply Dropout in your neural networkAdvanced dropout techniques and considerationsConclusion
What is Dropout regularizaton in machine learning?
Dropout regularization is a machine learning technique where you remove (or "drop out") units in a neural net to simulate training large numbers of architectures simultaneously. Importantly, dropout can drastically reduce the chance of overfitting during training. 

Overfitting occurs when a neural network learns the training data too well, including its noise and specific patterns, leading to poor performance on new, unseen data. Dropout addresses this by introducing randomness during training. By randomly "dropping out" (setting to zero) a fraction of neurons in each training iteration, you prevent the network from relying too heavily on any single neuron or feature. This encourages the network to learn more distributed and robust representations.
This also mitigates the co-adaptation of neurons, where neurons become overly specialized to work together, hindering generalization. The concept of dropout was introduced by Hinton et al. in the 2012 paper Improving neural networks by preventing co-adaptation of feature detectors, and has since become a staple in deep learning, significantly impacting the field by improving the generalization capabilities of models.
Run an example of dropout in PyTorch in this Colab →
Understanding torch.nn.Dropout
PyTorch provides a convenient way to implement dropout using the torch.nn.Dropout class. This class randomly zeroes elements of an input tensor during training, effectively "dropping out" neurons. The key parameters are p, which defines the probability of zeroing an element, and inplace, which determines whether the operation is performed in-place.
The p parameter, often referred to as the dropout rate, controls the fraction of neurons that are randomly deactivated during each training step. A higher p value means more neurons are dropped, leading to stronger regularization. Typical values for p range from 0.2 to 0.5, but the optimal value depends on the specific network and dataset.
The inplace parameter, when set to True, modifies the input tensor directly, saving memory. However, it's important to note that in-place operations can sometimes be less flexible and may not be suitable for all situations.
Here's a simple code example demonstrating how to instantiate and use nn.Dropout
Adding Dropout to a PyTorch model
Here are simple step-by-step instructions on how to add dropout regularization to a PyTorch model.
1. Add Dropout to a PyTorch Model
Adding dropout to your PyTorch models is very straightforward with the torch.nn.Dropout class, which takes in the dropout rate – the probability of a neuron being deactivated – as a parameter.
self.dropout = nn.Dropout(0.25)
We can apply dropout after any non-output layer.
2. Observe the Effect of Dropout on Model performance
To observe the dropout effect, train a model to do image classification. We'll first train an unregularized network, followed by a network regularized through Dropout. The models are trained on the Cifar-10 dataset for 15 epochs each. 
A complete example of adding dropout to a PyTorch model:
class Net(nn.Module):def __init__(self, input_shape=(3,32,32)):super(Net, self).__init__()self.conv1 = nn.Conv2d(3, 32, 3)self.conv2 = nn.Conv2d(32, 64, 3)self.conv3 = nn.Conv2d(64, 128, 3)self.pool = nn.MaxPool2d(2,2)n_size = self._get_conv_output(input_shape)self.fc1 = nn.Linear(n_size, 512)self.fc2 = nn.Linear(512, 10)# Define proportion or neurons to dropoutself.dropout = nn.Dropout(0.25)def forward(self, x):x = self._forward_features(x)x = x.view(x.size(0), -1)x = self.dropout(x)x = F.relu(self.fc1(x))# Apply dropoutx = self.dropout(x)x = self.fc2(x)return x
By using wandb.log() in your training function, you can automatically track the performance of your model. See docs for full details.
def train(model, device, train_loader, optimizer, criterion, epoch, steps_per_epoch=20):# Log gradients and model parameterswandb.watch(model)# loop over the data iterator, and feed the inputs to the network and adjust the weights.for batch_idx, (data, target) in enumerate(train_loader, start=0):# ...acc = round((train_correct / train_total) * 100, 2)# Log metrics to visualize performancewandb.log({'Train Loss': train_loss/train_total, 'Train Accuracy': acc})
Run set
2
Impact of using Dropout regularization in PyTorch
You may be asking, "What is the impact of using Dropout regularization?" With it you'll see:
- An unregularized network quickly overfits on the training dataset. Notice how the validation loss for without-dropout run diverges a lot after just a few epochs. This accounts for the higher generalization error.
- Training with two dropout layers with a dropout probability of 25% prevents model from overfitting. However, this brings down the training accuracy, which means a regularized network has to be trained longer.
- Dropout improves the model generalization. Even though the training accuracy is lower than the unregularized network, the overall validation accuracy has improved. This accounts for a lower generalization error.
And that concludes this short tutorial on using dropout in your PyTorch models.
Try our dropout colab yourself →
Try Weights & Biases
Weights & Biases helps you keep track of your machine learning experiments. Try our tool to log hyperparameters and output metrics from your runs, then visualize and compare results and quickly share findings with your colleagues.
To run 2 quick experiments on Replit and see how W&B can help organise your work foloow the instructions below:
Instructions:
- Click the green "Run" button below (the first time you click Run, Replit will take approx 30-45 seconds to allocate a machine)
- Follow the prompts in the terminal window (the bottom right pane below)
- You can resize the terminal window (bottom right) for a larger view
Where to apply Dropout in your neural network
The placement of dropout layers within a neural network is crucial for its effectiveness. While dropout can be applied in various locations, it's commonly used after activation functions in hidden layers and sometimes in the input layer. The strategic placement of dropout can significantly impact the model's performance.
Applying dropout after activation functions in hidden layers is a common practice. This helps to prevent overfitting by reducing the co-adaptation of neurons within those layers. However, the optimal placement can vary depending on the specific architecture and task.
For example, in deeper networks, dropout might be more beneficial in the earlier layers. Applying dropout to the input layer can be useful for feature selection, as it forces the model to learn more robust representations by not relying on any single input feature. However, it's important to note that applying too much dropout in the input layer can lead to underfitting.
The decision of where to apply dropout is often an empirical one, requiring experimentation to find the best configuration for a given task.
Advanced dropout techniques and considerations
While standard dropout is effective, there are advanced techniques and considerations that can further improve its performance. These include adaptive dropout rates and dropout variants.
Adaptive dropout rates involve adjusting the dropout rate dynamically during training, based on the model's performance. This can help to fine-tune the regularization strength and improve the model's generalization capabilities. Other dropout variants, such as variational dropout, introduce more sophisticated mechanisms for randomly dropping neurons. These variants can sometimes lead to better performance than standard dropout, especially in complex models. Choosing the appropriate dropout rate for different layers is also crucial. It's often beneficial to experiment with different dropout rates for different layers to find the optimal configuration for a given task.
Conclusion
Dropout is a powerful and versatile regularization technique that plays a crucial role in preventing overfitting in neural networks. By randomly deactivating neurons during training, dropout encourages the network to learn more robust and generalizable features. Understanding how to use torch.nn.Dropout in PyTorch effectively is essential for building high-performing deep learning models.
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.
