Skip to main content

How to Set Random Seeds in PyTorch and Tensorflow

Learn how to set the random seed for everything in PyTorch and Tensorflow in this short tutorial, which comes complete with code and interactive visualizations.
Created on September 9|Last edited on July 9
As many others have said about Deep Learning, most models aren't learning anything. This is true even for Kaggle Competitions. Particularly the "RSNA-MICCAI Brain Tumor Radiogenomic Classification"
  1. Be it the difference between the Public Leaderboard and Local CV AUC
  2. Models not training in the first place
The motivation for these experiments comes from Chai Time Kaggle Talks with Anjum Sayed (Datasaurus) Video on the Weights and Biases Channel. Anjum mentioned that a good way to check if the models are learning anything is to just change the random seeds and see if it affects the performance.


Here's what we'll cover:

Table of Contents



Let's jump in!

The Code To Set Up Random Seeds

Following are code snippets you can readily use in your codebase to set up random seeds for all the involved libraries in a PyTorch or Tensorflow pipeline.

Setting Up Random Seeds In PyTorch

def set_seed(seed: int = 42) -> None:
np.random.seed(seed)
random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
# When running on the CuDNN backend, two further options must be set
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
# Set a fixed value for the hash seed
os.environ["PYTHONHASHSEED"] = str(seed)
print(f"Random seed set as {seed}")
What most people forget while setting random seeds in PyTorch and Tensorflow is that you also need to set the seed for NumPy, random, and the inherent Python seed. This function also sets that to a unique value.

Setting Up Random Seeds In TensorFlow

def set_seed(seed: int = 42) -> None:
random.seed(seed)
np.random.seed(seed)
tf.random.set_seed(seed)
tf.experimental.numpy.random.seed(seed)
tf.set_random_seed(seed)
# When running on the CuDNN backend, two further options must be set
os.environ['TF_CUDNN_DETERMINISTIC'] = '1'
os.environ['TF_DETERMINISTIC_OPS'] = '1'
# Set a fixed value for the hash seed
os.environ["PYTHONHASHSEED"] = str(seed)
print(f"Random seed set as {seed}")

Motivation For The Experiment

The following code panels from various experiments prove my case that, for this particular problem, the training setup is extremely fragile and that our models aren't really learning anything at all.

EfficientNet3D b0 with different seeds 🌱

  1. The Validation Loss Curve is all over the place.
  2. The Training Loss changes by over 8% just by changing the random seed (Refer to the table below the plot).

Run set
16


EfficientNet3D b0 vs b1 vs b2

  1. Increasing the Model Size leads to sporadic changes to Validation and Training Loss. b0 has a lower training loss but higher validation loss compared to a b2

Run set
12


No effect of Augmentation

Even while using an Augmentation Pipeline, there's no significant change to the training loss (~3%).

Run set
8

As we can see scaling the model backbone and adding augmentation don't affect performance at all whereas changing the random seed increases the performance by 8%.
IMP !!! I'd strongly recommend the first set of trials you run in a Kaggle Competition is to establish a seed and then scale accordingly.
💡

Summary

In this article, you saw how you can use set the random seed for virtually every package in PyTorch and Tensorflow training pipelines for writing reproducible code and how using Weights & Biases to monitor your metrics can lead to valuable insights.
To see the full suite of W&B features, please check out this short 5 minutes guide. If you want more reports covering the math and from scratch code implementations, let us know in the comments down below or on our forum ✨!
Check out these other reports on Fully Connected covering other fundamental development topics like GPU Utilization and Saving Models.

Iterate on AI agents and models faster. Try Weights & Biases today.