Modern Scalable Hyperparameter Tuning Methods With Weights & Biases
This article provides a comparison of Random search, Bayesian search using HyperOpt, Bayesian search combined with Asynchronous Hyperband, and Population Based Training.
Created on August 23|Last edited on November 4
Comment
In this report, we'll compare the following hyper-parameter optimization methods.
- Random Search
- Bayesian Search combined with Asynchronous Hyperband
- Population-Based Training
To do so, we'll train a simple DCGAN on the MNIST dataset and optimize the model for maximizing the Inception score.
We'll use Ray Tune to perform these experiments and track the results on W&B dashboard.
Table of Contents
Random SearchBayesian Search With HyperOptBayesian Search with Asynchronous HyperBandPopulation-Based TrainingEnding Note
Why Use Ray Tune With W&B for Hyperparameter Tuning?
There are many unparalleled advantages of using Ray Tune with W&B:
- Tune provides implementations of state-of-the-art hyper-parameter tuning algorithms that scale.
- Experiments can be scaled easily from a notebook to GPU-powered servers without any change in code
- Experiments can be parallelized across GPUs in 2 lines of code.
- With W&B experiment Tracking, you have all your stats in one place for making useful inferences
- Using W&B with Ray Tune, you never lose any progress
The Search Space
We'll use the same search space for all the experiments in order to make the comparison fair.
config = {"netG_lr": lambda: np.random.uniform(1e-2, 1e-5),"netD_lr": lambda: np.random.uniform(1e-2, 1e-5),"beta1": [0.3,0.5,0.8]}
Enable W&B tracking
There are 2 ways of tracking progress through W&B using Tune.
- You can pass WandbLogger as a logger when calling tune.run. This tracks all the metrics reported to Tune.
- You can use @wandb_mixin function decorator and invoke wandb.log to track your desired metrics. Tune initializes W&B run using the information passed in the config dictionary.
config = {...."wandb":{"project": "Project_name",#Additional wandb.init() parameters}}...tune.run(...,config = config,...)
Random Search
Let's perform Random Search across the search space to see how well it optimizes. This will also act as the baseline metric for our comparison.
Our experimental setup has 2 GPUs and 4 CPUs. We'll parallelize the operation across multiple GPUs. The tune does this automatically for you if you specify the resources_per_trail.
analysis = tune.run(dcgan_train,resources_per_trial={'gpu': 1,'cpu':2}, # Tune will use this information to parallelize the tuning operationnum_samples=10,config=config)
Let's take a look at the results:
Run set
10
Bayesian Search With HyperOpt
The basic idea behind Bayesian Hyperparameter tuning is to not be completely random in your choice for hyperparameters but instead use the information from the prior runs to choose the hyperparameters for the next run. Tune supports HyperOpt which implements Bayesian search algorithms. Here's how you do it.
# Step 1: Specify the search spacehyperopt_space= {"netG_lr": hp.uniform( "netG_lr", 1e-5, 1e-2),"netD_lr": hp.uniform( "netD_lr", 1e-5, 1e-2),"beta1":hp.choice("beta1",[0.3,0.5,0.8]) }#step 2: initialize the search_alg object and (optionally) set the number of concurrent runshyperopt_alg = HyperOptSearch(space = hyperopt_space,metric="is_score",mode="max")hyperopt_alg = ConcurrencyLimiter(hyperopt_alg, max_concurrent=2)#Step 3: Start the tuneranalysis = tune.run(dcgan_train,search_alg = hyperopt_alg, # Specify the search algorithmresources_per_trial={'gpu': 1,'cpu':2},num_samples=10,config=config})
Here's what the results look like:
Run set
10
Bayesian Search with Asynchronous HyperBand
The idea of Asynchronous Hyperband is to eliminate or terminate the runs that don't perform well. It makes sense to combine this method with the Bayesian search to see if we can further reduce the wastage of resources on the runs that don't optimize. We just need to make a small change in our code to accommodate Hyperband.
from ray.tune.schedulers import AsyncHyperBandSchedulersha_schedular = AsyncHyperBandScheduler(metric="is_score",mode="max",max_t=300)analysis = tune.run(dcgan_train,search_alg = hyperopt_alg, # Specify the search algorithmscheduler = sha_schedular, # Specify the scheduler...})
Let us now see how this performs:
Run set
20
Population-Based Training

The last tuning algorithm that we'll cover is population-based training (PBT) introduced by Deepmind research. The basic idea behind the algorithm in layman's terms:
- Run the optimization process for some samples for a given time step(or iterations) T.
- After every T iterations, compare the runs and copy the weights of good-performing runs to the bad-performing runs and change their hyper-parameter values to be close to the values of the runs that performed well.
- Terminate the worst-performing runs. Although the idea behind the algorithm seems simple, there is a lot of complex optimization math that goes into building this from scratch. Tune provides a scalable and easy-to-use implementation of the SOTA PBT algorithm
- Here's how you can quickly incorporate it into your program.
# Step 1: Initialize the PBT schedularpbt_scheduler = PopulationBasedTraining(time_attr="training_iteration", # Set the time attribute as training iterationsmetric="is_score",mode="max",perturbation_interval=5, #The time interval T after which you perform the PBT operationhyperparam_mutations={# distribution for resampling"netG_lr": lambda: np.random.uniform(1e-2, 1e-5),"netD_lr": lambda: np.random.uniform(1e-2, 1e-5),"beta1": [0.3,0.5,0.8]})# step 2: run the tuneranalysis = tune.run(dcgan_train,scheduler=pbt_scheduler, #For using PBTresources_per_trial={'gpu': 1,'cpu':2},num_samples=5,config=config)
Let us now look at the results:
Run set
10
Reducing the Number of Runs
We'll now reduce the number of runs to 5 in order to make things difficult for PBT. Let's see how it performs under this restricted circumstance.
Run set
5
Comparing Average Inception Scores Across Runs
Here's how the final comparison of the average inception scores looks like. We've averaged across 5 run sets:
- Random Search - 10 Runs ( Job Type - mnist-random)
- Bayesian Search - 10 Runs ( Job Type - mnist-hyperopt)
- Bayesian Search with Hyperband - 20 Runs (Job Type- mnist-SHA2-hyperopt)
- PBT scheduler - 10 Runs (Job Type - mnist-pbt2)
- PBT scheduler - 5 Runs (Job Type - mnist-pbt3)
Run set
55
Ending Note
Some of the important points to notice in these experiments:
- All the experiments were parallelized across 2 GPUs automatically by Tune.
- These experiments can be scaled up or down without changing the code.
- We have all the important metrics, inferences and even this report in one place and can be easily shared
- These inferences can be used to accurately quantify the resources that will be saved by using the suitable search method.
- This overall structure leads to more productivity among the teams
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.