Skip to main content

Modern Scalable Hyperparameter Tuning Methods With Weights & Biases

This article provides a comparison of Random search, Bayesian search using HyperOpt, Bayesian search combined with Asynchronous Hyperband, and Population Based Training.
Created on August 23|Last edited on November 4
In this report, we'll compare the following hyper-parameter optimization methods.
  • Random Search
  • Bayesian Search using HyperOpt
  • Bayesian Search combined with Asynchronous Hyperband
  • Population-Based Training
To do so, we'll train a simple DCGAN on the MNIST dataset and optimize the model for maximizing the Inception score.
We'll use Ray Tune to perform these experiments and track the results on W&B dashboard.

Table of Contents



Why Use Ray Tune With W&B for Hyperparameter Tuning?

There are many unparalleled advantages of using Ray Tune with W&B:
  • Tune provides implementations of state-of-the-art hyper-parameter tuning algorithms that scale.
  • Experiments can be scaled easily from a notebook to GPU-powered servers without any change in code
  • Experiments can be parallelized across GPUs in 2 lines of code.
  • With W&B experiment Tracking, you have all your stats in one place for making useful inferences
  • Using W&B with Ray Tune, you never lose any progress

The Search Space

We'll use the same search space for all the experiments in order to make the comparison fair.
config = {
"netG_lr": lambda: np.random.uniform(1e-2, 1e-5),
"netD_lr": lambda: np.random.uniform(1e-2, 1e-5),
"beta1": [0.3,0.5,0.8]
}

Enable W&B tracking

There are 2 ways of tracking progress through W&B using Tune.
  • You can pass WandbLogger as a logger when calling tune.run. This tracks all the metrics reported to Tune.
  • You can use @wandb_mixin function decorator and invoke wandb.log to track your desired metrics. Tune initializes W&B run using the information passed in the config dictionary.
config = {
....
"wandb":{
"project": "Project_name",
#Additional wandb.init() parameters
}
}
...
tune.run(...,
config = config,
...
)
Let's perform Random Search across the search space to see how well it optimizes. This will also act as the baseline metric for our comparison.
Our experimental setup has 2 GPUs and 4 CPUs. We'll parallelize the operation across multiple GPUs. The tune does this automatically for you if you specify the resources_per_trail.
analysis = tune.run(
dcgan_train,
resources_per_trial={'gpu': 1,'cpu':2}, # Tune will use this information to parallelize the tuning operation
num_samples=10,
config=config
)
Let's take a look at the results:

Run set
10



Bayesian Search With HyperOpt

The basic idea behind Bayesian Hyperparameter tuning is to not be completely random in your choice for hyperparameters but instead use the information from the prior runs to choose the hyperparameters for the next run. Tune supports HyperOpt which implements Bayesian search algorithms. Here's how you do it.
# Step 1: Specify the search space
hyperopt_space= {
"netG_lr": hp.uniform( "netG_lr", 1e-5, 1e-2),
"netD_lr": hp.uniform( "netD_lr", 1e-5, 1e-2),
"beta1":hp.choice("beta1",[0.3,0.5,0.8]) }

#step 2: initialize the search_alg object and (optionally) set the number of concurrent runs
hyperopt_alg = HyperOptSearch(space = hyperopt_space,metric="is_score",mode="max")
hyperopt_alg = ConcurrencyLimiter(hyperopt_alg, max_concurrent=2)

#Step 3: Start the tuner
analysis = tune.run(
dcgan_train,
search_alg = hyperopt_alg, # Specify the search algorithm
resources_per_trial={'gpu': 1,'cpu':2},
num_samples=10,
config=config
})
Here's what the results look like:

Run set
10


Bayesian Search with Asynchronous HyperBand

The idea of Asynchronous Hyperband is to eliminate or terminate the runs that don't perform well. It makes sense to combine this method with the Bayesian search to see if we can further reduce the wastage of resources on the runs that don't optimize. We just need to make a small change in our code to accommodate Hyperband.
from ray.tune.schedulers import AsyncHyperBandScheduler
sha_schedular = AsyncHyperBandScheduler(metric="is_score",
mode="max",max_t=300)

analysis = tune.run(
dcgan_train,
search_alg = hyperopt_alg, # Specify the search algorithm
scheduler = sha_schedular, # Specify the scheduler
...
})
Let us now see how this performs:

Run set
20



Population-Based Training



The last tuning algorithm that we'll cover is population-based training (PBT) introduced by Deepmind research. The basic idea behind the algorithm in layman's terms:
  • Run the optimization process for some samples for a given time step(or iterations) T.
  • After every T iterations, compare the runs and copy the weights of good-performing runs to the bad-performing runs and change their hyper-parameter values to be close to the values of the runs that performed well.
  • Terminate the worst-performing runs. Although the idea behind the algorithm seems simple, there is a lot of complex optimization math that goes into building this from scratch. Tune provides a scalable and easy-to-use implementation of the SOTA PBT algorithm
    
  • Here's how you can quickly incorporate it into your program.
# Step 1: Initialize the PBT schedular
pbt_scheduler = PopulationBasedTraining(
time_attr="training_iteration", # Set the time attribute as training iterations
metric="is_score",
mode="max",
perturbation_interval=5, #The time interval T after which you perform the PBT operation
hyperparam_mutations={
# distribution for resampling
"netG_lr": lambda: np.random.uniform(1e-2, 1e-5),
"netD_lr": lambda: np.random.uniform(1e-2, 1e-5),
"beta1": [0.3,0.5,0.8]
})

# step 2: run the tuner
analysis = tune.run(
dcgan_train,
scheduler=pbt_scheduler, #For using PBT
resources_per_trial={'gpu': 1,'cpu':2},
num_samples=5,
config=config
)

Let us now look at the results:

Run set
10



Reducing the Number of Runs

We'll now reduce the number of runs to 5 in order to make things difficult for PBT. Let's see how it performs under this restricted circumstance.

Run set
5


Comparing Average Inception Scores Across Runs

Here's how the final comparison of the average inception scores looks like. We've averaged across 5 run sets:
  • Random Search - 10 Runs ( Job Type - mnist-random)
  • Bayesian Search - 10 Runs ( Job Type - mnist-hyperopt)
  • Bayesian Search with Hyperband - 20 Runs (Job Type- mnist-SHA2-hyperopt)
  • PBT scheduler - 10 Runs (Job Type - mnist-pbt2)
  • PBT scheduler - 5 Runs (Job Type - mnist-pbt3)

Run set
55



Ending Note

Some of the important points to notice in these experiments:
  • All the experiments were parallelized across 2 GPUs automatically by Tune.
  • These experiments can be scaled up or down without changing the code.
  • We have all the important metrics, inferences and even this report in one place and can be easily shared
  • These inferences can be used to accurately quantify the resources that will be saved by using the suitable search method.
  • This overall structure leads to more productivity among the teams

Iterate on AI agents and models faster. Try Weights & Biases today.