Modern Scalable Hyperparameter Tuning Methods With Weights & Biases

This article provides a comparison of Random search, Bayesian search using HyperOpt, Bayesian search combined with Asynchronous Hyperband, and Population Based Training.
Ayush Chaurasia
Created on August 23|Last edited on November 4
Comment
﻿
In this report, we'll compare the following hyper-parameter optimization methods.
Random Search
﻿Bayesian Search using HyperOpt
Bayesian Search combined with Asynchronous Hyperband
Population-Based Training
To do so, we'll train a simple DCGAN on the MNIST dataset and optimize the model for maximizing the Inception score.
We'll use Ray Tune to perform these experiments and track the results on W&B dashboard.
Table of ContentsRandom SearchBayesian Search With HyperOptBayesian Search with Asynchronous HyperBandPopulation-Based TrainingEnding Note
﻿
Why Use Ray Tune With W&B for Hyperparameter Tuning?There are many unparalleled advantages of using Ray Tune with W&B:
Tune provides implementations of state-of-the-art hyper-parameter tuning algorithms that scale.
Experiments can be scaled easily from a notebook to GPU-powered servers without any change in code
Experiments can be parallelized across GPUs in 2 lines of code.
With W&B experiment Tracking, you have all your stats in one place for making useful inferences
Using W&B with Ray Tune, you never lose any progress
The Search SpaceWe'll use the same search space for all the experiments in order to make the comparison fair.
     config = {
        "netG_lr": lambda: np.random.uniform(1e-2, 1e-5),
        "netD_lr": lambda: np.random.uniform(1e-2, 1e-5),
        "beta1": [0.3,0.5,0.8]
}
Enable W&B trackingThere are 2 ways of tracking progress through W&B using Tune. 
You can pass WandbLogger as a logger when calling tune.run. This tracks all the metrics reported to Tune. 
You can use @wandb_mixin function decorator and invoke wandb.log to track your desired metrics.
Tune initializes W&B run using the information passed in the config dictionary.
     config = {
       ....
          "wandb":{
            "project": "Project_name",
            #Additional wandb.init() parameters
        }
}
...
tune.run(...,
config = config,
...
)
Random SearchLet's perform Random Search across the search space to see how well it optimizes. This will also act as the baseline metric for our comparison.
Our experimental setup has 2 GPUs and 4 CPUs. We'll parallelize the operation across multiple GPUs. The tune does this automatically for you if you specify the resources_per_trail.
analysis = tune.run(
    dcgan_train,
    resources_per_trial={'gpu': 1,'cpu':2}, # Tune will use this information to parallelize the tuning operation
    num_samples=10,
    config=config
)
Let's take a look at the results: 
﻿
Run set10
﻿
﻿
Bayesian Search With HyperOptThe basic idea behind Bayesian Hyperparameter tuning is to not be completely random in your choice for hyperparameters but instead use the information from the prior runs to choose the hyperparameters for the next run. Tune supports HyperOpt which implements Bayesian search algorithms. Here's how you do it.
# Step 1: Specify the search space
hyperopt_space= {
     "netG_lr": hp.uniform( "netG_lr", 1e-5, 1e-2),
    "netD_lr": hp.uniform( "netD_lr", 1e-5, 1e-2),
    "beta1":hp.choice("beta1",[0.3,0.5,0.8])  }
﻿
#step 2: initialize the search_alg object and (optionally) set the number of concurrent runs
hyperopt_alg = HyperOptSearch(space = hyperopt_space,metric="is_score",mode="max")
hyperopt_alg = ConcurrencyLimiter(hyperopt_alg, max_concurrent=2)
﻿
#Step 3: Start the tuner
analysis = tune.run(
    dcgan_train,
    search_alg = hyperopt_alg, # Specify the search algorithm
    resources_per_trial={'gpu': 1,'cpu':2},
    num_samples=10,
    config=config
    })
Here's what the results look like: 
﻿
Run set10
﻿
Bayesian Search with Asynchronous HyperBandThe idea of Asynchronous Hyperband is to eliminate or terminate the runs that don't perform well. It makes sense to combine this method with the Bayesian search to see if we can further reduce the wastage of resources on the runs that don't optimize. We just need to make a small change in our code to accommodate Hyperband.
from ray.tune.schedulers import AsyncHyperBandScheduler
sha_schedular = AsyncHyperBandScheduler(metric="is_score",
                            mode="max",max_t=300)
﻿
analysis = tune.run(
    dcgan_train,
    search_alg = hyperopt_alg, # Specify the search algorithm
    scheduler = sha_schedular, # Specify the scheduler
   ...
   })
Let us now see how this performs: 
﻿
Run set20
﻿
﻿
Population-Based Training﻿
﻿
 
The last tuning algorithm that we'll cover is population-based training (PBT) introduced by Deepmind research. The basic idea behind the algorithm in layman's terms:
Run the optimization process for some samples for a given time step(or iterations) T.
After every T iterations, compare the runs and copy the weights of good-performing runs to the bad-performing runs and change their hyper-parameter values to be close to the values of the runs that performed well.
Terminate the worst-performing runs.
Although the idea behind the algorithm seems simple, there is a lot of complex optimization math that goes into building this from scratch. Tune provides a scalable and easy-to-use implementation of the SOTA PBT algorithm
 
﻿
 
Here's how you can quickly incorporate it into your program.
# Step 1: Initialize the PBT schedular
pbt_scheduler = PopulationBasedTraining(
    time_attr="training_iteration", # Set the time attribute as training iterations
    metric="is_score",
    mode="max",
    perturbation_interval=5, #The time interval T after which you perform the PBT operation
    hyperparam_mutations={ 
        # distribution for resampling
        "netG_lr": lambda: np.random.uniform(1e-2, 1e-5),
        "netD_lr": lambda: np.random.uniform(1e-2, 1e-5),
        "beta1": [0.3,0.5,0.8]
    })
﻿
# step 2: run the tuner
analysis = tune.run(
    dcgan_train,
    scheduler=pbt_scheduler, #For using PBT 
    resources_per_trial={'gpu': 1,'cpu':2},
    num_samples=5,
    config=config
  )
﻿
Let us now look at the results: 
﻿
Run set10
﻿
﻿
Reducing the Number of RunsWe'll now reduce the number of runs to 5 in order to make things difficult for PBT. Let's see how it performs under this restricted circumstance.
﻿
Run set5
﻿
Comparing Average Inception Scores Across RunsHere's how the final comparison of the average inception scores looks like. We've averaged across 5 run sets:
Random Search - 10 Runs ( Job Type - mnist-random)
Bayesian Search - 10 Runs ( Job Type - mnist-hyperopt)
Bayesian Search with Hyperband - 20 Runs (Job Type- mnist-SHA2-hyperopt)
PBT scheduler - 10 Runs (Job Type - mnist-pbt2)
PBT scheduler - 5 Runs (Job Type - mnist-pbt3)
﻿
Run set55
﻿
﻿
Ending NoteSome of the important points to notice in these experiments:
All the experiments were parallelized across 2 GPUs automatically by Tune.
These experiments can be scaled up or down without changing the code.
We have all the important metrics, inferences and even this report in one place and can be easily shared
These inferences can be used to accurately quantify the resources that will be saved by using the suitable search method.
This overall structure leads to more productivity among the teams  
﻿
﻿
Add a comment
Tags: Intermediate, NLP, OCR, Ray Tune, Experiment, DCGAN, Plots, Sweeps, MNIST
Iterate on AI agents and models faster. Try Weights & Biases today.