In this report, we'll compare the following hyper-parameter optimization methods.
To do so, we'll train a simple DCGAN on MNIST dataset and optimize the model for maximizing the inception score.
We'll use Ray Tune to perform these experiments and track the results on W&B dashboard.
There are many unparalleled advantages of using Ray Tune with W&B:
We'll use the same search space for all the experiments in order to make the comparison fair.
config = {
"netG_lr": lambda: np.random.uniform(1e-2, 1e-5),
"netD_lr": lambda: np.random.uniform(1e-2, 1e-5),
"beta1": [0.3,0.5,0.8]
}
There are 2 ways of tracking progress through W&B using Tune.
WandbLogger
as a logger when calling tune.run
. This tracks all the metrics reported to Tune.@wandb_mixin
function decorator and invoke wandb.log
to track your desired metrics.
Tune initializes W&B run using the information passed in the config
dictionary. config = {
....
"wandb":{
"project": "Project_name",
#Additional wandb.init() parameters
}
}
...
tune.run(...,
config = config,
...
)
Let's perform random search across the search space to see how well it optimizes. This will also act as the baseline metric for our comparison.
Our experimental setup has 2 GPUs and 4 CPUs. We'll parallelize the operation across multiple GPUs. Tune does this automatically for you if you specify the resources_per_trail
.
analysis = tune.run(
dcgan_train,
resources_per_trial={'gpu': 1,'cpu':2}, # Tune will use this information to parallelize the tuning operation
num_samples=10,
config=config
)
Let's see the results
The basic idea behind Bayesian Hyperparameter tuning is to not be completely random in your choice for hyper-parameters but instead use the information from the prior runs to choose the hyperparameters for the next run. Tune supports HyperOpt which implements Bayesian search algorithms. Here's how you do it.
# Step 1: Specify the search space
hyperopt_space= {
"netG_lr": hp.uniform( "netG_lr", 1e-5, 1e-2),
"netD_lr": hp.uniform( "netD_lr", 1e-5, 1e-2),
"beta1":hp.choice("beta1",[0.3,0.5,0.8]) }
#step 2: initialize the search_alg object and (optionally) set the number of concurrent runs
hyperopt_alg = HyperOptSearch(space = hyperopt_space,metric="is_score",mode="max")
hyperopt_alg = ConcurrencyLimiter(hyperopt_alg, max_concurrent=2)
#Step 3: Start the tuner
analysis = tune.run(
dcgan_train,
search_alg = hyperopt_alg, # Specify the search algorithm
resources_per_trial={'gpu': 1,'cpu':2},
num_samples=10,
config=config
})
Here's what results look like
The idea Asynchronous Hyperband is to eliminate or terminate the runs that don't perform well. It makes sense to combine this method with the Bayesian search to see if we can further reduce the wastage of resources on the runs that don't optimize. We just need to make a small change in our code to accommodate Hyperband.
from ray.tune.schedulers import AsyncHyperBandScheduler
sha_schedular = AsyncHyperBandScheduler(metric="is_score",
mode="max",max_t=300)
analysis = tune.run(
dcgan_train,
search_alg = hyperopt_alg, # Specify the search algorithm
scheduler = sha_schedular, # Specify the scheduler
...
})
Let us now see how this performs
The last tuning algorithm that we'll cover is population based training (PBT) introduced by Deepmind research. The basic idea behind the algorithm in layman terms:
# Step 1: Initialize the PBT schedular
pbt_scheduler = PopulationBasedTraining(
time_attr="training_iteration", # Set the time attribute as training iterations
metric="is_score",
mode="max",
perturbation_interval=5, #The time interval T after which you perform the PBT operation
hyperparam_mutations={
# distribution for resampling
"netG_lr": lambda: np.random.uniform(1e-2, 1e-5),
"netD_lr": lambda: np.random.uniform(1e-2, 1e-5),
"beta1": [0.3,0.5,0.8]
})
# step 2: run the tuner
analysis = tune.run(
dcgan_train,
scheduler=pbt_scheduler, #For using PBT
resources_per_trial={'gpu': 1,'cpu':2},
num_samples=5,
config=config
)
Let us now look at the results.
We'll now reduce the number of runs to 5 in order to make things difficult for PBT. Let's see how it performs under this restricted circumstance.
Here's how the final comparison of the average inception scores looks like. We've averaged across 5 run sets:
Some of the important points to notice in these experiments: