Hyperparameter Tuning for Keras and Pytorch models

Lavanya Shukla
10
Jan
2020


We’re excited to launch a powerful and efficient way to do hyperparameter tuning and optimization - W&B Sweeps, in both Keras and Pytoch.

With just a few lines of code Sweeps automatically search through high dimensional hyperparameter spaces to find the best performing model, with very little effort on your part.

Here’s how you can launch sophisticated hyperparameter sweeps in 3 simple steps.

Try Sweeps in Colab →

0. Integrate W&B

First let’s install the Weights & Biases library and add it into your training script.

A. Install wandb

pip install wandb
wandb login


B. Add W&B to your training script

At the top:
  • import wandb – Import the wandb library
  • from wandb.keras import WandbCallback – Import the wandb Keras callback
  • wandb.init() – Initializes a new W&B run.
To your .fit() function
  • callbacks=[WandbCallback()] – Fetches all layer dimensions, model parameters from your keras model and logs them automatically to your W&B dashboard.
# train.py
import wandb
from wandb.keras import WandbCallback
wandb.init()

# define model architecture
...
# compile the model
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

# add the WandbCallback()
model.fit(X_train, y_train,  validation_data=(X_test, y_test), epochs=config.epochs,
   callbacks=[WandbCallback(data_type="image", labels=labels)])

model.save("cnn.h5")

1. Define the Sweep

You can define powerful sweeps simply by creating a YAML file that specifies the parameters to search through, the search strategy, and the optimization metric.

Here’s an example:

# sweep.yaml
program: train.py
method: random
metric:
 name: val_loss
 goal: minimize
parameters:
 learning-rate:
   min: 0.00001
   max: 0.1
 optimizer:
   values: ["adam", "sgd"]
 hidden_layer_size:
   values: [96, 128, 148]
 epochs:
   value: 27
early_terminate:
   type: hyperband
   s: 2
   eta: 3
   max_iter: 27

Let’s break this yaml file down:

  • Program – a training script that defines your model architecture, trains the model, and contains either the WandbCallback(), or wandb.log
  • Method – The search strategy used by the sweep.
  • Grid Search – Iterates over every combination of hyperparameter values.
  • Random Search – Iterates over randomly chosen combinations of hyperparameter values.
  • Bayesian Search – Creates a probabilistic model that maps hyperparameters to probability of a metric score, and chooses parameters with high probability of improving the metric. The objective of Bayesian optimization is to spend more time in picking the hyperparameter values, but in doing so trying out fewer hyperparameter values.
  • Metric – This is the metric the sweeps are attempting to optimize. Metrics can take a name (this metric should be logged by your training script) and a goal (maximize or minimize).
  • Parameters – The hyperparameter names, and either discreet values, max and min values or distributions from which to pull values to sweep over.
  • Early_terminate – The is the stopping strategy for determining when to kill off poorly performing runs, and try more combinations faster. We offer custom scheduling algorithms like HyperBand.

You can find a list of all the configuration options here.

2. Setup a new sweep

Run wandb sweep  with the config file you created in step 1.

This creates your sweep, and returns both a unique identifier (SWEEP_ID) and a URL to track all your runs.

wandb sweep sweep.yaml

3. Launch the sweep

It’s time to launch our sweep and train some models!

You can do so by calling wandb agent with the SWEEP_ID you got from step 2.

wandb agent SWEEP_ID

This will start training models with different hyperparameter combinations and return a URL where you can track the sweep’s progress. You can launch multiple agents concurrently. Each of these agents will fetch parameters from the W&B server and use them to train the next model.

And voila! That's all there is to running a hyperparameter sweep!

Let’s see how we can extract insights about our model from sweeps next.

4. Visualize Sweep Results

Parallel coordinates plot

This plot maps hyperparameter values to model metrics. It’s useful for honing in on combinations of hyperparameters that led to the best model performance.

Hyperparameter Importance Plot

The hyperparameter importance plot surfaces which hyperparameters were the best predictors of, and highly correlated to desirable values for your metrics.

These visualizations can help you save both time and resources running expensive hyperparameter optimizations by honing in on the parameters (and value ranges) that are the most important, and thereby worthy of further exploration.

Next step - Get your hands dirty with sweeps

We created a simple training script and a few flavors of sweep configs for you to play with. We highly encourage you to give these a try. This repo also has examples to help you try more advanced sweep features like Bayesian Hyperband, and Hyperopt.

More Resources