Running Hyperparameter Sweeps to Pick the Best Model

Sayak Paul

Hyperparameters play a crucial role in determining a machine learning model’s performance. In this post we’ll talk about searching through the hyperparameter space and finding the optimal model using sweeps from Weights & Biases.

Some people use parameters and hyperparameters interchangeably. Before we get started, it’s important that we explicitly call out the difference between the two:

Parallel Coordinates Plot Visualizing the Performance of Multiple Runs Over a Set Number of Hyperparameters

Most of a machine learning model’s performance is driven by hyperparameters. For example with a large dataset, if you train your neural network with a small number of epochs, it is very likely to underfit the data. So you might begin to ask yourself questions like:

All of these questions collectively lead us to: what model architecture should be used for the given dataset?

Google researchers address this in their seminal work on Neural Architecture Search (NAS), but one challenge stands out: NAS does not scale well.  NAS is computationally expensive, and many organizations do not have the bandwidth to support it.

However, there are ways to work around this:

The first takes quite a bit of deliberate practice and may not always produce the expected or even accurate result. That is why the latter is heavily used today. In machine learning literature, the process of experimenting with different hyperparameter values to select the best model is referred to as hyperparameter tuning. The following are very popular methods for hyperparameter tuning:

This brings us to Hyperparameter Sweeps – a way to efficiently select the right model for a given dataset using Weights & Biases (wandb).

What are Hyperparameter Sweeps?

Hyperparameter Sweeps offer efficient ways of automatically finding the best possible combination of hyperparameter values for your machine learning model with respect to a particular dataset.

Conducting hyperparameter search is not a trivial task since not all  hyperparameters in a machine learning model have an equal priority when it comes to tuning them. Josh Tobin, in his Troubleshooting Deep Neural Networks presentation, has beautifully listed the hyperparameters in a typical neural network along with their priorities:

Typical Hyperparameters in Neural Network Architecture - Source

Hyperparameter Sweeps organize search in a very elegant way, allowing us to:

Running Hyperparameter Sweeps using Weights & Biases

Weights & Biases makes it really easy to run Hyperparameter Sweeps. To start, you need a model training script (more on that shortly) and a dataset. You can use this Colab notebook if you want to follow along without working in the code directly. If you want to take a more active role set up wandb. If you have not, be sure to check this out. We will also use TensorFlow 2.0, specifically its high-level Keras API.

To start, let’s load up the FashionMNIST dataset that ships with Keras.

# Load the dataset

fashion_mnist = tf.keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
# Scale the pixel values of the images
train_images = train_images / 255.0
test_images = test_images / 255.0
# Select the hyperparameters you want to tune. This is specified like the following:
sweep_config = {
   'method': 'grid',
   'parameters': {
       'layers': {
           'values': [32, 64, 96, 128, 256]

Where method is the hyperparameter tuning method we are going for and parameters is a dictionary containing the hyperparameters we want to tune.

# Next, we initialize this sweep by running:

sweep_id = wandb.sweep(sweep_config)

When you run this, you should get a link to the sweep which you can view in the browser and use to track your sweep runs.

Once you have initialized the sweep you need an agent. An agent is a model training script you can use to pair the sweep configurations. Let’s define a simple training script:

def train():
   # Initialize wandb with a sample project name
   (X_train, y_train) = train_images, train_labels
   (X_test, y_test) = test_images, test_labels
   # Specify the hyperparameter to be tuned along with
   # an initial value
   configs = {
       'layers': 128
   # Specify the other hyperparameters to the configuration
   config = wandb.config
   config.epochs = 5
   # Define the model
   model = tf.keras.Sequential([
       tf.keras.layers.Flatten(input_shape=(28, 28)),
       tf.keras.layers.Dense(wandb.config.layers, activation=tf.nn.relu),
       tf.keras.layers.Dense(10, activation=tf.nn.softmax)
   # Compile the model
   # Train the model, y_train, epochs=config.epochs,
                 validation_data=(X_test, y_test),
            callbacks=[WandbCallback(data_type="image", labels=labels)])

It’s super important to call wandb.init() at the beginning of the training script itself; otherwise, the different runs in the hyperparameter sweep will not be able to log the specified outputs to your wandb dashboard.

As you can see, we are trying to train a shallow fully-connected network. We are allowing the network to train for 5 epochs with the Adam optimizer (set to minimize the Sparse Category Crossentropy loss as the class labels are integers). This method is based on this example.

Once the model training script is done, you can run it as an agent to start the hyperparameter sweeping process.

wandb.agent(sweep_id, function=train)

And that’s it! You should see something like this in your notebook:

If you click on the sweep URL you got from the third step as described above, you should see a number of interesting things.

First, a dashboard giving you a summary of all the different runs (i.e. the network’s performance with a different set of layers) like so:

Note each run has its unique ID (left).

Along with the network’s performance details, for each run, you also get:

Just click on a run of your choice, and you will have these buttons available on

the left:

While choosing the best model from the sweep we focus on the trade-off between the validation loss and training loss, i.e. we find the runs for which the validation loss kept on decreasing with respect to the training loss. The following table can come in really handy here.

On your Weights and Biases project page (example here), you can press 'option+space' to expand the runs table, and compare all the results. From here you can sort by validation accuracy (or validation loss) to find the model that performed best. You can also explore the results of the sweep by adding a parallel coordinates plot.

How to Add a Parallel Coordinates Chart

Step 1: Click ‘Add visualization’ on the project page.

Step 2: Choose the parallel coordinates plot.

Step 3: Pick the dimensions (hyperparameters) you would like to visualize. Ideally, you want the last column to be the metric you are optimizing for (e.g. validation loss).

Step 4: And voila! You have a parallel coordinates plot!

In this plot, for example, we can see that a lower learning rate and a higher number of channels in our convolutional layers leads to a better validation accuracy. Parallel coordinate plots can be extremely useful in quickly seeing the trends in our hyperparameter values compared to our optimization metric. This, in turn, lets us narrow down our hyperparameter space and get to better accuracy faster.

If you’d like to learn more about how to choose the best model, check out these articles:


Now that you understand how to use sweeps, I encourage you to try sweeps with multiple hyperparameters. In this tutorial, we only tuned the layers hyperparameter; you could also try tuning the batch_size, epochs, optimizer and so on.

I hope you enjoyed reading this tutorial. Stay tuned for the second part where we dive deeper with more advanced hyperparameter sweeps and many practical tips of using them effectively.

Join our mailing list to get the latest machine learning updates.