How to Perform Massive Hyperparameter Experiments with W&B

In this article, we look at how Weights & Biases can help you track and organize a massive experiment with thousands of runs.
Thomas Capelle
Created on May 13|Last edited on March 20
Comment
﻿
﻿
Recently, I started a massive experiment of studying how well pre-trained image models transferred to other datasets. The question that I was trying to answer was:
Which pre-trained model is best for fine-tuning an image classifier?
Answering this question is complex. There isn't a straightforward answer. Ideally, we'd like to establish a relationship between parameters like accuracy vs. inference time and also define clear recipes on how to train each model type. Keeping track of the different combinations of hyperparameters can be daunting, sure, but it's one of the many ways Weights & Biases come to the rescue.
In this article, we will show the best practice for refactoring your scripts to perform hyperparameter searches with W&B Sweeps. It's a recipe that serves to organize your training code so it can then later be orchestrated to search for hyperparameters. Let's go!
We will assume that you are already familiar with hyperparameter search, if not, take a look at the folded section below ⬇️
💡
Table of ContentsCreating a Training ScriptInto a Script (optional)Sweep Sweep Sweep 🧹Conclusion & More
﻿
Intro to Sweeps (click to expand)
Creating a Training ScriptI like to prototype on Jupyter notebooks; I refactor my code and prepare my training script to be ready to launch it hundreds of times. Ideally, we'll split our code into multiple parametrized functions so that we can put everything together in a train function.
I would normally have a function to get the dataset in the right format...
def get_dataloader(batch_size, img_size, seed=42, method="crop"):
    "Use fastai to get the Dataloader for the Oxford Pets dataset"
    dataset_path = untar_data(URLs.PETS)
    files = get_image_files(dataset_path/"images")
    dls = ImageDataLoaders.from_name_re(dataset_path, files, 
                                        r'(^[a-zA-Z]+_*[a-zA-Z]+)', 
                                        valid_pct=0.2, 
                                        seed=seed, 
                                        bs=batch_size,
                                        item_tfms=Resize(img_size, method=method)) 
    return dls
...and then a training loop. For this example, I am using ﻿﻿fastai:
def train(wandb_project: str="my_fancy_project", 
          batch_size: int=64, 
	  img_size: int=224, 
	  seed: int=42, 
	  resize_method: str="crop", 
	  model_name: str:"convnext_tiny", 
	  epochs: int=5, 
	  learning_rate: float=2e-3):
    "Create a run and train the model"	
    with wandb.init(project=wandb_project, group="timm"):
        dls = get_dataloader(batch_size, img_size, seed, resize_method)
        learn = vision_learner(dls, 
                               model_name, 
                               metrics=[accuracy, error_rate], 
                               cbs=WandbCallback(log_preds=False)).to_fp16()
        learn.fine_tune(epochs, learning_rate)
﻿
😁 As we are using fastai and we have a tight integration using the WandbCallback, you don't need to set up any manual logging.
💡
Refactoring the CodeThe first thing to do, instead of passing the parameters one by one, is to change the function to accept a config dictionary as an argument, so we can easily override the parameters afterward. 
Pro tip:  I prefer using a SimpleNamespace instead of a dict so I can access the elements as attributes (with the dot).
💡
import wandb
from fastai.vision.all import *
from fastai.callback.wandb import WandbCallback
﻿
config_defaults = SimpleNamespace(
    batch_size=64,
    epochs=5,
    learning_rate=2e-3,
    img_size=224,
    resize_method="crop",
    model_name="convnext_tiny",
    seed=42,
    wandb_project='my_fancy_project')
﻿
def train(config=config_defaults):
    with wandb.init(project=config.wandb_project, config=config):
        config = wandb.config #we have to add this line, so we can inject the parameters afterwards
        dls = get_dataloader(config.batch_size, config.img_size, config.seed, config.resize_method)
        learn = vision_learner(dls, 
                               config.model_name, 
                               metrics=[accuracy, error_rate], 
                               cbs=WandbCallback(log_preds=False)).to_fp16()
        learn.fine_tune(config.epochs, config.learning_rate)
﻿
if __name__ == "__main__":
    train()
As you can see, this train function takes as input a config parameter. This is very typical in machine learning training scripts, where you would define these parameters on a YAML file or a python dictionary. We can dump this into a train.py file and we are ready to perform the sweep!
Into a Script (optional)Now we are almost there, I like my scripts to be able to run from the command line, so I usually transform them using argparse
There are multiple CLI option for parameterizing scripts in python: argparse, fastai's call_parse, typer,  fire,  ml_collections, just to name a few. If you have one that you prefer, I would love to know why and how to integrate it on this setup.
💡
We will use argparse as is the default one used in Python:
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--batch_size', type=int, default=config_defaults.batch_size)
    parser.add_argument('--epochs', type=int, default=config_defaults.epochs)
    parser.add_argument('--learning_rate', type=float, default=config_defaults.learning_rate)
    parser.add_argument('--img_size', type=int, default=config_defaults.img_size)
    parser.add_argument('--resize_method', type=str, default=config_defaults.resize_method)
    parser.add_argument('--model_name', type=str, default=config_defaults.model_name)
    parser.add_argument('--seed', type=int, default=config_defaults.seed)
    parser.add_argument('--wandb_project', type=str, default='my_fancy_project')
    return parser.parse_args()
...and we can use this in the main of the program. This way, we can override our training script arguments on the fly if we want to test something.
if __name__ == "__main__":
    args = parse_args()
    train(config=args)
Putting it all TogetherThe final script train.py looks like this:
import wandb
import argparse
from fastai.vision.all import *
from fastai.callback.wandb import WandbCallback
﻿
config_defaults = SimpleNamespace(
    batch_size=64,
    epochs=5,
    learning_rate=2e-3,
    img_size=224,
    resize_method="crop",
    model_name="convnext_tiny",
    seed=42,
    wandb_project='my_fancy_project',
)
﻿
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--batch_size', type=int, default=config_defaults.batch_size)
    parser.add_argument('--epochs', type=int, default=config_defaults.epochs)
    parser.add_argument('--learning_rate', type=float, default=config_defaults.learning_rate)
    parser.add_argument('--img_size', type=int, default=config_defaults.img_size)
    parser.add_argument('--resize_method', type=str, default=config_defaults.resize_method)
    parser.add_argument('--model_name', type=str, default=config_defaults.model_name)
    parser.add_argument('--seed', type=int, default=config_defaults.seed)
    parser.add_argument('--wandb_project', type=str, default=config_defaults.wandb_project)
    return parser.parse_args()
﻿
def get_dataloader(batch_size, img_size, seed=42, method="crop"):
    dataset_path = untar_data(URLs.PETS)
    files = get_image_files(dataset_path/"images")
    dls = ImageDataLoaders.from_name_re(dataset_path, files, 
                                        r'(^[a-zA-Z]+_*[a-zA-Z]+)', 
                                        valid_pct=0.2, 
                                        seed=seed, 
                                        bs=batch_size,
                                        item_tfms=Resize(img_size, method=method)) 
    return dls
﻿
﻿
def train(config=config_defaults):
    with wandb.init(project=config.wandb_project, config=config):
        config = wandb.config. #we have to add this line, so we can inject the parameters afterwards
        dls = get_dataloader(config.batch_size, config.img_size, config.seed, config.resize_method)
        learn = vision_learner(dls, 
                               config.model_name, 
                               metrics=[accuracy, error_rate], 
                               cbs=WandbCallback(log_preds=False)).to_fp16()
        learn.fine_tune(config.epochs, config.learning_rate)
﻿
if __name__ == "__main__":
    args = parse_args()
    train(config=args)
Now we can call this script from the command line directly and override the parameters by passing them as arguments.
The command-line interface that argparse creates. We can get a list of the different arguments by passing the --help flag
You can now override any parameter by calling the script with the proper flag:
python train.py --img_size=128 --batch_size=32, for instance, will override the image size and the batch size with these values. You can also pass them separated by a space --img_size 128 instead of the =
Sweep Sweep Sweep 🧹We are ready to use our script in a hyperparameter search now; let's do it!
Prepare the sweep configurationThere are multiple ways to launch a sweep, but the preferred way is using the wandb sweep command. To do so, we need to create a YAML file with the method and the hyperparameters we will search. Just save this file along with the train.py script, and you are ready to go!
program: train.py
method: bayes
metric:
  name: valid_loss
  goal: minimize
parameters:
  model_name:
    values: ['levit_128s', 'resnet18', 'resnet34d', 'convnext_base', 'regnetx_064']
  learning_rate:
    min: 0.0001
    max: 0.1
  resize_method:
    values: ['crop', 'squish']
Creating the SweepThe sweep controller will live on our server (you can also launch a local sweep controller in your own infrastructure). The controller will take care of the distribution of the hyperparameters and repatriate all the sweeps results.
﻿
Once the controller has been launched, you'll want to retrieve the sweep id.
Launching An AgentYou are ready to go. Now you can spawn the agents (the computers that will actually do the job). In each one of the computers you want the sweep to run, execute the command:
$ wandb agent {SWEEP_ID}
﻿
There, the sweep is now running!
If you want to stop after a certain amount of runs, you can pass wandb agent sweep_id --count 100 do run at max 100 experiments.
💡
Conclusion & MoreThis is my preferred way of doing sweeps. I really love Jupyter and would actually do all the coding and preparation of the code/scripts inside Jupyter lab. Still, for running a sweep that will probably take the whole weekend to run on an expensive instance, this is a more reliable way.
Classification Loss Functions: Comparing SoftMax, Cross Entropy, and More
Sometimes, when training a classifier, we can get confused about the last layer to put on our neural networks. This article helps you understand how to do it right.
Tables & Artifacts Reports
Showcasing features & applications in W&B
Leveraging Pre-Trained Models for Image Classification 
In this article, we fine-tune a pre-trained model on a new classification dataset, to understand how well transfer learning helps the model train on new data. 
Checking Out the New fastai/timm Integration
In this article, we'll explore how ML practitioners can leverage the full timm backbone catalog in their deep learning pipelines with the new fastai integration.
﻿
﻿