Skip to main content

How to Perform Massive Hyperparameter Experiments with W&B

In this article, we look at how Weights & Biases can help you track and organize a massive experiment with thousands of runs.
Created on May 13|Last edited on March 20

Recently, I started a massive experiment of studying how well pre-trained image models transferred to other datasets. The question that I was trying to answer was:
Which pre-trained model is best for fine-tuning an image classifier?
Answering this question is complex. There isn't a straightforward answer. Ideally, we'd like to establish a relationship between parameters like accuracy vs. inference time and also define clear recipes on how to train each model type. Keeping track of the different combinations of hyperparameters can be daunting, sure, but it's one of the many ways Weights & Biases come to the rescue.
In this article, we will show the best practice for refactoring your scripts to perform hyperparameter searches with W&B Sweeps. It's a recipe that serves to organize your training code so it can then later be orchestrated to search for hyperparameters. Let's go!
We will assume that you are already familiar with hyperparameter search, if not, take a look at the folded section below ⬇️
💡

Table of Contents



Intro to Sweeps (click to expand)

Creating a Training Script

I like to prototype on Jupyter notebooks; I refactor my code and prepare my training script to be ready to launch it hundreds of times. Ideally, we'll split our code into multiple parametrized functions so that we can put everything together in a train function.
I would normally have a function to get the dataset in the right format...
def get_dataloader(batch_size, img_size, seed=42, method="crop"):
"Use fastai to get the Dataloader for the Oxford Pets dataset"
dataset_path = untar_data(URLs.PETS)
files = get_image_files(dataset_path/"images")
dls = ImageDataLoaders.from_name_re(dataset_path, files,
r'(^[a-zA-Z]+_*[a-zA-Z]+)',
valid_pct=0.2,
seed=seed,
bs=batch_size,
item_tfms=Resize(img_size, method=method))
return dls
...and then a training loop. For this example, I am using fastai:
def train(wandb_project: str="my_fancy_project",
batch_size: int=64,
img_size: int=224,
seed: int=42,
resize_method: str="crop",
model_name: str:"convnext_tiny",
epochs: int=5,
learning_rate: float=2e-3):
"Create a run and train the model"
with wandb.init(project=wandb_project, group="timm"):
dls = get_dataloader(batch_size, img_size, seed, resize_method)
learn = vision_learner(dls,
model_name,
metrics=[accuracy, error_rate],
cbs=WandbCallback(log_preds=False)).to_fp16()
learn.fine_tune(epochs, learning_rate)

😁 As we are using fastai and we have a tight integration using the WandbCallback, you don't need to set up any manual logging.
💡

Refactoring the Code

The first thing to do, instead of passing the parameters one by one, is to change the function to accept a config dictionary as an argument, so we can easily override the parameters afterward.
Pro tip: I prefer using a SimpleNamespace instead of a dict so I can access the elements as attributes (with the dot).
💡
import wandb
from fastai.vision.all import *
from fastai.callback.wandb import WandbCallback

config_defaults = SimpleNamespace(
batch_size=64,
epochs=5,
learning_rate=2e-3,
img_size=224,
resize_method="crop",
model_name="convnext_tiny",
seed=42,
wandb_project='my_fancy_project')

def train(config=config_defaults):
with wandb.init(project=config.wandb_project, config=config):
config = wandb.config #we have to add this line, so we can inject the parameters afterwards
dls = get_dataloader(config.batch_size, config.img_size, config.seed, config.resize_method)
learn = vision_learner(dls,
config.model_name,
metrics=[accuracy, error_rate],
cbs=WandbCallback(log_preds=False)).to_fp16()
learn.fine_tune(config.epochs, config.learning_rate)

if __name__ == "__main__":
train()
As you can see, this train function takes as input a config parameter. This is very typical in machine learning training scripts, where you would define these parameters on a YAML file or a python dictionary. We can dump this into a train.py file and we are ready to perform the sweep!

Into a Script (optional)

Now we are almost there, I like my scripts to be able to run from the command line, so I usually transform them using argparse
There are multiple CLI option for parameterizing scripts in python: argparse, fastai's call_parse, typer, fire, ml_collections, just to name a few. If you have one that you prefer, I would love to know why and how to integrate it on this setup.
💡
We will use argparse as is the default one used in Python:
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument('--batch_size', type=int, default=config_defaults.batch_size)
parser.add_argument('--epochs', type=int, default=config_defaults.epochs)
parser.add_argument('--learning_rate', type=float, default=config_defaults.learning_rate)
parser.add_argument('--img_size', type=int, default=config_defaults.img_size)
parser.add_argument('--resize_method', type=str, default=config_defaults.resize_method)
parser.add_argument('--model_name', type=str, default=config_defaults.model_name)
parser.add_argument('--seed', type=int, default=config_defaults.seed)
parser.add_argument('--wandb_project', type=str, default='my_fancy_project')
return parser.parse_args()
...and we can use this in the main of the program. This way, we can override our training script arguments on the fly if we want to test something.
if __name__ == "__main__":
args = parse_args()
train(config=args)

Putting it all Together

The final script train.py looks like this:
import wandb
import argparse
from fastai.vision.all import *
from fastai.callback.wandb import WandbCallback

config_defaults = SimpleNamespace(
batch_size=64,
epochs=5,
learning_rate=2e-3,
img_size=224,
resize_method="crop",
model_name="convnext_tiny",
seed=42,
wandb_project='my_fancy_project',
)

def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument('--batch_size', type=int, default=config_defaults.batch_size)
parser.add_argument('--epochs', type=int, default=config_defaults.epochs)
parser.add_argument('--learning_rate', type=float, default=config_defaults.learning_rate)
parser.add_argument('--img_size', type=int, default=config_defaults.img_size)
parser.add_argument('--resize_method', type=str, default=config_defaults.resize_method)
parser.add_argument('--model_name', type=str, default=config_defaults.model_name)
parser.add_argument('--seed', type=int, default=config_defaults.seed)
parser.add_argument('--wandb_project', type=str, default=config_defaults.wandb_project)
return parser.parse_args()

def get_dataloader(batch_size, img_size, seed=42, method="crop"):
dataset_path = untar_data(URLs.PETS)
files = get_image_files(dataset_path/"images")
dls = ImageDataLoaders.from_name_re(dataset_path, files,
r'(^[a-zA-Z]+_*[a-zA-Z]+)',
valid_pct=0.2,
seed=seed,
bs=batch_size,
item_tfms=Resize(img_size, method=method))
return dls


def train(config=config_defaults):
with wandb.init(project=config.wandb_project, config=config):
config = wandb.config. #we have to add this line, so we can inject the parameters afterwards
dls = get_dataloader(config.batch_size, config.img_size, config.seed, config.resize_method)
learn = vision_learner(dls,
config.model_name,
metrics=[accuracy, error_rate],
cbs=WandbCallback(log_preds=False)).to_fp16()
learn.fine_tune(config.epochs, config.learning_rate)

if __name__ == "__main__":
args = parse_args()
train(config=args)
Now we can call this script from the command line directly and override the parameters by passing them as arguments.
The command-line interface that argparse creates. We can get a list of the different arguments by passing the --help flag
You can now override any parameter by calling the script with the proper flag:
  • python train.py --img_size=128 --batch_size=32, for instance, will override the image size and the batch size with these values. You can also pass them separated by a space --img_size 128 instead of the =

Sweep Sweep Sweep 🧹

We are ready to use our script in a hyperparameter search now; let's do it!

Prepare the sweep configuration

There are multiple ways to launch a sweep, but the preferred way is using the wandb sweep command. To do so, we need to create a YAML file with the method and the hyperparameters we will search. Just save this file along with the train.py script, and you are ready to go!
program: train.py
method: bayes
metric:
name: valid_loss
goal: minimize
parameters:
model_name:
values: ['levit_128s', 'resnet18', 'resnet34d', 'convnext_base', 'regnetx_064']
learning_rate:
min: 0.0001
max: 0.1
resize_method:
values: ['crop', 'squish']

Creating the Sweep

The sweep controller will live on our server (you can also launch a local sweep controller in your own infrastructure). The controller will take care of the distribution of the hyperparameters and repatriate all the sweeps results.

Once the controller has been launched, you'll want to retrieve the sweep id.

Launching An Agent

You are ready to go. Now you can spawn the agents (the computers that will actually do the job). In each one of the computers you want the sweep to run, execute the command:
$ wandb agent {SWEEP_ID}

There, the sweep is now running!
If you want to stop after a certain amount of runs, you can pass wandb agent sweep_id --count 100 do run at max 100 experiments.
💡

Conclusion & More

This is my preferred way of doing sweeps. I really love Jupyter and would actually do all the coding and preparation of the code/scripts inside Jupyter lab. Still, for running a sweep that will probably take the whole weekend to run on an expensive instance, this is a more reliable way.

Adam Geringer
Adam Geringer •  
. typo?
1 reply
alleniver
alleniver •  
useful article! thx man
1 reply
Nick White
Nick White •  
This is such a helpful and clear intro, Thomas! Thank you!
1 reply
Iterate on AI agents and models faster. Try Weights & Biases today.