Skip to main content

Hyperparameter Tuning with W&B Sweeps

Learn hyperparameter tuning using Weights & Biases. This video is a sampling from the free MLOps certification course from Weights & Biases!
Created on December 27|Last edited on December 28
Hyperparameter tuning is a crucial step in the machine learning process, as it can significantly impact the performance of your model. However, finding the optimal hyperparameters can be a time-consuming and manual process involving the testing of multiple combinations and the tracking of results. In this video from our MLOps course, we show you how to use Weights & Biases Sweeps to automate the hyperparameter tuning process.
With Weights & Biases Sweeps, you can easily define the hyperparameters you want to test and the range of values for each parameter. The platform will then automatically run a series of experiments, tracking the results in real-time and providing insights into the best-performing combinations. This not only saves time but also helps you quickly find the optimal hyperparameters for your model. If you're looking to streamline your hyperparameter tuning process, be sure to watch this video.




Transcription (from Whisper)

Hello again, let's take care of the original task, improving the model performance.
How can we make the model better, and increase the intersection of a union metric?
In our previous video, we refactored the baseline notebook. Now, we'll want to export everything to a train.py script file.
There are multiple ways of doing this. You can export every single cell by hand, copy-pasting them on the train.py file, or you can use a semi-automatic way, like nbdev or nbconvert. To keep it simple, I went the manual way. I merged every single cell and copy-pasted them on the train.py file.
At the end of the file, you have the train function that depends on the config. The only thing I have added, it's the path args functionality. This enables you to overwrite arguments on the fly. It converts your Python program into an interactive command-line interface.
There are multiple tools to add this functionality. I'm using default Python arg-paths. As we're using files.ai, you can use the built-in call-paths decorator that transforms your script into a command-line interface.
Inside the environment, you just call this file with Python, python-train.py. This way, we'll run the file in the same way as it was presented on the notebook.
This is not very interesting, as it will run with default args, and we already have a bunch of runs like that. Let's just cancel the run using Ctrl-C.
Let's go to the workspace and see the new run that we logged. There it is.
We can delete this partial run by clicking on the three dots. As we said before, we have an interactive Python program now, so let's make use of that.
You can access the help menu by calling python-train-.help, and it will print you out all the parameters you are capable of overwriting. Or with a different batch size. Let's try batch size 16 for a change.
You can override batch size by passing the argument name and the new value. This will create a new run with batch size equal 16, overriding the default 8.
Let's confirm this on the workspace. There it is.
Let's click on the run and on the overview tab, and scroll down to the config. We can confirm that the batch size is 16 now. Let's also cancel this run.
What we actually want to do is explore the hyperparameter space, but we don't want to do it manually like that. We want to define a way to orchestrate our hyperparameter optimization.
Here comes Weights & Biases sweeps, our hyperparameter optimization tool.
With just a few lines of code and our already instrumented training script, you will be running massive hyperparameter tuning in no time.
But how do we actually do this? How do we tell weights and biases to run this code automatically?
This is done through a YAML configuration file.
First, you define what script you want to run, in our case, train.py.
Second, you have to define a method of exploration of the hyperparameter space.
We provide grid, random, and bias and optimization search. You can refer to the Weights & Biases sweeps documentation to get more information about the algorithms.
Then, you have to define which project your sweep will live in.
Sometimes, you want to use a different project to put your sweeps in, to not pollute your main workspace.
In our case, we will use the same project as before. Then, you define a metric to monitor.
In our case, we want to maximize mean intersection of a union. And finally, you want to define your hyperparameter space. You can use these to overwrite default arguments, like log predictions. It is equivalent to changing the default values on the train.py file.
You can use a distribution to sample continuous parameters. In our case, we will sample the learning rate between the log of the mean and max values. You can also pass a list of values for discrete parameters.
We will try a smaller batch size, so we get more optimizer updates as our dataset is bigger. I had good results in the past using this trick with small datasets.
We will also increase the image size a little bit. This should improve the model performance on segmentation of small objects.
Finally, we will try different image backbones. These are my four favorite backbones from TorchVision. Feel free to go to TorchVision Malls and try other backbones. There are plenty of them.
Depending on your task and your dataset, you may want to try bigger malls. TorchVision Malls are trained regularly. With state-of-the-art techniques, our sweep configuration file is ready.
Let's switch to our terminal now and start the sweep.
You can launch the sweep using 1db, sweep, and the sweep configuration file. The sweep has been created.
You can click the link and you will be redirected to the sweep workspace. It's still empty. But you can click on the Overview tab and see the configuration file that was used to create the sweep. And you see the proposed sweep command to launch an agent.
This is the same command suggested on the terminal.
Let's run this command. Before doing that, let's check the options available for the W&B agent. We see that we have a count parameter, so we can limit the max number of runs per agent.
As we are doing a random search, if you don't pass any count parameter, it will run forever, so you will have to kill it manually.
Let's start by 50 runs. Running this command will launch the agent, and you will start populating your sweep workspace with runs. You can see the selected hyperparameters at the beginning of the script.
Now we can switch to the workspace, and we should see the incoming run. There it is.
You will see the plot updating automatically as more runs come in. But I have an extra surprise for you. I have switched machines. I'm not using the same machine as we were using before. This machine is equipped with two GPUs.
You can check the available GPUs on your machine using the NVIDIA SMI command.
We see that the first GPU is being used, and the second one is just sitting idle. Let's fix that. Let's open a new terminal. We can override the CUDA visible device environment variable, and force the code to run on the second GPU. This command will create a new agent.
On the second GPU, we will also pass a quota of 50 runs.
This is really powerful when you have access to large compute centers, like a cluster, or machines equipped with multiple GPUs. As you expect, you can launch agents in parallel, and greatly reduce the time spent performing the sweep.
As we go to the workspace, we see that two runs are coming in parallel.
Great! Now we have two agents contributing in parallel to finish the sweep. This should reduce in half the time needed to complete our hyperparameter exploration.
In the next video, we'll explore the results of the finished sweep.
Iterate on AI agents and models faster. Try Weights & Biases today.