Skip to main content

Iteratively Fine-Tuning Neural Networks with Weights & Biases

How to train and tune neural networks with Weights & Biases
Created on September 20|Last edited on September 20
When I’m trying to solve a problem with machine learning, I always follow three steps:
  1. Inspect the data
  2. Find typical architectures for this type of problem
  3. Train and fine-tune my neural network
In this article I’ll dive into the third step, training and fine-tuning.

Before Weights & Biases

When I first tried to optimize neural networks, I was using my local desktop machine to test different combinations of hyperparameters, and I took a lot of notes.
  • I tracked hyperparameters, adding more along the way.
  • I recorded the final value of metrics, ignoring the values over epochs or batches.
  • My notes were difficult to read, but tried to capture important changes like updating the loss or fixing a bug in the script.
Does this picture remind you of some of your model fine-tuning?

As I started using multiple remote machines, tracking became more difficult. In some cases, I had local changes that weren’t reflected on remote machines, and it was hard to notice the errors.
Out of desperation (and an inability to read my terrible handwriting), I turned to Excel… but manual spreadsheets still didn’t solve my real problem.
I was using handwritten notes and Excel because there were no tools or frameworks to automatically track and compare my training runs. TensorBoard only solves one part of the problem, visualizing experiments on graphs.

Unfortunately, I found that TensorBoard made it hard to compare multiple experiments, especially when I was using multiple servers. I was also trying PyTorch and Fast.ai, which made it hard to continue using TensorBoard.
I took the Full Stack Deep Learning class with Josh Tobin from OpenAI, and he guided us through best practices for training models. One of the tools he mentioned was Weights & Biases for experiment tracking, so I picked it up to see if it would help with my organization problem.
The wandb tool helped in a few different ways:
  • It logs all the hyperparameters and metrics
  • It shows quick visualizations like TensorBoard
  • It saves the trained model weights
  • I can easily sort and filter my runs
  • I could automatically log prediction samples during training, like the semantic segmentation examples below.
Here’s an example of an experiment where I was doing semantic segmentation.

In the web interface, it was easy to:
  • Create custom graphs (more on this later)
  • Create reports to save my reasoning and refer to it later
  • Access results from my phone! I sometimes obsess over experiments, and it is easy to quickly access the results from my phone! When I was obsessing over the colorizer challenge, I could quickly check on my experiments at any time to keep a healthy mind!
  • Integrate wandb in my script with just a few lines of code

When I have to optimize a model, I run into one of the two following cases:
  • Each experiment is taking a long time (for example 5 to 45 hours with semantic segmentation problems):
  • I carefully choose my hyperparameters
  • I may run up to 4-5 experiments (in parallel if I can rent online resources)
  • After those 4-5 experiments, I analyze my results and define the next set of hyperparameters to run.
  • Each experiment is relatively fast (simple classification problem)
  • My hyperparameters are defined randomly

  • I run continuously the experiment on every server I can
  • I inspect from time to time the results and refine my range of hyperparameters
When trying to observe the best runs, I first look at the parallel coordinates graph.

If I cannot observe easily which runs are good or bad, I also plot graphs grouped by hyper parameter. In those cases, you need to check that you get better results in average but you may also want to compare the best result of each grouped set (top of the band).

Once I find a parameter value much better (or much worse) than the other ones, I use it to filter all my runs by this value, and look for the next parameter to select.
Using a group of runs to make a conclusion is much more reliable as it decreases the effects of random noise that exist on every experiment.
This process is iterative and as I refine my values, I may decide to run more experiments within my reduced range of hyper-parameters until I am completely satisfied. I also save my reasoning in a report so that I can remember why I selected a particular filter.
I often like to start with shorter runs or reduced input size to try and get a few insights quickly. It is not completely reliable but useful when you have a limited amount of time to solve a problem.
I’ve not ventured into any kind of AutoML techniques yet, starting with hyper-parameter bayesian optimization but I’ll try it next to see how I can integrate it with my current workflow.
Please share your own workflow or any suggestions and comments you may have to make it better!
Iterate on AI agents and models faster. Try Weights & Biases today.