Skip to main content

Ray Tune: Distributed Hyperparameter Optimization at Scale

This article takes a look at how to use Ray Tune with W&B to run an effective distributed hyperparameter optimization pipeline at scale.
Created on August 11|Last edited on November 5
This article explores how to use Ray Tune with Weights & Biases, running some experiments to tune hyperparameters to generate MNIST, STL10, and CelebA images to demonstrate how the combination of these two tools provides a one-stop shop for scaling machine learning experimentation and model development.

Table of Contents



Weights & Biases 💜 Ray Tune

Weights & Biases helps your ML team unlock their productivity by optimizing, visualizing, collaborating on, and standardizing their model and data pipelines – regardless of framework, environment, or workflow.
Used by the likes of OpenAI, Toyota and Github, W&B is part of the new standard of best practices for machine learning. By saving everything you need to track and compare models — architecture, hyperparameters, weights, model predictions, GPU usage, git commits, and even datasets – W&B makes your ML workflows reproducible.


Today we're announcing an integration with a tool our community adores – Ray Tune is one of the first and most respected libraries for scalable hyperparameter optimization. With just a few lines of code Ray/Tune helps researchers optimize their models with state-of-the-art algorithms and scale their hyperparameter optimization process to hundreds of nodes and GPUs.



Why We Chose Ray Tune – Delivering Model Development and Hyperparameter Optimization at Scale

We're especially excited about the possibilities this collaboration with our friends at Ray/Tune opens up. Both Weights & Biases and Ray/Tune are built for scale and handle millions of models every month for teams doing some of the most cutting-edge deep-learning research.
Whereas W&B is a centralized repository for everything you need to track, reproduce and gain insights from your models easily; Ray/Tune provides a simple interface for scaling and running distributed experiments. A few reasons why our community likes Ray/Tune –
  1. Simple Distributed execution: Ray Tune makes it easy to scale from a single node, to multiple GPUs, and further multiple nodes
  2. Large number of algorithms: Ray Tune has a huge number of algorithms including Population Based Training, ASHA, and HyperBand
  3. Framework agnostic: Ray Tune works across frameworks including PyTorch, Keras, Tensorflow, XGBoost, and PyTorchLightning.
  4. Fault-tolerance: Ray Tune is built on top of Ray, providing fault tolerance out of the box.

Getting Started

There are 2 ways you can use the W&B integration with Ray Tune.

1. The WandbLogger

tune.run(
train,
loggers=[WandbLogger],
config={
"wandb": {"project": "rayTune", "monitor_gym": True}},
})
WandbLogger automatically logs the metrics reported to the W&B dashboard of the project.

2. The wandb_mixin

You can also use wandb_mixin function decorator when you need to log any custom metrics, charts and other visualizations
@wandb_mixin
def train(...):
...
wandb.log({...})
tune.report(metric = score)
...
Note: The W&B integration with Ray Tune is available on the nightly version of Ray and will be included in ray 0.8.7. Here are the instructions to pip install the nightly version of Ray.

Hear From the Ray Tune Team

AMA

We’re delighted to host the Ray/Tune team in our Slack community for an AMA on building hyperparameter optimization at workflows scale.
When: Friday, August 14 at 9 am PT Where: W&B Slack Community in the #ama-ml-questions channel
We invite you to start posting all your distributed hyperparameter optimization questions in #ama-ml-questions now. The Ray/Tune team will answer them from 9am - 10am on Friday.

Ray Summit

Ray Summit is a FREE Virtual Summit for all things Ray related! Join us to see talks by leading computer scientists, the founders of Anyscale and Weights & Biases.
Beyond these two upcoming events, we’re excited to bring you the best of distributed computing infrastructure and developer tools for machine learning to make it simple to go from model development to production in the fewest steps possible. We can’t wait to see what you’ll build.


A Deeper Dive

Let us explore the integration more in-depth by running some experiments.

Generation of MNIST images

The objective of this experiment is to train a vanilla Deep Convolutional Generative Adversarial Network to generate MNIST images.
Here's the basic code structure.
class Generator(nn.Module):
def __init__(self, latent_vector_size, features=32, num_channels=1):
super(Generator, self).__init__()
self.latent_vector_size = latent_vector_size
self.main = nn.Sequential(
'''
Network Layers
''' )
def forward(self, x):
return self.main(x)

class Discriminator(nn.Module):
def __init__(self, features=32, num_channels=1):
super(Discriminator, self).__init__()
self.main = nn.Sequential(
'''
Network Layers
''' )
def forward(self, x):
return self.main(x)
We'll use Tune to search for the best set of hyper-parameters for training a DCGAN on the MNIST dataset. We'll eliminate the bad choices for hyper-parameters before training on a larger CelebA dataset. Here's the structure of the training loop:
@wandb_mixin
def train_batch(...):
"""Trains on one batch of data from the data creator."""
real_label = 1
fake_label = 0
discriminator, generator ==models
optimD, optimG = optimizers
# Compute a discriminator update for real images
discriminator.zero_grad()
...
errD_real = criterion(output, label)
errD_real.backward()
# Compute a discriminator update for fake images
fake = generator(noise)
grid = make_grid(fake, nrow=10)
npgrid = np.transpose(grid.cpu().detach().numpy(), (1, 2, 0))
output = discriminator(fake.detach()).view(-1)
errD_fake = criterion(output, label)
errD_fake.backward()
errD = errD_real + errD_fake
# Update the discriminator
optimD.step()
# Update the generator
...
optimG.step()
'''
LOG on WandbDashboard
'''
wandb.log({
"batch_loss_g": errG.item(),
"batch_loss_d": errD.item()})
wandb.log({'Fake':wandb.Image(npgrid)})
return {
"loss_g": errG.item(),
"loss_d": errD.item(),
}
Below are the hyper-parameter ranges dictionary that we used with Tune. The information about W&B project( name, API KEY etc) can be passed into this dictionary.
tr_config = {
"lr" : tune.grid_search([0.001,0.01,0.005,0.05,0.1]),
"beta1" : tune.grid_search([0.5,0.8,0.9,0.99]),
"beta2" : tune.grid_search([0.5,0.8,0.9,0.99]),
"batch_size" : tune.grid_search([16,32,64]),
"epochs":5
# specify wandb project and apikey
"wandb": {
"project": "...",
"api_key": "..",
}
}
ray.init()
analysis = tune.run( train,
config = tr_config)
print(analysis.get_best_config(metric="metric"))
ray.shutdown()
The basic Tune workflow looks something like this.



Ray Dashboard

Ray Tune fires up a server on a localhost port if you're using your local system. All the information about the tasks created and the resources used are displayed there in real time. You can also connect to a remote server running a tuning job by passing the address argument in the ray.init() function.



W&B Dashboard

Let us now look at the runs and the metrics logged in the W&B dashboard. Here are some of the images generated by our model.


Run set
201


Generation of STL10 Images

In this experiment, we ran a hyperparameter tuning job for the task of generating STL10 images using the same DCGAN model after updating the hyperparameter values from the previous experiment.


Run set
26



Generation of CelebA Images

We chose the subset of hyper-parameters that we used in the previous experiment and we trained the DCGAN network on the new set of parameters using Tune.
ray.init()
tr_config = {
"lr" : tune.grid_search([0.0005,0.001,0.005,0.0003]),
"beta1" : 0.5,
"beta2" : tune.grid_search([0.999,0.99]),
"batch_size" :tune.grid_search([64,512,256]),
"epochs":10,
"wandb": {
"project": "...",
"api_key": "...", }
}

Support for Resuming Experiments

When using Tune, you can always resume your experimentation if you run into some errors that cause the program to crash.
analysis = tune.run(
train_example,
config = tr_config ,
resume =True #Resumes the experiment from the last checkpoint
)
Let us now look at some of the results Generated using our model. Almost all of the models optimized pretty well using the trimmed-down version of the hyperparameters. We have successfully combined multiple facial features to form new faces. There are definitely some facial features that overlap which can further be optimized by using larger models like StyleGAN.
You can always go back to the dashboard to view the information about these experiments, group the runs by categories and write detailed reports.

Run set
10


Conclusion

Ray Tune combined with W&B is a one-stop solution for machine learning experiment management and tracking. This integration is magical for several reasons. Firstly, it combines two excellent tools for scaling machine learning experimentation and model development.
Ray Tune makes it easy to scale from a single node to multiple GPUs, and further multiple nodes
The Ray and Weights & Biases team are hard at work collaborating to make developing machine learning applications simple and we’ve got a number of things coming up to help the community learn more!

Iterate on AI agents and models faster. Try Weights & Biases today.