Skip to main content

How to Compare Keras Optimizers in Tensorflow for Deep Learning

A short tutorial outlining how to compare Keras optimizers for your deep learning pipelines in Tensorflow, with a Colab to help you follow along.
Created on March 7|Last edited on March 14

Table of Contents

A Quick Introduction To Optimizers For Deep Learning

In deep learning, an optimizer is a function or algorithm that works with a neural network's Weights & Biases (shouldn't everyone?). The optimizer modifies the parameters (or suggests modifications) with the goal of reducing the model's loss with minimal effort.
Optimizers, when coupled with weight initialization, play a key role in neural network training dynamics and are currently a hot research topic. The question for many practitioners is which particular optimizer to use for which particular project.
Answering that question is the purpose of this post.
Put simply: in this report, we'll learn how you can choose the best optimizer for your deep learning project using Weights & Biases.
First things first: sadly, there is no "one-size-fits-all" best optimizer that's guaranteed to give you the best performance or the quickest training time in every pipeline or task. Whereas Adam or SGD might be the most popular first choice and give decent results, depending on your dataset size and distribution or neural network weight initialization strategy, there might be some niche optimizer that results in the best performance and training time (as we'll explore shortly).
The solution? As the title suggests, one simple way might is to compare some of the SOTA (state-of-the-art) optimizers against each other, and select the right one for your particular project. Weights & Biases can help you do just that and we'll explain all of that below.
Before jumping in, just note if you'd like to follow along with this piece in a Colab with executable code, you can find that right here:

Try it in a Colab Notebook \rightarrow

Ok, let's get started:

Experimenting With Keras Optimizers

Today, we're going to use Keras for our codebase.
In Keras, comparing optimizers is a simple task that just involves changing the optimizer: str parameter in the model.compile() calls and using the WandbMetricsLogger for Keras viz. Like so:
import wandb
from wandb.keras import WandbMetricsLogger

# Initialize the run
wandb.init(project="Optimizers")

# Create Model and Dataset
# model = ...
# train_ds = ...
# val_ds = ...

model.compile(optimizer="adam", # <---- Change to desired value for ex, "adadelta" or "adagrad"
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])

model.fit(...,callbacks = [WandbMetricsLogger()])
No, really: that's it. That's all it takes.
Now, you can train your model with various optimizers and the Weights & Biases Keras Callback will automatically pull in the metrics from your system. You can easily visualize them using W&B Panels to share them across your team.
For example, here's a quick comparison using a CNN the Flowers Dataset you'll find linked in the Colab above:

Run set
11

As you can see we have grouped our runs by the optimizer config variable which allows us to easily visualize and compare the various optimizers tested.
Another import factor to consider for bigger datasets and models are system metrics like as GPU Utilization and Memory Used. Luckily Weights & Biases also tracks those metrics, allowing us to use them in the decision making process.

Run set
11

All the graphs above were generated by training a simple CNN on the Flowers Dataset and following the aforementioned Colab adapted from the Official Tensorflow Tutorial.
💡
As we can see from the plots, Nadam happens to be the best optimizer in terms of both performance and CPU/GPU Utilization. To further improve the metrics, you can try hyperparameter tuning by changing the learning rate or the exponential decay rates. Weights & Biases Sweeps makes this incredibly easy by automatically running your pipeline using an agent.
For more details please refer to our Sweeps Quickstart Guide.

Try it out in a Colab Notebook \rightarrow

Comparing Keras Optimizers

In this article, we explored how to compare various optimizers for your deep learning pipelines using Weights & Biases to monitor your metrics. To see the full suite of W&B features please check out this short 5 minutes guide. If you want more reports covering the math and "from-scratch" code implementations let us know in the comments down below or on our forum ✨!
Check out these other reports on Fully Connected covering other fundamental development topics like GPU Utilization and Saving Models.

Iterate on AI agents and models faster. Try Weights & Biases today.