What's the Optimal Batch Size to Train a Neural Network?

A brief study on the effect of batch size on test accuracy. Made by Ayush Thakur using Weights & Biases
Ayush Thakur

Introduction

While realizing that the answer to questions like "what's the optimal batch size?" almost always has the same answer ("it depends"), our goal today is to look at how different batch sizes affect accuracy, training time, and compute resources. Then, we'll look into some hypotheses that explain those differences.

Let's investigate!

We first need to establish the effect of batch size on the test accuracy and training time.
To do so, let's do an ablation study. We will be using an image classification task and testing how accuracy changes with different batch sizes. A few things we'll focus on for these tests:

Try out the ablation study on Google colab \rightarrow

We will use Weights and Biases sweep to run our ablation study. Let's dig into the results. 👇

Why do larger batch sizes lead to poorer generalization?

What might be the reason(s) to explain this strange behavior? This Stat Exchange thread has a few great hypotheses. Some of my favorites:

Weights & Biases

Weights & Biases helps you keep track of your machine learning experiments. Use our tool to log hyperparameters and output metrics from your runs, then visualize and compare results and quickly share findings with your colleagues.
Get started in 5 minutes.