What is the right batch size for a conv neural net?
Created on April 16|Last edited on April 16
Comment
Add a comment
Generally speaking, neural network-based architectures are not so much dependent on the batch size. Given enough epochs, it will converge well. However these are my observations,
- Large batch size leads to quicker convergence for fewer epochs.
- Some architectures don't scale well and show training instability for large batch sizes.
- The choice of batch size will depend on the compute resource available to you.
Having said that for small dataset I have observed that smaller batch sizes tend to hit that sweet spot.
Rule of thumb: Use a batch size such that your GPU utilization is close to 100%. If you have time find the sweet spot for the batch size but in practice with enough epochs and right initialization and all you will converge your model.
1 reply
Hi, Enrico! In general: Larger batch sizes result in faster progress in training, but don't always converge as fast. Smaller batch sizes train slower, but can converge faster. It's definitely problem dependent.
In general, the models improve with more epochs of training, to a point. They'll start to plateau in accuracy as they converge. Try something like 50 and plot number of epochs (x axis) vs. accuracy (y axis). You'll see where it levels out.
What is the type and/or shape of your data? Are these images, or just tabular data? This is an important detail.
1 reply