Skip to main content

Testing different batch sizes

Testing out different batch sizes by doing gradient accumulation. The original model with batch_size = 2 is `learning_curve_338_early_stopping`. The other models are denoted by _{x}_{y} where x is the batch size and y the mini-batch size. It looks like the performance is getting worse for larger batch sizes.
Created on January 10|Last edited on January 10

Section 1


05101520Epoch5101520
05101520Epoch5101520
05001k1.5k2k2.5kStep24681012
Run set
6