Pros and cons of minibatch

Through some small scale tests with ~100 epochs, batchsize~1e3, minibatchsize~(1e2,1e3), minibatch training times seem to drop monotonically with an increasing minibatchsize. Then what’s the benefit of using minibatch at all when using a whole batch minimizes the time?

Larger models usually won’t be able to accept the entire dataset due to memory limitations, so I assume you are working on a small example, which completely fits into your GPU (e.g. MNIST with a small CNN).
If so, then also check the generalization error by calculating the final validation loss, as smaller batches commonly yield a better generalization compared to the full dataset training.

1 Like

thanks so much. Then what about the training time and how is it related to the minibatch size?

thanks so much. Then what about the training time and how is it related to the minibatch size?

Smaller batches (~2-32 for image recognition) are usually better because the network parameters are updated every time you run through a batch with an arbitrary batch size. If you have larger batch sizes then the training speed will decrease (because gradient calculation takes a lot of time and the gradients are only calculatet once for each batch). The caviat is that larger batch sizes dont perform that well especially on more difficult data (worse generalization). And the performance of a model is way more important then the time it takes to train a model. So its almost a nobrainer to take smaller batches. You can simply experiment by only changing your batch size (2, 4, 8, 16, 32, 64, … for example) and see how it performs on your evaluation dataset.