Through some small scale tests with ~100 epochs, batchsize~1e3, minibatchsize~(1e2,1e3), minibatch training times seem to drop monotonically with an increasing minibatchsize. Then what’s the benefit of using minibatch at all when using a whole batch minimizes the time?
Larger models usually won’t be able to accept the entire dataset due to memory limitations, so I assume you are working on a small example, which completely fits into your GPU (e.g. MNIST with a small CNN).
If so, then also check the generalization error by calculating the final validation loss, as smaller batches commonly yield a better generalization compared to the full dataset training.
thanks so much. Then what about the training time and how is it related to the minibatch size?