How to increase the batch size but keep the gpu memory

ptrblck · October 1, 2018, 3:55am

You could use a smaller batch size and accumulate the gradients. Then after a few iterations you could update the parameters using your optimizer.
Have a look at the 2nd option in this post.

It would yield the same behavior regarding the gradients, but note that other layers like BatchNorm will behave differently, since they see smaller batches.
If that’s problematic, e.g. when your batch size is really small, then you could change the momentum a bit or use other normalization layers, e.g. GroupNorm which should be more stable regarding smaller batch sizes.