How to increase the batch size but keep the gpu memory

since the .backward will accumulate the gradient ,can I use loss.backward twice then optim.step?
would it take double gpu memory or not ?
would the effect is the same as using batch size*2?

1 Like

You could use a smaller batch size and accumulate the gradients. Then after a few iterations you could update the parameters using your optimizer.
Have a look at the 2nd option in this post.

It would yield the same behavior regarding the gradients, but note that other layers like BatchNorm will behave differently, since they see smaller batches.
If that’s problematic, e.g. when your batch size is really small, then you could change the momentum a bit or use other normalization layers, e.g. GroupNorm which should be more stable regarding smaller batch sizes.

5 Likes

For #2 from this post, do you need the flag retain_graph=True , i.e loss.backward(retain_graph=True)? It seems this is important?

I think the question has been answered here: