# How to increase the batch size but keep the gpu memory

since the .backward will accumulate the gradient ,can I use loss.backward twice then optim.step?
would it take double gpu memory or not ?
would the effect is the same as using batch size*2?

You could use a smaller batch size and accumulate the gradients. Then after a few iterations you could update the parameters using your optimizer.
Have a look at the 2nd option in this post.

It would yield the same behavior regarding the gradients, but note that other layers like `BatchNorm` will behave differently, since they see smaller batches.
If that’s problematic, e.g. when your batch size is really small, then you could change the momentum a bit or use other normalization layers, e.g. `GroupNorm` which should be more stable regarding smaller batch sizes.

