Pytorch update after single batch_size which exceeds the GPU memory

You can accumulate the gradients over a few iterations and only then run your optimizer.
This discussion might help: How to implement accumulated gradient?

2 Likes