Pytorch update after single batch_size which exceeds the GPU memory

ptrblck · October 21, 2017, 9:32am

You can accumulate the gradients over a few iterations and only then run your optimizer.
This discussion might help: How to implement accumulated gradient？