Pytorch update after single batch_size which exceeds the GPU memory

Can pytorch support to update parameters after a relative large batch size which exceed the GPU memory if feeded in one time?

My model now can only be feeded batch_size=32 samples a time due to GPU 11G memory. The loss is varied heavily when batch_size is small, because the category is 4000. So I want to update the parameters after more samples, like 128 samples. Anyone has any advice?

By the way, anyone has tried it? I am doubt if it performs better than small batch size.


You can accumulate the gradients over a few iterations and only then run your optimizer.
This discussion might help: How to implement accumulated gradient?


I got puzzle with your link. I save this for some future reading. When I have enough time , I hope I can solve this problem.