I have a rather small GPU that can compute ImageNet classification loss on up to 32 images at once, and I’d like to simulate batch sizes of 256 images.
Since I use SGD optimizer, the loss gradient on the whole batch is simply the sum of the image loss gradients.
So theoretically, if I had access to the gradient vector, then I could compute the loss gradient for 256 images by computing it for 8 batches of 32 images and adding the results to a single vector, and only finally call optimizer.step()
.
Is there a way to do it in PyTorch?