Virtual batches of SGD optimization?

I have a rather small GPU that can compute ImageNet classification loss on up to 32 images at once, and I’d like to simulate batch sizes of 256 images.

Since I use SGD optimizer, the loss gradient on the whole batch is simply the sum of the image loss gradients.

So theoretically, if I had access to the gradient vector, then I could compute the loss gradient for 256 images by computing it for 8 batches of 32 images and adding the results to a single vector, and only finally call optimizer.step().

Is there a way to do it in PyTorch?

Yes, you could use the approaches described here.

1 Like

That’s great, thank you.