Virtual batches of SGD optimization?

tal123 · July 31, 2022, 1:42pm

I have a rather small GPU that can compute ImageNet classification loss on up to 32 images at once, and I’d like to simulate batch sizes of 256 images.

Since I use SGD optimizer, the loss gradient on the whole batch is simply the sum of the image loss gradients.

So theoretically, if I had access to the gradient vector, then I could compute the loss gradient for 256 images by computing it for 8 batches of 32 images and adding the results to a single vector, and only finally call optimizer.step().

Is there a way to do it in PyTorch?

ptrblck · July 31, 2022, 10:51pm

Yes, you could use the approaches described here.

tal123 · August 1, 2022, 8:01am

That’s great, thank you.