Backpropagation based on aggregated output of all batches

aminpdi · March 17, 2022, 6:23pm

Hi everyone,

I have one question that I want to make sure about it.

My loss function is defined on the aggregated output of all batches. It means I don’t have a loss function to evaluate the output of each batch. I aggregate the output of all batches to generate the final output and I have a cross-entropy on the final output.

Does backpropagation work properly by loss.backward and optimizer.step?

Thank you

tom · March 17, 2022, 6:57pm

This would only work if you had the memory to keep the computational graph of all batches, which is highly unlikely.

Best regards

Thomas

rfmac · March 17, 2022, 7:14pm

Hi, @aminpdi!

Answering you question, yes, backpropagation work properly for this scenario you described.
But I also agree with @tom, it could require a considerable amount of memory.

To reduce the amount of required memory, you could use the gradient accumulation technique.
The good news is that it’s quite simple to implement it in Pytorch, you just need to call loss.backward() as many times you want (in your case after all the batches that you want to accumulate), then when you want to update the model weights, just call optimizer.step(). If you’re going to repeat this process, don’t forget to use model.zero_grad() after the step function

Best Regards,
Rafael Macedo.

aminpdi · March 17, 2022, 7:56pm

Sorry, I think my question was not clear enough. let’s say I have a minibatch with size B * C * H * W. Then my output size is B * Cp * Hp * Wp. the output of Regular cross-entropy is B*1. Then we get for example average of errors to have the final error for backpropagation.

In my case, On the OUTPUT batch B * Cp * Hp * Wp, I want to first do an averaging on all images in the current mini-batch (OUTPUT.mean(dim=0)), then OUTPUT will be Cp * Hp * Wp. Finally, I apply my cross-entropy loss to the final output.

The reason is, I don’t have supervision on each image of the mini-batch. in my case Cp=2,Hp=1,Wp=1.