Compute the whole gradient of a mini-batch using the accumulated gradient of small mini-batches

Dhorka · June 12, 2019, 7:54am

It should be the same, right? I mean, if you have the numbers 4 7 12 15. The mean of all of them is the same if you compute the sum of all of them and divide by the number of elements or if you compute first the mean of the first numbers, following the mean of the last numbers and the mean of both.

In addition, I am accumulating the gradients, the loss is used in each small mini-batch. As is explained here.