I have simple and basic question regarding the to calculation of gradients: Let’s say I have mini-batch of N elements all leading to N individual losses which I want accumulated to a single loss.
My question: Does autograd average the gradients coming from a mini-batch (corresponding to the batch-size) or does it just sums all gradients up?
It seems that there is no averaging but pure summing. But I’d like to get a confirmation.
Here is what I mean: Let us consider a mini-batch of N equal elements. If I sum all the N individual losses and let autograd calculate gradients and do an optimizer-step (I tried this with pure SDG). Then the gradient step will scale with N (so N=2 leads to a step twice as big as N=1). Hence, to get something mini-batch size independent, I should sure the average of the individual losses, right?
Thanks a lot!