What is the running mean of BatchNorm if gradients are accumulated?

crcrpar · May 30, 2018, 4:53am

Yes.
Accumulated gradients will be the same if you divide them by the number of iterations. I referred below.