What is the running mean of BatchNorm if gradients are accumulated?

crcrpar · May 30, 2018, 4:15am

Batch Normalization updates its running mean and variance every call of forward method.
Also, by default, BatchNorm updates its running mean by running_mean = alpha * mean + (1 - alpha) * running_mean (the details are here).

As to accumulating gradients, this thread “How to implement accumulated gradient？ - #8 by Gopal_Sharma” might help you.

As a side note, I don’t think the accumulated gradients and the gradient will be the same in the example.