Hi, @Zhang_Chi
Batch Normalization updates its running mean and variance every call of forward
method.
Also, by default, BatchNorm updates its running mean by running_mean = alpha * mean + (1 - alpha) * running_mean
(the details are here).
As to accumulating gradients, this thread “How to implement accumulated gradient? - #8 by Gopal_Sharma” might help you.
As a side note, I don’t think the accumulated gradients and the gradient will be the same in the example.