What is the running mean of BatchNorm if gradients are accumulated?

Zhang_Chi · May 30, 2018, 4:29am

thank you very much for your reply. @crcrpar
1.so you mean running mean is updated during forward process not backward process ?
2.why do you think they are different? do you mean that the accumulated gradients should be divided by 10? any other difference ?