In the documentation of Pytorch’s batchnorm there is a parameter called momentum for expectation/variance updates. But, in the original batch normalization paper by Ioffe, there is no momentum. Which batch normalization algorithm does pytorch implement? Can you give reference to a paper?

Looking thoroughly, the trick is written in loffe’s batchnorm paper (page 4).

… their sample variances. Using

moving averagesinstead,

we can track the accuracy of a model as it trains.

Since the means and …