Which batch normalization does pytorch implement?

kko · August 9, 2017, 2:30am

In the documentation of Pytorch’s batchnorm there is a parameter called momentum for expectation/variance updates. But, in the original batch normalization paper by Ioffe, there is no momentum. Which batch normalization algorithm does pytorch implement? Can you give reference to a paper?

dandelin · August 9, 2017, 2:35am

Looking thoroughly, the trick is written in loffe’s batchnorm paper (page 4).

… their sample variances. Using moving averages instead,
we can track the accuracy of a model as it trains.
Since the means and …