In the documentation of Pytorch’s batchnorm there is a parameter called momentum for expectation/variance updates. But, in the original batch normalization paper by Ioffe, there is no momentum. Which batch normalization algorithm does pytorch implement? Can you give reference to a paper?
Looking thoroughly, the trick is written in loffe’s batchnorm paper (page 4).
… their sample variances. Using moving averages instead,
we can track the accuracy of a model as it trains.
Since the means and …