I’d like to learn the motivation behind batchnorm momentum set to 0.1 by default. Seems like keeping running average with such a low weight goes against common sense. Also other libraries like tf or lasagne have this value by default at 0.99 or 0.9 respectably. By momentum I mean the following value:
Is there some work I’m not familiar with that suggests better results with low momentum inside dataset statistics computation? Or maybe there’s a typo in description of the module and you meant alpha? Meaning that momentum = 1 - alpha. Otherwise maybe it makes sense to change the default value to a more reasonable one?
Then at least the description in docs should be changed to either accommodate formulas for accumulation of statistics or put notation in order and call it alpha (like it’s done in lasagne and renorm paper).
Are you sure that this is correct ? From what I see, pytorch is calling ATen which in turn calls cuDNN cudnnBatchNormalizationForwardTrainingfunction, where the definition of the parameter is: