Batch norm momentum default value


I’d like to learn the motivation behind batchnorm momentum set to 0.1 by default. Seems like keeping running average with such a low weight goes against common sense. Also other libraries like tf or lasagne have this value by default at 0.99 or 0.9 respectably. By momentum I mean the following value:

X_{mov_avg} = X_{mov_avg} * momentum + X_{mean} * (1 - momentum)

Is there some work I’m not familiar with that suggests better results with low momentum inside dataset statistics computation? Or maybe there’s a typo in description of the module and you meant alpha? Meaning that momentum = 1 - alpha. Otherwise maybe it makes sense to change the default value to a more reasonable one?

Thanks for the answer.

1 Like

The naming is a bit confusing. Effectively it is equivalent to X_new_avg = X_old_avg * (1 - 0.1) + X_val * 0.1.


Thank you.

Then at least the description in docs should be changed to either accommodate formulas for accumulation of statistics or put notation in order and call it alpha (like it’s done in lasagne and renorm paper).

I’ll make an issue on github page.


Are you sure that this is correct ? From what I see, pytorch is calling ATen which in turn calls cuDNN cudnnBatchNormalizationForwardTraining function, where the definition of the parameter is:

runningMean = newMeanfactor + runningMean(1-factor)

Yeah I’m sure that is correct, both in thnn and cudnn. In fact, isn’t what you say just equivalent to what I said?

1 Like

Ouch that’s true, sorry hehe.