Batch norm momentum default value

iwakuralain · January 6, 2018, 8:15pm

Hello.

I’d like to learn the motivation behind batchnorm momentum set to 0.1 by default. Seems like keeping running average with such a low weight goes against common sense. Also other libraries like tf or lasagne have this value by default at 0.99 or 0.9 respectably. By momentum I mean the following value:

X_{mov_avg} = X_{mov_avg} * momentum + X_{mean} * (1 - momentum)

Is there some work I’m not familiar with that suggests better results with low momentum inside dataset statistics computation? Or maybe there’s a typo in description of the module and you meant alpha? Meaning that momentum = 1 - alpha. Otherwise maybe it makes sense to change the default value to a more reasonable one?

Thanks for the answer.

SimonW · January 6, 2018, 9:56pm

The naming is a bit confusing. Effectively it is equivalent to X_new_avg = X_old_avg * (1 - 0.1) + X_val * 0.1.

iwakuralain · January 8, 2018, 9:19am

Thank you.

Then at least the description in docs should be changed to either accommodate formulas for accumulation of statistics or put notation in order and call it alpha (like it’s done in lasagne and renorm paper).

I’ll make an issue on github page.

christianperone · February 14, 2018, 3:32am

Are you sure that this is correct ? From what I see, pytorch is calling ATen which in turn calls cuDNN cudnnBatchNormalizationForwardTraining function, where the definition of the parameter is:

runningMean = newMeanfactor + runningMean(1-factor)

SimonW · February 14, 2018, 3:51am

Yeah I’m sure that is correct, both in thnn and cudnn. In fact, isn’t what you say just equivalent to what I said?

christianperone · February 14, 2018, 4:04am

Ouch that’s true, sorry hehe.