Is BatchNorm momentum backwards in PyTorch?

I just need to confirm I’m not confusing myself. I’m used to momentum (in an exponential moving average) referring to the weight that is placed on historical values in the time series. Higher momentum means more weight is placed on what has happened, rather than what is happening now. In math-speak if we have a time series S, and we want to produce an exponential moving average series V with momentum \beta

enter image description here

This is how Keras defines it.

But PyTorch seems to define it the other way around as shown in the note here. So in the PyTorch variation, higher “momentum” is the opposite of what we would intuitively think it to be. A momentum of 1 would just reproduce S in V. Right?

If so, I get that the note is there in plain sight. But even then, it should probably say something like “Yes, we know that this seems backwards and is counter to the common meaning of the word “momentum”.”

Yes, I think you are right and I also think the note mentions exactly this, doesn’t it?

This momentum argument is different from […] the conventional notion of momentum.

1 Like

Thanks for confirming! Yes I know it says that and it’s probably enough for most people. Even with that I still went on a hunt because I wasn’t sure that the “different” was what I thought it was.