Momentum in SGD

It seems that the final value of momentum is (learning_rate * momentum) in SGD; which is not according to the standard SGD equations.

see here for details: https://github.com/pytorch/pytorch/issues/1099