SGD momentum formula error?

PhysicsIsFun · July 12, 2021, 7:37pm

Greetings,

I am confused by PyTorchs usage of the momentum SGD method.
https://pytorch.org/docs/stable/generated/torch.optim.SGD.html#torch.optim.SGD
The formulas state that

v_{t+1} = \mu * v_{t} + g_{t+1} 
p_{t+1} = p_{t} - lr  * v_{t+1}

The documentation remarks that this differs from the original formula used by Sutskever et al. (download ) in terms of the application of the learning rate. However, it also differs by the fact that PyTorch subtracts the velocity from the parameter, instead of adding it.

Shouldn’t the formula be the following?

v_{t+1} = \mu * v_{t} - g_{t+1} 
p_{t+1} = p_{t} + lr * v_{t+1}

Note that in all cases, \mu > 0 is assumed.

Regards

KFrank · July 13, 2021, 12:31am

Hi Physics!

The two formulations are equivalent. Let’s call the “velocity” in the
first, pytorch, formulation vPytorch_{t} and in your second proposed
version vPhysics_{t}. The two formulations only differ in a redefinition
of v, namely vPhysics_{t} = -vPytorch_{t}, that drops out of the
final calculation of p_{t}.

Now you might prefer to call the thing that you add to your “position”,
p_{t}, a “velocity,” so you might prefer the second, vPhysics,
formulation. But that is purely a semantic or stylistic choice – again,
the two formulations are mathematically equivalent.

Best.

K. Frank

PhysicsIsFun · August 6, 2021, 7:25pm

Thank you Frank, I was seeing things…