# SGD momentum formula error?

Greetings,

I am confused by PyTorchs usage of the momentum SGD method.
https://pytorch.org/docs/stable/generated/torch.optim.SGD.html#torch.optim.SGD
The formulas state that

v_{t+1} = \mu * v_{t} + g_{t+1}
p_{t+1} = p_{t} - lr  * v_{t+1}


The documentation remarks that this differs from the original formula used by Sutskever et al. (download ) in terms of the application of the learning rate. However, it also differs by the fact that PyTorch subtracts the velocity from the parameter, instead of adding it.

Shouldnâ€™t the formula be the following?

v_{t+1} = \mu * v_{t} - g_{t+1}
p_{t+1} = p_{t} + lr * v_{t+1}


Note that in all cases, \mu > 0 is assumed.

Regards

Hi Physics!

The two formulations are equivalent. Letâ€™s call the â€śvelocityâ€ť in the
first, pytorch, formulation vPytorch_{t} and in your second proposed
version vPhysics_{t}. The two formulations only differ in a redefinition
of v, namely vPhysics_{t} = -vPytorch_{t}, that drops out of the
final calculation of p_{t}.

Now you might prefer to call the thing that you add to your â€śpositionâ€ť,
p_{t}, a â€śvelocity,â€ť so you might prefer the second, vPhysics,
formulation. But that is purely a semantic or stylistic choice â€“ again,
the two formulations are mathematically equivalent.

Best.

K. Frank

1 Like

Thank you Frank, I was seeing thingsâ€¦