In torch/optim/sgd.py
, we have:
p.data.add_(-group['lr'], d_p)
that does the parameter update (cmiiw).
Will it be more intuitive/obvious to write it as:
p.data.add_(-group['lr'] * d_p)
?
if not, why not?
Quick test:
>>> p = torch.tensor([[0.], [3.]])
>>> p2 = p.clone()
>>> alpha = 0.5 # step length
>>> dir = torch.tensor([[5.], [10.]]) # step direction
>>> p.data.add(-alpha,dir)
tensor([[-2.5000],
[-2.0000]])
>>> p2.data.add(-alpha * dir)
tensor([[-2.5000],
[-2.0000]])
>>> torch.equal(p,p2)
True