Why use add_(-group['lr'], d_p), instead of add_(-group['lr'] * d_p)?

tor · July 4, 2018, 11:33pm

In torch/optim/sgd.py, we have:
p.data.add_(-group['lr'], d_p) that does the parameter update (cmiiw).

Will it be more intuitive/obvious to write it as:
p.data.add_(-group['lr'] * d_p) ?
if not, why not?

Quick test:

>>> p = torch.tensor([[0.], [3.]])
>>> p2 = p.clone()
>>> alpha = 0.5 # step length
>>> dir = torch.tensor([[5.], [10.]]) # step direction
>>> p.data.add(-alpha,dir)
tensor([[-2.5000],
        [-2.0000]])
>>> p2.data.add(-alpha * dir)
tensor([[-2.5000],
        [-2.0000]])
>>> torch.equal(p,p2)
True

ptrblck · July 4, 2018, 11:51pm

It will be the same, since it’s just the second method described here as the in-place version.
It seems the docs for the in-place version are currently missing, but as far as I know @richard is onto this.