Can someone point me to a PyTorchs source code for optim.SGD where the actual weight update takes place

csailnadi · April 2, 2020, 10:11pm

I cannot find a place in a source code for optim.SGD where the actual weight update takes place. I expect to see “something like”

model.layer.weight -= lr * model.layer.weight.grad
model.layer.bias -= lr * model.layer.bias.grad

but I cannot find it…

I checked torch/optim/sgd.py and there is no obvious weight update …

Help will be very appreciated!

ptrblck · April 2, 2020, 10:13pm

These lines of code should correspond to the parameter update.

csailnadi · April 2, 2020, 10:15pm

that probably should, but again I do not see it… having that there is no momentum, or weight decay, just pure SGD with lr, where

model.layer.weight -= lr * model.layer.weight.grad
model.layer.bias -= lr * model.layer.bias.grad

takes place more precisely?

ptrblck · April 2, 2020, 10:17pm

Here it is:

d_p = p.grad
...
p.add_(d_p, alpha=-group['lr'])

csailnadi · April 2, 2020, 10:24pm

oh, I guess it is confusing because I do not see multiplication. Do you know where add_ function is defined? However it is probably not that important for me. Just when I use PyCharm “go to declaration” it brings me to empty declaration. So the definition is lost somewhere in class hierarchy.

I guess weights are stored in p, right?
Is there a way to tell what layer weight is currently being updated?
Say I have 3 layers, how do I know what weights are updated at any point?
E.g I want to do something with grad before it is used for weights update, but my action depends on a layer

ptrblck · April 2, 2020, 10:29pm

I’m not sure how PyCharm looks up the definition, but the method is an inplace version of torch.add, where alpha will be multiplied with other.

In that case, I would recommend to use register_hook on the desired parameter to change the gradients before the optimizer.step() method.
Layer-dependent gradient manipulations in the optimizer would most likely break it for any other use case.

csailnadi · April 3, 2020, 1:29am

I see… so I would do something like this
out_ = model.layer1.register_hook(lambda grad: torch.t(torch.mm(M, torch.t(grad))))
out_2 = model.layer2.register_hook(lambda grad: torch.t(torch.mm(M, torch.t(grad))))

Is it right?

but then how would I make the optimizer to use those updated weights?
e.g SGD optimizer

ptrblck · April 3, 2020, 2:37am

Since you are manipulating the gradients, the optimizer will see these new gradients in its step function and will use them to update the corresponding parameter.

Let me know, if I misunderstood the question, please.

csailnadi · April 3, 2020, 1:12pm

Oh I was just asking if I should do anything with the output of say model.layer1.register_hook, but I guess I should not, just calling model.layer.register_hook() is sufficient for the changes to be registered. This is the way I understand it

ptrblck · April 3, 2020, 10:28pm

Yes, you can manipulate the gradient directly.
However, note that you have to call register_hook on the parameter, not the module.
This should thus work: model.layer1.weight.register_hook.