Say I compute gradients of a model parameter manually. I then want to set the model parameter gradient to this value and use an optimizer to update the model parameter. How would one go about doing that? And what if the model parameter was instead a Variable?
That is, I don’t use .backward() at any time.
I don’t want to accidentally grow my graph at every update.
If you check out the optimizers’ source codes, you see that you need to have p.grad for each model parameter p.
So a starting point can be something like
Yes I’m aware of that, but I’ve become a bit confused looking at threads like the below for two reasons
As far as I understand, I cannot readily assign gradients to the .grad field of a model parameter since the gradient buffers are initialized lazily (on .backward()) and thus are None before a (dummy) .backward() has been performed. It would be nice to have this functionality, but I can do without.
(see Problem on Variable.grad.data?)
# Compute the gradients, returning a list of Tensors
gradients = compute_gradients(input)
# Assign the gradients; but in which way?
for layer, p in enumerate(model.parameters()):
# (1) This?
p.grad.data = gradients[layer]
# (2) What about this? (http://pytorch.org/docs/master/tensors.html#torch.Tensor.set_)
p.grad.data.set_(gradients[layer])
# (3) or this
p.grad = Variable(gradients[layer])
# (4) or versions using ._grad instead
p._grad.data = gradients[layer]
p._grad.data.set_(gradients[layer])
p._grad = Variable(gradients[layer])
Any update on assigning gradients to parameters without calling backward()? I have a similar situation where I compute the gradients manually. I added the gradient as
param.grad = Variable(my_gradient_tensor)
But on optimizer.step() does not update the parameters.