Say I compute gradients of a model parameter manually. I then want to set the model parameter gradient to this value and use an optimizer to update the model parameter. How would one go about doing that? And what if the model parameter was instead a Variable?
That is, I don’t use .backward() at any time.
I don’t want to accidentally grow my graph at every update.
If you check out the optimizers’ source codes, you see that you need to have
p.grad for each model parameter
So a starting point can be something like
l.weight.grad = torch.zeros_like(l.weight)
Thanks for your answer!
Yes I’m aware of that, but I’ve become a bit confused looking at threads like the below for two reasons
As far as I understand, I cannot readily assign gradients to the
.grad field of a model parameter since the gradient buffers are initialized lazily (on
.backward()) and thus are
None before a (dummy)
.backward() has been performed. It would be nice to have this functionality, but I can do without.
(see Problem on Variable.grad.data?)
I suspect that my current implementation dynamically grows the computational graph at each gradient update by saving the computational history as discuseed in What is the recommended way to re-assign/update values in a variable (or tensor)? since computation time increases for each iteration, but I’m not sure why. See also How does one make sure that the parameters are update manually in pytorch using modules?.
# Compute the gradients, returning a list of Tensors
gradients = compute_gradients(input)
# Assign the gradients; but in which way?
for layer, p in enumerate(model.parameters()):
# (1) This?
p.grad.data = gradients[layer]
# (2) What about this? (http://pytorch.org/docs/master/tensors.html#torch.Tensor.set_)
# (3) or this
p.grad = Variable(gradients[layer])
# (4) or versions using ._grad instead
p._grad.data = gradients[layer]
p._grad = Variable(gradients[layer])
Any update on assigning gradients to parameters without calling backward()? I have a similar situation where I compute the gradients manually. I added the gradient as
param.grad = Variable(my_gradient_tensor)
But on optimizer.step() does not update the parameters.