Manually compute and assign gradients of model parameters and Variables

JakobHavtorn · February 24, 2018, 2:47pm

Say I compute gradients of a model parameter manually. I then want to set the model parameter gradient to this value and use an optimizer to update the model parameter. How would one go about doing that? And what if the model parameter was instead a Variable?

That is, I don’t use .backward() at any time.

I don’t want to accidentally grow my graph at every update.

tom · February 24, 2018, 3:28pm

If you check out the optimizers’ source codes, you see that you need to have p.grad for each model parameter p.
So a starting point can be something like

l.weight.grad = torch.zeros_like(l.weight)

Best regards

Thomas

JakobHavtorn · February 26, 2018, 11:34am

Thanks for your answer!

Yes I’m aware of that, but I’ve become a bit confused looking at threads like the below for two reasons

As far as I understand, I cannot readily assign gradients to the .grad field of a model parameter since the gradient buffers are initialized lazily (on .backward()) and thus are None before a (dummy) .backward() has been performed. It would be nice to have this functionality, but I can do without.
(see Problem on Variable.grad.data?)
I suspect that my current implementation dynamically grows the computational graph at each gradient update by saving the computational history as discuseed in What is the recommended way to re-assign/update values in a variable (or tensor)? since computation time increases for each iteration, but I’m not sure why. See also How does one make sure that the parameters are update manually in pytorch using modules?.

My “thoughts”:

# Compute the gradients, returning a list of Tensors
gradients = compute_gradients(input)
# Assign the gradients; but in which way?
for layer, p in enumerate(model.parameters()):
    # (1) This?
    p.grad.data = gradients[layer]

    # (2) What about this? (http://pytorch.org/docs/master/tensors.html#torch.Tensor.set_)
    p.grad.data.set_(gradients[layer])

    # (3) or this
    p.grad = Variable(gradients[layer])

    # (4) or versions using ._grad instead
    p._grad.data = gradients[layer]
    p._grad.data.set_(gradients[layer])
    p._grad = Variable(gradients[layer])

rajcscw · May 2, 2018, 10:24pm

Any update on assigning gradients to parameters without calling backward()? I have a similar situation where I compute the gradients manually. I added the gradient as

param.grad = Variable(my_gradient_tensor)

But on optimizer.step() does not update the parameters.