What's the difference between torch.autograd.grad and backward()?

  • Are there any functional differences between the two pieces of code?
dydx = torch.autograd.grad(outputs=y,
                           inputs=x,
                           grad_outputs=weight,
                           retain_graph=True,
                           create_graph=True,
                           only_inputs=True)

**and**
y.backward()

the both code is to compute the grad. When use optimizer.step() after executing y.backward(), value of x will be updated, but if i use torch.autograd.grad instead of y.backward(), value of x will not be updated. Is it right?

2 Likes

Hi,

The difference is that autograd.grad() is returning the gradients to you.
While .backward() is populating the .grad field on the different leaf Tensors that were used to compute y.

In particular, this .grad field is used by the optimizers to update the weights. So if you use autograd.grad() you will need to populate these fields yourself based on the gradients it returned before calling optimizer.step().

5 Likes

thank you so muchļ¼I finally figure it out.

Hi,
Thanks for your explanation.
I have to use autograd.grad() to return the gradients for me as following:

def loss_fn(pred, target):
    return F.cross_entropy(input=pred, target=target)
data = data.clone().detach().requires_grad_(True)
output= model(data)
loss = loss_fn(pred=output, target=label)
grad = torch.autograd.grad(outputs=loss, inputs=data, allow_unused=True)

I want to update the weights of the network as well .
As you told I need to populate .grad field myself based on the gradients it returned, before calling optimizer.step(). Could you please tell me how can I do it?

Thanks in advance.

Hi,

If you want the gradients both for the parameters and the input, then you only need to make sure that your inputs is a leaf that requires grad. Which you already do by doing .detach().requires_grad_(True).
Then you can just call .backward() to get the gradient for all the leafs (so the input as well as all the weights in the net).
You can get the grad that correspond to the gradient of the input by accessing data.grad.

1 Like

Thanks alot. It works.

Hello

I have been working on VQ-VAE and facing some issues on .grad function. If you look at the attachment, it shows loss.backward(). I wish to access the gradients of loss before they reach the input again through backpropagation. But loss.grad() isnt working when I apply it after backward. Kindly help.

doubt

If we want to populate the grad manually how do you suggest to do thatĀæ

Hi @albanD, coming a bit late to this thread. Why do we need to .detach() the input tensor to get the gradient?

The .grad field is only populated for Tensors that are leafs (no gradient history). And you can use .detach() to break the link with the history, thus making the Tensor a leaf.

Note that if you donā€™t want that, you can use t.retain_grad() to force the .grad field to be populated on a non-leaf.
You can also use autograd.grad() to get the gradients wrt any Tensor you want without the .grad field being updated.