What's the difference between torch.autograd.grad and backward()?

  • Are there any functional differences between the two pieces of code?
dydx = torch.autograd.grad(outputs=y,
                           inputs=x,
                           grad_outputs=weight,
                           retain_graph=True,
                           create_graph=True,
                           only_inputs=True)

**and**
y.backward()

the both code is to compute the grad. When use optimizer.step() after executing y.backward(), value of x will be updated, but if i use torch.autograd.grad instead of y.backward(), value of x will not be updated. Is it right?

2 Likes

Hi,

The difference is that autograd.grad() is returning the gradients to you.
While .backward() is populating the .grad field on the different leaf Tensors that were used to compute y.

In particular, this .grad field is used by the optimizers to update the weights. So if you use autograd.grad() you will need to populate these fields yourself based on the gradients it returned before calling optimizer.step().

5 Likes

thank you so much!I finally figure it out.

Hi,
Thanks for your explanation.
I have to use autograd.grad() to return the gradients for me as following:

def loss_fn(pred, target):
    return F.cross_entropy(input=pred, target=target)
data = data.clone().detach().requires_grad_(True)
output= model(data)
loss = loss_fn(pred=output, target=label)
grad = torch.autograd.grad(outputs=loss, inputs=data, allow_unused=True)

I want to update the weights of the network as well .
As you told I need to populate .grad field myself based on the gradients it returned, before calling optimizer.step(). Could you please tell me how can I do it?

Thanks in advance.

Hi,

If you want the gradients both for the parameters and the input, then you only need to make sure that your inputs is a leaf that requires grad. Which you already do by doing .detach().requires_grad_(True).
Then you can just call .backward() to get the gradient for all the leafs (so the input as well as all the weights in the net).
You can get the grad that correspond to the gradient of the input by accessing data.grad.

1 Like

Thanks alot. It works.

Hello

I have been working on VQ-VAE and facing some issues on .grad function. If you look at the attachment, it shows loss.backward(). I wish to access the gradients of loss before they reach the input again through backpropagation. But loss.grad() isnt working when I apply it after backward. Kindly help.

doubt