What's the difference between torch.autograd.grad and backward()?

michaelfang · August 31, 2020, 3:03am

Are there any functional differences between the two pieces of code?

dydx = torch.autograd.grad(outputs=y,
                           inputs=x,
                           grad_outputs=weight,
                           retain_graph=True,
                           create_graph=True,
                           only_inputs=True)

**and**
y.backward()

the both code is to compute the grad. When use optimizer.step() after executing y.backward(), value of x will be updated, but if i use torch.autograd.grad instead of y.backward(), value of x will not be updated. Is it right?

albanD · August 31, 2020, 4:40pm

Hi,

The difference is that autograd.grad() is returning the gradients to you.
While .backward() is populating the .grad field on the different leaf Tensors that were used to compute y.

In particular, this .grad field is used by the optimizers to update the weights. So if you use autograd.grad() you will need to populate these fields yourself based on the gradients it returned before calling optimizer.step().

michaelfang · September 5, 2020, 6:44am

thank you so much！I finally figure it out.

Nazila-H · December 16, 2020, 10:29am

Hi,
Thanks for your explanation.
I have to use autograd.grad() to return the gradients for me as following:

def loss_fn(pred, target):
    return F.cross_entropy(input=pred, target=target)
data = data.clone().detach().requires_grad_(True)
output= model(data)
loss = loss_fn(pred=output, target=label)
grad = torch.autograd.grad(outputs=loss, inputs=data, allow_unused=True)

I want to update the weights of the network as well .
As you told I need to populate .grad field myself based on the gradients it returned, before calling optimizer.step(). Could you please tell me how can I do it?

Thanks in advance.

albanD · December 16, 2020, 3:03pm

Hi,

If you want the gradients both for the parameters and the input, then you only need to make sure that your inputs is a leaf that requires grad. Which you already do by doing .detach().requires_grad_(True).
Then you can just call .backward() to get the gradient for all the leafs (so the input as well as all the weights in the net).
You can get the grad that correspond to the gradient of the input by accessing data.grad.

Nazila-H · December 16, 2020, 3:20pm

Thanks alot. It works.

Shivani_Malhotra · May 14, 2021, 4:33am

Hello

I have been working on VQ-VAE and facing some issues on .grad function. If you look at the attachment, it shows loss.backward(). I wish to access the gradients of loss before they reach the input again through backpropagation. But loss.grad() isnt working when I apply it after backward. Kindly help.

doubt

bpfrd · February 12, 2023, 11:21am

If we want to populate the grad manually how do you suggest to do that¿

pietrolesci · February 4, 2024, 5:02pm

Hi @albanD, coming a bit late to this thread. Why do we need to .detach() the input tensor to get the gradient?

albanD · February 12, 2024, 7:35pm

The .grad field is only populated for Tensors that are leafs (no gradient history). And you can use .detach() to break the link with the history, thus making the Tensor a leaf.

Note that if you don’t want that, you can use t.retain_grad() to force the .grad field to be populated on a non-leaf.
You can also use autograd.grad() to get the gradients wrt any Tensor you want without the .grad field being updated.