the both code is to compute the grad. When use optimizer.step() after executing y.backward(), value of x will be updated, but if i use torch.autograd.grad instead of y.backward(), value of x will not be updated. Is it right?
The difference is that autograd.grad() is returning the gradients to you.
While .backward() is populating the .grad field on the different leaf Tensors that were used to compute y.
In particular, this .grad field is used by the optimizers to update the weights. So if you use autograd.grad() you will need to populate these fields yourself based on the gradients it returned before calling optimizer.step().
Hi,
Thanks for your explanation.
I have to use autograd.grad() to return the gradients for me as following:
def loss_fn(pred, target):
return F.cross_entropy(input=pred, target=target)
data = data.clone().detach().requires_grad_(True)
output= model(data)
loss = loss_fn(pred=output, target=label)
grad = torch.autograd.grad(outputs=loss, inputs=data, allow_unused=True)
I want to update the weights of the network as well .
As you told I need to populate .grad field myself based on the gradients it returned, before calling optimizer.step(). Could you please tell me how can I do it?
If you want the gradients both for the parameters and the input, then you only need to make sure that your inputs is a leaf that requires grad. Which you already do by doing .detach().requires_grad_(True).
Then you can just call .backward() to get the gradient for all the leafs (so the input as well as all the weights in the net).
You can get the grad that correspond to the gradient of the input by accessing data.grad.
I have been working on VQ-VAE and facing some issues on .grad function. If you look at the attachment, it shows loss.backward(). I wish to access the gradients of loss before they reach the input again through backpropagation. But loss.grad() isnt working when I apply it after backward. Kindly help.
The .grad field is only populated for Tensors that are leafs (no gradient history). And you can use .detach() to break the link with the history, thus making the Tensor a leaf.
Note that if you donāt want that, you can use t.retain_grad() to force the .grad field to be populated on a non-leaf.
You can also use autograd.grad() to get the gradients wrt any Tensor you want without the .grad field being updated.