Torch.autograd.grad() works but not torch.autograd.backward()

Hi,
I am trying to train a model with the loss on gradients.

torch.autograd.grad(outputs=loss, inputs=(model.parameters()), create_graph=True, retain_graph=True, allow_unused=True)

works just fine.

But calling

loss.backward()

throws the following error :

RuntimeError: trying to differentiate twice a function that was markedwith @once_differentiable

I am not sure why one works and other (backward) doesn’t. From the error it is not clear which of the function is only differentiable once.

The reason I am cannot use torch.autograd.grad() for my problem is because I need to accumulate gradients for updating my model parameters, and this is not possible with grad() as only_inputs flag is deprecated in v0.4.

Any ideas on how I can get backward() to work?

Hi,

When you backprop a function that is a function of the gradients, you actually ask for higher order derivatives as you need to backprop through the backward pass itself.
Unfortunately all the Functions don’t support that.

You can enable the anomaly detection mode to know which forward function caused this issue.

Thanks for the pointer @albanD.

My concern is, if there is a function which doesn’t support second order differentiation, even torch.autograd.grad() should throw an error. But that doesn’t seem to be the case (grad() runs fine and backward() doesn’t). I am trying to understand whats causing this.

Hmm I could you give a code sample showing exactly where do you do the forward, where do you cal backward or grad please?

Sure.

I am using the same model defined in this imagecaptioning repo
I am forward propagating the features

pred, att_w = model(fc_feats, bbox_feats, labels)
gradInput_gt = torch.autograd.grad(outputs=pred, inputs=(bbox_feats), grad_outputs=one_hot_gt.float(), create_graph=True)[0] # computing gradients wrt input features
gradInput_gt_sum = gradInput_gt.sum(dim=2)
criterion = nn.MSELoss()
loss = criterion(gradInput_gt_sum.float(), bbox_scores.float())
torch.autograd.grad(outputs=loss, inputs=(model.parameters()), create_graph=True, retain_graph=True, only_inputs=False, allow_unused=True)[0] # this works
loss.backward(retain_graph=True) # this throws the above mentioned error (once_differentiable) at runtime.

Hope this helps. Thanks

Why do you use create_grad=True and retain_grad=True in the second call to autograd.grad() ?
If you don’t do the second autograd.grad, does the backward work?
I can’t really run your code as I don’t know what all the inputs/tensors are so I’m not sure what’s wrong here :confused:

Why do you use create_grad=True and retain_grad=True in the second call to autograd.grad()?

retain_graph defaults to create_graph, if thats provided. retain_graph = True is required if we need to compute higher order gradients. Hence I have retain_graph= True in the first call.

If you don’t do the second autograd.grad , does the backward work?

No. It still fails.

Another interesting observation is that when I set require_grad=False to all the parameters in model.parameters(), and then call loss.backward(retain_graph=True) I still get the same (once_differentiable) error. This is surprising because there are no variables for which .grad needs to be computed, but it still throws the same error. I was under the impression that in a backward() call, only gradients wrt all the parameters (with attribute requires_grad=True) are computed. Am I missing something here?

Thanks