[Hessian Vector Product] RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior

I am trying to implement hessian vector product but I am facing this problem.

Here’s the code:

    for p in model.parameters():
        grads_fun.append(p.grad)
        grads_data.append(p.grad.data)

    for gg in grads_fun:
        gg.requires_grad=True

    zs = [0.0*p.data for p in model.parameters()]
    rs = [g for g in grads_data]
    ds = [-1.0*g for g in grads_data]

    Hd = torch.autograd.grad(grads_fun, list(model.parameters()), grad_outputs=ds, only_inputs=True, retain_graph=True,allow_unused=False)

“”""""

But I got the errors:

“”"""
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
“”"""

And if I set allow_unused=True, Hd will be all None.

Could any one point out what’s the problem with the codes and how do I get around this problem.

Thanks

You don’t show how you got the gradients. Did you use build_graph=True?
Also, for linear functions the gradient will not depend on the parameters, this will also look similar to your error message.

Best regards

Thomas

Thanks Tomas. I am using loss.backward() to accumulate gradients, I also try to let retain_graph =True.

As to the dependency on the parameters, I tried to not using ds but grads_data to compute hessian vector product. I still get the same error message.

“Hd = torch.autograd.grad(grads_fun, params, grad_outputs=grads_data, only_inputs=True, retain_graph=True,allow_unused=False)
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.”

Cheers

Yeah, but if you want to take derivatives of quantities involving the gradients, you need create_graph=True, too.

Best regards

Thomas

This solves the problem.

Thank you so much Thomas,

Best Regards,

chih-hao

Can you explain the reason as to why for linear functions the gradients will no longer depend on parameters, and also cant they be just set to zero if they do not in that case ?