I am trying to implement hessian vector product but I am facing this problem.
Here’s the code:
for p in model.parameters():
for gg in grads_fun:
zs = [0.0*p.data for p in model.parameters()]
rs = [g for g in grads_data]
ds = [-1.0*g for g in grads_data]
Hd = torch.autograd.grad(grads_fun, list(model.parameters()), grad_outputs=ds, only_inputs=True, retain_graph=True,allow_unused=False)
But I got the errors:
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
And if I set allow_unused=True, Hd will be all None.
Could any one point out what’s the problem with the codes and how do I get around this problem.
You don’t show how you got the gradients. Did you use
Also, for linear functions the gradient will not depend on the parameters, this will also look similar to your error message.
Thanks Tomas. I am using loss.backward() to accumulate gradients, I also try to let retain_graph =True.
As to the dependency on the parameters, I tried to not using ds but grads_data to compute hessian vector product. I still get the same error message.
“Hd = torch.autograd.grad(grads_fun, params, grad_outputs=grads_data, only_inputs=True, retain_graph=True,allow_unused=False)
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.”
Yeah, but if you want to take derivatives of quantities involving the gradients, you need
This solves the problem.
Thank you so much Thomas,
Can you explain the reason as to why for linear functions the gradients will no longer depend on parameters, and also cant they be just set to zero if they do not in that case ?