I am trying to implement hessian vector product but I am facing this problem.
Here’s the code:
for p in model.parameters():
grads_fun.append(p.grad)
grads_data.append(p.grad.data)
for gg in grads_fun:
gg.requires_grad=True
zs = [0.0*p.data for p in model.parameters()]
rs = [g for g in grads_data]
ds = [-1.0*g for g in grads_data]
Hd = torch.autograd.grad(grads_fun, list(model.parameters()), grad_outputs=ds, only_inputs=True, retain_graph=True,allow_unused=False)
“”""""
But I got the errors:
“”"""
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
“”"""
And if I set allow_unused=True, Hd will be all None.
Could any one point out what’s the problem with the codes and how do I get around this problem.
You don’t show how you got the gradients. Did you use build_graph=True?
Also, for linear functions the gradient will not depend on the parameters, this will also look similar to your error message.
Thanks Tomas. I am using loss.backward() to accumulate gradients, I also try to let retain_graph =True.
As to the dependency on the parameters, I tried to not using ds but grads_data to compute hessian vector product. I still get the same error message.
“Hd = torch.autograd.grad(grads_fun, params, grad_outputs=grads_data, only_inputs=True, retain_graph=True,allow_unused=False)
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.”
Can you explain the reason as to why for linear functions the gradients will no longer depend on parameters, and also cant they be just set to zero if they do not in that case ?