I trying to compute the Hessian vector product with an unused vector in a customized optimizer. What is optimized is Logistic Regression where the output is a `{Tensor: (10, 784) }’, that is a 10 class problem with an input of dimension 784. For simplicity lets say that we do not have a bias term. To implement the optimizer I am using the PyTorch optimizer interface with closure to get the loss. I have tried some examples like this
_grads = grad(loss, params, grad_outputs=None, only_inputs=True, retain_graph=True, create_graph=True)
and this
v = tuple([torch.ones_like(_, requires_grad=True) for _ in group['params']])
v_hess = grad(_grads, params, grad_outputs=v, only_inputs=True, retain_graph=True,create_graph=True)
and seems to work. However, what I need in my case is to compute the Hessian vector product for a specific gradient in _grads
with a predefined vector (grad_outputs
) like this
for p in group['params']:
if p.grad is not None:
params.append(p)
groups.append(group)
grads.append(p.grad)
for grad_x, param_x in zip(grads_x, params_x):
dummy_vector = torch.ones_like(grad_x, requires_grad=True)
Hdv = grad(grad_1_out_of_10, param_1_out_of_10, torch.ones_like(grad_1_out_of_10, requires_grad=True), allow_unused=True)
but Hdv
is None
. As an example of my problem think of that when you want to compute the conjugate directions where the output vector is initialized with the -grad_1_out_of_10 and Hessian vector products are needed in each iteration. Is it possible or meaningful to compute something like that? Could you please give an example?