Hessian vector product of each gradient separately

ITs · December 7, 2021, 9:12pm

I trying to compute the Hessian vector product with an unused vector in a customized optimizer. What is optimized is Logistic Regression where the output is a `{Tensor: (10, 784) }’, that is a 10 class problem with an input of dimension 784. For simplicity lets say that we do not have a bias term. To implement the optimizer I am using the PyTorch optimizer interface with closure to get the loss. I have tried some examples like this

_grads = grad(loss, params, grad_outputs=None, only_inputs=True, retain_graph=True, create_graph=True)

and this

v = tuple([torch.ones_like(_, requires_grad=True) for _ in group['params']])
v_hess = grad(_grads, params, grad_outputs=v, only_inputs=True, retain_graph=True,create_graph=True)

and seems to work. However, what I need in my case is to compute the Hessian vector product for a specific gradient in _grads with a predefined vector (grad_outputs) like this

for p in group['params']:
if p.grad is not None:
   params.append(p)
   groups.append(group)
   grads.append(p.grad)

for grad_x, param_x in zip(grads_x, params_x):
   dummy_vector = torch.ones_like(grad_x, requires_grad=True)
   Hdv = grad(grad_1_out_of_10, param_1_out_of_10, torch.ones_like(grad_1_out_of_10, requires_grad=True), allow_unused=True)

but Hdv is None. As an example of my problem think of that when you want to compute the conjugate directions where the output vector is initialized with the -grad_1_out_of_10 and Hessian vector products are needed in each iteration. Is it possible or meaningful to compute something like that? Could you please give an example?

ITs · December 16, 2021, 3:34pm

I found some useful posts like this that solved my question.