Problems on getting the hessian vector product of a network

Hi, I’m trying to get a hessian vector product of a network.

I try to follow the hvp implemented in Tensorflow. But the following codes don’t work as expected. Does anybody know how to solve it?

Thank you in advance.

input = torch.randn(1, 3, 32, 32)  # Batch_size is 1.
out = net(input).sum()  # net is a neural network

para_list = [x for x in net.parameters()]

grads = autograd.grad([out], para_list, retain_graph=True, create_graph=True)

elem_prod = [g * v for g, v in zip(grads, list_of_v)]  # Here list_of_v is a list of vector v. Each v is corresponding to a parameter in para_list.

hvps = autograd.grad(elem_prod, para_list, create_graph=True)

The error says:

RuntimeError: grad can be implicitly created only for scalar outputs.

If my implementation is totally wrong, what is the correct way of doing this?

Thanks again.


I think the problem is that you do an element-wise product when you do g * v instead of a dot product.
Does elem_prod = sum([(g * v).sum() for g, v in zip(grads, list_of_v)]) compute what you want?

1 Like