Is this an "idiomatic" way to compute a Hessian-vector product?

currymj · May 21, 2018, 2:24pm

I’m trying to port some of the code at https://github.com/kohpangwei/influence-release to PyTorch; right now, specifically, their code for computing the Hessian-Vector product.

I have code that (seems like it) works. My question is, am I doing things the right way? Do all the flags I’ve set to require/not require gradients make sense? Is there a more intuitive last step than calling .backward() on a tensor of ones?

If anyone can point me at other PyTorch code (other than the unit test, which I’ve looked at and was helpful) that uses HVPs, I would also much appreciate that.

def hvp(y, x, v):
    v.requires_grad = False
    grad_result = grad(y, x, create_graph=True)[0]
    elemwise_prods = grad_result * v
    elemwise_prods.backward(torch.ones(5, 1)) # by evaluating this at 1, get x grad
    return x.grad

Then the code to test on a simple quadratic form is something like:

A = torch.randn(5, 5, requires_grad=True)
def z(x):
    return 0.5 * x.t() @ A @ x
x = torch.randn(5, 1, requires_grad=True)
v = torch.randn(5, 1, requires_grad=False)
true_hvp = 0.5*(A + A.t()) @ v
hvp(z(x), x, v)
true_hvp

runjerry · February 19, 2019, 11:51pm

I think the hvp() can be simplified as:

def hvp(y, x, v):
    v.requires_grad = False
    grad_result = grad(y, x, create_graph=True)[0]
    grad_result.backward(v)
    return x.grad