I’m trying to port some of the code at https://github.com/kohpangwei/influence-release to PyTorch; right now, specifically, their code for computing the Hessian-Vector product.
I have code that (seems like it) works. My question is, am I doing things the right way? Do all the flags I’ve set to require/not require gradients make sense? Is there a more intuitive last step than calling
.backward() on a tensor of ones?
If anyone can point me at other PyTorch code (other than the unit test, which I’ve looked at and was helpful) that uses HVPs, I would also much appreciate that.
def hvp(y, x, v): v.requires_grad = False grad_result = grad(y, x, create_graph=True) elemwise_prods = grad_result * v elemwise_prods.backward(torch.ones(5, 1)) # by evaluating this at 1, get x grad return x.grad
Then the code to test on a simple quadratic form is something like:
A = torch.randn(5, 5, requires_grad=True) def z(x): return 0.5 * x.t() @ A @ x x = torch.randn(5, 1, requires_grad=True) v = torch.randn(5, 1, requires_grad=False) true_hvp = 0.5*(A + A.t()) @ v hvp(z(x), x, v) true_hvp