This is a piece of code that compute Hessian vector product (gradient of gradient with regard to a given vector).
input = torch.tensor([1.0, 2.0, 0.5, 0.2], requires_grad=True)
output = input.tanh().sum()
grads = torch.autograd.grad(output, input, create_graph=True, retain_graph = True)
flatten = torch.cat([g.reshape(-1) for g in grads if g is not None])
for i in range(100):
┆ v = torch.randn(4)
┆ hvps = torch.autograd.grad([flatten @ v], input, allow_unused=True, retain_graph = True)
┆ print("{} {} {}".format(output.data, grads[0].data, hvps[0].data))
PyTorch says the bold part (retain_graph=True in hvps computation) is necessary. Otherwise this error msg shows up.
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
However, I am wondering if that “retain_graph=True” is REALLY necessary? I might be wrong, but it seems to me that computing Hessian vector product for v1 doesn’t depend on that for v2. Will this be unnecessary memory overhead? Could this code snippet be written differently to avoid keeping those graphs that are not needed?