Issues computing Hessian-vector product

mjacar · May 5, 2017, 5:23pm

I think issue could best be described by giving a simple example. In the following simple script, I’m trying to take the Hessian-vector product where the Hessian is of f_of_theta taken w.r.t. theta and the vector is simply vector.

import torch
from torch.autograd import Variable, grad

theta = Variable(torch.randn(2,2), requires_grad=True)
f_of_theta = torch.sum(theta ** 2 + theta)
vector = Variable(torch.randn(2,2))

gradient = grad(f_of_theta, theta)[0]
gradient_vector_product = torch.sum(gradient * vector)
gradient_vector_product.requires_grad = True
hessian_vector_product = grad(gradient_vector_product, theta)[0]

gradient is being calculated correctly but when the script tries to calculated hessian_vector_product, I get the following error:

terminate called after throwing an instance of ‘std::runtime_error’
what(): differentiated input is unreachable
Aborted

So, simply put, my question is how exactly should I do what I’m trying to do? Any help with this would be greatly appreciated.

Edit: Note that I’m using the pytorch version built from the latest commit on master (ff0ff33).

jekbradbury · May 5, 2017, 6:55pm

There’s a Hessian-vector product example in the autograd tests: https://github.com/pytorch/pytorch/blob/master/test/test_autograd.py#L151

mjacar · May 5, 2017, 7:32pm

Yeah, I was looking at that before but I couldn’t get it to work by doing something that I thought was analogous. Turns out my hang-up was that it didn’t occur to me that it was important to pass in a Variable to the first backward pass and not a Tensor.

For the sake of anyone else who may read this in the future, it appears that the following is what is needed to get my simple example working:

import torch
from torch.autograd import Variable, grad

theta = Variable(torch.randn(2,2), requires_grad=True)
f_of_theta = torch.sum(theta ** 2 + theta)
vector = Variable(torch.randn(2,2))

f_of_theta.backward(Variable(torch.ones(2,2), requires_grad=True), retain_variables=True)
gradient = theta.grad
gradient_vector_product = torch.sum(gradient * vector)
gradient_vector_product.backward(torch.ones(2,2))
hessian_vector_product = theta.grad - gradient

apaszke · May 5, 2017, 8:19pm

You don’t need to pass in a Variable nor specify retain_variables. This would be enough:

f_of_theta.backward(create_graph=True)

mjacar · May 5, 2017, 8:24pm

When I do:

import torch
from torch.autograd import Variable, grad

theta = Variable(torch.randn(2,2), requires_grad=True)
f_of_theta = torch.sum(theta ** 2 + theta)
vector = Variable(torch.randn(2,2))

f_of_theta.backward(create_graph=True)
gradient = theta.grad
gradient_vector_product = torch.sum(gradient * vector)
gradient_vector_product.backward(torch.ones(2,2))
hessian_vector_product = theta.grad - gradient

I get:

TypeError: backward() got an unexpected keyword argument ‘create_graph’

I’m still on the same PyTorch version as before. Weird. I definitely see the create_graph argument on line 46 of ./torch/autograd/init.py of the source I’m building off of. Not sure what to make of that.

apaszke · May 5, 2017, 8:38pm

Hard to say. Maybe try to pip uninstall torch and build it again?

tom · May 5, 2017, 9:00pm

Hello @apaszke,

I’m not sure whether it is relevant, but for me, fn.backward does not take create_graph either, but backward(fn, create_graph=True) works as expected.
This seems to be because right now

github.com

pytorch/pytorch/blob/master/torch/autograd/variable.py#L110


            be constructed, allowing to compute higher order derivative
            products. Defaults to ``False``.
    """
    torch.autograd.backward(self, gradient, retain_graph, create_graph)


def register_hook(self, hook):
    """Registers a backward hook.


    The hook will be called every time a gradient with respect to the
    variable is computed. The hook should have the following signature::


        hook(grad) -> Variable or None


    The hook should not modify its argument, but it can optionally return
    a new gradient which will be used in place of :attr:`grad`.


    This function returns a handle with a method ``handle.remove()``
    that removes the hook from the module.


    Example:
        >>> v = Variable(torch.Tensor([0, 0, 0]), requires_grad=True)

does not take create_graph.

Best regards

Thomas

apaszke · May 6, 2017, 12:03am

@tom You’re right! I forgot to add new arguments to Variable.backward. Thanks.

mjacar · May 6, 2017, 12:07am

@apaszke That was probably my issue too. On a somewhat related note, is there any sense of an ETA on converting all operations to be twice differentiable?

tigerneil · June 4, 2017, 10:04am

https://github.com/mjacar/pytorch-trpo I run example.py in this project and came across several problems: one is unexpected argument create_graph

change

dict iteration methods,
xrange to range and
remove create_graph=True

made the trpo cartpole example work in python3.

But I wanna know what does create_graph do for us? And whether this argument has already been removed? Thanks.

AjayTalati · June 4, 2017, 10:32am

Hi, @tigerneil

try uninstalling torch, and rebuilding from source? Then, it should work fine, with create_graph=True

mjacar · June 4, 2017, 2:08pm

create_graph=True is necessary for the proper second derivative approach to work. Right now the bottleneck on that is having all PyTorch operations be twice differentiable. So, in a sense, that parameter is presently useless. For what it’s worth, I plan on continuing work on that repo which will include documentation so there’s no confusion over Python 2 vs Python 3 among numerous other things.

tigerneil · June 4, 2017, 2:17pm

yeah. I rebuild from source. It works now. Thanks.

onlytailei · July 22, 2017, 1:45pm

And only 0.2.0 supports that? but not 0.1.2?

tom · July 22, 2017, 2:28pm

Yes. As far as I understand, 0.2 will be released next week or so, but until then you need to compile from source for second derivatives.