Vectorise Grad and Hessian

Fasc · September 7, 2020, 3:57pm

Hi there,

I am trying to speed up / vectorise the following code:

def function(input_):

    # Compute prediction for u
    for ii in range(input_.shape[0]):
        x = input_[ii, :]
        x.requires_grad = True
        y = targets[ii]
        u = model(x)

        first_derivative = autograd.grad(u, x, retain_graph=True, create_graph=True)
        hessian_mtrx = hessian(model, x, create_graph=False, strict=False)

I am confused about the interfaces as the hessian requires “sequences”(Automatic differentiation package - torch.autograd — PyTorch 2.1 documentation) but

x_tup = tuple(x.detach()) # with and without .detach()
hessian_mtrx = hessian(model, x_tup, create_graph=False, strict=False)

does not work and I get a Typeerror where the batch size is confused for the dimension of the input to the model:

TypeError: forward() takes 2 positional arguments but 33 were given

I am similarly confused about the correct and efficient usage of autograd.grad as

x = input_
x.requires_grad = True
u = model(x)
first_derivative = autograd.grad(u, x, retain_graph=True, create_graph=True)

gives the following error message:

RuntimeError: grad can be implicitly created only for scalar outputs

Any pointers or partial solutions are highly appreciated!

Thanks a lot!

model(x) takes an input of size (n,2) and returns a one-dimensional tensor of length n.

albanD · September 7, 2020, 8:19pm

Hi,

I am confused about the interfaces as the hessian requires “sequences"

Not sure what you mean here?
If you function takes a single input, you can just give that to “hessian” and if it takes multiple of them, you can just pass a tuple containing all of them. hessian(model, x) in your case should work fine no?

RuntimeError: grad can be implicitly created only for scalar outputs

autograd.grad() is just running backward mode AD. And so it will create the grad_outputs for you only if the output, u in your case, is a scalar. In other cases, you will need to provide it explicitly with the value that fits what you want to get.

Fasc · September 8, 2020, 6:52am

Hi,

thank you for taking your time to look at my problem!

Not sure what you mean here?
If you function takes a single input, you can just give that to “hessian” and if it takes multiple of them, you can just pass a tuple containing all of them. hessian(model, x) in your case should work fine no?

I assumed that if my network has a 2 dimensional input that a tuple of n 1x2 tensors should work as for the example in the documentation. As for my last comment the batch size n is confused for the input dimension, in this can 2. Does that make more sense?

autograd.grad() is just running backward mode AD. And so it will create the grad_outputs for you only if the output, u in your case, is a scalar. In other cases, you will need to provide it explicitly with the value that fits what you want to get.

Ok, I think you found the critical point there. I am going to look into this and update this comment with relevant links. Thank you!

keunwoo · September 8, 2020, 8:03am

I guess the meaning of ‘vectorise’ is to calculate Hessian matrix for each row in input.
If it is true, I had the same problem.
This is the issue that I posted.

I don’t think it is possible to do this efficiently with Pytorch.
So I switched to Jax.
This is the code compares performance of Jax and Pytorch for Hassian diagonal vector calculation task. This might be what you need.

But I’m still figuring out if my approach is correct.

albanD · September 8, 2020, 2:41pm

Ho you mean you have a single argument that is a tuple? We can’t really handle that indeed. You can wrap it in a tuple before passing it to the hessian function for this to work: hessian(model, (my_tuple_arg,)).