Simultaneous evaluation of gradient and hessian

FOXP20 · May 30, 2023, 1:43pm

Hi, I have a simple question: does anyone know a straightforward method to compute both the gradient and hessian of a scalar function simultaneously using autograd? For example, when you use the torch.hessian function, the gradient is computed during this process - how do I get access to it?
In the standalone autograd library (outside of pytorch) I had to edit the sourcecode in order to do this, I was hoping I wouldn’t have to do the same here!

Thank you to anyone who replies, Sean.

AlphaBetaGamma96 · May 30, 2023, 2:09pm

Hi @FOXP20,

You can use torch.func package to get this. You can simply define a function, which computes the derivative and create a copy of it. Then you can compute the derivative again (and return the copy of the 1st derivative at the same time as an auxiliary variable).

Here’s a minimal reproducible example,

import torch
from torch.func import jacrev #reverse-mode AD

def func(x):
  return x**2

def jacobian_func(x):
  jac = jacrev(func, argnums=(0))(x)
  return jac, jac #return 2 copies here
  
def hessian_with_grad_func(x):
  hess, jac = jacrev(jacobian_func, argnums=(0), has_aux=True)(x)
  return hess, jac
  
x = torch.randn(1) #if you have more samples, you'll need to use torch.func.vmap

hess, jac = hessian_with_grad_func(x)
print("output: ",x)
print("jacobian: ",jac)  #equals 2*x
print("hessian: ",hess)  #equals 2

FOXP20 · May 31, 2023, 4:07am

Fantastic that was exactly the type of response I was looking for - thank you!

Rsmeets · August 29, 2024, 10:24am

An extra addition for people who also want to have the output given in an efficient manner (i.e. calculating the output, gradient and laplacian in one function of scalar valued functions working on a batched sample), here is a way to do that:

from typing import Callable

import torch
from torch.func import jacrev

def _return_twice(function: Callable[[torch.Tensor], torch.Tensor]) -> Callable[[torch.Tensor], tuple[torch.Tensor, torch.Tensor]]:
    def new_function(x: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]:
        output = function(x)
        return output, output
    return new_function

def _return_split(function: Callable[[torch.Tensor], tuple[torch.Tensor, torch.Tensor]]) -> Callable[[torch.Tensor], tuple[torch.Tensor, torch.Tensor]]:
    def new_function(x: torch.Tensor) -> tuple[torch.Tensor, tuple[torch.Tensor, torch.Tensor]]:
        output, aux = function(x)
        return output, (output, aux)
    return new_function

def _set_output_grad_laplacian_func(function: Callable[[torch.Tensor], torch.Tensor]) -> Callable[[torch.Tensor], tuple[torch.Tensor, torch.Tensor, torch.Tensor]]:
        jacobian_func = lambda x: _return_split(jacrev(_return_twice(function), has_aux= True))(x) 
        hessian_func = lambda x: jacrev(jacobian_func, has_aux=True)(x)
        get_laplacian = lambda hessian, aux: (torch.trace(hessian), *aux)
        vmappable_func = lambda x: get_laplacian(*hessian_func(x))
        return torch.vmap(vmappable_func)

# Generate test sample
batch_size = 2
input_dim = 3
samples = torch.rand((batch_size, input_dim))

# Define test function
def test_func(x: torch.Tensor) -> torch.Tensor:
    return 2 * torch.sum(torch.pow(x, 2))

# Creating function
output_grad_laplacian_calculator = _set_output_grad_laplacian_func(test_func)

# Calculate laplacian, gradient and output
laplacian, grad, output = output_grad_laplacian_calculator(samples)

print(f'Samples are {samples}')
print(f'Output is {output}')
print(f'Gradient is {grad}')
print(f'Laplacian is {laplacian}')

Note that this code can also be easily extended to include the Hessian as an output.