Could use some clarification on jacrev() etc

shubbey · May 6, 2023, 1:09am

Consider the example below. I’m trying to understand why the two calls aren’t identical (the second call returns a zero tensor instead of the proper gradients). I feel like I’m missing something very fundamental here. Thanks!

import torch
from torch.func import jacrev

def comp(x,y):
    return x*x*y;

def comp2(x,y):
    return torch.sum(comp(x,y))

def comp3(x,y):
    return torch.sum(y)

R = torch.tensor([1.0,2.0,3.0],requires_grad=True)
z = torch.tensor([5.0])

print(jacrev(comp2)(R,z))
print(jacrev(comp3)(R,comp(R,z)))

shubbey · May 10, 2023, 5:48pm

Just FYI, I was in fact missing something fundamental. d/dx f(y) will always be 0 even if y is computed as a function of x (since y is the output, not a function). My intention here was to efficiently compute the hessian for a known jacobian output (aka the gradient). Currently I am just calling torch.autograd() on each row of the jacobian and thought I could leverage jacrev() to do it faster.