Calculating the Jacobian matrix after calculating the gradient with autograd

Hi everyone,

I’ve been trying to calculate the jacobian matrix or Jacobian times a vector when the explicit formula for the gradient is not available, and I calculate it by autograd. In this case, the jacobian of PyTorch returns a zero matrix. In the matrix-vector product, it returns the same value as the gradient of the objective function with respect to parameter theta. Please find a toy example below. I appreciate your guidance.

#First version
#Forming the Jacobian matrix
def objective(x,theta):
return torch.exp(theta[0])*x[0]*x[1] +torch.exp(theta[1])*x[2]

def gradient(x,theta):
funval = objective(x,theta)
funval.backward()
return x.grad

x = torch.tensor([1.0,2.0,3.0], requires_grad=True)
theta = torch.tensor([-1.0,-3.0], requires_grad=True)
jac = jacobian(gradient,(x,theta))
print(jac)

#Second version
#Forming the Jacobian matrix-vector product with, for example, the vector of ones
def objective(x,theta):
return torch.exp(theta[0])*x[0]*x[1] +torch.exp(theta[1])*x[2]

def gradient(x,theta):
funval = objective(x,theta)
funval.backward()
print(theta.grad)
return x.grad

x = torch.tensor([1.0,2.0,3.0], requires_grad=True)
theta = torch.tensor([-1.0,-3.0], requires_grad=True)
grad = Variable(gradient(x,theta),requires_grad=True)
temp = torch.dot(grad,torch.ones(3))
temp.backward()
print(theta.grad)

#which is the same as the gradient of the objective function with respect to theta

Hi Sadegh!

The Jacobian of a vector-valued function of a vector argument is
the matrix of partial derivatives of each element of the function’s value
with respect to each element of the argument.

In your case your objective() function returns a scalar value, not a
vector, so we normally would not use the term Jacobian. (If you want
to say that your scalar result is a one-dimensional vector, you could
treat your gradient vector as a 1 x n matrix and call it the Jacobian.)

Could you clarify what you are asking here?

Note that the Hessian is the matrix of second-order mixed partial
derivatives of a scalar-valued function of a vector argument. Could
that be what you are asking about?

Best.

K. Frank

Thank you for your reply, K. Frank.

Let me clarify. Let f(x,theta) be our objective. At first, I calculate its gradient with respect to x when theta is constant in the function gradient(x,theta) which is a vector-valued function. Then, I want to calculate the derivative of the function gradient(x,theta) with respect to theta. The result will be the Jacobian matrix. Did it make it clear?

Hi Sadegh!

If you have a vector-valued function, g() (that happens to be the gradient
of some scalar-valued function, f()), then yes, the derivative of g() will
be g()'s Jacobian.

But that’s an odd way of describing is. You are, of course, computing the
second-order derivatives of f(), which is to say, you are computing the
Hessian of f(). Calling it the Jacobian (of g()) just obscures what it going
on.

Regardless of what you choose to call it, the computation you describe
is that of the mixed second-order partial derivatives of f().

I would suggest that you start with pytorch’s hessian() functional to
compute this.

It is true that hessian() will compute the full Hessian of f(), whereas
in your description you only want the x-theta cross terms, so there could
be some inefficiency in that you would compute the block-diagonal terms
that you don’t want.

But I would recommend hessian(), only moving on to something more
complicated if it proves inadequate for your needs.

Best.

K. Frank