[ 1. calculate gradient via backward() ]
The following code generates the gradient of the output of a row-vector-valued function y with respect to (w.r.t.) its row-vector input x, using the backward()
function in autograd
.
(Strictly speaking, x and y are both 1xN matrix, and this is why the Jacobian matrix is a 1x2x1x2 tensor before it is “squeezed” as shown below in section 2, because this is a Jacobian of one matrix w.r.t. another matrix)
x = torch.tensor( [[2, 3]], dtype=torch.float, requires_grad=True)
def func(x):
y = torch.zeros(1, 2)
y[0, 0] = x[0, 0]**2 + 3*x[0, 1]
y[0, 1] = x[0, 1]**2 + 2*x[0, 0]
return y
y = func(x)
y.backward(gradient=torch.ones_like(y))
x.grad
The output is:
tensor([[6., 9.]])
[ 2. calculate gradient manually via Jacobian-vector product ]
However, I’m unable to obtain the gradient of x mannually using the Jacobian-vector product method as shown below, i.e. conducting matrix multiplication between the transpose of the Jacobian matrix and a vector of “ones” in the same shape of y (i.e. a 1x2 row vector):
x = torch.tensor( [[2, 3]], dtype=torch.float, requires_grad=True)
def func(x):
y = torch.zeros(1, 2)
y[0, 0] = x[0, 0]**2 + 3<em>x[0, 1]
y[0, 1] = x[0, 1]**2 + 2</em>x[0, 0]
return y
y = func(x)
J = torch.squeeze(torch.autograd.functional.jacobian(func, x))
x_grad = torch.matmul(
torch.transpose(J, 0, 1) ,
torch.ones_like(y)
)
x_grad
This is because the 2x2 Jacobian matrix of y w.r.t. x cannot multiply a 1x2 row vector of ones in the same shape of y, as indicated in the error message, which is understandable.
RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x2 and 1x2)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-33-3ebd031d82e3> in <module>
11 J = torch.squeeze(torch.autograd.functional.jacobian(func, x))
12
---> 13 x_grad = torch.matmul(
14 torch.transpose(J, 0, 1),
15 torch.ones_like(y)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x2 and 1x2)
[ 3. My Question ]
So, if the gradient is indeed calculated by autograd
using the Jacobian-vector-product method, does this mean that internally autograd
will convert(transpose) the input row vector of ones to a column vector in the same length so that the matrix multiplication can be conducted correctly, and the resulting column vector will be transposed so that the final output is a row vector in the same shape as x, as shown below?
x = torch.tensor( [[2, 3]], dtype=torch.float, requires_grad=True)
def func(x):
y = torch.zeros(1, 2)
y[0, 0] = x[0, 0]**2 + 3*x[0, 1]
y[0, 1] = x[0, 1]**2 + 2*x[0, 0]
return y
y = func(x)
J = torch.squeeze(torch.autograd.functional.jacobian(func, x))
x_grad = torch.transpose(torch.matmul(
torch.transpose(J, 0, 1),
torch.transpose(torch.ones_like(y), 0, 1) # transpose the row vector of ones to a column vector
), 0, 1) # the result is transposed back to a row vector in the same shape as x
x_grad
… which does generate the correct results:
tensor([[6., 9.]])