The behavior of torch.autograd.functional.jvp

I am reading the documentation about jvp. I believe it is used to compute the dot product between the Jacobian and v. However, I find its behavior strange when the inputs are 2-D matrices.

According to its usage: torch.autograd.functional.jvp(func, inputs, v=None), we can define the func and inputs as follows:

def func(x): 
    return 2*x.sum(dim=1)
x = torch.ones(3,2, requiers_grad=True)
v = torch.tensor([[1,2],[3,4],[5,6])
res = torch.autograd.functional.jvp(func, x, v)

After running the above code:

res[0] = tensor([4,4,4]) # which is the output of func(x)
res[1] = tensor([6, 14, 22]) # which is the dot product between jacobian and v

The result of res[1] is somewhat unintuitive. Let’s define y = func(x), which means y = 2*x.sum(dim=1) :

y[0] = 2 * x[0,0] + 2 * x[0,1]
y[1] = 2 * x[1,0] + 2 * x[1,1]
y[2] = 2 * x[2,0] + 2 * x[2,1]

The Jacobian matrix of y with respect to x is:

grad = torch.tenosr([(2,2),(0,0),(0,0),
                     (0,0),(2,2),(0,0),
                     (0,0),(0,0),(2,2)])
grad.shape = torch.Size([3,3,2])
v = torch.tensor([[1,2],
                  [3,4],
                  [5,6]])

According to res[1] = tensor([6, 14, 22]), we can infer that the dot product is performed in the following way:

Such an operation is not the standard dot product between two matrices. Additionally, the output of JVP only has the shape of a torch.Size([3]), and the inputs of JVP have the shape of a torch.Size([3, 3]). So I wonder what the meaning of the outputs of JVP is? What does the grad of res[1] stand for, and where can we use it?

I have realized the motivation behind why jvp performs the Jacobian-vector-product operation with 1-D array inputs: It is designed to compute the partial gradients of two variables using the chain rule. In such cases, res[1] also has the same shape as the inputs. It is clear that res[1] represents the gradients of the inputs. However, this doesn’t hold when the inputs are a 2-D matrix, which greatly confused me.

Hey!

Indeed, this definition of computing a dot product only makes sense for 1D inputs.
But you can also view your 2D input as a 1D input (.view(-1)), compute the jacobian vector product and then reshape again the result to be the shape of the output.
In this view, the jacobian has size nb_elements_outputs x nb_elements_inputs which shows the linearization.

1 Like