Why use a different tensor for computing gradients of another tensor

I’m very new to pytorch and this might come as a stupid question; while going through the autograd tutorial I came to this section where it’s written-

Now in this case y is no longer a scalar. torch.autograd could not compute the full Jacobian directly, but if we just want the vector-Jacobian product, simply pass the vector to backward as an argument:

v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)

print(x.grad)

Now, I don’t understand why are we using a different tensor v in order to compute gradients for x.

1 Like

The short answer is, you can’t compute a gradient unless it is with respect to a scalar quantity, because the gradient says how much the parameter would change if you vary the output: it makes sense that you can bump a scalar up and down, but if you have a matrix, how would you vary that? The tensor v specifies exactly in what sense you are “varying” the matrix, so that you can still compute a gradient. We don’t ever compute Jacobians, because in DL they are not usually needed.