I’m very new to pytorch and this might come as a stupid question; while going through the autograd tutorial I came to this section where it’s written-
Now in this case y is no longer a scalar. torch.autograd could not compute the full Jacobian directly, but if we just want the vector-Jacobian product, simply pass the vector to backward as an argument:
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)
print(x.grad)
Now, I don’t understand why are we using a different tensor v in order to compute gradients for x.
The short answer is, you can’t compute a gradient unless it is with respect to a scalar quantity, because the gradient says how much the parameter would change if you vary the output: it makes sense that you can bump a scalar up and down, but if you have a matrix, how would you vary that? The tensor v specifies exactly in what sense you are “varying” the matrix, so that you can still compute a gradient. We don’t ever compute Jacobians, because in DL they are not usually needed.